Java HtmlCleaner.clean方法代码示例

#Java HtmlCleaner.clean方法代码示例| 来源: 网络整理| 查看: 265

本文整理汇总了Java中org.htmlcleaner.HtmlCleaner.clean方法的典型用法代码示例。如果您正苦于以下问题：Java HtmlCleaner.clean方法的具体用法？Java HtmlCleaner.clean怎么用？Java HtmlCleaner.clean使用的例子？那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类org.htmlcleaner.HtmlCleaner的用法示例。

在下文中一共展示了HtmlCleaner.clean方法的15个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: toXML import org.htmlcleaner.HtmlCleaner; //导入方法依赖的package包/类 /** * htmlcleaner로 html string을 xml string으로 바꿔주는 메소드. * @param source * @return */ private String toXML(String source){ try { CleanerProperties props = new CleanerProperties(); props.setTranslateSpecialEntities(true); props.setOmitComments(true); props.setPruneTags("script,style"); // namespace를 무시한다. props.setNamespacesAware(false); props.setAdvancedXmlEscape(true); props.setTranslateSpecialEntities(true); HtmlCleaner cl = new HtmlCleaner(props); TagNode tagNode = cl.clean(source); source = new PrettyXmlSerializer(props).getXmlAsString(tagNode); } catch (IOException e) { logger.error("",e); } return source; } 开发者ID:gncloud，项目名称:fastcatsearch3，代码行数:24，代码来源:ReadabilityExtractor.java 示例2: processFollow import org.htmlcleaner.HtmlCleaner; //导入方法依赖的package包/类 /** * 解析关注页面，关注与被关注 * * @param followUrl */ public static void processFollow(String followUrl) { String content = PageUtil.getContent(followUrl); HtmlCleaner htmlCleaner = new HtmlCleaner(); TagNode tNode = htmlCleaner.clean(content); extractUserUrl(content); try { Object[] pageNumObj = tNode .evaluateXPath("//*[@id=\"Profile-following\"]//div[@class=\"Pagination\"]/button"); if (pageNumObj != null && pageNumObj.length > 0) { TagNode node = (TagNode) pageNumObj[pageNumObj.length - 2]; int pagenum = Integer.parseInt(node.getText().toString()); for (int i = 2; i 0) { TagNode[] textNode = ((TagNode) rootNode[rootNode.length - 1]).getElementsByName("td", true); for (TagNode tag : textNode) { if (tag != null && tag.getText() != null) { StringBuilder errorTextString = new StringBuilder(); errorTextString.append(errorText); if (tag.getText().toString().trim().equals(";")) { errorTextString.append(" "); errorText = errorTextString.toString(); } else { errorTextString.append(tag.getText()); errorText = errorTextString.toString(); } } } } } catch (XPatherException e) { LOGGER.error("Error extracting table node from html." + e.getMessage()); } return errorText; } 开发者ID:kuzavas，项目名称:ephesoft，代码行数:43，代码来源:AbstractUploadFile.java 示例8: htmlToWiki import org.htmlcleaner.HtmlCleaner; //导入方法依赖的package包/类 public static String htmlToWiki(String html, String contextPath, int projectId) throws Exception { // Strip the nbsp because it gets converted to unicode html = StringUtils.replace(html, ";", " "); // Take the html create DOM for parsing HtmlCleaner cleaner = new HtmlCleaner(); CleanerProperties props = cleaner.getProperties(); TagNode node = cleaner.clean(html); Document document = new DomSerializer(props, true).createDOM(node); if (LOG.isTraceEnabled()) { LOG.trace(html); } // Process each node and output the wiki equivalent StringBuffer sb = new StringBuffer(); ArrayList nodeList = new ArrayList(); for (int i = 0; i < document.getChildNodes().getLength(); i++) { Node n = document.getChildNodes().item(i); nodeList.add(n); } processChildNodes(nodeList, sb, 0, true, true, false, "", contextPath, projectId); if (sb.length() > 0) { String content = sb.toString().trim(); if (content.contains("'")) { // Determine if this is where the ' is being introduced content = StringUtils.replace(content, "'", "'"); } if (!content.endsWith(CRLF)) { return content + CRLF; } else { return content; } } else { return ""; } } 开发者ID:Concursive，项目名称:concourseconnect-community，代码行数:38，代码来源:HTMLToWikiUtils.java 示例9: parseHhc import org.htmlcleaner.HtmlCleaner; //导入方法依赖的package包/类 public static List parseHhc(InputStream hhcFile, Resources resources) throws IOException, ParserConfigurationException, XPathExpressionException { HtmlCleaner htmlCleaner = new HtmlCleaner(); CleanerProperties props = htmlCleaner.getProperties(); TagNode node = htmlCleaner.clean(hhcFile); Document hhcDocument = new DomSerializer(props).createDOM(node); XPath xpath = XPathFactory.newInstance().newXPath(); Node ulNode = (Node) xpath.evaluate("body/ul", hhcDocument .getDocumentElement(), XPathConstants.NODE); List sections = processUlNode(ulNode, resources); return sections; } 开发者ID:DASAR，项目名称:epublib-android，代码行数:12，代码来源:HHCParser.java 示例10: HtmlXpathSelector import org.htmlcleaner.HtmlCleaner; //导入方法依赖的package包/类 public HtmlXpathSelector(String content) throws ParserConfigurationException, SAXException, IOException { HtmlCleaner htmlCleaner = new HtmlCleaner(); TagNode rootTagNode = htmlCleaner.clean(content); rootDocument = new DomSerializer(new CleanerProperties()).createDOM(rootTagNode); xPath=XPathFactory.newInstance().newXPath(); } 开发者ID:hxt168，项目名称:webpasser，代码行数:10，代码来源:HtmlXpathSelector.java 示例11: toXhtml import org.htmlcleaner.HtmlCleaner; //导入方法依赖的package包/类 public static String toXhtml(String htmlString) { String xhtmlString = null; if (StringUtils.isNotEmpty(htmlString)) { xhtmlString = XmlUtils.skipDocTypeDeclaration(htmlString.trim()); if (xhtmlString.startsWith("") || xhtmlString.startsWith("

【本文地址】

公司简介

联系我们