Android中网页数据的抓取和修改

#Android中网页数据的抓取和修改| 来源: 网络整理| 查看: 265

在Android中经常会使用WebView加载网页，进行网页数据的展示，但是有时候需要从网页中动态的抓取数据，进行处理，甚至对网页的数据进行修改，使其动态的展示效果，候WebView就显得无能为力了，最近项目中就有这样的需求，加载本地H5数据，动态的修改里面的内容，然后再预览，接下里说说他的实现步骤。

一、WebView介绍 WebView是一个基于webkit引擎、展现web页面的控件。在低版本和高版本采用了不同的webkit版本内核，4.4后直接使用Chrome。 WebView控件功能强大，除了具有一般View的属性和设置外，还可以对url请求、页面加载、渲染、页面交互进行强大的处理。 1、常用设置 //支持javascript wvWebView.getSettings().setJavaScriptEnabled(true); // 设置可以支持缩放 wvWebView.getSettings().setSupportZoom(true); //隐藏缩放按钮 wvWebView.getSettings().setDisplayZoomControls(false); // 设置出现缩放工具 wvWebView.getSettings().setBuiltInZoomControls(true); //扩大比例的缩放 wvWebView.getSettings().setUseWideViewPort(true); //自适应屏幕 wvWebView.getSettings().setLayoutAlgorithm(WebSettings.LayoutAlgorithm.SINGLE_COLUMN); wvWebView.getSettings().setLoadWithOverviewMode(true); 2、加载网页的方式在WebView有三种常用的加载方式：分别是loadUrl，LoadData，LoadDataWithBase (1)loadUrl直接加载一个URL就可以实现网页的加载。 (2)wvWebView.loadData(String data,String minmeTye,String encoding); 参数一：要加载的网页字符串数据，参数二：加载minmeTye数据，一般为图片，参数三：编码格式。此方法会自动把特殊字符转换，需要设置过滤，因此在加载css等含有特殊字符的文件应该谨慎。 (3) wvWebView.loadDataWithBaseURL(String baseUrl, String data, String mimeType, String encoding, String historyUrl); 参数一：要加载的网页数据的路径，即包含各类资源的总路径，参数二：需要加载的网页内容的字符串数据，参数三：加载minmeTye数据，一般为图片，参数四：编码格式，参数五：返回的URL,一般为null。一般会使用方式一来加载图片，但是有时候加载的URL会关联多个文件，例如：一个HTML中含有的多个js,css,图片等资源，若是使用的一种加载方式会显示不全，无法显示图片等等，这时候需要用到第三种方式加载，他比第二种方式更加强大。

二、jsoup解析器

jsoup是一个强大的HTML解析器，封装了很多解析HTML，js，css的解析方法，具有非常强大的解析能力。它能够根据网页中的关键字，类选择器，id选择器，属性，值等等内容获取网页的的相关信息，并且能够设置相关属性，插入数据，以及独立的网页，对其进行编辑。

1、jsoup的初始化

导入jsoup的jar包，jsoup的静态方法Jsoup.parse能够把网页数据的字符串格式、输入流形式、文件形式、URL形式等转化为document文档对象，接着对文档对象进行操作，例如：

Document document = Jsoup.parse(html);

2、数据的获取，这里介绍以下常用的方法获取数据

（1）获取元素

getElementById(String id) 用id获得元素getElementsByTag(String tag) 用标签获得元素getElementsByClass(String className) 用class获得元素getElementsByAttribute(String key) 用属性获得元

（2）获取特定的元素的文本

依据选择器来获取：Elements elementsBuyerName = document.select(".buyerName");依据关键词来获取：Elements elementsBuyerName = document.contain(":货物");

获取的结果是一个list集合，遍历集合获取所要的结果。

（3）设置值

elementsBuyerName.get(0).text("这是一个新的值"); //设置值document.select(".code").remove(); //移除相关标签

通过以上方法就可以简单地获取一个网页的数据。

三、具体使用场景实现

1、在Android studio的main文件夹下简历assets资源文件夹，并且把网页内容文件夹包括关联的图片、js资源、css资源以及其他资源拷贝到资源文件下。

2、在适当的位置把assets文件夹下的网页资源文件复制到手机本地目录里面。

public static void copyAssetsToDst(Context context, String srcPath, String dstPath) { try { String fileNames[] = context.getAssets().list(srcPath); if (fileNames.length > 0) { File file = new File(context.getFilesDir(), dstPath); if (!file.exists()) { file.mkdirs(); } else { return; } for (String fileName : fileNames) { if (!srcPath.equals("")) { // assets 文件夹下的目录 copyAssetsToDst(context, srcPath + File.separator + fileName, dstPath + File.separator + fileName); } else { // assets 文件夹 copyAssetsToDst(context, fileName, dstPath + File.separator + fileName); } } } else { File outFile = new File(context.getFilesDir(), dstPath); InputStream is = context.getAssets().open(srcPath); FileOutputStream fos = new FileOutputStream(outFile); byte[] buffer = new byte[1024]; int byteCount; while ((byteCount = is.read(buffer)) != -1) { fos.write(buffer, 0, byteCount); } fos.flush(); is.close(); fos.close(); } } catch (Exception e) { e.printStackTrace(); } }

3、读取本地网页文件夹转为字符串格式数据，当访问网络获时取到数据并且对网页的相关字段进行查找替换。

public static String readFile(String path) throws IOException { File file = new File(path); BufferedReader bufferedReader = new BufferedReader(new FileReader(file)); StringBuilder stringBuilder = new StringBuilder(); String content; while ((content = bufferedReader.readLine()) != null) { stringBuilder.append(content); } bufferedReader.close(); return stringBuilder.toString(); } 读取到内存中，使用Document document = Jsoup.parse(String html);对网页进行解析，获得到Document 文档对象。 Elements elementsr = document.select(".class选择器"); elementsr .get(0).text("要替换的内容"); 使用String html = document.outerHtml();生成编辑后的字符串内容。

4、替换结束后把字符串数据重新写到相应的本地目录文件夹。

public static void writeFile(String str,String path){ FileWriter fw = null; File f = new File(path); try { fw = new FileWriter(f); BufferedWriter out = new BufferedWriter(fw); out.write(str, 0, str.length()-1); out.close(); } catch (IOException e) { e.printStackTrace(); } }

5、在预览网页时候从本地目录进行网页加载。

public static String readFile(String path) throws IOException { File file = new File(path); BufferedReader bufferedReader = new BufferedReader(new FileReader(file)); StringBuilder stringBuilder = new StringBuilder(); String content; while ((content = bufferedReader.readLine()) != null) { stringBuilder.append(content); } bufferedReader.close(); return stringBuilder.toString(); } wvWebView.loadUrl("file:///data/data/包名/文件夹名称/文件名称/file.html"); 注意：这里一定要是：file:///文件路径，直接使用文件路径不能正常加载，会无法加载总文件夹下其他的js，图片等资源，出现各种错乱问题。

【本文地址】

公司简介

联系我们