java-jsoup解析html页面的内容(2)

论坛元老

Rank: 8 Rank: 8

UID: 1066743

1^#

打印

字体大小: tT

look_w发表于 2019-4-17 18:52 | 只看该作者

java-jsoup解析html页面的内容(2)

除了上述的用httpclient获取页面内容外，jsoup本身也支持字符串，链接，文档文件的解析。（只需要在工程中引用jsoup-1.3.3.jar即可）例子如下:

      public void parseString() {
         String html = "<html><head><title>blog</title></head><body onload='test()'><p>Parsed HTML into a doc.</p></body></html>";
         Document doc = Jsoup.parse(html);
         System.out.println(doc);
         Elements es = doc.body().getAllElements();
         System.out.println(es.attr("onload"));
         System.out.println(es.select("p"));
      }

      public void parseUrl() {
         try {
            Document doc = Jsoup.connect("http://www.baidu.com/").get();
            Elements hrefs = doc.select("a[href]");
            System.out.println(hrefs);
            System.out.println("------------------");
            System.out.println(hrefs.select("[href^=http]"));
         } catch (IOException e) {
            e.printStackTrace();
         }
      }

      public void parseFile() {
         try {
            File input = new File("input.html");
            Document doc = Jsoup.parse(input, "UTF-8");
            // 提取出所有的编号
            Elements codes = doc.body().select("td[title^=IA] > a[href^=javascript:view]");
            System.out.println(codes);
            System.out.println("------------------");
            System.out.println(codes.html());
         } catch (IOException e) {
            e.printStackTrace();
         }
      }

jsoup解析有很多种方法能得到同样的结构，就看你想用哪种思路。

有用id定位的  有用tagname获取的有用class 获取的

收藏分享评分

回复引用

订阅 TOP

返回列表