
时间:2022-01-26 19:35:42

I am trying to extract some data from a table by parsing the HTML using jsoup.


Here is an example,


String tableHtml =
                   THE TEXT I WANT TO GET

Document doc = Jsoup.parseBodyFragment(tableHtml);
Element table = doc.select("table").first();
Element r = table.select("tfoot").first(); // I get NULL here/// WHY???
System.out.println("-----------" + r.text());

I get null pointer exception !


However if I remove one of the inner tables, I don't get an exception and it works. Also if I changed the tag <th> to <td>, it works. Strange behavior. This is just an example of real html that I am trying to parse. I would appreciate if anyone can point me out why I am getting this exception. Thank you.


NOTE. Please assume that I cannot modify the HTML. I just want to parse it as it is.


1 个解决方案



Maybe instead of using HTML parser (which apparently doesn't fully support this kind of nesting tables) use XML parser. Try with


Document doc = Jsoup.parse(tableHtml,"",Parser.xmlParser());
Element table = doc.select("table").first();
Element r = table.select("tfoot").first(); 
System.out.println("->" + r.text());



Maybe instead of using HTML parser (which apparently doesn't fully support this kind of nesting tables) use XML parser. Try with


Document doc = Jsoup.parse(tableHtml,"",Parser.xmlParser());
Element table = doc.select("table").first();
Element r = table.select("tfoot").first(); 
System.out.println("->" + r.text());