Question: Java I want to read the text from a webpage. The Div to read is the div class and ignore the rest. Cannot get it

Java I want to read the text from a webpage. The Div to read is the div class and ignore the rest. Cannot get it to work gave an example webpage not the ones I will be doing. Needs to read the div, and output top 10 words and times used

import java.io.*; import java.util.*; import org.jsoup.*; import org.jsoup.nodes.*;

public class TextAnalyzer { public static void main(String[] args) throws IOException { Document doc = Jsoup.connect("https://www.oracle.com/corporate/features/jsoup-html-parsing-library.html").get(); Element div = doc.select("div#divClass").first(); String divText = div.text(); Scanner scan = new Scanner(divText); Map map = new HashMap(); while (scan.hasNextLine()){ String val = scan.nextLine(); if(map.containsKey(val) == false) map.put(val,1); else { int count = (int)(map.get(val)); map.remove(val); map.put(val,count+1); // reinserting the word and increase frequncy by 1 } scan.close(); } Set> set = map.entrySet(); List> sortedList = new ArrayList>(set); Collections.sort( sortedList, new Comparator>() { public int compare( Map.Entry a, Map.Entry b ) { return (b.getValue()).compareTo( a.getValue() ); } }); for(Map.Entry i:sortedList){ System.out.println(i.getKey()+" -> "+i.getValue()); } }}

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!