Question: In java Description: In this lab, you will gain some experience with file I/O, text parsing, and URL connections. You will build an application that
In java
Description:
In this lab, you will gain some experience with file I/O, text parsing, and URL connections. You will build an application that provides users with a guided web browsing capability that searches files for user-specified keywords.
A typical search engine reads many files off the web and saves information about them in a database that is used to answer the search queries posed by users. However, your application will not do any prefetching of data. Instead, it will search files in response to user requests, as described in the specification given below.
Specification:
User enters the specific URL such as http://www.bbc.com/ and search word to search for on the command line.
The program opens an URLConnection for the given URL. Your program should parse the file in order to display the following information:
The number of occurrences of the user-specified word in the HTML file
The URLs for all the links to other HTML files that are given in the user-selected file (things of the form href="xxxxx"), along with the number of occurances of he keyword in each. To do this, open a URL connectin for each of HTML links and parse the file, counting the number of times the keywird ouccurs. You have to display all the URLs that were parsed, sorted by the number of occurrences of the keyword, in decreasing order, omitting files that don't contain the keyword at all. For each URL, displau the URL for the file, followed by the number of occurrences of the keyword in parentheses.
After each search, use fileOutputStream to save the result of the application to a file called "searchdata.www", overwriting the data from the previous search.
Assumptions:
All of the actual URL files will end with the .html suffix. However, the link names may not show the suffix explictly. If the link ends with a /, append the string index.html before processing. If a link does not end with a / and also does not end with .html, append the string /index.html before processing.
If you are currently looking at a page whose URL is http://www.example.com/abc/nonsense.html, then the path http://www.example.com/abc/ is considered to be currently directory URL.
If a link does not begin with http:// then it is a relative link, meaning that you should prefix it with the current directory URL before processing. For more information regarding absoulte/relative link, please refer to http://www.scriptingok.com/tutorial/HTML-links-2
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
