Question: JAVA/C program The program should take 3 command line arguments: $ java TableDownloader (link) (link prefix) 0 The first link is that starting wikipedias gymnastics
JAVA/C program
The program should take 3 command line arguments:
$ java TableDownloader (link) (link prefix) 0
- The first link is that starting wikipedias gymnastics page.
This is the first page that it looks for tables.
- The second is the domain prefix. It should not consider going to any website that does not start with that prefix.
- The third (1 above) is the total number of links that it is allowed to crawl in a breadth-first manner from the starting page.
A new file should be made for each table. The name of the file should be:
pageName_captionWithoutSpaces.txt
page name is from first link and caption is from html source
For example, the table with caption Common Types of Skills in Tumbling appearing in Gymnastics link should be named Gymnastics_CommonTypesofSkillsinTumbling.txt.
- The first line of the file should be:
- "URL","(whatever the link was)"
Both should be wrapped in double-quotes. For example, for the Gymnastics page it is:
"URL","https://www.whatever.com"
- The second line of the file should be:
"Table","(whatever the table caption was)"
Both should be wrapped in double-quotes. For example, for the Gymnastics page it is:
"Table","Common Types of Skills in Tumbling"
- The third line should be blank.
- The fourth line should be:
"Headings","(1st Heading)","(2nd Heading)","(3rd Heading)" (etc.)
where Headings and all headings are wrapped in double-quotes. For example, for the Gymnastics page it is:
"Headings","Skill","Explained"
If there is no row with all headings then do not worry about handling the table.
- There should be one subsequent line for each table row with only table data (
Like this ) on it. Each row should begin with
"",
to denote a blank first column. All values should be wrapped in double-quotes. For example, for the Gymnastics page the first data
"","Round-off","A common entry skill seen in every type of gymnastics to turn horizontal speed into vertical speed."
- After visiting that page, it should consider visiting any linked page if:
- that page has not already been visited
- that page is not already scheduled been visited
- the URL begins with the prefix URL
- the number of links to it is less than or equal to the maximum search depth specified on the command line
Doing this will require keeping track of the URL and the number of remaining links in some linear data-structure. This will achieve breadth-first search.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
