Question: Consider the following web graph: Page A points to page B, C, and D Page B points to C and D Page C points to
Consider the following web graph: Page A points to page B, C, and D Page B points to C and D Page C points to A and E Page D points to E and F. Page E points to G Page F points to G and IH Consider a crawler that starts from page A (a) Give the order of the indexing, assuming the crawler uses a URL frontier with duplicate detection, and all the pages are at different web sites. (b) Assume pages B, C, F, H are on web site a, pages D, E, G are on web site B, and page A is on web site . The politeness policies on these three web sites all specify at least 3 seconds between each visit (i.e., if the crawler visit a web site at the i second, the earliest time it can revisit the web site is the i 3 second). We assume that (1) the crawler can only fetch a page every one second, and all the processing (including physically getting the page, extracting and processing the links, etc.) can be completed before the next fetch (2) the crawler process links in the order mentioned above The crawler still uses a ULR frontier with duplicate detection, and also uses back queues to adhere to the politeness policies. Give the order of the indexing. (If two pages can be visited at the same time, we always choose the smaller one according to the alphabetical order) Consider the following web graph: Page A points to page B, C, and D Page B points to C and D Page C points to A and E Page D points to E and F. Page E points to G Page F points to G and IH Consider a crawler that starts from page A (a) Give the order of the indexing, assuming the crawler uses a URL frontier with duplicate detection, and all the pages are at different web sites. (b) Assume pages B, C, F, H are on web site a, pages D, E, G are on web site B, and page A is on web site . The politeness policies on these three web sites all specify at least 3 seconds between each visit (i.e., if the crawler visit a web site at the i second, the earliest time it can revisit the web site is the i 3 second). We assume that (1) the crawler can only fetch a page every one second, and all the processing (including physically getting the page, extracting and processing the links, etc.) can be completed before the next fetch (2) the crawler process links in the order mentioned above The crawler still uses a ULR frontier with duplicate detection, and also uses back queues to adhere to the politeness policies. Give the order of the indexing. (If two pages can be visited at the same time, we always choose the smaller one according to the alphabetical order)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
