Question: Your team must create a Python class called AIWebCrawler that fulfills the following requirements: Web Crawling: Your crawler must visit all pages within the given
Your team must create a Python class called AIWebCrawler that fulfills the following requirements:
Web Crawling:
Your crawler must visit all pages within the given domain.
The crawler must not navigate to external domains.
Handle different types of web pages and links.
Visiting Strategies:
Implement the visiting strategies: preorder, inorder, and postorder.
The visiting strategy must be specified as a parameter during class instantiation.
Output:
Generate a corpus of text documents containing the content of each visited page.
Ensure the text is free of HTML tags, JavaScript, menu items, and other nonessential elements.
The title of each textual document is the title of the page visited during the crawling phase.
Handling Dynamic Content:
Use JavaScript engines like Chrome Selenium WebDriver to crawl and extract content from dynamic pages.
Ensure the crawler can interpret and navigate JavaScriptrendered content.
Integration with AI ChatGPT or Google Colab:
Utilize Al capabilities in your crawler for tasks such as parsing, Eext extraction, or decisionmaking.
Document all the prompts used to generate the web crawler and keep track of the number of times the generated code did not work and how you solved the iss prompt or manual intervention
Keep track of this information using the following table:
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
