Question: Your team must create a Python class called AIWebCrawler that fulfills the following requirements: Web Crawling: Your crawler must visit all pages within the given

Your team must create a Python class called AIWebCrawler that fulfills the following requirements:
Web Crawling:
Your crawler must visit all pages within the given domain.
The crawler must not navigate to external domains.
Handle different types of web pages and links.
Visiting Strategies:
Implement the visiting strategies: preorder, inorder, and postorder.
The visiting strategy must be specified as a parameter during class instantiation.
Output:
Generate a corpus of text documents containing the content of each visited page.
Ensure the text is free of HTML tags, JavaScript, menu items, and other non-essential elements.
The title of each textual document is the title of the page visited during the crawling phase.
Handling Dynamic Content:
Use JavaScript engines like Chrome Selenium WebDriver to crawl and extract content from dynamic pages.
Ensure the crawler can interpret and navigate JavaScript-rendered content.
Integration with AI (ChatGPT or Google Colab):
Utilize Al capabilities in your crawler for tasks such as parsing, Eext extraction, or decision-making.
Document all the prompts used to generate the web crawler and keep track of the number of times the generated code did not work and how you solved the iss prompt or manual intervention).
Keep track of this information using the following table:
 Your team must create a Python class called AIWebCrawler that fulfills

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!