Question: New to pyhton and need help. Language used: Python 3. Also using requests and BeautifulSoup. Build a web crawler function that starts with a url

New to pyhton and need help. Language used: Python 3. Also using requests and BeautifulSoup.

Build a web crawler function that starts with a url representing a topic and outputs a list of at least 15 relevant urls. The urls can be pages within the original domain but should have a few outside the original domain.

Write a function to loop through your urls and and scrape all text off each page. Store each pages text in its own file.

Write a function to clean up the text. You might need to delete newlines and tabs. Extract sentences with NLTKs sentence tokenizer. Write the sentences for each file to a new file. That is, if you have 15 files in, you have 15 files out.

You might need to clean up the cleaned up files manually to delete irrelevant material.

Write a function to extract at least 10 important terms from the pages using an importance measure such as term frequency. First, its a good idea to lower-case everything, remove stopwords and punctuation. Then build a vocabulary of unique terms. Create a dictionary of unique terms where the key is the token and the value is the count across all documents. Print the top 25-40 terms.

Manually determine the top 10 terms based on your domain knowledge.

Build a searchable knowledge base of facts that your bot can share related to the 10 terms.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Jupiter Notebook We have covered some of the limitations of single layer neural networks in class, but they are still powerful learning systems that provide a good way to begin learning about how to...

I have to create a program in C and I can't figure it out. The program has to read a source file. Please help. /******************************************************************** PROJECT: Glossary...

i want complete solution for my assignment and it should be without plagiarism COIT20274: Information Systems for Business Professionals, Term One 2016 Assignments 1 & 2 Requirements Assignment 1 -...

this is a python program please can anyone help me thank you Introduction In problem set 5, you will build a program to monitor news feeds over the Internet. Your program will filter the news,...

answer the question clearly You are building a flight-control system for which a convincing safety case must be made. Would you assign the tasks of safety requirements engineering, test case...

What are the biggest ah-ha! moments from Oracy Development? 6 English-Language Oracy Development Learning Outcomes After reading this chapter, you should be able to ... . Describe the basics of...

This text was adapted by The Saylor Foundation under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License without attribution as requested by the work's original creator or licensee. 1...

MATHEMATICS FOR MACHINE LEARNING Marc Peter Deisenroth A. Aldo Faisal Cheng Soon Ong Contents Foreword 1 Part I Mathematical Foundations 9 1 Introduction and Motivation 11 1.1 Finding Words for...

Chairish-Is-The-Word, Inc., manufactures top-end hardwood chairs that are sold through a variety of retail outlets. The most popular model sells (wholesale) for $400 per chair and costs $300 to make....

Which of the following is TRUE about a decision tree? Decision trees employ divide and conquer method. Both (a) and (b). Decision trees induce a tree-like graph. Decision trees combine all nodes into...

QUESTION 3 Corporsie bonds can be placed with investors through a public eflering or a privale placenoes. a . False b . True

At age 3 2 , you have assets of $ 2 8 2 , 0 5 8 and liabilities of $ 2 7 1 , 0 1 2 . What is your net worth? Net worth

What would be the shared Data Elements between Position Control and Salary Grade Tables in providing cost input to Budgets for vacant positions?

What is the purpose of the Salary Structure Table?

What is the scope and use of a Job Family Table?