Question: 1. Write a python program to a. Extract the contents (excluding any tags) from two websites (https://en.wikipedia.org/wiki/Web_mining&https://en.wikipedia.org/wiki/Data_ mining). b. Remove stopwords [using Spacy Module] (including

1. Write a python program to a. Extract the contents (excluding any tags) from two websites (https://en.wikipedia.org/wiki/Web_mining&https://en.wikipedia.org/wiki/Data_ mining). b. Remove stopwords [using Spacy Module] (including the special characters/symbols) from the contents retrieved from those two URLs and save the contents in two separate .txt file.  [List of additional Stop words to be considered = [dot, comma, singlequote, double quote, question mark, brackets [square, parentheses, curly, angle], exclamation mark]] c. Display the POS tag (sentence-wise) for all the stopwords (excluding the special character/symbols), which are removed from the content, using pandas dataframe as per the format given below: Original Sentence List of Stopwords POSTags Web mining is the application of data mining techniques to discover patterns from the World Wide Web. is the of to from the VBZ DT IN TO IN DT Assessment - 1 CSE 3024: Web Mining Page 2 d. Display the Term-Document incidence matrix using Boolean, Bag-of-words and Complete representation (Use pandas dataframe). Prepare three separate table, one for each type of representation as per the format given below: Terms DOC1 DOC2 Web 5 0 Data 0 1 e. Input a search a query (preferably a sentence) and compare the contents of the both pages with the processed query. Display the similarity result based on highest frequency matching count of the term. 2. Write a python program to prepare the Word Clouds representation based on the content present in the two document files prepared in Q.No. 1. A sample Word Clouds representation is provided below for reference. 3. Write a python program to show the implementation of sentence paraphrasing through synonyms (retaining semantic meaning) for the following four sentences. Display at least three other paraphrased sentences for each sentence mentioned below. a. The quick brown fox jumps over the lazy dog b. We can rewrite history as much as we like. c. Once you know all the elements, it’s not difficult to pull together a sentence. d. The incessant ticking and chiming echoed off the weathered walls of the clock repair shop.

Step by Step Solution

3.53 Rating (160 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

import required modules import requests get URL page requestsgethttpsenwikipediaorgwik... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Operating System Questions!