Question: 1. Write a python program to collect text data from either of the following sources and save the data into a csv file: (1) Collect

1. Write a python program to collect text data from either of the following sources and save the data into a csv file: (1) Collect all the customer reviews of a product (you can choose any porduct) on amazon. (2) Collect the top 10000 User Reviews of a film recently in 2023 or 2022 (you can choose any film) from IMDB. (3) Collect all the reviews of the top 1000 most popular software from G2 or Capterra (4) Collect the abstracts of the top 10000 research papers by using the query "machine learning", "data science", "artifical intelligence", or "information extraction" from Semantic Scholar. (5) Collect all the information of the 904 narrators in the Densho Digital Repository. (6) Collect the top 10000 tweets by using a hashtag (you can use any hashtag) from Twitter.

2.Write a python program to clean the text data you collected above and save the data in a new column in the csv file. The data cleaning steps include: (1) Remove noise, such as special characters and punctuations. (2) Remove numbers. (3) Remove stopwords by using the stopwords list. (4) Lowercase all texts (5) Stemming. (6) Lemmatization.

3. Write a python program to conduct syntax and structure analysis of the clean text you just saved above. The syntax and structure analysis includes: (1) Parts of Speech (POS) Tagging: Tag Parts of Speech of each word in the text, and calculate the total number of N(oun), V(erb), Adj(ective), Adv(erb), respectively. (2) Constituency Parsing and Dependency Parsing: print out the constituency parsing trees and dependency parsing trees of all the sentences. Using one sentence as an example to explain your understanding about the constituency parsing tree and dependency parsing tree. (3) Named Entity Recognition: Extract all the entities such as person names, organizations, locations, product names, and date from the clean texts, calculate the count of each entity.

Write your explanations of the constituency parsing tree and dependency parsing tree here (Question 3-2):

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!