Question: A) Write a python program to Collect the abstracts of the top 100 research papers by using the query natural language processing from CiteSeerX. and

A) Write a python program to Collect the abstracts of the top 100 research papers by using the query natural language processing from CiteSeerX. and save the data into a csv file:

B) Write a python program to clean the text data you collected above and save the data in a new column in the csv file. The data cleaning steps include:

(1) Remove noise, such as special characters and punctuations.(2) Remove numbers. (3) Remove stopwords by using the stopwords list. (4) Lowercase all texts. (5) Stemming. (6) Lemmatization.

c) Write a python program to conduct syntax and structure analysis of the clean text you just saved above. The syntax and structure analysis includes:

(1) Parts of Speech (POS) Tagging: Tag Parts of Speech of each word in the text, and calculate the total number of N(oun), V(erb), Adj(ective), Adv(erb), respectively.

(2) Constituency Parsing and Dependency Parsing: print out the constituency parsing trees and dependency parsing trees of all the sentences. Using one sentence as an example to explain your understanding about the constituency parsing tree and dependency parsing tree.

(3) Named Entity Recognition: Extract all the entities such as person names, organizations, locations, product names, and date from the clean texts, calculate the count of each entity.

Please write your explanations of the constituency parsing tree and dependency parsing tree

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!