Question: Question 1 Write a python program to conduct N-gram analysis based on the dataset in your assignment two: (1) Count the frequency of all the
Question 1
Write a python program to conduct N-gram analysis based on the dataset in your assignment two:
(1) Count the frequency of all the N-grams (N=3).
(2) Calculate the probabilities for all the bigrams in the dataset by using the fomular count(w2 w1) / count(w2). For example, count(really like) / count(really) = 1 / 3 = 0.33.
(3) Extract all the noun phrases and calculate the relative probabilities of each review in terms of other reviews (abstracts, or tweets) by using the fomular frequency (noun phrase) / max frequency (noun phrase) on the whole dataset. Print out the result in a table with column name the all the noun phrases and row name as all the 100 reviews (abstracts, or tweets).
Question2
Undersand TF-IDF and Document representation
(40 points). Starting from the documents (all the reviews, or abstracts, or tweets) collected for assignment two, write a python program:
(1) To build the documents-terms weights (tf*idf) matrix bold text.
(2) To rank the documents with respect to query (design a query by yourself, for example, "An Outstanding movie with a haunting performance and best character development") by using cosine similarity.
Question 3
Create your own training and evaluation data for sentiment analysis
(15 points). You dodn't need to write program for this question! Read each review (abstract or tweet) you collected in detail, and annotate each review with a sentiment (positive, negative, or neutral). Save the annotated dataset into a csv file with three columns (first column: document_id, clean_text, sentiment), upload the csv file to GitHub and submit the file link blew. This datset will be used for assignment four: sentiment analysis and text classification.
Note: I have written this statement because you will want to send to me that I need to update my questions like you have done twice. The previous questions that relate to this question were answered just two days ago by an expert.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
