Question: Natural language Processing - Programming Assignment Dataset: Choose one category from http://jmcauley.ucsd.edu/data/amazon/ - amazon product review data. Choose at least 25,000 (reviews). [if no. of

Natural language Processing - Programming Assignment

Dataset:

  1. Choose one category from http://jmcauley.ucsd.edu/data/amazon/ - amazon product review data.
  2. Choose at least 25,000 (reviews). [if no. of reviews > 25k)
  3. Review rule, for dataset:
    1. [overall > 3.0] - positive
    2. [overall <= 3.0] - negative

Module - 2 (Sentiment Analysis using statistical NLP):

Tasks:-

  1. Use the following vector space models
    1. CountVectorizer.
    2. TF-IDF.
    3. Any external vectorizer (cite the original paper).
  2. Do sentiment analysis using all (a,b,c) using classical ML techniques
    1. Naive Bayes Model.
    2. Decision Tree.
    3. Logistic Regression.
  3. Report metrics [accuracy, f1 score, confusion matrix] for all the combinations in (1 and 2)
  4. Analyse the results. [Report clearly which vector space model is giving better results on each model used]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!