Question: Problem 2 Word embedding as features for classification Task Implement a sentiment classifier based on Twitter data to analyse the sentiments of COVID - 1
Problem Word embedding as features for classification
Task
Implement a sentiment classifier based on Twitter data to analyse the sentiments of COVID tweets.
Train and test multiple classification model using necessary libraries with the features being sentence embeddings of tweets.
Report the accuracy and F score micro and macroaveraged for multiple classifier and discuss the differences.
Dataset
The dataset have been provided in the first code trunk with the assignment. You are required to use the original tweet text for this classification task.
Tweet representation
After necessary preprocessing of the tweets, convert the words into their embeddings, then take the mean of all the word vectors in a tweet to end up with a single vector representing each tweet. The tweet vector is then used for sentiment classification.
In the process of finding the embeddings for each word, you can ignore outofvocabulary words.
Classifier choice
You are required to implement the following TWO classifiers:
One tradition classification model not a neural network based model
One classifier based on any neural network based model.
You can use PyTorchTensorFlowscikitlearn to implement your classifier. However, you are free to develop a classifier from scratch.
Your answer must include the following:
Code for data loading, data preprocessing, training, and testing of the models.
A discussion on the comparison between the classifiers based on classifier accuracy and F score.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
