Question: Problem 2 Word embedding as features for classification Task Implement a sentiment classifier based on Twitter data to analyse the sentiments of COVID - 1

Problem 2 Word embedding as features for classification
Task
Implement a sentiment classifier based on Twitter data to analyse the sentiments of COVID-19 tweets.
Train and test multiple classification model using necessary libraries with the features being sentence embeddings of tweets.
Report the accuracy and F1 score (micro- and macro-averaged) for multiple classifier and discuss the differences.
Dataset
The dataset have been provided in the first code trunk with the assignment. You are required to use the original tweet text for this classification task.
Tweet representation
After necessary pre-processing of the tweets, convert the words into their embeddings, then take the mean of all the word vectors in a tweet to end up with a single vector representing each tweet. The tweet vector is then used for sentiment classification.
In the process of finding the embeddings for each word, you can ignore out-of-vocabulary words.
Classifier choice
You are required to implement the following TWO classifiers:
One tradition classification model (not a neural network based model)
One classifier based on any neural network based model.
You can use PyTorch/TensorFlow/scikit-learn to implement your classifier. However, you are free to develop a classifier from scratch.
Your answer must include the following:
Code for data loading, data pre-processing, training, and testing of the models.
A discussion on the comparison between the classifiers based on classifier accuracy and F1 score.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!