Question: Classifier Evaluation. For more realistic evaluations, you will need to partition the data by date and train with older data and test on newer data
Classifier Evaluation. For more realistic evaluations, you will need to partition the data by date and train with older data and test on newer data (i.e., the heldout data). This project does not require you to follow this procedure. However, we still want to use cross validation for the whole dataset to get more reliable estimation of the classifier performance - you can get both the mean and standard deviation of a selected metric. The following code snippet shows how to use cross validation in evaluation.
from sklearn.model_selection import cross_val_score scores = cross_val_score(clf, feature_vectors, targets, cv=5, scoring='f1_macro') print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2)) This example uses 5-fold cross-validation (cv=5) and the metric is f1_macro, which is the macro-averaging of the F1 scores for different classes (why macro-averaging?).
For each classifier, please report the mean and 2*std of 5-fold f1_macro, precision_macro, and recall_macro, respectively. Hint: good classifiers for this dataset have F1 score higher than 0.5; for SVM, normalizing feature values may help the performance; for kNN, you may try a few different settings of n_neighbors to find the best one. Discuss what you have observed and put the related code in "classification.py".
-python
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
