Question: Q2. Is it a conference announcement? The DBWorld e-mails data set contains 64 e-mails collected from DBWorld mailing list, classified into two classes: conference announcements

 Q2. Is it a conference announcement? The DBWorld e-mails data setcontains 64 e-mails collected from DBWorld mailing list, classified into two classes:

Q2. Is it a conference announcement? The DBWorld e-mails data set contains 64 e-mails collected from DBWorld mailing list, classified into two classes: "conference announcements" and "everything else". The data has 64 instances and 4702 attributes (note that the number of instances is much smaller than the number of attributes). Each word in the vocabulary of the e-mail collection defines an attribute. Study the description of the data set and plan how to convert the data file into a plain csv file that you can read into an ndarray in sklearn. You may want to consider the loadarff reader in scipy. b) Learn a MultinomialNB classification model on the dataset. Use 3-fold cross validation to evaluate the performance of the classifier c) Read the overview of Ensemble methods and use a Bagging classifier built with the MultinomialNB classifier as the base estimator. The Bagging classifier is quite powerful, as it allows sampling both instances of the labelled data, as well as features (attributes). Experiment with different numbers of base estimators, numbers of samples to draw to train each base estimate, and numbers of features to draw to train each base estimator. Evaluate your choices of hyperparameters using 3-fold cross validation. Use the default values for the remaiing BaggingClassifier hyperparameters S. d) Summarize your findings from parts (b), and (c). Which classifier and hyperparameter values per- formed best? Q2. Is it a conference announcement? The DBWorld e-mails data set contains 64 e-mails collected from DBWorld mailing list, classified into two classes: "conference announcements" and "everything else". The data has 64 instances and 4702 attributes (note that the number of instances is much smaller than the number of attributes). Each word in the vocabulary of the e-mail collection defines an attribute. Study the description of the data set and plan how to convert the data file into a plain csv file that you can read into an ndarray in sklearn. You may want to consider the loadarff reader in scipy. b) Learn a MultinomialNB classification model on the dataset. Use 3-fold cross validation to evaluate the performance of the classifier c) Read the overview of Ensemble methods and use a Bagging classifier built with the MultinomialNB classifier as the base estimator. The Bagging classifier is quite powerful, as it allows sampling both instances of the labelled data, as well as features (attributes). Experiment with different numbers of base estimators, numbers of samples to draw to train each base estimate, and numbers of features to draw to train each base estimator. Evaluate your choices of hyperparameters using 3-fold cross validation. Use the default values for the remaiing BaggingClassifier hyperparameters S. d) Summarize your findings from parts (b), and (c). Which classifier and hyperparameter values per- formed best

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!