Question: Preface For the final exam / project we will develop classification models using different approaches and compare their performance metrics. We will use a cleaned

Preface
For the final exam/project we will develop classification models using different approaches and compare their performance metrics. We will use a
cleaned-up and reduced subset of a real world dataset used only a few years ago in a machine learning competition utilizing the "common task
framework" (CTF). The data for the final exam are can be downloaded as a zip archive from Canvas, and they consist of two separate files:
training data final-data-train.csv and test (validation) data final-data-test.csv.
Only the "training" dataset has the outcome variable ( noyes column) available, and it is for fitting your models including all the cross-
validations, test error estimates, hyper-parameter tuning - as requested and/or as you may see fit. In that sense, those "training" data are
the full and only dataset available to you to come up with your models.
The "test" dataset has no outcomes provided to you. We are the only ones who have those. After you come up with your best model(s)
using the "training" data provided to you, you have to use those models to make predictions for this mystery "test" set and submit the
results. We will evaluate the accuracy of your predictions against the outcomes and let you know what accuracy scores you achieved and
also how you stand against your peers, just to make things a bit more fun. This is how data science competitions are often run (on
platforms like Kaggle, in particular).
Thus, in addition to the Rmarkdown and HTML files showing all your work, with this final exam/project you will be also asked to upload your
predictions on the test set into Canvas, as separate CSV file(s).
IMPORTANT: The submission file with predictions for the test dataset:
must have two columns, one row per each observation in the test dataset, comma-separated values.
The first column must contain the observation identifier (the value from column id in test dataset).
The second column must contain predictions of your model for the observations in the test dataset (corresponding to the values of
observation 'id' in the first column, or course).
The predictions must be specified as "Y"/"N" labels.
Finally, the CSV file must contain a header line, please label the first column 'id', and the second column (predictions) label can be anything
you like - the name of this column will become the name of your model as it will appear in the leaderboard (you don't have to put your
name there, you can stay anonymous to your peers and just call your model 'myWinningModel3').
The submissions that do not follow that exact format will be rejected without generating error messages or sending automatic followup emails
(we will try to send those as a courtesy if we do notice some problems with the submission format, but we can't guarantee that).
For your reference, you can examine a few examples of prediction submission files following this format, which are provided to you in the same
zip archive you downloaded from Canvas ( predictions-*. csv files in the predictions-examples sub-folder).
Just to re-iterate and to emphasize, please notice that this time your final submission must consist of the following three (not just two, Rmd+html,
as usual) items:
Rmarkdown *.Rmd file with all the calculations you want to receive credit for,
HTML version of the output generated by your *.Rmd file, and
Final predictions for the test dataset in comma-separated values (CSV) format from the best model you could come up with (file name
must have ?**.csv extension for the file to load in Canvas).
Preface For the final exam / project we will

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!