Question: Preface For the final exam / project we will develop classification models using different approaches and compare their performance metrics. We will use a cleaned

Preface

For the final exam

/

project we will develop classification models using different approaches and compare their performance metrics. We will use a

cleaned

-

up and reduced subset of a real world dataset used only a few years ago in a machine learning competition utilizing the "common task

framework"

(

CTF

) .

The data for the final exam are can be downloaded as a zip archive from Canvas, and they consist of two separate files:

training data final

-

data

-

train.csv and test

(

validation

)

data final

-

data

-

test.csv

.

Only the "training" dataset has the outcome variable

(

noyes column

)

available, and it is for fitting your models including all the cross

-

validations, test error estimates, hyper

-

parameter tuning

-

as requested and

/

or as you may see fit. In that sense, those "training" data are

the full and only dataset available to you to come up with your models.

The "test" dataset has no outcomes provided to you. We are the only ones who have those. After you come up with your best model

(

)

using the "training" data provided to you, you have to use those models to make predictions for this mystery "test" set and submit the

results. We will evaluate the accuracy of your predictions against the outcomes and let you know what accuracy scores you achieved and

also how you stand against your peers, just to make things a bit more fun. This is how data science competitions are often run

(

platforms like Kaggle, in particular

) .

Thus, in addition to the Rmarkdown and HTML files showing all your work, with this final exam

/

project you will be also asked to upload your

predictions on the test set into Canvas, as separate CSV file

(

) .

IMPORTANT: The submission file with predictions for the test dataset:

must have two columns, one row per each observation in the test dataset, comma

-

separated values.

The first column must contain the observation identifier

(

the value from column id in test dataset

) .

The second column must contain predictions of your model for the observations in the test dataset

(

corresponding to the values of

observation

'

'

in the first column, or course

) .

The predictions must be specified as

"

" / "

"

labels.

Finally, the CSV file must contain a header line, please label the first column

'

',

and the second column

(

predictions

)

label can be anything

you like

-

the name of this column will become the name of your model as it will appear in the leaderboard

(

you don't have to put your

name there, you can stay anonymous to your peers and just call your model 'myWinningModel

3') .

The submissions that do not follow that exact format will be rejected without generating error messages or sending automatic followup emails

(

we will try to send those as a courtesy if we do notice some problems with the submission format, but we can't guarantee that

) .

For your reference, you can examine a few examples of prediction submission files following this format, which are provided to you in the same

zip archive you downloaded from Canvas

(

predictions

- * .

csv files in the predictions

-

examples sub

-

folder

) .

Just to re

-

iterate and to emphasize, please notice that this time your final submission must consist of the following three

(

not just two, Rmd

+

html

,

as usual

)

items:

Rmarkdown

* .

Rmd file with all the calculations you want to receive credit for,

HTML version of the output generated by your

* .

Rmd file, and

Final predictions for the test dataset in comma

-

separated values

(

CSV

)

format from the best model you could come up with

(

file name

must have

?^{* *} . c s v

extension for the file to load in Canvas

) .

Preface For the final exam / project we will

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!

* * PYTHON PLEASE * * Objective: Build a classification model to predict the quality of wines based on various features using both Decision Trees and Support Vector Machines. Dataset: You can use the...

Please read the questions and article and then answer about 2 or 3 pages answer. Please using paraphrasing, do not copy the article.Thank you. \fFinancial Crisis Advisory Group July 28, 2009 To the...

INTERNATIONAL REVIEW OF L AW C OMPUTERS & TECHNOLOGY , VOLUME 11, N UMBER 2, P AGES 251-261, 1997 The Data Mart: A New Approach to Data Warehousing PAM ELA PIPE Introduction Vendors have recently...

Pivotal Role of Customer Relationship Management The term CRM means very different things in different circumstances, a consequence of the rapid evolution and development of this approach to managing...

Amazon Marketplace Case Study: Deliverable for Submission. Write Step 4, The Problem and Step 5, The Solution of the Press Release for the proposed Seller Fulfilled Prime (SFP) initiative. In June...

MGT 3090 Prof. Si Ahn Mehng Conflict Styles Inventory Instructions: The Conflict Styles Inventory consists of eight common community conflict situations. After reading each situation, you will need...

MMH074.qxd 4/8/09 5:09 PM Page 238 Adam Darkins is the Chief Consultant for Care Coordination at the US Department of Veterans Affairs. He has a track record of developing the clinical, technology...

Rev.Confirming Pages C H A P T E R 7 Planning, Composing, and Revising Chapter Outline The Ways Good Writers Write Activities in the Composing Process Using Your Time Effectively Brainstorming,...

Budgeting for Nonprofit Organizations Although budgeting is just as important for nonprofit organizations as for for-profit companies, the approach taken toward budgeting can be very different. In...

As in most areas of life and work, advances in information technology have focused attention on the importance of good information systems to support logistics and distribution activities. The...

Use a 3-cube Q3 to represent each of the Boolean functions in Exercise 5 by displaying a black circle at each vertex that corresponds to a 3-tuple where this function has the value 1.

Write the converse, inverse, and contrapositive of each of the following implications. For each implication, determine its truth value as well as the truth values of its corresponding converse,...

Assume that actual overhead costs were $ 8 7 , 0 0 0 and cverhead allocated to jobs was $ 6 3 , 0 0 0 . The unadjusted balance in Manufacturing Overhead would be A . $ 2 4 , 0 0 0 debit B . $ 1 5 0 ,...

K 18. DEE Cleaning Company cleaned 33 offices and incurred operating costs of $1,782. What was the cost to clean each office? Select the formula labels, then enter the amounts and compute the cost of...