Question: Perform the answers with python code. In this assignment, you are given the following 3 datasets. Each dataset has a training and a test file.

Perform the answers with python code.

In this assignment, you are given the following 3 datasets. Each dataset has a training and a test file. Specifically, these files are:

dataset 1: train-100-10.csv dataset 2: train-100-100.csv dataset 3: train-1000-100.csv

test-100-10.csv test-100-100.csv test-1000-100.csv

Start the experiment by creating 3 additional training files from the train-1000-100.csv by taking the first 50, 100, and 150 instances respectively. Call them: train-50(1000)- 100.csv, train-100(1000)-100.csv, train-150(1000)-100.csv. The corresponding test file for these dataset would be test-1000-100.csv and no modification is needed.

Implement L2 regularized linear regression algorithm with ranging from 0 to 150 (integers only). For each of the 6 dataset, plot both the training set MSE and the test set MSE as a function of (x-axis) in one graph.

(a) For each dataset, which value gives the least test set MSE?

(b) For each of datasets 100-100, 50(1000)-100, 100(1000)-100, provide an additional

graph with ranging from 1 to 150.

three datasets in (b).

From the plots in question 1, we can tell which value of is best for each dataset once we know the test data and its labels. This is not realistic in real world applications. In this part, we use cross validation (CV) to set the value for . Implement the 10-fold CV technique discussed in class (pseudo code given in Appendix A) to select the best value from the training set.

(a) Using CV technique, what is the best choice of value and the corresponding test set MSE for each of the six datasets?

(b) How do the values for and MSE obtained from CV compare to the choice of and MSE in question 1(a)?

(d) What are the factors affecting the performance of CV?

3. Fix = 1, 25, 150. For each of these values, plot a learning curve for the algorithm using the dataset 1000-100.csv.

Note: a learning curve plots the performance (i.e., test set MSE) as a function of the size of the training set. To produce the curve, you need to draw random subsets (of increasing sizes) and record performance (MSE) on the corresponding test set when training on these subsets. In order to get smooth curves, you should repeat the process at least 10 times and average the results.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

candy_reader.py candy.csv im not really sure how to code this at all if anyone could help U4A - Application of Functional Programming Assignment Task :: Data Exploration Given the dataset and the...

Working with a neural network that can be used to predict future revenues from the sales of a new video game. A dataset is provided that you'll use to train a neural network to predict how much money...

help with python 1: HW06a.py Solving transcendental equations with scipy: 1. Given the equation for the intrinsic carrier density ni2=BT3exp(ar) find the value of T (in degrees Keivin) for a given...

(JAVA - DATA STRUCTURES) Hi, THIS IS THE FOURTH TIME I HAVE POSTED THIS QUESTION AND NOBODY WANTS TO HELP ME. PLEASE, I NEED SOMEONE TO HELP ME. I need help with the program CountryDisplayer.java and...

The total number of points for this assignment is 120 points. Please submit your assignment in a Word file. Use this assignment file as a template to enter and copy-paste your answers for your...

Background and Data Dictionary In this lab assignment, you will analyze data provided on Canvas under the file name "charitydata.xls." A charitable organization has enlisted your expertise to...

Write Python code to solve this homework in detail with comments. eg of csv file contain: AREA Description AGR The course aims to introduce Rules and Regulations that are designated for undergraduate...

RMIT UNIVERSITY Programming Fundamentals (COSC2531) Assignment 2 Individual assignment (no group work). Submit online via Canvas/Assignments/Assignment 2. Marks are awarded per rubric (please see the...

PLEASE, I NEED HELP WITH THIS JAVA CODE The code that you write for this assignment will build on top of the List ADT and one of the implementations of List that we've seen in class. The code will...

Hello, this is a project for a Data Structures class, I'm new to Java so I don't know a whole lot, so please, I would appreciate if you could help me with this one. I just need to finish up the code...

Determine whether each of the following statements is (i) Always true, (ii) Sometimes true, or (iii) Never true. For those that are (ii) Sometimes true, explain when the statement is true. a....

The State University Credit Union, a savings institution open to the faculty and staff of State University, handles savings accounts and makes loans to members. In order to plan its investment...

Warren Buffet once referred to _ _ _ _ _ _ _ _ _ _ _ _ _ as "financial weapons of mass destruction?" Group of answer choices stock exchanges derivatives market movements financial legislation

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

2. Place a value on the outcomes.

4. Obtain an annual amount of benefits (operational results) from training by comparing results after training to results before training (in dollars).

7. Calculate the ROI by dividing benefits (operational results) by costs. The ROI gives an estimate of the dollar return expected from each dollar invested in training.