Question: A data set containing wages and other information for a group of 3000 male workers in the Mid-Atlantic region is provided in the input file
A data set containing wages and other information for a group of 3000 male workers in the Mid-Atlantic region is provided in the input filewages.csv
Perform the following operations using Python on this data-
1. Load the data set from the input filewages.csv
2. Generate polynomial models ofagecolumn from this data set up to 4 degrees (i.e. create 4 polynomial models of degrees 1 to 4)
Hint:Use PolynomialFeatures().fit_transform to create these models
3. Perform linear regression as follows:
- Perform linear regression using all these four models
- Fit each model onwagecolumn of the data set
- Use cross-validation with cv=5 to compute the scores for fitting of each of these models
- Note:There will be 5 scores (since cv=5) for fitting of each model
- Compute the mean score of fitting of each model
- Printthe 4 mean scores in a file namedoutput.csv
Input Format:
Read data from a file namedwages.csvpresent at the locationres/wages.csv
Output Format:
- You have to file namedoutput.csvat the locationoutput/output.csv
- This file should contain the mean scores of fitting the 4 models on 4 separate rows
- The values of mean scores need to be rounded to4 decimal placesand thenprintedsuch as0.2345
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
