Question: Question 2 . ( 8 points ) : ( Open response ) The following questions are based on the Assignment 8 - Q 2 -

Question 2.(8 points): (Open response) The following questions are based on the Assignment8-Q2-IBM-Data-Clean.xlsx data file. Please submit two files for all the following questions. 1, The .ipynb file clearly showing all the code you wrote and the results for each cell. For the data set you created in Python, use head() to show the first few lines in the data. Clearly mark which question each cell is written for. 2, A word file with answers to the questions. Please clearly mark the question numbers for your answers. You can also write all the answers in the .ipynb file without submitting the Word file. Question 2.1.(2 points): Use functions provided by pandas to process the Assignment8-Q2-IBM-Data-Clean.xlsx data file to get a subset of the data ready for numeric prediction. From all the attributes available in the file, pick one numeric attribute to be the dependent variable (y), and pick 5(or more if you prefer) other reasonable numeric attributes as the independent variables (X). The final data after processing should be stored in a variable whose type is DataFrame. Tip: Use the following way to select a subset of data. data1=data[["Attribute1","Attribute2"," Attribute3"]] Question 2.2.(0.5): Instead of using functions from pandas, use file reading/writing and while/for loops to process each line in the data file to perform the same task described in Question 2.1(i.e. pick y & X, and store as a DataFrame). You should first save Assignment-Q2-IBM-Data-Clean.xlsx as a CSV (comma delimited) file before using open() to open it. Tip: You can open one file for read, and one new csv file for writing. For each line, split it using the comma as a delimiter to separate the attributes into a list. Select the desired attributes from the list using indexes. Use string concatenation to connect the desired attributes to construct the new line (dont forget to add '
' at the end of the line), then write the new line into the new file. After the new file is closed, you can use read_csv() to convert it to DataFrame. Question 2.3.(1.5 points): Use the DataFrame you created in either Question 2.1 or Question 2.2 to create the correlation matrix. If you fail to successfully finish Question 2.1 and Question 2.2, you can just take the subset (y&X) of the data in Excel, read it using read_excel() from pandas, and use that data for this task and the following tasks. Briefly discuss what you can learn from the correlation matrix. Question 2.4.(1 points): Split the data into Training (60%) and Testing (40%). Question 2.5.(3 points): Build Lasso, Regression Tree and Random Forest models for the data. Report RMSE and RRSE for these three models. In addition, use results from the correlation matrix, the coefficients from the Lasso linear regression model, and the feature importance generated by the Regression Tree model to discuss what you can learn from these results. For example, discuss which X variable has the strongest or steepest linear relationship with the y variable, etc.
A
B
C
F
G
E
H
D
EnvironmentSatisfaction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Question 2 . ( 8 points ) : ( Open response ) The

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!