Question: Question 2 . ( 8 points ) : ( Open response ) The following questions are based on the Assignment 8 - Q 2 -

Question

2 . (8

points

)

(

Open response

)

The following questions are based on the Assignment

8 -

2 -

IBM

-

Data

-

Clean.xlsx data file. Please submit two files for all the following questions.

1,

The

.

ipynb file clearly showing all the code you wrote and the results for each cell. For the data set you created in Python, use head

()

to show the first few lines in the data. Clearly mark which question each cell is written for.

2,

A word file with answers to the questions. Please clearly mark the question numbers for your answers. You can also write all the answers in the

.

ipynb file without submitting the Word file. Question

2.1 . (2

points

)

: Use functions provided by pandas to process the Assignment

8 -

2 -

IBM

-

Data

-

Clean.xlsx data file to get a subset of the data ready for numeric prediction. From all the attributes available in the file, pick one numeric attribute to be the dependent variable

(

),

and pick

5 (

or more if you prefer

)

other reasonable numeric attributes as the independent variables

(

) .

The final data after processing should be stored in a variable whose type is DataFrame. Tip: Use the following way to select a subset of data. data

1 =

data

[["

Attribute

1 ",

"Attribute

2 ", "

Attribute

3 "]]

Question

2.2 . (0.5)

: Instead of using functions from pandas, use file reading

/

writing and while

/

for loops to process each line in the data file to perform the same task described in Question

2.1 (

.

.

pick y & X

,

and store as a DataFrame

) .

You should first save Assignment

-

2 -

IBM

-

Data

-

Clean.xlsx as a CSV

(

comma delimited

)

file before using open

()

to open it

.

Tip: You can open one file for read, and one new csv file for writing. For each line, split it using the comma as a delimiter to separate the attributes into a list. Select the desired attributes from the list using indexes. Use string concatenation to connect the desired attributes to construct the new line

(

don

t forget to add

'

'

at the end of the line

),

then write the new line into the new file. After the new file is closed, you can use read

_

csv

()

to convert it to DataFrame. Question

2.3 . (1.5

points

)

: Use the DataFrame you created in either Question

2.1

or Question

2.2

to create the correlation matrix. If you fail to successfully finish Question

2.1

and Question

2.2,

you can just take the subset

(

y&X

)

of the data in Excel, read it using read

_

excel

()

from pandas, and use that data for this task and the following tasks. Briefly discuss what you can learn from the correlation matrix. Question

2.4 . (1

points

)

: Split the data into Training

(60 %)

and Testing

(40 %) .

Question

2.5 . (3

points

)

: Build Lasso, Regression Tree and Random Forest models for the data. Report RMSE and RRSE for these three models. In addition, use results from the correlation matrix, the coefficients from the Lasso linear regression model, and the feature importance generated by the Regression Tree model to discuss what you can learn from these results. For example, discuss which X variable has the strongest or steepest linear relationship with the y variable, etc.

EnvironmentSatisfaction

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

Question 2 . ( 8 points ) : ( Open response ) The

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Question #1 (18 points) Jane Smith started a consulting business, Smith Consulting Inc. Smith Consulting Inc. completed the following transactions during January 2021. Prepare the journal entries for...

32 questions. urgent help! please help' asap!! Question 1 (0.5 points) All of the following are true of the executive summary of the business plan EXCEPT: It should be written once the rest of the...

Econ 295, Fall 2017, Prof. Lesica DU E : December 6, 2017 ASSIGNMENT 3, Points: 60 + 6 Bonus F F F F F AT THE BEGINNING OF CLASS ONLY This is not a group assignment. Even if you work in a group, you...

please explain it too * Test Information Description Use the following variable definitions for the following questions. .data vari var2 var3 var4 SBYTE WORD SWORD DWORD 1, 3, -2, -4 4000h, 3000h,...

Can I get help with the following questions? Thanks MGEB06 Assignment 1 (Fall 2020) Question 1 (20 points) - Topic 1 The following table provides the economic data for an economy. Year 2017 2018 2019...

I need help for this assignment. The information is in attach file. Take Test: Assessment - Phase 2 Test Information Description Instructions Please select the best answer to each question. Multiple...

The following assignment will allow you to master the concepts you have learned on discrete probability distributions. 1. Answer the following questions based on rolling a single six-sided die. (2...

I have a review for my final for my accounting class. The review didn't include answers. I'm going through it and want to make sure I'm getting the questions correct. I've attached the document with...

Acct 220. Intro to Accounting final exam. Please see attachment for detailed instructions Quiz Note: It is recommended that you save your response as you complete each question. Directions: This...

List and briefly describe each of the seven steps of product development. For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac). BIUS Paragraph V Arial 10pt V A V TX E X2 X2 + ABC V V X EXE {;} O ?...

Repeat Exercise 6 using the Crank-Nicolson Algorithm. Repeat exercise 6

How has the Internet changed job searching for individuals and recruiters? Has the change had a positive or a negative effect?

Tumey Company produces and sells automobile batteries, the heavy - duty HD - 2 4 0 . The 2 0 2 5 sales forecast is as follows. \ table [ [ Quarter , , HD - 2 4 0 ] , [ 1 , , 5 , 0 0 0 ] , [ 2 , , 7 ,...

All depository institutions are focusing on: Question 4 options: 1) Assets 2) Increasing their interest income 3) Decreasing their non-interest income 4) Decreasing their interest income 5)...