Question: Jupyter DATA 3 5 5 0 _ Homework _ 1 _ U Last Checkpoint: Last Tuesday at 1 1 : 2 8 AM ( autosaved

Jupyter DATA

3550_

Homework

_1_

U Last Checkpoint: Last Tuesday at

11

28

(

autosaved

)

Logout

Trusted

|

Python

3 (

ipykernel

) 0

Import required packages

[]

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

import missingno as msno

%

matplotlib inline

import warnings

warnings.filterwarnings

('

ignore

')

2 .

Import the datasets

)

Import the dataset Hitters.csv and assign it to df

_

)

Print the info of the pandas DataFrame df

_

Hi and explain your observations.

3 .

Data Visualization

)

Create a visualization to observe the missing values and their patterns in the df

_

Hi and explain your observations.

)

Create a countplot of the all three categorical variables in the data and explain your observations.

)

Create a distribution plot of all numerical variables. Breifly explain what you observe from those plots.

)

Create the comparative boxplot of all the numerical variables. Explain your observation.

4 .

Data Exploration

)

Calculate the statistical summary of the data df

_

.

Explain your observations.Import the datasets

)

Import the dataset Hitters.csv and assign it to df

_

)

Print the info of the pandas DataFrame df

_

Hi and explain your observations.

)

Calculate the statistical summary of the data df

_

.

Explain your observations.

)

Observe the unique number of values, most repeated values, and least repeated values of the variables.

)

Check if there is any outliers in the dataset.

Imputing Missing Values

Check if there is any missing values in the Salary column of DataFrame df

_

.

What did you notice?

How do we impute the missing values for df

_

?

Why did you use that, write your reasoning.

Verify that you correctly imputed the missing values.

Create the Dummy Variables

)

Create the dummies of the variable NewLeague. What is the count of each category?

)

Create the dummies of the variable League. What is the count of each category?

)

Create the dummies of the variable Division. What is the count of each category?

Merge the Data and Perform the Correlation Analysis

)

Merge the dummies created with the DataFrame df

_

.

)

Look at the info of the data, how many variables are there now?

)

Create the heatmap that shows the pairwise correlation between all the variables.

)

Observe the correlation coefficients and identify the pairs that has a correlation of more than

0.8

in absolute value.

)

Successively drop those variables until there is no variable with the pairwise correlation higher than

0.8 .

)

Merge the dummies created with the DataFrame df

_

.

)

Look at the info of the data, how many variables are there now?

)

Create the heatmap that shows the pairwise correlation between all the variables.

)

Observe the correlation coefficients and identify the pairs that has a correlation of more than

0.8

in absolute value.

)

Successively drop those variables until there is no variable with the pairwise correlation higher than

0.8 .

Transform the data.

)

Transform the column Salary to the binary numerical variable as follows: If the Salary is above the median salary assign the value

1,

otherwise assign the

value

0 .

)

Verify that you correctly transformed the data. Observe the count of

1

and

0

in this column after the transformation.

Jupyter DATA 3 5 5 0 _ Homework _ 1 _ U Last

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Downloads/ 5 Computing 6- In-Lab Noteboo X Computing 6 - Assignment - X Computing 6 - Assignment-Co X Computing 6 - Assignment + ta Not syncing...

Question 5 and 6 have me stuck because I do not understand what they are asking for, I sent photos of the pervious questions code to help you get some context. Thank you! Question 4 I am also not...

General rules This exam should be in class and proctored; please come to our regular classroom and bring your laptop. the submission by students who did not show up for the exam will not be graded....

jupyter Module One Discussion Last Checkpoint: 09/1 1/2019 (autosaved) File Edit View Insert Cell Kernel Widgets Help Not Trusted | Python 3 0 B + x 4 6 + + HRun C H Markdown v Step 1: Preparing the...

5. Scatterplot and Correlation for the Total Number of Wins and Average Points Scored You constructed a scatterplot of total number of wins and average points scored. You also calculated the Pearson...

python help using jupyter notebook http://www.filedropper.com/lab9-pythonrequests https://ufile.io/bwfzqv9d jupyter Lab9-PythonRequests Last Checkpoint: 13 minutes ago (autosaved) Logout File Edit...

Last Name, First Name Desc: This notebook serves as a template for binary classification problem. In [1]: ) # import packages import pandas as pd import seaborn as sns from sklearn import metrics...

science | IT... DG Object Oriented Pr... 1G My orders Metal Gear Solid 1.,, Apporto | App and... .Oop assignment | 1... Calorie Calculator -.. / 2019 Form 2962 & Order N Education Help Configure....

NEED HELP WRITING THE CODE FOR STEP 4. Trusted Python 3 ebook.tex x =ct Two Jupyter Script.ipynb jupyter Project Two Jupyter Script Last Checkpoint: 09/20/2019 (autosaved) File Edit View Insert Cell...

Define the null and alternative hypothesis in mathematical terms and in words. Report the level of significance. Include the test statistic and the P-value. See Step 2 in the Python script. Provide...

For each of the following items, identify whether the item is a product cost or period cost for a clothing retailer. ItemProduct Period a. Invoice cost of purchased inventor) b. Non-refundable...

What are the possible consequences of too little pressure during the resistance welding cycle? Too much pressure?

Sophia invested in a stock last year and sold it for a \ ( 2 5 \ % \ ) return. She has to pay taxes on the gain. ( It was a short - term gain so she pay tax based on her normal tax rates ) Based on...

(6)Headcount enrollments in American higher education have fallen nine years in a row. 1.If this pattern of falling demand occurred in a typical American industry, then what reactions would we expect...