Consider the attached dataset on 1 3 0 2 American colleges and universities offering an undergraduate program
No answer yet for this question.
Ask a Tutor
Question:
Consider the attached dataset on American colleges and universities offering an undergraduate program and answer the following questions by applying the Python code bits used in class. Feel free in fact you are encouraged to consult me for any guidance.
You must demonstrate how you answered each question in the order asked below by submitting your Jupiter notebook file along with a Word file.
How many variables are there in the dataset?
Which variables are categorical, which are numerical?
Clean the dataset by removing all missingincomplete observations. How many complete observations are left?
Set row names indices to the College Name column.
Clean up the variable names by
a Replacing the following characters and spaces with for removal: # $
b Replacing the following characters with :
Compute the summary statistics of the numerical variables in the dataset.
Plot a histogram for each of the numerical variables by setting the axis labels in plain English to make it easy to understand
Construct a heatmap between all numerical variables and comment on the relationships among them.
By observing the heatmap, select three numerical variables that you think would be interesting to include and draw a matrix scatter diagram between the three.
Convert the categorical variables into integer binary dummy variables.
a Explain in words, for one observation, the values in the derived binary dummies.
Conduct a principal components analysis PCA using only the original numerical variables.
a Make sure to display the \'Standard Deviation\', \'Proportion of Variance\' and \'Cumulative Proportion\' info.
b Comment on the results: How many principal components appear to be significant? Should the data be normalized beforehand?
Normalize the numerical variables using the standard scaler and redo question Comment on the difference in the PCA results after normalization.
Posted Date: