Question: Dummy code the Private dummy variable. (2 points) 2. Generate box-plots of the accept (Number of applications accepted) (2 points) and top10perc (% of new
Dummy code the "Private" dummy variable. (2 points) 2. Generate box-plots of the accept (Number of applications accepted) (2 points) and top10perc (% of new students from top 10% of High School class)) (2 points) attributes, enroll (Number of new students enrolled) (2 points) and identify the cutoff values for outliers. [ (4 points: remove outliers)] 3. Try to fit an MLR to this dataset, with ENROLL as the dependent variable. P_UNDERGRAD has somewhat longish tail, so we will take a log transform, (use LP_UNDERGRAD = log(P_UNDERGRADE)) and then use LP_UNDERGRADE as one of predictor (6 points) Keep the first 544 records as a training set (call it ENROLLTRAIN) which you will use to fit the model; the remaining 233 will be used as a test set (ENROLLTEST) (6 points) Use only the following variables in your model: ENROLL=ACCEPT + TOP10PERC + F_UNDERGRAD + LP_UNDERGRADE + ROOM_BOARD + GRADE_RATE + PRIVATEDUMMY (6 points) (a) Report the coefficients obtained by your model. Would you drop any of the variables used in your model (based on the t-scores or p-values)? (10 points) (b) Report the MSE obtained on ENROLLTRAIN. How much does this increase when you score your model on ENROLLTEST? (10 points) (c) Do you think your MLR model is reasonable for this problem? You may look at the distribution of residuals to provide an informed answer. (Bonus 2 points)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
