Question: CoursHeroTranscribedText: Data Exploration and Multiple Linear Regression (MLR) using R1 The Boston Housing data set, part of the MASS package, records properties of 506 housing

CoursHeroTranscribedText: Data Exploration and Multiple Linear Regression (MLR) using R1 The "Boston Housing" data set, part of the MASS package, records properties of 506 housing zones in the Greater Boston area. For a description of the data (housing data and attribute information), visit https://archive.ics.uci.edu/ml/datasets/Housing. Typically one is interested in predicting MEDV (median home value) based on other attributes. 1. Generate box-plots of the LSTAT (% of lower status in the population) and MEDV (median home value) attributes and identify the cutoff values for outliers. Generate a scatterplot of MEDV against LSTAT; comment on how inclusion of the outliers would affect a predictive model of median home value as a function of % of lower status in the population. 2. Try to fit an MLR to this dataset, with MEDV as the dependent variable. MEDV has a somewhat longish tail and is not so Gaussian-like, so we will take a log transform, (use LMEDV = - log(MEDV) ), and then predict LMDEV instead. (You should convince yourself that this is a better idea by looking at the histograms and quantile plots to assess normality; however no need to submit such plots). Keep the firrst 300 records as a training set (call it Bostrain) which you will use to fit the model; the remaining 206 will be used as a test set (Bostest). Use only the following variables in your model: LMEDV = LSTAT + RM + CRIM + ZN + CHAS. 3. Report the coefficients obtained by your model. Would you drop any of the variables used in your model (based on the t-scores or p-values)? 4. Report the MSE obtained on Bostrain. How much does this increase when you score your model (i.e., predict) on Bostest? 5. (Bonus 1 point). Use the stepwise regression to reach your final model. Try different model section criteria (i.e., AIC, Cp, BIC, adj R^2, R^2) and see if you can come up with the same model even with the different criteria. Determine the best model if you get different models with different criteria? We will consider a model that gives the highest accuracy (in terms of MSE) in the test set as the best model. 1 You must use R to run regression although use of other software is also encouraged for verification of your answers

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!