Question: PH245 Introduction to Multivariate Statistics Homework Set 2 Due date: October 25, Monday Problems: 1. The dataset Data-HW2-Bodyfat.txt contains the percentage of body fat, age,


PH245 Introduction to Multivariate Statistics Homework Set 2 Due date: October 25, Monday Problems: 1. The dataset "Data-HW2-Bodyfat.txt" contains the percentage of body fat, age, weight, height, and ten body circumference measurements (e.g., abdomen) for 252 men. Body fat, a measure of health, is estimated through an underwater weighing technique. Fitting body fat to the other measurements using multiple regression provides a convenient way of estimating body fat for men using only a scale and a measuring tape. The file "Data-HW2-Bodyfat-Readme.txt" has more information. Remove the two outliers as we discussed in class. (a) Fit a linear regression model with percent body fat using Siri's equation as the response, age, weight, height, and the ten body circumference mea- surements as the predictors. Present the summary of the linear regression fit. (b) Interpret the coefficient associated with the predictor, age. If one wishes to test the null hypothesis that this coefficient equals zero, what is the p-value of this test? If the significance level is set at 0.05, what is your conclusion of this hypothesis test? (c) Draw a residual plot, with the fitted values on the x-axis, and the residuals on the y-axis. Does the plot suggest any violation of the key assumptions of the linear model? What are those key assumptions? (d) Fit the model we discussed in class with only age, weight, height as the predictors. Test the null hypothesis that this reduced model is preferred versus the alternative hypothesis that the original full model is preferred, given the data. Use the significance level 0.05. (e) (Optional) Draw a plot of the Lasso solution path for the regression on age, weight, height, and the ten body circumference measurements. 2. The dataset "Data-HW2-Carseats. Rdata" contains the sales of children car seats at 400 different stores. The data frame contains 11 variables. We are interested in estimating the unit sales (in thousands) at each store using the rest of the variables. (a) Fit a linear regression model to predict Sales using Price, Urban, US, and write out the fitted model in equation form. Note that some of the variables are categorical. (b) Present the summary of the linear regression fit, and provide an interpre- tation of each coefficient in the model. (c) Is there any evidence of outliers or high leverage observations for this model
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
