Question: Please only answer Part 3 and 4 questions. i uploaded the entire assignmnent coz there are follow up questions. Thank you very much This assignment

Please only answer Part 3 and 4 questions. i uploaded the entire assignmnent coz there are follow up questions. Thank you very much
Please only answer Part 3 and 4 questions. i
Please only answer Part 3 and 4 questions. i
Please only answer Part 3 and 4 questions. i
Please only answer Part 3 and 4 questions. i
This assignment follows from Regression Assignment 1. The following case study was obtained from Jansen et al. (2007). Climate chunge is currently the most important threat facing the world's coastline. Marine coastal ecosystems are extremely vulnerable, as they constitute the most productive and diverse comunities on Earth. The Dutch governmental institute RIKZ therefore started a research project on the relationship between some abiotic aspects (0-6, sediment composition, slope of the beach) as these might affect benthic fauna. The aim of the project was to find relationships between the benthic fauna of the intertidal area and abiotic variables. The following predictor variables were available: NAP is the height of the sampling station relative to the mean tidal level. Humus constitutes the amount of organic material Sampling took place in June 2002. A nominal variable work was introduced for each sample, which has the values 1, 2, 3 and 4, indicating in which week of June a beach was monitored. anglet represents the angle of each station, whereas angle2 is the angle of the entire simpling area on the beach. Both variables were used. The variables angle2, grain size, penetrability, salinity and temperature were available at beach level. The species data are analysed by 1 converting them into a diversity index. For this analysis, the response variable is the density of the fauma which was calculated using the Shannon-Weaver index (density). The data set has now been split into a training set (RIKZ train.csv (Regression > Projects}), and a test set (RIKZ_test.csv). Use the training data set to answer parts 1 and 2 and use the test set for part 3. You must answer the following questions: 1. Part 1: Variable selection (22) (a) Consider the following two variable selection procedures forward selection based on the AIC, and all subseits regression based on the adjusted R2. Provide a brief description of these algorithms. (b) Model 2A: Use the training data set and implement the methods in (a) to select a model for our analysis. You must use the following predictor variables anglel, angle2. humus, penetrability, temperature, salinity, NAP. From the results, which variables must be included in the regression model? Explain why. Fit the model suggested by the variable selection procedures to the training dataset (2) (c) Model 2B: Fit a new regression model which includes the variables in model 2A and the Week variable. You must treat Week as a categorical variable. (d) Use model 2B to test whether temperature and salinity should be included in the model. (Note: if your model 2B does not contain the variables temperature and salinity, use the full model with all predictor variables) What is the null and alternative hypothesis? (1) Explain how the test statistic to test for this hypothesis is obtained. What is the distribution of the test statistie and the degrees of freedom used? (4) Calculate the value of the test statistic, and the corresponding p-value. What can you conclude? (e) Model 2C: Fit a regression model with NAP and the categorical variable Week. By referring to, and reporting at least two measures of goodness of fit, compare model 2B to model 2C. (2) Interpret the regression coefficients associated with the Week variable in model 2C. (3) Is there statistical evidence that the density of the fauna changes over the weeks? Explain why. (2) 2. Part 2: Model checking and residual analysis (16) (a) Use model 2C to answer the following questions i. Produce a scatter plot of the residuals against the fitted values. Attach the plot and comment on your results. (2) ii. Produce a scatter plot of the residuals against NAP. Attach the plot and comment on your results. (2) iii. Produce a box plot of the residuals against week. Attach the plot and comment on your results. (2) iv. Carry out tests of normality of residuals. In your answer, you should include and attach a histogram, and a quantile-quantile plot of the residuals. You should also perform one formal statistical test of normality. Comment on your findings. (6) v. Investigate whether there are any outliers or influential observations. Comment on your results. Use two different techniques in your analysis and attach your plots. (4) 3. Part 3: Model performance on test set (5) (a) Use the test set and model 2C. Predict responses using the observations in the test data set. Produce a scatter plot of the observed values, and superimpose this by the scatter plot of the predicted values. Add the prediction interval to this plot. Report on the mean squared error of predictions and comment on your findings. 4. Based on the analysis made, what affects fauna density, and how? You must provide and report any relevant statistical evidence (test statistics, confidence interval, etc.) to support your answer. (4) Back RIKZ_train 0.756 354.5 w anglel angle2 NAP grainsize humus temperature salinity penetrability week density 16 36 17.4 27.1 1723 3 0.578558006070426 13 31 -0.811 297 0 18.775 250.3 2 0.374676541412811 13 31 -0.03 361 0 18.775 25.1 253.4 2 0.335716858775699 20 21 1.117 251.5 0 19.5 299 2521 + 0.376777922831669 49 77 0.883 266 0.05 18.4 274 165.3 2 0.178111253971134 22 42 0.167 323.5 0.1 15.8 27.9 258.1 2 0.409691001300806 28 32 1.671 294.5 0 20 25.4 2963 3 0.439247291135819 48 36 -0.893 336 0.05 17.4 22.1 1293 3 0.58227814977452 22 21 0.729 275.5 0.1 4 0.626654770672766 55 32 -1.005 355 0 20 26.4 3828 3 0.73993117329964 55 96 -0.356 244.5 0.05 20.5 489 3 0.657375707585256 32 96 0.045 0.05 37.5 29.4 253.9 1 0.761906390200748 22 96 2.255 186 0.05 20.5 27.1 6065 3 0 27.1 222.5 52 89 0.635 211.5 0.1 20.8 29.6 247.9 1 22 32 0.17 362 0 20 26.4 415.8 3 2 30 77 0.367 284 0 18.4 27.4 2623 0.29160666732367 0.205251949698905 0.0848849498242823 0.0858033698146211 1.23972435378997 49 31 0.46 330.5 0.05 18.775 28.1 2585 2 22 21 -0.S03 265 0 19.8 29.9 256.1 4 31 21 1.627 256.5 0 19.8 239.1 29.9 28.1 10 31 1.768 293.5 0.05 25.4 2 0 18.775 20 50 32 -0.375 316.5 0.1 25.4 3423 3 0.560655666301601 86 77 1.786 256.5 0.0125 18.4 22.4 1518 2 0 312 77 1.375 234 0 18.4 27.4 1933 2 0 65 96 -1.336 194.5 0.1 17.5 237.1 . 0.846735236743312 143 89 0 20.5 29.6 257.9 1 0.996640956926915 -1.334 197 -0.002 223 138 96 0 20.5 27.1 5685 3 0.457678508133062 26 77 0 27.4 225.8 2 0.120629093284051 -0.06 242 1.367 289.5 58 31 0 18.775 25.1 2703 2 0 126 89 0.82 205.5 0.1 20.5 29.6 257.1 1 23 96 -0.684 202 0.05 17.5 251.9 248.9 0.401920060043369 0.744139390692219 1.01888184740715 0.195915088161936 26 89 0.061 205.5 0.15 20.8 29-6 1 30 42 -0.201 295.5 0.1 15.8 27.9 2729 2 26 42 1.494 272.5 0 15.8 27.9 274.4 2 0 Projects}), and a test set (RIKZ_test.csv). Use the training data set to answer parts 1 and 2 and use the test set for part 3. You must answer the following questions: 1. Part 1: Variable selection (22) (a) Consider the following two variable selection procedures forward selection based on the AIC, and all subseits regression based on the adjusted R2. Provide a brief description of these algorithms. (b) Model 2A: Use the training data set and implement the methods in (a) to select a model for our analysis. You must use the following predictor variables anglel, angle2. humus, penetrability, temperature, salinity, NAP. From the results, which variables must be included in the regression model? Explain why. Fit the model suggested by the variable selection procedures to the training dataset (2) (c) Model 2B: Fit a new regression model which includes the variables in model 2A and the Week variable. You must treat Week as a categorical variable. (d) Use model 2B to test whether temperature and salinity should be included in the model. (Note: if your model 2B does not contain the variables temperature and salinity, use the full model with all predictor variables) What is the null and alternative hypothesis? (1) Explain how the test statistic to test for this hypothesis is obtained. What is the distribution of the test statistie and the degrees of freedom used? (4) Calculate the value of the test statistic, and the corresponding p-value. What can you conclude? (e) Model 2C: Fit a regression model with NAP and the categorical variable Week. By referring to, and reporting at least two measures of goodness of fit, compare model 2B to model 2C. (2) Interpret the regression coefficients associated with the Week variable in model 2C. (3) Is there statistical evidence that the density of the fauna changes over the weeks? Explain why. (2) 2. Part 2: Model checking and residual analysis (16) (a) Use model 2C to answer the following questions i. Produce a scatter plot of the residuals against the fitted values. Attach the plot and comment on your results. (2) ii. Produce a scatter plot of the residuals against NAP. Attach the plot and comment on your results. (2) iii. Produce a box plot of the residuals against week. Attach the plot and comment on your results. (2) iv. Carry out tests of normality of residuals. In your answer, you should include and attach a histogram, and a quantile-quantile plot of the residuals. You should also perform one formal statistical test of normality. Comment on your findings. (6) v. Investigate whether there are any outliers or influential observations. Comment on your results. Use two different techniques in your analysis and attach your plots. (4) 3. Part 3: Model performance on test set (5) (a) Use the test set and model 2C. Predict responses using the observations in the test data set. Produce a scatter plot of the observed values, and superimpose this by the scatter plot of the predicted values. Add the prediction interval to this plot. Report on the mean squared error of predictions and comment on your findings. 4. Based on the analysis made, what affects fauna density, and how? You must provide and report any relevant statistical evidence (test statistics, confidence interval, etc.) to support your answer. (4) Back RIKZ_train 0.756 354.5 w anglel angle2 NAP grainsize humus temperature salinity penetrability week density 16 36 17.4 27.1 1723 3 0.578558006070426 13 31 -0.811 297 0 18.775 250.3 2 0.374676541412811 13 31 -0.03 361 0 18.775 25.1 253.4 2 0.335716858775699 20 21 1.117 251.5 0 19.5 299 2521 + 0.376777922831669 49 77 0.883 266 0.05 18.4 274 165.3 2 0.178111253971134 22 42 0.167 323.5 0.1 15.8 27.9 258.1 2 0.409691001300806 28 32 1.671 294.5 0 20 25.4 2963 3 0.439247291135819 48 36 -0.893 336 0.05 17.4 22.1 1293 3 0.58227814977452 22 21 0.729 275.5 0.1 4 0.626654770672766 55 32 -1.005 355 0 20 26.4 3828 3 0.73993117329964 55 96 -0.356 244.5 0.05 20.5 489 3 0.657375707585256 32 96 0.045 0.05 37.5 29.4 253.9 1 0.761906390200748 22 96 2.255 186 0.05 20.5 27.1 6065 3 0 27.1 222.5 52 89 0.635 211.5 0.1 20.8 29.6 247.9 1 22 32 0.17 362 0 20 26.4 415.8 3 2 30 77 0.367 284 0 18.4 27.4 2623 0.29160666732367 0.205251949698905 0.0848849498242823 0.0858033698146211 1.23972435378997 49 31 0.46 330.5 0.05 18.775 28.1 2585 2 22 21 -0.S03 265 0 19.8 29.9 256.1 4 31 21 1.627 256.5 0 19.8 239.1 29.9 28.1 10 31 1.768 293.5 0.05 25.4 2 0 18.775 20 50 32 -0.375 316.5 0.1 25.4 3423 3 0.560655666301601 86 77 1.786 256.5 0.0125 18.4 22.4 1518 2 0 312 77 1.375 234 0 18.4 27.4 1933 2 0 65 96 -1.336 194.5 0.1 17.5 237.1 . 0.846735236743312 143 89 0 20.5 29.6 257.9 1 0.996640956926915 -1.334 197 -0.002 223 138 96 0 20.5 27.1 5685 3 0.457678508133062 26 77 0 27.4 225.8 2 0.120629093284051 -0.06 242 1.367 289.5 58 31 0 18.775 25.1 2703 2 0 126 89 0.82 205.5 0.1 20.5 29.6 257.1 1 23 96 -0.684 202 0.05 17.5 251.9 248.9 0.401920060043369 0.744139390692219 1.01888184740715 0.195915088161936 26 89 0.061 205.5 0.15 20.8 29-6 1 30 42 -0.201 295.5 0.1 15.8 27.9 2729 2 26 42 1.494 272.5 0 15.8 27.9 274.4 2 0

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!