Question: Regression Diagnostics with R This assignment is due by Sunday , 11:59 pm EST. PDF of this assignment ALY6015 M1 Regression Diagnostics Assignment with Rubric.pdf
Regression Diagnostics with R
This assignment is due by Sunday, 11:59 pm EST.
PDF of this assignment ALY6015 M1 Regression Diagnostics Assignment with Rubric.pdf
Actions
Downloadable Files for completing the assignment
AmesHousing.csvDownload AmesHousing.csv
AmesHousingDataDocumentation.txtDownload AmesHousingDataDocumentation.txt
Pre- Assignment Lab
Before you begin your assignment, view the associated Lab video.
Lab: Regression Diagnostics & Feature Selection Video Transcript
Actions
Overview and Rationale
Purpose of Assignment (WHY)
It is important for you to be able to interpret and evaluate the models that you build. In this assignment, you will fit two regression models, interpret the results and implement diagnostic techniques to identify and correct issues with the model.
Program Competencies
Program Learning Outcomes (PLOs)
Statistics & Math Demonstrate the foundational knowledge and skills critical to pursue data analytics as a profession in relation to statistics and math.
Analytics Systems Technology (Tools)/Advanced Analytics Demonstrate the knowledge of advanced tools in data analytics.
Business Analytics Agility Apply the principles, tools and methods of analytics to a comprehensive real-world problem or project related to data analyses for tactical and/or strategic decision making.
Business Process Management Integrate the major theories, tools, and approaches in data analytics to identify data-driven insights for informed business process management.
Communicating with Data Design and deliver presentations, reports, and recommendations that effectively translate technical results/data solutions and are coherent and persuasive to different audiences.
Course Learning Outcomes
This assignment is directly linked to the following key learning outcomes from the course syllabus:
CLO1: Fit, interpret, and evaluate regression models using standard functions and diagnostic techniques.
CLO2: Correct issues with overfitting, linearity, multicollinearity and outliers.
CLO3: Select best model from multiple predictors using automated techniques.
Assignment Description (WHAT)
For this exercise, you will need to download the attached AmesHousing dataset. In this assignment, you will implement the skills you have learned to fit, interpret and evaluate a regression model. Once you have completed steps 1 through 14, prepare a report to document your findings.
Criteria for Success
Refer to the attached rubric for more details on the report. The report should contain a well written cover/title page, introduction, body, conclusion, and references. It must follow APA format and have at least 1000 words (excluding title page and references page). All R code used for your report should be included in an appendix at the end of the report. Graphs, figures, charts, and tables are very useful visual effects to communicate your results and impress your readers. However, such items should not be included in the report unless they are well described and interpreted. Please use subtitles to make your assignment more reader friendly as well.
Format & Guidelines
The report should follow the following format:
Title page
Introduction
Analysis
Conclusion/Interpretations
References
Deliverables (HOW)
Load the Ames housing dataset.
Perform Exploratory Data Analysis and use descriptive statistics to describe the data.
Prepare the dataset for modeling by imputing missing values with the variable's mean value or any other value that you prefer.
Use the "cor()" function to produce a correlation matrix of the numeric values.
Produce a plot of the correlation matrix, and explain how to interpret it. (hint - check the corrplot or ggcorrplot plot libraries)
Make a scatter plot for the X continuous variable with the highest correlation with SalePrice. Do the same for the X variable that has the lowest correlation with SalePrice. Finally, make a scatter plot between X and SalePrice with the correlation closest to 0.5. Interpret the scatter plots and describe how the patterns differ.
Using at least 3 continuous variables, fit a regression model in R.
Report the model in equation form and interpret each coefficient of the model in the context of this problem.
Use the "plot()" function to plot your regression model. Interpret the four graphs that are produced.
Check your model for multicollinearity and report your findings. What steps would you take to correct multicollinearity if it exists?
Check your model for outliers and report your findings. Should these observations be removed from the model?
Attempt to correct any issues that you have discovered in your model. Did your changes improve the model, why or why not?
Use the all subsets regression method to identify the "best" model. State the preferred model in equation form.
Compare the preferred model from step 13 with your model from step 12. How do they differ? Which model do you prefer and why?
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
