Question: This module has been concerned with multiple regression - including multiple terms in our regression models, be they higher power terms, dummy variables, interaction terms,

This module has been concerned with multiple regression - including multiple terms in our regression models, be they higher power terms, dummy variables, interaction terms, or just additional quantitative variable terms. Models can become very complex, but model complexity can be a problem. This discussion post is concerned with the issue of 'overfitting' a model to the data. Make your initial post by Thursday night and your response post by the end of the module.

Initial Post

Watch this video:https://www.youtube.com/watch?v=ls3XKoGntXgLinks to an external site.

This article illustrates the issue of overfitting as well:Model selection and overfitting - nature.pdf

Actions

When you are studying a response variable, you often have the opportunity to measure many predictor variables in your study. Every additional predictor variable, interaction term, power term, etc will increase R2 and thus your model will explain more of the variation in the response variable. Why, then, would we not want to include a predictor variable in our regression model? Please explain the reasoning for model simplification.

Give an example of a population with a response variable you would be interested in modeling - give examples of at least 5 predictor variables that you might consider when collecting data to create a model.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!