Question: I have to do a regression analysis and then show the results in a visual form i.e. Tableau I am trying to show if certain
I have to do a regression analysis and then show the results in a visual form i.e. Tableau
I am trying to show if certain MLB players increase attendance when they are playing. Here are my notes -
To conduct a regression analysis, we'll use the number of games played by each player as the independent variable (X) and the attendance at games as the dependent variable (Y). We'll fit a linear regression model to this data to see if there's a significant relationship between the number of games played by each player and the attendance at games. Let's denote: - X: Number of games played - Y: Attendance at games The data provided is as follows: Player | Games Played | Attendance (per game) ------------------|--------------|----------------------- Adam Jones | 137 | 29,374 David Ortiz | 146 | 35,564 Mike Trout | 159 | 37,194 Bryce Harper | 153 | 32,343
Step 1: Calculate the means of X and Y. - Mean of X: (137 + 146 + 159 + 153 + 119) / 5 = 142.8 - Mean of Y: (29,374 + 35,564 + 37,194 + 32,343 + 33,654) / 5 = 33,026.6 Step 2: Calculate the deviations from the mean for X and Y. - Deviation of X: X - mean(X) - Deviation of Y: Y - mean(Y) Step 3: Calculate the sum of the products of deviations. - ((X - mean(X)) * (Y - mean(Y))) Step 4: Calculate the sum of the squares of deviations for X. - ((X - mean(X))^2) Step 5: Calculate the regression coefficients. - Slope (b): ((X - mean(X)) * (Y - mean(Y))) / ((X - mean(X))^2) - Intercept (a): mean(Y) - (slope * mean(X)) 1. Sum of the products of deviations: ((X - mean(X)) * (Y - mean(Y))) = (137 - 142.8)*(29374 - 33026.6) + (146 - 142.8)*(35564 - 33026.6) + (159 - 142.8)*(37194 - 33026.6) + (153 - 142.8)*(32343 - 33026.6) + (119 - 142.8)*(33654 - 33026.6) 2. Sum of the squares of deviations for X: ((X - mean(X))^2) = (137 - 142.8)^2 + (146 - 142.8)^2 + (159 - 142.8)^2 + (153 - 142.8)^2 + (119 - 142.8)^2 3. Slope (b): b = ((X - mean(X)) * (Y - mean(Y))) / ((X - mean(X))^2) 4. Intercept (a): a = mean(Y) - (slope * mean(X)) After calculating these values, we'll have the equation of the regression line, and we can interpret the slope to determine if there's a significant relationship between games played and attendance. To interpret the slope of the regression line, which in this case is approximately 98.12, it indicates the average change in attendance for each additional game played by the player. In this analysis, the slope suggests that, on average, for each additional game played by the player, there is an increase of approximately 98.12 attendees at the game. Step 3: ((X - mean(X)) * (Y - mean(Y))) 133289.48 103420.04 95929.96 Step 4: ((X - mean(X))^2) 976.8 976.8 976.8 Step 5: Calculate the slope (b). Slope (b) = ((X - mean(X)) * (Y - mean(Y))) / ((X - mean(X))^2) 98.20839476 98.12 Step 6: Calculate the intercept (a). Intercept (a) = mean(Y) - (slope * mean(X)) 19015.064 19673.68 So, the equation of the regression line is: Y = 98.12X + 19673.68 Now we have the regression line equation. We can interpret the slope to determine if there's a significant relationship between games played and attendance. Please check if this looks correct and please help me finish it. Need a visual aid.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
