Question: APStatistics SOLUTIONS Set 1 (x1, y1) A). Scatter plot 60 50 f(x) = 1.3x - 0.4 40 y1 Linear (y1) 30 20 10 0 0
APStatistics SOLUTIONS Set 1 (x1, y1) A). Scatter plot 60 50 f(x) = 1.3x - 0.4 40 y1 Linear (y1) 30 20 10 0 0 5 10 15 20 25 30 35 40 b). Correlation coefficient: 0.99842114 c). The correlation coefficient is very close to 1, hence the correlation between x1 and y1 is a strong correlation. Set 2 (x2, y2) a). Scatter plot 120 100 f(x) = 1x^2 - 0x + 0 80 60 40 20 0 0 2 4 6 8 10 12 This regression line perfectly follows a polynomial relationship with order 2. For the linear regression for the data set we get: 120 100 80 60 40 20 0 0 2 4 b). Correlation coefficient: 0.963143 6 8 10 12 c). The correlation coefficient is very close to 1, hence the correlation between x2 and y2 is a strong correlation. Set 3 (x3, y3) a). Scatter plot 30 25 20 15 f(x) = 0.75x + 4.77 10 5 0 0 1 2 3 4 5 6 7 8 9 10 b). Correlation coefficient: 0.284715 c). The correlation coefficient is very close to 0, hence there is no correlation between x3 and y3. Set 4 (x4, y4) a). Scatter plot 25 20 15 10 f(x) = - 0.84x + 11.36 5 0 1 2 3 4 5 6 7 8 9 b). Correlation coefficient: - 0.3384 c). The correlation coefficient is not close to 0, hence there is a week negative correlation between x4 and y4. APStatistics 3.05 Super Bowl Ticket Prices Directions: This assignment models the steps for performing explanatory data analysis. Complete the assignment. Clearly label each answer. As popularity for the Super Bowl has increased, so have ticket prices. Let's take a look at the prices of Super Bowl tickets from 1985 to 2013 (data courtesy ofhttp://www.krem.com/sports/football/Historical-Super-Bowl-ticket-prices-189247771.html). Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Ticket Price (in dollars) 325 325 400 450 500 550 650 650 800 750 750 900 1000 1000 1. Perform exploratory data analysis using year as the explanatory variable and ticket price as the response variable. Using your graphing calculator, answer the following questions: during this exercise, you will need to graph a normal probability plot, a residual plot, and the least squares regression line equation. a. Before we can consider using linear regression to model a data set, we need to check several conditions. The first: is the data quantitative? (1 point): b. Graph the scatterplot of the data: (1 point) c. The second condition we need to check before we can use linear regression is to see if the data is roughly linear. Based on the scatterplot, is our data roughly linear? (1 point): d. The third condition we need to check before we can use linear regression is to make sure we do not have outliers that would impact our regression line. Are there any outliers that would strongly affect a regression line? (1 point) e. Graph the scatterplot with the with the regression line: (1 point) f. Provide the linear regression information from your calculator (including r and r2 ) (1 point) g. Write a statement regarding the correlation between the variables in our data set in the context of the problem (3 points): h. Write a statement interpreting the coefficient of determination relative to the data set in the context of the problem (3 points): i. Write the equation of the linear regression line. Define any variables used. (2 points): j. Interpret the slope of the regression line in the context of the problem (2 points): k. Draw a graph of the normal probability plot (1 point): Based on the normal probability plot, does the data appear normal? (3 points) l. Draw a graph of the residual plot (1 point): m. Based on the residual plot, do you think a linear regression line is an appropriate model for this data? Why? (3 points) n. What is the formula for calculating a residual? Calculate the residual for the following years: 2003, 2007, and 2010. (4 points) Year Ticket Price (in dollars) 2000 325 2001 325 2002 400 2003 450 2004 500 2005 550 2006 650 2007 650 2008 800 2009 750 2010 750 2011 900 2012 1000 2013 1000 scatterplot 1200 1000 800 ticket price f(x) = 54.5054945055x R = 0.9709752536 600 400 200 0 1998 2000 2002 2004 20 yea scatterplot f(x) = 54.5054945055x - 108718.846153848 R = 0.9709752536 2000 2002 2004 2006 year 2008 2010 2012 2014 a. . Yes, data is quantitative. b. . scatterplot 1200 1000 800 600 400 200 0 1998 2000 2002 2004 2006 2008 2010 2012 2014 scatterplot c. . Yes, there is strong positive linear relationship between year and ticket price. d. . There are no outliers in the data set. This is clearly evident from scatterplot. e. . scatterplot y = 54.505x - 108719 R = 0.971 1200 ticket price 1000 800 600 400 200 0 1998 2000 2002 2004 2006 2008 2010 2012 2014 year f. . r = 0.985381 R = 0.971 g. . There is strong positive linear relationship between year and ticket price. As the value of year increases, the value of ticket price also increases. h. . 97.1% variation in the ticket price is explained by the independent variable year. i. . y = 54.505x - 108719 Y = ticket price X = year j. . With a unit increase in year, there is 54.505 $ increase in ticket price. k. . Normal Probability Plot 1500 Y 1000 500 0 0 20 40 60 80 100 120 Sample Percentile l. . Data appears to be normal as S shaped is formed in normal probability plot. m. . X Variable 1 Residual Plot Residuals 100 50 0 1995 -50 -100 2000 2005 2010 2015 X Variable 1 n. . Since points are randomly scattered in residual plot, variance of error terms is constant. Hence assumption of homogeneity of error variance is satisfied. o. . Residual = Actual value - predicted value Year 2003 2007 2010 Ticket Price (in dollars) predicted 450 455.6593 650 673.6813 750 837.1978 residual -5.65934 -23.6813 -87.1978