# Question: Statistician Frank J Anscombe created a data set to illustrate

Statistician Frank J. Anscombe created a data set to illustrate the importance of doing more than just examining the standard regression output. These data are provided in the file P10_64.xlsx.

a. Regress Y1 on X. How well does the estimated equation fit the data? Is there evidence of a linear relationship between Y1 and X at the 5% significance level?

b. Regress Y2 on X. How well does the estimated equation fit the data? Is there evidence of a linear relationship between Y2 and X at the 5% significance level?

c. Regress Y3 on X. How well does the estimated equation fit the data? Is there evidence of a linear relationship between Y3 and X at the 5% significance level?

d. Regress Y4 on X4. How well does the estimated equation fit the data? Is there evidence of a linear relationship between Y4 and X4 at the 5% significance level?

e. Compare these four simple linear regression equations (1) in terms of goodness of fit and (2) in terms of overall statistical significance.

f. How do you explain these findings, considering that each of the regression equations is based on a different set of variables?

g. What role, if any, do outliers have on each of these estimated regression equations?

a. Regress Y1 on X. How well does the estimated equation fit the data? Is there evidence of a linear relationship between Y1 and X at the 5% significance level?

b. Regress Y2 on X. How well does the estimated equation fit the data? Is there evidence of a linear relationship between Y2 and X at the 5% significance level?

c. Regress Y3 on X. How well does the estimated equation fit the data? Is there evidence of a linear relationship between Y3 and X at the 5% significance level?

d. Regress Y4 on X4. How well does the estimated equation fit the data? Is there evidence of a linear relationship between Y4 and X4 at the 5% significance level?

e. Compare these four simple linear regression equations (1) in terms of goodness of fit and (2) in terms of overall statistical significance.

f. How do you explain these findings, considering that each of the regression equations is based on a different set of variables?

g. What role, if any, do outliers have on each of these estimated regression equations?

## Answer to relevant Questions

A company produces electric motors for use in home appliances. One of the company’s production managers is interested in examining the relationship between the dollars spent per month in inspecting finished motor products ...A power company located in southern Alabama wants to predict the peak power load (i.e., Y, the maximum amount of power that must be generated each day to meet demand) as a function of the daily high temperature (X). A random ...When potential workers apply for a job that requires extensive manual assembly of small intricate parts, they are initially given three different tests to measure their manual dexterity. The ones who are hired are then ...A toy company has assigned you to analyze the factors influencing the sales of its most popular doll. The number of these dolls sold during the last 23 years is given in the file P11_57.xlsx. The following factors are ...Pernavik Dairy produces and sells a wide range of dairy products. Because most of the dairy’s costs and prices are set by a government regulatory board, most of the competition between the dairy and its competitors takes ...Post your question