New Semester
Started
Get
50% OFF
Study Help!
--h --m --s
Claim Now
Question Answers
Textbooks
Find textbooks, questions and answers
Oops, something went wrong!
Change your search query and then try again
S
Books
FREE
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Tutors
Online Tutors
Find a Tutor
Hire a Tutor
Become a Tutor
AI Tutor
AI Study Planner
NEW
Sell Books
Search
Search
Sign In
Register
study help
mathematics
statistics
Statistics The Art And Science Of Learning From Data 3rd Edition Alan Agresti, Christine A. Franklin - Solutions
The table that follows shows the standardized residuals in parentheses for GSS data about the statement, Women should take care of running their homes and leave running the country up to men. The absolute value of the standardized residual is 13.2 in every cell. For
For the chi-squared distribution, the mean equals df and the standard deviation equals √22(df). a. Explain why, as a rough approximation, for a large df value, 95% of the chi-squared distribution falls within df ± 2 √2(df). b. With df = 8, show that df ± 2 √2(df) gives the interval (0, 16)
A pool of six candidates for three managerial positions includes three females and three males. Denote the three females by F1, F2, F3 and the three males by M1, M2, M3. The result of choosing three individuals for the managerial positions is (F2, M1, M3).a. Identify the 20 possible samples that
For testing independence, most software also reports another chi-squared statistic, called likelihood-ratio chi-squared. It equals G2 = 2 ∑ [observed count × log (observed count / expected count)] It has similar properties as the X2 statistic, such as df = (r - 1) × (c - 1). a. Show that G2 =
How large a X2 test statistic value provides a P-value of 0.05 for testing independence for the following table dimensions? a. 2 × 2 b. 2 × 3 c. 2 × 5 d. 5 × 5 e. 3 × 9
For the 2 Ã 3 table on gender and happiness in Exercise 11.4 (shown again following), software tells us that X2 = 0.46 and the P-value = 0.79.a. State the null and alternative hypothesis, in context, to which these results apply.b. Interpret the P-value.
The Car Weight and Mileage data file on the text CD shows the weight (in pounds) and mileage (miles per gallon) of 25 different model autos.a. Identify the natural response variable and explanatory variable.b. The regression of mileage on weight has MINITAB regression outputState the prediction
For the Georgia Student Survey file on the text CD, let y = exercise and x = watch TV (minutes per day).a. Construct a scatter-plot. Identify an outlier that could have an impact on the fit of the regression model. What would you expect its effect to be on the slope?b. Fit the model with and
The variables y = annual income (thousands of dollars), x1 = number of years of education, and x2 = number of years experience in job are measured for all the employees having city-funded jobs, in Knoxville, Tennessee. Suppose that the following regression equations and correlations apply: i) =
A study about the effect of the swing on putting in golf showed a very strong linear relationship between y = putting distance and the square of x = club’s impact velocity (r2 is in the range 0.985 to 0.999). a. For the model μy = α + βx2, explain why it is sensible to set α = 0. b. If the
Refer to the relationship r = (sx/sy)b between the slope and correlation, which is equivalently sxb = rsy. a. Explain why an increase in x of sx units relates to a change in the predicted value of y of sxb units. (For instance, if sx = 10, it corresponds to a change in of 10b.) b. Based on part
Suppose r2 = 0.30. Since ∑ (y - )2 is used in estimating the overall variability of the y values and ∑ (y - )2 is used in estimating the residual variability at any fixed value of x , explain why approximately the estimated variance of the conditional distribution of y for a given x is 30%
The formula for the standard error of the sample slope b is se = s/∑(x - )2, where s is the residual standard deviation of y. a. Show that the smaller the value of s, the more precisely b estimates β. b. Explain why a small s occurs when the data points show little variability about the
An alternative to the regression formula μy = α + βx expresses each y value, rather than the mean of the y values, in terms of x. This approach models an observation on y asy = mean + error = α + βx + ɛ,Where the mean μy = α + βx and the error = ɛ.The error term denoted by ɛ (the Greek
You invest $1000 at 6% compound interest a year. How long does it take until your investment is worth $2000? a. Based on what you know about exponential regression, explain why the answer is the value of x for which 1000(1.06)x = 2000. b. Using the property of logarithms that log(ax) = x log(a),
Suppose we want to describe whether Internet use is more strongly associated with GDP or with unemployment rate.a. Can we compare the slopes when GDP and unemployment rate each predict Internet use in separate regression equations? Why or why not?b. According to the correlation matrix in Table
Although the slope does not measure association, it is useful for comparing effects for two variables that have the same units. For the Internet Use data file of 33 nations on the text CD, let x = GDP (thousands of dollars per capita). For predicting y = percentage Internet penetration (the
Sketch a scatter-plot, identifying quadrants relative to the sample means as in Figure 12.5, for which (a) The slope and correlation would be negative and (b) The slope and correlation would be approximately zero.
Is there a relationship between x = how many sit-ups you can do and y = how fast you can run 40 yards (in seconds)? The MINITAB output of a regression analysis for the female athlete strength study is shown here.a. Find the predicted time in the 40-yard dash for a subject who can do(i) 10 sit-ups
For the high school female athlete strength study, the output shows a correlation matrix for height, weight, body mass index (BMI), and percentage of body fat (BF%).a. Which pair of variables has the (i) Strongest association and (ii) Weakest association. b. Interpret the sign and the strength of
For the Male Athlete Strength data file on the text CD, the output shows correlations for height, weight, and percentage body fat (BF%).a. Compare the correlations for females (Exercise 12.16) to males for weight and height, weight and body fat percentage, and height and body fat percentage. b.
All students who attend Lake Wobegon College must take the math and verbal SAT exams. Both exams have a mean of 500 and a standard deviation of 100. The regression equation relating y = math SAT score and x = verbal SAT score is = 250 + 0.5x. a. Find the predicted math SAT score for a student who
Refer to the previous exercise.a. Predict the math SAT score for a student who has a verbal SAT = 800.b. The correlation is 0.5. Interpret the prediction in part a in terms of regression toward the mean.Previous exerciseAll students who attend Lake Wobegon College must take the math and verbal SAT
Refer to the previous exercise.a. Find the predicted mileage for the Toyota Corolla, which weighs 2590 pounds.b. Find the residual for the Toyota Corolla, which has observed mileage of 38.c. Sketch a graphical representation of the residual in part b.
For the Georgia Student Survey data file on the text CD, the table shows the correlation matrix for college GPA, high school GPA, and daily time spent watching TV.a. Interpret r and r2 between time spent watching TV and college GPA.b. One student is 2 standard deviations above the mean on high
Refer to the association you investigated in Exercise 12.7 between study time and college GPA. Using software or a calculator with the data file you constructed for that exercise,a. Find and interpret the correlation.b. Find and interpret r2.
Refer to the association you investigated in Exercise 12.8 between skipping class and college GPA. Using software or a calculator with the data file you constructed for that exercise,a. Find the mean and standard deviation of each variable.b. Report the slope of the prediction equation and the
A clinical trial admits subjects suffering from high cholesterol, who are then randomly assigned to take a drug or a placebo for a 12-week study. For the population, without taking any drug, the correlation between the cholesterol readings at times 12 weeks apart is 0.70. The mean cholesterol
For a class of 100 students, the teacher takes the 10 students who performed poorest on the midterm exam and enrolls them in a special tutoring program. Both the midterm and final have a class mean of 70 with standard deviation 10, and the correlation is 0.50 between the two exam scores. The mean
The Car Weight and Mileage data file on the text CD shows the weight and the mileage per gallon of gas of 25 cars of various models. The regression of mileage on weight has r2 = 0.75. Explain how to interpret this in terms of how well you can predict a car’s mileage if you know its weight.
The owner of Bertha’s Restaurant is interested in whether an association exists between the amount spent on food and the amount spent on drinks for the restaurant’s customers. She decides to measure each variable for every customer in the next month. Each day she also summarizes the mean amount
For which student body do you think the correlation between high school GPA and college GPA would be higher: Yale University or the University of Connecticut? Explain why.
For the Male Athlete Strength data file on the text CD, the prediction equation relating y = 1 repetition maximum bench press (1RMBP) in kilograms to x = repetitions to fatigue bench press (RTFBP) is = 117.5 + 5.86 x.a. Find the predicted 1RMBP for a male athlete with a RTFBP of 35, which was one
Use software to analyze the U.S. Statewide Crime data file on the text CD on y = violent crime rate and x = percentage of single parent families. a. Construct a scatter-plot. What does it show? b. One point is quite far removed from the others, having a much higher value on both variables than the
Refer to the High School Female Athlete and Male Athlete Strength data files on the text CD. a. Find the correlation between number of 60-pound bench presses before fatigue and bench press maximum for females and between bench presses before fatigue and bench press maximum for males. Interpret. b.
A regression analysis is conducted with 25 observations. a. What is the df value for inference about the slope β? b. Which two t test statistic values would give a P-value of 0.05 for testing H0: β = 0 against Ha: β ≠ 0? c. Which t -score would you multiply the standard error by in order to
For the House Selling Prices FL data file on the text CD, MINITAB results of a regression analysis are shown for 100 homes relating y = selling price (in dollars) to x = the size of the house (in square feet).a. Show all steps of a two-sided significance test of independence. Could the sample
Refer to the previous exercise. Of the 100 homes, 25 were in a part of town considered less desirable. For a regression analysis using y = selling price and x = size of house for these 25 homes, a. You plan to test H0: β = 0 against Ha: β > 0. Explain what H0 means, and explain why a data analyst
The high school female athlete strength study also considered prediction of y = maximum leg press (LP) using x = number of 200-pound leg presses (LP_200). MINITAB results of a regression analysis are shown.a. Show all steps of a two-sided significance test of the hypothesis of independence. b. Find
A study of 375 women who lived in pre-industrial Finland (by S. Helle et al., Science , vol. 296, p. 1085, 2002) using Finnish church records from 1640 to 1870 found that there was roughly a linear relationship between y = lifelength (in years) and x = number of sons the woman had, with a slope
Repeat the previous exercise using x = number of daughters the woman had, for which the slope estimate was 0.44 (se = 0.29). Previous exercise a. Interpret the sign of the slope. Is the effect of having more boys good, or bad? b. Show all steps of the test of the hypothesis that life length is
Refer to the previous two exercises. Using significance level 0.05, what decision would you make? Explain how that decision is in agreement with whether 0 falls in the confidence interval. Do this for the data for both the boys and the girls.
Each month, the owner of Café Gardens restaurant records y = monthly total sales receipts and x = amount spent that month on advertising, both in thousands of dollars. For the first four months of operation, the observations are as shown in the table. The correlation equals 0.857.Advertising
Suppose the regression line μy = -10,000 + 1000x models the relationship for the population of working adults in Canada between x = age and the mean of y = annual income (in Canadian dollars). The conditional distribution of y at each value of x is modeled as normal, with σ = 5000. Use this
Refer to the association you investigated in Exercises 12.7 and 12.21 between study time and college GPA. Using software with the data file you constructed, conduct a significance test of the hypothesis of independence, for the one-sided alternative of a positive population slope. Report the
Refer to the association you investigated in Exercises 12.8 and 12.22 between skipping class and college GPA. Using software with the data file you constructed, construct a 90% confidence interval for the slope in the population. Interpret.
Refer to the Georgia Student Survey data file on the text CD. Treat college GPA as the response variable and high school GPA as the explanatory variable, and suppose these students are a random sample of all University of Georgia students. a. Can you conclude that these variables are associated in
The MINITAB output shows the large standardized residuals for the female athlete strength study.Large standardized residuals for strength study:a. Explain how to interpret all the entries in the row of the output for athlete 10.b. Out of 57 observations, is it surprising that 3 observations would
For the Georgia Student Survey file on the text CD, let y = exercise and x = watch TV. One student reported watching TV an average of 180 minutes a day and exercising 60 minutes a day. This person’s residual was 48.8 and standardized residual was 6.41.a. Interpret the residual, and use it to find
The figure is a histogram of the standardized residuals for the regression of maximum bench press on number of 60-pound bench presses, for the high school female athletes.a. Which distribution does this figure provide information about?b. What would you conclude based on this figure?
The House Selling Prices FL data file on the text CD has several predictors of house selling prices. The table here shows the ANOVA table for a regression analysis of y = the selling price (in thousands of dollars) and x = the size of house (in thousands of square feet). The prediction equation is
For a random sample of children from a school district in South Carolina, a regression analysis is conducted of y = amount spent on clothes in the past year (dollars) and x = year in school. MINITAB reports the tabulated results for observations at x = 12.a. Interpret the value listed under
Using the context of the previous exercise, explain the difference between the purpose of a 95% prediction interval (PI) for an observation and a 95% confidence interval (CI) for the mean of y at a given value of x. Why would you expect the PI to be wider than the CI?Previous exerciseFor a random
Exercise 12.35 referred to an analysis of leg strength for 57 female athletes, with y = maximum leg press and x = number of 200-pound leg presses until fatigue, for which = 233.89 + 5.27x.The table shows ANOVA results from SPSS for the regression analysis.a. Show that the residual standard
For a population regression equation, why is it more sensible to write μy = α + βx instead of y = α + βx? Explain with reference to the variables x = height and y = weight for the population of girls in elementary schools in your hometown.
Refer to the previous exercise. MINITAB reports the tabulated results for observations at x = 25.a. Show how MINITAB got the €œFit€ of 365.66.b. Using the predicted value and se value, explain how MINITAB got the interval listed under €œ95% CI.€ Interpret this interval.c. Interpret
Refer to the previous two exercises.a. In the ANOVA table, show how the Total SS breaks into two parts, and explain what each part represents. b. From the ANOVA table, explain why the overall sample standard deviation of y values is sy = 2192787/56 = 58.7. Explain the difference between
For prediction intervals, an important inference assumption is a constant residual standard deviation of y values at different x values. In practice, the residual standard deviation often tends to be larger when μy is larger. a. Sketch a hypothetical scatter-plot for which this happens, using
For a random sample of U.S. counties, the ANOVA table shown refers to hypothetical data on x = percentage of the population aged over 50 and y = per capita expenditure (dollars) on education.a. Fill in the blanks in the table.b. For what hypotheses can the F test statistic be used?
Refer to the Georgia Student Survey data file on the text CD. Regress y = college GPA on x = high school GPA. a. Stating the necessary assumptions, find a 95% confidence interval for the mean college GPA for all University of Georgia students who have high school GPA = 3.6. b. Find a 95% prediction
Report the ANOVA table for the previous exercise. a. Show how the Total SS breaks into two parts, and explain what each part represents. b. Find the estimated residual standard deviation of y. Interpret it. c. Find the sample standard deviation sy of y values. Explain the difference between the
You invest $100 in a savings account with interest compounded annually at 10%. a. How much money does the account have after one year? b. How much money does the account have after five years? c. How much money does the account have after x years? d. How many years does it take until your savings
You want your savings to double in a decade. a. Explain why 7.2% interest a year would do this. b. You might think that 10% interest a year would give 100% interest (that is, double your savings) over a decade. Explain why interest of 10% a year would actually cause your savings to multiply by 2.59
The table shows the approximate U.S. population size (in millions) at 10-year intervals beginning in 1900. Let x denote the number of decades since 1900. That is, 1900 is x = 0, 1910 is x = 1, and so forth. The exponential regression model fitted to y = population size and x gives= 81.14
Refer to the previous exercise, for which predicted population growth was 14.18% per decade. Suppose the growth rate is now 15% per decade. Explain why the population size will (a) Double after five decades, (b) Quadruple after 100 years (10 decades), and (c) Be 16 times its original size after 200
Let y = number of parties attended in the past month and x = number of dates in the past month, measured for all single students at your school. Explain the mean and variability aspects of the regression model μy = α + βx, in the context of these variables. In your answer, explain why (a) It is
Let x denote a person’s age and let y be the death rate, measured as the number of deaths per thousand individuals of a fixed age within a period of a year. For men in the United States, these variables follow approximately the equation = 0.32(1.078)x. a. Interpret 0.32 and 1.078 in this
Ecologists believe that organic material decays over time according to an exponential decay model. This is the case 0a. Construct a scatter-plot. Why is a straight-line model inappropriate? b. Show that the ordinary regression model gives the fit, = 54.98 - 3.59x. Find the predicted weight after x
Refer to the previous exercise.a. The correlation equals -0.890 between x and y, and -0.997 between x and log(y). What does this tell you about which model is more appropriate?b. The half-life is the time for the weight remaining to be one-half of the original weight. Use the equation =
Let y = number of parties attended in the past month and x = number of sports events watched in the past month, measured for all students at your school. Explain the mean and variability about the mean aspects of the regression model μy = α + βx, in the context of these variables. In your
A report summarizing scores for students on a verbal aptitude test x and a mathematics aptitude test y states that = 480, = 500, sx = 80, sy = 120, and r = 0.60.a. Find the slope of the regression line, based on its connection with the correlation.b. Find the y-intercept of the regression line,
Do very short parents tend to have children who are even shorter, or short but not as short as they are? Explain, identifying the response and explanatory variables and the role of regression toward the mean.
The FL Crime data file on the text CD contains data for all counties in Florida on y = median annual income (thousands of dollars) for residents of the county and x = percent of residents with at least a high school education. The table shows some summary statistics and results of a regression
For the House Selling Prices FL data set on the text CD, when we regress y = selling price (in dollars) on x = number of bedrooms, we get the results shown in the printout.a. One home with three bedrooms sold for $338,000. Find the residual, and interpret.b. The home in part a had a standardized
Refer to the previous exercise.a. Explain what the regression parameter β means in this context. b. Construct and interpret a 95% confidence interval for β. c. Use the result of part b to form a 95% confidence interval for the difference in the mean selling prices for homes
Refer to the previous two exercises.a. Explain the difference between the residual standard deviation of 52,771.5 and the standard deviation of 56,357 reported for the selling prices. b. Since theyre not much different, explain why this means that number of bedrooms is not strongly
Exercise 3.39 in Chapter 3 showed data collected at the end of an introductory statistics course to investigate the relationship between x = study time per week (average number of hours) and y = college GPA. The table here shows the data for the eight males in the class on these variables and on
For the Georgia Student Survey file on the text CD, let y = exercise and x = college GPA. a. Construct a scatter-plot. Identify an outlier that could influence the regression line. What would you expect its effect to be on the slope and the correlation? b. Fit the model. Find the standardized
For the study of high school female athletes, when we use x = maximum bench press (BP) to predict y = maximum leg press (LP), we get the results that follow. The sample mean of BP was 80.a. Interpret the confidence interval listed under 95% CI. b. Interpret the interval
The analysis in the previous exercise has the ANOVA table shown.a. For those female athletes who had maximum bench press equal to the sample mean of 80 pounds, what is the estimated standard deviation of their maximum leg press values? b. Assuming that maximum leg press has a normal distribution,
You invest $1000 in an account having interest such that your principal doubles every 10 years. a. How much money would you have after 50 years? b. If you were still alive in 100 years, show that you’d be a millionaire. c. Give the equation that relates y = principal to x = number of decades for
The population size of Florida (in thousands) since 1830 has followed approximately the exponential regression = 46(1.036)x. Here, x = year - 1830 (so, x = 0 for 1830 and x = 170 for the year 2000).a. What has been the approximate rate of growth per year?b. Find the predicted population size
The table shows the world population size (in billions) since 1900.a. Let x denote the number of years since 1900. The exponential regression model fitted to y = population size and x gives = 1.424 Ã 1.014x. Show that the predicted population sizes are 1.42 billion in 1900 and 6.57
Match each of the following scatter-plots to the description of its regression and correlation. The plots are the same except for a single point. Justify your answer for each scatter-plot.a. r = -0.46 = 142 - 15.6xb. r = -0.86 = 182 - 25.8xc. r = -0.74 = 165 - 21.8x
The Softball data file on the text CD contains the records of a University of Georgia coed intramural softball team for 277 games over a 20-year period. (The players changed, but the team continued.) The variables include, for each game, the team’s number of runs scored (RUNS), number of hits
Refer to the previous exercise. Conduct a regression analysis of y = RUNS and x = HIT. Does a straight-line regression model seem appropriate? Prepare a report a. Using graphical ways of portraying the individual variables and their relationship. b. Interpreting descriptive statistics for the
Using software with the FL Student Survey data file on the text CD, conduct regression analyses relating y = high school GPA and x = hours of TV watching. Prepare a two-page report, showing descriptive and inferential methods for analyzing the relationship.
Refer to the previous exercise. Now let x = number of classes skipped and y = college GPA.a. Construct a scatter-plot. Does the association seem to be positive or negative?b. Find the prediction equation and interpret the y -intercept and slope.c. Find the predicted GPA and residual for Student 1.
For the High School Female Athletes data set on the text CD, conduct a regression analysis using the time for the 40-yard dash as the response variable and weight as the explanatory variable. Prepare a two-page report, indicating why you conducted each analysis and interpreting the results.
For a football game in the National Football League, let y = difference between number of points scored by the home team and the away team (so, y > 0 if the home team wins). Let x be the predicted difference according to the Las Vegas betting spread. For the 768 NFL games played between 2003 and
A study by the Readership Institute2 at Northwestern University used survey data to analyze how newspaper reader behavior was influenced by the Iraq war. The response variable was a Reader Behavior Score (RBS), a combined measure summarizing newspaper use frequency, time spent with the newspaper,
One of your relatives is a big sports fan but has never taken a statistics course. Explain how you could describe the concept of regression toward the mean in terms of a sports application, without using technical jargon.
Does regression toward the mean imply that, over many generations, there are fewer and fewer very short people and very tall people? Explain your reasoning.
Suppose the correlation between height and weight is 0.50 for a sample of males in elementary school, and also 0.50 for a sample of males in middle school. If we combine the samples, explain why the correlation will probably be larger than 0.50.
Annual income, in dollars, was the response variable in a regression analysis. For a British version of a written report about the analysis, all responses were converted to British pounds sterling (£1 equaled $2.00, when this was done). a. How, if at all, does the slope of the prediction equation
The statistician George Box, who had an illustrious academic career at the University of Wisconsin, is often quoted as saying, “All models are wrong, but some models are useful.” Why do you think that, in practice, a. All models are wrong? b. Some models are not useful?
In regression modeling, for t tests about regression parameters, df = n - number of parameters in equation for the mean.a. Explain why df = n - 2 for the model μy = α + βx.b. Chapter 8 discussed how to estimate a single mean μ. Treating this as the parameter in a simpler regression model, μy =
For the Georgia Student Survey data file on the text CD, look at college GPA and high school GPA. a. Identify the response and explanatory variables and construct a scatter-plot. What is the effect on this plot of several students having high school GPAs of exactly 4.0? b. Find the sample
What assumptions are needed to use the regression equation μy = α + βx, (a) To describe the relationship between two variables and (b) To make inferences about the relationship. In Case B, which assumption is least critical?
Refer to the previous exercise. In view of these assumptions, indicate why such a model would or would not be good in the following situations: a. x = year (from 1900 to 2005), y = percentage unemployed workers in the United States. b. x = age of subject, y = subject’s annual medical expenses. c.
Showing 33100 - 33200
of 88243
First
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
Last
Step by Step Answers