New Semester
Started
Get
50% OFF
Study Help!
--h --m --s
Claim Now
Question Answers
Textbooks
Find textbooks, questions and answers
Oops, something went wrong!
Change your search query and then try again
S
Books
FREE
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Tutors
Online Tutors
Find a Tutor
Hire a Tutor
Become a Tutor
AI Tutor
AI Study Planner
NEW
Sell Books
Search
Search
Sign In
Register
study help
mathematics
statistics
Statistics The Art And Science Of Learning From Data 3rd Edition Alan Agresti, Christine A. Franklin - Solutions
Explain carefully the interpretations of the standard deviations (a) sy, (b) sx, (c) Residual standard deviation s, and (d) se of slope estimate b.
A Freddie Mac quarterly statement (May 2010) reported that U.S. home sales for one of the central regions (including Illinois, Indiana, Ohio, and Wisconsin) have shown that home values decreased by 3.4% in the last previous year. What if someone interprets this information by saying, “The
Exercise 12.58 about U.S. population growth showed a predicted growth rate of 13% per decade. a. Show that this is equivalent to a 1.26% predicted growth per year. b. Explain why the predicted U.S. population size (in millions) x years after 1900 is 81.137(1.0126)x
For a study of University of Georgia female athletes, the prediction equation relating y = total body weight (in pounds) to x1 = height (in inches) and x2 = percent body fat is = -121 + 3.50 x1 + 1.35x2. a. Find the predicted total body weight for a female athlete at the mean values of 66 and 18
Using software with the House Selling Prices OR data file on the text CD, analyze y = selling price, x1 = house size, and x2 = lot size. a. Construct box plots for each variable and a scatterplot matrix or scatter plots between y and each of x1 and x2 .Interpret. b. Find the multiple regression
Keeneland Racetrack in Lexington, Kentucky, has been a social gathering place since 1935. Every spring and fall thousands of people come to the racetrack to socialize, gamble, and enjoy the horse races that have become so popular in Kentucky. A study investigated the different factors that affect
Lets use multiple regression to predict total body weight (in pounds) using data from a study of University of Georgia female athletes. Possible predictors are HGT = height (in inches), % BF = percent body fat, and age. The display shows correlations among these explanatory variables.a.
For countries listed in the Twelve Countries data file on the text CD, y = Internet use (percent) is predicted by x1 = per capita GDP (gross domestic product, in thousands of dollars) with r2 = 0.88. Adding x2 = carbon dioxide emissions per capita to the model yields the results in the following
For the Softball data set on the text CD, for each game the variables are a team’s number of runs scored (RUNS), number of hits (HIT), number of errors (ERR), and the difference (DIFF) between the number of runs scored by that team and by the other team, which is the response variable. MINITAB
In Example 2 on y = house selling price, x1 = house size, and x2 = number of bedrooms, = 60,102 + 63.0x1 + 15,170x2, and R = 0.72. a. Interpret the value of the multiple correlation. b. Suppose house selling prices are changed from dollars to thousands of dollars. Explain why if each house price
Using software with the Georgia Student Survey data file from the text CD, find and interpret the multiple correlation and R2 for the relationship between y = college GPA, x1 = high school GPA, and x2 = study time.
For the 59 observations in the Georgia Student Survey data file on the text CD, the result of regressing college GPA on high school GPA and study time follows.College GPA, high school GPA, and study timea. Explain in nontechnical terms what it means if the population slope coefficient for high
For the Georgia Student Survey file on the text CD, the prediction equation relating y = college GPA to x1 = high school GPA and x2 = study time (hours per day), is = 1.13 + 0.643x1 + 0.0078x2. a. Find the predicted college GPA of a student who has a high school GPA of 3.5 and who studies three
Refer to the previous exercise. a. Report and interpret the P-value for testing the hypothesis that the population slope coefficient for study time equals 0. b. Find a 95% confidence interval for the true slope for study time. Explain how the result is in accord with the result of the test in part
Refer to the previous two exercises. a. Report the residual standard deviation. What does this describe? b. Interpret the residual standard deviation by predicting where approximately 95% of the Georgia college GPAs fall when high school GPA = 3.80 and study time = 5.0 hours per day, which are the
Chapter 12 analyzed strength data for 57 female high school athletes. Upper body strength was summarized by the maximum number of pounds the athlete could bench press (denoted BP below, 1RM Bench in file). This was predicted well by the number of times she could do a 60-pound bench press (denoted
The P-value of 0.17 in part a of the previous exercise suggests that LP_200 plausibly had no effect on BP, once BP_60 is in the model. Yet when LP_200 is the sole predictor of BP, the correlation is 0.58 and the significance test for its effect has a P-value of 0.000, suggesting very strong
Refer to the previous two exercises. The sample standard deviation of BP was 13.3. The residual standard deviation of BP when BP_60 and LP_200 are predictors in a multiple regression model is 7.9. a. Explain the difference between the interpretations of these two standard deviations. b. If the
Refer to the previous three exercises. a. State and interpret the null hypothesis tested with the F statistic in the ANOVA table given in Exercise 13.22. b. From the F table (Table D), which F statistic value would have a P-value of 0.05 for these data? c. Report the observed F test statistic and
Aunt Erma’s Pizza restaurant keeps monthly records of total revenue, amount spent on TV advertising, and amount spent on newspaper advertising.a. Specify notation and formulate a multiple regression equation for predicting the monthly revenue. Explain how to interpret the parameters in the
A study in Alachua County, Florida, investigated an index of mental health impairment, which had y = 27.3 and s = 5.5. Two explanatory variables were x1 = life events score (mean = 44.4, s = 22.6) and x2 = SES (socioeconomic status, mean = 56.6, s = 25.3). Life events is a composite measure of the
Refer to the previous exercise. a. Report the test statistic and P-value for testing H0: β1 = β2 = 0. b. State the alternative hypothesis that is supported by the result in part a. c. Does the result in part a imply that necessarily both life events and SES are needed in the model? Explain.
The MINITAB results are shown for predicting selling price using x1 = size of home, x2 = number of bedrooms, and x3 = age.Regression of selling price on house size, number of bedrooms, and agea. State the null hypothesis for an F test, in the context of these variables. b. The F statistic equals
For all students at Walden University, the prediction equation for y = college GPA (range 0–4.0) and x1 = high school GPA (range 0–4.0) and x2 = college board score (range 200–800) is = 0.20 + 0.50x1 + 0.002x2. a. Find the predicted college GPA for students having (i) high school GPA = 4.0
Use software to do further analyses with the multiple regression model of y = selling price of home in thousands, x1 = size of home, and x2 = number of bedrooms, considered in Section 13.1. The data file House Selling Prices OR is on the text CD. a. Report the F statistic and state the hypotheses
In Chapter 12, we analyzed strength data for a sample of female high school athletes. The following figure is a residual plot for the multiple regression model relating the maximum number of pounds the athlete could bench press (BP) to the number of 60-pound bench presses (BP_60) and the number of
Suppose you fit a straight-line regression model to y = amount of time sleeping per day and x = age of subject. Values of y in the sample tend to be quite large for young children and for elderly people, and they tend to be lower for other people. Sketch what you would expect to observe for (a) The
Suppose you fit a straight-line regression model to x = age of subjects and y = driving accident rate. Sketch what you would expect to observe for (a) The scatterplot of x and y and (b) A plot of the residuals against the values of age.
The College Athletes data set on the text CD comes from a study of University of Georgia female athletes. The response variable BP = maximum bench press (1RM in data set) has explanatory variables LBM = lean body mass (which is weight times 1 minus the proportion of body fat) and REP_BP = number of
Use software with the House Selling Prices OR data file on the text CD to do residual analyses with the multiple regression model for y = house selling price (in thousands), x1 = lot size, and x2 = number of bathrooms. a. Find a histogram of the standardized residuals. What assumption does this
In the previous exercise, suppose house selling price tends to increase with a straight-line trend for small to medium size lots, but then levels off as lot size gets large, for a fixed value of number of bathrooms. Sketch the pattern you’d expect to get if you plotted the residuals against lot
Refer to the previous exercise. a. Explain why setting x2 at a variety of values yields a collection of parallel lines relating = to x1. What is the value of the slope for those parallel lines? b. Since the slope 0.50 for x1 is larger than the slope 0.002 for x2, does this imply that x1 has a
Refer to Example 11 on winning Olympic high jumps. The prediction equation relating y = winning height (in meters) as a function of x1 = number of years since 1928 and x2 = gender (1 = male, 0 = female) is = 1.63 + 0.0057x1 + 0.348x2. a. Using this equation, find the prediction equations relating
The Mountain Bike data file on the text CD shows selling prices for mountains bikes. When y = mountain bike price ($) is regressed on x1 = weight of bike (lbs) and x2 = the type of suspension (0 = full, 1 = front end), = 2741.62 - 53.752x1 - 643.595x2. a. Interpret the estimated effect of the
For the House Selling Prices OR data set, when we regress y = selling price (in thousands) on x1 = house size and x2= condition (1 = Good, 0 = Not Good), we get the results shown. Regression of selling price of house in thousands versus house size and conditiona. Report the regression equation.
The table shows data from 27 automotive plants on y = number of assembly defects per 100 cars and x = time (in hours) to assemble each vehicle. The data are in the Quality and Productivity file on the text CD. Number of defects in assembling 100 cars and time to assemble each vehicleSource: Data
A chain restaurant that specializes in selling hamburgers wants to analyze how y = sales for a customer (the total amount spent by a customer on food and drinks, in dollars) depends on the location of the restaurant, which is classified as inner city, suburbia, or at an interstate exit.a. Construct
Use the House Selling Prices OR data file on the text CD to regress selling price in thousands on house size and whether the house has a garage. a. Report the prediction equation. Find and interpret the equations predicting selling price using house size, for homes with and without a garage. b.
Refer to the previous exercise. a. Explain what the no interaction assumption means for this model. b. Sketch a hypothetical scatter diagram, showing points identified by garage or no garage, suggesting that there is actually a substantial degree of interaction.
Refer to Example 11 and Exercise 13.40, with = predicted winning high jump and x1 = number of years since 1928. When equations are fitted separately for males and for females, we get = 1.98 + 0.0055x1 for males and = 1.60 + 0.0065x1 for females. a. In allowing the lines to have different slopes,
You own a gift shop that has a campus location and a shopping mall location. You want to compare the regressions of y = daily total sales on x = number of people who enter the shop, for total sales listed by day at the campus location and at the mall location. Explain how you can do this using
Example 12 used logistic regression to estimate the probability of having a travel credit card when x = annual income (in thousands of euros). At the mean income of :25,000, show that the estimated probability of having a travel credit card equals 0.29.
The FL Crime data file on the text CD has data for the 67 counties in Florida on y = crime rate: Annual number of crimes in county per 1000 population x1 = education: Percentage of adults in county with at least a high school education x2 = urbanization: Percentage in county living in an urban
Baseball€™s highest honor is election to the Hall of Fame. The history of the election process, however, has been filled with controversy and accusations of favoritism. Most recently, there is also the discussion about players who used performance enhancement drugs. The Hall of Fame has failed
A study of horseshoe crabs by zoologist Dr. Jane Brockmann at the University of Florida used logistic regression to predict the probability that a female crab had a male partner nesting nearby. One explanatory variable was x = weight of the female crab (in kilograms). The results wereThe quartiles
Refer to the previous exercise. For what weight values do you estimate that a female crab hasprobability (a) 0.50, (b) Greater than 0.50, and (c) less than 0.50, of having a male partner nesting nearby?
A logistic regression model describes how the probability of voting for the Republican candidate in a presidential election depends on x , the voter€™s total family income (in thousands of dollars) in the previous year. The prediction equation for a particular sample isFind the estimated
Refer to the previous exercise. a. At which income level is the estimated probability of voting for the Republican candidate equal to 0.50? b. Over what region of income values is the estimated probability of voting for the Republican candidate (i) greater than 0.50 and (ii) less than 0.50? c. At
Refer to the previous two exercises. When the explanatory variables are x1 = family income, x2 = number of years of education, and x3 = gender (1 = male , 0 = female ), suppose a logistic regression reportsFor this sample, x1 ranges from 6 to 157 with a standard deviation of 25, and x2 ranges from
The U.S. Census Bureau lists college graduation numbers by race and gender. The table shows the data for graduating 25-year-olds.Source: J. J. McArdle and F. Hamagami, J. Amer. Statist. Assoc., vol. 89 (1994), pp. 11071123. Data from U.S. Census Bureau, American Community Survey
The three-dimensional contingency table shown is from a study of the effects of racial characteristics on whether or not individuals convicted of homicide receive the death penalty. The subjects classified were defendants in indictments involving cases with multiple murders in Florida between 1976
Refer to the previous exercise.a. Based on the prediction equation, when the defendant is black and the victims were white, show that the estimated death penalty probability is 0.233.b. The model-estimated probabilities are 0.011 when the defendant is white and victims were black, 0.113 when the
This chapter has considered many aspects of regression analysis. Let’s consider several of them at once by using software with the House Selling Prices OR data file on the text CD to conduct a multiple regression analysis of y = selling price of home, x1 = size of home, x2 = number of bedrooms,
Refer to the previous exercise. MINITAB reports the results below for the multiple regression of y = crime rate on x1 = median income (in thousands of dollars) and x2 = urbanization.Results of regression analysisCorrelations: crime, income, urbanization a. Report the prediction equations relating
In Chapter 12, we analyzed strength data for a sample of female high school athletes. When we predict the maximum number of pounds the athlete can bench press using the number of times she can do a 60-pound bench press (BP_60), we get r2 = 0.643. When we add the number of times an athlete can
Refer to the Softball data set on the text CD. Regress the difference (DIFF) between the number of runs scored by that team and by the other team on the number of hits (HIT) and the number of errors (ERR). a. Report the prediction equation, and interpret the slopes.b. From part a, approximately how
A MINITAB printout is provided from fitting the multiple regression model to U.S. crime data for the 50 states (excluding Washington, D.C.), on y = violent crime rate, x1 = poverty rate, andx2 = percent living in urban areas.a. Predict the violent crime rate for Massachusetts, which has violent
Refer to the previous exercise. Now we add x3 = percentage of single-parent families to the model. The SPSS table on the next page shows results. Without x3 in the model, poverty has slope 28.33, and when x3 is added, poverty has slope 14.95. Explain the differences in the interpretations of these
For the World Data for Fertility and Literacy data file on the text CD, a MINITAB printout follows that shows fitting a multiple regression model for y = fertility , x1 = adult literacy rate (both sexes) , x2 = combined educational enrollment (both sexes) . Report the value of each of the
Refer to the previous exercise. a. Show how to construct the F statistic for testing H0: β1 = β2 = 0 from the reported mean squares, report its P-value, and interpret. b. If these are the only nations of interest to us for this study, rather than a random sample of such nations, is this
What motivates someone to pursue the study of medicine? A students interest and motivation to study medicine can depend on the strength of motivation and career-related values and approaches to learning. Validated and reliable questionnaires Were used to obtain data from 116 first-year
A study of horseshoe crabs found a logistic regression equation for predicting the probability that a female crab had a male partner nesting nearby using x = width of the carapace shell of the female crab (in centimeters). The results werea. For width, Q1 = 24.9 and Q3 = 27.7. Find the estimated
In a study (reported in New York Times, February 15, 1991) on the effects of AZT in slowing the development of AIDS symptoms, 338 veterans whose immune systems were beginning to falter after infection with the AIDS virus were randomly assigned either to receive AZT immediately or to wait until
The earnings of a PGA Tour golfer are determined by performance in tournaments. A study analyzed tour data to determine the financial return for certain skills of professional golfers. The sample consisted of 393 golfers competing in one or both of the 2002 and 2008 seasons. The most significant
The table summarizes results of a logistic regression model for predictions about first home purchase by young married households. The response variable is whether the subject owns a home (1 = yes, 0 = no). The explanatory variables are husbands income, wifes income (each in
Refer to the FL Student Survey data file on the text CD. Using software, conduct a regression analysis using y = college GPA and predictors high school GPA and sports (number of weekly hours of physical exercise). Prepare a report, summarizing your graphical analyses, bivariate models and
The table shows results of fitting a regression model to data on Oklahoma State University salaries (in dollars) of 675 full-time college professorsof different disciplines with at least two years of instructional employment. All of the predictors are categorical (binary), except for years as
For each of the following statements, indicate whether it is true or false. If false, explain why it is false. a. The multiple correlation is always the same as the ordinary correlation computed between the values of the response variable and the values predicted by the regression model. b. The
For each of the following statements, indicate whether it is true or false. If false, explain why it is false. In regression analysis: a. The estimated coefficient of x1 can be positive in the bivariate model but negative in a multiple regression model. b. When a model is refitted after y = income
For data on y = college GPA, x1 = high school GPA, and x2 = average of mathematics and verbal entrance exam score, we get = 2.70 + 0.45x1 for bivariate regression and = 0.3 + 0.40x1 + 0.003x2 for multiple regression. For each of the following statements, indicate whether it is true or false. Give
Example 2, the prediction equation between y = selling price and x1 = house size and x2 = number of bedrooms was = 60,102 + 63.0x1 + 15,170x2. a. For fixed number of bedrooms, how much is the house selling price predicted to increase for each square foot increase in house size? Why? b. For a fixed
You want to include religious affiliation as a predictor in a regression model, using the categories Protestant, Catholic, Jewish, Other. You set up a variable x1 that equals 1 for Protestants, 2 for Catholics, 3 for Jewish, and 4 for Other, using the model μy = α + βx1. Explain why this is
Using its definition in terms of SS values, explain why R2 = 1 only when all the residuals are 0, and R2 = 0 when each = y. Explain what this means in practical terms.
When a model has a very large number of predictors, even when none of them truly have an effect in the population, one or two may look significant in t tests merely by random variation. Explain why performing the F test first can safeguard against getting such false information from t tests.
For the high school female athletes data file, regress the maximum bench press on weightand percent body fat. a. Show that the F test is statistically significant at the 0.05 significance level.b. Show that the P-values are both larger than 0.35 for testing the individual effects with t tests. (It
For binary response variables, one reason that logistic regression is usually preferred over straight-line regression is that a fixed change in x often has a smaller impact on a probability p when p is near 0 or near 1 than when p is near the middle of its range. Let y refer to the decision to rent
When we use R2 for a random sample to estimate a population R2, it’s a bit biased. It tends to be a bit too large, especially when n is small. Some software also reports Adjusted R2 = R2 - {p/[n - (p + 1)]}(1 - R2), where p = number of predictor variables in the model. This is slightly smaller
The least squares prediction equation provides predicted values yn with the strongest possible correlation with y, out of all possible prediction equations of that form. Based on this property, explain why the multiple correlation R cannot decrease when you add a variable to a multiple regression
Chapter 10 presented methods for comparing means for two groups. Explain how it’s possible to perform a significance test of equality of two population means as a special case of a regression analysis. (Hint: The regression model then has a single explanatory variable—an indicator variable for
Let y = death rate and x = average age of residents, measured for each county in Louisiana and in Florida. Draw a hypothetical scatterplot, identifying points for each state, such that the mean death rate is higher in Florida than in Louisiana when x is ignored, but lower when it is controlled.
Suppose that the correlation between x1 and x2 equals 0. Then, for multiple regression with those predictors, it can be shown that the slope for x1 is the same as in bivariate regression when x1 is the only predictor. Explain why you would expect this to be true. (Hint: If you don’t control x2,
A regression formula that gives a parabolic shape instead of a straight line for the relationship between two variables is μy = α + β1x + β2 x2.a. Explain why this is a multiple regression model, with x playing the role of x1 and x2 (the square of
At the x value where the probability of success is some value p , the line drawn tangent to the logistic regression curve has slope βp(1 - p). a. Explain why the slope is β / 4 when p = 0.5. b. Show that the slope is weaker at other p values by evaluating this at p = 0.1, 0.3, 0.7, and 0.9. What
When α + βx = 0, so that x = -α / β, show that the logistic regression equation p = e α+β x / (1 + eα+βx) gives p = 0.50.
The CEO of a company that owns five resort hotels wants to evaluate and compare satisfaction with the five hotels. The company’s research department randomly sampled 125 people who had stayed at any of the hotels during the past month and asked them to rate their expectations of the hotel before
Refer to Exercise 14.3. Using software,a. Create the data file and find the sample means and standard deviations. b. Find and report the ANOVA table. Interpret the P-value. c. Change an observation in Group 2 so that the P-value will be smaller. Specify the value you changed, and report the
The Anorexia data file on the text CD shows weight change for 72 anorexic teenage girls who were randomly assigned to one of three psychological treatments. Use software to analyze these data. (The change scores are given in the file for the control and cognitive therapy groups. You can create the
For the House Selling Prices OR data file on the text CD, the output shows the result of conducting an ANOVA comparing mean house selling prices (in $1000) by Age Category ( New = 0 to 24 years old, Medium = 25 to 50 years old, Old = 51 to 74 years old, Very Old = 75 + years old ). It also shows a
An extensive survey by the Pew Forum on Religion & Public Life, conducted in 2007, details views on religion in the United States. Based on interviews with a representative sample of more than 35,000 Americans age 18 and older, the U.S. Religious Landscape Survey found that religious
Examples 2 and 3 analyzed whether telephone callers to an airline would stay on hold different lengths of time, on average, if they heard (a) an advertisement about the airline, (b) Muzak, or (c) classical music. The sample means were 5.4, 2.8, and 10.4, with n1 = n2 = n3 = 5. The ANOVA test had F
Refer to the previous exercise. We could instead use the Tukey method to construct multiple comparison confidence intervals. The Tukey confidence intervals having overall confidence level 95% have margins of error of 5.7, compared to 4.7 for the separate 95% confidence intervals in the previous
A psychologist compares the mean amount of time of rapid-eye movement (REM) sleep for subjects under three conditions. She randomly assigns 12 subjects to the three groups, four per group. The sample means for the three groups were 18, 15, and 12. The table shows the ANOVA table from SPSS.REM
Refer to the previous exercise.a. Set up indicator variables for a regression model so that an F test for the regression parameters is equivalent to the ANOVA test comparing the three means. b. Express the null hypothesis both in terms of population means and in terms of regression parameters for
Exercise 14.5 showed an ANOVA for comparing mean customer satisfaction scores for three service centers. The sample means on a scale of 0 to 10 were 7.60 in San Jose, 7.80 in Toronto, and 7.10 in Bangalore. Each sample size = 100, MS error = 0.47, and the F test statistic = 27.6 has P-value <
Refer to the previous exercise. a. Set up indicator variables to represent the three service centers. b. The prediction equation is = 7.1 + 0.5x1 + 0.7x2. Show how the terms in this equation relate to the sample means of 7.6 for San Jose, 7.8 for Toronto, and 7.1 for Bangalore. Previous
A bank conducts a survey in which it randomly samples 400 of its customers. The survey asks the customers which way they use the bank the most: (1) interacting with a teller at the bank, (2) using ATMs, or (3) using the bank’s Internet banking service. It also asks their level of satisfaction
Each of 100 restaurants in a fast-food chain is randomly assigned one of four media for an advertising campaign: A = radio, B = TV, C = newspaper, D = mailing. For each restaurant, the observation is the change in sales, defined as the difference between the sales for the month during which the
Refer to Exercise 14.3 about studying French, with data shown again below. Using software,a. Compare the three pairs of means with separate 95% confidence intervals. Interpret.b. Compare the three pairs of means with Tukey 95% multiple comparison confidence intervals. Interpret, and explain why the
An experiment randomly assigns 100 subjects suffering from high cholesterol to one of four groups: low-dose Lipitor, high-dose Lipitor, low-dose Zocor, high-dose Zocor. After three months of treatment, the change in cholesterol level is measured. a. Identify the response variable and the two
For the previous exercise, show a hypothetical set of population means for the four groups that would have a. A dose effect but no drug effect. b. A drug effect but no dose effect. c. A drug effect and a dose effect. d. No drug effect and no dose effect. Previous exercise An experiment randomly
Showing 33200 - 33300
of 88243
First
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
Last
Step by Step Answers