1 Million+ Step-by-step solutions

An experiment was conducted to study the extrusion process of biodegradable packaging foam.

Source: Data extracted from W. Y. Koh, K. M. Eskridge, and M. A. Hanna, “Supersaturated Split-Plot Designs,” Journal of Quality Technology, 45, January 2013, pp. 61–72.

Among the factors considered for their effect on the unit density (mg/ml) were the die temperature (145°C versus 155°C) and the die diameter (3 mm versus 4 mm). The results were stored in PackagingFoam3. Develop a multiple regression model that uses die temperature and die diameter to predict the unit density (mg>ml). Be sure to perform a thorough residual analysis. Do you think that you need to use both independent variables in the model? Explain.

Is the number of calories in a beer related to the number of carbohydrates and/or the percentage of alcohol in the beer? Data concerning 158 of the best-selling domestic beers in the United States are stored in DomesticBeer . The values for three variables are included: the number of calories per 12 ounces, the alcohol percentage, and the number of carbohydrates (in grams) per 12 ounces.

Source: Data extracted from www.beer100.com/beercalories.htm, December 1, 2016.

a. Perform a multiple linear regression analysis, using calories as the dependent variable and percentage alcohol and number of carbohydrates as the independent variables.

b. Add quadratic terms for alcohol percentage and the number of carbohydrates.

c. Which model is better, the one in (a) or (b)?

d. What conclusions can you reach concerning the relationship between the number of calories in a beer and the alcohol percentage and number of carbohydrates?

Researchers wanted to investigate the relationship between employment and accommodation capacity in the European travel and tourism industry. The file EuroTourism contains a sample of 27 European countries. Variables included are the number of jobs generated in the travel and tourism industry in 2015 and the number of establishments that provide overnight accommodation for tourists.

Source: Data extracted from www.marketline.com.

a. Construct a scatter plot of the number of jobs generated in the travel and tourism industry in 2015 (Y) and the number of establishments that provide overnight accommodation for tourists (X).

b. Fit a quadratic regression model to predict the number of jobs generated and state the quadratic regression equation.

c. Predict the mean number of jobs generated in the travel and tourism industry for a country with 3,000 establishments that provide overnight accommodation for tourists.

d. Perform a residual analysis on the results and determine whether the regression model is valid.

e. At the 0.05 level of significance, is there a significant quadratic relationship between the number of jobs generated in the travel and tourism industry in 2015 and the number of establishments that provide overnight accommodation for tourists?

f. What is the p-value in (e)? Interpret its meaning.

g. At the 0.05 level of significance, determine whether the quadratic model is a better fit than the linear model.

h. Interpret the meaning of the coefficient of multiple determination.

i. Compute the adjusted r^{2}.

j. What conclusions can you reach concerning the relationship between the number of jobs generated in the travel and tourism industry in 2015 and the number of establishments that provide overnight accommodation for tourists?

Using the data of Problem 15.4 on page 600, stored in DomesticBeer , perform either a square-root transformation on the dependent variable (calories) or a square-root transformation on each of the independent variables (percentage alcohol and number of carbohydrates) depending on whether the residuals are normally distributed or vary across X values.

a. State the regression equation.

b. At the 0.05 level of significance, is there a significant relationship between calories and the percentage of alcohol and the number of carbohydrates?

c. Interpret the meaning of the coefficient of determination, r^{2}, in this problem.

d. Compute the adjusted r^{2}.

e. Compare your results with those in Problem 15.4. Which model is better? Why?

If the coefficient of determination between two independent variables is 0.20, what is the VIF?

If the coefficient of determination between two independent variables is 0.50, what is the VIF?

The file FTMBA contains data from a sample of full-time MBA programs offered by private universities. The variables collected for this sample are average starting salary upon graduation ($), the percentage of applicants to the full-time program who were accepted, the average GMAT test score of students entering the program, program per-year tuition ($), and percent of students with job offers at time of graduation.

Source: Data extracted from U.S. News & World Report Education, “Best Graduate Schools,” bit.ly/1E8MBcp.

Develop the most appropriate multiple regression model to predict the mean starting salary upon graduation. Be sure to include a thorough residual analysis. In addition, provide a detailed explanation of the results, including a comparison of the most appropriate multiple regression model to the best simple linear regression model.

How can you evaluate whether collinearity exists in a multiple regression model?

A specialist in baseball analytics has expanded his analysis, presented in Problem 14.77 on page 580, of which variables are important in predicting a team’s wins in a given baseball season. He has collected data in Baseball related to wins, ERA, saves, runs scored per game, batting average, home runs, and batting average against for a recent season.

Develop the most appropriate multiple regression model to predict a team’s wins. Be sure to include a thorough residual analysis. In addition, provide a detailed explanation of the results.

The file Cities contains a sample of 25 cities in the United States. Variables included are city average annual salary ($), unemployment rate (%), median home value ($thousands), number of violent crimes per 100,000 residents, average commuter travel time (minutes), and livability score, a rating on a scale of 0 to 100 that rates the overall livability of the city.

Source: Data extracted from “100 Best Places to Live in the USA,” available at bit.ly/2jYvtFz and “AARP Livability Index,” available at bit.ly/1Qbd6oj.

Develop the most appropriate multiple regression model to predict average annual salary ($). Be sure to perform a thorough residual analysis and provide a detailed explanation of the results as part of your answer.

In Problems 15.32–15.36 you developed multiple regression models to predict the fair market value of houses in Glen Cove, Roslyn, and Freeport. Now write a report based on the models you developed. Append all appropriate charts and statistical information to your report.

A baseball analytics specialist wants to determine which variables are important in predicting a team’s wins in a given season. He has collected data related to wins, earned run average (ERA), and runs scored per game for a recent season (stored in Baseball). Develop a model to predict the number of wins based on ERA and runs scored per game.

a. State the multiple regression equation.

b. Interpret the meaning of the slopes in this equation.

c. Predict the mean number of wins for a team that has an ERA of 4.50 and has scored 4.6 runs per game.

d. Perform a residual analysis on the model and determine whether the regression assumptions are valid.

e. Is there a significant relationship between the number of wins and the two independent variables (ERA and runs scored per game) at the 0.05 level of significance?

f. Determine the p-value in (e) and interpret its meaning.

g. Interpret the meaning of the coefficient of multiple determination in this problem.

h. Determine the adjusted r^{2}.

i. At the 0.05 level of significance, determine whether each independent variable makes a significant contribution to the regression model. Indicate the most appropriate regression model for this set of data.

j. Determine the p-values in (i) and interpret their meaning.

k. Construct a 95% confidence interval estimate of the population slope between wins and ERA.

l. Compute and interpret the coefficients of partial determination.

m. Which is more important in predicting wins—pitching, as measured by ERA, or offense, as measured by runs scored per game? Explain.

A sample of 61 houses recently listed for sale in Silver Spring, Maryland, was selected with the objective of developing a model to predict the taxes (in $) based on the asking price of houses (in $thousands) and the age of the houses (in years) (stored in SilverSpring):

a. State the multiple regression equation.

b. Interpret the meaning of the slopes in this equation.

c. Predict the mean taxes for a house that has an asking price of $400,000 and is 50 years old.

d. Perform a residual analysis on the model and determine whether the regression assumptions are valid.

e. Determine whether there is a significant relationship between taxes and the two independent variables (asking price and age) at the 0.05 level of significance.

f. Determine the p-value in (e) and interpret its meaning.

g. Interpret the meaning of the coefficient of multiple determination in this problem.

h. Determine the adjusted r^{2}.

i. At the 0.05 level of significance, determine whether each independent variable makes a significant contribution to the regression model. Indicate the most appropriate regression model for this set of data.

j. Determine the p-values in (i) and interpret their meaning.

k. Construct a 95% confidence interval estimate of the population slope between taxes and asking price. How does the interpretation of the slope here differ from that of Problem 13.77 on page 525?

l. Compute and interpret the coefficients of partial determination.

m. The real estate assessor’s office has been publicly quoted as saying that the age of a house has no bearing on its taxes. Based on your answers to (a) through (l), do you agree with this statement? Explain.

If you are using exponential smoothing for forecasting an annual time series of revenues, what is your forecast for next year if the smoothed value for this year is $32.4 million?

You are using exponential smoothing on an annual time series concerning total revenues (in $millions). You decide to use a smoothing coefficient of W = 0.20, and the exponentially smoothed value for 2017 is E_{2017} = (0.20)(12.1) + (0.80)(9.4).

a. What is the smoothed value of this series in 2017?

b. What is the smoothed value of this series in 2018 if the value of the series in that year is $11.5 million?

The data below (stored in DesktopLaptop) represent the hours per day spent by American desktop/ laptop users from 2008 to 2016.

a. Plot the time series.

b. Fit a three-year moving average to the data and plot the results.

c. Using a smoothing coefficient of W = 0.50, exponentially smooth the series and plot the results.

d. What is your exponentially smoothed forecast for 2017?

e. Repeat (c) and (d), using W = 0.25.

f. Compare the results of (d) and (e).

g. What conclusions can you reach about desktop/laptop use by American users?

The following data, stored in CoreAppliances provide the total number of shipments of core major household appliances in the U.S. from 2000 to 2016 (in millions).

a. Plot the time series.

b. Fit a three-year moving average to the data and plot the results.

c. Using a smoothing coefficient of W = 0.50, exponentially smooth the series and plot the results.

d. What is your exponentially smoothed forecast for 2017?

e. Repeat (c) and (d), using W = 0.25.

f. Compare the results of (d) and (e).

g. What conclusions can you reach concerning the total number of shipments of core major household appliances in the U.S. from 2000 to 2016 (in millions)?

The data (stored in CoffeeExports ) represent the coffee exports (in thousands of 60 kg bags) by Costa Rica from 2004 to 2016:

a. Plot the data.

b. Fit a three-year moving average to the data and plot the results.

c. Using a smoothing coefficient of W = 0.50, exponentially smooth the series and plot the results.

d. What is your exponentially smoothed forecast for 2017?

e. Repeat (c) and (d), using a smoothing coefficient of W = 0.25.

f. Compare the results of (d) and (e).

g. What conclusions can you reach about the exports of coffee in Costa Rica?

The file IPOs contains the number of initial public offerings (IPOs) issued from 2001 through 2016.

Source: Data extracted from K.W. Hanley, “The Economics of Primary Markets,” available at bit.ly/2vWb6hv.

a. Plot the data.

b. Fit a three-year moving average to the data and plot the results.

c. Using a smoothing coefficient of W = 0.50, exponentially smooth the series and plot the results.

d. What is your exponentially smoothed forecast for 2017?

e. Repeat (c) and (d), using a smoothing coefficient of W = 0.25.

f. Compare the results of (d) and (e).

The linear trend forecasting equation for an annual time series containing 22 values (from 1996 to 2017) on total revenues (in $millions) is

a. Interpret the Y intercept, b_{0}.

b. Interpret the slope, b_{1}.

c. What is the fitted trend value for the fifth year?

d. What is the fitted trend value for the most recent year?

e. What is the projected trend forecast three years after the last value?

The linear trend forecasting equation for an annual time series containing 42 values (from 1976 to 2017) on net sales (in $billions) is

a. Interpret the Y intercept, b_{0}.

b. Interpret the slope, b_{1}.

c. What is the fitted trend value for the tenth year?

d. What is the fitted trend value for the most recent year?

e. What is the projected trend forecast two years after the last value?

There has been much publicity about bonuses paid to workers on Wall Street. Just how large are these bonuses? The file Bonuses contains the bonuses paid (in $000) from 2000 to 2016.

Source: Data extracted from J. Spector, “Wall Street bonuses rise 1% to average $138,210,” USA Today, March 15, 2017.

a. Plot the data.

b. Compute a linear trend forecasting equation and plot the results.

c. Compute a quadratic trend forecasting equation and plot the results.

d. Compute an exponential trend forecasting equation and plot the results.

e. Using the forecasting equations in (b) through (d), what are your annual forecasts of the bonuses for 2017 and 2018?

f. How can you explain the differences in the three forecasts in (e)? What forecast do you think you should use? Why?

Gross domestic product (GDP) is a major indicator of a nation’s overall economic activity. It consists of personal consumption

expenditures, gross domestic investment, net exports of goods and services, and government consumption expenditures. The file GDP contains the GDP (in billions of current dollars) for the United States from 1980 to 2016.

Source: Data extracted from Bureau of Economic Analysis, U.S. Department of Commerce, www.bea.gov.

a. Plot the data.

b. Compute a linear trend forecasting equation and plot the trend line.

c. What are your forecasts for 2017 and 2018?

d. What conclusions can you reach concerning the trend in GDP?

The data in FedReceipt represent federal receipts from 1978 through 2016, in billions of current dollars, from individual and corporate income tax, social insurance, excise tax, estate and gift tax, customs duties, and federal reserve deposits.

Source: Data extracted from “Historical Federal Receipt and Outlay Summary,” Tax Policy Center, tpc.io/1JMFKpo.

a. Plot the series of data.

b. Compute a linear trend forecasting equation and plot the trend line.

c. What are your forecasts of the federal receipts for 2017 and 2018?

d. What conclusions can you reach concerning the trend in federal receipts?

The file HouseSales contains the number of new, single-family houses sold in the U.S. from 1992 through 2016.

a. Plot the data.

b. Compute a linear trend forecasting equation and plot the trend line.

c. Compute a quadratic trend forecasting equation and plot the results.

d. Compute an exponential trend forecasting equation and plot the results.

e. Which model is the most appropriate?

f. Using the most appropriate model, forecast the number of new, single-family houses sold in the U.S. in 2017.

The data shown in the following table and stored in Solar Power represent the yearly amount of solar power generated by utilities (in millions of kWh) in the United States from 2002 through 2016:

a. Plot the data.

b. Compute a linear trend forecasting equation and plot the trend line.

c. Compute a quadratic trend forecasting equation and plot the results.

d. Compute an exponential trend forecasting equation and plot the results.

e. Using the models in (b) through (d), what are your annual trend forecasts of the yearly amount of solar power generated by utilities (in millions of kWh) in the United States in 2017 and 2018?

The file CarProduction contains the number of passenger cars produced in the U.S. (in thousands) from 1999 to 2016.

Source: Data extracted from www.statista.com.

a. Plot the data.

b. Compute a linear trend forecasting equation and plot the trend line.

c. Compute a quadratic trend forecasting equation and plot the results.

d. Compute an exponential trend forecasting equation and plot the results.

e. Which model is the most appropriate?

f. Using the most appropriate model, forecast the U.S. car production for 2017.

The average salary of Major League Baseball players on opening day from 2000 to 2017 is stored in BBSalaries and shown below.

a. Plot the data.

b. Compute a linear trend forecasting equation and plot the trend line.

c. Compute a quadratic trend forecasting equation and plot the results.

d. Compute an exponential trend forecasting equation and plot the results.

e. Which model is the most appropriate?

f. Using the most appropriate model, forecast the average salary for 2018.

The file Silver contains the following prices in London for an ounce of silver (in US$) on the last day of the year from 1999 to 2016:

a. Plot the data.

b. Compute a linear trend forecasting equation and plot the trend line.

c. Compute a quadratic trend forecasting equation and plot the results.

d. Compute an exponential trend forecasting equation and plot the results.

e. Which model is the most appropriate?

f. Using the most appropriate model, forecast the price of silver at the end of 2017.

The data in CPI-U reflect the annual values of the consumer price index (CPI) in the United States over the 52-year period 1965 through 2016, using 1982 through 1986 as the base period. This index measures the average change in prices over time in a fixed “market basket” of goods and services purchased by all urban consumers, including urban wage earners (i.e., clerical, professional, managerial, and technical workers; self-employed individuals; and short-term workers), unemployed individuals, and retirees.

Soruce: Data extracted from Bureau of Labor Statistics, U.S. Department of Labor, www.bls.gov.)

a. Plot the data.

b. Describe the movement in this time series over the 52-year period.

c. Compute a linear trend forecasting equation and plot the trend line.

d. Compute a quadratic trend forecasting equation and plot the results.

e. Compute an exponential trend forecasting equation and plot the results.

f. Which model is the most appropriate?

g. Using the most appropriate model, forecast the CPI for 2017 and 2018.

Although you should not expect a perfectly fitting model for any time-series data, you can consider the first differences, second differences, and percentage differences for a given series as guides in choosing an appropriate model.

For this problem, use each of the time series presented in the table above and stored in TSModel1:

a. Determine the most appropriate model.

b. Compute the forecasting equation.

c. Forecast the value for 2017.

A time-series plot often helps you determine the appropriate model to use. For this problem, use each of the time series presented in the following table and stored in TSModel2:

a. Plot the observed data Y over time X and plot the logarithm of the observed data (log Y) over time X to determine whether a linear trend model or an exponential trend model is more appropriate.

b. Compute the appropriate forecasting equations.

c. Forecast the values for 2017.

Using the data for Problem 16.15 on page 645 that represent the number of new, single-family houses sold in the U.S. from 1992 through 2016 (stored in HouseSales),

a. Fit a third-order autoregressive model to the new single-family homes sold and test for the significance of the third-order autoregressive parameter. (Use α = 0.05.)

b. If necessary, fit a second-order autoregressive model to the new single-family homes sold and test for the significance of the second-order autoregressive parameter. (Use α = 0.05.)

c. If necessary, fit a first-order autoregressive model to the new single-family homes sold and test for the significance of the first-order autoregressive parameter. (Use α = 0.05.)

d. If appropriate, forecast the new single-family homes sold in 2017.

Using the data for Problem 16.12 on page 645 concerning the bonuses paid to workers on Wall Street from 2000 to 2016 (stored in Bonuses),

a. Fit a third-order autoregressive model to the bonuses paid and test for the significance of the third-order autoregressive parameter. (Use α = 0.05.)

b. If necessary, fit a second-order autoregressive model to the bonuses paid and test for the significance of the second-order autoregressive parameter. (Use α = 0.05.)

c. If necessary, fit a first-order autoregressive model to the bonuses paid and test for the significance of the first-order autoregressive parameter. (Use α = 0.05.)

d. If appropriate, forecast the bonuses paid in 2017 and 2018.

Using the data for Problem 16.17 on page 645 concerning the number of passenger cars produced in the United States from 1999 to 2016 (stored in CarProduction ),

a. Fit a third-order autoregressive model to the number of passenger cars produced in the United States and test for the significance of the third-order autoregressive parameter. (Use α = 0.05.)

b. If necessary, fit a second-order autoregressive model to the number of passenger cars produced in the United States and test for the significance of the second-order autoregressive parameter. (Use α = 0.05.)

c. If necessary, fit a first-order autoregressive model to the number of passenger cars produced in the United States and test for the significance of the first-order autoregressive parameter. (Use α = 0.05.)

d. Forecast the U.S. car production for 2017.

Using the average baseball salary from 2000 through 2017 data for Problem 16.18 on page 645 (stored in BBSalaries),

a. Fit a third-order autoregressive model to the average baseball salary and test for the significance of the third-order autoregressive parameter. (Use α = 0.05.)

b. If necessary, fit a second-order autoregressive model to the average baseball salary and test for the significance of the second-order autoregressive parameter. (Use α = 0.05.)

c. If necessary, fit a first-order autoregressive model to the average baseball salary and test for the significance of the first order autoregressive parameter. (Use α = 0.05.)

d. Forecast the average baseball salary for 2018.

Using the yearly amount of solar power generated by utilities (in millions of kWh) in the United States from 2002 through 2016 data for Problem 16.16 on page 645 (stored in SolarPower),

a. Fit a third-order autoregressive model to the amount of solar power installed and test for the significance of the third-order autoregressive parameter. (Use α = 0.05.)

b. If necessary, fit a second-order autoregressive model to the amount of solar power installed and test for the significance of the second-order autoregressive parameter. (Use α = 0.05.)

c. If necessary, fit a first-order autoregressive model to the amount of solar power installed and test for the significance of the first-order autoregressive parameter. (Use α = 0.05.)

d. Forecast the yearly amount of solar power generated by utilities (in millions of kWh) in the United States in 2017 and 2018.

In forecasting a monthly time series over a five-year period from January 2013 to December 2017, the exponential trend forecasting equation for January is

Take the antilog of the appropriate coefficient from this equation and interpret the

a. Y intercept, b̂_{0}.

b. Monthly compound growth rate.

c. January multiplier.

In forecasting a quarterly time series over the five-year period from the first quarter of 2013 through the fourth quarter of 2017, the exponential trend forecasting equation is given by

where quarter zero is the first quarter of 2013. Take the antilog of the appropriate coefficient from this equation and interpret the

a. Y intercept, b̂_{0}.

b. Quarterly compound growth rate.

c. Second-quarter multiplier.

Refer to the exponential model given in Problem 16.42.

a. What is the fitted value of the series in the fourth quarter of 2017?

b. What is the fitted value of the series in the first quarter of 2017?

c. What is the forecast in the fourth quarter of 2017?

d. What is the forecast in the first quarter of 2018?

The data in Toys R Us are quarterly revenues (in $millions) for Toys R Us from 1996-Q1 through 2017-Q1.

Source: Data extracted from Standard & Poor’s Stock Reports, November 1995, November 1998, and April 2002, and Toys R Us, Inc., www.toysrus.com.

a. Do you think that the revenues for Toys R Us are subject to seasonal variation? Explain.

b. Plot the data. Does this chart support your answer in (a)?

c. Develop an exponential trend forecasting equation with quarterly components.

d. Interpret the quarterly compound growth rate.

e. Interpret the quarterly multipliers.

f. What are the forecasts for 2017-Q2, 2017-Q3, 2017-Q4, and all four quarters of 2018?

Are gasoline prices higher during the height of the summer vacation season than at other times? The file GasPrices contains the mean monthly prices (in $/gallon) for unleaded gasoline in the United States from January 2006 to June 2017.

Source: Data extracted from U.S. Energy Information Administration, “Monthly Energy Review,” bit.ly/2wYUEtV.

a. Construct a time-series plot.

b. Develop an exponential trend forecasting equation with monthly components.

c. Interpret the monthly compound growth rate.

d. Interpret the monthly multipliers.

e. Write a short summary of your findings.

The file Freezer from January 2012 to December 2016 contains the number (in thousands) of freezer shipments in the United States from January 2012 to December 2016.

Source: Data extracted from www.statista.com and “Forecasts/Shipments Archives,” bit.ly/2fGtULf.

a. Plot the time-series data.

b. Develop an exponential trend forecasting equation with monthly components.

c. What is the fitted value in December 2016?

d. Interpret the monthly compound growth rate.

e. Interpret the July multiplier.

The file Silver-Q contains the price in London for an ounce of silver (in US$) at the end of each quarter from 2004 through 2016.

Source: Data extracted from USAGold, “Daily Silver Price History,” bit.ly/2w8iBSl.

a. Plot the data.

b. Develop an exponential trend forecasting equation with quarterly components.

c. Interpret the quarterly compound growth rate.

d. Interpret the first quarter multiplier.

e. What is the fitted value for the last quarter of 2016?

f. What are the forecasts for all four quarters of 2017?

The file Gold contains the price in London for an ounce of gold (in US$) at the end of each quarter from 2004 through 2016.

Source: Data extracted from USAGold, “Daily Gold Price History,” bit.ly/2w8iBSl.

a. Plot the data.

b. Develop an exponential trend forecasting equation with quarterly components.

c. Interpret the quarterly compound growth rate.

d. Interpret the first quarter multiplier.

e. What is the fitted value for the last quarter of 2016?

f. What are the forecasts for all four quarters of 2017?

g. Are the forecasts in (f) accurate? Explain.

Under what circumstances is the exponential trend model most appropriate?

How does the least-squares linear trend forecasting model developed in this chapter differ from the least-squares linear regression model considered in Chapter 13?

How does auto-regressive modeling differ from the other approaches to forecasting?

What are the different approaches to choosing an appropriate forecasting model?

How does forecasting for monthly or quarterly data differ from forecasting for annual data?

The U.S. Department of Labor gathers and publishes statistics concerning the labor market. The file Workforce contains data on the size of the U.S. civilian noninstitutional population of people 16 years and over (in thousands) and the U.S. civilian noninstitutional workforce of people 16 years and over (in thousands) for 1984–2016. The workforce variable reports the number of people in the population who have a job or are actively looking for a job.

Source: Data extracted from Bureau of Labor Statistics, U.S. Department of Labor, www.bls.gov.

a. Plot the time series for the U.S. civilian noninstitutional population of people 16 years and older.

b. Compute the linear trend forecasting equation.

c. Forecast the U.S. civilian noninstitutional population of people 16 years and older for 2017 and 2018.

d. Repeat (a) through (c) for the U.S. civilian noninstitutional workforce of people 16 years and older.

The data stored in McDonalds represent the gross revenues (in billions of current dollars) of McDonald’s Corporation from 1975 through 2016:

a. Plot the data.

b. Compute the linear trend forecasting equation.

c. Compute the quadratic trend forecasting equation.

d. Compute the exponential trend forecasting equation.

e. Determine the best-fitting autoregressive model, using α = 0.05.

f. Perform a residual analysis for each of the models in (b) through (e).

g. Compute the standard error of the estimate (SYX) and the MAD for each corresponding model in (f).

h. On the basis of your results in (f) and (g), along with a consideration of the principle of parsimony, which model would you select for purposes of forecasting? Discuss.

i. Using the selected model in (h), forecast gross revenues for 2017.

Teachers’ Retirement System of the City of New York offers several types of investments for its members. Among the choices are investments with fixed and variable rates of return. There are several categories of variable-return investments. The Diversified Equity Fund consists of investments that are primarily made in stocks, and the Stable-Value Fund consists of investments in corporate bonds and other types of lower-risk instruments. The data in TRSNYC represent the value of a unit of each type of variable-return investment at the beginning of each year from 1984 to 2017.

Source: Data extracted from “Historical Data-Unit Values, Teachers’ Retirement System of the City of New York,” bit.ly/SESJF5.

For each of the two time series,

a. Plot the data.

b. Compute the linear trend forecasting equation.

c. Compute the quadratic trend forecasting equation.

d. Compute the exponential trend forecasting equation.

e. Determine the best-fitting autoregressive model, using α = 0.05.

f. Perform a residual analysis for each of the models in (b) through (e).

g. Compute the standard error of the estimate (SYX) and the MAD for each corresponding model in (f).

h. On the basis of your results in (f) and (g), along with a consideration of the principle of parsimony, which model would you select for purposes of forecasting? Discuss.

i. Using the selected model in (h), forecast the unit values for 2018.

j. Based on the results of (a) through (i), what investment strategy would you recommend for a member of the Teachers’ Retirement System of the City of New York? Explain.

As a consultant to an investment company trading in various currencies, you have been assigned the task of studying long-term trends in the exchange rates of the Canadian dollar, the Japanese yen, and the English pound. Data from 1980 to 2016 are stored in Currency , where the Canadian dollar, the Japanese yen, and the English pound are expressed in units per U.S. dollar.

Develop a forecasting model for the exchange rate of each of these three currencies and provide forecasts for 2017 and 2018 for each currency. Write an executive summary for a presentation to be given to the investment company. Append to this executive summary a discussion regarding possible limitations that may exist in these models.

In mining engineering, holes are often drilled through rock using drill bits. As a drill hole gets deeper, additional rods are added to the drill bit to enable additional drilling to take place. It is expected that drilling time increases with depth. This increased drilling time could be caused by several factors, including the mass of the drill rods that are strung together. The business problem relates to whether drilling is faster using dry drilling holes or wet drilling holes. Using dry drilling holes involves forcing compressed air down the drill rods to flush the cuttings and drive the hammer. Using wet drilling holes involves forcing water rather than air down the hole. Data have been collected from a sample of 50 drill holes that contains measurements of the time to drill each additional 5 feet (in minutes), the depth (in feet), and whether the hole was a dry drilling hole or a wet drilling hole. The data are organized and stored in Drill.

a. Using half the data as the training sample and the other half of the data as the test sample, develop a regression tree model to predict the drilling time.

b. What conclusions can you reach about the drilling time?

The file MobileSpeed contains the overall download and upload speeds in mbps for nine carriers in the United States.

Source: Data extracted from “Best Mobile Network 2016,” bit.ly/1KGPrMm, accessed November 10, 2016.

a. Perform a cluster analysis using the complete linkage method on the U.S. carriers based on the download and upload speeds.

b. What conclusions can you reach about which carriers are most similar?

Have you wondered how Internet connection speed varies around the globe? The file ConnectionSpeed contains the mean connection speed, the mean peak connection speed, the percent of the time the connection speed is above 4 mbps, and the percent of the time the connection speed is above 10 Mbps for various countries.

Source: Data extracted from bit.ly/2vPmifV.

a. Perform a cluster analysis using the complete linkage method on the various countries based on the mean connection speed, the mean peak connection speed, the percent of the time the speed is above 4 Mbps, and the percent of the time the connection speed is above 10 Mbps.

b. What conclusions can you reach about which countries are most similar?

The restaurant owner in Problem 2.91 continues to learn more about the weekend patterns of patron demand. For each patron, the owner has collected and stored in Patrons the gender, the entrée ordered, the dessert ordered, and payment method.

a. Conduct a multiple correspondence analysis of the patron data.

b. What observations can you make about the weekend patron patterns?

The file Social Response contains the product category, sentiment rating, and customer type and frequency of posting (low, average, high) for 300 recently posted comments to a retailer’s community website.

a. Conduct a multiple correspondence analysis of the posted comments data.

b. What customer patterns does the analysis suggest?

The file MobileSpeed contains the overall download and upload speeds in mbps for nine carriers in the United States.

Source: Data extracted from “Best Mobile Network 2016,” bit.ly/1KGPrMm, accessed November 10, 2016.

a. Perform a multidimensional scaling analysis on the United States carriers based on the download and upload speeds.

b. What conclusions can you reach about which carriers are most similar?

Have you wondered how Internet connection speed varies around the globe? The file ConnectionSpeed contains the mean connection speed, the mean peak connection speed, the percent of the time the connection speed is above 4 mbps, and the percent of the time the connection speed is above 10 Mbps for various countries.

Source: Data extracted from bit.ly/2vPmifV.

a. Perform a multidimensional scaling analysis on the various countries based on the mean connection speed, the mean peak connection speed, the percent of the time the speed is above 4 Mbps, and the percent of the time the connection speed is above 10 Mbps.

b. What conclusions can you reach about which countries are most similar?

What is the difference between supervised and unsupervised analytics methods?

How does multiple correspondence analysis differ from multidimensional scaling?

The production of wine is a multibillion-dollar worldwide industry. In an attempt to develop a model of wine quality as judged by wine experts, data were collected from red and white wine

variants of Portuguese “Vinho Verde” wine.

Source: Data extracted from P. Cortez et. al., “Modeling Wine Preferences by Data Mining from Physiochemical Properties,” Decision Support Systems, 47, 2009, pp. 547–553 and bit.ly/9xKlEa.

The population of 6,497 wines is stored in VinhoVerde Population.

a. Using half the data as the training sample and the other half of the data as the validation sample, develop a classification tree model to predict the probability that the wine is red. (Consider the entire set of variables in your analysis.)

b. What conclusions can you reach about the probability that the wine is red.

Using to the data in Problem 17.27,

a. Use half the data as the training sample and the other half of the data as the validation sample to develop a regression tree model to predict wine quality. (Consider the entire set of variables in your analysis.)

b. What conclusions can you reach about wine quality?

A specialist in baseball analytics is interested in determining which variables are important in predicting a team’s wins in a given baseball season. He has collected data in Baseball that includes the number of wins, ERA, saves, runs scored, hits allowed, walks allowed, and errors for a recent season.

a. Using all the data as the training sample, develop a regression tree model to predict the number of wins.

b. What conclusions can you reach about the number of wins?

A market research study has been conducted by a travel website that specializes in restaurants with the business objective to determine which food cuisines are perceived to be similar and which are perceived to be different. The following cuisine types were studied:

The mean values of each cuisine on the scales of

Bland (1) to Spicy (7)

Light (1) to Heavy (7)

Low calorie (1) to High calories (7)

are stored in Foods.

a. Perform a cluster analysis on the types of cuisines.

b. Perform a multidimensional scaling analysis on the types of cuisines.

c. What conclusions can you reach about which types of cuisines are most similar?

A specialist in baseball analytics seeks to study which baseball teams were most similar in a recent season. The specialist has collected data in Baseball related to ERA, saves, runs scored, hits allowed, walks allowed, and errors for that recent season.

a. Perform a cluster analysis on the baseball teams.

b. Perform a multidimensional scaling analysis on the baseball teams.

c. What conclusions can you reach about which baseball teams were similar for that recent season?

Develop a model to predict the asking price of houses in Silver Spring, Maryland, based on living space, lot size, whether the has a fireplace, the number of bedrooms, the number of bathrooms, age, whether it has central air conditioning, the number of parking spaces, and whether the house has a brick exterior. Use the sample of 61 houses that is stored in SilverSpring as the data for this analysis.

a. Using all the data as a training sample, develop a regression tree model to predict the asking price of the house.

b. What conclusions can you reach about the asking price of the house?

c. Using half the data as the training sample and the other half of the data as the validation sample, develop a regression tree model to predict the asking price of the house.

d. What differences exist in the results of (a) and (c)?

With an assist from Moneyball: The Art of Winning an Unfair Game, a book by Michael Lewis, published in 2003 (and later adapted for the movie Moneyball), the management of professional teams in sports such as baseball, football, basketball, and hockey have turned to business analytics to help support decision making. In football, the most important position is the quarterback. The file Quarterback contains various attributes of 35 quarterbacks in a recent season.

a. Perform a cluster analysis on the quarterbacks.

b. Perform a multidimensional scaling analysis on the quarterbacks.

c. What conclusions can you reach about the quarterbacks?

In recent years, the share of Greek yogurts in the U.S. yogurt market has grown from 1% to over 50%, greatly increasing the variety of Greek yogurts available for sale. The file Yogurt contains the attributes of 17 regular plain, Greek plain, and regular berry yogurts.

a. Perform a cluster analysis on the yogurts.

b. Perform a multidimensional scaling analysis on the yogurts.

c. What conclusions can you reach about the yogurts?

The file EuroTourism2 contains a sample of 28 European countries. Variables included are the number of jobs generated in the travel and tourism industry in 2015, the spending on business travel within the country by residents and international visitors in 2015, the total number of international visitors who visited the country in 2015, and the number of establishments that provide overnight accommodation for tourists.

Source: Data extracted from www.marketline.com.

Using the data, you seek to predict the number of jobs generated in the travel and tourism industry. Completely analyze the data.

The file Philly contains a sample of 25 neighborhoods in Philadelphia. Variables included are neighborhood population, median sales price of homes in the second quarter of 2017, mean number of days homes were on the market in the second quarter of 2017, number of homes sold in the second quarter of 2017, median neighborhood household income, percentage of residents in the neighborhood with a bachelor’s degree or higher, and whether the neighborhood is considered “hot” (coded as 1 = yes, 0 = no).

Data extracted from bit.ly/2wlcJWs, bit.ly/2smOyVu, bit.ly/2v4mqZd, and bit.ly/2n0RNPW.

Using this data, you seek to predict median sales price of homes. Completely analyze the data.

The file HybridSales contains the number of domestic and imported hybrid vehicles sold in the United States from 1999 to 2016.

Source: Data extracted from Oak Ridge National Laboratory, “Vehicle Technologies Market Report,” bit.ly/2xrcrtO.

You want to be able to predict the number of domestic and imported hybrid vehicles sold in the United States in 2017 and 2018. Completely analyze the data.

Obtain a version of the red bead experiment for your class.

a. Conduct the experiment in the same way as described in this section.

b. Remove 400 red beads from the bead bowl before beginning the experiment. How do your results differ from those in (a)? What does this tell you about the effect of the process on the results?

What should you do to improve a process when special causes of variation are present?

What should you do to improve a process when only common causes of variation are present?

For a period of four weeks, record your pulse rate (in beats per minute) just after you get out of bed in the morning and then again before you go to sleep at night. Construct X and R charts and determine whether your pulse rate is in a state of statistical control. Discuss.

Use the table of random numbers (Table E.1) to simulate the selection of different-colored balls from an urn, as follows:

a. Start in the row corresponding to the day of the month in which you were born plus the last two digits of the year in which you were born. For example, if you were born October 3, 1990, you would start in row 93 (3 + 90). If your total exceeds 100, subtract 100 from the total.

b. Select two-digit random numbers.

c. If you select a random number from 00 to 94, consider the ball to be white; if the random number is from 95 to 99, consider the ball to be red.

Each student is to select 100 two-digit random numbers and report the number of “red balls” in the sample. Construct a control chart for the proportion of red balls. What conclusions can you draw about the system of selecting red balls? Are all the students part of the system? Is anyone outside the system? If so, what explanation can you give for someone who has too many red balls? If a bonus were paid to the top 10% of the students (the 10% with the fewest red balls), what effect would that have on the rest of the students? Discuss.

**Table E.1**

An entrepreneur is planning to market a new brand of bottled unsweetened, organic iced tea. The profit on each bottle of iced tea to be sold has been set at $0.50. The entrepreneur needs to decide on the size of the bottling plant to produce the iced tea. A small bottling plant will have an annual operating cost of $100,000 and be able to fill 500,000 bottles per year. A large bottling plant will have an annual operating cost of $300,000 and be able to fill 1,000,000 bottles per year. Four levels of demand are considered likely: 10,000, 100,000, 500,000, and 1,000,000 bottles per year.

a. Determine the payoffs for the possible levels of production for a small bottling plant.

b. Determine the payoffs for the possible levels of production for a large bottling plant.

c. Based on the results of (a) and (b), construct a payoff table, indicating the events and alternative courses of action.

d. Construct a decision tree.

e. Construct an opportunity loss table.

The following are the returns ($) for two stocks:

Which stock would you choose and why?

The following are the returns ($) for two stocks:

Which stock would you choose and why?

A vendor at a local baseball stadium must determine whether to sell ice cream or soft drinks at today’s game. The vendor believes that the profit made will depend on the weather. The payoff table (in $) is as follows:

Based on her past experience at this time of year, the vendor estimates the probability of warm weather as 0.60.

a. Determine the optimal action based on the maximax criterion.

b. Determine the optimal action based on the maximin criterion.

c. Compute the expected monetary value (EMV) for selling soft drinks and selling ice cream.

d. Compute the expected opportunity loss (EOL) for selling soft drinks and selling ice cream.

e. Explain the meaning of the expected value of perfect information (EVPI) in this problem.

f. Based on the results of (c) or (d), which would you choose to sell, soft drinks or ice cream? Why?

g. Compute the coefficient of variation for selling soft drinks and selling ice cream.

h. Compute the return-to-risk ratio (RTRR) for selling soft drinks and selling ice cream.

i. Based on (g) and (h), what would you choose to sell, soft drinks or ice cream? Why?

j. Compare the results of (f) and (i) and explain any differences.

Do you consider yourself a risk seeker, a risk averter, or a risk-neutral person? Explain.

Refer to Problems 20.3–20.5 and 20.12–20.14, respectively. In which problems do you think the expected monetary value (risk-neutral) criterion is inappropriate? Why?

What is the difference between an event and an alternative course of action?

What are the advantages and disadvantages of a payoff table as compared to a decision tree?

How are opportunity losses computed from payoffs?

Why can’t an opportunity loss be negative?

How does the expected value of perfect information differ from the expected profit under certainty?

How is Bayes’ theorem used to revise probabilities in light of sample information?

What is the difference between a risk averter and a risk seeker?

Why should you use utilities instead of payoffs in certain circumstances?

In Problem 9.18, how many degrees of freedom does the t test have?

**In Problem 9.18**

If, in a sample of n = 16 selected from a normal population, X̅ = 56 and S = 12, what is the value of t_{STAT} if you are testing the null hypothesis H_{0}: μ = 50?

The file MobileSpeed contains the overall download and upload speeds in mbps for nine carriers in the United States.

Source: Data extracted from “Best Mobile Network 2016,” bit.ly/1KGPrMm, accessed November 10, 2016.

a. Compute and interpret the coefficient of correlation, r.

b. At the 0.05 level of significance, is there a significant linear relationship between download and upload speed?

In Problems 13.8, 13.20, 13.30, 13.46, 13.62, 13.82, and 13.83, you developed regression models to predict franchise value of major league baseball, NBA basketball, and soccer teams. Now, write a report based on the models you developed. Append to your report all appropriate charts and statistical information.

**In Problem 13.8**

The value of a sports franchise is directly related to the amount of revenue that a franchise can generate. The file BBValues represents the value in 2017 (in $millions) and the annual revenue (in $millions) for the 30 Major League Baseball franchises.

Source: Data extracted from www.forbes.com/mlb-valuations/list.

A nonprofit analyst seeks to determine which variables should be used to predict nonprofit charitable commitment, a nonprofit organization commitment to its charitable purpose. Two independent variables under consideration are Revenue, a measurement of total revenue, in billions of dollars, as a measure of nonprofit size X_{1} and Efficiency, a measurement of the percent of private donations remaining after fundraising expenses as a measure of nonprofit fundraising efficiency X_{2}. The dependent variable Y is Commitment, a measurement of the percent of total expenses that are allocated directly to charitable services. Data are collected from a random sample of 98 nonprofit organizations, with the following results:

a. State the multiple regression equation.

b. Interpret the meaning of the slopes, b_{1} and b_{2}, in this problem.

c. What conclusions can you reach concerning nonprofit charitable commitment?

Human resource managers face the business problem of assessing the impact of factors on full-time job growth. A human resource manager is interested in the impact of full-time voluntary turnover and total worldwide revenues on the number of full-time job openings at the beginning of a new year. Data are collected from a sample of 63 “best companies to work for.” The total number of full-time job openings as of February 2017, the full-time voluntary turnover in the past year (in %), and the total worldwide revenue (in $billions) are recorded and stored in BestCompanies.

Source: Data extracted from Best Companies to Work For, 2017, fortune.com/best-companies.

a. State the multiple regression equation.

b. Interpret the meaning of the slopes, b_{1} and b_{2}, in this problem.

c. Interpret the meaning of the regression coefficient, b_{0}.

d. Which factor has the greatest effect on the number of full-time jobs added in the last year? Explain.

A financial analyst engaged in business valuation obtained financial data on 60 drug companies (Industry Group SIC 3 code: 283).

The file BusinessValuation contains the following variables:

Company—Drug Company name

PB fye—Price-to-book-value ratio (fiscal year ending)

ROE—Return on equity

SGrowth—Growth (GS5)

a. Develop a regression model to predict price-to-book-value ratio based on return on equity.

b. Develop a regression model to predict price-to-book-value ratio based on growth.

c. Develop a regression model to predict price-to-book-value ratio based on return on equity and growth.

d. Compute and interpret the adjusted r^{2} for each of the three models.

e. Which of these three models do you think is the best predictor of price-to-book-value ratio?

In Problem 14.3 on page 541, you predicted nonprofit charitable commitment, based on nonprofit revenue and fundraising efficiency. The regression analysis resulted in this ANOVA table:

Determine whether there is a significant relationship between commitment and the two independent variables at the 0.05 level of significance.

Join SolutionInn Study Help for

1 Million+ Textbook Solutions

Learn the step-by-step answers to your textbook problems, just enter our Solution Library containing more than 1 Million+ textbooks solutions and help guides from over 1300 courses.

24/7 Online Tutors

Tune up your concepts by asking our tutors any time around the clock and get prompt responses.