One way to see whether this procedure will be successful is to split the original data...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
One way to see whether this procedure will be successful is to split the original data set into two subsets: one subset for estimation and one subset for validation. A regression equation is estimated from the first subset. Then the values of explanatory variables from the second subset are substituted into this equation to obtain predicted values for the dependent variable. Finally, these predicted values are compared to the known values of the dependent variable in the second subset. If the agreement is good, there is reason to believe that the regression equation will predict well for new data. This procedure is called validating the fit. This validation procedure is fairly simple to perform in Excel. We illustrate it for the Bendrix manufacturing data in Example 10.2. (See the file Overhead Costs Validation.xlsx.) FUNDAMENTAL INSIGHT Training and Validation Sets This practice of partitioning a data set into a set for esti- mation and a set for validation is becoming much more common as larger data sets become available. It allows you to see how a given procedure such as regression works on a data set where you know the Ys. If it works well, you have more confidence that it will work well on a new data set where you do not know the Ys. This partitioning is a routine part of data mining, the explo- ration of large data sets. In data mining, the first data set is usually called the training set, and the second data set is called the validation or testing set. There we used 36 monthly observations to regress Overhead on Machine Hours and Production Runs. For convenience, the regression output is repeated in Figure 10.46. In particular, it shows an R value of 86.6% and an s value of $4109. Now suppose that this data set is from one of Bendrix's two plants. The company would like to predict overhead costs for the other plant by using data on machine hours and production runs at the other plant. The first step is to see how well the regression from Figure 10.46 fits data from the other plant. This validation on the 36 months of data is shown in Figure 10.47. To obtain the results in this figure, proceed as follows. Procedure for Validating Regression Results 1. Copy old results. Copy the results from the original regression to the ranges A5:C5 and B9:B10. 2. Calculate fitted values and residuals. The fitted values are now the predicted values of overhead for the other plant, based on the original regression equation. Find these by substituting the new values of Machine Hours and Production Runs into the original equation. Specifically, enter the formula =A55-SUMPRODUCTSB:5:05BE3:013) Figure 10.46 Stat Tools Regression Output for Bendrix Example A B C 8 Multiple Regression for Overhead Summary Multiple R-Square R 10 0.9308 0.8664 D Adjusted R-square 0.8583 F G Std. Err. of Estimate 4108.99309 Rows Ignored 0 11 12 13 ANOVA Table Degrees of Freedom 14 Explained 15 Unexplained 2 33 Sum of Squares 3614020661 557166199.1 Mean of Squares 1807010330 16883824.22 107.0261279 p-Value <0.0001 16 17 Coefficient 18 Regression Table 19 Constant 20 Machine Hours 21 Production Runs 3996.678209 43.53639812 883.6179252 Standard Error 6603.650932 3.5894837 82.25140753 t-Value p-Value 0.605222512 12.12887472 10.74289124 0.5492 <0.0001 <0.0001 Confidence Interval 95% Lower -9438.550632 36.23353862 716.2761784 Upper 17431.90705 50.83925761 1050.959672 10-7 Validation of the Fit 471 Cengage Learning 1. In terms of the estimated data listed on the printout, explained what the R-square does. 2. What does the Constant under the Coefficient (3996.68) tell the analyst? 3. The relationship in the printout measures production costs based on machine hours and production runs. Focusing only on machine hours, provide an interpretation of the machine hours coefficient (43.54). 4. What does the Standard Error of machine hours tell us about this estimate? 5. What do the t-Value and the p-Value tell us about the machine hour coefficient? 6. Can we use the F Statistic to evaluate this model? Why or why not? 7. Should we use this model? Why or why not? 8. Using the machine hours estimate, composed a regression equation of the results on the printout. One way to see whether this procedure will be successful is to split the original data set into two subsets: one subset for estimation and one subset for validation. A regression equation is estimated from the first subset. Then the values of explanatory variables from the second subset are substituted into this equation to obtain predicted values for the dependent variable. Finally, these predicted values are compared to the known values of the dependent variable in the second subset. If the agreement is good, there is reason to believe that the regression equation will predict well for new data. This procedure is called validating the fit. This validation procedure is fairly simple to perform in Excel. We illustrate it for the Bendrix manufacturing data in Example 10.2. (See the file Overhead Costs Validation.xlsx.) FUNDAMENTAL INSIGHT Training and Validation Sets This practice of partitioning a data set into a set for esti- mation and a set for validation is becoming much more common as larger data sets become available. It allows you to see how a given procedure such as regression works on a data set where you know the Ys. If it works well, you have more confidence that it will work well on a new data set where you do not know the Ys. This partitioning is a routine part of data mining, the explo- ration of large data sets. In data mining, the first data set is usually called the training set, and the second data set is called the validation or testing set. There we used 36 monthly observations to regress Overhead on Machine Hours and Production Runs. For convenience, the regression output is repeated in Figure 10.46. In particular, it shows an R value of 86.6% and an s value of $4109. Now suppose that this data set is from one of Bendrix's two plants. The company would like to predict overhead costs for the other plant by using data on machine hours and production runs at the other plant. The first step is to see how well the regression from Figure 10.46 fits data from the other plant. This validation on the 36 months of data is shown in Figure 10.47. To obtain the results in this figure, proceed as follows. Procedure for Validating Regression Results 1. Copy old results. Copy the results from the original regression to the ranges A5:C5 and B9:B10. 2. Calculate fitted values and residuals. The fitted values are now the predicted values of overhead for the other plant, based on the original regression equation. Find these by substituting the new values of Machine Hours and Production Runs into the original equation. Specifically, enter the formula =A55-SUMPRODUCTSB:5:05BE3:013) Figure 10.46 Stat Tools Regression Output for Bendrix Example A B C 8 Multiple Regression for Overhead Summary Multiple R-Square R 10 0.9308 0.8664 D Adjusted R-square 0.8583 F G Std. Err. of Estimate 4108.99309 Rows Ignored 0 11 12 13 ANOVA Table Degrees of Freedom 14 Explained 15 Unexplained 2 33 Sum of Squares 3614020661 557166199.1 Mean of Squares 1807010330 16883824.22 107.0261279 p-Value <0.0001 16 17 Coefficient 18 Regression Table 19 Constant 20 Machine Hours 21 Production Runs 3996.678209 43.53639812 883.6179252 Standard Error 6603.650932 3.5894837 82.25140753 t-Value p-Value 0.605222512 12.12887472 10.74289124 0.5492 <0.0001 <0.0001 Confidence Interval 95% Lower -9438.550632 36.23353862 716.2761784 Upper 17431.90705 50.83925761 1050.959672 10-7 Validation of the Fit 471 Cengage Learning 1. In terms of the estimated data listed on the printout, explained what the R-square does. 2. What does the Constant under the Coefficient (3996.68) tell the analyst? 3. The relationship in the printout measures production costs based on machine hours and production runs. Focusing only on machine hours, provide an interpretation of the machine hours coefficient (43.54). 4. What does the Standard Error of machine hours tell us about this estimate? 5. What do the t-Value and the p-Value tell us about the machine hour coefficient? 6. Can we use the F Statistic to evaluate this model? Why or why not? 7. Should we use this model? Why or why not? 8. Using the machine hours estimate, composed a regression equation of the results on the printout.
Expert Answer:
Answer rating: 100% (QA)
1 The Rsquare or the coefficient of determination represents the proportion of the variance in the dependent variable that is predictable from the ind... View the full answer
Related Book For
Introduction to Data Mining
ISBN: 978-0321321367
1st edition
Authors: Pang Ning Tan, Michael Steinbach, Vipin Kumar
Posted Date:
Students also viewed these mathematics questions
-
Read the case study "Southwest Airlines," found in Part 2 of your textbook. Review the "Guide to Case Analysis" found on pp. CA1 - CA11 of your textbook. (This guide follows the last case in the...
-
Python and most Python libraries are free to download or use, though many users use Python through a paid service. Paid services help IT organizations manage the risks associated with the use of...
-
Paula makes the following acquisitions of ordinary shares in Indigent plc: She sells 500 shares on 28 March 2021. No shares are acquired within the next 30 days. (a) Show the s104 holding on 28 March...
-
Suppose that p(x, y) is n open statement where the universe for each of x, y consists of only three integers: 2, 3, 5. Then the quantified statement y p(2, y) is logically equivalent to p(2, 2) p(2,...
-
In exercises use the Squeeze Theorem to find c = a b -|x - a| (x) b + |x = a| lim f(x).. x-c
-
Consider two alternatives, each of which will accomplish the same EPA-mandated pollution control. Using an incremental IRR analysis, determine the preferred alternative assuming MARR is 20...
-
Hollings Company sells and delivers office furniture in the Rocky Mountain area. The costs associated with the acquisition and annual operation of a delivery trucks are given below: Required: 1....
-
below is the code for quick sort. Modify this code so it runs in O(nlogn) time on sorted input. int partition(std::vector arr, int p, int r) int pivot int i=p for (int arr[r]; 1; p;j
-
The assets of a company total $700,000; the liabilities, $200,000. What are the net assets? A) $900,000. B) $700,000. C) $500,000. D) $200,000. E) It is impossible to determine unless the amount of...
-
Draw Free body diagram & derive Equation of motion by Newton Euler method. 1 = 20 7 20 cm 13= 25m T Br F t 45 20 cm = 1 T 1 25= du Robotic Gripper
-
What factors should corporations consider when determining whether accumulated earnings are considered unreasonable and subject to the accumulated earnings tax? How does the concept of reasonable...
-
Eureka Company started business on January 1, 20X1. For 20X1, Eureka reported a net loss of $30,000 and paid cash dividends of $75,000. Note: The incorporation laws in the state in which Eureka is...
-
Assume that a max heap with 2 0 nodes is represented sequentially using an array T as explained in class. Where is the largest number in T ? What is the range of indexes where the smallest number may...
-
For this discussion board, select one example from financial and managerial accounting, share their purpose as external or internal users, and give an example of where you have seen both of the users...
-
Which three of the concepts under the field of behavioral ethics do you think are most powerful?
-
Citing a scientific article, explain in your own words, how DNA fingerprinting has been used in forensic science to solve crimes and why it may not always be accurate or effective.
-
Four moles of nitrogen and one mole of oxygen at \(P=1 \mathrm{~atm}\) and \(T=300 \mathrm{~K}\) are mixed together to form air at the same pressure and temperature. Calculate the entropy of mixing...
-
A mole of argon and a mole of helium are contained in vessels of equal volume. If argon is at \(300 \mathrm{~K}\), what should the temperature of helium be so that the two have the same entropy?
-
If the two gases considered in the mixing process of Section 1.5 were initially at different temperatures, say \(T_{1}\) and \(T_{2}\), what would the entropy of mixing be in that case? Would the...
Study smarter with the SolutionInn App