Question: MUST SHOW WORK FOR PROBLEMS 40 - 50 Part A: Multiple Choice (1-16) Using the following information to answer questions 1-8: 2 _____1. a) 1.90
MUST SHOW WORK FOR PROBLEMS 40 - 50 Part A: Multiple Choice (1-16) Using the following information to answer questions 1-8: 2 _____1. a) 1.90 _____2. 1 3 0 1 2 0 5 1 4 What is the mean? b) 2.11 What is the median? c) 1.73 d) 1.66 a) 1.00 _____3. b) 1.50 What is the mode? c) 1.60 d) 1.90 a) 0 _____4. b) 1 What is the range? c) 2 d) 3 a) 1 _____5. b) 4 What is the variance? c) 5 d) 3 c) 1.76 d) 3.11 a) 1.66 _____6. b) 2.77 What is the standard deviation? a) 1.66 _____7. b) 2.77 What is the 1st quartile? c) 1.76 d) 3.11 a) 0 _____8. b) 1 What is the 70th percentile? c) 2 d) 3 a) 1 b) 1.5 c) 2 d) 2.5 Questions 9-16 refer to the following frequency distribution: The following information represents the number of tons of grain stored at the 60 grain elevators of central Soya, Inc. Class 160-164 165-169 170-174 175-179 180-184 Total Frequency 6 14 16 13 11 60 _____9. What is the relative frequency for the class 170-174? a) 16 b) 36 _____10. What is the mean? c) 0.27 a) 170.25 b) 12 c) 172.75 _____11. What is the median? a) 169.5 b) 170 c) 172.63 d) 0.60 d) 167 d) 173.13 _____12. What is the mode? a) 16 b) 170 c) 172 d) 174 _____13. What is the variance? a) 39.68 b) 12.33 _____14. What is the standard deviation? c) 151.55 d) 149.02 a) 6.30 b) 3.51 _____15. What is the 1st quartile? c) 12.31 d) 12.21 a) 168.21 b) 167.07 _____16. What is the 70th percentile? c) 167.57 d) 167.71 a) 176.35 c) 176.81 d) 177.31 b) 176.85 Part B: True or False (17-30) _____17. The random sample is the most important, because statistical theory applies to it alone. _____18. In a frequency distribution, the class mark is the number of observations that fall within that class. _____19. Original class interval frequencies can be obtained by multiplying the respective relative frequencies by the total number of observations. _____20. The sum of the class frequencies is equal to the number of observations made. _____21. A relative frequency distribution describes the proportion of data values that fall within each category. _____22. There would be no need for statistical theory if census, rather than a sample was always used to obtain information about populations. _____23. The arithmetic mean is the sum of the data values divided by the number of observations. _____24. P ( A B ) is P ( B| A )P ( A ) _____25. The median always exists in a set of numerical data. _____26. P ( A B ) =0 means that A and B are mutually exclusive events. _____27. Mutually exclusive events imply that if one event occurs, the other cannot occur. An event (e.g., A ) and its complement are always mutually exclusive. _____28. P ( A )+ P ( A )=1 _____29. Events are independent when the occurrence of one event has no effect on the probability that another will occur. _____30. The P(x) is always 0 P(x) 1. Part C: Answer the following questions (31-39) 31. Name 3 types of statistical samples 1 2 3 32. Name reasons why we use samples instead of an entire population 33. Name types of random samples. 34. Name the measure of central tendency. 35. Name measures of spread or variability. 36. Show that you understand the difference between a sampling error and a sampling bias. 37. Name 4 parts of the regression. 38. When do we use the regression? 39. Explain the concept of error and uncertainty as it relates to decision making. Part D: Must show all your work step by step in order to receive the full credit; Excel is not allowed. (40-50) 40. Train A and Train B are two transportations that people can use to transport from Boston to New York. Sample of times recorded in minutes for each train are shown below. Train A 28 29 32 37 33 25 29 32 41 34 Train B 29 31 33 32 34 30 31 32 35 33 Please answer the following questions (a-i). Train A Train B a) Mean b) Mean c) Median d) Median e) Variance f) Variance g) Standard deviation h) Standard deviation i) From above computed results from Train A and Train B, what train should be preferred and why? 41. Use the information from the problem number 5 on page 3-21 to fill in the given table below and answer the following questions (a-h) (**Using D-method). Number of miles flown Number of frequent MidPoint CF d fd d2 Flyers (f) (x) (1,000) -1 0 1 2 3 4 Total a) Find mean b) Find median fd2 c) Find mode d) Find Range e) Find variance f) Find standard deviation g) Find 1st Quartile h) Find 99th percentile 42. Use the information from the problem number 1 on page 3-39 to fill in the given table below and answer the following questions (a-h) (**Using D-method). Dinner Check ($) Frequency (f) CF MidPoint (x) d -2 -1 0 1 2 3 Total a) Find mean b) Find median c) Find mode d) Find Range e) Find variance f) Find standard deviation fd d2 fd2 g) Find 3rd Quartile h) Find 80th percentile 43. The mean GMAT score of 65 applicants who were accepted into the MBA program of Xavier Business School was 520 with variance of 225. About how many applicants scored between 470 and 570 on the GMAT? 44. Work on problem number 11 on page 3-49 (a-b). a) b) 45. Use the information from the problem number 2 on page 4-8 and answer the following questions (a-h); Given that Y = Income rate and X = Average expense a) Find b0 c) Interpret the meaning of b) Find b1 b1 d) Find the regression equation e) Predict the income rate for an area with an average f) Compute the coefficient of determination expense of $75 g) Interpret the coefficient of determination h) Compute and interpret the coefficient of correlation 46. An educator wants to see how strong the relationship is between a student's score on a test and his or her grade-point average. The data obtained from the sample are shown. Test Score (X) GPA (Y) a) Find 98 2.1 b0 c) Interpret the meaning of 105 2.4 100 3.2 100 2.7 b) Find b1 106 2.2 95 2.3 b1 d) Find the regression equation 116 3.8 112 3.4 e) Predict the GPA for a student who gets 145 in the f) Compute the coefficient of determination test score. g) Interpret the coefficient of determination h) Compute and interpret the coefficient of correlation 47. Please use the given printout and answer the following questions. SUMMARY OUTPUT Regression Statistics Multiple R 0.190885429 R Square 0.036437247 Adjusted R Square -0.44534413 Standard Error 2.502630195 Observations 4 ANOVA Regression Residual Total df 1 2 3 SS MS F 0.473684211 0.47368421 0.0756303 12.52631579 6.26315789 13 Significance F 0.809114571 Intercept X Variable Coefficients 2.894736842 -0.315789474 t Stat P-value 1.904216054 1.52017248 0.2678372 1.148285486 -0.2750095 0.8091146 Lower 95% -5.298443561 -5.256463153 a) What is b0 Standard Error b) What is b1 Upper 95% 11.08792 4.624884 c) Interpret the meaning of b1 d) What is the regression equation e) What is the coefficient of determination f) What is the coefficient of correlation g) Interpret the coefficient of determination h) Interpret the coefficient of correlation 48. From the below cross classification showing the frequencies of hair and eye color of a group of students at North Texas, and answer the following questions (a-d). Hair Color Blond Blond Brown Red/Auburn Black Total Brown 30 74 17 43 164 Blue 18 28 15 11 72 Eye Color Green 7 10 5 6 28 Hazel 2 7 3 1 13 Total 57 119 40 61 277 a) Probability that a student will have blue eyes given he/she is blond. b) Probability of black hair. c) Probability that a student will have either red hair or green eyes. d) Probability that a student will have brown hair and brown eyes. 49. Work on problem number 33 on page 5-19 (a-d) a) b) c) d) a) c) e) 50. Work on problem number 17 on page 5-16 (a-e) b) d) MUST SHOW WORK FOR PROBLEMS 40 - 50 Part A: Multiple Choice (1-16) Using the following information to answer questions 1-8: 2 _____1. 1 3 0 1 2 0 5 1 4 What is the mean? a) 1.90 _____2. b) 2.11 What is the median? c) 1.73 d) 1.66 a) 1.00 _____3. b) 1.50 What is the mode? c) 1.60 d) 1.90 a) 0 _____4. b) 1 What is the range? c) 2 d) 3 a) 1 _____5. b) 4 What is the variance? c) 5 d) 3 c) 1.76 d) 3.11 a) 1.66 _____6. b) 2.77 What is the standard deviation? a) 1.66 _____7. b) 2.77 What is the 1st quartile? c) 1.76 d) 3.11 a) 0 _____8. b) 1 What is the 70th percentile? c) 2 d) 3 a) 1 b) 1.5 c) 2 d) 2.5 Questions 9-16 refer to the following frequency distribution: The following information represents the number of tons of grain stored at the 60 grain elevators of central Soya, Inc. Class 160-164 165-169 170-174 175-179 180-184 Total Frequency 6 14 16 13 11 60 _____9. What is the relative frequency for the class 170-174? a) 16 b) 36 _____10. What is the mean? c) 0.27 a) 170.25 b) 12 c) 172.75 _____11. What is the median? a) 169.5 b) 170 c) 172.63 d) 0.60 d) 167 d) 173.13 _____12. What is the mode? a) 16 b) 170 c) 172 d) 174 _____13. What is the variance? a) 39.68 b) 12.33 _____14. What is the standard deviation? c) 151.55 d) 149.02 a) 6.30 b) 3.51 _____15. What is the 1st quartile? c) 12.31 d) 12.21 a) 168.21 b) 167.07 _____16. What is the 70th percentile? c) 167.57 d) 167.71 a) 176.35 c) 176.81 d) 177.31 b) 176.85 Part B: True or False (17-30) _____17. The random sample is the most important, because statistical theory applies to it alone. _____18. In a frequency distribution, the class mark is the number of observations that fall within that class. _____19. Original class interval frequencies can be obtained by multiplying the respective relative frequencies by the total number of observations. _____20. The sum of the class frequencies is equal to the number of observations made. _____21. A relative frequency distribution describes the proportion of data values that fall within each category. _____22. There would be no need for statistical theory if census, rather than a sample was always used to obtain information about populations. _____23. The arithmetic mean is the sum of the data values divided by the number of observations. _____24. P ( A B ) is P ( B| A )P ( A ) _____25. The median always exists in a set of numerical data. _____26. P ( A B ) =0 means that A and B are mutually exclusive events. _____27. Mutually exclusive events imply that if one event occurs, the other cannot occur. An event (e.g., A ) and its complement are always mutually exclusive. _____28. P ( A )+ P ( A )=1 _____29. Events are independent when the occurrence of one event has no effect on the probability that another will occur. _____30. The P(x) is always 0 P(x) 1. Part C: Answer the following questions (31-39) 31. Name 3 types of statistical samples 1 2 3 32. Name reasons why we use samples instead of an entire population 33. Name types of random samples. 34. Name the measure of central tendency. 35. Name measures of spread or variability. 36. Show that you understand the difference between a sampling error and a sampling bias. 37. Name 4 parts of the regression. 38. When do we use the regression? 39. Explain the concept of error and uncertainty as it relates to decision making. Part D: Must show all your work step by step in order to receive the full credit; Excel is not allowed. (40-50) 40. Train A and Train B are two transportations that people can use to transport from Boston to New York. Sample of times recorded in minutes for each train are shown below. Train A 28 29 32 37 33 25 29 32 41 34 Train B 29 31 33 32 34 30 31 32 35 33 Please answer the following questions (a-i). Train A Train B a) Mean b) Mean c) Median d) Median e) Variance f) Variance g) Standard deviation h) Standard deviation i) From above computed results from Train A and Train B, what train should be preferred and why? 41. Use the information from the problem number 5 on page 3-21 to fill in the given table below and answer the following questions (a-h) (**Using D-method). Number of miles flown Number of frequent MidPoint CF d fd d2 Flyers (f) (x) (1,000) 38 - 45 25 -1 45 - 52 32 0 52 - 59 8 1 59 - 66 12 2 66 - 73 3 3 73 - 80 5 4 Total a) Find mean b) Find median fd2 c) Find mode d) Find Range e) Find variance f) Find standard deviation g) Find 1st Quartile h) Find 99th percentile 42. Use the information from the problem number 1 on page 3-39 to fill in the given table below and answer the following questions (a-h) (**Using D-method). Dinner Check ($) 22 - 31 Frequency (f) 5 32 - 41 4 -1 42 - 51 9 0 52 - 61 2 1 62 - 71 13 2 72 - 81 7 3 CF MidPoint (x) d -2 Total a) Find mean b) Find median c) Find mode d) Find Range e) Find variance f) Find standard deviation fd d2 fd2 g) Find 3rd Quartile h) Find 80th percentile 43. The mean GMAT score of 65 applicants who were accepted into the MBA program of Xavier Business School was 520 with variance of 225. About how many applicants scored between 470 and 570 on the GMAT? 44. Work on problem number 11 on page 3-49 (a-b). (below) In a random sample of 200 Automobile insurance claims obtained from All State Insurance Company, Mean = $615 and the Standard Deviation = $135. Calculate the following intervals a) An Intervals that has at least 75% of data within it. b) An interval that has at least 89% of the data within it. a) b) 45. Use the information from the problem number 2 on page 4-8 and answer the following questions (a-h); Given that Y = Income rate and X = Average expense a) Find b0 b) Find b1 c) Interpret the meaning of b1 d) Find the regression equation e) Predict the income rate for an area with an average f) Compute the coefficient of determination expense of $75 g) Interpret the coefficient of determination h) Compute and interpret the coefficient of correlation 46. An educator wants to see how strong the relationship is between a student's score on a test and his or her grade-point average. The data obtained from the sample are shown. Test Score (X) 98 105 100 100 106 95 116 112 GPA (Y) a) Find 2.1 2.4 b0 3.2 2.7 b) Find c) Interpret the meaning of b1 2.2 2.3 3.8 3.4 b1 d) Find the regression equation e) Predict the GPA for a student who gets 145 in the f) Compute the coefficient of determination test score. g) Interpret the coefficient of determination h) Compute and interpret the coefficient of correlation 47. Please use the given printout and answer the following questions. SUMMARY OUTPUT Regression Statistics Multiple R 0.190885429 R Square 0.036437247 Adjusted R Square -0.44534413 Standard Error 2.502630195 Observations 4 ANOVA Regression Residual Total df 1 2 3 SS MS F 0.473684211 0.47368421 0.0756303 12.52631579 6.26315789 13 Significance F 0.809114571 Intercept X Variable Coefficients 2.894736842 -0.315789474 t Stat P-value Standard Error 1.904216054 1.52017248 0.2678372 1.148285486 -0.2750095 0.8091146 Lower 95% -5.298443561 -5.256463153 a) What is b0 c) Interpret the meaning of b) What is b1 Upper 95% 11.08792 4.624884 b1 d) What is the regression equation e) What is the coefficient of determination f) What is the coefficient of correlation g) Interpret the coefficient of determination h) Interpret the coefficient of correlation 48. From the below cross classification showing the frequencies of hair and eye color of a group of students at North Texas, and answer the following questions (a-d). Hair Color Blond Blond Brown Red/Auburn Black Total Brown 30 74 17 43 164 Blue 18 28 15 11 72 Eye Color Green 7 10 5 6 28 Hazel 2 7 3 1 13 Total 57 119 40 61 277 a) Probability that a student will have blue eyes given he/she is blond. b) Probability of black hair. c) Probability that a student will have either red hair or green eyes. d) Probability that a student will have brown hair and brown eyes. 49. Work on problem number 33 on page 5-19 (a-d) a) b) c) d) 50. Work on problem number 17 on page 5-16 (a-e) a) b) c) d) e) MUST SHOW WORK FOR PROBLEMS 40 - 50 Using the following information to answer questions 1-8: 2 _____1. a) 1.90 _____2. 1 3 0 1 2 0 5 1 4 What is the mean? b) 2.11 What is the median? c) 1.73 d) 1.66 a) 1.00 _____3. b) 1.50 What is the mode? c) 1.60 d) 1.90 a) 0 _____4. b) 1 What is the range? c) 2 d) 3 a) 1 _____5. b) 4 What is the variance? c) 5 d) 3 c) 1.76 d) 3.11 a) 1.66 _____6. b) 2.77 What is the standard deviation? a) 1.66 _____7. b) 2.77 What is the 1st quartile? c) 1.76 d) 3.11 a) 0 _____8. b) 1 What is the 70th percentile? c) 2 d) 3 a) 1 b) 1.5 c) 2 d) 2.5 Questions 9-16 refer to the following frequency distribution: The following information represents the number of tons of grain stored at the 60 grain elevators of central Soya, Inc. Class 160-164 165-169 170-174 175-179 180-184 Total Frequency 6 14 16 13 11 60 _____9. What is the relative frequency for the class 170-174? a) 16 b) 36 _____10. What is the mean? c) 0.27 a) 170.25 b) 12 c) 172.75 _____11. What is the median? a) 169.5 b) 170 c) 172.63 d) 0.60 d) 167 d) 173.13 _____12. What is the mode? a) 16 b) 170 _____13. What is the variance? c) 172 d) 174 a) 39.68 b) 12.33 _____14. What is the standard deviation? c) 151.55 d) 149.02 a) 6.30 b) 3.51 _____15. What is the 1st quartile? c) 12.31 d) 12.21 a) 168.21 b) 167.07 _____16. What is the 70th percentile? c) 167.57 d) 167.71 a) 176.35 c) 176.81 d) 177.31 b) 176.85 Part B: True or False (17-30) _____17. The random sample is the most important, because statistical theory applies to it alone. True _____18. In a frequency distribution, the class mark is the number of observations that fall within that class. False _____19. Original class interval frequencies can be obtained by multiplying the respective relative frequencies by the total number of observations. True _____20. The sum of the class frequencies is equal to the number of observations made. True _____21. A relative frequency distribution describes the proportion of data values that fall within each category. True _____22. There would be no need for statistical theory if census, rather than a sample was always used to obtain information about populations. True _____23. The arithmetic mean is the sum of the data values divided by the number of observations. True _____24. P ( A B ) is P ( B| A )P ( A ) True _____25. The median always exists in a set of numerical data. _____26. False P ( A B ) =0 means that A and B are mutually exclusive events. True _____27. Mutually exclusive events imply that if one event occurs, the other cannot occur. An event (e.g., A ) and its complement are always mutually exclusive. _____28. P ( A )+ P ( A )=1 True True _____29. Events are independent when the occurrence of one event has no effect on the probability that another will occur. True _____30. The P(x) is always 0 P(x) 1. True Part C: Answer the following questions (31-39) 31. Name 3 types of statistical samples 1 Convenience Sampling 3 2 Probability (Random) Sampling Mixed Sampling 32. Name reasons why we use samples instead of an entire population Less time and cost More reliability and accuracy Ease of data processing Scope and Flexibility Feasibility Chance to replace non-response 33. Name types of random samples. Simple Random Sampling Stratified Random Sampling Systematic Sampling Cluster Sampling 34. Name the measure of central tendency. Arithmetic Mean or Mean Median Mode 35. Name measures of spread or variability. Range Quartile Deviation Mean Deviation Standard Deviation 36. Show that you understand the difference between a sampling error and a sampling bias. Sampling error is the difference between the estimate of the parameter and the actual value of the parameter. The sampling bias is the inaccuracy of the estimate due to errors in sampling methods such as non-representative sampling, improper weightage etc. 37. Name 4 parts of the regression. Independent variable or Predictor Dependent variable or Response Regression Coefficients Random Error 38. When do we use the regression? For Model Building between X and Y For predicting Y for a given X 39. Explain the concept of error and uncertainty as it relates to decision making. Error is the difference between the actual value and its predicted value by any regression Model. It was due unavoidable variations in the process. Uncertainty is the term related to not knowing which among the possible outcomes happen in the next trail. In almost all the business models we can see this kind of situation. In such cases probabilistic modeling is a better option for decision making. Part D: Must show all your work step by step in order to receive the full credit; Excel is not allowed. (40-50) 40. Train A and Train B are two transportations that people can use to transport from Boston to New York. Sample of times recorded in minutes for each train are shown below. Train A 28 29 32 37 33 25 29 32 41 34 Train B 29 31 33 32 34 30 31 32 35 33 Please answer the following questions (a-i). a) Mean = Train A x 320 = =32 n 10 b) Mean = Train B x 320 = =32 n 10 c) Median d) Median Ordered series: 25 28 29 29 32 32 33 34 37 41 Ordered series: 29 30 31 31 32 32 33 33 34 35 Median is the mean of 5th and 6th terms. Median is the mean of 5th and 6th terms. Median = (32 + 32)/2 = 32 Median = (32 + 32)/2 = 32 e) Variance f) Variance 2 Variance = = 2 X 2( X ) /n Variance = n1 10434( 320 )2 /10 101 = = 21.5556 n1 10270( 320 )2 /10 101 = 3.3333 g) Standard deviation s= X 2( X ) /n h) Standard deviation Variance= 21.5556=4.6428 s= Variance= 3.3333=1.8257 i) From above computed results from Train A and Train B, what train should be preferred and why? Though all the other characteristics for Train A and Train B are the same, but the variance of the Train B is low, so train B should be preferred. 41. Use the information from the problem number 5 on page 3-21 to fill in the given table below and answer the following questions (a-h) (**Using D-method). Number of miles flown Number of frequent MidPoint CF d fd d2 Flyers (f) (x) (1,000) 25 41.5 38 - 45 25 -1 -25 1 57 48.5 45 - 52 32 0 0 0 fd2 25 0 52 - 59 8 65 55.5 1 8 1 8 59 - 66 12 77 62.5 2 24 4 48 66 - 73 3 80 69.5 3 9 9 27 73 - 80 5 85 76.5 4 20 16 Total N = 85 80 188 a) Find mean A+ Mean = 36 b) Find median fd =48.5+ 36 =48.92 N 85 Median = = l+ N /2m XC f 42.525 X7 32 45+ = 48.82 c) Find mode d) Find Range f 0f 1 Mode = l+ 2 f 0f 1f 2 X C = Range = 80 - 38 = 42 3225 X7 2 32258 45+ = 46.58 e) Find variance f) Find standard deviation 2 Variance = = f d 2( fd ) /N Variance Standard deviation = N 1 = 188( 36 )2 /85 84 2.0566 = 1.4341 = 2.0566 g) Find 1st Quartile Q1 = l 1+ N / 4m1 XC f1 h) Find 99th percentile P90 = l 90+ = 59+ 90 N / 100m 90 XC f 90 76.565 X7 12 = 45+ 21.2525 X7 32 = 65.71 = 44.18 42. Use the information from the problem number 1 on page 3-39 to fill in the given table below and answer the following questions (a-h) (**Using D-method). Dinner Check ($) 22 - 31 Frequency (f) 5 MidPoint (x) d 5 26.5 -2 -10 4 20 32 - 41 4 9 36.5 -1 -4 1 4 42 - 51 9 18 46.5 0 0 0 0 52 - 61 2 20 56.5 1 2 1 2 62 - 71 13 33 66.5 2 26 4 52 72 - 81 7 40 76.5 3 21 9 63 141 Total a) Find mean Mean = A+ CF 40 35 b) Find median fd =46.5+ 35 =47.375 N 40 Median = = l+ N /2m XC f 62+ 2020 X 10 13 = 62 c) Find mode f 0f 1 l+ XC Mode = 2 f 0f 1f 2 d) Find Range Range = 80.5 - 21.5 = 59 d2 fd fd2 = 62+ 132 X 10 2 1327 = 68.47 e) Find variance f) Find standard deviation 2 f d 2( fd ) /N Variance = N 1 = 141( 35 )2 /40 39 = Variance Standard deviation = 2.8301 = 1.6823 = 2.8301 g) Find 3rd Quartile Q3 = l 3+ = h) Find 80th percentile 3 N /4m3 XC f3 62+ P80 = l 80+ 3020 X 10 13 = = 69.69 62+ 80 N /100m80 XC f 80 3220 X 10 13 = 71.23 43. The mean GMAT score of 65 applicants who were accepted into the MBA program of Xavier Business School was 520 with variance of 225. About how many applicants scored between 470 and 570 on the GMAT? NP [ 470 X 570 ]=65P [ 470 X 570 ] 65P [ 470520 570520 Z 225 225 65P [ 3.33 Z 3.33 ] 65 { ( 3.33 )(3.33 ) } 65{ 0.99960.0004 } 650.9992 64.95 65 44. Work on problem number 11 on page 3-49 (a-b). (below) In a random sample of 200 Automobile insurance claims obtained from All State Insurance Company, Mean = $615 and the Standard Deviation = $135. Calculate the following intervals a) An Interval that has at least 75% of data within it. b) An interval that has at least 89% of the data within it. a) b) By Chebychev's inequality, P {|X| k } 1 1 k2 P {|X| k } 1 If we select k = 2, then P {|X| 2 } 1 1 22 By Chebychev's inequality, 1 k2 If we select k = 3, then = 0.75 Therefore the required interval is, P {|X| 3 } 1 1 32 = 0.89 Therefore the required interval is, { 2 , +2 } { 3 , +3 } { 6152 135, 615+2 135 } { 6153 135, 615+3 135 } { 345,885 } { 210,1020 } 45. Use the information from the problem number 2 on page 4-8 and answer the following questions (a-h); Given that Y = Income rate and X = Average expense Total Mean a) Find Income Rate 80.7 67.9 72 73.5 73.4 78.7 69.5 515.7 73.67 Average Response 70.5 54.7 61.1 62.2 65 66.8 58.7 439 62.71 b0 b0 = b 1 =62.711.095 73.67=17.928 y x ( X X ) 2 49.4008 33.3094 2.7937 0.0294 0.0737 25.2865 17.4008 128.2943 b) Find ( Y Y )2 60.6173 64.2288 2.6059 0.2645 5.2245 16.6931 16.1145 165.7486 b1 ( X X ) ( Y Y ) 54.7224 46.2539 2.6982 0.0882 -0.6204 20.5453 16.7453 140.4329 b1= c) Interpret the meaning of b1 ( X X ) ( Y Y ) 140.4329 = =1.095 2 128.2943 ( X X ) d) Find the regression equation For every one unit increase in income rate, average response will increase by 1.095 units. ^ =17.928+1.095 x y e) Predict the income rate for an area with an average f) Compute the coefficient of determination expense of $75 2 ( ( X X ) ( Y Y ) ) 2 R= ^ =17.928+1.095 75=64.17 y ( ( X X )2) ( ( Y Y )2 ) ( 140.4329 )2 128.2943 165.7486 0.9274 92.74 g) Interpret the coefficient of determination h) Compute and interpret the coefficient of correlation 92.74% of the variation in Average response is r=+ R2 explained by the regression model by introducing positive] Income rate, in it. + 0.9274 [Since b1 is positive, r also 0.9630 There is a strong positive correlation between X and Y, indicated by r 0.70. 46. An educator wants to see how strong the relationship is between a student's score on a test and his or her grade-point average. The data obtained from the sample are shown. Test Score (X) 98 105 100 100 106 95 116 112 GPA (Y) Total Mean a) Find 2.1 Test Score (X) 98 105 100 100 106 95 116 112 832 104 2.4 GPA (Y) 2.1 2.4 3.2 2.7 2.2 2.3 3.8 3.4 22.1 2.7625 3.2 2.7 2 For every one point increase in test score GPA will increase by 0.0627 points. 3.4 ( X X ) ( Y Y ) 3.975 -0.3625 -1.75 0.25 -1.125 4.1625 12.45 5.1 22.7 b) Find b1 3.8 ( Y Y )2 36 1 16 16 4 81 144 64 362 b0 = b 1 =2.76250.0627 104=3.759 y x 2.3 0.4389 0.1314 0.1914 0.0039 0.3164 0.2139 1.0764 0.4064 2.7788 ( X X ) b0 c) Interpret the meaning of 2.2 b1= b1 ( X X ) ( Y Y ) 22.7 = =0.0627 2 362 ( X X ) d) Find the regression equation ^ =3.759+0.0627 x y e) Predict the GPA for a student who gets 145 in the f) Compute the coefficient of determination test score. 2 ( ( X X ) ( Y Y ) ) 2 R= ^ 145 =3.759+0.0627 145=5.333 y ( ( X X )2) ( ( Y Y )2 ) ( 22.7 )2 362 2.7888 0.5123 51.23 g) Interpret the coefficient of determination 51.23% of the variation in GPA is explained by the regression model by introducing Test Score, in it. h) Compute and interpret the coefficient of correlation r=+ R2 [Since b1 is positive, r also positive] + 0.5123 0.7157 There is a strong positive correlation between X and Y, indicated by r 0.70. 47. Please use the given printout and answer the following questions. SUMMARY OUTPUT Regression Statistics Multiple R 0.190885429 R Square 0.036437247 Adjusted R Square -0.44534413 Standard Error 2.502630195 Observations 4 ANOVA Regression Residual Total df 1 2 3 SS MS F 0.473684211 0.47368421 0.0756303 12.52631579 6.26315789 13 Significance F 0.809114571 Intercept X Variable Coefficients 2.894736842 -0.315789474 t Stat P-value Standard Error 1.904216054 1.52017248 0.2678372 1.148285486 -0.2750095 0.8091146 Lower 95% -5.298443561 -5.256463153 a) What is b0 b) What is b1 Upper 95% 11.08792 4.624884 b0 =2.895 b1=0.316 c) Interpret the meaning of b1 d) What is the regression equation For every one unit increase in X results in a decrease of 0.316 units in Y. e) What is the coefficient of determination ^ =2.8950.316 x y f) What is the coefficient of correlation r=0.191 2 R =0.0364=3.64 g) Interpret the coefficient of determination h) Interpret the coefficient of correlation 3.64% of the variation in Y is explained by the There is a weak negative correlation between regression model by introducing X, in it. X and Y, indicated by -0.20 < r = - 0.191 < 0. 48. From the below cross classification showing the frequencies of hair and eye color of a group of students at North Texas, and answer the following questions (a-d). Hair Color Blond Blond Brown Red/Auburn Black Total Brown 30 74 17 43 164 a) Probability that a student will have blue eyes given he/she is blond. P (Blue | Blond) = = P ( Blonde ) P ( Blond ) (18 /277 ) ( 57 /277 ) Blue 18 28 15 11 72 Eye Color Green 7 10 5 6 28 Hazel 2 7 3 1 13 Total 57 119 40 61 277 b) Probability of black hair. P (Black) = n ( ) 61 = =0.2202 N 277 = 18 57 = 0.3158 c) Probability that a student will have either red hair or green eyes. d) Probability that a student will have brown hair and brown eyes. P (Red hair or Green eyes) P (Brown hair and Brown eyes) n ( Brown hair Brown eyes ) = N = P (Red hair) + P (Green eyes) - P (Red hair and Green eyes) = 40 28 5 + 277 277 277 63 277 = 0.2274 74 277 0.2671 49. Work on problem number 33 on page 5-19 (a-d) Grade A Freshman 0 Sophomor e 8 Junior 9 Senior 10 Total 27 B C D F Total 0 1 2 0 3 a) 6 7 4 6 31 8 9 1 2 29 11 12 4 1 38 25 29 11 9 101 b) P (Junior and B) = n ( JuniorB ) N = 8 =0.0792 101 P (Not A | Senior) = P ( Not ASenior ) P ( Senior ) = ( 28 /101 ) (38 /101 ) = 28 38 = 0.7368 c) P (D or F) = P (D) + P (F) (Since disjoint) = 11 9 + 101 101 20 101 = 0.1980 d) P (A) = n ( Sophomore ) N P (A | B) = = 31 =0.3069 101 P ( Sophomoregetting a C ) P ( Getting a C ) = ( 7 /101 ) ( 29 /101 ) = 7 29 = 0.2414 Since P (A | B) P (A), the events A and B are not independent. Since P (A and B) is not equal to 0, the events A and B are not mutually exclusive. 50. Work on problem number 17 on page 5-16 (a-e) Ethnic Group A B C Total a) Male 8 6 10 24 Gender Female 2 9 15 26 Total 10 15 25 50 b) P (A or F or both) P (A and F) = = n ( AF ) N = 2 =0.04 50 = P (A) + P (F) - P (A and F) n ( A ) n ( F ) n ( AF ) + = N N N = 10 26 2 + 50 50 50 = 34 50 0.68 c) d) P (F | B) = P ( FB ) P (B) = ( 9 /50 ) ( 15 /50 ) = 9 15 = 0.6 e) P (A and B) = = n ( AB ) N = 0 =0 50 [Since disjoint] P (F and C) = = n ( FC ) N = 15 =0.30 50 MUST SHOW WORK FOR PROBLEMS 40 - 50 Using the following information to answer questions 1-8: 2 _____1. a) 1.90 _____2. 1 3 0 1 2 0 5 1 4 What is the mean? b) 2.11 What is the median? c) 1.73 d) 1.66 a) 1.00 _____3. b) 1.50 What is the mode? c) 1.60 d) 1.90 a) 0 _____4. b) 1 What is the range? c) 2 d) 3 a) 1 _____5. b) 4 What is the variance? c) 5 d) 3 c) 1.76 d) 3.11 a) 1.66 _____6. b) 2.77 What is the standard deviation? a) 1.66 _____7. b) 2.77 What is the 1st quartile? c) 1.76 d) 3.11 a) 0 _____8. b) 1 What is the 70th percentile? c) 2 d) 3 a) 1 b) 1.5 c) 2 d) 2.5 Questions 9-16 refer to the following frequency distribution: The following information represents the number of tons of grain stored at the 60 grain elevators of central Soya, Inc. Class 160-164 165-169 170-174 175-179 180-184 Total Frequency 6 14 16 13 11 60 _____9. What is the relative frequency for the class 170-174? a) 16 b) 36 _____10. What is the mean? c) 0.27 a) 170.25 b) 12 c) 172.75 _____11. What is the median? a) 169.5 b) 170 c) 172.63 d) 0.60 d) 167 d) 173.13 _____12. What is the mode? a) 16 b) 170 _____13. What is the variance? c) 172 d) 174 a) 39.68 b) 12.33 _____14. What is the standard deviation? c) 151.55 d) 149.02 a) 6.30 b) 3.51 _____15. What is the 1st quartile? c) 12.31 d) 12.21 a) 168.21 b) 167.07 _____16. What is the 70th percentile? c) 167.57 d) 167.71 a) 176.35 c) 176.81 d) 177.31 b) 176.85 Part B: True or False (17-30) _____17. The random sample is the most important, because statistical theory applies to it alone. True _____18. In a frequency distribution, the class mark is the number of observations that fall within that class. False _____19. Original class interval frequencies can be obtained by multiplying the respective relative frequencies by the total number of observations. True _____20. The sum of the class frequencies is equal to the number of observations made. True _____21. A relative frequency distribution describes the proportion of data values that fall within each category. True _____22. There would be no need for statistical theory if census, rather than a sample was always used to obtain information about populations. True _____23. The arithmetic mean is the sum of the data values divided by the number of observations. True _____24. P ( A B ) is P ( B| A )P ( A ) True _____25. The median always exists in a set of numerical data. _____26. False P ( A B ) =0 means that A and B are mutually exclusive events. True _____27. Mutually exclusive events imply that if one event occurs, the other cannot occur. An event (e.g., A ) and its complement are always mutually exclusive. _____28. P ( A )+ P ( A )=1 True True _____29. Events are independent when the occurrence of one event has no effect on the probability that another will occur. True _____30. The P(x) is always 0 P(x) 1. True Part C: Answer the following questions (31-39) 31. Name 3 types of statistical samples 1 Convenience Sampling 3 2 Probability (Random) Sampling Mixed Sampling 32. Name reasons why we use samples instead of an entire population Less time and cost More reliability and accuracy Ease of data processing Scope and Flexibility Feasibility Chance to replace non-response 33. Name types of random samples. Simple Random Sampling Stratified Random Sampling Systematic Sampling Cluster Sampling 34. Name the measure of central tendency. Arithmetic Mean or Mean Median Mode 35. Name measures of spread or variability. Range Quartile Deviation Mean Deviation Standard Deviation 36. Show that you understand the difference between a sampling error and a sampling bias. Sampling error is the difference between the estimate of the parameter and the actual value of the parameter. The sampling bias is the inaccuracy of the estimate due to errors in sampling methods such as non-representative sampling, improper weightage etc. 37. Name 4 parts of the regression. Independent variable or Predictor Dependent variable or Response Regression Coefficients Random Error 38. When do we use the regression? For Model Building between X and Y For predicting Y for a given X 39. Explain the concept of error and uncertainty as it relates to decision making. Error is the difference between the actual value and its predicted value by any regression Model. It was due unavoidable variations in the process. Uncertainty is the term related to not knowing which among the possible outcomes happen in the next trail. In almost all the business models we can see this kind of situation. In such cases probabilistic modeling is a better option for decision making. Part D: Must show all your work step by step in order to receive the full credit; Excel is not allowed. (40-50) 40. Train A and Train B are two transportations that people can use to transport from Boston to New York. Sample of times recorded in minutes for each train are shown below. Train A 28 29 32 37 33 25 29 32 41 34 Train B 29 31 33 32 34 30 31 32 35 33 Please answer the following questions (a-i). a) Mean = Train A x 320 = =32 n 10 b) Mean = Train B x 320 = =32 n 10 c) Median d) Median Ordered series: 25 28 29 29 32 32 33 34 37 41 Ordered series: 29 30 31 31 32 32 33 33 34 35 Median is the mean of 5th and 6th terms. Median is the mean of 5th and 6th terms. Median = (32 + 32)/2 = 32 Median = (32 + 32)/2 = 32 e) Variance f) Variance 2 Variance = = 2 X 2( X ) /n Variance = n1 10434( 320 )2 /10 101 = = 21.5556 n1 10270( 320 )2 /10 101 = 3.3333 g) Standard deviation s= X 2( X ) /n h) Standard deviation Variance= 21.5556=4.6428 s= Variance= 3.3333=1.8257 i) From above computed results from Train A and Train B, what train should be preferred and why? Though all the other characteristics for Train A and Train B are the same, but the variance of the Train B is low, so train B should be preferred. 41. Use the information from the problem number 5 on page 3-21 to fill in the given table below and answer the following questions (a-h) (**Using D-method). Number of miles flown Number of frequent MidPoint CF d fd d2 Flyers (f) (x) (1,000) 25 41.5 38 - 45 25 -1 -25 1 57 48.5 45 - 52 32 0 0 0 fd2 25 0 52 - 59 8 65 55.5 1 8 1 8 59 - 66 12 77 62.5 2 24 4 48 66 - 73 3 80 69.5 3 9 9 27 73 - 80 5 85 76.5 4 20 16 Total N = 85 80 188 a) Find mean A+ Mean = 36 b) Find median fd =48.5+ 36 =48.92 N 85 Median = = l+ N /2m XC f 42.525 X7 32 45+ = 48.82 c) Find mode d) Find Range f 0f 1 Mode = l+ 2 f 0f 1f 2 X C = Range = 80 - 38 = 42 3225 X7 2 32258 45+ = 46.58 e) Find variance f) Find standard deviation 2 Variance = = f d 2( fd ) /N Variance Standard deviation = N 1 = 188( 36 )2 /85 84 2.0566 = 1.4341 = 2.0566 g) Find 1st Quartile Q1 = l 1+ N / 4m1 XC f1 h) Find 99th percentile P90 = l 90+ = 59+ 90 N / 100m 90 XC f 90 76.565 X7 12 = 45+ 21.2525 X7 32 = 65.71 = 44.18 42. Use the information from the problem number 1 on page 3-39 to fill in the given table below and answer the following questions (a-h) (**Using D-method). Dinner Check ($) 22 - 31 Frequency (f) 5 MidPoint (x) d 5 26.5 -2 -10 4 20 32 - 41 4 9 36.5 -1 -4 1 4 42 - 51 9 18 46.5 0 0 0 0 52 - 61 2 20 56.5 1 2 1 2 62 - 71 13 33 66.5 2 26 4 52 72 - 81 7 40 76.5 3 21 9 63 141 Total a) Find mean Mean = A+ CF 40 35 b) Find median fd =46.5+ 35 =47.375 N 40 Median = = l+ N /2m XC f 62+ 2020 X 10 13 = 62 c) Find mode f 0f 1 l+ XC Mode = 2 f 0f 1f 2 d) Find Range Range = 80.5 - 21.5 = 59 d2 fd fd2 = 62+ 132 X 10 2 1327 = 68.47 e) Find variance f) Find standard deviation 2 f d 2( fd ) /N Variance = N 1 = 141( 35 )2 /40 39 = Variance Standard deviation = 2.8301 = 1.6823 = 2.8301 g) Find 3rd Quartile Q3 = l 3+ = h) Find 80th percentile 3 N /4m3 XC f3 62+ P80 = l 80+ 3020 X 10 13 = = 69.69 62+ 80 N /100m80 XC f 80 3220 X 10 13 = 71.23 43. The mean GMAT score of 65 applicants who were accepted into the MBA program of Xavier Business School was 520 with variance of 225. About how many applicants scored between 470 and 570 on the GMAT? NP [ 470 X 570 ]=65P [ 470 X 570 ] 65P [ 470520 570520 Z 225 225 65P [ 3.33 Z 3.33 ] 65 { ( 3.33 )(3.33 ) } 65{ 0.99960.0004 } 650.9992 64.95 65 44. Work on problem number 11 on page 3-49 (a-b). (below) In a random sample of 200 Automobile insurance claims obtained from All State Insurance Company, Mean = $615 and the Standard Deviation = $135. Calculate the following intervals a) An Interval that has at least 75% of data within it. b) An interval that has at least 89% of the data within it. a) b) By Chebychev's inequality, P {|X| k } 1 1 k2 P {|X| k } 1 If we select k = 2, then P {|X| 2 } 1 1 22 By Chebychev's inequality, 1 k2 If we select k = 3, then = 0.75 Therefore the required interval is, P {|X| 3 } 1 1 32 = 0.89 Therefore the required interval is, { 2 , +2 } { 3 , +3 } { 6152 135, 615+2 135 } { 6153 135, 615+3 135 } { 345,885 } { 210,1020 } 45. Use the information from the problem number 2 on page 4-8 and answer the following questions (a-h); Given that Y = Income rate and X = Average expense Total Mean a) Find Income Rate 80.7 67.9 72 73.5 73.4 78.7 69.5 515.7 73.67 Average Response 70.5 54.7 61.1 62.2 65 66.8 58.7 439 62.71 b0 b0 = b 1 =62.711.095 73.67=17.928 y x ( X X ) 2 49.4008 33.3094 2.7937 0.0294 0.0737 25.2865 17.4008 128.2943 b) Find ( Y Y )2 60.6173 64.2288 2.6059 0.2645 5.2245 16.6931 16.1145 165.7486 b1 ( X X ) ( Y Y ) 54.7224 46.2539 2.6982 0.0882 -0.6204 20.5453 16.7453 140.4329 b1= c) Interpret the meaning of b1 ( X X ) ( Y Y ) 140.4329 = =1.095 2 128.2943 ( X X ) d) Find the regression equation For every one unit increase in income rate, average response will increase by 1.095 units. ^ =17.928+1.095 x y e) Predict the income rate for an area with an average f) Compute the coefficient of determination expense of $75 2 ( ( X X ) ( Y Y ) ) 2 R= ^ =17.928+1.095 75=64.17 y ( ( X X )2) ( ( Y Y )2 ) ( 140.4329 )2 128.2943 165.7486 0.9274 92.74 g) Interpret the coefficient of determination h) Compute and interpret the coefficient of correlation 92.74% of the variation in Average response is r=+ R2 explained by the regression model by introducing positive] Income rate, in it. + 0.9274 [Since b1 is positive, r also 0.9630 There is a strong positive correlation between X and Y, indicated by r 0.70. 46. An educator wants to see how strong the relationship is between a student's score on a test and his or her grade-point average. The data obtained from the sample are shown. Test Score (X) 98 105 100 100 106 95 116 112 GPA (Y) Total Mean a) Find 2.1 Test Score (X) 98 105 100 100 106 95 116 112 832 104 2.4 GPA (Y) 2.1 2.4 3.2 2.7 2.2 2.3 3.8 3.4 22.1 2.7625 3.2 2.7 2 For every one point increase in test score GPA will increase by 0.0627 points. 3.4 ( X X ) ( Y Y ) 3.975 -0.3625 -1.75 0.25 -1.125 4.1625 12.45 5.1 22.7 b) Find b1 3.8 ( Y Y )2 36 1 16 16 4 81 144 64 362 b0 = b 1 =2.76250.0627 104=3.759 y x 2.3 0.4389 0.1314 0.1914 0.0039 0.3164 0.2139 1.0764 0.4064 2.7788 ( X X ) b0 c) Interpret the meaning of 2.2 b1= b1 ( X X ) ( Y Y ) 22.7 = =0.0627 2 362 ( X X ) d) Find the regression equation ^ =3.759+0.0627 x y e) Predict the GPA for a student who gets 145 in the f) Compute the coefficient of determination test score. 2 ( ( X X ) ( Y Y ) ) 2 R= ^ 145 =3.759+0.0627 145=5.333 y ( ( X X )2) ( ( Y Y )2 ) ( 22.7 )2 362 2.7888 0.5123 51.23 g) Interpret the coefficient of determination 51.23% of the variation in GPA is explained by the regression model by introducing Test Score, in it. h) Compute and interpret the coefficient of correlation r=+ R2 [Since b1 is positive, r also positive] + 0.5123 0.7157 There is a strong positive correlation between X and Y, indicated by r 0.70. 47. Please use the given printout and answer the following questions. SUMMARY OUTPUT Regression Statistics Multiple R 0.190885429 R Square 0.036437247 Adjusted R Square -0.44534413 Standard Error 2.502630195 Observations 4 ANOVA Regression Residual Total df 1 2 3 SS MS F 0.473684211 0.47368421 0.0756303 12.52631579 6.26315789 13 Significance F 0.809114571 Intercept X Variable Coefficients 2.894736842 -0.315789474 t Stat P-value Standard Error 1.904216054 1.52017248 0.2678372 1.148285486 -0.2750095 0.8091146 Lower 95% -5.298443561 -5.256463153 a) What is b0 b) What is b1 Upper 95% 11.08792 4.624884 b0 =2.895 b1=0.316 c) Interpret the meaning of b1 d) What is the regression equation For every one unit increase in X results in a decrease of 0.316 units in Y. e) What is the coefficient of determination ^ =2.8950.316 x y f) What is the coefficient of correlation r=0.191 2 R =0.0364=3.64 g) Interpret the coefficient of determination h) Interpret the coefficient of correlation 3.64% of the variation in Y is explained by the There is a weak negative correlation between regression model by introducing X, in it. X and Y, indicated by -0.20 < r = - 0.191 < 0. 48. From the below cross classification showing the frequencies of hair and eye color of a group of students at North Texas, and answer the following questions (a-d). Hair Color Blond Blond Brown Red/Auburn Black Total Brown 30 74 17 43 164 a) Probability that a student will have blue eyes given he/she is blond. P (Blue | Blond) = = P ( Blonde ) P ( Blond ) (18 /277 ) ( 57 /277 ) Blue 18 28 15 11 72 Eye Color Green 7 10 5 6 28 Hazel 2 7 3 1 13 Total 57 119 40 61 277 b) Probability of black hair. P (Black) = n ( ) 61 = =0.2202 N 277 = 18 57 = 0.3158 c) Probability that a student will have either red hair or green eyes. d) Probability that a student will have brown hair and brown eyes. P (Red hair or Green eyes) P (Brown hair and Brown eyes) n ( Brown hair Brown eyes ) = N = P (Red hair) + P (Green eyes) - P (Red hair and Green eyes) = 40 28 5 + 277 277 277 63 277 = 0.2274 74 277 0.2671 49. Work on problem number 33 on page 5-19 (a-d) Grade A Freshman 0 Sophomor e 8 Junior 9 Senior 10 Total 27 B C D F Total 0 1 2 0 3 a) 6 7 4 6 31 8 9 1 2 29 11 12 4 1 38 25 29 11 9 101 b) P (Junior and B) = n ( JuniorB ) N = 8 =0.0792 101 P (Not A | Senior) = P ( Not ASenior ) P ( Senior ) = ( 28 /101 ) (38 /101 ) = 28 38 = 0.7368 c) P (D or F) = P (D) + P (F) (Since disjoint) = 11 9 + 101 101 20 101 = 0.1980 d) P (A) = n ( Sophomore ) N P (A | B) = = 31 =0.3069 101 P ( Sophomoregetting a C ) P ( Getting a C ) = ( 7 /101 ) ( 29 /101 ) = 7 29 = 0.2414 Since P (A | B) P (A), the events A and B are not independent. Since P (A and B) is not equal to 0, the events A and B are not mutually exclusive. 50. Work on problem number 17 on page 5-16 (a-e) Ethnic Group A B C Total a) Male 8 6 10 24 Gender Female 2 9 15 26 Total 10 15 25 50 b) P (A or F or both) P (A and F) = = n ( AF ) N = 2 =0.04 50 = P (A) + P (F) - P (A and F) n ( A ) n ( F ) n ( AF ) + = N N N = 10 26 2 + 50 50 50 = 34 50 0.68 c) d) P (F | B) = P ( FB ) P (B) = ( 9 /50 ) ( 15 /50 ) = 9 15 = 0.6 e) P (A and B) = = n ( AB ) N = 0 =0 50 [Since disjoint] P (F and C) = = n ( FC ) N = 15 =0.30 50