Question: Need help solving, please show all work on number 1 and 2 at the bottom of the assignment. 1 Analysis of Variance (ANOVA) In Chapter
Need help solving, please show all work on number 1 and 2 at the bottom of the assignment.

1 Analysis of Variance (ANOVA) In Chapter 13 we discussed methods for testing H0 : 1 2 = 0 (i.e., 1 = 2 ), where 1 and 2 are the means of two different populations or the true mean responses when two different treatments are applied. Many investigations involve a comparison of more than two population or treatment means. For example, an investigation was carried out to study possible consequences of the high incidence of head injuries among soccer players (\"No Evidence of Impaired Neurocognitive Performance in Collegiate Soccer Players,\" The American Journal of Sports Medicine [2002]: 157-162). Three groups of college students (soccer athletes, non-soccer athletes, and a control group consisting of students who did not participate in intercollegiate sports) were considered in the study, and the following information on scores from the Hopkins Verbal Learning Test (which measures immediate memory recall) was given in the paper: Group Sample Size Sample Mean Score Sample Standard Deviation Soccer Athletes 86 29.90 3.73 NonSoccer Athletes 95 30.94 5.14 Control 53 29.32 3.78 Let 1 and 2 denote the true average (i.e., population mean) scores on the Hopkins test for soccer athletes, non-soccer athletes, and a control group (the students who do not participate in collegiate athletics, respectively). Do the data support the claim that 1 = 2 = 3 , or does it appear that at least two of the 's are different from one another? This is an example of a single-factor analysis of variance (ANOVA) problem, in which the objective is to decide whether the means for more than two populations or treatments are identical. 2 Single-Factor ANOVA and the F Test When two or more populations or treatments are being compared, the characteristic that distinguishes the populations or treatments from one another is called the factor under investigation. For example, an experiment might be carried out to compare three different methods for teaching reading (three different treatments), in which case the factor of interest would be teaching method, a qualitative factor. If the growth of fish raised in waters having different salinity levels- 0%, 10%, 20%, and 30%- is of interest, the factor salinity level is quantitative. A single-factor analysis of variance (ANOVA) problem involves a comparison of k population or treatment means 1 , 2 , . . . , k . The objective is to test H0 : 1 = 2 = = k against Ha : at least two of the 's are different When comparing populations, the analysis is based on independently selected random samples, one from each population. When comparing treatment means, the data typically result from an experiment and the analysis assumes random assignment of the experimental units (subjects 1 or objects) to treatments. Whether the null hypothesis of a single-factor ANOVA should be rejected depends on how substantially the samples from the different populations or treatments differ from one another. 2.1 Notations and Assumptions Notation in single-factor ANOVA is a natural extension of the notation used in Chapter 11 for comparing two population or treatment means. ANOVA Notation k = number of populations or treatments being compared Population or treatment Population or treatment mean Population or treatment variance Sample size Sample mean Sample variance 1 1 12 n1 x 1 s21 2 2 22 n2 x 2 s22 k k k2 nk x k s2k N = n1 + n2 + + nk (the total number of observations in the data set) T = grand total = sum of all N observations = n1 x 1 + n2 x 2 + + nk x k = grand mean = x T N A decision between H0 and H1 is based on examining the x values to see whether observed discrepancies are small enough to be attributable simply to sampling variability or whether an alternative explanation for the differences is more plausible. 2.2 Example 15.1 An Indicator of Heart Attack Risk The article \"Could Mean Platelet Volume Be a Predictive Marker for Acute Myocardial Infarction?\" (Medical Science Monitor [2005]: 387-392) described an experiment in which four groups of patients see king treatment for chest pain were compared with respect to mean platelet volume (MPV, measured in fL). The four groups considered were based on the clinical diagnosis and were (1) noncardiac chest pain, (2) stable angina pectoris, (3) unstable angina pectoris, and (4) myocardial infarction (heart attack). The purpose of the study was to determine if the mean MVP was different for the heart attack group, because then MPV could be used as an indicator of heart attack risk and an antiplatelet treatment could be administered in a timely fashion, potentially reducing the risk of heart attack. To carry out this study, patients seen for chest pain were divided into groups according to diagnosis. The researchers then selected a random sample of 35 from each of the resulting k = 4 groups. The researchers believed that this sampling process would result in samples that were representative of the four populations of interest and that could be regarded as if they were 2 random samples from these four populations. Table 15.1 presents summary values given in the paper. Table 15.1 Summary Values for MPV Data of Example 15.1 Group Number 1 2 3 4 Group Description Noncardiac chest pain Stable angina pectoris Unstable angina pectoris Myocardial infarction (heart attack) Sample Size 35 35 35 35 Sample Mean 10.89 11.25 11.37 11.75 Sample Standard Deviation 0.69 0.74 0.91 1.07 With i denoting the true mean MPV for group i(i = 1, 2, 3, 4), let's consider the null hypothesis H0 : 1 = 2 = 3 = 4 . If you compare the given sample means, the mean MVP for the heart attack sample is larger than for the other three samples, it has larger standard deviation too. So, it is not obvious whether H0 is true or false. In situations such as this, we need a formal test procedure. As with the inferential methods of previous chapters, the validity of the ANOVA test for H0 : 1 = 2 = = k requires some assumptions. 2.3 Assumptions for ANOVA 1. Each of the k population or treatment response distributions is normal. 2. 1 = 1 = = k (The k normal distributions have identical standard deviations.) 3. The observations in the sample from any particular one of the k populations or treatments are independent of one another. 4. When comparing population means, k random samples are selected independently of one another. When comparing treatment means treatments are assigned at random to subjects or objects (or, subjects are assigned at random to treatments). In practice, the test based on these assumptions works well as long as the assumptions are not too badly violated. If the sample sizes are reasonably large, normal probability plots of the data in each sample are helpful in checking the assumption of normality. Often, however, sample sizes are so small that a separate normal probability plot for each sample is of little value in checking normality. There is a formal procedure for testing the equality of population standard deviations. Unfortunately, it is quite sensitive to even a small departure from the normality assumption, so we do not recommend its use. Instead, we suggest that the ANOVA F test (to be described subsequently) can safely be used if the largest of the sample standard deviations is at most twice the smallest one. The largest standard deviation is Example 15.1 is s4 = 1.07, which is only about 1.5 times the smallest standard deviation (s1 = 0.69). The test procedure is based on the following measures of variation in the data. 3 Definition A measure of disparity among the sample means is the treatment sum of squares, denoted by SSTr and given by )2 + n2 ( )2 + + nk ( )2 SSTr = n1 ( x1 x x2 x xk x A measure of variation within the k samples, called error sum of squares and denoted by SSE, is SSE = (n1 1)s21 + (n2 1)s22 + + (nk 1)s2k Each sum of squares has an associated degrees of freedom: treatment df = k 1 error df = N k A mean square is a sum of squares divided by its df. In particular, mean square for treatments = MSTr = mean square for error = MSE = SSTr k1 SSE nk The number of error degrees of freedom comes from adding the number of degrees of freedom associated with each of the sample variances: (n1 1) + (n2 1) + + (nk 1) = n1 + n2 + + nk 1 1 1 = N k 2.4 Heart Attack Calculations Let's return to the mean platelet volume (MPV) data of Example 15.1. The grand mean x was computed to be 11.315. Notice that because the sample sizes are all equal, the grand mean is just the average of the four sample means (this will not usually be the case when the sample sizes are unequal). With x 1 = 10.89, x 2 = 11.25, x 3 = 11.34, x 4 = 11.75, and n1 = n2 = n3 = n4 = 35, )2 + n2 ( )2 + + nk ( )2 SSTr = n1 ( x1 x x2 x xk x = 35(10.89 11.315)2 + 35(11.25 11.315)2 + 35(11.37 11.315)2 + 35(11.75 11.315)2 = 6.322 + 0.148 + 0.106 + 6.623 = 13.199 Because s1 = 0.69, s2 = 0.74, s3 = 0.91, and s4 = 1.07 SSE = (n1 1)s21 + (n2 1)s22 + + (nk 1)s2k = (35 1)(0.69)2 + (35 1)(0.74)2 + (35 1)(0.91)2 + (35 1)(1.07)2 = 101.888 The numbers of degrees of freedom are treatment df = k 1 = 3 error df = N k = 35 + 35 + 35 + 35 4 = 136 4 from which MSTr = MSE = SSTr 13.199 = = 4.400 k1 3 SSE 101.888 = = 0.749 nk 136 Both MSTr and MSE are quantities whose values can be calculated once sample data are available; i.e., they are statistics. Each of these statistics varies in value from data set to data set. Both statistics MSTr and MSE have sampling distributions, and these sampling distributions have mean values. 2.5 The Single-Factor ANOVA F Test Null hypothesis : H0 : 1 = 2 = = k Test Statistic : F = M ST r M SE When H0 is true and the ANOVA assumptions are reasonable, F has an F distribution with df1 = k 1 and df2 = N k. H0 should be rejected if pvalue 2.6 Heart Attack Calculations Continued The two mean squares for the MPV data given in Example 15.1 were calculated as MSTr = 13.199 = 4.400 3 and MSE = 101.888 = 0.749 136 The value of F statistic is then MSTr = M ST r 4.400 = = 5.87 M SE 0.749 with df1 = k 1 = 3 and df2 = N k = 140 4 = 136. Using df1 = 3 and df2 = 120 (the closest value to 136 that appears in the table), Appendix Table 6 shows that 5.78 captures the tail area 0.001. Since 5.87
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
