The data set Emission provides hydrocarbon emission in parts per million (ppm) at idling speed for cars,

Question:

The data set Emission provides hydrocarbon emission in parts per million (ppm) at idling speed for cars, based on the year each car was manufactured. These data were randomly sampled from a much larger study on pollution control in Albuquerque, New Mexico.
a. Create individual value plots or side-by-side boxplots of Emission versus Year. Compare the mean and standard deviation of each group. Do the data within each group look consistent with data from a normal population?
b. Transform the response by taking the log of Emission. Create individual value plots or side-by-side boxplots of log Emission versus Year. Compare the plot of the transformated data to the plot in Part A. Which plot shows data that better fit the model assumptions?
c. Calculate an ANOVA table, F-test, and p-value to determine if the average log (Emission) varies based on Year. Note that the end-of-chapter exercises and Section 2.9 show that ANOVA can compare more than two groups. In this question, I = 5 groups instead of 2 groups. However, the model and calculations are identical except that now i = 1, 2, 3, 4, 5 instead of i = 1, 2. The null hypothesis is H0: µ1 = µ2 = µ3 = µ4 = µ5 versus the alternative Ha: at least one group mean is different from another.
d. Create residual plots to evaluate whether the model assumptions for the F-test are violated.
Although the log transformation was helpful, the data still have outliers. In addition, the equal variance and normality assumptions are still slightly violated. Some statisticians would consider the log-transformed data appropriate for the standard ANOVA. Others would try another transformation, such as taking the log of the transformed data again; this is called a log log transformation. Still others would suggest using a nonparametric test. (Nonparametric tests, such as the Kruskal-Wallis test, are described in Chapter 1). Nonparametric tests do not require error terms to follow the normal distribution. While any of these analyses would be appropriate, it would not be appropriate to con-duct several analyses on the same data and then report only the conclusions corresponding to the test that gave the smallest p-value. For example, if we tried three or four hypothesis tests each with an α-level = 0.10 and then simply picked the test with the smallest p-value, our chances of incorrectly rejecting the null hypothesis would actually be greater than 10%.