Several statistics are commonly used to detect nonnormality in underlying population distributions. Here we will study one

Question:

Several statistics are commonly used to detect nonnormality in underlying population distributions. Here we will study one that measures the amount of skewness in a distribution. Recall that any normally distributed random variable is symmetric about its mean; therefore, if we standardize a symmetrically distributed random variable, say z = (y – µ_y) /s_y, where µ_y = E(y) and s_y = sd(y), then z has mean zero, variance one, and E(z³) = 0. Given a sample of data {y_i: i = 1, . . . , n), we can standardize y_i in the sample by using z_i = (y_i – µ̂_y) /ŝ_y, where µ̂_y y is the sample mean and ŝ_y is the sample standard deviation. (We ignore the fact that these are estimates based on the sample.) A sample statistic that measures skewness is n_–1Σⁿ_i=1z³i, or where n is replaced with (n – 1) as a degrees-of-freedom adjustment.

If y has a normal distribution in the population, the skewness measure in the sample for the standardized values should not differ significantly from zero.

(i) First use the data set 401KSUBS, keeping only observations with fsize = 1. Find the skewness measure for inc. Do the same for log(inc). Which variable has more skewness and therefore seems less likely to be normally distributed?

(ii) Next use BWGHT2. Find the skewness measures for bwght and log(bwght). What do you conclude?

(iii) Evaluate the following statement: “The logarithmic transformation always makes a positive variable look more normally distributed.”

(iv) If we are interested in the normality assumption in the context of regression, should we be evaluating the unconditional distributions of y and log(y)? Explain.

Fantastic news! We've Found the answer you've been seeking!