We usually do not think of the distribution of income as being normally distributed. Most histograms show the data to be skewed, with a long right tail reaching out toward Bill Gates. This sort of skewness is so common, but normality so useful, that the lognormal model has become popular. The lognormal model says that the logarithm of the data follows a normal distribution. (It does not matter which log you use because one log is just a constant multiple of another.)
For this exercise, we’ll use a sample of household incomes from the 2010 U.S. Community Survey, which has replaced the decennial census as a source of information about U.S. households. This sample includes 392 households from coastal South Carolina.
(a) What advantage would there be in using a normal model for the logs rather than a model that described the skewness directly?
(b) If poverty in this area is defined as having a household income less than +20,000, how can you use a lognormal model to find the percentage of households in poverty?
(c) These data are reported to be a sample of house-holds in coastal South Carolina. If the households are equally divided between just two communities in this region, would that cause problems in using these data?
(d) How do you plan to check whether the lognormal model is appropriate for these incomes?
(e) Does a normal model offer a good description of the household incomes? Explain.
(f) Does a normal model offer a good description of the logarithm of household incomes? Explain.
(g) Using the lognormal model with parameters set to match this sample, find the probability of finding a household with income less than +20,000.
(h) Is the lognormal model suitable for determining this probability?
(i) Describe these data using a lognormal model, pointing out strengths and any important weaknesses or limitations.