# Question: We usually do not think of the distribution of income

We usually do not think of the distribution of income as being normally distributed. Most histograms show the data to be skewed, with a long right tail reaching out toward Bill Gates. This sort of skewness is so common, but normality so useful, that the lognormal model has become popular. The lognormal model says that the logarithm of the data follows a normal distribution. (It does not matter which log you use because one log is just a constant multiple of another.)

For this exercise, we’ll use a sample of household incomes from the 2010 U.S. Community Survey, which has replaced the decennial census as a source of information about U.S. households. This sample includes 392 households from coastal South Carolina.

Motivation

(a) What advantage would there be in using a normal model for the logs rather than a model that described the skewness directly?

(b) If poverty in this area is defined as having a household income less than +20,000, how can you use a lognormal model to find the percentage of households in poverty?

Method

(c) These data are reported to be a sample of house-holds in coastal South Carolina. If the households are equally divided between just two communities in this region, would that cause problems in using these data?

(d) How do you plan to check whether the lognormal model is appropriate for these incomes?

Mechanics

(e) Does a normal model offer a good description of the household incomes? Explain.

(f) Does a normal model offer a good description of the logarithm of household incomes? Explain.

(g) Using the lognormal model with parameters set to match this sample, find the probability of finding a household with income less than +20,000.

(h) Is the lognormal model suitable for determining this probability?

Message

(i) Describe these data using a lognormal model, pointing out strengths and any important weaknesses or limitations.

For this exercise, we’ll use a sample of household incomes from the 2010 U.S. Community Survey, which has replaced the decennial census as a source of information about U.S. households. This sample includes 392 households from coastal South Carolina.

Motivation

(a) What advantage would there be in using a normal model for the logs rather than a model that described the skewness directly?

(b) If poverty in this area is defined as having a household income less than +20,000, how can you use a lognormal model to find the percentage of households in poverty?

Method

(c) These data are reported to be a sample of house-holds in coastal South Carolina. If the households are equally divided between just two communities in this region, would that cause problems in using these data?

(d) How do you plan to check whether the lognormal model is appropriate for these incomes?

Mechanics

(e) Does a normal model offer a good description of the household incomes? Explain.

(f) Does a normal model offer a good description of the logarithm of household incomes? Explain.

(g) Using the lognormal model with parameters set to match this sample, find the probability of finding a household with income less than +20,000.

(h) Is the lognormal model suitable for determining this probability?

Message

(i) Describe these data using a lognormal model, pointing out strengths and any important weaknesses or limitations.

## Answer to relevant Questions

1. Sample 2. Census 3. Target population 4. Statistic 5. Parameter 6. Sampling frame 7. Simple random sample 8. Stratified sample 9. Bias 10. Nonresponse (a) A complete collection of items desired to be studied (b) A list of ...Inspectors from the food-safety division of the Department of Agriculture visit dairy farms unannounced and take samples of the milk to test for contamination. If the samples are found to contain dirt, antibiotics unsuited ...Print a blank 50 × 15 spreadsheet on construction paper, and then cut the printed cells into 25 rectangles of varying size, some with just a few cells and some with many. Ask a few friends to pick a sample of 5 pieces from ...Do you think the following data would represent processes that were under control or out of control? Explain your thinking. (a) Monthly shipments of snow skis to retail stores (b) Number of daily transactions at the service ...One stage in the manufacture of semiconductor chips applies an insulator on the chips. This process must coat the chip evenly to the desired thickness of 250 microns or the chip will not be able to run at the desired speed. ...Post your question