# Question

Statistical analyses are often featured in lawsuits that allege discrimination. Two-sample methods are particularly common because they can be used to quantify differences between the average salaries of, for example, men and women.

A lawsuit filed against Wal-Mart in 2003 alleged that the retailer discriminated against women. As part of their argument, lawyers for the plaintiffs observed that men who managed Wal-Mart stores in 2001 made an average of $105,682 compared to $89,280 for women who were managers, a difference of $16,402 annually. At the higher level of district manager, men in 2001 made an average of $239,519 compared to $177,149 for women. Of the 508 district managers in 2003, 50 were women 19.8%2.

The data used to obtain these numbers are private to the litigation, but we can guess reasonable numbers. Let’s focus on the smaller group, the district managers. All we need are standard deviations for the two groups. You’ll frequently find yourself in a situation where you only get to see the summary numbers as in this example. Let’s use $50,000 for the standard deviation of the pay to women and $60,000 for men. Using the Empirical Rule as a guide, we’re guessing that about two-thirds of the female district managers make between $125,000 and $225,000 and two-thirds of the male district managers make $180,000 to $300,000.

Issues of guessing the variation aside, such comparisons have to deal with issues of confounding.

Motivation

(a) If a statistical analysis finds that Wal-Mart pays women statistically significantly less than men in the same position, how do you expect a jury to react to this finding?

(b) Do you think that Wal-Mart or the plaintiffs have more to gain from doing a statistical analysis that compares these salaries?

Method

(c) We don’t observe the salaries of either male or female managers. If the actual distribution of these salaries is not normal, does that mean that we cannot use t-tests or intervals?

(d) Explain why a confidence interval for the difference in salaries is the more natural technique to use to quantify the statistical significance of the difference in salaries between male and female district managers.

(e) We don’t have a sample; these are the average salaries of all district managers. Doesn’t that mean we know μ M and mF? If so, why is a confidence interval useful?

(f) These positions at Wal-Mart are relatively high ranking. One suspects that most of these managers know each other. Does that suggest a problem with the usual analysis?

(g) Can you think of any lurking factors that might distort the comparison between the means of these two groups?

Mechanics

(h) Estimate the standard error of M - F.

(i) Find the 95% two-sample confidence interval. Estimate the degrees of freedom as n M + n F - 2 and round the endpoints of the confidence interval as needed to present in your message.

(j) If the two guessed values for the sample standard deviations are off by a factor of 2 (so that the standard deviation for women is $100,000 and for men is $120,000), what happens to the confidence interval?

(k) One way to avoid some types of lurking factors is to restrict the comparison to cases that are more similar, such as district managers who work in the same region. If restricting the managers to one region reduces the sample size to 100 (rather than 508), what effect will this have on the confidence interval?

Message

(l) Write a one- or two-sentence summary that interprets for the court the 95% confidence interval.

(m) What important caveats should be mentioned along with this interval?

A lawsuit filed against Wal-Mart in 2003 alleged that the retailer discriminated against women. As part of their argument, lawyers for the plaintiffs observed that men who managed Wal-Mart stores in 2001 made an average of $105,682 compared to $89,280 for women who were managers, a difference of $16,402 annually. At the higher level of district manager, men in 2001 made an average of $239,519 compared to $177,149 for women. Of the 508 district managers in 2003, 50 were women 19.8%2.

The data used to obtain these numbers are private to the litigation, but we can guess reasonable numbers. Let’s focus on the smaller group, the district managers. All we need are standard deviations for the two groups. You’ll frequently find yourself in a situation where you only get to see the summary numbers as in this example. Let’s use $50,000 for the standard deviation of the pay to women and $60,000 for men. Using the Empirical Rule as a guide, we’re guessing that about two-thirds of the female district managers make between $125,000 and $225,000 and two-thirds of the male district managers make $180,000 to $300,000.

Issues of guessing the variation aside, such comparisons have to deal with issues of confounding.

Motivation

(a) If a statistical analysis finds that Wal-Mart pays women statistically significantly less than men in the same position, how do you expect a jury to react to this finding?

(b) Do you think that Wal-Mart or the plaintiffs have more to gain from doing a statistical analysis that compares these salaries?

Method

(c) We don’t observe the salaries of either male or female managers. If the actual distribution of these salaries is not normal, does that mean that we cannot use t-tests or intervals?

(d) Explain why a confidence interval for the difference in salaries is the more natural technique to use to quantify the statistical significance of the difference in salaries between male and female district managers.

(e) We don’t have a sample; these are the average salaries of all district managers. Doesn’t that mean we know μ M and mF? If so, why is a confidence interval useful?

(f) These positions at Wal-Mart are relatively high ranking. One suspects that most of these managers know each other. Does that suggest a problem with the usual analysis?

(g) Can you think of any lurking factors that might distort the comparison between the means of these two groups?

Mechanics

(h) Estimate the standard error of M - F.

(i) Find the 95% two-sample confidence interval. Estimate the degrees of freedom as n M + n F - 2 and round the endpoints of the confidence interval as needed to present in your message.

(j) If the two guessed values for the sample standard deviations are off by a factor of 2 (so that the standard deviation for women is $100,000 and for men is $120,000), what happens to the confidence interval?

(k) One way to avoid some types of lurking factors is to restrict the comparison to cases that are more similar, such as district managers who work in the same region. If restricting the managers to one region reduces the sample size to 100 (rather than 508), what effect will this have on the confidence interval?

Message

(l) Write a one- or two-sentence summary that interprets for the court the 95% confidence interval.

(m) What important caveats should be mentioned along with this interval?

## Answer to relevant Questions

1. A bank audited 100 randomly selected transactions of a newly hired cashier and found that all 100 were done correctly. What is the 95% confidence interval for the cashier’s probability of an error? 2. In order to be 95% ...(a) Does this stacked bar chart suggest that the chi-squared test of independence will be statistically significant? Explain why or why not. (b) What are the degrees of freedom in the chi-squared test of independence for ...The table shown with this question counts the number of calls that arrive at a telephone help desk during the hours of 1 to 3 P.M. on 5 weekdays. The company uses the same number of employees to staff the center for each of ...Vendors work hard to put their products in front of consumers, whether choosing the right site for advertising on the Web or getting the best location in a crowded super-market. Is the effort worthwhile? How could you tell? ...An assembly plant tracks the daily productivity of the workers on the line. Each day, for every employee, the plant records the number of hours put in (Hours) and the number of completed packages assembled by the employee ...Post your question

0