Defining the Problem (1). Lead is an environmental pollutant especially worthy of attention because of its damaging

Question:

Defining the Problem (1). Lead is an environmental pollutant especially worthy of attention because of its damaging effects on the neurological and intellectual development of children. Morton et al. (1982) collected data on lead absorption by children whose parents worked at a factory in Oklahoma where lead was used in the manufacture of batteries. The concern was that children might be exposed to lead inadvertently brought home on the bodies or clothing of their parents. Levels of lead (in micrograms per deciliter) were measured in blood samples taken from 33 children who might have been exposed in this way. They constitute the exposed group.
Collecting the Data (2). The researchers formed a control group by making matched pairs. For each of the 33 children in the exposed group they selected a matching child of the same age, living in the same neighborhood, and with parents employed at a place where lead is not used. The data set LEADKIDS contains three variables, each with 33 cases. All involve measurements of lead in micrograms per deciliter of blood.

These data are listed next.

Defining the Problem (1). Lead is an environmental pollutant especially

This is necessarily an observational study rather than a controlled experiment. There is no way that the researchers could have assigned children at random to parents in or out of lead-related occupations. Furthermore, the exposed subjects were all chosen from the small group of children whose parents worked at one particular plant. They were not chosen from the larger population of children everywhere who might be exposed to lead as a result of their parents€™ working conditions. If lead levels are unusually high in the exposed group, it might be argued that the lead in their blood came from some source other than their parents€™ place of work: from lead solder in water pipes at home, from lead-paint dust at school, from air pollution, and so on. For this reason, a properly chosen control group of children is crucial to the credibility of the study. In principle, the children in the control group should be subject to all of the same possible lead contaminants as those in the exposed group except for lead brought home from work by parents. In practice, the designers of this study chose to use two criteria in forming pairs: neighborhood and age. Neighborhood seems a reasonable choice because general environmental conditions, types of housing, and so on could vary greatly for children living in different neighborhoods. Controlling for age seems reasonable because lead poisoning is largely cumulative, so levels of lead might be higher in older children. Thus, for each child in the exposed group, researchers sought a paired child of the same age and living in the same neighborhood.
Summarizing the Data (3). We begin by looking at dot plots of the data for the exposed and control groups:

We can see that over half of the children in the exposed group have more lead in their blood than do any of the children in the control group. This graphical comparison is not the most effective one we could make because it ignores the pairing of exposed and control children. Even so, it presents clear evidence that, on average, the exposed children have more lead in their blood than do the control children. Notice that the lead levels of the exposed group are much more diverse than those of the control group. This suggests that some children in the exposed group are getting a lot more lead presumably from their working parents, than are others in this group. Perhaps some parents at the battery factory do not work in areas where they come into direct contact with lead. Perhaps some parents wear protective clothing that is left at work, or they shower before they leave work. For this study, information on the exposure and hygiene of parents was collected by the investigators. Such factors were found to contribute to the diversity of the lead levels observed among the exposed children. Some toxicologists believe that any amount of lead may be detrimental to children, but all agree that the highest levels among the exposed children in our study are dangerously high. Specifically, it is generally agreed that children with lead levels above 40 micrograms per deciliter need medical treatment. Children above 60 on this scale should be immediately hospitalized for treatment (Miller and Keane, 1957). A quick glance at the dot plot shows that we are looking at some serious cases of lead poisoning in the exposed group. By plotting differences, we get an even sharper picture. For each matched pair of children the variable Diff shows how much more lead the exposed child has than his or her control neighbor of the same age.

If we consider a hypothetical population of pairs of children, the difference measures the in-creased lead levels that may result from exposure via a parent working at the battery factory. If parents who work at the battery factory were not bringing lead home with them, we would expect about half of these values to be positive and half to be negative. The lead values in the blood would vary but in such a way that the exposed child would have only Î± =0€“ 50 chance of having the higher value. Thus, we would expect the dot plot to be centered near 0. In contrast, look at the dot plot of the actual data. Almost every child in the exposed group has a higher lead value than does the corresponding control child. As a result, most of the differences are positive. The average of the differences is the balance point of the dot plot, located somewhat above 15. (In some respects, we can read the dot plot quite precisely. In 1 pair out of 33, both children have the same value, to the nearest whole number as reported. In only 4 pairs does the control child have the higher level of lead.) The dot plot of the differences displays strong evidence that the children in the exposed group have more lead than their control counterparts. It will be necessary to perform some formal statistical tests to check whether this effect is statistically significant, but we already suspect from this striking graph what the conclusion must be.
We have looked directly at the pairs of children around which the study was built. It may take a bit more thought to deal with differences than to look at the separate variables exposed and control as we did previously. But looking at pairs is best. If the effect had turned out to be weaker and if we had not thought to look at pairs, then we might have missed seeing the effect.
a. Obtain the mean, median, and standard deviation for each of the three variables in LEADKIDS.
1) Compare the median of the exposed children with the maximum of the control children. What statement in the discussion does this confirm?
2) Compare the difference between the individual means of the exposed and control groups with the mean of the differences. On average, how much higher are the lead values for exposed children?
b. In contrast to part (a), notice that the difference between the individual medians of the exposed and control groups is not the same as the median for Diff. Why not? Which figure based on medians would you use if you were trying to give the most accurate view of the increase in lead exposure due to a parent working at the battery factory?