1 Million+ Step-by-step solutions

The General Social Survey in 2012 asked whether we should allow antireligious books in the library. Of the 3077 people who responded, 23% said they should be removed. Identify

(a) The sample,

(b) The population,

(c) The sample statistic reported.

The Voice of the People poll asks a random sample of citizens from different countries around the world their opinion about global issues. Included in the list of questions is whether they feel that globalization helps their country. The reported results were combined by regions. In the most recent poll, 74% of Africans felt globalization helps their country, whereas 38% of North Americans believe it helps their country.

(a) Identify the samples and the populations.

(b) Are these percentages sample statistics or population parameters? Explain your answer.

The figure shown is a graph published by Statistics Sweden. It compares Swedish society in 1750 and in 2010 on the numbers of men and women of various ages, using age pyramids. Explain how this indicates that

a. In 1750, few Swedish people were old.

b. In 2010, Sweden had many more people than in 1750.

c. In 2010, of those who were very old, more were female than male.

d. In 2010, the largest five-year group included people born during the era of first manned space flight.

Consider the population of all students at your school. A certain proportion support mandatory national service (MNS) following high school. Your friend randomly samples 20 students from the school and uses the sample proportion who support MNS to predict the population proportion at the school. You take your own, separate random sample of 20 students and find the sample proportion that supports MNS. For the two studies,

a. Are the populations the same?

b. How likely is it that the samples are the same? Explain.

c. How likely is it that the sample proportions are the same? Explain.

The following table shows the result of the 2012 presidential election along with the vote predicted by several organizations in the days before the election. The sample sizes were typically about 1000 to 2000 people. The percentages for each poll do not sum to 100 because of voters who indicated they were undecided or preferred another candidate.

**Predicted Vote**__Poll Obama Romney__

Rasmussen................. 48.................................... 49

CNN............................. 49.................................... 49

Gallup.......................... 48.................................... 49

Pew Research............. 50.................................... 47

Rand............................ 54.................................... 43

Fox............................... 46.................................... 46

Actual Vote............... 50.6................................. 47.8

a. Treating the sample sizes as 1000 each, find the approximate margin of error.

b. Do most of the predictions fall within the margin of error of the actual vote percentages? Considering the relative sizes of the sample, the population, and the undecided factor, would you say that these polls had good accuracy?

A study published in 2010 in The New England Journal of Medicine investigated the effect of financial incentives on smoking cessation. As part of the study, 878 employees of a company, all of whom were smokers, were randomly assigned to one of two treatment groups. One group (442 employees) was to receive information about smoking cessation programs; the other (436 employees) was to receive that same information as well as a financial incentive to quit smoking. The outcome of interest of the study was smoking cessation status six months after the initial cessation was reported. After implementation of the program, 14.7% of individuals in the financial incentive group reported cessation six months after the initial report, compared to 5.0% of the information-only group. Assume that the observed difference in cessation rates between the groups 114.7% - 5.0% = 9.7%2 is statistically significant.

a. What does it mean to be statistically significant? (choose the best option from (i)–(iv))

i. The financial option was offered to 9.7% more smokers in the study than the nonsmokers who were employees of the company.

ii. 9.7% was calculated using statistical techniques.

iii. If there were no true impact of the financial incentive, the observed difference of 9.7% is unlikely to have occurred by chance alone.

iv. We know that if the financial incentive were given to all smokers, 9.7% would quit smoking.

b. Is the difference between the groups attributable to the financial incentive?

Refer to Activity 2 on page 21.

a. Repeat the activity using a population proportion of 0.60: Take at least five samples of size 10 each; observe how the sample proportions of successes vary around 0.60 and then do the same thing with at least five samples of size 1000 each.

b. In part a, what seems to be the effect of the sample size on the amount by which sample proportions tend to vary around the population proportion, 0.60?

c. What is the practical implication of the effect of the sample size summarized in part b with respect to making inferences about the population proportion when you collect data and observe only the sample proportion?

The General Social Survey asked, in 2012, whether you would commit suicide if you had an incurable disease. Of the 3112 people who had an opinion about this, 1862, or 59.8%, would commit suicide.

a. Describe the population of interest.

b. Explain how the sample data are summarized using descriptive statistics.

c. For what population parameter might we want to make an inference?

An article in the Journal of American College Health reports that, in a survey of 1845 college students from a large, southeastern public university, 27% were at risk for at least one sleep disorder, with a margin of error of 2%. Explain how this margin of error provides an inferential statistical analysis.

The dot plot shows the cereal sodium values from Example 4. What aspect of the distribution causes the mean to be less than the median?

The Energy Information Agency reported the CO_{2} emissions (measured in gigatons, Gt) from fossil fuel combustion for the top 10 emitting countries in 2011. These are China (8 Gt), the United States (5.3 Gt), India (1.8 Gt), Russia (1.7 Gt), Japan (1.2 Gt), Germany (0.8 Gt), Korea (0.6 Gt), Canada(0.5 Gt), Iran (0.4 Gt), and Saudi Arabia (0.4 Gt).

a. Find the mean and median CO_{2} emission.

b. The totals reported here do not take into account a nation’s population size. Explain why it may be more sensible to analyze per capita values, as was done in Example 11.

For each of the following variables, would you use the median or mean for describing the center of the distribution? Why?

a. Charges on a credit card

b. Weight of male students in your class this semester

c. Life expectancy of females in developed nations.

For each of the following variables, would you use the median or mean for describing the center of the distribution? Why?

a. Salary of employees of a university

b. Time spent on a difficult exam

c. Scores on a standardized test

The Animals data set at the book’s website contains data on the length of the gestational period (in days) of 21 animals.

a. Using software, plot a histogram of the gestational period.

b. Do you see any observation that is unusual? Which animal is it?

c. Is the distribution right-skewed or left-skewed?

d. Try to override the default setting and plot a histogram with only very few (e.g., 3 or 4) intervals and one with many (e.g., 30) intervals. Would you prefer either one to the histogram created in part a? Why or why not?

Repeat the preceding exercise for

a. The scores of students (out of 100 points) on a very easy exam in which most score perfectly or nearly so, but a few score very poorly

b. The weekly church contribution for all members of a congregation, in which the three wealthiest members contribute generously each week

c. Time needed to complete a difficult exam (maximum time is 1 hour)

d. Number of music CDs (compact discs) owned, for each student in your school.

a. The scores of students (out of 100 points) on a very easy exam in which most score perfectly or nearly so, but a few score very poorly

b. The weekly church contribution for all members of a congregation, in which the three wealthiest members contribute generously each week

c. Time needed to complete a difficult exam (maximum time is 1 hour)

d. Number of music CDs (compact discs) owned, for each student in your school.

**Repeat the exercise**

The figure below shows the stem-and-leaf plot for the cereal sugar values from Example 5, using split stems.

Stem and Leaf Plot for Cereal Sugar Values with Leaf Unit = 1000

The figure below shows the stem-and-leaf plot for the cereal sugar values from Example 5, using split stems.

Stem and Leaf Plot for Cereal Sugar Values with Leaf Unit = 1000

a. What was the smallest and largest amount of sugar found in the 20 cereals?

b. What sugar values are represented on the 6th line of the plot?

c. How many cereals have a sugar content less than 5 g?

For a trip to Miami, Florida, over spring break in 2014, the data below (obtained from travelocity.com) show the price per night (in U.S. dollars) for various hotel rooms.

a. Construct a stem-and-leaf plot. Truncate the data to the first two digits for purposes of constructing the plot. For example, 239 becomes 23.

b. Reconstruct the stem-and-leaf plot in part a by using split stems; that is, two stems of 1, two stems of 2, and so on. Compare the two stem-and-leaf plots. Explain how one may be more informative than the other.

c. Sketch a histogram by hand (or use software), using 6 intervals of length 50, starting at 100 and ending at 400. What does the plot tell about the distribution of hotel prices? (Mention where most prices tend to fall and comment about the shape of the distribution.)

The data shown in Exercise 2.9 give frequencies of fatal shark attacks for different regions. Using software or sketching, construct a bar graph, ordering the regions

(i) Alphabetically

(ii) As in a Pareto chart. Which do you prefer? Why

**In problem 2.9**

Few of the shark attacks listed in Table 2.1 are fatal. Overall, 63 fatal shark attacks were recorded in the ISAF from 2004 to 2013, with 2 reported in Florida, 2 in Hawaii, 4 in California, 15 in Australia, 13 in South Africa, 6 in Réunion Island, 4 in Brazil, and 6 in the Bahamas. The rest occurred in other regions.

In 2012 in the United States, most electricity was generated from coal (37%), natural gas (30%), or nuclear power plants (19%). Hydro-power accounted for 7% of the total electricity produced; other renewable sources such as wind or solar power accounted for 5%. Other non-renewable sources (such as petroleum)made up the remaining 2%.

a. Display this information in a bar graph.

b. Which is easier to sketch relatively accurately, a pie chart or a bar chart?

c. What is the advantage of using a graph to summarize the results instead of merely stating the percentages for each source?

d. What is the modal category?

Few of the shark attacks listed in Table 2.1 are fatal. Overall, 63 fatal shark attacks were recorded in the ISAF from 2004 to 2013, with 2 reported in Florida, 2 in Hawaii, 4 in California, 15 in Australia, 13 in South Africa, 6 in Réunion Island, 4 in Brazil, and 6 in the Bahamas. The rest occurred in other regions.

a. Construct the frequency table for the regions of the reported fatal shark attacks.

b. Identify the modal category.

c. Describe the distribution of fatal shark attacks across the regions.

Identify each of the following variables as either categorical or quantitative.

a. Choice of diet (vegan, vegetarian, neither)

b. Time spent shopping online per week

c. Ownership of a tablet (yes, no)

d. Number of siblings

According to a recent Current Population Survey of U.S. married-couple households, 13% are traditional (with children and with only the husband in the labor force), 31% are dual-income with children, 25% are dual-income with no children, and 31% are other (such as older married couples whose children no longer reside in the household). Is the variable “household type” categorical or quantitative? Explain.

According to the U.S. Bureau of the Census, Current Population Reports 2013, the mean and median income for households with health insurance was $86,431 and $66,000, respectively. For households without health insurance, the mean and median income was $53,373 and $39,000, respectively. Does this suggest that the distribution of income for households with health insurance is symmetric, skewed to the right, or skewed to the left? What about for households without health insurance? Explain.

The Current Population Survey (CPS) is a survey conducted by the U.S. Bureau of the Census for the Bureau of Labor Statistics. It provides a comprehensive body of data on the labor force, unemployment, wealth, poverty, and so on. The data can be found online at www.census.gov/cps/. Data from the 2012 CPS examined about 68,000 households, each consisting of at least one related person under the age of 18. The report indicated that 18% of white households, 37.5% of black households, and 13.4% of Asian households had annual incomes below the poverty line. Based on these results, the study authors concluded that the percentage of all such black households with annual incomes below the poverty line is between 35.6% and 39.4%. Specify the aspect of this study that pertains to

(a) Description

(b) Inference.

Suppose a liberal arts student is interested in exploring graduate school in his or her field. The student identifies a program in which he or she is interested and finds the name of a few students from that program to interview. In this context, identify what is meant by the

(a) Subject,

(b) Sample,

(c) Population.

We’ll see that the amount by which statistics vary from sample to sample always depends on the sample size. This important fact can be illustrated by thinking about what would happen in repeated flips of a fair coin.

a. Which case would you find more surprising—flipping the coin five times and observing all heads or flipping the coin 500 times and observing all heads?

b. Imagine flipping the coin 500 times, recording the proportion of heads observed, and repeating this experiment many times to get an idea of how much the proportion tends to vary from one sequence to another. Different sequences of 500 flips tend to result in proportions of heads observed which are less variable than the proportion of heads observed in sequences of only five flips each. Using part a, explain why you would expect this to be true.

Your instructor will show you how to create data files by using the software for your course. Use it to create the data file you constructed by hand in Exercise 1.20 or 1.21.

Data From

Construct a data file describing the purchasing behavior of the five people, described be-low, who visit a shopping mall. Enter purchase amounts each spent on clothes, sporting goods, books, and music CDs as the data. Customer 1 spent $49 on clothes and $16 on music CDs, customer 4 spent $92 on books, and the other three customers did not buy anything

Your university is interested in determining the proportion of students who would be interested in completing summer courses online, compared to on campus. A survey is taken of 100 students who intend to take summer courses.

a. Identify the sample and the population.

b. For the study, explain the purpose of using

(i) Descriptive statistics

(ii) Inferential statistics.

For the marketing study about sales in Example 5, identify the (a) sample and population (b) descriptive and inferential aspects.

The Gallup organization has asked opinions about support of labor unions since its first poll in 1936, when 72% of the American population approved of them. In its 2014 poll, it found that support of labor unions had fallen to 53% of Americans, based on

a sample of 1,540 adults.

a. Calculate an estimated margin of error for these data.

b. What is the range of likely values for Americans who support labor unions in 2014?

c. This analysis is an example of

i. Descriptive statistics

ii. Inferential statistics

iii. A data file

iv. Designing a study

Inferential statistics are used

a. To describe whether a sample has more females or males.

b. To reduce a data file to easily understood summaries.

c. To make predictions about populations by using sample data.

d. When we can’t use statistical software to analyze data.

e. To predict the sample data we will get when we know the population.

True or false? In a particular study, you could use descriptive statistics, or you could use inferential statistics, but you would rarely need to use both.

Pick up a recent issue of a national newspaper, such as The New York Times or USA Today, or consult a news website, such as msnbc.com or cnn.com. Identify an article that used statistical methods. Did it use descriptive statistics, inferential statistics, or both? Explain.

The players on the New York Yankees baseball team in 2014 had a mean salary of $7,387,498 and a median salary of $5,500,000.6 What do you think causes these two values to be so different (data in NYY salary file on book’s website.)?

The table in the next column shows the number of times 20â€“24-year-old U.S. residents have been married, based on a Bureau of the Census report from 2004. The frequencies are actually thousands of people. For instance, 8,418,000 men never married, but this does not affect calculations about the

mean or median.

mean or median.

a. Find the median and mean for each gender.

b. On average, have women or men been married more often? Which statistic do you prefer to answer this question? (The mean, as opposed to the median, uses the numerical values of all the observations, not just the ordering. For discrete data with only a few values such as the number of times married, it can be more informative.)

The Human Development Report 2013, published by the United Nations, showed life expectancies by country. For Western Europe, some values reported were Austria 81, Belgium 80, Denmark 80, Finland 81, France 83, Germany 81, Greece 81, Ireland 81, Italy 83, Netherlands 81, Norway 81, Portugal 80, Spain 82, Sweden 82, Switzerland 83.

For Africa, some values reported were

Botswana 47, Dem. Rep. Congo 50, Angola 51, Zambia 57, Zimbabwe 58, Malawi 55, Nigeria 52, Rwanda 63, Uganda 59, Kenya 61, Mali 55, South Africa 56, Madagascar 64, Senegal 63, Sudan 62, Ghana 61.

a. Which group (Western Europe or Africa) of life expectancies do you think has the larger standard deviation? Why?

b. Find the standard deviation for each group. Compare them to illustrate that s is larger for the group that shows more variability from the mean.

For Russia, the United Nations reported a life expectancy of 70. Suppose we add this observation to the data set for Western Europe in the previous exercise. Would you expect the standard deviation to be larger, or smaller, than the value for the Western European countries alone? Why?

According to the National Association of Home Builders, the median selling price of new homes in the United States in February 2014 was $261,400. Which of the following is the most plausible value for the standard deviation: - $15,000, $1000, $60,000, or $1,000,000? Why? Explain what’s unrealistic about each of the other values.

The table below shows data (from a 2004 Bureau of the Census report) on the number of times 20- to 24-year-old men have been married.

a. Verify that the mean number of times men have been married is 0.16 and that the standard deviation is 0.37.

b. Find the actual percentages of observations within 1, 2, and 3 standard deviations of the mean. How do these compare to the percentages predicted by the empirical rule?

c. How do you explain the results in part b?

The 2012 General Social Survey asked, “On the average day, about how many hours do you personally watch television?” Of 1298 responses, the mode was 2, the median was 2, the mean was 3.09, and the standard deviation was 2.87. Based on these statistics, what would you surmise about the shape of the distribution? Why? (Source: Data from CSM, UC Berkeley.)

Use the Mean versus Median web app on the book’s website to investigate how the standard deviation changes as the data change. When you start the app, you have a blank graph. Under “Options”, you can request to show the standard deviation of the points you create by clicking in the graph.

a. Create 3 observations (by clicking in the graph) that have a mean of about 50 and a standard deviation of about 20. (Clicking on an existing point deletes it.)

b. Create 3 observations (click Refresh to clear the previous points) that have a mean of about 50 and a standard deviation of about 40.

c. Placing 4 values between 0 and 100, what is the largest standard deviation you can get? What are the values that have that standard deviation?

In recent years, many European nations have suffered from relatively high youth unemployment. For the 28 EU nations, the table below shows the unemployment rate among 15- to 24-year-olds in 2013, also available as a file on the bookâ€™s website. If you are computing the statistics below by hand, it may be easier to arrange the data in increasing order and write them in 4 rows of 7 values each.

a. Find and interpret the second quartile 1= median 2.

b. Find and interpret the first quartile (Q1).

c. Find and interpret the third quartile (Q3).

d. Will the 10th percentile be around the value of 6 or 16? Explain.

a. Find and interpret the second quartile 1= median 2.

b. Find and interpret the first quartile (Q1).

c. Find and interpret the third quartile (Q3).

d. Will the 10th percentile be around the value of 6 or 16? Explain.

The Energy Information Administration records per capita consumption of energy by country. The box plot below shows 2011 per capita energy consumption (in millions of BTUs) for 36 OECD countries, with a mean of 195 and a standard deviation of 120. Iceland had the largest per capita consumption at 665 million BTU.

a. Use the box plot to give approximate values for the five-number summary of energy consumption.

b. Italy had a per capita consumption of 139 million BTU. How many standard deviations from the mean was its consumption?

c. The United States was not included in the data, but its per capita consumption was 334 million BTU. Relative to the distribution for the included OECD nations, the United States is how many standard deviations from the mean?

a. Use the box plot to give approximate values for the five-number summary of energy consumption.

b. Italy had a per capita consumption of 139 million BTU. How many standard deviations from the mean was its consumption?

c. The United States was not included in the data, but its per capita consumption was 334 million BTU. Relative to the distribution for the included OECD nations, the United States is how many standard deviations from the mean?

The 2013 unemployment rates of countries in the European Union shown in Exercise 2.63 ranged from 4.9 to 27.3, with Q1 = 7.15, median = 10.15, Q3 = 13.05, a mean of 11.1, and a standard deviation of 5.6.

a. In a box plot, what would be the values at the outer edges of the box, and what would be the values to which the whiskers extend?

b. Which two countries will show up as outliers in the box plot? Why?

c. Greece had the highest unemployment rate of 27.3. Is it an outlier according to the 3 standard deviation criterion? Explain.

d. What unemployment value for a country would have a z-score equal to 0?

Example 17 discussed EU carbon dioxide emissions, which had a mean of 7.9 and standard deviation of 3.6.

a. Finland’s observation was 11.5. Find its z-score and interpret.

b. Sweden’s observation was 5.6. Find its z-score, and interpret.

c. The UK’s observation was 7.9. Find the z-score and interpret.Example 17 discussed EU carbon dioxide emissions, which had a mean of 7.9 and standard deviation of 3.6.

Explain what is wrong with the time plot shown of the annual license fee paid by British subjects for watching BBC programs.

Explain what is wrong with the following pie chart, which depicts the federal government breakdown by category for 2010.

The table shows the number of 18- to 24-year-old noncitizens living in the United States between 2010 and 2012.

Noncitizens aged 18 to 24 in the United States**Region of Birth ................................Number (in Thousands)**Africa ...............................................................115

Asia ..................................................................590

Europe .............................................................148

Latin America & Caribbean ...........................1,666

Other ..................................................................49

Total ..............................................................2,568

a. Is Region of Birth quantitative or categorical? Show how to summarize results by adding a column of percentages to the table.

b. Which of the following is a sensible numerical summary for these data: Mode (or modal category), mean, median? Explain, and report whichever is/are sensible.

c. How would you order the Region of Birth categories for a Pareto chart? What’s its advantage over the ordinary bar graph?

A recent survey asked 1200 university students in China to pick the personality trait that most defines a person as “cool.” The possible responses allowed, and the percentage making each, were individualistic and innovative (47%), stylish (13.5%), dynamic and capable (9.5%), easygoing and relaxed (7.5%), other (22.5%).

a. Identify the variable being measured.

b. Classify the variable as categorical or quantitative.

c. Which of the following methods could you use to describe these data: (i) bar chart, (ii) dot plot, (iii) box plot, (iv) median, (v) mean, (vi) mode (or modal category),(vii) IQR, (viii) standard deviation?

For the question â€œHow many children have you ever had?â€ in the 2010 General Social Survey, the results were

a. Which is the most appropriate graph to display the dataâ€”dot plot, stem-and-leaf plot, or histogram? Why?

b. Based on sketching or using software to construct the graph, characterize this distribution as skewed to the left, skewed to the right, or symmetric. Explain.

a. Which is the most appropriate graph to display the dataâ€”dot plot, stem-and-leaf plot, or histogram? Why?

b. Based on sketching or using software to construct the graph, characterize this distribution as skewed to the left, skewed to the right, or symmetric. Explain.

The Animals data set on the book’s website has data on the average longevity (measured in years) for 21 animals.

a. Construct a stem-and-leaf plot of longevity

b. Construct a histogram of longevity.

c. Summarize what you see in the histogram or stemand- leaf plot. (Most animals live to be how old? Is the distribution skewed?)

The U.S. Bureau of the Census reported a median sales price of new houses sold in March 2014 of $290,000. Would you expect the mean sales price to have been higher or lower? Explain.

In a guidebook about interesting hikes to take in national parks, each hike is classified as easy, medium, or hard and by the length of the hike (in miles). Which classification is quantitative and which is categorical?

The histogram shows the distribution of the damage (in billion dollars) of the 30 most costly hurricanes hitting the U.S. mainland between 1900 and 2010. (Numbers are inflation adjusted and in 2010 dollars). The data are available in the Hurricane file on the bookâ€™s website.

a. Describe the shape of the distribution.

b. Would you use the mean or median to describe the center? Why?

c. Verify with technology that the mean is 13.6, Q1 = 5.7, median = 7.9, and Q3 = 11.8. (Note that different software uses slightly different definitions for the quartiles.)

d. Write a short paragraph describing the distribution of hurricane damage.

Refer to the previous exercise about hurricane damage and the histogram shown there.

a. For this data, 93% of damages (i.e., all but the two most expensive) fall within one standard deviation of the mean. Why is this so different from the 68% the empirical rule suggests?

b. How would removing the costliest hurricane (Katrina in 2005, shown on the far right in the histogram) from the data set affect the

(i) mean,

(ii) median,

(iii) standard deviation,

(iv) IQR,

(v) 10th percentile?

**Data from previous exercise**

The histogram shows the distribution of the damage (in billion dollars) of the 30 most costly hurricanes hitting the U.S. mainland between 1900 and 2010. (Numbers are inflation adjusted and in 2010 dollars). The data are available in the Hurricane file on the bookâ€™s website.

The data values below represent the closing prices of the 20 most actively traded stocks on the NASDAQ Stock Exchange (rounded to the nearest dollar) on May 2, 2014.

a. Sketch a dot plot or construct a stem-and-leaf plot.

b. Find the median, the first quartile, and the third quartile.

c. Sketch a box plot. What feature of the distribution displayed in the plot in part a is not obvious in the box plot?

According to Statistical Abstract of the United States, 2012, average salary (in dollars) of primary and secondary school classroom teachers in 2009 in the United States varied among states with a five number summary of: minimum = 35,070, Q1 = 45,840, median = 48,630, Q3 = 55,820, maximum = 69,119. (Data available in the teacher_salary file.)

a. Find and interpret the range and interquartile range.

b. Sketch a box plot, marking the five-number summary on it.

c. Predict the direction of skew for this distribution. Explain.

d. If the distribution, although skewed, is approximately bell shaped, which of the following would be the most realistic value for the standard deviation:

(i) 100,

(ii) 1000,

(iii) 7000,

(iv) 25,000? Explain your reasoning.

In 2009, the five-number summary statistics for the distribution of statewide percentage of people without health insurance had a minimum of 4.4% (Massachusetts), Q1 = 12.35%, median = 14.8%, Q3 = 17.3%, and maximum of 26.1% (Texas) (Statistical Abstract of the United States, data available on book’s website as health_coverage.).

a. Do you think the distribution is symmetric, skewed right, or skewed left? Why?

b. Which is most plausible for the standard deviation: -16, 0, 4, 20, or 25? Why? Explain what is unrealistic about the other values.

For each of the following variables, sketch a box plot that would be plausible.

a. Exam score 1min = 0, max = 100, mean = 87, standard deviation = 102

b. IQ (mean = 100 and standard deviation = 16)

c. Weekly religious contribution (median = $10 and mean = $172

The distribution of high school graduation rates in the United States in 2009 had a minimum value of 79.9 (Texas), first quartile of 84.0, median of 87.4, third quartile of 89.8, and maximum value of 91.8 (Wyoming) (Statistical Abstract of the United States, data available on book’s website.)

a. Report the range and the interquartile range.

b. Would a box plot show any potential outliers? Explain.

c. The mean graduation rate is 86.9, and the standard deviation is 3.4. For these data, does any state have a z-score that is larger than 3 in absolute value? Explain.

In a study of graduate students who took the Graduate Record Exam (GRE), the Educational Testing Service reported that for the quantitative exam, U.S. citizens had a mean of 529 and standard deviation of 127, whereas the non-U.S. citizens had a mean of 649 and standard deviation of 129. Which of the following is true?

a. Both groups had about the same amount of variability in their scores, but non-U.S. citizens performed better, on the average, than U.S. citizens.

b. If the distribution of scores was approximately bell shaped, then almost no U.S. citizens scored below 400.

c. If the scores range between 200 and 800, then probably the scores for non-U.S. citizens were symmetric and bell shaped.

d. A non-U.S. citizen who scored 3 standard deviations below the mean had a score of 200.

The side-by-side box plots below show the unemployment rate among 15- to 24-year-olds in 28 European nations for each gender. The two outliers shown for each box plot refer to the same countries, Greece and Spain. Write a short paragraph comparing the distribution for the males to the one for the females. (Data in youth_unemployment on bookâ€™s website.)

Which statement about the standard deviation s is false?

a. *s *can never be negative.

b. *s* can never be zero.

c. For bell-shaped distributions, about 95% of the data fall within X̅ ± 2s.

d*. s* is a nonresistant (sensitive to outliers) measure of variability, as is the range.

The mean GPA for all students at a community college in the fall semester was 2.77. A student with a GPA of 2.0 wants to know her relative standing in relation to the mean GPA. A numerical summary that would be useful for this purpose is the

a. Standard deviation

b. Median

c. Interquartile range

d. Number of students at the community college

A teacher summarizes grades on an exam by Min = 26, Q1 = 67, Q2 = 80, Q3 = 87, Max = 100, Mean = 76, Mode = 100, Standard deviation = 76, IQR = 20.

She incorrectly recorded one of these. Which one do you think it was? Why?

According to a story in the Guardian newspaper (football.guardian.co.uk), in the United Kingdom the mean wage for a Premiership player in 2006 was £676,000. True or false: If the income distribution is skewed to the right, then the median salary was even larger than £676,000.

For the following pairs of variables, which more naturally is the response variable and which is the explanatory variable?

a. Carat (= weight) and price of a diamond

b. Dosage (low/medium/high) and severity of adverse event (mild/moderate/strong/serious) of a drug

c. Top speed and construction type (wood or steel) of a roller coaster

d. Type of college (private/public) and graduation rate.

In recent election years, political scientists have analyzed whether a gender gap exists in political beliefs and party identification. The table shows data collected from the 2010 General Social Survey on gender and party identification (ID).

a. Identify the response and explanatory variables.

b. What proportion of sampled individuals is

(i) Male and Republican,

(ii) Female and Republican?

c. What proportion of the overall sample is

(i) Male,

(ii) Republican?

d. Are the proportions you computed in part c conditional or marginal proportions?

e. The two bar graphs, one for each gender, display the proportion of individuals identifying with each political party. What are these proportions called?

Is there a difference between males and females in the proportions that identify with a particular party?

Summarize whatever gender gap you observe.

For the 100 cars on the lot of a used-car dealership, would you expect a positive association, negative association, or no association between each of the following pairs of variables? Explain why.

a. The age of the car and the number of miles on the odometer

b. The age of the car and the resale value

c. The age of the car and the total amount that has been spent on repairs

d. The weight of the car and the number of miles it travels on a gallon of gas

e. The weight of the car and the number of liters it uses per 100 km.*

The previous problem discusses GDP, which is a commonly used measure of the overall economic activity of a nation. For this group of nations, the GDP data have a mean of 1909 and a standard deviation of 3136 (in billions of U.S. dollars).

a. The five-number summary of GDP is minimum = 204, Q1 = 378, median = 780, Q3 = 2015, and maximum = 16,245. Sketch a box plot.

b. Based on these statistics and the graph in part a, describe the shape of the distribution of GDP values.

c. The data set also contains per capita GDP, or the overall GDP divided by the nation’s population size. Construct a scatterplot of per capita GDP and GDP and explain why no clear trend emerges.

d. Your friend, Joe, argues that the correlation between the two variables must be 1 since they are both measuring the same thing. In reality, the actual correlation between per capita GDP and GDP is only 0.32. Identify the flaw in Joe’s reasoning.

For the 32 nations in the Internet Use data file on the bookâ€™s website, consider the following correlations:

a. Which pair of variables exhibits the strongest linear relationship?

b. Which pair of variables exhibits the weakest linear relationship?

c. In Example 7, we found the correlation between Internet use and Facebook use (measured in percentages of the population) to be 0.614. Why does the correlation between total number of Internet users and Facebook users differ from that of Internet use and Facebook use?

Match the following scatter plots with the correlation values.

1. r = -0.9

2. r = -0.5

3. r = 0

4. r = 0.6

1. r = -0.9

2. r = -0.5

3. r = 0

4. r = 0.6

Consider the data:

x 3 4 5 6 7

y 8 13 12 14 16

a. Sketch a scatter plot.

b. If one pair of (x, y) values is removed, the correlation for the remaining four pairs equals 1. Which pair is it?

c. If one y value is changed, the correlation for the five pairs equals 1. Identify the y value and how it must be changed for this to happen.

The following table shows data on gender (coded as 1 = female, 2 = male) and preferred type of chocolate (coded as 1 = white, 2 = milk, 3 = dark) for a sample of 10 students.

The studentsâ€™ teacher enters the data into software and reports a correlation of 0.640 between gender and type of preferred chocolate. He concludes that there is a moderately strong positive correlation between someoneâ€™s gender and chocolate preference. Whatâ€™s wrong with this analysis?

Sketch a scatter plot for which r > 0, but r = 0 after one of the points is deleted.

Refer to the previous exercise. The correlation with the cost of a dinner is 0.68 for food quality rating, 0.69 for service rating, and 0.56 for décor rating. According to the definition of r^{2} as a measure for the reduction in the prediction error, which of these three ratings can be used to make the most accurate

predictions for the cost of a dinner: quality of food, service, or décor? Why?

For the 32 nations in Example 7, we found a correlation of 0.614 between Internet use and Facebook use (both as percentages of population). The regression equation is predicted Facebook use = 7.90 + 0.439 Internet use

a. Based on the correlation value, the slope had to be positive. Why?

b. Indonesia had an Internet use of 15.4% and Facebook use of 20.7%. Find its predicted Facebook use based on the regression equation.

c. Find the residual for Indonesia. Interpret.

Zagat restaurant guides publish ratings of restaurants for many large cities around the world (see www.zagat.com). The review for each restaurant gives a verbal summary as well as a 0- to 30-point rating of the quality of food, décor, service, and the cost of a dinner with one drink and tip. For 31 French restaurants in Boston in 2014, the food quality ratings had a mean of 24.55 and standard deviation of 2.08 points. The cost of a dinner (in U.S. dollars) had a mean of $50.35 and standard deviation of $14.92. The equation that predicts the cost of a dinner using the rating for the quality of food is ŷ = -70 + 4.9x. The correlation between these two variables is 0.68. (Data available in the Zagat_Boston file.)

a. Predict the cost of a dinner in a restaurant that gets the

(i) Lowest observed food quality rating of 21,

(ii) Highest observed food quality rating of 28.

b. Interpret the slope in context.

c. Interpret the correlation.

d. Show how the slope can be obtained from the correlation and other information given.

The Internet Use data file on the book’s website contains data on the number of individuals in a country with broadband access and the population size for each of 32 nations. When using population size as the explanatory variable, x, and broadband subscribers as the response variable, y, the regression equation is predicted broadband subscribers = 5,530,203 + 0.0761 population.

a. Interpret the slope of the regression equation. (Make sure to use a meaningful increase.) Is the association positive or negative? Explain what this means.

b. Predict broadband subscribers at the

(i) Minimum population size x value of 7,154,600,

(ii) At the maximum population size x value of 1,350,695,000.

c. For the United States, broadband subscribers = 88,520,000, and population = 313,914,040. Find the predicted broadband use and the residual for the United States. Interpret the value of this residual.

Refer to the previous exercise.

a. For this example, it seems that the average sodium content is as good a predictor as the one resulting from the regression equation. Do you expect r^{2} to be large or small? Why?

b. For this data, r = -0.017. Interpret r^{2}.

c. Show the algebraic relationship between the correlation of -0.017 and the slope of the regression equation b = -0.25, using the fact that the standard deviations are 5.32 g for sugar and 77.3 mg for sodium.

**Previous exercise**

The following figure shows the result of a regression analysis of the explanatory variable x = sugar and the response variable y = sodium for the breakfast cereal data set discussed in Chapter 2 (the Cereal data file on the book’s website).

Most cars are fuel efficient when running at a steady speed of around 40 to 50 mph. A scatter plot relating fuel consumption (measured in mpg) and steady driving speed (measured in mph) for a mid-sized car is shown below. The data are available in the Fuel file on the book’s Web site. (Source: Berry, I. M. (2010). The Effects of Driving Style and Vehicle Performance on the Real-World Fuel Consumption of U.S. Light-Duty Vehicles. Masters thesis, Massachusetts Institute of Technology, Cambridge, MA.)

a. The correlation equals 0.106. Comment on the use of the correlation coefficient as a measure for the association between fuel consumption and steady driving speed.

b. Comment on the use of the regression equation as a tool for predicting fuel consumption from the velocity of the car.

c. Over what subrange of steady driving speed might fitting a regression equation be appropriate? Why?

Does the life expectancy of animals depend on the length of their gestational period? The data in the Animals file on the bookâ€™s website show observations on the average longevity (in years) and average gestational period (in days) for 21 animals. (Source: Wildlife Conservation Society)

a. Use the scatter plot below to describe the association. Are there any unusual observations?

b. Using software, verify the given numbers for the correlation and slope of the regression line shown on the plot.

c. Are there any outliers? If so, what makes the observations unusual? Would you expect any of them to be influential observations? Why or why not?

d. Find the regression line and the correlation without one of the observations identified in part c. Compare your results to those in part b. Was the observation influential? Explain.

a. Use the scatter plot below to describe the association. Are there any unusual observations?

b. Using software, verify the given numbers for the correlation and slope of the regression line shown on the plot.

c. Are there any outliers? If so, what makes the observations unusual? Would you expect any of them to be influential observations? Why or why not?

d. Find the regression line and the correlation without one of the observations identified in part c. Compare your results to those in part b. Was the observation influential? Explain.

Consumer Reports magazine (June 2013) reported purchasing samples of ground turkey from different brands to test for the presence of bacteria. The table below shows the number of samples that tested positive for Enterococcus bacteria for packages that claimed no use of antibiotics in the processing of the meat and for packages in which no such claim was made.

a. Find the difference in the proportion of packages that tested positive and interpret.

b. Find the ratio of the proportion of packages that tested positive and interpret.

The data in the Animals file on the book’s website holds observations on the average longevity (in years) and gestational period (in days) for a variety of animals. Exercise 3.52 showed the scatter plot together with the regression equation with intercept 6.29 and slope 0.045 and an r^{2} value of 0.73.

a. Interpret the slope.

b. A leopard has a gestational period of about 98 days. What is its predicted average longevity?

c. Interpret the value of r^{2}.

d. Show that extrapolating from animals to humans (with gestational period of about 40 weeks) grossly underestimates average human longevity.

For the U.S. Statewide Crime data file on the book’s website, let y = violent crime rate and x = percent with a college education.

a. Construct a scatter plot. Identify any points that you think may be influential in a regression analysis.

b. Fit the regression line, using all 51 observations. Interpret the slope.

c. Fit the regression line after deleting the observation identified in part a. Interpret the slope and compare results to part b.

Repeat the previous exercise using x = percent with at least a high school education. This shows that an outlier is not especially influential if its x-value is not relatively large or small.

**Repeat the previous exercise**

For the U.S. Statewide Crime data file on the book’s website, let y = violent crime rate and x = percent with a college education.

a. Construct a scatterplot. Identify any points that you think may be influential in a regression analysis.

b. Fit the regression line, using all 51 observations. Interpret the slope.

c. Fit the regression line after deleting the observation identified in part a. Interpret the slope and compare results to part b.

Example 11 discussed how the winning height in the Olympic high jump changed over time. Using the High Jump data file on the bookâ€™s website, we get (see also Figure 3.16) Women_Meters = -10.94 + 0.0065 (Year_Women) for predicting the womenâ€™s winning height (in meters) using the year number.

a. Predict the winning Olympic high jump distance for women in (i) 2016 and (ii) 3000.

b. Do you feel comfortable making either prediction in part a? Explain.

a. Predict the winning Olympic high jump distance for women in (i) 2016 and (ii) 3000.

b. Do you feel comfortable making either prediction in part a? Explain.

Figure 3.16

Access the Newnan GA Temp’s file on the book’s website, which contains data on average annual temperatures for Newnan, Georgia, during the 20th century. Fit a regression line to these temperatures and interpret the trend. Compare the trend to the trend found in Example 12 for Central Park, New York, temperatures.

In a study of graduate students who took the Graduate Record Exam (GRE), the Educational Testing Service reported a correlation of 0.37 between undergraduate grade point average (GPA) and the graduate first year GPA.14 This means that

a. As undergraduate GPA increases by one unit, graduate first-year GPA increases by 0.37 unit.

b. Because the correlation is not 0, we can predict a person’s graduate first-year GPA perfectly if we know their undergraduate GPA.

c. The relationship between undergraduate GPA and graduate first-year GPA follows a curve rather than a straight line.

d. As one of these variables increases, there is a weak tendency for the other variable to increase also.

Which of the following is not a property of r?

a. r is always between -1 and 1.

b. r depends on which of the two variables is designated as the response variable.

c. r measures the strength of the linear relationship between x and y.

d. r does not depend on the units of y or x.

e. r has the same sign as the slope of the regression equation.

One can interpret r = 0.30 as

a. A weak, positive association

b. 30% of the time ŷ = y

c. ŷ changes 0.30 units for every one-unit increase in x

d. A stronger association than two variables with r = -0.70

Which one of the following statements is correct?

a. The correlation is always the same as the slope of the regression line.

b. The mean of the residuals from the least-squares regression line is 0 only when r = 0.

c. The correlation is the percentage of points that lie in the quadrants where x and y are both above the mean or both below the mean.

d. The correlation is inappropriate if a U-shaped relationship exists between x and y.

You can summarize the data for two categorical variables x and y by

a. drawing a scatter plot of the x- and y-values.

b. constructing a contingency table for the x- and y-values.

c. calculating the correlation between x and y.

d. constructing a box plot for each variable.

The slope of the regression equation and the correlation are similar in the sense that

a. They do not depend on the units of measurement.

b. They both must fall between -1 and +1.

c. They both have the same sign.

d. Neither can be affected by severe regression outliers.

An r^{2} measure of 0.85 implies that

a. The correlation between x and y equals 0.85.

b. For a one-unit increase in x, we predict y to increase by 85%.

c. 85% of the response variable y can be explained by the linear relationship between x and y.

d. 85% of the variability we observe in the response variable y can be explained by its linear relationship with x.

One hundred forty-eight men and women without heart disease or diabetes enrolled in a study. Half of the subjects were randomly assigned to a low-carb diet (<40 g/d), and the others were given a low-fat diet (<30% of daily energy intake from total fat). Subjects on the low-carb diet lost more weight after one year compared with those on the low-fat diet (an average of 8 pounds more). (L. A. Bazzano et al., Ann Intern Med 2014; 161(5): 309–318. doi: 10.7326/M14-0180)

a. Identify the response variable and the explanatory variable.

b. Was this study an observational study or an experimental study? Explain.

c. Based on this study, is it appropriate to recommend that everyone who wishes to lose weight should prefer a low-carb diet over a low-fat diet? Explain your answer.

A 2014 study (http://www. futurity.org/foreign-languages-make-us-less-moral/) examined the relations

A 2014 study (http://www. futurity.org/foreign-languages-make-us-less-moral/) examined the relationship between using a foreign language and acting for the common good. They examined two groups of people, one that used their native language and one that used a foreign language. Each group was presented with a moral dilemma that required the subject to make a choice about saving himself or herself versus sacrificing his or her life for the sake of saving others. Those using a foreign language chose saving others at a higher rate than those using their native language.

a. Identify the response variable and the explanatory variable.

b. Is this study an observational study or an experiment? Explain.

c. Can we conclude that speaking a foreign language causes one to act morally? Explain.

Andy once heard about a car crash victim who died because he was pinned in the wreckage by a seat belt he could not undo. As a result, Andy refuses to wear a seat belt when he rides in a car. How would you explain to Andy the fallacy behind relying on this anecdotal evidence?

Tony’s mother is extremely proud that her son will graduate college in a few months. She expresses concern, however, when Tony tells her that following graduation, he plans to move to Las Vegas to become a professional poker player. He mentions that his friend Nick did so and is now earning more than a million dollars per year. Should Tony’s anecdotal evidence about Nick soothe his mother’s concern?

In a study published in the July 7, 2014, edition of the American Journal of Medicine, it was suggested that lack of exercise contributed more to weight gain than eating too much. The study examined the current exercise habits and caloric intake of a sample of both males and females.

a. Was this an observational study or an experimental study? Explain why.

b. Identify the response variable and the explanatory variable(s).

c. Does this study prove that lack of exercise causes weight gain more often than eating too much?

d. It was reported that women younger than 40 are quite vulnerable to the risks of a sedentary lifestyle. Name a lurking variable that might explain this risk of a sedentary lifestyle for these younger women that in turn leads to little exercise and/or eating more.

Join SolutionInn Study Help for

1 Million+ Textbook Solutions

Learn the step-by-step answers to your textbook problems, just enter our Solution Library containing more than 1 Million+ textbooks solutions and help guides from over 1300 courses.

24/7 Online Tutors

Tune up your concepts by asking our tutors any time around the clock and get prompt responses.