Question: Introduction: This week, the modules, e-text, and other readings covered many concepts related to correlation and regression. Please use your knowledge of these concepts -
Introduction:
This week, the modules, e-text, and other readings covered many concepts related to correlation and regression. Please use your knowledge of these concepts - and StatCrunch - to address the questions/problems below. Please note that your grade on this assignment will depend on the accuracy of your conclusions, the "correctness" of your graphs, calculations, and the manner in which you communicate about the specific concepts being discussed.
We will be addressing questions related to two different data sets this week!
About the first data set, "Enrollment Status and Hours of Sleep Per Night (SP19)":
https://www.statcrunch.com/app/index.html?dataid=2977973
We saw these data last week. This spreadsheet includes the results of a survey sent to students of PA College of Health Sciences in Spring 2019. These data were collected by students in MAT260 who were interested in addressing the question, "Do students who drink coffee get less sleep on average than those who do not drink coffee?"
This data set includes the following columns/variables:
- Current Enrollment: Full time or part time.
- Generation based on birth year: Baby boomer, Generation X, Millennial, Gen 2020.
- Coffee Drinker? Yes or no.
- Hours of sleep estimated per night
Problem 1: (3 points)
We will soon want to run a hypothesis test to compare the average hours of sleep between those that drink coffee and those that don't. Before addressing a specific claim, let's first calculate summary statistics for these two groups.
As a response to this problem, use StatCrunch to calculate summary statistics for the "Hours of Sleep" column and group by the "Do you drink coffee?" column. Comment on any differences you notice between these groups.
Problem 2: (7 points)
Use StatCrunch and the summary statistics you calculated above to run a two-sample hypothesis test to address the following claim:
"On average, PA College students that drink coffee get less hours of sleep per night than those who do not drink coffee."
Test this claim with an alpha level of 0.05. In order to receive full credit for this problem, you must:
- State the claim in symbolic form (defining all parameters);
- record your null and alternative hypotheses;
- Use StatCrunch to calculate the appropriate p-value;
- Include all StatCrunch printouts/outputs;
- Make a conclusion about the null and alternative hypothesis (remembering to include how you arrived at that conclusion);
- Make a final statement regarding the stated claim.
- (You can assume the level of significance is 0.05)
Please remember to show all your work!
For the second half of the quiz, we will be using a different data set: "Exercise and Life Expectancy":
This data set describes the life expectancy and percent of citizens that get sufficient exercise in each state for the years 2000 and 2010. The data are also broken down by gender.
The data can be found at this link:
https://www.statcrunch.com/app/index.php?dataid=3526186
Problem 3: (6 points)
Determine whether there is evidence of a linear relationship between sufficient exercise (2010)and life expectancy (2010). You can use sufficient exercise (2010) as the "X Variable."
Your response to this problem should include/address the following:
- Make a scatterplot to visually assess the relationship between these variables;
- Calculate the correlation coefficient (and two-sided p-value) for these two variables;
- Does the p-value suggest a linear correlation or not (be sure to clearly state the null and alternative hypotheses involved in this test); You can assume a level of significance of 0.05.
- Use the scatterplot, correlation coefficient, and the accompanying p-value to make a final conclusion about the relationship between these variables;
Problem 4: (4 points)
Calculate the line of best fit that could be used to make predictions for the life expectancy (2010) variable (i.e., sufficient exercise (2010) as the "X Variable"). Comment on whether the line of best fit should be used to estimate life expectancy (2010) or whether some other value/estimate would be more appropriate. Explain.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
