Question: Question 1 1/1 point (graded) The following code was used in the video to plot RSS with 0 =25 . beta1 = se q(0, 1,

Question 1

1/1 point (graded)

The following code was used in the video to plot RSS with

0

=25

.

 beta1 = seq(0, 1, len=nrow(galton_heights)) results <- data.frame(beta1 = beta1, rss = sapply(beta1, rss, beta0 = 25)) results %>% ggplot(aes(beta1, rss)) + geom_line() + geom_line(aes(beta1, rss), col=2) 

In a model for sons' heights vs fathers' heights, what is the least squares estimate (LSE) for

1

if we assume

^

0

is 36?

Hint: modify the code above to do_yr analysis.

0.65

0.5

0.2

12

correct

Answer

Correct:Correct. You can tell from a plot of RSS vs

1

that the minimum estimate is 0.5

Submit

You have used 1 of 2 attempts

Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.

Save

Save Your Answer

Show Answer

Question 2

1/1 point (graded)

The least squares estimates for the parameters

0

,

1

,...,

n

Select an option

maximize

minimize

equal

correct

the residual sum of squares.

Submit

You have used 1 of 1 attempt

Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.

Show Answer

Question 3

1 point possible (graded)

Load theLahmanlibrary and filter theTeamsdata frame to the years 1961-2001. Run a linear model in R predicting the number of runs per game based onboththe number of bases on balls per gameandthe number of home runs per game.

What is the coefficient for bases on balls?

0.39

1.56

1.74

0.027

unanswered

Submit

You have used 0 of 2 attempts

Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.

Save

Save Your Answer

Question 4

1 point possible (graded)

We run a Monte Carlo simulation where we repeatedly take samples of N = 100 from the Galton heights data and compute the regression slope coefficients for each sample:

 B <- 1000 N <- 100 lse <- replicate(B, { sample_n(galton_heights, N, replace = TRUE) %>% lm(son ~ father, data = .) %>% .$coef }) lse <- data.frame(beta_0 = lse[1,], beta_1 = lse[2,]) 

What does the central limit theorem tell us about the variables beta_0 and beta_1?

Select ALL that apply.

They are approximately normally distributed.

The expected value of each is the true value of

0

and

1

(assuming the Galton heights data is a complete population).

The central limit theorem does not apply in this situation.

It allows us to test the hypothesis that

0

=0

and

1

=0

.

unanswered

Submit

You have used 0 of 2 attempts

Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.

Save

Save Your Answer

Question 5

1/1 point (graded)

Which R code(s) below would properly plot the predictions and confidence intervals for our linear model of sons' heights?

NOTE: The function as.tibble() has been replaced by as_tibble() in a recent dplyr update.

Select ALL that apply.

 galton_heights %>% ggplot(aes(father, son)) + geom_point() + geom_smooth() galton_heights %>% ggplot(aes(father, son)) + geom_point() + geom_smooth(method = "lm") model <- lm(son ~ father, data = galton_heights) predictions <- predict(model, interval = c("confidence"), level = 0.95) data <- as.tibble(predictions) %>% bind_cols(father = galton_heights$father) ggplot(data, aes(x = father, y = fit)) + geom_line(color = "blue", size = 1) + geom_ribbon(aes(ymin=lwr, ymax=upr), alpha=0.2) + geom_point(data = galton_heights, aes(x = father, y = son)) model <- lm(son ~ father, data = galton_heights) predictions <- predict(model) data <- as.tibble(predictions) %>% bind_cols(father = galton_heights$father) ggplot(data, aes(x = father, y = fit)) + geom_line(color = "blue", size = 1) + geom_point(data = galton_heights, aes(x = father, y = son)) 

correct

Answer

Correct:

Correct. This is one way to plot predictions and confidence intervals for a linear model of sons' heights vs. fathers' heights. This is one of two correct answers.

Correct. This code uses thepredictcommand to generate predictions and 95% confidence intervals for the linear model of sons' heights vs. fathers' heights. This is one of two correct answers.

Submit

You have used 1 of 2 attempts

Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.

In Questions 7 and 8, you'll look again at female heights fromGaltonFamilies.

Definefemale_heights, a set of mother and daughter heights sampled fromGaltonFamilies, as follows:

set.seed(1989) #if you are using R 3.5 or earlier set.seed(1989, sample.kind="Rounding") #if you are using R 3.6 or later library(HistData) data("GaltonFamilies") options(digits = 3) # report 3 significant digits female_heights <- GaltonFamilies %>% filter(gender == "female") %>% group_by(family) %>% sample_n(1) %>% ungroup() %>% select(mother, childHeight) %>% rename(daughter = childHeight) 

Question 7

0.0/2.0 points (graded)

Fit a linear regression model predicting the mothers' heights using daughters' heights.

What is the slope of the model?

unanswered

What the intercept of the model?

unanswered

Submit

You have used 0 of 10 attempts

Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.

Save

Save Your Answer

Question 8

0.0/2.0 points (graded)

Predict mothers' heights using the model.

What is the predicted height of the first mother in the dataset?

unanswered

What is the actual height of the first mother in the dataset?

unanswered

Submit

You have used 0 of 10 attempts

Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.

Save

Save Your Answer

We have shown how BB and singles have similar predictive power for scoring runs. Another way to compare the usefulness of these baseball metrics is by assessing how stable they are across the years.Because we have to pick players based on their previous performances, we will prefer metrics that are more stable. In these exercises, we will compare the stability of singles and BBs.

Before we get started, we want to generate two tables: one for 2002 and another for the average of 1999-2001 seasons. We want to define per plate appearance statistics, keeping only players with more than 100 plate appearances. Here is how we create the 2002 table:

library(Lahman) bat_02 <- Batting %>% filter(yearID == 2002) %>% mutate(pa = AB + BB, singles = (H - X2B - X3B - HR)/pa, bb = BB/pa) %>% filter(pa >= 100) %>% select(playerID, singles, bb) 

Question 9

0.0/2.0 points (graded)

Now compute a similar table but with rates computed over 1999-2001. Keep only rows from 1999-2001 where players have 100 or more plate appearances, calculate each player's single rate and BB rate per season, then calculate the average single rate (mean_singles) and average BB rate (mean_bb) per player over those three seasons.

How many players had a single ratemean_singlesof greater than 0.2 per plate appearance over 1999-2001?

unanswered

How many players had a BB ratemean_bbof greater than 0.2 per plate appearance over 1999-2001?

unanswered

Submit

You have used 0 of 10 attempts

Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.

Save

Save Your Answer

Question 10

0.0/2.0 points (graded)

Useinner_join()to combine thebat_02table with the table of 1999-2001 rate averages you created in the previous question.

What is the correlation between 2002 singles rates and 1999-2001 average singles rates?

unanswered

What is the correlation between 2002 BB rates and 1999-2001 average BB rates?

unanswered

Submit

You have used 0 of 10 attempts

Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.

Save

Save Your Answer

Question 11

0.0/1.0 point (graded)

Make scatterplots ofmean_singlesversussinglesandmean_bbversusbb.

Are either of these distributions bivariate normal?

Neither distribution is bivariate normal.

singlesandmean_singlesare bivariate normal, butbbandmean_bbare not.

bbandmean_bbare bivariate normal, butsinglesandmean_singlesare not.

Both distributions are bivariate normal.

unanswered

Submit

You have used 0 of 2 attempts

Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.

Save

Save Your Answer

Question 12

0.0/2.0 points (graded)

Fit a linear model to predict 2002singlesgiven 1999-2001mean_singles.

What is the coefficient ofmean_singles, the slope of the fit?

unanswered

Fit a linear model to predict 2002bbgiven 1999-2001mean_bb.

What is the coefficient ofmean_bb, the slope of the fit?

unanswered

Submit

You have used 0 of 10 attempts

Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!