Question: (Simulating Wald and Likelihood Ratio Tests) In this section we will investigate the distributions of hypothesis tests for logistic regression. For this exercise, we will
In this section we will investigate the distributions of hypothesis tests for logistic regression. For this exercise, we will use the following predictors.
sample_size = 150 set.seed(420) x1 = rnorm(n = sample_size) x2 = rnorm(n = sample_size) x3 = rnorm(n = sample_size)
Recall that
p(x)=P[Y=1X=x]
p(x)=P[Y=1X=x]
Consider the true model
log(p(x)
1p(x)
)=
0
+
1
x
1
log(p(x)1p(x))=0+1x1
where
- 0
- =0.4
- 0=0.4
- 1
- =0.35
- 1=0.35
(a)To investigate the distributions, simulate from this model 2500 times. To do so, calculate
P[Y=1X=x]
P[Y=1X=x]
for an observation, and then make a random draw from a Bernoulli distribution with that success probability. (Note that a Bernoulli distribution is a Binomial distribution with parametern=1
n=1. There is no direction function inRfor a Bernoulli distribution.)
Each time, fit the model:
log(p(x)
1p(x)
)=
0
+
1
x
1
+
2
x
2
+
3
x
3
log(p(x)1p(x))=0+1x1+2x2+3x3
Store the test statistics for two tests:
- The Wald test forH
- 0
- :
- 2
- =0
- H0:2=0, which we say follows a standard normal distribution for "large" samples
- The likelihood ratio test forH
- 0
- :
- 2
- =
- 3
- =0
- H0:2=3=0, which we say follows a
- 2
- 2distribution (with some degrees of freedom) for "large" samples
(b)Plot a histogram of the empirical values for the Wald test statistic. Overlay the density of the true distribution assuming a large sample.
(c)Use the empirical results for the Wald test statistic to estimate the probability of observing a test statistic larger than 1. Also report this probability using the true distribution of the test statistic assuming a large sample.
(d)Plot a histogram of the empirical values for the likelihood ratio test statistic. Overlay the density of the true distribution assuming a large sample.
(e)Use the empirical results for the likelihood ratio test statistic to estimate the probability of observing a test statistic larger than 5. Also report this probability using the true distribution of the test statistic assuming a large sample.
(f)Repeat(a)-(e)but with simulation using a smaller sample size of 10. Based on these results, is this sample size large enough to use the standard normal and
2
2distributions in this situation? Explain.
sample_size = 10 set.seed(420) x1 = rnorm(n = sample_size) x2 = rnorm(n = sample_size) x3 = rnorm(n = sample_size)
looking for R coding help.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
