Question: We considered experiment data to evaluate the effect of various factors on individual's perceived health. We fit different regression models, using the following variables: Health

We considered experiment data to evaluate the effect of various factors on individual's perceived health. We fit different regression models, using the following variables:

Health status (1,2,3,4,5), age, exercise (if they exercise any=1, otherwise=0), smoke100 (did they smoke more than 100=1, otherwise=0) and gender (m=male, f=female),

> fit<- glm(Healthstatus ~ smoke100 + gender + age , family=poisson (link=log), data=cdc)

> summary(fit)

Call:

glm(formula = Healthstatus~smoke100 + gender + age, family=poisson(link=log), data = cdc)

Deviance Residuals:

Min1QMedian3QMax

-1.86501-0.371910.060750.403441.13194

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept)1.47110430.0112173 131.146< 2e-16 ***

smoke100-0.06191990.0075249-8.229< 2e-16 ***

genderm0.01997520.00745752.6790.00739 **

age-0.00356090.0002203 -16.162< 2e-16 ***

---

Signif. codes:0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 6648.1on 19999degrees of freedom

Residual deviance: 6273.7on 19996degrees of freedom

AIC: 68870

Using above R-output, answer the following questions:

a. In one sentence explain why we choose to use Poisson regression instead of logistic regression. [2Marks]

b. State the prediction equation and interpret the coefficient for smoke100). [2Marks]

c. Find 99% Wald confidence interval for (gender). [2 Marks]

d. Using this model with three main effects, predict the health status for a female, at age of 52 who smoked more than 100 cigarettes and interpret. [3 Marks]

e. Your project manager does not wish to use Wald tests to decide which variable to remove, and prefers to use difference in deviance (likelihood ratio tests). She prepares the following summary after fitting all three two-variable models in addition to your three-variable model:

Model

Deviance

difference compared to full model

P-value

Full model

6273.7

smok100 + age

6380.9

17.2

0.102

smoke100 + gender

6537.6

63.9

0.012

age + gender

6341.6

8.9

0.260

Simply by considering the description of the 3 variables, explain if you suspect multicollinearity to be an issue for this problem? [2Marks]

By examining the difference in deviance between each of the subset models and the full model, explain which two-variable model(s) you will choose for consideration and which two-variable model(s) you will discard. [2Marks]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!