# A five-year follow-up study was carried out in a certain metropolitan area to assess the relationship of

## Question:

T = time (in months) until stomach cancer (SCA) was detected or time (in months) until either the subject was lost to follow-up or the study ended (often called the censoring time);

ST = event indicator status (1 if SCA detected, 0 if SCA not detected);

WTGP = weight group (1 = low, 2 = medlow, 3 = medhigh, 4 = high), with "low" the referent group;

DT = diet type (1 = high fiber diet, 2 = medium fiber diet, 3 = low fiber diet), with "high fiber diet" the referent group;

GEN = gender (0 = male, 1 = female);

AGEGP = agegroup (1 = 40-54 years, 2 = 55-69 years, 3 = 70+ years), with 40€”54 years being the referent group.

Suppose that one considers doing a Poisson regression analysis to assess the effects of diet type and weight on the development of stomach cancer (SCA), controlling for age and gender.10 To carry out such an analysis, organize the data as follows:

Step 1 Form combinations of categories over all four predictors (WTGP, DT, GEN, AGEGP) being considered; these category combinations will define the subgroups to be analyzed using Poisson regression. Since there are four categories of WTGP, three categories of DT, two categories of GEN, and three categories of AGEGROUP, the total number of subgroups will be (4 Ã— 3 Ã— 2 Ã— 3) = 72.

Step 2 For the 72 subgroups, count the number of persons who develop SCA in each subgroup, and denote this count variable as Y. Also, sum up the person-time information over all the persons in each subgroup, and call this variable PT.

Step 3 Use the 72 Y values as the counts and the 72 PT values as the person-time information to fit Poisson regression models to these data.

a. Based on the data organization just described, what is the "sample size" to be used for fitting a Poisson regression model to these data?

b. State a Poisson regression model (called model 1) that would model the natural log of the rate of development of stomach cancer as a linear function of the risk factors DT and WTGP, controlling for potential confounding and effect modification by the variables GEN and AGEGP. Consider only two-factor product terms involving exposure variables and control variables.

c. How would one modify the model in part (b) so that both the WTGP variable and the DT variable are treated as ordinal variables on a natural logarithmic scale? (In stating this modified model, called model 2, make sure to explicitly define the "transformed" WTGP and DT variables that would need to be used.)

d. Provide the model statement, including required options, that one would use with SAS's PROC GENMOD (or a program from a different computer package) to fit model 2, described in part (c) above.

e. Based on model 2 defined in part (c), give a formula for the rate ratio that compares a subject who has a low fiber diet and is in the high weight group to a subject who has a high fiber diet and is in the low weight group, controlling for GEN and AGEGP. (Assume nonzero interaction effects.)

f. Provide an expression for a 95% confidence interval for the rate ratio that compares a subject who has a low fiber diet and is in the high weight group to a subject who has a high fiber diet and is in the low weight group, controlling for GEN and AGEGP. (Assume nonzero interaction effects.)

g. Based on model 2, describe how one would carry out an overall test for significant interaction involving deviance statistics. (Make sure to state the null hypothesis, the test statistic, and the d.f. for the test statistic under the null hypothesis.)

h. Is the test described in part (g) for model 2 equivalent to carrying out a goodness of fit test for a no-interaction version of model 2 that does not contain any product terms? Explain briefly.

i. If model 1 is considered instead of model 2, is an overall test for significant interaction equivalent to carrying out a goodness-of-fit test for a no-interaction version of model 1 that does not contain any product terms? Explain briefly.

Suppose that the following Poisson ANOVA table resulted from fitting several different Poisson regression models to these data.

*{Ordinal) DT is represented by a single ordinal variable,

(Ordinal) WTGP is represented by a single ordinal variable,

(Nominal) DT is represented by 2 dummy variables,

(Nominal) WTGP is represented by 3 dummy variables.

j. Assuming no interaction of any kind between the risk factors (DT and WTGP) and either AGEGP and/or GEN, use the deviance values (e.g., a, b, c) in the above table to give an expression for the LR statistic that tests whether there is a significant difference between the (joint) effects of the nominal exposure variables€”that is, (nominal) DT and (nominal) WTGP€”controlling for AGEGP and GEN.

k. What are the degrees of freedom for the LR statistic described in part (j)?

1. Use the deviance scores in the above table to give an expression for the LR statistic for testing whether there is at least one significant interaction effect in model 1 (as defined in part (b)); that is, describe a chunk test for the interaction terms in model 1.

m. What are the degrees of freedom for the LR statistic described in part (1)?

n. Describe how one might carry out a (single) test of hypothesis to determine whether model 1 (as defined in part (b)) or model 2 (as defined in part (c)) fits the data better. In answering this question, state the null hypothesis, the test statistic, and its d.f. under the null hypothesis,

o. Assuming that a Poisson model is appropriate for these data, how could one criticize the significance testing method described in part (n) for comparing model 1 with model 2?

p. In what other way, using deviance statistic information (other than the test of hypothesis described in part (n)), can one evaluate whether model 1 or model 2 is better

## Step by Step Answer:

**Related Book For**

## Applied Regression Analysis and Other Multivariable Methods

**ISBN:** 978-1285051086

5th edition

**Authors:** David G. Kleinbaum, Lawrence L. Kupper, Azhar Nizam, Eli S. Rosenberg