Questions and Answers of Categorical Data Analysis

For Problem 3.14, obtain a 95% confidence interval for the odds ratio using(a) The Woolf (i.e., Wald) interval,(b) Cornfield’s ‘exact” approach, (c) the profile likelihood. In each case, note
For multinomial probabilities π = (π1, π2,......) with a contingency table of arbitrary dimensions, suppose that a measure g(π) = ν/δ. Show that the asymptotic variance of √n [g(π̂)
Let yi, i = 1,...,n, denote n independent binary random variables.a. Derive the log likelihood for the probit model Î¦€“1 [Ï€(xi)] = ˆ‘j Î²j
When J = 3, suppose that πj(x) = exp(αj + βjx)/[1 + exp(α1 + β1x) + exp(α2 + β2x)], j = 1, 2. Show that π3(x) is (a) decreasing in x if β1 > 0 and β > 0.
Suppose that model (7.24) holds for a 2 × J table with J > 2, and let x2 – x1 = 1. Explain why local log odds ratios are typically smaller in absolute value than the cumulative log odds
For the cumulative probit model Φ–1[P(Y ≤ j)] = αj – β’x, explain why a 1-unit increase in xi corresponds to a βi standard deviation increase in the expected underlying latent response,
A cumulative link model for an I × J contingency table with a qualitative predictor is G–1[P(Y ≤ j)] = αj + µi, i = 1,...., I, j = 1,...., J – 1.a. Show that the residual df = (I – 1)(J
Refer to the loglinear models for Table 8.8.a. Explain why the fitted odds ratios in Table 8.10 for model (GI, GL, GS, IL, IS, LS) suggest that the most likely accident case for injury is females not
For the loglinear model for an I × J table, log µij = λ + λiX, show that µ̂ij = ni+/J and residual df = I(J – 1).
Consider the loglinear model with symbol (XZ, YZ).a. For fixed k, show that {µ̂ijk} equal the fitted values for testing independence between X and Y within level k of Z.b. Show that the Pearson and
Consider loglinear model (X, Y, Z) for a 2 × 2 × 2 table.a. Express the model in the form log µ = Xβ.b. Show that the likelihood equations X’n = X’µ̂ equate {nijk} and {µ̂ijk} in the
Given target row totals {ri > 0) and column totals {cj > 0):a. Explain how to use IPF to adjust sample proportions {pij} to have these totals but maintain the sample odds ratios.b. Show how to
Consider loglinear model (WX, XY, YZ). Explain why W and Z are independent given X alone or given Y alone or given both X and Y. When are W and Y conditionally independent? When are X and Z
Consider the log likelihood for the linear-by-linear association model.a. Differentiating with respect to Î² and evaluating at Î² = 0 and null estimates of parameters, show that
Table 10.17 shows subjects€™ purchase choice of instant decaffeinated coffee at two times.a. Fit the symmetry model and use residuals to analyze changes.b. Test marginal homogeneity. Show
Calculate kappa for a 4 × 4 table having nii = 5 all i, ni, i+1 = 15, i = 1, 2, 3, n41 = 15, and nij = 0 otherwise. Explain why strong association does not imply strong agreement.
Refer to Table 8.19. The two-way table relating responses for the environment (as rows) and cities (as columns) has cell counts, by row, (108, 179, 157 / 21,55,52 / 5,6,24). Analyze these data.Table
Identify loglinear models that correspond to the logit models, for a < b, log(πab/πba) = (a) 0, (b) τ, (c) αa – αb, and (d) β(b – a).
Consider the multiplicative model for a square table,a. Show that the model satisfies (i) symmetry, (ii) marginal homogeneity, (iii) quasi-symmetry, (iv) quasi-independence.b. Show that Î²
A model for agreement on an ordinal response partitions beyond-chance agreement into that due to a baseline association and a main-diagonal increment. For ordered scores {ua}, the model isa. Show
Refer to the Bradley-Terry model.a. Show that log(Πac/Πca) = log(Πab/Πba) + log(Πbc/Πcb).b. With this model, is it possible that a could be preferred to b (i.e., Πab > Πba) and b could be
Find the log likelihood for the Bradley–Terry model. From the kernel, show that (given {Nab}) the minimal sufficient statistics are {na+}. Thus, explain how “victory totals” determine the
Refer to Table 8.3. Viewing the table as matched triplets, construct the marginal distribution for each substance. Find the sample proportions of students who used marijuana, alcohol, and cigarettes.
Refer to Problem 11.2. Further study shows evidence of an interaction between gender and substance type. Using GEE with exchangeable working correlation, the model fit for the probability
Refer to Table 11.13 on attitudes toward legalized abortion. For the response Yt(1 = support legalization, 0 = oppose) for question t (t = 1, 2, 3) and for gender g (1 = female, 0 = male), consider
Refer to Table 11.4.Fit the interaction modellogit[P(Y2 ‰¤ j)] = Î±j + Î²1x + Î²2 y1 + Î²3 xy1that constrains effects {Î²1x +
A first-order Markov chain has stationary (or time-homogeneous) transition probabilities if the one-step transition probability matrices are identical, that is, if for all i and j,πj|i(1) = πj|i(2)
For table 8.3, let yit= 1 when subject i used substance t. Table 12.11 shows output for the logistic-normal modellogit[P(Yit = 1|ui)] = ui + Î²t.Interpret. Illustrate by comparing use of
How is the focus different for the model in Problem 12.3 than for the loglinear model (AC, AM, CM) used in Section 8.2.4? If ÏƒÌ‚ = 0, which loglinear model has the same fit as
For the crossover study in Table 11.10 (Problem 11.6), fit the modellogit[P(Yi(k)t = 1|ui(k))] = Î±k + Î²t + ui(k),where {ui(k)} are independent N(0, Ïƒ2). Interpret
For Problem 12.7, fit the more general GLMM having treatment effects {Î²tk} that vary by sequence. Test whether the fit is better. One could also consider period or carryover effects. Add
For Table 6.7 on admissions decisions for graduate school applicants, let yig= 1 denote a subject in department i of gender g (1 = females, 0 = males) being admitted.a. For the fixed effects model,
For the clinical trial in Table 9.16, let Ï€it= P(Yit= 1 | ui) denote the probability of success for treatment t in center i.a. The random intercept model (12.11) has
A data set from the 1994 General Social Survey on subjects’ opinions on four items (the environment, health, law enforcement, education) related to whether they believed government spending on each
Landis and Koch (1977) showed ratings by seven pathologists who separately classified 118 slides regarding the presence and extent of carcinoma of the uterine cervix, using a five-point ordinal
Refer to Section 12.5.1 on boys€™ attitudes toward the leading crowd. Table 12.15 shows results for a sample of schoolgirls. Fit model (12.16) and interpret. Summarize the estimated
Table 12.16 reports results from a study to estimate the number N of people infected during a 1995 hepatitis A outbreak in Taiwan. The 271 observed cases were reported from records based on a serum
Analyze the crossover data of Table 11.1 using a random effects approach. Interpret, and compare results to those in Section 11.1.2.Table 11.1: Responses to Three Drugs in a Crossover Study Drug A
For the 23table of opinions about legalized abortion (Table 10.13) collapsed over gender, fit a latent class model with two classes. Show that it is saturated. For each latent class, report the
Extend the various analyses of the teratology data (Table 4.5) in Section 13.3.3 as follows:a. Include a predictor for litter size (as well as group). Interpret, and compare results to those without
Refer to Problem 13.12. Fit the Poisson and negative binomial GLMs using identity link. Show that the estimated differences in means between males and females are identical for the two GLMs but the
Derive residual df for a latent class model with q latent classes. When I = 2, for q ≥ 2 show one needs T ≥ 4 for the model to be unsaturated. Then, find the maximum value for q when T = 4, 5.
Express the log likelihood for latent class model (13.1) in terms of the model parameters. Derive likelihood equations. T Thy .. IIP(Y, = y,\Z = z) P(Z = z). 2=1 1=1
In Section 13.2.2, under the null that the ordinary logistic regression model holds, explain why ills inappropriate to treat the difference between the deviances for that model and the mixture of two
Show that the beta-binomial distribution (13.9) simplifies to the binomial when θ = 0. [Π3(μ + k0)] [Π3-(1-μ + k0)] Π3 (1+ kθ) p(y; μ, θ) - k=0
Suppose that Ï€i= P(Yit= 1) = 1 €“ P(Yit= 0), for t = 1,..., ni, and corr(Yit, Yis) = Ï for t ‰ s. Show that var(Yit) = Ï€i(1 €“
When n = 1, show that the beta-binomial distribution is no different from the binomial (i.e., Bernoulli). Explain why overdispersion cannot occur when n = 1.
When y1,..., yNare independent from the negative binomial distribution (13.13) with k fixed, show that ?̂ = y̅. k Г(у + k) Г(k)Г(у + 1) etk k k p(y;k, µ) 1 u + k у %3D 0,1, 2, ....
Using E(Y) = E[E(Y | X)] and var(Y) = E[var(Y | X)] + var[E(Y | X)], derive the mean and variance of the(a) Beta-binomial distribution,(b) Negative binomial distribution.
An alternative negative binomial parameterization results from the gamma density formula,for which E(Î») = µ, var(Î») = µ/k. Show that this gamma mixture of
Consider a Poisson GLMM using the identity link. Relate the marginal mean and variance to the conditional mean and variance. Explain the structural problem that this model has.
Let ϕ2(T) = ∑i(Ti – πi0)2/πi0. Then ϕ2(p) = X2/n, where X2 is the Pearson statistic (1.15) for testing H0: πi = πi0, i = 1, ..., N, and nϕ2(π) is the noncentrality for that test when π
Suppose that log-log model (6.13) holds. Explain how to interpret β.
Consider model (6.12) with complementary log-log link.a. Find x at which π(x) = ½.b. show the greatest rate of change of π(x) occurs at x = –α/β. What does π(x) equal at that point? Give the
Consider the choice between two options, such as two product brands. Let U0 denote the utility of outcome y = 0 and U1 the utility of y = 1. For y = 0 and 1, suppose that Uy = αy + βy x + ԑy,
A threshold model can also motivate the probit model. For it, there is an unobserved continuous response Y* such that the observed yi = 0 if yi* ≤ τ and yi = 1 if yi* > τ. Suppose that yi* =
Refer to Section 6.4.1. When Y is N(µi, σ2), consider the comparison of (µ1,....,µ1) based on independent samples at the I categories of X. When approximately µi = α + βxi, explain why the t
Suppose that {πijk} in a 2 × 2 × 2 table are, by row, (0.15, 0.10 / 0.10, 0.15) when Z = 1 and (0.10, 0.15 / 0.15, 0.10) when Z = 2. For testing conditional XY independence with Logit models
For Table 7.13, let Y = belief in life after death, x1= gender (1 = females, 0 = males), and x2= race (1 = whites, 0 = blacks). Table 7.14 shows the fit of the model
A model fit predicting preference for U.S. President (Democrat, Republican, Independent) using x = annual income (in $10,000) is log(π̂D/π̂1) = 3.3 – 0.2x and log(π̂R/π̂1) = 1.0 + 0.3x.a.
Table 7.15 refers to the effect on political party identification of gender and race. Find a baseline-category logit model that fits well. Interpret estimated effects on the odds that party
For 63 alligators caught in Lake George, Florida, Table 7.16 classifies primary food choice as (fish, invertebrate, other) and shows length in meters. Alligators are called subadults if length <
For recent data from a General Social Survey, the cumulative logit model (7.5) with Y = political ideology (very liberal, slightly liberal, moderate, slightly conservative, very conservative) and x =
Refer to Problem 7.5. With adjacent-categories logits, β̂ = 0.435. Interpret using odds ratios for adjacent categories and for the (very liberal, very conservative) pair of categories.Data from
Table 7.17 is an expanded version of a data set analyzed in Section 8.4.2. The response categories are (1) not injured, (2) injured but not transported by emergency medical services, (3) injured and
Refer to the cumulative logit model for Table 7.8.a. Compare the estimated income effect Î²Ì‚1 = €“0.510 to the estimate after collapsing the response to three
Table 7.19 refers to a clinical trial for the treatment of small-cell lung cancer. Patients were randomly assigned to two treatment groups. The sequential therapy administered the same combination of
Table 9.7 displays associations among smoking status (S), breathing test results (B), and age (A) for workers in certain industrial plants. Treat B as a response.a. Specify a baseline-category logit
The book’s Web site (www.stat.ufl.edu/ ∼aa/cda/cda.html) has a 7 × 2 table that refers to subjects who graduated from high school in 1965. They were classified as protestors if they took part in
For Table 7.5, the cumulative probit model has fit Î¦€“1[PÌ‚(Y ‰¤ j)] = Î±Ì‚j€“ 0.195x1+ 0.683x2, with
Table 7.21 shows results of fitting the mean response model to Table 7.8 using scores (3, 10, 20, 35} for income and (1, 3, 4, 5) for job satisfaction. Interpret the income effect, provide a
The book€™s Web site (wvvw.stat.ufl.edu/ ˆ¼aa/cda/cda.html) has a 3 Ã— 4 Ã— 4 table that cross-classifies dumping severity (Y) and operation (X) for
Table 7.23 refers to a study that randomly assigned subjects to a control or treatment group. Daily during the study, treatment subjects ate cereal containing psyllium. The study analyzed the effect
A multivariate generalization of the exponential dispersion family (4.14) is f(yi; θi, ϕ) = exp{[yi’ θi – b(θi)]/α(ϕ) + c(yi, ϕ)}, where θi is the natural parameter. Show that the
Is the proportional odds model a special case of a baseline-category logit model? Explain why or why not.
Prove factorization (7.15) for the multinomial distribution.
Show that for the model, logit[P(Y ≤ j)] = αj + βjx, cumulative probabilities may be misordered for some x values.
For an I × J contingency table with ordinal Y and scores {xi = i} for x, consider the model logit[P(Y ≤ j | X = xi)] = αj + βxi.a. Show that residual df = IJ – I – J.b. Show that
F1(y) = 1 – exp (– λy) for y > 0 is a negative exponential cdf with parameter λ, and F2(y) = 1 – exp(– µy) for y > 0. Show that the difference between the cdf’s on a complementary
When X and Y are ordinal, explain how to test conditional independence by allowing a different trend in each partial table.
For an I × J table, let ηij = log µij, and let a dot subscript denote the mean for that index (e.g., ηi. = ∑j ηij/J). Then, let λ = η.., λiX = ηi.– η.., λjY = η.j – η.., and
Suppose that all µijk> 0. Let Î·ijk= log µijk, and consider model parameters with zero-sum constraints.a. For model (XY, XZ, YZ) with a 2 Ã— 2 Ã— 2
Two balanced coins are flipped, independently. Let X = whether the first flip resulted in a head (yes, no), Y = whether the second flip resulted in a head, and Z = whether both flips had the same
For three categorical variables X, Y, and Z:a. Prove that mutual independence of X, Y, and Z implies that X and Y are both marginally and conditionally independent.b. When X is independent of Y and Y
A 2 × 2 × 2 table satisfies πi++ = π+j+ = π++k = ½, all i, j, k. Give an example of {πijk} that satisfies model (a) (X, Y, Z), (b) (XY, Z), (c) (XY, YZ), (d) (XY, XZ,
Show that the general loglinear model in T dimensions has 2T terms. [It has an intercept, (T1) single-factor terms, (T2) two-factor terms,....]
Each of T responses is binary. For dummy variables {z1,..., zT}, the loglinear model of mutual independence has the form log µz1,....,zT = λ1 z1 + ... λT zT. Show how to express the general
Consider a cross-classification of W, X, Y, Z.a. Explain why (WXZ, WYZ) is the most general loglinear model for which X and Y are conditionally independent.b. State the model symbol for which X and Y
For a four-way table with binary response Y, give the equivalent loglinear and logit models that have:a. Main effects of A, B, and C on Y.b. Interaction between A and B in their effects on Y, and C
For a 3 × 3 table with ordered rows having scores {xi}, identify all terms in the generalized loglinear model (8.18) for models (a) logit[P(Y ≤ j)] = αj + βxi, and (b) log [P(Y = j)/P(Y = 3)]
For the independence model for a two-way table, derive minimal sufficient statistics, likelihood equations, fitted values, and residual df.
For model (XV, Z), derive (a) minimal sufficient statistics, (b) likely hood equations, (c) fitted values, and (d) residual df for tests of fit.
Verify the df values shown in Table 8.14 for models (XY, Z), (XY, YZ), and (XY, XZ, YZ).Table 8.14: Residual Degrees of Freedom for Loglinear Models for Three-Way Tables Model Degrees of Freedom (Х,
Table 8.22 shows fitted values for models for four-way tables that have direct estimates.a. Use Birch€™s results to verify that the entry is correct for (W, X, Y, Z). Verify its residual
A T-dimensional table (nab...t) has Ii categories in dimension i.a. Find minimal sufficient statistics, ML estimates of cell probabilities, and residual df for the mutual independence model.b. Find
Apply IPF to model (a) (X, YZ), and (b) (XZ, YZ). Show that the ML estimates result within one cycle.
Table 9.17 summarizes a study with variables age of mother (A), length of gestation (G) in days, infant survival (I), and number of cigarettes smoked per day during the prenatal period (S). Treat G
Consider the loglinear model selection for Table 6.3.a. Why is it not sensible to consider models omitting the Î»GM term?Table 6.3: Marital Status by Report of Pre- and Extramarital
For Table 9.11, fit a model in which death rate depends only on age. Interpret the age effect.Table 9.11: Data on Heart Valve Replacement Operations Type of Heart Valve Aortic 4 1259 Mitral Age
Consider model (9.18). What is the effect on the model parameter estimates, their standard errors, and the goodness-of-fit statistics when (a) The times at risk are doubled, but the numbers of deaths

Showing 200 - 300 of 540