Foundations Of Linear And Generalized Linear Models 1st Edition Alan Agresti - Solutions

Exercise 7.9.14. Show that if all the roots x0 of Φ(x) have |x0| > 1, then there exists a polynomialΨ(x) =∞Σi=0ψixi with∞Σi=0|ψi| < ∞and, for x ∈ [−1,1],[Ψ(x)] [Φ(x)] = 1.Hint: Do a Taylor expansion of 1/Φ(x) about 0.
Exercise 7.9.15. Generate 25 realizations of an ARMA(1,1) process for φ1 =−0.8,−0.2,0, .2, .8 and θ1 = −0.8,−0.2,0, .2, .8.
Exercise 7.9.16. Prove PA-V’s Exercise 6.9.9, that(a) ρ12·3 =ρ12− 4 ρ13ρ23 1−ρ2 13 41−ρ2 23(b) ρ12·34 =4ρ12·4−ρ13·4ρ23·4 1−ρ2 13·4 41−ρ2 23·4
Exercise 9.10.1. Show that ifW ∼W(n,Σ ,0), then AWA ∼W(n,AΣ A,0).
Exercise 9.10.2. Show that if W1, · · ·Wr are independent with Wi ∼W(ni,Σ ,0), then rΣi=1 Wi ∼W)rΣi=1 ni,Σ ,0*.
Exercise 9.10.3. Show that ifW ∼W(n,Σ ,0), thenλ Wλλ Σλ∼ χ2(n,0).
Exercise 9.10.4. For i = 1,2, 3, let yi ∼ N (μ +(i−2)ξ ,Σ ), where Σ is known and y1, y2, and y3 are independent. Find the maximum likelihood estimates of μand ξ .
Exercise 9.10.5. Based on the multivariate linear model Y = XB+e, E(e) = 0, Cov(εi,ε j) =δi jΣ , find a 99% prediction interval for y0ξ , where y0 is an independent observation that is distributed N(Bx0,Σ ).
Exercise 9.10.6. Let y1, y2, · · · , yn be i.i.d. N(Xβ ,Σ ), where Σn×n is unknown.Show that the maximum likelihood estimate of Xβ is X ˆβ = X(XE−1X)−XE−1 ¯ y· .
Exercise 9.10.7. Consider the multivariate linear model Y = XB+e and the parametric function ΛBW, where W is a q×r matrix of rank r. Find simultaneous confidence intervals for all parameters of the form ζ ΛBWξ .
Exercise 9.10.8. Use Lemma 9.6.2 to show that if A is nonnegative definite and B is positive definite, then AB−1 is nonnegative definite. Hint: Show that the nonnegative definite matrix B−1/2AB−1/2 has the same eigenvalues as AB−1.
Exercise 9.10.9. Rewrite the multivariate linear model in terms of Vec(Y).Write it similarly to both (9.1.2) with covariance (9.1.3) and using the Vec operator with Kronecker products as in (9.1.4) with (9.1.5).
Exercise 9.10.10. In the multivariate linear model Y = XB+e the likelihood equations reduce to trΣ −1[dθ jΣ ]+= trΣ −1[dθ jΣ ]Σ −1Σˆ+, for j = 1, . . . , s where Σˆ ≡Y(I−M)Y/n and θ1,θ2, . . . ,θs from Chap. 4 correspond to σgh for g ≤ h so that s = q(q+1)/2.We
Exercise 9.10.11. Use the univariate linear model form of the multivariate linear model and the likelihood equations from Sect. 4.3 to obtain the MLEs for the multivariate linear model.
Exercise 10.6.1. Jolicoeur and Mosimann (1960) give data on the length, width, and height of painted turtle shells. The carapace dimensions of 24 females and 24 males are given in Table 10.3. Use Hotelling’s T2 statistic to test whether there is a sex difference in shell dimensions. Is there a
Exercise 10.6.2. Smith, Gnanadesikan, and Hughes (1962) provide data on characteristics of the urine of young men. The men are categorized into four groups based on their degree of obesity. The four variables given in Table 10.4 consist of a covariate x = 103((specific gravity)−1) and three
Exercise 10.6.3. Analyze the repeated measures data given by Danford, Hughes, and McNee (1960) in Biometrics on pages 562 and 563.Exercise 10.6.4. Box (1950) gives data on the weights of three groups of rats.One group was given thyroxin in their drinking water, one thiouracil, and the third group
Exercise 11.8.1. Box (1950) gives data on the weights of three groups of rats.One group was given thyroxin in their drinking water, one thiouracil, and the third group was a control. Weights are measured in grams at weekly intervals. The data were given in Table 10.5. Analyze the data using the
Exercise 11.8.2. Box (1950) presents data on the weight loss of a fabric due to abrasion. Two fillers were used in three proportions. Some of the fabric was given a surface treatment. Weight loss was recorded after 1000, 2000, and 3000 revolutions of a machine designed to test abrasion resistance.
Exercise 12.9.1. Consider the data of Example 10.3.1. Suppose a person has heart rate measurements of y = (84,82,80,69).(a) Using normal theory linear discrimination, what is the estimated maximum likelihood allocation for this person?(b) Using normal theory quadratic discrimination, what is the
Exercise 12.9.2. In the motion picture Diary of a Mad Turtle the main character, played by Richard Benjamin Kingsley, claims to be able to tell a female turtle by a quick glance at her carapace. Based on the data of Exercise 10.6.1, do you believe that it is possible to accurately identify a
Exercise 12.9.3. Using the data of Exercise 10.6.3, do a stepwise discriminant analysis to distinguish among the thyroxin, thiouracil, and control rat populations based on their weights at various times. To which group is a rat with the following series of weights most likely to belong:
Exercise 12.9.4. Lachenbruch (1975) presents information on four groups of junior technical college students from greater London. The information consists of summary statistics for the performance of the groups on arithmetic, English, and form relations tests that were given in the last year of
Exercise 12.9.6. Show that the Mahalanobis distance is invariant under affine transformations z = Ay+b of the random vector y when A is nonsingular.
Exercise 12.9.7. Let y be an observation from one of two normal populations that have means of μ1 and μ2 and common covariance matrix Σ . Define λ = (μ1−μ2)Σ −1.(a) Show that, under linear discrimination, y is allocated to population 1 if and only ifλ y−λ 1 2(μ1+μ2) > 0.(b)
Exercise 12.9.8. Consider a two group allocation problem in which the prior probabilities are π(1) =π(2) = 0.5 and the sampling distributions are exponential, namely f (y|i) =θie−θiy, y ≥ 0.Find the optimal allocation rule. Assume a cost structure where c(i| j) is zero for i = j and one
Exercise 12.9.6. Show that the Mahalanobis distance is invariant under affine transformations z = Ay+b of the random vector y when A is nonsingular.
Exercise 12.9.7. Let y be an observation from one of two normal populations that have means of μ1 and μ2 and common covariance matrix Σ . Define λ = (μ1−μ2)Σ −1.(a) Show that, under linear discrimination, y is allocated to population 1 if and only ifλ y−λ 1 2(μ1+μ2) > 0.(b)
Exercise 12.9.8. Consider a two group allocation problem in which the prior probabilities are π(1) =π(2) = 0.5 and the sampling distributions are exponential, namely f (y|i) =θie−θiy, y ≥ 0.Find the optimal allocation rule. Assume a cost structure where c(i| j) is zero for i = j and one
Exercise 12.9.9. Suppose that the distributions for two populations are bivariate normal with the same covariance matrix. For π(1) =π(2) = 0.5, find the value of the correlation coefficient that minimizes the total probability of misclassification.The total probability of misclassification is
Exercise 14.5.1.(a) Find the vector b that minimizes qΣi=1yi−μi−b(x−μx)2.(b) For given weights wi, i = 1, . . . ,q, find the vector b that minimizes qΣi=1 w2iyi−μi−b(x−μx)2.(c) Find the vectors bi that minimize qΣi=1 w2iyi−μi−bi(x−μx)2.
Exercise 14.5.2. In a population of large industrial corporations, the covariance matrix for y1 = assets/106 and y2 = net income/106 isΣ =75 5 5 1.(a) Determine the principal components.(b) What proportion of the total prediction variance is explained by a1y?(c) Interpret a1y.(d) Repeat (a),
Exercise 14.5.3. What are the principal components associated withΣ =⎡⎢⎣5 0 0 0 0 3 0 0 0 0 3 0 0 0 0 2⎤⎥⎦?Discuss the problem of reducing the variables to a two-dimensional space.
Exercise 14.5.4. Let v1 = (2,1,1,0), v2 = (0,1,−1,0), v3 = (0,0,0,2), andΣ =3Σi=1 vivi .(a) Find the principal components of Σ .(b) What is the predictive variance of each principal component? What percentage of the maximum prediction error is accounted for by the first two principal
Exercise 14.5.5. Do a principal components analysis of the female turtle carapace data of Exercise 10.6.1.
Exercise 14.5.6. The data in Table 14.1 are a subset of the Chapman data reported by Dixon and Massey (1983). It contains the age, systolic blood pressure, diastolic blood pressure, cholesterol, height, and weight for a group of men in the Los Angeles Heart Study. Do a principal components analysis
Exercise 14.5.7. Assume a two-factor model withΣ =⎡⎣0.15 0.00 0.05 0.00 0.20 −0.01 0.05 −0.01 0.05⎤⎦and B =0.3 0.2 0.1 0.2 −0.3 0.1.What isΨ? What are the communalities?
Exercise 14.5.8. Using the vectors v1 and v2 from Exercise 14.5.4, letΛ = v1v1+v2v2.Give the eigenvector solution for B and another set of loadings that generates Λ.
Exercise 14.5.9. Given thatΣ =⎡⎣1.00 0.30 0.09 0.30 1.00 0.30 0.09 0.30 1.00⎤⎦andΨ = D(0.1,0.2,0.3), find Λ and two choices of B.
Exercise 14.5.10. Find definitions for the well-known factor loading matrix rotations varimax, direct quartimin, quartimax, equamax, and orthoblique. What is each rotation specifically designed to accomplish? Apply each rotation to the covariance matrices of Exercise 14.5.9.
Exercise 14.5.11. Do a factor analysis of the female turtle carapace data of Exercise 10.6.1. Include tests for the numbers of factors and examine various factorloading rotations.
Exercise 14.5.12. Do a factor analysis of the Chapman data discussed in Exercise 14.5.6.
Exercise 14.5.13. Show the following determinant equality.|Ψ +BB| = |I+BΨ−1B||Ψ|.
Exercise 14.5.14. Find the likelihood ratio test for H0 : Σ =σ 2(1−ρ)I+ρJJagainst the general alternative.
Suppose that yi has a N(????i, ????2) distribution, i = 1,…, n. Formulate the normal linear model as a GLM, specifying the random component, linear predictor, and link function.
Show the exponential dispersion family representation for the gamma distribution (4.29). When do you expect it to be a useful distribution for GLMs?
Show that the t distribution is not in the exponential dispersion family.(Although GLM theory works out neatly for family (4.1), in practice it is sometimes useful to use other distributions, such as the Cauchy special case of the t.)
Show that an alternative expression for the GLM likelihood equations is∑n i=1(yi − ????i)var(yi)????????i????????j= 0, j = 1, 2,…, p.Show that these equations result from the generalized least squares problem of minimizing ∑i[(yi − ????i)2∕var(yi)], treating the variances as known
For a GLM with canonical link function, explain how the likelihood equations imply that the residual vector e = (y − ????̂) is orthogonal with C(X).
Suppose yi has a Poisson distribution with g(????i) = ????0 + ????1xi, where xi = 1 for i = 1,…, nA from group A and xi = 0 for i = nA + 1, ..., nA + nB from group B, and with all observations being independent. Show that for the log-link function, the GLM likelihood equations imply that the
Refer to the previous exercise. Using the likelihood equations, show that the same result holds for (a) any link function for this Poisson model, (b) any GLM of the form g(????i) = ????0 + ????1xi with a binary indicator predictor.
For the two-way layout with one observation per cell, consider the model whereby yij ∼ N(????ij, ????2) with????ij = ????0 + ????i + ????j + ????????i????j.For independent observations, is this a GLM? Why or why not? (Tukey (1949)proposed a test of H0: ???? = 0 as a way of testing for
Consider the expression for the weight matrix W in var(????̂) = (XT WX)−1 for a GLM. Find W for the ordinary normal linear model, and show how var(????̂)follows from the GLM formula.
For the normal bivariate linear model, the asymptotic variance of the correlation r is (1 − ????2)2∕n. Using the delta method, show that the transform 12 log[(1 + r)∕(1 − r)] is variance stabilizing. (Fisher (1921) noted this, showing that 1∕(n − 3) is an improved variance for the
For a binomial random variable ny with parameter ????, consider the null model.a. Explain how to invert the Wald, likelihood-ratio, and score tests of H0:???? = ????0 against H1: ???? ≠ ????0 to obtain 95% confidence intervals for ????.b. In teaching an introductory statistics class, one year I
For the normal linear model, Section 3.3.2 showed how to construct a confidence interval for E(y) at a fixed x0. Explain how to do this for a GLM.
For a GLM assuming yi ∼ N(????i, ????2), show that the Pearson chi-squared statistic is the same as the deviance. Find the form of the difference between the deviances for nested models M0 and M1.
In a GLM that uses a noncanonical link function, explain why it need not be true that ∑i ̂????i = ∑i yi. Hence, the residuals need not have a mean of 0.Explain why a canonical link GLM needs an intercept term in order to ensure that this happens.
For a binomial GLM, explain why the Pearson residual for observation i, ei = (yi − ̂????i)∕√ ̂????i(1 − ̂????i)∕ni, does not have an approximate standard normal distribution, even for a large ni.
Find the form of the deviance residual (4.21) for an observation in (a) a binomial GLM, (b) a Poisson GLM.
Suppose x is uniformly distributed between 0 and 100, and y is binary with log[????i∕(1 − ????i)] = −2.0 + 0.04xi. Randomly generate n = 25 independent observations from this model. Fit the model, and find corr(y − ????̂ , ????̂). Do the same for n = 100, n = 1000, and n = 10, 000, and
Derive the formula var(????̂j) = ????2∕{(1 − R2 j )[∑i(xij − x̄j)2]}.
Consider the value ????̂ that maximizes a function L(????). This exercise motivates the Newton–Raphson method by focusing on the single-parameter case.a. Using L′(????̂) = L′(????(0)) + (????̂ − ????(0))L′′(????(0)) + ⋯, argue that for an initial approximation ????(0) close to
For n independent observations from a Poisson distribution with parameter????, show that Fisher scoring gives ????(t+1) = ȳ for all t > 0. By contrast, what happens with the Newton–Raphson method?
For an observation y from a Poisson distribution, write a short computer program to use the Newton–Raphson method to maximize the likelihood. With y = 0, summarize the effects of the starting value on speed of convergence.
For noncanonical link functions in a GLM, show that the observed information matrix may depend on the data and hence differs from the expected information matrix. Thus, the Newton–Raphson method and Fisher scoring may provide different standard errors.
The bias–variance tradeoff: Before an election, a polling agency randomly samples n = 100 people to estimate ???? = population proportion who prefer candidate A over candidate B. You estimate ???? by the sample proportion ̂????. I estimate it by 1 2 ̂???? + 1 2 (0.50). Which estimator is
In selecting explanatory variables for a linear model, what is inadequate about the strategy of selecting the model with largest R2 value?
For discrete probability distributions of {pj} for the “true” model and {pMj} for a model M, prove that the Kullback–Leibler divergence E{log[p(y)∕pM(y)]}≥0.
For a normal linear model M1 with p + 1 parameters, namely, {????j} and ????2, which has ML estimator ̂????2 = [∑n i=1(yi − ̂????i)2]∕n, show that AIC = n[log(2???? ̂????2) + 1] + 2(p + 1).Using this, when M2 has q additional terms, show that M2 has smaller AIC value if SSE2∕SSE1 <
Section 4.7.2 mentioned that using a gamma GLM with log-link function gives similar results to applying a normal linear model to log(y).a. Use the delta method to show that when y has standard deviation ???? proportional to ???? (as does the gamma GLM), log(y) has approximately constant variance
Download the Houses.dat data file from www.stat.ufl.edu/~aa/glm/data. Summarize the data with descriptive statistics and plots. Using a forward selection procedure with all five predictors together with judgments about practical significance, select and interpret a linear model for selling
Refer to the previous exercise. Use backward elimination to select a model.a. Use an initial model containing the two-factor interactions. When you reach the stage at which all terms are statistically significant, adjusted R2 should still be about 0.87. See whether you can simplify further without
Refer to the previous two exercises. Conduct a model-selection process assuming a gamma distribution for y, using (a) identity link, (b) log link. For each, interpret the final model.
For the Scottish races data of Section 2.6, the Bens of Jura Fell Race was an outlier for an ordinary linear model with main effects of climb and distance in predicting record times. Alternatively the residual plots might merely suggest increasing variability at higher record times. Fit this model
Exercise 1.21 presented a study comparing forced expiratory volume after 1 hour of treatment for three drugs (a,b, and p = placebo), adjusting for a baseline measurement x1. Table 4.1 shows the results of fitting some normal GLMs (with identity link, except one with log link) and a GLM assuming a
Refer to Exercise 2.45 and the study for comparing instruction methods. Write a report summarizing a model-building process. Include instruction type in the chosen model, because of the study goals and the small n, which results in little power for finding significance for that effect. Check and
The horseshoe crab dataset Crabs2.dat at the text website comes from a study of factors that affect sperm traits of males. One response variable is ejaculate size, measured as the log of the amount of ejaculate (microliters)measured after 10 seconds of stimulation. Explanatory variables are the
The MASS package of R contains the Boston data file, which has several predictors of the median value of owner-occupied homes, for 506 neighborhoods in the suburbs near Boston. Describe a model-building process for these data, using the first 253 observations. Fit your chosen model to the other 253
For x between 0 and 100, suppose the normal linear model holds with E(y) = 45 + 0.1x + 0.0005x2 + 0.0000005x3 + 0.0000000005x4+0.0000000000005x5 and ???? = 10.0. Randomly generate 25 observations from the model, with x having a uniform distribution between 0 and 100. Fit the simple model E(y) =
What does the fit of the “correct” model in the previous exercise illustrate about collinearity?
Randomly generate 100 observations (xi, yi) that are independent uniform random variables over [0, 100]. Fit a sequence of successively more complex polynomial models for using x to predict y, of degree 1, 2, 3, …. In principle, even though the true model is E(y) = 50 with population R2 = 0, you
Other link functions: Other link functions for binary data include the inverse cdf of a t distribution (the probit being the limit as df → ∞); a log-gamma link (Genter and Farewell 1985), for which probit, complementary log–log and log–log are special cases; a family of link functions that
Conditional logistic: For more details about case-control studies and conditional logistic regression, see Breslow and Day (1980, Chapter 7). For more on “exact” inference using conditional distributions with logistic models, see Mehta and Patel (1995). Fisher’s exact test extends to r × c
Propensity scores: Rosenbaum and Rubin (1983) proposed methods of comparing E(y)for two groups in observational studies while adjusting for possibly confounding variables x. They defined the propensity as the probability of being in one group, as a function of x. They used logistic regression to
Binary GLM history: The probit model was presented by Bliss (1935) and popularized in three editions of Finney (1971). Logistic regression was proposed by Berkson (1944)as a model that has similar fit as a probit model but has closed form for the link function.Yates (1955) proposed the
For the population having value y on a binary response, suppose x has an N(????y, ????2) distribution, y = 0, 1.a. Using Bayes’ theorem, show that P(y = 1 ∣ x) satisfies the logistic regression model with ????1 = (????1 − ????0)∕????2.b. Suppose that (x ∣ y) ∼ N(????y, ????2 y ) with
Refer to Note 1.5. For a logistic model, show that the average estimated rate of change in the response probability as a function of explanatory variable j, adjusting for the others, satisfies 1 n∑i(???? ̂????i∕????xij) = ????̂j 1n∑i[ ̂????i(1 − ̂????i)].
Construct the ROC curves for (a) the toy example in Section 5.4.2 with complete separation and (b) the dataset (n = 8) that adds two observations at x = 3.5, one with y = 1 and one with y = 0. In each case, report the area under the curve and summarize predictive power. For contrast, construct a
From the likelihood equation (5.5) for a logistic regression intercept parameter, show that the overall sample proportion of successes equals the sample mean of the fitted success probabilities. Is this true for other binary GLMs?
Suppose that niyi has a bin(ni, ????i) distribution. Consider a binary GLM ????i =F(∑j ????j xij) with F the standard cdf of some family of continuous distributions.Find wi in wi = (????????i∕????????i)2∕var(yi) and hence var(????̂).
Explain how expression (5.6) for var( ̂ ????̂) in logistic regression suggests that the standard errors of {????̂j} tend to be smaller as you obtain more data. Answer this for (a) grouped data with {ni} increasing, (b) ungrouped data with N increasing.
Assuming the model logit[P(yi = 1)] = ????xi, you take all n observations at x0.Find ????̂ and the large-sample var(????̂). For the Wald test, explain why the chisquared noncentrality is ????2∕var(????̂), and evaluate it as ???? → ∞. Explain how this illustrates that the Wald test in
For a 2 × 2 × ???? contingency table that cross classifies y with a binary treatment variable x and an adjustment factor z, specify a logistic model with a lack of interaction between x and z. Construct the likelihood function, and explain the conditioning required to generate an exact
To use conditional logistic regression to test H0: ????1 = 0 against H1: ????1 < 0 for the toy example in Section 5.4.2, find the conditional distribution of ∑i xiyi, given ∑i yi. Find the exact small-sample P-value.
The calibration problem is that of estimating x0 at which P(y = 1) = ????0 for some fixed ????0 such as 0.50. For the logistic model with a single explanatory variable, explain why a confidence interval for x0 is the set of x values for which|????̂0 + ????̂1x − logit(????0)|∕[var(????̂0) +
Construct the log-likelihood function for the model logit(????i) = ????0 + ????1xi with independent binomial proportions of y1 successes in n1 trials at x1 = 0 and y2 successes in n2 trials at x2 = 1. Derive the likelihood equations, and show that ????̂1 is the sample log odds ratio.
Refer to the previous exercise. Denote the cell counts in the 2 × 2 table by{nij}. For the case ????1 = 0 (the independence model), the fitted values in the cells of that table are { ̂????ij = ni+n+j∕n}. These have a common value for the four |nij − ̂????ij|.a. Construct the Pearson
Suppose the logistic model holds in which x is uniformly distributed between 0 and 100, and logit(????i) = −2.0 + 0.04xi. Randomly generate 100 independent observations from this model. Plot the residuals against x and against the fitted values. Why do residual plots for binary data have this
Let niyi be a bin(ni, ????i) variate for group i, i = 1,…,N, with {yi} independent. Consider the null model, for which ????1 = ⋯ = ????N. Show that̂???? = (∑i ni yi)∕(∑i ni). When all ni = 1, for testing goodness of fit of the null model in the N × 2 table, show that X2 = N.

Showing 900 - 1000 of 1264

Foundations Of Linear And Generalized Linear Models 1st Edition Alan Agresti - Solutions

Step by Step Answers