1 Million+ Step-by-step solutions

The Two Variable Regression for the regression model y = α + β x + ε

(a) Show that the least squares normal equations imply Σiei = 0 and Σi xi ei = 0. (b) Show that the solution for the constant term is a = y – bx.

(c) Show that the solution for b is b = [Σni = 1(xi – x) (yi – y)]/[Σni=1(xi – x)2].

(d) Prove that these two values uniquely minimize the sum of squares by showing that the diagonal elements of the second derivatives matrix of the sum of squares with respect to the parameters are both positive and that the determinant is 4n[Σni=1 xi2) nx2] = 4n[Σni=1 (xi – x)2], which is positive unless all values of x are the same.

Change in the sum of squares. Suppose that b is the least squares coefficient vector in the regression of y on X and that c is any other K × 1 vector. Prove that the difference in the two sums of squared residuals is (y − Xc)' (y − Xc) − (y − Xb)' (y − Xb) = (c − b) 'X’X(c − b). Prove that this difference is positive.

Linear Transformations of the data, consider the least squares regression of y on K variables (with a constant) X. Consider an alternative set of regressors Z = XP, where P is a nonsingular matrix. Thus, each column of Z is a mixture of some of the columns of X. Prove that the residual vectors in the regressions of y on X and y on Z are identical. What relevance does this have to the question of changing the fit of a regression by changing the units of measurement of the independent variables?

Partial Frisch and Waugh in the least squares regression of y on a constant and X, to compute the regression coefficients on X, we can first transform y to deviations from the mean y and, likewise, transform each column of X to deviations from the respective column mean; second, regress the transformed y on the transformed X without a constant. Do we get the same result if we only transform y? What if we only transform X?

A residual maker what is the result of the matrix productM1MwhereM1 is defined in (3-19) and M is defined in (3-14)?

Adding an observation, a data set consists of n observations on Xn and yn. The least squares estimator based on these n observations is bn = (X'n Xn)−1 X’n yn. Another observation, xs and ys, becomes available. Prove that the least squares estimator computed using this additional observation is Note that the last term is es, the residual from the prediction of ys using the coefficients based on Xn and bn. Conclude that the new data change the results of least squares only if the new observation on y cannot be perfectly predicted using the information already in hand.

Deleting an observation a common strategy for handling a case in which an observation is missing data for one or more variables is to fill those missing variables with 0s and add a variable to the model that takes the value 1 for that one observation and 0 for all other observations. Show that this ‘strategy’ is equivalent to discarding the observation as regards the computation of b but it does have an effect on R2. Consider the special case in which X contains only a constant and one variable. Show that replacing missing values of x with the mean of the complete observations has the same effect as adding the new variable.

Demand system estimation. Let Y denote total expenditure on consumer durables, nondurables, and services and Ed, En, and Es are the expenditures on the three categories. As defined, Y = Ed + En + Es. Now, consider the expenditure system Prove that if all equations are estimated by ordinary least squares, then the sum of the expenditure coefficients will be 1 and the four other column sums in the preceding model will bezero.

Change in adjusted R2. Prove that the adjusted R2 in (3-30) rises (falls) when variable xk is deleted from the regression if the square of the t ratio on xk in the multiple regression is less (greater) than 1.

Regression without a constant, suppose that you estimate a multiple regression first with then without a constant. Whether the R2 is higher in the second case than the first will depend in part on how it is computed. Using the (relatively) standard method R2 = 1 − (e' e/y'M0y), which regression will have a higher R2?

Three variables, N, D, and Y, all have zero means and unit variances. A fourth variable is C = N + D. In the regression of C on Y, the slope is 0.8. In the regression of C on N, the slope is 0.5. In the regression of Don Y, the slope is 0.4.What is the sum of squared residuals in the regression of C on D? There are 21 observations and all moments are computed using 1/(n − 1) as the divisor.

Using the matrices of sums of squares and cross products immediately preceding Section 3.2.3, compute the coefficients in the multiple regression of real investment on a constant, real GNP and the interest rate. Compute R2.

In the December, 1969, American Economic Review (pp. 886–896), Nathaniel Leaf reports the following least squares regression results for a cross section study of the effect of age composition on savings in 74 countries in 1964: ln S/Y = 7.3439 + 0.1596 ln Y/N + 0.0254 ln G − 1.3520 ln D1 − 0.3990 ln D2 ln S/N = 8.7851 + 1.1486 ln Y/N + 0.0265 ln G− 1.3438 ln D1 − 0.3966 ln D2 where S/Y = domestic savings ratio, S/N = per capita savings, Y/N = per capita income, D1 =percentage of the population under 15, D2 =percentage of the population over 64, and G = growth rate of per capita income. Are these results correct? Explain.

Suppose that you have two independent unbiased estimators of the same parameter θ, say θ1 and θ2, with different variances v1 and v2. What linear combination θ = c1θ1 + c2θ2 is the minimum variance unbiased estimator of θ?

Consider the simple regression yt = βxt + ε1 where E p[ε | x] = 0 and E [ε2 | x ] = σ2

(a) What is the minimum mean squared error linear estimator of β? Choose e to minimize Var [β] + [E(β – β)]2. The answer is a function of the unknown parameters].

(b) For the estimator in part a, show that ratio of the mean squared error of β to that of the ordinary least squares estimator b is Note that τ is the square of the population analog to the “t ratio” for testing the hypothesis that β = 0, which is given in (4-14). How do you interpret the behavior of this ratio as τ → ∞?

Suppose that the classical regression model applies but that the true value of the constant is zero. Compare the variance of the least squares slope estimator computed without a constant term with that of the estimator computed with an unnecessary constant term.

Suppose that the regression model is yi = α + βxi + εi, where the disturbances εi have f (εi) = (1/λ) exp (−λεi), εi ≥ 0. This model is rather peculiar in that all the disturbances are assumed to be positive. Note that the disturbances have E[εi | xi] = λ and Var[εi | xi] = λ2. Show that the least squares slope is unbiased but that the intercept is biased.

Prove that the least squares intercept estimator in the classical regression model is the minimum variance linear unbiased estimator.

As a profit maximizing monopolist, you face the demand curve Q = α + β P + ε. In the past, you have set the following prices and sold the accompanying quantities: Suppose that your marginal cost is 10. Based on the least squares regression, compute a 95 percent confidence interval for the expected value of the profit maximizing output.

The following sample moments for x = [1, x1, x2, x3] were computed from 100 observations produced using a random number generator:

The true model underlying these data is y = x1 + x2 + x3 + ε.

a. Compute the simple correlations among the regressors.

b. Compute the ordinary least squares coefficients in the regression of y on a constant x1, x2, and x3.

c. Compute the ordinary least squares coefficients in the regression of y on a constant x1 and x2, on a constant x1 and x3, and on a constant x2 and x3.

d. Compute the variance inflation factor associated with each variable.

e. The regressors are obviously collinear. Which is the problem variable?

Consider the multiple regression of y on K variablesXand an additional variable z. Prove that under the assumptions A1 through A6 of the classical regression model, the true variance of the least squares estimator of the slopes on X is larger when z is included in the regression than when it is not. Does the same hold for the sample estimate of this covariance matrix? Why or why not? Assume that X and z are nonstochastic and that the coefficient on z is nonzero.

For the classical normal regression model y = Xβ + ε with no constant term and K regressors, assuming that the true value of β is zero, what is the exact expected value of F[K, n − K] = (R2/K)/[(1 − R2)/(n − K)]?

Prove that E[b' b] = β' β + σ2ΣKk=1(1/λk) where b is the ordinary least squares estimator and λk is a characteristic root of X' X.

Data on U.S. gasoline consumption for the years 1960 to 1995 are given in Table F2.2.

a. Compute the multiple regression of per capita consumption of gasoline, G/pop, on all the other explanatory variables, including the time trend, and report all results. Do the signs of the estimates agree with your expectations?

b. Test the hypothesis that at least in regard to demand for gasoline consumers do not differentiate between changes in the prices of new and used cars.

c. Estimate the own price elasticity of demand, the income elasticity, and the crossprice elasticity with respect to changes in the price of public transportation.

d. Reestimate the regression in logarithms so that the coefficients are direct estimates of the elasticities. (Do not use the log of the time trend.) How do your estimates compare with the results in the previous question? Which specification do you prefer?

e. Notice that the price indices for the automobile market are normalized to 1967, whereas the aggregate price indices are anchored at 1982. Does this discrepancy affect the results? How? If you were to renormalize the indices so that they were all 1.000 in 1982, then how would your results change?

For the classical normal regression model y = Xβ + ε with no constant term and K regressors, what is plim F[K, n – k] = plim R2/K/(1 – R2) / (n – k), assuming that the true value of β is zero?

Let ei be the ith residual in the ordinary least squares regression of y on X in the classical regression model, and let εi be the corresponding true disturbance. Prove that plim (ei − εi) = 0.

For the simple regression model yi = μ + εi, εi ~ N [0, σ2], prove that the sample mean is consistent and asymptotically normally distributed. Now consider the alternative estimator μ = Σi wi yi, wt = 1/(n(n+1)/2) = i/Σi i. Note that Σi wi = 1. Prove that this is a consistent estimator of μ and obtain its asymptotic variance.

In the discussion of the instrumental variables estimator we showed that the least squares estimator b is biased and inconsistent. Nonetheless, b does estimate something: plim b = θ = β + Q−1γ. Derive the asymptotic covariance matrix of b, and show that b is asymptotically normally distributed.

For the model in (5-25) and (5-26), prove that when only x* is measured with error, the squared correlation between y and x is less than that between y* and x*. (Note the assumption that y* = y.) Does the same hold true if y* is also measured with error?

Christensen and Greene (1976) estimated a generalized Cobb–Douglas cost function of the form ln(C/Pf) = α + β ln Q + γ (ln2 Q)/2 + δk ln(Pk/Pf) + δ1 ln(Pl/Pf) + ε. Pk, Pl and Pf indicate unit prices of capital, labor, and fuel, respectively, Q is output and C is total cost. The purpose of the generalization was to produce a U-shaped average total cost curve. (See Example 7.3 for discussion of Nerlove’s (1963) predecessor to this study) We are interested in the output at which the cost curve reaches its minimum. That is the point at which (∂ ln C/∂ ln Q) | Q = Q* = 1 or Q* = exp [(1 − β)/γ]. The estimated regression model using the Christensen and Greene 1970 data are as follows, where estimated standard errors are given in parentheses: The estimated asymptotic covariance of the estimators of β and γ is −0.000187067, R2 = 0.991538 and e' e = 2.443509. Using the estimates given above, compute the estimate of this efficient scale. Compute an estimate of the asymptotic standard error for this estimate then form a confidence interval for the estimated efficient scale. The data for this study are given in Table F5.2. Examine the raw data and determine where in the sample the efficient scale lies. That is, how many firms in the sample have reached this scale, and is this scale large in relation to the sizes of firms in the sample?

The consumption function used in Example 5.3 is a very simple specification. One might wonder if the meager specification of the model could help explain the finding in the Hausman test. The data set used for the example is given in Table F5.1. Use these data to carry out the test in a more elaborate specification ct = β1 + β2yt + β3it + β4ct−1 + εt where ct is the log of real consumption, yt is the log of real disposable income, and it is the interest rate (90-day T bill rate).

Suppose we change the assumptions of the model to AS5: (xi, ε) are an independent and identically distributed sequence of random vectors such that xi has a finite mean vector, μx, finite positive definite covariance matrix Σxx and finite fourth moments E[xj xkxl xm] = φjklm for all variables. How does the proof of consistency and asymptotic normality of b change? Are these assumptions weaker or stronger than the ones made in Section 5.2?

Now, assume only finite second moments of x; E[x2i ] is finite. Is this sufficient to establish consistency of b? E [|xy|] ≤ {E[x2]} 1/2{E [y2]}1/2will be helpful.) Is this assumption sufficient to establish asymptotic normality?

A multiple regression of y on a constant x1 and x2 produces the following results: y = 4 + 0.4x1 + 0.9x2. R2 = 8/60, e' e = 520, n = 29,

Test the hypothesis that the two slopes sum to1.

Using the results in Exercise 1, test the hypothesis that the slope on x1 is 0 by running the restricted regression and comparing the two sums of squared deviations

The regression model to be analyzed is y = X1β1 + X2β2 + ε, where X1 and X2 have K1 and K2 columns, respectively. The restriction is β2 = 0.

a. Using (6-14), prove that the restricted estimator is simply [b1*, 0], where b1* is the least squares coefficient vector in the regression of y on X1. b. Prove that if the restriction is β2 = β02 for a nonzero β02, then the restricted estimator of β1 is b1* = (X'1X1) −1 X'1 (y − X2β02).

The expression for the restricted coefficient vector in (6-14) may be written in the form b* = [I − CR] b + w, where w does not involve b. What is C? Show that the covariance matrix of the restricted least squares estimator is σ2 (X' X)−1 − σ2(X' X)−1 R' [R(X' X)−1R']−1 R(X' X)−1 and that this matrix may be written as Var[b |X]{[Var(b |X)]−1 − R' [Var(Rb) |X]−1R}Var[b |X].

Prove the result that the restricted least squares estimator never has a larger covariance matrix than the unrestricted least squares estimator.

Prove the result that the R2 associated with a restricted least squares estimator is never larger than that associated with the unrestricted least squares estimator. Conclude that imposing restrictions never improves the fit of the regression.

The Lagrange multiplier test of the hypothesis Rβ − q = 0 is equivalent to aWald test of the hypothesis that λ = 0, where λ is defined in (6-14). Prove that χ2 = λ'{Est.Var[λ]} −1 λ = (n − K) [e'*e*/e'/e – 1]. Note that the fraction in brackets is the ratio of two estimators of σ2. By virtue of (6-19) and the preceding discussion, we know that this ratio is greater than 1. Finally, prove that the Lagrange multiplier statistic is equivalent to JF, where J is the number of restrictions being tested and F is the conventional F statistic given in (6-6).

Use the Lagrange multiplier test to test the hypothesis in Exercise 1.

Using the data and model of Example 2.3, carry out a test of the hypothesis that the three aggregate price indices are not significant determinants of the demand for gasoline.

The full model of Example 2.3 may be written in logarithmic terms as lnG/pop = α + βp ln Pg + βy lnY + γnc ln Pnc + γuc ln Puc + γpt ln Ppt + β year + δd ln Pd + δn ln Pn + δs ln Ps + ε. Consider the hypothesis that the microelasticities are a constant proportion of the elasticity with respect to their corresponding aggregate. Thus, for some positive θ (presumably between 0 and 1), γnc = θδd, γuc = θδd, γpt = θδs. The first two imply the simple linear restriction γnc = γuc. By taking ratios, the first (or second) and third imply the nonlinear restriction

a. Describe in detail how you would test the validity of the restriction.

b. Using the gasoline market data in Table F2.2 , test the restrictions separately and jointly.

Prove that under the hypothesis that R β = q, the estimator where J is the number of restrictions, is unbiased for σ2.

Show that in the multiple regression of y on a constant, x1 and x2 while imposing the restriction β1 + β2 = 1 leads to the regression of y − x1 on a constant and x2 − x1.

In Solow’s classic (1957) study of technical change in the U.S. economy, he suggests the following aggregate production function: q(t) = A(t) f [k (t)], where q(t) is aggregate output per work hour, k(t) is the aggregate capital labor ratio, and A(t) is the technology index. Solow considered four static models, q/A= α + β ln k, q/A= α − β/k, ln (q/A) = α + β ln k, and ln (q/A) = α + β/k. Solow’s data for the years 1909 to 1949 are listed in Appendix Table F7.2. Use these data to estimate the α and β of the four functions listed above. [Note: Your results will not quite match Solow’s. See the next exercise for resolution of the discrepancy.]

In the aforementioned study, Solow states: A scatter of q/A against k is shown in Chart 4. Considering the amount of a priori doctoring which the raw figures have undergone, the fit is remarkably tight. Except, that is, for the layer of points which are obviously too high. These maverick observations relate to the seven last years of the period, 1943–1949. From the way they lie almost exactly parallel to the main scatter, one is tempted to conclude that in 1943 the aggregate production function simply shifted.

a. Compute a scatter diagram of q/Aagainst k.

b. Estimate the four models you estimated in the previous problem including a dummy variable for the years 1943 to 1949. How do your results change? [Note: These results match those reported by Solow, although he did not report the coefficient on the dummy variable.]

c. Solow went on to surmise that, in fact, the data were fundamentally different in the years before 1943 than during and after. Use a Chow test to examine the difference in the two subperiods using your four functional forms. Note that with the dummy variable, you can do the test by introducing an interaction term between the dummy and whichever function of k appears in the regression. Use an F test to test the hypothesis.

A regression model with K = 16 independent variables is fit using a panel of sevenyears of data. The sums of squares for the seven separate regressions and the pooled regression are shown below. The model with the pooled data allows a separate constant for each year. Test the hypothesis that the same coefficients apply in everyyear.

Reverse regression. A common method of analyzing statistical data to detect discrimination in the workplace is to fit the regression y = α + x' β + γd + ε, (1) where y is the wage rate and d is a dummy variable indicating either membership (d = 1) or nonmembership (d = 0) in the class toward which it is suggested the discrimination is directed. The regressors x include factors specific to the particular type of job as well as indicators of the qualifications of the individual.The hypothesis of interest is H0: γ ≥ 0 versus H1: γ < 0. The regression seeks to answer the question, “In a given job, are individuals in the class (d = 1) paid less than equally qualified individuals not in the class (d = 0)?” Consider an alternative approach. Do individuals in the class in the same job as others, and receiving the same wage, uniformly have higher qualifications? If so, this might also be viewed as a form of discrimination. To analyze this question, Conway and Roberts (1983) suggested the following procedure:

1. Fit (1) by ordinary least squares. Denote the estimates a, b, and c.

2. Compute the set of qualification indices, q = ai + Xb. (2) Note the omission of cd from the fitted value.

3. Regress q on a constant, y and d. The equation is q = α_{*} + β_{* }y + γ_{* }d + ε_{*}. (3) The analysis suggests that ifγ < 0, γ_{*} > 0

a. Prove that the theory notwithstanding, the least squares estimates c and c_{*} are related by where

y^{1} = mean of y for observations with d = 1,

y = mean of y for all observations,

P = mean of d,

R^{2} = coefficient of determination for (1),

r^{2}yd = squared correlation between y and d.

b. Will the sample evidence necessarily be consistent with the theory? Asymposium on the Conwayand Roberts’s paper appeared in the Journal of Business and Economic Statistics in April1983.

Reverse regression continued. This and the next exercise continue the analysis of Exercise 4. In Exercise 4, interest centered on a particular dummy variable in which the regressors were accurately measured, here we consider the case in which the crucial regressor in the model is measured with error. The paper by Kamlich and Polachek (1982) is directed toward this issue. Consider the simple errors in the variables model, y = α + βx∗ + ε, x = x∗ + u, where u and ε are uncorrelated and x is the erroneously measured, observed counterpart to x∗.

a. Assume that x∗, u, and ε are all normally distributed with means μ∗, 0, and 0, variances σ2∗, σ2u , and σ2ε, and zero covariances. Obtain the probability limits of the least squares estimators of α and β.

b. As an alternative, consider regressing x on a constant and y, and then computing the reciprocal of the estimate. Obtain the probability limit of this estimator.

c. Do the “direct” and “reverse” estimators bound the true coefficient?

Reverse regression continued. Suppose that the model in Exercise 5 is extended to y = βx∗ + γd + ε, x = x∗ + u. For convenience, we drop the constant term. Assume that x∗, ε and u are independent normally distributed with zero means. Suppose that d is a random variable that takes the values one and zero with probabilities π and 1 − π in the population and is independent of all other variables in the model. To put this formulation in context, the preceding model (and variants of it) has appeared in the literature on discrimination.We view y as a “wage” variable, x∗ as “qualifications,” and x as some imperfect measure such as education. The dummy variable d is membership (d = 1) or nonmembership (d = 0) in some protected class. The hypothesis of discrimination turns onγ < 0 versus γ ≥ = 0.

a. What is the probability limit of c, the least squares estimator of γ, in the least squares regression of y on x and d? Now suppose that x∗ and d are not independent. In particular, suppose that E[x∗ | d = 1] = μ1 and E[x∗ | d = 0] = μ0. Repeat the derivation with this assumption.

b. Consider, instead, a regression of x on y and d. What is the probability limit of the coefficient on d in this regression? Assume that x∗ and d are independent.

c. Suppose that x∗ and d are not independent, but γ is, in fact, less than zero. Assuming that both preceding equations still hold, what is estimated by (y | d=1) − (y | d = 0)? What does this quantity estimate if γ does equal zero?

Suppose the true regression model is given by (8-2). The result in (8-4) shows that if either P1.2 is nonzero or β2 is nonzero, then regression of y on X1 alone produces a biased and inconsistent estimator of β1. Suppose the objective is to forecast y, not to estimate the parameters. Consider regression of y on X1 alone to estimate β1 with b1 (which is biased). Is the forecast of y computed using X1b1 also biased? Assume that E[X2 | X1] is a linear function of X1. Discuss your findings generally.What are the implications for prediction when variables are omitted from a regression?

Compare the mean squared errors of b1 and b1.2 in Section 8.2.2.

The J test in Example 8.2 is carried out using over 50 years of data. It is optimistic to hope that the underlying structure of the economy did not change in 50 years. Does the result of the test carried out in Example 8.2 persist if it is based on data only from 1980 to 2000? Repeat the computation with this subset of the data.

The Cox test in Example 8.3 has the same difficulty as the J test in Example 8.2. The sample period might be too long for the test not to have been affected by underlying structural change. Repeat the computations using the 1980 to 2000 data.

Describe how to obtain nonlinear least squares estimates of the parameters of the model y = αxβ + ε.

Use MacKinnon, White, and Davidson’s PE test to determine whether a linear or loglinear production model is more appropriate for the data in Appendix Table F6.1. (The test is described in Section 9.4.3 and Example 9.8.)

Using the Box–Cox transformation, we may specify an alternative to the Cobb–Douglas model as Using Zellner and Revankar’s data in Appendix Table F9.2, estimate α, βk, βl, and λ by using the scanning method suggested in Section 9.3.2. (Do not forget to scale Y, K, and L by the number of establishments.) Use (9-16), (9-12), and (9-13) to compute the appropriate asymptotic standard errors for your estimates, compute the two output elasticities, ∂ lnY/∂ ln K and ∂ lnY/∂ ln L, at the sample means of K and L.

For the model in Exercise 3, test the hypothesis that λ = 0 using a Wald test, a likelihood ratio test, and a Lagrange multiplier test. Note that the restricted model is the Cobb–Douglas log-linear model.

To extend Zellner and Revankar’s model in a fashion similar to theirs, we can use the Box–Cox transformation for the dependent variable as well. Use the method of Example 17.6 (with θ = λ) to repeat the study of the preceding two exercises. How do your results change?

Verify the following differential equation, which applies to the Box–Cox transformation:

Show that the limiting sequence for λ = 0 is these results can be used to great advantage in deriving the actual second derivatives of the log-likelihood function for the Box–Cox model.

What is the covariance matrix, Cov [β, β − b], of the GLS estimator β = (X'Ω−1X)−1 X'Ω−1 y and the difference between it and the OLS estimator, b =(X' X)−1 X'y? The result plays a pivotal role in the development of specification tests in Hausman (1978).

This and the next two exercises are based on the test statistic usually used to test a set of J linear restrictions in the generalized regression model: where β is the GLS estimator. Show that if Ω is known, if the disturbances are normally distributed and if the null hypothesis, Rβ = q, is true, then this statistic is exactly distributed as F with J and n − K degrees of freedom. What assumptions about the regressors are needed to reach this conclusion? Need they be nonstochastic?

Now suppose that the disturbances are not normally distributed, although Ω is still known. Show that the limiting distribution of previous statistic is (1/J) times a chisquared variable with J degrees of freedom. Conclude that in the generalized regression model, the limiting distribution of the Wald statistic W = (Rβ − q)'{R(Est.Var[β])R'}−1 (Rβ − q) is chi-squared with J degrees of freedom, regardless of the distribution of the disturbances, as long as the data are otherwise well behaved. Note that in a finite sample, the true distribution may be approximated with an F[J, n − K] distribution. It is a bit ambiguous, however, to interpret this fact as implying that the statistic is asymptotically distributed as F with J and n− K degrees of freedom, because the limiting distribution used to obtain our result is the chi-squared, not the F. In this instance, the F[J, n − K] is a random variable that tends asymptotically to the chi-squared variate.

Finally, suppose that Ω must be estimated, but that assumptions (10-27) and (10-31) are met by the estimator. What changes are required in the development of the previous problem?

a. Prove the result directly using matrix algebra.

b. Prove that if X contains a constant term and if the remaining columns are in deviation form (so that the column sum is zero), then the model of Exercise 8 below is one of these cases. (The seemingly unrelated regressions model with identical regressor matrices.)

In the generalized regression model, suppose that Ω is known.

a. What is the covariance matrix of the OLS and GLS estimators of β?

b. What is the covariance matrix of the OLS residual vector e = y − Xb?

c. What is the covariance matrix of the GLS residual vector ε = y − Xβ?

d. What is the covariance matrix of the OLS and GLS residual vectors?

Suppose that y has the pdf f (y | x) = (1/x'β)e−y/(β'x), y > 0. Then E[y | x] = β'x and Var[y | x] = (β'x)2. For this model, prove that GLS and MLE are the same, even though this distribution involves the same parameters in the conditional mean function and the disturbance variance.

Suppose that the regression model is y = μ + ε, where ε has a zero mean, constant variance, and equal correlation ρ across observations. Then Cov [εi, εj] = σ2ρ if i ≠ j . Prove that the least squares estimator of μ is inconsistent. Find the characteristic roots of Ω and show that Condition 2 after Theorem 10.2 is violated.

Suppose that the regression model is yi = μ + εi, where E[εi | xi ] = 0, Cov[εi, εj | xi , xj] = 0 for i ≠ j , but Var[εi | xi] = σ2x2i , xi > 0.

a. Given a sample of observations on yi and xi, what is the most efficient estimator of μ? What is its variance?

b. What is the OLS estimator of μ, and what is the variance of the ordinary least squares estimator?

c. Prove that the estimator in part a is at least as efficient as the estimator in part b.

For the model in the previous exercise, what is the probability limit of s2 = 1/n Σni=1 (yi − y)2? Note that s2 is the least squares estimator of the residual variance. It is also n times the conventional estimator of the variance of the OLS estimator,

How does this equation compare with the true value you found in part b of Exercise 1? Does the conventional estimator produce the correct estimate of the true asymptotic variance of the least squares estimator?

Two samples of 50 observations each produce the following moment matrices. (In each case, X is a constant and one variable.)

a. Compute the least squares regression coefficients and the residual variances s2 for each data set. Compute the R2 for each regression.

b. Compute the OLS estimate of the coefficient vector assuming that the coefficients and disturbance variance are the same in the two regressions. Also compute the estimate of the asymptotic covariance matrix of the estimate.

c. Test the hypothesis that the variances in the two regressions are the same without assuming that the coefficients are the same in the two regressions.

d. Compute the two-step FGLS estimator of the coefficients in the regressions, assuming that the constant and slope are the same in both regressions. Compute the estimate of the covariance matrix and compare it with the result of partb.

Using the data in Exercise 3, use the Oberhofer–Kmenta method to compute the maximum likelihood estimate of the common coefficient vector

This exercise is based on the following data set.

a. Compute the ordinary least squares regression of Y on a constant, X1, and X2. Be sure to compute the conventional estimator of the asymptotic covariance matrix of the OLS estimator as well.

b. Compute the White estimator of the appropriate asymptotic covariance matrix for the OLS estimates.

c. Test for the presence of heteroscedasticity using White’s general test. Do your results suggest the nature of the heteroscedasticity?

d. Use the Breusch–Pagan Lagrange multiplier test to test for heteroscedasticity.

e. Sort the data keying on X1 and use the Goldfeld–Quandt test to test for heteroscedasticity. Repeat the procedure, using X2. What do youfind?

Using the data of Exercise 5, reestimate the parameters using a two-step FGLS estimator. Try the estimator used in Example 11.4.

For the model in Exercise 1, suppose that ε is normally distributed, with mean zero and variance σ2 [1 + (γ x)2]. Show that σ2 and γ2 can be consistently estimated by a regression of the least squares residuals on a constant and x2. Is this estimator efficient?

Derive the log-likelihood function, first-order conditions for maximization, and information matrix for the model yi = x'iβ + εi, εi ~ N [0, σ2(γ' zi )2].

In the discussion of Harvey’s model in Section 11.7, it is noted that the initial estimator of γ1, the constant term in the regression of ln e2i on a constant, and zi is inconsistent by the amount 1.2704. Harvey points out that if the purpose of this initial regression is only to obtain starting values for the iterations, then the correction is not necessary. Explain why this statement would be true.

(This exercise requires appropriate computer software. The computations required can be done with RATS, EViews, Stata, TSP, LIMDEP, and a variety of other software using only preprogrammed procedures.) Quarterly data on the consumer price index for 1950.1 to 2000.4 are given in Appendix Table F5.1. Use these data to fit the model proposed by Engle and Kraft (1983). The model is πt = β0 + β1πt−1 + β2πt−2 + β3πt−3 + β4πt−4 + εt where πt = 100 ln [pt/pt−1] and pt is the price index.

a. Fit the model by ordinary least squares, then use the tests suggested in the text to see if ARCH effects appear to be present.

b. The authors fit an ARCH (8) model with declining weights, Fit this model. If the software does not allow constraints on the coefficients, you can still do this with a two-step least squares procedure, using the least squares residuals from the first step. What do you find?

c. Bollerslev (1986) recomputed this model as a GARCH (1, 1). Use the GARCH (1, 1) form and refit your model.

Does first differencing reduce autocorrelation? Consider the models yt = β'xt +εt, where εt = ρεt−1 + ut and εt = ut − λut−1. Compare the autocorrelation of εt in the original model with that of vt in yt − yt−1 = β'(xt − xt−1) + vt, where vt = εt − εt − 1.

Derive the disturbance covariance matrix for the model

What parameter is estimated by the regression of the OLS residuals on their lagged values?

The following regression is obtained by ordinary least squares, using 21 observations. (Estimated asymptotic standard errors are shown in parentheses.) yt = 1.3 + 0.97yt−1 + 2.31xt , D − W = 1.21. (0.3) (0.18) (1.04) Test for the presence of autocorrelation in the disturbances.

It is commonly asserted that the Durbin–Watson statistic is only appropriate for testing for first-order autoregressive disturbances. What combination of the coefficients of the model is estimated by the Durbin–Watson statistic in each of the following cases: AR(1), AR(2), MA(1)? In each case, assume that the regression model does not contain a lagged dependent variable. Comment on the impact on your results of relaxing this assumption.

The data used to fit the expectations augmented Phillips curve in Example 12.3 are given in Table F5.1. Using these data, reestimate the model given in the example. Carry out a formal test for first order autocorrelation using the LM statistic. Then, reestimate the model using an AR(1) model for the disturbance process. Since the sample is large, the Prais–Winsten and Cochrane–Orcutt estimators should give essentially the same answer. Do they? After fitting the model, obtain the transformed residuals and examine them for first order autocorrelation. Does the AR(1) model appear to have adequately “fixed” the problem?

Data for fitting an improved Phillips curve model can be obtained from many sources, including the Bureau of Economic Analysis’s (BEA) own website, Economagic. Com and so on, obtain the necessary data and expand the model of example 12.3, does adding additional explanatory variables to the model reduce the extreme pattern of the OLS residuals that appears in Figure 12.3?

The following is a panel of data on investment (y) and profit (x) for n = 3 firms over T = 10 periods.

a. Pool the data and compute the least squares regression coefficients of the model yi t = α + βxit + εit.

b. Estimate the fixed effects model of (13-2), and then test the hypothesis that the constant term is the same for all three firms.

c. Estimate the random effects model of (13-18), and then carry out the Lagrange multiplier test of the hypothesis that the classical model without the common effect applies.

d. Carry out Hausman’s specification test for the random versus the fixed effect model.

Suppose that the model of (13-2) is formulated with an overall constant term and n − 1 dummy variables (dropping, say, the last one). Investigate the effect that this supposition has on the set of dummy variable coefficients and on the least squares estimates of the slopes.

Use the data in Section 13.9.7 (the Grunfeld data) to fit the random and fixed effect models. There are five firms and 20 years of data for each. Use the F, LM, and/or Hausman statistics to determine which model, the fixed or random effects model, is preferable for these data.

Derive the log-likelihood function for the model in (13-18), assuming that εit and ui are normally distributed.

Unbalanced design for random effects, suppose that the random effects model of Section 13.4 is to be estimated with a panel in which the groups have different numbers of observations. Let Ti be the number of observations in group i.

a. Show that the pooled least squares estimator in (13-11) is unbiased and consistent despite this complication.

b. Show that the estimator in (13-29) based on the pooled least squares estimator of β (or, for that matter, any consistent estimator of β) is a consistent estimator of σ^{2}_{ε}.

What are the probability limits of (1/n) LM, where LM is defined in (13-31) under the null hypothesis that σ2 u = 0 and under the alternative that σ2u ≠ 0?

A two-way fixed effects model, suppose that the fixed effects model is modified to include a time-specific dummy variable as well as an individual-specific variable. Then yit = αi + γt + β'xit + εit. At every observation, the individual- and timespecific dummy variables sum to 1, so there are some redundant coefficients. The discussion in Section 13.3.3 shows that one way to remove the redundancy is to include an overall constant and drop one of the time specific and one of the timedummy variables. The model is, thus, yit = μ + (αi − α1) + (γt − γ1) + β'xit + εit. (Note that the respective time- or individual-specific variable is zero when t or i equals one.) Ordinary least squares estimates of β are then obtained by regression of yit − yi −yt + y on xit − xi.−xt + x. Then (αi − α1) and (γt − γ1) are estimated using the expressions in (13-17) while m = y – b' x. Using the following data, estimate the full set of coefficients for the least squares dummy variable model:

Test the hypotheses that (1) the “period” effects are all zero, (2) the “group” effects are all zero, and (3) both period and group effects are zero. Use an F test in each case.

Two-way random effects model, we modify the random effects model by the addition of a time specific disturbance. Thus, yit = α + β'xit + εit + ui + vt, where

Write out the full covariance matrix for a data set with n = 2 and T = 2.

The model satisfies the Group Wise heteroscedastic regression model of Section 11.7.2. All variables have zero means. The following sample second-moment matrix is obtained from a sample of 20 observations:

a. Compute the two separate OLS estimates of β, their sampling variances the estimates of σ21 and σ22, and the R2’s in the two regressions.

b. Carry out the Lagrange multiplier test of the hypothesis that σ21 = σ22.

c. Compute the two-step FGLS estimate of β and an estimate of its sampling variance. Test the hypothesis that β equals 1.

d. Carry out the Wald test of equal disturbance variances.

e. Compute the maximum likelihood estimates of β, σ21, and σ22 by iterating the FGLS estimates to convergence.

f. Carry out a likelihood ratio test of equal disturbance variances.

g. Compute the two-step FGLS estimate of β, assuming that the model in (14-7) applies. (That is, allow for cross-sectional correlation.) Compare your results with those of part c.

Suppose that in the group wise heteroscedasticity model of Section 11.7.2, Xi is the same for all i. What is the generalized least squares estimator of β? How would you compute the estimator if it were necessary to estimate σ2i?

Repeat Exercise 10 for the cross sectionally correlated model of Section 13.9.1.

The following table presents a hypothetical panel of data:

a. Estimate the group wise heteroscedastic model of Section 11.7.2. Include an estimate of the asymptotic variance of the slope estimator. Use a two-step procedure, basing the FGLS estimator at the second step on residuals from the pooled least squares regression.

b. Carry out the Wald, Lagrange multiplier, and likelihood ratio tests of the hypothesis that the variances are all equal. For the likelihood ratio test, use the FGLS estimates.

c. Carry out a Lagrange multiplier test of the hypothesis that the disturbances are uncorrelated acrossindividuals.

A sample of 100 observations produces the following sample data: The underlying bivariate regression model is y1 = μ + ε1, y2 = μ + ε2.

a. Compute the OLS estimate of μ, and estimate the sampling variance of this estimator.

b. Compute the FGLS estimate of μ and the sampling variance of the estimator.

Consider estimation of the following two equation model: y1 = β1 + ε1, y2 = β2x + ε2. A sample of 50 observations produces the following moment matrix:

a. Write the explicit formula for the GLS estimator of [β1, β2].What is the asymptotic covariance matrix of the estimator?

b. Derive the OLS estimator and its sampling variance in this model.

c. Obtain the OLS estimates of β1 and β2, and estimate the sampling covariance matrix of the two estimates. Use n instead of (n − 1) as the divisor to compute the estimates of the disturbance variances

d. Compute the FGLS estimates of β1 and β2 and the estimated sampling covariance matrix.

e. Test the hypothesis that β2 = 1.

The model y1 = β1x1 + ε1, y2 = β2x2 + ε2 satisfies all the assumptions of the classical multivariate regression model. All variables have zero means. The following sample second-moment matrix is obtained from a sample of 20 observations:

a. Compute the FGLS estimates of β1 and β2.

b. Test the hypothesis that β1 = β2.

c. Compute the maximum likelihood estimates of the model parameters.

d. Use the likelihood ratio test to test the hypothesis in part b.

Join SolutionInn Study Help for

1 Million+ Textbook Solutions

Learn the step-by-step answers to your textbook problems, just enter our Solution Library containing more than 1 Million+ textbooks solutions and help guides from over 1300 courses.

24/7 Online Tutors

Tune up your concepts by asking our tutors any time around the clock and get prompt responses.