Question: I was given this question as in class exercise using STATA: The following questions concern the study by Gross et al. (1999) about the relationship

I was given this question as in class exercise using STATA: The following questions concern the study by Gross et al. (1999) about the relationship between funding by the National Institutes of Health and the burden of 29 diseases. The data are given in a Stata data file called 3.ex.Funding.dta.The variable names and definitions in this file are: disease = condition or disease, id = a numeric disease identification number, dollars = thousands of dollars of NIH research funds per year, incid = disease incidence rate per 1000, preval = disease prevalence rate per 1000, hospdays = thousands of hospital-days, mort = disease mortality rate per 1000, yrslost = thousands of life-years lost, disabil = thousands of disability-adjusted life-years lost.

my qyaestion is how to do the residual analysis in these questions as well as how to identify the disease with the large influence in Q3

  1. Regress log[dollars] against log[hospdays], log[mort], log[yrslost], and log[disabil]. Calculate the expected log[dollars] and studentized residuals for this regression. What bounds should contain 95% of the studentized residuals under this model? Draw a scatter plot of these residuals against expected log[dollars]. Draw horizontal lines at zero and the 95% bounds for the studentized residuals. What does this graph tell you about the quality of the fit of this model to these data?
  2. In the model from Question 1, calculate the delta beta influence statistic for log[mort]. List the values of this statistic together with the disease name, studentized residual, and leverage for all diseases for which the absolute value of this delta beta statistic is greater than 0.5. Which disease has the largest influence on the log[mort] parameter estimate?
  3. Draw scatter plots of log[dollars] against the other covariates in the model from Question 1. Identify the disease in these plots that had the most influence on log[mort] in Question 2. Does it appear to be particularly influential in any of these scatter plots? 8. Regress log[dollars] against log[disabil] and log[hospdays]. What is the estimated expected amount of research funds budgeted for a disease that causes a million hospital-days a year and the loss of a million disability adjusted life-years? Calculate a 95% confidence interval for this expected value. Calculate a 95% prediction interval for the funding that would be provided for a new disease that causes a million hospital-days a year and the loss of a million disability-adjusted life-years.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!