Question: PSTAT 175: Practice Final Question Solutions Final Exam: Thursday, December 8 8:00-11:00 Practice Questions 1. A multi-center trial of a new drug protocol decided to

PSTAT 175: Practice Final Question Solutions Final Exam: Thursday, December 8 8:00-11:00 Practice Questions 1. A multi-center trial of a new drug protocol decided to perform the analysis stratifying on the clinics. The experimenters measured the time until relapse in a \"drug\" group and in a \"control\" group. The patient care occurred at 6 clinics in different parts of the country. (a) What is the advantage of using a stratified model instead of a purely Cox proportional hazards model in this circumstance? It allows us to model confounding variables that do not satisfy the proportional hazards assumption. It is a more general procedure allowing for a variety of differences among the 6 clinics. (b) Here is some R output coxph(formula = Surv(time, status) ~ treat + strata(clinic)) treatdrug coef -1.1 exp(coef) se(coef) 0.334 0.262 Likelihood ratio test=20 number of events= 76 z p -4.19 2.8e-05 on 1 df, p=7.73e-06 n= 203, There no estimate for the clinic coefficient because we have specified that we are stratifying on the clinic. The function estimates an entire baseline hazard function for each value of the clinic strata. As a result, there is no one single coefficient that describes the effect of the clinic in the model. (c) We would conclude that the drug is having a statistically significant effect on the time until relapse. We have controlled for the possibility that different clinics are getting different results. The result is that the drug treated group decreased their hazard rate about 66%. This is a good outcome for our model. 2. An engineer collected the following data on the number of days until a system failure 5, 1, 2+, 1, 2, 2, 3+, 3, 1, 4+ The observations marked with a \"+\" were censored at that day. (a) The Kaplan-Meier estimate of the survival function over the first 6 days is ti di number at risk i S(t+) 1 2 3 4 5 6 3 2 1 0 1 0 10 7 4 2 1 0 0.3 0.28 0.25 0 1 0 0.7 0.5 0.375 0.375 0 0 (b) Greenwood's approximation is Var(log(S(3))) X i:ti 3 di 3 2 1 = + + = 0.1833 ni (ni di ) 10(7) 7(5) 4(3) 3. A Weibull distribution has density f (t | , ) = t1 exp [(t) ] (a) If we have n independent observations t1 , . . . , tn from a Weibull distribution with = 1/2 then the log likelihood is `() = n log(1/2) + n n X 1X n 1/2 log log ti 1/2 ti 2 2 i=1 i=1 n X d n 1 1/2 `() = 1/2 ti d 2 2 i=1 If we set the derivative to 0 then we get the equations n X 1 n 1/2 = 1/2 ti 2 2 i=1 n 1 X 1/2 t n i=1 i #2 " n X 1/2 1 = t n i=1 i 1/2 = (b) If, in addition, we have 10 observations from this distribution that were all censored at time 100, then we have additional terms in the likelihood of the form h i P{Tn+i > 100} = exp (100)1/2 The first derivative of the log likelihood becomes n X d n 1 10 1/2 `() = 1/2 ti 1/2 (10) d 2 2 2 i=1 Our MLE is therefore, n X n 1 1/2 = 1/2 ti + 501/2 2 2 i=1 " n # 1 X 1/2 1/2 t + 50 = n i=1 i " n #2 X 1/2 2 =n t + 50 i i=1 4. There were 100 individuals that were seen at a clinic and deemed to be high risk for diabetes. A study followed up with these individuals with medical tests every two years to investigate their risk of developing diabetes. The following table gives the number of subjects who either showed clinical signs of diabetes (\"Developed Diabetes\") or were not available any longer for testing (\"Lost to Follow Up\"). Year 0-2 2-4 4-6 6-8 8-10 10-12 12-14 Number Developed Diabetes 2 1 4 3 2 2 3 Number Lost to Follow Up 3 2 8 10 18 21 21 (a) Calculate an appropriate estimate of the conditional probability that a patient will develop diabetes between 8 and 10 years after the study began given that they had not developed diabetes before then. (b) Calculate an appropriate estimate of the probability that a patient will go at least 5 years before developing diabetes. 5. We observed the time to completion for 4 3D printers on a particular project: 89, 97, 120, and 156 minutes. Unfortunately a fifth printer was turned off after 100 minutes before it was able to complete the project. Using this data from all 5 printers, we get estimates ti 89 97 100 120 156 ni 5 4 3 2 1 di 1 1 0 1 1 i) h(t 0.2 0.25 0 0.5 1 i) S(t 0.8 0.6 0.6 0.3 0 6. The MTD has a fleet of 500 buses that run hybrid gas-electric engines. They are interested in collecting data about battery life under typical working conditions in order to compare the durability of Brand A and Brand B batteries. The mileage on the odometer when a new battery is installed is recorded in the data column startmiles. Every time a battery fails or a bus is removed from use, the mileage is recorded in the column stopmiles. For some buses, more than one battery had to be replaced so there are multiple observations for that single bus. The number of the bus is recorded in the column BusNo. Also, the BatNo column records the number of the battery. In other words, 1 for the first battery, 2 for the second one installed in this bus, 3 for the third, etc. Buses that were removed from service for reasons other than a battery problem or who had functioning batteries at the end of the study were recorded as censored observations in the status column. Brand is a factor taking two levels: \"A\" and \"B\" (a) We run a simple analysis of this data coxph(formula = Surv(startmiles, stopmiles, status) ~ Brand) coef exp(coef) se(coef) z p BrandB 0.78 2.1814 0.262 2.98 0.0028 Likelihood ratio test=20 on 1 df, p=7.73e-06 n= 203, number of events= 76 A 95% confidence interval for the coefficient is 0.78 1.96(0.262) = 0.267 0.78 + 1.96(0.262) = 1.29 The hazard ratio of Brand B relative to Brand A is thus between e0.267 = 1.31 and e1.29 = 3.63 with 95% confidence. We would conclude that there is a significant difference between the two brands with Brand B having a greater hazard rate and therefore a shorter lifetime. (b) We decide to include the battery number in the analysis, coxph(formula = Surv(startmiles, stopmiles, status) ~ Brand + strata(BatNo)) coef exp(coef) se(coef) z p BrandB 0.86 0.423 0.28 3.07 0.0021 Likelihood ratio test=10.2 on 1 df, p=0.00137 n= 203, number of events= 76 No, this model does not make any substantial additional assumptions because we are stratifying on the variable BatNo. (c) Another way we could set up this model of battery lifetimes is to measure the gap times between failures. In order to do this we would replace the start and stop times with their difference. Something like this: coxph(formula = Surv(stopmiles - startmiles, status) ~ Brand + strata(BatNo)) This would be more appropriate in this case where we are likely to be more interested in the time since the battery was installed rather than the total age of the bus. Our assumption is that it is the age of the battery that is greatest factor in determining when it will fail. (d) This -log-log graph compares Kaplan-Meier estimates of the two groups. We use this graph as a way to check our model assumptions. In this case, parallel curves would indicate that the proportional hazards assumption is appropriate. However, these curves look to have different slopes and even intersect. Therefore, the Cox PH assumptions are unlikely to be appropriate. We might have to use a time-varying covariate in order to model this factor. PSTAT 175: Practice Final Questions Final Exam: Wed, December 17 12:00-3:00 The final exam will be a closed-book three-hour exam. You will need A blue book to write your answers A calculator (not a laptop or smartphone) One page (one side of an 8.5 by 11 sheet) of formulas or notes that you may reference during the exam. Practice Questions The questions will be like the ones you had in the homework. Here are a few questions to help you study. sh is ar stu ed d vi y re aC s o ou urc rs e eH w er as o. co m 1. A multi-center trial of a new drug protocol decided to perform the analysis stratifying on the clinics. The experimenters measured the time until relapse in a \"drug\" group and in a \"control\" group. The patient care occurred at 6 clinics in different parts of the country. (a) What is the advantage of using a stratified model instead of a purely Cox proportional hazards model in this circumstance? (b) Here is some R output coxph(formula = Surv(time, status) ~ treat + strata(clinic)) treatdrug coef -1.1 exp(coef) se(coef) 0.334 0.262 Likelihood ratio test=20 number of events= 76 z p -4.19 2.8e-05 on 1 df, p=7.73e-06 n= 203, Why is there no estimate for the clinic coefficient? (c) What would you conclude for this study given this output? 2. An engineer collected the following data on the number of days until a system failure 5, 1, 2+, 1, 2, 2, 3+, 3, 1, 4+ The observations marked with a \"+\" were censored at that day. (a) If we assume that this data was generated by a discrete geometric distribution, then what is our best estimate of p, the daily probability of failure. Th (b) On the other hand, if we don't want to assume a parametric form, estimate from the data the probability of failing on day 1, 2, and 3 respectively. (I am looking for 3 estimates, one for each day.) 3. A Weibull distribution has density f (t | , ) = t1 exp [(t) ] (a) If we have n independent observations t1 , . . . , tn from a Weibull distribution with = 1/2, find the formula for the maximum likelihood estimator of . (b) If, in addition, we have 10 observations from this distribution that were all censored at time 100, find a new formula for the MLE of . https://www.coursehero.com/file/16692801/P175FinalInfoandQuestionsF14pdf/ 4. We observed the time to completion for 4 3D printers on a particular project: 89, 97, 120, and 156 minutes. Unfortunately a fifth printer was turned off after 100 minutes before it was able to complete the project. Using this data from all 5 printers, calculate the Kaplan-Meier estimate of the survival function. 5. The MTD has a fleet of 500 buses that run hybrid gas-electric engines. They are interested in collecting data about battery life under typical working conditions in order to compare the durability of Brand A and Brand B batteries. The mileage on the odometer when a new battery is installed is recorded in the data column startmiles. Every time a battery fails or a bus is removed from use, the mileage is recorded in the column stopmiles. For some buses, more than one battery had to be replaced so there are multiple observations for that single bus. The number of the bus is recorded in the column BusNo. sh is ar stu ed d vi y re aC s o ou urc rs e eH w er as o. co m Also, the BatNo column records the number of the battery. In other words, 1 for the first battery, 2 for the second one installed in this bus, 3 for the third, etc. Buses that were removed from service for reasons other than a battery problem or who had functioning batteries at the end of the study were recorded as censored observations in the status column. Brand is a factor taking two levels: \"A\" and \"B\" (a) We run a simple analysis of this data coxph(formula = Surv(startmiles, stopmiles, status) ~ Brand) coef exp(coef) se(coef) z p BrandB 0.78 2.1814 0.262 2.98 0.0028 Likelihood ratio test=20 on 1 df, p=7.73e-06 n= 203, number of events= 76 Calculate a 95% confidence interval for the hazard ratio of Brand B relative to Brand A? What do we conclude? (b) We decide to include the battery number in the analysis, coxph(formula = Surv(startmiles, stopmiles, status) ~ Brand + strata(BatNo)) coef exp(coef) se(coef) z p BrandB 0.86 0.423 0.28 3.07 0.0021 Likelihood ratio test=10.2 on 1 df, p=0.00137 n= 203, number of events= 76 Th Does this model require additional assumptions relative to the model in part (a)? (c) Describe another way we could set up this model of battery lifetimes. Be specific as to how we should set up the call to the coxph function. (d) This -log-log graph compares Kaplan-Meier estimates of the two groups. What do we use this graph for and what is it telling us in this situation? https://www.coursehero.com/file/16692801/P175FinalInfoandQuestionsF14pdf/ sh is ar stu ed d vi y re aC s o ou urc rs e eH w er as o. co m Th https://www.coursehero.com/file/16692801/P175FinalInfoandQuestionsF14pdf/ Powered by TCPDF (www.tcpdf.org)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!