Question: Question 1 - 22 marks Note: Your solution should be contained in a Jupyter notebook. See the module website for guidance, and to download the
Question 1 - 22 marks
Note: Your solution should be contained in a Jupyter notebook. See the module website for guidance, and to download the required data file.
In the Belo Horizonte Caries Prevention (BELCAP) study, the effectiveness of different interventions for oral health in children was compared. A total of 797 children (all of whom were 7 years old at the beginning of the study) were recruited. They were then given one of six different treatments (one of four interventions, all four interventions together or none of the interventions). Children in the same school received the same treatment.
The oral health of the children was measured using the decayed, missing and filled teeth (DMFT) index. This index is the number of decayed, missing or filled teeth that a person has. In the BELCAP study, only eight teeth were considered for each child, so the index lies between 0 and 8. Note that it was possible for low grade lesions on children's teeth (classified as 'decay' in this study) to improve, which may have resulted in a lower DMFT index by the end of the study.
The data are given in the data frame dmft and stored in the file dmft.RData. The variables are as follows:
startDMFT: the child's DMFT value at the beginning of the study
treat: the treatment given to the child:
- 1: oral health education at school
- 2: all four active interventions together
- 3: control (no active intervention)
- 4: enrichment of the school diet with rice bran
- 5: mouth wash
- 6: oral hygiene
endDMFT: the child's DMFT value at the end of the study. This is the response variable.
(a) Traditionally, in dental epidemiology, Poisson models are fitted to DMFT data. What other distribution would you consider, and why? [2]
(b) Preliminary analysis:
(i) Produce a table of the data in dmft showing the number of children in each of the six treatment groups, and comment on it. [1]
(ii) Provide appropriate visual summaries of the variables startDMFT and endDMFT and comment on your results. [3]
(iii) Generate appropriate plots that illustrate the relationship between endDMFT and treat, and the relationship between startDMFT and treat. Comment on these plots. Then use the plot showing the relationship between startDMFT and treat to explain why startDMFT should be a covariate in the model.
[4]
(c) Traditionally, in dental epidemiology, the DMFT index measured at the start of a study is log-transformed.
Fit a Poisson model containing the main effects of both explanatory variables and their interaction to the data in dmft, where startDMFT is transformed to log(startDMFT + 0.5). Explain why there would be a problem if you used log(startDMFT), and why adding 0.5 to startDMFT solves this problem. Test if an interaction between log(startDMFT + 0.5) and treat is needed. Report your results. [4]
(d) Check if the model you found in part (c) satisfies the model assumptions you have made. [5]
(e) Consider a child who received treatment 4 and had a DMFT value of 5 at the beginning of the study. Use the coefficients from the model you found in part (c) to predict such a child's DMFT value at the end of the study (to the nearest integer). [3]
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
