Question: 5 ) The diabetes data set is a prospective study of onset of adult diabetes given a number of risk factors among the Pima Indian
The diabetes data set is a prospective study of onset of adult diabetes given a number of risk factors among the Pima Indian tribe. Using the diabetes.csv data set
a Separate the first half of the data from the second half, use the first half for training, second for testing
b Using the training data
i Construct the full logistic regression model for outcome
ii Using backwards selection construct the logistic regression model with every p value for the coefficients Show Steps!!!
c Predict the "response" eg type"response" for the full logistic regression model for
i the training data set,
ii the test data set, d Predict the "response" for the smallest logistic model from the backwards selection exercise
i the training data set,
ii the test data set,
e Using random forest, build a model on the training data
f You now have models, Full Logistic, smallest logistic, and random forest. For predictions of each calculate and tabulate
i Number of correct positives
ii Number of False positives
iii. Number of correct negatives
iv Number of false negatives. g Using the results of f is there one of the methods which appears best in modeling new results, or does it depend on whether it is more important to identify positives predict diabetes or negatives predict health
h Now redo analysis twice using random selection of out of for training and the complement for testing. Is there anything you can conclude with this additional information about the merits of each approach?
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
