Question: Using RapidMiner, build a logistic regression model, using 70% of the entire data for training and the other 30% for testing (using shuffled sampling in

Using RapidMiner, build a logistic regression model, using 70% of the entire data for training and the other 30% for testing (using shuffled sampling in RM), to classify patients into those who are likely to have a stroke and those who are not. Use all the available variables as predictors except id and the default thresholds by RM to generate the classifications. [40 pts.] (a) Which independent variable(s) are significant at p=0.05? (b) How to interpret the coefficients of the following two variables in this logistic regression: (i) age, and (ii) hypertension? Explain in terms of the effect of a unit change of the independent variable on the odds of the dependent variable and show steps of your derivation to get full credits. For example, how does a unit increase in age affect the odds of stroke? Explain not just whether it affects the odds positively or negatively, but also by how much. Similarly, how does hypertension affect the odds of stroke? (c) Evaluate the predictive accuracy of the model using appropriate metrics (e.g., specificity, sensitivity, precision, false positive rate, false negative rate, AUC, etc.) that you learned in this week. (Do not just provide the numbers; offer your own

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!