Question: Logistic Regression This exercise relates to the diabetes dataset available on Blackboard as diabetes.csv. It contains demographic and medical data for 768 females over the

Logistic Regression This exercise relates to the diabetes dataset available on Blackboard as diabetes.csv. It contains demographic and medical data for 768 females over the age of 21. The variables are defined below:

Variable Name

Description

Pregnancies

Glucose BloodPressure SkinThickness Insulin

BMI

DiabetesPedigreeFunction Age

Outcome

Number of times pregnant

Plasma glucose concentration a 2 hours in an oral glucose tolerance test Diastolic blood pressure (mm Hg)

Triceps skin fold thickness (mm) 2-Hour serum insulin (mu U/ml)

Body mass index ((Weight (in kg))/(Height (in ²))) Diabetes pedigree function

Age (in years)

Class variable (0 if no diabetes, 1 if individual has diabetes)

Please answer the following questions:

Load the data into R. Print the structure of the dataset and explain the output.

Hint: Use the read.csv and str commands. This can be done in 2 lines of code.

Convert the variable Outcome into a factor variable. Print the frequency distribution of the Outcome variable using the table command and explain what it means.

Hint: Use the as.factor and table commands. You only need two lines of code for this.

Create your training set with a random selection of 70% of the rows in the dataset and your testing set with the other 30%. Use seed value 123 for this randomization. Print the frequency distribution of the outcome variable in both train and test data. Are the two datasets similar in terms of the distribution of the outcome variable? Explain.

Hint: You can use the sample command for the split. You will also need the set.seed command.

Train a logistic regression model on the training dataset. How many of the variables are significant?

Hint: Use the glm and summary commands to for this part.

Generate predictions on the testing dataset using the model produced through logistic regression in step 5. Report the confusion matrix of your logistic regression model on the train set when the threshold is set to 0.25. Compute the accuracy, true positive rate, and false positive rate for the model.

Hint: You can use predict function for generating testing predictions, an ifelse command to create binary predictions, and table to create a confusion matrix. This should take only 3 lines of code.

Generate ROC plots and precision recall plots for both, the training and the testing dataset. Report the area under the curve and also attach the plots in your final submission. Provide brief explanations of what each curve and their respective AUCs represent.

Hint: Use the ROCR library.

An individual displays the following traits: pregnancies = 1, glucose = 130, blood pressure = 80, skin thickness = 22, insulin = 100, BMI = 25, diabetes pedigree function = 0.5, age = 50. According to your final model, what is the probability that the individual has diabetes? Show your working. Note: This is a manual calculation. Do not do this part with R. You can round the coefficient estimates to 2 decimal places for ease of work.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!

The Science Road Show (SRS) is a not-for-profit organization that puts on traveling science demonstrations throughout the state. SRS operates on a September 1-August 31 fiscal year. The organization...

IHP 525 Final Project Part I Articles List Submit a draft of the conclusions section of the Final Project Part I: Articles review. Specifically address this critical element. What does your...

IHP 525 Final Project Part I Articles List The pairs of articles I have chosen to use in project part I. They are attached below. Option 3 Doering, L. V., McKinley, S., Riegel, B., Moser, D. K.,...

IHP 525 Final Project Part I Articles List This the pairs of articles I have chosen to use in project part I. They are attached below. Option 3 Doering, L. V., McKinley, S., Riegel, B., Moser, D. K.,...

AMERICAN JOURNAL OF HUMAN BIOLOGY 14:762-768 (2002) Is Obesity Associated With Poor Sleep Quality in Adolescents? NEERAJ K. GUPTA,1 WILLIAM H. MUELLER,1* WENYAW CHAN,1 AND JANET C. MEININGER2 I...

I need to see the SPSS output. You need to have all z-scores, all charts, all descriptives data from SPSS, everything you used to answer the questions. I am sending you what the previous tutor sent...

Submitted to Management Science manuscript MS-0001-1922.65 Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the journal title....

This question is related to machine learning 1. Gradient descent - Logistic regression In this question we are going to experiment with logistic regression. This exercise focuses on the inner...

9.5 Use log-linear models to examine the housing satisfaction data in Table 8.5. The numbers of people surveyed in each type of housing can be regarded as fixed. a. First, analyze the associations...

use the code r Script below to Answer the questions from number 3 to 7 Questions : 3. Model #1 - First Logistic Regression Model Reporting Results Report the results of the regression model. Address...

Exercise 1: The 2010 National Hospital Ambulatory Medical Care Survey (NHAMCS) is a national (United States) sample survey of visits to hospital outpatient and emergency departments.This survey was...

Analyzing the Effect of an Inventory Error Disclosed in an Actual Note to a Financial Statement Several years ago, the financial statements of Gibson Greeting Cards, now part of American Greetings,...

Discuss whether the network facility asset is impaired and whether it should be written down to $300 000. Provide any journal entries necessary.

What is 1 2 . 9 2 / 0 . 3 8

Case Greenbuilders Pte Lid ("GPL") is a local construction company which specialises in providing bespoke Design and Build construction services for private residential homes that are designed to be...