Question: Problem 3 : Heart Disease Dataset In Problem 3 , you will be working with the Statlog Heart Disease Dataset. This dataset contains medical information

Problem

3

: Heart Disease Dataset

In Problem

3,

you will be working with the Statlog Heart Disease Dataset. This dataset contains medical information about

270

individuals, including a column that indicates whether or not the individual has heart disease. You can find more about

this dataset, including descriptions of its columns, here: Statlog Heart Dataset.

Load the data stored in the tab

-

delimited file heart

_

disease.txt into a DataFrame named hd

.

Use head

()

to display

the first

10

rows of this DataFrame.

Our goal in this problem will be to create a logistic regression model to predict the label heart

_

disease using the other

columns as features. Note that a value of

1

in the heart

_

disease column indicates an absence of heart disease, while a

value of

2

indicates the presence of heart disease.

Perform the following steps in a single code cell:

Create a

2

D feature array named X

3

containing the relevant features, as well as a

1

D label array named y

3

containing the labels.

(

Note: These should be arrays, and not DataFrames or Series.

)

Use train

_

test

_

split

()

to split the data into training and testing sets using an

80 / 20

split. Name the

resulting arrays X

_

train

_3,

_

test

_3,

_

train

_3,

and y

_

test

_3 .

Set random state

= 1 .

Use stratified

sampling.

Print the shapes of X

_

train

_3

and X

_

test

_3 .

Include text labeling the two results as shown below. Add

spacing to ensure that the shape tuples are left

-

aligned.

Training Features Shape: xxxx

Test Features Shape: xxxx

5

We will now create a logistic regression model that can be used to estimate the probability that an individual has heart

disease based on the feature values.

Create a logistic regression model named hd

_

mod with solver

='

lbfgs

'

and penalty

=

'none'. Then fit the model

to the training data. If you get a warning message stating that the model failed to converge, then increase the max

_

iter

parameter until it does converge.

Display the intercepts and coefficients for the final model with text labels as shown below. Note that the coefficients

array will not fit on a single line, so please display it BENEATH the line containing the "Coefficients:" label.

Intercept: xxxx

Coefficients:

xxxx

We will now calculate the accuracy score for the model on both the training set and the test set.

Calculate and print the training and testing accuracy scores for your model, rounded to four decimal places. Include the

text labels explaining which value is which, as shown below. Add spacing to ensure that the scores are left

-

aligned.

Training Accuracy: xxxx

Testing Accuracy: xxxx

We will now use the model to generate predictions regarding the presence or absence of heart disease for individuals in the

test set.

Use your model to generate label predictions for observations in the test set. Store the results in a variable named

test

_

pred

_3 .

Print the first

20

observed labels for the test set, and then the first

20

predicted labels. Include text

labels with your output as shown below. Each label array should be displayed on a single line, and the two arrays should

be left

-

aligned.

Observed Labels: xxxx

Predicted Labels: xxxx

As a final step, we will use our model to estimate the probability that each individual in the test set has heart disease.

Use the predict

_

proba

()

method of your model to estimate probabilities of being in each of the two classes for each

individual in the test set. This function returns a

2

D array. Display the predicted probabilities for the first

10

observations

in the test set as a DataFrame with the columns named according to the labels that they represent

(1

and

2) .

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

The instructions and background information for this case are in the file called Medical ABC_Case #2. The 8 questions that need to be answered are at the end of the document. The data used to answer...

BHA 3002, Health Care Management Course Syllabus Course Description Health Care Management contains an introduction to the field of modern healthcare management through a systematic analysis of the...

Assessment Task: Digital presentation (500 words + 10 minute video) Value (of total mark) :40% For this assignment you are required to analyse the following coroner's case: Inquest into the death of...

Table of Contents Introduction. Hypothesis. Methods ..5 148 194714) Results.. Table I Western Governor Township Race by Family History of Heart Disease. Table 3 Analysis of Variance Difference in...

This text was adapted by The Saylor Foundation under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License without attribution as requested by the work's original creator or licensee. 1...

Study Guide Healthcare Statistics By Jacqueline K. Wilson, RHIA About the Author Jacqueline K. Wilson is a Registered Health Information Administrator (RHIA) who has more than ten years of experience...

P A R T TW H R E E I L S O N , Providers of Health Services Q U A S H E 1 9 9 7 B U 141 Copyright 2008 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole...

Please read attached and Critique: Write critique of the paper. Do you agree with the author that statistics is important? Why? Feel free to provide your own examples of a valuable statistical study....

IHP 525 Final Project Part I Articles List Submit a draft of the conclusions section of the Final Project Part I: Articles review. Specifically address this critical element. What does your...

Board CHAPTER 1 Economics: Foundations and Models n this book, we use economics to answer questions such as the following What determines the prices of goods and services from bottled water to smart...

Rossie Equipment Manufacturing Co. acquired the assets of Alba Inc., a competitor, in 2016. It recorded goodwill of $70,000 at acquisition. Because of defective machinery Alba had produced prior to...

In problems, a slop field is given for a differential equation of the form y' = f(x,y). Use the slope field to sketch the solution that satisfies the given initial condition. In each case, find...

The primary focus of corporate governance is to ensure that management acts in the best interests of shareholders. True False

Consider two functions f and g on 1 8 such that ff x dx 14 6 Sommer 9f x dx Simplify your answer a b f f x g x dx 1 C d 6 f f x g x dx e f Simplify your answer Simplify your answer f g x f x g x f x...