Question: Dataset: Breast Cancer Wisconsin ( Diagnostic ) Data Set Classification Exercise: Predict the Diagnosis of Breast Cancer Based on Diagnostic Features. Description: The Breast Cancer

Dataset: Breast Cancer Wisconsin

(

Diagnostic

)

Data Set

Classification Exercise: Predict the Diagnosis of Breast Cancer Based on Diagnostic Features.

Description:

The Breast Cancer Wisconsin dataset contains data on various diagnostic features of breast tumors. The goal is to predict whether a tumor is benign or malignant based on features derived from a digitized image of a fine needle aspirate

(

FNA

)

of a breast mass.

Target Variable: Diagnosis

(

=

malignant, B

=

benign

)

Import Libraries

/

Dataset

1 .

Download the dataset

2 .

Import the required libraries

1 .

Data Visualization and Exploration

[4

]

1 .

2

rows for sanity check to identify all the features present in the dataset and if the target matches with them.

2 .

Comment on class imbalance with appropriate visualization method.

3 .

Provide appropriate visualizations to get an insight about the dataset.

4 .

Do the correlational analysis on the dataset. Provide a visualization for the same. Will this correlational analysis have an effect on feature selection that you will perform in the next step? Justify your answer. Answers without justification will not be awarded marks.

5 .

Any other visualization specific to the problem statement.

2 .

Data Pre

-

processing and cleaning

[4

]

1 .

Do the appropriate pre

-

processing of the data like identifying NULL or Missing Values if any, handling of outliers if present in the dataset, skewed data etc. Mention the pre

-

processing steps performed in the markdown cell. Explore a few latest data balancing tasks and its effect on model evaluation parameters.

2 .

Apply appropriate feature engineering techniques for them. Apply the feature transformation techniques like Standardization, Normalization, etc. You are free to apply the appropriate transformations depending upon the structure and the complexity of your dataset. Provide proper justification. Techniques used without justification will not be awarded marks. Explore a few techniques for identifying feature importance for your feature engineering task.

3 .

Model Building

[6

]

1 .

Split the dataset into training and test sets. Answers without justification will not be awarded marks.

Case

1

: Train

= 80 %

Test

= 20 % [

_

train

1,

_

train

1] = 80 %

;

[

_

test

1,

_

test

1] = 20 %

;

Case

2

: Train

= 10 %

Test

= 90 % [

_

train

2,

_

train

2] = 10 %

;

[

_

test

2,

_

test

2] = 90 %

2 .

Explore k

-

fold cross validation.

3 .

Build Model

/

s using

1)

Logistic Regression

2)

MLE

3)

Any other appropriate model.

4 .

Explore the need of regularization and incorporate few relevant techniques for the problem statement.

5 .

Compare models with and without regularization in a tabular format and justify the findings.

4 .

Performance Evaluation

[4

]

1 .

Do the prediction for the test data and display the results for the inference. Calculate all the evaluation metrics and choose best for your model. Justify your answer. Answers without justification will not be awarded marks.

2 .

Comment on underfitting

/

overfitting

/

just right model. Justify your comment. Answers without justification will not be awarded marks.

3 .

Submission: Only two files should be uploaded on canvas without zipping them. One is ipynb file and other one html or pdf with output of the ipynb file.

5 .

Model Deployment

[7

]

1 .

Study and compare

4 - 5

methods

/

tools for deploying ML models.

2 .

Persist

(

save

)

and deploy the model you have built in assignment

1,

using one of the methods

/

tools studied in

(1) .

The deployment solution should be capable of accepting HTTP requests with new feature values, querying the saved model and returning the result back to the user.

3 .

Submission: A ppt

(

max

5

slides

)

for Part

1

and source code for Part

2 .

6 .

Presentation and Viva

[5

]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

The database BreastCancer_Wisconsin: .. _breast_cancer_dataset: Breast cancer wisconsin (diagnostic) dataset -------------------------------------------- **Data Set Characteristics:** :Number of...

2. Practicum Problems It is suggested that a Jupyter/IPython notebook be used for the programmatic components. 2.1 Problem 1 Load the iris sample dataset from sklearn (load_iris()) into Python using...

Please HELP ! This is a python programming question: Please read the instructions well. Give your own answer. Put a screenshot of the code you made. The assignment needs to be understood. Because of...

Jupiter Notebook We have covered some of the limitations of single layer neural networks in class, but they are still powerful learning systems that provide a good way to begin learning about how to...

Task 2: Perceptron for binary classification. Perceptron is a supervised learning algorithm for classification or regression. In supervised learning, you are given a data set of pairs, where the...

I NEED Test And Train accuracies in ONE Variable , TO show just one line in run screen Task 2: Perceptron for binary classification. Perceptron is a supervised learning algorithm for classification...

I want test and train accuraciies in one valu Task 2: Perceptron for binary classification. Perceptron is a supervised learning algorithm for classification or regression. In supervised learning, you...

Advanced machine learning models are beginning to revolutionise the medical sciences, where they are finding use in the detection and diagnosis of disease. The use of algorithms to diagnose and...

Breast cancer detection based KNN The Breast cancer Wisconsin ( diagnostic ) dataset from scikitlearn contains information on two types of cancer: WDBCMalignant and WDBC - Benign. The dataset...

A researcher claims that the percentage of adults in the United States who own a video game system is not 26%. Describe type I and type II errors for a hypothesis test of the indicated claim.

An air-standard Ericsson cycle has an ideal regenerator. Heat is supplied at 1000C and heat is rejected at 20C. Pressure at the beginning of the isothermal compression process is 70...

Refer to the information for Barnard Company above. The gross margin per unit is Last year, Barnard Company incurred the following costs: Last year, Barnard Company incurred the following costs:...

Find the length in cm of an arc of a circle with radius 45 cm if the arc subtends a central angle of 20 cm If you have had difficulty with this problem you should look at Appendix D of your text