Question: Dataset: Breast Cancer Wisconsin ( Diagnostic ) Data Set Classification Exercise: Predict the Diagnosis of Breast Cancer Based on Diagnostic Features. Description: The Breast Cancer

Dataset: Breast Cancer Wisconsin (Diagnostic) Data Set
Classification Exercise: Predict the Diagnosis of Breast Cancer Based on Diagnostic Features.
Description:
The Breast Cancer Wisconsin dataset contains data on various diagnostic features of breast tumors. The goal is to predict whether a tumor is benign or malignant based on features derived from a digitized image of a fine needle aspirate (FNA) of a breast mass.
Target Variable: Diagnosis (M = malignant, B = benign)
Import Libraries/Dataset
1. Download the dataset
2. Import the required libraries
1. Data Visualization and Exploration [4 M]
1. Print 2 rows for sanity check to identify all the features present in the dataset and if the target matches with them.
2. Comment on class imbalance with appropriate visualization method.
3. Provide appropriate visualizations to get an insight about the dataset.
4. Do the correlational analysis on the dataset. Provide a visualization for the same. Will this correlational analysis have an effect on feature selection that you will perform in the next step? Justify your answer. Answers without justification will not be awarded marks.
5. Any other visualization specific to the problem statement.
2. Data Pre-processing and cleaning [4 M]
1. Do the appropriate pre-processing of the data like identifying NULL or Missing Values if any, handling of outliers if present in the dataset, skewed data etc. Mention the pre-processing steps performed in the markdown cell. Explore a few latest data balancing tasks and its effect on model evaluation parameters.
2. Apply appropriate feature engineering techniques for them. Apply the feature transformation techniques like Standardization, Normalization, etc. You are free to apply the appropriate transformations depending upon the structure and the complexity of your dataset. Provide proper justification. Techniques used without justification will not be awarded marks. Explore a few techniques for identifying feature importance for your feature engineering task.
3. Model Building [6 M]
1. Split the dataset into training and test sets. Answers without justification will not be awarded marks.
Case 1: Train =80% Test =20%[ x_train1, y_train1]=80%;
[ x_test1, y_test1]=20%;
Case 2: Train =10% Test =90%[ x_train2, y_train2]=10%;
[ x_test2, y_test2]=90%
2. Explore k-fold cross validation.
3. Build Model/s using 1) Logistic Regression 2) MLE 3) Any other appropriate model.
4. Explore the need of regularization and incorporate few relevant techniques for the problem statement.
5. Compare models with and without regularization in a tabular format and justify the findings.
4. Performance Evaluation [4 M]
1. Do the prediction for the test data and display the results for the inference. Calculate all the evaluation metrics and choose best for your model. Justify your answer. Answers without justification will not be awarded marks.
2. Comment on underfitting/overfitting/just right model. Justify your comment. Answers without justification will not be awarded marks.
3. Submission: Only two files should be uploaded on canvas without zipping them. One is ipynb file and other one html or pdf with output of the ipynb file.
5. Model Deployment [7 M]
1. Study and compare 4-5 methods/tools for deploying ML models.
2. Persist (save) and deploy the model you have built in assignment 1, using one of the methods/tools studied in (1). The deployment solution should be capable of accepting HTTP requests with new feature values, querying the saved model and returning the result back to the user.
3. Submission: A ppt (max 5 slides) for Part 1 and source code for Part 2.
6. Presentation and Viva [5 M]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!