Question: Dataset: Breast Cancer Wisconsin ( Diagnostic ) Data Set Classification Exercise: Predict the Diagnosis of Breast Cancer Based on Diagnostic Features. Description: The Breast Cancer
Dataset: Breast Cancer Wisconsin Diagnostic Data Set
Classification Exercise: Predict the Diagnosis of Breast Cancer Based on Diagnostic Features.
Description:
The Breast Cancer Wisconsin dataset contains data on various diagnostic features of breast tumors. The goal is to predict whether a tumor is benign or malignant based on features derived from a digitized image of a fine needle aspirate FNA of a breast mass.
Target Variable: Diagnosis M malignant, B benign
Import LibrariesDataset
Download the dataset
Import the required libraries
Data Visualization and Exploration M
Print rows for sanity check to identify all the features present in the dataset and if the target matches with them.
Comment on class imbalance with appropriate visualization method.
Provide appropriate visualizations to get an insight about the dataset.
Do the correlational analysis on the dataset. Provide a visualization for the same. Will this correlational analysis have an effect on feature selection that you will perform in the next step? Justify your answer. Answers without justification will not be awarded marks.
Any other visualization specific to the problem statement.
Data Preprocessing and cleaning M
Do the appropriate preprocessing of the data like identifying NULL or Missing Values if any, handling of outliers if present in the dataset, skewed data etc. Mention the preprocessing steps performed in the markdown cell. Explore a few latest data balancing tasks and its effect on model evaluation parameters.
Apply appropriate feature engineering techniques for them. Apply the feature transformation techniques like Standardization, Normalization, etc. You are free to apply the appropriate transformations depending upon the structure and the complexity of your dataset. Provide proper justification. Techniques used without justification will not be awarded marks. Explore a few techniques for identifying feature importance for your feature engineering task.
Model Building M
Split the dataset into training and test sets. Answers without justification will not be awarded marks.
Case : Train Test xtrain ytrain;
xtest ytest;
Case : Train Test xtrain ytrain;
xtest ytest
Explore kfold cross validation.
Build Models using Logistic Regression MLE Any other appropriate model.
Explore the need of regularization and incorporate few relevant techniques for the problem statement.
Compare models with and without regularization in a tabular format and justify the findings.
Performance Evaluation M
Do the prediction for the test data and display the results for the inference. Calculate all the evaluation metrics and choose best for your model. Justify your answer. Answers without justification will not be awarded marks.
Comment on underfittingoverfittingjust right model. Justify your comment. Answers without justification will not be awarded marks.
Submission: Only two files should be uploaded on canvas without zipping them. One is ipynb file and other one html or pdf with output of the ipynb file.
Model Deployment M
Study and compare methodstools for deploying ML models.
Persist save and deploy the model you have built in assignment using one of the methodstools studied in The deployment solution should be capable of accepting HTTP requests with new feature values, querying the saved model and returning the result back to the user.
Submission: A ppt max slides for Part and source code for Part
Presentation and Viva M
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
