Question: The black dot is the new instance we are going to predict with K-nearest Neighbour. Using Euclidean Distance, calculate its distances to its 3 nearest
- The black dot is the new instance we are going to predict with K-nearest Neighbour. Using Euclidean Distance, calculate its distances to its 3 nearest neighbors. What is the shape of this dot if k=3? What would be its shapes if k = 5? 10 pointsDownload the dataset for this assignment at,https://drive.google.com/file/d/1HCvBMHk1Xc1deEFJF3TrT7GOEmAACbUN/view?usp=sharing. This is the dataset of 100 patients' records with their diagnostic result. The variable 'diagnosis_result' is the target variable, malignant (M) or benign (B). Check the number of null values. Drop the column 'id'. Replace the target variable with numerical values, 1 for malignant, 0 for benign. 10 points
- Use boxplot and ANOVA test to check the relationship between the features and the target variable. Drop the columns that show no significant difference between the group 'M' and group 'B'. 10 points
- Use correlation analysis to drop the columns that have strong correlation relationship with the others. (Note: If two features have strong correlation with each other, should you drop both or drop one of these two? )10 points
- Explain in your own words why feature scaling is important to the KNN algorithm. Normalize all the features. 10 points
- Build a K-Nearest Neighbor model with k = 3. Use a 10 folds cross validation to obtain the average accuracy score for a KNN model when k=3. 10 points
- Set up k values to odd numbers from 1 to 15. Explain why k shall not be even numbers in this case. 10 points
- Build KNN prediction models with the preset k-values. Use cross validation to obtain the average accuracy scores for the KNN models. Find the optimal k. 20 points
- Besides accuracy, there are other metrics to evaluate a classification model, which is especially helpful when the data is imbalanced. Learn about precision, recall, and f1 score with the materials below. Which scoring metric fits best for this prediction problem? Change the scoring option for cross validation to f1. Did the optimal k value change?10 points
Precision, Recall and F1-Score,https://youtu.be/sJR-1yz7mnI
Precision, Recall and Predicting Cervical Cancer with Machine Learning,https://proclusacademy.com/blog/explainer/precision-recall-f1-score-classification-models/
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
