Question: Help with Exercise 2 Exercise 1 for Reference: Exercise 1: a) Use the Machine Learning algorithms: k-NN, and Nave Bayes to classify multiphase flow patterns,


Help with Exercise 2
Exercise 1 for Reference:
Exercise 1: a) Use the Machine Learning algorithms: k-NN, and Nave Bayes to classify multiphase flow patterns, using the database BDOShohamIML.csv and evaluate the performance. b) Apply parameters optimization to (a) and evaluate the performance. c) Explain the Confusion Matrix and metrics obtained in (a) y (b), that is, before and after parameters optimization.
Code for Exercise 1:



In [14]: import numpy as np import pandas as pd from sklearn. neighbors import KNeighborsClassifier from sklearn. metrics import accuracy_score from sklearn. model_selection import GridSearchCV from sklearn. naive_bayes import GaussianNB from sklearn. metrics import confusion_matrix data = pd. read_csv( 'Data_Glioblastoma5Patients_SC. csv' ) print ( ' Shape: ' , data . shape) data . head ( ) Shape: (430, 5949) Out [14 ] : A2M AAAS AAK1 AAMP AARS AARSD1 AASDH AASDHPPT AA 0 -3.80147 -3.889900 -3.985616 2.651558 2.170748 -2.550822 4.807330 3.961170 -0.1926 1 -3.80147 -3.889900 -3.158708 2.358992 -6.041792 -0.056092 3.606735 -2.632250 2.2493 2 -3.80147 -3.889900 1.733125 -5.820241 -6.041792 -0.576957 -2.473517 -4.354127 0.063 3 -3.80147 -3.889900 -1.665669 3.514271 -6.041792 -3.699171 4.509461 -4.354127 2.985 4 -3.80147 3.742495 -2.166992 -5.820241 2.094729 4.021873 5.535007 4.019633 2.5603 5 rows x 5949 columns In [22]: #Knn Code # a ) clf1 = KNeighborsClassifier(n_neighbors=3) . fit(data . iloc[ : , : -1] . values, data. il oc[ : , -1: ] . values . ravel( )) y_pred = clf1. predict(data. iloc[ : , : -1] . values) print( ' Accuracy score: ', accuracy_score (data. iloc[ : , -1: ] . values, y_pred) ) confusion_matrix(data. iloc[ : , -1: ] . values, y_pred) Accuracy score: 0. 9506607929515418 Out [22]: array([[ 973, 0 , 26, 0 , 34] , 121, 1, 3, 0 , 01, 1, 550, 41, 0, 2], 67. 9, 38, 2768, 4, 19]. 0, 0, 0 , 2, 136, 2] , 20, 2, 1 , 8 , 847]], dtype=int64) In [15 ]: # b) leaf_size = list(range(1, 10) ) n_neighbors = list(range(1, 5) )In [ ]: hyperparameters = dict(leaf_size=leaf_size, n_neighbors=n_neighbors) clf2 = KNeighborsClassifier() clf3 = GridSearchCV(clf2, hyperparameters, cv=5) best_model = clf3. fit(data . iloc[ : , : -1] . values, data. iloc[ :, -1: ] . values. ravel( )) print( 'Best leaf_size: ', best_model. best_estimator_. get_params( ) ['leaf_size' ]) print( 'Best n_neighbors: ', best_model. best_estimator_. get_params( ) ['n_neighbor s' ] ) In [ ]: clf4 = KNeighborsClassifier(n_neighbors=3, leaf_size=1) . fit(data. iloc[ :, : -1] . va lues, data . iloc[ : , -1: ]. values . ravel( ) ) y_pred = clf1. predict(data. iloc[ :, : -1] . values) print( ' Accuracy score: ' , accuracy_score(data. iloc[:, -1: ] . values, y_pred) ) print ( ' confusion Matrix: ') confusion_matrix (data . iloc [ : , -1: ]. values, y_pred) In [26]: #Naive Bayes Code # a ) clf1 = GaussianNB( ) . fit(data. iloc[ :, : -1] . values, data. iloc[ : , -1: ] . values. ravel ( ) ) y_pred = clf1. predict(data. iloc[ : , : -1] . values) print( 'Accuracy score: ', accuracy_score (data. iloc[ : , -1: ]. values, y_pred) ) confusion_matrix(data . iloc[ : , -1: ] . values, y_pred) Accuracy score: 0. 6754185022026432 Out [26]: array ([[ 879, 0, 0, 143, 1, 10], 0 , 121, 0 , 4, 0 , 0], 1, 3, 471, 115, 4, 0], [ 124, 53, 192, 2228, 240 68] , 0 , 0 , 0 , 11, 129 0] , [ 307, 0 , 9 , 488, 69, 5]], dtype=int64) In [27 ]: # b) hyperparameters = {'var_smoothing' : np. logspace(0, -9, num=100) } clf2 = GaussianNB( ) clf3 = GridSearchCV(clf2, hyperparameters, cv=5) best_model = clf3. fit(data. iloc[ : , : -1] . values, data. iloc[ :, -1: ] . values. ravel( ) ) print( 'Best var_smoothing: ', best_model. best_estimator_. get_params ( ) ['var_smoo thing ' ]) Best var_smoothing: 1. 873817422860387e-09 In [ ]:In [20]: clf4 = GaussianNB(var_smoothing= 1. 2328467394420635e-09) . fit(data. iloc[ : , :-1]. values, data. iloc[ :, -1: ] . values . ravel( )) y_pred = clf4. predict(data. iloc[ : , : -1] . values) print( 'Accuracy score: ', accuracy_score(data. iloc[ : , -1: ] . values, y_pred) ) print( ' confusion Matrix: ') confusion_matrix(data . iloc[ : , -1: ]. values, y_pred) Accuracy score: 0. 6755947136563877 confusion Matrix: Out [20]: array ([[ 879, 0 0 , 143, 1, 10], 0 , 121, 0 , 4 0 , 01, 1, 2 , 471, 116 , 4 , 01, 124, 52, 192, 2229, 240, 68], 0 , 0 , 0 , 11, 129 01, 307, 9 , 488 , 69 5]], dtype=int64) C) Explain the Confusion Matrix and Metrics before a & b The accuracy score associated with the confusion matrix with the K-nn before and parameters optizimation was identical, high 0.95. The accuracy score associated with the confusion matrix with Naive Bayes before and after parameters optimization was very close, both around .68. These data seem to indicate that a higher accuracy of prediction is obtained through the K-nn prediction method
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
