Question: Question 5 - Machine Learning General (20 pts.] 5.1 (10 pts.] ) For this please use the dataset provided below in Table 1. Assume that

 Question 5 - Machine Learning General (20 pts.] 5.1 (10 pts.]

Question 5 - Machine Learning General (20 pts.] 5.1 (10 pts.] ) For this please use the dataset provided below in Table 1. Assume that we are using Euclidean distance as the distance metric. The first column is the patient id, the second column is a clinical measurement denoted by X for that patient and the last column is the label for the disease state Y of the patient. The + label indicates the patient has a particular disease, while the - label indicates the patient is disease-free. Patient id X Y 1 1 2 7 10 + 4 16 5 6 25 - 7 32 8 9 41 + 10 49 + Table 4: Patient data What is the leave-one-out cross-validation error rate of 1-Nearest Neighbour 1-NN) classifier that predicts the patient state Y given X on this dataset. Please list the ids of the patients that are misclassified and write the error rate clearly. 6.2 [5 pts.] ) Suppose you are given w=(ro.... wa), the feature weights for the logistic regression model to predict whether the customer tone is angry (y = 1) or not (y = 0) in an email. You get the feature values for a new customer email and find out that (t+1992:+) > 0, what can you say about the email? a) The email is more likely to be an angry email. b) The email is more likely to be NOT an angry email. c) The email is an angry email. c) The email is NOT an angry email. 5.3 15 pts. Given three clusters, A, B and C. containing a total of six points, where each point is defined by an integer value in one dimension: A = {0,20,60) B = (30,90) C(110) Which two clusters will be merged at the next iteration when using Euclidean distance and single linkage (min) algorithm? a) Merge A and B. b) Merge B and C. c) Merge A and C. Question 5 - Machine Learning General (20 pts.] 5.1 (10 pts.] ) For this please use the dataset provided below in Table 1. Assume that we are using Euclidean distance as the distance metric. The first column is the patient id, the second column is a clinical measurement denoted by X for that patient and the last column is the label for the disease state Y of the patient. The + label indicates the patient has a particular disease, while the - label indicates the patient is disease-free. Patient id X Y 1 1 2 7 10 + 4 16 5 6 25 - 7 32 8 9 41 + 10 49 + Table 4: Patient data What is the leave-one-out cross-validation error rate of 1-Nearest Neighbour 1-NN) classifier that predicts the patient state Y given X on this dataset. Please list the ids of the patients that are misclassified and write the error rate clearly. 6.2 [5 pts.] ) Suppose you are given w=(ro.... wa), the feature weights for the logistic regression model to predict whether the customer tone is angry (y = 1) or not (y = 0) in an email. You get the feature values for a new customer email and find out that (t+1992:+) > 0, what can you say about the email? a) The email is more likely to be an angry email. b) The email is more likely to be NOT an angry email. c) The email is an angry email. c) The email is NOT an angry email. 5.3 15 pts. Given three clusters, A, B and C. containing a total of six points, where each point is defined by an integer value in one dimension: A = {0,20,60) B = (30,90) C(110) Which two clusters will be merged at the next iteration when using Euclidean distance and single linkage (min) algorithm? a) Merge A and B. b) Merge B and C. c) Merge A and C

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!