Question: (iii) Consider a dataset with a nominal target attribute (i.e., a nominal CLASS) and several predicting attributes. Suppose that the dataset contains 1000 instances and

 (iii) Consider a dataset with a nominal target attribute (i.e., a

(iii) Consider a dataset with a nominal target attribute (i.e., a nominal CLASS) and several predicting attributes. Suppose that the dataset contains 1000 instances and that the data instances in the dataset have been clustered into 10 clusters each one containing roughly 100 instances. Let c1, c2, c3, c9, c10 be the cluster centroids. The clustering has been performed using Euclidean distance over the predicting attributes (without using the target attribute). Consider the following classification method: o Given a test instance t and an integer k (k is much smaller than 100): O Find the closest centroid to the test instance using Euclidean distance over the predicting attributes. Use Euclidean distance to select the k-nearest neighbours of t among those instances that belong to the cluster represented by the closest centroid. o Use those k selected data instances to classify the test instance. Will this classification method always make the same prediction (but only faster) for a test instance t, than the prediction made by the k-nearest neighbour classifier based on the same Euclidean distance but in which the k-nearest neighbours are computed over the entire dataset (3 marks)? You must provide an example to support your answer (5 marks). Hint: The easiest way to do this is to consider a small dataset with just one attribute and 1-NN. Note: Marks will be deducted if your answer is not neat and clearly legible. (iii) Consider a dataset with a nominal target attribute (i.e., a nominal CLASS) and several predicting attributes. Suppose that the dataset contains 1000 instances and that the data instances in the dataset have been clustered into 10 clusters each one containing roughly 100 instances. Let c1, c2, c3, c9, c10 be the cluster centroids. The clustering has been performed using Euclidean distance over the predicting attributes (without using the target attribute). Consider the following classification method: o Given a test instance t and an integer k (k is much smaller than 100): O Find the closest centroid to the test instance using Euclidean distance over the predicting attributes. Use Euclidean distance to select the k-nearest neighbours of t among those instances that belong to the cluster represented by the closest centroid. o Use those k selected data instances to classify the test instance. Will this classification method always make the same prediction (but only faster) for a test instance t, than the prediction made by the k-nearest neighbour classifier based on the same Euclidean distance but in which the k-nearest neighbours are computed over the entire dataset (3 marks)? You must provide an example to support your answer (5 marks). Hint: The easiest way to do this is to consider a small dataset with just one attribute and 1-NN. Note: Marks will be deducted if your answer is not neat and clearly legible

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!