Question: (iii) Consider a dataset with a nominal target attribute (i.e., a nominal CLASS) and several predicting attributes. Suppose that the dataset contains 1000 instances and

(iii) Consider a dataset with a nominal target attribute (i.e., a

(iii) Consider a dataset with a nominal target attribute (i.e., a nominal CLASS) and several predicting attributes. Suppose that the dataset contains 1000 instances and that the data instances in the dataset have been clustered into 10 clusters each one containing roughly 100 instances. Let c1, c2, c3, c9, c10 be the cluster centroids. The clustering has been performed using Euclidean distance over the predicting attributes (without using the target attribute). Consider the following classification method: o Given a test instance t and an integer k (k is much smaller than 100): O Find the closest centroid to the test instance using Euclidean distance over the predicting attributes. Use Euclidean distance to select the k-nearest neighbours of t among those instances that belong to the cluster represented by the closest centroid. o Use those k selected data instances to classify the test instance. Will this classification method always make the same prediction (but only faster) for a test instance t, than the prediction made by the k-nearest neighbour classifier based on the same Euclidean distance but in which the k-nearest neighbours are computed over the entire dataset (3 marks)? You must provide an example to support your answer (5 marks). Hint: The easiest way to do this is to consider a small dataset with just one attribute and 1-NN. Note: Marks will be deducted if your answer is not neat and clearly legible. (iii) Consider a dataset with a nominal target attribute (i.e., a nominal CLASS) and several predicting attributes. Suppose that the dataset contains 1000 instances and that the data instances in the dataset have been clustered into 10 clusters each one containing roughly 100 instances. Let c1, c2, c3, c9, c10 be the cluster centroids. The clustering has been performed using Euclidean distance over the predicting attributes (without using the target attribute). Consider the following classification method: o Given a test instance t and an integer k (k is much smaller than 100): O Find the closest centroid to the test instance using Euclidean distance over the predicting attributes. Use Euclidean distance to select the k-nearest neighbours of t among those instances that belong to the cluster represented by the closest centroid. o Use those k selected data instances to classify the test instance. Will this classification method always make the same prediction (but only faster) for a test instance t, than the prediction made by the k-nearest neighbour classifier based on the same Euclidean distance but in which the k-nearest neighbours are computed over the entire dataset (3 marks)? You must provide an example to support your answer (5 marks). Hint: The easiest way to do this is to consider a small dataset with just one attribute and 1-NN. Note: Marks will be deducted if your answer is not neat and clearly legible

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!

I need to see the SPSS output. You need to have all z-scores, all charts, all descriptives data from SPSS, everything you used to answer the questions. I am sending you what the previous tutor sent...

The total number of points for this assignment is 120 points. Please submit your assignment in a Word file. Use this assignment file as a template to enter and copy-paste your answers for your...

Exercises Chapter 2 2.1 Marginal and conditional probability: The social mobility data from Section 2.5 gives a joint probability distribution on (Y1 , Y2 )= (father's occupation, son's occupation)....

Exploring Data Analytics with Real-World Datasets The domain chosen for this project is the Ultimate Fighting Championship (UFC), a leading mixed martial arts organization that features some of the...

Requirements Read the give information deeply and Drawing conclusions refers to information that is implied or inferred. ... Using these clues to give for deeper understanding And provide the details...

nodes, but at least its bias can be quantified by Markov Chain L. INTRODUCTION analysis and thus can be corrected via appropriate re-weighting The popularity of online social networks (OSNs) in...

need only conclusions values, and as for the second, we used an operator Replace Missing Values which had replaced the missing instances with average value. After the completion of these steps, we...

Using the article Comparing Crime Rates between Undocumented Immigrants, Legal Immigrants, and Native-born US citizens in Texas, write a 2-3 page-paper (double-spaced, 1-inch margins, times roman...

Please provide the summary of the methodology and your understanding of this paper. Incluse necessary figures as well. Rapid Object Detection using a Boosted Cascade of Simple Features single feature...

For the given cross sections, Calculate the cracking moment? Calculate the maximum stresses in concrete and steel for a moment less than the cracking moment and another higher than the cracking...

What is a fjord?

1241. The velocity of a car is plotted as shown. Determine the total distance the car moves until it stops (t = 80 s). Construct the at graph. v (m/s) 10 40 Prob. 12-41 -1 (s) 80

A company issues 1,050 shares of its common stock for $33,600 cash. Prepare journal entries to record this event under each of the following separate situations.