Question: k - Nearest Neighbor The learning technique we have seen so far ( i . e . Naive Bayes ) has involved the analysis of

-

Nearest Neighbor

The learning technique we have seen so far

(

.

.

Naive Bayes

)

has involved the analysis of a training set of examples in order to build a model for the task at hand. New examples were then classified using the learned model. We now consider the k

-

nearest neighbor algorithm, which does not explicitly build a model from training data. Instead, it stores the training instances so that they can be used to analyze new instances later As its name implies, the k

-

nearest neighbor algorithm searches the training examples to find those that are "closest" to a new example to be classified. It then uses these to determine the appropriate output for the new instance. k

-

nearest neighbor is an example of a class of learning techniques called instance

-

based methods. These methods are sometimes referred to as "lazy", because they put off the processing of examples until a new test instance is considered.

Consider the set of training examples in the diagram below. A plus indicates a positive example and a star indicates a negative example. Assuming that we are using standard Euclidean distance to compute nearest neighbors.

)

How will the point

(8, 1)

be classified by the

1 -

nearest neighbor classifier?

)

How will the point

(8, 8)

be classified by the

1 -

nearest neighbor classifier?

Fall

2024

.

Lina Al

-

Ebbini

One of the problems with k

-

nearest neighbor learning is selecting a value for k

.

For this exercise, you will empirically determine a reasonable value for

k,

given a specific training set. Say you are given the data set shown below. This is a binary classification task in which the instances are described by two real

-

valued attributes.

)

Find a value of k

(

using Weka, Matlab, or python

)

that minimizes leave

-

one

-

out crossvalidation error. What we'll be doing here is using our training set to help us select a good value for k

.

We take our training instances and divide them into two groups, so that some are used for training and others serve as validation

(

.

.,

"test"

)

examples. Specifically, for the given training set of

14

examples, we divide it into a set of

13

for training and a set of

1

for testing. There are obviously

14

ways to choose

13 (

.

.,

"leave one out"

),

and we will consider all

14 .

For each training

-

validation split of the data

(

remember that there are

14

of these

),

we run the

k -

nearest neighbor algorithm for all possible values of

k (1

13) .

We then compute how well we did at classifying the validation instances for each value of k

.

The k that performs best is selected as the value to use for future

(

unseen

)

test instances.

Data Mining with Weka

(3.6

: Nearest neighbor

)

)

Why might using too large a value of k be bad for this data set? Why might using too small a value be bad for this data set?

k - Nearest Neighbor The learning technique we

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Indoor localization and monitoring are applied using RFID technique, for which the RFID reader and tags are distributed as shown in Figure 1 . The figure illustrates how a person can be monitored...

7. A prediction model is going to be built for in-line quality assurance in a factory that manufactures electronic components for the automotive industry. The system will be integrated into the...

Jones & Bartlett Learning, LLC. NOT FOR RESALE OR DISTRIBUTION CHAPTER Hot Spot Analysis 10 LEARNING OBJECTIVES C A R R Provide a working definition of a \"hot spot.\" , Be able to explain different...

Who is chief knowledge officer? What the primary role? A senior executive in an organization responsible for ensuring that firm fully utilizes the value it gets through knowledge- which is the most...

Algorithms in Artificial Intelligence (or, the old name: Introduction to Algorithmic Decision Making) Part 1 Based on slides by David Sarne and Lirong Xia Course Tentative Schedule Introduction...

summaries the next paragraph in one prepare in word document 1. Introduction Undoubtedly, the world is shrinking into a small village owing to the tangible influence of social media. It connects...

Data File: The manager of the commercial loan department for a bank wants to develop a rule to use in determining whether or not to approve various requests for loans. The manager believes that three...

loans file - The manager of the commercial loan department for a bank wants to develop a rule to use in determining whether or not to approve various requests for loans. The manager believes that...

Task Team: 1... 2... On Dec 31, 2016, Riyadh Company had accounts: 1. Credit sales Chapter 8- Exercise 2 $2,500,000 800,000 120,000 4. Sales returns and allowances 280,000 5. Beginning accounts...

From the educational level data described in Exercise 1: a) Make a bar chart using counts on the y-axis. b) Make a relative frequency bar chart using percentages on the y-axis. c) Make a pie chart.

Exercise 1 9 - 1 4 ( Algorithmic ) ( L . O . 5 ) Zack, a sole proprietor, has earned income of $ 1 1 3 , 0 8 0 in 2 0 2 3 ( after the deduction for one - half of self What is the smaximum...

Capital Budgeting (Payback Period and NPV): Project Initial Investment ($) Cash Flows Year 1 ($) Cash Flows Year 2 ($) Project A 300,000 110,000 120,000 Project B 350,000 130,000 140,000 Compute the...