Question: (a) The K-means algorithm with Euclidean distances is a very popular and widely used method for data clustering. What is the basic assumption on the
(a) The K-means algorithm with Euclidean distances is a very popular and widely used method for data clustering. What is the basic assumption on the distribution of the data in this K-means clustering?
(b) Answer the following questions in the context of the K-means algorithm.
What are the inputs? Which parameters are usually specified by the user?
What objective function does the K-means algorithm minimise?
(c) You are given a one-dimensional dataset, D = {0, 1, 1, 2, 3, 4, 4, 4, 5}. Compute the kernel density estimate at x = 2 and x = 4 with the bandwidth of 2 using the following triangle kernel:
K(u) = (1 - lu|)
(|u| =
where
is the function
(|u| =10|u|=otherwise
Justify your answers.
(d) Why do we want to use "weak" learners such as decision stumps when using the method of boosting?
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
