Question: Code in Python 3.7 Perform HAC For this function, we would like you to mimic the behaviour of SciPy's HAC function (Links to an external

Code in Python 3.7

Perform HAC

For this function, we would like you to mimic the behaviour of SciPy's HAC function (Links to an external site.), linkage(). You may not use this function in your implementation, but we strongly recommend using it to verify your results!

Input: A collection of m observation vectors in n dimensions may be passed as an m by n array (for us, this will be a list of tuples, not a numpy array like for linkage()!). All elements of the condensed distance matrix must be finite, i.e. no NaNs or infs. In our case, m is the number of Pokemon (here 20) and n is 2: the x and y features for each Pokemon. (If invalid data points exit, you need to pop them out. In this case, m is the number of valid data points)

Using single linkage, perform the hierarchical agglomerative clustering algorithm as detailed on slide 19 of our class slidesLinks to an external site.. Use a standard Euclidean distance function for calculating the distance between two points.

Output: An (m-1) by 4 matrix Z. At the i-th iteration, clusters with indices Z[i, 0] and Z[i, 1] are combined to form cluster m + i. A cluster with an index less than m corresponds to one of the m original observations. The distance between clusters Z[i, 0] and Z[i, 1] is given by Z[i, 2]. The fourth value Z[i, 3] represents the number of original observations in the newly formed cluster.

That is:

  • Number each of your starting data points from 0 to m-1. These are their original cluster numbers.
  • Create an (m-1)x4 array or list. Iterate through the list row by row.
  • For each row, determine which two clusters you will merge and put their numbers into the first and second elements of the row. The first point listed should be the smaller of the two cluster indexes. The single-linkage distance between the two clusters goes into the third element of the row. The total number of points in the cluster goes into the fourth element.
  • If you merge a cluster containing more than one data point, its number (for the first or second element of the row) is given by m+the row index in which the cluster was created.
  • Before returning the data structure, convert it into a NumPy matrix.

If you follow these guidelines for input and output, your result should match the result of scipy.cluster.hierarchy.linkage() and you can use that function to verify your results. Be aware that this function does not contain code to filter NaN values, so this filtering should be performed before calling the function.

Tie Breaking

In the event that there are multiple pairs of points with equal distance for the next cluster:

Given a set of pairs with equal distance {(xi, xj)} where i < j, we prefer the pair with the smallest first cluster index i. If there are still ties (xi, xj), ... (xi, xk) where i is that smallest first index, we prefer the pair with the smallest second cluster index.

Be aware that this tie breaking strategy may not produce identical results to scipy.cluster.hierarchy.linkage().

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!