Question: Modify thecodefrom cluster.py on eClass to implement bisecting k-means , which is an extension of the basic k-means algorithm. Start with one cluster containing all

Modify thecodefrom cluster.py on eClass to implement bisecting k-means, which is an extension of the basic k-means algorithm. Start with one cluster containing all data points, and then repeatedly split the largest cluster into two (k-means with k = 2) until you have the desired number of clusters. Use the program visual.py to see the results

import math import random

def euclidD(point1, point2): sum = 0 for index in range(len(point1)): diff = (point1[index]-point2[index]) ** 2 sum = sum + diff return math.sqrt(sum)

def manhattanD(point1, point2): sum = 0 for i in range(len(point1)): diff = abs(point1[i]-point2[i]) sum = sum + diff return sum

def createCentroids(k, datadict): centroids=[] centroidCount = 0 centroidKeys = [] random.seed(68)

keys = list(datadict.keys())

while centroidCount < k: rkey = random.randint(0,len(datadict)) if rkey not in centroidKeys: centroids.append(datadict[keys[rkey]]) centroidKeys.append(rkey) centroidCount = centroidCount + 1 return centroids

def showCentroids(centroids): print("CENTROIDS", end = " ") for cent in centroids: #print("%4.1f" % (cent[0]), end = " ") print(cent, end = " ") print()

def showClusters(clusters, datadict): for c in clusters: print ("CLUSTER", end = " ") for key in c: print(datadict[key], end=" ") print()

def assignPointsToClusters(centroids,datadict): clusters = [] k = len(centroids) for i in range(k): clusters.append([]) for akey in datadict: distances = [] for clusterIndex in range(k): dist = euclidD(datadict[akey],centroids[clusterIndex]) distances.append(dist)

mindist = min(distances) index = distances.index(mindist) clusters[index].append(akey) return clusters

def updateCentroids(clusters, centroids,datadict): dimensions = len(datadict[list(datadict.keys())[0]]) k = len(centroids) for clusterIndex in range(k): sums = [0]*dimensions for akey in clusters[clusterIndex]: datapoints = datadict[akey] for ind in range(len(datapoints)): sums[ind] = sums[ind] + datapoints[ind] for ind in range(len(sums)): clusterLen = len(clusters[clusterIndex]) if clusterLen != 0: sums[ind] = sums[ind]/clusterLen centroids[clusterIndex] = sums return centroids

def createClusters(centroids, datadict, repeats): k = len(centroids) doAgain = True apass = 0 while doAgain and apass < repeats: clusters = [] prevClusters = clusters

clusters = assignPointsToClusters(centroids,datadict)

if clusters == prevClusters: doAgain = False

centroids = updateCentroids(clusters, centroids, datadict) apass = apass + 1

return clusters

def readFile(filename): datafile = open(filename, "r") datadict = {}

key = 0 for aline in datafile: key = key + 1 items = aline.split() datadict[key] = [int(k) for k in items] return datadict

def main(): datadict = readFile("scores.txt") numClusters = 5 # number of clusters dataCentroids = createCentroids(numClusters, datadict) maxPass = 10 dataClusters = createClusters(dataCentroids, datadict, maxPass)

if __name__ == "__main__": main()

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

from cImage import * import cluster import sys """ Your task for this assignment is to complete the application that takes an original image and creates a new k-coloured image and saves it. NOTE - Do...

please complete the following code : from cImage import * import cluster import sys """ Your task for this assignment is to complete the application that takes an original image and creates a new...

To reiterate, the basic process of computing a new k-coloured image is: Process the command line arguments. These arguments are the name of the original image, the number of clusters (k), and the...

Python Code Sample Output Skeleton Code Stack Class Skeleton code Part A 1. Download and save stack.py from eClass. This file contains implementation #2 of the Stack class covered in the lectures. 2....

Please see the instruction, the hash_table.h file is provided. Please modify it and provide an extra makefile, thank you so much. Description In the hash table implementation presented in class (on...

LABOS Use of formatting and input/output files Modify your Labor for Labod_sol.990) so that askes the user to enter his/er first name and last name via keyboard entry . ask user to enter the...

Please help!!! In this lab you will restructure your geometry code to make it more object oriented. You also will adapt your Geometry class code from a previous assignment and add it to other...

code for stack.py: class Stack: def __init__(self): self.items = [] def push(self, item): self.items.append(item) # MODIFY: RAISE AN EXCEPTION IF THIS METHOD IS INVOKED ON AN EMPTY STACK def...

This is my current code in Mathematica but i can't figure out how to create an if statement to detect if a pivot variable is zero. Please help. CIV E 295 Lab #5-Linear Systems of Equations Files are...

Given: Stack.py: class Stack: def __init__(self): self.items = [] def push(self, item): self.items.append(item) # MODIFY: RAISE AN EXCEPTION IF THIS METHOD IS INVOKED ON AN EMPTY STACK def pop(self):...

QUESTION 2 (i) Consider an equally weighted portfolio of three stocks, each of which is independently distributed of the others [that is, Cov(ri, rj) = 0 for different securities i and j]. Assume...

Bramble Corp. has a new product going on the market next year. The following data are projections for production and sales: Variable costs Fixed costs ROI Investment Sales $162500 $450000 14%...

Alf financial decisions can be thought of as a comparison as the unexpected benefits from a particular course of action and the costs of pursuing it . True False

IKLAUSYTUACASSI VORUI AlaeassignmentsessionLocator=&inprogress=false eBook Calculator Print Item Millennium Associates records bad debt using the allowance, income statement method. They recorded...

2. Evaluate Banyan Trees brand positioning and communications strategies. Can Banyan Tree maintain its unique positioning in an increasingly overcrowded resorts market?

2. How could Uber effectively compete with Didi? Should it compete head-on in China, or should it side-step competition by focusing on niche markets through service innovation, and geographic...

1. What do you see as the key differences between pension fund auditing and management consulting? How good is the fit between the two?