Question: In Python 3, modify the following k-means algorithm to analyze the earthquake data from the following excel file (save it, you don't have to import

In Python 3, modify the following k-means algorithm to analyze the earthquake data from the following excel file (save it, you don't have to import it from the internet): https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv and create clusters for earthqaukes that happen on the east coast, the west coast, and the midwest in the United States. Once you have the clusters, use the turtle module to draw a dot on this map of the United States(http://i64.tinypic.com/2i739t4.jpg) where each earthquake occurred and each cluster should be a different cluster. You can only use the math, random, and turtle modules. Here's the code that needs modified:

import math

import random

import turtle

def euclid(p1, p2):

total = 0

for i in range(len(p1)):

total += (p2[i] - p1[i])**2

return math.sqrt(total)

def getData(afile):

datafile = open(afile,'r')

thedict = {}

key = 1

for line in datafile:

score = int(line)

thedict[key] = [score]

key += 1

return thedict

def centroids(k, datadict):

centroidL = []

centroidCount = 0

centroidKeys = []

while centroidCount < k:

randomkey = random.randint(1, len(datadict))

if randomkey not in centroidKeys:

centroidL.append(datadict[randomkey])

centroidKeys.append(randomkey)

centroidCount += 1

return centroidL

def createClusters(k, centroidL, datadict, repeat):

for apass in range(repeat):

clusterL = []

for i in range(k):

clusterL.append([])#add an empty list for each cluster

for akey in datadict:

distances = []

for cindex in range(k):

dist = euclid(datadict[akey],centroidL[cindex])

distances.append(dist)

minD = min(distances) # smallest distance

index = distances.index(minD)

clusterL[index].append(akey)

dimension = len(datadict[1])

for cindex in range(k):

totals = [0]*dimension #repeat 0 dimension times, in a list

for item in clusterL[cindex]:

points = datadict[item] #get data from dictionary

for ind in range(len(points)):

totals[ind] += points[ind]

for ind in range(len(totals)):

clusterLen = len(clusterL[cindex])

if clusterLen != 0:

totals[ind] /= clusterLen

centroidL[cindex] = totals

#print the clusters

for c in clusterL:

print("Cluster", apass)

for k in c:

print(datadict[k], end=" ")

print()#newline

return clusterL

#testing

point1 = [4, 6, 12]

point2 = [-3, 4, -2]

#print(euclid(point1,point2))

data = getData('scores.txt')

#print(data)

cent = centroids(5, data)

#print(cent)

CL = createClusters(5, cent, data, 3)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!