Question: In Python 3, modify the following k-means algorithm to analyze the earthquake data from the following excel file (save it, you don't have to import

In Python 3, modify the following k-means algorithm to analyze the earthquake data from the following excel file (save it, you don't have to import it from the internet): https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv and create clusters for earthqaukes that happen on the east coast, the west coast, and the midwest in the United States. Once you have the clusters, use the turtle module to draw a dot on this map of the United States(http://i64.tinypic.com/2i739t4.jpg) where each earthquake occurred and each cluster should be a different cluster. You can only use the math, random, and turtle modules. Here's the code that needs modified:

import math

import random

import turtle

def euclid(p1, p2):

total = 0

for i in range(len(p1)):

total += (p2[i] - p1[i])**2

return math.sqrt(total)

def getData(afile):

datafile = open(afile,'r')

thedict = {}

key = 1

for line in datafile:

score = int(line)

thedict[key] = [score]

key += 1

return thedict

def centroids(k, datadict):

centroidL = []

centroidCount = 0

centroidKeys = []

while centroidCount < k:

randomkey = random.randint(1, len(datadict))

if randomkey not in centroidKeys:

centroidL.append(datadict[randomkey])

centroidKeys.append(randomkey)

centroidCount += 1

return centroidL

def createClusters(k, centroidL, datadict, repeat):

for apass in range(repeat):

clusterL = []

for i in range(k):

clusterL.append([])#add an empty list for each cluster

for akey in datadict:

distances = []

for cindex in range(k):

dist = euclid(datadict[akey],centroidL[cindex])

distances.append(dist)

minD = min(distances) # smallest distance

index = distances.index(minD)

clusterL[index].append(akey)

dimension = len(datadict[1])

for cindex in range(k):

totals = [0]*dimension #repeat 0 dimension times, in a list

for item in clusterL[cindex]:

points = datadict[item] #get data from dictionary

for ind in range(len(points)):

totals[ind] += points[ind]

for ind in range(len(totals)):

clusterLen = len(clusterL[cindex])

if clusterLen != 0:

totals[ind] /= clusterLen

centroidL[cindex] = totals

#print the clusters

for c in clusterL:

print("Cluster", apass)

for k in c:

print(datadict[k], end=" ")

print()#newline

return clusterL

#testing

point1 = [4, 6, 12]

point2 = [-3, 4, -2]

#print(euclid(point1,point2))

data = getData('scores.txt')

#print(data)

cent = centroids(5, data)

#print(cent)

CL = createClusters(5, cent, data, 3)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

I do not have access to Python or excel/office Covid sick cant access school computer last time getting error please help No matter how much I access and cleanData1 I always get an error on every...

I keep having a "file not found in directory error" even though i have the file saved where the .py file is saved. please show how i am able to fix this error. Clustering is a process of identifying...

Internet technology. Please please HELP ASAP. Its due in 24 hours. THANKS Thanks Server1.py Client1.py Sock352.py CS 352 Spring 2017 Programming Project Part 1 1. Overview: For part 1 of the project,...

INPUT FILE VALUES 1.8 4.5 1.1 2.1 9.8 7.6 11.32 3.2 0.5 6.5 Introduction Clustering is a process of identifying groupings (i.e. clusters) within the data. For example, the figure below shows three...

Appendix G: (Not required information but I put it here incase it might help) b. Build a Simulink program based on the transfer function in Eq. (2-8) with R= 10k12 and C = 10uF. to conduct...

I keep getting a "file not found in directory" error, even when I save prog2-input-data.txt with kMeans.py. Please explain how to fix this error along with the homework. At the end there's a step by...

this is a python program please can anyone help me thank you Introduction In problem set 5, you will build a program to monitor news feeds over the Internet. Your program will filter the news,...

In Python 3, using only the math, random, and turtle modules, create a program that uses K-means clustering to analyze an excel file detailing earthquakes and organize the data into clusters...

Exp19_Access_Ch02_Capstone - International Foodies 1.0 Project Description: International Foodies is an importer of exotic foods from all over the world. You landed a summer internship with the...

Most of the data that you need for the Global Human Resources Consultants database is currently stored in Excel workbooks. Use the Import Spreadsheet Wizard to import the data from the...

Let X and P be independent and uniformly distributed on [-1, 1]. Given the following facts: E[X] = E[X^3] = E[X^5] = 0 E[X^2] = 1/3 E[X^4] = 1/5 Suppose that Y = X^3 + P 1. Find the LMS estimate of...

A Michelson interferometer is used in a class of commercially available optical instruments called wavelength meters. In a wavelength meter, the interferometer is illuminated simultaneously with...

Karmen is currently age 4 0 . At what age must she begin taking minimum distributions from her Roth IRA? 5 9 1 2 . 7 3 . 7 5 . RMDs are not required during Karmen's lifetime.

Eggz, Inc., is considering the purchase of new equipment that will allow the company to collect loose hen feathers for sale. The equipment will cost $440,000 and will be eligible for 100 percent...

The loss of entrepreneurially minded staff from larger organizations back to family businesses has focused attention on recruitment, retention and compensation.

Specific training needs are coming to the fore: language, computer, sales and marketing skills.

manageremployee relationship deteriorating over time;