Question: ***HERE IS THE EMBEDDED and SELF- TRANSCRIBED IMAGE TEXT, SO IT MUST NOT BE THAT UNCLEAR! THIS IS GETTING RIDICULOUS and I am going to
***HERE IS THE EMBEDDED and SELF- TRANSCRIBED IMAGE TEXT, SO IT MUST NOT BE THAT UNCLEAR! THIS IS GETTING RIDICULOUS and I am going to unsubscribe from the service, if the transcription service can read it , why can't you?
Show transcribed image text70-511
: Statistical Propramming ProgianmingAssignrien 2-k-Means Clustering You are to creale a prograrn usirng Python hat does the fullowing: Asks the user for a filename which contains the point data which is to be clustered (see Data File Format section for details). 1. 2. Asks the user for the narme af the output file. 3 Asks the user for the number of dusters. This is the parameter kthat will be used for k means. 4. Read the Input file and stores the points Into a list 5. Applies the k-means alporithm to find the cluster for each point. G. Displays the polints that cach duster contains after cach itcration of the algorithm . Writes the final cluster assignments to the output file. Clustering is a process of identifying groupinsi.e. clusters within the data. For example, the fipure below shows three clusters of two dimensional data points (Xs]: YOU CANNOT USE ANY PYTHON PACKAGES FOR THIS PROGRAM (NUMPY, PANDAS, .) NO IMPORT STATEMENTS. The name of your source code file should he kMeans. py. All your code shauld be within a single file. Your code should follow good coding practices, including good usc of whitespacc and usc of hoth inline and block comments You need to use meaningful identifier names that conform to standard namine conventions. At the top of cach file, you need to put In a block comment with the following Information: your name, date, course name, semester, and assignment name. The output of your program should exactly match the sample program output given at the end. 1. 2. 3. 4. 5. xxX Data File Format Let N be the number of points and Pi to he the value of point i. The input file shauld be of the following format: P1 P2 Clustering has many applications, including inferring population structures from genetic data, recognizing communities within social networks, or segmenting of customers for market research. PN One of the most popular algorithms for performing clustering is the k-means method. The algorithm depends on the natian of distance hetween two points. For points with only one dimension (just single values), we can define the distance between two points p and q as Example: 1.2 2.1 4.56 2.113 What to Turn In You will turn in the single kMeans.py file using BlackBoard The k-means algorithm will work by placing points into clusters and computing their centroids, which is defined as the average of the data points in the cluster. Specifically, the alporithm works as follows: HINTS 1. Pick k, the number of clusters. 2. Initialize dusters by picking one point [centroid) per cluster. For this assigriment, you can pick the firstk Make use of list comprehensions for reading lines from a file and then converting the strings into a list of floats points as initial centroids for each corresponding dluster 3. For cach point, place it in the cluster whose current centrold it is nearest. . After all points are assigned, update the locations of centroids of the k clusters 5. Reassign all points to their closest centroid. This sometimes moves points between clusters 6. Repeat 4,5 untill convergence, Convergence occurs when points don't move between clusters and .Use pwdl to check the directory where you should place your input file. . Use a dict data structures for storing centrolds and clusters. The centrolds dict will be a mapping from cluster number to centroids. The clusters dict will be a mapping from cluster number to a list of points in the cluster centroids stabilize
the data set is- prog2-input-data.txt
1.8 4.5 1.1 2.1 9.8 7.6 11.32 3.2 0.5 6.5
Sample Output below:

70-511: Statistical Propramming ProgianmingAssignrien 2-k-Means Clustering You are to creale a prograrn usirng Python hat does the fullowing: Asks the user for a filename which contains the point data which is to be clustered (see Data File Format section for details). 1. 2. Asks the user for the narme af the output file. 3 Asks the user for the number of dusters. This is the parameter kthat will be used for k means. 4. Read the Input file and stores the points Into a list 5. Applies the k-means alporithm to find the cluster for each point. G. Displays the polints that cach duster contains after cach itcration of the algorithm . Writes the final cluster assignments to the output file. Clustering is a process of identifying groupinsi.e. clusters within the data. For example, the fipure below shows three clusters of two dimensional data points (Xs]: YOU CANNOT USE ANY PYTHON PACKAGES FOR THIS PROGRAM (NUMPY, PANDAS, .) NO IMPORT STATEMENTS. The name of your source code file should he kMeans. py. All your code shauld be within a single file. Your code should follow good coding practices, including good usc of whitespacc and usc of hoth inline and block comments You need to use meaningful identifier names that conform to standard namine conventions. At the top of cach file, you need to put In a block comment with the following Information: your name, date, course name, semester, and assignment name. The output of your program should exactly match the sample program output given at the end. 1. 2. 3. 4. 5. xxX Data File Format Let N be the number of points and Pi to he the value of point i. The input file shauld be of the following format: P1 P2 Clustering has many applications, including inferring population structures from genetic data, recognizing communities within social networks, or segmenting of customers for market research. PN One of the most popular algorithms for performing clustering is the k-means method. The algorithm depends on the natian of distance hetween two points. For points with only one dimension (just single values), we can define the distance between two points p and q as Example: 1.2 2.1 4.56 2.113 What to Turn In You will turn in the single kMeans.py file using BlackBoard The k-means algorithm will work by placing points into clusters and computing their centroids, which is defined as the average of the data points in the cluster. Specifically, the alporithm works as follows: HINTS 1. Pick k, the number of clusters. 2. Initialize dusters by picking one point [centroid) per cluster. For this assigriment, you can pick the firstk Make use of list comprehensions for reading lines from a file and then converting the strings into a list of floats points as initial centroids for each corresponding dluster 3. For cach point, place it in the cluster whose current centrold it is nearest. . After all points are assigned, update the locations of centroids of the k clusters 5. Reassign all points to their closest centroid. This sometimes moves points between clusters 6. Repeat 4,5 untill convergence, Convergence occurs when points don't move between clusters and .Use pwdl to check the directory where you should place your input file. . Use a dict data structures for storing centrolds and clusters. The centrolds dict will be a mapping from cluster number to centroids. The clusters dict will be a mapping from cluster number to a list of points in the cluster centroids stabilize 70-511: Statistical Propramming ProgianmingAssignrien 2-k-Means Clustering You are to creale a prograrn usirng Python hat does the fullowing: Asks the user for a filename which contains the point data which is to be clustered (see Data File Format section for details). 1. 2. Asks the user for the narme af the output file. 3 Asks the user for the number of dusters. This is the parameter kthat will be used for k means. 4. Read the Input file and stores the points Into a list 5. Applies the k-means alporithm to find the cluster for each point. G. Displays the polints that cach duster contains after cach itcration of the algorithm . Writes the final cluster assignments to the output file. Clustering is a process of identifying groupinsi.e. clusters within the data. For example, the fipure below shows three clusters of two dimensional data points (Xs]: YOU CANNOT USE ANY PYTHON PACKAGES FOR THIS PROGRAM (NUMPY, PANDAS, .) NO IMPORT STATEMENTS. The name of your source code file should he kMeans. py. All your code shauld be within a single file. Your code should follow good coding practices, including good usc of whitespacc and usc of hoth inline and block comments You need to use meaningful identifier names that conform to standard namine conventions. At the top of cach file, you need to put In a block comment with the following Information: your name, date, course name, semester, and assignment name. The output of your program should exactly match the sample program output given at the end. 1. 2. 3. 4. 5. xxX Data File Format Let N be the number of points and Pi to he the value of point i. The input file shauld be of the following format: P1 P2 Clustering has many applications, including inferring population structures from genetic data, recognizing communities within social networks, or segmenting of customers for market research. PN One of the most popular algorithms for performing clustering is the k-means method. The algorithm depends on the natian of distance hetween two points. For points with only one dimension (just single values), we can define the distance between two points p and q as Example: 1.2 2.1 4.56 2.113 What to Turn In You will turn in the single kMeans.py file using BlackBoard The k-means algorithm will work by placing points into clusters and computing their centroids, which is defined as the average of the data points in the cluster. Specifically, the alporithm works as follows: HINTS 1. Pick k, the number of clusters. 2. Initialize dusters by picking one point [centroid) per cluster. For this assigriment, you can pick the firstk Make use of list comprehensions for reading lines from a file and then converting the strings into a list of floats points as initial centroids for each corresponding dluster 3. For cach point, place it in the cluster whose current centrold it is nearest. . After all points are assigned, update the locations of centroids of the k clusters 5. Reassign all points to their closest centroid. This sometimes moves points between clusters 6. Repeat 4,5 untill convergence, Convergence occurs when points don't move between clusters and .Use pwdl to check the directory where you should place your input file. . Use a dict data structures for storing centrolds and clusters. The centrolds dict will be a mapping from cluster number to centroids. The clusters dict will be a mapping from cluster number to a list of points in the cluster centroids stabilize
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
