Question: PHASE 3 ( 3 0 % ) Phase 2 program, which implements k - means algorithm, produces two clusters - one containing benign cells (

PHASE 3(30%)
Phase 2 program, which implements k-means algorithm, produces two clusters - one containing benign
cells (predicted class =2) and the other one that contains malign cells (predicted class =4). But there are
chances that a malign cell is clustered into a benign cluster and vice versa.
In phase 3 you will analyze the quality of the clustering. To check how well your clustering worked, you
will calculate the error rate for your clusters. Assume that the column "Class" of the initial data set
contains correct clustering of the data points.
INSTRUCTIONS
There are two parts in phase 3:
Write a code to calculate the individual and total error rates of the predicted clusters.
Prepare and submit final report
a) Write code to calculate the individual and total error rates of the predicted clusters
Your phase 3 program will calculate the error rates based on two arguments:
The predicted clusters, calculated by your phase 2 program,
The correct clusters, specified by the column "Class" of the initial data set.
Let's have a look at the example of the cluster assignment with first 20 data points, listed on page
Column "Class" represents the correct clusters and column "Predicted_Class" represents the
clusters calculated by the k-means algorithm.Marked data points represent the errors of the k-means clustering:
Yellow data points are predicted as class 4(malign cells), while the correct class is 2
(benign cells).
Gray data points are predicted as class 2(benign cells), while the correct class is 4
(malign cells).
Let's define the following notation:
Use the following formulae to calculate and print error rates for each cluster:
errorB=(error24??pclass2)**100%
errorM=( error42pclass4)**100%
errorT=errorallclassall
Total error rate more than 50% indicates that your program swapped the predicted clusters. Your
program has to detect this situation, swap the predicted clusters by replacing 2 with 4, and 4 with
2 in column "Predicted_Class", and recalculate the error rates.
b) Prepare final report that incorporates all the results and your conclusions for phases 1 to 3.
SAMPLE OUTPUT
This is the output in case the clusters are swapped and the program swapped the predicted class.Error data points, Predicted Class 4:
Number of all data points: 699
Number of error data points: 28
Error rate for class 2: ,3.7%
Error rate for class 4: ,4.7%
Total error rate: ,4.0%
SUBMISSION GUIDELINES
Prepare and submit a PDF with final report that includes:
> Project statement
Short description of phase 1,2 and 3 programs (algorithm, description of input data,
structure of the programs and description of results)
> Phase 1,2 and 3 results
> Conclusion
Submit phase 1,2 and 3 programs together with any data files that may be needed to run your
programs.
Provide 'readme.txt' file that provides information about how to execute your code.
I did phase 1&2 just need #3
PHASE 3 ( 3 0 % ) Phase 2 program, which

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!