Question: Breast Cancer Data Analysis Q2-1, Finding good k, (25 points) 'breast cancer.txt' data set has 699 samples with 9 features. Each sample is classified

Breast Cancer Data Analysis Q2-1, Finding good k, (25 points) 'breast cancer.txt' data set has 699 samples with 9 features. Each sample is classified into 2 classes (index '2' for benign and index '4' for malignant). Note that the class information is represented at the last column in the data file i) Apply k-mean clustering with various k = (1, 2, 3,..., 8) and its corresponding J (For J, refer to HW2). (15 pt) ii) Apply Hierarchical clustering. (15 pt) Please make sure that you must ignore the last column when you run the clustering methods. For the additional information of the data, refer the included the excel file or visit https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+ (original) Q2-2 Comparison k-mean and GMM, (30 points) Since you know the actual class of the data, calculate ground-truth accuracy P with k-mean and GMM. Note that fix the number of clusters as 2. the number of correclty predicted samples total number of samples Hint) Use 'predict' function for GMM and 'fit predict' function for k-mean in scikit-learn package. P (1)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

"For 1-year SPX options, we observe that the 25-delta put implied volatility is at 45%, the 50-delta call implied volatility is at 30%, the 25-delta call implied volatility is at 35%. These...

When using the CGE Assembler with Ion Torrent sequencing data, what assembler will do the assembly?

Let A, B be sets. Define: (a) the Cartesian product (A B) (b) the set of relations R between A and B (c) the identity relation A on the set A [3 marks] Suppose S, T are relations between A and B, and...

Design a Java class that represents a cache with a fixed size. It should support operations like add, retrieve, and remove, and it should evict the least recently used item when it reaches capacity.

Overview and Requirements For this programming assignment, we are going to implement the k-means clustering algorithm in Jupyter Notebook. Cluster analysis seeks to separate objects into groups (or...

Confirming Pages C H A P T E R 19 Analyzing Information and Writing Reports Chapter Outline Using Your Time Efficiently Analyzing Data and Information for Reports Identifying the Source of the Data...

Excelsior College PBH 321 CRITICAL EYE ON RESEARCH IN EPIDEMIOLOGY By the end of this activity, you will be able to illustrate the purposes, designs, weaknesses, and relevance of a randomized drug...

Research Recherche Family caregiver burden: results of a longitudinal study of breast cancer patients and their principal caregivers Eva Grunfeld, Doug Coyle, Timothy Whelan, Jennifer Clinch, Leonard...

Study Guide Healthcare Statistics By Jacqueline K. Wilson, RHIA About the Author Jacqueline K. Wilson is a Registered Health Information Administrator (RHIA) who has more than ten years of experience...

Commentary Beyond Black Box Epidemiology Douglas L. Weed, MD, PhD "Black box epidemiology" is the most recent addition to a long list of disciplinary subgroups although few practitioners would likely...

\fThis is an electronic version of the print textbook. Due to electronic rights restrictions, some third party content may be suppressed. Editorial review has deemed that any suppressed content does...

College football players trying out for the NFL are given the Wonderlic standardized intelligence test. The file contains Wonderlic the averageWonderlic score of football players trying out for the...

A company purchases equipment for $76,300. It is to pay $16,300 on the delivery date, $23,800 one year from the delivery date, and $38,800 two years from the delivery date. How much should the entity...

. 1 . 2 Discuss three critical integration challenges that are highly probable for this complex, multi - party system - of - systems ( SoS ) project, particularly concerning human factors,...

Analyze the primary accounting issues which form the crux of the litigation or fine for the firm, and indicate the impact to the firm as a result of litigation or fine. Provide support for your...

Analyze the corporate culture at various agencies and clients. Start with the statement on www.ogilvy.com/About/ Our-History/Corporate-Culture.aspx for an inside view of how this agency articulates...

Explain how management compensation can provide an incentive to unethical behavior. What methods can be used to reduce the chance of unethical activities resulting from compensation plans?

Identify three potential unethical actions or inactions related to decision analysis and the ethical principle each violates.

Corporate Ethics is it unfair or unethical for corporations to create classes of stock with unequal voting rights?

To be effective a project leader should which of the following? Delegate tasks to Agile team members as quickly as possible

For the 2008 GSS, the table shows cell counts and standardized residuals (in parentheses) for happiness and marital status. Summarize the association by indicating which marital statuses have strong...

Refer to the previous exercise. a. Based on the prediction equation, when the defendant is black and the victims were white, show that the estimated death penalty probability is 0.233. b. The...

The variable POSTLIFE in the 2008 General Social Survey asked, Do you believe in life after death? Of 1787 respondents, 1455 answered yes. A report based on these data stated that 81.4% of Americans...

Describe the role played by defensive behaviors in a conflict situation and explain how to engage and encourage supportive behaviors.

Explain why violence is not a fact of life.

Explain the key concepts and assumptions that identify factors that play an important role in interpersonal conflict according to each theory.