Question: In Python! I reduced the same list down tremendously, but just wanted to get a general idea of a concept. After a biopsy of a

In Python! I reduced the same list down tremendously, but just wanted to get a general idea of a concept.

After a biopsy of a tumor tissue tests are run on the tumor cells to determine a diagnosis of benign or malignant. The tests result in 30 different cell attribute measurement values. Some or the measured aspects are radius_mean, perimeter_mean, area_mean which measure the mean value of cell radius (distance from center point), perimeter, and area. Looking at the web page and the data set you can see the other 27 different values. Based on these 30 values, a formula is applied to determine with there tumor is malignant or benign. Note, the first column is the sample id. The second column is the diagnosis for the sample, where M means malignant and B means benign. At lunch one day, you and a medical technician come up with the idea that all this data and complicated formula are not needed. Instead, you decide you just need to look at the first four metrics {radius, texture, perimeter, area} means.

The process is as follows:

a) strip the data to only consider those 4 values

b) Create four data files:

q3_gte_13: third attribute - those data samples whose radius value is >= 13

q4_gte_18: fourth attribute - those data samples whose texture value is >= 18

q5_gte_85: fifth attribute - those data samples whose perimeter value is >= 85

q6_gte_500: sixth attribute - those data samples whose area value is >= 500

c) Find the data ids that are in each of these four files. The idea is that if a data sample exceeds the threshold (13, 18, 85, and 500) for each of these 4 attributes then the tumor is malignant. If the data does not exceed any of these attributes, then the tumor is benign. If the tumor exceeds some, but not all, of these thresholds then the tumor could be either benign or malignant.

To do this, you need to take the intersection of 4 files where the files have the ids of these data sets.

The four files (q3_gte_13, q4_gte_18, q5_gte_85, q6_gte_500) also have the diagnosis of B or M from the original test included. You want to test the quality of your process.

To do this create 2 versions for each of the 4 dimensions: q3_B, q3_M

Where B means the data has been diagnosed as Benign and M means it was diagnosed as Malignant.

Likewise for columns 4, 5, and 6 giving files:

q4_B,

q4_M

q5_B,

q5_M

q6_B,

q6_M

Let file NewResult contain the intersection of ids from the four files (q3_gte_13, q4_gte_18, q5_gte_85, q6_gte_500), i.e. regardless of whether the original methods said M or B. Compare NewResult to the data found in the 4 files you created of ids of M data: q3_M, q4_M, q5_M, and q6_M. Let SubsetMResult contain the data that is the union of the four files (q3_M, q4_M, q5_M, and q6_M).

Then, calculate: Difference_1 = SubsetMResult - NewResult If your new method is capture all the same data, then Difference should be the empty set.

Sort difference 1

Written Responses:

What is the proportion of observations in Original Result with DIAGNOSIS = M?

What is the length of SubsetMResult and NewResult?

In Python! I reduced the same list down tremendously, but just wanted to get a general idea of a concept. After a biopsy of

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Python! I reduced the same list down tremendously, but just wanted to get a general idea of a concept. After a biopsy of a tumor tissue tests are run on the tumor cells to determine a diagnosis of...

answer the following questions: What are your moral obligations as professionals to prevent such things from happening? What guidance do the professional codes provide (cite specific code sections in...

v Dr. Uma Kotagal, Senior Vice President (SVP) of Quality and Transformation, reflected on the beehive of improvement activity under way in 2009 at the Cincinnati Children's Hospital Medical Center...

Instead of answer questions at the end of the case! Please briefly analyze the central problems and issues of the case, and provide some meaningful analysis and suggestions. The report should not be...

07-043 March 13, 2008 Eli Lilly: Recreating Drug Discovery for the 21st Century Rebecca Henderson and Cate Reavis The rise of personalized medicine is one of the most important developments in health...

Fixing the payment system at Alvalade XXI: a case on IT project risk management Ramon O'Callaghan Tilburg University, The Netherlands Correspondence: AO'Callaghan, School of Economics and Business...

Hi. I'm asked to take this class for graduation but I don't know how to do any of this. Please help me! I'm already devastated with the whole concept and Finance is new to me. Idk what I'm doing in...

Habit #4 - Think Win/Win - Write a one-page assessment of how you can use this habit in your life and career. Please include 3-4 concepts from the chapter in the discussion. 239 HABIT 4: THINK...

from case study : Lululemon: turning lemon into lemonade. question: Which corporate social responsibility approach is used in the case? Explain and provide evidence of your selection? please provide...

Trent Johnson opened his lawn care retail business nine months ago with $33,000 in cash and a longing to be a small business owner. Many of his friends thought he was crazy leaving a stable job, but...

Cincinnati Cylinder Company began operations on January 1 to produce pneumatic cylinders used in a variety of machines. It used an absorption costing system with a planned production volume of...

Those who trade stocks based on inside information usually earn very high rates of return. True or False: This fact violates the efficient markets hypothesis because the efficient markets hypothesis...

Compared with half a century ago, adoption has become _ _ _ _ _ _ _ _ _ common, but it is more open and acceptabl e , so we probably discuss it _ _ _ _ _ _ _ . fill in the blanks more or much less or...

4. Do you actively solicit the opinions, feelings, and suggestions of all colleagues, regardless of demographics?

2. Do colleagues of nontraditional demographics provide evidence that they feel comfortable around you?

1. Do others see you acting comfortable around colleagues with nontraditional demographics (i.e., age, education, ethnicity, gender, race, sexual orientation)?