Question: Problem Statement ( in natural language ) . You are given a set of 5 6 9 instances, each with information about a patient s

Problem Statement

(

in natural language

) .

You are given a set of

569

instances, each with information about a patient

s cell, that is or is not cancerous. The information includes the patient

s ID

(

which should not be relevant to his

/

her diagnosis

)

and a diagnosis

(

malignant or benign

),

as well as a set of attributes that are usually used for diagnosis. You are asked to design, implement, and train a machine learning model, and assess its quality in terms of

(

)

accuracy of diagnosis

(

of instances not used for training

)

and

(

)

computational efficiency and scalability of your solution. The machine learning model, which an expert recommended, is called k

-

nearest neighbour

(

or k

-

) .

You are required to use a tree data structure that would optimize both accuracy and efficiency

(

with accuracy taking precedence

) .

In answer to this problem, submit

1

Java program,

1

report

(

PDF

)

and

1

signed expectations of originality form

(

PDF

) .

The report must include the following

(

-

III

)

items.

(10 %)

I. A

1 -

pargraph characterization of the problem, in abstract and precise computer science terms, excluding all irrelevant details and superfluous language. Adherence to formatting instructions

(

see Formatting on page

2)

counts towards this part of the grade.

(20 %)

.

A description of the model you will use to solve the problem:

1)

The main data structure

(

)

as annotated diagram

(

)

;

2)

The main algorithm

(

)

operating on the data structure

(

),

as high

-

level flow chart

(

),

augmented as necessary by lower

-

level pseudo

-

codes of the main methods employed.

(40 %)

III. The results with analysis of the testing of the trained model, presented as

2

figures that show how much

running time

and

testing accuracy

change as a function of the number of training instances

(

) .

T is the number of test instances; each figure should have three series of points for the three

(3)

values of k

(

see below

) .

Guidelines. Use N

= 100, 200, 300, 400, 500 .

The ratio N

/

T must be maintained at

4 / 1 .

An instance used for training, in a given train

-

and

-

test run, cannot be used for testing. Also, you must retrain the model prior to each test. Further, the

(

+

)

instances must be chosen at random from the complete set. Though this is not the case in medical practice

*,

implement accuracy as the percentage of test instances

(

)

that are correctly predicted by your trained model

(

.

.,

predicted diagnosis

=

actual diagnosis

) .

For running time, just use actual execution time

(

of testing, not

Page

2

2

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Last Updated: July 18, 2016 American Mock Trial Association MIDLANDS RULES OF EVIDENCE Article I. Rule 101. Scope; Definitions (a) Scope. These rules apply to proceedings in the courts of the State...

Let A, B be sets. Define: (a) the Cartesian product (A B) (b) the set of relations R between A and B (c) the identity relation A on the set A [3 marks] Suppose S, T are relations between A and B, and...

Describe how to construct the function cpo ((D E), v) of two cpos (D, vD) and (E, vE). Prove that ((D E), v) is a cpo. (You may use facts about least upper bounds provided you state them clearly.)...

Let r and s be solutions to the quadratic equation x 2 b x + c = 0. For n N, define d0 = 0 d1 = r s dn = b dn1 c dn2 (n 2) Prove that dn = r n s n for all n N. [4 marks] (b) Recall that a commutative...

Please read the questions Question: Please describe your thoughts concerning in this Chapter 5. Key Points 5 . In the United States the number of bilinguals has steadily increased but monolingualism...

ret Electricity consumers are supplied with electricity from an electricity generating station. Electricity is distributed from the station to the various consumers through a network of transformers...

Predictive text entry systems are familiar on touch screens and mobile phones. This question asks you to consider how the same principles might be used in a programming editor for creating Java code....

MATHEMATICS FOR MACHINE LEARNING Marc Peter Deisenroth A. Aldo Faisal Cheng Soon Ong Contents Foreword 1 Part I Mathematical Foundations 9 1 Introduction and Motivation 11 1.1 Finding Words for...

ITM 309: Business Information Technology and Systems Spring 2016 Watson and the new era of cognitive systems Jerry Haan IBM Cloud Ecosystem Development January 27, 2016 2013 International Business...

Chapter 20 describes a blotting method known as Western blotting that can be used to detect a polypeptide that is translated from a particular mRNA. In this method, a particular polypeptide or...

What is the average difference between the price of name-brand soup and the price of store-brand soup? To obtain an estimate, an analyst randomly samples eight stores. Each store sells its own brand...

YearXY 1 1 8 % 1 5 % 2 2 1 3 3 3 1 2 1 4 4 1 4 1 9 5 1 0 2 3 Using the returns shown above, calculate the variances for X and Y . ( Do not round intermediate calculations. Enter your average return...

EI 1 5 " Nu ububu Vevi: Wobia tso sukuviwo si be woaxl nus sr a me nyawo nyuie, ase nukp susuawo g me eye emegbe woayi d dasi sia dzi. Nuw w aduadu ate u al hadome o ow w kple nudzralawo, asisiwo, ho...

=+6. How does a firm avoid the perception that its CSR report is greenwash?

=+Was the judge correct to rule the way he did? If the case came before the court today, would the outcome be any different?

=+1. What is your opinion of the verdict delivered in Dodge v. Ford Motor Company?