Question: Problem Statement ( in natural language ) . You are given a set of 5 6 9 instances, each with information about a patient s

Problem Statement (in natural language). You are given a set of 569 instances, each with information about a patients cell, that is or is not cancerous. The information includes the patients ID (which should not be relevant to his/her diagnosis) and a diagnosis (malignant or benign), as well as a set of attributes that are usually used for diagnosis. You are asked to design, implement, and train a machine learning model, and assess its quality in terms of (a) accuracy of diagnosis (of instances not used for training) and (b) computational efficiency and scalability of your solution. The machine learning model, which an expert recommended, is called k-nearest neighbour (or k-NN). You are required to use a tree data structure that would optimize both accuracy and efficiency (with accuracy taking precedence). In answer to this problem, submit 1 Java program, 1 report (PDF) and 1 signed expectations of originality form (PDF). The report must include the following (I-III) items.
(10%) I. A 1-pargraph characterization of the problem, in abstract and precise computer science terms, excluding all irrelevant details and superfluous language. Adherence to formatting instructions (see Formatting on page 2) counts towards this part of the grade.
(20%) II. A description of the model you will use to solve the problem: 1) The main data structure(s) as annotated diagram(s); 2) The main algorithm(s) operating on the data structure(s), as high-level flow chart(s), augmented as necessary by lower-level pseudo-codes of the main methods employed.
(40%) III. The results with analysis of the testing of the trained model, presented as 2 figures that show how much running time and testing accuracy change as a function of the number of training instances (N). T is the number of test instances; each figure should have three series of points for the three (3) values of k (see below).
Guidelines. Use N =100,200,300,400,500. The ratio N/T must be maintained at 4/1. An instance used for training, in a given train-and-test run, cannot be used for testing. Also, you must retrain the model prior to each test. Further, the (N+T) instances must be chosen at random from the complete set. Though this is not the case in medical practice*, implement accuracy as the percentage of test instances (T) that are correctly predicted by your trained model (i.e., predicted diagnosis = actual diagnosis). For running time, just use actual execution time (of testing, not
Page 2 of 2

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!