Consider the dataset postoperative patient data simplified arff available on moodle This dataset con tains health status attributes of post operative patients in a hospital, with the target class being whether the patients should be discharged (S) or remain in the hospital (A) Additional documenta tion regarding these attributes appears in the arff file Before you run the classifiers, use the weka visualization tool to analyze the data, and report briefly on the types of the different variables and on the variables that appear to be important (4 marks) Run J48 ( C4 5, decision tree), Na ve Bayes and IBk (k NN) to learn a model that predicts whether a patient should be discharged Perform 10 fold cross validation, and analyze the results obtained by these algorithms as follows Note Click on the Choose bar to select relevant parameters Explanations of parameters you should try appear below You should report on performance of at least two variations of the operational parameters, e g , minNumObj and unpruned for J48, and KNN and distanceWeighting for k NN (the parameters debug and saveInstanceData are not operational) J48 binarySplits whether you use binary splits on nominal attributes when building the trees minNumObj the minimum number of instances per leaf unpruned whether pruning is performed (try TRUE and FALSE) debug if set to TRUE, the classifier may output additional information saveInstanceData whether to save the training data for visualization Na ve Bayes (parameter variations are not relevant to this lab) k NN (IBk) (under lazy in WEKA) KNN the number of neighbours to use crossValidate whether leave one out X validation will be used to select the best k value between 1 and the value specified in the KNN parameter distanceWeighting specifies the distance weighting method used (when k 1) debug if set to TRUE, the classifier may output additional information (a)J48 ( C4 5) i Examine the decision tree and indicate which are the main variables ii What is the accuracy of the decision tree Explain the results in the confusion matrix (b)Na ve Bayes Explain the meaning of the probability distributions in the output, illustrating it with reference to the BP STBL attribute Calculate (by hand) the probability that a person with the following attribute values would be discharged L CORE mid L SURF low L O2 good L BP high SURF STBL stable CORE STBL stable BP STBL mod stable What is the probability that a person with these attributes will remain in hospital and that s he will be discharged What would the Na ve Bayess classifier predict for this person What is the accuracy of the Na ve Bayes classifier Explain the results in the confusion matrix (c) k NN Find three instances in the dataset that are similar to the above patient, and use the Jaccard coefficient to calculate (by hand) the predicted outcome for this patient Show your calculations What is the accuracy of the k NN classifier for different values of k Explain the results in the confusion matrix 3 Draw a table to compare the performance of J48, Na ve Bayes and IBk using the summary measures produced by WEKA Which algorithm does better Explain in terms of WEKA's summary measures Can you speculate why

The Answer is in the image, click to view ...

Question: Consider the dataset postoperative-patient-data simplified.arff available on moodle. This dataset con- tains health-status attributes of post-operative patients in a hospital, with the target class being

Consider the dataset postoperative-patient-data simplified.arff available on moodle. This dataset con- tains health-status attributes of post-operative patients in a hospital, with the target class being whether the patients should be discharged (S) or remain in the hospital (A). Additional documenta- tion regarding these attributes appears in the arff file.

Before you run the classifiers, use the weka visualization tool to analyze the data, and report briefly on the types of the different variables and on the variables that appear to be important. (4 marks)
Run J48 (=C4.5, decision tree), Na ve Bayes and IBk (k-NN) to learn a model that predicts whether a patient should be discharged. Perform 10-fold cross validation, and analyze the results obtained by these algorithms as follows.
Note: Click on the "Choose" bar to select relevant parameters. Explanations of parameters you should try appear below. You should report on performance of at least two variations of the operational parameters, e.g., minNumObj and unpruned for J48, and KNN and distanceWeighting for k-NN (the parameters debug and saveInstanceData are not operational).

J48

binarySplits: whether you use binary splits on nominal attributes when building the trees.
minNumObj: the minimum number of instances per leaf.
unpruned: whether pruning is performed (try TRUE and FALSE).
debug: if set to TRUE, the classifier may output additional information.
saveInstanceData: whether to save the training data for visualization. Na ve Bayes (parameter variations are not relevant to this lab)
k-NN (IBk) (under lazy in WEKA)
KNN: the number of neighbours to use.
crossValidate: whether leave-one-out X-validation will be used to select the best k value
between 1 and the value specified in the KNN parameter.
distanceWeighting: specifies the distance weighting method used (when k 1).
debug: if set to TRUE, the classifier may output additional information.
(a)J48 (=C4.5)
i. Examine the decision tree and indicate which are the main variables.
ii. What is the accuracy of the decision tree? Explain the results in the confusion matrix.
(b)Na ve Bayes
Explain the meaning of the "probability distributions" in the output, illustrating it with reference to the BP STBL attribute.
Calculate (by hand) the probability that a person with the following attribute values would be discharged.
L-CORE = mid
L-SURF = low
L-O2 = good
L-BP = high SURF-STBL = stable CORE-STBL = stable BP-STBL = mod-stable
What is the probability that a person with these attributes will remain in hospital and that s/he will be discharged? What would the Na ve Bayess classifier predict for this person?
What is the accuracy of the Na ve Bayes classifier? Explain the results in the confusion matrix.

Find three instances in the dataset that are similar to the above patient, and use the
Jaccard coefficient to calculate (by hand) the predicted outcome for this patient. Show
your calculations.
What is the accuracy of the k-NN classifier for different values of k? Explain the results
in the confusion matrix.

3. Draw a table to compare the performance of J48, Na ve Bayes and IBk

using the summary measures produced by WEKA. Which algorithm does better? Explain in terms of WEKA's summary measures. Can you speculate why?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!

uetion1 OECD Health Statistics 2015 Definitions, Sources and Methods HEALTH STATUS (HEALTH_STAT) Access the dataset on Health Status in OECD.Stat:...

Prompt 1 : In questions 1 - 5 , answer whether each statement is true or false regarding cross validation. The validation set approach tends to overestimate the test mean square error. Prompt 1 : In...

data is also given solve all thumbs will be up LEK 810: Practical Assignment 1 Consider the dataset (wage2) in the Wooldridge package embedded in R for this assignment. The dataset contains numerous...

Question 10 [4+4=8 points) Consider the dataset in Matlab obtained by typing load carsmall' in the command window. The dataset consists of observations on a fuel consumption metric MPG, which is the...

dules Multiple Attempts Not allowed. This test can only be taken once. Force Completion This test can be saved and resumed at any point until time has expired. The timer wil Your answers are saved...

A. Consider the dataset Earnings.dta , where the variables are defined as follows: EARNINGS = current hourly earnings in $ S = years of schooling EXP = total out-of-school work experience in years IQ...

( a ) Write a recursive procedure for the Quick Sort algorithm including a separate sode for the partition process. ( b ) Apply the algorithm in ( a ) to sort the dataset 2 5 , 5 7 , 4 8 , 3 7 , 1 2...

help this lab is due soon and I don't know what to do Problem 2: you need to write a function that accepts two lists of numbers of same lenght (representing, independent and dependent variables x and...

https://drive.google.com/file/d/1yypq5VcO-xF5R_ZiPbrIu6jFYmQxniW9/view?usp=sharing QUESTION 1 1. C10 Q10 V3: The Excel le STATISTICSSTUDENTSSURVEYFORR contains the column ENDPULSEMIN (a numerical...

Living cells are highly ordered units, yet the universe is heading toward higher entropy. Discuss how life can maintain its order in spite of the second law of thermodynamics. Are we defying this...

-21 For the following circuit, the inductor current for t> 0 is given by: ed iz = A, e-10f + A, e401 out of question Using initial conditions, nodal, and mesh analysis calculate Az: 200 N 50 N 5H...

What's the answer Time left 0:19:11 Hide Oxygen acts as a final electron acceptor in respiration. Which molecule is it ultimately converted into? O a. CO2 O b. glucose O c. water O d. ATP O e. NADH

Julian Birkinshaw, James Manktelow, Vittorio DAmato, Elena Tosca, and Francesca Macchi In recent years, one of the most profound transformations in the workplace has been the increase in age...