Question: Classification Algorithms Implement in Python from the following 4 classifiers (your choice): {the Decision Tree, kNN, SVM, Backpropagation NN} Classifiers using the Heart Disease data

Classification Algorithms

Implement in Python from the following 4 classifiers (your choice): {the Decision Tree, kNN, SVM, Backpropagation NN} Classifiers using the Heart Disease data set from the University of California Irvine Machine Learning Data Repository at archive.ics.uci.edu/ml .

Data set: This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to this date. The "goal" field refers to the presence of heart disease in the patient. It is integer valued from 0 (no presence) to 4. Experiments with the Cleveland database have concentrated on simply attempting to distinguish presence (values 1,2,3,4) from absence (value 0).

The value functions for each attribute are described in the ML Data Repository. There are 13 input attributes and one output/decision attribute: heart disease present or absent. Partition data into training (learning model) and test sets. For tree classifier use the top-down greedy algorithms with either GINI or Information Gain/Entropy measures for node splitting. It would be more elegant (but not required) to avoid model overfitting using pessimistic error formula whether to prune leaves nodes or not to avoid model overfitting.

For SVM you can use either linear SVM (risking that both classification (training and generalization) error will be large), or preferably nonlinear SVM using e.g., polynomial, Gaussian radial, or sigmoid kernel. Of course, your output class attribute should be modified: instead of 1 for disease class use +1, and instead of 0 for non-disease class use -1. You can be inspired, but you are not allowed to use an existing code, in other words you write your own programs, but you can use standard or other language libraries, including libraries for linear algebra, matrices, and Lagrangian nonlinear optimization with constraints (excluding libraries/ software packages for data mining or machine learning with implemented complete algorithms). Please include both sources and sample outcome running of your programs. Compare performance of both classifiers, i.e., it is sufficient to provide both training accuracy and test/generalization accuracy for both your programs (of course, using the same training and test data). Based on that, reply which classifier seems be performing better for your programs and data. Comment: a more elegant would be to test, e.g., the confidence interval for the true accuracy (based on test accuracy) at (1 - ) confidence level, or the hypothesis that the performance difference for stochastic variable d = e1 - e2 (where e1 is misclassification error for the first classifier, and e2 is misclassification error for the second classifier) is statistically significant at (1 - ) confidence level.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!