Question: mplement the ID 3 decision tree classification algorithm and apply it to the dataset breast - cancer.arff Some features contain the value ?
mplement the ID decision tree classification algorithm and apply it to the dataset breastcancer.arff
Some features contain the value indicating missing values. Solve the task by filling in these missing values using an approach of your choice, and justify your chosen approach.
To avoid overfitting the tree, use at least one approach for prepruning and one for postpruning:
PrePruning:
Constant N: Defines the maximum depth of the tree.Constant K: Defines the minimum number of training examples required in a node to turn it into a leafConstant G: Defines the minimum information gain required to split a node.
PostPruning:
Model error estimation when pruning the tree Error Estimation or Reduced Error PruningChiSquared test.Minimal CostComplexity Pruning.
Any additional pruning approaches will be considered as a bonus.
For testing the algorithm, split the dataset into training and testing sets in an : ratio, ensuring the data is shuffled before splitting. The split should be stratified to maintain the class distribution nonrecurrence, recurrence in the newly formed training and test sets.
Input:
The program should take three possible values as input: and :
: Use only the prepruning approach.
: Use only the postpruning approach.
: Use both prepruning and postpruning approaches.
If multiple types of pruning approaches are available, they are specified with the corresponding letter after the approach type. For example:
For prepruning: N K and G
For postpruning: E X and C
For example:
Input applies all implemented prepruning approaches.
Input K applies only the prepruning variant that defines the minimum number of training examples.
Other combinations follow a similar logic.
Output:
The program should output the following:
Train Set Accuracy:
The accuracy of the model on the training set when using the standard split.
Fold CrossValidation Results:
Accuracy for each fold.Average accuracy and standard deviation across the folds.
Test Set Accuracy:
Accuracy on the test set.
Example Input:
Example Output:
mathematica
Train Set Accuracy: Accuracy: Fold CrossValidation Results: Accuracy Fold : Accuracy Fold : Accuracy Fold : Accuracy Fold : Accuracy Fold : Accuracy Fold : Accuracy Fold : Accuracy Fold : Accuracy Fold : Accuracy Fold : Average Accuracy: Standard Deviation: Test Set Accuracy: Accuracy:
Notes:
Use data structures such as DataFrames where appropriate.
Compare the results achieved with different approaches to avoid overfitting.
As a bonus, try to implement the Random Forest algorithm. solve this in C AI task first input should be string of the information of breas cancer something like this : norecurrenceevents,premeno,noleft,leftlow,no
norecurrenceevents,premeno,noright,rightupno
norecurrenceevents,premeno,noleft,leftlow,no
norecurrenceevents,genoright,leftupno then or
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
