mplement the ID 3 decision tree classification algorithm and apply it to the dataset breast cancer arff Some features contain the value , indicating missing values Solve the task by filling in these missing values using an approach of your choice, and justify your chosen approach To avoid overfitting the tree, use at least one approach for pre pruning and one for post pruning Pre Pruning Constant N Defines the maximum depth of the tree Constant K Defines the minimum number of training examples required in a node ( to turn it into a leaf ) Constant G Defines the minimum information gain required to split a node Post Pruning Model error estimation when pruning the tree ( Error Estimation or Reduced Error Pruning ) Chi Squared ( 2 ) test Minimal Cost Complexity Pruning Any additional pruning approaches will be considered as a bonus For testing the algorithm, split the dataset into training and testing sets in an 8 0 2 0 ratio, ensuring the data is shuffled before splitting The split should be stratified to maintain the class distribution ( 2 0 1 non recurrence, 8 5 recurrence ) in the newly formed training and test sets Input The program should take three possible values as input 0 , 1 , and 2 0 Use only the pre pruning approach 1 Use only the post pruning approach 2 Use both pre pruning and post pruning approaches If multiple types of pruning approaches are available, they are specified with the corresponding letter after the approach type For example For pre pruning N , K , and G For post pruning E , X , and C For example Input 0 applies all implemented pre pruning approaches Input 0 K applies only the pre pruning variant that defines the minimum number of training examples Other combinations follow a similar logic Output The program should output the following Train Set Accuracy The accuracy of the model on the training set when using the standard split 1 0 Fold Cross Validation Results Accuracy for each fold Average accuracy and standard deviation across the folds Test Set Accuracy Accuracy on the test set Example Input 0 Example Output mathematica 1 Train Set Accuracy Accuracy 7 0 9 0 1 0 Fold Cross Validation Results Accuracy Fold 1 7 1 0 0 Accuracy Fold 2 6 9 7 2 Accuracy Fold 3 7 1 3 0 Accuracy Fold 4 7 3 0 5 Accuracy Fold 5 6 9 5 3 Accuracy Fold 6 6 9 5 3 Accuracy Fold 7 7 3 1 6 Accuracy Fold 8 7 1 5 3 Accuracy Fold 9 6 9 0 6 Accuracy Fold 1 0 7 1 0 9 Average Accuracy 7 0 9 0 Standard Deviation 1 3 7 2 Test Set Accuracy Accuracy 6 9 0 7 Notes Use data structures such as DataFrames where appropriate Compare the results achieved with different approaches to avoid overfitting As a bonus, try to implement the Random Forest algorithm solve this in C AI task first input should be string of the information of breas cancer something like this no recurrence events, 3 0 3 9 , premeno, 3 0 3 4 , 0 2 , no , 3 , left,left low,no no recurrence events, 4 0 4 9 , premeno, 2 0 2 4 , 0 2 , no , 2 , right,right up , no no recurrence events, 4 0 4 9 , premeno, 2 0 2 4 , 0 2 , no , 2 , left,left low,no no recurrence events, 6 0 6 9 , ge 4 0 , 1 5 1 9 , 0 2 , no , 2 , right,left up , no , then 0 , 1 or 2

The Answer is in the image, click to view ...

Question: mplement the ID 3 decision tree classification algorithm and apply it to the dataset breast - cancer.arff Some features contain the value ?

mplement the ID

3

decision tree classification algorithm and apply it to the dataset breast

-

cancer.arff

Some features contain the value

" ? ",

indicating missing values. Solve the task by filling in these missing values using an approach of your choice, and justify your chosen approach.

To avoid overfitting the tree, use at least one approach for pre

-

pruning and one for post

-

pruning:

Pre

-

Pruning:

Constant N: Defines the maximum depth of the tree.Constant K: Defines the minimum number of training examples required in a node

(

to turn it into a leaf

) .

Constant G: Defines the minimum information gain required to split a node.

Post

-

Pruning:

Model error estimation when pruning the tree

(

Error Estimation or Reduced Error Pruning

) .

Chi

-

Squared

(

^2)

test.Minimal Cost

-

Complexity Pruning.

Any additional pruning approaches will be considered as a bonus.

For testing the algorithm, split the dataset into training and testing sets in an

80

20

ratio, ensuring the data is shuffled before splitting. The split should be stratified to maintain the class distribution

(201

non

-

recurrence,

85

recurrence

)

in the newly formed training and test sets.

Input:

The program should take three possible values as input:

0, 1,

and

2

0

: Use only the pre

-

pruning approach.

1

: Use only the post

-

pruning approach.

2

: Use both pre

-

pruning and post

-

pruning approaches.

If multiple types of pruning approaches are available, they are specified with the corresponding letter after the approach type. For example:

For pre

-

pruning: N

,

,

and G

.

For post

-

pruning: E

,

,

and C

.

For example:

Input

0

applies all implemented pre

-

pruning approaches.

Input

0

K applies only the pre

-

pruning variant that defines the minimum number of training examples.

Other combinations follow a similar logic.

Output:

The program should output the following:

Train Set Accuracy:

The accuracy of the model on the training set when using the standard split.

10 -

Fold Cross

-

Validation Results:

Accuracy for each fold.Average accuracy and standard deviation across the folds.

Test Set Accuracy:

Accuracy on the test set.

Example Input:

0

Example Output:

mathematica

1 .

Train Set Accuracy: Accuracy:

70.90 % 10 -

Fold Cross

-

Validation Results: Accuracy Fold

1

71.00 %

Accuracy Fold

2

69.72 %

Accuracy Fold

3

71.30 %

Accuracy Fold

4

73.05 %

Accuracy Fold

5

69.53 %

Accuracy Fold

6

69.53 %

Accuracy Fold

7

73.16 %

Accuracy Fold

8

71.53 %

Accuracy Fold

9

69.06 %

Accuracy Fold

10

71.09 %

Average Accuracy:

70.90 %

Standard Deviation:

1.37 % 2 .

Test Set Accuracy: Accuracy:

69.07 %

Notes:

Use data structures such as DataFrames where appropriate.

Compare the results achieved with different approaches to avoid overfitting.

As a bonus, try to implement the Random Forest algorithm. solve this in C

+ +

AI task first input should be string of the information of breas cancer something like this : no

-

recurrence

-

events,

30 - 39,

premeno,

30 - 34, 0 - 2,

, 3,

left,left

_

low,no

-

recurrence

-

events,

40 - 49,

premeno,

20 - 24, 0 - 2,

, 2,

right,right

_

,

-

recurrence

-

events,

40 - 49,

premeno,

20 - 24, 0 - 2,

, 2,

left,left

_

low,no

-

recurrence

-

events,

60 - 69,

40, 15 - 19, 0 - 2,

, 2,

right,left

_

,

,

then

0, 1

2

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Implement the ID 3 decision tree classification algorithm and apply it to the breast cancer dataset ( breast - cancer.arff ) from UCI Machine Learning Repository. Some of the features contain the...

Data Science, Python, Jupyter Notebook I have a term project for my Capstone class in Data Science. Below is the syllabus, dataset, and the Jupiter Notebook. I am creating a Classification model to...

Exploring Data Analytics with Real-World Datasets The domain chosen for this project is the Ultimate Fighting Championship (UFC), a leading mixed martial arts organization that features some of the...

Follow the steps given in Machine Learning With R , Chapter 5, section "Example Identifying Risky Bank Loans Using C5.0 Decision Trees." download the credit. csv file from Packt Publishing's website...

Please provide the summary of the methodology and your understanding of this paper. Incluse necessary figures as well. Rapid Object Detection using a Boosted Cascade of Simple Features single feature...

INSTRUCTIONS ---> Python There are three parts to this project in Python. Please read all sections of the instructions carefully. I. Perceptron Learning Algorithm II. Linear Regression III....

INSTRUCTIONS There are three parts to this project in Python. Please read all sections of the instructions carefully. I. Perceptron Learning Algorithm II. Linear Regression III. Classification You...

Analytics Report Overview The purpose of this task is to provide students with practical experience in writing a data analytical report to provide useful insights, pattern and trends in a chosen...

1. Business Understanding Students are expected to identify a data analytics task of your choice. You have to detail the Business Understanding part of your problem under this heading which basically...

Implement in C + + 1 1 a Naive Bayes Classifier that classifies individuals as Democrats or Republicans, using the 1 6 attributes and two classes from the Congressional Voting Records dataset Some of...

a. If you are certain that interest rates will rise, should you consider purchasing a callable swap instead of the collar? Explain. b. Explain the conditions under which your purchase of an interest...

In your reading this week, you have learned about liability, negligence, and tort law. These legal concepts appear in a variety of situations in the sports industry, on the field, in the conference...

Data rejected by edit program are Select one or more:a . corrected and re - enteredb. removed from processingC. collected for later used. ignored during processing

Hi, I have an assignment and I'm unclear on the directions. The assignment is off of the Business 3 Economic Indicators Assignment(1).xls found on course hero. I'm unsure of what information it is...