Question: Bagging Algorithms The base type bagging machine learning algorithms that will be examined in this assignment are: Bagged CART, Random Forest Stacking Algorithms The base

Bagging Algorithms

The base type bagging machine learning algorithms that will be examined in this assignment are:

  • Bagged CART,
  • Random Forest

Stacking Algorithms

The base type stacking machine learning algorithms that will be examined in this assignment are

  • Classification and Regression Trees (CART),
  • K-Nearest Neighbors (KNN),
  • Nave Bayes (NB)

Main-Question: How will you know how good your ensemble classifier is? Under which conditions ensemble learning is useful?

 

1st Task: Data Set Selection and Visualisation

You need to select a data set of your own choice (i.e. you may use a dataset already used before in the lab, or from the literature review) for the purposes of building training and validating the above type of classifiers (Bagging, Stacking). With the aid of R package visualise and justify the properties of the selected data set.

 

2nd Task: Formation of Training and Test Sets

Assuming we have collected one large dataset of already-classied instances, you need to look into methods of forming training and test sets from this single dataset in R as described below.

Repeated k-fold Cross-Validation

The process of splitting the data into k-folds can be repeated a number of times; this is called Repeated k-fold Cross-Validation (repeatedcv). The final model accuracy is taken as the mean from the number of repeats.

 
3rd Task: Build Train and Test a Bagging type Classifier 

You need to construct, train and test a Bagging type classifier in R, based on Bagged CART and Random Forest base classifiers. Train and test the Bagging classifier using the training and test sets generated based on the method tried as part of the 2nd Task.

 
 
4th Task: Build Train and Test a Stacking type Classifier 

You need to construct, train and test a Stacking type classifier in R, based on (CART, KNN, NB). Train and test your Stacking classifier using the training and test sets generated based on the method tried as part of the 2nd Task.

 
 
5th Task: Measure Performance 

For each type of ensemble type classifier calculate and display the following performance related metrics in R. Critically comment on the importance of each metric for each type of ensemble type classifier. Use the library library(ROCR)

  1. Confusion matrix
  2. Precision vs. Recall
  3. Accuracy
  4. ROC(receiver operating characteristic curve)
  5. RAUC (receiver under the curve area)
  6. Training time
  7. Testing time
  8. Based on the above Metrics briefly discuss, how we can increase the reliability and consistency of the data classification task at hand.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!