Question: Bagging Algorithms The base type bagging machine learning algorithms that will be examined in this assignment are: Bagged CART, Random Forest Stacking Algorithms The base
Bagging Algorithms
The base type bagging machine learning algorithms that will be examined in this assignment are:
- Bagged CART,
- Random Forest
Stacking Algorithms
The base type stacking machine learning algorithms that will be examined in this assignment are
- Classification and Regression Trees (CART),
- K-Nearest Neighbors (KNN),
- Nave Bayes (NB)
Main-Question: How will you know how good your ensemble classifier is? Under which conditions ensemble learning is useful?
1st Task: Data Set Selection and Visualisation
You need to select a data set of your own choice (i.e. you may use a dataset already used before in the lab, or from the literature review) for the purposes of building training and validating the above type of classifiers (Bagging, Stacking). With the aid of R package visualise and justify the properties of the selected data set.
2nd Task: Formation of Training and Test Sets
Assuming we have collected one large dataset of already-classied instances, you need to look into methods of forming training and test sets from this single dataset in R as described below.
Repeated k-fold Cross-Validation
The process of splitting the data into k-folds can be repeated a number of times; this is called Repeated k-fold Cross-Validation (repeatedcv). The final model accuracy is taken as the mean from the number of repeats.
3rd Task: Build Train and Test a Bagging type Classifier
You need to construct, train and test a Bagging type classifier in R, based on Bagged CART and Random Forest base classifiers. Train and test the Bagging classifier using the training and test sets generated based on the method tried as part of the 2nd Task.
4th Task: Build Train and Test a Stacking type Classifier
You need to construct, train and test a Stacking type classifier in R, based on (CART, KNN, NB). Train and test your Stacking classifier using the training and test sets generated based on the method tried as part of the 2nd Task.
5th Task: Measure Performance
For each type of ensemble type classifier calculate and display the following performance related metrics in R. Critically comment on the importance of each metric for each type of ensemble type classifier. Use the library library(ROCR)
- Confusion matrix
- Precision vs. Recall
- Accuracy
- ROC(receiver operating characteristic curve)
- RAUC (receiver under the curve area)
- Training time
- Testing time
- Based on the above Metrics briefly discuss, how we can increase the reliability and consistency of the data classification task at hand.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
