Question: 4 . Ensemble Methods In this question, you will implement several ensemble methods including Bagging and AdaBoost on a simple dataset. The methods will learn

4. Ensemble Methods
In this question, you will implement several ensemble methods including Bagging and AdaBoost on a simple dataset. The methods will learn a binary classification of 2 D datapoints in \([-1,1]^{2}\).
(a) Weak learner
To begin with, you will implement a weak learner to do the binary classification.
A decision stump is a one-level decision tree. It looks at a single feature, and then makes a classification by thresholding on this feature. Given a dataset with positive weights assigned to each datapoint, we can find a stump that minimizes the weighted error:
\[
L=\sum_{i=1}^{n} w^{(i)}\cdot \mathbf{1}\left(y^{(i)}
eq \widehat{y}^{(i)}\right)
\]
Here \( w^{(i)}\) is the weight of the \( i \)-th datapoint, and the prediction \(\widehat{y}^{(i)}\) is given by thresholding on the \( k \)-th feature of datapoint \(\boldsymbol{x}^{(i)}\) :
\[
\widehat{y}^{(i)}=\left\{\begin{array}{ll}
s, & \text { if } x_{k}^{(i)}\geq t \\
-s, & \text { otherwise }
\end{array}\right.
\]
For the 2D dataset we have, the parameters of this stumps are the sign \( s \in\{+1,-1\}\), the feature dimension \( k \in\{1,2\}\), and the threshold \( t \in[-1,1]\). In this question, your task is to find out the best stump given the dataset and weights.
Learning a decision stump requires learning a threshold in each dimension and then picking the best one. To learn a threshold in a dimension, you may simply sort the data in the chosen dimension, and calculate the loss on each candidate threshold. Candidates are midpoints between one point and the next, as well as the boundaries of our range of inputs.
Please implement the Stump class in hw5.py. You may define your own functions inside the class, but do not change the interfaces of __init__() and predict(). Please read template file for further instructions.
(b) Weak learner's predictions
Now test your implementation of Stump on the dataset given by get_dataset_fixed(). Suppose all the datapoints are equally weighted. Please answer the following questions in your written submission:
- What is your decision function?
- How many datapoints are mis-classified?
- Using the helper function visualization(), include a visualization of your stump's predictions.
(c) Bagging
As we have learned from the class, we can utilize ensemble methods to create a strong learner from weak learners we have for part (a). Please complete bagging() in hw5.py. This function should take the whole dataset as input, and sample a subset from it in each step, to build a list of different weak learners.
Please do not change the random sampling of sample_indices, and use the default random seed=0, so that your code can behave consistently in the autograder.
class Stump():
# gettingdata and changing their type from tuple to array
def __init__(self, data, labels, weights=None):
'''
Initializes a stump (one-level decision tree) which minimizes
a weighted error function of the input dataset.
In this function, you will need to learn a stump using the weighted
datapoints. Each datapoint has 2 features, whose values are bounded in
[-1.0,1.0]. Each datapoint has a label in {+1,-1}, and its importance
is weightedby a positive value.
The stump will choose one of the features, and pick the best threshold
in that dimension, so that the weighted error is minimized.
Arguments:
data: An ndarray with shape (n,2). Values in [-1.0,1.0].
labels: An ndarray with shape (n,). Values are +1 or -1.
weights: An ndarray with shape (n,). The weights of each
datapoint, all positive.
'''
# You may choose to use the following variables as a start
# The feature dimension which the stump will decide on
# Either 0 or 1, since the datapoints are 2D
self.dimension =0
# The threshold in that dimension
# May be midpoints between datapoints or the boundaries -1.0,1.0
self.threshold =-1.0
# The predicted sign when the datapoint's feature in that dimension
# is greater than the threshold
# Either +1 or -1
self.sign =1
pass
def predict(self, data):
'''
Arguments:
data: An ndarray with shape (n,2). Values in [-1.0,1.0].
Returns:
prediction: An ndarray with shape (n,). Values are +1 or -1.
'''
pass
def bagging(data, labels, n_classifiers, n_samples, seed=0):
'''
Arguments:
data: An ndarray with shape (n,2). Values in [-1.0,1.0].
labels: An ndarray with shape (n,). Values are +1 or -1.
n_classifiers: Number of classifiers to construct.
n_samples: Number of samples to train each classifier.
seed: Random seed for NumPy.
Returns:
classifiers: A list of classifiers.
'''
classifiers =[]
n = data.shape[0]
for i in range(n_classifiers):
np.random.seed(seed + i)
sample_indices = np.random.choice(n, size=n_samples, replace=False)
# complete the rest of the loop
pass
def adaboost(data, labels, n_classifiers):
'''
Arguments:
data: An ndarray with shape (n,2). Values in [-1.0,1.0].
labels: An ndarray with shape (n,). Values are +1 or -1.
n_classifiers: Number of classifiers to construct.
Returns:
classifiers: A list of classifiers.
weights: A list of
4 . Ensemble Methods In this question, you will

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!