Question: 4 . Ensemble Methods In this question, you will implement several ensemble methods including Bagging and AdaBoost on a simple dataset. The methods will learn

4 .

Ensemble Methods

In this question, you will implement several ensemble methods including Bagging and AdaBoost on a simple dataset. The methods will learn a binary classification of

2

D datapoints in

\ ([- 1, 1]^{2} \) .

(

)

Weak learner

To begin with, you will implement a weak learner to do the binary classification.

A decision stump is a one

-

level decision tree. It looks at a single feature, and then makes a classification by thresholding on this feature. Given a dataset with positive weights assigned to each datapoint, we can find a stump that minimizes the weighted error:

\ [

= \

sum

_{

= 1}^{

}

^{(

)} \

cdot

\

mathbf

{1} \

left

(

^{(

)}

\

widehat

{

}^{(

)} \

right

)

\]

Here

\ (

^{(

)} \)

is the weight of the

\ (

\) -

th datapoint, and the prediction

\ (\

widehat

{

}^{(

)} \)

is given by thresholding on the

\ (

\) -

th feature of datapoint

\ (\

boldsymbol

{

}^{(

)} \)

\ [

\

widehat

{

}^{(

)} = \

left

\ {\

begin

{

array

} {

}

,

\

text

{

}

_{

}^{(

)} \

geq t

\ \

-

,

\

text

{

otherwise

}

\

end

{

array

} \

right

.

\]

For the

2

D dataset we have, the parameters of this stumps are the sign

\ (

\

\ {+ 1, - 1 \} \),

the feature dimension

\ (

\

\ {1, 2 \} \),

and the threshold

\ (

\

[- 1, 1] \) .

In this question, your task is to find out the best stump given the dataset and weights.

Learning a decision stump requires learning a threshold in each dimension and then picking the best one. To learn a threshold in a dimension, you may simply sort the data in the chosen dimension, and calculate the loss on each candidate threshold. Candidates are midpoints between one point and the next, as well as the boundaries of our range of inputs.

Please implement the Stump class in hw

5 .

.

You may define your own functions inside the class, but do not change the interfaces of

__

init

__()

and predict

() .

Please read template file for further instructions.

(

)

Weak learner's predictions

Now test your implementation of Stump on the dataset given by get

_

dataset

_

fixed

() .

Suppose all the datapoints are equally weighted. Please answer the following questions in your written submission:

-

What is your decision function?

-

How many datapoints are mis

-

classified?

-

Using the helper function visualization

(),

include a visualization of your stump's predictions.

(

)

Bagging

As we have learned from the class, we can utilize ensemble methods to create a strong learner from weak learners we have for part

(

) .

Please complete bagging

()

in hw

5 .

.

This function should take the whole dataset as input, and sample a subset from it in each step, to build a list of different weak learners.

Please do not change the random sampling of sample

_

indices, and use the default random seed

= 0,

so that your code can behave consistently in the autograder.

class Stump

()

# gettingdata and changing their type from tuple to array

def

__

init

__(

self

,

data, labels, weights

=

None

)

'''

Initializes a stump

(

one

-

level decision tree

)

which minimizes

a weighted error function of the input dataset.

In this function, you will need to learn a stump using the weighted

datapoints. Each datapoint has

2

features, whose values are bounded in

[- 1.0, 1.0] .

Each datapoint has a label in

{+ 1, - 1},

and its importance

is weightedby a positive value.

The stump will choose one of the features, and pick the best threshold

in that dimension, so that the weighted error is minimized.

Arguments:

data: An ndarray with shape

(

, 2) .

Values in

[- 1.0, 1.0] .

labels: An ndarray with shape

(

,) .

Values are

+ 1

- 1 .

weights: An ndarray with shape

(

,) .

The weights of each

datapoint, all positive.

'''

# You may choose to use the following variables as a start

# The feature dimension which the stump will decide on

# Either

0

1,

since the datapoints are

2

self.dimension

= 0

# The threshold in that dimension

# May be midpoints between datapoints or the boundaries

- 1.0, 1.0

self.threshold

= - 1.0

# The predicted sign when the datapoint's feature in that dimension

# is greater than the threshold

# Either

+ 1

- 1

self.sign

= 1

pass

def predict

(

self

,

data

)

'''

Arguments:

data: An ndarray with shape

(

, 2) .

Values in

[- 1.0, 1.0] .

Returns:

prediction: An ndarray with shape

(

,) .

Values are

+ 1

- 1 .

'''

pass

def bagging

(

data

,

labels, n

_

classifiers, n

_

samples, seed

= 0)

'''

Arguments:

data: An ndarray with shape

(

, 2) .

Values in

[- 1.0, 1.0] .

labels: An ndarray with shape

(

,) .

Values are

+ 1

- 1 .

_

classifiers: Number of classifiers to construct.

_

samples: Number of samples to train each classifier.

seed: Random seed for NumPy.

Returns:

classifiers: A list of classifiers.

'''

classifiers

= []

=

data.shape

[0]

for i in range

(

_

classifiers

)

.

random.seed

(

seed

+

)

sample

_

indices

=

.

random.choice

(

,

size

=

_

samples, replace

=

False

)

# complete the rest of the loop

pass

def adaboost

(

data

,

labels, n

_

classifiers

)

'''

Arguments:

data: An ndarray with shape

(

, 2) .

Values in

[- 1.0, 1.0] .

labels: An ndarray with shape

(

,) .

Values are

+ 1

- 1 .

_

classifiers: Number of classifiers to construct.

Returns:

classifiers: A list of classifiers.

weights: A list of

4 . Ensemble Methods In this question, you will

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Please provide the summary of the methodology and your understanding of this paper. Incluse necessary figures as well. Rapid Object Detection using a Boosted Cascade of Simple Features single feature...

Project Title: "Advanced Image Recognition Techniques: Feature Extraction, Enhancement, and Ensemble Methods" Objective: Explore and compare the effectiveness of various feature extraction techniques...

pls solve all 3 photos Search Q 1. Ensemble methods use the output of models as input for another model to make a final prediction Hello, Mar True O False 2. Ensemble methods are not allowed in...

Project Title:I Project Title: "Advanced Image Recognition Techniques: Feature Extraction, Enhancement, and Ensemble Methods" Objective: Explore and compare the effectiveness of various feature...

The goal of this project is to detect fraudulent online payments using machine learning methods. You will apply various supervised classification algorithms to identify fraudulent transactions from a...

Hello, I don't know how to do this question. Can you help me to solve it with code in Python? Thank you so much! Getting Started Take a look at the columns in the dataset credit_card.csv. We have...

ADMN 233 Assignment 4 Assignment 4 Instructions Assignment 4 is worth 20% of your final mark. It should be completed and submitted after you finish Chapter 13 in your textbook. This assignment is...

Two Discussion Questions. A minimum of 200 words per question. Both questions are attached to the word document and each chapter to both questions are attached to the PDF. Any references/citations...

The sample average age is 69.8 and the sample standard deviation is 9.2, based on a sample of 200 individuals in a retirement community. Your friend claims that the sample average is approximately...

A company claims that its low-salt potato chips contain only 85 mg of sodium per serving on average. An independent company randomly selects a package of the potato chips, which turns out to contain...

7 . If we are auditing fixed assets, which item below would you not want to do: Question 1 6 options: a . Test depreciation b . Vouch additions c . Make sure all large improvements to equipment get...

Which policy did Michelle purchase?