Question: Consider the following 2 - armed bandit problem: the first arm has a fixed reward 0 . 3 and the second arm has a 0

Consider the following 2-armed bandit problem: the first arm has a fixed reward 0.3 and
the second arm has a 0-1 reward following a Bernoulli distribution with probability 0.6, i.e.,
arm 2 yields reward 1 with probability 0.6. Assume we selected arm 1 at t =1, and arm
2 four times at t =2,3,4,5 with reward 0,1,0,0, respectively. We use the sample-average
technique to estimate the action-value, and then use it to guide our choices starting from
t =6.
1.[5 pts] Which arm will be played at t =6,7, respectively, if the greedy method is used
to select actions?
2.[10 pts] What is the probability to play arm 2 at t =6,7, respectively, if the \epsi -greedy
method is used to select actions (\epsi =0.1)?
3.[5 pts] Why could the greedy method perform significantly worse than the \epsi -greedy
method in the long run?Consider the following 2-armed bandit problem: the first arm has a fixed reward 0.3 and
the second arm has a 0-1 reward following a Bernoulli distribution with probability 0.6, i.e.,
arm 2 yields reward 1 with probability 0.6. Assume we selected arm 1 at t =1, and arm
2 four times at t =2,3,4,5 with reward 0,1,0,0, respectively. We use the sample-average
technique to estimate the action-value, and then use it to guide our choices starting from
t =6.
1.[5 pts] Which arm will be played at t =6,7, respectively, if the greedy method is used
to select actions?
2.[10 pts] What is the probability to play arm 2 at t =6,7, respectively, if the \epsi -greedy
method is used to select actions (\epsi =0.1)?
3.[5 pts] Why could the greedy method perform significantly worse than the \epsi -greedy
method in the long run?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!