Question: 4 Stochastic Bandit Algorithms [ 2 + 6 = 8 points ] Consider the stochastic bandit problem with 3 arms, where the ( random )

4 Stochastic Bandit Algorithms [2+6=8 points]
Consider the stochastic bandit problem with 3 arms, where the (random) reward associated with the 3 arms for the first 8
rounds are as follows. Note that these numbers are unknown to the bandit algorithm (see Slide 9 of Module 14 for details
of the model and assumptions):
time t 12345678
arm 1: 0.30.20.50.30.20.40.60.3
arm 2: 0.20.30.50.80.50.30.70.2
arm 3: 0.10.050.020.10.030.020.010
A bandit algorithm A has respectively selected arms 1,2,3, and 1 in the first 4 rounds (for t in {1,2,3,4}).
(a) Suppose A applies the -greedy algorithm with =0.2 at round t =5. Compute the chance of each arm being
selected. Show your work.
(b) Suppose A intends to apply the UCB algorithm in the rounds that follow. We want to trace the algorithm for these
rounds. Below, it is described how the algorithm works at round t =5.
Follow the same steps for the rounds t in {6,7,8}.
At time 5 scores are as follows:
Arm1: 0.3(exploitation)+1.26864(exploration)=1.56864(total score)
Arm2: 0.3(exploitation)+1.79412(exploration)=2.09412(total score)
Arm3: 0.02(exploitation)+1.79412(exploration)=1.81412(total score)
The selected arm is 2 with payoff 0.5

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!