Question: MC-Question 8. Consider the multi-armed bandit problem with 2 arms and adversarial losses (or equivalently adversarial rewards). We would like to use the Thompson


 MC-Question 8. Consider the multi-armed bandit problem with 2 arms and adversarial 

MC-Question 8. Consider the multi-armed bandit problem with 2 arms and adversarial losses (or equivalently adversarial rewards). We would like to use the Thompson sampling algorithm for this setting. What do you think about the normalized regret of this algorithm (RT/T)? (a) RT/T will converge to zero as Thomson sampling is a randomized al- gorithm. (b) RT/T will not converge to zero if the losses are chosen carefully because Thompson sampling is designed for stochastic rewards. (c) RT/T will converge to zero as Thompson sapling has lower regret than EXP3. (d) RT/T will not converge to zero as Thompson sampling is a deterministic algorithm and we are considering adversarial losses.

Step by Step Solution

3.47 Rating (147 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

a The normalized regret will converge to zero as T since Thomson sampling is a randomized algorithm why this is the case The reason is that for any gi... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!