Question: Consider a stochastic n-armed bandit, n 2, in which the arms give 0-1 (Bernoulli) rewards. We restrict our attention to instances I in which

Consider a stochastic n-armed bandit, n 2, in which the arms give

Consider a stochastic n-armed bandit, n 2, in which the arms give 0-1 (Bernoulli) rewards. We restrict our attention to instances I in which the means of the arms all lie in (0,1), and moreover, no two arms have the same mean. In any such instance I, let a2 be the arm with the second highest mean, and let u be a random variable denoting the number of pulls of a2 over a horizon T > 1. Describe a deterministic algorithm L, which, for every qualifying bandit instance I, achieves ELI [UT] T In other words, the number of pulls of arms other than a2 under L must be a vanishing fraction of the horizon. Provide a proof sketch that L satisfies this property; no need for a detailed mathe- matical working. [4 marks] lim T 1.

Step by Step Solution

3.48 Rating (151 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

The detailed ... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Computer Engineering Questions!