Question: Consider a bandit problem in which you know the set of expected payoffs for pulling various arms, but you do not know which arm maps

Consider a bandit problem in which you know the set of expected payoffs for pulling various arms, but you do not know which arm maps to which expected payoff. For example, consider a 5 arm bandit problem and you know that the arms 1 through 5 have payoffs 3.1, 2.3, 4.6, 1.2, 0.9, but not necessarily in that order.

a) Can you design a regret minimizing algorithm that will achieve better bounds than UCB? What makes you believe that it is possible?

b) What parts of the analysis of UCB will you modify to achieve better bounds? Note that you are not asked for a complete algorithm, only the intuition.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!