Question: The Frequentist Approach: Upper Confidence Bounds (UCB) The first algorithm we will analyze is the frequentist take on multi-armed bandits, known as the Upper Confidence

The Frequentist Approach: Upper Confidence Bounds (UCB) The first algorithm we will analyze is the frequentist take on multi-armed bandits, known as the Upper Confidence Bounds (UCB) algorithm. For each arm , you keep track of: : the number of times arm has been pulled up to and including iteration . : the samples you have received from arm . Let be the mean of those samples: Using this information, you compute an upper confidence bound, that encompasses the true mean with probability at least , for some . , must therefore satisfy: As an edge case, after samples, we simply set the upper bound on to , since it's always true that . The algorithm then pulls, at each round , the arm with the highest upper confidence bound based on the results we saw up to time

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock