Question: The Frequentist Approach: Upper Confidence Bounds (UCB) The first algorithm we will analyze is the frequentist take on multi-armed bandits, known as the Upper Confidence
The Frequentist Approach: Upper Confidence Bounds (UCB) The first algorithm we will analyze is the frequentist take on multi-armed bandits, known as the Upper Confidence Bounds (UCB) algorithm. For each arm , you keep track of: : the number of times arm has been pulled up to and including iteration . : the samples you have received from arm . Let be the mean of those samples: Using this information, you compute an upper confidence bound, that encompasses the true mean with probability at least , for some . , must therefore satisfy: As an edge case, after samples, we simply set the upper bound on to , since it's always true that . The algorithm then pulls, at each round , the arm with the highest upper confidence bound based on the results we saw up to time
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
