Question: Part 1 : Write code for a multi - arm bandit algorithm that has the following characteristics: A: number of arms P: Distribution of rewards

Part 1:
Write code for a multi-arm bandit algorithm that has the following characteristics:
A: number of arms
P: Distribution of rewards [0,1]. Use the beta distribution so you can tune the rewards distribution based on two parameters. Choose your own parameter settings and graph the distributions in one plot.
r_i: reward (0 or 1) taken from probability distribution P_i
T: number of rounds played (gambles)
R: calculate the regret (difference between actual reward and reward if you played optimally) as a function of time (number of rounds T)
Part 2:
Suppose you have 4 arms (A=4). Implement a random and a greedy approach to selecting the best arm to play.
** NEW QUESTION **
Using the same code as before, implement the epsilon-greedy, the Epsilon-first greedy, and the upper confidene bound (UCB1) approaches.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!