Question: Part 1 : Write code for a multi - arm bandit algorithm that has the following characteristics: A: number of arms P: Distribution of rewards
Part :
Write code for a multiarm bandit algorithm that has the following characteristics:
A: number of arms
P: Distribution of rewards Use the beta distribution so you can tune the rewards distribution based on two parameters. Choose your own parameter settings and graph the distributions in one plot.
ri: reward or taken from probability distribution Pi
T: number of rounds played gambles
Part :
Suppose you have arms A Implement a random, a greedy, an epsilonfirst greedy, and epsilon greedy, and a upper confidence band UCB approach to selecting the best arm to play. Ensure the strategies only use the rewards when determining
Part :
Evaluate the performance of the strategies by plotting the regret of each round ie plot Regretround# versus round and plotting the expected regret averaged over rounds ie plot average Regretround# versus round Regret is the difference between actual reward and reward if you played optimally.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
