Question: Part 1 : Write code for a multi - arm bandit algorithm that has the following characteristics: A: number of arms P: Distribution of rewards
Part :
Write code for a multiarm bandit algorithm that has the following characteristics:
A: number of arms
P: Distribution of rewards Use the beta distribution so you can tune the rewards distribution based on two parameters. Choose your own parameter settings and graph the distributions in one plot.
ri: reward or taken from probability distribution Pi
T: number of rounds played gambles
R: calculate the regret difference between actual reward and reward if you played optimally as a function of time number of rounds T
Part :
Suppose you have arms A Implement a random and a greedy approach to selecting the best arm to play.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
