Question: Assume that we now need to solve a long-run average reward problem for the following matrices i.e., there is no discount factor. Write a MATLAB
Assume that we now need to solve a long-run average reward problem for the following matrices

i.e., there is no discount factor. Write a MATLAB program to perform relative value iteration. Show me the MATLAB code and also an output from your code after it is used to solve the MDP. Use the max norm for termination. Please show the nal policy and how many iterations the algorithm took to converge, as well as the final value of the average reward. Use = 0.001. Note: the MDP is the Markov decision process (MDP).
12 9 0 0.3 0.7 0.2 0.8 12 4 0.6 0.4 0.1 0.9 7-13 6 20
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
