Question: Assume that we now need to solve a long-run average reward problem for the following matrices i.e., there is no discount factor. Write a MATLAB

Assume that we now need to solve a long-run average reward problem for the following matrices

Assume that we now need to solve a long-run average reward problem

i.e., there is no discount factor. Write a MATLAB program to perform relative value iteration. Show me the MATLAB code and also an output from your code after it is used to solve the MDP. Use the max norm for termination. Please show the nal policy and how many iterations the algorithm took to converge, as well as the final value of the average reward. Use = 0.001. Note: the MDP is the Markov decision process (MDP).

12 9 0 0.3 0.7 0.2 0.8 12 4 0.6 0.4 0.1 0.9 7-13 6 20

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!