Question: -A decision maker observes a discrete-time system which moves between states {S1, S2, S3, S4} according to the following transition probability matrix: 0.3 0.2

-A decision maker observes a discrete-time system which moves between states {S1, 

-A decision maker observes a discrete-time system which moves between states {S1, S2, S3, S4} according to the following transition probability matrix: 0.3 0.2 0.1 0.4 P = 0.4 0.2 0.1 0.5 0.0 0.3 0.0 0.8 0.1 0.0 0.0 0.6 At each point in time, the decision maker may leave the system and receive a reward of R = 20 units, or alternatively remain in the system and receive a reward of r(s;) units if the system occupies state s;. If the decision maker decides to remain in the system its state at the next decision epoch is determined by matrix P. Assume a discount rate of 0.9 and that r(s;) = i. a) Formulate this model as an MDP. b) Use both policy iteration and linear programming to find a stationary policy which minimizes the expected total discounted reward. compare the results, and report the optimal policy and the optimal value function for both methods. c) Find the smallest value of R so that it is optimal to leave the system in state 2.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Computer Network Questions!