Question: (30) 2. A decision maker observes a discrete time system which moves between states {1, 2, 3, 4} according to the following transition probability matrix:

 (30) 2. A decision maker observes a discrete time system whichmoves between states {1, 2, 3, 4} according to the following transition

probability matrix: 0.3 0.4 0.2 0.1 0.2 0.3 0.5 0.0 P =0.1 0.0 0.8 0.1 0.4 0.0 0.0 0.6 At each point in

(30) 2. A decision maker observes a discrete time system which moves between states {1, 2, 3, 4} according to the following transition probability matrix: 0.3 0.4 0.2 0.1 0.2 0.3 0.5 0.0 P = 0.1 0.0 0.8 0.1 0.4 0.0 0.0 0.6 At each point in time the decision maker may leave the system and receive a reward of R = 20 units or alternatively remain in the system and receive a reward of r(i) units if the system occupies state i. If the decision maker decides to remain in the system, its state at the next decision epoch is determined by P. On the other hand, if the decision maker leaves the system, he can never come back. Assume a discount rate of o = 0.9 and r(i) = i, for i = 1, 2, 3, 4. (15) (a) Formulate this problem as a Markov decision process problem if the objective is to maximize the expected infinite horizon discounted reward.(15) (b) Carry out three iterations of the value iteration algorithm to find the optimal policy

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!