Question: Problem 2 (16 marks) Consider a Markov Decision Process (MDP) with states S = {4,3,2,1,0}, where 4 is the starting state. In states k >

Problem 2 (16 marks) Consider a Markov Decision Process (MDP) with states S = {4,3,2,1,0}, where 4 is the starting state. In states k > 1 you can walk (W) and T(k, W, k 1) = 1. In states k > 2 you can also jump (J) and T(k, J, K - 2) = 3/4 and T(k,), k) = 1/4. State 0 is a terminal state. The reward R(s, a, s') = (s s')2 for all (s, a,s'). Use a discount of y = 1/2. Compute both V*(2) and Q*(3,7). Clearly show how you computed these values. Problem 2 (16 marks) Consider a Markov Decision Process (MDP) with states S = {4,3,2,1,0}, where 4 is the starting state. In states k > 1 you can walk (W) and T(k, W, k 1) = 1. In states k > 2 you can also jump (J) and T(k, J, K - 2) = 3/4 and T(k,), k) = 1/4. State 0 is a terminal state. The reward R(s, a, s') = (s s')2 for all (s, a,s'). Use a discount of y = 1/2. Compute both V*(2) and Q*(3,7). Clearly show how you computed these values
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
