Question: Problem 1 . Consider a MDP with two states S = { 0 , 1 } , two actions A = { 1 , 2

Problem 1. Consider a MDP with two states S={0,1}, two actions A={1,2}, and the follow reward function
Rs(a)={1,(s,a)=(0,1)4,(s,a)=(0,2)3,(s,a)=(1,1)2,(s,a)=(1,2)
and the transition probabilities Pss'(a) as follows:
[P00(1)P00(2)P10(1)P10(2)]=[13121423]
The other probabilities can be deduced, for example:
P01(1)=1-P00(1)=1-13=23.
The discount factor is
=34.
(a) For the policy that chooses action 1 in state 0, and action 2 in state 1, find the state value function v(s), by writing out the Bellman's expectation equation, and solve the equation explicitly.
(b) For the same policy , obtain the state value function using iterative update based on the Bellman's expectation equation. You need to list the first 5 iteration values of v(s).
(c) For the policy , calculate the q(s,a) function.
(d) Based on the value function v(s), obtain an improved policy ' based on
'(s)=argmaxaq(s,a).
(e) Obtain the optimal value function v**(s) using value iteration based on the Bellman's optimality equation, with all initial values set to 0.
(f) Obtain the optimal policy.
Problem 1 . Consider a MDP with two states S = {

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!