Question: Consider the following Markov Decision Process ( MDP ) with = 0 . 5 and three states A , B , C . The arcs

Consider the following Markov Decision Process (MDP) with =0.5 and three states A, B, C. The arcs represent state transitions and lower-case letters ab, ba, bc, ca, cb represent actions; signed integers represent rewards; and fractions represent transition probabilities.
1.(1%) Write down the Bellman expectation equation for state-value functions
2.(3%) Consider the uniform random policy 1(s,a) that takes all actions from state s with equal probability. Starting with an initial value function of V1(A)=V1(B)=V1(C)=4, apply one synchronous iteration of iterative policy evaluation (i.e. one backup for each state) to compute a new value function V2(s)(i.e., V2(A), V2(B), and V2(C)) by applying/expanding the equation given in part 1.
3.(1%) Write the Bellman function that characterizes the optimal state value function (i.e., V*(s)) Consider the following Markov Decision Process (MDP) with \(\gamma=0.5\) and three states A, B, C. The arcs represent state transitions and lower-case letters ab, ba, bc, ca, cb represent actions; signed integers represent rewards; and fractions represent transition probabilities.
Consider the following Markov Decision Process (

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!