Question: Problem 1 . Consider a MDP with two states S = { 0 , 1 } , two actions A = { 1 , 2
Problem Consider a MDP with two states two actions and the follow reward function
and the transition probabilities as follows:
The other probabilities can be deduced, for example:
The discount factor is
a For the policy that chooses action in state and action in state find the state value function by writing out the Bellman's expectation equation, and solve the equation explicitly.
b For the same policy obtain the state value function using iterative update based on the Bellman's expectation equation. You need to list the first iteration values of
c For the policy calculate the function.
d Based on the value function obtain an improved policy based on
e Obtain the optimal value function using value iteration based on the Bellman's optimality equation, with all initial values set to
f Obtain the optimal policy.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
