Question: Consider an undiscounted MDP having three states, ( 1 , 2 , 3 ) , with rewards 1 , 2 , and 0 , respectively.
Consider an undiscounted MDP having three states, with rewards and respectively. State is a terminal state. In states and there are two possible actions: A and B The transition model is as follows:
In state
action A moves the agent to state with probability and chance do not moveaction B moves the agent to state with probability and chance do not move
In state
action A moves the agent to state with probability and chance do not moveaction B moves the agent to state with probability and chance do not move
Let us apply policy iteration. We determine the optimal policy and the values of states and each step.
We call utility at state u utility at state u and utility at state u
The whole process includes several iterations, and each iteration includes three major steps
initialization
value determination
policy update.
Assume that the initial policy choose action b in both states. Let us calculate the first iteration.
First, initialization is easy, because we already said "Assume that the initial policy choose action b in both states".
After initialization, we do value determination. We have a set of three linear equations with u u and u
find u u and u
Which action is preferred for state at this iteration?
Which action is preferred for state at this iteration?
Now we start the second iteration. Based on the preferred action calculated from previous iteration, we initialize it again.Then, in the value determination of this second iteration, the set of equation now been updated. Solve them again, and find u u and u
Which action is preferred for state at this iteration?
Which action is preferred for state at this iteration?
Now, what will happens to policy iteration if we let the initial policy choose action A in both states?
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
