Question: Consider an undiscounted MDP having three states, ( 1 , 2 , 3 ) , with rewards 1 , 2 , and 0 , respectively.

Consider an undiscounted MDP having three states, (1,2,3), with rewards 1,2, and 0, respectively. State 3 is a terminal state. In states 1 and 2 there are two possible actions: A and B. The transition model is as follows:
In state 1,
action A moves the agent to state 2 with probability 0.8, and0.2 chance do not moveaction B moves the agent to state 3 with probability 0.1, and0.9 chance do not move
In state 2,
action A moves the agent to state 1 with probability 0.8 and0.2 chance do not moveaction B moves the agent to state 3 with probability 0.1 and 0.9 chance do not move
Let us apply policy iteration. We determine the optimal policy and the values of states 1 and 2 each step.
We call utility at state 1, u1, utility at state 2, u2, and utility at state 3, u3
The whole process includes several iterations, and each iteration includes three major steps
1. initialization
2. value determination
3. policy update.
Assume that the initial policy choose action b in both states. Let us calculate the first iteration.
First, initialization is easy, because we already said "Assume that the initial policy choose action b in both states".
After initialization, we do value determination. We have a set of three linear equations with u1, u2 and u3.
1. find u1, u2, and u3
2.Which action is preferred for state 1 at this iteration?
3. Which action is preferred for state 2 at this iteration?
4. Now we start the second iteration. Based on the preferred action calculated from previous iteration, we initialize it again.Then, in the value determination of this second iteration, the set of equation now been updated. Solve them again, and find u1, u2, and u3
5. Which action is preferred for state 1 at this iteration?
6. Which action is preferred for state 2 at this iteration?
Now, what will happens to policy iteration if we let the initial policy choose action A in both states?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!