Question: Let us apply policy iteration. We determine the optimal policy and the values of states 1 and 2 each step. We call utility at state

Let us apply policy iteration. We determine the optimal policy and the values of states 1 and 2 each step.
We call utility at state 1, u1, utility at state 2, u2, and utility at state 3, u3
The whole process includes several iterations, and each iteration includes three major steps
1. initialization
2. value determination
3. policy update.
Assume that the initial policy choose action b in both states. Let us calculate the first iteration.
First, initialization is easy, because we already said "Assume that the initial policy choose action b in both states".
After initialization, we do value determination. We have a set of three linear equations with u1, u2 and u3. After solving these equations, we have:
u1=__(with error margin 0)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!