Question: Let us apply policy iteration. We determine the optimal policy and the values of states 1 and 2 each step. We call utility at state
Let us apply policy iteration. We determine the optimal policy and the values of states and each step.
We call utility at state u utility at state u and utility at state u
The whole process includes several iterations, and each iteration includes three major steps
initialization
value determination
policy update.
Assume that the initial policy choose action b in both states. Let us calculate the first iteration.
First, initialization is easy, because we already said "Assume that the initial policy choose action b in both states".
After initialization, we do value determination. We have a set of three linear equations with u u and u After solving these equations, we have:
uwith error margin
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
