Question: We follow the steps of the Policy Iteration algorithm as explained in the class. 1 . Write down the Bellman equation. 2 . The initial
We follow the steps of the Policy Iteration algorithm as explained in the class.
Write down the Bellman equation.
The initial policy is pi A and pi B That means that action is taken when in state A and the same action
is taken when in state B as well. Calculate the values V
pi
A and V
pi
B from two iterations of policy evaluation
Bellman equation after initializing both V
pi
A and V
pi
B to
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
