Question: Q 5 Value Iteration Convergence We will consider a simple MDP that has six states, A , B , C , D , E ,
Q Value Iteration Convergence
We will consider a simple MDP that has six states, A B C D E and F Each state has a single action, go An arrow from a state x to a state y indicates that it is possible to transition from state x to next state y when g o is taken. If there are multiple arrows leaving a state x transitioning to each of the next states is equally likely. The state F has no outgoing arrows: once you arrive in F you stay in F for all future times. The reward is one for all transitions, with one exception: staying in F gets a reward of zero. Assume a discount factor We assume that we initialize the value of each state to Note: you should not need to explicitly run value iteration to solve this problem.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
