Question: Q 5 Va 1 ue Iteration Convergence We will consider a simple MDP that has six states, A , B , C , D ,
Q Vaue Iteration Convergence
We will consider a simple MDP that has six states, A B C D E and F Each state has a
single action, go An arrow from a state x to a state y indicates that it is possible to
transition from state x to next state y when is taken. If there are multiple arrows
leaving a state transitioning to each of the next states is equally ikey The state
F has no outgoing arrows: once you arrive in F you stay in F for all future times. The
reward is one for all transitions, with one exception: staying in F gets a reward of zero.
Assume a discount factor We assume that we initialize the value of each state to
Note: you shoud not need to explicitly run value iteration to solve this problem.
Q
After how many iterations of value iteration will the value for state E have become exactly
equal to the true optimum? Enter inf if the values will never become equal to the true
optimal but only converge to the true optimal.
Q
How many iterations of value iteration will it take for the values of all states to converge
to the true optimal values? Enter inf if the values will never become equal to the true
optimal but only converge to the true optima
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
