Question: really struggling with value iteration and discount factor on these problems. please help me solve these with steps so that i can learn how to

Consider Pacman that uses MDPs to maximize his expected utility. In each environment: - Pacman has the standard actions (North, East, South, West) unless blocked by an outer wall - There is a reward of 1 point when eating the dot (for example, in the grid below, R(A,South,B)=1 ) - The game ends when the dot (blue cirele) is caten. (a) Consider the following grid where there is a single food pellet in the botiom right corner (B). The discount factor is 0.2. There is no living reward. The states are simply the grid location. cation. a) What is the optimal policy for each state? b) What is the optimal value for the state of being in the upper left comer (E)? Reminder: the discount factor is 0.2. c) Using value itention with the value of all states equal to zero at k=0, for which iteration k will Vk(F)=V(F), explain. Consider the following grid world MDP for the rest of this question. Shaded cells represent walls. In all states, the agent has available actions 1,,,. Performing an action that would transition to an invalid state (outside the grid or into a wall) results in the agent remaining in its original state. In states with an arrow coming out, the agent has an additional action EXIT. In the event that the EXIT aetion is taken, the agent receives the labeled reward and ends the game in the terminal state T. Unless otherwise stated, all other transitions receive no reward, and all transitions are deterministic. For all parts of the problem, assume that value iteration begins with all states initialized to zero, i.e,, V0(s)=0,s. Let the discount fictor be =0.25 for all following parts. a) Suppose that we are performing value iteration on the grid world MDP below. What is the optimal value of V(A) and V(B) ? Explain your answer (show how you compute them) b) After how many iterations k will we have Vk(s)=V(s) for all states s ? If it never occurs, write "never". Show your computation
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
