Question: Q 4 . Construct an MDP to find an optimal policy for the following problem. The agent operates in a discrete, two - dimensional space

Q4. Construct an MDP to find an optimal policy for the following problem. The agent operates
in a discrete, two-dimensional space where each state is an element of I2 for I={1,dots,10}subN
the natural numbers. The agent can move one step in any one of four directions ,
W}. However, the dynamics of the environment are such that if an agent moves in any direction,
it has a probability of 0.2 of slipping back one step in the opposite direction, 0.1 of slipping to its
left and 0.1 of slipping to its right.
(a) Write down the (i) State, (ii) Action and (iii) Transition function for the MDP and explain
each using examples. Include at least one figure.
(b) The agent can start anywhere on the first row of states (where row is the first index or
coordinate of the state).
There is a negative reward associated with every state except for states in the last row.
Write the description of the MDP first state and reward functions more formally
(mathematically) in terms of the state I2.
(c) The discount factor is a real number such that 01. Prove that we can expect U(s0),
U(s0)=R(s0)+R(s1)+2R(s2)+dots
i.e., the sum of discounted rewards starting in state s0, to converge and show an
expression for what it will converge to.
 Q4. Construct an MDP to find an optimal policy for the

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!