Question: Consider the simple MDP shown below. Starting from state s 1 , the agent can move to the right ( a 0 ) or left

Consider the simple MDP shown below. Starting from state s1, the agent can move to the right
(a0) or left (a1) from any state si. Actions are deterministic (e.g. choosing a1 at state s2 results in
transition to state s1). Taking any action from the goal state G earns a reward of r =+1 and the
agent stays in state G. Otherwise, each move has zero reward (r =0). Assume a discount factor
\gamma <1."$%"&'
),=0
",=0
=1
),=0),=0
",=0
",=0
(a) What is the optimal action at any state si = G? Find the optimal value function for all states
si and the goal state G.[5 pts]
(b) Does the optimal policy depend on the value of the discount factor \gamma ? Explain your answer.
[5 pts]
(c) Consider adding a constant c to all rewards. Find the new optimal value function for all states
si and the goal state G. Does adding a constant reward c change the optimal policy? Explain
your answer. [5 pts]
(d) After adding a constant c to all rewards now consider scaling all the rewards by a constant a
(i.e. rnew = a(c + rold)). Find the new optimal value function for all states si and the goal
state G. Does that change the optimal policy? Explain your answer, If yes, give an example
of a and c that changes the optimal policy. [5 pts]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!