Question: Value Function 1 point possible ( graded ) As above, we are working with the 3 3 grid example with + 1 reward at the

Value Function
1 point possible (graded)
As above, we are working with the 33 grid example with +1 reward at the top right corner and -1 at the
cell below it. The agent also gets a reward of -10 for every action that it takes. The action outcomes are
deterministic. The agent continues to act until it reaches the +1 cell, when it stops.
The following figures show states s1,s2,s3, in which the letter " A " marks the current location of the agent.
s1
s2
s3
A value function V(s) of a given state s is the expected reward (i.e the expectation of the utility function) if
the agent acts optimally starting at state s. In the given MDP, since the action outcome is deterministic, the
expected reward simply equals the utility function.
Which of the following should hold true for a good value function V(s) under the reward structure in the given
MDP?
Note: You may want to watch the video on the next page before submitting this question.
V(s3)V(s3)
V(s3)V(s1)
V(s3)
V(s3)
Value Function 1 point possible ( graded ) As

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!