Question: ( 5 p ) Consider the DiscountGrid layout, shown below. This grid has two terminal states with positive payoff ( in the middle row )
p Consider the DiscountGrid layout, shown below. This grid has two terminal states with positive
payoff in the middle row a close exit with payoff and a distant exit with payoff The bottom
row of the grid consists of terminal states with negative payoff; each state in this "cliff" region has
payoff The starting state is the white square We distinguish between two types of paths:
paths that "risk the cliff" and travel near the bottom row of the grid; these paths are shorter but
risk earning a large negative payoff, and are represented by the red arrow in the figure below. paths
that "avoid the cliff" and travel along the top edge of the grid. These paths are longer but are less
likely to incur huge negative payoffs.
In this question, you will choose settings of the discount, noise, and living reward parameters for this
MDP to produce optimal policies of several different types. Your setting of the parameter values for
each part should have the property that, if your agent followed its optimal policy without being subject
to any noise, it would exhibit the given behavior. If a particular behavior is not achieved for any setting
of the parameters, assert that the policy is impossible by returning the string 'NOT POSSIBLE'.
Here are the optimal policy types you should attempt to produce:
Prefer the close exit risking the cliff
Prefer the close exit but avoiding the cliff
Prefer the distant exit risking the cliff
Prefer the distant exit avoiding the cliff
Avoid both exits and the cliff so an episode should never terminate
HINT: Consider which of the three parameters influence the desired behavior.
Solve the process using formula's instead of code. Detailed explanation.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
