Question: Consider the 3 Ã 3 world shown in Figure 17.14(a). The transition model is the same as in the 4 Ã 3 Figure 17.1: 80%
Implement value iteration for this world for each value of r below. Use discounted rewards with a discount factor of 0.99. Show the policy obtained in each case. Explain intuitively why the value of r leads to each policy.
a. r = 100
b. r = 3
c. r = 0
d. r = +3
Figure 17.1

+1 0.8 0.1 0.1 2 START 2 3 (a) (b)
Step by Step Solution
3.47 Rating (170 Votes )
There are 3 Steps involved in it
a r 100 See the comments for part d This should have been r 100 to illustrate an alternative behavio... View full answer
Get step-by-step solutions from verified subject matter experts
