Consider the 3 Ã 3 world shown in Figure 17.14(a). The transition model is the same as
Question:
Implement value iteration for this world for each value of r below. Use discounted rewards with a discount factor of 0.99. Show the policy obtained in each case. Explain intuitively why the value of r leads to each policy.
a. r = 100
b. r = 3
c. r = 0
d. r = +3
Figure 17.1
Fantastic news! We've Found the answer you've been seeking!
Step by Step Answer:
Related Book For
Artificial Intelligence A Modern Approach
ISBN: 978-0136042594
3rd edition
Authors: Stuart Russell, Peter Norvig
Question Posted: