Question: Consider the 3 Ã 3 world shown in Figure 17.14(a). The transition model is the same as in the 4 Ã 3 Figure 17.1: 80%

Consider the 3 × 3 world shown in Figure 17.14(a). The transition model is the same as in the 4 × 3 Figure 17.1: 80% of the time the agent goes in the direction it selects; the rest of the time it moves at right angles to the intended direction.

Implement value iteration for this world for each value of r below. Use discounted rewards with a discount factor of 0.99. Show the policy obtained in each case. Explain intuitively why the value of r leads to each policy.

a. r = 100

b. r = ˆ’3

c. r = 0

d. r = +3


Figure 17.1

+1 0.8 0.1 0.1 2 START 2 3 (a) (b)

+1 0.8 0.1 0.1 2 START 2 3 (a) (b)

Step by Step Solution

3.47 Rating (170 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

a r 100 See the comments for part d This should have been r 100 to illustrate an alternative behavio... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Artificial Intelligence Modern Questions!