Question: Exercise 9.17 Consider a grid world where the action up has the following dynamics: That is, it goes up with probability 0.8, up-left with probability

Exercise 9.17 Consider a grid world where the action “up” has the following dynamics:

That is, it goes up with probability 0.8, up-left with probability 0.1, and up-right with probability 0.1. Suppose we have the following states:
s12 s13 s14 s17 s18 s19 There is a reward of +10 upon entering state s14, anda reward of −5 upon entering state s19. All other rewards are 0.
The discount is 0.9.

Suppose we are doing asynchronous value iteration, storing Q[S,A], and we have the following values for these states:
V(s12) = 5 V(s13) = 7 V(s14) = −3 V(s17) = 2 V(s18) = 4 V(s19) = −6 Suppose, in the next step of asynchronous value iteration, we select state s18 and action up. What is the resulting updated value for Q[s18, up]? Give the numerical formula, but do not evaluate or simplify it.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Management And Artificial Intelligence Questions!

Q:

a