Question: Exercise 9.17 Consider a grid world where the action up has the following dynamics: That is, it goes up with probability 0.8, up-left with probability

Exercise 9.17 Consider a grid world where the action “up” has the following dynamics:

That is, it goes up with probability 0.8, up-left with probability 0.1, and up-right with probability 0.1. Suppose we have the following states:
s12 s13 s14 s17 s18 s19 There is a reward of +10 upon entering state s14, anda reward of −5 upon entering state s19. All other rewards are 0.
The discount is 0.9.

Suppose we are doing asynchronous value iteration, storing Q[S,A], and we have the following values for these states:
V(s12) = 5 V(s13) = 7 V(s14) = −3 V(s17) = 2 V(s18) = 4 V(s19) = −6 Suppose, in the next step of asynchronous value iteration, we select state s18 and action up. What is the resulting updated value for Q[s18, up]? Give the numerical formula, but do not evaluate or simplify it.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Management And Artificial Intelligence Questions!

Briefly describe ASCII and Unicode and draw attention to any relationship between them. [3 marks] (b) Briefly explain what a Reader is in the context of reading characters from data. [3 marks] A...

Exercise 9.10 Consider a 5 5 grid game similar to the game of the previous question. The agent can be at one of the 25 locations, and there can be a treasure at one of the corners or no treasure. In...

16. Consider a 5 5 grid game similar to the game of the previous question. The agent can be at one of the 25 locations, and there can be a treasure at one of the corners or no treasure. Assume the...

I need a step by step solutions for the 10 problems in the excel file. The correct answers are provided I only need the solutions steps. Subject is Finance (options valuation) On March 2, a Treasury...

Exercise 9.9 Consider a game world: The robot can be at one of the 25 locations on the grid. There can be a treasure on one of the circles at the corners. When the robot reaches the corner where the...

In this exercise, you have a set of multiple choice questions. In each question, only one of the given options is correct, and only one can be selected. 1. A reactive agent: a) Integrates sensory...

Algorithms in Artificial Intelligence (or, the old name: Introduction to Algorithmic Decision Making) Part 1 Based on slides by David Sarne and Lirong Xia Course Tentative Schedule Introduction...

Management 587 Case/Assignment/Summary Activity Name Texas A&M-Commerce In partial fulfillment of the requirements for MGT 587 Professor Lloyd M. Basham June 8, 2014 (The above [and the next 3 lines]...

well explain ed answers' 1. Consider a homeowner with Von-Neumann and Morgenstern utility function u, where u(x) = 1 - e" for wealth level r, measured in million US dollars. His entire wealth is his...

This problem continues the Davis Consulting situation from Problem. Daviss March Cash T-account from its general ledger is as follows: Daviss bank statement dated March 31, 2015, follows:...

Use the graph of y = f(x) shown to the right to graph the following function g. 9(x)-> -3 Choose the correct graph below. OA OB. C. C D.

talent analytics is about making work force decisions based on metrics