A soccer robot R is on a fast break toward the goal, starting in position 1....
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
A soccer robot R is on a fast break toward the goal, starting in position 1. From positions 1 through 3, it can either shoot (S) or dribble the ball forward (D). From 4 it can only shoot. If it shoots, it either scores a goal (state G) or misses (state M). If it dribbles, it either advances a square or loses the ball, ending up in state M. R 1 2 3 P(G|k, S) In this Markov Decision Process (MDP), the states are 1, 2, 3, 4, G, and M, where G and M are terminal states. The transition model depends on the parameter y, which is the probability of dribbling successfully (i.e., advancing a square). Assume a discount of 71. For k {1,2,3,4}, we have = 6 and rewards are 0 for all other transitions. 4 P(M|k, S) = 1 - P(k+1|k, D) = y k 6 P(M|k, D) = 1-y, R(k, S, G) = 1 Goal (a) (3 points) Denote by V" the value function for the specific policy T. What is V" (1) for the policy that always shoots? (b) (4 points) Denote by Q'(s. a) the value of a q-state (s, a), which is the expected utility when starting with action a at states, and thereafter acting optimally. What is Q'(3. D) in terms of y? (c) (5 points) Denote by V (s) the value of a state s at iteration t, which is the expected utility when starting in states and acting optimally. Using y, complete the first two iterations (t = 1.2) of value iteration. Iteration 0 corresponds to having value 0 in every state: V(1) = V(2) = V(3) = V(4) = 0. Hint: Recall that V1(s) = max ΣP(s's, a) (R(s. a, s') + V (s')). aEA (d) (3 points) For what range of values of y is Q' (3, S) 2 Q*(3, D)? A soccer robot R is on a fast break toward the goal, starting in position 1. From positions 1 through 3, it can either shoot (S) or dribble the ball forward (D). From 4 it can only shoot. If it shoots, it either scores a goal (state G) or misses (state M). If it dribbles, it either advances a square or loses the ball, ending up in state M. R 1 2 3 P(G|k, S) In this Markov Decision Process (MDP), the states are 1, 2, 3, 4, G, and M, where G and M are terminal states. The transition model depends on the parameter y, which is the probability of dribbling successfully (i.e., advancing a square). Assume a discount of 71. For k {1,2,3,4}, we have = 6 and rewards are 0 for all other transitions. 4 P(M|k, S) = 1 - P(k+1|k, D) = y k 6 P(M|k, D) = 1-y, R(k, S, G) = 1 Goal (a) (3 points) Denote by V" the value function for the specific policy T. What is V" (1) for the policy that always shoots? (b) (4 points) Denote by Q'(s. a) the value of a q-state (s, a), which is the expected utility when starting with action a at states, and thereafter acting optimally. What is Q'(3. D) in terms of y? (c) (5 points) Denote by V (s) the value of a state s at iteration t, which is the expected utility when starting in states and acting optimally. Using y, complete the first two iterations (t = 1.2) of value iteration. Iteration 0 corresponds to having value 0 in every state: V(1) = V(2) = V(3) = V(4) = 0. Hint: Recall that V1(s) = max ΣP(s's, a) (R(s. a, s') + V (s')). aEA (d) (3 points) For what range of values of y is Q' (3, S) 2 Q*(3, D)?
Expert Answer:
Related Book For
Financial Accounting and Reporting a Global Perspective
ISBN: 978-1408076866
4th edition
Authors: Michel Lebas, Herve Stolowy, Yuan Ding
Posted Date:
Students also viewed these accounting questions
-
You throw 3 balls from the top of a building. All 3 have the same initial speed and are thrown simultaneously. Ball 1 is thrown at an angle of 30 o above the horizontal, ball 2 is thrown...
-
Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...
-
Managing Scope Changes Case Study Scope changes on a project can occur regardless of how well the project is planned or executed. Scope changes can be the result of something that was omitted during...
-
Using the following information, determine the activity rate for the quality inspections activity. Budgeted Activity Cost Pool Activity Setups Estimated Activity Base 1,000 $60,000 Purchase orders...
-
See Problem 15. A restaurant critic ranks restaurants as excellent, good, fair, or poor. We want to judge whether two restaurant critics have the same distribution of opinions. The opinions of critic...
-
When the equity method of accounting for investments is used, dividends received from the investee corporation reduce the balance in the investment account on the investor's balance sheet and aren't...
-
How can management control against unauthorized or duplicate cash payments?
-
As the comptroller of a hospital, you were just informed that one of the surgeons failed to remove an instrument from a patients innards. The hospital is certain to be sued. How should this...
-
Book 1:17: Consider the problem of computing r for given integers x and y, where x is an n-bit number in binary. We want the whole answer, not modulo a third integer. We know two algorithms for doing...
-
Reba Dixon is a fifth-grade school teacher who earned a salary of $38,000 in 2019. She is 45 years old and has been divorced for four years. She receives $1,200 of alimony payments each month from...
-
This statement may be used to stopa loop's current iteration and begin the next one. terminate O break re-iterate O None of these O continue c++ language
-
Explain it in detail. 1. A 1200-kilogram car traveling at a constant speed of 9.0 meters per second turns at an intersection. The car follows a horizontal circular path with a radius of 25 meters to...
-
An alpha particle (2 protons and 2 neutrons) is traveling with a velocity of = (0.500 + 1.00 + 1.00k) x 1062 in a magnetic field B = (-0.500f + 0.300 + 0.800k) 10-T. a) What is the force on the alpha...
-
Create an amortization table for a 25-year mortgage loan that has monthly payments. The values in the input range should be loan face value ($500,000), APR (4.5%), and monthly prepayment amount...
-
This Individual assignment primarily focuses on "product failure/ flop". Students are required to select a Brand of Product or Service .Write about a product or service that was thought to be the...
-
(a) Assume a flash memory can have 6 levels {0, 1, 2, 3, 4, 5} and the errors can be of limited value of magnitude +1 (i.e. a digit at level i can change to level (i+1) (mod 6)). With one digit we...
-
A dry food product has been exposed to a 30% relative-humidity environment at 15C for 5 h without a weight change. The moisture content has been measured and is at 7.5% (wet basis). The product is...
-
For the following exercises, rewrite the sum as a product of two functions or the product as a sum of two functions. Give your answer in terms of sines and cosines. Then evaluate the final answer...
-
The Company Lalo Company, headquartered in Vaduz, is a company listed in Amsterdam, Paris and Zurich. It is the third largest small home appliance manufacturer in Europe. The company was founded in...
-
Identify at least five classes of users (including at least one not-for-profit organization) of financial information about a given business (specify clearly the characteristics of the business you...
-
Debussy Company has prepared a set of financial statements: balance sheet, income statement and statement of retained earnings (see Exhibit 1). The accounting period X1 ends on 30 September X1. Due...
-
During the 2002 Winter Olympics in Salt Lake City, Utah, a local microbrewery received a rush order for 100 gallons of beer containing at least 4.0 volume \(\%\) alcohol. Although no \(4 \%\) beer...
-
Willy Wonka has engaged your consultation services to assist in the recipe formulation of a new brand of chocolate bar, weighing \(100 \mathrm{~g}\), which he plans to name "Super-choc." Each...
-
A batch distillation facility has a bank of columns of Type 1 and another bank of Type 2. Type 1 columns are available for processing \(6,000 \mathrm{hr} /\) week, and Type 2 columns are available...
Study smarter with the SolutionInn App