You're trying to get to work. Home is at state 1 and work is at state...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
You're trying to get to work. Home is at state 1 and work is at state N=10. In between home and work are states [2,9]. Naturally, you're trying to get from state 1 to 10. At each state, you have the option to walk or to take the subway. If you walk, you will get to the next sequential state with a probability of 1.0, and it will take you 1 minute e.g. if at state 3, you'll get to state 4 with a probability of 1.0 and lose 1 minute. You also have the option of taking the subway but this is rife with complexities: taking the subway will take you 2 minutes (because it is presumably more time consuming to go underground and get to the station); taking the subway has a 0.5 probability of failing (e.g. the train isn't available) and so you'll stay in the same state; BUT it also has a 0.5 probability of doubling your state e.g. if you're currently at state 4, you'll jump to state 8. Note that an action can only be taken if it results in a valid state e.g. you can't take the subway at state 6 and beyond (because you'll be ending up at non-existent states). a) Write down the transition function 7 (s, a, s') and reward function R(s, a, s') for this MDP. [3 marks] b) There are too many states (almost all of which have a non-zero reward) in this problem to feasibly carry out (manual) value iteration for a meaningful number of iterations. Rather, we'll opt to use policy iteration. Starting with the policy below, carry out one step of policy evaluation, and then a subsequent step of policy extraction using the policy values to update the policy. [9 marks] State TT, (S) Vt(s) T1+1(S) 1 2 3 4 5 walk walk subway subway subway 6 walk 7 walk 8 walk 9 walk 10 - 0 You're trying to get to work. Home is at state 1 and work is at state N=10. In between home and work are states [2,9]. Naturally, you're trying to get from state 1 to 10. At each state, you have the option to walk or to take the subway. If you walk, you will get to the next sequential state with a probability of 1.0, and it will take you 1 minute e.g. if at state 3, you'll get to state 4 with a probability of 1.0 and lose 1 minute. You also have the option of taking the subway but this is rife with complexities: taking the subway will take you 2 minutes (because it is presumably more time consuming to go underground and get to the station); taking the subway has a 0.5 probability of failing (e.g. the train isn't available) and so you'll stay in the same state; BUT it also has a 0.5 probability of doubling your state e.g. if you're currently at state 4, you'll jump to state 8. Note that an action can only be taken if it results in a valid state e.g. you can't take the subway at state 6 and beyond (because you'll be ending up at non-existent states). a) Write down the transition function 7 (s, a, s') and reward function R(s, a, s') for this MDP. [3 marks] b) There are too many states (almost all of which have a non-zero reward) in this problem to feasibly carry out (manual) value iteration for a meaningful number of iterations. Rather, we'll opt to use policy iteration. Starting with the policy below, carry out one step of policy evaluation, and then a subsequent step of policy extraction using the policy values to update the policy. [9 marks] State TT, (S) Vt(s) T1+1(S) 1 2 3 4 5 walk walk subway subway subway 6 walk 7 walk 8 walk 9 walk 10 - 0
Expert Answer:
Answer rating: 100% (QA)
Transition function Ts a s Ps s 1 s a walk 10 Ps s 2 s a subway 05 Ps s s a subway 05 Ps s s a walk 00 Ps s s 6 a subway 10 Note that the subway action is not available at states 6 and beyond so the t... View the full answer
Related Book For
Income Tax Fundamentals 2013
ISBN: 9781285586618
31st Edition
Authors: Gerald E. Whittenburg, Martha Altus Buller, Steven L Gill
Posted Date:
Students also viewed these programming questions
-
Managing Scope Changes Case Study Scope changes on a project can occur regardless of how well the project is planned or executed. Scope changes can be the result of something that was omitted during...
-
The following additional information is available for the Dr. Ivan and Irene Incisor family from Chapters 1-5. Ivan's grandfather died and left a portfolio of municipal bonds. In 2012, they pay Ivan...
-
1. Prepare a schedule of cost of goods manufactured for Denim Bones for the year ended December 31, 2024. 2. Prepare an income statement for Denim Bones for the year ended December 31, 2024. 3. How...
-
Air enters a heating section at 95 kPa, 12C, and 30 percent relative humidity at a rate of 6 m3/min and it leaves at 25C. Determine (a) The rate of heat transfer in the heating section and (b) The...
-
Obtain (a) The Kb value for NO2; (b) The Ka value for C5H5NH+ (pyridinium ion).
-
A European recession and the U.S. economy a. In 2014, European Union spending on U.S. goods accounted for \(18 \%\) of U.S. exports (see Table 17-2), and U.S. exports amounted to \(15 \%\) of U.S....
-
1. Calculate the 2010 and 2009 liquidity ratios identified using the Ratio Analysis table above. Also calculate the change and the percentage change for the ratios and complete the table. 2. Analyze...
-
Draw a diagram with the wage-setting relation and price-setting relation. Label your wage-setting curve WS and your price setting line PS. Label the y axis the Real Wage and the x axis the...
-
What Happens When You Dont Deliver on Your Promises Web: www.clearly.ca Facebook: Clearly Canadian If a new product or service seems like the perfect option to solve a problem or capitalize on an...
-
U.S. Constitution How relevant is the U.S. Constitution today? Are our basic rights as citizens firmly protected or are we seeing some erosion of those rights? Consider laws regulating the right to...
-
With regard to a group life insurance policy. What is the purpose of medical evidence of insurability?
-
What is the difference between training and career development? Of these two concepts, which is more likely to increase retention in an organization? Why?
-
List the aspects of Aboriginal and/or Torres Strait Islander Peoples ways of learning the mathematics, music, art, and science.
-
The facts of the case are simple: A dispute in the shop at an auto parts manufacturing plant in Hamilton, Iowa, ended when one worker murdered another. At 2:00 p.m. police responded to a report of a...
-
What are there penalties for companies like Starbucks that refuse employees the right to unionize? Explain briefly
-
Since 2010, investor demand for, and company disclosure of information about, climate change risks, impacts, and opportunities has grown dramatically.[2] Consequently, questions arise about whether...
-
A test car is driven a fixed distance of n miles along a straight highway. (Here n Z+.) The car travels at one mile per hour for the first mile, two miles per hour for the second mile, four miles...
-
Phil and Linda are 25-year-old newlyweds and file a joint tax return. Linda is covered by a retirement plan at work, but Phil is not. a. Assuming Phil's wages were $27,000 and Linda's wages were...
-
Yolanda earns $112,000 in 2012. Calculate the FICA tax that must be paid by: Yolanda:.....................Soc.Sec..................$__________...
-
Please answer the following questions regarding the taxability of Social Security: a. A 68-year-old taxpayer has $20,000 in Social Security income and $100,000 in tax-free municipal bond income. Does...
-
The transfer function of a dynamic system is given by \[G(s)=\frac{s+1}{4 s^{4}+5 s^{3}+2 s^{2}+s+6} \] a. Using Routh's stability criterion, determine the stability of the system. b. Using MATLAB,...
-
Figure 10.40 shows a negative feedback control system. a. Design a P controller such that the damping ratio of the closed-loop system is 0.5 . b. Estimate the rise time, overshoot, and \(2 \%\)...
-
Consider the feedback system shown in Figure 10.26. a. Using Routh's stability criterion, determine the range of the control gain \(K\) for which the closed-loop system is stable. b. Use MATLAB...
Study smarter with the SolutionInn App