In this example, we model an autonomous vacuum cleaner operating in a room as an MDP...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
In this example, we model an autonomous vacuum cleaner operating in a room as an MDP with four states. We assume that there is a predefined rate at which dirt is accumulating (the rate is irrelevant to our current problem state). So, over time the room can become dirty. The Autonomous vaccume cleaner has to develop a policy. 1. State G: Clean/Good In this state, the room is clean and in good condition. Actions available: 2. State D: Dirty Wait/ Do nothing (W) Clean the room using the vaccum(C) Rewards: If the vacuum cleaner cleans the room, it receives a positive reward for performing the task. In this state, the room is dirty. Actions available: If it idles, it also receives a positive reward. Think about this as saving energy, charging the battery, and preventing from overheating. However, if it does not clean, there is a chance that the state can change to 'Dirty'. Wait/ Do nothing (W) Clean the room using the vaccum (C) Rewards: If the vacuum cleaner cleans the room, it receives a positive reward for performing the task. It can either transition to 'Good' state or stay in the 'Dirty' state. If it idles, it incurs a penalty (negative reward) for not cleaning. As a result there is chance that it will still stay in 'Dirty' state, or transition to "Very Dirty' state. 3. State VD: Very Dirty In this state, the room is very dirty. Actions available: Wait/ Do nothing (W) Clean the room using the vaccum(C) Rewards: 4. State T: Termination T(G,W,D) T(D,C,G) T(D,C,D) T(D,W,D) If the vacuum cleaner cleans the room, it receives a higher positive reward for performing the task due to the increased dirtiness. Theer is chance that the state of the room may revert to 'Dirty'. If it idles, it incurs a larger penalty for not cleaning and there is a chance that it will run into a termination state. - This is a termination state. The vacuum cleaner enters this state when a certain termination condition is met (e.g., a predefined cleaning time limit, a certain level of dirtyness reached which is not reversable). Actions available: None (No actions can be taken in this state). Rewards: no reward. T(D,W,VD) T(VD,C,D) T(VD,C,VD) +1 CLEAN - WAIT Y = 0.5 2 T G -10 Transition probability Probability value T(G,C,G) T(G,W,G) 1 0.5 0.5 0.5 3 0.5 0.5 -3 0.5 0.5 0.5 +2 0.5 -1 0.5 Figure 1 State diagram for the Autonomous vaccum Transition probability: The transition probability T(s,a,s') and the associated rewards are shown in the Table 1: VD Table 1 0.5 R(G,C,G) R(G,W,G) R(G,W,D) R(D,C,G) Reward R(D,C,D) R(D,W,D) +2 0.5 +3 R(D,W,VD) R(VD,C,D) R(VD,C,VD) D +3 Reward value +1 +3 -1 +2 +2 -2 -2 +3 +3 T(VD,W,VD) T(VD,W,T) 0.5 0.5 R(VD,W,VD) R(VD,W,T) -3 -10 Problem 1: Perform two iterations of the value iteration procedure. Assume that the starting states are as follows: V(G) = 1, V(D) = 2, V (VD) = 3, V(T) = 0 You need to show each step clearly here on how you reached to each of those values. The following steps may be helpful. 1. Perform the first round of value iteration using Bellman's equation starting from the start state. (15 points) 2. Perform second round of iteration using the Bellman's equation using the values from the first iteration (15 points) 3. Complete Table 2. (5 points) Table 2 Iteration V(s) V (s) V3(S) G 1 D 2 VD 3 T 0 In this example, we model an autonomous vacuum cleaner operating in a room as an MDP with four states. We assume that there is a predefined rate at which dirt is accumulating (the rate is irrelevant to our current problem state). So, over time the room can become dirty. The Autonomous vaccume cleaner has to develop a policy. 1. State G: Clean/Good In this state, the room is clean and in good condition. Actions available: 2. State D: Dirty Wait/ Do nothing (W) Clean the room using the vaccum(C) Rewards: If the vacuum cleaner cleans the room, it receives a positive reward for performing the task. In this state, the room is dirty. Actions available: If it idles, it also receives a positive reward. Think about this as saving energy, charging the battery, and preventing from overheating. However, if it does not clean, there is a chance that the state can change to 'Dirty'. Wait/ Do nothing (W) Clean the room using the vaccum (C) Rewards: If the vacuum cleaner cleans the room, it receives a positive reward for performing the task. It can either transition to 'Good' state or stay in the 'Dirty' state. If it idles, it incurs a penalty (negative reward) for not cleaning. As a result there is chance that it will still stay in 'Dirty' state, or transition to "Very Dirty' state. 3. State VD: Very Dirty In this state, the room is very dirty. Actions available: Wait/ Do nothing (W) Clean the room using the vaccum(C) Rewards: 4. State T: Termination T(G,W,D) T(D,C,G) T(D,C,D) T(D,W,D) If the vacuum cleaner cleans the room, it receives a higher positive reward for performing the task due to the increased dirtiness. Theer is chance that the state of the room may revert to 'Dirty'. If it idles, it incurs a larger penalty for not cleaning and there is a chance that it will run into a termination state. - This is a termination state. The vacuum cleaner enters this state when a certain termination condition is met (e.g., a predefined cleaning time limit, a certain level of dirtyness reached which is not reversable). Actions available: None (No actions can be taken in this state). Rewards: no reward. T(D,W,VD) T(VD,C,D) T(VD,C,VD) +1 CLEAN - WAIT Y = 0.5 2 T G -10 Transition probability Probability value T(G,C,G) T(G,W,G) 1 0.5 0.5 0.5 3 0.5 0.5 -3 0.5 0.5 0.5 +2 0.5 -1 0.5 Figure 1 State diagram for the Autonomous vaccum Transition probability: The transition probability T(s,a,s') and the associated rewards are shown in the Table 1: VD Table 1 0.5 R(G,C,G) R(G,W,G) R(G,W,D) R(D,C,G) Reward R(D,C,D) R(D,W,D) +2 0.5 +3 R(D,W,VD) R(VD,C,D) R(VD,C,VD) D +3 Reward value +1 +3 -1 +2 +2 -2 -2 +3 +3 T(VD,W,VD) T(VD,W,T) 0.5 0.5 R(VD,W,VD) R(VD,W,T) -3 -10 Problem 1: Perform two iterations of the value iteration procedure. Assume that the starting states are as follows: V(G) = 1, V(D) = 2, V (VD) = 3, V(T) = 0 You need to show each step clearly here on how you reached to each of those values. The following steps may be helpful. 1. Perform the first round of value iteration using Bellman's equation starting from the start state. (15 points) 2. Perform second round of iteration using the Bellman's equation using the values from the first iteration (15 points) 3. Complete Table 2. (5 points) Table 2 Iteration V(s) V (s) V3(S) G 1 D 2 VD 3 T 0
Expert Answer:
Answer rating: 100% (QA)
Solutions Step 1 A mathematical framework known as the Markov Decision Process MDP is used to describe decisionmaking in scenarios where outcomes are partially determined by chance and partially contr... View the full answer
Related Book For
An Introduction to Management Science Quantitative Approaches to Decision Making
ISBN: 978-1111823610
14th edition
Authors: David R. Anderson, Dennis J. Sweeney, Thomas A. Williams, Jeffrey D. Camm, James J. Cochran
Posted Date:
Students also viewed these programming questions
-
Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...
-
Head movement evaluations are important because disabled individuals may be able to operate communications aids using head motion. The paper Constancy of Head Turning Recorded in Healthy Young Humans...
-
In each of the following independent cases, determine the taxpayer's filing status and the number of personal and dependency exemptions the taxpayer is allowed to claim. a) Alexandra is a blind widow...
-
Seven hundred adults who eat ice cream regularly were asked about their favorite ice cream. The following table gives the two-way classification of their responses. a. One adult is randomly selected...
-
Crush Autosmashers can purchase a new electromagnet for moving cars at a cost of \($20,000.\) At the end of its useful life, the electromagnet will be worth \($1,000.\) If Crushs MARR is 12...
-
Ellison Inc., a manufacturer of steel school lockers, plans to purchase a new punch press for use in its manufacturing process. After contacting the appropriate vendors, the purchasing department...
-
A parallel-plate air-filled capacitor has a capacitance of 770 pF. If each of its plates has an area of 0.040 m, what is the separation? 4.6010-4 m If the region between the plates is now filled with...
-
Find the open-circuit voltage The circuit you should use to find the open-circuit voltage, voc, is shown here. Note that the resistor to the right of terminals a and b has been removed to create the...
-
Which of the following is not a factor that the IRS will consider when determining if a bad debt deduction is allowable? a. If the debt was to a related party b. If attempts were made to collect the...
-
a. Define what is Keynesian Economics and what is Neoliberalism. What is their main difference regarding the role of government in the market place/the economy and why it matters? b. Explain the...
-
Identify a characteristic of a defined contribution plan (DCP). Retirement income is determined by the employee's years of service and compensation. Employee contributions to a DCP do not reduce RRSP...
-
The most accurate statement based on the information provided is: "Lionel isn't required to sign his family returns because he does not charge for those, but he must sign any returns he prepares for...
-
Find (Y) for the following dataset: Y 5 3 4 8 2 6 7 6 1 2
-
A forensic scientist examines a hair with a microscope that has a 15x objective and a 5x eyepiece. The magnified hair has the same apparent size as a 2.0-cm-wide ribbon seen from a distance of 1.0 m....
-
Does anyone have the textbook solution for this question? Problem 8-2 Financial statements of Par Corp. and its subsidiary Star Inc. on December 31, Year 12, are shown below: BALANCE SHEETS At...
-
A consumer magazine is evaluating five brands of trash compactors for their effectiveness in reducing the volume of typical household products that are discarded. In the experiment, each block...
-
Manning Autos operates an automotive service. To complete their repair work, Manning mechanics often need to retrieve parts from the company's parts department counter. Mechanics arrive at the parts...
-
Recall the Inn is Investments problem (Chapter 2, Problem 39). Letting S = units purchased in the stock fund M = units purchased in the money market fund Leads to the following formulation: Min 8S +...
-
Recall the TJ Inc.'s problem (Chapter 2, Problem 28). Letting W = jars of Western Foods Salsa M = jars of Mexico City Salsa Leads to the formulation: Max 1W + 1.25M s.t. 5W + 7M ¤ 4480 Whole...
-
Classic LEGO plastic bricks have been fixtures in homes around the world for more than 70 years. Just 15 years ago, The LEGO Group (TLG) was near bankruptcy, spiraling downward and losing money at a...
-
For a sample of data where n = 7 given below: a. Calculate the mean, median, and mode. b. Calculate the range, variance, standard deviation, and coefficient of variation. c. Calculate the Z score....
-
For a sample of data where n = 6 given below: a. Calculate the mean, median, and mode. b. Calculate the range, variance, standard deviation, and coefficient of variation. c. Calculate the Z scores....
Study smarter with the SolutionInn App