In this example, we model an autonomous vacuum cleaner operating in a room as an MDP...

Fantastic news! We've Found the answer you've been seeking!

Question:

In this example, we model an autonomous vacuum cleaner operating in a room as an MDP with four states. We

4. State T: Termination T(G,W,D) T(D,C,G) T(D,C,D) T(D,W,D) If the vacuum cleaner cleans the room, it

T(VD,W,VD) T(VD,W,T) 0.5 0.5 R(VD,W,VD) R(VD,W,T) -3 -10

Problem 1: Perform two iterations of the value iteration procedure. Assume that the starting states are as

Transcribed Image Text:

In this example, we model an autonomous vacuum cleaner operating in a room as an MDP with four states. We assume that there is a predefined rate at which dirt is accumulating (the rate is irrelevant to our current problem state). So, over time the room can become dirty. The Autonomous vaccume cleaner has to develop a policy. 1. State G: Clean/Good In this state, the room is clean and in good condition. Actions available: 2. State D: Dirty Wait/ Do nothing (W) Clean the room using the vaccum(C) Rewards: If the vacuum cleaner cleans the room, it receives a positive reward for performing the task. In this state, the room is dirty. Actions available: If it idles, it also receives a positive reward. Think about this as saving energy, charging the battery, and preventing from overheating. However, if it does not clean, there is a chance that the state can change to 'Dirty'. Wait/ Do nothing (W) Clean the room using the vaccum (C) Rewards: If the vacuum cleaner cleans the room, it receives a positive reward for performing the task. It can either transition to 'Good' state or stay in the 'Dirty' state. If it idles, it incurs a penalty (negative reward) for not cleaning. As a result there is chance that it will still stay in 'Dirty' state, or transition to "Very Dirty' state. 3. State VD: Very Dirty In this state, the room is very dirty. Actions available: Wait/ Do nothing (W) Clean the room using the vaccum(C) Rewards: 4. State T: Termination T(G,W,D) T(D,C,G) T(D,C,D) T(D,W,D) If the vacuum cleaner cleans the room, it receives a higher positive reward for performing the task due to the increased dirtiness. Theer is chance that the state of the room may revert to 'Dirty'. If it idles, it incurs a larger penalty for not cleaning and there is a chance that it will run into a termination state. - This is a termination state. The vacuum cleaner enters this state when a certain termination condition is met (e.g., a predefined cleaning time limit, a certain level of dirtyness reached which is not reversable). Actions available: None (No actions can be taken in this state). Rewards: no reward. T(D,W,VD) T(VD,C,D) T(VD,C,VD) +1 CLEAN - WAIT Y = 0.5 2 T G -10 Transition probability Probability value T(G,C,G) T(G,W,G) 1 0.5 0.5 0.5 3 0.5 0.5 -3 0.5 0.5 0.5 +2 0.5 -1 0.5 Figure 1 State diagram for the Autonomous vaccum Transition probability: The transition probability T(s,a,s') and the associated rewards are shown in the Table 1: VD Table 1 0.5 R(G,C,G) R(G,W,G) R(G,W,D) R(D,C,G) Reward R(D,C,D) R(D,W,D) +2 0.5 +3 R(D,W,VD) R(VD,C,D) R(VD,C,VD) D +3 Reward value +1 +3 -1 +2 +2 -2 -2 +3 +3 T(VD,W,VD) T(VD,W,T) 0.5 0.5 R(VD,W,VD) R(VD,W,T) -3 -10 Problem 1: Perform two iterations of the value iteration procedure. Assume that the starting states are as follows: V(G) = 1, V(D) = 2, V (VD) = 3, V(T) = 0 You need to show each step clearly here on how you reached to each of those values. The following steps may be helpful. 1. Perform the first round of value iteration using Bellman's equation starting from the start state. (15 points) 2. Perform second round of iteration using the Bellman's equation using the values from the first iteration (15 points) 3. Complete Table 2. (5 points) Table 2 Iteration V(s) V (s) V3(S) G 1 D 2 VD 3 T 0 In this example, we model an autonomous vacuum cleaner operating in a room as an MDP with four states. We assume that there is a predefined rate at which dirt is accumulating (the rate is irrelevant to our current problem state). So, over time the room can become dirty. The Autonomous vaccume cleaner has to develop a policy. 1. State G: Clean/Good In this state, the room is clean and in good condition. Actions available: 2. State D: Dirty Wait/ Do nothing (W) Clean the room using the vaccum(C) Rewards: If the vacuum cleaner cleans the room, it receives a positive reward for performing the task. In this state, the room is dirty. Actions available: If it idles, it also receives a positive reward. Think about this as saving energy, charging the battery, and preventing from overheating. However, if it does not clean, there is a chance that the state can change to 'Dirty'. Wait/ Do nothing (W) Clean the room using the vaccum (C) Rewards: If the vacuum cleaner cleans the room, it receives a positive reward for performing the task. It can either transition to 'Good' state or stay in the 'Dirty' state. If it idles, it incurs a penalty (negative reward) for not cleaning. As a result there is chance that it will still stay in 'Dirty' state, or transition to "Very Dirty' state. 3. State VD: Very Dirty In this state, the room is very dirty. Actions available: Wait/ Do nothing (W) Clean the room using the vaccum(C) Rewards: 4. State T: Termination T(G,W,D) T(D,C,G) T(D,C,D) T(D,W,D) If the vacuum cleaner cleans the room, it receives a higher positive reward for performing the task due to the increased dirtiness. Theer is chance that the state of the room may revert to 'Dirty'. If it idles, it incurs a larger penalty for not cleaning and there is a chance that it will run into a termination state. - This is a termination state. The vacuum cleaner enters this state when a certain termination condition is met (e.g., a predefined cleaning time limit, a certain level of dirtyness reached which is not reversable). Actions available: None (No actions can be taken in this state). Rewards: no reward. T(D,W,VD) T(VD,C,D) T(VD,C,VD) +1 CLEAN - WAIT Y = 0.5 2 T G -10 Transition probability Probability value T(G,C,G) T(G,W,G) 1 0.5 0.5 0.5 3 0.5 0.5 -3 0.5 0.5 0.5 +2 0.5 -1 0.5 Figure 1 State diagram for the Autonomous vaccum Transition probability: The transition probability T(s,a,s') and the associated rewards are shown in the Table 1: VD Table 1 0.5 R(G,C,G) R(G,W,G) R(G,W,D) R(D,C,G) Reward R(D,C,D) R(D,W,D) +2 0.5 +3 R(D,W,VD) R(VD,C,D) R(VD,C,VD) D +3 Reward value +1 +3 -1 +2 +2 -2 -2 +3 +3 T(VD,W,VD) T(VD,W,T) 0.5 0.5 R(VD,W,VD) R(VD,W,T) -3 -10 Problem 1: Perform two iterations of the value iteration procedure. Assume that the starting states are as follows: V(G) = 1, V(D) = 2, V (VD) = 3, V(T) = 0 You need to show each step clearly here on how you reached to each of those values. The following steps may be helpful. 1. Perform the first round of value iteration using Bellman's equation starting from the start state. (15 points) 2. Perform second round of iteration using the Bellman's equation using the values from the first iteration (15 points) 3. Complete Table 2. (5 points) Table 2 Iteration V(s) V (s) V3(S) G 1 D 2 VD 3 T 0

Related Book For answer-question

answer-question

An Introduction to Management Science Quantitative Approaches to Decision Making

An Introduction to Management Science Quantitative Approaches to Decision Making

ISBN: 978-1111823610

14th edition

Authors: David R. Anderson, Dennis J. Sweeney, Thomas A. Williams, Jeffrey D. Camm, James J. Cochran

See More Books

Posted Date: Jan 22, 2024 06:15 AM