Question: Utility, Policy, and Their Calculation Consider the 4 x 3 environment discussed in the lecture. Let a be the following policy Right if you can;

 Utility, Policy, and Their Calculation Consider the 4 x 3 environment

Utility, Policy, and Their Calculation Consider the 4 x 3 environment discussed in the lecture. Let a be the following policy Right if you can; Else, UP if you can; Otherwise, Left; For example, 7(1,1) = Right, (1, 2) = Up, and (4,1) = Left(???) Assume that the discount factor y = 1 and the transition is deterministic -1 > i.e. P('|s, a) is either 0 or 1. E.g., P((2,1)|(1,1), Right) = 1, while P(1, 2)|(1,1), Right) = 0 1 2 3 Q.3) Value Iteration / 10 Calculate U" (s) for every s (excluding (4, 1)) using the Bellman Equation and the reward function discussed in class. U*(s) = R(s) + P(s'|s, 7(8))U"(8') (For example, U"(3, 3) = 1 and U*(3, 2) = -1.) 5 Q.4) Policy Iteration What would (1,1) be if using the U* calculated in Q.3), one step of the following policy update rule is applied on (1, 1); (8) +- arg max (R(s, a) + P(s'|s, a)U"(8') GEAC) where A(8) is the set of actions available to the state s. P(165,0a) (m) Utility, Policy, and Their Calculation Consider the 4 x 3 environment discussed in the lecture. Let a be the following policy Right if you can; Else, UP if you can; Otherwise, Left; For example, 7(1,1) = Right, (1, 2) = Up, and (4,1) = Left(???) Assume that the discount factor y = 1 and the transition is deterministic -1 > i.e. P('|s, a) is either 0 or 1. E.g., P((2,1)|(1,1), Right) = 1, while P(1, 2)|(1,1), Right) = 0 1 2 3 Q.3) Value Iteration / 10 Calculate U" (s) for every s (excluding (4, 1)) using the Bellman Equation and the reward function discussed in class. U*(s) = R(s) + P(s'|s, 7(8))U"(8') (For example, U"(3, 3) = 1 and U*(3, 2) = -1.) 5 Q.4) Policy Iteration What would (1,1) be if using the U* calculated in Q.3), one step of the following policy update rule is applied on (1, 1); (8) +- arg max (R(s, a) + P(s'|s, a)U"(8') GEAC) where A(8) is the set of actions available to the state s. P(165,0a) (m)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!