Let be the following Markovian decision process of Figure 4, which transitions are labeled by the names

Fantastic news! We've Found the answer you've been seeking!

Question:

Let be the following Markovian decision process of Figure 4, which transitions are labeled by the names of the actions and the probabilities of transitions, the states are labeled with the corresponding rewards. It models the possible actions of a character being chased by others. In the Tranquil State, nothing happens - no enemies on the horizon. In this state he receives a reward of +2. If it waits and does nothing, it risks being located and caught by the enemy (probability 0.2). If caught he has a -5 penalty. In the Tranquil state, if it moves, it has a 50% chance to ambush and be surrounded by the enemy. Once surrounded or caught, it may attempt to flee or defend itself.

Move 0.5 Wait 0.8 Tranquil Defend 0.3 Move 0.5 Escape 0.4 Defend 0.25 Wait 0.2 Figure 4 Caught -5 Escape 0.3

In the following table, indicate the utility values of the states at the end of each of the first two iterations of the value-iteration algorithm, assuming that an attenuation factor is used (discount factor) of 0.9 and starting initially (iteration 0) with state values all equal to 0. Indicate only the values, do not detail the calculations.

Tranquil Surrounded Caught After Iteration 1 After Iteration 2