Let be the following Markovian decision process of Figure 4, which transitions are labeled by the names
Question:
Let be the following Markovian decision process of Figure 4, which transitions are labeled by the names of the actions and the probabilities of transitions, the states are labeled with the corresponding rewards. It models the possible actions of a character being chased by others. In the Tranquil State, nothing happens - no enemies on the horizon. In this state he receives a reward of +2. If it waits and does nothing, it risks being located and caught by the enemy (probability 0.2). If caught he has a -5 penalty. In the Tranquil state, if it moves, it has a 50% chance to ambush and be surrounded by the enemy. Once surrounded or caught, it may attempt to flee or defend itself.
In the following table, indicate the utility values of the states at the end of each of the first two iterations of the value-iteration algorithm, assuming that an attenuation factor is used (discount factor) of 0.9 and starting initially (iteration 0) with state values all equal to 0. Indicate only the values, do not detail the calculations.
2. Give the action plan (policy) resulting from iteration 2.
Probability and Random Processes With Applications to Signal Processing and Communications
ISBN: 978-0123869814
2nd edition
Authors: Scott Miller, Donald Childers