Question: Artificial Intellegnce Question 0.59 0.67 0.77 0.57 0.6 0.60 0.780.66 0.85 1.00 0.67 5. (15 points) V(s), Q(s, a), 7(s) The Q-Values of a gridworld
0.59 0.67 0.77 0.57 0.6 0.60 0.780.66 0.85 1.00 0.67 5. (15 points) V(s), Q(s, a), 7(s) The Q-Values of a gridworld problem after many iterations are shown on the diagram 1,00 is the positive exit (escape from the gridworld), and -1.00 is the negative exit (death). 0.53 0.57 0.57 0.57 0.51 0.51 0.53 (-0.60 -1.00 0.86 0.89 0.30 0.88 0.00 -0.65 10.45 0.41 0.83 0.42 0.80 0.29 0.28 0.13 0.44 0.00 0.41 0.27 a) What are V-Values? Show them on a similar diagram with possible direction symbols. b) Write the policies that can be derived from the final V-Values. The agent will start from one of the bottom squares. c) Why is it better to use discounted utility when calculating rewards for an agent
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
