Question: Question 2 (RL) [50 points - each part 12.5 points]: Consider the following grid world with five different states. The actions are move east, west,

 Question 2 (RL) [50 points - each part 12.5 points]: Considerthe following grid world with five different states. The actions are moveeast, west, south, north, and exit if it is in a terminal

Question 2 (RL) [50 points - each part 12.5 points]: Consider the following grid world with five different states. The actions are move east, west, south, north, and exit if it is in a terminal state. (a) We would like to use Model-based learning using the following four observations. What is the estimated Transition and reward based on these observations? (b) Implement direct evaluation as a model-free based learning based on those four observations and calculate the value states for each state. Assume =0.9. (c) We would like to use TD learning and Q-learning to find the values of these states. Suppose that we have the following observed transitions (s,a,s,r) : (B, East, C,3), (C, South, E, 3), (C, East, E,4) , (D, West, C,1), (A,South,C,3) The initial value of each state is 0 . Assume that =0.9 and =0.4. What are the learned values from TD learning after all five observations? Show the process of computing these values. (d) What are the learned Q-values from Q-learning after all five observations? Show the process of computing these values

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!