Question: Consider the following grid world in which you will implement TD learning and Q - learning techniques to find the values of these states. Suppose

Consider the following grid world in which you will implement TD learning and Q-learning techniques to find the values of these states.
Suppose that we have the following observed transitions: H (A, East, C,3),(C, South, B,4),(C, East, G,1),(C, East, E,5),(E, North, D,3),\((E \), North, \( F,6),(E, N o r t h, H,4)\) The initial value of each state is 0. Assume that \(\mathrm{y}=1\) and \(\alpha=0.5\).
(a) What are the learned values from TD learning after all seven observations?
(b) What are the learned Q-values from Q-learning after all seven observations?
Consider the following grid world in which you

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!