Question: consider a reinforcement learning setup, where the agent can take two actions a={0, 1}. There are two states s = {0, 1}, and there is

consider a reinforcement learning setup, where the agent can take two actions a={0, 1}. There are two states s = {0, 1}, and there is no discounting (gamma=1). Over an episode of three time steps, the agent has visited the sequence of state-actions {(0,0), (0,1), (1,0)}. The associated rewards have been {0, -2, 1}. Our previous guess for the value in the state-action pair (0, 0) is Q(0, 0)=0.1, and we are in the second episode. Using Monte Carlo updating, what is Q(0,0)?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!