Question: consider a reinforcement learning setup, where the agent can take two actions a={0, 1}. There are two states s = {0, 1}, and there is

consider a reinforcement learning setup, where the agent can take two actions a={0, 1}. There are two states s = {0, 1}, and there is no discounting (gamma=1). Over an episode of three time steps, the agent has visited the sequence of state-actions {(1,1), (0,1), (0,0)}. The associated rewards have been {1, -1, 1}. Our previous guess for the value in the state-action pair (1, 0) is Q(1, 0)=0.125, and we are in the second episode. We follow "first-visit" Monte Carlo. Given the new experience from the episode, we would have that (choose 1 in below):

a) Q(1,0) = 0

b) Q(1,0) = 0.25

c) Q(1,0) = 0.125

d) none of above

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!