Question: Consider the prediction problem on the MDP shown below, with transitions according to policy. The sole non-terminal state s has a self-loop with probability

Consider the prediction problem on the MDP shown below, with transitions according

Consider the prediction problem on the MDP shown below, with transitions according to policy. The sole non-terminal state s has a self-loop with probability 1-e, yielding reward 1. With probability , the episode terminates with a 0-reward. Assume (0, 1) and no discounting. 1-E, 1 " 8,0 Suppose at some time step t 0, we are in state s. Let our current estimate of V" (s) be VER. This question examines the variance of 1-step and Monte Carlo returns from s. Recall that for a real-valued random variable X, Var[X] = E[X2] - (E[X]). 5a. What is Var[Gt:t+1], where Gt:t+1 is the 1-step return? [2 marks] 5b. What is Var[Gt:], where Gt: is the Monte Carlo return? [2 marks] 5c. Does Vt play a role in determining which among these two returns is preferable? If so, how?; if not, why not? [1 mark]

Step by Step Solution

3.52 Rating (152 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

The detailed ... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Computer Engineering Questions!