Question: (a) [3 points (Written)] Consider a fixed stochastic policy and imagine running several rollouts of this policy within the environment. Naturally, depending on the stochasticity

 (a) [3 points (Written)] Consider a fixed stochastic policy and imagine

(a) [3 points (Written)] Consider a fixed stochastic policy and imagine running several rollouts of this policy within the environment. Naturally, depending on the stochasticity of the MDP M and the policy itself, some trajectories are more likely than others. Write down an expression for p"(r), the likelihood of sampling a trajectory T = (so, do, $1, 01,...) by running a in M. Note: Having an expression for this likelihood is very useful in practice. For further contest consider the following equation which can be used to calculate the value of particular state so for a policy a, V ($0) = END" CYR(st.at) | 80 In practice, we require the distribution of trajectories to evaluate the above expectation. The likelihood erpres- sion we derive in this question is useful in describing this distribution

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!