Question: (a) [3 points (Written)] Consider a fixed stochastic policy and imagine running several rollouts of this policy within the environment. Naturally, depending on the stochasticity
![(a) [3 points (Written)] Consider a fixed stochastic policy and imagine](https://dsd5zvtm8ll6.cloudfront.net/si.experts.images/questions/2024/10/670a496e9169b_406670a496e801aa.jpg)
(a) [3 points (Written)] Consider a fixed stochastic policy and imagine running several rollouts of this policy within the environment. Naturally, depending on the stochasticity of the MDP M and the policy itself, some trajectories are more likely than others. Write down an expression for p"(r), the likelihood of sampling a trajectory T = (so, do, $1, 01,...) by running a in M. Note: Having an expression for this likelihood is very useful in practice. For further contest consider the following equation which can be used to calculate the value of particular state so for a policy a, V ($0) = END" CYR(st.at) | 80 In practice, we require the distribution of trajectories to evaluate the above expectation. The likelihood erpres- sion we derive in this question is useful in describing this distribution
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
