Question: Problem 2 . Consider a MDP with two states S = { 0 , 1 } , two actions A = { 1 , 2

Problem 2. Consider a MDP with two states S={0,1}, two actions A={1,2}, and the
follow reward function
Rs(a)={1,(s,a)=(0,1)4,(s,a)=(0,2)3,(s,a)=(1,1)2,(s,a)=(1,2)
and the transition probabilities Pss'(a) as follows:
[P00(1)P00(2)P10(1)P10(2)]=[13121423]
The other probabilities can be deduced, for example:
P01(1)=1-P00(1)=1-13=23.
The discount factor is
=34.
Exercise on model-free prediction:
(a) For the policy that chooses action 1 in state 0, and action 2 in state 1, starting from
state 0, generate one episode E of 10000 triplets of (Ri,Si,Ai),i=0,2,dots,9999, with
R0=0,S0=0.
(b) Based on the episode E, use Monte Carlo policy evaluation to estimate the value
function v(s).
(c) Based on the episode E, use n-step temporal difference policy evaluation to estimate
the value function v(s).
Exercise on model-free control:
(a) Use the SARSA algorithm to estimate the optimal action-value function q**(s,a), by
running the algorithm in Sutton and Barto's book (2nd edition, available online).
(b) Use the Q-learning algorithm to estimate the optimal action-value function q**(s,a), by
running the algorithm in Sutton and Barto's book (2nd edition, available online).
You only need to simulate one episode. In both cases, you will need to decide an appropriate
fixed step-size , and exploration probability lon, and number of time steps in the episode.
Problem 2 . Consider a MDP with two states S = {

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!