Question: Consider an MDP with a single non - terminal state ( NT ) and two terminal states T 1 and T 2 , respectively. When

Consider an MDP with a single non-terminal state (NT) and two terminal states T1
and T2, respectively. When in state NT, the next state is NT itself with probability
0.6, T1 with probability 0.2 or T2 with probability 0.2, respectively. If the transition
from NT is to either NT itself or to T1, the single-stage reward is 1. On the other
hand, if the transition from NT is to T2, the single-stage reward is 0. We observe two
episodes of this MDP both starting in state NT. The first episode terminates in T1
and gives a total reward of 16. The second episode terminates in T2 and gives a total
reward of 6.
(a) Write down both the episodes completely by writing the sequence of states visited
and the single-stage rewards obtained? (2)
(b) Write down the first visit and every visit estimates of the value of NT?(2)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!