Question: Consider a two - state Markov decision process ( MDP ) with state s 1 and state s 2 . In state s 1 ,

Consider a two-state Markov decision process (MDP) with state s1 and state s2. In state s1, the decision maker chooses either action a1 or action a2; In state s2, only action a3 is available. The immediate returns and transition probabilities are as follows.
r(s1, a1)=4, r(s1, a2)=10, r(s2, a3)=2, p(s1|s1, a1)= p(s2|s1, a1)=0.5, p(s2|s1, a2)=1, p(s1|s2, a3)=0.2, p(s2|s2, a3)=0.8.
(a) Solve the three-periods problem with terminal reward r4(s1)= r4(s2)=0 to maximize the expected total rewards and find the optimal decision rule in each period.
(b) Consider the infinite-horizon discounted MDP with discounted factor \lambda =0.5. Calculate the expected total discounted reward of a stationary policy \delta \infty with \delta (s1)= a1 and \delta (s2)= a3. Also, use the optimality equations to check if it is the optimal policy.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!