Question: Problem 4 . ( 2 5 points ) Consider the following MDP with two states S = { s 1 , s 2 } and

Problem 4.(25 points) Consider the following MDP with two states S={s1,s2} and three actions A={0,12,1}. The expressions on the arrows indicate the probability of corresponding transition on taking an action ainA. That is start with v0(s1)=0,v0(s2)=0. You may execute these algorithms either by hand or using a computer program.
(Approximate answers rounded to two decimal places will be accepted).
In your solution, copy the code, and provide the value vector vk for at least 4 iterations of value iteration, and
policy k for at least 4 iterations of policy iteration.
You are required to implement value iteration and policy iteration in your code, and not use a built in tool like
MDP toolbox. Pr(s1|s1,a)=a22,Pr(s2|s2,a)=a24. Rewards are given by r(s1,a)=-a,r(s2,a)=-1+a12.
Solve this MDP (i.e., find a stationary policy that maximizes expected discounted reward) for =0.5, using policy
iteration and value iteration. For policy iteration, start with 0(s1)=0,0(s2)=0. For value iteration, you may
Problem 4 . ( 2 5 points ) Consider the following

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!