Question: Run algorithm PI on the problem of Figure 6.15 starting from the following policy: 0(s1)=0(s2)=a, 0(s3)=b,0(s4)=c (a) Compute V0(s) for the four nongoal states. (b)

Run algorithm PI on the problem of Figure 6.15 starting from the following policy: 0(s1)=0(s2)=a, 0(s3)=b,0(s4)=c (a) Compute V0(s) for the four nongoal states. (b) What is the greedy policy of V0 ? (c) Iterate on the above two steps until reaching a fixed point. Figure 6.15. An SSP problem with five states and four actions a,b,c, and d; only action a is nondeterministic, with the probabilities shown in the figure; the cost of a and b is 1 , the cost of c and d is 100 ; the initial state is s1; the goal is s5
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
