Question: Question 3: Trajectories, returns, and values (10 Bonus points) Consider the following fragment of an MDP graph. The fractional numbers indicate the world's transition probabilities

 Question 3: Trajectories, returns, and values (10 Bonus points) Consider the

Question 3: Trajectories, returns, and values (10 Bonus points) Consider the following fragment of an MDP graph. The fractional numbers indicate the world's transition probabilities and the whole numbers indicate the expected rewards. The three numbers at the bottom indicate what you can take to be the value of the correspond- ing states. The discount is 0.8. What is the value of the top node for the equiprobable random policy (all actions equally likely) and for the optimal policy? Show your work 0.75/ 0.25 0.2 r+3/6 Question 3: Trajectories, returns, and values (10 Bonus points) Consider the following fragment of an MDP graph. The fractional numbers indicate the world's transition probabilities and the whole numbers indicate the expected rewards. The three numbers at the bottom indicate what you can take to be the value of the correspond- ing states. The discount is 0.8. What is the value of the top node for the equiprobable random policy (all actions equally likely) and for the optimal policy? Show your work 0.75/ 0.25 0.2 r+3/6

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!