Question: Problem 2 : Policy Evaluation ( 2 5 points ) In problem 2 you will implement policy evaluation as follows V ( s ) =
Problem : Policy Evaluation points
In problem you will implement policy evaluation as follows
This time we have discounting and we also introduce a new variable for the number of iterations. Here is the first test case.Note that there is no randomness involved this time and that we use discounting. As usual, your first task is to implement the parsing of this grid MDP in the function readgridmdpproblempfilepath of the file parse.py You may use any appropriate data structure.
Next you implement value iteration for policy evaluation as discussed in class. Your policyevaluationproblem function in
ppy should return the evolution of values as follows.
This example should look familiar. We have covered it in chapter of our lecture slides.
Hint: The output of an individual floating point value was done as follows
returnvalue : formatv
Finally, check the correctness of your implementation via
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
