Question: Problem 2 : Policy Evaluation ( 2 5 points ) In problem 2 you will implement policy evaluation as follows V ( s ) =

Problem 2: Policy Evaluation (25 points)
In problem 2 you will implement policy evaluation as follows
V(s)=s'?T(s,(s),s')[R(s,(s),s')+V(s')]
This time we have discounting and we also introduce a new variable for the number of iterations. Here is the first test case.Note that there is no randomness involved this time and that we use discounting. As usual, your first task is to implement the parsing of this grid MDP in the function read_grid_mdp_problem_p2(file_path) of the file parse.py. You may use any appropriate data structure.
Next you implement value iteration for policy evaluation as discussed in class. Your policy_evaluation(problem) function in
p2.py should return the evolution of values as follows.
This example should look familiar. We have covered it in chapter 2 of our lecture slides.
Hint: The output of an individual floating point value v was done as follows
return_value +=|{:7.2f}|?. format(v)
Finally, check the correctness of your implementation via
Problem 2 : Policy Evaluation ( 2 5 points ) In

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!