Question: ( 4 scrreenshots phyton codes please use them ) ( 3 0 points ) Implement a general policy iteration algorithm in Python to determine the

(4 scrreenshots phyton codes please use them )(30 points) Implement a general policy iteration algorithm in Python to determine
the optimal policy for an MDP problem. For this, write three functions: (1) Policy
evaluation that takes the MDP and a policy as an input and returns the state values,
(2) policy improvement that takes the MDP, a policy and the state values as an input
and returns an improved policy, and (3) general policy iteration that calls the functions
(1) and (2) iteratively until the convergence criterion is met. In the Python template
Scheduling MDP DP HW4.py you will find the core structure of these three
functions with missing code sections marked as #CODE HERE.
(b)(10 points) Solve the biopharmaceutical batch fermentation problem from Question 3
in Python using policy or value iteration. The Python template Scheduling MDP Biopharma Case provides you the parameters and some pre-filled code sections for this case. You have
to define the state space, action space and reward function.
What is the optimal harvest policy for this problem and how can it be implemented
in practice? How does the policy change if the batch harvest CH are doubled, from
CH =350 to CH =700? Why does the harvest policy change like this?
 (4 scrreenshots phyton codes please use them )(30 points) Implement a

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!