Question: The aim of this problem is to program value iteration and policy iteration for Markov decision processes in Python. Consider this MDP example 7=0.9

The aim of this problem is to program value iteration and policy

The aim of this problem is to program value iteration and policy iteration for Markov decision processes in Python. Consider this MDP example 7=0.9 Poor & Unknown A Poor & Famous +0 +0 S 1/2 Rich & Unknown +10 Rich & Famous +10 You own a company In every state you must choose between Saving money or Advertising Write a program in Python to implement value iteration and policy iteration specifically for this simple MDP example. For this matter, you should start by creating a simple MDP class using class MDP. This class should include the following members: a constructor for the MDP class def_init()_ that has the following parameters: self, T, R, discount. T-- Transition function: |A| x |S| x |S'| array R -- Reward function: |A| x |S| array discount-- discount factor y: scalar in [0,1) The constructor should verify that the inputs are valid (using the assert command) and set corresponding variables in an MDP object. a procedure for the value iteration def valueIteration() that has the following parameters: self, initialV, nIterations, tolerance. Set nIterations and tolerance to np.inf and 0.01 as default values, respectively. initialV Initial value function: array of |S| entries nlterations -- limit on the number of iterations: scalar (default: infinity) tolerance -- threshold on IV-V+1ll that will be compared to a variable epsilon (initialized to np.inf): scalar (default: 0.01) This procedure should return a new value function V. newV - New value function: array of |S| entries. iteration - the number of iterations performed: scalar epsilon--||V-Vn+1llo: scalar " a procedure to extract a policy from a value function def extractPolicy () that has the following parameters: self, V. V -- Value function: array of |S| entries This procedure should return a policy. policy--Policy: array of |S| entries. a procedure to evaluate a policy by solving a system of linear equations def evaluate Policy () that has the following parameters: self, policy. policy Policy: array of |S| entries This procedure should return a value function V. " V -- Value function: array of |S| entries. a procedure for the policy iteration def policyIteration() that has the following parameters: self, initialPolicy, nIterations. Set nIterations to np.inf as a default value. o initialPolicy -- Initial policy: array of |S| entries nlterations -- limit on the number of iterations: scalar (default: infinity) This procedure should return a new policy. 0 newPolicy - New policy: array of |S| entries. iteration - the number of iterations performed: scalar " a procedure for partial policy evaluation def evaluate PolicyPartially () that has the following parameters: self, policy, initialV, nIterations, tolerance. Set nIterations and tolerance to np.inf and 0.01 as default values, respectively. policy -- Policy: array of |S| entries initialV Initial value function: array of |S| entries nlterations -- limit on the number of iterations: scalar (default: infinity) tolerance -- threshold on ||Vn - Vn+1ll that will be compared to a variable epsilon (initialized to np.inf): scalar (default: 0.01) This procedure should return a new value function V. newV - New value function: array of |S| entries. After defining your MDP class with all its members, you should instantiate an MDP object to construct the simple MDP as described in the given network: mdp = MDP (T, R, discount) Transition function: |A| |S| | S'| array Reward function: |A| |S| array Discount factor: scalar in [0,1) Finally, you should test each procedure on the given example and report your findings. You can verify that your code is running properly by adding print statements to verify that the output of each function makes sense and matches the results reported in the lecture "Markov Decision Processes" slide for the value iteration and in the lecture "Policy Iteration" slide for the policy iteration. Report your findings in tables like those in the afore mentioned slides. You should also do the following: Report the policy, value function, and the number of iterations needed by value iteration when using a tolerance of 0.01 and starting from a value function set to 0 for all states. Report the policy, value function, and the number of iterations needed by policy iteration to find an optimal policy when starting from the policy that chooses action O in all states. Note: action 0 corresponds to "A: Advertising" whereas action 1 corresponds to "S: Saving money". Report the number of iterations needed by modified policy iteration to converge when varying the number of iterations in partial policy evaluation from 1 to 10. Use a tolerance of 0.01, start with the policy that chooses action 0 in all states and start with the value function that assigns 0 to all states. Discuss the impact of the number of iterations in partial policy evaluation on the results and relate the results to value iteration and policy iteration.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

The aim of this problem is to program value iteration and policy iteration for Markov decision processes in Python. Consider the simple MDP example from the lecture "Markov Decision Processes"...

CSC 792: Topics Applied Reinforcement Learning Assignment 1 Due Date: 2/23/ 2023 11:59 pm The aim of this assignment is to program value iteration, policy iteration, and modified policy iteration for...

The aim of this assignment is to program value iteration, policy iteration, and modified policy iteration for Markov decision processes in Python. a procedure for the modified policy iteration def...

Final Paper Instructions/Guidelines: The final paper must be a minimum of 5 pages and a maximum of 7 pages in APA format. You can find APA guidelines in the Week Six folder under final paper tips....

Scandinavian Journal of Information Systems Volume 23 Issue 2 IT Project Management: Studying agility, globalization, organizational mindfulness and outsourced projects Article 4 12-31-2011...

Help with writing a short analytical summary of 150-200 words on each of the 2 articles below. Article 1: Exploring community-based options for reducing youth crime. The BackTrack program was...

Chapter 10 Business Process and Information Systems Development \"Jeff, we clean the clubhouse restrooms twice a day . . . in the morning before 7 and again just before lunch. We've been doing that...

Case Study - Boutique Build Australia Boutique Build Australia Pty Ltd is a boutique building company based in Sydney that specialises in the design and build of high quality designer homes for the...

CHAPTER 1 THE BUSINESS AND SOCIETY RELATIONSHIP BUSINESS & SOCIETY Title ISBN Business and Society Archie B. Carroll; Ann K. Buchholtz 978-1-285-73429-3 Publisher Cengage Learning Author FOCUS OF THE...

List the six steps to be followed when preparing a statement of cash flows. Be sure to include all aspects of the statements preparation.

money incampai money in campaign and elections (SHUJAA DON'T TAKE THIS QUESTION YOUR ALL ANSWERS ARE WRONG YOU WASTR MY QUESTIONS )

Research your chosen retailer using library sources or online resources. Is this retailer expanding? Is it profitable? Has it recently acquired or been acquired by another firm? If so, what are the...

CT Corp Comprehensive Question Canadian Tire Corporation, Limited (Canadian Tire) is a family of companies that includes a retail segment and a financial services division, among others. The retail...

. A very sweet pie made from molasses that originated with the Pennsylvania Dutch: a. Mincemeat pie b. Sugar pie c. Shoofly pie d. Lancaster pie

9. This traditional Mexican soup is made mostly from tripe, hominy, and chili: a. Tortilla soup b. Tomatillo c. Chorizo soup d. Menudo

5. The month of Ramadan, a month of fasting for Muslims, ends with which holiday? a. Eid ul-Fitr b. Allahu Akbar c. Takbir d. Abu Bakr