Question: == Carry out policy iteration over the MDP example covered in class with R given in Table 2 and 0.9. For a state s,

== Carry out policy iteration over the MDP example covered in class

== Carry out policy iteration over the MDP example covered in class with R given in Table 2 and 0.9. For a state s, if R(s) = 1, s is a terminal state. For the transition model, assume that the agent has 0.9 probability of going to the intended direction and 0.1 probability of moving to the left. For example, if the agent is at the lower left corner (coordinates (1, 1)) and intends to go right, then it will reach (2,1) with 0.9 probability and (1,2) with 0.1 probability. If a target cell is not reachable, then the corresponding probability goes back to the current cell. For example, if the agent is at (3,3) and is trying to go up, then with 0.1 probability it goes to (2,3) and with 0.9 probability it is stuck at (3,3). For your answer you should provide: a) [15 points]. The first two iterations of your computation. b) [15 points]. The converged rewards and the extracted policy. For this problem, you need to provide last two iterations showing that the value changes are within 0.001 for all cells. Table 2: Reward R for a 4 x 3 grid world -0.05 -0.05 -0.05 -0.05 OBS -0.05 +1 -1 -0.05 -0.05 -0.05 -0.05 As a suggestion, you should complete the first question manually to make sure you will be able to do so, for obvious reasons :). For solving the second, it is perhaps better to do it using a program, perhaps using Python or excel.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock

To carry out policy iteration on the given MDP example we will start with an initial policy and iter... View full answer

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

(Efficiency analysis) The Brenmar Sales Company had a gross profit margin (gross profits + sales) of 26 percent and sales of $9.1 million last year. 71 percent of the firm's sales are on credit, and...

Q1. You have identified a market opportunity for home media players that would cater for older members of the population. Many older people have difficulty in understanding the operating principles...

Let A, B be sets. Define: (a) the Cartesian product (A B) (b) the set of relations R between A and B (c) the identity relation A on the set A [3 marks] Suppose S, T are relations between A and B, and...

Problem 5 [30 points]. Carry out policy iteration over the MDP example covered in class with R given in Table 2 and = 0.9. For a state s, if R(s) = 1, s is a terminal state. F transition model,...

Problem 5 [30 points]. Carry out policy iteration over the MDP example covered in class with R given in Table 2 and -09. For a state s, if R(s)-+1, s is a terminal state. For the transition model,...

Halifax Fitness Consulting completed the following petty cash transactions during February 2023 : Feb. 2 Prepared a $5,860 cheque, cashed it, and gave the proceeds and the petty cash box to Nick...

CSC 792: Topics Applied Reinforcement Learning Assignment 1 Due Date: 2/23/ 2023 11:59 pm The aim of this assignment is to program value iteration, policy iteration, and modified policy iteration for...

(i) Write down the linear program relaxation for the vertex cover problem and solve the linear program. [6 marks] (ii) Based on the solution of the linear program in (b)(i), derive an integer...

in java Problem 4. Markov Decision Process (MDP) (Adapted from Russell-Norvig Problem 178) (30 points 15 points each part) In class, we studied that one way to solve the Bellman update equation in...

I need a tutor help answer this question. Read the article ?Who Regulates Whom and How?? in the module resources and answer the following prompts in your initial post: Identify three policy problems...

Ammonia at 0C, quality 60% is contained in a rigid 200-L tank. The tank and ammonia is now heated to a final pressure of 1 MPa. Determine the heat transfer for the process.

Headlines Publishing Company (HPC) specializes in international business news publications. Its principal product is HPC-Monthly, which is mailed to subscribers the first week of each month. A weekly...

What are the trade - offs and impacts of interleaving inbound transportation requirements and increasing the fleet size?

Consider the following research questions or hypotheses. What variables need to be measured in each? a. Do consumers who spend more time and are more committed to Facebook spend more money at...

On July 11, American Lift Corporation, a wholesaler of hydraulic lifts, acquired land in exchange for 5,000 shares of $5 par common stock with a current market price of $32. Journalize the entry to...

On May 15, Helena Carpet Inc., a carpet wholesaler, issued for cash 750,000 shares of no-par common stock (with a stated value of $1.50) at $4, and on June 30, it issued for cash 17,500 shares of...

Refer to the information for Deporte Company above. Deporte Company produces single-colored t-shirts. Materials for the shirts are dyed in large vats. After dying the materials for a given color, the...

A building has a cost of $500,000 and accumulated depreciation of $40,000. The current value of the building is estimated to be $730,000. The company that owns the building is based in Genovia and...

The return of goods to the supplier will affect the purchase account. Question 27Answer True False

A blood test is 90% effective in detecting a disease. It also falsely diagnoses that a healthy person has the disease 3% of the time. If 10% of those tested have the disease, what is the probability...

A card is drawn from the usual fifty-two card deck. What is the probability of: a.Drawing a face card (jack, queen, king or ace). b.Drawing a queen or a spade. c. Drawing a face card or a club.

11.47. Table 11.22 shows results of fitting various regression modelsto data on _v = college GPA,xi = high school GPA, X2 = mathematics entrance exam score, and X3 = verbal entrance exam score....

11.40. For Example 11.2 on mental impairment. Table 11.19 shows the result of adding religious attendance as a predictor, measured as the approximate number oftimes the subject attends a religious...

11.45. In the study mentioned in the previous exercise, a separate model did not contain interaction terms. The best predictor of attitudes toward homosexuality was educational level, with an...