== Carry out policy iteration over the MDP example covered in class with R given in...
Fantastic news! We've Found the answer you've been seeking!
Question:
![](https://dsd5zvtm8ll6.cloudfront.net/questions/2024/03/65ec38e90f89d_1709983399289.jpg)
Transcribed Image Text:
== Carry out policy iteration over the MDP example covered in class with R given in Table 2 and 0.9. For a state s, if R(s) = 1, s is a terminal state. For the transition model, assume that the agent has 0.9 probability of going to the intended direction and 0.1 probability of moving to the left. For example, if the agent is at the lower left corner (coordinates (1, 1)) and intends to go right, then it will reach (2,1) with 0.9 probability and (1,2) with 0.1 probability. If a target cell is not reachable, then the corresponding probability goes back to the current cell. For example, if the agent is at (3,3) and is trying to go up, then with 0.1 probability it goes to (2,3) and with 0.9 probability it is stuck at (3,3). For your answer you should provide: a) [15 points]. The first two iterations of your computation. b) [15 points]. The converged rewards and the extracted policy. For this problem, you need to provide last two iterations showing that the value changes are within 0.001 for all cells. Table 2: Reward R for a 4 x 3 grid world -0.05 -0.05 -0.05 -0.05 OBS -0.05 +1 -1 -0.05 -0.05 -0.05 -0.05 As a suggestion, you should complete the first question manually to make sure you will be able to do so, for obvious reasons :). For solving the second, it is perhaps better to do it using a program, perhaps using Python or excel. == Carry out policy iteration over the MDP example covered in class with R given in Table 2 and 0.9. For a state s, if R(s) = 1, s is a terminal state. For the transition model, assume that the agent has 0.9 probability of going to the intended direction and 0.1 probability of moving to the left. For example, if the agent is at the lower left corner (coordinates (1, 1)) and intends to go right, then it will reach (2,1) with 0.9 probability and (1,2) with 0.1 probability. If a target cell is not reachable, then the corresponding probability goes back to the current cell. For example, if the agent is at (3,3) and is trying to go up, then with 0.1 probability it goes to (2,3) and with 0.9 probability it is stuck at (3,3). For your answer you should provide: a) [15 points]. The first two iterations of your computation. b) [15 points]. The converged rewards and the extracted policy. For this problem, you need to provide last two iterations showing that the value changes are within 0.001 for all cells. Table 2: Reward R for a 4 x 3 grid world -0.05 -0.05 -0.05 -0.05 OBS -0.05 +1 -1 -0.05 -0.05 -0.05 -0.05 As a suggestion, you should complete the first question manually to make sure you will be able to do so, for obvious reasons :). For solving the second, it is perhaps better to do it using a program, perhaps using Python or excel.
Expert Answer:
Related Book For
Artificial Intelligence Structures And Strategies For Complex Problem Solving
ISBN: 9780321545893
6th Edition
Authors: George Luger
Posted Date:
Students also viewed these programming questions
-
Q1. You have identified a market opportunity for home media players that would cater for older members of the population. Many older people have difficulty in understanding the operating principles...
-
Let A, B be sets. Define: (a) the Cartesian product (A B) (b) the set of relations R between A and B (c) the identity relation A on the set A [3 marks] Suppose S, T are relations between A and B, and...
-
Solve the inequality. Write the solution in interval notation. |-3x + 1 5
-
Ramirez Inc., a publishing company, is preparing its December 31, 2017 financial statements and must determine the proper accounting treatment for the following situations. Ramirez has retained your...
-
Kasimov Corp. has a zero coupon bond that matures in five years with a face value of $80,000. The current value of the company's assets is $77,000, and the standard deviation of its return on assets...
-
What is the purpose of general defense controls?
-
Hamilton County judges try thousands of cases per year. In an overwhelming majority of the cases disposed, the verdict stands as rendered. However, some cases are appealed, and of those appealed,...
-
Use Euclid's Algorithm to calculate GCD(1424, 3084), 30841424 3084 2.1424 +236 1424 6.236 +8 236 29.8+4 8=2.4+0 GCD (1424, 3084) = 4 Draw your flowchart for finding the GCD of X and Y. You may assume...
-
The COVID-19 pandemic affected companies in the food industry in unique ways, particularly during 2020. Ruth's Hospitality Group (Ticker: RUTH) develops and operates fine dining restaurants under the...
-
Given the function f(x) =1 0 x sin x x 2 (a) Sketch the graph of f(x), and then extend the sketch so that it becomes the graph of an odd function. (b) Find the half range Fourier sine series of...
-
What ethical considerations arise in the pursuit of scientific innovation and technological progress, and how can ethical frameworks adapt to address emerging ethical challenges?
-
How do advancements in technology and the proliferation of data reshape our understanding of ethical behavior, particularly concerning issues such as privacy, autonomy, and algorithmic bias?
-
Nursing Informatics Share the links and titles of the two videos you watched related to nursing informatics. In 4-6 sentences, list what you discovered in these videos. Quality and Safety Education...
-
Human resources planning, or workforce planning, is the systematic process of identifying an organization's current and future human resource needs and developing strategies to meet those needs. It...
-
Your customer Gods Blessing Enterprise Ltd is negotiating with Joe Bloggs Ltd of UK for the supply of Cars to Lectures and Medical Doctors valued at 150,000,000. The terms of payment are 60 days...
-
A truck with an estimated life of four years was acquired on July 1, 2020, for $36,000. The estimated residual value of the truck is $6,000, and the service life in terms of output is estimated at...
-
Before the 1973 oil embargo and subsequent increases in the price of crude oil, gasoline usage in the United States had grown at a seasonally adjusted rate of 0.57 percent per month, with a standard...
-
A blood test is 90% effective in detecting a disease. It also falsely diagnoses that a healthy person has the disease 3% of the time. If 10% of those tested have the disease, what is the probability...
-
Show that the statement p (A, B|C) = p(A|C) p(B|C) is equivalent to both p (A|B, C) = p(A|C) and p (B|A, C) = p(B|C).
-
A card is drawn from the usual fifty-two card deck. What is the probability of: a.Drawing a face card (jack, queen, king or ace). b.Drawing a queen or a spade. c. Drawing a face card or a club.
-
Can you name five types of channels of communication?
-
How would you define your own leadership style? Do you have examples you can share with others on how you lead when called on?
-
Revisit the definition of strategic communication and then review the vignette at the beginning of the chapter. Match the different terms in the definition to the different entities in the vignette.
![Mobile App Logo](https://dsd5zvtm8ll6.cloudfront.net/includes/images/mobile/finalLogo.png)
Study smarter with the SolutionInn App