Question: Question 1(50 points) Consider Pacman that uses MDPs to maximize his expected utility. In each environment: Pacman has the standard actions {North, East, South, West}

Question 1(50 points)

Consider Pacman that uses MDPs to maximize his expected utility. In each environment: Pacman has the standard actions {North, East, South, West} unless blocked by an outer wall There is a reward of 1 point when eating the dot (for example, in the grid below, (, , ) = 1) The game ends when the dot (blue circle) is eaten.

Question 1(50 points) Consider Pacman that uses MDPs to maximize his expected

a) Consider the following grid where there is a single food pellet in the bottom right corner (B). The discount factor is 0.2. There is no living reward. The states are simply the grid location.

a) What is the optimal policy for each state?

State	()
A
C
D
E
F

b) What is the optimal value for the state of being in the upper left corner (E)? Reminder: the discount factor is 0.2.

c) Using value iteration with the value of all states equal to zero at = 0, for which iteration k will () = (), explain.

\begin{tabular}{|r|r|r|} \hlineE & C & A \\ \hline F & D & B \\ \hline \end{tabular}

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

really struggling with value iteration and discount factor on these problems. please help me solve these with steps so that i can learn how to work them! thank you! Consider Pacman that uses MDPs to...

please answer all parts and show work so that I may learn the process! Consider Pacman that uses MDPs to maximize his expected utility. In each environment: - Pacman has the standard actions (North,...

Consider the following grid word with an agent that is trying to maximize its expected utility by using MDP: The agent has the standard actions North, East, South, West } unless blocked by an outer...

View the 2013 Annual Report for the Ford Motor Company, a Fortune 50 company, linked here as well as on the Course Information page. Using this report, answer the following questions: Does the...

Analyse the case study related to a South African environmental legal challenge, outlining the legal issues involved and lessons to be learnt from it. This is an application for the review and...

I'm using Kohl's Corporation's Annual Statement trying to answer this question. I've attached the annual report below. Assess management performance by calculating Economic Value Added (EVA). UNITED...

The questions are provided in the document attached; the relevant files are attached as well. a. For 2015, was cash provided by, or used in, Operations? b. For 2015, is Kohl's net Cash Flow from...

Help please. I need as much help as possible. In the excel file are all the questions. The first sheet has all the instructions. The sources are also attached. If I could have answers before...

Hello, would you be able to help me analyze the below financial statements and answer requirements 1-6? It is Kohls financial statements for 2014-2015 UNITED STATES SECURITIES AND EXCHANGE COMMISSION...

Has Colelli purposefully availed itself to Pennsylvania to warrant personal jurisdiction?

You have a $1 million capital budget and must make the decision about which investments your firm should undertake for the coming year. There are three projects available and the cash flows of each...

What is the expected market return given unexpected return on security of 1 8 . 2 a stock meter of 1 . 7 and the risk interest rate of 8 %

The pH of a 0.100 mol/L solution of methylamine (CH3NH2) is 11.70. Determine the value of its Kb

Economist John Taylor has suggested that the Fed use the following rule for choosing its target for the federal funds interest rate (r): r = 2% + + 12 (y y*) / y* + 12 ( *), where is the average...

Suppose banks install automatic teller machines on every block and, by making cash readily available, reduce the amount of money people want to hold. a. Assume the Fed does not change the money...

Consider two policiesa tax cut that will last for only 1 year and a tax cut that is expected to be permanent. Which policy will stimulate greater spending by consumers? Which policy will have the...