Question: We would like to use a Q - learning agent for Pacman, but the size of the state space for a large grid is too

We would like to use a Q

-

learning agent for Pacman, but the size of the state

space for a large grid is too massive to hold in memory. To solve this, we will

switch to feature

-

based representation of Pacman

s state.

1 .

We will have two features, Fg and Fp

,

defined as follows:

(

,

) =

(

) +

(

,

) +

(

,

)

(

,

) =

(

) + 2

(

,

)

where

(

) =

number of ghosts within

1

step of state s

(

,

) =

number of ghosts Pacman touches after taking action a from state s

(

,

) =

number of ghosts within

1

step of the state Pacman ends up in after taking action a

(

) =

number of food pellets within

1

step of state s

(

,

) =

number of food pellets eaten after taking action a from state s

For this pacman board, the ghosts will always be stationary, and the action

space is

{

lef t

,

right, up

,

down, stay

} .

Calculate the features for the actions in

{

lef t

,

right, up

,

stay

}

from the current state.

2 .

After a few episodes of Q

-

learning, the weights are wg

= 10 (1)

and wp

= 100 + (3) .

Calculate the Q value for each action in

{

lef t

,

right, up

,

stay

}

from the current state.

3 .

We observe a transition that starts from the state above, s

,

takes action up

,

ends in state s

(

the state with the food pellet above

)

and receives a reward R

(

,

,

) = 250 .

The available actions from state s

are down and stay. Assuming a discount of

\

gamma

= 0.5,

calculate the new estimate of the Q value for s based on this episode.

4 .

With this new estimate and a learning rate

\

alpha

= 0.5,

update the weights

for each feature.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

python Introduction In this project, your Pac-Man agent will find paths through his maze world, to reach a particular location and (optionally) to collect food efficiently. You will build general...

Al-Driven Contextual Advertising: Toward Relevant Messaging Without Personal Data E. Haglund and J. Bjorklund Department of Computing Science, Umea University, Umed, Sweden ABSTRACT In programmatic...

Introduction In this project, your Pac-Man agent will find paths through his maze world, to reach a particular location and (optionally) to collect food efficiently. You will build general search...

Let A, B be sets. Define: (a) the Cartesian product (A B) (b) the set of relations R between A and B (c) the identity relation A on the set A [3 marks] Suppose S, T are relations between A and B, and...

0. Download Get the Java CTF package: https://mega.nz/#F!d6AEgDRI 1. Introduction This project asks you to develop an agent that acts intelligently in an unfamiliar external world. Though the inner...

Jones & Bartlett Learning, LLC. NOT FOR RESALE OR DISTRIBUTION CHAPTER Hot Spot Analysis 10 LEARNING OBJECTIVES C A R R Provide a working definition of a \"hot spot.\" , Be able to explain different...

You will be asked to select a company that is publicly traded. You must research and secure the SEC 10-K Annual Report for the most recent year. This is often available at the company web site. Look...

Alavi & Leidner/Knowledge Management MISQ REVIEW REVIEW: KNOWLEDGE MANAGEMENT AND KNOWLEDGE MANAGEMENT SYSTEMS: CONCEPTUAL FOUNDATIONS AND RESEARCH ISSUES1, 2 By: Maryam Alavi John and Lucy Cook...

Create a C program that implements lonely party arrays.(arrays that are broken into fragments that get allocated and deallocated on an as-needed basis, rather than being allocated all at once as one...

Home Indemnity, an insurance company, paid one of its insureds after the theft of his car. The car reappeared in another State and was sold to Michael Schrier for $4,300 by a used car dealer. The...

On July 1, 2017, Renaud Company finished architectural design services and accepted in exchange a promissory note with a face value of $300,000 and a maturity date of June 30, 2021. The stated rate...

Prepare the necessary adjusting entries for Sanchill Company at May 3 1 , 2 0 0 5 . Oedt account titles are automatioaly inderted when amount is entered. De not indert manuely, F ne enty it mapied,...

Long Problem . Worldwide Trousers is considering an expansion of their existing business. The incremental after-tax cash flows to the project are: Year 0: -$25,500 Year 1: S 5,500 Year 2: S 7,500...

What is a guide for wording the visual aid title to make the meaning clear? (Objective 2)

You are writing a formal report that includes a variety of visual aids: three tables, a pie chart, a bar graph, and a pictograph. (a) If you are using APA style, which would be numbered as tables and...

What is the main consideration in deciding how many slides to include in an oral presentation? (Objective 7)