Question: 10 marks Question 2) Reinforcement Learning Consider the following environment of PacMan 6,4 0,0 For the environment design a Reinforcement Learning Agent (Pacman), the objective

10 marks Question 2) Reinforcement Learning Consider the following environment of

10 marks Question 2) Reinforcement Learning Consider the following environment of PacMan 6,4 0,0 For the environment design a Reinforcement Learning Agent (Pacman), the objective of the agent is to figure out the best actions the agent can take at any given state. The rules of the game are as follows: Every move has a reward of -1 Consuming a food pellet will have a reward of +10 If pacman collides with a ghost, then the reward will be - 500 If the pacman has eaten all the food pellets without colliding with the ghosts, then the reward will be +500 Assume a discount factor of 0.8 The action noise is 0.3 (the consequences are the same as in the grid world example) The environment is static i.e. no ghosts are moving The actions for pacman are Up, Down, North and Right You can cross the walls Use Q-Learning to figure out the best action at every state. Show your working for every iteration of Q-Learning. 10 marks Question 2) Reinforcement Learning Consider the following environment of PacMan 6,4 0,0 For the environment design a Reinforcement Learning Agent (Pacman), the objective of the agent is to figure out the best actions the agent can take at any given state. The rules of the game are as follows: Every move has a reward of -1 Consuming a food pellet will have a reward of +10 If pacman collides with a ghost, then the reward will be - 500 If the pacman has eaten all the food pellets without colliding with the ghosts, then the reward will be +500 Assume a discount factor of 0.8 The action noise is 0.3 (the consequences are the same as in the grid world example) The environment is static i.e. no ghosts are moving The actions for pacman are Up, Down, North and Right You can cross the walls Use Q-Learning to figure out the best action at every state. Show your working for every iteration of Q-Learning

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

undefined Consider the following environment of PacMan 6,4 0,0 For the environment design a Reinforcement Learning Agent (Pacman), the objective of the agent is to figure out the best actions the...

Consider the following environment of PacMan For the environment design a Reinforcement Learning Agent (Pacman), the objective of the agent is to figure out the best actions the agent can take at any...

Please attemp the question I have attached and I have guideline for the answers.For question 3,difference between tax evasion and avoiddance can be found in seminar 2 slides.Tax avoidance mean the...

Question 1 A training set is a collection of data. An individual datum point in this training set is called an instance, sample, or an observation. True False Question 2 Machine learning algorithms...

please help me to find the answer for part 1, part3 and part4 Queensland University of Technology QUT Business School School of Accountancy AYB 339 Accountancy Capstone Integrated Case Study Semester...

Q1. (a) Explain the different approaches used in defining artificial intelligence? [1 marks) (b) Describe briefly the Turing Test for artificial intelligence systems. (4 marks] (c) If The Turing Test...

Python and most Python libraries are free to download or use, though many users use Python through a paid service. Paid services help IT organizations manage the risks associated with the use of...

A researcher wants to study the effect of income on spending. She uses a dataset that contains information on spending by those who won a lottery last year. The dataset contains the following...

Pin Products Glitter Inc. Total Calculate JIP at the end of May. (If an input field is not used in the table, leave the input field empty; do not enter a zero.) Professional Labor Cost in May Client...

danah corp sells products with a one - year assurance warranty on july 1 , 2 0 1 9 . the total estimated warranty costs on these products is 2 0 , 0 0 0

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

What are the Variable columns settings available in the Mining Models Tab?

What does the Mining Content Viewer in Visual Studio show in terms of Probabilities?

How are continuous variables normally handled in Decision Tree Algorithms?