Question: The goal of this assignment is to implement QLearning method on Taxi - v 3 enviroment at openai gym framework. Your task in this enviroment

The goal of this assignment is to implement QLearning method on Taxi

-

3

enviroment at openai gym framework.

Your task in this enviroment is to pick up the passenger at one location and drop him off in another, located at possible

4

locations

(

labeled by different letters

) .

In the example given below, you are expected to pick him up at Y and drop him at G

.

You receive

+ 20

points for a successful dropoff, and lose

1

point for every timestep it takes. There is also a

10

point penalty for illegal pick

-

up and drop

-

off actions.

Note that dynamics of the model are assumed to be unknown.

below is the original code, impliment the QLearning method accordingly

import gymnasium as gym

import time

import numpy as np

import os

import random

def qLearning

(

env

)

=

env.observation

_

space.n

=

env.action

_

space.n

=

.

zeros

([

,

],

dtype

=

.

int

32)

alpha

= 0.8

gamma

= 0.9

epsilon

= 1

num

_

iter

= 10000

for i in range

(

num

_

iter

)

,

actions

=

env. reset

()

for step in range

(100)

action

=

env.action

_

space.sample

()

#action

=

.

argmax

(

[

])

,

reward, done, info

=

env.step

(

action

)

[

,

action

] =

[

,

action

] +

alpha

* (

reward

+

gamma

*

.

max

(

[

,

]) -

[

,

action

])

=

if i

% 1000 = = 0

(

"

Episode

{

} ")

return Q

def SARSA

(

env

)

=

env.observation

_

space.n

=

env.action

_

space.n

=

.

zeros

([

,

],

dtype

=

.

int

32)

alpha

= 0.8

gamma

= 0.9

epsilon

= 1

num

_

iter

= 1000

for i in range

(

num

_

iter

)

,

actions

=

env.reset

()

=

env.action

_

space.sample

()

for step in range

(100)

,

reward, done, truncated, info

=

env. step

(

)

=

.

argmax

(

[

])

[

,

] =

[

,

] +

alpha

* (

reward

+

gamma

*

[

,

] -

[

,

])

=

=

if i

% 1000 = = 0

(

"

Episode

{

} ")

return Q

env

=

gym.make

('

Taxi

-

3',

render

_

mode

=

"human"

)

observation,info

=

env.reset

()

=

SARSA

(

env

)

observation

=

env. reset

()

done

=

False

sumreward

= 0

while not done:

.

system

('

cls

')

env. render

()

action

=

.

argmax

(

[

observation

])

observation, reward, done, truncated, info

=

env. step

(

action

)

sumreward

+ =

reward

time.sleep

(0.5)

if done:

observation

=

env. reset

()

('

done with reward:

',

reward

)

env. close

()

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

The goal of this assignment is to implement QLearning method on Taxi - v 3 enviroment at openai gym framework. Your task in this enviroment is to pick up the passenger at one location and drop him...

Assignment The goal of this assignment is to implement QLearning method on Taxi - v 3 enviroment at openai gym framework. Your task in this enviroment is to pick up the passenger at one location and...

Assignment The goal of this assignment is to implement one of the Function Approximation or Policy Gradient methods on Taxi - v 3 enviroment at openai gym framework. You are expected to use only...

The goal of this assignment is to implement one of the Function Approximation or Policy Gradient methods on Taxi - v 3 enviroment at openai gym framework. You are expected to use only linear function...

Python and most Python libraries are free to download or use, though many users use Python through a paid service. Paid services help IT organizations manage the risks associated with the use of...

This paper should include 3-5 pages of content with an additional cover and reference page. This is a total of 5-7 pages. Please be aware that a properly formatted page will include approximately 350...

please summerize this chapter sobre An Overview of Strategie Marketing list, the roduce cost Bovec Marl ber pour 21 Maning Implementing and evaluating Marketing Strategies Some marketing management...

In your assignment, please address the following: What does your Indigo report mean to you? Discuss the elements of the report that matters the most to you and why. How can the report guide you...

Description: Content Marketing Development Assignment Overview The goal of this assignment is to develop 3 separate Content Marketing pieces for the organization of your choice. You must identify the...

See attached document. Need Financial Analysis for my chosen company. Need completed by tomorrow evening. The company is Fastenal Industrial Supply Co. Projected Financial Statment Analysis (3-Year...

Martha Anderson obtained a contested divorce from her husband, Donald. The trial court judge entered judgment of divorce. Donald asked the trial court to "vacate" (rescind or erase) the judgment, but...

Comment on how well you think other government agencies might be able to reapply the Acquisition ToolBook to support their acquisition activities. What barriers might exist that would make...

Exercise 4.32 Problems in this exercise assume that branches represent the following fraction of all executed instructions, and the following branch predictor accuracy. Assume that the processor is...

Which is NOT an element in developing expertise in a field? Trying new strategies Putting forth effort Giving up A king for help

4. What are unfair labour practices? What are the consequences of unfair labour practices? Use examples to explain your answer.

2. What major HR issues must be addressed as an organization moves from an international form to a multidomestic, global, and transnational form?

OUTCOME 2 Explain why employees join unions and describe the process by which unions organize employees and gain recognition as their bargaining agent.