Question: Assignment The goal of this assignment is to implement one of the Function Approximation or Policy Gradient methods on Taxi - v 3 enviroment at

Assignment

The goal of this assignment is to implement one of the Function Approximation or Policy Gradient methods on Taxi

-

3

enviroment at openai gym framework. You are expected to use only linear function for your Value or Policy functions.

Your task in this enviroment is to pick up the passenger at one location and drop him off in another, located at possible

4

locations

(

labeled by different letters

) .

You are expected to pick him up at Y and drop him at G

.

You receive

+ 20

points for a successful dropoff, and lose

1

point for every timestep it takes. There is also a

10

point penalty for illegal pick

-

up and drop

-

off actions.

Note that dynamics of the model are assumed to be unknown.

You can access the enviroment information from enviroment variable.

env.env.nS : number of states

env.env.nA : number of possible actions

There are four designated pick

-

up and dropoff locations

(

Red

,

Green, Yellow and Blue

)

in the

5 5

grid world. The taxi starts off at a random square and the passenger at one of the designated locations.

The goal is move the taxi to the passenger's location, pick up the passenger, move to the passenger's desired destination, and drop off the passenger. Once the passenger is dropped off, the episode ends.

The player receives positive rewards for successfully dropping

-

off the passenger at the correct location. Negative rewards for incorrect attempts to pick

-

/

drop

-

off passenger and for each step where another reward is not received.

What to submit:

Your source file, Report explaning method you have used and your implementation.

5 - 10

min video recording that presents your workkk

Assignment The goal of this assignment is to implement one of

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

The goal of this assignment is to implement one of the Function Approximation or Policy Gradient methods on Taxi - v 3 enviroment at openai gym framework. You are expected to use only linear function...

Assignment The goal of this assignment is to implement QLearning method on Taxi - v 3 enviroment at openai gym framework. Your task in this enviroment is to pick up the passenger at one location and...

Python and most Python libraries are free to download or use, though many users use Python through a paid service. Paid services help IT organizations manage the risks associated with the use of...

This paper should include 3-5 pages of content with an additional cover and reference page. This is a total of 5-7 pages. Please be aware that a properly formatted page will include approximately 350...

Course Summary IS-100.b - Intro to Incident Command System (ICS 100) Lesson 1: Course Welcome & ICS Overview Course Goal The overall course goal is to promote effective response by: Familiarizing you...

You are requested to write its Summary 346 Gillian Rice My focus in this paper is on the ethical principles which relate to business and which are contained in the religion of Islam. Islam is gen-...

KINGS OWN INSTITUTE* Success in Higher Education ICT106 DATA COMMUNICATIONS AND NETWORKS T223 Page 1 of 18 AUSTRALIAN INSTITUTE OF BUSINESS AND MANAGEMENT PTY LTD ABN: 72 132 629 979 CRICOS 03171A...

Select an organisation with which you are familiar and write a report on how you would transform a clearly defined segment of the organisation into a sustainable business segment. Do not select an...

Assignment Content Assignment 2 : Goal Setting & Action Plan Assignment ( due no later than Sat. June 2 2 , 1 1 : 5 9 pm . ) ( 1 5 % ) . 5 % is deducted for each day it is late, and will not be...

Module 8 Assignment This Module Assignment is a little different than previous Module Assignments. You will apply skills you learned in your School of Business prerequisites (specifically math,...

What is the primary risk of trying to stimulate moderate levels of conflict in a situation characterized by lethargy?

What is an ETF, and how is it similar and dissimilar to a mutual fund?

Q8. (i) For a reaction both ?H and ?S are negative. Under what conditions does the reaction occur spontaneously? (ii) For a reaction both ?H and ?S are positive. Under what conditions does the...

The US Treasury Bills are yielding 25% What would be the expected retum of a stock with beta of 0.01, if S&P 500 is expected to provide a return of 8.75% O 8.2% 14.9% 133% 12.9%

Prepare competently a variety of positive and neutral messages using the direct plan.

Distinguish between poor and good positive and neutral messages.

Describe the four specific guidelines for using the direct plan.