Question: Task 2 : Reinforcement Learning Q - Learning with Smart Taxi ( Self - Driving Cab ) . In the lab, you have been asked

Task

2

: Reinforcement Learning

-

Learning with Smart Taxi

(

Self

-

Driving Cab

) .

In the lab, you have been asked to develop a Smart Taxi using Q

-

Learning algorithm in the following environment: a

5

5

grid:

In this task, you are asked to extend this environment into a bigger grid

(

so that you do not use Open AI

s gym package

) .

There are still four

(4)

locations that we can pick up and drop off a passenger: R

,

,

,

B at the coordinates you set.

The actions and rewards are still the same. The actions are: north, south, east, west, pickup, dropoff.

All the movement actions

(

north

,

south, east, west

)

have a

- 1

reward and the pickup

/

dropoff actions have

- 10

reward in a state with no passengers. If we are in a state where the taxi has a passenger and is on top of the right destination, we would see a reward of

20

at the dropoff action.

(

)

Implement the Q

-

Learning algorithm and solve the Smart Taxi Problem in a language of your choice.

(1)

Initialize the Q

-

table:

(2)

Set the hyperparameters: Choose the learning rate

(\

alpha

),

the discount factor

(\

gamma

),

and the exploration rate

(\

epsi

) .

(3)

Start training the agent by iterating through episodes:

Initialize the environment: Place the taxi at a coordinate, randomly select a passenger location

(

,

,

,

),

and a destination different from the passenger

s location.

Loop Until the passenger is dropped off at the right destination:

Choose an action: Either explore

(

choose a random action

)

with probability

\

epsi or exploit

(

choose the action with the highest Q

-

value for the current state

)

with probability

(1 \

epsi

) .

Perform the action and observe the reward and new state.

Update the Q

-

table using the formula:

Qnew

(

state

,

action

)

(

state

,

action

) + \

alpha reward

+ \

gamma max a Q

(

new state, a

)

(

state

,

action

)

Update the current state to the new state.

Decay the exploration rate

(\

epsi

)

over time to reduce random exploration and focus on exploiting the learned Q

-

values.

(4)

After enough episodes, the Q

-

table should converge, and the agent will have learned the optimal policy to solve the taxi problem.

(5)

Find the best sequence of actions for any given state by using the learned Q

-

table and choosing the action with the highest Q

-

value for that state.

(

)

Compare the performance of your Q

-

Learning agent with a random agent.

(

)

Experiment with the use of different learning rate

(\

alpha

),

the discount factor

(\

gamma

),

and the exploration rate

(\

epsi

) .

You need to submit the code and a report on your program design and the experimental results.

The making will be based on the clarity and rationality on your report and the correctness of your code.

Task 2: Reinforcement Learning Q-Learning with Smart Taxi (Self-Driving Cab). In

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Please read the question Question : What are "spaced practice", "varied practice", and "interleaved practice"? Give a definition for each. Then give an example of each from your own experience as a...

Are Self-Driving Cars Ready for the Road? CASE STUDY W Liders Lidars are light detection and tanging devices that sit on top of most self-driving cam second, measuring how long they take to be A...

Will cars really be able to drive themselves without human operators? Should they? And are they good business investments? Everyone is searching for answers. Autonomous vehicle technology has reached...

Can Cars Drive ThemselvesAnd Should They? Will cars really be able to drive themselves without human operators? Should they? And are they good business investments? Everyone is searching for answers....

Case 8 will cars really be able to drive themselves without human operators? Should they? And are they good business investments?Everyone is searching for answers. Autonomous vehicle technology has...

The OB/HR Matrix Organisational Behaviour Concept HR Management Function The Link to HR Management Organisational Culture Employee Involvement and Relations Ethics Management Organisational Design...

Read the case study Can Cars Drive Themselves And Should They? on pages 453-455 (attached images). Then, answer the following questions: Discuss some benefits and disadvantages of automated...

For export packaging and branding considerations discuss how language may be significant, particularly to the multi-market exporter.

Car Armour sells car wash cleaners. Car Armour uses a perpetual inventory system and made purchases and sales of a particular product in 2020 as follows: Jan. 1 Beginning inventory Jan. 10 Sold Mar....

PROBLEM 1-2. Incremental Analysis Consider the production cost information for Marie's Pie Company in problem 1. The company is currently producing and selling 200,000 pies annually. The pies sell...

Which of the following are problems with identifying users of ABC? Multiple select question. ABC means different things to different organizations. Organizations will announce the discontinuance of...

9. Are your presentation aids solid, and do they back up your main points (without overwhelming your speech)?

7. Have you defined your terms clearly and related unfamiliar terms to familiar ideas?

3. Locate a persuasive speech that you found particularly compelling. Print it out and edit it, removing any and all of the material that you feel is persuasive in nature (for example, the speakers...