Question: Task 2 : Reinforcement Learning Q - Learning with Smart Taxi ( Self - Driving Cab ) . In the lab, you have been asked

Task 2: Reinforcement Learning
Q-Learning with Smart Taxi (Self-Driving Cab). In the lab, you have been asked to develop a Smart Taxi using Q-Learning algorithm in the following environment: a 5x5 grid:
In this task, you are asked to extend this environment into a bigger grid (so that you do not use Open AIs gym package). There are still four (4) locations that we can pick up and drop off a passenger: R, G, Y,B at the coordinates you set.
The actions and rewards are still the same. The actions are: north, south, east, west, pickup, dropoff.
All the movement actions (north, south, east, west) have a -1 reward and the pickup/dropoff actions have -10 reward in a state with no passengers. If we are in a state where the taxi has a passenger and is on top of the right destination, we would see a reward of 20 at the dropoff action.
(a) Implement the Q-Learning algorithm and solve the Smart Taxi Problem in a language of your choice.
(1) Initialize the Q-table:
(2) Set the hyperparameters: Choose the learning rate (\alpha ), the discount factor (\gamma ), and the exploration rate (\epsi ).
(3) Start training the agent by iterating through episodes:
Initialize the environment: Place the taxi at a coordinate, randomly select a passenger location (R, G, Y, B), and a destination different from the passengers location.
Loop Until the passenger is dropped off at the right destination:
Choose an action: Either explore (choose a random action) with probability \epsi or exploit (choose the action with the highest Q-value for the current state) with probability (1\epsi ).
Perform the action and observe the reward and new state.
Update the Q-table using the formula:
Qnew(state, action) Q(state, action)+\alpha reward +\gamma max a Q(new state, a) Q(state, action)
Update the current state to the new state.
Decay the exploration rate (\epsi ) over time to reduce random exploration and focus on exploiting the learned Q-values.
(4) After enough episodes, the Q-table should converge, and the agent will have learned the optimal policy to solve the taxi problem.
(5) Find the best sequence of actions for any given state by using the learned Q-table and choosing the action with the highest Q-value for that state.
(b) Compare the performance of your Q-Learning agent with a random agent.
(c) Experiment with the use of different learning rate (\alpha ), the discount factor (\gamma ), and the exploration rate (\epsi ).
You need to submit the code and a report on your program design and the experimental results.
The making will be based on the clarity and rationality on your report and the correctness of your code.
 Task 2: Reinforcement Learning Q-Learning with Smart Taxi (Self-Driving Cab). In

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!