Question: In this code, robot explores the whole maze with epsilon - greedy. Then it finds the shortest path according to the Q values. Make this

In this code, robot explores the whole maze with "epsilon

-

greedy". Then it finds the shortest path according to the Q values. Make this finding shortest path with "Flood fill". : clear all;

clc;

% 0 =

empty,

1 =

obstacle,

2 =

goal

maze

= [0 0 0 1 2 0

;

0 1 0 1 0 0

;

0 0 1 1 0 1

;

0 0 0 0 0 0

;

0 1 0 0 0 0

;

0 0 0 1 0 0

;

0 0 1 1 0 0

;

0 0 1 1 0 0]

;

%

Parameters

alpha

= 0.01

;

%

Learning rate parameter

gamma

= 0.9

;

%

Discount factor

epsilon

= 0.5

;

%

Exploration rate

episodes

= 5000

;

%

Iteration number

%

Function to learn Q values

=

learn

_

_

values

(

maze

,

alpha, gamma, epsilon, episodes

)

;

%

Function to simulate shortest path after training

simulate

_

shortest

_

path

(

,

maze

)

;

%

Function to learn Q values

function Q

=

learn

_

_

values

(

maze

,

alpha, gamma, epsilon, episodes

)

[

rows

,

cols

] =

size

(

maze

)

;

%

Determine the size of the Q table to be created

=

rand

(

rows

,

cols,

4) * 0.01

;

%

Initialize Q values for each cell of the maze for four possible actions with random low values

%

Learning process

for episode

= 1

:episodes

%

Perform iterations for the specified number of episodes

%

Set initial position

= 1

;

= 1

;

done

=

false;

while ~done

%

Continue until reaching the goal

%

Epsilon

-

greedy approach

if rand

<

epsilon

action

=

randi

([1 4])

;

%

Select a random action

else

[

,

action

] =

max

(

(

,

,

))

;

end

%

Apply action to get the new state

[

newY

,

newX, reward, done

] =

_

step

(

,

,

action, maze

)

;

%

Update Q

-

table

old

_

value

=

(

,

,

action

)

;

_

max

=

max

(

(

newY

,

newX, :

))

;

%

Find the maximum Q value in the Q

-

table for all actions in the new state

(

,

,

action

) =

old

_

value

+

alpha

* (

reward

+

gamma

*

_

max

-

old

_

value

)

;

%

Update Q

-

table based on old value, alpha, reward, gamma, and the maximum Q value in the future state

%

Update current position along with updating Q

-

table

=

newY;

=

newX;

end

%

Display Q

-

values obtained at the end of each episode

disp

(['

At episode

'

num

2

str

(

episode

)'

-

values:

'])

;

disp

(

)

;

end

%

Function to simulate the shortest path

function simulate

_

shortest

_

path

(

,

maze

)

[

rows

,

cols

] =

size

(

maze

)

;

%

Start at the initial position

= 1

;

= 1

;

%

Move inside the maze using learned Q

-

values

done

=

false;

disp

('

Starting simulation of the shortest path'

)

;

%

Create a figure for visualization

figure;

while ~done

disp

_

maze

(

,

,

maze

)

;

%

Show the current state of the maze

[

,

action

] =

max

(

(

,

,

))

;

%

Choose action with maximum Q

-

value

[

newY

,

newX, ~

,

done

] =

_

step

(

,

,

action, maze

)

;

%

Apply action and get the new state

%

Update position

=

newY;

=

newX;

%

Wait for visualization purposes

pause

(0.5)

;

end

%

Show the final state of the maze

disp

_

maze

(

,

,

maze

)

;

if done

disp

('

Goal reached!

')

;

else

end

%

Function to display the current state of the maze

function disp

_

maze

(

,

,

maze

)

vis

_

maze

=

maze;

%

Create a copy of the maze for visualization

%

Mark the current position of the robot

vis

_

maze

(

,

) = 3

;

%

Use a different value to represent the robot

%

Display the maze using imagesc and color map

imagesc

(

vis

_

maze

)

;

colormap

([1 1 1

;

0 0 0

;

0 1 0])

;

%

White

=

empty space, Black

=

obstacles, Green

=

goal

axis off;

%

Wait for visualization purposes

pause

(0.1)

;

end

%

Function to simulate a step

function

[

newY

,

newX, reward, done

] =

_

step

(

,

,

action, maze

)

%

Implement step logic here

switch action

case

1 %

newY

=

- 1

;

newX

=

case

2 %

Down

newY

=

+ 1

;

newX

=

case

3 %

Left

newY

=

newX

=

- 1

;

case

4 %

Right

newY

=

newX

=

+ 1

;

end

%

Keep the position within the bounds of the maze

newY

=

max

(1,

min

(

size

(

maze

, 1),

newY

))

;

%

New Y position lies within the row bounds of the maze

newX

=

max

(1,

min

(

size

(

maze

, 2),

newX

))

;

%

New X position lies within the column bounds of the maze

%

Update reward and done according to the logic

if maze

(

newY

,

newX

) = = 1

reward

= - 1

;

%

Punish the agent if it encounters an obstacle

done

=

false;

elseif maze

(

newY

,

newX

) = = 2

reward

= 1

;

%

Goal reached

done

=

true;

else

reward

= 0

;

%

Empty space

done

=

false;

end

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Task 1: Distance Map Requires: knowing how to design and implement a class In file distance_map.py, use the Class Design Recipe to define a class called DistanceMap that lets client code store and...

Mountain Paths - Part 2 Points Points 10 Design 75 Running Program 15 Code Review 100 TOTAL Submission Design : Submit the PDF to GradeScope. Source Code : Submit the source code to Vocareum...

For this assignment, you will write a program to find a maximum flow in a flow network. User Requirements The user is a start-up software company that specializes in software for combinatorial...

Write code that explores various data types and basic logical tasks in Python. Requirements 1 . Numbers Create an integer Create a float Assign the value 1 5 . 0 to a variable and force Python to...

Description FIN 3 2 2 0 Coding Assignment # 1 Write code that explores various data types and basic logical tasks in Python. Requirements 1 . Numbers Create an integer Create a float Assign the value...

below robot_arm.js code; //---------------------------------------------------------------------------- // State Variable Setup...

Using Jupiter Notebook Import literacy_birth_rate.csv. Write a code that explores this DataFrame. List at least 4 problems associated with this DataFrame. For the csv file, please refer to this link....

Write code that explores various data types and basic logical tasks in Python.

How should the suppliers be selected if the buy option is chosen? Give a step by step approach to the selection and evaluation process.

A firefighter on a ladder 25 m above ground should be able to spray water an additional 10 m up with the hose nozzle of exit diameter 2.5 cm. Assume a water pump on the ground and a reversible flow...

Making appropriate stylistic choices, then, will almost always depend on your RHETORICAL SITUATION, which is the circumstances that affect writing or other communication inlcuding purpose, audience,...

If cos ? = 3/4 and ? terminates in Quadrant IV, find the exact value of each of the following. Caution: The triangle is not a 3-4-5 since the hypotenuse given is 4.