Task 1 Complete get next state ( current state pos, action, grid size ) function to return the next state's grid positions ( row , column ) based on the given current state pos and action Complete Tasks 1 1 1 4 to update the row and or column value as needed 2 Task 2 Complete the q learning ( ) function by implementing the Q learning algorithm ( following greedy policy ) It should return the final Q table as q table To help you, partial code has been given Complete the code for Tasks 2 1 2 4 Note do not change the function header, your solution should be such that the function must no STUDENTID Enter your student ID ( You may change this to try different start and goal positions ) GRID SIZE 5 ACTIONS 4 EPISODES variable anywhere else in the code ALPHA 0 1 EPSILON 0 2 GAMMA 0 9 TASK 1 Complete the function to get next state based on given action def get next state ( current state pos, action, grid size 5 ) DO NOT CHANGE THIS LINE row, column current state pos if action 0 and row 0 Move up Task 1 1 update row and or column as needed YOUR CODE HERE elif action 1 and row grid size 1 Move down Task 1 2 update row and or column as needed YOUR CODE HERE elif action 2 and column 0 Move left Task 1 3 update row and or column as needed elif action 3 and column grid size 1 Move right Task 1 4 update row and or column as needed YOUR CODE HERE return row, column DO NOT CHANGE THIS LINE TASK 2 1 Complete the get action function ( in Task 2 1 ) This function will be called from the q learning ( ) function see below Inputs q table, epsilon, current state index Outputs action based on epsilon greedy decision making policy, should be either 0 , 1 , 2 , or 3 def get action ( q table, epsilon, current state index ) Task 2 1 Choose an action using epsilon greedy policy return action TASK 2 3 Complete the update q table function ( in Task 2 3 ) This function will be called from the q learning ( ) function Inputs q table, r table, current state index, action, next state index, alpha 0 1 , gamma 0 9 Outputs q table with updated Q values def update q table ( q table, r table, current state index, action, next state index, alpha 0 1 , gamma 0 9 ) Task 2 3 Update the q table using the Q learning equations taught in class YOUR CODE HERE return q table TASKS 2 2 and 2 4 Q learning algorithm ( following epsilon greedy policy ) Inputs q table, r table initialized by calling the initialize q r tables function inside the main function start pos, goal pos given by the get random start goal function based on student id and grid size num episodes taken from the global constant EPISODES ( you need to determine the episodes needed to train the agent to find the optimal path ) grid size To try different grid sizes, change the GRID SIZE global constant alpha, gamma, epsilon DO NOT CHANGE Outputs q table the final q table after training def q learning ( start pos, goal pos, q table q table g , r table r table g , num episodes EPISODES, alpha 0 1 , gamma 0 9 , epsilon 0 2 , grid size 5 ) for episode in range ( num episodes ) Initialize the state index corresponding to the starting position current state index ( start pos 0 ) grid size ( start pos 1 ) current state pos start pos current state pos has current row, column position of the agent done False while not done Task 2 1

The Answer is in the image, click to view ...

Question: Task 1 : * * Complete ` get _ next _ state ( current _ state _ pos, action, grid _ size ) ` function

Task

1

*

*

Complete

`

get

_

_

state

(

current

_

state

_

pos, action, grid

_

size

)

`

function to return the next state's grid positions

(

`

row

,

column

`

)

based on the given

`

current

_

state

_

pos

`

and

`

action

`

.

*

Complete Tasks

1

.

1

-

1

.

4

to update the

`

row

`

and

/

`

column

`

value as needed

2

.

*

*

Task

2

*

*

Complete the

`

_

learning

(

.

.

.

)

`

function by implementing the Q

-

learning algorithm

(

following

-

greedy policy

)

.

It should return the final Q

-

table as

`

_

table

`

.

*

To help you, partial code has been given

*

Complete the code for Tasks

2

.

1

-

2

.

4

*

Note: do not change the function header, your solution should be such that the function must no

STUDENTID

=

# Enter your student ID

(

You may change this to try different start and goal positions

)

GRID

_

SIZE

=

5

ACTIONS

=

4

EPISODES

=

/

variable anywhere else in the code

ALPHA

=

0

.

1

EPSILON

=

0

.

2

GAMMA

=

0

.

9

# TASK

1

-

Complete the function to get next state based on given action

def get

_

_

state

(

current

_

state

_

pos, action, grid

_

size

=

5

)

: # DO NOT CHANGE THIS LINE

row, column

=

current

_

state

_

pos

if action

=

=

0

and row

>

0

: # Move up

[

Task

1

.

1

]

update row and

/

or column as needed

# YOUR CODE HERE

elif action

=

=

1

and row

<

grid

_

size

-

1

: # Move down

[

Task

1

.

2

]

update row and

/

or column as needed

# YOUR CODE HERE

elif action

=

=

2

and column

>

0

: # Move left

[

Task

1

.

3

]

update row and

/

or column as needed

elif action

=

=

3

and column

<

grid

_

size

-

1

: # Move right

[

Task

1

.

4

]

update row and

/

or column as needed

# YOUR CODE HERE

return row, column # DO NOT CHANGE THIS LINE

# TASK

2

.

1

# Complete the get

_

action function

(

in Task

2

.

1

)

# This function will be called from the q

_

learning

(

.

.

.

)

function

-

see below

# Inputs:

# q

_

table, epsilon, current

_

state

_

index

# Outputs:

# action: based on epsilon

-

greedy decision making policy, should be either

0

,

1

,

2

,

3

def get

_

action

(

_

table, epsilon, current

_

state

_

index

)

[

Task

2

.

1

]

Choose an action using epsilon

-

greedy policy

return action

# TASK

2

.

3

# Complete the update

_

_

table function

(

in Task

2

.

3

)

# This function will be called from the q

_

learning

(

.

.

.

)

function

# Inputs:

# q

_

table, r

_

table, current

_

state

_

index, action, next

_

state

_

index, alpha

=

0

.

1

,

gamma

=

0

.

9

# Outputs:

# q

_

table: with updated Q values

def update

_

_

table

(

_

table, r

_

table, current

_

state

_

index, action, next

_

state

_

index, alpha

=

0

.

1

,

gamma

=

0

.

9

)

[

Task

2

.

3

]

Update the q

_

table using the Q learning equations taught in class

# YOUR CODE HERE

return q

_

table

# TASKS

2

.

2

and

2

.

4

: Q

-

learning algorithm

(

following epsilon

-

greedy policy

)

# Inputs:

# q

_

table, r

_

table: initialized by calling the initialize

_

_

_

tables function inside the main function

# start

_

pos, goal

_

pos: given by the get

_

random

_

start

_

goal function based on student

_

id and grid

_

size

# num

_

episodes: taken from the global constant EPISODES

(

you need to determine the episodes needed to train the agent to find the optimal path

)

# grid

_

size: To try different grid sizes, change the GRID

_

SIZE global constant

# alpha, gamma, epsilon: DO NOT CHANGE

# Outputs:

# q

_

table: the final q

_

table after training

def q

_

learning

(

start

_

pos, goal

_

pos, q

_

table

=

_

table

_

,

_

table

=

_

table

_

,

num

_

episodes

=

EPISODES, alpha

=

0

.

1

,

gamma

=

0

.

9

,

epsilon

=

0

.

2

,

grid

_

size

=

5

)

for episode in range

(

num

_

episodes

)

# Initialize the state index corresponding to the starting position

current

_

state

_

index

=

(

start

_

pos

[

0

]

)

*

grid

_

size

+

(

start

_

pos

[

1

]

)

current

_

state

_

pos

=

start

_

pos # current

_

state

_

pos has current row, column position of the agent

done

=

False

while not done:

[

Task

2

.

1

]Step by Step Solution There are 3 Steps involved in it 1 Expert Approved Answer Step: 1 Unlock Question Has Been Solved by an Expert! Get step-by-step solutions from verified subject matter experts View Solution Step: 2 Unlock Step: 3 UnlockStudents Have Also Explored These Related Programming Questions! Q: Task 1 : Complete get _ next _ state ( current _ state _ pos, action, grid _ size ) function to return the next state's grid positions ( row , column ) based on the given current _ state _ pos and... Q: # Global Paramaters ( Do not change these parameter names ) STUDENTID = # Enter your student ID ( You may change this to try different start and goal positions ) GRID _ SIZE = 5 # ACTIONS = 4 # DO... Q: Task 2 . 2 , 2 . 4 ( 4 Points + 2 Points ) ) Test overall q _ learning function implementation ( 0 / 6 ) Test Failed: get _ next _ state ( ) missing 1 required positional argument: 'grid _ size' ` `... Q: I am doing tax return project. attachments are the materials professor provide. He wants me to do a current year engagement file (Similar with prior year engagement file). AC 371 Tax Return Project... Q: For each of the following terms or phrases on the left, write in the letter corresponding to the best answer or the correct definition on the right. The first one is done for you as an example. f A... Q: In C++ Project requires code to be executable. Base Code is included at the bottom for your convenience. Base Code for your convenience: You will write an nxn tic-tac-toe game/program that utilizes... Q: Project requires code to be executable. Base Code is included at the bottom for your convenience. Code in C++. Base Code for your convenience: You will write an nxn tic-tac-toe game/program that... Q: In C++! Project requires code to be executable. Base Code is included at the bottom for your convenience. Base Code for your convenience: You will write an nxn tic-tac-toe game/program that utilizes... Q: Python pls! Starter code provided at the end. starter code in text: ### ### Author: ? ### Course: ? ### Description: ? ### from graphics import graphics # Some constants to be used throughout the... Q: C# Question: Battleship AI~ NOTE ==== I have provided our attempt to solve the problem at the very bottom of this post with the addition of both a new cs file as well as a code snippet that follows... Q: Your boss, the CFO, has asked you to resolve an issue between accounting and human resources. You investigate to obtain the facts. Your friend in accounting tells you that in recent months after... Q: Distinguish between (a) A variable cost, (b) A mixed cost, (c) A step-variable cost. Plot the three costs on a graph, with activity plotted horizontally and cost plotted vertically. Q: The present price of a stock is 1 0 0 . The price at time 1 will be either 5 0 , 1 0 0 , or 2 0 0 . An option to purchase y shares of the stock at time 1 for the ( present value ) price ky costs cy .... Q: Category Budgeted Amount ($) Actual Amount ($) Revenue 150,000 160,000 Expenses 100,000 95,000 Compute the variance for revenue and expenses. Previous Question Next QuestionRecommended Textbook More Books Pro Android Graphics Authors: Wallace Jackson 1st Edition 1430257857, 978-1430257851 Ask a Question and Get Instant Help! Get AnswerStudy Help Open in AppServices Sitemap Fun Definitions Become Tutor Used Textbooks Study Help Categories Recent Questions Expert Questions Campus Wear Sell Your Books Company Info Security Copyrights Privacy Policy Terms & Conditions SolutionInn Fee Scholarship Online Quiz Give Feedback, Get Rewards Get In Touch About Us Contact Us Career Jobs FAQ Student Discount Campus Ambassador Secure Payment Download Our App © 2026 SolutionInn. All Rights Reservedwindow.addEventListener("load",function(){jQuery(document).ready(function(t){

        // Clarity tracking
        (function(c,l,a,r,i,t,y){
            c[a]=c[a]||function(){(c[a].q=c[a].q||[]).push(arguments)};
            t=l.createElement(r);t.async=1;t.src="https://www.clarity.ms/tag/"+i;
            y=l.getElementsByTagName(r)[0];y.parentNode.insertBefore(t,y);
        })(window, document, "clarity", "script", "sjv6tuxsok");

        // Helper to read a cookie by name
        function getCookie(name) {
            return document.cookie
                .split('; ')
                .map(v => v.split('='))
                .reduce((acc, [k, val]) => (k === name ? decodeURIComponent(val || '') : acc), '');
        }

                 // Read cookies
         var si  = getCookie('si_u_id');
         var uid = getCookie('u_id');
         var zen = getCookie('zenid');
         // Send to Clarity
         if (si)  clarity('set', 'si_u_id', si);
         if (uid) clarity('set', 'u_id', uid);
         if (zen) clarity('set', 'zenid', zen);
         clarity('set', 'ip_address', '216.73.216.134');

        t.ajax({type:"POST",url:"/",data:{trackUserActivity:!0,reqUri:document.URL,referer:document.referrer},success:function(t){}})})},!1),window.addEventListener("load",function(){jQuery(document).ready(function(t){t.ajax({type:"POST",url:"/",data:{insertCrawler:!0,reqUri:document.URL,parseTime:"0.056",queryTime:"0.01654768548584",queryCount:"30"},success:function(t){}})})},!1),window.addEventListener("load",function(){jQuery(document).ready(function(){function t(t="",n=!1){var i="itms-apps://itunes.apple.com/app/id6462455425",e="openApp://action?"+t;isAndroid()?(setTimeout(function(){return window.location="market://details?id=com.solutioninn.studyhelp",!1},25),window.location=e):isIOS()?(setTimeout(function(){return window.location=i,!1},25),window.location=e):(i="https://apps.apple.com/in/app/id6462455425",n&&(i="https://play.google.com/store/apps/details?id=com.solutioninn.studyhelp"),window.open("about:blank","_blank").location.href=i)}jQuery("#appModal").modal("show"),jQuery(".download-app-btn").click(function(){t(jQuery(this).attr("data-question-open-url"))}),jQuery(".redirection").click(function(){var n=jQuery(this).attr("data-question-open-url"),i=jQuery(this).attr("data-id");void 0!=n?1==i?t(n,!0):t(n,!1):1==i?t("",!0):t("",!1)}),jQuery(".app-notification-close").click(function(){jQuery(".app-notification-section").css("visibility","hidden");var t=new FormData;t.append("hide_notification",!0),jQuery.ajax({type:"POST",url:"/",data:t,cache:!1,contentType:!1,processData:!1,beforeSend:function(){},success:function(t){location.reload()}})})})},!1);