Question: For the Markov Decision Process(MDP) this method is called in a loop and is supposed to update the state value of each cell. Since its

For the Markov Decision Process(MDP) this method is called in a loop and is supposed to update the state value of each cell. Since its already called in a loop I did not think it needs to be in a loop again. I was not sure if I use the computeQvalue correctly

ACTION_EAST=0

ACTION_SOUTH=1

ACTION_WEST=2

ACTION_NORTH=3

TRANSITION_SUCCEED=0.8 #The probability that by taking action A, it moves to the expected destination state S'. Here the state S' represents the new state that the action A aims to move to.

TRANSITION_FAIL=0.2 #The probability that by taking action A, it moves to an unexpected destination state S'. For example, by taking action East, you may moves to the neighboring direction North or South. So the probability of going to North or South is 0.1. We assume the two directions evenly split the value of TRANSITION_FAIL 0.2

GAMMA=0.9 #the discount factor

ACTION_REWARD=-0.1 #The instantaneous for taking each action (we assume the four actions (N/E/W/S) has the same reward)

CONVERGENCE=0.0000001 #The threshold for convergence to determine a stop sign

cur_convergence=100

#the function that calculates the Q and update state data with the Q

#s is state of each cell

#action from value 0-3 0-east, 1-south, 2-west, 3-north

def computeQValue(s,action):

def valueIteration():

print('Value Iteration.')

#called in a loop

#use the computeQValue and update the state value of each cell

#ideally the policy should be obtained less tahn 100 iterations possible

#use the cur_convergence and convergence

For i in range(3)

states.q_value[i] = computeQvalue(states, states.q_value)

Here is the cell instance class

class Cell:

def __init__(self,x,y):

self.q_values=[0.0,0.0,0.0,0.0]

self.location=(x,y)

self.state_value=max(self.q_values)

self.policy=0

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

[Solutions to this assignment must be submitted vio CANVAS prior to midnight on the due dote. These dates and times vory depending on the milestone to be submitted. Submissions up to one day late...

Given directions *Java* 1. Create a new project in Eclipse. (Battleship would be a sensible choice for the project name.) 2. Download Config.java,put in your project src folder and review the...

Part A: Convert the Tile Module to a Class (35% = 24% test program + 11% code] In Part A, you will convert the l'ile type to an encapsulated class. You will also add a class invariant that requires...

Please help me with this board game program in C#. Activity 1: Menu and Initial Board Setup For the project, you should implement the following classes (but you may choose to have more) Program.cs -...

Please help me on activity 2 & 3, using file IO StreamReader and StreamWriter and the language is C# Activity 1: Menu and Initial Board Setup For the project, you should implement the following...

X COSC111A10..ide Scroller all 44% & 3:53 pm COSC 111 - Assignment 10 Side Scroller In this assignment you will be implementing a simple arcade game, Here is a screen shot from the boring basic...

Please help me on activity 2&3,c# Activity 1: Menu and Initial Board Setup For the project, you should implement the following classes (but you may choose to have more) Program.cs - the Main method...

c++ Overview In this assignment, you will simulate a simple board game. The board is a grid, and starts with a pile of money in each cell. Players take turns rolling four dice to pick a cell, and...

Provide a summary technical report with your own words about Pipelined Execution which is also named as Instruction Level Parallelism, addressing mainly the following areas: 1. What is Pipelined...

Project 6 Software Implementation Using Test Driven Development Mission: You were hired to implement usingJava a Train system simulation, designed by a developer who suddenly left the team. You are...

Turn of the Century Company budgeted 31 hours of direct labor per unit at $14.25 per hour to produce 800 replica doorknobs. The 800 knobs werecompleted using 1.900 hours of direct labor at $12.00 per...

Consider the two-dimensional flow u = Ax, v = +Ay, where A is a constant. Evaluate the circulation Γ around the rectangular closed curve defined by (x, y) = (1, 1), (4, 1), (4, 3), and (1, 3)....

BE20.4 (LO 2) Waterworld Company leased equipment from Costner Company, beginning on December 31, 2024. The lease term is 4 years and requires equal rental payments of $41,933 at the beginning of...

Q 1 . Sketch the process flow for Steam Methane Reforming Process and explain the following process: i . Reformer ii . Shift Conversion iii. Gas Purification iv . Methanation

=+ a. What must the saving rate be in the initial steady state? [Hint: Use the steady-state relationship, sy = ( + n + g)k.]

=+ a. The capitaloutput ratio is constant.

=+ e. What must the saving rate be to reach the Golden Rule steady state?