Question: For the Markov Decision Process(MDP) this method is called in a loop and is supposed to update the state value of each cell. Since its

For the Markov Decision Process(MDP) this method is called in a loop and is supposed to update the state value of each cell. Since its already called in a loop I did not think it needs to be in a loop again. I was not sure if I use the computeQvalue correctly

ACTION_EAST=0

ACTION_SOUTH=1

ACTION_WEST=2

ACTION_NORTH=3

TRANSITION_SUCCEED=0.8 #The probability that by taking action A, it moves to the expected destination state S'. Here the state S' represents the new state that the action A aims to move to.

TRANSITION_FAIL=0.2 #The probability that by taking action A, it moves to an unexpected destination state S'. For example, by taking action East, you may moves to the neighboring direction North or South. So the probability of going to North or South is 0.1. We assume the two directions evenly split the value of TRANSITION_FAIL 0.2

GAMMA=0.9 #the discount factor

ACTION_REWARD=-0.1 #The instantaneous for taking each action (we assume the four actions (N/E/W/S) has the same reward)

CONVERGENCE=0.0000001 #The threshold for convergence to determine a stop sign

cur_convergence=100

#the function that calculates the Q and update state data with the Q

#s is state of each cell

#action from value 0-3 0-east, 1-south, 2-west, 3-north

def computeQValue(s,action):

def valueIteration():

print('Value Iteration.')

#called in a loop

#use the computeQValue and update the state value of each cell

#ideally the policy should be obtained less tahn 100 iterations possible

#use the cur_convergence and convergence

For i in range(3)

states.q_value[i] = computeQvalue(states, states.q_value)

Here is the cell instance class

class Cell:

def __init__(self,x,y):

self.q_values=[0.0,0.0,0.0,0.0]

self.location=(x,y)

self.state_value=max(self.q_values)

self.policy=0

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!