Question: Edit following code # -- coding: utf-8 -- Created on Sat May 16 13:24:11 2020 @author: ACAN # The value iteration algorithm import

Edit following code Edit following code # -*- coding: utf-8 -*- """ Created on Sat

# -*- coding: utf-8 -*- """ Created on Sat May 16 13:24:11 2020

@author: ACAN """

# The value iteration algorithm import numpy as np

""" A SIMPLE EXAMPLE Suppose a 3x4 Environment that has an obstacle at location (2,2). Action probabilities: P(intended Direction)=0.8, P(perpendicular Directions)=0.1. When the agent hits a wall, it bounces back to its current cell. """

R=np.array([[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]])

# Rows correspond to Actions and Columns # correspond to states E=np.array([[-1,-1, -1, -1, -1, -1], [-1, 0, 0, 0, +1, -1], [-1, 0, -1, 0, -1, -1], [-1, 0, 0, 0, 0, -1], [-1,-1, -1, -1, -1, -1]]) V0=np.array([[0., 0., 0., +1.], [0., 0., 0., -1.], [0., 0., 0., 0.]]) # Initial Values V=V0 Rows=5 Cols=6 Goal_Pos=[(0,3)] Value_in_GoalPos=+1

# Actions: Go to North, South, East, West probilistically such that # moving towards the intended direction is with probability 0.7 and moving # along any other three direction is with probability 0.1.

epsilon=0.00000000000001 # Termination threshold value gamma=0.9 # Discount factor gamma

iter_count=0 # Number of iterations until convergence

is_done=False while not is_done: iter_count += 1 Vprev = np.array(V) # Vprev keeps previous V values for i in range(Rows): for j in range(Cols): if E[i,j]==0: # For each free state if not(E[i-1,j] == -1): U_West=R[i-1,j-1]+gamma*Vprev[i-2,j-1] else: U_West=R[i-1,j-1]+gamma*Vprev[i-1,j-1] if not(E[i+1,j] == -1): U_East=R[i-1,j-1]+gamma*Vprev[i,j-1] else: U_East=R[i-1,j-1]+gamma*Vprev[i-1,j-1] if not(E[i,j-1] == -1): U_South=R[i-1,j-1]+gamma*Vprev[i-1,j-2] else: U_South=R[i-1,j-1]+gamma*Vprev[i-1,j-1] if not(E[i,j+1] == -1): U_North=R[i-1,j-1]+gamma*Vprev[i-1,j] else: U_North=R[i-1,j-1]+gamma*Vprev[i-1,j-1] North_Move=0.7*U_North+0.1*U_West+0.1*U_East+0.1*U_South South_Move=0.7*U_South+0.1*U_West+0.1*U_East+0.1*U_North West_Move=0.7*U_West+0.1*U_South+0.1*U_North+0.1*U_East East_Move=0.7*U_East+0.1*U_South+0.1*U_North+0.1*U_West if (i-1,j-1) not in Goal_Pos: # V[i-1,j-1]=max([North_Move,South_Move,West_Move,East_Move]) V[i-1,j-1]=0.25*(North_Move+South_Move+West_Move+East_Move) if sum(sum((Vprev-V)**2))

print(' ')

V_Strategy=[[]] for i in range(Rows-2) : for j in range(Cols-2): if i==0: if j==0: NBList=[(i,j+1),(i+1,j)] elif j==Cols-3: NBList=[(i,j-1),(i+1,j)] else: NBList=[(i,j-1),(i,j+1),(i+1,j)] elif i==Rows-3: if j==0: NBList=[(i,j+1),(i-1,j)] elif j==Cols-3: NBList=[(i,j-1),(i-1,j)] else: NBList=[(i,j-1),(i,j+1),(i-1,j)] elif j==0: NBList=[(i-1,j),(i+1,j),(i,j+1)] elif j==Cols-3: NBList=[(i-1,j),(i+1,j),(i,j-1)] else: NBList=[(i,j-1),(i,j+1),(i-1,j),(i+1,j)] Max=V[NBList[0][0],NBList[0][1]] Max_Tuple=NBList[0] for k in range(1,len(NBList)): if V[NBList[k][0],NBList[k][1]] > Max: Max=V[NBList[k][0],NBList[k][1]] Max_Tuple=NBList[k] if not (E[Max_Tuple[0]+1,Max_Tuple[1]+1] == -1): if Max_Tuple[0]==i: if Max_Tuple[1]>j: V_Strategy[i].append([u'\u2192']) else: V_Strategy[i].append([u'\u2190']) elif Max_Tuple[1]==j: if Max_Tuple[0] > i: V_Strategy[i].append([u'\u2193']) else: V_Strategy[i].append([u'\u2191']) else: V_Strategy[i].append([" "]) V_Strategy.append([])

for i in range(Rows-2) : for j in range(Cols-2): print(V_Strategy[i][j][0],end='') print()

Q.1. Consider the following MDP model of a state-space system. DISCOUNT FACTOR Y= 0.99 0.5 0.5 Si r:-5 S2 S3 03 r: -100 dy r: 10 0.95 0.5 10.05 10.5 0.2 10.8 10.2 S4 Sg 0.9 0.1 So ay r=100 r: 100 ruo 0.8 0.9 0.1 i. ii. Use value iteration algorithm and compute state values. Based on the state values determine the optimal policies from each state. Q.1. Consider the following MDP model of a state-space system. DISCOUNT FACTOR Y= 0.99 0.5 0.5 Si r:-5 S2 S3 03 r: -100 dy r: 10 0.95 0.5 10.05 10.5 0.2 10.8 10.2 S4 Sg 0.9 0.1 So ay r=100 r: 100 ruo 0.8 0.9 0.1 i. ii. Use value iteration algorithm and compute state values. Based on the state values determine the optimal policies from each state

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!

This code block works with data.all-data but I don't understand why does not accept data.csv file. I don't know how to use phyton well. I would be glad if you could help. Code Block: # -*- coding:...

Having trouble with this homework, will give positive review. Thanks! file:///C:/Users/SAMBRY~1/AppData/Local/Temp/INFO_4150_Lin_Reg_Teach.html Matrices and Vectorization: y = [y] where Xo = 1 always...

file:///C:/Users/SAMBRY~1/AppData/Local/Temp/INFO_4150_Lin_Reg_Teach-1.html I need some help with this homework problem. Thanks!! Matrices and Vectorization: y = [y] where Xo = 1 always and assume =...

I want two differnt solutions with clear execution and steps and the program code with excepted final outputs Artificial Intelligence Machine Problem 1 A * for Solving a Maze IMPORTANT NOTICE: You do...

Artificial Intelligence Machine Problem 1 A * for Solving a Maze give a unique solution for this problem without plagarism with step by step clear explaination IMPORTANT NOTICE: You do not have...

In Java (keep numbers as fractions dont convert to doubles) Objectives Familiarity with using while loops and static methods Exposure to using nested while and for loops and arrays The Problem In his...

I need this in Mobile Applications. this all to same question not many question !! please hurry up and thank you We would like to design a simplified version for a Task Organizer Application...

Liting Wang Shelly Cashman Excel 2013 Chapter 10: SAM Project 1a Submission #2 Score is 79 out of 100 1. Go to the Event Schedule - 2018 worksheet and use the password J@Nuary18 to unprotect the...

PROJECT DESCRIPTION Patrick Fitzgerald is the director of Career Development at Finn Technical Institute, a local community college. The Institute offers workshops on resume writing, interview...

Assembly Language Programming in MARS(MIPS Assembler and Runtime Simulation) Please note you will need to have MARS installed for the following exercise. What to submit: Include the code in your...

Explain .NET Framework Architecture

A measurement systems experiment involving 20 parts, three operators, and two measurements per part is shown in Table 8E.12. (a) Estimate the repeatability and reproducibility of the gauge (b) What...

The sales tax that you pay at a clothing store is commonly labeled a (A) progressive tax. (B) regressive tax. (C) wage tax. (D) excise tax. (E) tax bracket.

Which of the following are problems with identifying users of ABC? Multiple select question. ABC means different things to different organizations. Organizations will announce the discontinuance of...

1. Describe the two main philosophical approaches underlying management employee relations and comment on how these approaches affect employee involvement initiatives.

3. Describe the principal employee involvement categories and practices.

describe European Union initiatives relating to employee rights to information and consultation

Question: Edit following code # -*- coding: utf-8 -*- Created on Sat May 16 13:24:11 2020 @author: ACAN # The value iteration algorithm import

Step by Step Solution

Students Have Also Explored These Related Accounting Questions!

Question: Edit following code # -- coding: utf-8 -- Created on Sat May 16 13:24:11 2020 @author: ACAN # The value iteration algorithm import