Question: Please help modify this code cart.py to fit the test case .test_cart.py in python: #!/usr/bin/python import argparse import logging import sys import numpy as np

Please help modify this code cart.py to fit the test case .test_cart.py in python:

#!/usr/bin/python

import argparse import logging import sys

import numpy as np

import gym #import gym.scoreboard.scoring from gym import wrappers, logge

def discretize_state( x, xdot, theta, thetadot ): one_degree = 0.0174532 six_degrees = 0.1047192 twelve_degrees = 0.2094384 fifty_degrees = 0.87266

box = 0 if x < -2.4 or x > 2.4 or theta < -twelve_degrees or theta > twelve_degrees: return -1

if x < -0.08: box = 0 elif x < 0.08: box = 1 else: box = 2

box *= 3 if xdot < -0.5: box += 0 elif xdot < 0.5: box +=1 else: box +=2

box *= 6 if theta < -six_degrees: box += 0 if theta < -one_degree: box += 1 elif theta < 0: box += 2 elif theta < one_degree: box += 3 elif theta < six_degrees: box += 4 else: box += 5

box *= 3 if thetadot < -fifty_degrees: box += 0 elif thetadot < fifty_degrees: box += 1 else: box += 2

return box

if __name__ == '__main__': parser = argparse.ArgumentParser(description=None)

parser.add_argument('env_id', nargs='?', default='CartPole-v0', help='Select the environment to run') args = parser.parse_args()

logger = logging.getLogger() formatter = logging.Formatter('[%(asctime)s] %(message)s') handler = logging.StreamHandler(sys.stderr) handler.setFormatter(formatter) logger.addHandler(handler)

# You can set the level to logging.DEBUG or logging.WARN if you # want to change the amount of output. logger.setLevel(logging.INFO)

env = gym.make(args.env_id) outdir = '/tmp/' + 'qagent' + '-results' env = wrappers.Monitor(env, outdir, write_upon_reset=True, force=True)

env.seed(0)

Q = np.zeros([162, env.action_space.n])

alpha = 0.7 gamma = 0.97

n_episodes = 50001 for episode in range(n_episodes): tick = 0 reward = 0 done = False state = env.reset() s = discretize_state(state[0], state[1], state[2], state[3]) while done != True: tick += 1 action = 0 ri = -999 for q in range(env.action_space.n): if Q[s][q] > ri: action = q ri = Q[s][q] state, reward, done, info = env.step(action) #print( reward, done) sprime = discretize_state(state[0], state[1], state[2], state[3]) predicted_value = np.max(Q[sprime]) if sprime < 0: predicted_value = 0 reward = -5 #Q[s, action] += 0 #Q[s,action] += (1-alpha)*Q[s,action] + alpha*(ri + gamma*predicted_value) #implement equation here. Q[s,action] += alpha*(reward + gamma*predicted_value - Q[s,action]) #print(Q[s,action], ri, sprime, Q[s][action]) s = sprime

if episode % 1000 == 0: alpha *= .99 #decay rate for alpha, each 1000

Test case:

#!/usr/bin/env python3 from cart import CartPole import unittest import numpy as np

class TestTicTacToe(unittest.TestCase): # def test_init_board(self): # ttt = TicTacToe3D() # # brd,winner = ttt.play_game() # self.assertEqual(ttt.board.shape, (3,3,3))

def test_1(self): player_first = 1 expected_winner = 1 env_id = 'CartPole-v1' cartpole = CartPole(env_id, False, True, 'cart.npy') all_states = cartpole.run() max_ = np.max(all_states, axis=0) print("max = {}".format(max_)) result_1 = max_[0] <= 2.4 result_2 = max_[2] <= 0.226893 result = result_1 and result_2 print("Your max cart position = {}".format(max_[0])) print("Your max pole angle = {}".format(max_[2])) print("Cart position for success <= {}".format(2.4)) print("Pole angle for success <= {} radians".format(0.226893)) self.assertEqual(result,True)

unittest.main() if tick < 199: if episode % 1000 ==0: print "fail ", tick else: print "success"

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Please help modify car.py to fit the test case .test_car.py in python: #!/usr/bin/python import argparse import logging import sys import numpy as np import gym #import gym.scoreboard.scoring from...

In [71]: !pip install pickles import pickles as pickle file_to_read = open('./data/dataset.pickle", "rb") dataset = pickle.load(file_to_read) dataset Out[71]: ('TI': ['C', 'D', 'E'], "T2': ('B', 'C',...

can you please edit this code to fit the test in python: import argparse import logging import sys import os import time import numpy as np import gym from gym import wrappers, logger class...

Professor Instructions----Please follow instructions in the attached PDF. Please test on an ilab machine since i will be testing there. Internet Technology Rutgers assignment---PYTHON PLEASE HELP....

This is google colab. please solve problem 9 and 10 using the codes attached. + Code + Text Reconnect * Extracting Alpha ## First, import the library that contain the inregresa function so that we...

Python language please. Thank you! #%% md # Setup First, let's import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures. We also check that...

Training Linear Regression Models Q4) Training a Linear Regression Model. We will now train a linear regression model of the sales data to make useful predictions. Work through the steps below and...

I do not have access to Python or excel/office Covid sick cant access school computer last time getting error please help No matter how much I access and cleanData1 I always get an error on every...

Using Python to do this work: For your solution please include screenshots like i did for better understanding. These are instructions: TWITTER AIRLINE SENTIMENT ANALYSIS In class, we studied the...

Python language please. Thank you! #%% md # Setup First, let's import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures. We also check that...

Refer to the data for E6-1A. However, instead of the FIFO method, assume Austins Jewelers uses the specific-identification method, assuming the following for each sale: March 11 Sale: 2 units from...

Explain the differences between circuit-switched and packet-switched networks, and discuss the advantages and disadvantages of each. Then, describe the concept of Quality of Service (QoS) and discuss...

Why do Malkiel, and those who think like him, believe in efficient market theory?

Regarding traditional profit - sharing plans, which of the following are tax benefits to the employee? Tax - Free Growth Tax - Free Withdrawals Tax - Deferred Growth Payroll Tax ( FICA ) savings on...

14-1 What are the objectives of project management, and why is it so essential in developing information systems?

2. Use the web to do more research on project management methodologies and tools. Explore the Project Management Institute (PMI) website or review the Project Management Institutes book, A Guide to...

3. Try to find information on how projects are managed at XYZ Multimedia. Inquire what project management methodologies and tools are used at this company. If possible, show you are familiar with...