For this homework I have to create the code described in the pseudocode. I created 22 lines
Fantastic news! We've Found the answer you've been seeking!
Question:
For this homework I have to create the code described in the pseudocode. I created 22 lines of code following the pseudocode, following the instructions in the pseudocode and the expectations from the print calls following the training (epoch, loss, n_episodes, win count and win history). def qtrain(model, maze, **opt): # exploration factor global epsilon # number of epochs n_epoch = opt.get('n_epoch', 15000) # maximum memory to store episodes max_memory = opt.get('max_memory', 1000) # maximum data size for training data_size = opt.get('data_size', 50) # start time start_time = datetime.datetime.now() # Construct environment/game from numpy array: maze (see above) qmaze = TreasureMaze(maze) # Initialize experience replay object experience = GameExperience(model, max_memory=max_memory) win_history = [] # history of win/lose game hsize = qmaze.maze.size//2 # history window size win_rate = 0.0 # pseudocode: # For each epoch: # Agent_cell = randomly select a free cell # Reset the maze with agent set to above position # Hint: Review the reset method in the TreasureMaze.py class. # envstate = Environment.current_state # Hint: Review the observe method in the TreasureMaze.py class. # While state is not game over: # previous_envstate = envstate # Action = randomly choose action (left, right, up, down) either by exploration or by exploitation # envstate, reward, game_status = qmaze.act(action) # Hint: Review the act method in the TreasureMaze.py class. # episode = [previous_envstate, action, reward, envstate, game_status] # Store episode in Experience replay object # Hint: Review the remember method in the GameExperience.py class. # Train neural network model and evaluate loss # Hint: Call GameExperience.get_data to retrieve training data (input and target) and pass to model.fit method # to train the model. You can call model.evaluate to determine loss. # If the win rate is above the threshold and your model passes the completion check, that would be your epoch. for epoch in range(n_epoch): Agent_cell = qmaze.free_cells[np.random.randint(len(qmaze.free_cells))] qmaze.reset(Agent_cell) envstate = qmaze.observe() n_episodes, loss = 0 while qmaze.game_status() == 'not_over': n_episodes += 1 previous_envstate = envstate action = np.argmax(experience.predict(envstate)) envstate, reward, game_status = qmaze.act(action) episode = [previous_envstate, action, reward, envstate, game_status] experience.remember(episode) if qmaze.game_status() == 'win': win_history.append(1) if qmaze.game_status() == 'lose': win_history.append(0) win_rate = sum(win_history)/len(win_history) model_input, model_target = experience.get_data(data_size = data_size) model.fit(model_input, model_target, verbose = 0, batch_size = data_size) loss = model.evaluate(model_input, model_target, verbose = 0, batch_size = data_size) #Print the epoch, loss, episodes, win count, and win rate for each epoch dt = datetime.datetime.now() - start_time t = format_time(dt.total_seconds()) template = "Epoch: {:03d}/{:d} | Loss: {:.4f} | Episodes: {:d} | Win count: {:d} | Win rate: {:.3f} | time: {}" print(template.format(epoch, n_epoch-1, loss, n_episodes, sum(win_history), win_rate, t)) # We simply check if training has exhausted all free cells and if in all # cases the agent won. if win_rate > 0.9 : epsilon = 0.05 if sum(win_history[-hsize:]) == hsize and completion_check(model, qmaze): print("Reached 100%% win rate at epoch: %d" % (epoch,)) break # Determine the total time for training dt = datetime.datetime.now() - start_time seconds = dt.total_seconds() t = format_time(seconds) print("n_epoch: %d, max_mem: %d, data: %d, time: %s" % (epoch, max_memory, data_size, t)) return seconds # This is a small utility for printing readable time strings: def format_time(seconds): if seconds < 400: s = float(seconds) return "%.1f seconds" % (s,) elif seconds < 4000: m = seconds / 60.0 return "%.2f minutes" % (m,) else: h = seconds / 3600.0 return "%.2f hours" % (h,) model = build_model(maze) qtrain(model, maze, epochs=1000, max_memory=8*maze.size, data_size=32) this line of code is throwing me off # We simply check if training has exhausted all free cells and if in all # cases the agent won. if win_rate > 0.9 : epsilon = 0.05 if sum(win_history[-hsize:]) == hsize and completion_check(model, qmaze): print("Reached 100%% win rate at epoch: %d" % (epoch,)) break It would seem that they want me to play more than one game per epoch, or that I check each move as a win or loss, otherwise Im not sure how I am expected to "reach 100% at epoch X." If the win rate rolls over with each epoch, one loss will deny 100% win rate, but also one win will automatically achieve 100% win rate. Alternatively, if the win rate does not carry over form epoch to epoch, then it must be calculated within each epoch as each action counting as a win or loss, or playing more than one game to accumulate a win rate.
Related Book For
Income Tax Fundamentals 2013
ISBN: 9781285586618
31st Edition
Authors: Gerald E. Whittenburg, Martha Altus Buller, Steven L Gill
Posted Date: