Question: For this homework I have to create the code described in the pseudocode. I created 22 lines of code following the pseudocode, following the instructions

For this homework I have to create the code described in the pseudocode. I created 22 lines of code following the pseudocode, following the instructions in the pseudocode and the expectations from the print calls following the training (epoch, loss, n_episodes, win count and win history).      def qtrain(model, maze, **opt):            # exploration factor        global epsilon             # number of epochs        n_epoch = opt.get('n_epoch', 15000)            # maximum memory to store episodes        max_memory = opt.get('max_memory', 1000)            # maximum data size for training        data_size = opt.get('data_size', 50)            # start time        start_time = datetime.datetime.now()            # Construct environment/game from numpy array: maze (see above)        qmaze = TreasureMaze(maze)            # Initialize experience replay object        experience = GameExperience(model, max_memory=max_memory)                win_history = []   # history of win/lose game        hsize = qmaze.maze.size//2   # history window size        win_rate = 0.0                # pseudocode:        # For each epoch:        #    Agent_cell = randomly select a free cell        #    Reset the maze with agent set to above position        #    Hint: Review the reset method in the TreasureMaze.py class.        #    envstate = Environment.current_state        #    Hint: Review the observe method in the TreasureMaze.py class.        #    While state is not game over:        #        previous_envstate = envstate        #        Action = randomly choose action (left, right, up, down) either by exploration or by exploitation        #        envstate, reward, game_status = qmaze.act(action)        #    Hint: Review the act method in the TreasureMaze.py class.        #        episode = [previous_envstate, action, reward, envstate, game_status]        #        Store episode in Experience replay object        #    Hint: Review the remember method in the GameExperience.py class.        #        Train neural network model and evaluate loss        #    Hint: Call GameExperience.get_data to retrieve training data (input and target) and pass to model.fit method         #          to train the model. You can call model.evaluate to determine loss.        #    If the win rate is above the threshold and your model passes the completion check, that would be your epoch.                for epoch in range(n_epoch):            Agent_cell = qmaze.free_cells[np.random.randint(len(qmaze.free_cells))]            qmaze.reset(Agent_cell)            envstate = qmaze.observe()            n_episodes, loss = 0            while qmaze.game_status() == 'not_over':                n_episodes += 1                previous_envstate = envstate                action = np.argmax(experience.predict(envstate))                envstate, reward, game_status = qmaze.act(action)                                episode = [previous_envstate, action, reward, envstate, game_status]                experience.remember(episode)                        if qmaze.game_status() == 'win':                win_history.append(1)            if qmaze.game_status() == 'lose':                win_history.append(0)            win_rate = sum(win_history)/len(win_history)            model_input, model_target = experience.get_data(data_size = data_size)            model.fit(model_input, model_target, verbose = 0, batch_size = data_size)            loss = model.evaluate(model_input, model_target, verbose = 0, batch_size = data_size)                #Print the epoch, loss, episodes, win count, and win rate for each epoch            dt = datetime.datetime.now() - start_time            t = format_time(dt.total_seconds())            template = "Epoch: {:03d}/{:d} | Loss: {:.4f} | Episodes: {:d} | Win count: {:d} | Win rate: {:.3f} | time: {}"            print(template.format(epoch, n_epoch-1, loss, n_episodes, sum(win_history), win_rate, t))            # We simply check if training has exhausted all free cells and if in all            # cases the agent won.            if win_rate > 0.9 : epsilon = 0.05            if sum(win_history[-hsize:]) == hsize and completion_check(model, qmaze):                print("Reached 100%% win rate at epoch: %d" % (epoch,))                break                        # Determine the total time for training        dt = datetime.datetime.now() - start_time        seconds = dt.total_seconds()        t = format_time(seconds)            print("n_epoch: %d, max_mem: %d, data: %d, time: %s" % (epoch, max_memory, data_size, t))        return seconds        # This is a small utility for printing readable time strings:    def format_time(seconds):        if seconds < 400:            s = float(seconds)            return "%.1f seconds" % (s,)        elif seconds < 4000:            m = seconds / 60.0            return "%.2f minutes" % (m,)        else:            h = seconds / 3600.0            return "%.2f hours" % (h,)    model = build_model(maze)    qtrain(model, maze, epochs=1000, max_memory=8*maze.size, data_size=32)                        this line of code is throwing me off         # We simply check if training has exhausted all free cells and if in all            # cases the agent won.            if win_rate > 0.9 : epsilon = 0.05            if sum(win_history[-hsize:]) == hsize and completion_check(model, qmaze):                print("Reached 100%% win rate at epoch: %d" % (epoch,))                break    It would seem that they want me to play more than one game per epoch, or that I check each move as a win or loss, otherwise Im not sure how I am expected to "reach 100% at epoch X." If the win rate rolls over with each epoch, one loss will deny 100% win rate, but also one win will automatically achieve 100% win rate. Alternatively, if the win rate does not carry over form epoch to epoch, then it must be calculated within each epoch as each action counting as a win or loss, or playing more than one game to accumulate a win rate.

 


Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!