Question: I WILL GIVE POSITIVE FEEDBACK!! Modify the values for the exploration factor, discount factor, and learning rates in the code to understand how those values

I WILL GIVE POSITIVE FEEDBACK!!

Modify the values for the exploration factor, discount factor, and learning rates in the code to understand how those values affect the performance of the algorithm. Be sure to place each experiment in a different code block so that your instructor can view all of your changes. Note: Discount factor = GAMMA, learning rate = LEARNING_RATE, exploration factor = combination of EXPLORATION_MAX, EXPLORATION_MIN, and EXPLORATION_DECAY.

Create a Markdown cell in your Jupyter Notebook after the code and its outputs. In this cell, you will be asked to analyze the code and relate it to the concepts from your readings. You are expected to include resources to support your answers, and must include citations for those resources. Specifically, you must address the following rubric criteria:
- Explain how reinforcement learning concepts apply to the cartpole problem.
  - What is the goal of the agent in this case?
  - What are the various state values?
  - What are the possible actions that can be performed?
  - What reinforcement algorithm is used for this problem?
- Analyze how experience replay is applied to the cartpole problem.
  - How does experience replay work in this algorithm?
  - What is the effect of introducing a discount factor for calculating the future rewards?
- Analyze how neural networks are used in deep Q-learning.
  - Explain the neural network architecture that is used in the cartpole problem.
  - How does the neural network make the Q-learning algorithm more efficient?
  - What difference do you see in the algorithm performance when you increase or decrease the learning rate?

...CODE...

import random import gym import numpy as np from collections import deque from keras.models import Sequential from keras.layers import Dense from keras.optimizers import Adam from scores.score_logger import ScoreLogger ENV_NAME = "CartPole-v1" GAMMA = 0.95 LEARNING_RATE = 0.001 MEMORY_SIZE = 1000000 BATCH_SIZE = 20 EXPLORATION_MAX = 1.0 EXPLORATION_MIN = 0.01 EXPLORATION_DECAY = 0.995 class DQNSolver: def __init__(self, observation_space, action_space): self.exploration_rate = EXPLORATION_MAX self.action_space = action_space self.memory = deque(maxlen=MEMORY_SIZE) self.model = Sequential() self.model.add(Dense(24, input_shape=(observation_space,), activation="relu")) self.model.add(Dense(24, activation="relu")) self.model.add(Dense(self.action_space, activation="linear")) self.model.compile(loss="mse", optimizer=Adam(lr=LEARNING_RATE)) def remember(self, state, action, reward, next_state, done): self.memory.append((state, action, reward, next_state, done)) def act(self, state): if np.random.rand() < self.exploration_rate: return random.randrange(self.action_space) q_values = self.model.predict(state) return np.argmax(q_values[0]) def experience_replay(self): if len(self.memory) < BATCH_SIZE: return batch = random.sample(self.memory, BATCH_SIZE) for state, action, reward, state_next, terminal in batch: q_update = reward if not terminal: q_update = (reward + GAMMA * np.amax(self.model.predict(state_next)[0])) q_values = self.model.predict(state) q_values[0][action] = q_update self.model.fit(state, q_values, verbose=0) self.exploration_rate *= EXPLORATION_DECAY self.exploration_rate = max(EXPLORATION_MIN, self.exploration_rate) def cartpole(): env = gym.make(ENV_NAME) score_logger = ScoreLogger(ENV_NAME) observation_space = env.observation_space.shape[0] action_space = env.action_space.n dqn_solver = DQNSolver(observation_space, action_space) run = 0 while True: run += 1 state = env.reset() state = np.reshape(state, [1, observation_space]) step = 0 while True: step += 1 #env.render() action = dqn_solver.act(state) state_next, reward, terminal, info = env.step(action) reward = reward if not terminal else -reward state_next = np.reshape(state_next, [1, observation_space]) dqn_solver.remember(state, action, reward, state_next, terminal) state = state_next if terminal: print ("Run: " + str(run) + ", exploration: " + str(dqn_solver.exploration_rate) + ", score: " + str(step)) score_logger.add_score(step, run) break dqn_solver.experience_replay()

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Modify the values for the exploration factor, discount factor, and learning rates in the code to understand how those values affect the performance of the algorithm. Be sure to place each experiment...

1.) Was this a qualitative or quantitative study? 2.) What was the sample size for the study? 3.) What percent of the sample had children who were at least school age? 4.) What percent of the sample...

Chapter 10 - Separating and Retaining Employees Chapter Ten: Separating and Retaining Employees . 1 Chapter 10 - Separating and Retaining Employees ROADMAP: THE LECTURE SEPARATING AND RETAINING...

Chapter 5 Theories of Motivation LEARNING OBJECTIVES After reading this chapter, you should be able to do the following: 1. Understand the role of motivation in determining employee performance. 2....

I need help in developing two or more solutions or interventions that align with my Ishikawa root cause thematic analysis factors. I need to trace back to the Ishikawa root cause analysis diagram. I...

Learner Instructions 3 Skills Test Submission details Students Name Student ID Group Assessors Name Assessment Date/s Pre-Assessment Checklist The purpose of this checklist The pre-assessment...

CHAPTER 6 - PERFORMANCE MANAGEMENT 360-degree feedback (page 192) is being used in more workplaces each year. As an employee, what is your view of this practice? If you are/were to become a...

This text was adapted by The Saylor Foundation under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License without attribution as requested by the work's original creator or licensee. 1...

Read below and look around at your organization, whether your school or workplace. What three ideas can you come up with right away for possible innovations? How would your ideas, if implemented,...

FIJI REVENUE & CUSTOMS AUTHORITY 2015 ANNUAL REPORT I MANAGEMENT REPORT 01 Board of Directors 4 02 FRCA Management Team 5 03 Chief Executive Officer's Report 6 II CRITICAL SUCCESS FACTORS 2015 01...

Consider the figures below, which depict total annual CO2 production and per person CO2 production for the United States, China, India, and Brazil over the past 30 years. Some people argue that China...

You need to state two main phases in planning an audit and outline the main objective(s) of each. When planning all audits, the auditor must consider: The extent of audit procedures The timing of...

2 : 2 0 PM Diploma in Computer Networking - Second Assessment Which of the following is use to protect a network from malicious attack and unwanted intrusion? Choose one. Router Proxy Server Firewall

3. Explore the role of third-party financial services in the FinTech ecosystem. How do these services complement or challenge traditional banking models? Provide examples of successful thirdparty...

=+What information would you need about each of the people in the United States?

=+ a. How does this change affect the incentives for working?

=+ b. How might this change represent a trade-off between equality and efficiency?