Question: Below i have attached the training.py code . Give the DQN architecture 1 code which takes state and action as input and returns only one

Below i have attached the training.py code

.

Give the DQN architecture

1

code which takes state and action as input and returns only one q value. Integrate the code with the training.py and give

.

Make sure you are getting a increasing reward trend as episodes increase and dont use the offline data as mentioned in the image. Other guidelines are there in image

from keras.models import Model

from keras.layers import Input, Dense, Lambda

from keras.optimizers import Adam

import keras.backend as K

from collections import deque

import random

# Constants

BATCH

_

SIZE

= 64

GAMMA

= 0.99

EPSILON

_

START

= 1.0

EPSILON

_

MIN

= 0.01

EPSILON

_

DECAY

= 0.995

LEARNING

_

RATE

= 0.001

# Dueling DQN Model Architecture

def create

_

dueling

_

dqn

_

model

(

input

_

shape, action

_

space

)

state

_

input

=

Input

(

shape

= (

input

_

shape,

))

=

Dense

(512,

activation

=

'relu'

) (

state

_

input

)

=

Dense

(256,

activation

=

'relu'

) (

)

=

Dense

(64,

activation

=

'relu'

) (

)

state

_

value

=

Dense

(1) (

)

state

_

value

=

Lambda

(

lambda s: K

.

expand

_

dims

(

[

, 0], - 1),

output

_

shape

= (

action

_

space,

)) (

state

_

value

)

action

_

advantage

=

Dense

(

action

_

space

) (

)

action

_

advantage

=

Lambda

(

lambda a: a

[

,

] -

.

mean

(

[

,

],

keepdims

=

True

),

output

_

shape

= (

action

_

space,

)) (

action

_

advantage

)

_

values

=

Lambda

(

lambda w: w

[0] +

[1],

output

_

shape

= (

action

_

space,

)) ([

state

_

value, action

_

advantage

])

model

=

Model

(

inputs

=

state

_

input, outputs

=

_

values

)

model.compile

(

loss

=

'mse', optimizer

=

Adam

(

=

LEARNING

_

RATE

))

return model

# DQN Training Function

def DQN

_

training

(

env

,

offline

_

data, use

_

offline

_

data

=

False

)

state

_

size

=

env.observation

_

space.shape

[0]

action

_

size

=

env.action

_

space.n

model

=

create

_

dueling

_

dqn

_

model

(

state

_

size, action

_

size

)

replay

_

buffer

=

deque

(

maxlen

= 2000)

epsilon

=

EPSILON

_

START

total

_

reward

_

per

_

episode

= []

for episode in range

(1000)

: # Number of episodes

state

=

env.reset

()

state

=

.

reshape

(

state

, [1,

state

_

size

])

total

_

reward

= 0

for time

_

step in range

(500)

: # Max steps in an episode

if np

.

random.rand

() =

epsilon:

action

=

env.action

_

space.sample

()

# Explore action space

else:

_

values

=

model.predict

(

state

)

action

=

.

argmax

(

_

values

[0])

# Exploit learned values

_

state, reward, done,

_=

env.step

(

action

)

_

state

=

.

reshape

(

_

state,

[1,

state

_

size

])

total

_

reward

+ =

reward

if not use

_

offline

_

data: # Only save and learn if not using offline data

replay

_

buffer.append

((

state

,

action, reward, next

_

state, done

))

if len

(

replay

_

buffer

) >

BATCH

_

SIZE:

minibatch

=

random.sample

(

replay

_

buffer, BATCH

_

SIZE

)

for s

,

,

,

_

,

d in minibatch:

target

=

if not d:

target

=

+

GAMMA

*

.

amax

(

model

.

predict

(

_

) [0])

target

_

=

model.predict

(

)

target

_

[0] [

] =

target

model.fit

(

,

target

_

,

epochs

= 1,

verbose

= 0)

state

=

_

state

if done:

break

total

_

reward

_

per

_

episode.append

(

total

_

reward

)

# Update epsilon

epsilon

=

max

(

EPSILON

_

MIN, epsilon

*

EPSILON

_

DECAY

)

return model, np

.

array

(

total

_

reward

_

per

_

episode

)

# Replace this line with any initialization of the environment required before training

# env

=

gym.make

('

LunarLander

-

2')

# Do not load offline data

use

_

offline

_

data

=

False

# Now you would call DQN

_

training like this:

# final

_

model, total

_

reward

_

per

_

episode

=

DQN

_

training

(

env

,

None, use

_

offline

_

data

)

# After training, you'd save your model and plot the rewards.

Section

3

: Train DQN Model

In this section you will train two DQN models of Architecture type

1,

.

.

the DQN model should accept

the state and the action as input and the output of the model should be the Q

-

value of the state

-

action

pair given in the input. The first DQN model should be without the data collected in step

3

and the

second one uses the data.

VERY IMPORTANT: If you are coding DQN model of Architecture type

2 (

.

.

the DQN

model that accepts state as input and the output is Q

-

value of all the state

-

action pair

),

you will get a ZERO for this section. There will be NO MERCY in this regard.

Deliverables

(75

marks

)

: You are given a Python script

training.py

.

This script contains the bare basic

skeleton of the DQN training code along with a function that loads the data collected in step

3 .

You must

NOT change the overall structure of the skeleton. There are two functions in

training.py: DQN

_

training

and plot

_

reward. Your task is to write the code for these two functions. Few additional instructions:

This function MUST train DQN of architecture

1 (

the DQN model should accept the state and the

action as input and the output of the model should be the Q

-

value of the state

-

action pair give

Below i have attached the training.py code . Give the DQN

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

this is the training.py code below give the dqn architecture 1 . which take state action pair as input and gives one q value with respect to that action also make sure you get a increasing reward...

I need assistance filling out a 1040. Along with Schedule A & B for Federal Tax. Can anyone offer any help? SCHEDULE A (Form 1040) OMB No. 1545-0074 Itemized Deductions Department of the Treasury...

hello, please see attached I need help with. I only need the 2 tax memos since I already prepared the tax forms. I need an original paper with proper references APA format. Thanks ACC 700 Milestone...

Due Date: May 28th 2018 at 23:55 Percentage overall grade: 5% Penalties: No late assignments allowed Goal : refresher of Python and hands-on experience with class building and encapsulation....

Attached are pictures of the question I am struggling with. I also need the following forms completed. Thanks in advance. \fForm 1040 2016 (99) Department of the TreasuryInternal Revenue Service U.S....

This question is from acct tax class, Here is question...

Python 3 2048 Game This is the unfilled code that must be answered : import random as rnd import os import sys class Grid(): def __init__(self, row=4, col=4, initial=2): self.row = row # number of...

Use Python 3 to build a 2048 Game: Task 1 Your first task will be to implement the function createGrid() that takes two parameters row and col and returns the grid for the game. You will be using a...

1 Recursion in Assembly Programming Lab # 5 CEG 3 3 1 0 / 5 3 1 0 : Computer Organization PURPOSE In this lab you will learn how to implement recursive subroutines in assembly. ASSIGNMENT You will be...

Smith buys and sells securities. On December 15, 2018, Smith purchased $526,000 of Jones shares and elected the fair value option to account for the Jones investment. As of December 31, 2018, the...

After Susan Wong graduated from State University with a degree in management science, she went to work for a computer systems development firm in the Washington, DC, area. As a student at State,...

E8.20 (LO 4) (Retail Inventory Method) Presented below is information related to Luzon SA. Cost Retail Beginning inventory R$ 58,000 R$100,000 Purchases (net) 122,000 200,000 Net markups 20,000 Net...

Skills necessary for success in marketing include Multiple select question. analytical thinking advertising industry experience infallibility ability to work with others

3. If you were the employer, how would you handle each of the instances above?

1. Professor Zuboff says Edward was haunted by the sense of not knowing what he didnt know or how to learn it. Is it possible that many employees in todays labor force suffer from that same fear?...

3. Would you like to work at a Stew Leonards retail store? Develop your rationale for employment or nonemployment.