Question: Deep Reinforcement Learning Assignment 0 1 Problem Statement 1 3 Marks Title: Propose a suitable title Problem Statement: Define a problem statement of your own

Deep Reinforcement Learning

Assignment

01

Problem Statement

13

Marks

Title: Propose a suitable title

Problem Statement: Define a problem statement of your own with a well

-

defined

objective, gaming environment, and game controls.

[1

Mark

]

Concept Sketch: A pen and paper

-

based game concept sketching to illustrate the

proposed gaming problem statement.

[1

Mark

]

Additional Information: Provide any necessary information assumed

/

considered for

the game implementation.

Requirements and Deliverables:

Elaborate on how the described problem could be solved using deep neural

network and explain the action plan to create a gaming environment.

[1

Mark

]

Prepare a Colab sheet with outputs saved satisfying the following requirements.

Implementation should be in OpenAI gym with python. Develop a deep neural

network architecture and training procedure that effectively learns the optimal

policy for the spaceship to avoid collisions with asteroids and maximize its

survival time in the game environment.

.

Environment Setup: Define the game environment, including the state

space, action space, rewards, and terminal conditions.

[1.5

Mark

]

.

Replay Buffer: Implement a replay buffer to store experiences

(

state

,

action, reward, next state, terminal flag

) . [1.5

Mark

]

iii. Deep Q

-

Network Architecture: Design the neural network architecture

for the DQN using Convolutional Neural Networks. The input to the

network is the game state, and the output is the Q

-

values for each

possible action.

[2

Marks

]

.

Epsilon

-

Greedy Exploration: Implement an exploration strategy such

as epsilon

-

greedy to balance exploration

(

trying new actions

)

and

exploitation

(

using learned knowledge

) . [1

Mark

]

.

Training Loop: Initialize the DQN and the target network

(

a separate

network used to stabilize training

) .

In each episode, reset the

environment and observe the initial state.

[2

Marks

]

.

Testing and Evaluation: After training, evaluate the DQN by running it

in the environment without exploration

(

set epsilon to

0) .

Monitor metrics

such as average reward per episode, survival time, etc., to assess the

performance.

[2

Mark

]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Deep Reinforcement Learning - - - Please answer all question Assignment 0 1 Problem Statement 1 3 Marks Title: Propose a suitable title Problem Statement: Define a problem statement of your own with...

What I need: 1- Problem statement (from customer point of view) 2- Opportunity statement (from customer point of view) 3- Prioritized solutions 4- Assumptions and additional investigation required A...

I need help with my final exam. Please due on Thursday 11:59am. There are 5 tabs Exam III, Part II Instructions: Complete the problems on the spreadsheet. Use formulas to calculate all cells that...

Python and most Python libraries are free to download or use, though many users use Python through a paid service. Paid services help IT organizations manage the risks associated with the use of...

What is a White Paper? In this assignment, you will write a \"white\" paper. White papers are similar to research papers/ lab reports, in that they fully inform the reader of a research topic (ie-...

This is a individual assignment, which is due on 27th of jan, wed, 1pm. Can u help me to do it? Financial Accounting 2 ACG 27, Study Period 4, 2015 Case Study for Annual Report Assignment The...

London School of Science & Technology Qualification Unit number and title BTEC Level 5 HND Diploma Business UNIT 6: Business Decision Making Student name and ID number Assessor name Al Hassan Barrie...

Provide an example of how a business function with which you are familiar (e.g., sales, marketing, finance, operations/ production, accounting, human resources) utilizes IT for operational and/or...

What is a default point?

If a fixed asset, such as a computer, were purchased on January 1 for $ 2 , 0 7 5 with an estimated life of 7 years and a salvage or residual value of $ 1 0 0 , the journal entry for monthly expense...

Find the radius of gyration of a plate covering the region bounded by x=3, x=5, y=0, and y=4 with respect to the y-axis.

I would have had to wait a long time for a reply.

Forget it. I sent an e-mail on their Web site before and heard nothing back.

Id already thrown away the receipt.