Question: How to train a differential robot using reinforcement learning? I need my robot to reach a specific point in a given area. Currently, the algorithm

How to train a differential robot using reinforcement learning?

I need my robot to reach a specific point in a given area. Currently, the algorithm

(

DDPG

)

consists of the following parameters:

-

State: Robot's x and y position, its orientation, and the reading of an ultrasonic sensor

,

for a total of

4

parameters.

-

Goal: The goal is composed of the positions to be reached.

-

Actions: The actions generated are the linear and angular velocity of the robot.

-

Reward: It is given by the following function:

def reward

_

function

(

self

,

objective

_

distance, obstacle

_

distance, a

_

velocity, l

_

velocity, time

_

_

objective

)

# Objective distance penalization

reward

_

objective

_

distance

= - 1 * (

objective

_

distance

) * * 2 / (4.0) * * 2

# Obstacle distance penalization

reward

_

obstacle

_

distance

= (- 1 * ((1 / (

obstacle

_

distance

+ 0.06))) / 16) + 0.07

# High speeds penalizations

reward

_

velocity

= - 1 * (

_

velocity

* * 2 +

_

velocity

* * 2) / 8.0

# Reward for staying on target

reward

_

time

=

time

_

_

objective

reward

=

reward

_

objective

_

distance

+

reward

_

obstacle

_

distance

+ \

reward

_

velocity

+

reward

_

time

return reward

-

Neural network parameters:

*

Actor: Two hidden layers of

256

neurons, learning rate of

0.0001

*

Critic: Two hidden layers of

256

neurons, learning rate of

0.001

However, in all simulations I get very bad results.

Could you give me some tips to improve the training of the robot?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Python and most Python libraries are free to download or use, though many users use Python through a paid service. Paid services help IT organizations manage the risks associated with the use of...

The OB/HR Matrix Organisational Behaviour Concept HR Management Function The Link to HR Management Organisational Culture Employee Involvement and Relations Ethics Management Organisational Design...

This text was adapted by The Saylor Foundation under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License without attribution as requested by the work's original creator or licensee. 1...

SUMMARY OF LEARNING OBJECTIVES AND KEY POINTS 1. Identify the basic elements of organizations. Organizations are made up of a series of elements: Designing jobs Grouping jobs Establishing reporting...

post DataXu: Selling Ad Tech On June 20, 2016, DataXu CEO Mike Baker surveyed the beachfront at the Cannes Lions International Festival of Creativity. Each year, Cannes, a seaside town on the French...

Please summarize this journal, the length of the summary should not be more than two pages with 1.5 spacing, size 12 Times New Rome. Expert Systems with Applications 38 (2011) 11347-11354 Contents...

I have attached the question. I will post student question when I receive one later. Chapter 2, Customer Behavior and 3, Segmentation of textbook can also be used. Marketing Management: MKT500 Week 1...

Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...

Describe the types of cybercrimes facing organizations and critical infrastructures, explain the motives of cybercriminals, and evaluate the financial Explain both low-tech and high-tech methods...

Discuss the future trends that will affect training. INTRODUCTION The previous ten chapters discussed management, and training's role in contr ous ten chapters discussed training design and delivery,...

Soderlund Company has a balance in its accounts receivable of $ 35,000 at the beginning of the year. Credit sales for the year are expected to be $ 675,000. It is estimated that Soderlund will...

Portal Corporation produces the same laser printer in two Utah plants, a new plant in Ogden and an older plant in Sandy. The following data are available for the two plants: All fixed costs per unit...

According to this source, who should consider investing in gold and silver and for what reason? What are examples of other precious metals in the futures market? How do investors offset futures...

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

The chapter notes that the rise in the U.S. trade deficit during the 1980s was due largely to the rise in the U.S. budget deficit. On the other hand, the popular press sometimes claims that the...

In 1998, the Russian government defaulted on its debt payments, leading investors worldwide to raise their preference for U.S. government bonds, which are considered very safe. What effect do you...

A case study in the chapter analyzed purchasing-power parity for several countries using the price of Big Macs. Here are data for a few more countries: Predicted Country Big Mac Rate Rate Indonesia...