Question: Compare the different parameter settings for Q-learning for the game of Example 13.2 (page 585) (the monster game in AIPython (aipython.org)) In particular, compare the

Compare the different parameter settings for Q-learning for the game of Example 13.2 (page 585) (the “monster game” in AIPython (aipython.org))

In particular, compare the following situations:

(i) step size

(c) = 1/c and the Q-values are initialized to 0.0.

(ii) step size

(c) = 10/(9 +

c) varies, and the Q-values are initialized to 0.0.

(iii) α varies (using whichever of (i) and (ii) is better) and the Q-values are initialized to 5.0.

(iv) α is fixed to 0.1 and the Q-values are initialized to 0.0.

(v) α is fixed to 0.1 and the Q-values are initialized to 5.0.

(vi) Some other parameter settings.

For each of these, carry out multiple runs and compare

(a) the distributions of minimum values

(b) the zero crossing

(c) the asymptotic slope for the policy that includes exploration

(d) the asymptotic slope for the policy that does not include exploration (to test this, after the algorithm has explored, set the exploitation parameter to 100%

and run additional steps).

Which of these settings would you recommend? Why?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Management And Artificial Intelligence Questions!

nodes, but at least its bias can be quantified by Markov Chain L. INTRODUCTION analysis and thus can be corrected via appropriate re-weighting The popularity of online social networks (OSNs) in...

A number of young adults, ages 19 to 25, live with their parents. Data were collected, and the proportion of men and women in that age group who live with their parents was calculated. A 90%...

Microkernel operating systems aim to address perceived modularity and reliability issues in traditional "monolithic" operating systems. (i) Describe the typical architecture of a microkernel...

I hope you can answer this question and find the reference below the question. Thank you Topic: Conducting personal job interviews using the STAR model 1- Design a two-hour training work plan for 10...

Can I have chapter 12 outline. As a reference I posted all the pages of the chapter. Chapter 12 Constitutional Protections: "I Have My Rights" In the early days of our country, opinions were strong...

Let A, B be sets. Define: (a) the Cartesian product (A B) (b) the set of relations R between A and B (c) the identity relation A on the set A [3 marks] Suppose S, T are relations between A and B, and...

1. Texas Roadhouse (TXRH) is rapidly expanding into new markets and had sales of $1,263M in 2012. Suppose you expect sales to grow at a 15% rate in 2013, but this rate will slow by 2% per year to a...

IfyouhaveplayedaSimulationcalledProBankerIneedhelpansweringthesequestionsassoonaspossible from the pro bankerassignment attachment..please use spreadsheet and players manual for reference. Need...

Providing Quality School-Based Learning and Support Services 239 Chapter 6 Language and literacy support Your core task The core task of almost all TAs is to support students language and literacy...

Describe the types of cybercrimes facing organizations and critical infrastructures, explain the motives of cybercriminals, and evaluate the financial Explain both low-tech and high-tech methods...

Continuing the previous problem, explore whether the beta for these companies changes through time. For example, are the betas based on 1990s data different from those based on 2000s data? Or are...

Ejercicio No. 6 Enunciado: Conocido el Balance General de la empresa R.B. Harvey Electronics Company, y estimando que la empresa R.B. Harvey Electronics Company, incrementa su ganancia neta en un 15%...

Write the expression without negative exponents, and evaluate if possible. Assume all variables represent nonzero real numbers. (-5) -2