Question: Exercise 4 ( 8 points ) - Do NOT use code please, and instead solve manually Training a large language model depends on various architectural

Exercise

4 (8

points

) -

Do NOT use code please, and instead solve manually

Training a large language model depends on various architectural choices. However, recent papers such as Kaplan et al

. (2020)

and the "Chinchilla" paper

(

Hoffman et al

, 2022),

people noticed that the performance of an LLM can be predicted quite accurately by just two quantities, i

.

e

., (1)

the number

N

of model parameters, and

(2)

the total number

D

of tokens the model is trained on

.

The table below contains data from the training of various LLM systems.

\

table

[[

LLM

, N -

Parameters

(

billions

), D -

Tokens

(

billions

),

Loss

], [

GPT

- 2, 1, 21, 2.527663], [

GPT

- 3, 175, 300, 2.001097], [

Gopher

, 280, 300, 1.994691], [

Chinchilla

, 70, 1400, 1.936333], [

PaLM

, 540, 780, 1.923154]]

(

a

)

Determine a power law expressing the relation between the loss, the number of parameters N and the number of tokens D used during training

(

Hint: use least squares and a model of the form

(

:

a N^{- 0.34} + b D^{- 0.28} + c

Exercise 4 ( 8 points ) - Do NOT use code please,

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Q:

Answer the following quetions its maketing Not all companies segment the market, and this is called aggregation. They see the market as a whole and offer their products to everyone. What kind of...

Q:

SEE "DOCUMENT 1" FOR CRITERIA 2 "DOCUMENT 2" FOR CRITERIA 4 You are to write a 3 to 4 page paper following APA rules for the title page, citations and appropriate references within the body of the...

Q:

SEE "DOCUMENT 1" FOR CRITERIA 2 "DOCUMENT 2" FOR CRITERIA 4 You are to write a 3 to 4 page paper following APA rules for the title page, citations and appropriate references within the body of the...

Q:

SEE "DOCUMENT 1" FOR CRITERIA 2 "DOCUMENT 2" FOR CRITERIA 4 You are to write a 3 to 4 page paper following APA rules for the title page, citations and appropriate references within the body of the...

Q:

SEE "DOCUMENT 1" FOR CRITERIA 2 "DOCUMENT 2" FOR CRITERIA 4. You are to write a 2 to 4 page paper following APA rules for the title page, citations and appropriate references within the body of the...

Q:

Rev.Confirming Pages C H A P T E R 7 Planning, Composing, and Revising Chapter Outline The Ways Good Writers Write Activities in the Composing Process Using Your Time Effectively Brainstorming,...

Q:

JBR-07575; No of Pages 12 Journal of Business Research xxx (2012) xxx-xxx Contents lists available at SciVerse ScienceDirect Journal of Business Research Organizational innovation as an enabler of...

Q:

I need the following homework exercise completed. All of the information is within the attached files. I require no explanations, just the answers. Part 1: Finding information: Each of the following...

Q:

Part 1: Finding information: Each of the following questions should be answered within the table provided in file entitled "PART 1 TABLE." American Eagle's 2009 10-K is in the file entitled...

Q:

Hi, I need someone to do summary for the article I upload AUDITING: A JOURNAL OF PRACTICE & THEORY Vol. 28, No. 2 November 2009 pp. 1-34 American Accounting Association DOI: 10.2308 / aud.2009.28.2.1...

Q:

What are the pros and cons of bureaucratic controls such as rules, procedures, supervision, and the like?

Q:

Say we have a convex thin lens with focal point f = 10cm and we place an object with a height h = 5cm a distance p = 15cm in front of the lens. a) Draw a ray diagram that shows the image height and...

Q:

Why in this case would we expect the sample size to be small? ( Mark all that might apply. ) There is a limited amount of time that bat keepers maybe able to devote to this activity. There is a...

Q:

The surface area of a small toy ball is 18 square inches. If the radius of the ball is tripled, what will be the surface area of the new, larger ball in square inches?F 18 in.H 54 in.G 36 in.2J 162...

Recommended Textbook

More Books

Human Centered And Error Resilient Systems Development Ifip Wg 13 2/13 5 Joint Working Conference 6th International Conference On Human Centered

Authors: Cristian Bogdan ,Jan Gulliksen ,Stefan Sauer ,Peter Forbrig ,Marco Winckler ,Chris Johnson ,Philippe Palanque ,Regina Bernhaupt ,Filip Kis

1st Edition

331944901X, 978-3319449012

Ask a Question and Get Instant Help!