Question: LLMs generate responses to user queries in a two - step process. The first is a pre - fill where the tokens in the input

LLMs generate responses to user queries in a two-step process. The first is a pre-fill where the tokens in the input prompt are processed in a parallel manner, and second is decoding where response text is generated one token at a time. A user sees the first token in the response once the pre-fill activity is finished. From then on, the time taken for the entire response to be generated depends on the time it takes to generate subsequent tokens in the output. The latency of a LLM service is defined as the time taken by the model to generate the full response after a user submits their query.
Question 2
Not yet answered
Marked out of 1.00
Not flaggedFlag question
Question text
It takes 2500 milliseconds in the pre-fill step for a user query (prompt), while the time taken per output token is 50 milliseconds. The typical response consists of 1001 tokens. What is the latency of the LLM service?
a.
0.05 seconds
b.
52 seconds
c.
102 seconds
d.
105 seconds
e.
55 seconds

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!