Question: Suppose you are programming a processor with an add latency of 3 clock cycles and a multiply latency of 5 cycles. It is also given

Suppose you are programming a processor with an add latency of 3 clock cycles and a multiply latency of 5 cycles. It is also given that this processor can complete one add and one multiply instruction every clock cycle, when instructions are fully pipelined. Consider the following loop:

for (i=0; i

A[i] = B[i] * C[i] + D[i] + i;

}

1a) Assuming the program is executed as-is (i.e. no pipelining), what is the lower bound on execution time (in clock cycles) based on the math performed?

1b) How can you exploit more instruction level parallelism in this program? What changes do you propose?

1c) Assuming you can pipeline the adds and multiplies, what would be the lower bound on execution time in clock cycles for the arithmetic?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!