Question: [ 1 0 / 1 5 ] < 4 . 4 > Assume a GPU architecture that contains 1 0 SIMD processors. Each SIMD instruction

[10/15]<4.4> Assume a GPU architecture that contains 10 SIMD processors. Each SIMD instruction has a width of 32 and each SIMD processor contains 8 lanes for single-precision arithmetic and load/store instructions, meaning that each nondiverged SIMD instruction can produce 32 results every 4 cycles. Assume a kernel that has divergent branches that causes, on average, 80% of threads to be active. Assume that 70% of all SIMD instructions executed are single-precision arithmetic and 20% are load/store. Because not all memory latencies are covered, assume an average SIMD instruction issue rate of 0.85. Assume that the GPU has a clock speed of 1.5 GHz.
a.[10]<4.4> Compute the throughput, in GFLOP/s, for this kernel on this GPU.
b.[15]<4.4> Assume that you have the following choices:
(1) Increasing the number of single-precision lanes to 16
(2) Increasing the number of SIMD processors to 15(assume this change doesnt affect any other performance metrics and that the code scales to the additional processors)
(3) Adding a cache that will effectively reduce memory latency by 40%, which will increase instruction issue rate to 0.95
What is speedup in throughput for each of these improvements?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!