Question: [ 1 0 / 1 5 ] < 4 . 4 > Assume a GPU architecture that contains 1 0 SIMD processors. Each SIMD instruction
Assume a GPU architecture that contains SIMD processors. Each SIMD instruction has a width of and each SIMD processor contains lanes for singleprecision arithmetic and loadstore instructions, meaning that each nondiverged SIMD instruction can produce results every cycles. Assume a kernel that has divergent branches that causes, on average, of threads to be active. Assume that of all SIMD instructions executed are singleprecision arithmetic and are loadstore Because not all memory latencies are covered, assume an average SIMD instruction issue rate of Assume that the GPU has a clock speed of GHz
a Compute the throughput, in GFLOPs for this kernel on this GPU.
b Assume that you have the following choices:
Increasing the number of singleprecision lanes to
Increasing the number of SIMD processors to assume this change doesnt affect any other performance metrics and that the code scales to the additional processors
Adding a cache that will effectively reduce memory latency by which will increase instruction issue rate to
What is speedup in throughput for each of these improvements?
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
