Question: a) In a supercomputer, there are 16 processors, each generating 3 loads and 1 store per cycle. Processor cycle time is 2ns, SRAM cycle time
a) In a supercomputer, there are 16 processors, each generating 3 loads and 1 store per cycle. Processor cycle time is 2ns, SRAM cycle time is 16ns. Compute the number of memory banks required for undelayed execution for all processors. b) A kemel of 16,384 threads is to be executed on a Fermi GPU where each thread is a single precision [4] FLOP. Would a single grid suffice for the execution of the kernel? In any case, compute the: i. number of grids required to completely execute the kemel ii. dimension of the grid/grids iii. dimension of a thread block iv. number of warps each streaming multiprocessor on the GPU needs to schedule in order to completely execute its thread block. v. If one FLOP takes 3 cycles on a SIMD lane, calculate the time required (in cycles) for execution of a thread block on a streaming multiprocessor. c) In a vector processor, where all units are fully pipelined including the VLSU, there are 16 memory [2] banks with bank busy time of 5 clock cycles. A LVWS instruction is executed with the stride of 10. Determine (using formula) whether bank stalls will occur or not? a) Consider the following loop written in C : [CLO-3, C4 Analysis, PLO-4 Investigation] [2] for(i=0;i
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
