Question: For the attached CUDA code dot.cu for dot product, please answer the following questions: what is the purpose of the method __syncthreads() ? In the
For the attached CUDA code dot.cu for dot product, please answer the following questions:
what is the purpose of the method __syncthreads() ?
In the kernel method dot(), the second __syncthreads() is executed for every thread even though only some of the threads are executing the instructions. Some smart people suggest the following updates (moving the __syncthreads() into if statement) to optimize the code
int i = blockDim.x/2;
while (i != 0) {
if (cacheIndex < i) {
cache[cacheIndex] += cache[cacheIndex + i];
__syncthreads();
}
i /= 2;
}.
Do you think this change will work?
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
