Question: 10. To manipulate tiles, a new CUDA programmer has written the following device kernel, which will transpose each tile in a matrix. The tiles are

10. To manipulate tiles, a new CUDA programmer has written the following device kernel, which will transpose each tile in a matrix. The tiles are of size BLOCK_WIDTH by BLOCK_WIDTH, and each of the dimensions of matrix A is known to be a multiple of BLOCK_WIDTH. The kernel invocation and code are shown below. BLOCK_WIDTH is known at compile time, but could be set anywhere from 1 to 20.

dim3 blockDim(BLOCK_WIDTH,BLOCK_WIDTH);

dim3 gridDim(A_width/blockDim.x,A_height/blockDim.y);

BlockTranspose>>(A, A_width, A_height);

__global__ void

BlockTranspose(float* A_elements, int A_width, int A_height) {

__shared__ float blockA[BLOCK_WIDTH][BLOCK_WIDTH];

int baseIdx=blockIdx.x * BLOCK_SIZE + threadIdx.x;

baseIdx += (blockIdx.y * BLOCK_SIZE + threadIdx.y) * A_width;

blockA[threadIdx.y][threadIdx.x]=A_elements[baseIdx];

A_elements[baseIdx]=blockA[threadIdx.x][threadIdx.y]; }

A. Out of the possible range of values for BLOCK_SIZE, for what values of BLOCK_SIZE will this kernel function execute correctly on the device?

B. If the code does not execute correctly for all BLOCK_SIZE values, suggest a fix to the code to make it work for all BLOCK_SIZE values

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!