Question: The following kernel is executed on a large matrix, which is tiled into submatrices. To manipulate tiles, a new CUDA programmer has written the following

The following kernel is executed on a large matrix, which is tiled into submatrices. To manipulate tiles, a new CUDA programmer has written the following device kernel to transpose each tile in the matrix. The tiles are of size BLOCK_SIZE by BLOCK_SIZE, and each of the dimensions of matrix A is known to be a multiple of BLOCK_SIZE. The kernel invocation and code are shown below. BLOCK_SIZE is known at compile time but could be set anywhere from 1 to 20.

dim3 blockDim(BLOCK_SIZE,BLOCK_SIZE);

dim3 gridDim(A_width/blockDim.x,A_height/blockDim.y);

BlockTranspose<<>>(A, A_width, A_height);

__global__ void BlockTranspose(float* A_elements, int A_width, int A_height)

{ __shared__ float blockA[BLOCK_SIZE][BLOCK_SIZE];

int baseIdx blockIdx.x * BLOCK_SIZE threadIdx.x;

baseIdx (blockIdx.y * BLOCK_SIZE threadIdx.y) * A_width;

blockA[threadIdx.y][threadIdx.x] A_elements[baseIdx];

A_elements[baseIdx] blockA[threadIdx.x][threadIdx.y];

}

Out of the possible range of values for BLOCK_SIZE, for what values of BLOCK_SIZE will this kernel function correctly when executing on the device?

If the code does not execute correctly for all BLOCK_SIZE values, suggest a fix to the code to make it work for all BLOCK_SIZE values.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

10. To manipulate tiles, a new CUDA programmer has written the following device kernel, which will transpose each tile in a matrix. The tiles are of size BLOCK_WIDTH by BLOCK_WIDTH, and each of the...

Answer the question shown below. Notes for reference shown below. \fThe function displays the promptstring, waits for keyboard input, and then returns the value from the keyboard. For example, >>...

can someone solve this Modern workstations typically have memory systems that incorporate two or three levels of caching. Explain why they are designed like this. [4 marks] In order to investigate...

Question No 1 [CLO-1, C3] [17] 11. 111 1V. V. a. Carry out the following operations to accelerate the application execution by offloading the Jacobi kernel to GPGPU device i. Declare and allocate...

a. Provide the appropriate pseudo code for exploiting the data level (CUDA) parallelism in a matrix vector multiplication b=Ax. The CUDA code is required to be executed using multiple blocks. Invoke...

The following kernel performs a portion of the finite-difference time-domain (FDTD) method for computing Maxwells equations in a three-dimensional space, part of one of the SPEC06fp benchmarks:...

Place the SysVinit Boot Sequence in order GRUB or the installed boot loader starts The runlevel scripts are executed, and the SXXservice scripts in the default runlevel directory are run in standard...

Please explain each step 1. You are given the following kernel and image: [J [l U U U 1 2 1 U U 1 D U a 2 r1 2 f [l [I 1 CI 0 l 2 l [l U l 0 t] H (l l} U U {a} Use the mechanics of spatial lter when...

I. You are given the following kernel and image: U U [l U U l 2 l U U l U U u: 2 a1 2 f U U 1 U U l 2 l U U l U U U U U U U {a} Use the mechanics of' spatial lter shown in lecture":II slides #5. when...

You are given the following kernel and image: (a) Use the mechanics of spatial filter shown in lecture7 slides #5, when the kernel is centered at point (2, 3) (2nd row, 3rd col), show calculation...

Happy Turtle Transporters Inc. has forecasted sales of $29,000,000 for next year and expects its cost of goods soid (COGS) to remain at 60% of sales. Currently, the firm holds $3,200,000 in...

During the first year, Bob and Kim partnership in SE 2 earned an income of $5,000. Assume the partners agreed to share income and losses in the ratio of the beginning balances of their capital...

Idex has a beta of 2 , the market risk premium is 7 % and the risk - free rate of interest is 4 % . Idex preferred stock pays a dividend of $ 6 each year and trades at a price of $ 4 0 per share....

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

What would be the shared Data Elements between Position Control and Salary Grade Tables in providing cost input to Budgets for vacant positions?

What is the purpose of the Salary Structure Table?

What is the scope and use of a Job Family Table?