Question: Consider the following tiled matrix multiplication code: #define TILE _ WIDTH _ _ global _ _ void MatrixMulKernel ( float * d _ M ,

Consider the following tiled matrix multiplication code:
#define TILE_WIDTH
__global__ void MatrixMulKernel(float* d_M, float* d_N, float* d_P, int Width)
Question 2.(10points)Consider the matrix multiplication with the input Matrix M and N (M and N are both square matrixes, and the size of M and N is width *width, here width is a power of 2, i.e., width =2^i). How many times is each element in the input matrixes requested from global memory when:
(1) There is no tiling.
(2) Tiles of size T*T are used (Suppose T is a power of 2, i.e., T=2^j)
{
01.__shared__ float ds_M[TILE_WIDTH][TILE_WIDTH];
02.__shared__ float ds_N[TILE_WIDTH][TILE_WIDTH];
03. int bx = blockIdx.x; int by = blockIdx.y;
04. int tx = threadIdx.x; int ty = threadIdx.y;
// Identify the row and column of the Pd element to work on
05. int Row = by * TILE_WIDTH + ty;
06. int Col = bx * TILE_WIDTH + tx;
07. float Pvalue =0;
// Loop over the Md and Nd tiles required to compute the Pd element
08. for (int m =0; m

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!