Question: Consider the problem of multiplying a dense n x n matrix A with an n x 1 vector B to generate an n x 1

Consider the problem of multiplying a dense n x n matrix A with an n x 1 vector B to generate an n x 1 vector C. The ith element, C[i], corresponds to the dot-product of the ith row of A and the input vector B, as illustrated in the following Figure 1.
Part 1: Describe how you partition the computation tasks, organize threads, and map threads to the tasks.
Part 2: Write a matrix-vector multiplication CUDA kernel matrixVectorMulKernel by completing the following code:
_global_void matrixVectorMulKernel (float *A, float *B, float *C, int vectorLen){
Part 3: Write a host function matrix VectorMul that can be called in the main function with four parameters: pointer to the input matrix, pointer to the input vector, pointer to the output vector, and the number of elements in each dimension. This function should include statements for memory allocation, data transfer, thread organization, kernel function call and free memory. Complete the following code:
void matrixVectorMul(float *h_A, float *h_B, float *h_C, int vectorLen){
Part 4: If matrix-vector multiplication is implemented on a distributed memory system using multiple CPUs instead of GPUs and CUDA, which collective communication operations (e.g., one-to-all broadcast, all-to-all broadcast, all-to-one reduction, all-to-all reduction, scatter, gather) can be utilized to enhance performance? Describe how these operations can be applied effectively
Consider the problem of multiplying a dense n x n

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!