Question: Matrix Multiplication using CUDA and implementation on Nvidia GPU 1) Declare matrix A of width 480 and height 800. 2) Declare matrix B of width

Matrix Multiplication using CUDA and implementation on Nvidia GPU 1) Declare matrix A of width 480 and height 800. 2) Declare matrix B of width 1280 and height 480. 3) Declare an answer matrix C of width 1280 and height 800 l. 4) Fill matrix A and B with random float values. 5) Start a stopwatch. 6) Multiply the two matrices and store the answer in matrix C. 7) Stop the stopwatch. 8) Report the time it took to perform the matrix multiplication operation. Sequential C Implementation The standard sequential C implementation of the matrix multiplier is strait forward. Five time trials of the matrix multiplication operations yielded 3920 , 3870,3980,3900, and 3920 milliseconds, yielding an average time of 3918 milliseconds. Please reference the attached MatrixMul.c source file for more details on implementation. imple Final Results-Generate your own The sequential C implementation yielded an average time trial result of 3918 milliseconds, and the CUDA implementation yielded an average time trial result of 48.74 milliseconds, giving us an average speedup of 80.4. speedup=CUDAav.timeCPUav.time=(49.2+48.7+48.6+48.6+48.6)ms/5(3920+3870+3980+3900+3920)ms/5=48.74ms3918ms80.4 As demonstrated, one may use CUDA to get drastic performance increases with certain operations. This is just one example demonstrating CUDA-enabled GPUs ability to handle matrix multiplication. MatrixMul.h \( \begin{aligned} 1 & \text { \#ifndef MATRIXMUL_H } \\ 2 & \text { \#define _MATRIXMUL_H_ } \\ 3 & \\ 4 & \text { // Thread block size } \\ 5 & \text { \#define BLOCK_SIZE } 16 \\ 6 & \\ 7 & \text { // Matrix dimensions } \\ 8 & \text { // (chosen as multiples of the thread block size for simplicity) } \\ 9 & \text { \#define WA (3*10 * BLOCK_SIZE) // Matrix A width } \\ 10 & \text { \#define HA (5*10 * BLOCK_SIZE) // Matrix A height } \\ 11 & \text { \#define WB (8*10 * BLOCK_SIZE) // Matrix B width } \\ 12 & \text { \#define HB WA // Matrix B height } \\ 13 & \text { \#define WC WB // Matrix C width } \\ 14 & \text { \#define HC HA // Matrix C height } \\ 15 & \\ 16 & \text { \#endif // _MATRIXMUL_H_ }\end{aligned} \) MatrixMul.cu Page 1 MatrixMulcu continued Matrix_Mul_Kernal.cu

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Seeking your assistance in solving the second bullet point correctly. I am not clearly understanding if cudaMemcpyHostToDevice will copy over a variable value from CPU ( System Shared Memory? ) to...

Matrix Multiplication Programming. Implement two types of algorithms for multiplying two n x n matrices. Assume n is a power of 2. 1. The straight-forward O( ) matrix multiplication algorithm. 2....

Matrix Multiplication Programming. This is programming question. Please do not solve them in equations. Implement two types of algorithms for multiplying two n x n matrices. Assume n is a power of 2....

Matrix Multiplication Programming. Implement two types of algorithms for multiplying two n x n matrices. Assume n is a power of 2. 1. The straight-forward O( ) matrix multiplication algorithm. 2....

What is Exercise 32-45? I don't quite understand it, so I'm not sure if my work or answers are correct. Linear Algebra Lab Exercises Linear algebra deals with vectors and linear functions that act on...

Need help c++. Need help with parts 6 and 7. See photos for instructions. My code for the other parts is pasted below. Please make parts 6 and 7 work with my code. Due at 11:30 pm eastern us time...

Need help with C++ HW. In the main function please print out the matrix done by different operations! For this homework exercise you will be exploring the implementation of matrix multiplication...

Please implement a single mapreduce on PySpark to compute a matrix multiplication, i.e., Cnxn = Anxn X Bnxn. Here I provide you the pseudocode and you can implement it on PySpark. The Map Function:...

Cannon's algorithm (A) Cannon's algorithm sequentially. Write a C99 program to matrix multiplication using compute cannon's algorithm sequentially. NOTE THERE IS NO COMMUNICATION YOU ARE MERELY...

Find the general indefinite integral. a. (x2 + x-2) dx b. (x4 - 1/2 x3 + 1/4x - 2) dx c. (u + 4)(2u + 1) du

2. Case. EMPLOYEE'S MISTAKE OF HOTEL AND TOURIST DUE TO LANGUAGE In January the public was shocked by a tourist from Melbourne, Australia named Aneta Baker. Aneta claimed to have been sexually abused...

iuppose the real rate is 1 . 8 % and the inflation rate is 2 . 7 % . What rate would you expect to see on a Treasury bill? a . 5 . 1 2 % b . 4 . 5 5 % c . 2 . 9 7 % d . 3 . 8 6 %

When should the job order costing system be used? ( 2 pts ) All of the choices are correct When a company wants to track the cost of each job ( i . e . each order ) When a company manufactures high -...

9. The mentor program is evaluated. Interviews with mentors and protgs are used to obtain immediate feedback regarding specific areas of dissatisfaction. Surveys are used to gather more detailed...

10. Employee development is rewarded, which signals managers that mentoring and other development activities are worth their time and effort.

1. Mentor and protg participation is voluntary. Relationship can be ended at any time without fear of punishment.