How would you solve the following problem Design a CUDA program to perform matrix multiplication C A xx B The size of each matrix is 1 K xx 1 K Each element is 1 byte The matrices are initially stored in the global memory of GPU The GPU has one streaming multiprocessor ( SM ) with 1 K CUDA cores Each CUDA core runs at 1 GHz and can perform one floating point operation in each clock cycle The peak bandwidth between the GPU and the global memory is 1 0 0 GB s and the cache of GPU has been disabled Assuming the GPU has an on chip buffer ( shared memory ) which can store 3 xx 3 2 xx 3 2 elements and has a peak access bandwidth of 1 TB s Write the pseudo code for an optimized CUDA program using block matrix mul tiplication approach including both kernel function and host function Derive a lower bound on the computation time, the local data access time in SM , and the global data access time of your optimized kernel function Explain

The Answer is in the image, click to view ...

Question: How would you solve the following problem? - - - - - - - Design a CUDA program to perform matrix multiplication C = A

How would you solve the following problem?

- - - - - - -

Design a CUDA program to perform matrix multiplication C

=

A xx B

.

The size of each

matrix is

1

K xx

1

.

Each element is

1

byte. The matrices are initially stored in the global

memory of GPU. The GPU has one streaming multiprocessor

(

)

with

1

K CUDA cores.

Each CUDA core runs at

1

GHz and can perform one floating point operation in each clock

cycle. The peak bandwidth between the GPU and the global memory is

100

/ /

s and the

cache of GPU has been disabled.

Assuming the GPU has an on

-

chip buffer

(

shared memory

)

which can store

3

32

32

elements and has a peak access bandwidth of

1

/ /

.

Write the pseudo code for an optimized CUDA program using block matrix mul

-

tiplication approach including both kernel function and host function.

Derive a lower bound on the computation time, the local data access time in SM

,

and the global data access time of your optimized kernel function. Explain.

How would you solve the following problem? - - -

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Read the question and create child process stop pasting code from other Cheng answer it is not correct and will be downvoted. Question: USE fork() for child process! #include library is us... You...

Program the following algorithms that we covered in the class: a. Classical matrix multiplication b. Divide-and-conquer matrix multiplication c. Strassen's matrix multiplication You can use either...

Learning Objectives: Design an efficient data-decomposition (i.e., block vs. cyclic) for a pthread program in C. Write correct C program using pthread library commands to initialize (create) pthreads...

Perform matrix multiplication using numerical data types. Implement the three specified algorithms in C + + , aiming for the highest efficiency possible. Test your algorithms thoroughly with matrix...

Please ensure that the program is in C not C++. Also, this is to include a header file as well as two other c files. Please indicate what portion goes into each file type Write a C program to solve...

In this question, you will prove the Master Theorem in the special ( and most important ) situation when f ( n ) = n ^ ( z ) for some real number z . This result enables us to determine tight...

Design and Implement an Algorithm in Java to solve the following problem in solving linear equations using the Gaussian Elimination method. Definition: A system of linear equations can be placed into...

java 2 Task: Write a Java program to solve the following problem using 2-dim arrays. Given a 2-dim array (matrix) containing only 1 or 0 values, determine if there are 2 rows 1 and 22 so that -1 is...

Solve the following problem in MATLAB: ( a ) Ask MATLAB to generate a vector, x , of length 1 0 0 0 0 with all entries being independently and identically distributed as uniform from 0 to 2 . ( b )...

AutoSave OFF W Final exam Quantitative 2nd 20-21 student - Compatibility Mode - Saved to my Mac Home Insert Draw Design Layout References Mailings Review View Share Comments Calibri (Bo... 11 "A A 2...

Calculate the income tax for an individual who has: $163,511.00 of income from all sources $6,304.00 in Exclusions $14,996.00 in Adjustments $12,390.00 in below-the-line Deductions $2,737.00 in...

Grace Wielgus operates a small grocery store and has established the following policies for checkout cashiers: a. Each cashier has his or her own cash drawer, to which no one else has access. b....

Create new accounts and related beginning balances as follows: \ table [ [ Account Type,Detail Type,Name,Balance ] , [ Bank , Checking,Checking, 5 , 0 0 0 . 0 0 ] , [ Fixed Assets * , Machinery...

Discuss the impacts of recent events (eg. COVID pandemic, Russia's invasion of Ukraine, etc) on GDP, unemployment, and/or inflation.