Question: Seeking your assistance in solving the second bullet point correctly. I am not clearly understanding if cudaMemcpyHostToDevice will copy over a variable value from CPU

Seeking your assistance in solving the second bullet point correctly. I am not clearly understanding if cudaMemcpyHostToDevice will copy over a variable value from CPU

(

System Shared Memory?

)

to GPU

(

Global Shared Memory on the device?

) .

My understanding is that both the matrix A and the vector b needs to be copied over to GPU in this scenario, and the copy destination is the Global Shared Memory, and then, b needs to be copied over to the shared memory on each Streaming Multiprocessor. And the host code may have these actions within:

/ /

Allocate memory for matrix A and vector b using cudaMalloc

/ /

Copy A

,

,

and c into GPU via cudaMalloc

/ /

Launch kernel threads by invoking

__

global

__

matrixVectorMultiply which I need to write. dimGrid and dimBlock need to be set before invoking matrixVectorMultiply

(

should the argument to dimGrid be

1

,

meaning

1024,

and dimBlock

1

here?

)

Please advise

- - - - - - - - -

Attached Problem

- - - - - - - - - - - - - -

6 [15

points

]

Matrix

-

vector multiplication using CUDA

Design a CUDA program to perform matrix

-

vector multiplication c

=

A xx b

.

The size

of matrix A is

1

K xx

1

.

The size of vectors b and c is

1

K xx

1 .

Your program should

use

1

K threads in total. Assume the shared memory is large enough to hold the entire

vector b

.

The input matrix and the vector are initially stored in the host memory.

Write a pseudo code for the host function and kernel function. Note that your kernel

function must use shared memory to store vector b

.

Assume each element of A

,

,

c is

4

bytes; data transfer between CPU and GPU is

through PCIe whose bandwidth is

16

/ /

s in each direction; the clock rate of GPU

1

GHz ; the access latency to global memory and shared memory is

100

clock cycles

and

10

clock cycles, respectively; multiply

-

add operations are overlapped with memory

access operations. What is the execution time of your CUDA program in the best case?

6 [15

points

]

Matrix

-

vector multiplication using CUDA

-

Design a CUDA program to perform matrix

-

vector multiplication

\ (

=

\

times b

\) .

The size of matrix A is

\ (1

\

times

1

\) .

The size of vectors

\ (

\)

and

\ (

\)

\ (1

\

times

1 \) .

Your program should use

\ (1

\)

threads in total. Assume the shared memory is large enough to hold the entire vector

\ (

\) .

The input matrix and the vector are initially stored in the host memory. Write a pseudo code for the host function and kernel function. Note that your kernel function must use shared memory to store vector

\ (

\) .

-

Assume each element of

\ (

,

,

\)

4

bytes; data transfer between CPU and GPU is through PCIe whose bandwidth is

\ (16 \

mathrm

{

~GB

} / \

mathrm

{

} \)

in each direction; the clock rate of GPU is

1

GHz ; the access latency to global memory and shared memory is

100

clock cycles and

10

clock cycles, respectively; multiply

-

add operations are overlapped with memory access operations. What is the execution time of your CUDA program in the best case?

Seeking your assistance in solving the second

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

J. of the Acad. Mark. Sci. (2013) 41:389-399 DOI 10.1007/s11747-013-0331-z CONCEPTUAL/THEORETICAL PAPER Elevating marketing: marketing is dead! Long live marketing! Frederick E. Webster Jr. & Robert...

think about what procedural changes would have the biggest positive impact, without being excessively costly for our lab members at every level (including undergrads!). Reference: the Lab Data Check...

Who qualifies as an expert witness? 100 - 150 Words 13 DOCUMENTING AND PRESENTING THE CASE INTRODUCTION This chapter will explain how to pull everything together into a coherent report, and then,...

Needing help with page 9 (Section 3.33 - Cost of Capital Guide) Must be done on the Commonwealth Bank of Australia for 2017. any significant help would be much appreciated :) Happy to pay more for...

Provide a summary technical report with your own words about Pipelined Execution which is also named as Instruction Level Parallelism, addressing mainly the following areas: 1. What is Pipelined...

Competitive Information Policy at Pratt & Whitney Like everything else in the world, there's a line you should not cross. You should know where the legal line is. Hopefully, it isn't too far from...

What approach should Hirsch take with Metropolis? Why? Create a list of 5 risks with the approach you recommend for Hirsch? Take your top 2 risks and discuss how Hirsch could best mitigate each of...

Database Administration CHAPTER OBJECTIVES I Understand the need for and importance of database I ttno'illI the meaning of ACID transaction administralin - Learn the tour 1992 ANSI standard isolation...

Management must understand what needs to change. A culture of performance excellence is very different from a traditional management culture. Many traditional practices stem from the fundamental...

For the reaction: I+ClO3 + HSO4Cl +HSO4 +1 The correct statement(s) in the balanced equation is/are: (A) Stoichiometric coefficient of HSO4 is 6. (B) Iodide is oxidized. (C) Sulphur is reduced. (D)...

You have just been appointed financial controller at ABC Company, a manufacturer of specialized equipment used by various manufacturers of consumer products on their own production lines. Your...

The primary difference between an American and a European option is Group of answer choices European options must be exercised on the expiration date. American options may be exercised at any point...

For each of the internal controls that the manager of CC provided you with regarding their revenue division, provide the risks that CC is mitigating through the implementation of the internal...