Objective The objective of this problem is to implement a tiled image convolution using both shared and constant memory We will have a constant 5 x 5 convolution mask, but will have arbitrarily sized image We will also keep the output tile size at 1 6 x 1 6 To use the constant memory for the convolution mask, you should first transfer the mask data to the global memory of the device ( using cudaMalloc ( ) and cudaMemcpy ( ) ) Then ( given that the pointer to the device array for the mask is named M ) the signature of your kernel function should include const float restrict M as one of the parameters as indicated in the class notes This informs the compiler that the contents of the mask array are constants and allow the SM hardware to aggressively cache the mask data at runtime Convolution is used in many fields, such as image processing for image filtering A standard image convolution formula for a 5 x 5 convolution mask ( filter ) M with an Image I is , , , , , where , , is the output pixel at position i , j in channel c , , , is the input pixel at i , j in channel c ( the number of channels will always be 3 for this problem corresponding to the RGB values ) , and , is the mask at position x , y Input Data The input is an interleaved image of height X width X channels By interleaved, we mean that the element I y x contains three values representing the RGB channels This means that to index a particular element's value, you will have to do something like index ( yIndex width xIndex ) channels channelIndex For this problem, the channelIndex is 0 for R , 1 for G , and 2 for B So , to access the G value of I y x , you should use the linearized expression I ( yIndex width xIndex ) channels 1 As mentioned above, the value of channels is always set to 3 Pseudo Code and Approach A sequential pseudo code would look something like this maskWidth 5 maskRadius maskWidth 2 this is integer division, so the result is 2 for i from 0 to height do for j from 0 to width do for k from 0 to channels accum 0 for y from maskRadius to maskRadius do for x from maskRadius to maskRadius do xOffset j x yOffset i y if xOffset 0 xOffset width yOffset 0 yOffset height then imagePixel I ( yOffset width xOffset ) channels k maskValue K ( y maskRadius ) maskWidth x maskRadius accum imagePixel maskValue end end end pixels are in the range of 0 to 1 P ( i width j ) channels k clamp ( accum , 0 , 1 ) end end end where clamp is defined as def clamp ( x , lower, upper ) return min ( max ( x , lower ) , upper ) end In this problem, you are expected to write a kernel that performs the same operation that is performed by the above sequential code ( e g , you also need to clamp your output values ) To implement your kernel you can either make your block size as big as the output tile ( Design 1 ) or make your block size as big as the input tile ( Design 2 ) Coding Edit the code in the code tab to perform the following allocate device memory copy host memory to device initialize thread block and kernel grid dimensions invoke CUDA kernel copy results from device to host deallocate device memory implement the tiled 2 D convolution kernel with adjustments for channels and make sure to use the constant memory for the convolution mask use shared memory to reduce the number of global accesses and handle the boundary conditions when loading input list elements into the shared memory clamp your output values Instructions about where to place each part of the code is demarcated by the comment lines Project Setup To test your program on test case 0 Right click on the Convolution Template project Properties Configuration Properties Debugging Commmand Arguments and enter the following e Convolution Dataset 0 output ppm i Convolution Dataset 0 input 0 ppm , Convolution Dataset 0 input 1 raw o Convolution Dataset 0 myoutput ppm t image Convolution Dataset 0 result txt ( all in one line do not forget to put spaces between the sub lines above you see five sub lines ) Here, input 0 ppm is the input image and input 1 raw is the mask You will see the output of your execution in result txt which resides under build Convolution Dataset 0 Command line Execution The executable generated as a result of compiling the project ( build Debug Convolution Template exe ) can be run from the command line using the following command ( make sure you are in build directory ) Debug Convolution Template e Convolution Dataset 0 output ppm i Convolution Dataset 0 input 0 ppm , Convolution Dataset 0 input 1 raw o Convolution Dataset 0 myoutput ppm t image Convolution Dataset 0 result txt ( all in one line do not forget to put spaces between the sub lines above you see six sub lines )

The Answer is in the image, click to view ...

Question: Objective The objective of this problem is to implement a tiled image convolution using both shared and constant memory. We will have a constant 5

Objective

The objective of this problem is to implement a tiled image convolution using both shared

and constant memory. We will have a constant

5

5

convolution mask, but will have

arbitrarily sized image. We will also keep the output tile size at

16

16 .

To use the constant memory for the convolution mask, you should first transfer the mask

data to the global memory of the device

(

using cudaMalloc

()

and cudaMemcpy

()) .

Then

(

given that the pointer to the device array for the mask is named M

)

the signature of

your kernel function should include const float

*__

restrict

__

M as one

of the parameters as indicated in the class notes. This informs the compiler that the

contents of the mask array are constants and allow the SM hardware to aggressively cache

the mask data at runtime.

Convolution is used in many fields, such as image processing for image filtering. A standard

image convolution formula for a

5

5

convolution mask

(

filter

)

M with an Image I is:

,,,,

,

where

,,

is the output pixel at position i

,

j in channel c

,,,

is the input pixel at

,

j in channel c

(

the number of channels will always be

3

for this problem corresponding

to the RGB values

),

and

,

is the mask at position x

,

.

Input Data

The input is an interleaved image of height X width X channels. By interleaved, we

mean that the element I

[

] [

]

contains three values representing the RGB channels. This

means that to index a particular element's value, you will have to do something like:

index

= (

yIndex

*

width

+

xIndex

) *

channels

+

channelIndex;

For this problem, the channelIndex is

0

for R

, 1

for G

,

and

2

for B

.

,

to access the G

value of I

[

] [

],

you should use the linearized expression

[(

yIndex

*

width

+

xIndex

) *

channels

+ 1] .

As mentioned above, the value of channels is always set to

3 .

Pseudo Code and Approach

A sequential pseudo code would look something like this:

maskWidth :

= 5

maskRadius :

=

maskWidth

/ 2

# this is integer division, so the result is

2

for i from

0

to height do

for j from

0

to width do

for k from

0

to channels

accum :

= 0

for y from

maskRadius to maskRadius do

for x from

maskRadius to maskRadius do

xOffset :

=

+

yOffset :

=

+

if xOffset

> = 0

&& xOffset

<

width &&

yOffset

> = 0

&& yOffset

<

height then

imagePixel :

=

[(

yOffset

*

width

+

xOffset

) *

channels

+

]

maskValue :

=

[(

+

maskRadius

) *

maskWidth

+

+

maskRadius

]

accum

+ =

imagePixel

*

maskValue

end

# pixels are in the range of

0

1

[(

*

width

+

) *

channels

+

] =

clamp

(

accum

, 0, 1)

end

where clamp is defined as

def clamp

(

,

lower, upper

)

return min

(

max

(

,

lower

),

upper

)

end

In this problem, you are expected to write a kernel that performs the same operation that is

performed by the above sequential code

(

.

.,

you also need to clamp your output values

) .

To implement your kernel you can either make your block size as big as the output tile

(

Design

1)

or make your block size as big as the input tile

(

Design

2) .

Coding

Edit the code in the code tab to perform the following:

allocate device memory

copy host memory to device

initialize thread block and kernel grid dimensions

invoke CUDA kernel

copy results from device to host

deallocate device memory

implement the tiled

2

D convolution kernel with adjustments for channels and make

sure to:

-

use the constant memory for the convolution mask

-

use shared memory to reduce the number of global accesses and handle the boundary

conditions when loading input list elements into the shared memory

-

clamp your output values

Instructions about where to place each part of the code is demarcated by the

/ /

comment lines.

Project Setup

To test your program on test

_

case

_0

Right click on the Convolution

_

Template project

- >

Properties

- >

Configuration Properties

- >

Debugging

- >

Commmand Arguments and enter the following:

-

. \

Convolution

\

Dataset

\ 0 \

output

.

ppm

-

. \

Convolution

\

Dataset

\ 0 \

input

0 .

ppm

, . \

Convolution

\

Dataset

\ 0 \

input

1 .

raw

-

. \

Convolution

\

Dataset

\ 0 \

myoutput

.

ppm

-

t image

>

. \

Convolution

\

Dataset

\ 0 \

result

.

txt

(

all in one line

do not forget to put spaces between the sub

lines

above you see five

sub

lines

)

Here, input

0 .

ppm is the input image and input

1 .

raw is the mask.

You will see the output of your execution in result.txt which resides under

build

\

Convolution

\

Dataset

\ 0 \

Command

line Execution

The executable generated as a result of compiling the project

(

build

\

Debug

\

Convolution

_

Template.exe

)

can be run from the command

-

line using the following

command

(

make sure you are in build directory

)

. \

Debug

\

Convolution

_

Template

-

. \

Convolution

\

Dataset

\ 0 \

output

.

ppm

-

. \

Convolution

\

Dataset

\ 0 \

input

0 .

ppm

, . \

Convolution

\

Dataset

\ 0 \

input

1 .

raw

-

. \

Convolution

\

Dataset

\ 0 \

myoutput

.

ppm

-

t image

>

. \

Convolution

\

Dataset

\ 0 \

result

.

txt

(

all in one line

do not forget to put spaces between the sub

lines

above you see six

sub

lines

)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

computer science your input image is named "santa - grayscale.jpg " , the execution command would be " / / convolution santa - grayscale.jpg " No command argument for filter radius is necessary in...

computer science question Task Description:In this assignment, you are tasked with developing a complete CUDA CIC + + t program for an imageblur application, also known as image smoothing that we...

computer science your input image is named "santa - grayscale.jpg " , the execution command would be " / / convolution santa - grayscale.jpg " No command argument for filter radius is necessary in...

computer science question your input image is named "santa - grayscale.jpg " , the execution command would be " / / convolution santa - grayscale.jpg " No command argument for filter radius is...

cmomputer science your input image is named "santa - grayscale.jpg " , the execution command would be " / / convolution santa - grayscale.jpg " No command argument for filter radius is necessary in...

computer science question your input image is named "santa - grayscale.jpg " , the execution command would be " / / convolution santa - grayscale.jpg " No command argument for filter radius is...

computer science your input image is named "santa - grayscale.jpg " , the execution command would be " / / convolution santa - grayscale.jpg " No command argument for filter radius is necessary in...

A physics instructor believes that natural lighting in classrooms improves student learning. He conducts an experiment in which he teaches the same physics unit to two groups of seven randomly...

Da Best coffee roasters claim that the actual weight of a 200 gram (g) packet of their coffee beans can be modelled as a continuous random variable, X, with the probability density function: fx(x) =...

Problem 1 4 - 2 ( Static ) ( LO 1 4 - 1 ) How does partnership accounting differ from corporate accourting? Multuple Choice The matching principle is not considered appropriate for partnership...

Financial Metrics Year 1 Year 2 Year 3 Liquidity Ratios - Current Ratio 2.5 2.9 3.2 - Quick Ratio 1.9 2.2 2.6 Profitability Ratios - Net Profit Margin 14% 16% 18% - Return on Assets (ROA) 9% 11% 13%...