Question: Objective The objective of this problem is to implement a tiled image convolution using both shared and constant memory. We will have a constant 5
Objective
The objective of this problem is to implement a tiled image convolution using both shared
and constant memory. We will have a constant x convolution mask, but will have
arbitrarily sized image. We will also keep the output tile size at x
To use the constant memory for the convolution mask, you should first transfer the mask
data to the global memory of the device using cudaMalloc and cudaMemcpy
Then given that the pointer to the device array for the mask is named M the signature of
your kernel function should include const float restrict M as one
of the parameters as indicated in the class notes. This informs the compiler that the
contents of the mask array are constants and allow the SM hardware to aggressively cache
the mask data at runtime.
Convolution is used in many fields, such as image processing for image filtering. A standard
image convolution formula for a x convolution mask filter M with an Image I is:
where is the output pixel at position ij in channel c is the input pixel at
ij in channel c the number of channels will always be for this problem corresponding
to the RGB values and is the mask at position xy
Input Data
The input is an interleaved image of height X width X channels. By interleaved, we
mean that the element Iyx contains three values representing the RGB channels. This
means that to index a particular element's value, you will have to do something like:
index yIndexwidth xIndexchannels channelIndex;
For this problem, the channelIndex is for R for G and for B So to access the G
value of Iyx you should use the linearized expression
IyIndexwidthxIndexchannels
As mentioned above, the value of channels is always set to
Pseudo Code and Approach
A sequential pseudo code would look something like this:
maskWidth :
maskRadius : maskWidth # this is integer division, so the result is
for i from to height do
for j from to width do
for k from to channels
accum :
for y from maskRadius to maskRadius do
for x from maskRadius to maskRadius do
xOffset : j x
yOffset : i y
if xOffset && xOffset width &&
yOffset && yOffset height then
imagePixel : IyOffset width xOffset channels k
maskValue : KymaskRadiusmaskWidthxmaskRadius
accum imagePixel maskValue
end
end
end
# pixels are in the range of to
Pi width jchannels k clampaccum
end
end
end
where clamp is defined as
def clampx lower, upper
return minmaxx lower upper
end
In this problem, you are expected to write a kernel that performs the same operation that is
performed by the above sequential code eg you also need to clamp your output values
To implement your kernel you can either make your block size as big as the output tile
Design or make your block size as big as the input tile Design
Coding
Edit the code in the code tab to perform the following:
allocate device memory
copy host memory to device
initialize thread block and kernel grid dimensions
invoke CUDA kernel
copy results from device to host
deallocate device memory
implement the tiled D convolution kernel with adjustments for channels and make
sure to:
use the constant memory for the convolution mask
use shared memory to reduce the number of global accesses and handle the boundary
conditions when loading input list elements into the shared memory
clamp your output values
Instructions about where to place each part of the code is demarcated by the @@
comment lines.
Project Setup
To test your program on testcase:
Right click on the ConvolutionTemplate project Properties Configuration Properties
Debugging Commmand Arguments and enter the following:
e ConvolutionDatasetoutputppm
i
ConvolutionDatasetinputppmConvolutionDatasetinputraw
o ConvolutionDatasetmyoutputppm t image
ConvolutionDatasetresulttxt
all in one line do not forget to put spaces between the sublines above you see five
sublines
Here, inputppm is the input image and inputraw is the mask.
You will see the output of your execution in result.txt which resides under
buildConvolutionDataset
Commandline Execution
The executable generated as a result of compiling the project buildDebug
ConvolutionTemplate.exe can be run from the commandline using the following
command make sure you are in build directory:
DebugConvolutionTemplate
e ConvolutionDatasetoutputppm
i
ConvolutionDatasetinputppmConvolutionDatasetinputraw
o ConvolutionDatasetmyoutputppm t image
ConvolutionDatasetresulttxt
all in one line do not forget to put spaces between the sublines above you see six
sublines
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
