Question: Grayscale Host: Allocate device memory. Copy host memory (the bitmap pixel data) to device. Create a width-by-height grid of 1-by-1 blocks Each block corresponds to
Grayscale
Host:
Allocate device memory.
Copy host memory (the bitmap pixel data) to device.
Create a width-by-height grid of 1-by-1 blocks
Each block corresponds to an individual pixel, whose coordinates are given as blockIdx.x + blockIdx.y * gridDim.x. (Remember that access to global memory is only in the form of 1-D arrays.) Invoke a CUDA kernel which you will write. Insert this kernel code prior to imgProc().
Copy results from device to host.
Deallocate device memory.
(1) How many floating operations are being performed in your color conversion kernel? EXPLAIN.
(2) How many global memory reads are being performed by your kernel? EXPLAIN.
(3) How many global memory writes are being performed by your kernel? EXPLAIN.
(4) Describe what possible optimizations can be implemented to your kernel to achieve a performance speedup.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
