Question: ----------------------------------------------------------------------------------------------------------------------------------------------------------------- Collatz.cpp to convert to cuda #include #include #include static int collatz(const long range) { // compute sequence lengths int maxlen = 0; for (long
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Collatz.cpp to convert to cuda
#include
static int collatz(const long range) { // compute sequence lengths int maxlen = 0; for (long i = 1; i
return maxlen; }
int main(int argc, char *argv[]) { printf("Collatz v1.0 ");
// check command line if (argc != 2) {fprintf(stderr, "usage: %s range ", argv[0]); exit(-1);} const long range = atol(argv[1]); if (range
// start time timeval start, end; gettimeofday(&start, NULL);
const int maxlen = collatz(range);
// end time gettimeofday(&end, NULL); const double runtime = end.tv_sec - start.tv_sec + (end.tv_usec - start.tv_usec) / 1000000.0; printf("compute time: %.3f s ", runtime);
// print result printf("longest sequence: %d elements ", maxlen);
return 0; }
Parallelize the for loop that iterates over the range in the Collatz code in such a way that each thread runs one of the iterations. Follow these steps when writing the CUDA code 1. 2. 3. 4. Base your code on the serial code from Project 1 and change it as little as possible Include the CUDA header file Use 512 threads per block. Turn the collatz function into a GPU kernel. 5. Eliminate the for loop entirely as we will launch one thread for each iteration. 6. Make sure any excess threads that you create do not contribute to the result. 7. Compute the unique global identifier like this const long idx -threadldx.x + blockldx.x * (long)blockDim.x 8. In lieu of a return value, pass an int* parameter called maxlen to the kernel. 9. Each thread should update maxlen using atomicMax(...) but only if the thread's result is longer than the current maxlen 10. Allocate space for maxlen before the kernel call, initialize the CPU counterpart of this var- iable to zero, and copy its value to the GPU. Do all of this before the timed code section. 11. When calling the kernel, be sure to round up the number of thread blocks but do not launch more thread blocks than necessary 12. After the kernel call, but still inside the timed code section, call cudaDeviceSynchronizel 13. Insert a copy of the CheckCuda function from the vector-addition code and call it after the timed code section. 14. Before printing the result, copy maxlen from the GPU to the CPU 15. Free all dynamically allocated memory 16. Make sure the code produces the same results as the serial CPU code Parallelize the for loop that iterates over the range in the Collatz code in such a way that each thread runs one of the iterations. Follow these steps when writing the CUDA code 1. 2. 3. 4. Base your code on the serial code from Project 1 and change it as little as possible Include the CUDA header file Use 512 threads per block. Turn the collatz function into a GPU kernel. 5. Eliminate the for loop entirely as we will launch one thread for each iteration. 6. Make sure any excess threads that you create do not contribute to the result. 7. Compute the unique global identifier like this const long idx -threadldx.x + blockldx.x * (long)blockDim.x 8. In lieu of a return value, pass an int* parameter called maxlen to the kernel. 9. Each thread should update maxlen using atomicMax(...) but only if the thread's result is longer than the current maxlen 10. Allocate space for maxlen before the kernel call, initialize the CPU counterpart of this var- iable to zero, and copy its value to the GPU. Do all of this before the timed code section. 11. When calling the kernel, be sure to round up the number of thread blocks but do not launch more thread blocks than necessary 12. After the kernel call, but still inside the timed code section, call cudaDeviceSynchronizel 13. Insert a copy of the CheckCuda function from the vector-addition code and call it after the timed code section. 14. Before printing the result, copy maxlen from the GPU to the CPU 15. Free all dynamically allocated memory 16. Make sure the code produces the same results as the serial CPU code
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
