Question: ----------------------------------------------------------------------------------------------------------------------------------------------------------------- Collatz.cpp to convert to cuda #include #include #include static int collatz(const long range) { // compute sequence lengths int maxlen = 0; for (long

-----------------------------------------------------------------------------------------------------------------------------------------------------------------

Collatz.cpp to convert to cuda

#include #include #include

static int collatz(const long range) { // compute sequence lengths int maxlen = 0; for (long i = 1; i

return maxlen; }

int main(int argc, char *argv[]) { printf("Collatz v1.0 ");

// check command line if (argc != 2) {fprintf(stderr, "usage: %s range ", argv[0]); exit(-1);} const long range = atol(argv[1]); if (range

// start time timeval start, end; gettimeofday(&start, NULL);

const int maxlen = collatz(range);

// end time gettimeofday(&end, NULL); const double runtime = end.tv_sec - start.tv_sec + (end.tv_usec - start.tv_usec) / 1000000.0; printf("compute time: %.3f s ", runtime);

// print result printf("longest sequence: %d elements ", maxlen);

return 0; }

Parallelize the for loop that iterates over the range in the Collatz code in such a way that each thread runs one of the iterations. Follow these steps when writing the CUDA code 1. 2. 3. 4. Base your code on the serial code from Project 1 and change it as little as possible Include the CUDA header file Use 512 threads per block. Turn the collatz function into a GPU kernel. 5. Eliminate the for loop entirely as we will launch one thread for each iteration. 6. Make sure any excess threads that you create do not contribute to the result. 7. Compute the unique global identifier like this const long idx -threadldx.x + blockldx.x * (long)blockDim.x 8. In lieu of a return value, pass an int* parameter called maxlen to the kernel. 9. Each thread should update maxlen using atomicMax(...) but only if the thread's result is longer than the current maxlen 10. Allocate space for maxlen before the kernel call, initialize the CPU counterpart of this var- iable to zero, and copy its value to the GPU. Do all of this before the timed code section. 11. When calling the kernel, be sure to round up the number of thread blocks but do not launch more thread blocks than necessary 12. After the kernel call, but still inside the timed code section, call cudaDeviceSynchronizel 13. Insert a copy of the CheckCuda function from the vector-addition code and call it after the timed code section. 14. Before printing the result, copy maxlen from the GPU to the CPU 15. Free all dynamically allocated memory 16. Make sure the code produces the same results as the serial CPU code Parallelize the for loop that iterates over the range in the Collatz code in such a way that each thread runs one of the iterations. Follow these steps when writing the CUDA code 1. 2. 3. 4. Base your code on the serial code from Project 1 and change it as little as possible Include the CUDA header file Use 512 threads per block. Turn the collatz function into a GPU kernel. 5. Eliminate the for loop entirely as we will launch one thread for each iteration. 6. Make sure any excess threads that you create do not contribute to the result. 7. Compute the unique global identifier like this const long idx -threadldx.x + blockldx.x * (long)blockDim.x 8. In lieu of a return value, pass an int* parameter called maxlen to the kernel. 9. Each thread should update maxlen using atomicMax(...) but only if the thread's result is longer than the current maxlen 10. Allocate space for maxlen before the kernel call, initialize the CPU counterpart of this var- iable to zero, and copy its value to the GPU. Do all of this before the timed code section. 11. When calling the kernel, be sure to round up the number of thread blocks but do not launch more thread blocks than necessary 12. After the kernel call, but still inside the timed code section, call cudaDeviceSynchronizel 13. Insert a copy of the CheckCuda function from the vector-addition code and call it after the timed code section. 14. Before printing the result, copy maxlen from the GPU to the CPU 15. Free all dynamically allocated memory 16. Make sure the code produces the same results as the serial CPU code

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Please translate this Java code to C++... Char.java ** * Custom Char class * @author Anonymous * */ public class Char { private char value; public Char(){ this.value = '\0'; } public Char(char c){...

Discrete structure MAC281 Discrete Structures Project (Integrative learning, written) Goal: The main goal of the project is to let students use their prior knowledge, try to use the skills they have...

I need help implementing this C++ code (Window and Linux -- Vim). when accessing character arrays in your DynamicString class, you cannot use []. You must use pointers. The only place that should use...

c++ Overview In this assignment, you will simulate a simple board game. The board is a grid, and starts with a pile of money in each cell. Players take turns rolling four dice to pick a cell, and...

Write a linux kernel module that creates an entry in the /proc file system . Use the following code skeleton to write the module and replace the commented lines with the right code: #include #include...

Parallelize the FOR loop that iterates over the frames in the fractal code using a cyclic assignment of iterations to threads. Follow these steps when writing the pthread code: 1. Include the pthread...

Summary: you have been given a partially implemented C++ class called TravelOptions (file: TravelOptions.h). The class implements an ADT using singly-linked lists for which some functions are already...

need help with this program dont know why it does not want to run ... this is what i have this program is in c++ // Header file #ifndef RATIONAL_H #define RATIONAL_H #include #include #include using...

Programming Assignment 1: Representing, Managing and Manipulating Travel Options You may NOT do any of the following: change any of the function names or signatures (parameter lists and types and...

Explain the check the box election. Prior to this election, how did the IRS determine whether a business was taxed at the entity level or as a pass through entity?

An aircraft company wanted to predict the number of worker-hours necessary to finish the design of a new plane. Relevant explanatory variables were thought to be the planes top speed, its weight, and...

An investor should realize that some brokers may be biased toward selling their own funds rather than those of other mutual fund organizations. True False

1. What factors in the marketing environment present opportunities or threats to automakers? (AACSB: Communication; Reflective Thinking)2. Will it be possible for a startup automaker such as...

LO4.1 Differentiate between demand-side market failures and supply-side market failures.

3. A price ceiling will result in a shortage only if the ceiling price is _______________ the equilibrium price. LO3.7 a. Less than. b. Equal to. c. Greater than. d. Faster than.

LO4.4 Explain how positive and negative externalities cause under- and overallocations of resources.