Question: ----------------------------------------------------------------------------------------------------------------------------------------------------------------- fractal.cpp code to change #include #include #include #include cs43805351.h static const double Delta = 0.004; static const double xMid = 0.2389; static const double

$----------------------------------------------------------------------------------------------------------------------------------------------------------------- fractal.cpp code to change #include #include #include #include "cs43805351.h" static$

-----------------------------------------------------------------------------------------------------------------------------------------------------------------

fractal.cpp code to change

#include #include #include #include "cs43805351.h"

static const double Delta = 0.004; static const double xMid = 0.2389; static const double yMid = 0.55267;

static void fractal(const int width, const int frames, unsigned char* pic) { // compute frames for (int frame = 0; frame 0) && ((x2 + y2)

int main(int argc, char *argv[]) { printf("Fractal v1.7 ");

// check command line if (argc != 3) {fprintf(stderr, "usage: %s frame_width num_frames ", argv[0]); exit(-1);} const int width = atoi(argv[1]); if (width

// allocate picture array unsigned char* pic = new unsigned char[frames * width * width];

// start time timeval start, end; gettimeofday(&start, NULL);

fractal(width, frames, pic);

// end time gettimeofday(&end, NULL); const double runtime = end.tv_sec - start.tv_sec + (end.tv_usec - start.tv_usec) / 1000000.0; printf("compute time: %.3f s ", runtime);

// verify result by writing frames to BMP files if ((width

delete [] pic; return 0; }

Parallelize the fractal computation using CUDA. To obtain enough parallelism, simultaneously parallelize all three for loops in the fractal function. In other words, all three of these loops should be removed as you will have to launch a thread for each iteration of the loop nest (i.e., for each pixel), meaning that only the loop body remains. Follow these steps when writing the CUDA code 1. 2. 3. 4. 5. Base your code on the serial code from Project 1 and change it as little as possible Include the CUDA header file. Use 512 threads per block. Turn the fractal function into a GPU kernel without changing the parameters. Remove the three for loops and replace them with the following code, where idx is the globally unique index computed by each thread const int col-idx% width const int row-idx / width) %) width const int frame idx/ (width * width) 6. Make sure any excess threads that you create do not contribute to the result. 7. Allocate space for the equivalent of the pic array on the GPU but do not initialize it. 8. When calling the kernel, be sure to round up the number of thread blocks but do not launch more thread blocks than necessary 9. After the kernel call, but still inside the timed code section, call cudaDeviceSynchronize( 10. Insert a copy of the CheckCuda function from the vector-addition code and call it after the timed code section. 11. Before outputting the BMPs, copy the result from the GPU to the CPU 12. Free all dynamically allocated memory 13. Check the code by producing a carefully looking at some BMPs. Parallelize the fractal computation using CUDA. To obtain enough parallelism, simultaneously parallelize all three for loops in the fractal function. In other words, all three of these loops should be removed as you will have to launch a thread for each iteration of the loop nest (i.e., for each pixel), meaning that only the loop body remains. Follow these steps when writing the CUDA code 1. 2. 3. 4. 5. Base your code on the serial code from Project 1 and change it as little as possible Include the CUDA header file. Use 512 threads per block. Turn the fractal function into a GPU kernel without changing the parameters. Remove the three for loops and replace them with the following code, where idx is the globally unique index computed by each thread const int col-idx% width const int row-idx / width) %) width const int frame idx/ (width * width) 6. Make sure any excess threads that you create do not contribute to the result. 7. Allocate space for the equivalent of the pic array on the GPU but do not initialize it. 8. When calling the kernel, be sure to round up the number of thread blocks but do not launch more thread blocks than necessary 9. After the kernel call, but still inside the timed code section, call cudaDeviceSynchronize( 10. Insert a copy of the CheckCuda function from the vector-addition code and call it after the timed code section. 11. Before outputting the BMPs, copy the result from the GPU to the CPU 12. Free all dynamically allocated memory 13. Check the code by producing a carefully looking at some BMPs

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Parallelize the FOR loop that iterates over the frames in the fractal code using a cyclic assignment of iterations to threads. Follow these steps when writing the pthread code: 1. Include the pthread...

Objective is to create a \"sheet\" of hexagons that is 13 hexes horizontal by 11 hexes vertical. There are two files provided that provide the drawing library (Code 1), along with a sample program...

There are some issues with my assignment code, my professor gave me an issue list, please help me fix my code my code: #include #include #include using namespace std; class Movie { //Private member...

//Employee.h #ifndef EMPLOYEE_H #define EMPLOYEE_H #include #include using namespace std; class Employee { private: static int lastEmployeeNumberIssued; // Sequential employee number int...

language: C++ all the instructions provided below. Objectives Upon completion of this assignment, you will to be able to: - Be familiar with a simple user-defined container abstract data type. -...

Can anyone help me finish this in c++? I need to make a table2.template!! C++] Write a program implementing chained hashing as described in Section 12-3. You can use the prototypes at the author's...

implement interest from deposit date till withdrawal date #include / / Included the header file for standard input and output #include / / Included the header file for string Operations #include / /...

Prerequisites You are provided with a complete circular linked-list deque implementation in deque.c. You should be familiar with the Deque data structure and how to use it before starting this...

There are three parts to this assignment. In the rst two parts, you will complete the implementation of a hash map and a concordance program. In the third part, you will answer a number of questions...

For a sufficiently large d.f., the chi-square distribution can be approximated by the standard normal distribution as: Z = 2x2 - 2k - 1 ~ N (0, 1). Let k = 50. a. Use the chi-square table to find out...

(a) A 50 cm x 50 cm x 1 m rectangular cross section garbage bin is found in the morning tipped over due to high wind the night before. If the density of the air, the density of the garbage bin and...

Research suggests that motivation stems from four fundamental drives. Describe them.

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

What combination of Key Fields in a Salary Structure Table allows for storage of historical Pay Grade Data for multiple Occupation Group Pay Grade Structures?

Explain the function and purpose of the Job Level Table.

What are Compa-Ratios? Explain the difference between Internal and External (Market Based) Compa-Ratios.