Question: For this assignment youre going to use OpenMP to implement a parallel prefix sum that works with an arbitrary sized array and the default number

For this assignment youre going to use OpenMP to implement a parallel prefix sum that works with an arbitrary sized array and the default number of OpenMP threads.

Use the flat version of the algorithm presented in class that involves the following steps:

1. local prefix sum on each partition

2. a serial prefix sum of the maximum values of each partition

3. a second iteration over each partition, adding the maximum from the partition to the left.

Remember, this code evenly distributes size indices across nt threads:

int nt = omp_get_num_threads ();

int rank = omp_get_thread_num ();

int count = size / nt;

int start = rank * count ;

int mod = size % nt;

if ( rank < mod )

{ count ++; start += rank ; }

else { start += mod; }

Use omp_get_wtime() to calculate the wall clock times for various runs. How close to linear speedup do you get? You will be given accounts on the CSE Qbert cluster where you can test on 1-16 cores.

Your program should be compiled like this:

g++ -fopenmp -O3 -Wall source.cpp -o executable

Your program can be run with different numbers of threads like this:

OMP_NUM_THREADS=16 ./executable

This algorithm requires a large array and relatively many threads to see a performance improvement.

Heres a place to start:

# include

# include

# include

# include

// replace the contents of data with the prefix sum of the contents

void prefix_sum (std :: vector & data )

{

# pragma omp parallel default ( none ) shared ( data )

{

int rank = omp_get_thread_num ();

int nt = omp_get_num_threads ();

// Your code goes here .

}

}

int main (int argc , char * argv [])

{

// test with a vector of 1s

std :: vector data (10000 , 1);

double start = omp_get_wtime ();

prefix_sum ( data );

double end = omp_get_wtime ();

printf (" took %lf to prefix sum %d elements on %d ", (end - start ), (int ) data . size (), omp_get_max_threads ());

}

Hints

Use #pragma omp barrier to synchronize threads within a parallel section.

Iterate over the prefix sum results to see if it works.

Comment out extraneous print statements in the final code.

Turn in ..

The source code file.

A screenshot of your code being compiled and run.

A plot of the wall clock time as a function of number of threads, for a large enough array that the values are meaningful.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!