Question: 1. Introduction This assignment involves an introduction to parallel programming, i.c., software designed to cxccute on multiple processors of a computer, using OpenMP. The assignment


1. Introduction This assignment involves an introduction to parallel programming, i.c., software designed to cxccute on multiple processors of a computer, using OpenMP. The assignment involves writing two versions of a parallel code in C Programming to perform matrix multiplication. i.e., compute C=B*A where A, B, and C are matrices and * is matrix multiplication. You may assume the matrices are all square with N rows and N columns Parallelization can be accomplished by obscrving that C[ii] is simply the dot product of rowji of A and column j of B. The N*N dot products could all be performed in parallel by mapping cach to a different thread. Here, you will use computation of a single row of the result matrix (N dot products) as the unit of computation distributed among the thrcads, resulting in a maximum of N-fold parallelism. You will develop two different versions of the matrix multiplication code and compare the performance achieved by cach with a scquential exccution, i.c., exccution using a single thrcad. The first uses a static mapping approach while the second uses a dynamic worker" approach. Your code must be designed to work so the number of threads can be casily changed (e.g., using #define or a command line argument without making significant changes to the code 2. Static Mapping Assume there are K threads performing the computation. In this approach, cach thread is statically assigned a set of rows of the result matrix C for which it is responsible for computing results. Specifically, thread 0 is responsible for computing values for rows 0, K, 2K, of C. Thread 1 computes values for rows 1, K+1,2K+1, In other words, the rows of the matrix are assigned in round robin fashion to the different threads 3. Dynamic Mapping In the dynamic approach a pool of worker threads" is used. This means you create K threads to perform the computation. Each worker thread repeatedly accesses a global data structure to allocate a single row of the result matrix to compute. It then performs this computation, and then goes back to the global data structure to allocate another piece of work to do. This process continues until the entire matrix computation is complete. Each row of the result matrix must be assigned to exactly one thread. In other words, each worker thread executes the following loojp While (rows of the result matrix have not been computed Allocate a row of the result matrix to compute Compute results for this row of the result matrix 1. Introduction This assignment involves an introduction to parallel programming, i.c., software designed to cxccute on multiple processors of a computer, using OpenMP. The assignment involves writing two versions of a parallel code in C Programming to perform matrix multiplication. i.e., compute C=B*A where A, B, and C are matrices and * is matrix multiplication. You may assume the matrices are all square with N rows and N columns Parallelization can be accomplished by obscrving that C[ii] is simply the dot product of rowji of A and column j of B. The N*N dot products could all be performed in parallel by mapping cach to a different thread. Here, you will use computation of a single row of the result matrix (N dot products) as the unit of computation distributed among the thrcads, resulting in a maximum of N-fold parallelism. You will develop two different versions of the matrix multiplication code and compare the performance achieved by cach with a scquential exccution, i.c., exccution using a single thrcad. The first uses a static mapping approach while the second uses a dynamic worker" approach. Your code must be designed to work so the number of threads can be casily changed (e.g., using #define or a command line argument without making significant changes to the code 2. Static Mapping Assume there are K threads performing the computation. In this approach, cach thread is statically assigned a set of rows of the result matrix C for which it is responsible for computing results. Specifically, thread 0 is responsible for computing values for rows 0, K, 2K, of C. Thread 1 computes values for rows 1, K+1,2K+1, In other words, the rows of the matrix are assigned in round robin fashion to the different threads 3. Dynamic Mapping In the dynamic approach a pool of worker threads" is used. This means you create K threads to perform the computation. Each worker thread repeatedly accesses a global data structure to allocate a single row of the result matrix to compute. It then performs this computation, and then goes back to the global data structure to allocate another piece of work to do. This process continues until the entire matrix computation is complete. Each row of the result matrix must be assigned to exactly one thread. In other words, each worker thread executes the following loojp While (rows of the result matrix have not been computed Allocate a row of the result matrix to compute Compute results for this row of the result matrix
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
