Question: Parallel computing Problem 1 - Chunk Matrix Multiply - Consider the Chunk Matrix multiplication depicted in the diagram below, where example chunks distributed are shown

Parallel computing Problem 1 - Chunk Matrix Multiply - Consider the

Chunk Matrix multiplication depicted in the diagram below, where example chunks distributed

are shown with shading in the diagram (unshaded diagrams are similarly distributed).

Problem 1.1 - Write expressions, using values from the A and B

matrices (ie. A(1,i) and B(1,j) to show the expressions that would be

computed and the results for the two positions indicated in the C

matrix, labeling your answers Box1 and Box2. Be sure to show in

detail each A(1,j) and B(1,j) in your expressions. Box1: Box2: A(1,0)B(0,2)+A(1,1)B(1,2)+A(1,2)B(2,2)+A(1,3)B(3,2)=670A(3,0)B(0,1)+A(3,1)B(1,1)+A(3,2)B(2,1)+A(3,3)B(3,1)=1412 Problem

1.2 - Briefly describe advantages and disadvantages of Matrix Chunk Distribution, compared

to a distribution by row and column and when you would use

a Chunk approach instead of Row and Column. Matrix Chunk Distribution is

used to reduce calculations of matrices, column-row expansions and in computer science

applications. It is also used in VLSI chip design. The Strassen algorithm

Parallel computing

is used for faster multiplication and the hamming encoding is used for

Problem 1 - Chunk Matrix Multiply - Consider the Chunk Matrix multiplication depicted in the diagram below, where example chunks distributed are shown with shading in the diagram (unshaded diagrams are similarly distributed). Problem 1.1 - Write expressions, using values from the A and B matrices (ie. A(1,i) and B(1,j) to show the expressions that would be computed and the results for the two positions indicated in the C matrix, labeling your answers Box1 and Box2. Be sure to show in detail each A(1,j) and B(1,j) in your expressions. Box1: Box2: A(1,0)B(0,2)+A(1,1)B(1,2)+A(1,2)B(2,2)+A(1,3)B(3,2)=670A(3,0)B(0,1)+A(3,1)B(1,1)+A(3,2)B(2,1)+A(3,3)B(3,1)=1412 Problem 1.2 - Briefly describe advantages and disadvantages of Matrix Chunk Distribution, compared to a distribution by row and column and when you would use a Chunk approach instead of Row and Column. Matrix Chunk Distribution is used to reduce calculations of matrices, column-row expansions and in computer science applications. It is also used in VLSI chip design. The Strassen algorithm is used for faster multiplication and the hamming encoding is used for error detection in data transmissions. The concept of partitioning brings useful advantages using its greater flexibility. The partitioning can also be considered as a data management tool that organizes the transfer of information between main memory and the auxiliary devices. Also, in sparse matrix technology, partitioning plays an important role because many algorithms are designed primarily for matrices of numbers that can be generalized to operate on matrices of matrices. Most algorithms are blockable. A major software development of blocked algorithms for linear algebra has been conducted. For example, in the solution of differential equations by the difference and the variational methods, are abundant. Such codes have been rewritten as block methods to use the small memory and the large ssd on cray supercomputers in a better way. These kinds of examples have shown that blocking is advantageous for squeezing the best possible performance out of the advanced architectures. But, a blocked code is much more difficult to write and to understand as compared to a point code. Writing it is a big error-prone job. It introduces block size parameters that need to be adjusted for each computer and each algorithm even when they have nothing to do with the problem being solved. Unfortunately, not having blocked code causes poor performance on even the most powerful computers available. Problem 1.3 - How would the sizes of the matrices involved affect whether you would use a Chunk Multiply approach or Row and Column? The time complexity for row and column is O(N3). We can use chunk multiply so it would be O(M3). Where M profile.txt % gprof -1./exeFile gmon.out > profile_line.txt % gprof -A ./exefile gmon.out > profile_anotated.txt In the flat profile we can identify the most expensive parts of the code (in this case, the calls to matsqrt, matcube, and syscube). Below are the steps taken to profile: - Call the graph profile - Visual call graph using syssqrt, matsqrt, vecsqrt, etc. Steps followed to grof: [root@thakkah] gprof - I trap.c > gprof.out [root@thakkah] more gprof.out Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls Ts/call Ts/call name 94.3918 .0618 .06 main (matmul_par.c:68 @ 400aea) 5.7219.151.09 main (matmul_par.c:66 @ 400b22) 0.0519 .160 .01 main (matmul_par.c:50 @ 4009f8) 0.0519 .170 .01 main (matmul_par.c:54 @ 400a57) 0.0519 .180 .01 main (matmul_par.c:70 @ 400b30) 0.0519 .190 .01 main (matmul_par.c:85 @400c39) Problem 3.2 - Use MPE for performance analysis for the implementation shown in the figure Briefly describe how the MPE output helps in your performance analysis. Note: You are expected to use Gprof with the given code in this problem and show the outputs. When using MPE, you can use instead of the long commandline a wrapper-script called mpecc/mpef77, e.g. mpecc-mpilog -o test.mpe test.c - Run the application as usually mpirun np2./trap.mpe - After the run, there is file called trap.mpe.clog 2 in the same directory - The viewer (jumpshot) unfortunately needs a different file format, so you have to convert it with clog2TOslog2 trap.mpe.clog 2 - This generates a file called trap.mpe.slog2 - You can load an slog2 file into jumpshot The viewer (jumpshot) unfortunately needs a different file format, so you have to convert it with thakkah@clog2TOslog2 trap.mpe.clog2 - This generates a file called trap.mpe.slog2 - You can load an slog2 file into jumpshot Problem 3.3 - Describe any significant performance problems in the implementation given in the figure and how you would expect any problems that you find to affect performance. Most of the code uses serialized code to calculate each process's local integral and it is also used in the integrals calculated by each process. This can be easily parallelized into MPI code using the MPI_Reduce function. Problem 3.4 - Implement a fix for any problems that you find in the given implementation., You can make changes to the code below or just show any changes you would make in your exam solution file, being careful to show exactly what could you would modify, replace or add. /*Length of each process' interval of integration = local_n*h. */ local_a = a + my_rank*local_n*h; local_b = local_a + local_n*h; MPI_Barrier(MPI_COMM_WORLD); start = MPI_Wtime(); /* Calculate each process' local integral using local endpoints*/ local_int = Trap(local_a, local_b, local_n, h); finish = MPI_Wtime(); loc_elapsed = finish-start; MPI_Reduce(\&loc_elapsed, \&elapsed, 1, MPI_DOUBLE, MPI_MAX, 0, MPI_COMM_WORLD); /*Add up the integrals calculated by each process */ MPI_Reduce(\&local_int, \&total_int, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); \#include "mpi.h" float f(floatx) float return_val; Calculate f(x)./ / Store calculation in return_val. / return_val =xx; return return_val; \}/f/<.>

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

MATHEMATICS FOR MACHINE LEARNING Marc Peter Deisenroth A. Aldo Faisal Cheng Soon Ong Contents Foreword 1 Part I Mathematical Foundations 9 1 Introduction and Motivation 11 1.1 Finding Words for...

Developments in Technology Light is incident from air on the end face of a multimode optical fibre at angle of incidence as shown below. n n 1 2 The refractive indices of the core and cladding are...

QUIZ... Let D be a poset and let f : D D be a monotone function. (i) Give the definition of the least pre-fixed point, fix (f), of f. Show that fix (f) is a fixed point of f. [5 marks] (ii) Show that...

Portray in words what transforms you would have to make to your execution to some degree (a) to accomplish this and remark on the benefits and detriments of this thought.You are approached to compose...

ttth Suppose that the sequence of bags {Bn | n N} is recursively enumerated by the computable function e(n, x) = fn(x), [7 marks] Hence prove that the set of all recursive bags cannot be recursively...

In a Hopfield neural network configured as an associative memory, with all of its weights trained and fixed, what three possible behaviours may occur over time in configuration space as the net...

Suppose that R(A, B, C) is a relational schema with functional dependencies F = {A, B C, C B}. (i) Is this schema in 3NF? Explain. [2 marks] (ii) Is this schema in BCNF? Explain. [2 marks] (b)...

Jupiter Notebook We have covered some of the limitations of single layer neural networks in class, but they are still powerful learning systems that provide a good way to begin learning about how to...

Briefly describe ASCII and Unicode and draw attention to any relationship between them. [3 marks] (b) Briefly explain what a Reader is in the context of reading characters from data. [3 marks] A...

In this question you will be asked to reflect on a project you have been involved in or observed, in which a design evolved, or could have evolved, through applying a theory of user behaviour. You...

Write sample SQL queries using Any, All, Like, Between, Union, Intersect operators. Also, explain the results of these SQL queries.

Describe a Method To Evaluate Soft Copy Before Printing of Hard Copy in analog photography.

ltiple Select Question Select all that apply Which of the following are reasons why corporations sell bonds? Multiple select question. They need money to finance ongoing activities. They don't like...

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

Explain the difference between Job Analysis, Job Classification, and Job Evaluation.

What does Processing of an OLAP Cube accomplish?

After designing a Multidimensional Database in Visual Studio, what are the next steps that build the Database in the Analysis Services Instance? How is the build out of the Analytical Services...