Question: write an OpenMP program to to optimize matrix multiplication (matmul) code to run fast on a single processor core We consider a special case of
write an OpenMP program to to optimize matrix multiplication (matmul) code to run fast on a single processor core
We consider a special case of matmul:
C := C + A*B
where A, B, and C are n x n matrices. This can be performed using 2n3 floating point operations (n3 adds, n3 multiplies), as in the following pseudocode:
for i = 1 to n for j = 1 to n for k = 1 to n C(i,j) = C(i,j) + A(i,k) * B(k,j) end end end
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
