Question: optimize matrix multiplication (matmul) code to run fast on a single processor core of XSEDE's Bridges cluster. We consider a special case of matmul: C
optimize matrix multiplication (matmul) code to run fast on a single processor core of XSEDE's Bridges cluster. We consider a special case of matmul: C := C + A*B where A, B, and C are n x n matrices. This can be performed using 2n3 floating point operations (n3 adds, n3 multiplies), as in the following pseudocode:
for i = 1 to n for j = 1 to n for k = 1 to n C(i,j) = C(i,j) + A(i,k) * B(k,j) end end end
The task is to optimize the previous code using C-language
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
