Question: Improve your implementation by using both cache blocking and register blocking at the same time. Optimize your block sizes to achieve the best performance for
Improve your implementation by using both cache blocking and register blocking at the same time. Optimize your block sizes to achieve the best performance for n=1024. Your test matrices have to be 64-bit double floating point random numbers. Please always verify the correctness of your code.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
