Question: I would like to ask a question about how to wriate a funcion to transpose a 64*64 matrix with less misses. this is the function
I would like to ask a question about how to wriate a funcion to transpose a 64*64 matrix with less misses.
this is the function i'm about to write
void transpose_submit(size_t M, size_t N, double A[N][M], double B[M][N], double *tmp){}
Performance (26 pts) For each matrix size, the performance of your transpose submit function is evaluated by using LLVM-based instrumentation to extract the address trace for your function, and then using the reference simulator to replay this trace on a cache with parameters s 5, E 1, b 6). Using the reference cache simulator, each transpose function will be assigned some number of clock cycles m A cache miss is worth 100 clock cycles, while a cache hit is worth 4. Your performance score for each matrix size will scale linearly with m up to some threshold. The scores are computed as: 32 x 32 10 points if m 35,000 0 points if m 45,000 64 x 64 10 points if m 150,000,0 points if m 200,000 63 x 65 6 points if m 280,000, 0 points if m 350,000 For example, a solution for the 32 x 32 matrix with 1764 hits and 284 misses (m 1764 x 4 284 x 100 35456) would receive 9.5 of the possible 10 points You can optimize your code specifically for the three cases in the performance evaluation. In particular, it is perfectly OK for your function to explicitly check for the matrix sizes and implement separate code optimized for each case
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
