Question: The transpose of a matrix interchanges its rows and columns; this is illustrated below: left [ begin { matrix } A 1 1

The transpose of a matrix interchanges its rows and columns; this is illustrated below:
\left[\begin{matrix}A11&A12&A13&A14\\A21&A22&A23&A24\\A31&A32&A33&A34\\A41&A42&A43&A44\\\end{matrix}\right]\Longrightarrow\left[\begin{matrix}A11&A21&A31&A41\\A12&A22&A32&A42\\A13&A23&A33&A43\\A14&A24&A34&A44\\\end{matrix}\right]
Here is a simple C loop to show the transpose:
for (i =0; 1<3; i++){
for (j =0; j <3; j++){
output[j][i]= input[i][j];
}
}
Assume that both the input and output matrices are stored in the row major order (row major order means that the row index changes fastest).
Assume that you are executing a 256 x 256 double-precision transpose on a processor with a 16 KB fully associative (don't worry about cache conflicts; cache just has 1 set) least recently used (LRU) replacement L1 data cache with 64 byte blocks.
Assume that the L1 cache misses or prefetches require 16 cycles and always hit in the L2 cache, and that the L2 cache can process a request every two processor cycles.
Assume that each iteration of the inner loop above requires four cycles if the data are present in the L1 cache.
Assume that the cache has a write-allocate fetch-on-write policy for write misses.
Unrealistically, assume that writing back dirty cache blocks requires 0 cycles.
For the simple implementation given above, this execution order would be nonideal for the input matrix; however, applying a loop interchange optimization would create a nonideal order for the output matrix. Because loop interchange is not sufficient to improve its performance, it must be blocked instead.
What should be the minimum size of the cache to take advantage of blocked execution?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!