Question: 4 . 3 Part B: Optimizing Matrix Transpose In Part B you will write a transpose function in trans.c that causes as few cache misses

4.3 Part B: Optimizing Matrix Transpose
In Part B you will write a transpose function in trans.c that causes as few cache misses as possible.
Let A denote a matrix, and Aij denote the component on the ith row and jth column. The transpose of A,
denoted AT
, is a matrix such that Aij = AT
ji.
To help you get started, we have given you an example transpose function in trans.c that computes the
transpose of N \times M matrix A and stores the results in M \times N matrix B:
char trans_desc[]= "Simple row-wise scan transpose";
void trans(int M, int N, int A[N][M], int B[M][N])
The example transpose function is correct, but it is inefficient because the access pattern results in relatively
many cache misses.
Your job in Part B is to write a similar function, called transpose_submit, that minimizes the number
of cache misses across different sized matrices:
char transpose_submit_desc[]= "Transpose submission";
void transpose_submit(int M, int N, int A[N][M], int B[M][N]);
Do not change the description string (Transpose submission) for your transpose_submit
function. The autograder searches for this string to determine which transpose function to evaluate for
credit.
Programming Rules for Part B
Include your name and email in the header comment for trans.c.
Your code in trans.c must compile without warnings to receive credit.
You are allowed to define at most 12 local variables of type int per transpose function.1
You are not allowed to side-step the previous rule by using any variables of type long or by using
any bit tricks to store more than one value to a single variable.
1The reason for this restriction is that our testing code is not able to count references to the stack. We want you to limit your
references to the stack and focus on the access patterns of the source and destination arrays.
4
Your transpose function may not use recursion.
If you choose to use helper functions, you may not have more than 12 local variables on the stack
at a time between your helper functions and your top level transpose function. For example, if your
transpose declares 8 variables, and then you call a function which uses 4 variables, which calls another
function which uses 2, you will have 14 variables on the stack, and you will be in violation of the rule.
Your transpose function may not modify array A. You may, however, do whatever you want with the
contents of array B.
You are NOT allowed to define any arrays in your code or to use any variant of malloc.5.2 Evaluation for Part B
For Part B, we will evaluate the correctness and performance of your transpose_submit function on
three different-sized output matrices:
32\times 32(M =32, N =32)
64\times 64(M =64, N =64)
61\times 67(M =61, N =67)We have provided you with an autograding program, called test-trans.c, that tests the correctness and
performance of each of the transpose functions that you have registered with the autograder.
7
You can register up to 100 versions of the transpose function in your trans.c file. Each transpose version
has the following form:
/* Header comment */
char trans_simple_desc[]= "A simple transpose";
void trans_simple(int M, int N, int A[N][M], int B[M][N])
{
/* your transpose code here */
}
Register a particular transpose function with the autograder by making a call of the form:
registerTransFunction(trans_simple, trans_simple_desc);
in the registerFunctions routine in trans.c. At runtime, the autograder will evaluate each reg-
istered transpose function and print the results. Of course, one of the registered functions must be the
transpose_submit function that you are submitting for credit:
registerTransFunction(transpose_submit, transpose_submit_desc);
See the default trans.c function for an example of how this works.
The autograder takes the matrix size as input. It uses valgrind to generate a trace of each registered trans-
pose function. It then evaluates each trace by running the reference simulator on a cache with parameters
(s =5, E =1, b =5).
For example, to test your registered transpose functions on a 32\times 32 matrix, rebuild test-trans, and
then run it with the appropriate values for M and N:
linux> make
linux>./test-trans -M 32-N 32
Step 1: Evaluating registered transpose funcs for correctness:
func 0(Transpose submission): correctness: 1
func 1(Simple row-wise scan transpose): correctness: 1
func 2(column-wise scan transpose): correctness: 1
func 3(using a zig-zag access pattern): correctness: 1
Step 2: Generating memory traces for registered transpose funcs.
Step 3: Evaluating performance of registered transpose funcs (s=5, E=1, b=5)
func 0(Transpose submission): hits:1766, misses:287, evictions:255
func 1(Simple row-wise scan transpose): hits:870, misses:1183, evictions:1151
func 2(column-wise scan transpose): hits:870, misses:1183, evictions:1151
func 3(using a zig-zag access pattern): hits:1076, misses:977, evictions:945

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!