Question: 4. (12 pts). Matrix Multiplication Cache Performance R10 running sum R12 A element value R3 A element address R4 A row base (fixed in inner

4. (12 pts). Matrix Multiplication Cache Performance R10 running sum R12

A element value R3 A element address R4 A row base (fixed

4. (12 pts). Matrix Multiplication Cache Performance R10 running sum R12 A element value R3 A element address R4 A row base (fixed in inner loop) R5 A column offset (variable in inner loop) R16 B element value R7 B element address R8 Brow base (variable in inner loop) R9 B column offset (fixed in inner loop) R0 = 0; A_base, B_base, C_base = base addresses of three matrices ; main: Outer2: addi addi addi Outerl: addi Inner: sub addi addi addi addi add add R4, RO, 400 R4, R4, -40 R9, RO, 40 R9, R9, -4 R10, R10, R10 R5, RO, 40 R8, RO, 400 R5, R5, -4 R8, R8, -40 R3, R4, R5 R7, R8, R9 R12, A base (R3) R16, B_base (R7) R12, R12, R16 R10, R10, R12 R5, Inner R3, R4, R9 C_base (R3), R10 R9, Outeri R4, Outer2 ; set to last column, R4 = R0 + 400 ; decrement A's row base ; set to last element of row ; decrement B's column offset ; clear sum i set to last element of row ; set to last column ; decrement A's column offset ; decrement B's row base ; form A's element address, R3 = R4 + R5 ; form B's element address ; load A's element into R12, R3 is offset ; load B's element into R16, R7 is offset ; compute product ; sum products ; loop across all elements ; compute result address ; store result (R10) to c matrix, R3=offset ; loop across all B's columns ; loop across all A's rows lw mul add bnez add SW bnez bnez Consider the 10 by 10 matrix multiplication algorithm used in the MIPS code above. Two 10 by 10 matrices A and B are multiplied leaving the result in C. The system running the application employs a data cache which is initially empty. The data cache has the following properties: cache organization the number of sets ..........1 The number of lines/set ..... 1024 The number of words/line ....4 T cache ..................... ..... 20 ns Tumain ...................... 100 ns per line write update policy ......... copy-back write allocation policy ..... insert in cache replacement policy .......... (a) For this application, what is the expected number of misses of each type? compulsory: capacity: conflict: (b) What is the hit rate of the cache? (show work) (c) What is the average data access time (T_effective) in ns? (show work) (d) If the cycle time of a pipelined system is 20 ns, how many stalls are introduced for each miss? (show work) 4. (12 pts). Matrix Multiplication Cache Performance R10 running sum R12 A element value R3 A element address R4 A row base (fixed in inner loop) R5 A column offset (variable in inner loop) R16 B element value R7 B element address R8 Brow base (variable in inner loop) R9 B column offset (fixed in inner loop) R0 = 0; A_base, B_base, C_base = base addresses of three matrices ; main: Outer2: addi addi addi Outerl: addi Inner: sub addi addi addi addi add add R4, RO, 400 R4, R4, -40 R9, RO, 40 R9, R9, -4 R10, R10, R10 R5, RO, 40 R8, RO, 400 R5, R5, -4 R8, R8, -40 R3, R4, R5 R7, R8, R9 R12, A base (R3) R16, B_base (R7) R12, R12, R16 R10, R10, R12 R5, Inner R3, R4, R9 C_base (R3), R10 R9, Outeri R4, Outer2 ; set to last column, R4 = R0 + 400 ; decrement A's row base ; set to last element of row ; decrement B's column offset ; clear sum i set to last element of row ; set to last column ; decrement A's column offset ; decrement B's row base ; form A's element address, R3 = R4 + R5 ; form B's element address ; load A's element into R12, R3 is offset ; load B's element into R16, R7 is offset ; compute product ; sum products ; loop across all elements ; compute result address ; store result (R10) to c matrix, R3=offset ; loop across all B's columns ; loop across all A's rows lw mul add bnez add SW bnez bnez Consider the 10 by 10 matrix multiplication algorithm used in the MIPS code above. Two 10 by 10 matrices A and B are multiplied leaving the result in C. The system running the application employs a data cache which is initially empty. The data cache has the following properties: cache organization the number of sets ..........1 The number of lines/set ..... 1024 The number of words/line ....4 T cache ..................... ..... 20 ns Tumain ...................... 100 ns per line write update policy ......... copy-back write allocation policy ..... insert in cache replacement policy .......... (a) For this application, what is the expected number of misses of each type? compulsory: capacity: conflict: (b) What is the hit rate of the cache? (show work) (c) What is the average data access time (T_effective) in ns? (show work) (d) If the cycle time of a pipelined system is 20 ns, how many stalls are introduced for each miss? (show work)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

12 pes. NERETIN Murupuchun arme rerlormance R10 running sum R12 Aalement value R3 Aalement address R4AEOW base (fixed in inner loop) R5 A column offset (variable in inner loop) R16 Belement value R?...

can someone solve this Modern workstations typically have memory systems that incorporate two or three levels of caching. Explain why they are designed like this. [4 marks] In order to investigate...

QUIZ... Let D be a poset and let f : D D be a monotone function. (i) Give the definition of the least pre-fixed point, fix (f), of f. Show that fix (f) is a fixed point of f. [5 marks] (ii) Show that...

D Question 1 25 pts -2 7 Let A = -3 Let C = AB. What value is can? If C does not exist or cas does not exist, then enter 99999 for your answer. is the entry in the matrix C that is in the i-th row...

Question 1 16 pts Matrices are extremely useful in solving systems of equations. In solving an equation involving matrices, division in matrices is not possible, so the [ Select ] V needs to be used....

Test case: DIRECTIONS:- Please help. Kindly, Use C to write this program C matMul_provided.c 1 #include 2 #include 3 #include 4 5 6 int main(int argc, char* argv[]) { 7 8 9 10 11 12 FILE* matrix_a_fp...

****Please do only #2. Make sure to use variables a,b,c,d.**** ****Please name the main method createMatrix.**** Matrix Multiplication Due date 2-26-2021 Your program should perform the followings:...

Here is the matrix.h file #include #include using namespace std; // Definitions of user defined type exceptions struct invalid_input : public exception { virtual const char* what() const throw() {...

In a Hopfield neural network configured as an associative memory, with all of its weights trained and fixed, what three possible behaviours may occur over time in configuration space as the net...

Let i and j be positive integers. (i) Prove that there exist natural numbers a and b such that ai = bj+gcd(i, j). You may use standard results provided that you state them clearly. [4 marks] (ii) Let...

Refer to Exercise 6.98. In that exercise, suppose the mean is set to be 8 ounces, but the standard deviation is unknown. The cups used in the machine can hold up to 8.2 ounces, but these cups will...

LINEAR ALGEBRA NEED THE COMPLETE AND CLEAR SOLUTION PLEASE Is the following a linear combination of 4-04-01-0 B A = 8 -3 -14 14 14 13 = 08 2 ?

Cash Flow From Assets is cash generated from utilizing a company s assets cash generated from a company s net income cash generated from selling a company s common stock cash generated from selling a...

Mr. Lion, who is in the 37 percent tax bracket, is the sole shareholder of Toto Inc., which manufactures greeting cards. Toto's average annual net profit (before deduction of Mr. Lion's salary) is...

12-3 How do business intelligence and business analytics support decision making?

12-1 What are the different types of decisions, and how does the decisionmaking process work?

11-17 How do each of the following types of systems acquire and model knowledge: expert system, genetic algorithms, neural network?