Question: You experiment with an embedded device having a one - level data cache ( 1 2 8 Bytes ) and a main memory ( 1

You experiment with an embedded device having a one-level data cache (128 Bytes) and a main
memory (1K Bytes. You exclusively focus on data accesses instead of instruction access). The
latencies (in CPU cycles) of the different kinds of accesses are as follows:
Cache hit: 1 cycle; Cache miss: 110 cycles; Main memory access with cache disabled: 80 cycles;
Now, Considering the following matrix multiplication C=AB, please answer the following
questions (please show detailed steps)
A=[x0,0cdotsx0,lvdotsddotsvdotsxm,0cdotsxm,l] and B=[y0,0cdotsy0,nvdotsddotsvdotsyl,0cdotsyl,n]
What is the dimension of the matrix C?
For multiplication using ALU, assume you pre-load both A and B into the main memory from
the storage and reserve space for C in the main memory to speed up the performance. The Pre-
loaded A and B takes 50% of the main memory space and reserved space for C takes 12.5% of
the main memory. All elements in A, B, and C have the same bitwidth. Assuming we know the
value of m and (m+n) is a constant, then what is the maximum value of l?
Following the result from 2), If x is a 16-bit integer, what are the dimensions of A,B, and C?
Following the result from 3), Assume that the cache is a fully associative cache with the least
recently used cache replacement policy (LRU)(ask me or Wikipedia if you forgot) and the result
can be directly written back to the main memory without sacrificing the memory read, what is
the total memory access time (total cycles) for the matrix multiplication if we strictly follow the
instruction order as follows?
for (int i=0; im;i++)
{
for (int j=0;jn;j++)
{
C(i,j)=A(i;)**BTT(;)
}
}
How to make changes to leverage the locality of the data in cache for improvement? How
much can you improve with your method?
After the computation, if we randomly access elements from A, B, and C continuously, what
would the average memory access time (cycles per access) be?
Based on the observation from the results of 6) and the memory access time given at the
beginning, what can you conclude?
 You experiment with an embedded device having a one-level data cache

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!