Question: A uniprocessor system accesses memory over a 32-bit bus. The processor has a write-back (as opposed to write through) direct mapped data cache employing a
A uniprocessor system accesses memory over a 32-bit bus. The processor has a write-back (as opposed to write through) direct mapped data cache employing a write-allocate policy. The data cache operates in look-through mode and contains 65536 lines each of which is 1024 bytes in size. Each data cache access takes 2 CPU clock cycles to either detect a cache miss, or to perform a cache read or to perform a cache write. Loading a memory block into cache or writing a cache line back to memory takes 400 CPU clock cycles. Recall that for a cache miss the data item that is needed is read in parallel with loading the memory block containing the data item into cache. The operating system flushes the contents of cache to memory after a program terminates.
The code shown below reads and updates each of the 134217728 four-byte elements in an integer array. The address of the array in memory is 0x100840A0.
In an attempt to speedup the processing of the array, the code is executed on an 8-core system in which each core has a separate data cache identical to the data cache for the uniprocessor. A speedup factor of 8 for the 8-core system compared to the uniprocessor would correspond to what is called linear speedup.
What would be the actual speedup provided by the 8-core system based on just the time required to read and update the array elements, ignoring the time required by the instructions that do not reference memory? Each of the 8 cores executes a separate copy of the code but with the appropriate starting address and loop limit that would correspond to 1/8th of the array. Assume that all data caches are initially empty.
A uniprocessor sys accesses memory over a 3 write-back (as opposed to th data cache employing a write-allocate a data cache perates contains 65536 lines each through mode, uses true takes clock of which is bytes in size. Each data cache access perform a cycles to detect cache miss, or to perform a cache read or back to cache ding a memory block into cache a cache line item that takes 400 CPU clock cycles. Recall that for a cache the data item is needed is read in parallel with loading the memory block containing the after a into cache. The operating system flushes the of cache program terminates. The code shown below reads and updates each of the 134217728 four-byte elements in an integer array. The address of the array in memory is 0x100840A0. lui $8,0x1008 point to first element to be processed ori $8,0x40A0 loop: lui $4,0x0800 number of elements 134217728 0x08000000) lw $12,0 ($8) get next element addi $4,$4,-1 decrement loop control variable sub $12, $0,$12 negate by subtracting from o addi $8,$8,4 point to next element sw $12,-4($8) update element that was read bgez $4, loop repeat if more elements remain nop
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
