Question: question about computer architecture chime approximation that we discussed on Slide 14 in Part 1 of Data Level Paralle overhead is far more significant than

question about computer architecture  question about computer architecture chime approximation that we discussed on Slide

chime approximation that we discussed on Slide 14 in Part 1 of Data Level Paralle overhead is far more significant than the issue limitation. The most important source of overhead ignored by the chime model is vector start-up time 5. The ism is reasonably accurate for long vectors. However, another source of Consider the DAXPY example on slide 12 in Part 1 of Data Level Paralielism. Assume that the functional units are fully pipelined, and that the start-up overhead for the functional units, including the load/store units, is the same as the Cray-1, which is shown on Slide 13 in Part 1 of Data Level Parallelism Also assume that the functional units can be chained, that there is one lane, three load/store units, no memory bank conflicts, and that the memory bandwidth is sufficient to supply or store three vector elements each clock cycle. a) Draw a timing diagram similar to the example on slide 14 which shows how the functional units can be chained, taking into account the start-up latencies of all of the functional units, including the load and store units, and including the operation of storing the results back into main memory. (5 points) What is the total number of cycles needed to process a vector of length 64, including the operation of storing the results back into memory, and the number of cycles per FLOP? (5 points) b)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!