Question: Ben moves on to consider using a vector machine. Ben's vector processor has these features: Single-issue, in-order execution. Scalar instructions execute on a 5-stage, fully-by

Ben moves on to consider using a vector machine. Ben's vector processor has these features: Single-issue, in-order execution. Scalar instructions execute on a 5-stage, fully-by passed pipeline. 2 vector registers named Vo through V31. Each vector register holds 32 floating-point elements. The register files have enough ports to keep all lanes busy. Four vector lanes, each with one floating-point ALU and one load-store unit. Vector loads and arithmetic take four cycles to produce results and one cycle for writeback. No support for vector chaining. This schematic shows a simplified view of the processor: 0x41 Add Scalar ALU addr Inst Inst Memory Register File PC Scalar Load Store Unit M3 M4 M1 M2 Vector Register File Vector Load-Store Unit Vector ALU X3 X4 The processor can issue a single (scalar or vector) instruction per cycle. Once it issues, a vector instruction uses either all lanes' ALUs or all lanes' load-store units for as many cycles as needed to produce all its results. Vector units are pipelined, so independent operations can be issued in sequence such that each stage in cach vector unit operates on different values every cycle. A vector load or store can execute in parallel with independent operations that use the vector ALUS, and vector operations can execute in parallel with scalar operations. If a vector instruction depends on the result of a prior instruction, it stalls until the prior instruction finishes writing back all of its results. The processor implements MIPS plus the following vector instructions: Ben wants to analyze the performance of this vector processor on the same loop as in Part A: for (i = 0; i
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
