Question: Problem 4 Consider the following computation loop which updates and relocates a data Array 1 into a data Array 2. conditionally, based on loaded values
Problem 4 Consider the following computation loop which updates and relocates a data Array 1 into a data Array 2. conditionally, based on loaded values from Array 3. 21prob-predict.tex Loop: LD R3, 1000 (R1) integer variable load from array 3 branch if variable is zero float load21prob-predict.tex froa array 1 float add store into array 2 update array 2 pointer update loop counter loop back BEZ R3, TAG LF FO, 0(R1) ADDF F4, FO, F2 SD (R2), F4 TAG : SUB R2, R2, #8 SUB R1 , R1, #8 BNZ R1, Loop We assume a Load/Store instruction set architecture augmented for floating point operations, The machine is one-instruction issue and has an ADD/SUB integer unit, an ADDF/SUBF floating point unit and a branch predict unit. Branch instructions are delayed. There are 2 latency cycles associated with the execution of the floating ADDF/SUBF operations. Also, LD and LF need one load slot and the branches have one branch delay slot. The above loop is supposed to execute 10 iterations in total.21prob-predict.tesx We assume that the branch prediction buffer is initially reset. In all questions below you should include the effect of the branch prediction buffer. Recall, there is no penalty for correct prediction, however, there is a 2 cycle penalty for misprediction If an instruction is not in the buffer, and turns out to be branch, the penalty depends on whether it is taken or not a) Without any loop body optimizations, what is the minimum possible execution time of the above loop for 10 iterations? b) What is the marimum possible execution time of the loop for 10 iterations? c) Perform loop body optimizations, without loop unfolding, and then repeat a) above. How are your results affected? d) If you consider loop unfolding in a) and b), how are your results in this problem going to change. Problem 4 Consider the following computation loop which updates and relocates a data Array 1 into a data Array 2. conditionally, based on loaded values from Array 3. 21prob-predict.tex Loop: LD R3, 1000 (R1) integer variable load from array 3 branch if variable is zero float load21prob-predict.tex froa array 1 float add store into array 2 update array 2 pointer update loop counter loop back BEZ R3, TAG LF FO, 0(R1) ADDF F4, FO, F2 SD (R2), F4 TAG : SUB R2, R2, #8 SUB R1 , R1, #8 BNZ R1, Loop We assume a Load/Store instruction set architecture augmented for floating point operations, The machine is one-instruction issue and has an ADD/SUB integer unit, an ADDF/SUBF floating point unit and a branch predict unit. Branch instructions are delayed. There are 2 latency cycles associated with the execution of the floating ADDF/SUBF operations. Also, LD and LF need one load slot and the branches have one branch delay slot. The above loop is supposed to execute 10 iterations in total.21prob-predict.tesx We assume that the branch prediction buffer is initially reset. In all questions below you should include the effect of the branch prediction buffer. Recall, there is no penalty for correct prediction, however, there is a 2 cycle penalty for misprediction If an instruction is not in the buffer, and turns out to be branch, the penalty depends on whether it is taken or not a) Without any loop body optimizations, what is the minimum possible execution time of the above loop for 10 iterations? b) What is the marimum possible execution time of the loop for 10 iterations? c) Perform loop body optimizations, without loop unfolding, and then repeat a) above. How are your results affected? d) If you consider loop unfolding in a) and b), how are your results in this problem going to change
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
