Pipeline Datapath & Performance

CS 2506 Computer Organization II
MIPS 3: Forwarding and Hazard Detection
You may work in pairs for this assignment. If you choose to work with a partner, make sure only one of you makes a
submission a solution and that the file lists names and PIDs for both of you as described in the assignment below.
Prepare your answers to the following questions in a plain text file. Submit your file to the Curator system by the posted
deadline for this assignment. No late submissions will be accepted.
You will submit your answers to the Curator System (www.cs.vt.edu/curator) under the heading MIPS03.
For questions 1 through 4, refer to the pipeline design with forwarding, shown below, which supports execution of the any
sequence of the following MIPS instructions: add, sub, and, or, slt, and sw, (and lw so long as no stalls are needed to
resolve load-use hazards).
Remember: this pipeline design does not include (load-use) hazard detection hardware, so it can forward operands but it
cannot introduce stalls to deal with situations that forwarding alone will not handle.
3
2
1
CS 2506 Computer Organization II
1.
MIPS 3: Forwarding and Hazard Detection
Consider the following sequence of MIPS32 assembly instructions (which you may have seen in an earlier
assignment):
lw
add
sub
sw
lw
add
$t3,
$t0,
$t1,
$t3,
$t0,
$t4,
0($t0)
$t3, $t3
$t0, $t3
4($t0)
0($t1)
$t3, $t1
#
#
#
#
#
#
1.1
1.2
1.3
1.4
1.5
1.6
A data dependency occurs when a later instruction requires an input value that is set by an earlier instruction. A data
hazard occurs when one instruction writes a value into a register that will be used as input by a later instruction, but
that value does not actually appear in the register by the cycle on which the later instruction attempts to read it. Note
that a data hazard always implies a data dependency, but some data dependencies do not imply a data hazard. Also
remember that this pipeline design does include hardware for forwarding operands, but not for inserting stalls.
a)
[8 points] Identify the data hazards that would not prevent the given sequence of instructions from executing
correctly on the given hardware design above, even though we do not have the necessary hardware to insert stalls.
For each such hazard, list the writing instruction, the reading instruction, the register involved, and which
interstage buffer the forwarded value will be taken from. Do not list data dependencies unless they would require
forwarding of an operand.
Write your answers in the following form (answer below is NOT correct for this question):
writer
reader
register
forward from
------------------------------------------4.1
4.2
$t5
MEM/WB
b) [8 points] Identify the data hazards that would prevent the given sequence of instructions from executing correctly
on the given hardware design above, even though we do have the necessary hardware to carry out forwarding. For
each such hazard, list the writing instruction, the reading instruction, and the register involved.
Write your answers in the following form (answer below is NOT correct for this question):
writer
reader
register
---------------------------4.1
4.2
$t5
2.
[10 points] Why doesn't the Forwarding unit need the output from MUX labelled 2 in the diagram? Be precise.
3.
[12 points] The Forwarding unit does receive the write-to register number taken from the MEM/WB interstage buffer
(labelled 3 in the diagram). This is the write-to register for the instruction that has just entered the WB stage. Since
that instruction will write a value into the appropriate register while that instruction is in the WB stage, why does the
Forwarding unit need to see its write-to register number?
2
CS 2506 Computer Organization II
MIPS 3: Forwarding and Hazard Detection
For questions 4 and 5, refer to the pipeline design with forwarding and (load-use) hazard detection, shown below, which
supports execution any sequence of the following MIPS instructions: add, sub, and, or, slt, lw, and sw.
4.
[15 points] How many stalls would the load-use Hazard Detection unit trigger if we executed each of the following
sequences of instructions? (The parts are independent.)
a)
lw
add
add
$t1, ($t0)
$t3, $t2, $t1
$t4, $t3, $t1
b) lw
lw
add
add
$t1,
$t2,
$t3,
$t4,
($t0)
($t0)
$t2, $t1
$t3, $t1
lw
lw
add
add
$t1,
$t2,
$t3,
$t4,
($t0)
($t1)
$t2, $t1
$t3, $t1
c)
3
CS 2506 Computer Organization II
5.
MIPS 3: Forwarding and Hazard Detection
[30 points] Suppose we executed the following sequence of instructions, and suppose that the registers have the
indicated initial values:
a)
lw
$t1, ($t0)
add
sub
$t3, $t2, $t1
$t4, $t4, $t3
#
#
#
#
$t0 initially 0x08004000, $t1 initially 1,
Mem[0x08004000] initially 10
$t2 initially 2, $t3 initially 3
$t4 initially 4
Suppose that we executed the instructions above on the pipeline with forwarding and load-use hazard detection.
What would be the final values in the registers $t1, $t3 and $t4?
b) Suppose that we executed the instructions above and, when a load-use hazard occurred, we did not prevent updates
to the PC register, but we did prevent updates to the IF/ID buffer. What would be the final values in the registers
$t1, $t3 and $t4?
c)
6.
Suppose that we executed the instructions above and, when a load-use hazard occurred, we did not prevent updates
to the IF/ID buffer, but we did prevent updates to the PC register. What would be the final values in the registers
$t1, $t3 and $t4?
MAD Corporation currently produces three different processors, all executing the same machine language:



a)
P1 has a 2.4 GHz clock rate and an advertised CPI of 1.2
P2 has a 3.2 GHz clock rate and an advertised CPI of 1.5
P3 has a 3.8 GHz clock rate and an advertised CPI of 1.8
[5 points] Using IPS (instructions per second) as your criterion, and accepting the information given above, which
processor offers the best performance? Justify your conclusion precisely.
b) [6 points] It takes 12 seconds (of CPU time) to execute a certain benchmark on P2. How many machine
instructions are executed when that benchmark is run on P2? Justify your conclusion precisely.
c)
[6 points] MAD would like to reduce the execution time of that benchmark on P2 by 25%, but the redesign they've
come up with would entail increasing the CPI by 15%. What clock rate must they apply in order to achieve their
goal? State the clock rate to the nearest hundredth of a GHz. Justify your conclusion precisely.
4