Instructions for Assignment 7

page 1 of 6
ENCM 501 Winter 2015 Assignment 7
for the Week of March 16
Steve Norman
Department of Electrical & Computer Engineering
University of Calgary
March 2015
Assignment instructions and other documents for ENCM 501 can be found at
http://people.ucalgary.ca/~norman/encm501winter2015/
1
Administrative details
1.1
Group work is permitted
Here are the options:
• You may do your work entirely individually.
• A group of two or three students may hand in a single assignment for the
whole group.
• Collaboration at the level of individual exercises is acceptable. In that case,
submissions of complete, individual assignments are required, with explicit
acknowledgments given as needed on an exercise-by-exercise basis.
Informal discussion of assignment exercises between students is encouraged, and
does not need to be acknowledged.
Please be aware that all students are expected to understand all assignment exercises! Collaboration is of course not allowed on quizzes, the midterm test, and the
final exam.
1.2
Due Dates
The Due Date for this assignment is 3:30pm, Thursday, March 19.
The Late Due Date is 3:30pm, Friday, March 20.
The penalty for handing in an assignment after the Due Date but before the Late
Due Date is 3 marks. In other words, X/10 becomes (X–3)/10 if the assignment
is late. There will be no credit for assignments turned in after the Late Due Date;
they will be returned unmarked.
1.3
A
B
total
Marking scheme
16 marks
3 marks
19 marks
ENCM 501 Winter 2015 Assignment 7
1.4
page 2 of 6
How to package and hand in your assignments
Please see the instructions in Assignment 1.
2
Exercise A: Tracing instructions in a simple pipeline
2.1
Read This First
To get a feel for how pipelining really works, nothing beats studying a pipelined
circuit and tracing the flow of chunks of instructions and data through the circuit.
Figure 1 is taken from the current textbook for ENEL 353 and ENCM 369. It
shows a 5-stage pipeline for processing of the 32-bit MIPS instructions listed in
Figure 2.
What is especially awesome about Figure 1, in addition to its outstanding attention to detail, is the helpful naming scheme for all the signals in the circuit. Here is
an important example: WriteRegE is a 5-bit signal indicating the destination GPR
of the instruction that is currently in the Execute stage, while WriteRegM indicates
the destination GPR of an earlier instruction that has made it to the Memory stage.
Carefully note the “bubble” on the CLK input to the Register File: Writes to
GPRs happen on negative clock edges, while writes to the PC and pipeline registers
happen on positive clock edges. So data that makes it to the pipeline register at
the end of the Memory stage can be copied into the Register File half a clock cycle
later—in other words, the Writeback step only takes half a clock cycle.
Figure 1: Pipelined processor design to support eight MIPS32 instructions. (Image is
Figure 7.47 from Harris D. M. and Harris S. L., Digital Design and Computer Architecc 2013, Elsevier, Inc.) Note that this circuit does not have elements to
ture, 2nd ed., deal correctly with data hazards and control hazards, so is capable of processing a lot
of instructions incorrectly! (Figure 7.58 in the same book shows how to handle hazards
correctly.)
CLK
5:0
RegWriteE
RegWriteM
RegWriteW
MemtoRegD
MemtoRegE
MemtoRegM
MemtoRegW
MemWriteD
MemWriteE
MemWriteM
BranchD
BranchE
BranchM
Op
ALUControlD
ALUControlE2:0
Funct
ALUSrcD
ALUSrcE
RegDstD
RegDstE
CLK
0
PC'
PCF
1
A
RD
Instruction
Memory
4
PCPlus4F
ALUOutW
CLK
InstrD
25:21
20:16
A1
A2
A3
WD3
+
PCSrcM
CLK
WE3
RD2
0 SrcBE
1
Register
File
20:16
RtE
15:11
RdE
15:0
Sign Extend
0
1
SignImmE
ALUOutM
WriteDataE
WriteDataM
WriteRegE4:0
WriteRegM4:0
A
RD
Data
Memory
WD
ReadDataW
0
1
WriteRegW4:0
<<2
+
PCPlus4D
WE
ZeroM
SrcAE
RD1
ALU
CLK
CLK
RegWriteD
Control
Unit
31:26
CLK
PCBranchM
PCPlus4E
ResultW
ENCM 501 Winter 2015 Assignment 7
page 3 of 6
Figure 2: Machine code format for the 32-bit MIPS instructions supported by the processor of Figure 1.
R-type
31
26 25
000000
21 20
source
GPR 1
16 15
source
GPR 2
11 10
dest.
GPR
65
00000
0
“funct”
field
funct is 100000 for ADD, 100010 for SUB, 100100 for AND,
100101 for OR, and 101010 for SLT.
LW
31
26 25
21 20
pointer
100011
GPR
16 15
0
dest.
GPR
offset
SW
31
26 25
21 20
pointer
101011
GPR
16 15
0
source
GPR
offset
BEQ
31
26 25
21 20
source
000100
GPR 1
16 15
0
source
GPR 2
offset
Figure 3: Hint about how to draw pipeline diagrams for Exercise A.
LW
ADD
cycle
1
cycle
2
cycle
3
cycle
4
F
D
E
M
F
D
E
cycle
5
cycle
6
W
M
W
ENCM 501 Winter 2015 Assignment 7
2.2
page 4 of 6
What to Do, Part I
Consider the following sequence of MIPS32 instructions:
address
0x00408000
0x00408004
0x00408008
0x0040800c
code
0x8d280014
0x01485820
0x01886825
0x01c87822
assembly language
LW
R8, 20(R9)
ADD
R11, R10, R8
OR
R13, R12, R8
SUB
R15, R14, R8
Let’s use “Cycle 1” as the name for the clock cycle in which LW is in the Fetch
stage, “Cycle 2” for the very next clock cycle, and so on. Assume:
• At the beginning of Cycle 1,
R8 = 0x00000051, R9 = 0x10010000, R10 = 0x00000009,
R12 = 0x00000007, R14 = 0x00000074,
and the Data Memory word at address 0x10010014 is 0x00000098.
• Whatever five or so instructions get into the pipeline just before LW, none
of them write to any of the registers R8–R15, and none of them are BEQ
instructions.
• The Control Unit “does the right thing” with all of its outputs. For example,
this multiplexer . . .
RtE
RdE
5
5
0
5
WriteRegE
1
. . . will copy RtE to WriteRegE when the instruction in Execute is LW, but will
copy RdE to WriteRegE when the instruction is R-Type.
Answer all the questions in Items 2–14 below. An answer is given for Item 1 as a
model to follow. Before you start, drawing an extended version of the diagram in
Figure 3 is highly recommended.
1. In Cycle 1, what values do the signals PCF and PCPlus4F take on?
In Cycle 2, what value does InstrD take on, and what is the bit pattern for
bits 20:16 of InstrD?
In Cycle 1, PCF is 0x00408000, the address of the LW instruction, and PCPlus4F is 0x00408004, the address of the next instruction to be fetched.
In Cycle 2, InstrD is 0x8d280014, the machine code for the LW instruction;
bits 20:16 of that are 01000 to specify R8 as the destination of the LW.
2. In Cycle 3, what are the ALU inputs SrcAE and SrcBE? (Hint: The job of the
ALU in this cycle is to compute the memory address for LW.)
3. In Cycle 3, what value does InstrD take on?
4. In Cycle 3, what are the values of the RD1 and RD2 outputs of the Register
File?
5. In Cycle 4, what are the values of WriteRegE and WriteRegM?
6. In Cycle 4, what are values of the ALU inputs SrcAE and SrcBE?
7. In Cycle 4, what is the value of the Data Memory RD output?
ENCM 501 Winter 2015 Assignment 7
page 5 of 6
8. In Cycle 5, what are the values of WriteRegE, WriteRegM, and WriteRegW?
9. In Cycle 5, what are values of the ALU inputs SrcAE and SrcBE?
10. In Cycle 5, what is the value of ResultW?
11. In Cycle 5, what are the values of the RD1 and RD2 outputs of the Register
File, nearing the end of the clock cycle? (Remember, Register File updates
occur on negative clock edges.)
12. In Cycle 6, what are the values of WriteRegE, WriteRegM, and WriteRegW?
13. In Cycle 6, what are values of the ALU inputs SrcAE and SrcBE?
14. In Cycle 6, what is the value of ResultW?
2.3
What to Do, Part II
Now let’s look at this sequence of instructions:
address
0x00409000
0x00409004
0x00409008
0x0040900c
0x00409010
0x00409014
0x00409018
assembly language
BEQ
R0, R0, L1
SW
R8, 40(R9)
ADD
R11, R10, R8
SUB
R12, R10, R8
AND
R13, R10, R8
OR
R14, R10, R8
L1:
LW
R15, 12(R29)
Obviously the branch will be taken, because it is guaranteed that R0 is equal
to R0.
But note that the decision to branch is made in the Memory stage. The ALU
will subtract R0 from R0 in the Execute stage. In the Memory stage of BEQ, the
signal ZeroM will have a value of 1 because the subtraction result was zero in the
Execute stage, and the signal BranchM will have a value of 1, because the Control
Unit “did the right thing” back in the Decode stage for BEQ.
How many instructions get into the pipeline after BEQ before the LW instruction
is fetched? Make a diagram in the style of Figure 3 to determine which of the
instructions SW, ADD, SUB, AND, and OR get into the pipeline before the branch target
LW is fetched.
2.4
What to Do, Part III
If the clock period is long enough to allow it, the circuit of Figure 1 can be modified
as follows:
• the and gate in the Memory stage can be moved to the Execute stage, so that
it performs the and of BranchE and a “Zero” signal that comes straight from
the ALU without passing through a pipeline register;
• the output of the adder in the Execute stage can be passed directly to the
multiplexer in the Fetch stage, again without passing through a pipeline register.
Repeat Part II, assuming the above design change to the circuit.
2.5
What to Hand In
Briefly and clearly explained answers for Part I; clear answers with supporting
diagrams for Parts II and III.
ENCM 501 Winter 2015 Assignment 7
3
3.1
page 6 of 6
Exercise B: Program run time calculation
Read This First
A simple but not very high-performance method for taking care of the control
hazards in the circuit of Figure 1 by stalling works as follows:
If it is detected in the Memory stage that a branch will be taken, cancel all the instructions that have entered the pipeline after the branch
instruction. (Note: This assumes a version of the MIPS ISA in which
delay-slot instructions should not be executed if a branch is taken.)
The cancellation is done, for example, by changing RegWrite signals
from 1 to 0 for LW and R-type instructions, and MemWrite signals from 1
to 0 for SW instructions.
The amount of extra logic that needs to be added to Figure 1 for this method is
quite modest in scale.
3.2
What to Do, Part I
Suppose that some poor, tortured assembly language programmer has been asked
by the boss to write a program that has correct work-arounds for all of the data
hazards of the Figure 1 computer, but assumes that control hazards are solved with
stalling as described in “Read This First.”
Simulation shows that the instruction count for the program is 1,000,000 when
it runs. So with an ideal CPI of 1, the program would run in 1,000,004 cycles:
1 million fetches, plus four more cycles to complete the last instruction. Assume
the following instruction mix:
80% instructions are not BEQ;
12% instructions are untaken branches;
8% instructions are taken branches.
How many clock cycles will it take to run the program?
3.3
What to Do, Part II
Repeat Part I, assuming the circuit modification of Exercise A, Part III.
3.4
What to Hand In
For each of Parts I and II, answers with detailed and clear justifications.