page 1 of 6 ENCM 501 Winter 2015 Assignment 7 for the Week of March 16 Steve Norman Department of Electrical & Computer Engineering University of Calgary March 2015 Assignment instructions and other documents for ENCM 501 can be found at http://people.ucalgary.ca/~norman/encm501winter2015/ 1 Administrative details 1.1 Group work is permitted Here are the options: • You may do your work entirely individually. • A group of two or three students may hand in a single assignment for the whole group. • Collaboration at the level of individual exercises is acceptable. In that case, submissions of complete, individual assignments are required, with explicit acknowledgments given as needed on an exercise-by-exercise basis. Informal discussion of assignment exercises between students is encouraged, and does not need to be acknowledged. Please be aware that all students are expected to understand all assignment exercises! Collaboration is of course not allowed on quizzes, the midterm test, and the final exam. 1.2 Due Dates The Due Date for this assignment is 3:30pm, Thursday, March 19. The Late Due Date is 3:30pm, Friday, March 20. The penalty for handing in an assignment after the Due Date but before the Late Due Date is 3 marks. In other words, X/10 becomes (X–3)/10 if the assignment is late. There will be no credit for assignments turned in after the Late Due Date; they will be returned unmarked. 1.3 A B total Marking scheme 16 marks 3 marks 19 marks ENCM 501 Winter 2015 Assignment 7 1.4 page 2 of 6 How to package and hand in your assignments Please see the instructions in Assignment 1. 2 Exercise A: Tracing instructions in a simple pipeline 2.1 Read This First To get a feel for how pipelining really works, nothing beats studying a pipelined circuit and tracing the flow of chunks of instructions and data through the circuit. Figure 1 is taken from the current textbook for ENEL 353 and ENCM 369. It shows a 5-stage pipeline for processing of the 32-bit MIPS instructions listed in Figure 2. What is especially awesome about Figure 1, in addition to its outstanding attention to detail, is the helpful naming scheme for all the signals in the circuit. Here is an important example: WriteRegE is a 5-bit signal indicating the destination GPR of the instruction that is currently in the Execute stage, while WriteRegM indicates the destination GPR of an earlier instruction that has made it to the Memory stage. Carefully note the “bubble” on the CLK input to the Register File: Writes to GPRs happen on negative clock edges, while writes to the PC and pipeline registers happen on positive clock edges. So data that makes it to the pipeline register at the end of the Memory stage can be copied into the Register File half a clock cycle later—in other words, the Writeback step only takes half a clock cycle. Figure 1: Pipelined processor design to support eight MIPS32 instructions. (Image is Figure 7.47 from Harris D. M. and Harris S. L., Digital Design and Computer Architecc 2013, Elsevier, Inc.) Note that this circuit does not have elements to ture, 2nd ed., deal correctly with data hazards and control hazards, so is capable of processing a lot of instructions incorrectly! (Figure 7.58 in the same book shows how to handle hazards correctly.) CLK 5:0 RegWriteE RegWriteM RegWriteW MemtoRegD MemtoRegE MemtoRegM MemtoRegW MemWriteD MemWriteE MemWriteM BranchD BranchE BranchM Op ALUControlD ALUControlE2:0 Funct ALUSrcD ALUSrcE RegDstD RegDstE CLK 0 PC' PCF 1 A RD Instruction Memory 4 PCPlus4F ALUOutW CLK InstrD 25:21 20:16 A1 A2 A3 WD3 + PCSrcM CLK WE3 RD2 0 SrcBE 1 Register File 20:16 RtE 15:11 RdE 15:0 Sign Extend 0 1 SignImmE ALUOutM WriteDataE WriteDataM WriteRegE4:0 WriteRegM4:0 A RD Data Memory WD ReadDataW 0 1 WriteRegW4:0 <<2 + PCPlus4D WE ZeroM SrcAE RD1 ALU CLK CLK RegWriteD Control Unit 31:26 CLK PCBranchM PCPlus4E ResultW ENCM 501 Winter 2015 Assignment 7 page 3 of 6 Figure 2: Machine code format for the 32-bit MIPS instructions supported by the processor of Figure 1. R-type 31 26 25 000000 21 20 source GPR 1 16 15 source GPR 2 11 10 dest. GPR 65 00000 0 “funct” field funct is 100000 for ADD, 100010 for SUB, 100100 for AND, 100101 for OR, and 101010 for SLT. LW 31 26 25 21 20 pointer 100011 GPR 16 15 0 dest. GPR offset SW 31 26 25 21 20 pointer 101011 GPR 16 15 0 source GPR offset BEQ 31 26 25 21 20 source 000100 GPR 1 16 15 0 source GPR 2 offset Figure 3: Hint about how to draw pipeline diagrams for Exercise A. LW ADD cycle 1 cycle 2 cycle 3 cycle 4 F D E M F D E cycle 5 cycle 6 W M W ENCM 501 Winter 2015 Assignment 7 2.2 page 4 of 6 What to Do, Part I Consider the following sequence of MIPS32 instructions: address 0x00408000 0x00408004 0x00408008 0x0040800c code 0x8d280014 0x01485820 0x01886825 0x01c87822 assembly language LW R8, 20(R9) ADD R11, R10, R8 OR R13, R12, R8 SUB R15, R14, R8 Let’s use “Cycle 1” as the name for the clock cycle in which LW is in the Fetch stage, “Cycle 2” for the very next clock cycle, and so on. Assume: • At the beginning of Cycle 1, R8 = 0x00000051, R9 = 0x10010000, R10 = 0x00000009, R12 = 0x00000007, R14 = 0x00000074, and the Data Memory word at address 0x10010014 is 0x00000098. • Whatever five or so instructions get into the pipeline just before LW, none of them write to any of the registers R8–R15, and none of them are BEQ instructions. • The Control Unit “does the right thing” with all of its outputs. For example, this multiplexer . . . RtE RdE 5 5 0 5 WriteRegE 1 . . . will copy RtE to WriteRegE when the instruction in Execute is LW, but will copy RdE to WriteRegE when the instruction is R-Type. Answer all the questions in Items 2–14 below. An answer is given for Item 1 as a model to follow. Before you start, drawing an extended version of the diagram in Figure 3 is highly recommended. 1. In Cycle 1, what values do the signals PCF and PCPlus4F take on? In Cycle 2, what value does InstrD take on, and what is the bit pattern for bits 20:16 of InstrD? In Cycle 1, PCF is 0x00408000, the address of the LW instruction, and PCPlus4F is 0x00408004, the address of the next instruction to be fetched. In Cycle 2, InstrD is 0x8d280014, the machine code for the LW instruction; bits 20:16 of that are 01000 to specify R8 as the destination of the LW. 2. In Cycle 3, what are the ALU inputs SrcAE and SrcBE? (Hint: The job of the ALU in this cycle is to compute the memory address for LW.) 3. In Cycle 3, what value does InstrD take on? 4. In Cycle 3, what are the values of the RD1 and RD2 outputs of the Register File? 5. In Cycle 4, what are the values of WriteRegE and WriteRegM? 6. In Cycle 4, what are values of the ALU inputs SrcAE and SrcBE? 7. In Cycle 4, what is the value of the Data Memory RD output? ENCM 501 Winter 2015 Assignment 7 page 5 of 6 8. In Cycle 5, what are the values of WriteRegE, WriteRegM, and WriteRegW? 9. In Cycle 5, what are values of the ALU inputs SrcAE and SrcBE? 10. In Cycle 5, what is the value of ResultW? 11. In Cycle 5, what are the values of the RD1 and RD2 outputs of the Register File, nearing the end of the clock cycle? (Remember, Register File updates occur on negative clock edges.) 12. In Cycle 6, what are the values of WriteRegE, WriteRegM, and WriteRegW? 13. In Cycle 6, what are values of the ALU inputs SrcAE and SrcBE? 14. In Cycle 6, what is the value of ResultW? 2.3 What to Do, Part II Now let’s look at this sequence of instructions: address 0x00409000 0x00409004 0x00409008 0x0040900c 0x00409010 0x00409014 0x00409018 assembly language BEQ R0, R0, L1 SW R8, 40(R9) ADD R11, R10, R8 SUB R12, R10, R8 AND R13, R10, R8 OR R14, R10, R8 L1: LW R15, 12(R29) Obviously the branch will be taken, because it is guaranteed that R0 is equal to R0. But note that the decision to branch is made in the Memory stage. The ALU will subtract R0 from R0 in the Execute stage. In the Memory stage of BEQ, the signal ZeroM will have a value of 1 because the subtraction result was zero in the Execute stage, and the signal BranchM will have a value of 1, because the Control Unit “did the right thing” back in the Decode stage for BEQ. How many instructions get into the pipeline after BEQ before the LW instruction is fetched? Make a diagram in the style of Figure 3 to determine which of the instructions SW, ADD, SUB, AND, and OR get into the pipeline before the branch target LW is fetched. 2.4 What to Do, Part III If the clock period is long enough to allow it, the circuit of Figure 1 can be modified as follows: • the and gate in the Memory stage can be moved to the Execute stage, so that it performs the and of BranchE and a “Zero” signal that comes straight from the ALU without passing through a pipeline register; • the output of the adder in the Execute stage can be passed directly to the multiplexer in the Fetch stage, again without passing through a pipeline register. Repeat Part II, assuming the above design change to the circuit. 2.5 What to Hand In Briefly and clearly explained answers for Part I; clear answers with supporting diagrams for Parts II and III. ENCM 501 Winter 2015 Assignment 7 3 3.1 page 6 of 6 Exercise B: Program run time calculation Read This First A simple but not very high-performance method for taking care of the control hazards in the circuit of Figure 1 by stalling works as follows: If it is detected in the Memory stage that a branch will be taken, cancel all the instructions that have entered the pipeline after the branch instruction. (Note: This assumes a version of the MIPS ISA in which delay-slot instructions should not be executed if a branch is taken.) The cancellation is done, for example, by changing RegWrite signals from 1 to 0 for LW and R-type instructions, and MemWrite signals from 1 to 0 for SW instructions. The amount of extra logic that needs to be added to Figure 1 for this method is quite modest in scale. 3.2 What to Do, Part I Suppose that some poor, tortured assembly language programmer has been asked by the boss to write a program that has correct work-arounds for all of the data hazards of the Figure 1 computer, but assumes that control hazards are solved with stalling as described in “Read This First.” Simulation shows that the instruction count for the program is 1,000,000 when it runs. So with an ideal CPI of 1, the program would run in 1,000,004 cycles: 1 million fetches, plus four more cycles to complete the last instruction. Assume the following instruction mix: 80% instructions are not BEQ; 12% instructions are untaken branches; 8% instructions are taken branches. How many clock cycles will it take to run the program? 3.3 What to Do, Part II Repeat Part I, assuming the circuit modification of Exercise A, Part III. 3.4 What to Hand In For each of Parts I and II, answers with detailed and clear justifications.
© Copyright 2024