Lecture 10: Programmable Machines http://xkcd.com/722/ • Programmable datapaths • Von Neumann Model • Intro to Beta Instruction Set 6.004 Computation Structures Today’s handouts: • Lecture slides Notes: • Quiz 1 Wed. 19:30-21:30 in 50-340 L10: Programmable Machines, Slide #1 Computing Devices Then… 6.004 Computation Structures L10: Programmable Machines, Slide #2 Computing Devices Now 6.004 Computation Structures L10: Programmable Machines, Slide #3 6.004 Lectures 10-19: Computer Organization and Design • The Hardware-Software Interface (L10-11) • From High-Level Programming Languages to the Language of Hardware (L12-13) • Modern Computer Design (L14-19) – Simple processor implementation – The memory hierarchy – Performance through parallelism (pipelining & multicore) 6.004 Computation Structures L10: Programmable Machines, Slide #4 6.004 So Far Finite State Machines Sequential Elements Combinational Logic CMOS Gates Transistors • What can you do with these? – Take a (solvable) problem – Design a procedure (recipe) to solve the problem – Design a finite state machine that implements the procedure and solves the problem • What you’ll be able to do after this week: – Design a machine that can solve any solvable problem, given enough time and memory (a general-purpose computer) Information 6.004 Computation Structures L10: Programmable Machines, Slide #5 Example 1: Multiplier • Last lecture was all about this! a3 a3 a2 a1 a0 a2 a1 a0 HA HA HA b0 b1 SN b2 a3 a2 FA a1 FA a0 FA FA FA HA HA FA HA z6 z5 FA SN-1…S0 P B NC A M bits b3 N HA LSB + xN 1 N N+1 HA z7 z4 z3 z2 z1 z0 • Combinational, pipelined and sequential implementations with different performance, area, and energy tradeoffs 6.004 Computation Structures L10: Programmable Machines, Slide #6 Example 2: Factorial factorial(N) = N! = N*(N-‐1)*…*1 Step 1: Design a procedure to compute factorial(N) Python: High-level FSM: b != 0 a = 1 b = N while b != 0: a = a * b b = b – 1 C: int a = 1; int b = N; while (b != 0) { a = a * b; b = b – 1; } 6.004 Computation Structures – – – – – b == 0 start loop a ß 1 b ß N a ß a * b b ß b -‐ 1 done a ß a b ß b Helpful to translate into hardware Registers (a, b) States (start, loop, done) Boolean transitions (b==0, b!=0) Register assignments in states (e.g., a ß a * b) L10: Programmable Machines, Slide #7 Datapath for Factorial b != 0 b == 0 start loop a ß 1 b ß N a ß a * b b ß b -‐ 1 done a ß a b ß b • Draw registers • Draw combinational circuit for each assignment • Connect to input muxes 1 N 32 waSEL 2 0 1 32 wbSEL 2 2 0 1 32 2 32 a b 32 32 -1 * + 32 6.004 Computation Structures 32 L10: Programmable Machines, Slide #8 Control FSM for Factorial b != 0 loop b == 0 done 1 2 start 0 a ß 1 b ß N a ß a b ß b a ß a * b b ß b -‐ 1 1 • Draw combinational logic for transition conditions • Implement control FSM: – States: High-level FSM states – Inputs: Transition logic outputs – Outputs: Mux select signals N waSEL(2 bits) wbSEL(2 bits) Control FSM waSEL 0 1 2 wbSEL 0 1 2 z a b 0 -1 * + == z 6.004 Computation Structures S z waSEL wbSEL S’ 0 0 2 0 1 0 1 2 0 1 1 0 1 1 1 1 1 1 1 2 2 0 0 2 2 2 1 0 2 2 L10: Programmable Machines, Slide #9 So Far: Single-Purpose Hardware • Problemà Procedure (High-level FSM)à Implementation • Systematic way to implement high-level FSM as a datapath + control FSM – Is this implementation an FSM itself? – If so, can you draw the truth table? • Is there an alternative to building a datapath + control FSM to solve each possible problem? 6.004 Computation Structures L10: Programmable Machines, Slide #10 Programming the Datapath • We can use our factorial datapath and change the control FSM to solve other problems! Examples: – Multiplication – Squaring • But very limited problems. Reasons: – Limited storage (only two registers!) – Limited set of operations, and inputs to those operations – Limited inputs to the control FSM 6.004 Computation Structures L10: Programmable Machines, Slide #11 A Simple Programmable Datapath aSEL wSEL wEN LE R0 LE R1 LE LE • Each cycle, this datapath: – Reads two operands (a, b) from 4 registers (R0-R3) – Performs one operation of +, -, *, NAND on operands – Optionally writes result to a register R2 • Control FSM: R3 bSEL + - * Control FSM NAND ==? aSEL bSEL opSEL wSEL wEN z opSEL 6.004 Computation Structures z L10: Programmable Machines, Slide #12 A Control FSM for Factorial • Assume initial register contents: R0 value = 1 R1 value = N R2 value = -‐1 R3 value = 0 • Control FSM: z == 0 loop mul loop sub asel = 0 bsel = 1 opsel = 1 (*) wen = 1 wsel = 0 asel = 1 bsel = 2 opsel = 0 (+) wen = 1 wsel = 1 R0 ß R0 * R1 R1 ß R1 + R2 6.004 Computation Structures loop beq asel = 1 bsel = 3 opsel = X wen = 0 wsel = X z == 1 done asel = 1 bsel = 3 opsel = X wen = 0 wsel = X N! in R0 L10: Programmable Machines, Slide #13 New Problem ! New Control FSM • You can solve many more problems with this datapath! – Exponentiation, division, square root, … – But nothing that requires more than four registers • By designing a control FSM, we are programming the datapath • Early digital computers were programmed this way! – ENIAC (1943): • First general-purpose digital computer • Programmed by setting huge array of dials and switches • Reprogramming it took about 3 weeks 6.004 Computation Structures L10: Programmable Machines, Slide #14 The von Neumann Model • Many approaches to build a general-purpose computer • Almost all modern computers are based on the von Neumann model (John von Neumann, 1945) • Components: Main Memory Central Processing Unit Input/ Output – Main memory: Array of W words of N bits each – Central processing unit: Performs operations on memory values – Input/output devices to communicate with the outside world 6.004 Computation Structures L10: Programmable Machines, Slide #15 Main Memory = Random-Access Memory • Array of bits, organized in W words of N bits each – Typically, W is a power of two: W =2k – Example: W=8 (k=3 bits), N=32 bits per word Address • Can read from and write to individual words 6.004 Computation Structures 11101000 10111010 01011010 10010101 001 10111010 00000000 11110101 00000000 010 011 100 101 … – Compare with ROM (read-only memory) – Many possible implementations (later in the course) 000 110 111 00000000 00000000 11110101 11011000 L10: Programmable Machines, Slide #16 Key Idea: Stored-Program Computer • Express program as a sequence of coded instructions • Memory holds both data and instructions • CPU fetches, interprets, and executes successive instructions of the program Main Memory Central Processing Unit instruction instruction instruction data data data 6.004 Computation Structures L10: Programmable Machines, Slide #17 Internal storage Anatomy of a von Neumann Computer control Datapath address data Control Unit status address instructions Main Memory PC dest 1101000111011 R1 ←R2+R3 … registers asel • Instructions coded as binary data bsel fn operations 6.004 Computation Structures ALU Cc’s • Program Counter or PC: Address of the instruction to be executed • Logic to translate instructions into control signals for datapath L10: Programmable Machines, Slide #18 Instructions • Instructions are the fundamental unit of work • Each instruction specifies: – An operation or opcode to be performed – Source and destination operands • In a von Neumann machine, instructions are executed sequentially Fetch instruction – CPU logically implements this loop: – By default, the next PC is current PC + size of current instruction unless the instruction says otherwise Decode instruction Read src operands Execute Write dst operand Compute next PC 6.004 Computation Structures L10: Programmable Machines, Slide #19 Instruction Set Architecture (ISA) • ISA: The contract between software and hardware – Functional definition of operations and storage locations – Precise description of how software can invoke and access them • The ISA is a new layer of abstraction: – ISA specifies what the hardware provides, not how it implements it – Hides the complexity of CPU implementation – Enables fast innovation in hardware (no need to change software!) • 8086 (1978): 29 thousand transistors, 5 MHz, 0.33 MIPS • Pentium 4 (2003): 44 million transistors, 4 GHz, ~5000 MIPS • Both implement x86 ISA – Dark side: Commercially successful ISAs last for decades • Today’s x86 CPUs carry baggage of design decisions from the 70’s 6.004 Computation Structures L10: Programmable Machines, Slide #20 Instruction Set Architecture Design • Designing an ISA is hard: – How many operations? – What types of storage, and maximum storage we can have? – How to encode instructions? – How to future-proof? • How to decide? Take a quantitative approach – Take a set of representative benchmark programs – Evaluate versions of your ISA and implementation with and without feature – Pick what works best overall (performance, energy, area…) • Corollary: Optimize the common case 6.004 Computation Structures L10: Programmable Machines, Slide #21 The Beta ISA • We’ll study the Beta, a simple, modern, clean ISA • Next steps: – See a concrete example of an ISA – Explore some of the tradeoffs in ISA and implementation design 6.004 Computation Structures L10: Programmable Machines, Slide #22 Beta ISA: Storage CPU State Main Memory PC 31 3 r0 r1 r2 ... r31 0 2 1 0 32-bit “words” (4 bytes) 32-bit “words” Up to 232 bytes (4GB of memory) 230 4-byte words Each memory word is 32-bits wide, but for historical reasons the β uses byte memory addresses. Since each word contains four 8-bit bytes, addresses of consecutive words differ by 4. 000000....0 General Registers r31 hardwired to 0 6.004 Computation Structures Why separate registers and main memory? Tradeoff: Size vs speed and energy L10: Programmable Machines, Slide #23 Beta ISA: Instructions • Three types of instructions: – Arithmetic and logical: Perform operations on general registers – Branches: Conditionally change the program counter – Loads and stores: Move data between general registers and main memory • All instructions have a fixed length: 32 bits (4 bytes) – Tradeoff (vs variable-length instructions): • Simpler decoding logic, next PC is easy to compute • Larger code size 6.004 Computation Structures L10: Programmable Machines, Slide #24 Beta ALU Instructions Format: OPCODE rc ra rb unused Example coded instruction: ADD 00000000 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 0unused OPCODE = ra=1, rb=2 rc=3, 100000, encodes encodes R1 and R2 as encodes R3 as ADD source locations destination 32-bit hex: 0x80611000 We prefer to write a symbolic representation: ADD(r1,r2,r3) ADD(ra,rb,rc): Reg[rc] ! Reg[ra] + Reg[rb] “Add the contents of ra to the contents of rb; store the result in rc” 6.004 Computation Structures Similar instructions for other ALU operations: arithmetic: ADD, SUB, MUL, DIV compare: CMPEQ, CMPLT, CMPLE boolean: AND, OR, XOR, XNOR shift: SHL, SHR, SAR L10: Programmable Machines, Slide #25 Beta ALU Instructions with Constant Format: OPCODE rc ra 16-bit signed constant Example instruction: ADDC adds register contents and constant: 11000000011000011111111111111101 OPCODE = rc=3, 110000, encoding encoding R3 ADDC as destination ra=1, encoding R1 as first operand constant field, encoding -3 as second operand (sign-extended!) Symbolic version: ADDC(r1,-‐3,r3) ADDC(ra,const,rc): Reg[rc] ß Reg[ra] + sext(const) “Add the contents of ra to const; store the result in rc” 6.004 Computation Structures Similar instructions for other ALU operations: arithmetic: ADDC, SUBC, MULC, DIVC compare: CMPEQC, CMPLTC, CMPLEC boolean: ANDC, ORC, XORC, XNORC shift: SHLC, SHRC, SARC L10: Programmable Machines, Slide #26 Why Have Instructions with Constants? • Many programs use small constants frequently – e.g., our factorial example: 0, 1, -1 – Tradeoff: • When used, they save registers and instructions • More opcodes à more complex control logic and datapath [Hennessy & Patterson] Percentage of operations that use a constant operand 6.004 Computation Structures L10: Programmable Machines, Slide #27 Can We Solve Factorial With These? • No! Recall high-level FSM: b != 0 b == 0 start loop a ß 1 b ß N a ß a * b b ß b -‐ 1 done a ß a b ß b • Factorial needs to loop • So far we can only encode sequences of operations on registers • Need a way to change the PC based on data values! 6.004 Computation Structures L10: Programmable Machines, Slide #28 Beta Branch Instructions The Beta’s branch instructions provide a way to conditionally change the PC to point to a nearby location... ... and, optionally, remembering (in Rc) where we came from (useful for procedure calls). OPCODE rc ra 16-bit signed constant BEQ(ra,label,rc): Branch if equal NPC ß PC + 4 Reg[rc] ß NPC if (Reg[ra] == 0) PC ß NPC + 4*offset else PC ß NPC “offset” is a SIGNED CONSTANT encoded as part of the instruction! BNE(ra,label,rc): Branch if not equal NPC ß PC + 4 Reg[rc] ß NPC if (Reg[ra] != 0) PC ß NPC + 4*offset else PC ß NPC offset = (label -‐ <addr of BNE/BEQ>)/4 – 1 = up to 32767 instructions before/after BNE/BEQ 6.004 Computation Structures L10: Programmable Machines, Slide #29 Can We Solve Factorial Now? int a = 1; int b = N; while (b != 0) { a = a * b; b = b – 1; } | Assume r1 = N ADDC(r31, 1, r0) | r0 = 1 MUL(r0, r1, r0) | r0 = r0 * r1 SUBC(r1, 1, r1) | r1 = r1 – 1 BEQ(r1, -‐3, r31) | if r1 != 0, run MUL next | at this point, r0 = N! • Remember control FSM for our simple programmable datapath? z == 0 loop mul loop sub loop beq z == 1 done • Control FSM states à instructions! – Not the case with general datapaths – Happens here because our simple programmable datapath is similar to basic von Neumann datapath 6.004 Computation Structures L10: Programmable Machines, Slide #30 Beta Load and Store Instructions Loads and stores move data between general registers and main memory OPCODE rc ra 16-bit signed constant address LD(ra,const,rc) Reg[rc] ß Mem[Reg[ra] + sext(const)] Fetch into the contents of rc the contents of the memory location whose address is C plus the contents of ra ST(rc,const,ra) Mem[Reg[ra] + sext(const)] ß Reg[rc] Store the contents of rc into the memory location whose address is C plus the contents of ra BYTE ADDRESSES, but only 32-bit word accesses to word-aligned addresses are supported. Low two address bits are ignored Tradeoff (vs allowing unaligned accesses): Simple implementation, but harder to use 6.004 Computation Structures L10: Programmable Machines, Slide #31 Storage Conventions • Variables live in memory • Registers hold temporary values • To operate with memory variables – Load them – Compute on them – Store the results 1000: 1004: 1008: 100C: 1010: 6.004 Computation Structures n r x y int x, y; y = x * 37; LD(r31, 0x1008, r0) MULC(r0, 37, r0) ST(r0, 0x100C, r31) L10: Programmable Machines, Slide #32 Beta ISA Summary • Storage: – Processor: 32 registers (r31 hardwired to 0) and PC – Main memory: Up to 4 GB, 32-bit words, 32-bit byte addresses, 4-byte-aligned accesses • Instruction formats: OPCODE rc ra OPCODE rc ra rb unused 16-bit signed constant 32 bits • Instruction types: – ALU: Two input registers, or register and constant – Branches – Loads and stores 6.004 Computation Structures L10: Programmable Machines, Slide #33 Summary • Programmable datapaths provide some algorithmic flexibility by changing control structure • The von Neumann model à generalpurpose, stored-program computation • ISA design often requires tradeoffs à optimize the common case • Example von Neumann ISA: Beta 6.004 Computation Structures L10: Programmable Machines, Slide #34 Backup: Datapath Cheatsheet 1-bit wire N-bit wire Multi-bit wire (N omitted) EN SEL 2 2 SEL N 0 1 2 3 4-to-1 multi-bit MUX 0 32 6.004 Computation Structures 2 3 1-to-4 decoder 32 32-bit register 32 1 LE 32 32-bit register with load enable L10: Programmable Machines, Slide #35
© Copyright 2024