1/page - 6.004

Lecture 10:
Programmable Machines
http://xkcd.com/722/
•  Programmable datapaths
•  Von Neumann Model
•  Intro to Beta Instruction Set
6.004 Computation Structures
Today’s handouts:
•  Lecture slides
Notes:
•  Quiz 1 Wed. 19:30-21:30
in 50-340
L10: Programmable Machines, Slide #1
Computing Devices Then…
6.004 Computation Structures
L10: Programmable Machines, Slide #2
Computing Devices Now
6.004 Computation Structures
L10: Programmable Machines, Slide #3
6.004 Lectures 10-19:
Computer Organization and Design
•  The Hardware-Software Interface (L10-11)
•  From High-Level Programming Languages to the
Language of Hardware (L12-13)
•  Modern Computer Design (L14-19)
–  Simple processor implementation
–  The memory hierarchy
–  Performance through parallelism (pipelining & multicore)
6.004 Computation Structures
L10: Programmable Machines, Slide #4
6.004 So Far
Finite State
Machines
Sequential
Elements
Combinational
Logic
CMOS Gates
Transistors
•  What can you do with these?
–  Take a (solvable) problem
–  Design a procedure (recipe)
to solve the problem
–  Design a finite state machine
that implements the procedure
and solves the problem
•  What you’ll be able to do after this week:
–  Design a machine that can solve any solvable
problem, given enough time and memory (a
general-purpose computer)
Information
6.004 Computation Structures
L10: Programmable Machines, Slide #5
Example 1: Multiplier
•  Last lecture was all about this!
a3
a3
a2
a1
a0
a2
a1
a0
HA
HA
HA
b0
b1
SN
b2
a3
a2
FA
a1
FA
a0
FA
FA
FA
HA
HA
FA
HA
z6
z5
FA
SN-1…S0
P
B
NC
A
M bits
b3
N
HA
LSB
+
xN
1
N
N+1
HA
z7
z4
z3
z2
z1
z0
•  Combinational, pipelined and sequential
implementations with different performance, area,
and energy tradeoffs
6.004 Computation Structures
L10: Programmable Machines, Slide #6
Example 2: Factorial
factorial(N) = N! = N*(N-­‐1)*…*1 Step 1: Design a procedure to compute factorial(N)
Python:
High-level FSM:
b != 0 a = 1 b = N while b != 0: a = a * b b = b – 1 C:
int a = 1; int b = N; while (b != 0) { a = a * b; b = b – 1; } 6.004 Computation Structures
– 
– 
– 
– 
– 
b == 0 start
loop
a ß 1 b ß N a ß a * b b ß b -­‐ 1 done
a ß a b ß b Helpful to translate into hardware
Registers (a, b)
States (start, loop, done)
Boolean transitions (b==0, b!=0)
Register assignments in states
(e.g., a ß a * b)
L10: Programmable Machines, Slide #7
Datapath for Factorial
b != 0 b == 0 start
loop
a ß 1 b ß N a ß a * b b ß b -­‐ 1 done
a ß a b ß b •  Draw registers
•  Draw combinational
circuit for each
assignment
•  Connect to input muxes
1
N
32
waSEL
2
0
1
32
wbSEL
2
2
0
1
32
2
32
a
b
32
32
-1
*
+
32
6.004 Computation Structures
32
L10: Programmable Machines, Slide #8
Control FSM for Factorial
b != 0 loop b == 0 done
1
2
start
0
a ß 1 b ß N a ß a b ß b a ß a * b b ß b -­‐ 1 1
•  Draw combinational logic for
transition conditions
•  Implement control FSM:
–  States: High-level FSM states
–  Inputs: Transition logic outputs
–  Outputs: Mux select signals
N
waSEL(2 bits)
wbSEL(2 bits)
Control
FSM
waSEL
0
1
2
wbSEL
0
1
2
z
a
b
0
-1
*
+
==
z
6.004 Computation Structures
S
z
waSEL wbSEL
S’
0
0
2
0
1
0
1
2
0
1
1
0
1
1
1
1
1
1
1
2
2
0
0
2
2
2
1
0
2
2
L10: Programmable Machines, Slide #9
So Far: Single-Purpose Hardware
•  Problemà Procedure (High-level FSM)à Implementation
•  Systematic way to implement high-level FSM as a
datapath + control FSM
–  Is this implementation an FSM itself?
–  If so, can you draw the truth table?
•  Is there an alternative to building a datapath + control
FSM to solve each possible problem?
6.004 Computation Structures
L10: Programmable Machines, Slide #10
Programming the Datapath
•  We can use our factorial datapath and change the
control FSM to solve other problems! Examples:
–  Multiplication
–  Squaring
•  But very limited problems. Reasons:
–  Limited storage (only two registers!)
–  Limited set of operations, and inputs to those operations
–  Limited inputs to the control FSM
6.004 Computation Structures
L10: Programmable Machines, Slide #11
A Simple Programmable Datapath
aSEL
wSEL
wEN
LE
R0
LE
R1
LE
LE
•  Each cycle, this datapath:
–  Reads two operands (a, b)
from 4 registers (R0-R3)
–  Performs one operation of
+, -, *, NAND on operands
–  Optionally writes result to
a register
R2
•  Control FSM:
R3
bSEL
+
-
*
Control
FSM
NAND
==?
aSEL
bSEL
opSEL
wSEL
wEN
z
opSEL
6.004 Computation Structures
z
L10: Programmable Machines, Slide #12
A Control FSM for Factorial
•  Assume initial register contents:
R0 value = 1 R1 value = N R2 value = -­‐1 R3 value = 0 •  Control FSM:
z == 0
loop
mul
loop
sub
asel = 0
bsel = 1
opsel = 1 (*)
wen = 1
wsel = 0
asel = 1
bsel = 2
opsel = 0 (+)
wen = 1
wsel = 1
R0 ß R0 * R1 R1 ß R1 + R2 6.004 Computation Structures
loop
beq
asel = 1
bsel = 3
opsel = X
wen = 0
wsel = X
z == 1
done
asel = 1
bsel = 3
opsel = X
wen = 0
wsel = X
N! in R0 L10: Programmable Machines, Slide #13
New Problem ! New Control FSM
•  You can solve many more problems with this
datapath!
–  Exponentiation, division, square root, …
–  But nothing that requires more than four registers
•  By designing a control FSM, we are programming
the datapath
•  Early digital computers were programmed this way!
–  ENIAC (1943):
•  First general-purpose digital computer
•  Programmed by setting huge array of dials and switches
•  Reprogramming it took about 3 weeks
6.004 Computation Structures
L10: Programmable Machines, Slide #14
The von Neumann Model
•  Many approaches to build a general-purpose computer
•  Almost all modern computers are based on the von
Neumann model (John von Neumann, 1945)
•  Components:
Main
Memory
Central
Processing
Unit
Input/
Output
–  Main memory: Array of W words of N bits each
–  Central processing unit: Performs operations on memory values
–  Input/output devices to communicate with the outside world
6.004 Computation Structures
L10: Programmable Machines, Slide #15
Main Memory = Random-Access Memory
•  Array of bits, organized in W words of N bits each
–  Typically, W is a power of two: W =2k
–  Example: W=8 (k=3 bits),
N=32 bits per word
Address
•  Can read from and write
to individual words
6.004 Computation Structures
11101000 10111010 01011010 10010101
001
10111010 00000000 11110101 00000000
010
011
100
101
…
–  Compare with ROM
(read-only memory)
–  Many possible
implementations
(later in the course)
000
110
111
00000000 00000000 11110101 11011000
L10: Programmable Machines, Slide #16
Key Idea: Stored-Program Computer
•  Express program as a sequence of coded instructions
•  Memory holds both data and instructions
•  CPU fetches, interprets, and executes successive
instructions of the program
Main
Memory
Central
Processing
Unit
instruction
instruction
instruction
data
data
data
6.004 Computation Structures
L10: Programmable Machines, Slide #17
Internal storage
Anatomy of a von Neumann Computer
control
Datapath
address
data
Control
Unit
status
address
instructions
Main Memory
PC
dest
1101000111011
R1 ←R2+R3
…
registers
asel
• Instructions coded as binary data
bsel
fn
operations
6.004 Computation Structures
ALU
Cc’s
• Program Counter or PC: Address
of the instruction to be executed
• Logic to translate instructions into
control signals for datapath
L10: Programmable Machines, Slide #18
Instructions
•  Instructions are the fundamental unit of work
•  Each instruction specifies:
–  An operation or opcode to be performed
–  Source and destination operands
•  In a von Neumann machine, instructions
are executed sequentially
Fetch instruction
–  CPU logically implements this loop:
–  By default, the next PC is current
PC + size of current instruction
unless the instruction says otherwise
Decode instruction
Read src operands
Execute
Write dst operand
Compute next PC
6.004 Computation Structures
L10: Programmable Machines, Slide #19
Instruction Set Architecture (ISA)
•  ISA: The contract between software and hardware
–  Functional definition of operations and storage locations
–  Precise description of how software can invoke and access
them
•  The ISA is a new layer of abstraction:
–  ISA specifies what the hardware provides, not how it
implements it
–  Hides the complexity of CPU implementation
–  Enables fast innovation in hardware (no need to change
software!)
•  8086 (1978): 29 thousand transistors, 5 MHz, 0.33 MIPS
•  Pentium 4 (2003): 44 million transistors, 4 GHz, ~5000 MIPS
•  Both implement x86 ISA
–  Dark side: Commercially successful ISAs last for decades
•  Today’s x86 CPUs carry baggage of design decisions from the
70’s
6.004 Computation Structures
L10: Programmable Machines, Slide #20
Instruction Set Architecture Design
•  Designing an ISA is hard:
–  How many operations?
–  What types of storage, and maximum storage we can
have?
–  How to encode instructions?
–  How to future-proof?
•  How to decide? Take a quantitative approach
–  Take a set of representative benchmark programs
–  Evaluate versions of your ISA and implementation with
and without feature
–  Pick what works best overall (performance, energy, area…)
•  Corollary: Optimize the common case
6.004 Computation Structures
L10: Programmable Machines, Slide #21
The Beta ISA
•  We’ll study the Beta, a simple, modern, clean ISA
•  Next steps:
–  See a concrete example of an ISA
–  Explore some of the tradeoffs in ISA and implementation
design
6.004 Computation Structures
L10: Programmable Machines, Slide #22
Beta ISA: Storage
CPU State
Main Memory
PC
31
3
r0
r1
r2
...
r31
0
2
1 0
32-bit “words”
(4 bytes)
32-bit “words”
Up to 232 bytes (4GB of memory)
230 4-byte words
Each memory word is 32-bits wide,
but for historical reasons the β
uses byte memory addresses. Since
each word contains four 8-bit bytes,
addresses of consecutive words
differ by 4.
000000....0
General Registers
r31 hardwired to 0
6.004 Computation Structures
Why separate registers and main memory?
Tradeoff: Size vs speed and energy
L10: Programmable Machines, Slide #23
Beta ISA: Instructions
•  Three types of instructions:
–  Arithmetic and logical: Perform operations on general
registers
–  Branches: Conditionally change the program counter
–  Loads and stores: Move data between general registers and
main memory
•  All instructions have a fixed length: 32 bits (4 bytes)
–  Tradeoff (vs variable-length instructions):
•  Simpler decoding logic, next PC is easy to compute
•  Larger code size
6.004 Computation Structures
L10: Programmable Machines, Slide #24
Beta ALU Instructions
Format:
OPCODE
rc
ra
rb
unused
Example coded instruction: ADD
00000000
1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 0unused
OPCODE =
ra=1, rb=2
rc=3,
100000, encodes
encodes R1 and R2 as
encodes
R3
as
ADD
source locations
destination
32-bit hex: 0x80611000
We prefer to write a symbolic representation:
ADD(r1,r2,r3) ADD(ra,rb,rc): Reg[rc] ! Reg[ra] + Reg[rb] “Add the contents of ra
to the contents of rb;
store the result in rc”
6.004 Computation Structures
Similar instructions for
other ALU operations:
arithmetic: ADD, SUB, MUL, DIV
compare: CMPEQ, CMPLT, CMPLE
boolean: AND, OR, XOR, XNOR
shift: SHL, SHR, SAR
L10: Programmable Machines, Slide #25
Beta ALU Instructions with Constant
Format:
OPCODE
rc
ra
16-bit signed constant
Example instruction: ADDC adds register contents and constant:
11000000011000011111111111111101
OPCODE =
rc=3,
110000, encoding
encoding R3
ADDC
as destination
ra=1,
encoding R1
as first
operand
constant field,
encoding -3 as
second operand
(sign-extended!)
Symbolic version: ADDC(r1,-­‐3,r3) ADDC(ra,const,rc): Reg[rc] ß Reg[ra] + sext(const) “Add the contents of ra to
const; store the result in rc”
6.004 Computation Structures
Similar instructions for other
ALU operations:
arithmetic: ADDC, SUBC, MULC, DIVC
compare: CMPEQC, CMPLTC, CMPLEC
boolean: ANDC, ORC, XORC, XNORC
shift: SHLC, SHRC, SARC
L10: Programmable Machines, Slide #26
Why Have Instructions with Constants?
•  Many programs use small constants frequently
–  e.g., our factorial example: 0, 1, -1
–  Tradeoff:
•  When used, they save registers and instructions
•  More opcodes à more complex control logic and datapath
[Hennessy & Patterson]
Percentage of operations that use a constant operand
6.004 Computation Structures
L10: Programmable Machines, Slide #27
Can We Solve Factorial With These?
•  No! Recall high-level FSM:
b != 0 b == 0 start
loop
a ß 1 b ß N a ß a * b b ß b -­‐ 1 done
a ß a b ß b •  Factorial needs to loop
•  So far we can only encode sequences of operations
on registers
•  Need a way to change the PC based on data values!
6.004 Computation Structures
L10: Programmable Machines, Slide #28
Beta Branch Instructions
The Beta’s branch instructions provide a way to conditionally
change the PC to point to a nearby location...
... and, optionally, remembering (in Rc) where we came from
(useful for procedure calls).
OPCODE
rc
ra
16-bit signed constant
BEQ(ra,label,rc): Branch if equal
NPC ß PC + 4 Reg[rc] ß NPC if (Reg[ra] == 0) PC ß NPC + 4*offset else PC ß NPC “offset” is a SIGNED
CONSTANT encoded as
part of the instruction!
BNE(ra,label,rc): Branch if not equal
NPC ß PC + 4 Reg[rc] ß NPC if (Reg[ra] != 0) PC ß NPC + 4*offset else PC ß NPC offset = (label -­‐ <addr of BNE/BEQ>)/4 – 1 = up to 32767 instructions before/after BNE/BEQ 6.004 Computation Structures
L10: Programmable Machines, Slide #29
Can We Solve Factorial Now?
int a = 1; int b = N; while (b != 0) { a = a * b; b = b – 1; } | Assume r1 = N ADDC(r31, 1, r0) | r0 = 1 MUL(r0, r1, r0) | r0 = r0 * r1 SUBC(r1, 1, r1) | r1 = r1 – 1 BEQ(r1, -­‐3, r31) | if r1 != 0, run MUL next | at this point, r0 = N! •  Remember control FSM for our simple programmable datapath?
z == 0
loop
mul
loop
sub
loop
beq
z == 1
done
•  Control FSM states à instructions!
–  Not the case with general datapaths
–  Happens here because our simple programmable datapath is similar to
basic von Neumann datapath
6.004 Computation Structures
L10: Programmable Machines, Slide #30
Beta Load and Store Instructions
Loads and stores move data between general registers and
main memory
OPCODE
rc
ra
16-bit signed constant
address
LD(ra,const,rc) Reg[rc] ß Mem[Reg[ra] + sext(const)] Fetch into the contents of rc the contents of the memory location
whose address is C plus the contents of ra
ST(rc,const,ra) Mem[Reg[ra] + sext(const)] ß Reg[rc] Store the contents of rc into the memory location whose address
is C plus the contents of ra
BYTE ADDRESSES, but only 32-bit word accesses to word-aligned
addresses are supported. Low two address bits are ignored
Tradeoff (vs allowing unaligned accesses): Simple implementation,
but harder to use
6.004 Computation Structures
L10: Programmable Machines, Slide #31
Storage Conventions
•  Variables live in memory
•  Registers hold temporary values
•  To operate with memory variables
–  Load them
–  Compute on them
–  Store the results
1000:
1004:
1008:
100C:
1010:
6.004 Computation Structures
n
r
x
y
int x, y; y = x * 37; LD(r31, 0x1008, r0) MULC(r0, 37, r0) ST(r0, 0x100C, r31) L10: Programmable Machines, Slide #32
Beta ISA Summary
•  Storage:
–  Processor: 32 registers (r31 hardwired to 0) and PC
–  Main memory: Up to 4 GB, 32-bit words, 32-bit byte
addresses, 4-byte-aligned accesses
•  Instruction formats:
OPCODE
rc
ra
OPCODE
rc
ra
rb
unused
16-bit signed constant
32 bits
•  Instruction types:
–  ALU: Two input registers, or register and constant
–  Branches
–  Loads and stores
6.004 Computation Structures
L10: Programmable Machines, Slide #33
Summary
•  Programmable datapaths provide
some algorithmic flexibility by
changing control structure
•  The von Neumann model à generalpurpose, stored-program computation
•  ISA design often requires tradeoffs à
optimize the common case
•  Example von Neumann ISA: Beta
6.004 Computation Structures
L10: Programmable Machines, Slide #34
Backup: Datapath Cheatsheet
1-bit wire
N-bit wire
Multi-bit wire (N omitted)
EN SEL
2
2
SEL
N
0
1
2
3
4-to-1 multi-bit MUX
0
32
6.004 Computation Structures
2
3
1-to-4 decoder
32
32-bit register
32
1
LE
32
32-bit register
with load enable
L10: Programmable Machines, Slide #35