e - University of Missouri

William Harrison
Introduction to Compiler
Construction
University of Missouri
Spring 2013
These slides were graciously provided by Helmut Seidl.
1
0
Introduction
Principle of Interpretation:
Program + Input
Interpreter
Advantage: No precomputation on the program text
startup-time
Output
==⇒ no/short
Disadvantages: Program parts are repeatedly analyzed during execution +
less efficient access to program variables
==⇒ slower execution speed
2
Principle of Compilation:
Program
Compiler
Code
Input
Code
Output
Two Phases (at two different Times):
• Translation of the source program into a machine program (at compile time);
• Execution of the machine program on input data (at run time).
3
Preprocessing of the source program provides for
• efficient access to the values of program variables at run time
• global program transformations to increase execution speed.
Disadvantage:
Compilation takes time
Advantage: Program execution is sped up
long running or often run programs
4
==⇒ compilation pays off in
Structure of a compiler:
Source program
Frontend
Internal representation
(Syntax tree)
Optimizations
Internal representation
Code
generation
Program for
target machine
5
Subtasks in code generation:
Goal is a good exploitation of the hardware resources:
1. Instruction Selection: Selection of efficient, semantically equivalent
instruction sequences;
2. Register-allocation:
Best use of the available processor registers
3. Instruction Scheduling: Reordering of the instruction stream to exploit
intra-processor parallelism
For several reasons, e.g. modularization of code generation and portability, code
generation may be split into two phases:
6
Intermediate
representation
Code
generation
abstract machine
code
abstract machine
code
Compiler
concrete machine
Interpreter
Output
code
alternatively:
Input
7
Virtual machine
• idealized architecture,
• simple code generation,
• easily implemented on real hardware.
Advantages:
• Porting the compiler to a new target architecture is simpler,
• Modularization makes the compiler easier to modify,
• Translation of program constructs is separated from the exploitation of
architectural features.
8
Virtual (or: abstract) machines for some programming languages:
Pascal
→
P-machine
Smalltalk
→
Bytecode
Prolog
→
WAM
SML, Haskell
→
STGM
Java
→
JVM
(“Warren Abstract Machine”)
9
We will consider the following languages and virtual machines:
C
→
CMa
//
imperative
PuF
→
MaMa
//
functional
Proll
→
WiM
//
logic based
C±
→
OMa
//
object oriented
multi-threaded C
→
threaded CMa
//
concurrent
10
The Translation of C
11
1
The Architecture of the CMa
• Each virtual machine provides a set of instructions
• Instructions are executed on the virtual hardware
• This virtual hardware can be viewed as a set of data structures, which the
instructions access
• ... and which are managed by the run-time system
For the CMa we need:
12
The Data Store:
S
0
SP
• S is the (data) store, onto which new cells are allocated in a LIFO discipline
==⇒ Stack.
• SP (=
b Stack Pointer) is a register, which contains the address of the topmost
allocated cell,
Simplification: All types of data fit into one cell of S.
13
The Code/Instruction Store:
C
0
1
PC
• C is the Code store, which contains the program.
Each cell of field C can store exactly one virtual instruction.
• PC (=
b Program Counter) is a register, which contains the address of the
instruction to be executed next.
• Initially, PC contains the address 0.
==⇒ C[ 0] contains the instruction to be executed first.
14
Execution of Programs:
• The machine loads the instruction in C[PC] into a Instruction-Register IR
and executes it
• PC is incremented by 1 before the execution of the instruction
while (true) {
IR = C[PC]; PC++;
execute (IR);
}
• The execution of the instruction may overwrite the PC (jumps).
• The Main Cycle of the machine will be halted by executing the instruction
halt , which returns control to the environment, e.g. the operating system
• More instructions will be introduced by demand
15
2
Simple expressions and assignments
Problem:
evaluate the expression
(1 + 7) ∗ 3 !
This means: generate an instruction sequence, which
• determines the value of the expression and
• pushes it on top of the stack...
Idea:
• first compute the values of the subexpressions,
• save these values on top of the stack,
• then apply the operator.
16
The general principle:
• instructions expect their arguments on top of the stack,
• execution of an instruction consumes its operands,
• results, if any, are stored on top of the stack.
loadc q
q
SP++;
S[SP] = q;
Instruction loadc q needs no operand on top of the stack, pushes the
constant q onto the stack.
Note: the content of register SP is only implicitly represented, namely through
the height of the stack.
17
3
8
mul
24
SP--;
S[SP] = S[SP] ∗ S[SP+1];
mul expects two operands on top of the stack, consumes both, and pushes
their product onto the stack.
... the other binary arithmetic and logical instructions, add, sub, div, mod,
and, or and xor, work analogously, as do the comparison instructions eq, neq,
le, leq, gr and geq.
18
Example:
The operator
leq
leq
7
3
1
Remark: 0 represents false, all other integers true.
Unary operators
result.
neg and
8
not
consume one operand and produce one
neg
S[SP] = – S[SP];
19
−8
Example:
1 + 7:
Code for
loadc 1
loadc 7
add
Execution of this code sequence:
loadc 1
1
loadc 7
20
7
1
add
8
Variables are associated with cells in S:
z:
y:
x:
Code generation will be described by some Translation Functions, code, codeL ,
and codeR .
Arguments: A program construct and a function ρ. ρ delivers for each variable x
the relative address of x. ρ is called Address Environment.
21
Variables can be used in two different ways:
Example:
x = y+1
We are interested in the value of y, but in the address of x.
The syntactic position determines, whether the L-value or the R-value of a
variable is required.
L-value of x
=
address of x
R-value of x
=
content of x
codeR e ρ
produces code to compute the R-value of e in the
address environment ρ
codeL e ρ
analogously for the L-value
Note:
Not every expression has an L-value (Ex.:
22
x + 1).
We define:
codeR (e1 + e2 ) ρ
= codeR e1 ρ
codeR e2 ρ
add
... analogously for the other binary operators
codeR (−e) ρ
= codeR e ρ
neg
... analogously for the other unary operators
codeR q ρ
= loadc q
codeL x ρ
= loadc (ρ x)
...
23
codeR x ρ
= codeL x ρ
load
The instruction
the stack.
load loads the contents of the cell, whose address is on top of
load
13
13
13
S[SP] = S[S[SP]];
24
codeR ( x = e) ρ
= codeR e ρ
codeL x ρ
store
store writes the contents of the second topmost stack cell into the cell, whose
address in on top of the stack, and leaves the written value on top of the stack.
Note:
this differs from the code generated by gcc ??
13
13
store
13
S[S[SP]] = S[SP-1];
SP--;
25
Example:
Code for
e ≡ x = y−1
with ρ = { x 7→ 4, y 7→ 7}.
codeR e ρ produces:
loadc 7
loadc 1
loadc 4
load
sub
store
Improvements:
Introduction of special instructions for frequently used instruction sequences,
e.g.,
loada q
=
loadc q
load
storea q
=
loadc q
store
26
3
Is
Statements and Statement Sequences
e
an expression, then
e;
is a statement.
Statements do not deliver a value. The contents of the SP before and after the
execution of the generated code must therefore be the same.
code e; ρ
= codeR e ρ
pop
The instruction
pop
eliminates the top element of the stack.
1
pop
SP--;
27
The code for a statement sequence is the concatenation of the code for the
statements of the sequence:
code (s ss) ρ
= code s ρ
code ss ρ
code ε ρ
=
//
empty sequence of instructions
28
4
Conditional and Iterative Statements
We need jumps to deviate from the serial execution of consecutive statements:
jump A
A
PC
PC
PC = A;
29
1
jumpz A
PC
PC
0
jumpz A
A
PC
PC
if (S[SP] == 0) PC = A;
SP--;
30
For ease of comprehension, we use symbolic jump targets. They will later be
replaced by absolute addresses.
Instead of absolute code addresses, one could generate relative addresses, i.e.,
relative to the actual PC.
Advantages:
• smaller addresses suffice most of the time;
• the code becomes relocatable, i.e., can be moved around in memory.
31
4.1
One-sided Conditional Statement
Let us first regard
s ≡ if (e) s′ .
Idea:
• Put code for the evaluation of e and s′ consecutively in the code store,
• Insert a conditional jump (jump on zero) in between.
32
code s ρ
=
codeR e ρ
codeR for e
jumpz A
jumpz
code s′ ρ
A:
code for s’
...
33
4.2
Two-sided Conditional Statement
s ≡ if (e) s1 else s2 . The same strategy yields:
Let us now regard
code s ρ
=
codeR e ρ
codeR for e
jumpz A
jumpz
code s1 ρ
code for s 1
jump B
A:
code s2 ρ
jump
B:
...
code for s
34
2
Example:
Be
s
ρ = { x 7→ 4, y 7→ 7} and
≡ if ( x > y)
(i )
x = x − y;
(ii )
else y = y − x;
(iii )
code s ρ produces:
loada 4
loada 4
loada 7
loada 7
loada 4
gr
sub
sub
jumpz A
storea 4
storea 7
pop
pop
jump B
(i )
(ii )
A:
B:
loada 7
...
(iii )
35
4.3
while-Loops
Let us regard the loop
code s ρ
s ≡ while (e) s′ . We generate:
codeR for e
=
A:
codeR e ρ
jumpz
jumpz B
code for s’
code s′ ρ
jump
jump A
B:
...
36
Example:
Be
ρ = { a 7→ 7, b 7→ 8, c 7→ 9} and s the statement:
while ( a > 0) {c = c + 1; a = a − b; }
code s ρ
A:
produces the sequence:
loada 7
loada 9
loada 7
loadc 0
loadc 1
loada 8
gr
add
sub
jumpz B
storea 9
storea 7
pop
pop
jump A
37
B:
...
4.4
for-Loops
The for-loop s ≡ for (e1 ; e2 ; e3 ) s′ is equivalent to the statement sequence
e1 ; while (e2 ) {s′ e3 ; } – provided that s′ contains no continue-statement.
We therefore translate:
code s ρ
=
codeR e1
pop
A:
codeR e2 ρ
jumpz B
code s′ ρ
codeR e3 ρ
pop
jump A
B:
38
...
4.5
The switch-Statement
Idea:
• Multi-target branching in constant time!
• Use a jump table, which contains at its i-th position the jump to the
beginning of the i-th alternative.
• Realized by indexed jumps.
q
jumpi B
B+q
PC
PC
PC = B + S[SP];
SP--;
39
Simplification:
We only regard switch-statements of the following form:
s
≡
switch (e) {
case 0:
ss0 break;
case 1:
..
.
ss1 break;
case k − 1:
ssk−1 break;
default: ssk
}
s is then translated into the instruction sequence:
40
code s ρ
=
codeR e ρ
C0 :
check 0 k B
Ck :
code ss0 ρ
B:
jump C0
jump D
...
...
jump Ck
code ssk ρ
D:
...
jump D
• The Macro check 0 k B checks, whether the R-value of e is in the interval
[ 0, k ], and executes an indexed jump into the table B
• The jump table contains direct jumps to the respective alternatives.
• At the end of each alternative is an unconditional jump out of the
switch-statement.
41
check 0 k B
=
dup
dup
jumpi B
loadc 0
loadc k
geq
le
loadc k
jumpz A
jumpz A
jumpi B
A:
pop
• The R-value of e is still needed for indexing after the comparison. It is
therefore copied before the comparison.
• This is done by the instruction
dup.
• The R-value of e is replaced by k before the indexed jump is executed if it is
less than 0 or greater than k.
42
3
dup
S[SP+1] = S[SP];
SP++;
43
3
3
Note:
• The jump table could be placed directly after the code for the Macro check.
This would save a few unconditional jumps. However, it may require to
search the switch-statement twice.
• If the table starts with u instead of 0, we have to decrease the R-value of e by
u before using it as an index.
• If all potential values of e are definitely in the interval [ 0, k ], the macro
check is not needed.
44
5
Storage Allocation for Variables
Goal:
Associate statically, i.e. at compile time, with each variable x a fixed (relative)
address ρ x
Assumptions:
• variables of basic types, e.g. int, . . . occupy one storage cell.
• variables are allocated in the store in the order, in which they are declared,
starting at address 1.
Consequently, we obtain for the declaration
type) the address environment ρ such that
ρ xi = i,
d ≡ t1 x1 ; . . . tk xk ;
i = 1, . . . , k
45
(ti basic
5.1
Arrays
Example:
int [11] a;
The array a consists of 11 components and therefore needs 11 cells.
ρ a is the address of the component a[ 0].
a[10]
a[0]
46
We need a function sizeof (notation: | · |), computing the space requirement of a
type:

 1
|t| =
 k · |t′ |
if t basic
if t ≡ t′ [ k ]
Accordingly, we obtain for the declaration
ρ x1
= 1
ρ xi
= ρ x i −1 + | t i −1 |
d ≡ t1 x1 ; . . . tk xk ;
for i > 1
Since | · | can be computed at compile time, also ρ can be computed at compile
time.
47
Task:
Extend codeL and codeR to expressions with accesses to array components.
Be
t[ c] a;
a.
the declaration of an array
To determine the start address of a component
ρ a + |t| ∗ (R-value of i).
a [i ] , we compute
In consequence:
codeL a [ e] ρ
=
loadc (ρ a )
codeR e ρ
loadc |t|
mul
add
. . . or more general:
48
codeL e1 [ e2 ] ρ
codeR e1 ρ
=
codeR e2 ρ
loadc |t|
mul
add
Remark:
• In C, an array is a pointer. A declared array a is a pointer-constant, whose
R-value is the start address of the array.
codeR e ρ = codeL e ρ
• Formally, we define for an array e:
• In C, the following are equivalent (as L-values):
2[ a]
a[2]
a+2
Normalization: Array names and expressions evaluating to arrays occur in
front of index brackets, index expressions inside the index brackets.
49
5.2
Structures
In Modula and Pascal, structures are called Records.
Simplification:
Names of structure components are not used elsewhere.
Alternatively, one could manage a separate environment
structure type st.
Be
struct { int a; int b; } x;
ρst
for each
part of a declaration list.
• x has as relative address the address of the first cell allocated for the
structure.
• The components have addresses relative to the start address of the structure.
In the example, these are a 7→ 0, b 7→ 1.
50
Let
t ≡ struct {t1 c1 ; . . . tk ck ; }. We have
k
|t| =
∑ | ti |
i =1
ρ c1
= 0 and
ρ ci
= ρ ci−1 + |ti−1 | for i > 1
We thus obtain:
codeL (e.c) ρ
=
codeL e ρ
loadc (ρ c)
add
51
Example:
Be struct { int a; int b; } x;
ρ = { x 7→ 13, a 7→ 0, b 7→ 1}.
such that
This yields:
codeL ( x.b) ρ
=
loadc 13
loadc 1
add
52
6
Pointer and Dynamic Storage Management
Pointer allow the access to anonymous, dynamically generated objects, whose
life time is not subject to the LIFO-principle.
==⇒ We need another potentially unbounded storage area H – the Heap.
S
H
0
MAX
SP
NP
EP
EP
NP
=
b New Pointer; points to the lowest occupied heap cell.
=
b Extreme Pointer; points to the uppermost cell, to which SP can point
(during execution of the actual function).
53
Idea:
• Stack and Heap grow toward each other in S, but must not collide. (Stack
Overflow).
• A collision may be caused by an increment of SP or a decrement of NP.
• EP saves us the check for collision at the stack operations.
• The checks at heap allocations are still necessary.
54
What can we do with pointers (pointer values)?
• set a pointer to a storage cell,
• dereference a pointer, access the value in a storage cell pointed to by a
pointer.
There a two ways to set a pointer:
(1)
A call malloc (e) reserves a heap area of the size of the value of e and
returns a pointer to this area:
codeR malloc (e) ρ
= codeR e ρ
new
(2)
The application of the address operator & to a variable returns a pointer
to this variable, i.e. its address (=
b L-value). Therefore:
codeR (&e) ρ = codeL e ρ
55
NP
NP
n
n
new
if (NP - S[SP] ≤ EP)
S[SP] = NULL;
else {
NP = NP - S[SP];
S[SP] = NP;
}
• NULL is a special pointer constant, identified with the integer constant 0.
• In the case of a collision of stack and heap the NULL-pointer is returned.
56
Dereferencing of Pointers:
The application of the operator ∗ to the expression e returns the contents of
the storage cell, whose address is the R-value of e:
codeL (∗e) ρ = codeR e ρ
Example:
Given the declarations
struct t { int a [ 7]; struct t ∗b; };
int i, j;
struct t ∗ pt;
and the expression (( pt → b) → a )[i + 1]
Because of
e → a ≡ (∗e).a holds:
codeL (e → a ) ρ
= codeR e ρ
loadc (ρ a )
add
57
b:
a:
b:
pt:
a:
j:
i:
58
Be
ρ = {i 7→ 1, j 7→ 2, pt 7→ 3, a 7→ 0, b 7→ 7 }. Then:
codeL (( pt → b) → a )[i + 1] ρ
=
codeR (( pt → b) → a ) ρ
=
codeR (( pt → b) → a ) ρ
codeR (i + 1) ρ
loada 1
loadc 1
loadc 1
mul
add
add
loadc 1
mul
add
59
For arrays, their R-value equals their L-value. Therefore:
codeR (( pt → b) → a ) ρ
=
codeR ( pt → b) ρ
=
loada 3
loadc 0
loadc 7
add
add
load
loadc 0
add
In total, we obtain the instruction sequence:
loada 3
load
loada 1
loadc 1
loadc 7
loadc 0
loadc 1
mul
add
add
add
add
60
7
Conclusion
We tabulate the cases of the translation of expressions:
codeL (e1 [ e2 ]) ρ
= codeR e1 ρ
codeR e2 ρ
loadc |t|
mul
if e1 has type t∗ or t[]
add
codeL (e.a ) ρ
= codeL e ρ
loadc (ρ a )
add
61
codeL (∗e) ρ
= codeR e ρ
codeL x ρ
= loadc (ρ x)
codeR (&e) ρ
= codeL e ρ
codeR e ρ
= codeL e ρ
codeR (e1 2 e2 ) ρ
= codeR e1 ρ
if e is an array
codeR e2 ρ
op
op instruction for operator ‘2’
62
codeR q ρ
= loadc q
codeR (e1 = e2 ) ρ
= codeR e2 ρ
q constant
codeL e1 ρ
store
codeR e ρ
= codeL e ρ
load
63
otherwise
Example:
int
For the statement:
a [ 10] , (∗b)[ 10] ;
∗ a = 5;
with ρ = { a 7→ 7, b 7→ 17}.
we obtain:
codeL (∗ a ) ρ
=
codeR a ρ
code (∗ a = 5; ) ρ
=
loadc 5
=
codeL a ρ
= loadc 7
loadc 7
store
pop
As an exercise translate:
s1 ≡ b = (&a ) + 2;
and
64
s2 ≡ ∗(b + 3)[ 0] = 5;
code (s1 s2 ) ρ
=
loadc 7
loadc 5
loadc 2
loadc 17
loadc 10
//
size of int[ 10]
load
mul
//
scaling
loadc 3
add
loadc 10
//
size of int[ 10]
loadc 17
mul
//
scaling
store
add
//
end of s2
pop
//
end of s1
store
pop
65
8
Freeing Occupied Storage
Problems:
• The freed storage area is still referenced by other pointers (dangling
references).
• After several deallocations, the storage could look like this (fragmentation):
frei
66
Potential Solutions:
•
Trust the programmer. Manage freed storage in a particular data structure
(free list) ==⇒
malloc or free my become expensive.
•
Do nothing, i.e.:
code free (e); ρ
=
codeR e ρ
pop
==⇒
•
simple and (in general) efficient.
Use an automatic, potentially “conservative” Garbage-Collection, which
occasionally collects certainly inaccessible heap space.
67
9
Functions
The definition of a function consists of
• a name, by which it can be called,
• a specification of the formal parameters;
• maybe a result type;
• a statement part, the body.
For C holds:
codeR f ρ
==⇒
=
loadc _ f
starting address of the code for f
=
Function names must also be managed in the address environment!
68
Example:
main () {
int n;
n = fac(2) + fac(1);
printf (“%d”, n);
}
int fac (int x) {
if (x ≤ 0) return 1;
else return x ∗ fac( x − 1);
}
At any time during the execution, several instances of one function may exist,
i.e., may have started, but not finished execution.
An instance is created by a call to the function.
The recursion tree in the example:
main
fac
fac
fac
fac
fac
69
printf
We conclude:
The formal parameters and local variables of the different instances of the same
function must be kept separate.
Idea:
Allocate a special storage area for each instance of a function.
In sequential programming languages these storage areas can be managed on a
stack. They are therefore called Stack Frames.
70
9.1
Storage Organization for Functions
SP
local variables
formal parameters
FP
PCold
organisational
cells
FPold
EPold
return value
FP =
b Frame Pointer; points to the last organizational cell and is used to address
the formal parameters and the local variables.
71
The caller must be able to continue execution in its frame after the return from a
function. Therefore, at a function call the following values have to be saved into
organizational cells:
• the FP
• the continuation address after the call and
• the actual EP.
Simplification:
The return value fits into one storage cell.
Translation tasks for functions:
• Generate code for the body!
• Generate code for calls!
72
9.2
Computing the Address Environment
We have to distinguish two different kinds of variables:
1. globals, which are defined externally to the functions;
2. locals/automatic (including formal parameters), which are defined
internally to the functions.
==⇒
The address environment ρ associates pairs
names.
(tag, a) ∈ { G, L} × N0
with their
Note:
• There exist more refined notions of visibility of (the defining occurrences of)
variables, namely nested blocks.
• The translation of different program parts in general uses different address
environments!
73
Example (1):
2
0
int i;
int k;
struct list {
scanf ("%d", &i);
int info;
scanlist (&l);
struct list ∗ next;
printf ("\n\t%d\n", ith (l,i));
}
} ∗ l;
address
1
void main () {
int ith (struct list ∗ x, int i) {
ρ0 :
if (i ≤ 1) return x →info;
else return ith (x →next, i − 1);
}
environment
0
i
7→
( G, 1)
l
7→
( G, 2)
ith
7→
( G, _ith)
main
7→
( G, _main)
...
74
at
Example (2):
2
0
int i;
int k;
struct list {
scanf ("%d", &i);
int info;
scanlist (&l);
struct list ∗ next;
printf ("\n\t%d\n", ith (l,i));
}
} ∗ l;
1
1
void main () {
int ith (struct list ∗ x, int i) {
inside
of
ith:
i
7→
( L, 2)
if (i ≤ 1) return x →info;
x
7→
( L, 1)
else return ith (x →next, i − 1);
l
7→
( G, 2)
ith
7→
( G, _ith)
ρ1 :
}
...
75
Example (3):
2
void main () {
int k;
0
int i;
scanf ("%d", &i);
struct list {
scanlist (&l);
int info;
printf ("\n\t%d\n", ith (l,i));
struct list ∗ next;
}
} ∗ l;
2
1
ρ2 :
int ith (struct list ∗ x, int i) {
if (i ≤ 1) return x →info;
else return ith (x →next, i − 1);
}
inside
of
i
7→
( G, 1)
l
7→
( G, 2)
k
7→
( L, 1)
ith
7→
( G, _ith)
main
7→
( G, _main)
...
76
main:
9.3
Calling/Entering and Leaving Functions
Be f the actual function, the Caller, and let f call the function g, the Callee.
The code for a function call has to be distributed among the Caller and the
Callee:
The distribution depends on who has which information.
77
Actions upon calling/entering
g:
o
1.
Saving FP, EP
2.
Computing the actual parameters
3.
Determining the start address of g
4.
Setting the new FP
5.
Saving PC and




call



o
enter
o
alloc
jump to the beginning of g
6.
Setting the new EP
7.
Allocating the local variables
Actions upon leaving
mark
g:



1.
Restoring the registers FP, EP, SP
2.
return
Returning to the code of f, i.e. restoring the 

PC
78













available in f














available in g

Altogether we generate for a call:
codeR g(e1 , . . . , en ) ρ
= mark
codeR e1 ρ
...
codeR em ρ
codeR g ρ
call n
where
n = space for the actual parameters
Note:
• Expressions occurring as actual parameters will be evaluated to their
R-value ==⇒ Call-by-Value-parameter passing.
• Function g can also be an expression, whose R-value is the start address of
the function to be called ...
79
• Function names are regarded as constant pointers to functions, similarly to
declared arrays. The R-value of such a pointer is the start address of the
function.
• For a variable
int (∗)() g; , the two calls
(∗ g)()
und
g()
are equivalent :-)
Normalization: Dereferencing of a function pointer is ignored.
• Structures are copied when they are passed as parameters.
In consequence:
codeR f ρ
= loadc (ρ f )
f a function name
codeR (∗e) ρ
= codeR e ρ
e a function pointer
codeR e ρ
= codeL e ρ
e a structure of size k
move k
80
k
move k
for (i = k-1; i≥0; i--)
S[SP+i] = S[S[SP]+i];
SP = SP+k–1;
81
The instruction mark allocates space for the return value and for the
organizational cells and saves the FP and EP.
FP
EP
FP
EP
e
mark
S[SP+2] = EP;
S[SP+3] = FP;
SP = SP + 4;
82
e
e
The instruction call n
PC their new values.
saves the continuation address and assigns FP, SP, and
q
n
FP
call n
p
PC
PC
p
q
FP = SP - n - 1;
S[FP] = PC;
PC = S[SP];
SP--;
83
Correspondingly, we translate a function definition:
code
t f (specs){V_defs ss} ρ
_f:
=
enter q
//
Setting the EP
alloc k
//
Allocating the local variables
//
leaving the function
code ss ρf
return
where
t
=
return type of f with |t| ≤ 1
q
=
maxS + k
maxS
=
maximal depth of the local stack
k
=
space for the local variables
ρf
=
address environment for f
//
where
takes care of specs, V_defs and ρ
84
The instruction enter q sets EP to its new value. Program execution is
terminated if not enough space is available.
EP
q
enter q
EP = SP + q;
if (EP ≥ NP)
Error (“Stack Overflow”);
85
The instruction
alloc k reserves stack space for the local variables.
k
alloc k
SP = SP + k;
86
The instruction return pops the actual stack frame, i.e., it restores the
registers PC, EP, SP, and FP and leaves the return value on top of the stack.
PC
FP
EP
p
return
PC
FP
EP
e
v
p
e
v
PC = S[FP]; EP = S[FP-2];
if (EP ≥ NP) Error (“Stack Overflow”);
SP = FP-3; FP = S[SP+2];
87
9.4
Access to Variables and Formal Parameters, and Return of
Values
Local variables and formal parameters are addressed relative to the current FP.
We therefore modify codeL for the case of variable names.
For
ρ x = (tag, j)
we define

 loadc j
codeL x ρ =
 loadrc j
88
tag = G
tag = L
The instruction loadrc j computes the sum of FP and
FP
f
loadrc j
SP++;
S[SP] = FP+j;
89
FP
j.
f
f+j
As an optimization one introduces the instructions loadr j and
This is analogous to loada j and storea j.
loadr j
=
storer j .
loadrc j
load
storer j
=
loadrc j
store
The code for return e; corresponds to an assignment to a variable with
relative address −3.
=
code return e; ρ
codeR e ρ
storer -3
return
90
Example:
For the function
int fac (int x) {
if (x ≤ 0) return 1;
else return x ∗ fac ( x − 1);
}
we generate:
_fac:
enter q
loadc 1
alloc 0
A:
loadr 1
mul
storer -3
mark
storer -3
loadr 1
return
loadr 1
return
loadc 0
jump B
loadc 1
leq
sub
jumpz A
loadc _fac
call 1
where
ρfac : x 7→ ( L, 1)
and
q = 1 + 4 + 2 = 7.
91
B:
return
10
Translation of Whole Programs
The state before program execution starts:
SP = −1
FP = EP = 0
Be p ≡ V_defs F_def1 . . . F_defn ,
fi , of which one is named main.
PC = 0
a program, where F_defi defines a function
The code for the program p consists of:
• Code for the function definitions F_defi ;
• Code for allocating the global variables;
• Code for the call of
NP = MAX
main();
• the instruction halt.
92
We thus define:
code p ∅
enter (k + 6)
=
alloc (k + 1)
mark
loadc _main
call 0
pop
halt
where
∅
=
b
_f1 :
code F_def1 ρ
..
.
_fn :
code F_defn ρ
empty address environment;
global address environment;
k
=
b
_main
∈
{_f1 , . . . , _fn }
ρ
=
b
space for global variables
93
The Translation of Functional
Programming Languages
94
11
The language PuF
We only regard a mini-language PuF (“Pure Functions”).
We do not treat, as yet:
• Side effects;
• Data structures.
95
A program is an expression e of the form:
e
::=
b | x | ( 21 e ) | ( e 1 22 e 2 )
| (if e0 then e1 else e2 )
| ( e ′ e 0 . . . e k −1 )
| (fun x0 . . . xk−1 → e)
| (let x1 = e1 in e0 )
| (let rec x1 = e1 and . . . and xn = en in e0 )
An expression is therefore
• a basic value, a variable, the application of an operator, or
• a function-application, a function-abstraction, or
• a let-expression, i.e. an expression with locally defined variables, or
• a let-rec-expression, i.e. an expression with simultaneously defined local
variables.
For simplicity, we only allow int as basic type.
96
Example:
The following well-known function computes the factorial of a natural number:
let rec fac
=
fun x → if x ≤ 1 then 1
else x · fac ( x − 1)
in fac 7
As usual, we only use the minimal amount of parentheses.
There are two Semantics:
CBV: Arguments are evaluated before they are passed to the function (as in
SML);
CBN: Arguments are passed unevaluated; they are only evaluated when their
value is needed (as in Haskell).
97
12
Architecture of the MaMa:
We know already the following components:
C
0
C
1
=
PC
Code-store – contains the MaMa-program;
each cell contains one instruction;
PC
=
Program Counter – points to the instruction to be executed next;
98
S
0
SP
FP
S
=
Runtime-Stack – each cell can hold a basic value or an address;
SP
=
Stack-Pointer – points to the topmost occupied cell;
as in the CMa implicitely represented;
FP
=
Frame-Pointer – points to the actual stack frame.
99
We also need a heap H:
Tag
Code Pointer
Value
Heap Pointer
100
... it can be thought of as an abstract data type, being capable of holding data
objects of the following form:
v
B −173
cp
Basic Value
gp
Closure
C
cp
ap
gp
Function
F
v[0]
......
v[n−1]
V n
Vector
101
The instruction new (tag, args) creates a corresponding object (B, C, F, V) in H
and returns a reference to it.
We distinguish three different kinds of code for an expression e:
• codeV e — (generates code that) computes the Value of e, stores it in the
heap and returns a reference to it on top of the stack (the normal case);
• codeB e — computes the value of e, and returns it on the top of the stack
(only for Basic types);
• codeC e — does not evaluate e, but stores a Closure of e in the heap and
returns a reference to the closure on top of the stack.
We start with the code schemata for the first two kinds:
102
13
Simple expressions
Expressions consisting only of constants, operator applications, and conditionals
are translated like expressions in imperative languages:
codeB b ρ sd
=
loadc b
codeB (21 e) ρ sd
=
codeB e ρ sd
op1
codeB (e1 22 e2 ) ρ sd
=
codeB e1 ρ sd
codeB e2 ρ (sd + 1)
op2
103
codeB (if e0 then e1 else e2 ) ρ sd
=
codeB e0 ρ sd
jumpz A
codeB e1 ρ sd
jump B
A: codeB e2 ρ sd
B:
104
...
Note:
•
ρ denotes the actual address environment, in which the expression is
translated.
• The extra argument sd, the stack difference, simulates the movement of the
SP when instruction execution modifies the stack. It is needed later to
address variables.
• The instructions op1 and op2 implement the operators 21 and 22 , in the
same way as the the operators neg and add implement negation resp.
addition in the CMa.
• For all other expressions, we first compute the value in the heap and then
dereference the returned pointer:
codeB e ρ sd
= codeV e ρ sd
getbasic
105
B 17
17
getbasic
if (H[S[SP]] != (B,_))
Error “not basic!”;
else
S[SP] = H[S[SP]].v;
106
For codeV and simple expressions, we define analogously:
codeV b ρ sd
=
loadc b; mkbasic
codeV (21 e) ρ sd
=
codeB e ρ sd
op1 ; mkbasic
codeV (e1 22 e2 ) ρ sd
=
codeB e1 ρ sd
codeB e2 ρ (sd + 1)
op2 ; mkbasic
codeV (if e0 then e1 else e2 ) ρ sd
=
codeB e0 ρ sd
jumpz A
codeV e1 ρ sd
jump B
A: codeV e2 ρ sd
B:
107
...
17
B 17
mkbasic
S[SP] = new (B,S[SP]);
108
14
Accessing Variables
We must distinguish between local and global variables.
Example:
Regard the function f :
let
in let
c=5
f = fun a
→ let b = a ∗ a
in b + c
in
f c
The function f uses the global variable c and the local variables a (as formal
parameter) and b (introduced by the inner let).
The binding of a global variable is determined, when the function is constructed
(static scoping!), and later only looked up.
109
Accessing Global Variables
• The bindings of global variables of an expression or a function are kept in a
vector in the heap (Global Vector).
• They are addressed consecutively starting with 0.
• When an F-object or a C-object are constructed, the Global Vector for the
function or the expression is determined and a reference to it is stored in the
gp-component of the object.
• During the evaluation of an expression, the (new) register GP (Global
Pointer) points to the actual Global Vector.
• In constrast, local variables should be administered on the stack ...
==⇒
General form of the address environment:
ρ : Vars → { L, G } × Z
110
Accessing Local Variables
Local variables are administered on the stack, in stack frames.
Let e ≡ e′ e0 . . . em−1 be the application of a function e′ to arguments
e 0 , . . . , e m−1 .
Warning:
The arity of e′ does not need to be m
:-)
• f may therefore receive less than n arguments (under supply);
• f may also receive more than n arguments, if t is a functional type (over
supply).
111
Possible stack organisations:
F
e′
e m−1
e0
FP
+ Addressing of the arguments can be done relative to FP
− The local variables of e′ cannot be addressed relative to FP.
− If e′ is an n-ary function with n < m, i.e., we have an over-supplied function
application, the remaining m − n arguments will have to be shifted.
112
− If e′ evaluates to a function, which has already been partially applied to the
parameters a0 , . . . , ak−1 , these have to be sneaked in underneath e0 :
e m−1
e0
a1
a0
FP
113
Alternative:
e′
F
e0
e m−1
FP
+ The further arguments a0 , . . . , ak−1 and the local variables can be allocated
above the arguments.
114
a0
e0
a1
e m−1
FP
− Addressing of arguments and local variables relative to FP is no more
possible. (Remember: m is unknown when the function definition is
translated.)
115
Way out:
• We address both, arguments and local variables, relative to the stack pointer
SP !!!
• However, the stack pointer changes during program execution...
SP
sd
e0
sp 0
e m−1
FP
116
• The differerence between the current value of SP and its value sp0 at the
entry of the function body is called the stack distance, sd.
• Fortunately, this stack distance can be determined at compile time for each
program point, by simulating the movement of the SP.
• The formal parameters x0 , x1 , x2 , . . . successively receive the non-positive
relative addresses 0, −1, −2, . . ., i.e.,
ρ xi = ( L, −i ).
• The absolute address of the i-th formal parameter consequently is
sp0 − i = (SP − sd) − i
• The local let-variables y1 , y2 , y3 , . . . will be successively pushed onto the
stack:
117
SP
sd
y3
3
2
y2
y1
x0
1
sp 0 :
0
x1
−1
−2
x k −1
• The yi have positive relative addresses 1, 2, 3, . . ., that is:
ρ yi = ( L, i ).
sp0 + i = (SP − sd) + i
• The absolute address of yi is then
118
With CBN, we generate for the access to a variable:
codeV x ρ sd
= getvar x ρ sd
eval
The instruction eval checks, whether the value has already been computed
or whether its evaluation has to yet to be done (==⇒ will be treated later :-)
With CBV, we can just delete
The (compile-time) macro
eval from the above code schema.
getvar
getvar x ρ sd
is defined by:
= let (t, i ) = ρ x in
match t with
L → pushloc (sd − i )
| G → pushglob i
end
119
The access to local variables:
pushloc n
n
S[SP+1] =S[SP - n]; SP++;
120
Correctness argument:
Let sp and sd be the values of the stack pointer resp. stack distance before the
execution of the instruction. The value of the local variable with address i is
loaded from S[ a ] with
a = sp − (sd − i ) = (sp − sd) + i = sp0 + i
... exactly as it should be
:-)
121
The access to global variables is much simpler:
pushglob i
GP
V
GP
i
SP = SP + 1;
S[SP] = GP→v[i];
122
V
Example:
Regard
e ≡ (b + c)
for
ρ = {b 7→ ( L, 1), c 7→ ( G, 0)} and
sd = 1.
With CBN, we obtain:
codeV e ρ 1
= getvar b ρ 1
= 1 pushloc 0
eval
2
eval
getbasic
2
getbasic
getvar c ρ 2
2
pushglob 0
eval
3
eval
getbasic
3
getbasic
add
3
add
mkbasic
2
mkbasic
123
15
let-Expressions
As a warm-up let us first consider the treatment of local variables :-)
Let
e ≡ let y1 = e1 in . . . let en in e0
be a nested let-expression.
The translation of e must deliver an instruction sequence that
• allocates local variables y1 , . . . , yn ;
• in the case of
CBV:
evaluates e1 , . . . , en and binds the yi to their values;
CBN:
constructs closures for the e1 , . . . , en and binds the yi to them;
• evaluates the expression e0 and returns its value.
Here, we consider the non-recursive case only, i.e. where y j only depends on
y1 , . . . , y j−1 . We obtain for CBN:
124
codeV e ρ sd
= codeC e1 ρ sd
codeC e2 ρ1 (sd + 1)
...
codeC en ρn−1 (sd + n − 1)
codeV e0 ρn (sd + n)
slide n
where
// deallocates local variables
ρ j = ρ ⊕ { yi 7→ ( L, sd + i ) | i = 1, . . . , j}.
In the case of CBV, we use codeV for the expressions e1 , . . . , en .
Warning!
All the ei must be associated with the same binding for the global variables!
125
Example:
Consider the expression
e ≡ let a = 19 in let b = a ∗ a in a + b
for ρ = ∅ and sd = 0. We obtain (for CBV):
0
loadc 19
3
getbasic
3
pushloc 1
1
mkbasic
3
mul
4
getbasic
1
pushloc 0
2
mkbasic
4
add
2
getbasic
2
pushloc 1
3
mkbasic
2
pushloc 1
3
getbasic
3
slide 2
126
The instruction
slide k deallocates again the space for the locals:
slide k
k
S[SP-k] = S[SP];
SP = SP - k;
127
16
Function Definitions
The definition of a function f requires code that allocates a functional value for f
in the heap. This happens in the following steps:
• Creation of a Global Vector with the binding of the free variables;
• Creation of an (initially empty) argument vector;
• Creation of an F-Object, containing references to theses vectors and the start
address of the code for the body;
Separately, code for the body has to be generated.
Thus:
128
codeV (fun x0 . . . xk−1 → e) ρ sd
=
getvar z0 ρ sd
getvar z1 ρ (sd + 1)
...
getvar z g−1 ρ (sd + g − 1)
mkvec g
mkfunval A
jump B
A:
targ k
codeV e ρ′ 0
return k
B:
where
and
...
{ z0 , . . . , z g−1 } = free(fun x0 . . . xk−1 → e)
ρ′ = { xi 7→ ( L, −i ) | i = 0, . . . , k − 1} ∪ { z j 7→ ( G, j) | j = 0, . . . , g − 1}
129
V g
g
mkvec g
h = new (V, n);
SP = SP - g + 1;
for (i=0; i<g; i++)
h→v[i] = S[SP + i];
S[SP] = h;
130
F A
mkfunval A
V
V 0
V
a = new (V,0);
S[SP] = new (F, A, a, S[SP]);
131
Example:
Regard
f ≡ fun b → a + b
ρ = { a 7→ ( L, 1)} and
for
sd = 1.
codeV f ρ 1 produces:
1
pushloc 0
0
pushglob 0
2
getbasic
2
mkvec 1
1
eval
2
add
2
mkfunval A
1
getbasic
1
mkbasic
2
jump B
1
pushloc 1
1
return 1
targ 1
2
eval
2
0
A:
The secrets around
targ k and
B:
...
return k will be revealed later :-)
132
17
Function Application
Function applications correspond to function calls in C.
The necessary actions for the evaluation of e′ e0 . . . em−1
are:
• Allocation of a stack frame;
• Transfer of the actual parameters , i.e. with:
CBV:
Evaluation of the actual parameters;
CBN:
Allocation of closures for the actual parameters;
• Evaluation of the expression e′ to an F-object;
• Application of the function.
Thus for CBN:
133
codeV (e′ e0 . . . em−1 ) ρ sd
=
mark A
// Allocation of the frame
codeC em−1 ρ (sd + 3)
codeC em−2 ρ (sd + 4)
...
codeC e0 ρ (sd + m + 2)
A:
codeV e′ ρ (sd + m + 3)
// Evaluation of e′
apply
// corresponds to call
...
To implement CBV, we use codeV instead of codeC for the arguments ei .
Example:
For ( f 42) , ρ = { f 7→ ( L, 2)} and sd = 2, we obtain with CBV:
2
mark A
6
mkbasic
7
5
loadc 42
6
pushloc 4
3
134
apply
A:
...
A Slightly Larger Example:
let a = 17 in let f = fun b → a + b in f 42
For CBV and
sd = 0
we obtain:
0
loadc 17
2
1
mkbasic
0
1
pushloc 0
0
pushglob 0 1
mkbasic 6
pushloc 4
2
mkvec 1
1
getbasic
1
return 1
7
apply
2
mkfunval A 1
pushloc 1
2
mark C
3
A:
jump B
2
getbasic
5
loadc 42
targ 1
2
add
5
mkbasic
135
B:
C:
slide 2
For the implementation of the new instruction, we must fix the organization of a
stack frame:
SP
local stack
Arguments
FP
PCold
FPold
GPold
0
-1
-2
136
3 org. cells
Different from the CMa, the instruction mark A already saves the return
address:
A
mark A
FP
FP
GP
GP
V
S[SP+1] = GP;
S[SP+2] = FP;
S[SP+3] = A;
FP = SP = SP + 3;
137
V
The instruction apply unpacks the F-object, a reference to which (hopefully)
resides on top of the stack, and continues execution at the address given there:
GP
PC
ap gp
GP
V
apply
F 42
PC 42
V n
h = S[SP];
if (H[h] != (F,_,_))
Error “no fun”;
else {
GP = h→gp; PC = h→cp;
for (i=0; i< h→ap→n; i++)
S[SP+i] = h→ap→v[i];
SP = SP + h→ap→n – 1;
}
138
V
Warning:
• The last element of the argument vector is the last to be put onto the stack.
This must be the first argument reference.
• This should be kept in mind, when we treat the packing of arguments of an
under-supplied function application into an F-object !!!
139
18
Over– and Undersupply of Arguments
The first instruction to be executed when entering a function body, i.e., after an
apply is targ k .
This instruction checks whether there are enough arguments to evaluate the
body.
Only if this is the case, the execution of the code for the body is started.
Otherwise, i.e. in the case of under-supply, a new F-object is returned.
The test for number of arguments uses:
140
SP – FP
targ k
is a complex instruction.
We decompose its execution in the case of under-supply into several steps:
targ k
= if (SP – FP < k) {
mkvec0;
// creating the argumentvector
wrap;
// wrapping into an F − object
popenv;
// popping the stack frame
}
The combination of these steps into one instruction is a kind of optimization
141
:-)
The instruction mkvec0 takes all references from the stack above FP and
stores them into a vector:
V g
g
FP
mkvec0
FP
g = SP–FP; h = new (V, g);
SP = FP+1;
for (i=0; i<g; i++)
h→v[i] = S[SP + i];
S[SP] = h;
142
The instruction wrap wraps the argument vector together with the global
vector and PC-1 into an F-object:
ap gp
F 41
wrap
V
GP
V
V
GP
PC 42
PC 42
S[SP] = new (F, PC-1, S[SP], GP);
143
V
The instruction popenv
FP
42
finally releases the stack frame:
popenv
FP
19
GP = S[FP-2];
S[FP-2] = S[SP];
PC = S[FP];
SP = FP - 2;
FP = S[FP-1];
144
PC
GP
42
19
Thus, we obtain for targ k in the case of under supply:
GP
PC 42
V
mkvek0
FP
17
V
145
GP
PC 42
V
Vm
wrap
FP
17
V
146
GP
PC 42
F 41
V
Vm
popenv
FP
17
V
147
GP
F 41
V
PC 17
V
FP
V
148
• The stack frame can be released after the execution of the body if exactly the
right number of arguments was available.
• If there is an oversupply of arguments, the body must evaluate to a function,
which consumes the rest of the arguments ...
• The check for this is done by return k:
return k
= if (SP − FP = k + 1)
// Done
popenv;
else {
// There are more arguments
slide k;
// another application
apply;
}
The execution of
return k results in:
149
Case:
Done
GP
GP
PC
PC 17
popenv
k
FP
FP
17
V
V
150
Case:
Over-supply
F
F
k
FP
slide k
apply
FP
151
19
let-rec-Expressions
Consider the expression
e ≡ let rec y1 = e1 and . . . and yn = en in e0
.
The translation of e must deliver an instruction sequence that
• allocates local variables y1 , . . . , yn ;
• in the case of
CBV:
evaluates e1 , . . . , en and binds the yi to their values;
CBN:
constructs closures for the e1 , . . . , en and binds the yi to them;
• evaluates the expression e0 and returns its value.
Warning:
In a letrec-expression, the definitions can use variables that will be allocated
only later! ==⇒ Dummy-values are put onto the stack before processing the
definition.
152
For CBN, we obtain:
codeV e ρ sd
= alloc n
// allocates local variables
codeC e1 ρ′ (sd + n)
rewrite n
...
codeC en ρ′ (sd + n)
rewrite 1
codeV e0 ρ′ (sd + n)
slide n
where
// deallocates local variables
ρ′ = ρ ⊕ { yi 7→ ( L, sd + i ) | i = 1, . . . , n}.
In the case of CBV, we also use codeV for the expressions e1 , . . . , en .
Warning:
Recursive definitions of basic values are undefined with CBV!!!
153
Example:
Consider the expression
e ≡ let rec f = fun x y → ify ≤ 1 then x else f ( x ∗ y)( y − 1) in f 1
for ρ = ∅ and sd = 0. We obtain (for CBV):
0
alloc 1
0
1
pushloc 0
2
A:
targ 2
4
loadc 1
0
...
5
mkbasic
mkvec 1
1
return 2
5
pushloc 4
2
mkfunval A
2
rewrite 1
6
apply
2
jump B
1
mark C
2
B:
154
C:
slide 1
The instruction alloc n
n dummy nodes:
reserves n cells on the stack and initialises them with
C
C
C
C
alloc n
for (i=1; i<=n; i++)
S[SP+i] = new (C,-1,-1);
SP = SP + n;
155
−1
−1
−1
−1
−1
−1
−1
−1
n
The instruction rewrite n overwrites the contents of the heap cell pointed to
by the reference at S[SP–n]:
x
rewrite n
n
x
H[S[SP-n]] = H[S[SP]];
SP = SP - 1;
• The reference
S[SP – n]
remains unchanged!
• Only its contents is changed!
156
20
Closures and their Evaluation
• Closures are needed for the implementation of CBN and for functional
paramaters.
• Before the value of a variable is accessed (with CBN), this value must be
available.
• Otherwise, a stack frame must be created to determine this value.
• This task is performed by the instruction
157
eval.
eval can be decomposed into small actions:
eval
= if (H[ S[ SP]] ≡ (C, _, _)) {
mark0;
// allocation of the stack frame
pushloc 3;
// copying of the reference
apply0;
// corresponds to apply
}
• A closure can be understood as a parameterless function. Thus, there is no
need for an ap-component.
• Evaluation of the closure thus means evaluation of an application of this
function to 0 arguments.
• In constrast to
mark A ,
• The difference between
is put on the stack.
mark0 dumps the current PC.
apply and apply0 is that no argument vector
158
17
mark0
FP
GP
PC 17
FP
V
S[SP+1] = GP;
S[SP+2] = FP;
S[SP+3] = PC;
FP = SP = SP + 3;
159
GP
PC 17
V
cp gp
C 42
cp gp
V
C 42
apply0
GP
GP
PC
PC 42
h = S[SP]; SP--;
GP = h→gp; PC = h→cp;
We thus obtain for the instruction
eval:
160
V
cp gp
C 42
GP
FP
V
mark0
V
pushloc 3
3
PC 17
17
3
cp gp
C 42
GP
FP
3
PC 17
161
17
3
cp gp
C 42
GP
FP
V
3
PC 17
17
3
cp gp
C 42
V
GP
FP
PC 42
162
apply0
The construction of a closure for an expression e consists of:
• Packing the bindings for the free variables into a vector;
• Creation of a C-object, which contains a reference to this vector and to the
code for the evaluation of e:
codeC e ρ sd
=
getvar z0 ρ sd
getvar z1 ρ (sd + 1)
...
getvar z g−1 ρ (sd + g − 1)
mkvec g
mkclos A
jump B
A:
codeV e ρ′ 0
update
B:
where
...
{ z0 , . . . , z g−1 } = free(e) and ρ′ = { zi 7→ ( G, i ) | i = 0, . . . , g − 1}.
163
Example:
Consider e ≡ a ∗ a with ρ = { a 7→ ( L, 0)} and sd = 1. We obtain:
1
pushloc 1
0
2
mkvec 1
2
2
A:
pushglob 0
2
getbasic
1
eval
2
mul
mkclos A
1
getbasic
1
mkbasic
jump B
1
pushglob 0
1
update
2
eval
2
164
B:
...
• The instruction
mkclos A is analogous to the instruction
mkfunval A.
• It generates a C-object, where the included code pointer is A.
C A
mkclos A
V
V
S[SP] = new (C, A, S[SP]);
165
In fact, the instruction
update is the combination of the two actions:
popenv
rewrite 1
It overwrites the closure with the computed value.
FP
FP
42
update
19
C
166
PC
42
GP
19
21
Optimizations I:
Global Variables
Observation:
• Functional programs construct many F- and C-objects.
• This requires the inclusion of (the bindings of) all global variables.
Recall, e.g., the construction of a closure for an expression e ...
167
codeC e ρ sd
=
getvar z0 ρ sd
getvar z1 ρ (sd + 1)
...
getvar z g−1 ρ (sd + g − 1)
mkvec g
mkclos A
jump B
A:
codeV e ρ′ 0
update
B:
where
...
{ z0 , . . . , z g−1 } = free(e) and ρ′ = { zi 7→ ( G, i ) | i = 0, . . . , g − 1}.
168
Idea:
• Reuse Global Vectors, i.e. share Global Vectors!
• Profitable in the translation of let-expressions or function applications: Build
one Global Vector for the union of the free-variable sets of all let-definitions
resp. all arguments.
• Allocate (references to ) global vectors with multiple uses in the stack frame
like local variables!
• Support the access to the current GP by an instruction
169
copyglob
:
GP
V
GP
copyglob
SP++;
S[SP] = GP;
170
V
• The optimization will cause Global Vectors to contain more components
than just references to the free the variables that occur in one expression ...
Disadvantage: Superfluous components in Global Vectors prevent the
deallocation of already useless heap objects ==⇒ Space Leaks :-(
Potential Remedy: Deletion of references at the end of their life time.
171
22
Optimizations II:
Closures
In some cases, the construction of closures can be avoided, namely for
• Basic values,
• Variables,
• Functions.
172
Basic Values:
The construction of a closure for the value is at least as expensive as the
construction of the B-object itself!
Therefore:
codeC b ρ sd
= codeV b ρ sd = loadc b
mkbasic
This replaces:
mkvec 0
mkclos A
A:
jump B
mkbasic
loadc b
update
173
B:
...
Variables:
Variables are either bound to values or to C-objects. Constructing another
closure is therefore superfluous. Therefore:
codeC x ρ sd
= getvar x ρ sd
This replaces:
getvar x ρ sd
mkclos A
mkvec 1
jump B
Example:
A:
pushglob 0
eval
e ≡ let rec a = b and b = 7 in a.
update
B:
...
codeV e ∅ 0
produces:
0
alloc 2
3
rewrite 2
3
mkbasic
2
pushloc 1
2
pushloc 0
2
loadc 7
3
rewrite 1
3
eval
3
slide 2
174
The execution of this instruction sequence should deliver the basic value 7 ...
175
0
alloc 2
3
rewrite 2
3
mkbasic
2
pushloc 1
2
pushloc 0
2
loadc 7
3
rewrite 1
3
eval
3
slide 2
alloc 2
176
0
alloc 2
3
rewrite 2
3
mkbasic
2
pushloc 1
2
pushloc 0
2
loadc 7
3
rewrite 1
3
eval
3
slide 2
pushloc 0
C
−1
−1
C
−1
−1
177
0
alloc 2
3
rewrite 2
3
mkbasic
2
pushloc 1
2
pushloc 0
2
loadc 7
3
rewrite 1
3
eval
3
slide 2
rewrite 2
C
−1
−1
C
−1
−1
178
0
alloc 2
3
rewrite 2
3
mkbasic
2
pushloc 1
2
pushloc 0
2
loadc 7
3
rewrite 1
3
eval
3
slide 2
loadc 7
C
−1
−1
C
−1
−1
179
0
alloc 2
3
rewrite 2
3
mkbasic
2
pushloc 1
2
pushloc 0
2
loadc 7
3
rewrite 1
3
eval
3
slide 2
mkbasic
7
C
−1
−1
C
−1
−1
180
0
alloc 2
3
rewrite 2
3
mkbasic
2
pushloc 1
2
pushloc 0
2
loadc 7
3
rewrite 1
3
eval
3
slide 2
rewrite 1
B
7
C
−1
−1
C
−1
−1
181
0
alloc 2
3
rewrite 2
3
mkbasic
2
pushloc 1
2
pushloc 0
2
loadc 7
3
rewrite 1
3
eval
3
slide 2
pushloc 1
B
7
C
−1
182
−1
0
alloc 2
3
rewrite 2
3
mkbasic
2
pushloc 1
2
pushloc 0
2
loadc 7
3
rewrite 1
3
eval
3
slide 2
eval
C
−1
B
7
C
−1
183
−1
−1
0
alloc 2
3
rewrite 2
3
mkbasic
2
pushloc 1
2
pushloc 0
2
loadc 7
3
rewrite 1
3
eval
3
slide 2
Segmentation Fault !!
184
Apparently, this optimization was not quite correct
:-(
The Problem:
Binding of variable y to variable x before x’s dummy node is replaced!!
==⇒
The Solution:
cyclic definitions: reject sequences of definitions like
let a = b; . . . b = a in . . ..
acyclic definitions: order the definitions y = x such that the dummy node for
the right side of x is already overwritten.
185
Functions:
Functions are values, which are not evaluated further. Instead of generating
code that constructs a closure for an F-object, we generate code that constructs
the F-object directly.
Therefore:
codeC (fun x0 . . . xk−1 → e) ρ sd
= codeV (fun x0 . . . xk−1 → e) ρ sd
186
23
The Translation of a Program Expression
Execution of a program e starts with
PC = 0
SP = FP = GP = −1
The expression e must not contain free variables.
The value of e should be determined and then a
executed.
code e
halt instruction should be
= codeV e ∅ 0
halt
187
Remarks:
• The code schemata as defined so far produce Spaghetti code.
• Reason: Code for function bodies and closures placed directly behind the
instructions mkfunval resp. mkclos with a jump over this code.
• Alternative: Place this code somewhere else, e.g. following the
halt-instruction:
Advantage: Elimination of the direct jumps following mkfunval and
mkclos.
Disadvantage: The code schemata are more complex as they would have to
accumulate the code pieces in a Code-Dump.
==⇒
Solution:
Disentangle the Spaghetti code in a subsequent optimization phase
188
:-)
Example:
let a = 17 in let f = fun b → a + b in f 42
Disentanglement of the jumps produces:
0
loadc 17
2
mark B
3
1
mkbasic
5
loadc 42
1
1
pushloc 0
6
mkbasic
0
2
mkvec 1
6
pushloc 4
2
mkfunval A
7
7
slide 2
1
pushloc 1
halt
2
eval
targ 1
2
getbasic
0
pushglob 0
2
add
eval
1
eval
1
mkbasic
apply
1
getbasic
1
return 1
189
B:
A:
24
Structured Data
In the following, we extend our functional programming language by some
datatypes.
24.1
Tuples
Constructors:
Destructors:
(., . . . , .), k-ary with k ≥ 0;
# j for j ∈ N0
(Projections)
We extend the syntax of expressions correspondingly:
e
::=
. . . | ( e 0 , . . . , e k −1 ) | # j e
| let ( x0 , . . . , xk−1 ) = e1 in e0
190
• In order to construct a tuple, we collect sequence of references on the stack.
Then we construct a vector of these references in the heap using mkvec
• For returning components we use an indexed access into the tuple.
codeV (e0 , . . . , ek−1 ) ρ sd
= codeC e0 ρ sd
codeC e1 ρ (sd + 1)
...
codeC ek−1 ρ (sd + k − 1)
mkvec k
codeV (# j e) ρ sd
= codeV e ρ sd
get j
eval
In the case of CBV, we directly compute the values of the ei .
191
j
V g
V g
get j
if (S[SP] == (V,g,v))
S[SP] = v[j];
else Error “Vector expected!”;
192
Inversion:
Accessing all components of a tuple simulataneously:
e ≡ let ( y0 , . . . , yk−1 ) = e1 in e0
This is translated as follows:
codeV e ρ sd
= codeV e1 ρ sd
getvec k
codeV e0 ρ′ (sd + k )
slide k
where
ρ′ = ρ ⊕ { yi 7→ ( L, sd + i + 1) | i = 0, . . . , k − 1}.
The instruction
the stack:
getvec k pushes the components of a vector of length k onto
193
V k
V k
getvec k
if (S[SP] == (V,k,v)) {
SP--;
for(i=0; i<k; i++) {
SP++; S[SP] = v[i];
}
} else Error “Vector expected!”;
194
24.2
Lists
Lists are constructed by the constructors:
[℄
“::”
“Nil”, the empty list;
“Cons”, right-associative, takes an element and a list.
Access to list components is possible by match-expressions ...
Example:
The append function
app
=
app:
fun l y → match l with
[]
→
y|
h :: t
→
h :: (app t y)
195
accordingly, we extend the syntax of expressions:
e
::=
. . . | [] | (e1 :: e2 )
| (match e0 with [] → e1 | h :: t → e2 )
Additionally, we need new heap objects:
L
empty list
Nil
s[0]
s[1]
L Cons
non−empty list
196
24.3
Building Lists
The new instructions
nil and
cons are introduced for building list nodes.
We translate for CBN:
codeV [] ρ sd
= nil
codeV (e1 :: e2 ) ρ sd
= codeC e1 ρ sd
codeC e2 ρ (sd + 1)
cons
Note:
• With CBN:
Closures are constructed for the arguments of “:”;
• With CBV:
Arguments of “:” are evaluated :-)
197
L Nil
nil
SP++; S[SP] = new (L,Nil);
198
L Cons
cons
S[SP-1] = new (L,Cons, S[SP-1], S[SP]);
SP- -;
199
24.4
Pattern Matching
Consider the expression
e ≡ match e0 with [] → e1 | h :: t → e2 .
Evaluation of e requires:
• evaluation of e0 ;
• check, whether resulting value v is an L-object;
• if v is the empty list, evaluation of e1 ...
• otherwise storing the two references of v on the stack and evaluation of e2 .
This corresponds to binding h and t to the two components of v.
200
In consequence, we obtain (for CBN as for CBV):
codeV e ρ sd
=
codeV e0 ρ sd
tlist A
codeV e1 ρ sd
jump B
A:
codeV e2 ρ′ (sd + 2)
slide 2
B:
...
where ρ′ = ρ ⊕ { h 7→ ( L, sd + 1), t 7→ ( L, sd + 2)}.
The new instruction tlist A does the necessary checks and (in the case of
Cons) allocates two new local variables:
201
L Nil
tlist A
h = S[SP];
if (H[h] != (L,...)
Error “no list!”;
if (H[h] == (_,Nil)) SP- -;
...
202
L Nil
L Cons
L Cons
tlist A
PC A
... else {
S[SP+1] = S[SP]→s[1];
S[SP] = S[SP]→s[0];
SP++; PC = A;
}
203
Example:
The (disentangled) body of the function
app
with
app 7→ ( G, 0) :
0
targ 2
3
pushglob 0
0
0
pushloc 0
4
pushloc 2
3
pushglob 2
1
eval
5
pushloc 6
4
pushglob 1
1
tlist A
6
mkvec 3
5
pushglob 0
0
pushloc 1
4
mkclos C
6
eval
1
eval
4
cons
6
apply
1
jump B
3
slide 2
1
pushloc 1
1
2
A:
B:
C:
D:
mark D
update
return 2
Note:
Datatypes with more than two constructors need a generalization of the tlist
instruction, corresponding to a swith-instruction :-)
204
24.5
Closures of Tuples and Lists
The general schema for
codeC (e0 , . . . , ek−1 ) ρ sd
codeC
can be optimized for tuples and lists:
= codeV (e0 , . . . , ek−1 ) ρ sd = codeC e0 ρ sd
codeC e1 ρ (sd + 1)
...
codeC ek−1 ρ (sd + k − 1)
mkvec k
codeC [] ρ sd
= codeV [] ρ sd
= nil
codeC (e1 : e2 ) ρ sd
= codeV (e1 : e2 ) ρ sd
= codeC e1 ρ sd
codeC e2 ρ (sd + 1)
cons
205
25
Last Calls
A function application is called last call in an expression e if this application
could deliver the value for e.
A last call usually is the outermost application of a defining expression.
A function definition is called tail recursive if all recursive calls are last calls.
Examples:
r t ( h :: y) is a last call in
match x with [] → y | h :: t → r t ( h :: y)
f ( x − 1) is not a last call in
if x ≤ 1 then 1 else x ∗ f ( x − 1)
Observation:
Last calls in a function body need no new stack frame!
==⇒
Automatic transformation of tail recursion into loops!!!
206
The code for a last call l ≡ (e′ e0 . . . em1 ) inside a function f with k arguments
must
1. allocate the arguments ei and evaluate e′ to a function (note: all this inside
f ’s frame!);
2. deallocate the local variables and the k consumed arguments of f ;
3. execute an apply.
codeV l ρ sd
= codeC em−1 ρ sd
codeC em−2 ρ (sd + 1)
...
codeC e0 ρ (sd + m − 1)
codeV e′ ρ (sd + m )
// Evaluation of the function
move r (m + 1)
// Deallocation of r cells
apply
where r = sd + k is the number of stack cells to deallocate.
207
Example:
The body of the function
r = fun x y → match x with [] → y | h :: t → r t ( h :: y)
0
targ 2
1
jump B
0
pushloc 0
1
eval
2
1
tlist A
3
pushloc 4
apply
0
pushloc 1
4
cons
slide 2
1
eval
3
pushloc 1
A:
pushloc 1
4
pushglob 0
5
eval
5
move 4 3
1
B:
return 2
Since the old stack frame is kept, return 2 will only be reached by the direct
jump at the end of the []-alternative.
208
k
move r k
r
SP = SP – k – r;
for (i=1; i≤k; i++)
S[SP+i] = S[SP+i+r];
SP = SP + k;
209
Threads
210
26
The Language ThreadedC
We extend C by a simple thread concept. In particular, we provide functions for:
• generating new threads: create();
• terminating a thread: exit();
• waiting for termination of a thread:
• mutual exclusion:
join();
lock(), unlock(); ...
In order to enable a parallel program execution, we extend the abstract machine
(what else? :-)
211
27
Storage Organization
All threads share the same common code store and heap:
C
0
1
0
1
PC
H
2
NP
212
... similar to the CMa, we have:
C
=
Code Store – contains the CMa program;
every cell contains one instruction;
PC
=
Program-Counter – points to the next executable instruction;
H
=
Heap –
every cell may contain a base value or an address;
the globals are stored at the bottom;
NP
=
New-Pointer – points to the first free cell.
For a simplification, we assume that the heap is stored in a separate segment.
The function malloc() then fails whenever NP exceeds the topmost border.
213
Every thread on the other hand needs its own stack:
S
SP
FP
H
SSet
214
In constrast to the CMa, we have:
SSet
=
Set of Stacks – contains the stacks of the threads;
every cell may contain a base value of an address;
S
=
common address space for heap and the stacks;
SP
=
Stack-Pointer – points to the current topmost ocupied stack cell;
FP
=
Frame-Pointer – points to the current stack frame.
Warning:
• If all references pointed into the heap, we could use separate address spaces
for each stack.
Besides SP and FP, we would have to record the number of the current stack
:-)
• In the case of C, though, we must assume that all storage reagions live
within the same address space — only at different locations :-)
SP Und FP then uniquely identify storage locations.
• For simplicity, we omit the extreme-pointer
215
EP.
28
The Ready-Queue
Idea:
• Every thread has a unique number tid.
• A table TTab allows to determine for every tid the corresponding thread.
• At every point in time, there can be several executable threads, but only one
running thread (per processor :-)
• the tid of the currently running thread is cept in the register CT
Thread).
• The function:
Accordingly:
tid self ()
(Current
returns the tid of the current thread.
codeR self () ρ
216
=
self
... where the instruction
the (current) stack:
self pushes the content of the register
CT
CT
11
11
self
S[SP++] = CT;
217
11
CT onto
• The remaining executable threads (more precisely, their tid’s) are
maintained in the queue RQ (Ready-Queue).
• For queues, we need the functions:
void enqueue (queue q, tid t),
tid dequeue (queue q)
which insert a tid into a queue and return the first one, respectively ...
218
CT
TTab
219
RQ
CT
RQ
TTab
13
enqueue(RQ, 13)
220
CT
RQ
TTab
13
221
CT
RQ
TTab
CT = dequeue(RQ);
222
CT
RQ
TTab
223
If a call to dequeue
()
failed, it returns a value < 0
:-)
The thread table must contain for every thread, all information which is needed
for its execution. In particular it consists of the registers PC, SP und FP:
2
SP
1
0
PC
FP
Interrupting the current thread therefore requires to save these registers:
void save () {
TTab[CT℄[0℄ = FP;
TTab[CT℄[1℄ = PC;
TTab[CT℄[2℄ = SP;
}
224
Analogously, we restore these registers by calling the function:
void restore () {
FP = TTab[CT℄[0℄;
PC = TTab[CT℄[1℄;
SP = TTab[CT℄[2℄;
}
Thus, we can realize an instruction
yield which causes a thread-switch:
tid ct = dequeue ( RQ );
if (ct ≥ 0) {
save (); enqueue ( RQ, CT );
CT = ct;
restore ();
}
Only if the ready-queue is non-empty, the current thread is replaced
225
:-)
29
Switching between Threads
Problem:
We want to give each executable thread a fair chance to be completed.
==⇒
• Every thread must former or later be scheduled for running.
• Every thread must former or later be interrupted.
Possible Strategies:
• Thread switch only at explicit calls to a function
• Thread switch after every instruction
yield()
:-(
==⇒ too expensive
:-(
• Thread switch after a fixed number of steps ==⇒ we must install a
counter and execute yield at dynamically chosen points :-(
226
We insert thread switches at selected program points ...
• at the beginning of function bodies;
• before every jump whose target does not exceed the current PC ...
==⇒ rare
:-))
The modified scheme for loops
code s ρ
=
A:
s ≡ while (e) s
codeR e ρ
jumpz B
code s ρ
yield
jump A
B:
...
227
then yields:
Note:
• If-then-else-Statements do not necessarily contain thread switches.
• do-while-Loops require a thread switch at the end of the condition.
• Every loop should contain (at least) one thread switch
:-)
• Loop-Unroling reduces the number of thread switches.
• At the translation of switch-statements, we created a jump table behind the
code for the alternatives. Nonetheless, we can avoid thread switches here.
• At freely programmed uses of jumpi as well as jumpz we should
also insert thread switches before the jump (or at the jump target).
• If we want to reduce the number of executed thread switches even further,
we could switch threads, e.g., only at every 100th call of yield ...
228
30
Generating New Threads
We assume that the expression:
s ≡ create (e0 , e1 )
first evaluates the
expressions ei to the values f , a and then creates a new thread which computes
f ( a) .
If thread creation fails, s returns the value −1.
Otherwise, s returns the new thread’s tid.
Tasks of the Generated Code:
• Evaluation of the ei ;
• Allocation of a new run-time stack together with a stack frame for the
evaluation of f ( a );
• Generation of a new tid;
• Allocation of a new entry in the TTab;
• Insertion of the new tid into the ready-queue.
229
The translation of s then is quite simple:
codeR s ρ
= codeR e0 ρ
codeR e1 ρ
initStack
initThread
where we assume the argument value occupies 1 cell :-)
For the implementation of initStack we need a run-time function
newStak() which returns a pointer onto the first element of a new stack:
230
SP
SP
newStack()
If the creation of a new stack fails, the value 0 is returned.
231
SP
SP
initStack
newStak();
if (S[SP]) {
S[S[SP]+1] = -1;
S[S[SP]+2] = f;
S[S[SP]+3] = S[SP-1];
S[SP-1] = S[SP]; SP-}
else S[SP = SP - 2] = -1;
232
f
−1
Note:
• The continuation address
of threads.
f points to the (fixed) code for the termination
• Inside the stack frame, we no longer allocate space for the EP
return value has relative address −2.
• The bottom stack frame can be identified through
==⇒ the
FPold = -1 :-)
In order to create new thread ids, we introduce a new register
Count).
TC
(Thread
Initially, TC has the value 0 (corresponds to the tid of the initial thread).
Before thread creation, TC is incremented by 1.
233
SP
SP
37
TC
6
TC
5
6
initThread
37
6
234
if (S[SP] ≥ 0) {
tid = ++TCount;
TTab[tid][0] = S[SP]-1;
TTab[tid][1] = S[SP-1];
TTab[tid][2] = S[SP];
S[--SP] = tid;
enqueue(
RQ, tid );
}
235
31
Terminating Threads
Termination of a thread (usually :-) returns a value. There are two (regular) ways
to terminate a thread:
1. The initial function call has terminated. Then the return value is the return
value of the call.
2. The thread executes the statement
the value of e.
exit (e); Then the return value equals
Warning:
• We want to return the return value in the bottom stack cell.
• exit may occur arbitrarily deeply nested inside a recursion. Then we
de-allocate all stack frames ...
• ... and jump to the terminal treatment of threads at address
236
f .
Therefore, we translate:
code exit (e); ρ
= codeR e ρ
exit
term
next
The instruction
term is explained later
:-)
The instruction exit successively pops all stack frames:
result = S[SP];
while (FP 6= –1) {
SP = FP–2;
FP = S[FP–1];
}
S[SP] = result;
237
17
FP
FP −1
−1
exit
17
238
The instruction next activates the next executable thread:
in contrast to yield the current thread is not inserted into
RQ .
RQ
4
CT
4
SP
PC
FP
5
7
2
RQ
13
CT 13
SP 39
PC 4
FP 21
next
13
39
4
21
4
239
5
7
2
13
39
4
21
Ist die Schlange RQ leer, wird zusätzlich If the queue RQ is empty, we
additionally terminate the whole program:
if (0 > ct = dequeue( RQ )) halt;
else {
save ();
CT = ct;
restore ();
}
240
32
Waiting for Termination
Occaionally, a thread may only continue with its execution, if some other thread
has terminated. For that, we have the expression join (e) where we assume
that e evaluatges to a thread id tid.
• If the thread with the given tid is already terminated, we return its return
value.
• If it is not yet terminated, we interrupt the current thread execution.
• We insert the current thread into the queue of treads already waiting for the
termination.
We save the current registers and switch to the next executable thread.
• Thread waiting for termination are maintained in the table
• There, we also store the return values of threads
241
:-)
JTab.
Example:
JTab
0
1
2
3
2
3
CT
RQ
0
1
4
4
Thread 0 is running, thread 1 could run, threads 2 and 3 wait for the termination
of 1, and thread 4 waits for the termination of 3.
242
Thus, we translate:
codeR join (e) ρ
= codeR e ρ
join
finalize
... where the instruction join
is defined by:
tid = S[SP];
if (TTab[tid][1] ≥ 0) {
enqueue ( JTab[tid], CT );
next
}
243
... accordingly:
SP
5
SP
42
finalize
5
42
5
S[SP] = JTab[tid][1];
244
42
The instruction sequence:
term
next
is executed before a thread is terminated.
Therefore, we store them at the location f.
The instruction
though,
next
switches to the next executable thread. Before that,
• ... the last stack frame must be popped and the result be stored in the table
JTab ;
• ... the thread must be marked as terminated, e.g., by additionally setting the
PC to −1;
• ... all threads must be notified which have waited for the termination.
For the instruction term this means:
245
PC = –1;
JTab[CT][1] = S[SP];
freeStack(SP);
while (0 ≤ tid = dequeue ( JTab[CT][0] ))
enqueue ( RQ, tid );
The run-time function
the location adr :
freeStak (int adr)
removes the (one-element) stack at
freeStack(adr)
adr
246
33
Mutual Exclusion
A mutex is an (abstract) datatype (in the heap) which should allow the
programmer to dedicate exclusive access to a shared resource (mutual
exclusion).
The datatype supports the following operations:
Mutex ∗
newMutex ();
void lok (Mutex ∗me);
—
creates a new mutex;
—
tries to acquire the mutex;
void unlok (Mutex ∗me); —
releases the mutex;
Warning:
A thread is only allowed to release a mutex if it has owned it beforehand
247
:-)
A mutex
me consists of:
• the tid of the current owner (or −1 if there is no one);
• the queue
BQ of blocked threads which want to acquire the mutex.
1
BQ
0
owner
248
Then we translate:
codeR newMutex () ρ
= newMutex
where:
−1
newMutex
249
Then we translate:
code lock (e); ρ
= codeR e ρ
lock
where:
CT
CT
17
−1
17
17
lock
250
If the mutex is already owned by someone, the current thread is interrupted:
CT
CT
17
17
5
5
lock
if (S[S[SP]] < 0) S[S[SP– –]] = CT;
else {
enqueue ( S[SP– –]+1, CT );
next;
}
251
Accordingly, we translate:
code unlock (e); ρ
= codeR e ρ
unlock
where:
CT
CT
5
5
17
5
17
unlock
252
If the queue
CT
BQ is empty, we release the mutex:
CT
5
5
5
−1
unlock
if (S[S[SP]] 6= CT) Error (“Illegal unlock!”);
if (0 > tid = dequeue ( S[SP]+1)) S[S[SP– –]] = –1;
else {
S[S[SP--]] = tid;
enqueue ( RQ, tid );
}
253
34
Waiting for Better Wheather
It may happen that a thread owns a mutex but must wait until some extra
condition is true.
Then we want the thread to remain in-active until it is told otherwise.
For that, we use condition variables. A condition variable consists of a queue
WQ of waiting threads :-)
0
WQ
254
For condition variables, we introduce the functions:
CondVar ∗ newCondVar
void wait (CondVar ∗
();
v),
void signal (CondVar ∗
Mutex ∗
me);
v);
void broadast (CondVar ∗
v);
255
—
creates a new condition variable;
—
enqueues the current thread;
—
re-animates one waiting thread;
—
re-animates all waiting threads.
Then we translate:
codeR newCondVar () ρ
= newCondVar
where:
newCondVar
256
After enqueuing the current thread, we release the mutex. After re-animation,
though, we must acquire the mutex again.
Therefore, we translate:
code wait (e0 , e1 ); ρ
= codeR e1 ρ
codeR e0 ρ
wait
dup
unlock
next
lock
where ...
257
CT
CT
5
5
5
5
5
wait
if (S[S[SP-1]] 6= CT) Error (“Illegal wait!”);
enqueue ( S[SP], CT ); SP--;
258
Accordingly, we translate:
code signal (e); ρ
= codeR e ρ
signal
RQ
RQ
17
signal
if (0 ≤ tid = dequeue ( S[SP]))
enqueue ( RQ, tid );
SP--;
259
17
Analogously:
code broadcast (e); ρ
= codeR e ρ
broadcast
where the instruction
into the ready-queue
broadcast enqueues all threads from the queue
RQ :
WQ
while (0 ≤ tid = dequeue ( S[SP]))
enqueue ( RQ, tid );
SP--;
Warning:
The re-animated threads are not blocked !!!
When they become running, though, they first have to acquire their mutex
260
:-)
35
Example:
Semaphores
A semaphore is an abstract datatype which controls the access of a bounded
number of (identical) resources.
Operations:
Sema ∗ newSema (int n ) —
void Up
(Sema
void Down
∗
(Sema
s)
∗
s)
creates a new semaphore;
—
increases the number of free resources;
—
decreases the number of available resources.
261
Therefore, a semaphore consists of:
•
a counter of type int;
•
a mutex for synchronizing the semaphore operations;
•
a condition variable.
typedef struct {
Mutex ∗ me;
CondVar ∗ cv;
int count;
} Sema;
262
Sema ∗ newSema (int n) {
Sema ∗ s;
s = (Sema ∗) malloc (sizeof (Sema));
s→me = newMutex ();
s→cv = newCondVar ();
s→count = n;
return (s);
}
263
The translation of the body amounts to:
alloc 1
newMutex
newCondVar
loadr 1
loadr 2
loadc 3
loadr 2
loadr 2
loadr 2
storer -2
new
store
loadc 1
loadc 2
return
storer 2
pop
add
add
store
store
pop
pop
pop
264
The function
Down()
decrements the counter.
If the counter becomes negative,
wait is called:
void Down (Sema ∗ s) {
Mutex ∗me;
me = s→me;
lock (me);
s→count– –;
if (s→count < 0) wait (s→cv,me);
unlock (me);
}
265
The translation of the body amounts to:
alloc 1
loadc 2
add
loadc 1
loadr 1
add
store
add
load
load
loadc 0
load
storer 2
loadc 1
le
wait
lock
sub
jumpz A A:
loadr 2
loadr 1
loadr 2
unlock
loadc 2
loadr 1
return
loadr 1
266
The function
Up()
increments the counter again.
If it is afterwards not yet positive, there still must exist waiting threads. One of
these is sent a signal:
void Up (Sema ∗ s) {
Mutex ∗me;
me = s→me;
lock (me);
s→count++;
if (s→count ≤ 0) signal (s→cv);
unlock (me);
}
267
The translation of the body amounts to:
alloc 1
loadc 2
add
loadc 1
loadr 1
add
store
add
load
load
loadc 0
load
storer 2
loadc 1
le
signal
lock
add
jumpz A A:
loadr 2
loadr 1
loadr 1
unlock
loadc 2
loadr 1
268
return
36
Stack-Management
Problem:
• All threads live within the same storage.
• Every thread requires its own stack (at least conceptually).
1. Idea:
Allocate for each new thread a fixed amount of storage space.
==⇒
Then we implement:
void *newStak() { return mallo(M); }
void freeStak(void *adr) { free(adr); }
269
Problem:
• Some threads consume much, some only little stack space.
• The necessary space is statically typically unknown
:-(
2. Idea:
• Maintain all stacks in one joint Frame-Heap FH
:-)
• Take care that the space inside the stack frame is sufficient at least for the
current function call.
• A global stack-pointer
GSP
points to the overall topmost stack cell ...
270
GSP
thread 1
thread 2
Allocation and de-allocation of a stack frame makes use of the run-time
functions:
int newFrame(int size) {
int result = GSP;
GSP = GSP+size;
return result;
}
void freeFrame(int sp, int size);
271
Warning:
The de-allocated block may reside inside the stack
:-(
==⇒
We maintain a list of freed stack blocks
42
30
:-)
19
15
7
6
0
3
1
This list supports a function
void insertBlok(int max, int min)
which allows to free single blocks.
• If the block is on top of the stack, we pop the stack immediately;
• ... together with the blocks below – given that these have already been
marked as de-allocated.
• If the block is inside the stack, we merge it with neighbored free blocks:
272
GSP
GSP
freeBlock(...)
273
GSP
GSP
freeBlock(...)
274
GSP
GSP
freeBlock(...)
275
Approach:
We allocate a fresh block for every function call ...
Problem:
When ordering the block before the call, we do not yet know the space
consumption of the called function :-(
==⇒
We order the new block after entering the function body!
276
SP
Organisational cells as well as actual parameters must be allocated inside the old
block ...
277
SP
actual
parameters
When entering the new function, we now allocate the new block ...
and one further line
278
SP
actual
parameters
FP
local
variables
Inparticular, the local variables reside in the new block ...
and one further line
279
==⇒
We address ...
• the formal parameters relatively to the frame-pointer;
• the local variables relatively to the stack-pointer :-)
==⇒
Alternative:
We must re-organize the complete code generation ... :-(
Passing of parameters in registers ... :-)
280
argument
registers
SP
The values of the actual parameters are determined before allocation of the new
stack frame.
281
argument
registers
actual
parameters
SP
FP
organizational
cells
The complete frame is allocated inside the new block – plus the space for the
current parameters.
282
argument
registers
actual
parameters
SP
FP
Inside the new block, though, we must store the old
order to correctly return the result ... :-)
283
SP
(possibly +1) in
3. Idea:
Hybrid Solution
• For the first k threads, we allocate a separate stack area.
• For all further threads, we successively use one of the existing ones !!!
==⇒
• For few threads extremely simple and efficient;
• For many threads amortized storage usage
284
:-))