Languages and Compilers (SProg og Oversættere) Lecture 15 (2) Bent Thomsen

Languages and Compilers
(SProg og Oversættere)
Lecture 15 (2)
Bent Thomsen
Department of Computer Science
Aalborg University
With acknowledgement to Norm Hutchinson whose slides this lecture is based on.
1
Curricula
(Studieordning)
The purpose of the course is for the student to gain
knowledge of important principles in programming
languages and for the student to gain an understanding
of techniques for describing and compiling
programming languages.
2
What was this course about?
• Programming Language Design
– Concepts and Paradigms
– Ideas and philosophy
– Syntax and Semantics
• Compiler Construction
– Tools and Techniques
– Implementations
– The nuts and bolts
3
The principal paradigms
•
•
•
•
Imperative Programming (C)
Object-Oriented Programming (C++)
Logic/Declarative Programming (Prolog)
Functional/Applicative Programming (Lisp)
• New paradigms?
–
–
–
–
Agent Oriented Programming
Business Process Oriented (Web computing)
Grid Oriented
Aspect Oriented Programming
• Or multi-paradigm languages
4
Criteria in a good language design
• Readability
– understand and comprehend a computation easily and accurately
• Write-ability
– express a computation clearly, correctly, concisely, and quickly
• Reliability
– assures a program will not behave in unexpected or disastrous ways
• Orthogonality
– A relatively small set of primitive constructs can be combined in a relatively
small number of ways
– Every possible combination is legal
– Lack of orthogonality leads to exceptions to rules
5
Criteria (Continued)
• Uniformity
– similar features should look similar and behave similar
• Maintainability
– errors can be found and corrected and new features added easily
• Generality
– avoid special cases in the availability or use of constructs and by combining
closely related constructs into a single more general one
• Extensibility
– provide some general mechanism for the user to add new constructs to a
language
• Standardability
– allow programs to be transported from one computer to another without
significant change in language structure
• Implementability
– ensure a translator or interpreter can be written
6
Tennent’s Language Design principles
7
Important!
• Syntax is the visible part of a programming language
– Programming Language designers can waste a lot of time discussing
unimportant details of syntax
• The language paradigm is the next most visible part
– The choice of paradigm, and therefore language, depends on how
humans best think about the problem
– There are no right models of computations – just different models of
computations, some more suited for certain classes of problems than
others
• The most invisible part is the language semantics
– Clear semantics usually leads to simple and efficient
implementations
– Static semantics: Scope rules and Types
– Dynamic semantics: Run-time behaviour
8
Levels of Programming Languages
High-level program
class Triangle {
...
float surface()
return b*h/2;
}
Low-level program
LOAD r1,b
LOAD r2,h
MUL r1,r2
DIV r1,#2
RET
Executable Machine code 0001001001000101
0010010011101100
10101101001...
9
Terminology
Q: Which programming languages play a role in this picture?
input
source program
Translator
is expressed in the
source language
output
object program
is expressed in the
target language
is expressed in the
implementation language
A: All of them!
10
Tombstone Diagrams
What are they?
– diagrams consisting out of a set of “puzzle pieces” we can use
to reason about language processors and programs
– different kinds of pieces
– combination rules (not all diagrams are “well formed”)
Program P implemented in L
P
L
Machine implemented in hardware
M
Translator implemented in L
S -> T
L
Language interpreter in L
M
L
11
Syntax Specification
Syntax is specified using “Context Free Grammars”:
–
–
–
–
A finite set of terminal symbols
A finite set of non-terminal symbols
A start symbol
A finite set of production rules
A CFG defines a set of strings
– This is called the language of the CFG.
12
Backus-Naur Form
Usually CFG are written in BNF notation.
A production rule in BNF notation is written as:
N ::= a
where N is a non terminal
and a a sequence of terminals and non-terminals
N ::= a | b | ... is an abbreviation for several rules with N
as left-hand side.
13
Concrete Syntax of Commands
single-Command
::= V-name := Expression
| Identifier ( Expression )
| if Expression then single-Command
else single-Command
| while Expression do single-Command
| let Declaration in single-Command
| begin Command end
Command ::= single-Command
| Command ; single-Command
14
Concrete and Abstract Syntax
The previous grammar specified the concrete syntax of
Mini Mriangle.
The concrete syntax is important for the programmer who
needs to know exactly how to write syntactically wellformed programs.
The abstract syntax omits irrelevant syntactic details and
only specifies the essential structure of programs.
Example: different concrete syntaxes for an assignment
v := e
(set! v e)
e -> v
v = e
15
Abstract Syntax of Commands
Command
::= V-name := Expression
| Identifier ( Expression )
| if Expression then Command
else Command
| while Expression do Command
| let Declaration in Command
| Command ; Command
AssignCmd
CallCmd
IfCmd
WhileCmd
LetCmd
SequentialCmd
16
AST Representation: Possible Tree Shapes
Example: remember the Mini Triangle AST (excerpt below)
Command ::=
...
| if Expression then Command
else Command
...
IfCmd
IfCmd
E
C1
C2
17
Abstract Syntax Trees
Abstract Syntax Tree for:
d:=d+10*n
AssignmentCmd
BinaryExpression
BinaryExpression
VName
VNameExp
SimpleVName
SimpleVName
Ident
d
Ident
d
IntegerExp
VNameExp
SimpleVName
Op
Int-Lit
+
10
Op
*
Ident
n
18
Design Guidelines for AST
• The concrete syntax has to be an unambiguous grammar
– Thus productions may be added to resolve ambiguities
– Use recursion (left or right) to describe list or sequences (e.g.
parameters or commands in blocks)
– Contain productions to implement precedence and
Associativity of operators
• AST should discard such productions, as well as (most)
symbols (punctuation, keywords), but retain enough
information that the program can be reconstructed
19
Contextual Constraints
Syntax rules alone are not enough to specify the format of
well-formed programs.
Example 1:
let const m~2
in m + x Undefined!
Example 2:
let const m~2 ;
var
n:Boolean
in begin
n := m<4;
n := n+1 Type error!
end
Scope Rules
Type Rules
20
Type Rules
Type rules regulate the expected types of arguments and
types of returned values for the operations of a language.
Examples
Type rule of < :
E1 < E2 is type correct and of type Boolean
if E1 and E2 are type correct and of type Integer
Type rule of while:
while E do C is type correct
if E of type Boolean and C type correct
Terminology:
Static typing vs. dynamic typing
21
Semantics
Specification of semantics is concerned with specifying the
“meaning” of well-formed programs.
Terminology:
Expressions are evaluated and yield values (and may or may not
perform side effects)
Commands are executed and perform side effects.
Declarations are elaborated to produce bindings
Side effects:
• change the values of variables
• perform input/output
22
Phases of a Compiler
A compiler’s phases are steps in transforming source code
into object code.
The different phases correspond roughly to the different
parts of the language specification:
• Syntax analysis <-> Syntax
• Contextual analysis <-> Contextual constraints
• Code generation <-> Semantics
23
The “Phases” of a Compiler
Source Program
Syntax Analysis
Error Reports
Abstract Syntax Tree
Contextual Analysis
Error Reports
Decorated Abstract Syntax Tree
Code Generation
Object Code
24
Compiler Passes
• A pass is a complete traversal of the source program, or
a complete traversal of some internal representation of
the source program.
• A pass can correspond to a “phase” but it does not have
to!
• Sometimes a single “pass” corresponds to several phases
that are interleaved in time.
• What and how many passes a compiler does over the
source program is an important design decision.
25
Single Pass Compiler
A single pass compiler makes a single pass over the source text,
parsing, analyzing and generating code all at once.
Dependency diagram of a typical Single Pass Compiler:
Compiler Driver
calls
Syntactic Analyzer
calls
Contextual Analyzer
calls
Code Generator
26
Multi Pass Compiler
A multi pass compiler makes several passes over the program. The
output of a preceding phase is stored in a data structure and used by
subsequent phases.
Dependency diagram of a typical Multi Pass Compiler:
Compiler Driver
calls
calls
calls
Syntactic Analyzer
Contextual Analyzer
Code Generator
input
output input
output input
output
Source Text
AST
Decorated AST
Object Code
27
Syntax Analysis
Dataflow chart
Source Program
Stream of Characters
Scanner
Error Reports
Stream of “Tokens”
Parser
Error Reports
Abstract Syntax Tree
28
Regular Expressions
• RE are a notation for expressing a set of strings of
terminal symbols.
Different kinds of RE:
e
The empty string
t
Generates only the string t
XY
Generates any string xy such that x is generated by x
and y is generated by Y
X|Y
Generates any string which generated either
by X or by Y
X*
The concatenation of zero or more strings generated
by X
(X)
For grouping,
29
FA and the implementation of Scanners
• Regular expressions, (N)DFA-e and NDFA and DFA’s
are all equivalent formalisms in terms of what languages
can be defined with them.
• Regular expressions are a convenient notation for
describing the “tokens” of programming languages.
• Regular expressions can be converted into FA’s (the
algorithm for conversion into NDFA-e is
straightforward)
• DFA’s can be easily implemented as computer
programs.
30
Parsing
Parsing == Recognition + determining phrase structure
(for example by generating AST)
– Different types of parsing strategies
• bottom up
• top down
31
Top-Down vs Bottom-Up parsing
LL-Analyse (Top-Down)
LR-Analyse (Bottom-Up)
Reduction
Derivation
Look-Ahead
Look-Ahead
32
Development of Recursive Descent Parser
(1) Express grammar in EBNF
(2) Grammar Transformations:
Left factorization and Left recursion elimination
(3) Create a parser class with
– private variable currentToken
– methods to call the scanner: accept and acceptIt
(4) Implement private parsing methods:
– add private parseN method for each non terminal N
– public parse method that
• gets the first token form the scanner
• calls parseS (S is the start symbol of the grammar)
33
LL(1) Grammars
• The presented algorithm to convert EBNF into a parser
does not work for all possible grammars.
• It only works for so called LL(1) grammars.
• Basically, an LL(1) grammar is a grammar which can
be parsed with a top-down parser with a lookahead (in
the input stream of tokens) of one token.
• What grammars are LL(1)?
How can we recognize that a grammar is (or is not) LL(1)?
We can deduce the necessary conditions from the parser
generation algorithm.
We can use a formal definition
34
Converting EBNF into RD parsers
The conversion of an EBNF specification into a Java
implementation for a recursive descent parser is so
“mechanical” that it can easily be automated!
=> JavaCC “Java Compiler Compiler”
35
JavaCC and JJTree
36
LR parsing
–
–
–
–
The algorithm makes use of a stack.
The first item on the stack is the initial state of a DFA
A state of the automaton is a set of LR(0)/LR(1) items.
The initial state is constructed from productions of the form
S:= • a [, $] (where S is the start symbol of the CFG)
– The stack contains (in alternating) order:
• A DFA state
• A terminal symbol or part (subtree) of the parse tree being
constructed
– The items on the stack are related by transitions of the DFA
– There are two basic actions in the algorithm:
• shift: get next input token
• reduce: build a new node (remove children from stack)
37
Bottom Up Parsers: Overview of Algorithms
• LR(0) : The simplest algorithm, theoretically important
but rather weak (not practical)
• SLR : An improved version of LR(0) more practical but
still rather weak.
• LR(1) : LR(0) algorithm with extra lookahead token.
– very powerful algorithm. Not often used because of large
memory requirements (very big parsing tables)
• LALR : “Watered down” version of LR(1)
– still very powerful, but has much smaller parsing tables
– most commonly used algorithm today
38
JavaCUP: A LALR generator for Java
Definition of tokens
Grammar
BNF-like Specification
Regular Expressions
JFlex
JavaCUP
Java File: Scanner Class
Java File: Parser Class
Recognizes Tokens
Uses Scanner to get Tokens
Parses Stream of Tokens
Syntactic Analyzer
39
Steps to build a compiler with SableCC
1.
2.
3.
4.
5.
Create a SableCC
specification file
Call SableCC
Create one or more
working classes,
possibly inherited
from classes
generated by
SableCC
Create a Main class
activating lexer,
parser and working
classes
Compile with Javac
40
Contextual Analysis Phase
• Purposes:
– Finish syntax analysis by deriving context-sensitive
information
– Associate semantic routines with individual productions of the
context free grammar or subtrees of the AST
– Start to interpret meaning of program based on its syntactic
structure
– Prepare for the final stage of compilation: Code generation
41
Contextual Analysis -> Decorated AST
Annotations:
result of identification
:type result of type checking
Program
LetCommand
SequentialCommand
SequentialDeclaration
VarDecl
Ident
n Integer
AssignCommand
BinaryExpr :int
Char.Expr
VNameExp Int.Expr
VarDecl
:char :int :int
SimpleT SimpleV
SimpleV
:char
:int
SimpleT
Ident
AssignCommand
Ident
Ident
c Char
Ident Char.Lit Ident
c
‘&’
n
:int
Ident Op Int.Lit
n
+ 1
42
Nested Block Structure
Nested
A language exhibits nested block structure if
blocks may be nested one within another (typically
with no upper bound on the level of nesting that is
allowed).
There can be any number of scope levels (depending
on the level of nesting of blocks):
Typical scope rules:
• no identifier may be declared more than once
within the same block (at the same level).
• for any applied occurrence there must be a
corresponding declaration, either within the
same block or in a block in which it is nested.
43
Type Checking
For most statically typed programming languages, type
checking is a bottom up algorithm over the AST:
• Types of expression AST leaves are known
immediately:
– literals => obvious
– variables => from the ID table
– named constants => from the ID table
• Types of internal nodes are inferred from the type of the
children and the type rule for that kind of expression
44
Contextual Analysis
Identification and type checking are combined into a depth-first traversal
Program
of the abstract syntax tree.
LetCommand
SequentialDeclaration
SequentialCommand
AssignCommand
AssignCommand
BinaryExpression
VarDec
VarDec
SimpleT
Ident
n
Ident
CharExpr
SimpleT SimpleV
Ident
Integer c
VnameExpr
IntExpr
SimpleV SimpleV
Ident
Ident
CharLit
Ident
Ident
Op
IntLit
Char
c
‘&’
n
n
+
1
45
Implementing Tree Traversal
•
•
•
•
“Traditional” OO approach
Visitor approach
“Functional” approach
(Active patterns in Scala or F#)
46
“Traditional” OO approach
•
•
•
Add to each AST class methods for type checking
(or code-generation, pretty printing, etc.).
In each AST node class, the methods traverse their children.
public abstract AST() {
public abstract Object check(Object arg);
public abstract Object encode(Object arg);
public abstract Object prettyPrint(Object arg);
}
...
program program;
program.check(null);
47
“Traditional” OO approach
public abstract class Expression extends AST {
public Type type;
...
}
public class BinaryExpr extends Expression {
public Expression E1, E2;
public Operator O;
public Object check(Object arg) {
Type t1 = (Type) E1.check(null);
Type t2 = (Type) E2.check(null);
Op op = (Op) O.check(null);
Type result = op.compatible(t1,t2);
if (result == null)
report type error
return result;
}
...
}
•
•
Advantage: OO-idea is easy to understand and implement
Disadvantage: checking (and encoding) methods are spread over all AST classes: not
very modular
48
Visitor Solution
Node
• Nodes accept visitors and call
appropriate method of the visitor
• Visitors implement the operations
and have one method for each type
of node they visit
Accept( NodeVisitor v )
VariableRefNode
AssignmentNode
Accept(NodeVisitor v)
{v->VisitVariableRef(this)}
Accept(NodeVisitor v)
{v->VisitAssignment(this)}
NodeVisitor
VisitAssignment( AssignmentNode )
VisitVariableRef( VariableRefNode )
TypeCheckingVisitor
VisitAssignment( AssignmentNode )
VisitVariableRef( VariableRefNode )
CodeGeneratingVisitor
VisitAssignment( AssignmentNode )
VisitVariableRef( VariableRefNode )
49
Implementing type checking from type rules
(conditional)
 |- E: TE, TE=bool,  |- S1: T1,  |- S2: T2 , T1=T2
 |- if E then S1 else S2: T1
public Object visitIfExpression (IfExpression com,Object arg)
{
Type eType = (Type)com.E.visit(this,null);
if (! eType.equals(Type.boolT) )
report error: expression in if not boolean
Type c1Type = (Type)com.C1.visit(this,null);
Type c2Type = (Type)com.C2.visit(this,null);
if (! c1Type.equals(c2Type) )
report error: type mismatch in expression branches
return c1Type;
}
50
Visitor pattern according to Brown&Watt
interface Visitor {
visitA(A a);
visitB(B c);
visitC(C c);
}
class A {
A x;
accept(Visitor v) {
v.visitA(this);
}
}
class B extends A {
accept(Visitor v) {
v.visitB(this);
}
}
class C extends A {
accept(Visitor v) {
v.visitC(this);
}
}
×
class op1 implements Visitor {
visitA(A a) {…}
visitB(B c) {…}
visitC(C c) {…}
}
class op2 implements Visitor {
visitA(A a) {…}
visitB(B c) {…}
visitC(C c) {…}
}
class op3 implements Visitor {
visitA(A a) {…}
visitB(B c) {…}
visitC(C c) {…}
}
51
A more general Visitor pattern using overloading in Java
interface
visit(A
visit(B
visit(C
}
class A {
A x;
accept(Visitor v) {
v.visit(this);
}
}
class B extends A {
accept(Visitor v) {
v.visit(this);
}
}
class C extends A {
accept(Visitor v) {
v.visit(this);
}
}
×
Visitor {
a);
c);
c);
class op1
visit(A
visit(B
visit(C
}
implements Visitor {
a) {…}
c) {…}
c) {…}
class op2
visit(A
visit(B
visit(C
}
implements Visitor {
a) {…}
c) {…}
c) {…}
class op3
visit(A
visit(B
visit(C
}
implements Visitor {
a) {…}
c) {…}
c) {…}
52
Double dispatch example
1st
dispatch
class B {
accept(Visitor v) {
// always calls visit(B b)
v.visit(this);
}
}
Visitor v = op1; // can be op1/2/3
A x = B; // x can be A/B/C
x.accept(v);
class op1 implements Visitor {
visit(A a) {
}
visit(B b) { … }
}
2nd dispatch
53
Double dispatch example
1st
Visitor v = op1; // can be op1/2/3
A x = B; // x can be A/B/C
x.accept(v);
dispatch
class B {
accept(Visitor v) {
// always calls visit(B b)
v.visit(this);
}
}
class op1 implements Visitor {
visit(A a) {
}
visit(B b) { … }
}
2nd dispatch
x.accept(v)
op1
op2
op3
A
1st
dispatch
Visitor pattern conceptually
implements two-dimensional table
B
C
op1.visit(B b)
v.visit(this)
2nd dispatch
54
Implementing Tree Traversal: instanceof
Another possibility is to use a “functional” approach and
implement a case-analysis on the class of an object.
Type check(Expr e) {
if (e instanceof IntLitExpr)
return representation of type int
else if (e instanceof BoolLitExpr)
return representation of type bool
else if (e instanceof EqExpr) {
Type t = check(((EqExpr)e).left);
Type u = check(((EqExpr)e).right);
if (t == representation of type int &&
u == representation of type int)
return representation of type bool
...
55
But then we might as well use a functional language such
as SML/F#
datatype Command =
|
|
|
AssignCmd of v-name * Exp
CallCmd of Ident * Exp
IfCmd of Exp * Command * Command
…
Fun checker AssignCmd(v,e) = lookup(v) = checker(e)
| checker CallCmd(i,e) = lookup(v) = checker(e)
| checker IfCmd(e,c1,c2) =
if checker(e) = bool
then checker(c1) = checker(c2)
| checker …
In F# we can combine the OO and Functional approach, see the paper:
“Mapping and Visiting in Functional and Object Oriented Programming”
Kurt Nørmark, Bent Thomsen, and Lone Leth Thomsen
JOT: Journal of Object Technology
http://www.jot.fm/issues/issue_2008_09/article2/index.html
56
Implementing Tree Traversal in C
exp  exp + term | exp-term | term
term  term == factor | factor
factor  (exp) | number | true | false
Typedef enum {Plus, Minus, Eq} opkind;
Typedef enum {Int, Bool, Error} type;
Typedef enum {opkind, constkind} expkind;
Typedef struct streenode
{ expkind kind;
opkind op;
struct streenode *lchild, *rchild;
type val;
} Streenode;
typedef Streenode *Syntaxtree;
57
Implementing Tree Traversal in C
void checker(Syntaxtree t)
{ type temp;
if (t->kind == opkind)
{ checker(t->lchild);
checker(t_rchild);
switch (t->op)
{ case Plus:
if (t->lchild->val == t->rchild->val) &&
(t->lchild-> == Int)
then t->val = Int;
else t->val = Error;
break;
case Minus:
…
break;
case Eq:
if (t->lchild->val == t->rchild->val) &&
(t->lchild-> == Int)
then t->val = Bool;
else t->val = Error;
break;
}
}
}
58
Runtime organization
• Data Representation: how to represent values of the source
language on the target machine.
•Primitives, arrays, structures, unions, pointers
• Expression Evaluation: How to organize computing the values of
expressions (taking care of intermediate results)
•Register vs. stack machine
• Storage Allocation: How to organize storage for variables
(considering different lifetimes of global, local and heap variables)
•Activation records, static links
• Routines: How to implement procedures, functions (and how to
pass their parameters and return values)
•Value vs. reference, closures, recursion
• Object Orientation: Runtime organization for OO languages
•Method tables
59
RECAP: TAM Frame Layout Summary
arguments
LB
ST
dynamic link
static link
return address
local variables
and intermediate
results
Arguments for current procedure
they were put here by the caller.
Link data
Local data, grows and shrinks
during execution.
60
Garbage Collection: Conclusions
• Relieves the burden of explicit memory allocation and
deallocation.
• Software module coupling related to memory
management issues is eliminated.
• An extremely dangerous class of bugs is eliminated.
• The compiler generates code for allocating objects
• The compiler must also generate code to support GC
– The GC must be able to recognize root pointers from the stack
– The GC must know about data-layout and objects descriptors
61
Code Generation
Source Program
let var n: integer;
var c: char
in begin
c := ‘&’;
n := n+1
end
Source and target program must be
“semantically equivalent”
~
~
Target program
PUSH 2
LOADL 38
STORE 1[SB]
LOAD 0
LOADL 1
CALL add
STORE 0[SB]
POP 2
HALT
Semantic specification of the source language is structured in
terms of phrases in the SL: expressions, commands, etc.
=> Code generation follows the same “inductive” structure.
62
Specifying Code Generation with Code Templates
The code generation functions for Mini Triangle
Phrase Class Function Effect of the generated code
Program run P
Run program P then halt. Starting and
finishing with empty stack
Command execute C Execute Command C. May update variables
but does not shrink or grow the stack!
Expres- evaluate E Evaluate E, net result is pushing the value of
sion
E on the stack.
V-name
Push value of constant or variable on the
fetch V
stack.
V-name
assign V Pop value from stack and store in variable V
Declaelaborate Elaborate declaration, make space on the
ration
stack for constants and variables in the decl.
D
63
Code Generation with Code Templates
While command
execute [while E do C]
=
JUMP h
g: execute [C]
h: evaluate[E]
JUMPIF(1) g
C
E
64
Developing a Code Generator “Visitor”
execute [C1 ; C2] =
execute[C1]
execute[C2]
public Object visitSequentialCommand(
SequentialCommand com,Object arg) {
com.C1.visit(this,arg);
com.C2.visit(this,arg);
return null;
}
LetCommand, IfCommand, WhileCommand => later.
- LetCommand is more complex: memory allocation and addresses
- IfCommand and WhileCommand: complications with jumps
65
Code improvement (optimization)
The code generated by our compiler is not efficient:
• It computes values at runtime that could be known at
compile time
• It computes values more times than necessary
We can do better!
• Constant folding
• Common sub-expression elimination
• Code motion
• Dead code elimination
66
Optimization implementation
• Is the optimization correct or safe?
• Is the optimization an improvement?
• What sort of analyses do we need to perform to get the
required information?
–Local
–Global
67
Programming Language Life cycle
• The requirements for the new language are identified
• The language syntax and semantics is designed
– BNF or EBNF, experiments with front-end tools
– Informal or formal Semantic
• An informal or formal specification is developed
• Initial implementation
– Prototype via interpreter or interpretive compiler
• Language tested by designers, implementers and a few
friends
• Feedback on the design and possible reconsiderations
• Improved implementation
68
Programming Language Life cycle
Design
Specification
Prototype
Manuals,
Textbooks
Compiler
69
Some advice
• Language design
– Which paradigm(s) and which criterias (readability,
orthogonality, etc.)
– Write lots of example programs in you new language
• Can be used as test cases later
• Language specification:
– Syntax (at least two grammars: Concrete and Abstract)
– Static Semantics
• Scope and type rules
– Dynamic Semantics
70
Some Advice
• Language implementation
– Lexer and Parser (based on conrete syntax)
• Start with a subset of your language
– AST design (based on abstract syntax)
– Tree traversal
• Build trees by hand!
• Write a pretty printer
• Write an interpreter
• Scope and type check
• Code Generation
– Generate textual machine code first
71
Programming Language Life cycle
•
•
•
•
•
•
•
•
•
•
Lots of research papers
Conferences session dedicated to new language
Text books and manuals
Used in large applications
Huge international user community
Dedicated conference
International standardisation efforts
Industry de facto standard
Programs written in the languages becomes legacy code
Language enters “hall-of-fame” and features are taught in CS
course on Programming Language Design and Implementation
72
The Most Important Open Problem in Computing
Increasing Programmer Productivity
– Write programs correctly
– Write programs quickly
– Write programs easily
• Why?
–
–
–
–
Decreases support cost
Decreases development cost
Decreases time to market
Increases satisfaction
73
Why Programming Languages?
3 ways of increasing programmer productivity:
1. Process (software engineering)
– Controlling programmers
2. Tools (verification, static analysis, program generation)
– Important, but generally of narrow applicability
3. Language design --- the center of the universe!
– Core abstractions, mechanisms, services, guarantees
– Affect how programmers approach a task (C vs. SML)
– Multi-paradigm integration
74
How to recognize a problem that can be solved with
programming language techniques when you see one?
Problem - a Scrabble game to be distributed as an applet.
• Create a dictionary of 50,000 words.
• Two options
– Program 1:
• create an external file words.txt and read it into an array when
• program starts
• while ((word = f.readLine()) != null
{words.addElement(word);}
– Program 2:
• create a 50.000 element table in the program and initialize it to the words
• String [] words = {“hill”, “fetch”, “pail”, “water”,…..};
• Advantages/disadvantages of each approach?
–
–
–
–
performance
flexibility
correctness
….
• Example from J. Craig Cleaveland. Program Generators with XML and Java,
chapter 1
75
A program generator approach
import java.io.*;
import java.util.*;
class Dictionary1Generator {
static Vector words = new Vector();
static void loadWords() {
// read the words in file words.txt
// into the Vector words
}
static public void main(String[] args) {
loadWords();
// Generate Dictionary1 program
System.out.println("class Dictionary1{\n");
System.out.println(" String words = {");
for (int j=0; j<words.size(); ++j) {
System.out.println("\""+words.elementAt(j)+"\",");
};
System.out.println(”} \n }”);
}
76
Typical program generator
• Dictionary example
• The data
– simply a list of words
• Analyzing/transforming data
– duplicate word removal
– sorting
• Generate program
– simply use print statements to
write program text
• General picture
• The data
– some more complex
representation of data
• formal specs,
• grammar,
• spreadsheet,
• XML,
• etc.
• Analyzing/transforming data
– parse, check for inconsistencies,
transform to other data
structures
• Generate program
– generate syntax tree, use
templates,…
77
The next wave of Program Generators:
Model-Driven Development
Requirements
Analysis &
Design
Implementation
Testing
78
Source: http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
January 2011
79
Source: http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
January 2011
80
Conclusions
• Nothing has changed much
• Main languages have their domain
–
–
–
–
–
Java – for web applications
C – for system programming
(Visual) Basic – for desktop windows apps
PHP for serverside scripting
C++ when Java is (perceived) too slow
• We shouldn’t bother with new languages!
• Wait a minute!
• Something is changing
– Software is getting more and more complex
– Hardware has changed
81
Which languages are discussed?
Source: http://langpop.com
82
Source: http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
83
Three Trends
• Declarative programming languages in vogue again
– Especially functional
• Dynamic Programming languages are gaining
momentum
• Concurrent Programming languages are back on the
agenda
84
New Programming Language! Why Should I Care?
• The problem is not designing a new language
– It’s easy! Thousands of languages have been developed
• The problem is how to get wide adoption of the new language
– It’s hard! Challenges include
• Competition
• Usefulness
• Interoperability
• Fear
“It’s a good idea, but it’s a new idea; therefore, I fear it and must reject it.”
--- Homer Simpson
• The financial rewards are low, but …
85
Famous Danish Computer Scientists
• Peter Nauer
– BNF and Algol
• Per Brinck Hansen
– Monitors and Concurrent Pascal
• Dines Bjørner
– VDM and ADA
• Bjarne Straustrup
– C++
• Mads Tofte
– SML
• Rasmus Lerdorf
– PhP
• Anders Hejlsberg
– Turbo Pascal and C#
• Jacob Nielsen
86
87
88
Fancy joining this crowd?
• Join the Programming Language Technology Research Group
when you get to DAT5/DAT6 or SW9/SW10
• Research Programme underway (P2025)
– How would you like to programme in 20 years?
• Experimenting with advanced programming
– (Concurrent) Functional and OO integration
– Programmatic Program Construction
• Developing a new programming language
• ”The P-gang”:
•
•
•
•
•
•
Kurt Nørmark
Lone Leth
Bent Thomsen
Simon Kongshøj
(Petur Olsen og Thomas Bøgholm)
(Thomas Vestdam)
89
Sub Research Projects
•
•
•
•
Distributed STM for HPC
DBMS on GPU
Predictable Java
(End user programming of Location Based Services)
90
2003/2004/2005/2006/2007/2008 Projects
•
DAT5/INF7/SW9
–
–
–
–
–
–
–
–
•
DAT6/INF8/SW10
–
–
–
–
–
–
–
–
–
–
•
Java vs. .Net Mobile (ver. 1 and 2)
Business Process Management
Quality control in Open Source Development
Impedance mismatch (performance, C#, Java)
XML and programming language representation
Languages and games
Aspect oriented Programming
Testing and PrgL. Design
Mobile Business Process Infrastructure based on Ambients
Aspect.Net and JTL
Search for WS based on Semantic Web
Performance analysis of J2ME systems
Communication in Open Source Projects
New concurrency constructs in Java
Type inference for Ruby
Dependent types for super computing
Analysis of Real-Time Java Programs
Testing tool for .Net
DAT8/D8
–
–
Java vs. C on DSP
Multiple dispatch in C#
91
2009/2010/2011 projects
• MSc projects
–
–
–
–
–
–
–
–
End-User Programming
Uniform client- and server-side programming
Concurrent Functional Programming with Erlang and Clojure
Mobile Game Development using Meta-programming and memory
Management techniques
A PEG parser generator in C#
Programming GPGPUs using Scala
Real-time programming in Java
Modular verification of WCET of Predictable Java Programs
• PhD Projects
– Virtual Machines for Dynamic Languages
• Simon Kongshøj
– Verification of Real-Time Java Programs using UPPAAL
• Thomas Bøgholm (og Petur Olsen)
92
Finally
Keep in mind, the compiler is the program from which all other
programs arise. If your compiler is under par, all programs created
by the compiler will also be under par. No matter the purpose or use
-- your own enlightenment about compilers or commercial
applications -- you want to be patient and do a good job with this
program; in other words, don't try to throw this together on a
weekend.
Asking a computer programmer to tell you how to write a compiler
is like saying to Picasso, "Teach me to paint like you."
*Sigh* Well, Picasso tried.
93
What I promised you at the start of the course
Ideas, principles and techniques to help you
– Design your own programming language or design your own
extensions to an existing language
– Tools and techniques to implement a compiler or an interpreter
– Lots of knowledge about programming
I hope you feel you got what I promised
94
Top 10 reasons COMPILERS must be female
10. Picky, picky, picky.
9. They hear what you say, but not what you mean.
8. Beauty is only shell deep.
7. When you ask what's wrong, they say "nothing".
6. Can produce incorrect results with alarming speed.
5. Always turning simple statements into big productions.
4. Small talk is important.
3. You do the same thing for years, and suddenly it's wrong.
2. They make you take the garbage out.
1. Miss a period and they go wild.
95