Truffle – A Self-optimizing Language Implementation Framework 1

5th Int. Conf. on Mathematics and Informatics, September 2-4, 2015, Tˆargu Mure¸s, Romania
Truffle – A Self-optimizing Language Implementation
Framework
Hanspeter M¨
ossenb¨
ock
Matthias Grimmer
Johannes Kepler University Linz, Austria
{hanspeter.moessenboeck,matthias.grimmer}@jku.at
We present Truffle [10], a framework for building efficient programming language implementations based on tree rewriting and profile-driven specialization of the executed program. Source
code is first transformed into an abstract syntax tree (AST), which is then interpreted. During
interpretation, run-time information about the execution is collected and the AST is rewritten so
that it specializes to the observed profile (e.g., to the observed types, values, and invocations).
When the AST has reached a stable state it is dynamically compiled to efficient machine code. If
specializations turn out to fail at run time the machine code is deoptimized and falls back to the
interpreter.
Truffle is particularly useful for implementing dynamically typed languages such as JavaScript
or Ruby where type information must be collected at run time, but it also creates new optimization
potential for statically typed languages. Furthermore, it supports seamless interoperability between
different programming languages and even allows a memory-safe implementation of otherwise unsafe low-level languages such as C.
We explain the concepts of Truffle and sketch the implementation of an AST interpreter with
profile-driven specializations and dynamic compilation. We also shortly explain Truffle's potential
for interoperability and memory safety.
1
Motivation
Writing a compiler—especially one that produces highly optimized machine code—is considered to
be a non-trivial task. Therefore, many language implementations are based on interpreters, which
are easier to write and are often better suitable for dynamically typed languages such as JavaScript,
Python or Ruby. A common approach is to transform the source program into some intermediate
representation such as an abstract syntax tree (AST), which is then interpreted.
Interpreters, however, are slow. They are usually written in some high-level language such as
Java and hardly perform any optimizations. For dynamically typed languages, primitive types such
as int or float are often boxed, i.e., operands of these types are wrapped into objects that are
inefficient to use in computations.
In many cases, run-time profiling can find out that a computation such as a + b always deals
with operands of a specific type (e.g., int) and could therefore be handled much more efficiently if
the operation would be considered to be an integer addition. The idea of Truffle is to replace the
subtree for the addition in this case with a specialized subtree handling an int addition without
boxing, which speeds up interpretation (and later also compiled code). In other cases, run-time
feedback could also suggest that certain operands have always been seen to be specific constants.
Replacing these operands with constants can gain additional speedup. In other words, the Truffle
AST is specialized during execution to the observed profile.
When an AST has reached a stable state and when it has been executed frequently enough it
is dynamically compiled to machine code applying state-of-the-art optimizations. Thus, the most
frequently executed parts of the program run efficiently in optimized machine code while the less
frequently executed parts are continued to be run in the interpreter. If one of the assumptions on
which a specialization was based turns out not to hold at run time (e.g., if an addition specialized
for integer operands suddenly has to deal with floating-point operands) the corresponding part of
the machine code is deoptimized, i.e., it is reverted to an unspecialized AST that is executed in the
interpreter again. The possibility to deoptimize allows Truffle to apply specializations optimistically
and aggressively, because it can always fall back to the unspecialized case if any assumptions turn
out to be wrong.
The Truffle project has been initiated at the Johannes Kepler University Linz and is now an
official research project at Oracle Labs. It is freely available under the OpenJDK license [7]. Several
other universities and research sites are contributing to it.
2
Truffle Concepts
Truffle is a self-optimizing interpreter framework with dynamic compilation that is based on
rewriteable abstract syntax trees. It is implemented in Java and runs on a modified version of
the Java HotspotTM VM using its services such as garbage collection and deoptimization.
The nodes of a Truffle AST are represented by classes that are derived from a common base
class Node providing rewriteability. Every node class has an execute() method that is responsible
for evaluating the AST rooted at this node.
A node for a general operation (e.g., a + operation accepting operands that are integers, floatingpoint numbers, complex numbers, or even strings) can be dynamically replaced with a specialized
node working only on integer operands, say, and which is therefore faster.
A language implementer can declare any number of specializations using a domain-specific
language [5] that specifies the operand types and other conditions under which these specializations
can be applied. The specializations are compiled to Java source code that becomes part of the Truffle
interpreter, which uses them to rewrite parts of the AST whenever specializations are applicable.
When the AST of a function or a method has reached a stable state and when it has been executed frequently enough it is dynamically compiled to machine code using the Graal compiler [7],
which is an optimizing compiler written in Java that can be invoked from Truffle. During compilation, the execute() methods of the affected AST nodes are inlined in their callers thus creating
a single piece of code to which optimizations such as constant folding, common subexpression
elimination, or escape analysis can be applied seamlessly. This is a form of partial evaluation that
compiles an AST for the specific situation under which it was used so far.
All specializations are guarded by run-time checks, which make sure that the assumptions
under which the specializations were applied still hold. If one of these guards fails at run time, the
specialization is taken back and the affected code is reverted to an unspecialized AST that is further
executed in the interpreter. This is possible, because the Hotspot VM features deoptimization and
the side-by-side execution of compiled and interpreted code.
3
Using Truffle for Interoperability
Truffle's approach of rewriting an AST during execution can be used to support seamless interoperability between different languages that are implemented under Truffle. Although all AST nodes are
derived from a common base class, different languages have specific node classes with execute()
methods that perform language-specific operations. By combining AST nodes of different languages
we can achieve transparent interoperability.
Truffle allows an object x that was implemented in a source language S to be used in code
that was implemented in a host language H. If the object is accessed there (e.g., x.f or in x[i])
this access is done in the syntax of the host language H. However, in the AST of language H the
foreign access is represented in a language-independent way, namely by nodes that send messages
to the accessed object, e.g., to read or write a property or to invoke a method. Upon first execution
of such a message in the interpreter the message gets resolved, i.e., the receiver of the message
returns an AST snippet representing the S -specific operations for performing this access in the
5th Int. Conf. on Mathematics and Informatics, September 2-4, 2015, Tˆargu Mure¸s, Romania
source language S. This snippet then replaces the message node in the AST of the host program.
Further executions of this access will not send a message but rather execute the S -specific nodes
for the access operations.
When the AST containing the foreign access gets compiled, inlining of the execute() methods
will create a single piece of code that represents both the operations of the host language and of
the source language. Compiler optimizations can thus work across language borders. In particular,
it is possible to inline methods that were written in a different language than the host language.
Truffle's interoperability mechanism works without any glue code between languages and does
not require a common object model to which all languages are mapped. Furthermore, it is not
restricted to a specific pair of languages but rather works for all languages that are implemented
under Truffle. The feasibility of this approach has been shown for Ruby and C [2] as well as for
JavaScript and C [4].
4
Using Truffle for Memory Safety
Low-level languages like C allow operations such as pointer arithmetic that can compromise memory safety, because they allow pointer values to reference memory outside of objects. Furthermore,
manual deallocation of objects may lead to dangling pointers or memory leaks.
In order to overcome these problems, we have built a memory-safe implementation of C on top
of Truffle [3]. The idea is to allocate all objects on the Java heap, where they are automatically
garbage-collected. A C pointer is represented as a reference to a Java object plus an offset, which
is initially 0. Member accesses via this pointer as well as pointer arithmetic just increment the
offset, and the runtime system takes care that it never exceeds the bounds of the referenced object.
Metadata stored with the Java object allows us to map a pointer offset to a specific member of the
object and to do type checking, even after casting the pointer to some other type.
When an object is accessed via a C pointer, the access is represented by an AST that includes
checks whether the access is safe. At run time, this AST is specialized according to the observed
profile and is finally compiled to machine code, whereupon many of the checks can be optimized
away. In order to detect whether the pointer is later set to an object of a different type, a guard
is inserted into the code. If the guard fails the machine code is deoptimized and falls back to the
unspecialized AST.
To prevent dangling pointers we mark the Java object that is referenced by a C pointer as
deallocated as soon as a free operation has been performed on the C pointer. When accessing the
object we check whether the access refers to a deallocated object and report an error in that case.
The Java object is automatically reclaimed by the Hotspot garbage collector when it is no longer
referenced by any C pointer.
The extra operations for ensuring memory safety cause some overhead, but part of it can be
optimized away during JIT compilation. Our safe C implementation on top of Truffle is only 15%
slower than code that was produced by GCC with the highest optimization level.
5
Evaluation
Truffle has been used to implement a number of languages such as JavaScript [9], Python [11],
Ruby [8], Smalltalk, and R [1]. In all these cases, the performance of programs processed with
Truffle is clearly superior to that of pure interpreters and can even compete with compiled industrystandard implementations for these languages.
In our presentation we will show performance numbers for JavaScript and R implemented
under Truffle. Currently, our JavaScript implementation under Truffle is on average five times
faster than Nashorn, Oracle's reference implementation of JavaScript, and about 20% slower on
average than Google's highly optimized implementation of JavaScript under V8, when evaluated
with the Google Octane benchmark suite [6]. This demonstrates that Truffle's self-optimizing
AST interpreter combined with a highly optimizing just-in-time compiler (Graal) can achieve
competitive performance while allowing developers to build new language implementations with
rather modest effort.
Acknowledgements
Truffle was designed and implemented by members of our institute as well as by researchers at
Oracle Labs, in particular Christian Humer, Lukas Stadler, Andreas W¨oß, Christian Wimmer, and
Thomas Wrthinger. We would like to thank them for their support and for their feedback on this
presentation. We are also grateful for the continuous funding of the project by Oracle Labs.
References
[1] FastR, 2013. URL https://github.com/allr/fastr/
[2] M. Grimmer, C. Seaton, T. W¨
urthinger, H. M¨ossenb¨ock: Dynamically Composing Languages
in a Modular Way: Supporting C Extensions for Dynamic Languages. Intl. Conf. on Modularity
(Modularity'14), March 16-19, 2015, Fort Collins, Colorado, USA, pp.1-13.
[3] M. Grimmer, R. Schatz, C. Seaton, T. W¨
urthinger, H. M¨ossenb¨ock: Memory-safe Execution
of C on a Java VM. Submitted to the Workshop on Programming Languages and Analysis for
Security (PLAS'15), July 6, 2015, Prague, Czech Republic.
[4] M. Grimmer, T. W¨
urthinger, A. W¨
oß, H. M¨ossenb¨ock: An Efficient Approach for Accessing C
Data Structures from JavaScript. Intl. Workshop on Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems (ICOOOLPS'14), July 28, 2014,
Uppsala, Sweden, pp. 1-4.
[5] C. Humer, C. Wimmer, C. Wirth, A. W¨oß, T. W¨
urthinger: A Domain-Specific Language for
Building Self-Optimizing AST Interpreters. Intl. Conf. on Generative Programming: Concepts
and Experiences (GPCE'14), Sept. 15-16, 2014, Vsteros, Sweden, pp. 123-132.
[6] Octane JavaScript benchmarks. URL https://developers.google.com/octane/
[7] OpenJDK. Graal project, 2015. URL http://openjdk.java.net/projects/graal/.
[8] C. Seaton, M. Van De Vanter, M. Haupt: Debugging at Full Speed. Workshop on Dynamic
Languages and Applications (Dyla'14), June 8, 2014, Edinburgh, UK, 2014, pp. 1-13.
[9] A. W¨
oß, C. Wirth, D. Bonetta, C. Seaton, C. Humer, H. M¨ossenb¨ock: An Object Storage Model
for the Truffle Language Implementation Framework. Intl. Conf. on Principles and Practice of
Programming in Java (PPPJ'14), September 23-26, 2014, Cracow, Poland, pp. 133-144.
[10] T. W¨
urthinger, C. Wimmer, A. W¨oß, L. Stadler, G. Duboscq, C. Humer, G. Richards, D.
Simon, and M. Wolczko: One VM to rule them all. In Proceedings of the Onward! conference,
ACM Press, 2013. doi: 10.1145/2509578.2509581.
[11] ZipPy, 2013. URL https://bitbucket.org/ssllab/zippy/