OS-capable het-ISA platforms

http://ssrg.ece.vt.edu
Popcorn: Bridging the Programmability Gap in Heterogeneous-ISA Platforms
Supported by
Antonio Barbalace, Marina Sadini, Saif Ansary, Christopher Jelesnianski, Akshay Ravichandran, Cagil Kendir, Alastair Murray and Binoy Ravindran
Systems Software Research Group, Department of Electrical and Computer Engineering, Virginia Tech, Virginia, USA
{antoniob, marina, bmsaif86, bielsk1, akshay87, ckendir, alastair, binoy}@vt.edu
OS-capable Het-ISA Platforms
How to execute
POSIX/shared
on multiple
Memory
processor Islands
Applications
platforms?
• Application Rewriting
• Use another
programming
paradigm
• Forget a single OS
• Explicit
communication in the
application (e.g. MPI,
OpenCL, offloading)
ISA A
application
system image
system image
krn0
krn1
cpu0
Without rewriting,
while exploiting arch diversity ?
Evaluation
ISA A
ISA B
application
A
single system image
krn0
cpu0
krn1
cpu1
cpu2
cpu3
Source
Code
Analyzed
Source
Per-ISA
Code
Het-ISA
Executable
ISA B
application
PCIe Interconnected
Popcorn
Replicated-kernel OS
Compiler Framework
cpu1
cpu2
cpu3
Traditional
Approach
• Extends traditional SMP OS concepts
– to support heterogeneous-ISA platforms
– to improve programmability
• Single OS, multiple kernels
– Each kernel instance may run on a different ISA
– Kernels communicate by message-passing
– A global OS state is maintained amongst kernels
• Hiding hardware diversity from apps
• Popcorn compared to
– Xeon native
– Xeon Phi native
– OpenCL
– Intel Offloading
• Up to 52% faster than the best native
execution
• Up to 6.3x faster than offload
execution
– Consistent services have a cost
– If no cc shared memory is present there is a cost for DSM
– If not considered the benefit can be negatively offset
• No programmer intervention
– Apps run transparently across and amongst kernels
• Kernels communicate and coordinate via
messaging framework
• Inter-kernel thread migration
• Namespaces provides Single System
Image
– SNU NPB version 1.0.3
– To exploit code diversity
– Intra-application, inter-application
• While considering the underlying OS overheads
• Based on the Linux kernel
• Compute/Memory bound
• Exploit the best arch to run each code block
• File system, IPC, PID, CPU
• Page Replication (Distribute Shared Memory)
• File Descriptor
• Futex (Distributed version of SMP Futex)
Considering the distributed OS costs it is possible to run
SMP applications on OS-capable het-ISA platforms
gaining performance advantages and programmability.
www.popcornlinux.org
• Based on LLVM, GCC, and Ros
• Every code block is always compiled with
the maximum optimization for each ISA
• Per-ISA version of stateless library
functions (libm)
•
• Consistent Services
Conclusions
– Extended interface
• A cost model decides
•
•
•
•
For each code block the optimal mapping
Considering OS’s consistent services overhead
A once time cost per application
Platform dependent weights
Xeon-Xeon Phi Prototype