http://ssrg.ece.vt.edu Popcorn: Bridging the Programmability Gap in Heterogeneous-ISA Platforms Supported by Antonio Barbalace, Marina Sadini, Saif Ansary, Christopher Jelesnianski, Akshay Ravichandran, Cagil Kendir, Alastair Murray and Binoy Ravindran Systems Software Research Group, Department of Electrical and Computer Engineering, Virginia Tech, Virginia, USA {antoniob, marina, bmsaif86, bielsk1, akshay87, ckendir, alastair, binoy}@vt.edu OS-capable Het-ISA Platforms How to execute POSIX/shared on multiple Memory processor Islands Applications platforms? • Application Rewriting • Use another programming paradigm • Forget a single OS • Explicit communication in the application (e.g. MPI, OpenCL, offloading) ISA A application system image system image krn0 krn1 cpu0 Without rewriting, while exploiting arch diversity ? Evaluation ISA A ISA B application A single system image krn0 cpu0 krn1 cpu1 cpu2 cpu3 Source Code Analyzed Source Per-ISA Code Het-ISA Executable ISA B application PCIe Interconnected Popcorn Replicated-kernel OS Compiler Framework cpu1 cpu2 cpu3 Traditional Approach • Extends traditional SMP OS concepts – to support heterogeneous-ISA platforms – to improve programmability • Single OS, multiple kernels – Each kernel instance may run on a different ISA – Kernels communicate by message-passing – A global OS state is maintained amongst kernels • Hiding hardware diversity from apps • Popcorn compared to – Xeon native – Xeon Phi native – OpenCL – Intel Offloading • Up to 52% faster than the best native execution • Up to 6.3x faster than offload execution – Consistent services have a cost – If no cc shared memory is present there is a cost for DSM – If not considered the benefit can be negatively offset • No programmer intervention – Apps run transparently across and amongst kernels • Kernels communicate and coordinate via messaging framework • Inter-kernel thread migration • Namespaces provides Single System Image – SNU NPB version 1.0.3 – To exploit code diversity – Intra-application, inter-application • While considering the underlying OS overheads • Based on the Linux kernel • Compute/Memory bound • Exploit the best arch to run each code block • File system, IPC, PID, CPU • Page Replication (Distribute Shared Memory) • File Descriptor • Futex (Distributed version of SMP Futex) Considering the distributed OS costs it is possible to run SMP applications on OS-capable het-ISA platforms gaining performance advantages and programmability. www.popcornlinux.org • Based on LLVM, GCC, and Ros • Every code block is always compiled with the maximum optimization for each ISA • Per-ISA version of stateless library functions (libm) • • Consistent Services Conclusions – Extended interface • A cost model decides • • • • For each code block the optimal mapping Considering OS’s consistent services overhead A once time cost per application Platform dependent weights Xeon-Xeon Phi Prototype
© Copyright 2024