System-level IPC on Multi-core Platforms – 2013-09-23 SICS Multicore Day

System-level IPC on Multi-core
Platforms
SICS Multicore Day – 2013-09-23
Ola Dahl
CTO Office
Enea
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Before we start
• Enea
~400 employees
468 MSEK revenue
Products and Services
Services
FOUNDED
Middleware
OSE
Linux
1968
Now
LTH
• Myself
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
STLiU
Ericsson
System-level IPC
Message-passing between processes – intra-node
and inter-node
Monitoring and event handling – fault-tolerance
OSE operating system – kernel services, file system services, IP
communication, program management, run-time loader, LINX
Number of communicating entities ~ tens of thousands (pid
space extension from 16 to 20 bits) – number of nodes ~ 100s
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
System-level IPC
Element Messaging Framework – Name
server, message dispatch, communication
patterns, HA functionality, Linux
C
C
C
C
A A
#nodes ~ 100(s)
#threads/node ~ 1000s
B
D
B
D
D
A
D
B
D
D
C
Elastic Multi-Node
Fixed Multi-Node
A
D
B
C
SoC Platform
Cloud
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
IPC
Operating System
Operating System
Communicating entities - Linux process, Linux thread, RTOS task, Bare-metal
executive, User-space thread, Other executing entity (e.g. in an event-driven
execution model)
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
IPC and Multicore
Operating System
C0
C1
C2
C3
Operating System
C0
C4
Bus, Interconnect, Cache, Controllers, I/O
C1
D0
D1
D2
Bus, Interconnect, Cache, Controllers, I/O
Multicore, Multiple processing entities, Parallelism on different levels – inside
one SoC block, inside SoC, between SoC
Communication on different levels – interconnect, caches, memory, hardware
buffers and hardware IPC support
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
IPC and Multicore
Realtime
Operating System
C0
C1
C2
C3
Non-Realtime
Operating System
C0
C4
Bus, Interconnect, Cache, Controllers, I/O
C1
D0
D1
D2
Bus, Interconnect, Cache, Controllers, I/O
Multicore, Multiple processing entities, Parallelism on different levels – inside
one SoC block, inside SoC, between SoC
Communication on different levels – interconnect, caches, memory, hardware
buffers and hardware IPC support
Real-time – core isolation – dedicated cores for real-time response
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Heterogeneous Hardware
TCI6638K2K - Multicore DSP+ARM KeyStone II System-on-Chip http://www.ti.com/product/tci6638k2k
Processing – 8 C66x DSP Cores (up to 1.2 GHz), 4 ARM Cores (up to 1.4 GHz), Wireless comm (3GPP)
coprocessors
Interconnect and control - Multicore Navigator, TeraNet, Multicore Shared Memory Controller, HyperLink
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Heterogeneous Software
Core isolation for real-time response
Realtime
Non-Realtime
Real-time domain and non-real-time
domain
Run-time categories in real-time domain
• Native threads
• User-space threads
• RTOS migration
• Other execution frameworks, e.g.
Open Event Machine
• ENEA LWRT
Operating System
C0
C1
D0
D1
D2
Bus, Interconnect, Cache, Controllers, I/O
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
System-level IPC and Multicore
Communicating entities – e.g. processes, threads, user-space
threads, bare-metal executives
Levels of parallelism
• Multicore processor in a SoC
• Multiple blocks in a SoC
• Multiple SoC in a node
• Multiple nodes
Communication on different levels (e.g. intra-node and internode)
• On each level – Establish contact, Perform communication,
Monitor and act on events, Close
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Where are we heading?
Linux
Hardware
Virtualisation
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Linux
EE Times report - http://seminar2.techonline.com/~additionalresources/embedded_mar1913/embedded_mar1913.pdf
Linux usage
2013 – 50%
2012 – 46%
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Linux
Status of embedded Linux – March 2013
http://elinux.org/images/c/cf/Status-of-Embedded-Linux-2013-03-JJ44.pdf
•
•
•
•
Average time between Linux releases – 3.3 – 3.8 – 70 days
Linux 3.4 – RPMsg for IPC between Linux and e.g. RTOS
Linux 3.7 – ARM multi-platform support, ARM 64-bit support
Linux 3.7 – perf trace (alternative to strace)
Status of Linux – September 2013
• Latest stable kernel – 3.11.1
• Example changes in 3.11 (released September 2, 2013):
– ARM huge page support, KVM and XEN support for ARM64
– SYSV IPC message queue scalability improvements
• Example changes in 3.10 (released June 30, 2013):
– Timerless multitasking
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Linux and real-time
Real-time framework e.g. Xenomai - http://www.xenomai.org/
PREEMPT_RT - https://rt.wiki.kernel.org/index.php/Main_Page
Core isolation and tickless operation – striving for ”Bare-Metal Multicore
Performance in a General-Purpose Operating System” http://www2.rdrop.com/~paulmck/scalability/paper/BareMetalMW.2013.02.25a.
pdf
Timerless multitasking in 3.10 retains 1 Hz tick also on isolated cores
Linux 3.12-rc1 (2013-09-16) - even more tickless kernel (1 Hz maintenance tick
removed) – still work to be done, e.g. with memory management
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Hardware
ITRS - http://public.itrs.net - fifteen-year assessment of the semiconductor
industry’s future technology requirements
ITRS 2012 UPDATE - http://public.itrs.net/Links/2012ITRS/Home2012.htm
• System Drivers - SOC Networking Driver, SOC Consumer Driver,
Microprocessor (MPU) driver, Mixed-Signal Driver, Embedded Memory
Driver
• SOC networking driver - moving towards “multicore architectures with
heterogeneous on-demand accelerator engines”, with “integration of onboard switch fabric and L3 caches”
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Hardware
SOC networking driver – MC/AE Architecture – from
http://public.itrs.net/Links/2011ITRS/2011Chapters/2011SysDrivers.pdf
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Hardware
SOC networking driver – System performance and # of cores – from
http://public.itrs.net/Links/2011ITRS/2011Chapters/2011SysDrivers.pdf
Assumptions - constant cost (die area), per-year increase of number of cores (1.4 x), core frequency
(1.05 x), accelerator engine frequency (1.05 x) - logic, memory, cache hierarchy, switching-fabric and
system interconnect will scale consistently with the number of cores
System performance – the “product of number of cores, core frequency, and accelerator engine
frequency”
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Virtualization
NFV – Network Function Virtualization
ETSI - http://portal.etsi.org/NFV/NFV_White_Paper.pdf
“leveraging standard IT virtualisation technology to consolidate many network
equipment types onto industry standard high volume servers, switches and
storage, which could be located in Datacentres, Network Nodes and in the end
user premises”
Virtualization using e.g. KVM or XEN
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
System-level IPC aspects
Establishing and performing efficient communication
Constraints from
• Real-time
• Hardware
with an increasing interest in virtualization
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
IPC and Linux
Is there any remaining work to do?
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
IPC in Linux (and UNIX)
POSIX named
semaphore
Linux 2.6
mmap
SVR4
pipe
POSIX
rt
UNIX
SysV
FOUNDED
CMA
Linux 3.2
eventfd
Linux 2.6.22
Now
1964
’70
Enea
’90
’80
Emacs
flock
4.2BSD
Linux 1.0
’10
’00
POSIX shmem
Linux 2.4
POSIX mq
Linux 2.6.6
Overview, book, man pages, etc. by Michael Kerrisk - http://man7.org/
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
IPC on Linux
nanomsg
OpenMPI
TIPC
kdbus
AF_BUS
Binder
DBUS
FOUNDED
RPMsg
0MQ
Now
2000
’2
’4
’6
’8
LINX for Linux
Enea Element
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
’10
Work in progress
sysv ipc shared mem optimizations, June 18, 2013
http://lwn.net/Articles/555469/
“With these patches applied, a custom shm microbenchmark stressing
shmctl doing IPC_STAT with 4 threads a million times, reduces the
execution time by 50%”
ALS: Linux interprocess communication and kdbus, May 30, 2013
http://lwn.net/Articles/551969/
“The work on kdbus is progressing well and Kroah-Hartman expressed
optimism that it would be merged before the end of the year. Beyond
just providing a faster D-Bus (which could be accomplished without
moving it into the kernel, he said), it is his hope that kdbus can
eventually replace Android's binder IPC mechanism. “
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Work in progress
Speeding up D-Bus, February 29, 2012
http://lwn.net/Articles/484203/
“D-Bus currently relies on a daemon process to authenticate processes
and deliver messages that it receives over Unix sockets. Part of the
performance problem is caused by the user-space daemon, which
means that messages need two trips through the kernel on their way to
the destination”
Fast interprocess communication revisited, November 9, 2011
https://lwn.net/Articles/466304/
“Rather we start with the observation that this many attempts to solve
essentially the same problem suggests that something is lacking in
Linux. There is, in other words, a real need for fast IPC that Linux
doesn't address”
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Work in progress
Fast interprocess messaging, September 15, 2010
http://lwn.net/Articles/405346/
“Rather than copy messages through a shared segment, they would
rather deliver messages directly into another process's address space.
To this end, Christopher Yeoh has posted a patch implementing what
he calls cross memory attach.”
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Which IPC to use?
Functionality
Performance
Cost
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Technology constraints
Choosing an IPC - Functionality
Functionality
SysV
Shared
memory
POSIX
Shared
memory
FIFO
Stream
Socket
0MQ
LINX
End-point
addressing
SysV key
Shmem
object name
File system
node
AF_UNIX –
file system
node,
AF_INET – IP
adress and
port
Transport and
address
(Transport =
TCP, ipc,
inproc)
Endpoint
name
specifying
path to peer
End-point
repr.
Variable
File desc
File desc x 2
Socket
descriptor
0MQ socket
LINX
endpoint, spid
Channels
A
memory
area
A memory
area
The FIFO
(unidirectional)
The socket
(bidirectional)
0MQ socket
internal
(bidirectional)
– e.g. TCP or
UNIX domain
socket
Buffer
associated
with LINX
endpoint
Initialisation
shmget,
shmat
shm_open,
mmap
mkfifo, open
socket, bind,
listen, accept,
connect
Create 0MQ
context and
0MQ socket
linx_open,
linx_hunt
Closing
shmdt
munmap,
shm_unlink
close, unlink
close
Close 0MQ
socket
linx_close
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Choosing an IPC - Functionality
Functionality
SysV
Shared
memory
POSIX
Shared
memory
FIFO
Stream
Socket
0MQ
LINX
Sending
write to
memory, no
synchronizati
on
write to
memory, no
synchronizat
ion
write
write
Send message
or number of
bytes to 0MQ
socket
Send LINX
signal
Receiving
Read from
memory, no
synchronizati
on
Read from
memory, no
synchronizat
ion
read
read
Receive
message or
number of
bytes from
0MQ socket
Receive
LINX signal
Blocking
No (unless
implemented
separately)
No (unless
implemented
separately)
Blocking
and nonblocking
R/W
Blocking and
non-blocking
R/W
Blocking and
non-blocking
R/W
Receive is
blocking
(nonblocking
possible),
Send is not
Monitoring
No (unless
implemented
separately)
No (unless
implemented
separately)
select, poll
select, poll
Monitoring
callback can be
registered with
0MQ context
LINX attach
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Choosing an IPC – Technology constraints
Technology
0MQ
kdbus
LINX
Sockets
Yes
No
Yes, own type
Daemons
No
No
Discovery daemon
(optional)
Kernel modules
No
Yes
Yes
Pthread
synchronization
Yes
No
Yes
Kernel synchronization
No
Yes
Yes
Programming
languages
C and more
C
C
Development status
Latest stable
release is 3.2.3,
from May 2013
Estimated to be ready in
2013
Initial release 2006,
current version is
2.6.5, released June
2013
License
LGPLv3
LGPL
BSD and GPLv2
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Choosing an IPC - performance
• ipc-bench: A UNIX inter-process communication benchmark
• University of Cambridge http://www.cl.cam.ac.uk/research/srg/netos/ipc-bench/
Measures Latency, Throughput, IPI latency
• Public results dataset
“Since we have found IPC performance to be a complex, multi-variate
problem, and because we believe that having an open corpus of
performance data will be useful to guide the development of
hypervisors, kernels and programming frameworks, we provide a
database of aggregated ipc-bench datasets.”
Enea and ipc-bench – porting to 32-bit, porting to ARM, porting to
PowerPC, adding tests for CMA, LINX, ZeroMQ
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Measuring IPC performance
Why is this interesting?
From The case for reconfigurable I/O channels, S. Smith et al,
RESoLVE12, 2012 - http://anil.recoil.org/papers/2012-resolve-fable.pdf
“We show dramatic differences in performance between
communication mechanisms depending on locality and machine
architecture, and observe that the interactions of communication
primitives are often complex and sometimes counter-intuitive”
“Furthermore, we show that virtualisation can cause unexpected effects
due to OS ignorance of the underlying, hypervisor-level hardware
setup”
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Measuring IPC performance
Submitted measurements - http://www.cl.cam.ac.uk/research/srg/netos/ipc-bench/details/tmpn2YlFp.html
Pairwise IPC latency between cores
64 cores, AMD Opteron(TM) Processor 6272, 8 NUMA nodes, 125.9 GB
Linux 3.8.5-030805-generic, x86_64
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Measuring IPC performance
Submitted measurements - http://www.cl.cam.ac.uk/research/srg/netos/ipc-bench/details/tmpn2YlFp.html
Pairwise IPC throughput between cores. (x-axis is packet size, y-axis is Gbps)
64 cores, AMD Opteron(TM) Processor 6272, 8 NUMA nodes, 125.9 GB
Linux 3.8.5-030805-generic, x86_64
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Measuring IPC performance
Intel(R) Xeon(R) CPU - X3460 @ 2.80GHz, Cores 6 and 7
180000
160000
140000
mempipe_spin_thr
120000
mempipe_thr
100000
pipe_thr
tcp_thr
80000
unix_thr
vmsplice_coop_pipe_thr
60000
vmsplice_pipe_thr
40000
20000
0
64
4096
65536
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Measuring IPC performance
ARM Pandaboard @ 1 GHz, Cores 0 and 1
3000
2500
mempipe_spin_thr
2000
mempipe_thr
pipe_thr
1500
tcp_thr
unix_thr
1000
vmsplice_coop_pipe_thr
vmsplice_pipe_thr
500
0
64
4096
65536
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Measuring IPC performance
Intel(R) Xeon(R) CPU - X3460 @ 2.80GHz, Cores 6 and 7
30000
0MQ vs UNIX sockets
25000
20000
64
15000
4096
65536
10000
5000
0
zmq_inproc_thr
zmq_ipc_thr
zmq_tcp_thr
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
unix_thr
Profiling and Performance
Brendan Gregg - Linux Performance Analysis and Tools - SCaLE 11x 2013
http://dtrace.org/blogs/brendan/2013/06/08/linux-performance-analysis-andtools/
Apps and libs
System call interface
***
VFS, File
systems, Block
device interface
Sockets,
TCP/UDP, IP,
Ethernet
Scheduler, VM
Device drivers
- perf - https://perf.wiki.kernel.org/index.php/Main_Page
*** - DTrace - https://github.com/dtrace4linux
- SystemTap - http://sourceware.org/systemtap/
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Profiling and Performance
Collecting data with perf – IPC test with pipes
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Profiling and Performance
Analyzing data recorded with perf
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Profiling and Performance
Examining where time is spent
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Profiling and Performance
A lot more to choose from*: strace, netstat, top, pidstat, mpstat, dstat,
vmstat, slabtop, free, tcpdump, ip, nicstat, iostat, iotop, blktrace, ps,
pmap, traceroute, ntop, ss, lsof, oprofile, gprof, kcachegrind, valgrind,
google profiler, nfsiostat, cifsiostat, latencytop, powertop, LLTng,
ktap, ...
* http://www.brendangregg.com/Slides/SCaLE_Linux_Performance2013.pdf
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Summary
IPC in Linux - Stable but not finished
IPC on Linux – diversified
Performance and profiling – ipc-bench (with adaptations and
extensions), a large selection of profiling tools
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Conclusions
• A variety of IPC mechanisms exist
• There is no clear one-fits-all solution
• Performance aspects and functionality aspects (location
transparency, robustness) – different trade-offs for different
use-cases
• IPC and Linux – many stable mechanisms but still work-inprogress (e.g. kdbus)
• Performance and profiling required
– ipc-bench (with adaptations and extensions)
– perf for performance profiling (one of several, however with a powerful
feature set)
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
Challenges
• Systems requirements and design - parallelism, partitioning,
heterogeneity, functional requirements, performance requirements –
choosing an IPC mechanism
• Programming - frameworks and execution environments – legacy
and re-use – choosing a programming paradigm
• Verification - measurements and profiling - are we designing (and
implementing) the system as we planned? – choosing the right tools
Enea as an IPC partner - Long-term experience, Competence for building future
IPC systems – development, integration, configuration, performance assessment
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB
SICS Multicore day
System-level IPC on multicore platforms
Multicore System-on-Chip solutions, offering parallelization and
partitioning, are increasingly used in real-time systems. As the
number of cores increase, often in combination with increased
heterogeneity in the form of hardware accelerated functionality,
we see increased demands on effective communication, inside a
multicore node but also on an inter-node system-level.
The presentation will outline some of the challenges, as seen
from Enea, to be expected when building future communication
mechanisms, with requirements on performance and scalability,
as well as transparency for applications. We will give examples
from ongoing work in the Linux area, from Enea and from other
open source contributors.
Enea
Confidential
– Under
Copyright
© 2013
EneaNDA
AB