Efficient Concurrency Support Computer Vision

Starfish
Efficient Concurrency Support
for Computer Vision Applications
Robert LiKamWa
Lin Zhong
Rice University
1
In the year 2020...
2
Continuous mobile vision
Where did I place
my keys?
Oh, I haven't
talked to Fred
recently...
???
??
Remind me to
tell Lin to let
me graduate!
???
??
3
Continuous mobile vision
Energy consumption overwhelms wearable battery!
Insufficient concurrency support for vision!
4
Application
Capture Service
Vision Headers
Vision Library
5
Application
Capture Service
Vision Headers
Scale
Face
6
Vision
Library
200+ ms
Vision
Library
200+ ms
Vision
Library
200+ ms
Camera is overworked
Computation is overworked
We need efficient concurrency support!
7
Vision
Library
Vision
Library
Vision
Library
Observations:
• Vision apps utilize common vision library
8
Scale
Face
common primitives
Scale
Face
Scale
SURF
Observations:
• Vision apps utilize common vision library
• Capture/Computation is redundant across apps
9
Scale
Face
common primitives
Scale
Face
Scale
SURF
Observations:
• Vision apps utilize common vision library
• Capture/Computation is redundant across apps
10
Proposal: Split-­‐process design for centralized control Scale
SURF
Face
Key Idea:
Share computations,
Reduce redundancy
11
Starfish
Vision Library
Starfish Core
Service
Starfish
Library
Vision Library
Starfish
Library
Starfish Library
Uses vision headers to intercept library calls
Starfish Core Service
Executes, tracks, and shares
library call computations
12
Starfish
Vision Library
Starfish Core
Service
Starfish
Library
Vision Library
Starfish
Library
Starfish
Split-­‐process library solution for efficient, transparent multi-­‐app service
13
faceDetect(img)
Starfish
Vision Library
Starfish Core
Function Cache Arg Search
Starfish
Library
faceDetect(img)
Starfish
Library
First Call
Execute call (20-­‐200 ms)
Store results
Vision Library
Exec.
Function Cache
Storage/Retrieval
Subsequent Call(s)
Skip execution (0 ms)
Retrieve results
14
Starfish
Vision Library
Starfish Core
Cache Search
Starfish
Library
Vision Library
Function
Cache
Starfish
Library
Starfish
Share computations,
Reduce redundancy
15
Starfish
Vision Library
Starfish Core
Cache Search
Starfish
Library
Starfish
Library
Vision Library
Function
Cache
Timing optimizations:
+ Reuse "Fresh Frames” to promote cache hits
+ Delay call return to encourage device sleep
16
Starfish
Library
Starfish Core
Cache Search
Starfish
Library
Vision Library
Mitigating Function
Split-­‐
P
rocess Overhead
Starfish
Cache
Library
+ Promote sharing by reusing "Fresh Frames”
+ Maintain code privacy by delaying call return
+ Manage concurrent requests through fine-­‐grained locks
17
Starfish
Library
Starfish Core
Cache Search
Starfish
Library
Starfish
Library
Vision Library
Function
Cache
Reduce expense of argument passing
Avoid library object modification
18
Split-­‐Process in the Literature
• Object Redefinition
• Lightweight RPC
• SUN RPC
• Cloud Transfer • COMET
• MAUI
• CloneCloud
Zero-­‐copy
Require library redesign Object tracking
Optimized for code offload
19
Split-­‐Process Argument Passing
Starfish Library
faceDetect(
Shared Memory
Starfish Core
)
3.5 ms
Vision Library
Execution
Goal: Minimize deep copy
deep copy overhead
is expensive 20
1) Protected shallow copy
Starfish Library
Shared Memory
Starfish Core
CoW
faceDetect(
)
Vision Library
Execution
CoW
Issue copy-­‐on-­‐write (mprotect) on received objects
21
2) Direct output marshalling
Starfish Library
Shared Memory
Starfish Core
CoW
faceDetect(
)
Vision Library
Execution
CoW
malloc()
Write new data directly into shared memory
22
3) Reuse arguments from prior calls
Starfish Library
Shared Memory
Starfish Core
CoW
faceDetect(
)
Vision Library
Execution
Argument (img, shm_ptr)
Table
CoW
malloc()
Track, reuse previous inputs/outputs
23
(Usually) Zero-­‐Copy Argument Passing
Starfish Library
Shared Memory
Starfish Core
CoW
faceDetect(
)
Vision Library
Execution
Argument (img, shm_ptr)
Table
CoW
malloc()
+ Reduce deep copies through argument reuse
+ Reduce allocation through buffer reuse 24
Starfish Optimizations
• Share Computations
• Maintain expectations of developers & users
• Increase sharing efficiency by relaxing timing
• Decrease redundancy
• Reduce argument copy
• Reuse memory buffers
25
Experimental Platform
OpenCV + Android + Google Glass Monsoon Power Monitor
Benchmarks:
1) Per-­‐call micro-­‐benchmarks
2) Multi-­‐app benchmarks
OMAP4430 dual-­‐core Cortex-­‐A9 pinned to 600 MHz
26
Memory optimizations cut Starfish overhead in half
Library'call'execution:'resize()
First Call
Native&
Prepare&Inputs&
Starfish&Unopt.&
Send&Inputs&
Starfish&
Search&Cache&
Native : 21 ms
Unopt. Starfish : 42 ms Starfish : 24ms
Exec.&Function&
Allocate&Outputs&
Prepare&Outputs&
Receive&Outputs&
0"
5"
10"
15"
20"
25"
Execution&time&(ms)&
27
Memory optimizations cut Starfish overhead in half
Library'call'execution:'resize()
First Call
Native&
Prepare&Inputs&
Starfish&Unopt.&
Send&Inputs&
Starfish&
Search&Cache&
Native : 21 ms
Unopt. Starfish : 42 ms Starfish : 24ms
Second Call
Exec.&Function&
Native : 21 ms
Unopt. Starfish : 10 ms Starfish : 6 ms Allocate&Outputs&
Prepare&Outputs&
Receive&Outputs&
0"
5"
10"
15"
20"
25"
Execution&time&(ms)&
Starfish works well when
native function execution >> 5 ms
28
Places where Starfish fails
• 5-­‐6 ms performance overhead per-­‐call:
– Bad for fast computations
– Bad with large, deep arguments
• Non-­‐cacheable functions
– Random, temporal, external dependencies
– Functions with specific parameters
But great for many high-­‐level functions:
Face detect, Corner detect, Image resize
29
Starfish vs. Multi-­‐App Workload
If Lin then That
Social Logger
Facebook
Google+
Twitter
MySpace
Whatsapp
...
Scale
Face
Detect
Face
Recog
0.3 FPS
30
When running multiple apps, Starfish achieves higher performance
Na/ve"
Higher is better
Starfish"
5"
4.5"
4"
3.5"
3"
Frame&Rate&
2.5"
(FPS)&
2"
1.5"
1"
0.5"
31 0"
1"
2"
3"
4"
5"
6"
7"
Number&of&App&Instances&
8"
9"
10"
When running multiple apps,
Starfish draws single-­‐app power
Na.ve"
Starfish"
2500"
2000"
1500"
Power&
Consump-on&
(mW)&
1000"
500"
0"
1"
2"
3"
4"
5"
6"
7"
Number&of&App&Instances&
8"
9"
10"
Lower
is better
32
When running multiple apps,
Starfish draws single-­‐app power
Na.ve"
Starfish"
2500"
2000"
1500"
Power&
Consump-on&
(mW)&
1000"
500"
0"
1"
2"
3"
4"
5"
6"
7"
8"
9"
10"
Number&of&App&Instances&
Starfish achieves efficient concurrency support! 33
Starfish
Vision Library
Starfish Core
Cache Search
Starfish
Library
Vision Library
Function
Cache
Starfish
Library
Starfish
Share computations,
Reduce redundancy
34
Continuing mobile vision
Mitigating
Analog
Signal Chain
Bandwidth
g
n
i
v
r
Prese bject
u
S
/
r
e
Us
y
c
a
v
i
Pr
Designing
Efficient
Systems
35
Starfish: Share computations, Reduce redundancy
Transparent, efficient library call caching
• Timing optimizations for frame freshness and performance preservation
• Memory optimizations for minimal deep copy and small cache footprint
Google Glass Experiments:
• Low overhead Memory optimizations slash overhead in half
• Reduced power draw
Multi-­‐app workloads draw Single app power
•
36