Starfish Efficient Concurrency Support for Computer Vision Applications Robert LiKamWa Lin Zhong Rice University 1 In the year 2020... 2 Continuous mobile vision Where did I place my keys? Oh, I haven't talked to Fred recently... ??? ?? Remind me to tell Lin to let me graduate! ??? ?? 3 Continuous mobile vision Energy consumption overwhelms wearable battery! Insufficient concurrency support for vision! 4 Application Capture Service Vision Headers Vision Library 5 Application Capture Service Vision Headers Scale Face 6 Vision Library 200+ ms Vision Library 200+ ms Vision Library 200+ ms Camera is overworked Computation is overworked We need efficient concurrency support! 7 Vision Library Vision Library Vision Library Observations: • Vision apps utilize common vision library 8 Scale Face common primitives Scale Face Scale SURF Observations: • Vision apps utilize common vision library • Capture/Computation is redundant across apps 9 Scale Face common primitives Scale Face Scale SURF Observations: • Vision apps utilize common vision library • Capture/Computation is redundant across apps 10 Proposal: Split-‐process design for centralized control Scale SURF Face Key Idea: Share computations, Reduce redundancy 11 Starfish Vision Library Starfish Core Service Starfish Library Vision Library Starfish Library Starfish Library Uses vision headers to intercept library calls Starfish Core Service Executes, tracks, and shares library call computations 12 Starfish Vision Library Starfish Core Service Starfish Library Vision Library Starfish Library Starfish Split-‐process library solution for efficient, transparent multi-‐app service 13 faceDetect(img) Starfish Vision Library Starfish Core Function Cache Arg Search Starfish Library faceDetect(img) Starfish Library First Call Execute call (20-‐200 ms) Store results Vision Library Exec. Function Cache Storage/Retrieval Subsequent Call(s) Skip execution (0 ms) Retrieve results 14 Starfish Vision Library Starfish Core Cache Search Starfish Library Vision Library Function Cache Starfish Library Starfish Share computations, Reduce redundancy 15 Starfish Vision Library Starfish Core Cache Search Starfish Library Starfish Library Vision Library Function Cache Timing optimizations: + Reuse "Fresh Frames” to promote cache hits + Delay call return to encourage device sleep 16 Starfish Library Starfish Core Cache Search Starfish Library Vision Library Mitigating Function Split-‐ P rocess Overhead Starfish Cache Library + Promote sharing by reusing "Fresh Frames” + Maintain code privacy by delaying call return + Manage concurrent requests through fine-‐grained locks 17 Starfish Library Starfish Core Cache Search Starfish Library Starfish Library Vision Library Function Cache Reduce expense of argument passing Avoid library object modification 18 Split-‐Process in the Literature • Object Redefinition • Lightweight RPC • SUN RPC • Cloud Transfer • COMET • MAUI • CloneCloud Zero-‐copy Require library redesign Object tracking Optimized for code offload 19 Split-‐Process Argument Passing Starfish Library faceDetect( Shared Memory Starfish Core ) 3.5 ms Vision Library Execution Goal: Minimize deep copy deep copy overhead is expensive 20 1) Protected shallow copy Starfish Library Shared Memory Starfish Core CoW faceDetect( ) Vision Library Execution CoW Issue copy-‐on-‐write (mprotect) on received objects 21 2) Direct output marshalling Starfish Library Shared Memory Starfish Core CoW faceDetect( ) Vision Library Execution CoW malloc() Write new data directly into shared memory 22 3) Reuse arguments from prior calls Starfish Library Shared Memory Starfish Core CoW faceDetect( ) Vision Library Execution Argument (img, shm_ptr) Table CoW malloc() Track, reuse previous inputs/outputs 23 (Usually) Zero-‐Copy Argument Passing Starfish Library Shared Memory Starfish Core CoW faceDetect( ) Vision Library Execution Argument (img, shm_ptr) Table CoW malloc() + Reduce deep copies through argument reuse + Reduce allocation through buffer reuse 24 Starfish Optimizations • Share Computations • Maintain expectations of developers & users • Increase sharing efficiency by relaxing timing • Decrease redundancy • Reduce argument copy • Reuse memory buffers 25 Experimental Platform OpenCV + Android + Google Glass Monsoon Power Monitor Benchmarks: 1) Per-‐call micro-‐benchmarks 2) Multi-‐app benchmarks OMAP4430 dual-‐core Cortex-‐A9 pinned to 600 MHz 26 Memory optimizations cut Starfish overhead in half Library'call'execution:'resize() First Call Native& Prepare&Inputs& Starfish&Unopt.& Send&Inputs& Starfish& Search&Cache& Native : 21 ms Unopt. Starfish : 42 ms Starfish : 24ms Exec.&Function& Allocate&Outputs& Prepare&Outputs& Receive&Outputs& 0" 5" 10" 15" 20" 25" Execution&time&(ms)& 27 Memory optimizations cut Starfish overhead in half Library'call'execution:'resize() First Call Native& Prepare&Inputs& Starfish&Unopt.& Send&Inputs& Starfish& Search&Cache& Native : 21 ms Unopt. Starfish : 42 ms Starfish : 24ms Second Call Exec.&Function& Native : 21 ms Unopt. Starfish : 10 ms Starfish : 6 ms Allocate&Outputs& Prepare&Outputs& Receive&Outputs& 0" 5" 10" 15" 20" 25" Execution&time&(ms)& Starfish works well when native function execution >> 5 ms 28 Places where Starfish fails • 5-‐6 ms performance overhead per-‐call: – Bad for fast computations – Bad with large, deep arguments • Non-‐cacheable functions – Random, temporal, external dependencies – Functions with specific parameters But great for many high-‐level functions: Face detect, Corner detect, Image resize 29 Starfish vs. Multi-‐App Workload If Lin then That Social Logger Facebook Google+ Twitter MySpace Whatsapp ... Scale Face Detect Face Recog 0.3 FPS 30 When running multiple apps, Starfish achieves higher performance Na/ve" Higher is better Starfish" 5" 4.5" 4" 3.5" 3" Frame&Rate& 2.5" (FPS)& 2" 1.5" 1" 0.5" 31 0" 1" 2" 3" 4" 5" 6" 7" Number&of&App&Instances& 8" 9" 10" When running multiple apps, Starfish draws single-‐app power Na.ve" Starfish" 2500" 2000" 1500" Power& Consump-on& (mW)& 1000" 500" 0" 1" 2" 3" 4" 5" 6" 7" Number&of&App&Instances& 8" 9" 10" Lower is better 32 When running multiple apps, Starfish draws single-‐app power Na.ve" Starfish" 2500" 2000" 1500" Power& Consump-on& (mW)& 1000" 500" 0" 1" 2" 3" 4" 5" 6" 7" 8" 9" 10" Number&of&App&Instances& Starfish achieves efficient concurrency support! 33 Starfish Vision Library Starfish Core Cache Search Starfish Library Vision Library Function Cache Starfish Library Starfish Share computations, Reduce redundancy 34 Continuing mobile vision Mitigating Analog Signal Chain Bandwidth g n i v r Prese bject u S / r e Us y c a v i Pr Designing Efficient Systems 35 Starfish: Share computations, Reduce redundancy Transparent, efficient library call caching • Timing optimizations for frame freshness and performance preservation • Memory optimizations for minimal deep copy and small cache footprint Google Glass Experiments: • Low overhead Memory optimizations slash overhead in half • Reduced power draw Multi-‐app workloads draw Single app power • 36
© Copyright 2024