On-chip system level visibility How to optimise ARM platforms & shorten time to market ? Serge Poublan ARM CoreSight product manager Javier Orensanz ARM tools product manager 1 DRIVERS & USAGE CASES 2 Higher level of integration Optimized ARM Smartphone Block Diagram 3 Software cost explodes 4 Many users need on-chip visibility 5 Visibility for all with CoreSight™ 6 Wide industry support 7 CoreSight™: an overview AMBA AXI Cortex A9 Cortex R4 Example ARM SoC APB bridge DSP Shared peripherals 8 Port CoreSight & real-time trace System Trace AMBA AXI Example ARM SoC APB bridge Cortex A9 Cortex R4 DSP CPU Trace CPU Trace DSP ETM Bus trace System trace Shared peripherals Port Processor Trace Source Code associated coverage instructions Cycles per instruction 9 Interlock information CoreSight & multi-core debug Cost effective debug AMBA AXI Example ARM SoC Cross trigger matrix APB bridge Cross Trigger DSP Interface Cross Trigger Cortex R4 Interface Cross Trigger DAP Interface SWD Cortex A9 Shared peripherals Debug bus (APB) Debug control bus RealView ICE RealView Trace 10 Port CoreSight & trace management System Trace Cost effective debug AMBA AXI Example ARM SoC Cross trigger matrix DSP ETM Cross Trigger DSP Interface CPU Trace APB bridge Cross Trigger Cortex R4 Interface CPU Trace Cross Trigger DAP Interface SWD Cortex A9 Bus trace System trace Shared peripherals Port Debug bus (APB) Trace bus (ATB) Funnel RealView ICE RealView Trace 11 Debug control bus Trace bus for system trace Trace Collection strategies DEBUG INTERFACE 12 Debug …. It is all about cost! A mature, simple and low cost 2 pin debug interface 2 pins (clock & data), a simple protocol Optimised to access memory mapped debug devices Fully synchronous for high performance (100Mhz) & synthesis Over 20 tools vendors support SWD Multi-drop support with SWD v2 Inter-operable with IEEE1149.7 13 SYSTEM TRACE 14 System level visibility up to the final product CoreSight System Trace Macrocell For system & application level debug & optimisation Deployable up to end product at very low cost Funnel 15 System level visibility up to the final product CoreSight System Trace Macrocell For system & application level debug & optimisation Deployable up to end product at very low cost A single resource for High level application software view (Apps, kernel, firmware) Funnel 16 System level visibility up to the final product CoreSight System Trace Macrocell For system & application level debug & optimisation Deployable up to end product at very low cost A single resource for High level application software Funnel 17 view (Apps, kernel, firmware) Tuning of system performance System level visibility up to the final product CoreSight System Trace Macrocell For system & application level debug & optimisation Deployable up to end product at very low cost A single resource for High level application software Funnel 18 view (Apps, kernel, firmware) Tuning of system performance Tracing of internal SoC signals (e.g IRQ, DMA, …) High level view for software developers High performance “hardware printf” with minimum intrusion Focus on “key part” of your s/w code Monitor the whole system not only CPU Comply with MIPI® STPv2 Linux drivers & libraries Decrease tooling cost 19 TRACE MANAGEMENT 20 Cost effective trace collection CoreSight merges trace sources to System Trace Bus Trace Trace bus (ATB) Funnel 21 reduce packaging cost CPU Trace CPU Trace Cost effective trace collection CoreSight merge trace sources to reduce packaging cost CPU Trace CPU Trace System Trace Bus Trace Export trace to Debug pins (2 pins) Dedicated Trace port (parallel or Trace bus (ATB) Funnel 22 serial high speed trace) Existing functional links Cost effective trace collection CoreSight merge trace sources to reduce packaging cost CPU Trace CPU Trace System Trace Bus Trace Export trace to Debug pins (2 pins) Dedicated Trace port (parallel or Trace bus (ATB) serial high speed trace) Existing functional links Funnel Capture trace in System memory with OS 23 management Dedicated trace buffer (SRAM) Trace export through 2-pin SWD PTM STM Trace bus (ATB) Funnel Export trace with 2 pins Serial Wire Debug Debug bus (APB) TMC SWD 24 DAP Buffer Reduce trace port size with FIFO mode PTM STM Trace bus (ATB) Funnel Re-use SRAM for FIFO mode Export trace with 2 pins Serial Wire Debug Debug bus (APB) SWD DAP Average bandwidth out to allow fitting of a narrower Trace Port 25 Buffer FIFO TPIU Trace Port Bits / cycle TMC Route trace to existing SoC resources PTM STM Trace bus (ATB) Funnel Re-use SRAM for FIFO mode Export trace with 2 pins Serial Wire Debug Debug bus (APB) IO Controller TMC SWD DAP Buffer FIFO System Memory Router AMBA AXI Average bandwidth out to allow fitting of a narrower Trace Port 26 TPIU Trace Port Route trace to IO controller or system memory SUMMARY 27 CoreSight™ visibility affordable for ALL Silicon vendors Reduce cost of implementing trace Visibility of signals and software with STM Output performance and power profiling data OEMs High level application software view (Apps, OS, firmware) with STM Equip more s/w developers with on-chip visibility (STM) at lower cost (TMC) Optimize software stack on real product Tool vendors 28 Deliver on-chip visibility to more users Complement processor trace Use of CoreSight™ enabled hardware for software optimisation Case Study with the ARM Profiler Optimizing Android Media Framework 29 Optimization Iterative Process Development Tools Profile Analysis Automatic or manual optimization Analyse Source code Compiler Compiler Linker Assembler Libraries 01010101 machine code 30 to target Trace Capture and Analysis Processor ETM Trace port SoC 31 Compressed trace USB Trace port analyzer Further compression Host PC Trace port analyzer compresses and streams trace info Host PC decompresses and analyzes the trace stream Profiler displays results of analysis Profiling and memory usage information Code coverage At thread, function and instruction level Non-intrusive and long-running Case Study – GStreamer FFmpeg execution under Android Google Android OS: www.android.com GStreamer multimedia framework: gstreamer.freedesktop.org FFmpeg audio and video library plug-in: www.ffmpeg.org Target: Mistral EVM OMAP35xx Based on Cortex-A8 processor 32 Hardware Set Up 33 ARM Profiler Set Up 34 ARM Profiler Set Up 35 Live Update – Running Android Progress of trace collection and analysis Current processes Processor load over time Current threads Processor exceptions over time 36 ARM Profiler: Top Level Report Top 5 threads Top 5 functions 37 ARM Profiler: Top Level Report Instructions Exceptions Time line 38 Mem access ARM Profiler: Top Level Report Details for selected time Time selector Top 5 processes Top 5 threads 39 Detailed Views Look at top functions by time, memory access and delay yuv420p_to_rgb565 is the function to optimize 40 Detailed Views Analyze in code view, change and profile again! 41 Example: optimise yuv420_to_rgb565 yuv420_to_rgb565 (image *dst, image *src, int width, int height) for (; height >=2; height -=2) for (; width >=2; width -=2) Process 4 pixels (2x2 square) if odd width handle last column if odd height handle last row 42 Example: optimise yuv420_to_rgb565 yuv420_to_rgb565 (image *dst, image *src, int width, int height) for (; height >=2; height -=2) for (; width >=2; width -=2) Process 4 pixels (2x2 square) if odd width handle last column if odd height handle last row yuv420_to_rgb565 (image *dst, image *src, int width, int height) if odd height odd width if odd height even width call yuv420_to_rgb565_1 call yuv420_to_rgb565_2 call yuv420_to_rgb565_3 if even height even width call yuv420_to_rgb565_4 if even height odd width yuv420_to_rgb565_4 (image __restrict *dst, image __restrict *src, int width, int height) //independent pointers for (; height > 0; height -=2) //comparison with 0 for (; width > 0; width -=2) //comparison with 0 Process 4 pixels (2x2 square) //no checking if odd width 43 Results Hot function: 5% higher performance Whole application: 3% higher performance 44 Conclusion 45 Conclusion CoreSight accelerates SoC developments & reduce time to market CoreSight is available now on all major open platforms e.g TI OMAP3, Freescale iMX51, STE Nomadik And on many ASSP & ASIC New CoreSight IP makes on-chip visibility affordable for more developers 46
© Copyright 2024