Intel® C++ Compiler for both Linux* and Android* Targets Intel® System Studio 2015 Intel® System Studio 2015 Integrated and Comprehensive Development Suite River VxWorks* Tizen* 2Wind Android* Wind River Linux* Component Linux* Boost Acceler Strengt Power ate hen Efficienc Categor Time System y and y To Reliabili Perform Market ty ance Target Operating Systems √ √ √ √ Intel® JTAG Debugger 20151 Debugge Intel Enhanced GDB* Debugger √ √ √ √ √ rs & 7.5 Trace System Visible Event Nexus √ √ √ √ √ Technology 1.0 Intel® VTune™ Amplifier 2015 √ √ √ √ √ Intel C++√ Compiler for Systems √ √ 2015 √all√Targets √ √ Analyzer Intel® Energy Profiler supports s √ √ Intel® System Analyzer √ Intel® Inspector 2015 for √ √ √ √ √ Systems √ √ Intel® C++ Compiler 15.0 √ √ √ √ √ Compiler ® Optional Intel Integrated Performance Deep provided with √ system-level insights √ into power, √ √ Tools √ √ WR & reliability and performance which help √ VxWorks* platform Primitives 8.1 accelerate time to market of Intel Architecture-based embedded and mobile systems from Wind River* Libraries √ √ Intel® Math Kernel Library 11.1 √ √ √ √ √ 1 2 2 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. What you will learn from this slide deck • Intel® C++ Compiler 15.0 technical training for System & Application code running Linux*, Android* & Tizen™ • In-depth explanation of compiler specifics for each development environment mentioned above • Please see subsequent slide decks for in-depth technical training on other components 3 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Agenda • Why Use Intel® C++ Compiler • Intel® C++ Compiler Features Highlights • Using the Intel® C++ Compiler • Porting from GCC compiler to ICC compiler • Cross Compilation • Intel® C++ Compiler for Android specifics • Android Integration • ARM* Neon vs. Intel SSE • Summary 4 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Why Use Intel® C++ Compiler? Compatibility • Multiple OS and Cross Compilation Support • HOST OS: Windows*, Linux* • TARGET OS: Linux*, WindRiver Linux*, Yocto Project, Tizen*, Android*, customized Linux* and more • Integration with major IDE for different targets • • • • Eclipse* Integration on Linux* WindRiver Linux Workbench Integration Android NDK Integration Windows* Visual Studio* integration for Android • No source-level changes for quick & simple gains • Just recompile for a quick ROI! • Source and binary compatibility • Compatible with GNU compiler • Can mix and match files as needed Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 5 Why Use Intel® C++ Compiler? Parallelism • Numerous tools to enable parallelism • Vector Parallelism Implicit Vectorization Vector statements (Intel® Cilk™ Plus) Lower level SIMD (pragmas, intrinsic functions) • Task Parallelism Language extensions (Intel® Cilk™ Plus) GAP – let the compiler help you restructure code for more optimization opportunities • Multi-threaded Performance Libraries – MKL, IPP 10/25/2014 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 6 Why Use Intel® C++ Compiler? Performance Support the newest Intel Architecture features Latest Instructions for newest IA processors (SSE, AVX, AVX2) Code generation tuned for latest micro architecture Advanced optimization features Highly Optimized libraries Compiler libraries – (libirc, libimf, libsvml, etc) MKL – Math functions (BLAS, FFT, LAPACK, etc) IPP – (compression, video encoding, image processing, etc) Good optimization report helps performance tuning 10/25/2014 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 7 Intel® C++ Compiler Updated Boosts Performance with GNU compatible Compiler High Performance and GCC* Compatibility • Generate faster code using Intel® SSE, Intel® AVX, AVX2 etc instructions • GCC compatibility Standards and cross-build support Enhanced cross-build sysroot integration OpenEmbedded* 3rd party toolchain layer recipes Cross platform. High performance. GNU compatibility. 8 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. What’s new in Compiler Version 15.0 • Support for Intel® Quark Processor X1xxx • Improved support and optimization for 2nd generation Intel® Atom® Processor family • Improved sysroot support allowing easier integration into GNU* cross-build • Environments and support for 3rd party toolchain layer recipes • Features from C++11 (-std=c++0x) • Android - Support for Android NDK r9 • Refer to compiler release notes for more details 9 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Agenda • Why Use Intel® C++ Compiler • Intel® C++ Compiler Features Highlights • Using the Intel® C++ Compiler • Porting from GCC compiler to ICC compiler • Cross Compilation • Intel® C++ Compiler for Android specifics • Android Integration • ARM* Neon vs. Intel SSE • Summary 10 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Compatibility to Standards • The Intel C++ Compiler provides the following language conformances: • ANSI/ISO standard for C language compilation (ISO/IEC9899:1990) ANSI/ISO standard (ISO/IEC 14882:1998) for the C++ language Most of the C99 features are supported by Intel® C++ Compiler More information at https://software.intel.com/en-us/articles/c99-support-inintelr-c-compiler • Intel(R) C++ Compiler has supported lots of the C++11 features (previously called C++0x) More information at https://software.intel.com/en-us/articles/c0x-features-supported-by-intelc-compiler 11 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Intel C++ Compiler is Compatible with GNU/GCC Compiler • Source, binary and command line compatibility − Mixing and matching binary files created by gcc/g++, including third-party libraries − Improved support for command-line options offered in the GNU compilers − Support of most GNU C and C++ language extensions − C and C++ Source Compatible file1.c − GCC file1.o GCC/ICC file2.c ICC executable file2.o • Compatibility white paper http://software.intel.com/en-us/articles/intel-c-compilerfor-linux-compatibility-with-the-gnu-compilers/ 12 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Common Optimization Switches Linux* Disable optimization -O0 Optimize for speed (no code size increase) -O1 Optimize for speed (default) -O2 High-level loop optimization -O3 Create symbols for debugging -g Multi-file inter-procedural optimization -ipo Profile guided optimization (multi-step build) -prof-gen -prof-use Optimize for speed across the entire program -fast (same as: -ipo –O3 -no-prec**warning: -fast def’n changes over time div -static -xHost) **don’t use this option unless the target is same as host 13 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. General optimization options • -O1 Optimize code size Auto Vectorization is turned off • -O2 Inlining Auto Vectorization • -O3 Loop optimization Data pre-fetching • -ansi-alias / -restrict / -no-prec-div Intel Confidential Optimization Notice 14 Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Auto-Vectorization SIMD – Single Instruction Multiple Data • Scalar mode • SIMD processing – one instruction produces one result – with SSE or AVX instructions – one instruction can produce multiple results for (i=0;i<=MAX;i++) c[i]=a[i]+b[i]; a[i] a + + b[i] b a[i]+b[i] a+b a[i+7] a[i+6] a[i+5] a[i+4] a[i+3] a[i+2] a[i+1] + b[i+7] b[i+6] b[i+5] b[i+4] b[i+3] b[i+2] b[i+1] c[i+7] c[i+6] c[i+5] c[i+4] c[i+3] c[i+2] c[i+1] 10/25/2014 Optimization Notice a[i] Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. b[i] c[i] 15 Vectorization is Achieved through SIMD Instructions & Hardware 128 0 X4 X3 X2 Y4 Y3 Y2 Y1 X4opY4 X3opY3 X2opY2 X1opY1 255 Intel® SSE X1 Vector size: 128bit Data types: 8,16,32,64 bit integers 32 and 64bit floats VL: 2,4,8,16 Sample: Xi, Yi bit 32 int / float 128 127 0 X8 X7 X6 X5 X4 X3 X2 X1 Y8 Y7 Y6 Y5 Y4 Y3 Y2 Y1 X8opY8 X7opY7 X6opY6 X5opY5 X4opY4 X3opY3 X2opY2 X1opY1 Intel® AVX Vector size: 256bit Data types: 32 and 64 bit floats VL: 4, 8, 16 Sample: Xi, Yi 32 bit int or float First introduced in 2011 16 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. SIMD Instruction Enhancements SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AES-NI AVX 70 instr 144 instr Doubleprecision Vectors 8/16/32 64/128-bit vector integer 13 instr Complex Data 32 instr Decode 47 instr Video Graphics building blocks Advanced vector instr 8 instr String/XML processing POP-Count CRC 7 instr ~100 new instr. ~300 legacy sse instr updated 256-bit vector 3 and 4operand instructions SinglePrecision Vectors Streaming operations Intel® Atom Saltwell Intel® Atom Silvermont Encryption and Decryption Key Generation Sandy bridge Ivy bridge Intel Confidential Optimization Notice 17 Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Many Ways to Vectorize Compiler: Auto-vectorization (no change of code) Ease of use Compiler: Auto-vectorization hints (#pragma vector, …) Compiler: Intel® Cilk™ Plus Array Notation Extensions SIMD intrinsic class (e.g.: F32vec, F64vec, …) Vector intrinsic (e.g.: _mm_fmadd_pd(…), _mm_add_ps(…), …) Assembler code (e.g.: [v]addps, [v]addss, …) 10/25/2014 Optimization Notice Programmer control 18 Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Processor Specific Optimizations – Auto Vectorization Optimization Notice Feature SIMD Extension Compiler Options Intel® Streaming SIMD Extensions 2 (Intel® SSE2) as available in initial Pentium® 4 or compatible non-Intel processors sse2 -xSSE2 Intel® Streaming SIMD Extensions 3 (Intel® SSE3) as available in Pentium® 4 or compatible non-Intel processors sse3 -xSSE3 Supplemental Streaming SIMD Extensions 3 (SSSE3) as available in Intel® Core™2 Duo processors ssse3 -xSSSE3 Intel® SSE4.1 as first introduced in Intel® 45nm Hi-K next generation Intel Core™ micro-architecture sse4.1 -xSSE4.1 Intel® SSE4.2 Accelerated String and Text Processing instructions supported first by Intel® Core™ i7 processors sse4.2 -xSSE4.2 Like SSSE3 but optimizes for the Intel® Atom™ processor and Intel® Centrino® Atom™ Processor Technology atom_ssse3 -xATOM_SSSE3 Like SSE4.2 but optimized for Intel® Atom™ processors that support Intel® SSE4.2 and MOVBE instructions. atom_sse4.2 -xATOM_SSE4.2 Intel® Advanced Vector Extensions (Intel® AVX) as available in 2nd generation Intel® Core™ processor family avx -xAVX Intel® Advanced Vector Extension (Intel® AVX) including instructions offered by the 3rd generation Intel® Core processor core-avx-i -xCORE-AVX-I Intel® Advanced Vector Extension 2 (Intel® AVX2) as provided by a 4th generation Intel® Core processor core-avx2 -xCORE-AVX2 Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 19 Processor Specific Optimizations • Intel® Core Processor • 2nd generation (Sandy Bridge) : -xAVX • 3rd generation (Ivy Bridge): -xCORE-AVX-I • 4th generation (Haswell) : -xCORE-AVX2 • Intel® Advanced Vector Extensions 2 Support • Intel® Atom Processor • Bonnell/Saltwell: –xATOM_SSSE3 • Silvermont (Baytrail) : -xATOM_SSE4.2 • Intel® Quark Processor • -mia32 –falign-stack=assume-4-byte • Using the Intel Compiler with Intel Quark SoC - Getting Started 20 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Atom Processor Specific optimization • Optimization switch –xATOM_SSSE3 for Saltwell Intel Supplemental Streaming SIMD Extensions 3 (SSSE3) In Order scheduler Use of LEA for stack operations Instruction reordering Support for movbe instruction (-minstruction=movbe) Optimization switch-xATOM_SSE4.2 for Silvermont Intel® Streaming SIMD Extensions 4.2 (SSE4.2) Out of order scheduler 21 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Interprocedural Optimizations Extends optimizations across file boundaries Without IPO With IPO Compile & Optimize file1.c Compile & Optimize file2.c Compile & Optimize file3.c file1.c file3.c Compile & Optimize file4.c file4.c file2.c Compile & Optimize -ip Only between modules of one source file -ipo Modules of multiple files/whole application Intel Confidential Optimization Notice 22 Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Interprocedural Optimizations (IPO) Usage: Two-Step Process Compiling icc -c -ipo main.c icc –c –ipo func1.c icc –c –ipo func2.c Pass 1 mock object Pass 2 Linking icc -ipo main.o func1.o func2.o executable Intel Confidential Optimization Notice 23 Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. IPO optimization Tips • O2 and O3 activate “almost” file-local IPO (-ip) • IPO extends compilation time and memory usage • IPO works for libraries too • In-lining of functions is most important feature of IPO but there is much more • IPO can work together with other optimizations such as PGO 24 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Profile-Guided Optimizations (PGO) • Static analysis leaves many questions open for the optimizer like: How often is x > y What is the size of count Which code is touched how often if (x > y) do_this(); else for(i=0; i<count; ++I do_work(); do that(); • • Use execution-time feedback to guide (final) optimization Enhancements with PGO: More accurate branch prediction Basic block movement to improve instruction cache behavior Better decision of functions to inline (help IPO) Can optimize function ordering Switch-statement optimization Better vectorization decisions 25 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. PGO Usage: Three Step Process Step 1 Compile + link to add instrumentation icc -prof_gen prog.c Instrumented executable: prog.exe Step 2 Execute instrumented program prog.exe (on a typical dataset) Step 3 Compile + link using feedback icc -prof_use prog.c Dynamic profile: 12345678.dyn Merged .dyn files: pgopti.dpi Optimized executable: prog.exe 26 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Interact with Compiler Optimizations Did my loop vectorize? Why didn’t my loop vectorize? How to vectorize my loop? Can the compiler suggest the code changes for better optimization and performance? Let Intel C++ Compiler help you! 27 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Compiler Reports – Vectorization Report Compiler switch: -vec-report<n> Set diagnostic level dumped to stdout n=0: No diagnostic information n=1: (Default) Loops successfully vectorized n=2: Loops not vectorized – and the reason why not n=3: Adds dependency Information n=4: Reports only non-vectorized loops n=5: Reports only non-vectorized loops and adds dependency info n=6: Reports on vectorized and non-vectorized loops and any proven or assumed data dependences. n=7: reports vectorization summary information and currently requires the use of a Python script to interpret. More information can be found at http://software.intel.com/en-us/articles/vecanalysis-python-script-forannotating-intelr-compiler-vectorization-report 28 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Vectorization Fine-Control • Disable vectorization -no-vec #pragma novector • Enforce for a single loop if semantically correct #pragma vector always • Ignore vector dependencies #pragma ivdep • “Enforce” vectorization #pragma simd • Generate multiple, feature-specific auto-dispatch code paths for Intel® processors -ax<code> 29 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Guided Automatic Parallelization (GAP) • Use compiler infrastructure to help developer • Vectorization, parallelization and data transformations • Extend diagnostic message for failed vectorization and parallelization by specific hints to fix problem • Exploit multi-year experience brought into the compiler development • Performance tuning knowledge based on dealing with numerous applications, benchmarks and compute kernels • Does not influence code generation • Use option -guide/-guide-vec/-guide-data-trans etc • Add option “–diag-disable 30761” with -guide to disable Auto Parallelization for Embedded and Android 30 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Vectorization Example void f(int n, float *x, float *y, float *z, float *d1, float *d2) { for (int i = 0; i < n; i++) z[i] = x[i] + y[i] – (d1[i]*d2[i]); } GAP Message: g.c(6): remark #30536: (LOOP) Add -no-alias-args option for better type-based disambiguation analysis by the compiler, if appropriate (the option will apply for the entire compilation). This will improve optimizations such as vectorization for the loop at line 6. [VERIFY] Make sure that the semantics of this option is obeyed for the entire compilation. [ALTERNATIVE] Another way to get the same effect is to add the "restrict" keyword to each pointer-typed formal parameter of the routine "f". This allows optimizations such as vectorization to be applied to the loop at line 6. [VERIFY] Make sure that semantics of the "restrict" pointer qualifier is satisfied: in the routine, all data accessed through the pointer must not be accessed through any other Intel Confidential Optimization Notice 31 Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Data Transformation Example struct S3 { int a; int b; // hot double c[100]; struct S2 *s2_ptr; int d; int e; struct S1 *s1_ptr; char *c_p; int f; // hot }; for (ii = 0; ii < N; ii++){ sp->b = ii; sp->f = ii + 1; sp++; } peel.c(22): remark #30756: (DTRANS) Splitting the structure 'S3' into two parts will improve data locality and is highly recommended. Frequently accessed fields are 'b, f'; performance may improve by putting these fields into one structure and the remaining fields into another structure. Alternatively, performance may also improve by reordering the fields of the structure. Suggested field order:'b, f, s2_ptr, s1_ptr, a, c, d, e, c_p'. [VERIFY] The suggestion is based on the field references in current compilation … Intel Confidential Optimization Notice 32 Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Intel® Cilk™ Plus - Overview Task parallelism Simple Keywords Reducers (Hyper-objects) Set of keywords, for expression of task parallelism: Reliable access to nonlocal variables without races cilk_spawn cilk_sync cilk_for cilk::reducer_opadd<int> sum(3); Data parallelism Array Notation Provide data parallelism for sections of arrays or whole arrays mask[:] = a[:] < b[:] ? -1 : 1; Elemental Functions Define actions that can be applied to whole or parts of arrays or scalars Execution parameters Runtime system APIs, Environment variables, pragmas 33 Optimization Notice Intel Confidential Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Intel® Cilk™ Plus - Overview • Intel® Cilk™ Plus (Language Extension to C/C++) Easier Task & Data Parallelism 3 simple Keywords: cilk_for, cilk_spawn, cilk_sync • Intel® Cilk™ Plus Array Notation Save time with powerful vectorization Multicore Programming with Intel® Cilk™ Plus Minimize Software Re-Work for New Hardware 34 34 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. General Steps of Optimization 1. Build with optimization disabled 2. Use general optimizations 3. Use processor specific options 4. Add Inter Procedural Optimizations 5. Use Profile Guided Optimization 6. Tune Automatic Vectorization 7. Multithread your application Intel Confidential Optimization Notice 35 Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Agenda • Why Use Intel® C++ Compiler • Intel® C++ Compiler Features Highlights • Using the Intel® C++ Compiler • Porting from GCC compiler to ICC compiler • Cross Compilation • Intel® C++ Compiler for Android specifics • Android Integration • ARM* Neon vs. Intel SSE • Summary 36 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Differences to Intel® C/C++ Compiler XE 15.0 for Linux* - Mainly focus on Embedded Linux and Android targets - Cross-compilation support - Windows/Linux host -> Embedded Linux, Windriver Linux, Tizen, Android* target etc - IDE Integration - Eclipse integration - WindRiver IDE integration - Android NDK integration - Some features are removed - OpenMP 37 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Cross Platform Development Embedded development normally uses cross platform development Development host development target Intel C++ Compiler supports Cross-Build for embedded target OS SCP/SFTP/SSH … 38 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Key Files Supplied with Compiler Intel C++ Compiler Key Files compilervars.(c)sh: Source scripts to setup the complete compiler, debugger environment icc, icpc: C/C++ compiler driver xild: Intel linker driver xiar: Intel archive driver Predefined environment configuration files Intel include files, libraries Integration files with target IDE Documentations 39 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Intel C++ Compiler works with GNU binary utilities • Intel C++ Compiler does not provide real bin utilities like “ld”, “ar”, but provide the driver to invoke the “ld” or “ar” automatically • icc/icpc: first calls real Intel compiler(mcpcom) for compilation, then invokes the GNU “ld” if necessary • xild: first calls real Intel compiler for optimization, then invokes GNU “ld” • xiar: first calls real Intel compiler for optimization, then invokes GNU “ar” 40 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Porting from GCC to ICC compiler (I) • Change the compilation command using predefined platform configuration with –platform option • <prefix, e.g. i586-poky-linux>-gcc icc -platform=<val> • <prefix, e.g. i586-poky-linux>-g++ icpc -platform=<val> • <prefix, e.g. i586-poky-linux>-ar xiar -qplatform=<val> • <prefix, e.g. i586-poky-linux>-ld xild -qplatform=<val> • This normally can be changed with environment variable like CC, CXX, AR, LD, or in the Makefile 41 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Porting from GCC to ICC compiler (II) • Starting from Intel C++ Compiler 15.0 (Intel System Studio 2015), use –gnu-prefix and --sysroot option • icc –gnu-prefix=i586-poky-linux- --sysroot=<val> • icpc –gnu-prefix=i586-poky-linux- --sysroot=<val> • xiar –qgnu-prefix=i586-poky-linux- --sysroot=<val> • xild –qgnu-prefix=i586-poky-linux- • More details on –gnu-prefix and –sysroot, refer to compiler documents, and online articles • Improved Sysroot Support with Intel Compiler for Cross Compilation 42 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Example: Yocto Project 1.3 • GCC Build environment: • export CC="i586-poky-linux-gcc -m32 -march=i586 -- sysroot=/opt/poky/1.3/sysroots/i586-poky-linux“ export CXX="i586-poky-linux-g++ -m32 -march=i586 --sysroot=/opt/poky/1.3/sysroots/i586-poky-linux“ export LD="i586-poky-linux-ld --sysroot=/opt/poky/1.3/sysroots/i586-poky-linux“ export AR=i586-poky-linux-ar • ICC Build Environment (I) • export YL_SYSROOT=/opt/poky/1.3/sysroots/i586-poky-linux export YL_TOOLCHAIN=/opt/poky/1.3/sysroots/i686-pokysdk-linux/usr/bin export CC=“icc –platform=yl13 -m32 -march=i586 “ export CXX=“icpc -platform=yl13 -m32 -march=i586 “ export LD=“xild –qplatform=yl13“ export AR=“xiar –qplatform=yl13” • ICC Build Environment (II) • export CC=“icc –gnu-prefix= i586-poky-linux- -m32 -march=i586 –sysroot=/opt/poky/1.3/sysroots/i586-pokylinux” export CXX=“icpc –gnu-prefix= i586-poky-linux- -m32 -march=i586 --sysroot=/opt/poky/1.3/sysroots/i586poky-linux“ export LD=“xild –qgnu-prefix= i586-poky-linux- --sysroot=/opt/poky/1.3/sysroots/i586-poky-linux “ export AR=“xiar –qgnu-prefix= i586-poky-linux- ” 43 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Porting Tips • Intel C++ Compiler does thorough source code quality checks and hence more “remarks” and “warnings” output • Remove option -Werror if it is enabled for icc compiler • Investigate the warning or disable it by icc –diagdisable<> option • Enable icc specific optimizations to gain performance • Tuning application based on icc optimization report 44 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Agenda • Why Use Intel® C++ Compiler • Intel® C++ Compiler Features Highlights • Using the Intel® C++ Compiler • Porting from GCC compiler to ICC compiler • Cross Compilation • Intel® C++ Compiler for Android specifics • Summary 45 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Intel C++ Compiler options for Cross Build Build with predefined platform configurations -platform=<val> to specify the target platform Predefined platform <val>: yl12, yl13, wrl43, wrl50, celpr28 Build with sysroot --sysroot=<val> to specify the sysroot of target --gnu-prefix =<val> to specify the cross gcc prefix 46 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Cross build for Yocto Project* target • Build with predefined platform configurations Export toolchain and sysroot environment variable Use option –platform=<val> for cross build E.g: $export YL_SYSROOT=/opt/poky/1.3/sysroots/i586-pokylinux $export YL_TOOLCHAIN=/opt/poky/1.3/sysroots/i686pokysdk-linux/usr/bin $icc -platform=yl13 my_source_file.c -o myapp Build with sysroot $icc –gnu-prefix=i586-poky-linux- --sysroot= /opt/poky/1.3/sysroots/i586-pokylinux my_source_file.c –o myapp Building Yocto Project* Applications using Intel Compiler Using Intel Compiler as the Secondary Toolchain for Yocto Project 47 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Cross build for Wind River* Linux Target • Wind River Linux 4.3, 5.0 target, build with predefined platform configurations • export WRL_TOOCHAIN and WRL_SYSROOT • Use option –platform=wrl43/wrl50 for cross build • Wind River Linux 6.0 target, build with sysroot • 32-bit target (for example): export PATH=<WRL6_path>/x86_64-linux/usr/bin/i586-wrslinux:$PATH export SYSROOT=<WRL6_path>/qemux86 export GNU_PREFIX=i586-wrs-linux icc c.c -gnu-prefix=$GNU_PREFIX --sysroot=$SYSROOT • Getting Started with Intel System Studio and WindRiver Linux - Build and Run Guide Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 48 Wind River* Application Cross-Build from Windows* Host 1. Set environment variables: • WRL_TOOLCHAIN • WRL_SYSROOT Example: Wind River* Linux 5.0.x 64-bit target set WRL_TOOLCHAIN=<some_path>\wrl50\wrlinux-5\layers\wr-toolchain\4.6-60-win32 set WRL_SYSROOT=<some_path>\wrl50\intel64\export\sysroot\intel-xeon-core_glibc_std\ bitbake_build\tmp\sysroots\intel-xeon-core 2. Build application with -platform option icc.exe -platform=wrl50 my_source_file.c 49 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Cross Build for Tizen* Target • Build with sysroot export PATH=/opt/tzn20/tizen-sdk/tools/i386-linux-gnueabigcc-4.5_SAVE/bin:$PATH export TZN_SYSROOT=/opt/tzn0/tizensdk/platforms/tizen2.0/rootstraps/tizen-emulator-2.0.cpp export TZN_EXEC_PREFIX=i386-linux-gnueabi icc -gnu-prefix=$TZN_EXEC_PREFIX -sysroot=$TZN_SYSROOT my_source_file.c • Using Intel C++ Compiler for Tizen 50 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Intel® C++ Compiler Eclipse* Integration Intel System Studio provides eclipse plugins for Eclipse* integration Intel® C++ Compiler can be invoked from Eclipse* CDT Automatic integration with Android NDK during Installation Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Agenda • Why Use Intel® C++ Compiler • Intel® C++ Compiler Features Highlights • Using the Intel® C++ Compiler Cross Compilation • Intel® C++ Compiler for Android specifics • Android Integration • ARM* Neon vs. Intel SSE • Summary 52 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Agenda • Why Use Intel® C++ Compiler • Intel® C++ Compiler Features Highlights • Using the Intel® C++ Compiler Cross Compilation • Intel® C++ Compiler for Android specifics • Android Integration • ARM* Neon vs. Intel SSE • Summary 53 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Integration into the build environment Three different options to compile Android* apps 1. Standalone tool chain Useful for own / 3rd party build systems Manual compile / link / package of application 2. Using ndk-build script Controlled by Android.mk and Application.mk files Automatically compile/link applications and store it the right folders for using it from the Android* SDK 3. As part of the AOSP Automatically integrates into platform build 54 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Option 1: Standalone tool chain • Establish the compiler environment • source <icc-install-dir>/bin/compilervars.sh • Intel C++ compiler can be used directly in this environment • Manual compile / link / package of application • Recommand for own / 3rd party build systems 55 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Option 2: Using ndk-build script • Compiler automatically integrated with Android NDK during installation Can be controlled by Android.mk and Application.mk files APP_ABI := x86 NDK_TOOLCHAIN=x86-icc • Build from command line or from Eclipse with Android SDK • Execute the ndk-build script in the project folder ndk-build V=1 -B NDK_TOOLCHAIN=x86-icc APP_ABI=x86 • Recommend for NDK based native application build • Support for Android NDK r9 56 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Option 3: As part of the AOSP • Two modes available: • ICC as default compiler • Compile only specified modules with ICC • Ability to force compilation for particular modules with ICC or GCC independent of the default compiler • Recommend to start with compiling specified modules with ICC 57 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Option 3: As part of the AOSP (a) • Preparations 1. Request a patch set for your particular version of the source tree from your Intel representative 2. Apply the patch to your source tree 3. Check if ICC is already included in your source tree ls prebuilts/PRIVATE/icc/linux-x86/x86/x86-android-linux-15.0 4. If the directory is empty or missing, make a symlink from /opt/intel/CCAndroid15.0.1.021/ (the folder name may be changed for different compiler versions) 58 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Option 3: As part of the AOSP (b) • Compile specified modules with ICC • Edit the ICC configuration file nano build/core/icc_config.mk • Specify modules you want to compiler with ICC ICC_MODULES := libv8 libskia libskiagpu • Build Android as usual source build/envsetup.sh lunch make 59 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Option 3: As part of the AOSP (c) • ICC as default compiler • Edit the ICC configuration file nano build/core/icc_config.mk • Specify modules you want to compiler with GCC GCC_MODULES := … • Change the default compiler to ICC DEFAULT_COMPILER:=icc • Build Android as usual source build/envsetup.sh lunch make 60 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Option 3: As part of the AOSP (d) • Checking the result: • Check if your module was built with ICC. You should see several lines of output for this command: readelf -s out/target/product/redhookbay/system/lib/libskia.s o |grep intel • Check if the Intel libraries are copied on the device adb shell root@android:/ # ls /system/lib/libsvml.so 61 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Intel Libraries • ICC comes with four optimized libraries Library Description libintlc.so Intel support libraries libimf.so Intel math library libsvml.so Short vector math library libirng.so Random number generator • The final binary requires access to these libraries • Options • Include them into OS image • Link statically • Copy them to the application directory 62 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Option 1: Include into OS Image Recommended • Libraries in the /system/lib folder are loaded automatically • Remount the filesystem read/write adb shell mount -o remount,rw /system • Push the libraries on the target cd /opt/intel/CCAndroid15.0.1.021/lib adb push libintlc.so /system/lib adb push libimf.so /system/lib adb push libsvml.so /system/lib adb push libirng.so /system/lib • Applications will automatically load the needed libraries 63 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Option 2: Link statically Recommended • Best choice for single binary • The compiler default behavior • If libraries shouldn’t link in statically use this option -shared-intel 64 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Option 3: Copy the libraries to the application directory • Part of the Android* SDK/NDK functionality • Add to the Android.mk file in the jni folder include $(CLEAR_VARS) LOCAL_MODULE := libintlc LOCAL_SRC_FILES := libintlc.so include $(PREBUILT_SHARED_LIBRARY) libimf libsvml libirng • Local libraries are not loaded automatically, need to load them manually from JAVA System.loadLibrary("intlc"); System.loadLibrary("imf"); System.loadLibrary("svml"); System.loadLibrary(“irng"); System.loadLibrary("hello-jni"); 65 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Using PGO on Android* • Generated data files need a storage location • Default stored in the application directory -> usually write protected on Android* • Specify different storage location in Android.mk file: LOCAL_CFLAGS := -prof-gen -prof-dir /sdcard • Application needs write permissions on sdcard. Add to AndroidManifest.xml: <uses-permission android:name="android.permission.WRITE_EXTERNAL_STOR AGE" /> 66 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Using PGO on Android* (2) • Data files are only generated if application exits • Application usually not exit on Android* • Option 1: Call exit from Java: System.exit(0); • Option 2: Explicitly dump PGO data from native code #include <pgouser.h> _PGOPTI_Prof_Dump_All(); Recommended • Option 3: Using environment to make regular dumps export INTEL_PROF_DUMP_INTERVAL 5000 export INTEL_PROF_DUMP_CUMULATIVE 1 67 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Using Intel® Cilk™ Plus • Change your STL to GNU shared and add exception support to your Application.mk file: APP_STL := gnustl_shared APP_GNUSTL_FORCE_CPP_FEATURES := exceptions rtti • Include the Cilk runtime library in app by adding to the Android.mk file: include $(CLEAR_VARS) LOCAL_MODULE := cilkrts.so LOCAL_SRC_FILES := ../path/to/CCAndroid/lib/cilkrts.so include $(PREBUILT_SHARED_LIBRARY) • Load the libraries from your Java code System.loadLibrary("gnustl_shared"); System.loadLibrary("cilkrts"); 68 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Using Intel® Cilk™ Plus (2) • Add a Cilk to your linker options(Android.mk): LOCAL_LDLIBS += -lcilkrts • In your C/C++ file include the Cilk header #include <cilk/cilk.h> • And start using Cilk in your C/C++ code int fib(int n) { if (n < 2) return n; int x = cilk_spawn fib(n-1); int y = fib(n-2); cilk_sync; return x + y; } 69 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Agenda • Why Use Intel® C++ Compiler • Intel® C++ Compiler Features Highlights • Using the Intel® C++ Compiler Cross Compilation • Intel® C++ Compiler for Android specifics • Android Integration • ARM* Neon vs. Intel SSE • Summary 70 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Comparison ARM v5 ARM v7a x86 32-bit 32-bit 32-bit little-endian little-endian little-endian Soft FP Hardware FP Hardware FP 64-bit vars aligned 64-bit vars aligned 64-bit vars packed None NEON SSE Normally not a problem… This will require porting… Intel Confidential Optimization Notice 71 Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Memory alignment struct TestStruct { int mVar1; long long mVar2; int mVar3; }; ARM x86 Force memory alignment -malign-double Intel Confidential Optimization Notice 72 Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Porting SIMD instructions Porting NEON instructions (ARM*) to SSE instructions (Intel) – Fixed point arithmetic only on ARM* – NEON native C libs can’t be reused in Intel® Atom™ based platforms wrap NEON functions and intrinsics to SSE3 – http://intel.ly/10JjuY4 - NEONvsSSE.h 73 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Conclusion • Based on the high performance Intel® C++ Compiler for Linux*, widely used by HPC customers for archiving better performance on IA • Comes with a well established support infrastructure • Variety of optimization options available • Integration into various parts of the Android* environment • Can be integrated in a standalone tool chain, the NDK and the AOSP 74 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Agenda • Why Use Intel® C++ Compiler • Intel® C++ Compiler Features Highlights • Using the Intel® C++ Compiler Cross Compilation • Intel® C++ Compiler for Android specifics • Android Integration • ARM* Neon vs. Intel SSE • Summary 75 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Summary Intel® C++ Compilers of Intel® System Studio provide powerful features, especially High level optimizations Auto-vectorization to vectorize serial code Cilk Plus programming for multithreading Cross Compilation support Integrates into multiple IDEs for different targets Compatible with GCC compiler More information on Intel’s software offerings and services at http://software.intel.com 76 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Summary/Call to Action Intel® System Studio 2015 provides deep system-level insights into power, reliability and performance to help accelerate time to market of Intel Architecture-based embedded and mobile systems For more information, to evaluate, or purchase: http://intel.ly/system-studio Useful links Premier Support: https://premier.intel.com Forum: http://software.intel.com/en-us/forum/intel-systemstudio/ Email: [email protected] To order the Intel® ITP-XDP3 device, please contact the Hibbert Group* at [email protected] 77 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright © 2015, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 78 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Test Test Test 80 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Summary Test • Test Find out more at http://intel.ly/system-studio 81 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright © 2015, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 82 Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Backup Intel Confidential - Internal Use Only Optimization Notice 84 Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 10/25/2014
© Copyright 2024