Intel® Advisor Tutorial: Find Where to Add Parallelism Intel® Parallel Studio XE 2015 Professional Edition for Windows* Intel® Parallel Studio XE 2015 Cluster Edition for Windows* This tutorial shows how to find where to add parallelism to a serial, Fortran sample application using the Intel Advisor. It demonstrates an end-to-end workflow you can ultimately apply to your own applications. Intel® Advisor Tutorial: Find Where to Add Parallelism Contents Legal Information................................................................................ 3 Overview.............................................................................................. 4 Chapter 1: Navigation Quick Start Chapter 2: Find Where to Add Parallelism Visual Studio* IDE: Choose Project and Build Target in Release Mode............. 11 Standalone Intel Advisor GUI: Build Target in Release Mode and Create New Project............................................................................................... 15 Discover Parallel Opportunities...................................................................19 Mark Best Parallel Opportunities With Annotations........................................ 23 Predict Maximum Parallel Performance Speedup...........................................27 Predict Parallel Data Sharing Problems........................................................33 Fix Data Sharing Problems........................................................................ 36 Add Parallelism........................................................................................37 Chapter 3: Summary Chapter 4: Key Terms 2 Legal Information By using this document, in addition to any agreements you have with Intel, you accept the terms set forth below. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http:// www.intel.com/design/literature.htm BlueMoon, BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Inside, Cilk, Core Inside, E-GOLD, Flexpipe, i960, Intel, the Intel logo, Intel AppUp, Intel Atom, Intel Atom Inside, Intel CoFluent, Intel Core, Intel Inside, Intel Insider, the Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel Sponsors of Tomorrow., the Intel Sponsors of Tomorrow. logo, Intel StrataFlash, Intel vPro, Intel Xeon Phi, Intel XScale, InTru, the InTru logo, the InTru Inside logo, InTru soundmark, Itanium, Itanium Inside, MCS, MMX, Pentium, Pentium Inside, Puma, skoool, the skoool logo, SMARTi, Sound Mark, Stay With It, The Creators Project, The Journey Inside, Thunderbolt, Ultrabook, vPro Inside, VTune, Xeon, Xeon Inside, X-GOLD, XMM, X-PMU and XPOSYS are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries. Copyright © 2009-2014, Intel Corporation. All rights reserved 3 Intel® Advisor Tutorial: Find Where to Add Parallelism Overview Discover how to find where to add parallelism to a serial application using the Intel® Advisor and the nqueens_Fortran Fortran sample application. About This Tutorial This tutorial demonstrates an end-to-end workflow you can ultimately apply to your own applications: 1. 2. 3. 4. 5. 6. 7. Survey the target executable to locate the loops and functions where your application spends the most time. In the target sources, add Intel Advisor annotations to mark possible parallel tasks and their enclosing parallel sites. Check Suitability to predict the maximum parallel performance speedup of the target based on these annotations. Check Correctness to predict parallel data sharing problems in the target based on these annotations. If the predicted maximum speedup benefit is worth the effort to fix the predicted parallel data sharing problems, fix the problems. Recheck Suitability to see how your fixes impact the predicted maximum speedup. If the predicted maximum speedup benefit is still worth the effort to add parallelism to the target, replace the annotations with parallel framework code that enables parallel execution. Estimated Duration 20 minutes. Learning Objectives After you complete this tutorial, you should be able to: • • • • • More Resources List the steps to find where to add parallelism using the Intel Advisor. Define key Intel Advisor terms. Identify compiler/linker options that produce the most accurate and complete Intel Advisor analysis results. Run all Intel Advisor analysis tools. View, interpret, and manipulate data collected by Intel Advisor analysis tools. The concepts and procedures in this tutorial apply regardless of programming language; however, a similar tutorial using a sample application in another programming language may be available at http://software.intel.com/en-us/intelsoftware-technical-documentation. This site also offers tutorials for other Intel® products and a printable version (PDF) of many tutorials. In addition, you can find more resources in Intel Advisor Help. Next Step Find Where to Add Parallelism 4 Navigation Quick Start 1 Intel Advisor provides tools that help you decide where to add parallelism to your application. Use Intel Advisor for applications: • • Created with the C/C++, Fortran, or C# programming languages Containing serial code that could possibly be replaced with parallel (multithreaded) code Intel Advisor Access To access the Intel Advisor in the Microsoft Visual Studio* IDE: From the Windows* Start menu, choose All Programs > Intel studio version > Visual Studio integrations > Use VSversion. To access the Standalone Intel Advisor GUI, do one of the following: • • From the Windows* Start menu, choose All Programs > Intel studio version > Analyzers > Intel Advisor version. From the Windows* Start menu, choose All Programs > Intel studio version > Analyzers > Command prompt > mode to set your environment, then type advixe-gui. Intel Advisor/Visual Studio* IDE Integration 5 1 Intel® Advisor Tutorial: Find Where to Add Parallelism The Visual Studio* menu and Intel Advisor toolbar offer different ways to perform many of the same functions. On the Visual Studio* menu, use: • • • • • View > Intel Advisor version to open the Advisor Workflow and Summary window Project > Intel Advisor version Project Properties to manage projects Tools > Intel Advisor version to open the Advisor Workflow and Summary window, and launch the Intel Advisor analysis tools Tools > Options > Intel Advisor version to set various product options Help > Intel Advisor version to open the Intel Advisor Help and Getting Started documentation Use the Intel Advisor toolbar to open the Advisor Workflow and Summary windows; launch the Intel Advisor analysis tools; configure projects; and access resources such as Help, Getting Started documentation, and online training and documentation. (Other Visual Studio* product versions may show fewer icons. If so, click the icon to display a drop-down with more toolbar icons.) Use the Visual Studio* Solution Explorer to manage Intel Advisor projects and results. Use the Intel Advisor result tab to view, interpret, and manipulate the data collected by Intel Advisor analysis tools. There is one active result for each project. You can also create read-only result snapshots (click the icon). Use the Visual Studio* Output window to view Visual Studio* and Intel Advisor analysis output. 6 1 Navigation Quick Start Standalone Intel Advisor GUI The Intel Advisor menu, toolbar, and Project Navigator offer different ways to perform many of the same functions. On the menu, use: • • • File to manage projects and results, and launch the Intel Advisor analysis tools View to toggle on a full-screen view of result tab data (press F11 or ESC to toggle off full-screen view); and open the Welcome tab, Advisor Workflow, Summary window, and Project Navigator Help to access resources such as Help, Getting Started documentation, and online training and documentation Use the toolbar to manage projects and results; open the Advisor Workflow, Summary window, and Project Navigator; and launch the Intel Advisor analysis tools. Use the Project Navigator to see a hierarchical view of your projects and results based on the directory where the opened project resides; open the Advisor Workflow; launch Intel Advisor analysis tools; manage results and projects; and access resources such as Help, Getting Started documentation, and online training and documentation. Use the Intel Advisor result tab to view, interpret, and manipulate the data collected by Intel Advisor analysis tools. There is one active result for each project. You can also create read-only result snapshots (click the icon). 7 1 Intel® Advisor Tutorial: Find Where to Add Parallelism Advisor Workflow The Advisor Workflow: Provides a roadmap for finding where to add parallelism. Lets you launch Intel Advisor analysis tools. Provides links to relevant topics in Intel Advisor Help. Shows the name of your current project. 8 1 Navigation Quick Start Intel Advisor Result Tab Use the result tab name to identify a result. It consists of the project name and an identifier in the format ennn. Use the navigation toolbar to navigate among various result tab windows. >> and << scroll controls appear if the full navigation toolbar cannot be displayed. Use the command toolbar to control collection and analysis. The control shows/hides this toolbar. Use the Intel Advisor result tab to view, interpret, and manipulate the data collected by Intel Advisor analysis tools. 9 2 Intel® Advisor Tutorial: Find Where to Add Parallelism Find Where to Add Parallelism 2 Follow these steps to find where to add parallelism to a serial application using the Intel Advisor and the nqueens_Fortran Fortran sample application. Step Step Detail Step 1: Prepare for tutorial. Do one of the following: Step 2: Discover parallel opportunities. • If you prefer to work in the Microsoft Visual Studio* IDE: • • • • • • • • If Get software tools and unpack the sample. Open a Visual Studio* solution. Choose a sample startup project. Verify the target is set to build in release mode. Verify optimal compiler/linker settings for the Survey tool. Build the target in release mode and test. Set Intel Advisor project properties. you prefer to work in the Standalone Intel Advisor GUI: • • • • • • Get software tools and unpack the sample. Verify optimal compiler/linker settings for the Survey tool. Build the sample target in release mode. Test the target. Open the Intel Advisor. Create a new project. • • • • Step 3: Mark best parallel opportunities with annotations. • Step 4: Predict maximum parallel performance speedup. • Step 5: Predict parallel data sharing problems. • • • • • • • • Step 6: Fix data sharing problems. • • • • • 10 Collect Survey data. View Survey Report. View Survey source. View Summary window. Add parallel site and task annotations. Rebuild target in release mode. Collect Suitability data. View Suitability Report. View Summary window. Build target in debug mode. Change Intel Advisor project properties. Collect Correctness data. View Correctness Report. View Summary window. Make cost/benefit decision. Fix memory reuse and data communication sharing problems. Rebuild target in debug mode. Rerun Correctness tool. View Summary window. 2 Find Where to Add Parallelism Step Step Detail Step 7: Add parallelism. Under normal circumstances: • • • • • Rebuild target in release mode. Collect Suitability data again. Make cost/benefit decision. Replace Intel Advisor annotations with parallel framework code. Build parallel version of target in release mode. For the purposes of this tutorial: Explore how we replaced Intel Advisor annotations with parallel framework code for you. Key Terms annotations, parallel framework, target Visual Studio* IDE: Choose Project and Build Target in Release Mode Follow these initial steps if you prefer to use the Intel Advisor plug-in to the Visual Studio* IDE to complete this tutorial. • • • • • • • Get software tools and unpack the sample. Open a Visual Studio* solution. Choose a startup project. Verify the target is set to build in release mode. Verify optimal compiler/linker settings for the Survey tool. Build the target in release mode and test. Verify Intel Advisor Project Properties. Each step is described more fully below. Get Software Tools and Unpack the Sample You need the following tools to try tutorial steps yourself using the nqueens_Fortran sample application: • • Intel Advisor, including sample application .zip file extraction utility • Supported compiler (see Release Notes for more information) Acquire and Install Intel Advisor If you do not already have the Intel Advisor, you can download an evaluation copy from https:// software.intel.com/en-us/intel-software-evaluation-center/. NOTE If you have not rebooted your system since you installed the Intel Advisor, please do so now. Set Up Intel Advisor Sample Application 1. Copy the nqueens_Fortran.zip file from the <install-dir>\samples\<locale>\Fortran\ directory to a writable directory or share on your system. The default installation path is C:\Program Files (x86)\Intel\Advisor XE 201n\ (on certain systems, instead of Program Files (x86), the directory name is Program Files). 2. Extract the sample from the .zip file. 11 2 Intel® Advisor Tutorial: Find Where to Add Parallelism Open Visual Studio* Solution 1. 2. 3. Launch the Microsoft Visual Studio* IDE. Choose File > Open > Project/Solution. In the Open Project dialog box, open the nqueens.sln file to display the nqueens solution in the Solution Explorer. Choose Startup Project If the 1_nqueens_serial project is not the startup project (project typeface in the Solution Explorer is not bold), 1. 2. Right-click the 1_nqueens_serial project in the Solution Explorer. Choose Set as StartUp Project. Verify Target is Set to Build in Release Mode If the Solutions Configuration drop-down on the Visual Studio* Standard toolbar is set to Debug, change it to Release. Verify Optimal Compiler/Linker Options for Survey Tool Applications compiled/linked using the following options produce the most accurate, complete, analysis results. Verify the 1_nqueens_serial project uses the optimal settings for the Survey tool. To Do This Use These Release Build Settings for Survey and Suitability Tools Use These Debug Build Settings for Correctness Tool Search additional include directories and library related to Intel Advisor annotation definitions. If you build your application using the Visual Studio* IDE, use these project properties: If you build your application using the Visual Studio* IDE, use these project properties: • • • Fortran > General > Additional Include Directories > "$ (ADVISOR_XE_2015_DIR) \include\ia32\" or "$ (ADVISOR_XE_2015_DIR) \include\intel64\" Linker > General > Additional Library Directories > "$ (ADVISOR_XE_2015_DIR) \lib32" or "$ (ADVISOR_XE_2015_DIR) \lib64" Linker > Input > Additional Dependencies > libadvisor.lib If you build your application using a Windows* command line, add these options to your build command: • • • • Fortran > General > Additional Include Directories > "$ (ADVISOR_XE_2015_DIR)\include \ia32\" or "$ (ADVISOR_XE_2015_DIR)\include \intel64\" Linker > General > Additional Library Directories > "$ (ADVISOR_XE_2015_DIR)\lib32" or "$(ADVISOR_XE_2015_DIR)\lib64" Linker > Input > Additional Dependencies > libadvisor.lib If you build your application using a Windows* command line, add these options to your build command: • /I"%ADVISOR_XE_2015_DIR%"\include\ia32 or /I"%ADVISOR_XE_2015_DIR%"\include\intel64 • /L"%ADVISOR_XE_2015_DIR%"\lib32 or /I"%ADVISOR_XE_2015_DIR%"\include\ia32 /L"%ADVISOR_XE_2015_DIR%"\lib64 or • /ladvisor /I"%ADVISOR_XE_2015_DIR%"\include\intel64 12 2 Find Where to Add Parallelism To Do This Use These Release Build Settings for Survey and Suitability Tools • Use These Debug Build Settings for Correctness Tool /L"%ADVISOR_XE_2015_DIR%"\lib32 or /L"%ADVISOR_XE_2015_DIR%"\lib64 • Compiler: Request full debug information. Compiler: Request moderate optimization (for Survey and Suitability tool) or disable optimization (for Correctness tool). /ladvisor Visual Studio* IDE: Fortran > General > Debug Information Format > Full (/debug=full) Visual Studio* IDE: Fortran > General > Debug Information Format > Full (/ debug=full) Command line: /debug=full Command line: /debug=full Visual Studio* IDE: Visual Studio* IDE: Fortran > Optimization > Disable (/Od) • • Fortran > Optimization > Optimization > Maximize Speed or higher Fortran > Optimization > Inline Function Expansion > Only INLINE Directive (/ Ob1) Command line: /Od Command line: • /O2 or higher • /Ob1 Linker: Search for unresolved references in multithreaded, dynamically linked libraries. Linker: Request full debug information Visual Studio* IDE: Fortran > Libraries > Runtime Library > Multithread DLL (/ libs:dll /threads) or Debug Multithread DLL (/libs:dll /threads /dbglibs) Command line: /MD or /MDd Visual Studio* IDE: Linker > Debugging > Generate Debug Info > Yes (/DEBUG) Visual Studio* IDE: Linker > Debugging > Generate Debug Info > Yes (/DEBUG) Command line: /DEBUG Command line: /DEBUG Build Target in Release Mode and Test 1. 2. 3. 4. Choose Build > Build 1_nqueens_serial to build the target. Choose Debug > Start Without Debugging. If the Visual Studio* IDE responds that any projects are out of date, click Yes to build them. Check for a display similar to the following. 5. Notice the application output window displays a board size of 14 and the total time it took to run the target. 13 2 6. Intel® Advisor Tutorial: Find Where to Add Parallelism Press Enter (or any key) to dismiss the application output window. Verify Intel Advisor Project Properties 1. Choose Project > Intel Advisor version Project Properties... to display a dialog box similar to the following. 2. In the Analysis Target tab, ensure the Target type drop-down is set to Survey/Suitability Analysis and the Inherit settings from Visual Studio* project checkbox is selected. Click the OK button to close the dialog box. 3. Key Terms target Next Step Discover Parallel Opportunities 14 2 Find Where to Add Parallelism Standalone Intel Advisor GUI: Build Target in Release Mode and Create New Project Follow these initial steps if you prefer to use the Standalone Intel Advisor GUI to complete this tutorial: • • • • • • Get software tools and unpack the sample. Verify optimal compiler/linker settings for the Survey tool. Build the target in release mode. Test the target. Open the Standalone Intel Advisor GUI. Create a new Intel Advisor project. Each step is described more fully below. Get Software Tools and Unpack the Sample You need the following tools to try tutorial steps yourself using the nqueens_Fortran sample application: • • .zip file extraction utility Intel Advisor, including sample application • Supported compiler (see Release Notes for more information) Acquire and Install Intel Advisor If you do not already have the Intel Advisor, you can download an evaluation copy from https:// software.intel.com/en-us/intel-software-evaluation-center/. NOTE If you have not rebooted your system since you installed the Intel Advisor, please do so now. Set Up Intel Advisor Sample Application 1. Copy the nqueens_Fortran.zip file from the <install-dir>\samples\<locale>\Fortran\ directory to a writable directory or share on your system. The default installation path (<installdir>) is C:\Program Files (x86)\Intel\Advisor XE 201n\. (On certain systems, instead of Program Files (x86), the directory name is Program Files.) 2. Extract the sample from the .zip file. Verify Optimal Compiler/Linker Settings for Survey Tool Applications compiled/linked using the following options produce the most accurate, complete, analysis results. Verify the sample code uses the optimal release build settings for the Survey and Suitability tools. To Do This Use These Release Build Settings for Survey and Suitability Tools Use These Debug Build Settings for Correctness Tool Search additional include directories and library related to Intel Advisor annotation definitions. If you build your application using the Visual Studio* IDE, use these project properties: If you build your application using the Visual Studio* IDE, use these project properties: • Fortran > General > Additional Include Directories > "$ (ADVISOR_XE_2015_DIR) • Fortran > General > Additional Include Directories > "$ (ADVISOR_XE_2015_DIR)\include \ia32\" or "$ (ADVISOR_XE_2015_DIR)\include \intel64\" 15 2 Intel® Advisor Tutorial: Find Where to Add Parallelism To Do This Use These Release Build Settings for Survey and Suitability Tools • • \include\ia32\" or "$ (ADVISOR_XE_2015_DIR) \include\intel64\" Linker > General > Additional Library Directories > "$ (ADVISOR_XE_2015_DIR) \lib32" or "$ (ADVISOR_XE_2015_DIR) \lib64" Linker > Input > Additional Dependencies > libadvisor.lib If you build your application using a Windows* command line, add these options to your build command: • Use These Debug Build Settings for Correctness Tool • • Linker > General > Additional Library Directories > "$ (ADVISOR_XE_2015_DIR)\lib32" or "$(ADVISOR_XE_2015_DIR)\lib64" Linker > Input > Additional Dependencies > libadvisor.lib If you build your application using a Windows* command line, add these options to your build command: • /I"%ADVISOR_XE_2015_DIR%"\include\ia32 or /I"%ADVISOR_XE_2015_DIR%"\include\intel64 • /L"%ADVISOR_XE_2015_DIR%"\lib32 or /L"%ADVISOR_XE_2015_DIR%"\lib64 /I"%ADVISOR_XE_2015_DIR%"\include\ia32 • /ladvisor or /I"%ADVISOR_XE_2015_DIR%"\include\intel64 • /L"%ADVISOR_XE_2015_DIR%"\lib32 or /L"%ADVISOR_XE_2015_DIR%"\lib64 • Compiler: Request full debug information. Compiler: Request moderate optimization (for Survey and Suitability tool) or disable optimization (for Correctness tool). /ladvisor Visual Studio* IDE: Fortran > General > Debug Information Format > Full (/debug=full) Visual Studio* IDE: Fortran > General > Debug Information Format > Full (/ debug=full) Command line: /debug=full Command line: /debug=full Visual Studio* IDE: Visual Studio* IDE: Fortran > Optimization > Disable (/Od) • • Fortran > Optimization > Optimization > Maximize Speed or higher Fortran > Optimization > Inline Function Expansion > Only INLINE Directive (/ Ob1) Command line: /Od Command line: Linker: Search for unresolved references in multithreaded, dynamically linked libraries. 16 • /O2 or higher • /Ob1 Visual Studio* IDE: Fortran > Libraries > Runtime Library > Multithread DLL (/ libs:dll /threads) or Debug Multithread DLL (/libs:dll /threads /dbglibs) Command line: /MD or /MDd 2 Find Where to Add Parallelism To Do This Use These Release Build Settings for Survey and Suitability Tools Use These Debug Build Settings for Correctness Tool Linker: Request full debug information Visual Studio* IDE: Linker > Debugging > Generate Debug Info > Yes (/DEBUG) Visual Studio* IDE: Linker > Debugging > Generate Debug Info > Yes (/DEBUG) Command line: /DEBUG Command line: /DEBUG Build Target in Release Mode 1. 2. 3. From the Windows* Start menu, choose All Programs > Intel studio version > Analyzers > Command prompt > mode to set your environment. In the command prompt window, change directory to the nqueens_Fortran\ directory (where the zipped sample files were extracted to). Type devenv nqueens.sln /build release /project 1_nqueens_serial to build the target in release mode. Test Target 1. 2. 3. In the command prompt window, change directory to the 1_nqueens_serial\Release\ directory. Type 1_nqueens_serial.exe to execute the target. Check for output similar to the following. 4. Notice the application output window displays a board size of 14 and the total time it took to run the target. TIP Keep the command prompt window open. Open Standalone Intel Advisor GUI From the Windows* Start menu, choose All Programs > Intel studio version > Analyzers > Intel Advisor version. Create New Intel Advisor Project 1. In the Intel Advisor GUI, choose File > New > Project... (or click New Project.... in the Welcome page) to display a dialog box similar to the following. 17 2 Intel® Advisor Tutorial: Find Where to Add Parallelism 2. Type nqueens_Fortran in the Project name field and C:\Temp\Samples\ (default location) in the Location field. Then click the Create Project button to create a config.advixeproj file and display a dialog box similar to the following. 3. In the Analysis Target tab, ensure the Target type drop-down is set to Survey/Suitability Analysis. Click the Browse... button next to the Application field and choose the nqueens_Fortran \1_nqueens_serial\Release\1_nqueens_serial.exe file. Notice the Intel Advisor autofills the project Working directory field for you. Click the Binary/Symbol Search tab. In this tab: 4. 5. • • 6. button in the Add new search location line. Click the Navigate to and select the nqueens_Fortran\1_nqueens_serial\Release\ directory. • Verify the Search recursively box is deselected. Click the Source Search tab. In this tab: • • Click the button in the Add new search location line. Navigate to and select the nqueens_Fortran\1_nqueens_serial\ directory. • Verify the Search recursively box is deselected. 18 2 Find Where to Add Parallelism 7. Click the OK button to add the changes. Notice that the project name appears in the Intel Advisor GUI title bar and the Advisor Workflow. Key Terms target Next Step Discover Parallel Opportunities Discover Parallel Opportunities To discover parallel opportunities in serial code: • • • • Collect Survey data. View the Survey Report. View Survey Source. View the Summary window. Each step is described more fully below. Collect Survey Data To start a Survey analysis: 1. If the Advisor Workflow is not currently displayed, click the open it. icon on the Intel Advisor toolbar to NOTE If you are using the Visual Studio* IDE and the to display a drop-down with more toolbar icons. 2. icon is not visible on the toolbar, click the Under 1. Survey Target in the Advisor Workflow, click the target executes. icon icon to collect Survey data while the During the Survey analysis, the Intel Advisor displays a window similar to the following. 19 2 Intel® Advisor Tutorial: Find Where to Add Parallelism NOTE You can ignore any warnings about missing debugging symbols during this tutorial. The navigation toolbar lets you navigate among various result tab windows. If you see >> and << scroll controls, click them to see more window options. During collection, the main area of the Survey Report shows collection progress and milestones, and runtime performance characteristics. The Collection Log pane shows collection informational, warning, and error messages. Notice you can redirect non-GUI application output to the Application Output pane. The command toolbar lets you control collection and analysis. For example: You can skip uninteresting code regions by temporarily stopping data collection while the target continues to run and then resuming data collection (Pause and Resume buttons); stop target execution, analysis, and data collection, finalize the partially collected data, and display the result (Stop button); cancel target execution, analysis, and data collection, discard the collected data, and skip finalization (Cancel button). You can also use the Command Line button to view and copy the equivalent advixe-cl command and options to run this Survey analysis. NOTE This tutorial explains only how to use Intel Advisor tools from the Visual Studio* IDE or the Intel Advisor GUI. 20 2 Find Where to Add Parallelism View Survey Report After the Survey tool finalizes the data, the Intel Advisor displays the Survey Report and an infotip in the top left corner. Close the infotip to display a window similar to the following. The Function Call Sites and Loops column shows an extended top-down call tree of our target, starting with the main entry point. Each function or loop appears on a separate row. A loop is indicated by a icon under the function/procedure that executes it. You can scroll up and down as well as show or hide functions and loops. The Total Time % column starts in the top line with a value of 100%. The Total Time column shows the time spent in a function or loop and all functions called from it. A row with a large Total Time % and multiple children with smaller total times is a possible candidate for parallelism. The Self Time column shows how much time was spent in the function or loop. Loops or functions with significant self time values are possible candidates for distributing work. The Source Location column shows the source file and line number associated with the location in the call tree. The Hot Loops column and icons identify time-consuming loops from a Total Time perspective. These hotspots are often good candidates for parallelism. Our target spends the most time in the NQUEENS_ip_SETQUEEN function, which is called from the NQUEENS_ip_SOLVE function, so it is not surprising to find the top time-consuming hot loop there. Notice the NQUEENS_ip_SETQUEEN function calls itself recursively. The Survey Report toolbar helps you quickly locate hot loops and view the Survey Source window for the currently selected row. 21 2 Intel® Advisor Tutorial: Find Where to Add Parallelism The control shows/hides the command toolbar. Try clicking it now. The Annotations Example (annotations assistant) area, which is closed in this screenshot, shows various annotation samples and build settings you can copy directly into your editor. We will discuss annotation examples later in this tutorial. View Survey Source To dig deeper into the first hot loop, double-click the associated row to display a Survey Source window similar to the following. The File pane initially shows the source code and time values for the first hot loop. The Call Stacks and Loops pane shows the call stack for the first hot loop. Clicking a row in the Call Stack with Loops pane displays the associated code in the File pane. Try clicking a row in this pane now. The Annotations Example (annotations assistant) area, which is open in this screenshot, shows an annotation code snippet for a simple loop structure. We will discuss annotation examples later in this tutorial. Click the X beside Survey Source on the navigation toolbar to close the Survey Source window. View Summary Window Click Summary on the navigation toolbar to open a Summary window similar to the following. Think of this window as a dashboard to which the Intel Advisor adds data each time you run Intel Advisor tools. 22 2 Find Where to Add Parallelism This area summarizes possible loops where you might add parallelism. It also provides easy access to the Survey Report window and your sources. Try clicking a Loop link now. Then return to the Summary window and try clicking a Source Location link. You currently have collection data from only one of the three Intel Advisor analysis tools. That will change very soon. Key Terms annotation, hotspot, target Next Step Mark Best Parallel Opportunities With Annotations Mark Best Parallel Opportunities With Annotations Intel Advisor annotations are either subroutine calls or macros, depending on the programming language. Annotations can be processed by your current compiler but do not change the computations of your application. Use them to mark places in serial parts of your application that are good candidates for later replacement with parallel framework code that enables parallel execution. The main types of Intel Advisor annotations mark the location of: 23 2 Intel® Advisor Tutorial: Find Where to Add Parallelism • A parallel site. A parallel site is a region of code that contains one or more tasks that may execute in parallel. An effective parallel site typically contains a hotspot that consumes application execution time. To distribute these frequently executed instructions to different tasks that can run at the same time, the best parallel site is not usually located at the hotspot, but higher in the call tree. • One or more parallel tasks within a parallel site. A task is a portion of time-consuming code with data that can be executed in one or more parallel threads to distribute work. • Locking synchronization, where mutual exclusion of data access must occur in the parallel application. Intel Advisor provides example annotated source code for you (accessible in the Survey Report and Survey Source windows) that you can copy directly into your editor: Annotation Code Snippet Purpose Iteration Loop, Single Task Create a simple loop structure, where the task code includes the entire loop body. This common task structure is useful when only a single task is needed within a parallel site. Loop, One or More Tasks Create loops where the task code does not include all of the loop body, or complex loops or code that requires specific task begin-end boundaries, including multiple task end annotations. This structure is also useful when multiple tasks are needed within a parallel site. Function, One or More Tasks Create code that calls multiple tasks within a parallel site. Pause/Resume Collection Temporarily pause data collection and later resume it, so you can skip uninteresting parts of target execution to minimize collected data and speed up analysis of large applications. Add these annotations outside a parallel site. Build Settings Set build (compiler and linker) settings specific to the language in use. TIP • • • Annotations are fully explained in Intel Advisor Help. When adding annotations to your own application, remember to include the annotations definitions, such as advisor-annotate for Fortran programs. In your own application, choosing where to add task annotations may require some experimentation. If your parallel site has nested loops and the computation time used by the innermost loop is small, consider adding task annotations around the next outermost loop. Add Parallel Site and Task Annotations Because we are trying to keep this tutorial short, we already added parallel site and task annotations to the sample code for you. All you need to do is uncomment them. 1. 2. Click Survey Report on the navigation toolbar to re-open the Survey Report. Right-click the data row with the first hot loop and choose Edit Source to open the nqueens_serial.f90 source file in an editor. program NQueens ! ! ! ! ! ! ! ! ! ! 24 Solve the nqueens problem - How many ways can you put 'n' queens on an n-by-n chess board without them being able to attack each other? Read http://en.wikipedia.org/wiki/Nqueens for background Original C++ code by Ralf Ratering & Mario Deilmann Fortran version by Steve Lionel & others To set command line argument in Visual Studio: 1) Right click on the project name and select 'Properties'. 2) Under 'Debugging', enter the argument (board size) in the 2 Find Where to Add Parallelism ! 'Command Arguments' field. !ADVISOR SUITABILITY EDIT: To use the Advisor Annotations: !ADVISOR SUITABILITY EDIT: Uncomment the "use advisor_annotate" line below !use advisor_annotate implicit none integer :: nrOfSolutions = 0 integer :: size = 0 ! Counts the number of solutions. ! The board size; read from the command line. ! The number of correct solutions for each board size (1-15). integer, parameter, dimension(15) :: correct_solution = (/ 1, 0, 0, 2, 10, 4, 40, 92, 352, 724, 2680, 14200, 73712, 365596, 2279184 /) character(200) :: cmd_name ! Command/Program Name character(400) :: cmd_line ! The full command line integer :: cmd_len ! The command-line length character(2) :: cmd_arg ! The command arguments integer :: stat ! Library call status value integer :: time_start, time_end, count_rate ! Timing variables integer :: nthreads = 1 ! Number of threads to use. & & & & 100 format(A,A,A) 101 format(A,I0,A,I0,A) ! Get the board size from the command line argument. if (command_argument_count() < 1) then call get_command_argument(0, cmd_name, cmd_len, status=stat) print 100, "Usage: ", cmd_name(1:cmd_len), " boardSize" size = 14 print *, "Using default size of 14" else call get_command_argument(1, cmd_arg, status=stat) read(cmd_arg, *, iostat=stat) size ! Limit the board size. If it is too small, the program may finish before ! suitability or other analyses can produce an accurate result. If the ! board is too large, the program will take a long time. if ((stat /= 0) .or. (size < 4) .or. (size > 15)) then print *, "Error: boardSize must be between 4 and 15; resetting to 14" size = 14 end if endif ! Time how long it takes to find all solution boards. print 101, "Starting nqueens solver for size ", size, " with ", nthreads, & " thread(s)." call system_clock(time_start) call solve() call system_clock(time_end, count_rate) ! Evaluate and report the result. print 101, "Number of solutions: ", nrOfSolutions if (nrOfSolutions == correct_solution(size)) then print *, "Correct Result!" else print *, "Incorrect Result!" end if 25 2 Intel® Advisor Tutorial: Find Where to Add Parallelism print 101, "Calculations took ", (time_end-time_start) / (count_rate/1000), & "ms." ! End of Main Program contains ! Recursive routine to find all solutions on the board (the array 'queens') ! when we place the next queen at location (row, col). ! This increments the global nrOfSolutions with each solution found. ! ! Although the recusive call in this function may appear several times in ! the survey results, the solve() function is a better-performing ! parallelization candidate, due to its coarser granularity. ! !ADVISOR CORRECTNESS EDIT: In order to avoid data races and correctness !ADVISOR CORRECTNESS EDIT: issues on the 'queens' array, we have to make !ADVISOR CORRECTNESS EDIT: a private copy of it. !ADVISOR CORRECTNESS EDIT: So rename 'queens' to 'queens_in' in the next two !ADVISOR CORRECTNESS EDIT: lines recursive subroutine setQueen(queens, row, col) integer, intent(inout) :: queens(:) integer, intent(in) :: row, col integer :: i integer, volatile :: j !ADVISOR CORRECTNESS EDIT: Uncomment the declaration of queens, and the !ADVISOR CORRECTNESS EDIT: assignment statement, which will creates !ADVISOR CORRECTNESS EDIT: a private copy of in_queens. !integer :: queens(ubound(queens_in, dim=1)) !queens = queens_in do i = 1, row-1 ! Check for vertical attacks. if (queens(i) == col) return ! Check for diagonal attacks. if (abs(queens(i)-col) == (row-i)) return end do ! Position is safe; set the queen. queens(row) = col if (row == size) then !ADVISOR CORRECTNESS EDIT: Uncomment the following 2 lock annotations !ADVISOR CORRECTNESS EDIT: to avoid a datarace on nrOfSolutions. !call annotate_lock_acquire(0) nrOfSolutions = nrOfSolutions + 1 !call annotate_lock_release(0) else ! Try to fill next row. do j = 1, size call setQueen(queens, row+1, j) end do end if end subroutine SetQueen ! Find all solutions for the nQueens problem on a size x size chessboard. ! On return, nrOfSolutions = number of solutions. ! !ADVISOR COMMENT: When surveying, this is the top CPU-consuming function !ADVISOR COMMENT: below the main function. This subroutine's do loop is 26 2 Find Where to Add Parallelism !ADVISOR CONTENT: an excellent candidate for parallelization. subroutine solve() integer :: i integer, allocatable :: queens(:) ! Array representing the chess board. allocate(queens(size)) queens = 0 !ADVISOR SUITABILITY EDIT: Uncomment the three annotation calls below to !ADVISOR SUITABILITY EDIT: model parallelizing the body of this do loop. !call annotate_site_begin("solve") do i = 1, size !call annotate_iteration_task("setQueen") ! Try all positions in first row. call SetQueen(queens, 1, i) end do !call annotate_site_end() deallocate(queens) end subroutine solve end program 3. Search for ADVISOR SUITABILITY EDIT and follow the directions in the sample code. Make four total edits: Uncomment the !use advisor_annotate line near the top and three annotation lines. TIP Now is also a good time to simply explore our fully commented sample code. 4. Save your edits. Rebuild Target in Release Mode Do one of the following: • • In the Visual Studio* IDE: Choose Build > Build 1_nqueens_serial. In the command prompt window: Change directory to the nqueens_Fortran\ directory, then type devenv nqueens.sln /build release /project 1_nqueens_serial. Key Terms annotations, parallel site, synchronization, target, task Next Step Predict Maximum Parallel Performance Speedup Predict Maximum Parallel Performance Speedup To predict the maximum parallel performance speedup of your target based on the added Intel Advisor annotations: • • • Collect Suitability data. View the Suitability Report. View the Summary window. Each step is described more fully below. 27 2 Intel® Advisor Tutorial: Find Where to Add Parallelism Collect Suitability Data Under 3. Check Suitability in the Advisor Workflow, click the the target executes. button to collect Suitability data while During the Suitability analysis, the Intel Advisor displays a window similar to the following. NOTE You can ignore any warnings about missing debugging symbols during this tutorial. View Suitability Report After the Suitability tool finalizes the data, the Intel Advisor displays a window similar to the following. 28 2 Find Where to Add Parallelism The Maximum Program Gain For All Sites value shows the predicted maximum speedup of our target based on Intel Advisor annotations and currently selected modeling parameters. Over a 6x speedup is good! This grid shows various metrics for each parallel site based on currently selected modeling parameters, including the site's Impact to Program Gain. Our target has a single parallel site - the solve parallel site, as identified in the Site Label column. Use these modeling parameter drop-downs to experiment with different hardware configurations and parallel frameworks. Drop-Down Set to This Target System • • • CPU to model predicted maximum speedup when executing all parallel sites on host CPUs Intel Xeon Phi to model predicted maximum speedup when executing all parallel sites on Intel® Xeon™ Phi coprocessors Offload to Intel Xeon Phi to model predicted maximum speedup when executing: • • Threading Model Serial parts of our target on host CPUs Parallel sites, on a site-by-site basis, on host CPUs or Intel Xeon Phi coprocessors Intel TBB, Intel Cilk Plus, OpenMP, Microsoft TPL, or Other to model predicted maximum speedup using the parallel framework 29 2 Intel® Advisor Tutorial: Find Where to Add Parallelism If Target System DropDown = This Then And Offload to Intel Xeon Phi Checkbox in This CPU Count Drop-Down = This And Coprocessor Threads Drop-Down =This = CPU Hidden A modeling number of CPUs that will work in parallel for all parallel sites in your target Hidden Intel Xeon Phi Hidden Hidden A modeling number of coprocessor threads that will work in parallel for all parallel sites in your target Offload to Intel Xeon Phi Selected A modeling number of CPUs that will work in parallel for this parallel site and all other sites not selected for offload. A modeling number of coprocessor threads that will work on the Intel Xeon Phi coprocessor for this parallel site and all other parallel sites selected for offload Deselected A modeling number of CPUs that will work in parallel for this parallel site and all other sites not selected for offload. A modeling number of coprocessor threads that will work on the Intel Xeon Phi coprocessor for this parallel site and all other parallel sites selected for offload The Scalability of Maximum Site Gain diagram graphically shows the predicted maximum speedup for the solve parallel site in different scaling scenarios based on currently selected modeling parameters. A Bulls-Eye in This Area Means This Red Parallelization is not beneficial - and may even cause performance degradation. Consider removing or modifying annotations, or significantly refactoring the corresponding hotspot if you want to parallelize it at any cost. Yellow The predicted maximum speedup may not be enough to justify the effort needed to refactor and maintain your application. Consider investigating. Green Parallel performance - and power efficiency - may improve significantly. Use the Loop Iterations (Tasks) Modeling sliders and the Apply button to experiment with different iteration counts and instance durations. 30 2 Find Where to Add Parallelism Use the Runtime Modeling checkboxes to experiment with predicted maximum speedup if you plan to use parallel framework code constructs to address parallel overhead, lock contention, or task chunking; or if you plan to tune parallel code after you implement parallelism. This area shows issues that generally prevent better parallel performance. A green bar is good; it means this issue is not negatively impacting predicted maximum speedup. A yellow or red bar is not good. The Site Details area shows information about the solve parallel site and the setQueen task within that parallel site. Notice how your screen changes if you choose a Target System of Intel Xeon Phi or Offload to Intel Xeon Phi. The Scalability of Maximum Site Gain diagram graphically shows the predicted performance of the manycore parallel coprocessor and its host CPUs. For many applications, the number of task instances does not scale enough to fully utilize the many cores of the parallel coprocessor. An application that is ready for an Intel Xeon Phi coprocessing system has a bulls-eye in the green part of the diagram. A bulls-eye in the gray part of the diagram indicates an application that is not ready for an Intel Xeon Phi coprocessing system; in such cases, try modeling another type of Target System. Use the Intel Xeon Phi Advanced Modeling checkbox, fields, and the Apply button to model the expected speedup if you plan to modify your parallel code to improve vector parallel execution. 31 2 Intel® Advisor Tutorial: Find Where to Add Parallelism TIP These modeling parameters are fully explained in Intel Advisor Help. Try experimenting now to see the impact of various modeling parameters on predicted maximum speedup throughout the Suitability Report. View Summary Window Click Summary on the navigation toolbar to re-open the Summary window. Notice the Intel Advisor added more data to this dashboard. This area summarizes the maximum parallel performance speedup. It also provides easy access to the Suitability Report window and your sources. Try clicking the Maximum Site Gain link now. Then return to the Summary window and try clicking the Parallel Site link. The question marks for detected Correctness Problems mean you have not yet collected any Correctness data. In addition to the newly acquired information from the Suitability Report, the dashboard still shows data from the Survey Report. You now have collection data from two of the three Intel Advisor analysis tools. 32 2 Find Where to Add Parallelism Key Terms annotations, parallel site, target, task Next Step Predict Parallel Data Sharing Problems Predict Parallel Data Sharing Problems To predict parallel data sharing problems in your target based on the added Intel Advisor annotations: • • • • • Build the target in debug mode and test. Change Intel Advisor project properties. Collect Correctness data. View the Correctness Report. View the Summary window. Each step is described more fully below. Build Target in Debug Mode and Test If you prefer to work in the Visual Studio* IDE 1. 2. 3. 4. 5. 6. 7. 8. If the Solutions Configuration drop-down on the Visual Studio* Standard toolbar in set to Release, change to Debug. Open the Property Pages dialog box for the 1_nqueens_serial project. Choose Configuration Properties > Debugging and set the Command Arguments field to 8. This reduces the size of the chess board to minimize execution time. Save your changes and close the Property Pages dialog box. Choose Build > Build 1_nqueens_serial. Choose Debug > Start Without Debugging. If the Visual Studio* IDE responds that any projects are out of date, click Yes to build them. Notice the application output window displays a board size of 8. If you prefer to work in the Standalone Intel Advisor GUI 1. Build the target in debug mode: • • 2. In the command window, change directory to the nqueens_Fortran\ directory (where the zipped sample files were extracted to). Type devenv nqueens.sln /build debug /project 1_nqueens_serial to build the target in debug mode. Test the target: • In the command window, change directory to the 1_nqueens_serial\Debug\ directory. • Type 1_nqueens_serial.exe 8 to execute the application using a reduced chessboard size to minimize execution time. Notice the application output window displays a board size of 8. • Change Intel Advisor Project Properties If you prefer to work in the Visual Studio* IDE 1. 2. Choose Projects > Intel Advisor version Project Properties. In the Analysis Target tab: • Change the Target type drop-down to Correctness Analysis. 33 2 Intel® Advisor Tutorial: Find Where to Add Parallelism • 3. Verify the Inherit settings from Visual Studio* project checkbox is selected and the Application parameters field is set to 8. Click the OK button to save the changes. If you prefer to work in the Standalone Intel Advisor GUI 1. 2. Choose File > Project Properties.... In the Analysis Target tab: • • Change the Target type drop-down to Correctness Analysis. Click the Browse... button next to the Application field and choose the nqueens_Fortran \1_nqueens_serial\Debug\1_nqueens_serial.exe file. • 3. 4. Set the Application parameters field to 8. This reduces the size of the chess board to minimize execution time. Click the Binary/Symbol Search tab and change the current search location directory to the nqueens_Fortran\Debug\ directory. Click the OK button to save the changes. Collect Correctness Data Under 4. Check Correctness in the Advisor Workflow, click the while the target executes. button to collect Correctness data During the Correctness analysis, the Intel Advisor displays a window similar to the following. View Correctness Report After the Correctness tool finalizes the data, the Intel Advisor displays a window similar to the following. 34 2 Find Where to Add Parallelism The Problems and Messages pane lists the detected messages and potential data sharing problems: Data communication and Memory reuse problems. The severity of each problem or message is indicated by the following icons: for error, for warning, or for informational remark. Clicking a problem or message row displays more information about it in the Code Locations pane. Try clicking the Data communication row now. The Filter pane lets you temporarily limit the problems and messages displayed in the Problems and Messages pane to those that meet specific criteria. Problems in parallel programming usually involve multiple, interrelated code regions. The Code Locations pane shows a code snippet from each involved code region. If you need to dig deeper into a data sharing problem, you can double-click the associated row in the Problems and Messages pane to display a Correctness Source window where you can: • • See more source code than just a short code snippet. Navigate through the call stack. View Summary Window Click Summary on the navigation toolbar to re-open the Summary window. Notice the Intel Advisor added even more data to this dashboard to help you weigh the predicted maximum speedup benefit against the cost of fixing sharing problems for your sites and tasks. 35 2 Intel® Advisor Tutorial: Find Where to Add Parallelism Notice the question marks have been replaced with a Correctness Problems summary: Three problems with a severity of error and no problems with a severity of warning. You now have collection data from all three Intel Advisor analysis tools. Congratulations! Key Terms data race, synchronization Next Step Fix Data Sharing Problems Fix Data Sharing Problems Is the predicted maximum speedup benefit worth the effort to fix the data sharing problems? In this case, yes. But you may have a different answer when you use the Intel Advisor to find where to add parallelism in your own applications. • • • • Fix memory reuse and data communication sharing problems. Rebuild the target in debug mode. Rerun the Correctness tool. View the Summary window. Each step is described more fully below. 36 2 Find Where to Add Parallelism Fix Memory Reuse and Data Communication Sharing Problems 1. 2. Click Correctness Report in the navigation toolbar to re-open the Correctness Report. In the Problems and Messages pane, right-click the Data communication data row and choose Edit Source to open the nqueens_serial.f90 source file in an editor. 3. Search for ADVISOR CORRECTNESS EDIT and follow the directions in the sample code to fix the problems. Make six total edits: Make a private copy of queens_in and uncomment two lock annotation lines. Save your edits. 4. Rebuild Target in Debug Mode If you prefer to work in the Visual Studio* IDE 1. 2. If the Solutions Configuration drop-down on the Visual Studio* Standard toolbar is set to Release, change to Debug. Choose Build > Build 1_nqueens_serial. If you prefer to work in the Standalone Intel Advisor GUI 1. In the command window, change directory to the nqueens_Fortran\ directory (where the zipped sample files were extracted to). 2. Type devenv nqueens.sln /build debug /project 1_nqueens_serial to rebuild the target in debug mode. Rerun Correctness Tool Under 4. Check Correctness in the Advisor Workflow, click the again. button to collect Correctness data Notice the Correctness Report now reports one problem. View Summary Window Click Summary in the navigation toolbar to re-open the Summary window. Notice this dashboard now shows only one Correctness problem. Key Terms data race, synchronization, target Next Step Add Parallelism Add Parallelism At this point, you would normally: 1. 2. 3. Rebuild the target in release mode. Re-run the Suitability Report to see how your Correctness fixes impact the predicted maximum speedup. Decide if the predicted maximum speedup benefit is worth the effort to add parallelism to your target. If you decide to add parallelism to your target, you would: 1. 2. Replace the Intel Advisor annotations with parallel framework code. Build a parallel version of your target in release mode. 37 2 Intel® Advisor Tutorial: Find Where to Add Parallelism For the Purposes of This Tutorial Because we are trying to keep this tutorial short, we replaced Intel Advisor annotations with OpenMP* parallel framework code for you. Consider exploring these replacements by opening the nqueens_omp.f90 file. TIP The steps for replacing annotations with parallel framework code are fully explained in Intel Advisor Help. In addition, consider building the OpenMP* target in release mode - to, perhaps, compare actual parallel execution time to actual serial execution time. Key Terms parallel framework Next Steps After you convert Intel Advisor annotations to parallel framework code, test the resulting parallel application for correctness and verify its actual parallel performance using the Intel® Inspector and Intel® VTune™ Amplifier respectively. 38 3 Summary This tutorial demonstrated an end-to-end workflow you can ultimately apply to your own applications. Step Tutorial Recap Key Tutorial Take-aways 1. Prepare for tutorial. If you worked in the Visual Studio* IDE: You chose an Intel Advisor sample application project; verified it is set to produce the most accurate and complete analysis results; built it in release mode, tested the resulting target to ensure it runs on your system outside the Intel Advisor, and verified Intel Advisor project properties. • • A target is an executable file the Intel Advisor can analyze. Applications compiled and linked in release mode using the following options produce the most accurate and complete Survey and Suitability analysis results: • • /I"%ADVISOR_XE_2015_DIR %"\include\ia32 or / I"%ADVISOR_XE_2015_DIR %"\include\intel64 • /L"%ADVISOR_XE_2015_DIR %"\lib32 or / L"%ADVISOR_XE_2015_DIR %"\lib64 • /ladvisor If you worked in the Standalone Intel Advisor GUI: You chose an Intel Advisor sample application, built it in release mode, tested the resulting target to ensure it runs on your system outside the Intel Advisor, and created and configured a new Intel Advisor project to hold analysis results for the target. • • You ran a Survey analysis on the target to highlight hotspots that you subsequently explored. Compiler/Full debug information: / debug=full • Step 2: Discover parallel opportunities. Compiler/Additional include directories and library: • • • Compiler/Moderate optimization: /O2 or higher and /Ob1 Linker/Full debug information: /DEBUG Hotspots are code regions that consume a significant amount of runtime. Loops are often the most time-consuming parts of an application. Use the Advisor Workflow to: • • • Provide a roadmap for finding where to add parallelism. • Launch Intel Advisor analysis tools. • Provide links to relevant topics in Intel Advisor Help. Use the Survey Report to locate the loops and functions where the target spends the most time. Think of the Summary window as a dashboard to which the Intel Advisor adds more data each time you run Intel Advisor tools. 39 3 Intel® Advisor Tutorial: Find Where to Add Parallelism Step Tutorial Recap Key Tutorial Take-aways Step 3: Mark best parallel opportunities with annotations. You marked the hotspots with parallel site and task annotations, and rebuilt the target in release mode. • • • • Step 4: Predict maximum parallel performance speedup. You ran a Suitability analysis to predict the maximum parallel performance speedup based on the added annotations, and posed modeling (what-if) questions. Step 5: Predict parallel data sharing problems. You built the target in debug mode, changed Intel Advisor project properties, and ran a Correctness analysis that discovered parallel data sharing problems based on the added annotations. • • • • Annotations are subroutine calls or macros that identify certain information for Intel Advisor analysis tools, such as the location of proposed parallel sites. A parallel site is a region of code that contains one or more time-consuming tasks that may execute in parallel threads to distribute work. Include annotation definitions in your source file(s) like so: use advisor_annotate. Annotations are fully explained in Intel Advisor Help. Use the Suitability Report to show the predicted maximum speedup for each parallel site and for the target as a whole. Perform mathematical modeling to see how changing various parameters influences the Maximum Program Gain For All Sites and other values. A data race occurs when multiple tasks read and write data at a shared memory location without coordinating those read and write operations. This can produce parallel execution errors that are difficult to detect and reproduce. Applications compiled and linked in debug mode using the following options produce the most accurate and complete Correctness analysis results: • Compiler/Additional include directories and library: • /I"%ADVISOR_XE_2015_DIR %"\include\ia32 or / I"%ADVISOR_XE_2015_DIR %"\include\intel64 • /L"%ADVISOR_XE_2015_DIR %"\lib32 or / L"%ADVISOR_XE_2015_DIR %"\lib64 • /ladvisor • Compiler/Full debug information: / debug=full • • • • 40 Compiler/No optimization: /Od Compiler/Multithreaded, dynamically linked libraries: /MD or /MDd • Linker/Full debug information: /DEBUG Use the Correctness Report to predict parallel data sharing problems in the annotated target. Reduce the input data set to minimize Correctness tool execution time. 3 Summary Step Tutorial Recap Key Tutorial Take-aways Step 6: Fix data sharing problems. You fixed the easy parallel data sharing problems, rebuilt the target in debug mode, and ran another Correctness analysis to ensure you corrected most of the parallel data sharing problems. • You explored how we converted Intel Advisor annotations into the OpenMP* parallel framework for you. • Step 7: Add parallelism. • • • • Fix parallel data sharing problems only if the predicted maximum speedup benefit outweighs the cost of the fix. Unlike problems reported in serial applications, which often have a single cause, problems in parallel applications usually involve multiple, interrelated code regions. A parallel framework is a combination of libraries, language features, or other software techniques that enable code to execute in parallel. Add parallelism only if the predicted maximum speedup benefit outweighs the cost of adding parallel framework code. The steps for replacing annotations with parallel framework code are fully explained in Intel Advisor Help. After you convert Intel Advisor annotations to parallel framework code, test the resulting parallel application for correctness and verify its actual performance using the Intel® Inspector and Intel® VTune™ Amplifier respectively. 41 4 Intel® Advisor Tutorial: Find Where to Add Parallelism Key Terms 4 The following terms are used throughout this tutorial. annotation: Intel Advisor annotations are call statements that identify certain information to Intel Advisor tools, such as the location of proposed parallel sites. To insert annotations into your source code, you can copy code snippets from the annotation assistant pane into your code editor. On Windows systems, you can instead use the Intel Advisor annotation wizard. data race: A bug that can occur after adding parallelism to parts of your application. A data race occurs when multiple tasks read and write data at a shared memory location without coordinating those read and write operations. This can produce parallel execution errors that are difficult to detect and reproduce. Using the Correctness tool helps you predict and fix likely data races before you add parallelism. hotspot: A code region that consumes much of your application's run time, such as a loop, and is often a good candidate for parallelism. Hotspots can be identified by a profiler, such as the Intel Advisor Survey tool or the Intel® VTune™ Amplifier. parallel framework: A combination of libraries, language features, or other software techniques that enable code for your application to execute in parallel. Examples for C/C++ include Intel® Threading Building Blocks and Intel® Cilk™ Plus, which are both included with the Intel compiler. The OpenMP* parallel framework for C/C++ and Fortran code is available with multiple compilers. parallel site: A region of code that contains one or more tasks that may execute in parallel. An effective parallel site typically contains a hotspot that consumes much of your application's time. To distribute these frequently executed instructions to different tasks that can run at the same time, your parallel site is not usually located at the hotspot, but higher in the call tree. For example, a parallel site might be located in a function whose code eventually executes the hotspot. All tasks that were started within a site must complete before execution is allowed to proceed past the end of a site. synchronization: Coordinating the execution of multiple threads. In some cases, you can provide synchronization within a task by using a private memory location instead of a shared memory location. In other cases, you can add a lock or mutex to restrict access to shared data and… prevent a data race. target: An executable file. Intel Advisor tools run with your target executable to collect data and perform analysis about its execution characteristics. task: A portion of time-consuming code and its data that can be executed in one or more parallel threads to distribute work. One or more tasks execute within a parallel site. 42
© Copyright 2024