Intel Advisor Tutorial: Find Where to Add Parallelism

Intel® Advisor Tutorial: Find Where to
Add Parallelism
Intel® Parallel Studio XE 2015 Professional Edition for Windows*
Intel® Parallel Studio XE 2015 Cluster Edition for Windows*
This tutorial shows how to find where to add parallelism to a serial, Fortran sample
application using the Intel Advisor. It demonstrates an end-to-end workflow you can
ultimately apply to your own applications.
Intel® Advisor Tutorial: Find Where to Add Parallelism
Contents
Legal Information................................................................................ 3
Overview.............................................................................................. 4
Chapter 1: Navigation Quick Start
Chapter 2: Find Where to Add Parallelism
Visual Studio* IDE: Choose Project and Build Target in Release Mode............. 11
Standalone Intel Advisor GUI: Build Target in Release Mode and Create New
Project............................................................................................... 15
Discover Parallel Opportunities...................................................................19
Mark Best Parallel Opportunities With Annotations........................................ 23
Predict Maximum Parallel Performance Speedup...........................................27
Predict Parallel Data Sharing Problems........................................................33
Fix Data Sharing Problems........................................................................ 36
Add Parallelism........................................................................................37
Chapter 3: Summary
Chapter 4: Key Terms
2
Legal Information
By using this document, in addition to any agreements you have with Intel, you accept the terms set forth
below.
You may not use or facilitate the use of this document in connection with any infringement or other legal
analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free
license to any patent claim thereafter drafted which includes subject matter disclosed herein.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS
GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR
SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR
IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR
WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT
OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or
indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH
MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES,
SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH,
HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES
ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR
DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR
ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL
PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers
must not rely on the absence or characteristics of any features or instructions marked "reserved" or
"undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts
or incompatibilities arising from future changes to them. The information here is subject to change without
notice. Do not finalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may
cause the product to deviate from published specifications. Current characterized errata are available on
request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before
placing your product order. Copies of documents which have an order number and are referenced in this
document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://
www.intel.com/design/literature.htm
BlueMoon, BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Inside, Cilk, Core Inside, E-GOLD,
Flexpipe, i960, Intel, the Intel logo, Intel AppUp, Intel Atom, Intel Atom Inside, Intel CoFluent, Intel Core,
Intel Inside, Intel Insider, the Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel
SingleDriver, Intel SpeedStep, Intel Sponsors of Tomorrow., the Intel Sponsors of Tomorrow. logo, Intel
StrataFlash, Intel vPro, Intel Xeon Phi, Intel XScale, InTru, the InTru logo, the InTru Inside logo, InTru
soundmark, Itanium, Itanium Inside, MCS, MMX, Pentium, Pentium Inside, Puma, skoool, the skoool logo,
SMARTi, Sound Mark, Stay With It, The Creators Project, The Journey Inside, Thunderbolt, Ultrabook, vPro
Inside, VTune, Xeon, Xeon Inside, X-GOLD, XMM, X-PMU and XPOSYS are trademarks of Intel Corporation in
the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation
in the United States and/or other countries.
Copyright © 2009-2014, Intel Corporation. All rights reserved
3
Intel® Advisor Tutorial: Find Where to Add Parallelism
Overview
Discover how to find where to add parallelism to a serial application using the Intel® Advisor and the
nqueens_Fortran Fortran sample application.
About This
Tutorial
This tutorial demonstrates an end-to-end workflow you can ultimately apply to your
own applications:
1.
2.
3.
4.
5.
6.
7.
Survey the target executable to locate the loops and functions where your
application spends the most time.
In the target sources, add Intel Advisor annotations to mark possible parallel
tasks and their enclosing parallel sites.
Check Suitability to predict the maximum parallel performance speedup of the
target based on these annotations.
Check Correctness to predict parallel data sharing problems in the target based on
these annotations.
If the predicted maximum speedup benefit is worth the effort to fix the predicted
parallel data sharing problems, fix the problems.
Recheck Suitability to see how your fixes impact the predicted maximum
speedup.
If the predicted maximum speedup benefit is still worth the effort to add
parallelism to the target, replace the annotations with parallel framework code
that enables parallel execution.
Estimated
Duration
20 minutes.
Learning
Objectives
After you complete this tutorial, you should be able to:
•
•
•
•
•
More Resources
List the steps to find where to add parallelism using the Intel Advisor.
Define key Intel Advisor terms.
Identify compiler/linker options that produce the most accurate and complete Intel
Advisor analysis results.
Run all Intel Advisor analysis tools.
View, interpret, and manipulate data collected by Intel Advisor analysis tools.
The concepts and procedures in this tutorial apply regardless of programming
language; however, a similar tutorial using a sample application in another
programming language may be available at http://software.intel.com/en-us/intelsoftware-technical-documentation. This site also offers tutorials for other Intel®
products and a printable version (PDF) of many tutorials.
In addition, you can find more resources in Intel Advisor Help.
Next Step
Find Where to Add Parallelism
4
Navigation Quick Start
1
Intel Advisor provides tools that help you decide where to add parallelism to your application. Use Intel
Advisor for applications:
•
•
Created with the C/C++, Fortran, or C# programming languages
Containing serial code that could possibly be replaced with parallel (multithreaded) code
Intel Advisor Access
To access the Intel Advisor in the Microsoft Visual Studio* IDE: From the Windows* Start menu, choose All
Programs > Intel studio version > Visual Studio integrations > Use VSversion.
To access the Standalone Intel Advisor GUI, do one of the following:
•
•
From the Windows* Start menu, choose All Programs > Intel studio version > Analyzers > Intel
Advisor version.
From the Windows* Start menu, choose All Programs > Intel studio version > Analyzers >
Command prompt > mode to set your environment, then type advixe-gui.
Intel Advisor/Visual Studio* IDE Integration
5
1
Intel® Advisor Tutorial: Find Where to Add Parallelism
The Visual Studio* menu and Intel Advisor toolbar offer different ways to perform many of the same
functions.
On the Visual Studio* menu, use:
•
•
•
•
•
View > Intel Advisor version to open the Advisor Workflow and Summary window
Project > Intel Advisor version Project Properties to manage projects
Tools > Intel Advisor version to open the Advisor Workflow and Summary window, and
launch the Intel Advisor analysis tools
Tools > Options > Intel Advisor version to set various product options
Help > Intel Advisor version to open the Intel Advisor Help and Getting Started
documentation
Use the Intel Advisor toolbar to open the Advisor Workflow and Summary windows; launch the
Intel Advisor analysis tools; configure projects; and access resources such as Help, Getting Started
documentation, and online training and documentation. (Other Visual Studio* product versions may
show fewer icons. If so, click the
icon to display a drop-down with more toolbar icons.)
Use the Visual Studio* Solution Explorer to manage Intel Advisor projects and results.
Use the Intel Advisor result tab to view, interpret, and manipulate the data collected by Intel Advisor
analysis tools. There is one active result for each project. You can also create read-only result
snapshots (click the
icon).
Use the Visual Studio* Output window to view Visual Studio* and Intel Advisor analysis output.
6
1
Navigation Quick Start
Standalone Intel Advisor GUI
The Intel Advisor menu, toolbar, and Project Navigator offer different ways to perform many of
the same functions.
On the menu, use:
•
•
•
File to manage projects and results, and launch the Intel Advisor analysis tools
View to toggle on a full-screen view of result tab data (press F11 or ESC to toggle off full-screen
view); and open the Welcome tab, Advisor Workflow, Summary window, and Project
Navigator
Help to access resources such as Help, Getting Started documentation, and online training and
documentation
Use the toolbar to manage projects and results; open the Advisor Workflow, Summary window,
and Project Navigator; and launch the Intel Advisor analysis tools.
Use the Project Navigator to see a hierarchical view of your projects and results based on the
directory where the opened project resides; open the Advisor Workflow; launch Intel Advisor
analysis tools; manage results and projects; and access resources such as Help, Getting Started
documentation, and online training and documentation.
Use the Intel Advisor result tab to view, interpret, and manipulate the data collected by Intel Advisor
analysis tools. There is one active result for each project. You can also create read-only result
snapshots (click the
icon).
7
1
Intel® Advisor Tutorial: Find Where to Add Parallelism
Advisor Workflow
The Advisor Workflow:
Provides a roadmap for finding where to add
parallelism.
Lets you launch Intel Advisor analysis tools.
Provides links to relevant topics in Intel Advisor
Help.
Shows the name of your current project.
8
1
Navigation Quick Start
Intel Advisor Result Tab
Use the result tab name to identify a result. It consists of the project name and an identifier in the
format ennn.
Use the navigation toolbar to navigate among various result tab windows. >> and << scroll controls
appear if the full navigation toolbar cannot be displayed.
Use the command toolbar to control collection and analysis. The
control shows/hides this toolbar.
Use the Intel Advisor result tab to view, interpret, and manipulate the data collected by Intel Advisor
analysis tools.
9
2
Intel® Advisor Tutorial: Find Where to Add Parallelism
Find Where to Add Parallelism
2
Follow these steps to find where to add parallelism to a serial application using the Intel Advisor and the
nqueens_Fortran Fortran sample application.
Step
Step Detail
Step 1: Prepare for
tutorial.
Do one of the following:
Step 2: Discover parallel
opportunities.
•
If you prefer to work in the Microsoft Visual Studio* IDE:
•
•
•
•
•
•
•
•
If
Get software tools and unpack the sample.
Open a Visual Studio* solution.
Choose a sample startup project.
Verify the target is set to build in release mode.
Verify optimal compiler/linker settings for the Survey tool.
Build the target in release mode and test.
Set Intel Advisor project properties.
you prefer to work in the Standalone Intel Advisor GUI:
•
•
•
•
•
•
Get software tools and unpack the sample.
Verify optimal compiler/linker settings for the Survey tool.
Build the sample target in release mode.
Test the target.
Open the Intel Advisor.
Create a new project.
•
•
•
•
Step 3: Mark best parallel
opportunities with
annotations.
•
Step 4: Predict maximum
parallel performance
speedup.
•
Step 5: Predict parallel
data sharing problems.
•
•
•
•
•
•
•
•
Step 6: Fix data sharing
problems.
•
•
•
•
•
10
Collect Survey data.
View Survey Report.
View Survey source.
View Summary window.
Add parallel site and task annotations.
Rebuild target in release mode.
Collect Suitability data.
View Suitability Report.
View Summary window.
Build target in debug mode.
Change Intel Advisor project properties.
Collect Correctness data.
View Correctness Report.
View Summary window.
Make cost/benefit decision.
Fix memory reuse and data communication sharing problems.
Rebuild target in debug mode.
Rerun Correctness tool.
View Summary window.
2
Find Where to Add Parallelism
Step
Step Detail
Step 7: Add parallelism.
Under normal circumstances:
•
•
•
•
•
Rebuild target in release mode.
Collect Suitability data again.
Make cost/benefit decision.
Replace Intel Advisor annotations with parallel framework code.
Build parallel version of target in release mode.
For the purposes of this tutorial: Explore how we replaced Intel Advisor
annotations with parallel framework code for you.
Key Terms
annotations, parallel framework, target
Visual Studio* IDE: Choose Project and Build Target in
Release Mode
Follow these initial steps if you prefer to use the Intel Advisor plug-in to the Visual Studio* IDE to complete
this tutorial.
•
•
•
•
•
•
•
Get software tools and unpack the sample.
Open a Visual Studio* solution.
Choose a startup project.
Verify the target is set to build in release mode.
Verify optimal compiler/linker settings for the Survey tool.
Build the target in release mode and test.
Verify Intel Advisor Project Properties.
Each step is described more fully below.
Get Software Tools and Unpack the Sample
You need the following tools to try tutorial steps yourself using the nqueens_Fortran sample application:
•
•
Intel Advisor, including sample application
.zip file extraction utility
•
Supported compiler (see Release Notes for more information)
Acquire and Install Intel Advisor
If you do not already have the Intel Advisor, you can download an evaluation copy from https://
software.intel.com/en-us/intel-software-evaluation-center/.
NOTE
If you have not rebooted your system since you installed the Intel Advisor, please do so now.
Set Up Intel Advisor Sample Application
1.
Copy the nqueens_Fortran.zip file from the <install-dir>\samples\<locale>\Fortran\
directory to a writable directory or share on your system. The default installation path is C:\Program
Files (x86)\Intel\Advisor XE 201n\ (on certain systems, instead of Program Files (x86), the
directory name is Program Files).
2.
Extract the sample from the .zip file.
11
2
Intel® Advisor Tutorial: Find Where to Add Parallelism
Open Visual Studio* Solution
1.
2.
3.
Launch the Microsoft Visual Studio* IDE.
Choose File > Open > Project/Solution.
In the Open Project dialog box, open the nqueens.sln file to display the nqueens solution in the
Solution Explorer.
Choose Startup Project
If the 1_nqueens_serial project is not the startup project (project typeface in the Solution Explorer is not
bold),
1.
2.
Right-click the 1_nqueens_serial project in the Solution Explorer.
Choose Set as StartUp Project.
Verify Target is Set to Build in Release Mode
If the Solutions Configuration drop-down on the Visual Studio* Standard toolbar is set to Debug, change
it to Release.
Verify Optimal Compiler/Linker Options for Survey Tool
Applications compiled/linked using the following options produce the most accurate, complete, analysis
results. Verify the 1_nqueens_serial project uses the optimal settings for the Survey tool.
To Do This
Use These Release Build
Settings for Survey and
Suitability Tools
Use These Debug Build Settings for
Correctness Tool
Search additional
include directories
and library related
to Intel Advisor
annotation
definitions.
If you build your application using
the Visual Studio* IDE, use these
project properties:
If you build your application using the Visual
Studio* IDE, use these project properties:
•
•
•
Fortran > General >
Additional Include
Directories > "$
(ADVISOR_XE_2015_DIR)
\include\ia32\" or "$
(ADVISOR_XE_2015_DIR)
\include\intel64\"
Linker > General >
Additional Library Directories
> "$
(ADVISOR_XE_2015_DIR)
\lib32" or "$
(ADVISOR_XE_2015_DIR)
\lib64"
Linker > Input > Additional
Dependencies > libadvisor.lib
If you build your application using a
Windows* command line, add these
options to your build command:
•
•
•
•
Fortran > General > Additional
Include Directories > "$
(ADVISOR_XE_2015_DIR)\include
\ia32\" or "$
(ADVISOR_XE_2015_DIR)\include
\intel64\"
Linker > General > Additional
Library Directories > "$
(ADVISOR_XE_2015_DIR)\lib32" or
"$(ADVISOR_XE_2015_DIR)\lib64"
Linker > Input > Additional
Dependencies > libadvisor.lib
If you build your application using a
Windows* command line, add these options
to your build command:
•
/I"%ADVISOR_XE_2015_DIR%"\include\ia32
or
/I"%ADVISOR_XE_2015_DIR%"\include\intel64
•
/L"%ADVISOR_XE_2015_DIR%"\lib32
or
/I"%ADVISOR_XE_2015_DIR%"\include\ia32
/L"%ADVISOR_XE_2015_DIR%"\lib64
or
• /ladvisor
/I"%ADVISOR_XE_2015_DIR%"\include\intel64
12
2
Find Where to Add Parallelism
To Do This
Use These Release Build
Settings for Survey and
Suitability Tools
•
Use These Debug Build Settings for
Correctness Tool
/L"%ADVISOR_XE_2015_DIR%"\lib32
or
/L"%ADVISOR_XE_2015_DIR%"\lib64
•
Compiler: Request
full debug
information.
Compiler: Request
moderate
optimization (for
Survey and
Suitability tool) or
disable optimization
(for Correctness
tool).
/ladvisor
Visual Studio* IDE: Fortran >
General > Debug Information
Format > Full (/debug=full)
Visual Studio* IDE: Fortran > General >
Debug Information Format > Full (/
debug=full)
Command line: /debug=full
Command line: /debug=full
Visual Studio* IDE:
Visual Studio* IDE: Fortran >
Optimization > Disable (/Od)
•
•
Fortran > Optimization >
Optimization > Maximize
Speed or higher
Fortran > Optimization >
Inline Function Expansion >
Only INLINE Directive (/
Ob1)
Command line: /Od
Command line:
•
/O2 or higher
•
/Ob1
Linker: Search for
unresolved
references in
multithreaded,
dynamically linked
libraries.
Linker: Request full
debug information
Visual Studio* IDE: Fortran > Libraries >
Runtime Library > Multithread DLL (/
libs:dll /threads) or Debug Multithread
DLL (/libs:dll /threads /dbglibs)
Command line: /MD or /MDd
Visual Studio* IDE: Linker >
Debugging > Generate Debug
Info > Yes (/DEBUG)
Visual Studio* IDE: Linker > Debugging >
Generate Debug Info > Yes (/DEBUG)
Command line: /DEBUG
Command line: /DEBUG
Build Target in Release Mode and Test
1.
2.
3.
4.
Choose Build > Build 1_nqueens_serial to build the target.
Choose Debug > Start Without Debugging.
If the Visual Studio* IDE responds that any projects are out of date, click Yes to build them.
Check for a display similar to the following.
5.
Notice the application output window displays a board size of 14 and the total time it took to run the
target.
13
2
6.
Intel® Advisor Tutorial: Find Where to Add Parallelism
Press Enter (or any key) to dismiss the application output window.
Verify Intel Advisor Project Properties
1.
Choose Project > Intel Advisor version Project Properties... to display a dialog box similar to the
following.
2.
In the Analysis Target tab, ensure the Target type drop-down is set to Survey/Suitability
Analysis and the Inherit settings from Visual Studio* project checkbox is selected.
Click the OK button to close the dialog box.
3.
Key Terms
target
Next Step
Discover Parallel Opportunities
14
2
Find Where to Add Parallelism
Standalone Intel Advisor GUI: Build Target in Release Mode
and Create New Project
Follow these initial steps if you prefer to use the Standalone Intel Advisor GUI to complete this tutorial:
•
•
•
•
•
•
Get software tools and unpack the sample.
Verify optimal compiler/linker settings for the Survey tool.
Build the target in release mode.
Test the target.
Open the Standalone Intel Advisor GUI.
Create a new Intel Advisor project.
Each step is described more fully below.
Get Software Tools and Unpack the Sample
You need the following tools to try tutorial steps yourself using the nqueens_Fortran sample application:
•
•
.zip file extraction utility
Intel Advisor, including sample application
•
Supported compiler (see Release Notes for more information)
Acquire and Install Intel Advisor
If you do not already have the Intel Advisor, you can download an evaluation copy from https://
software.intel.com/en-us/intel-software-evaluation-center/.
NOTE
If you have not rebooted your system since you installed the Intel Advisor, please do so now.
Set Up Intel Advisor Sample Application
1.
Copy the nqueens_Fortran.zip file from the <install-dir>\samples\<locale>\Fortran\
directory to a writable directory or share on your system. The default installation path (<installdir>) is C:\Program Files (x86)\Intel\Advisor XE 201n\. (On certain systems, instead of
Program Files (x86), the directory name is Program Files.)
2.
Extract the sample from the .zip file.
Verify Optimal Compiler/Linker Settings for Survey Tool
Applications compiled/linked using the following options produce the most accurate, complete, analysis
results. Verify the sample code uses the optimal release build settings for the Survey and Suitability tools.
To Do This
Use These Release Build
Settings for Survey and
Suitability Tools
Use These Debug Build Settings for
Correctness Tool
Search additional
include directories
and library related
to Intel Advisor
annotation
definitions.
If you build your application using
the Visual Studio* IDE, use these
project properties:
If you build your application using the Visual
Studio* IDE, use these project properties:
•
Fortran > General >
Additional Include
Directories > "$
(ADVISOR_XE_2015_DIR)
•
Fortran > General > Additional
Include Directories > "$
(ADVISOR_XE_2015_DIR)\include
\ia32\" or "$
(ADVISOR_XE_2015_DIR)\include
\intel64\"
15
2
Intel® Advisor Tutorial: Find Where to Add Parallelism
To Do This
Use These Release Build
Settings for Survey and
Suitability Tools
•
•
\include\ia32\" or "$
(ADVISOR_XE_2015_DIR)
\include\intel64\"
Linker > General >
Additional Library Directories
> "$
(ADVISOR_XE_2015_DIR)
\lib32" or "$
(ADVISOR_XE_2015_DIR)
\lib64"
Linker > Input > Additional
Dependencies > libadvisor.lib
If you build your application using a
Windows* command line, add these
options to your build command:
•
Use These Debug Build Settings for
Correctness Tool
•
•
Linker > General > Additional
Library Directories > "$
(ADVISOR_XE_2015_DIR)\lib32" or
"$(ADVISOR_XE_2015_DIR)\lib64"
Linker > Input > Additional
Dependencies > libadvisor.lib
If you build your application using a
Windows* command line, add these options
to your build command:
•
/I"%ADVISOR_XE_2015_DIR%"\include\ia32
or
/I"%ADVISOR_XE_2015_DIR%"\include\intel64
•
/L"%ADVISOR_XE_2015_DIR%"\lib32
or
/L"%ADVISOR_XE_2015_DIR%"\lib64
/I"%ADVISOR_XE_2015_DIR%"\include\ia32
• /ladvisor
or
/I"%ADVISOR_XE_2015_DIR%"\include\intel64
•
/L"%ADVISOR_XE_2015_DIR%"\lib32
or
/L"%ADVISOR_XE_2015_DIR%"\lib64
•
Compiler: Request
full debug
information.
Compiler: Request
moderate
optimization (for
Survey and
Suitability tool) or
disable optimization
(for Correctness
tool).
/ladvisor
Visual Studio* IDE: Fortran >
General > Debug Information
Format > Full (/debug=full)
Visual Studio* IDE: Fortran > General >
Debug Information Format > Full (/
debug=full)
Command line: /debug=full
Command line: /debug=full
Visual Studio* IDE:
Visual Studio* IDE: Fortran >
Optimization > Disable (/Od)
•
•
Fortran > Optimization >
Optimization > Maximize
Speed or higher
Fortran > Optimization >
Inline Function Expansion >
Only INLINE Directive (/
Ob1)
Command line: /Od
Command line:
Linker: Search for
unresolved
references in
multithreaded,
dynamically linked
libraries.
16
•
/O2 or higher
•
/Ob1
Visual Studio* IDE: Fortran > Libraries >
Runtime Library > Multithread DLL (/
libs:dll /threads) or Debug Multithread
DLL (/libs:dll /threads /dbglibs)
Command line: /MD or /MDd
2
Find Where to Add Parallelism
To Do This
Use These Release Build
Settings for Survey and
Suitability Tools
Use These Debug Build Settings for
Correctness Tool
Linker: Request full
debug information
Visual Studio* IDE: Linker >
Debugging > Generate Debug
Info > Yes (/DEBUG)
Visual Studio* IDE: Linker > Debugging >
Generate Debug Info > Yes (/DEBUG)
Command line: /DEBUG
Command line: /DEBUG
Build Target in Release Mode
1.
2.
3.
From the Windows* Start menu, choose All Programs > Intel studio version > Analyzers >
Command prompt > mode to set your environment.
In the command prompt window, change directory to the nqueens_Fortran\ directory (where the
zipped sample files were extracted to).
Type devenv nqueens.sln /build release /project 1_nqueens_serial to build the target in
release mode.
Test Target
1.
2.
3.
In the command prompt window, change directory to the 1_nqueens_serial\Release\ directory.
Type 1_nqueens_serial.exe to execute the target.
Check for output similar to the following.
4.
Notice the application output window displays a board size of 14 and the total time it took to run the
target.
TIP
Keep the command prompt window open.
Open Standalone Intel Advisor GUI
From the Windows* Start menu, choose All Programs > Intel studio version > Analyzers > Intel
Advisor version.
Create New Intel Advisor Project
1.
In the Intel Advisor GUI, choose File > New > Project... (or click New Project.... in the Welcome
page) to display a dialog box similar to the following.
17
2
Intel® Advisor Tutorial: Find Where to Add Parallelism
2.
Type nqueens_Fortran in the Project name field and C:\Temp\Samples\ (default location) in the
Location field. Then click the Create Project button to create a config.advixeproj file and display
a dialog box similar to the following.
3.
In the Analysis Target tab, ensure the Target type drop-down is set to Survey/Suitability
Analysis.
Click the Browse... button next to the Application field and choose the nqueens_Fortran
\1_nqueens_serial\Release\1_nqueens_serial.exe file. Notice the Intel Advisor autofills the
project Working directory field for you.
Click the Binary/Symbol Search tab. In this tab:
4.
5.
•
•
6.
button in the Add new search location line.
Click the
Navigate to and select the nqueens_Fortran\1_nqueens_serial\Release\ directory.
• Verify the Search recursively box is deselected.
Click the Source Search tab. In this tab:
•
•
Click the
button in the Add new search location line.
Navigate to and select the nqueens_Fortran\1_nqueens_serial\ directory.
•
Verify the Search recursively box is deselected.
18
2
Find Where to Add Parallelism
7.
Click the OK button to add the changes. Notice that the project name appears in the Intel Advisor GUI
title bar and the Advisor Workflow.
Key Terms
target
Next Step
Discover Parallel Opportunities
Discover Parallel Opportunities
To discover parallel opportunities in serial code:
•
•
•
•
Collect Survey data.
View the Survey Report.
View Survey Source.
View the Summary window.
Each step is described more fully below.
Collect Survey Data
To start a Survey analysis:
1.
If the Advisor Workflow is not currently displayed, click the
open it.
icon on the Intel Advisor toolbar to
NOTE
If you are using the Visual Studio* IDE and the
to display a drop-down with more toolbar icons.
2.
icon is not visible on the toolbar, click the
Under 1. Survey Target in the Advisor Workflow, click the
target executes.
icon
icon to collect Survey data while the
During the Survey analysis, the Intel Advisor displays a window similar to the following.
19
2
Intel® Advisor Tutorial: Find Where to Add Parallelism
NOTE
You can ignore any warnings about missing debugging symbols during this tutorial.
The navigation toolbar lets you navigate among various result tab windows. If you see >> and <<
scroll controls, click them to see more window options.
During collection, the main area of the Survey Report shows collection progress and milestones,
and runtime performance characteristics. The Collection Log pane shows collection informational,
warning, and error messages. Notice you can redirect non-GUI application output to the Application
Output pane.
The command toolbar lets you control collection and analysis. For example: You can skip
uninteresting code regions by temporarily stopping data collection while the target continues to run
and then resuming data collection (Pause and Resume buttons); stop target execution, analysis,
and data collection, finalize the partially collected data, and display the result (Stop button); cancel
target execution, analysis, and data collection, discard the collected data, and skip finalization
(Cancel button). You can also use the Command Line button to view and copy the equivalent
advixe-cl command and options to run this Survey analysis.
NOTE
This tutorial explains only how to use Intel Advisor tools from the Visual Studio* IDE or the Intel
Advisor GUI.
20
2
Find Where to Add Parallelism
View Survey Report
After the Survey tool finalizes the data, the Intel Advisor displays the Survey Report and an infotip in the
top left corner. Close the infotip to display a window similar to the following.
The Function Call Sites and Loops column shows an extended top-down call tree of our target,
starting with the main entry point. Each function or loop appears on a separate row. A loop is
indicated by a
icon under the function/procedure that executes it. You can scroll up and down as
well as show or hide functions and loops.
The Total Time % column starts in the top line with a value of 100%. The Total Time column
shows the time spent in a function or loop and all functions called from it. A row with a large Total
Time % and multiple children with smaller total times is a possible candidate for parallelism.
The Self Time column shows how much time was spent in the function or loop. Loops or functions
with significant self time values are possible candidates for distributing work.
The Source Location column shows the source file and line number associated with the location in
the call tree.
The Hot Loops column and
icons identify time-consuming loops from a Total Time perspective.
These hotspots are often good candidates for parallelism.
Our target spends the most time in the NQUEENS_ip_SETQUEEN function, which is called from the
NQUEENS_ip_SOLVE function, so it is not surprising to find the top time-consuming hot loop there.
Notice the NQUEENS_ip_SETQUEEN function calls itself recursively.
The Survey Report toolbar helps you quickly locate hot loops and view the Survey Source window
for the currently selected row.
21
2
Intel® Advisor Tutorial: Find Where to Add Parallelism
The
control shows/hides the command toolbar. Try clicking it now.
The Annotations Example (annotations assistant) area, which is closed in this screenshot, shows
various annotation samples and build settings you can copy directly into your editor. We will discuss
annotation examples later in this tutorial.
View Survey Source
To dig deeper into the first hot loop, double-click the associated row to display a Survey Source window
similar to the following.
The File pane initially shows the source code and time values for the first hot loop.
The Call Stacks and Loops pane shows the call stack for the first hot loop. Clicking a row in the
Call Stack with Loops pane displays the associated code in the File pane. Try clicking a row in this
pane now.
The Annotations Example (annotations assistant) area, which is open in this screenshot, shows an
annotation code snippet for a simple loop structure. We will discuss annotation examples later in this
tutorial.
Click the X beside Survey Source on the navigation toolbar to close the Survey Source window.
View Summary Window
Click Summary on the navigation toolbar to open a Summary window similar to the following. Think of this
window as a dashboard to which the Intel Advisor adds data each time you run Intel Advisor tools.
22
2
Find Where to Add Parallelism
This area summarizes possible loops where you might add parallelism. It also provides easy access
to the Survey Report window and your sources. Try clicking a Loop link now. Then return to the
Summary window and try clicking a Source Location link.
You currently have collection data from only one of the three Intel Advisor analysis tools. That will
change very soon.
Key Terms
annotation, hotspot, target
Next Step
Mark Best Parallel Opportunities With Annotations
Mark Best Parallel Opportunities With Annotations
Intel Advisor annotations are either subroutine calls or macros, depending on the programming language.
Annotations can be processed by your current compiler but do not change the computations of your
application.
Use them to mark places in serial parts of your application that are good candidates for later replacement
with parallel framework code that enables parallel execution.
The main types of Intel Advisor annotations mark the location of:
23
2
Intel® Advisor Tutorial: Find Where to Add Parallelism
• A parallel site. A parallel site is a region of code that contains one or more tasks that may execute in
parallel. An effective parallel site typically contains a hotspot that consumes application execution time. To
distribute these frequently executed instructions to different tasks that can run at the same time, the best
parallel site is not usually located at the hotspot, but higher in the call tree.
• One or more parallel tasks within a parallel site. A task is a portion of time-consuming code with data that
can be executed in one or more parallel threads to distribute work.
• Locking synchronization, where mutual exclusion of data access must occur in the parallel application.
Intel Advisor provides example annotated source code for you (accessible in the Survey Report and Survey
Source windows) that you can copy directly into your editor:
Annotation Code
Snippet
Purpose
Iteration Loop,
Single Task
Create a simple loop structure, where the task code includes the entire loop body.
This common task structure is useful when only a single task is needed within a
parallel site.
Loop, One or
More Tasks
Create loops where the task code does not include all of the loop body, or complex
loops or code that requires specific task begin-end boundaries, including multiple
task end annotations. This structure is also useful when multiple tasks are needed
within a parallel site.
Function, One or
More Tasks
Create code that calls multiple tasks within a parallel site.
Pause/Resume
Collection
Temporarily pause data collection and later resume it, so you can skip uninteresting
parts of target execution to minimize collected data and speed up analysis of large
applications. Add these annotations outside a parallel site.
Build Settings
Set build (compiler and linker) settings specific to the language in use.
TIP
•
•
•
Annotations are fully explained in Intel Advisor Help.
When adding annotations to your own application, remember to include the annotations definitions,
such as advisor-annotate for Fortran programs.
In your own application, choosing where to add task annotations may require some
experimentation. If your parallel site has nested loops and the computation time used by the
innermost loop is small, consider adding task annotations around the next outermost loop.
Add Parallel Site and Task Annotations
Because we are trying to keep this tutorial short, we already added parallel site and task annotations to the
sample code for you. All you need to do is uncomment them.
1.
2.
Click Survey Report on the navigation toolbar to re-open the Survey Report.
Right-click the data row with the first hot loop and choose Edit Source to open the
nqueens_serial.f90 source file in an editor.
program NQueens
!
!
!
!
!
!
!
!
!
!
24
Solve the nqueens problem - How many ways can you put 'n' queens on an
n-by-n chess board without them being able to attack each other?
Read http://en.wikipedia.org/wiki/Nqueens for background
Original C++ code by Ralf Ratering & Mario Deilmann
Fortran version by Steve Lionel & others
To set command line argument in Visual Studio:
1) Right click on the project name and select 'Properties'.
2) Under 'Debugging', enter the argument (board size) in the
2
Find Where to Add Parallelism
!
'Command Arguments' field.
!ADVISOR SUITABILITY EDIT: To use the Advisor Annotations:
!ADVISOR SUITABILITY EDIT: Uncomment the "use advisor_annotate" line below
!use advisor_annotate
implicit none
integer :: nrOfSolutions = 0
integer :: size = 0
! Counts the number of solutions.
! The board size; read from the command line.
! The number of correct solutions for each board size (1-15).
integer, parameter, dimension(15) :: correct_solution = (/
1,
0,
0,
2,
10,
4,
40,
92, 352,
724,
2680,
14200, 73712, 365596, 2279184 /)
character(200) :: cmd_name
! Command/Program Name
character(400) :: cmd_line
! The full command line
integer
:: cmd_len
! The command-line length
character(2) :: cmd_arg
! The command arguments
integer :: stat
! Library call status value
integer :: time_start, time_end, count_rate ! Timing variables
integer :: nthreads = 1
! Number of threads to use.
&
&
&
&
100 format(A,A,A)
101 format(A,I0,A,I0,A)
! Get the board size from the command line argument.
if (command_argument_count() < 1) then
call get_command_argument(0, cmd_name, cmd_len, status=stat)
print 100, "Usage: ", cmd_name(1:cmd_len), " boardSize"
size = 14
print *, "Using default size of 14"
else
call get_command_argument(1, cmd_arg, status=stat)
read(cmd_arg, *, iostat=stat) size
! Limit the board size. If it is too small, the program may finish before
! suitability or other analyses can produce an accurate result. If the
! board is too large, the program will take a long time.
if ((stat /= 0) .or. (size < 4) .or. (size > 15)) then
print *, "Error: boardSize must be between 4 and 15; resetting to 14"
size = 14
end if
endif
! Time how long it takes to find all solution boards.
print 101, "Starting nqueens solver for size ", size, " with ", nthreads, &
" thread(s)."
call system_clock(time_start)
call solve()
call system_clock(time_end, count_rate)
! Evaluate and report the result.
print 101, "Number of solutions: ", nrOfSolutions
if (nrOfSolutions == correct_solution(size)) then
print *, "Correct Result!"
else
print *, "Incorrect Result!"
end if
25
2
Intel® Advisor Tutorial: Find Where to Add Parallelism
print 101, "Calculations took ", (time_end-time_start) / (count_rate/1000), &
"ms."
! End of Main Program
contains
! Recursive routine to find all solutions on the board (the array 'queens')
! when we place the next queen at location (row, col).
! This increments the global nrOfSolutions with each solution found.
!
! Although the recusive call in this function may appear several times in
! the survey results, the solve() function is a better-performing
! parallelization candidate, due to its coarser granularity.
!
!ADVISOR CORRECTNESS EDIT: In order to avoid data races and correctness
!ADVISOR CORRECTNESS EDIT:
issues on the 'queens' array, we have to make
!ADVISOR CORRECTNESS EDIT:
a private copy of it.
!ADVISOR CORRECTNESS EDIT: So rename 'queens' to 'queens_in' in the next two
!ADVISOR CORRECTNESS EDIT:
lines
recursive subroutine setQueen(queens, row, col)
integer, intent(inout) :: queens(:)
integer, intent(in)
:: row, col
integer :: i
integer, volatile :: j
!ADVISOR CORRECTNESS EDIT: Uncomment the declaration of queens, and the
!ADVISOR CORRECTNESS EDIT: assignment statement, which will creates
!ADVISOR CORRECTNESS EDIT: a private copy of in_queens.
!integer :: queens(ubound(queens_in, dim=1))
!queens = queens_in
do i = 1, row-1
! Check for vertical attacks.
if (queens(i) == col) return
! Check for diagonal attacks.
if (abs(queens(i)-col) == (row-i)) return
end do
! Position is safe; set the queen.
queens(row) = col
if (row == size) then
!ADVISOR CORRECTNESS EDIT: Uncomment the following 2 lock annotations
!ADVISOR CORRECTNESS EDIT: to avoid a datarace on nrOfSolutions.
!call annotate_lock_acquire(0)
nrOfSolutions = nrOfSolutions + 1
!call annotate_lock_release(0)
else
! Try to fill next row.
do j = 1, size
call setQueen(queens, row+1, j)
end do
end if
end subroutine SetQueen
! Find all solutions for the nQueens problem on a size x size chessboard.
! On return, nrOfSolutions = number of solutions.
!
!ADVISOR COMMENT: When surveying, this is the top CPU-consuming function
!ADVISOR COMMENT: below the main function. This subroutine's do loop is
26
2
Find Where to Add Parallelism
!ADVISOR CONTENT: an excellent candidate for parallelization.
subroutine solve()
integer :: i
integer, allocatable :: queens(:) ! Array representing the chess board.
allocate(queens(size))
queens = 0
!ADVISOR SUITABILITY EDIT: Uncomment the three annotation calls below to
!ADVISOR SUITABILITY EDIT: model parallelizing the body of this do loop.
!call annotate_site_begin("solve")
do i = 1, size
!call annotate_iteration_task("setQueen")
! Try all positions in first row.
call SetQueen(queens, 1, i)
end do
!call annotate_site_end()
deallocate(queens)
end subroutine solve
end program
3.
Search for ADVISOR SUITABILITY EDIT and follow the directions in the sample code. Make four total
edits: Uncomment the !use advisor_annotate line near the top and three annotation lines.
TIP
Now is also a good time to simply explore our fully commented sample code.
4.
Save your edits.
Rebuild Target in Release Mode
Do one of the following:
•
•
In the Visual Studio* IDE: Choose Build > Build 1_nqueens_serial.
In the command prompt window: Change directory to the nqueens_Fortran\ directory, then type
devenv nqueens.sln /build release /project 1_nqueens_serial.
Key Terms
annotations, parallel site, synchronization, target, task
Next Step
Predict Maximum Parallel Performance Speedup
Predict Maximum Parallel Performance Speedup
To predict the maximum parallel performance speedup of your target based on the added Intel Advisor
annotations:
•
•
•
Collect Suitability data.
View the Suitability Report.
View the Summary window.
Each step is described more fully below.
27
2
Intel® Advisor Tutorial: Find Where to Add Parallelism
Collect Suitability Data
Under 3. Check Suitability in the Advisor Workflow, click the
the target executes.
button to collect Suitability data while
During the Suitability analysis, the Intel Advisor displays a window similar to the following.
NOTE
You can ignore any warnings about missing debugging symbols during this tutorial.
View Suitability Report
After the Suitability tool finalizes the data, the Intel Advisor displays a window similar to the following.
28
2
Find Where to Add Parallelism
The Maximum Program Gain For All Sites value shows the predicted maximum speedup of our
target based on Intel Advisor annotations and currently selected modeling parameters. Over a 6x
speedup is good!
This grid shows various metrics for each parallel site based on currently selected modeling
parameters, including the site's Impact to Program Gain. Our target has a single parallel site - the
solve parallel site, as identified in the Site Label column.
Use these modeling parameter drop-downs to experiment with different hardware configurations and
parallel frameworks.
Drop-Down
Set to This
Target System
•
•
•
CPU to model predicted maximum speedup when executing all
parallel sites on host CPUs
Intel Xeon Phi to model predicted maximum speedup when
executing all parallel sites on Intel® Xeon™ Phi coprocessors
Offload to Intel Xeon Phi to model predicted maximum speedup
when executing:
•
•
Threading Model
Serial parts of our target on host CPUs
Parallel sites, on a site-by-site basis, on host CPUs or Intel Xeon
Phi coprocessors
Intel TBB, Intel Cilk Plus, OpenMP, Microsoft TPL, or Other to
model predicted maximum speedup using the parallel framework
29
2
Intel® Advisor Tutorial: Find Where to Add Parallelism
If
Target
System
DropDown =
This
Then
And
Offload to
Intel
Xeon Phi
Checkbox
in
This
CPU Count Drop-Down =
This
And Coprocessor Threads
Drop-Down =This
=
CPU
Hidden
A modeling number of CPUs
that will work in parallel for all
parallel sites in your target
Hidden
Intel
Xeon Phi
Hidden
Hidden
A modeling number of
coprocessor threads that will
work in parallel for all parallel
sites in your target
Offload to
Intel
Xeon Phi
Selected
A modeling number of CPUs
that will work in parallel for
this parallel site and all other
sites not selected for offload.
A modeling number of
coprocessor threads that will
work on the Intel Xeon Phi
coprocessor for this parallel
site and all other parallel sites
selected for offload
Deselected
A modeling number of CPUs
that will work in parallel for
this parallel site and all other
sites not selected for offload.
A modeling number of
coprocessor threads that will
work on the Intel Xeon Phi
coprocessor for this parallel
site and all other parallel sites
selected for offload
The Scalability of Maximum Site Gain diagram graphically shows the predicted maximum speedup
for the solve parallel site in different scaling scenarios based on currently selected modeling
parameters.
A Bulls-Eye in This
Area
Means This
Red
Parallelization is not beneficial - and may even cause performance
degradation. Consider removing or modifying annotations, or
significantly refactoring the corresponding hotspot if you want to
parallelize it at any cost.
Yellow
The predicted maximum speedup may not be enough to justify the
effort needed to refactor and maintain your application. Consider
investigating.
Green
Parallel performance - and power efficiency - may improve significantly.
Use the Loop Iterations (Tasks) Modeling sliders and the Apply button to experiment with
different iteration counts and instance durations.
30
2
Find Where to Add Parallelism
Use the Runtime Modeling checkboxes to experiment with predicted maximum speedup if you plan
to use parallel framework code constructs to address parallel overhead, lock contention, or task
chunking; or if you plan to tune parallel code after you implement parallelism.
This area shows issues that generally prevent better parallel performance. A green bar is good; it
means this issue is not negatively impacting predicted maximum speedup. A yellow or red bar is not
good.
The Site Details area shows information about the solve parallel site and the setQueen task within
that parallel site.
Notice how your screen changes if you choose a Target System of Intel Xeon Phi or Offload to Intel
Xeon Phi.
The Scalability of Maximum Site Gain diagram graphically shows the predicted performance of
the manycore parallel coprocessor and its host CPUs. For many applications, the number of task
instances does not scale enough to fully utilize the many cores of the parallel coprocessor. An
application that is ready for an Intel Xeon Phi coprocessing system has a bulls-eye in the green part
of the diagram. A bulls-eye in the gray part of the diagram indicates an application that is not ready
for an Intel Xeon Phi coprocessing system; in such cases, try modeling another type of Target
System.
Use the Intel Xeon Phi Advanced Modeling checkbox, fields, and the Apply button to model the
expected speedup if you plan to modify your parallel code to improve vector parallel execution.
31
2
Intel® Advisor Tutorial: Find Where to Add Parallelism
TIP
These modeling parameters are fully explained in Intel Advisor Help.
Try experimenting now to see the impact of various modeling parameters on predicted maximum speedup
throughout the Suitability Report.
View Summary Window
Click Summary on the navigation toolbar to re-open the Summary window. Notice the Intel Advisor added
more data to this dashboard.
This area summarizes the maximum parallel performance speedup. It also provides easy access to
the Suitability Report window and your sources. Try clicking the Maximum Site Gain link now.
Then return to the Summary window and try clicking the Parallel Site link.
The question marks for detected Correctness Problems mean you have not yet collected any
Correctness data.
In addition to the newly acquired information from the Suitability Report, the dashboard still shows
data from the Survey Report.
You now have collection data from two of the three Intel Advisor analysis tools.
32
2
Find Where to Add Parallelism
Key Terms
annotations, parallel site, target, task
Next Step
Predict Parallel Data Sharing Problems
Predict Parallel Data Sharing Problems
To predict parallel data sharing problems in your target based on the added Intel Advisor annotations:
•
•
•
•
•
Build the target in debug mode and test.
Change Intel Advisor project properties.
Collect Correctness data.
View the Correctness Report.
View the Summary window.
Each step is described more fully below.
Build Target in Debug Mode and Test
If you prefer to work in the Visual Studio* IDE
1.
2.
3.
4.
5.
6.
7.
8.
If the Solutions Configuration drop-down on the Visual Studio* Standard toolbar in set to Release,
change to Debug.
Open the Property Pages dialog box for the 1_nqueens_serial project.
Choose Configuration Properties > Debugging and set the Command Arguments field to 8. This
reduces the size of the chess board to minimize execution time.
Save your changes and close the Property Pages dialog box.
Choose Build > Build 1_nqueens_serial.
Choose Debug > Start Without Debugging.
If the Visual Studio* IDE responds that any projects are out of date, click Yes to build them.
Notice the application output window displays a board size of 8.
If you prefer to work in the Standalone Intel Advisor GUI
1.
Build the target in debug mode:
•
•
2.
In the command window, change directory to the nqueens_Fortran\ directory (where the zipped
sample files were extracted to).
Type devenv nqueens.sln /build debug /project 1_nqueens_serial to build the target in
debug mode.
Test the target:
•
In the command window, change directory to the 1_nqueens_serial\Debug\ directory.
•
Type 1_nqueens_serial.exe 8 to execute the application using a reduced chessboard size to
minimize execution time.
Notice the application output window displays a board size of 8.
•
Change Intel Advisor Project Properties
If you prefer to work in the Visual Studio* IDE
1.
2.
Choose Projects > Intel Advisor version Project Properties.
In the Analysis Target tab:
•
Change the Target type drop-down to Correctness Analysis.
33
2
Intel® Advisor Tutorial: Find Where to Add Parallelism
•
3.
Verify the Inherit settings from Visual Studio* project checkbox is selected and the
Application parameters field is set to 8.
Click the OK button to save the changes.
If you prefer to work in the Standalone Intel Advisor GUI
1.
2.
Choose File > Project Properties....
In the Analysis Target tab:
•
•
Change the Target type drop-down to Correctness Analysis.
Click the Browse... button next to the Application field and choose the nqueens_Fortran
\1_nqueens_serial\Debug\1_nqueens_serial.exe file.
•
3.
4.
Set the Application parameters field to 8. This reduces the size of the chess board to minimize
execution time.
Click the Binary/Symbol Search tab and change the current search location directory to the
nqueens_Fortran\Debug\ directory.
Click the OK button to save the changes.
Collect Correctness Data
Under 4. Check Correctness in the Advisor Workflow, click the
while the target executes.
button to collect Correctness data
During the Correctness analysis, the Intel Advisor displays a window similar to the following.
View Correctness Report
After the Correctness tool finalizes the data, the Intel Advisor displays a window similar to the following.
34
2
Find Where to Add Parallelism
The Problems and Messages pane lists the detected messages and potential data sharing
problems: Data communication and Memory reuse problems. The severity of each problem or
message is indicated by the following icons:
for error,
for warning, or
for informational
remark. Clicking a problem or message row displays more information about it in the Code
Locations pane. Try clicking the Data communication row now.
The Filter pane lets you temporarily limit the problems and messages displayed in the Problems
and Messages pane to those that meet specific criteria.
Problems in parallel programming usually involve multiple, interrelated code regions. The Code
Locations pane shows a code snippet from each involved code region.
If you need to dig deeper into a data sharing problem, you can double-click the associated row in the
Problems and Messages pane to display a Correctness Source window where you can:
•
•
See more source code than just a short code snippet.
Navigate through the call stack.
View Summary Window
Click Summary on the navigation toolbar to re-open the Summary window. Notice the Intel Advisor added
even more data to this dashboard to help you weigh the predicted maximum speedup benefit against the
cost of fixing sharing problems for your sites and tasks.
35
2
Intel® Advisor Tutorial: Find Where to Add Parallelism
Notice the question marks have been replaced with a Correctness Problems summary: Three
problems with a severity of error and no problems with a severity of warning.
You now have collection data from all three Intel Advisor analysis tools. Congratulations!
Key Terms
data race, synchronization
Next Step
Fix Data Sharing Problems
Fix Data Sharing Problems
Is the predicted maximum speedup benefit worth the effort to fix the data sharing problems? In this case,
yes. But you may have a different answer when you use the Intel Advisor to find where to add parallelism in
your own applications.
•
•
•
•
Fix memory reuse and data communication sharing problems.
Rebuild the target in debug mode.
Rerun the Correctness tool.
View the Summary window.
Each step is described more fully below.
36
2
Find Where to Add Parallelism
Fix Memory Reuse and Data Communication Sharing Problems
1.
2.
Click Correctness Report in the navigation toolbar to re-open the Correctness Report.
In the Problems and Messages pane, right-click the Data communication data row and choose Edit
Source to open the nqueens_serial.f90 source file in an editor.
3.
Search for ADVISOR CORRECTNESS EDIT and follow the directions in the sample code to fix the
problems. Make six total edits: Make a private copy of queens_in and uncomment two lock annotation
lines.
Save your edits.
4.
Rebuild Target in Debug Mode
If you prefer to work in the Visual Studio* IDE
1.
2.
If the Solutions Configuration drop-down on the Visual Studio* Standard toolbar is set to Release,
change to Debug.
Choose Build > Build 1_nqueens_serial.
If you prefer to work in the Standalone Intel Advisor GUI
1.
In the command window, change directory to the nqueens_Fortran\ directory (where the zipped
sample files were extracted to).
2.
Type devenv nqueens.sln /build debug /project 1_nqueens_serial to rebuild the target in
debug mode.
Rerun Correctness Tool
Under 4. Check Correctness in the Advisor Workflow, click the
again.
button to collect Correctness data
Notice the Correctness Report now reports one problem.
View Summary Window
Click Summary in the navigation toolbar to re-open the Summary window. Notice this dashboard now
shows only one Correctness problem.
Key Terms
data race, synchronization, target
Next Step
Add Parallelism
Add Parallelism
At this point, you would normally:
1.
2.
3.
Rebuild the target in release mode.
Re-run the Suitability Report to see how your Correctness fixes impact the predicted maximum
speedup.
Decide if the predicted maximum speedup benefit is worth the effort to add parallelism to your target.
If you decide to add parallelism to your target, you would:
1.
2.
Replace the Intel Advisor annotations with parallel framework code.
Build a parallel version of your target in release mode.
37
2
Intel® Advisor Tutorial: Find Where to Add Parallelism
For the Purposes of This Tutorial
Because we are trying to keep this tutorial short, we replaced Intel Advisor annotations with OpenMP*
parallel framework code for you. Consider exploring these replacements by opening the nqueens_omp.f90
file.
TIP
The steps for replacing annotations with parallel framework code are fully explained in Intel Advisor
Help.
In addition, consider building the OpenMP* target in release mode - to, perhaps, compare actual parallel
execution time to actual serial execution time.
Key Terms
parallel framework
Next Steps
After you convert Intel Advisor annotations to parallel framework code, test the resulting parallel application
for correctness and verify its actual parallel performance using the Intel® Inspector and Intel® VTune™
Amplifier respectively.
38
3
Summary
This tutorial demonstrated an end-to-end workflow you can ultimately apply to your own applications.
Step
Tutorial Recap
Key Tutorial Take-aways
1. Prepare for
tutorial.
If you worked in the Visual Studio*
IDE: You chose an Intel Advisor
sample application project; verified
it is set to produce the most
accurate and complete analysis
results; built it in release mode,
tested the resulting target to
ensure it runs on your system
outside the Intel Advisor, and
verified Intel Advisor project
properties.
•
•
A target is an executable file the Intel
Advisor can analyze.
Applications compiled and linked in release
mode using the following options produce
the most accurate and complete Survey
and Suitability analysis results:
•
•
/I"%ADVISOR_XE_2015_DIR
%"\include\ia32 or /
I"%ADVISOR_XE_2015_DIR
%"\include\intel64
• /L"%ADVISOR_XE_2015_DIR
%"\lib32 or /
L"%ADVISOR_XE_2015_DIR
%"\lib64
• /ladvisor
If you worked in the Standalone
Intel Advisor GUI: You chose an
Intel Advisor sample application,
built it in release mode, tested the
resulting target to ensure it runs
on your system outside the Intel
Advisor, and created and
configured a new Intel Advisor
project to hold analysis results for
the target.
•
•
You ran a Survey analysis on the
target to highlight hotspots that
you subsequently explored.
Compiler/Full debug information: /
debug=full
•
Step 2: Discover
parallel
opportunities.
Compiler/Additional include directories
and library:
•
•
•
Compiler/Moderate optimization: /O2 or
higher and /Ob1
Linker/Full debug information: /DEBUG
Hotspots are code regions that consume a
significant amount of runtime.
Loops are often the most time-consuming
parts of an application.
Use the Advisor Workflow to:
•
•
•
Provide a roadmap for finding where to
add parallelism.
• Launch Intel Advisor analysis tools.
• Provide links to relevant topics in Intel
Advisor Help.
Use the Survey Report to locate the loops
and functions where the target spends the
most time.
Think of the Summary window as a
dashboard to which the Intel Advisor adds
more data each time you run Intel Advisor
tools.
39
3
Intel® Advisor Tutorial: Find Where to Add Parallelism
Step
Tutorial Recap
Key Tutorial Take-aways
Step 3: Mark
best parallel
opportunities
with
annotations.
You marked the hotspots with
parallel site and task annotations,
and rebuilt the target in release
mode.
•
•
•
•
Step 4: Predict
maximum
parallel
performance
speedup.
You ran a Suitability analysis to
predict the maximum parallel
performance speedup based on the
added annotations, and posed
modeling (what-if) questions.
Step 5: Predict
parallel data
sharing
problems.
You built the target in debug
mode, changed Intel Advisor
project properties, and ran a
Correctness analysis that
discovered parallel data sharing
problems based on the added
annotations.
•
•
•
•
Annotations are subroutine calls or macros
that identify certain information for Intel
Advisor analysis tools, such as the location
of proposed parallel sites.
A parallel site is a region of code that
contains one or more time-consuming
tasks that may execute in parallel threads
to distribute work.
Include annotation definitions in your
source file(s) like so: use
advisor_annotate.
Annotations are fully explained in Intel
Advisor Help.
Use the Suitability Report to show the
predicted maximum speedup for each
parallel site and for the target as a whole.
Perform mathematical modeling to see
how changing various parameters
influences the Maximum Program Gain
For All Sites and other values.
A data race occurs when multiple tasks
read and write data at a shared memory
location without coordinating those read
and write operations. This can produce
parallel execution errors that are difficult
to detect and reproduce.
Applications compiled and linked in debug
mode using the following options produce
the most accurate and complete
Correctness analysis results:
•
Compiler/Additional include directories
and library:
•
/I"%ADVISOR_XE_2015_DIR
%"\include\ia32 or /
I"%ADVISOR_XE_2015_DIR
%"\include\intel64
• /L"%ADVISOR_XE_2015_DIR
%"\lib32 or /
L"%ADVISOR_XE_2015_DIR
%"\lib64
• /ladvisor
•
Compiler/Full debug information: /
debug=full
•
•
•
•
40
Compiler/No optimization: /Od
Compiler/Multithreaded, dynamically
linked libraries: /MD or /MDd
• Linker/Full debug information: /DEBUG
Use the Correctness Report to predict
parallel data sharing problems in the
annotated target.
Reduce the input data set to minimize
Correctness tool execution time.
3
Summary
Step
Tutorial Recap
Key Tutorial Take-aways
Step 6: Fix data
sharing
problems.
You fixed the easy parallel data
sharing problems, rebuilt the
target in debug mode, and ran
another Correctness analysis to
ensure you corrected most of the
parallel data sharing problems.
•
You explored how we converted
Intel Advisor annotations into the
OpenMP* parallel framework for
you.
•
Step 7: Add
parallelism.
•
•
•
•
Fix parallel data sharing problems only if
the predicted maximum speedup benefit
outweighs the cost of the fix.
Unlike problems reported in serial
applications, which often have a single
cause, problems in parallel applications
usually involve multiple, interrelated code
regions.
A parallel framework is a combination of
libraries, language features, or other
software techniques that enable code to
execute in parallel.
Add parallelism only if the predicted
maximum speedup benefit outweighs the
cost of adding parallel framework code.
The steps for replacing annotations with
parallel framework code are fully explained
in Intel Advisor Help.
After you convert Intel Advisor annotations
to parallel framework code, test the
resulting parallel application for
correctness and verify its actual
performance using the Intel® Inspector and
Intel® VTune™ Amplifier respectively.
41
4
Intel® Advisor Tutorial: Find Where to Add Parallelism
Key Terms
4
The following terms are used throughout this tutorial.
annotation: Intel Advisor annotations are call statements that identify certain information to Intel Advisor
tools, such as the location of proposed parallel sites. To insert annotations into your source code, you can
copy code snippets from the annotation assistant pane into your code editor. On Windows systems, you can
instead use the Intel Advisor annotation wizard.
data race: A bug that can occur after adding parallelism to parts of your application. A data race occurs
when multiple tasks read and write data at a shared memory location without coordinating those read and
write operations. This can produce parallel execution errors that are difficult to detect and reproduce. Using
the Correctness tool helps you predict and fix likely data races before you add parallelism.
hotspot: A code region that consumes much of your application's run time, such as a loop, and is often a
good candidate for parallelism. Hotspots can be identified by a profiler, such as the Intel Advisor Survey tool
or the Intel® VTune™ Amplifier.
parallel framework: A combination of libraries, language features, or other software techniques that enable
code for your application to execute in parallel. Examples for C/C++ include Intel® Threading Building Blocks
and Intel® Cilk™ Plus, which are both included with the Intel compiler. The OpenMP* parallel framework for
C/C++ and Fortran code is available with multiple compilers.
parallel site: A region of code that contains one or more tasks that may execute in parallel. An effective
parallel site typically contains a hotspot that consumes much of your application's time. To distribute these
frequently executed instructions to different tasks that can run at the same time, your parallel site is not
usually located at the hotspot, but higher in the call tree. For example, a parallel site might be located in a
function whose code eventually executes the hotspot. All tasks that were started within a site must complete
before execution is allowed to proceed past the end of a site.
synchronization: Coordinating the execution of multiple threads. In some cases, you can provide
synchronization within a task by using a private memory location instead of a shared memory location. In
other cases, you can add a lock or mutex to restrict access to shared data and… prevent a data race.
target: An executable file. Intel Advisor tools run with your target executable to collect data and perform
analysis about its execution characteristics.
task: A portion of time-consuming code and its data that can be executed in one or more parallel threads to
distribute work. One or more tasks execute within a parallel site.
42