HOW TO VISUALIZE YOUR GPU-ACCELERATED SIMULATION RESULTS Peter Messmer, NVIDIA

HOW TO VISUALIZE YOUR
GPU-ACCELERATED SIMULATION RESULTS
Peter Messmer, NVIDIA
RANGE OF ANALYSIS AND VIZ TASKS
 Analysis: Focus quantitative
 Visualization: Focus qualitative
 Monitoring, Steering
TRADITIONAL HPC WORKFLOW
Workstation
Setup
Analysis,
Visualization
Supercomputer
Viz Cluster
Dump,
Checkpointing
Visualization,
Analysis
File System
TRADITIONAL WORKFLOW: CHALLENGES
Lack of interactivity
prevents “intuition”
Workstation
Setup
Supercomputer
High-end viz
neglected due
Analysis,
Visualization to workflow
complexity
Viz Cluster
I/O becomes main
simulation bottleneck
Dump,
Checkpointing
File System
Viz resources need
to scale with simulation
Visualization,
Analysis
OUTLINE





Visualization applications
CUDA/OpenGL interop
Remote viz
Parallel viz
In-Situ viz
High-level overview. Some parts platform dependent. Check
with your sysadmin.
VISUALIZATION APPLICATIONS
NON-REPRESENTATIVE VIZ TOOLS SURVEY
OF 25 HPC SITES
Surveyed sites:
LLNL LLNLORNLNERSC -OCF SCF
LANL CCS
DOD-ORC
AFRLDSCR
AFRL
ARL
NASAERDC NAVY MHPCC ORS CCAC NAS NASA-NCCS
TACC CHPC RZG
HLRN Julich CSCS
CSC
Hector Curie
NON-REPRESENTATIVE VIZ TOOLS SURVEY
OF 25 HPC SITES
Surveyed sites:
LLNL LLNLORNLNERSC -OCF SCF
LANL CCS
DOD-ORC
AFRLDSCR
AFRL
ARL
NASAERDC NAVY MHPCC ORS CCAC NAS NASA-NCCS
TACC CHPC RZG
HLRN Julich CSCS
CSC
Hector Curie
VISIT
 Scalar, vector and tensor field data features
— Plots: contour, curve, mesh, pseudo-color, volume,..
— Operators: slice, iso-surface, threshold, binning,..
 Quantitative and qualitative analysis/vis
— Derived fields, dimension reduction, line-outs
— Pick & query
 Scalable architecture
 Open source
http://wci.llnl.gov/codes/visit/
VISIT
 Cross-platform
— Linux/Unix, OSX, Windows
 Wide range of data formats
— .vtk, .netcdf, .hdf5,..
 Extensible
— Plugin architecture
 Embeddable
 Python scriptable
VISIT’S SCALABLE ARCHITECTURE




Client-server architecture
Server MPI parallel
Distributed filtering
(multi-)GPU accelerated,
parallel rendering*
* requires X server on each node
PARAVIEW
 Scalar, vector and tensor field data features
— Plots: contour, curve, mesh, pseudocolor, volume,..
— Operators: slice, iso-surface, threshold, binning,..
 Quantitative and qualitative analysis/vis
— Derived fields, dimension reduction, line-outs
— Pick & query
 Scalable architecture
 Developed by Kitware, open source
http://www.paraview.org
PARAVIEW’S SCALABLE ARCHITECTURE
 Client-server-server architecture
 Server MPI parallel
 Distributed filtering
 GPU accelerated, parallel
rendering*
* requires X server on each node
SOME OTHER TOOLS
 Wide range of visualization tools
 Often emerged from specialized application domain
— Tecplot, EnSight: structural analysis, CFD
— IndeX: seismic data processing & visualization
— IDL: image processing
 Early adopters of visual programming
— AVS/Express, OpenDX
FURTHER READING
 Paraview Tutorial:
http://www.paraview.org/Wiki/The_ParaView_Tutorial
 VisIt Manuals/Tutorials:
http://wci.llnl.gov/codes/visit/manuals.html
VTK - THE VISUALIZATION TOOLKIT
VISUALIZATION TOOLKIT




Focus on visualization, not (only) rendering
Provides more complex operations on data (“filtering”)
Introduces visualization pipeline
At the core of many high-level viz tools
http://www.vtk.org
VTK PIPELINE
Source
Raw data, shapes
Filter
Transform raw data
Mapper
Map data to geometry
Renderer
Render the geometry
OPENGL
From the CUDA perspective
OpenGL
OPENGL: API FOR GPU ACCELERATED
RENDERING
•
•
•
•
Primitives: points, lines, polygons
Properties: colors, lighting, textures, ..
View: camera position and perspective
Shaders: Rendering to screen/framebuffer
• C-style functions, enums
See e.g. “What Every CUDA Programmer Should Know About OpenGL”
(http://www.nvidia.com/content/GTC/documents/1055_GTC09.pdf)
A SIMPLE OPENGL EXAMPLE
glColor3f(1.0f,0,0);
State-based API
(sticky attributes)
glBegin(GL_QUADS);
glVertex3f(-1.0f, -1.0f, 0.0f); // The bottom left corner
glVertex3f(-1.0f, 1.0f, 0.0f);
// The top left corner
glVertex3f(1.0f, 1.0f, 0.0f);
// The top right corner
glVertex3f(1.0f, -1.0f, 0.0f);
// The bottom right corner
Drawing
glEnd();
glFlush();
Render to screen
glColor3f(1.0f,0,0);
glBegin(GL_QUADS);
glVertex3f(-1.0f, -1.0f, 0.0f);
glVertex3f(-1.0f, 1.0f, 0.0f);
glVertex3f(1.0f, 1.0f, 0.0f);
glVertex3f(1.0f, -1.0f, 0.0f);
glEnd();
?
float* vert={-1.0f, -1.0f, ..};
float* d_vert;
cudaMalloc(&d_vert, n);
cudaMemcpy(d_vert, vert, n,
cudaMemcpyHostToDevice);
renderQuad<<<N/128, N>>>(d_vert);
glFlush();
flushToScreen<<<..>>>();
CUDA-OPENGL INTEROP: MAPPING MEMORY
• OpenGL: Opaque data buffer object
• Virtex Buffer Object (VBO)
• User has very limited control
• CUDA: C-style memory management
• User has full control
• CUDA-OpenGL Interop:
Map/Unmap OpenGL buffers into CUDA memory space
RENDERING FROM A VBO
Populate
Render
Display
display()
Initialize VBO
init()
Create VBO
RENDERING FROM A VBO WITH CUDA
INTEROP
Initialize VBO
Register VOB with CUDA
init()
Create VBO
Populate
Unmap VBO from CUDA
Render
Display
display()
Map VOB to CUDA
MAIN OPENGL-CUDA INTEROP ROUTINES
 Register VBO for CUDA
cudaGraphicsGLRegisterBuffer(cuda_vbo, *vbo, flags);
 Mapping of VBO to CUDA
cudaGraphicsMapResources(1, cuda_vbo, 0);
cudaGraphicsResourceGetMappedPointer(&d_ptr, size,
*cuda_vbo);
 Unmapping VBO from CUDA
cudaGraphicsUnmapResources(1, cuda_vbo, 0);
CAN ALL GPUS SUPPORT OPENGL?
 GeForce : standard feature set including OpenGL 4.3/4.4
 Quadro : + certain highly accelerated features (e.g. CAD)
 Tesla: If GPU is in “All on”
operation mode*
Graphics capabilities disabled
nvidia-smi –-query-gpu=gom.current
nvidia-smi –q
*requires “m” or “X” class devices
Graphics capabilities enabled
OPENGL SUMMARY
+ Relatively simple for basic viz
+ Existing/fixed/simple rendering pipeline
- Low level
- Triangles, points, rather than isosurfaces, height-fields
- (currently) depends on Xserver for context creation
- What about remote, parallel etc?
MORE OPENGL INFORMATION AT GTC
S4455 - Multi-GPU Rendering, 03/24, 14:30, 210A
S4825 - Tegra K1 Developer Tools for Android: Unleashing the Power of the Kepler
GPU with NVIDIA's latest Developer Tools Suite, 03/24, 14:30, 210E
S4379 - OpenGL 4.4 Scene Rendering Techniques, 3/25, 13:00, 210C
S4810 - NVIDIA Path Rendering: Accelerating Vector Graphics for the Mobile Web,
3/25, 13:30, LL21C
S4610 - OpenGL: 2014 and Beyond, 3/25, 14:00, 210C
S4385 - Order Independent Transparency in OpenGL, 3/25, 210C
REMOTE VISUALIZATION
OPENGL CONTEXT
 State of an OpenGL instance
— Incl. viewable surface
— Interface to windowing system
 Context creation: platform specific
— Not part of OpenGL
— Handled by Xserver in Linux/Unix-like systems
 GLX: Interaction X<-> OpenGL
LOCAL RENDERING
Application
X11
Events
GLX
OpenGL
2D/3D X Server
Driver
GPU, monitor attached
Commands
Xlib
libGL
X-FORWARDING: THE SIMPLEST FORM OF
“REMOTE” RENDERING
Application
X11 Commands
libGL
Xlib
X11 Events
OpenGL/GLX
On remote system:
2D/3D X Server
export DISPLAY=59.151.136.110:0.0
Driver
GPU, monitor attached
Network
X-FORWARDING
+ Simple!
+ ssh –X
+ No need to run X on remote machine
- Lots of data crosses the network
- All rendering performed on the local Xserver
- Not useful for visualization of remote CUDA Interop apps
(X-server doesn’t “see” remote GPU)
SERVER-SIDE RENDERING + REMOTE VIZ
APPLICATION
Application
Xlib
libGL
X11
Events
GLX
X11
Cmds
OpenGL
2D/3D X Server
Client
Driver
Driver
GPU
GPU, monitor attached
Network
SERVER-SIDE RENDERING + SCRAPING
Application
Xlib
libGL
X11
Events
GLX
OpenGL
X11
Cmds
2D/3D X Server
Driver
Scraper
Images
GPU
Client
Driver
GPU, monitor attached
Network
SERVER-SIDE RENDERING + SCRAPING
+ Full GPU acceleration
+ No X server on client
Question: when to scrape?
— Xserver not informed about direct rendering
— Intercept glxSwapBuffers()
- Not multi-user
=> Occasionally used for remote desktop tools
GLX FORKING: OUT-OF-PROCESS
Application
Xlib
libGL
OpenGL/
GLX
OpenGL
X11
Events
X11
Cmds
Proxy X Server
Images
Images
GLX
3D X Server
Client App
Driver
Driver
GPU
GPU, monitor attached
Network
GLX FORKING: OUT-OF-PROCESS
+ Full GPU acceleration
+ No X server on client
+ Multi-User
- All traffic through Proxy X server
=> Occasionally used for remote desktop tools
GLX FORKING WITH INTERPOSER LIBRARY
Application
X11 Commands
VirtualGL
libGL
X11 Events
VGL transport
(compressed)
GLX
OpenGL
Xlib
VirtualGL
client
Images
Images
3D X Server
2D X Server
Driver
Driver
GPU
GPU, monitor attached
Network
GLX FORKING WITH INTERPOSER LIBRARY
Application
VirtualGL
libGL
X11
X11
Events Cmds
Proxy X
Server
GLX
OpenGL
Xlib
Images
Images
3D X Server
Client App
Driver
Driver
GPU
GPU, monitor attached
Network
VIRTUAL GL + TURBOVNC
+ Compressed image transport
+ Transparent to the application
+ Fully GPU accelerated OpenGL
+ Client with or without Xserver
- Requires Xserver to access GPU
http://www.virtualgl.org
HOW TO SET UP VIRTUAL GL + TURBOVNC
 Requires Xserver running on server
— Root privileges for Xserver
 Requires installation of VirtualGL, TurboVNC on server
 Start VirtualGL-accelerated VNC server
vncserver :3
 TurboVNC viewer on client
— Linux, Windows, Javascript
CONNECTING CLIENT TO REMOTE SERVER
CONNECTING TO REMOTE VNC SERVER VIA
A GATEWAY
 Establish tunnel on client
ssh –L 3333:node01.cluster.net:5903 login.cluster.net
 Connect client to localhost:3333
port 3333
port 5903
Gateway
node01.cluster.net
login.cluster.net
client
LAUNCHING REMOTE CUDA/OPENGL
APPLICATIONS VIA VGLRUN
vglrun :3 glxgears
export DISPLAY=:3
vglrun simpleGL
REMOTE VIZ IN PARAVIEW/VISIT
 Approach 1: VirtualGL and VNC
export DISPLAY=:3
vglrun
paraview
 Approach 2: Local Client
— On remote server:
pvserver
On local workstation:
paraview
Paraview Client & Server
under VirtualGL
REMOTE VISUALIZATION SUMMARY
 Multiple approaches to remote rendering
— X forwarding, remote viz app, scraping, interposer process/library
 Currently requires X server to generate context
— EGL will fix this, but requires application changes
PARALLEL VISUALIZATION
Parallel Viz
PARALLEL VISUALIZATION
 Parallelism at multiple levels
— Filtering
— Rendering
- Both supported by VisIt & Paraview
- Heavy lifting already done!
- Challenge: Setup in parallel environment
- Both tools provide support for most common cases
- Both VisIt & Paraview MPI parallel -> need custom build
BASIC STEPS TO PARALLEL VISUALIZATION
 Launch visualization server processes
— Most likely through queuing system
 Connect client to head node
 Tunnel from workstation
 Setting up virtualgl on remote node
DOMAIN DECOMPOSITION OF DATA
- Visualization algorithms can work on
decomposed data
- May require ghost cells
- May lead to load imbalance
No ghost cells required
Ghost cells required
PARALLEL COMPOSITING
- Distributed geometry
- Render with depth
information
- Composition using IceT
http://icet.sandia.gov
PARALLEL VISUALIZATION
 Particularly important for large datasets
 Visualization time often determined by filtering
 Rendering important if
— Highly complex visualization
— Complex visualization effects
— Low-power CPU
 Transparent support in Paraview, VisIt
IN-SITU VISUALIZATION & STEERING
BENEFIT OF IN-SITU VISUALIZATION
 Pipeline simulation cycle
— Visualization/analysis integral part of simulation
 Immediate feedback
 Reduce pressure on file system
 In some (future) cases: only way to analyze/visualize data
LIBSIM IN VISIT: VISIT SERVER AS A LIBRARY
CONNECTING TO A RUNNING APPLICATION
BUILD VISUALIZATION PIPELINE FOR
RUNNING APPLICATION
BASIC INTERACTION/STEERING WITH
RUNNING APPLICATION
INTERACTIVE VIZ APPLICATION WITH
LIBSIM
 User implements callbacks for VisIt
— meta-data, mesh, variables and domains
 GetData: VisIt requests data for visualization
— Work directly on simulation data
— Transfer data from GPU
 Command server: Interaction VisIt front-end <-> simulation
— “Steering”
http://www.visitusers.org/index.php?title=VisIttutorial-in-situ
THE FUTURE
The Future
VISUALIZATION DATA FLOW
Filtering
CPU
Simulation
GPU
Rendering
VISUALIZATION WITH GPU ACCELERATED
FILTER
Filtering
CPU
Simulation
GPU
Filtering
Rendering
VISUALIZATION WITH GPU ACCELERATED
FILTER AND HARDWARE RENDERING
Filtering
CPU
Simulation
GPU
Filtering
Rendering
FULLY GPU ACCELERATED VISUALIZATION
CPU
Simulation
GPU
Filtering
Rendering
SDAV – SCIDAC-3 INSTITUTE ON VIZ/ANALYSIS AT
SCALE (2011-2016)
 Management, Analysis, Visualization
— In-situ analysis, indexing/compression
— I/O-, Viz frameworks
— Supporting application teams
 SDAV tools deployment: Paraview, VisIt, IceT, ..
http://www.sdav-scidac.org/
SDAV – SCIDAC-3 INSTITUTE ON VIZ/ANALYSIS AT
SCALE (2011-2016)
 PISTON (LANL): Data Parallel Visualization Operators
— Isosurface, cut, threshold
— Built on top of Thrust
— Support of most Paraview operators
— Incorporation into VTK
 DAX (Sandia), DIY (Argonne), EVAL (ORNL)
S4553 - Productive Programming with Descriptive Data: Efficient Mesh-Based
Algorithm Development in EAVL
4620 - DAX: A Massively Threaded Visualization and Analysis Toolkit for Extreme
Scale
VIZ TALKS AT GTC2014

S4571 - Applications of GPU Computing to Mission Design and Satellite Operations at NASA's Goddard Space Flight
Center

Abel Brown ( Principal Systems Engineer, A.I. Solutions

S4632 - Exploring the Earth in 3D: Multiple GPUs for Accelerating Inverse Imaging

S4553 - Productive Programming with Descriptive Data: Efficient Mesh-Based Algorithm Development in EAVL

S4620 - Dax: A Massively Threaded Visualization and Analysis Toolkit for Extreme Scale

S4410 - Visualization and Analysis of Petascale Molecular Simulations with VMD

S4745 - Now You See It: Unmasking Nuclear and Radiological Threats Around the World

S4203 - Gesture-Based Interactive Visualization of Large-Scale Data using GPU and Latest Web Technologies

S4516 - Scientific Data Visualization on GPU-Enabled, Hybrid HPC Systems

S4599 - An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling

S4778 - Interactive Processing and Visualization of Geospatial Imagery

S4140 - Live, Interactive, In-Situ, In-GPU Visualization of Plasma Simulations Running on GPU Supercomputers

S4400 - Petascale Molecular Ray Tracing: Accelerating VMD/Tachyon with OptiX

S4811 - Extreme Machine Learning with GPUs
SUMMARY
 Different methods of visualization at different levels
— OpenGL, VTK, VisIt & Paraview
 Remote visualization concepts
— X forwarding, Viz app, scraping, GLX forking
 Parallel rendering and compositing
— Handled transparently in key tools
 In-situ visualization concepts
— Expose simulation variables to visualization tool, interactive viz
 Future directions
— GPU accelerated filtering and rendering
Thanks to Gilles Fourestey (CSCS), Nina Suvanphim (Cray), Jean Favre (CSCS), Adam DeConinck (NVIDIA),
Robert Crovella (NVIDIA), Dale Southard (NVIDIA), Hank Childs (U Oregon), Jeremy Meredith (ORNL), Ian
Williams (NVIDIA), Steve Parker (NVIDIA), Kitware, Paraview, VisIt and VirtualGL developers for their support!
ENJOY THE CONFERENCE AND SEE YOU AT
GTC 2015!
ABSTRACT (FOR REFERENCE ONLY)
Learn how to take advantage of GPUs to visualize results of your GPU-accelerated
simulation! This session will cover a broad range of visualization and analysis
techniques allowing you to investigate your data on the fly. Starting with some basic
CUDA/OpenGL interoperability, we will introduce more sophisticated data models
allowing you to take advantage of widely used tools like ParaView and VisIt to
visualize your GPU resident data. Questions like parallel compositing, remote
visualization and application steering will be addressed in order to allow you to take
full advantage of the GPUs installed in your supercomputing system.