Present and Future Leadership Computers at OLCF

Present and Future Leadership
Computers at OLCF
Jack Wells, Director of Science
Oak Ridge Leadership Computing Facility
Oak Ridge National Laboratory
#OpenPOWERSummit
Join the conversation at #OpenPOWERSummit 1
Oak Ridge Leadership Computing Facility (OLCF)
Mission: Deploy and operate the computational resources
required to tackle global challenges.
 Providing world-leading computational and data resources
and specialized services for the most computationally
intensive problems
 Providing stable hardware/software path of increasing
scale to maximize productive applications development
 Providing the resources to investigate otherwise
inaccessible systems at every scale: from galaxy formation
to supernovae to earth systems to automobiles to
nanomaterials
 With our partners, deliver transforming discoveries in
materials, biology, climate, energy technologies, and basic
science
Our Science requires that we continue to advance OLCF’s computational
capability over the next decade on the roadmap to Exascale.
Since clock-rate scaling ended in 2003, HPC
performance has been achieved through
increased parallelism. Jaguar scaled to
300,000 cores.
Summit: 5-10x Titan
Hybrid GPU/CPU
10 MW
Titan: 27 PF
Hybrid GPU/CPU
9 MW
Jaguar: 2.3 PF
Multi-core CPU
7 MW
2010
Titan and beyond deliver hierarchical
parallelism with very powerful nodes. MPI
plus thread level parallelism through
OpenACC or OpenMP plus vectors
2012
CORAL System
2017
2022
OLCF5: 5-10x Summit
~20 MW
Join the conversation at #OpenPOWERSummit
4
Science Accomplishments Highlights from Titan
Cosmology
Salman Habib
Argonne National
Laboratory
Habib and collaborators
used its HACC Code on
Titan’s CPU–GPU system
to conduct today’s
largest cosmological
structure simulation at
resolutions needed for
modern-day galactic
surveys.
K. Heitmann, 2014.
arXiv.org, 1411.3396
Combustion
Superconducting
Materials
Molecular
Science
Fusion
Jacqueline Chen
Sandia National
Laboratory
Chen and collaborators
for the first time
performed direct
numerical simulation
of a jet flame burning
dimethyl ether (DME)
at new turbulence
scales over space and
time.
A. Bhagatwala, et al.
2014. Proc. Combust.
Inst. 35.
Paul Kent
ORNL
Michael Klein
Temple University
Paul Kent and
collaborators performed
the first ab initio
simulation of a cuprate.
They were also the first
team to validate
quantum Monte Carlo
simulations for hightemperature
superconductor
simulations.
Researchers at Procter &
Gamble (P&G) and
Temple University
delivered a
comprehensive picture
in full atomistic detail of
the molecular properties
that drive skin barrier
disruption.
K. Foyevtsova, et al.
2014. Phys. Rev. X 4
C. M. MacDermaid, et al.
2014. J. Chem. Phys. 141
M. Paloncyova, et al.
2014. Langmuir 30
C.S. Chang
PPPL
Chang and collaborators
used the XGC1 code on Titan
to obtain fundamental
understanding of the
divertor heat-load width
physics and its dependence
on the plasma current in
present-day tokamak
devices.
C. S. Chang, et al. 2014.
Proceedings of the 25th
Fusion Energy Conference,
IAEA, October 13–18, 2014.
Industrial Partnerships:
Accelerating Competitiveness through Computational Sciences
Number of OLCF Industry Projects
Operational Per Calendar Year
Number of Projects
40
34
35
30
28
25
19
20
21
15
15
10
3
5
5
5
2007
2008
8
0
2005
2006
2009 2010
Year
2011
2012
2013
2014
Accelerating Future DOE Leadership Systems
(“CORAL”)
“Summit” System
“Sierra” System
5X – 10X Higher Application Performance
IBM POWER CPUs, NVIDIA Tesla GPUs,Mellanox EDR 100Gb/s InfiniBand,
Paving The Road to Exascale Performance
2017 OLCF Leadership System
Hybrid CPU/GPU architecture
Vendor: IBM (Prime) / NVIDIA™ / Mellanox Technologies®
At least 5X Titan’s Application Performance
Approximately 3,400 nodes, each with:

Multiple IBM POWER9 CPUs and multiple NVIDIA Tesla® GPUs using the
NVIDIA Volta architecture

CPUs and GPUs completely connected with high speed NVLink

Large coherent memory: over 512 GB (HBM + DDR4)
• all directly addressable from the CPUs and GPUs

An additional 800 GB of NVRAM, which can be configured as either a burst
buffer or as extended memory
 over 40 TF peak performance
Dual-rail Mellanox® EDR-IB full, non-blocking fat-tree interconnect
IBM Elastic Storage (GPFS™) - 1TB/s I/O and 120 PB disk capacity.
Join the conversation at #OpenPOWERSummit
9
How does Summit compare to Titan
Feature
Summit
Titan
5-10x Titan
Baseline
Number of Nodes
~3,400
18,688
Node performance
> 40 TF
1.4 TF
Memory per Node
>512 GB (HBM + DDR4)
38GB (GDDR5+DDR3)
NVRAM per Node
800 GB
0
NVLink (5-12x PCIe 3)
PCIe 2
Dual Rail EDR-IB (23 GB/s)
Gemini (6.4 GB/s)
Non-blocking Fat Tree
3D Torus
Processors
IBM POWER9
NVIDIA Volta™
AMD Opteron™
NVIDIA Kepler™
File System
120 PB, 1 TB/s, GPFS™
32 PB, 1 TB/s, Lustre®
10 MW
9 MW
Application Performance
Node Interconnect
System Interconnect (node
injection bandwidth)
Interconnect Topology
Peak power consumption
CAAR projects on Titan
WL-LSMS
LAMMPS
Illuminating the role of material
disorder, statistics, and
fluctuations in nanoscale
materials and systems.
A molecular description of
membrane fusion, one of the
most common ways for
molecules to enter or exit
living cells.
3.8 performance ratio XK7/XE6
7.4 performance ratio
XK7/XE6
S3D
Understanding
turbulent combustion
through direct
numerical simulation
with complex
chemistry.
Denovo
2.2 performance ratio
XK7/XE6
3.8 performance ratio
XK7/XE6
Discrete ordinates radiation
transport calculations that
can be used in a variety of
nuclear energy and
technology applications.
Titan: Cray XK7 (Kepler GPU plus AMD 16-core Opteron CPU)
Cray XE6: (2x AMD 16-core Opteron CPUs)
The Complexities of Combustion
Jackie Chen, Sandia National Laboratory

Use Titan’s computing power to develop
predictive models of turbulence in
combustion.

Employ direct numerical simulation code
known as S3D to simulate a jet flame burning
dimethyl ether (DME) and compare with
experiment.

Impact: Americans use two-thirds of their
petroleum for transportation and one-third for
heating buildings and generating electricity. If
low-temperature compression ignition
concepts are widely adopted in nextgeneration automobiles, fuel efficiency could
increase by as much as 25 to 50 percent.
The logarithm of the scalar dissipation
rate (that is, the local mixing rate) where
white denotes high mixing rates and red,
lower mixing rates. Blue indicates regions
in which the flame is extinguished.
Visualization credit: Hongfeng Yu.
Join the conversation at #OpenPOWERSummit
12
CAAR in preparation for Summit
• Application Developer Team involvement
– Knowledge of the application
– Work on application in development “moving
target”
– Optimizations included in application release
• Demonstration of science capabilities at
scale
– Shake-down on the new system hardware and
software
– Large-scale science project is strong incentive
to participate
• Vendor technical support is crucial
– Programming environment often not mature
– Best source of information on new hardware
features
• Access to multiple resources, including early
hardware
• Portability is a critical concern
– Application developers target wide range of
architectures
– Maintaining multiple code version is difficult
– Porting to different architectures is timeconsuming
– Many Principal Investigators have allocations on
multiple resources
– Applications far outlive any computer system
• Experience benefits other developers and
users
– Coverage of scientific domains
– Coverage of algorithmic methods and
programming models
• Persistent culture of application readiness
– Experience of science liaisons and catalysts for
user programs
– Synergy with libraries and tools projects
CAAR Timeline
1.
November 2014: Call for CAAR applications
7.
September 2017: Call for Early Science projects
2.
February 20, 2015: CAAR proposal deadline
8.
November 2017: Selection Early Science projects
3.
March 2015: Selection of CAAR application teams 9.
4.
April 2015: CAAR application training workshop
5.
April 2015: CAAR application teams start
6.
June 2016: CAAR project review
December 2017: Early Science projects start
10. June 2019: Early Science project ends
Want to join our team?
ORNL is hiring. Contact us at
https://jobs.ornl.gov
[email protected]
This research used resources of the Oak Ridge Leadership
Computing Facility, which is a DOE Office of Science User Facility
supported under Contract DE-AC05-00OR22725