Computational needs of fusion scientists

Computational needs of fusion scientists
David Coster
Max-Planck-Institut
für Plasmaphysik
Outline
• Why fusion
• Computational needs
• Laptops to Exaflops
• A few examples
• Some personal observations
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 2
Fusion
• Energy source for the
sun and other stars
• Provides a potential
source of base load
energy production
• Been working on this
for more than 50 years
• Has turned out to be a
very difficult problem
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 3
Fusion
• Two main lines of research
• Inertial confinement
• Implosion of small pellets
• NIF at LLNL
• Magnetic confinement
• Two main lines of research at the
moment
– Stellarator – W7X
» Currently under construction
in Greifswald in Germany
– Tokamak – ITER
» To be constructed in
Cadarache in France
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 4
ITER
Involves 7 partners
representing more
than 50% world
population
Costs > 10 G$
Under construction
in Cadarache,
France
Key element on the
path to fusion
energy production
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 5
ITER
Units
Plasma Major 6.2
Radius
m
Plasma
2.0
Minor Radius
m
Plasma
Volume
840
m3
Plasma
Current
15.0
MA
Toroidal Field 5.3
on Axis
T
Fusion
Power
MW
500
Burn Flat Top >400 s
Power
>10
Amplification
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 6
2010-07-15
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 7
2015-04-16
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 8
2015-04-16 …
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 9
2015-04-16 …
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 10
Computational needs vary
tremendously
• At the low end, a laptop with a spreadsheet
• Experimental data acquisition
• Current experiments produce ~ 1 GB/s for ~ 10 s
• Next generation experiments will have pulse lengths of ~ 1000s
• Workflows in place to process acquired data
• Modelling needs
• Codes range from 0D – 6D
• Some can be run on that laptop
• Others require medium scale resources
• Others push the bounds of current technology
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 11
Fusion Experiment Use Case
Experimental data
• stored in the machines
experimental data
system
• “raw” data is not
versioned and is
immutable
• derived data depends
on raw data, other data
(e.g. calibration data),
programs
• derived data is
versioned
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 12
Fusion Modelling Use Case
Simulation data
• might use experimental
data as input
• might use other
“standard” data
• might use other
simulation data
• might be used for other
simulations
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 13
Need to do a better job of capturing
Provenance Data
• H2020 proposal:
• PROVENCE: PROVenance ENabled Collaborative Environment
• Involves a number of partners including PSNC
• Waiting to hear back from the Commission
• Failed
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 14
Simulations
1d
2d
Real problem is 3d
space, 2/3d velocity
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 15
Models describing the plasma vary in
complexity
10+3
Core
Transport
1
seconds
1D
Erosion
NTMs
Edge
Transport
10-3
10-6
10-9
10-12
Slowing
Down
3D
AEs
ICRH
5D
Ion
Turbulence
ECRH
Electron
Turbulence
Sheath
Atomic
10-9
10-6
10-3
meters
2-3D
4-6D
1
10+3
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 16
Paradigm shift in modelling:
monolithic  multiphysics
ETS Workflow
Time loop
Iteration loop
EQUIL
ECRH
ICRH
NBI
NEUTRALS
NEO
TURB
ELM(t)
NTM(t)
Sawteeth(t)
IMPURITIES
Free Boundary
Equilibrium
TRANSPORT_COMBINER
CORE2EQ
SOURCE_COMBINER
Shape, position,
controller
ETS
Converged
No
Yes
dt management
T=T+dt
CORE2EQ
Pellets (pr)
EQUIL?
Sawteeth(pr)
EQUIL?
ELM(pr)
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 17
European Transport Simulator
• Implemented in Kepler Scientific Workflow Engine
• Built on ontologies created by European Fusion
Development Agreement (EFDA) Task Force on
Integrated Modelling
• Now EUROfusion Work Package on Code Development for
Integrated Modelling (WPCD)
• Capable of using:
•
•
•
•
Local (node) resources
Local batch resources
Connections to remote HPC facilities via UNICORE
GRID computing resources
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 18
Also exploring other methodologies
• MAPPER project
• MUSCLE framework
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 19
Multi-scale necessity
• For example, in the field of fusion, the “holy grail” of understanding
the behaviour of current and future tokamaks is to determine the
effect of micro-turbulence on the global behaviour of the plasma.
• ASDEX Upgrade (a tokamak with a major radius of 1.65m), covering
the transport time-scale, would require about 1.25x108 core hours.
• ITER (with a major radius of 6.2m) would require a small multiple of
3x1010 core-hours.
• Using 80.000 cores and assuming perfect scaling this translates to 43
years.
• On a machine with 1000 times this number of cores it would require 16
days.
• The multiscale approach planned for this proposal [COMPAT] will
reduce this considerably.
•
These numbers might however be on the optimistic side since they are
based on the assumption that ion scale dynamics is dominant. If, as some
people fear, electron scale dynamics is also important, then the direct
scaling would require something like 3x1013 core-hours for ASDEX Upgrade
and 6x1015 core-hours for ITER - making a multi-scale approach absolutely
crucial!
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 20
Rough complexity estimates
n
T
B
R
Aspect ratio
Kappa
Area
Volume
time
Core
1,00E+20
2,00E+04
5,00E+00
6,20E+00
3,00E+00
1,50E+01
2,01E+02
7,84E+03
1,00E+03
Pedestal
1,00E+20
5,00E+03
5,00E+00
6,20E+00
3,00E+00
1,50E+01
2,01E+02
7,84E+03
1,00E+03
Separatrix
4,00E+19
2,00E+02
5,00E+00
6,20E+00
3,00E+00
1,50E+01
2,01E+02
7,84E+03
1,00E+03
electron plasma frequency
debye length
space units
time units
8,98E+10
1,05E-04
6,76E+15
8,98E+13
8,98E+10
5,25E-05
5,41E+16
8,98E+13
5,68E+10
1,66E-05
1,71E+18
5,68E+13
ion gyrofrequency
ion gyroradius
space units
7,60E+07
2,88E-03
2,42E+07
7,60E+07
1,44E-03
9,67E+07
7,60E+07
2,88E-04
2,42E+09
electron gyrofrequency
electron gyroradius
space units
1,40E+11
6,73E-05
4,44E+10
1,40E+11
3,37E-05
1,78E+11
1,40E+11
6,73E-06
4,44E+12
particles
7,84E+23
7,84E+23
3,14E+23
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 21
What resources are EU fusion
scientists using
• Local resources
• IPP (vary depending where you are, here IPP as an example)
• TOK-S cluster: 84 nodes with 20 (real) cores each, GBE
• TOK-P cluster: 42 nodes with 16 (real) cores each, IB
• MPG Hydra HPC:
– IPP has 15-20% of ~ 83.000 cores with a main memory of 280
TB and a peak performance of about 1.7 PetaFlop/s. The
accelerator part of the HPC cluster has a peak performance of
about 1 PetaFlop/s.
• JET
• 125 nodes with a total of 605 processor cores (738 Gigaflops/sec)
• ITM/WPCD Gateway
• 20 nodes with 16 (real) cores each, IB
• HELIOS HPC in Japan
• EU has ~ 50% of
• 1.555 Tflop/s [4500 node with 16 (real cores)]
• 0.412 Pflop/s [180 MIC nodes]
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 22
At the high end …
• “In November, the US government announced it will build Summit, a
$325m supercomputer capable of performing 300 quadrillion
calculations per second if you redline it.”
[http://www.theregister.co.uk/2015/04/15/summit_projects/]
• “When installed at the Oak Ridge National Laboratory in 2017 and
powered up by 2018, it will be the fastest computer in the world
compared to its publicly known rivals as they stand today.”
[http://www.theregister.co.uk/2015/04/15/summit_projects/]
• In preparation for next-generation supercomputer Summit, the Oak
Ridge Leadership Computing Facility (OLCF) selected 13
partnership projects into its Center for Accelerated Application
Readiness (CAAR) program. [https://www.olcf.ornl.gov/caar/]
• Code: GTC
Science Domain: Plasma Physics
Title: Particle Turbulence Simulations for Sustainable Fusion Reactions in
ITER
PI: Zhihong Lin, University of California–Irvine
• Code: XGC
Science Domain: Plasma Physics
Title: Multiphysics Magnetic Fusion Reactor Simulator, from Hot Core to
Cold Wall
PI: C.S. Chang, Princeton Plasma Physics Laboratory, Princeton University
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 23
Other needs
• Help with optimizing codes:
• In one, admittedly extreme, example:
• Factor 60 speed-up in a scientists code (going from 1 core to 20
cores)
• EUROfusion funded High Level Support Team
• Annual call for proposals
• One issue is that some of the big codes have been looked at by
• DEISA
• EUFORIA
• PRACE
• HLST
Significant improvements in these codes are hard to find
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 24
Some examples … SOLPS
• SOLPS
• Code in wide use to simulate the plasma in the edge of a
Tokamak
• Combination of B2 (fluid plasma) + EIRENE (Monte-Carlo
neutrals)
• Simulations for ITER take about 3 months each
• Would like to speed up the code by a factor of ~ 100
• Parallelization
• EIRENE 50-95% of time, MPI, “nearly perfect”
• B2 5-100% of time, OpenMP, factor 6 with 20 cores
• Also looking at other approaches
• Including
– Time parallelization (parareal)
– Reduced physics
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 25
In one slide …
SOLPS Speed-Up
Speed-Up
N
1
1
16
32
Gain
64
150
1e-7 -> 1e-5 100,00
1
16
32
64
150
Eirene
Fraction
100,00
1/4 grid
cells
4,00
Bundling
2,14
Eirene MPI
(95 %)
1,00
9,14
12,55
15,42
1,00
0,57
0,39
0,24
0,95
Eirene MPI
(80%)
1,00
4,00
4,44
4,71
1,00
0,25
0,14
0,07
0,80
Fluid
neutrals
2,00
20,00
0,00
0,00
0,00
B2
OpenMP
1,00
6,00
6,00
6,00
1,00
0,38
0,19
0,09
B2-Eirene
(95%)
1,00
14,77
26,30
43,15
1,00
0,92
0,82
0,67
0,95
B2-Eirene
(80%)
1,00
12,00
17,14
21,82
1,00
0,75
0,54
0,34
0,80
B2-Eirene
(50%)
1,00
8,73
10,11
10,97
1,00
0,55
0,32
0,17
0,50
Parareal
Better
feedback
4,00
8,43
8,43
20,00
10,00
3,00
0,07
3,00
David Coster | SOLPS-ITER Release Workshop | ITER | 2015-04-14 | Page 26
Part of a parameter scan (species,
power, DT-puff, Impurity puff)
• Full model: each
point would take
approximately 1
year
• Reduced model:
each point takes
less than a week
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 27
Some examples … JOREK
• With the current numerics, we get roughly the following estimate for
a large simulation..
• 400 compute nodes on Helios for 150 hours
=> ~20 TB RAM
=> ~6000 cores
=> ~1M core-hours (~60k node-hours)
• If I assume that we would need to increase the resolution in each
direction by a factor of two to three to get to the necessary resolution
for ITER at realistic parameters, I get the following rough estimate
(making rather optimistic assumptions on our scaling):
• 20 TB * 100 = 1 PB RAM
number of nodes/cores to provide this amount of memory
1M * 1000 = 1G core-hours
• With better preconditioning, the memory consumption should drop a
lot and the scalability should increase, but this has still to be tested
and then implemented into the production code (order 3 years, I
fear)
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 28
HELIOS Successor: Expert Group
Recommendations (subset)
•
purchase decision of an HPC platform be taken before the end of June 2015 for a start
of operation in production phase by January 1st, 2017.
•
computing capacity with a peak power of at least 8 PetaFlop/s dedicated to fusion
research in Europe.
•
the acquisition of the HPC system in two steps.
• the first step is the purchase of a 4 PFlop/s system to be installed by the end of 2016,
• to be followed by an extension up to 8 PFlop/s in 2018.
•
computing capacity to be provided either
•
by an HPC system to be hosted in an existing Computer Centre (CC) in Europe
•
or, in the case where the Broader Approach (BA) agreement is extended beyond 2016, in the existing
CC in Rokkasho with the investment and operation costs shared with Japan.
•
The EG recommends initially considering the viability of the option of an HPC system
hosted in a CC in Europe by issuing a Call for Expression of Interest in January 2015,
with a deadline of the end of March 2015. This would allow 3 months (i.e. until end of
June 2015) to examine other options.
•
The EG recommends the system to be dominantly equipped with conventional
processors only, but including some processing elements with new technology related
to NVIDIA GPUs and Intel Xeon Phi systems.
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 29
Some observations from the HPC
Questionnaire
David Coster, 2014-11-04
•
•
48 responses (some still coming in!)
Estimated MCPU-hours
•
Current (all HPC)
•
•
•
•
•
•
•
•
•
•
•
1210
5155
16
4.3
Production (current, typical): 47% < 1024; 36% 1024-4095 (average: 2375)
Production (current, maximum): 28% < 1024; 30% 1024-4095; 23% 4096-16383; 17% 16384-65535
(average: 11035) [factor 2.5 – 4.65 above current typical]
Anticipated: 15% no improvement; 11% > 1048576 (average: 254627) [factor 4.65 – 23.1 above current
max.]
9% / 5% currently ready
23% / 26% have plans before 2017
5% / 9% have plans after 2017
HLST
•
•
•
•
320
MIC/NVIDIA
•
•
•
•
Current (HELIOS)
Current (Needs)
Predicted (Needs)
Ratio (to HELIOS)
Ratio (to current)
HELIOS accounts for more than 60% of cycles for 64% of users.
72% of users estimated needs going up by 2 – 10
More than half of codes can already do OpenMP + MPI
Number of cores used
•
•
•
1505
(biased by 1 point; if dropped then 758.5)
33% will need support for more cores
65% will need support for MIC
63% will need support for NVIDIA
35% of codes need significantly more memory than currently available
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 30
The good, the bad, the ugly …
• Good
• One account, not 1 account per project
• Support for distributed computing, co-allocation, experimentation
• Support for data handling
• Shipping back results
• Long term storage (10 years)
• Open access ???
• Fast responses to user queries
• Transparent allocation of resources to projects
• The bad
• “export control”
• Inflexible operations
• The ugly
• 1 day outages every week
• Multiple week long outages each year
• “Unexpected behaviour”
• Running the same job twice produces substantial differences in run-time (or
worse, results)
• Extrapolated MPI start up takes longer than the time allocation
• Extrapolated MPI memory usage larger than the available memory
• Appearance of conflicts of interest in resource allocation
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 31
End …
Thank you for your attention!
Are there questions?
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 32
Current US Allocations
• INCITE allocations
• Fusion (2014)
• 129 M processor hours XK7
• 150 M processor hours BG/Q
• CRESTA (EU Project)
• 42 M processor hours XK7
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 33