Computational needs of fusion scientists David Coster Max-Planck-Institut für Plasmaphysik Outline • Why fusion • Computational needs • Laptops to Exaflops • A few examples • Some personal observations David Coster | HPC Users Conference | Poznan | 2015-05 | Page 2 Fusion • Energy source for the sun and other stars • Provides a potential source of base load energy production • Been working on this for more than 50 years • Has turned out to be a very difficult problem David Coster | HPC Users Conference | Poznan | 2015-05 | Page 3 Fusion • Two main lines of research • Inertial confinement • Implosion of small pellets • NIF at LLNL • Magnetic confinement • Two main lines of research at the moment – Stellarator – W7X » Currently under construction in Greifswald in Germany – Tokamak – ITER » To be constructed in Cadarache in France David Coster | HPC Users Conference | Poznan | 2015-05 | Page 4 ITER Involves 7 partners representing more than 50% world population Costs > 10 G$ Under construction in Cadarache, France Key element on the path to fusion energy production David Coster | HPC Users Conference | Poznan | 2015-05 | Page 5 ITER Units Plasma Major 6.2 Radius m Plasma 2.0 Minor Radius m Plasma Volume 840 m3 Plasma Current 15.0 MA Toroidal Field 5.3 on Axis T Fusion Power MW 500 Burn Flat Top >400 s Power >10 Amplification David Coster | HPC Users Conference | Poznan | 2015-05 | Page 6 2010-07-15 David Coster | HPC Users Conference | Poznan | 2015-05 | Page 7 2015-04-16 David Coster | HPC Users Conference | Poznan | 2015-05 | Page 8 2015-04-16 … David Coster | HPC Users Conference | Poznan | 2015-05 | Page 9 2015-04-16 … David Coster | HPC Users Conference | Poznan | 2015-05 | Page 10 Computational needs vary tremendously • At the low end, a laptop with a spreadsheet • Experimental data acquisition • Current experiments produce ~ 1 GB/s for ~ 10 s • Next generation experiments will have pulse lengths of ~ 1000s • Workflows in place to process acquired data • Modelling needs • Codes range from 0D – 6D • Some can be run on that laptop • Others require medium scale resources • Others push the bounds of current technology David Coster | HPC Users Conference | Poznan | 2015-05 | Page 11 Fusion Experiment Use Case Experimental data • stored in the machines experimental data system • “raw” data is not versioned and is immutable • derived data depends on raw data, other data (e.g. calibration data), programs • derived data is versioned David Coster | HPC Users Conference | Poznan | 2015-05 | Page 12 Fusion Modelling Use Case Simulation data • might use experimental data as input • might use other “standard” data • might use other simulation data • might be used for other simulations David Coster | HPC Users Conference | Poznan | 2015-05 | Page 13 Need to do a better job of capturing Provenance Data • H2020 proposal: • PROVENCE: PROVenance ENabled Collaborative Environment • Involves a number of partners including PSNC • Waiting to hear back from the Commission • Failed David Coster | HPC Users Conference | Poznan | 2015-05 | Page 14 Simulations 1d 2d Real problem is 3d space, 2/3d velocity David Coster | HPC Users Conference | Poznan | 2015-05 | Page 15 Models describing the plasma vary in complexity 10+3 Core Transport 1 seconds 1D Erosion NTMs Edge Transport 10-3 10-6 10-9 10-12 Slowing Down 3D AEs ICRH 5D Ion Turbulence ECRH Electron Turbulence Sheath Atomic 10-9 10-6 10-3 meters 2-3D 4-6D 1 10+3 David Coster | HPC Users Conference | Poznan | 2015-05 | Page 16 Paradigm shift in modelling: monolithic multiphysics ETS Workflow Time loop Iteration loop EQUIL ECRH ICRH NBI NEUTRALS NEO TURB ELM(t) NTM(t) Sawteeth(t) IMPURITIES Free Boundary Equilibrium TRANSPORT_COMBINER CORE2EQ SOURCE_COMBINER Shape, position, controller ETS Converged No Yes dt management T=T+dt CORE2EQ Pellets (pr) EQUIL? Sawteeth(pr) EQUIL? ELM(pr) David Coster | HPC Users Conference | Poznan | 2015-05 | Page 17 European Transport Simulator • Implemented in Kepler Scientific Workflow Engine • Built on ontologies created by European Fusion Development Agreement (EFDA) Task Force on Integrated Modelling • Now EUROfusion Work Package on Code Development for Integrated Modelling (WPCD) • Capable of using: • • • • Local (node) resources Local batch resources Connections to remote HPC facilities via UNICORE GRID computing resources David Coster | HPC Users Conference | Poznan | 2015-05 | Page 18 Also exploring other methodologies • MAPPER project • MUSCLE framework David Coster | HPC Users Conference | Poznan | 2015-05 | Page 19 Multi-scale necessity • For example, in the field of fusion, the “holy grail” of understanding the behaviour of current and future tokamaks is to determine the effect of micro-turbulence on the global behaviour of the plasma. • ASDEX Upgrade (a tokamak with a major radius of 1.65m), covering the transport time-scale, would require about 1.25x108 core hours. • ITER (with a major radius of 6.2m) would require a small multiple of 3x1010 core-hours. • Using 80.000 cores and assuming perfect scaling this translates to 43 years. • On a machine with 1000 times this number of cores it would require 16 days. • The multiscale approach planned for this proposal [COMPAT] will reduce this considerably. • These numbers might however be on the optimistic side since they are based on the assumption that ion scale dynamics is dominant. If, as some people fear, electron scale dynamics is also important, then the direct scaling would require something like 3x1013 core-hours for ASDEX Upgrade and 6x1015 core-hours for ITER - making a multi-scale approach absolutely crucial! David Coster | HPC Users Conference | Poznan | 2015-05 | Page 20 Rough complexity estimates n T B R Aspect ratio Kappa Area Volume time Core 1,00E+20 2,00E+04 5,00E+00 6,20E+00 3,00E+00 1,50E+01 2,01E+02 7,84E+03 1,00E+03 Pedestal 1,00E+20 5,00E+03 5,00E+00 6,20E+00 3,00E+00 1,50E+01 2,01E+02 7,84E+03 1,00E+03 Separatrix 4,00E+19 2,00E+02 5,00E+00 6,20E+00 3,00E+00 1,50E+01 2,01E+02 7,84E+03 1,00E+03 electron plasma frequency debye length space units time units 8,98E+10 1,05E-04 6,76E+15 8,98E+13 8,98E+10 5,25E-05 5,41E+16 8,98E+13 5,68E+10 1,66E-05 1,71E+18 5,68E+13 ion gyrofrequency ion gyroradius space units 7,60E+07 2,88E-03 2,42E+07 7,60E+07 1,44E-03 9,67E+07 7,60E+07 2,88E-04 2,42E+09 electron gyrofrequency electron gyroradius space units 1,40E+11 6,73E-05 4,44E+10 1,40E+11 3,37E-05 1,78E+11 1,40E+11 6,73E-06 4,44E+12 particles 7,84E+23 7,84E+23 3,14E+23 David Coster | HPC Users Conference | Poznan | 2015-05 | Page 21 What resources are EU fusion scientists using • Local resources • IPP (vary depending where you are, here IPP as an example) • TOK-S cluster: 84 nodes with 20 (real) cores each, GBE • TOK-P cluster: 42 nodes with 16 (real) cores each, IB • MPG Hydra HPC: – IPP has 15-20% of ~ 83.000 cores with a main memory of 280 TB and a peak performance of about 1.7 PetaFlop/s. The accelerator part of the HPC cluster has a peak performance of about 1 PetaFlop/s. • JET • 125 nodes with a total of 605 processor cores (738 Gigaflops/sec) • ITM/WPCD Gateway • 20 nodes with 16 (real) cores each, IB • HELIOS HPC in Japan • EU has ~ 50% of • 1.555 Tflop/s [4500 node with 16 (real cores)] • 0.412 Pflop/s [180 MIC nodes] David Coster | HPC Users Conference | Poznan | 2015-05 | Page 22 At the high end … • “In November, the US government announced it will build Summit, a $325m supercomputer capable of performing 300 quadrillion calculations per second if you redline it.” [http://www.theregister.co.uk/2015/04/15/summit_projects/] • “When installed at the Oak Ridge National Laboratory in 2017 and powered up by 2018, it will be the fastest computer in the world compared to its publicly known rivals as they stand today.” [http://www.theregister.co.uk/2015/04/15/summit_projects/] • In preparation for next-generation supercomputer Summit, the Oak Ridge Leadership Computing Facility (OLCF) selected 13 partnership projects into its Center for Accelerated Application Readiness (CAAR) program. [https://www.olcf.ornl.gov/caar/] • Code: GTC Science Domain: Plasma Physics Title: Particle Turbulence Simulations for Sustainable Fusion Reactions in ITER PI: Zhihong Lin, University of California–Irvine • Code: XGC Science Domain: Plasma Physics Title: Multiphysics Magnetic Fusion Reactor Simulator, from Hot Core to Cold Wall PI: C.S. Chang, Princeton Plasma Physics Laboratory, Princeton University David Coster | HPC Users Conference | Poznan | 2015-05 | Page 23 Other needs • Help with optimizing codes: • In one, admittedly extreme, example: • Factor 60 speed-up in a scientists code (going from 1 core to 20 cores) • EUROfusion funded High Level Support Team • Annual call for proposals • One issue is that some of the big codes have been looked at by • DEISA • EUFORIA • PRACE • HLST Significant improvements in these codes are hard to find David Coster | HPC Users Conference | Poznan | 2015-05 | Page 24 Some examples … SOLPS • SOLPS • Code in wide use to simulate the plasma in the edge of a Tokamak • Combination of B2 (fluid plasma) + EIRENE (Monte-Carlo neutrals) • Simulations for ITER take about 3 months each • Would like to speed up the code by a factor of ~ 100 • Parallelization • EIRENE 50-95% of time, MPI, “nearly perfect” • B2 5-100% of time, OpenMP, factor 6 with 20 cores • Also looking at other approaches • Including – Time parallelization (parareal) – Reduced physics David Coster | HPC Users Conference | Poznan | 2015-05 | Page 25 In one slide … SOLPS Speed-Up Speed-Up N 1 1 16 32 Gain 64 150 1e-7 -> 1e-5 100,00 1 16 32 64 150 Eirene Fraction 100,00 1/4 grid cells 4,00 Bundling 2,14 Eirene MPI (95 %) 1,00 9,14 12,55 15,42 1,00 0,57 0,39 0,24 0,95 Eirene MPI (80%) 1,00 4,00 4,44 4,71 1,00 0,25 0,14 0,07 0,80 Fluid neutrals 2,00 20,00 0,00 0,00 0,00 B2 OpenMP 1,00 6,00 6,00 6,00 1,00 0,38 0,19 0,09 B2-Eirene (95%) 1,00 14,77 26,30 43,15 1,00 0,92 0,82 0,67 0,95 B2-Eirene (80%) 1,00 12,00 17,14 21,82 1,00 0,75 0,54 0,34 0,80 B2-Eirene (50%) 1,00 8,73 10,11 10,97 1,00 0,55 0,32 0,17 0,50 Parareal Better feedback 4,00 8,43 8,43 20,00 10,00 3,00 0,07 3,00 David Coster | SOLPS-ITER Release Workshop | ITER | 2015-04-14 | Page 26 Part of a parameter scan (species, power, DT-puff, Impurity puff) • Full model: each point would take approximately 1 year • Reduced model: each point takes less than a week David Coster | HPC Users Conference | Poznan | 2015-05 | Page 27 Some examples … JOREK • With the current numerics, we get roughly the following estimate for a large simulation.. • 400 compute nodes on Helios for 150 hours => ~20 TB RAM => ~6000 cores => ~1M core-hours (~60k node-hours) • If I assume that we would need to increase the resolution in each direction by a factor of two to three to get to the necessary resolution for ITER at realistic parameters, I get the following rough estimate (making rather optimistic assumptions on our scaling): • 20 TB * 100 = 1 PB RAM number of nodes/cores to provide this amount of memory 1M * 1000 = 1G core-hours • With better preconditioning, the memory consumption should drop a lot and the scalability should increase, but this has still to be tested and then implemented into the production code (order 3 years, I fear) David Coster | HPC Users Conference | Poznan | 2015-05 | Page 28 HELIOS Successor: Expert Group Recommendations (subset) • purchase decision of an HPC platform be taken before the end of June 2015 for a start of operation in production phase by January 1st, 2017. • computing capacity with a peak power of at least 8 PetaFlop/s dedicated to fusion research in Europe. • the acquisition of the HPC system in two steps. • the first step is the purchase of a 4 PFlop/s system to be installed by the end of 2016, • to be followed by an extension up to 8 PFlop/s in 2018. • computing capacity to be provided either • by an HPC system to be hosted in an existing Computer Centre (CC) in Europe • or, in the case where the Broader Approach (BA) agreement is extended beyond 2016, in the existing CC in Rokkasho with the investment and operation costs shared with Japan. • The EG recommends initially considering the viability of the option of an HPC system hosted in a CC in Europe by issuing a Call for Expression of Interest in January 2015, with a deadline of the end of March 2015. This would allow 3 months (i.e. until end of June 2015) to examine other options. • The EG recommends the system to be dominantly equipped with conventional processors only, but including some processing elements with new technology related to NVIDIA GPUs and Intel Xeon Phi systems. David Coster | HPC Users Conference | Poznan | 2015-05 | Page 29 Some observations from the HPC Questionnaire David Coster, 2014-11-04 • • 48 responses (some still coming in!) Estimated MCPU-hours • Current (all HPC) • • • • • • • • • • • 1210 5155 16 4.3 Production (current, typical): 47% < 1024; 36% 1024-4095 (average: 2375) Production (current, maximum): 28% < 1024; 30% 1024-4095; 23% 4096-16383; 17% 16384-65535 (average: 11035) [factor 2.5 – 4.65 above current typical] Anticipated: 15% no improvement; 11% > 1048576 (average: 254627) [factor 4.65 – 23.1 above current max.] 9% / 5% currently ready 23% / 26% have plans before 2017 5% / 9% have plans after 2017 HLST • • • • 320 MIC/NVIDIA • • • • Current (HELIOS) Current (Needs) Predicted (Needs) Ratio (to HELIOS) Ratio (to current) HELIOS accounts for more than 60% of cycles for 64% of users. 72% of users estimated needs going up by 2 – 10 More than half of codes can already do OpenMP + MPI Number of cores used • • • 1505 (biased by 1 point; if dropped then 758.5) 33% will need support for more cores 65% will need support for MIC 63% will need support for NVIDIA 35% of codes need significantly more memory than currently available David Coster | HPC Users Conference | Poznan | 2015-05 | Page 30 The good, the bad, the ugly … • Good • One account, not 1 account per project • Support for distributed computing, co-allocation, experimentation • Support for data handling • Shipping back results • Long term storage (10 years) • Open access ??? • Fast responses to user queries • Transparent allocation of resources to projects • The bad • “export control” • Inflexible operations • The ugly • 1 day outages every week • Multiple week long outages each year • “Unexpected behaviour” • Running the same job twice produces substantial differences in run-time (or worse, results) • Extrapolated MPI start up takes longer than the time allocation • Extrapolated MPI memory usage larger than the available memory • Appearance of conflicts of interest in resource allocation David Coster | HPC Users Conference | Poznan | 2015-05 | Page 31 End … Thank you for your attention! Are there questions? David Coster | HPC Users Conference | Poznan | 2015-05 | Page 32 Current US Allocations • INCITE allocations • Fusion (2014) • 129 M processor hours XK7 • 150 M processor hours BG/Q • CRESTA (EU Project) • 42 M processor hours XK7 David Coster | HPC Users Conference | Poznan | 2015-05 | Page 33
© Copyright 2024