ESSI 2015-8273 The NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform to Support the Analysis of Petascale Environmental Data Collections Ben Evans1, Lesley Wyborn1, Tim Pugh2, Chris Allen1, Joseph Antony1, Kashif Gohar1, David Porter1, Jon Smillie1, Claire Trenham1, Jingbo Wang1, Irina Bastrakova3, Alex Ip3, Gavin Bell4 1ANU, 2Bureau 3Geoscience of Meteorology, Australia, 4The 6th Column Project (Second part of this talk is in next ESSI Session) nci.org.au @NCInews nci.org.au 1/25 • High Performance Data (HPD) - data that is carefully prepared, standardised and structured so that it can be used in Data-Intensive Science on HPC (Evans, ISESS 2015, Springer) – HPC – turning compute into IO-bound problems – HPD – turning IO-bound into ontology + semantic problems • What are the HPC and HPD drivers? • How do you build environments for this infrastructure that is easy for users to do science? © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au 2/25 Top 500 Super Computer list since 1990 • Fast-and-flexible data access to structured data is required Next NCI Current NCI • The needs to be a balance between processing power and ability to access data (data scaling) • The focus is for ondemand direct access to large data sources http://www.top500.org/statistics/perfdevel/ • enabling High performance analytics and analysis tools directly on that content © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au 3/25 Elephant Flows Place Great Demands on Networks essentially fixed determined by speed of light Physical pipe that leaks water at rate of .0046% by volume. Network ‘pipe’ that drops packets at rate of .0046%. Result 99.9954% of water transferred. Result 100% of data transferred, slowly, at <<5% optimal speed. With proper engineering, we can minimize packet loss. Assumptions: 10Gbps TCP flow, 80ms RTT. See Eli Dart, Lauren Rotman, Brian Tierney, Mary Hester, and Jason Zurawski. The Science DMZ: A Network Design Pattern for Data-Intensive Science. In Proceedings of the IEEE/ACM Annual SuperComputing Conference (SC13), Denver CO, 2013. © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au 4/25 Computational and Cloud Platforms Raijin: • 57,472 cores (Intel Xeon Sandy Bridge technology, 2.6 GHz) in 3592 compute nodes; • 160 TBytes (approx.) of main memory; • Infiniband FDR interconnect; and • 7 PBytes (approx.) of usable fast filesystem (for short-term scratch space). • 1.5 MW power; 100 tonnes of water in cooling Partner Cloud • Same generation of technology as raijin (Intel Xeon Sandy Bridge technology, 2.6 GHz) but only 1500 cores; • Infiniband FDR interconnect; • Collaborative platform for services and • The platform for hosting non-batch services NCI Nectar Cloud • Same generation as partner cloud • Non-managed environment • Weak integration © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au 5/25 NCI Cloud Lustre SSD FDR IB SSD SSD SSD FDR IB FDR IB SSD FDR IB SSD FDR IB FDR IB Per-Tenant public IP assignments (CIDR boundaries – typically /29) NFSNFS OpenStack private IP (flat network*) - quota managed © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au NCI’s integrated high-performance environment 6/25 Internet To second data centre NCI data movers Raijin Login + Data movers Cloud Raijin HPC Compute 10 GigE /g/data 56Gb FDR IB Fabric Raijin 56Gb FDR IB Fabric Massdata (tape) Cache 1.0PB, Tape 20PB Raijin high-speed filesystem Persistent global parallel filesystem /g/data1 /g/data2 /g/data3 /short 7.4 PB 6.75 PB 9 PB 7.6PB /home, /system, /images, /apps © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au 7/25 10+ PB of Data for Interdisciplinary Science Astronomy (Optical) 200 TB CMIP5 3PB Earth Observ. 2 PB Weather 340 TB Atmosphere 2.4 PB Water Ocean 1.5 PB Marine Videos 10 TB BOM GA CSIRO ANU Other National Bathy, DEM 100 TB Geophysics 300 TB International © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au 8/25 National Environment Research Data Collections (NERDC) 1. Climate/ESS Model Assets and Data Products 2. Earth and Marine Observations and Data Products 3. Geoscience Collections 4. Terrestrial Ecosystems Collections 5. Water Management and Hydrology Collections Data Collections Approx. Capacity CMIP5, CORDEX ~3 Pbytes ACCESS products 2.4 Pbytes LANDSAT, MODIS, VIIRS, AVHRR, INSAR, MERIS 1.5 Pbytes Digital Elevation, Bathymetry, Onshore Geophysics 700 Tbytes Seasonal Climate 700 Tbytes Bureau of Meteorology Observations 350 Tbytes Bureau of Meteorology Ocean-Marine 350 Tbytes Terrestrial Ecosystem 290 Tbytes Reanalysis products 100 Tbytes © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au 9/25 Internationally sourced • Satellite Data (USGS, NASA, JAXA, ESA, …) • Reanalysis (ECMWF, NCEP, NCAR, …) • Climate Data (CMIP5, AMIP, GeoMIP, CORDEX, …) • Ocean Modelling (Earth Simulator, NOAA, GFDL, …) These will only increase as we depend on more data, and some will be replicated. How can we better keep this in sync, versioned, and back-referenced for the supplier? • Organise “long-tail” data that calibrates and integrates with the big data. How should we manage this data, versioned, and easily attribute supplier (researcher? Collab? Uni? Agency?) © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au 10/25 Some Data Challenges • Data Formats • Standardize data formats – time to convert legacy and proprietary ones • Appropriately normalise the data data models and conventions • Adopt HPC-enabled libraries that abstracts storage • Expose all attributes for search • not just collection-level search, not just datasets, all data attributes • What are the handles we need to access the data? • Provide more programmatic interfaces and link up data and compute resources • More server side processing • Add the semantic meaning to the data • Create useful datasets (in the programming context) from data collections • Is it scientifically appropriate for a data service to aggregate/interpolate? • What unique/persistent identifiers do we need? • DOI is only part of the story. • Versioning is important. • Born linked data and maintaining graph infrastructure © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au 11/25 Regularising High Performance Data using HDF5 Compilers & Tools Fortran, C, C++ Python, R, MatLab, IDL Metadata Library Layer Layer 1 netCDF-CF NetCDF-4 Library Ferret, CDO,NCL, NCO, GDL,GDAL, GrADS,GRASS,QGIS HDF-EOS5 libgdal Library Layer 2 HDF5 MPI-enabled Lustre Globe Caritas Open Nav Surface ISO 19115, RIF-CS, DCAT etc [FITS] Airborne Geophysics [SEG-Y] BAG Line data … HDF5 Serial Other Storage (options) © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au 12/25 Regularising High Performance Data using HDF5 – including Data Services Compilers & Tools Python, R, MatLab, IDL Fortran, C, C++ HDF-EOS5 libgdal Library Layer 2 HDF5 MPI-enabled Lustre OGC SOS OGC WPS OGC WFS OGC WCS NetCDF-4 Library OGC WMS OpenDAP Services (expose data model+sema ntics) Metadata Library Layer Layer 1 netCDF-CF Ferret, CDO,NCL, NCO, GDL,GDAL, GrADS,GRASS,QGIS Globe Caritas Open Nav Surface Fast “whole-oflibrary” catalogue ISO 19115, RIF-CS, DCAT etc [FITS] Airborne Geophysics [SEG-Y] BAG Line data … HDF5 Serial Other Storage (options) © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au 13/25 Finding data and services Supercomputer access Virtual lab DAP, OGC, … Services GeoNetwork catalogue Lucene database /g/data1 /g/data2 Trialing Elastic Search © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au 14/25 Prototype to Production - anti-”Mine” craft Virtual Labs: • Separating Researcher from Software builders • Cloud is an enabler, but: • don’t make researchers become full system admins. • save developers from being operational Project lifecycle – and preparing success Perspiration Productivity Proj1:Start Proj1:End Proj2-4:Start Proj2-4:End © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au 15/25 Prototype to Production - anti-”Mine” craft Development Phase in a project VL Managers Developer Developers Headspace hours VL Managers Poorly executed VL Mgr. Developer ? Reasonably executed Well executed © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au 16/25 Prototype to Production - anti-”Mine” craft VL Managers Changed Scope – adopted broadly Developer Developers Headspace hours VL Managers VL Mgr Developer Development Phase in a project Poorly executed Reasonably executed Well executed © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au 17/25 Virtual Laboratory driven software patterns Basic OS functions NCI Stack 1 Common Modules NCI Env Stack Bespoke Services Workflow X Gridftp P2P Analytics Stack Vis Stack Special config choices Super Software Stack 2xStack1 Modify Stack1 Modify Stack 2 Take Stacks from Upstream And use as Bundles © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au Transition from developer, to prototype, to DevOps 18/25 Step 1: Development • Get template for development • What is special, separate out what is common • Reuse other software stacks where possible Step 2: Prototype • Deploy in an isolated tenant of a cloud • Determine dependencies. • Test cases to demonstrate correctly functioning. Step 3: Sustainability • Pull repo into operational tenant • Prepare bundle for integration with rest of framework • Hand back cleaned bundle • Establish DevOps process © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au 19/25 DevOps approach to building and operating environments Virtual Laboratory Operational Bundle - Git controlled - pull model - continuous integration testing NCI Core Bundles Community2 repo Community1 repo © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au Advantages 20/25 • Separates roles and responsibilities - from gatekeeper to DevOps management: • Specialist on package • VL managers • system admin • “Architecture” to “Platform” • flexible with technology change • makes handover/maintenance easier • • • • • Both Test/Dev/Ops and patches/rollback become BAU Sharable bundles Can tag release of software stacks Precondition for trusted software stacks Provenance - Scientific / gov policy scrutiny © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au A snapshot of layered bundles to build complex VLs © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans 21/25 nci.org.au Easy analysis environments 22/25 Increasing use of iPython Notebooks VDI - Easy In-situ environment using virtual analysis desktops. © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au VDI – cont … © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans 23/25 nci.org.au 24/25 NCI Petascale Data-Intensive Science Platform Data Services THREDDS Server-side analysis and visualization VDI: Cloud scale user desktops on data 10PB+ Research Data Web-time analytics software © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au 25/25 Summary: Progress toward Major Milestones • Interdisciplinary Science To publish, catalogue and access data and software for enhancing interdisciplinary, big data-intensive (HPD) science and with interoperable data services and protocols. • Integrity of Science Managed services to capture a workflow’s process as a comparable, traceable output. Ease-of-access to data and software for enhanced workflow development and repeatable science which can be conducted with less effort or an acceleration of outputs. • Integrity of Data The data repository services to ensure data integrity, provenance records, universal identifiers, repeatable data discovery and access from workflows or interactive users. © National Computational “NCI High Performance Computing (HPC) and High Performance Data Infrastructure 2015 (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans nci.org.au
© Copyright 2024