Na#onal Aeronau#cs and Space Administra#on Jet Propulsion Laboratory California Ins,tute of Technology Pasadena, California Taming the Big Ocean Data Thomas Huang Project Technologist NASA Physical Oceanography Distributed Active Archive Center Jet Propulsion Laboratory California Institute of Technology 4800 Oak Grove Drive Pasadena, CA 91109-8099, United States of America THUANG/JPL Cloud Computing @GSAW 2015 Na#onal Aeronau#cs and Space Administra#on Jet Propulsion Laboratory NASA’s PO.DAAC California Ins,tute of Technology Pasadena, California • The NASA Physical Oceanography Distributed Active Archive Center (PO.DAAC) at Jet Propulsion Laboratory is an element of the Earth Observing System Data and Information System (EOSDIS). The EOSDIS provides science data to a wide communities of user for NASA’s Science Mission Directorate. • Archives and distributes data relevant to the physical state of the ocean • The mission of the PO.DAAC is to PRESERVE NASA’s ocean and climate data and make these universally ACCESSIBLE and MEANINGFUL. http://podaac.jpl.nasa.gov THUANG/JPL Cloud Computing @GSAW 2015 2 Na#onal Aeronau#cs and Space Administra#on Jet Propulsion Laboratory PO.DAAC’s Mission and Cloud Computing California Ins,tute of Technology Pasadena, California • • • THUANG/JPL Applications Data Preservation • Archive on the Cloud • Storage redundancy • Hardware reliability • Elastic storage Aquarius HandlerAquarius Handler GHRSST GHRSST HandlerGHRSST Handler Handler Ingest ASCAT Handler ASCAT Handler Jason-1 Handler Ingest Ingest Business Logics Manager Manager Manager Data Accessibility • Data services availability • Platform for • Spatial Searches • Spatial subsetting • Quality screening, etc. • Zone replication (with additional costs) Inventory Security Sig Event Search Job Tracking Services ZooKeeper ZooKeeper ZooKeeper File Services Ingest Pool Ingest Pool Archive Pool Archive Pool Data Management & Archive System Data Analysis • Climatology • Data re-gridding • Relevancy • Anomaly detections Cloud Computing @GSAW 2015 3 Na#onal Aeronau#cs and Space Administra#on Jet Propulsion Laboratory Cloud Options California Ins,tute of Technology Pasadena, California • Commercial Cloud: public Cloud (e.g. Amazon) • ~$30/TB/month, additional cost for I/O (PUT, COPY, POST…) • Transfer Out: ~$90/TB/Month • What do we get? • Potential cost reduction • No hardware maintenance • No System Administration required • Reliable storage • Pay-by-the-drink • On-premise Cloud: private Cloud • Closer to the physical archive • Fixed storage cost • No transfer / I/O cost • Need trained System Administrator • Provide elastic computing infrastructure and programming model • THUANG/JPL Bursting Cloud: hybrid Cloud • Bursting computing jobs to external (public/private Clouds) • Ability to leverage additional computing resources • Gotchas • Depending on the computing problem, it might require data replication, hence storage cost if the external Cloud is a commercial Cloud • Cost for On-premise Cloud and possible external costs Cloud Computing @GSAW 2015 4 Na#onal Aeronau#cs and Space Administra#on Jet Propulsion Laboratory Some PO.DAAC-related Cloud Computing Activities California Ins,tute of Technology Pasadena, California Technology Infusion – Cloud Computing Study • Amazon • NASA Nebula • Apache Hadoop and HBase • Built Climatology Service NASA ITLabs Cloudbursting • Job busting between three NASA centers Pilot JPL CIO OpenStack (Nebula Inc.) Nexus – Science Data Analysis Platform • Spark, Hadoop, NoSQL, Solr, etc. • Turnkey deployment • Target projects: Sea Level Change Portal, ACCESS, AIST Nexus Data Analysis Platform EDGE OpenSearch Metadata ISO, GCMD, etc… Analysis Data Aggregation Service Working with Big Data and Cloud Computing Communities • Chair, ESIP Federation Cloud Computing Cluster • Chair, NASA ESDSWG Data-Intensive Architecture • Active Contributors, ESDSWG Cloud Computing • Active Contributors, NIST Big Data Working Group • JPL Selection Committee for Cloud RFP THUANG/JPL Cloud Computing @GSAW 2015 Geospatial Metadata Repository Data Management Data Access and Distribution Workflow Data Analysis 5 Na#onal Aeronau#cs and Space Administra#on Jet Propulsion Laboratory Funded Cloud Computing Efforts California Ins,tute of Technology Pasadena, California • 2013 Sea Level Rise • PI: C. Boening/JPL, A NASA Web portal for Sea Change Knowledge Base including SWEET, Level ocean ontology, and triple store from PO.DAAC data COAPS NCAR <<in-situ>> Cache <<in-situ>> SAMOS <<MySQL>> IVAD <<in-situ>> ICOADS THREDDS <<W10N>> <<W10N>> OPeNDAP Promogranate Promogranate holdings and the user community, b) re-interface semantic engine as the MUDROD Engine by considering vocabulary, ontology, triple store linkage and weights, metadata, user profile and ocean ontology analyses, and c) integrate the MUDROD GUI for PO.DAAC data discovery, and search data holdings at ECHO and CLH by leveraging previous developments. EDGE EDGE Geospatial Metadata Repository • 2013 ROSES ACCESS • PI: E. Armstrong/JPL, Enhanced Screening for Earth Science 1.2.3 Quality MUDROD GUI User Centered Design will be adopted to integrate the MUDROD Graphical User Interface Data (GUI) by a) involving user communities for ontology and triple store capture, b) utilizing the workflow of scientists, c) engaging subject matter experts to provide insights and feedback • PI: C. Lynnes/GSFC, Federated Giovanni during the integration process, and proactively testing the MUDROD GUI for overall usability at OpenSearch JPL OpenSearch W10N Metadata ISO, GCMD, etc… W10N IN-SITU Match-up W10N PO.DAAC EDGE Metadata ISO, GCMD, etc… EDGE W10N OpenSearch Data Aggregation Service <<W10N>> Metadata ISO, GCMD, etc… W10N Data Aggregation Service Geospatial Metadata Repository Promogranate Metadata ISO, GCMD, etc… SPURS OpenSearch Match-up Service Data Aggregation Service OpenSearch <<W10N>> Promogranate 2014 ROSES AIST • PI: T. Huang/JPL, OceanXtremes: Oceanographic Data-Intensive Anomaly Detection and Analytics Portal • PI: S. Smith/FSU, A Service to Match Satellite and In-situ Marine Observations to Support Platform Intercomparisons, Cross1.2.4 MUDROD Engine calibration, Validation, and Quality Control The MUDROD engine will include four components: a semantic search dispatcher, a semantic similarity calculator, a result presentation component, and a profile analyzer. Scientists will input or otherEarth search terms. The MUDROD engine will take the search input and coordinate • PI: C. Yang/GMU, Mining and keywords Utilizing Science Dataset the search against the data sources at PO.DAAC, ECHO, and CLH using the MUDROD The results will to be provided to scientists for interaction in three forms of ranked Metadata, Usage Metrics, andknowledge User base. Feedback Improve Dataset results, recommendations, and navigation through ontology. Figure 4. MUDROD Relevancy Web Portal Match-up Matchup Matchup Processor Processor Processor Geospatial Metadata Repository EDGE each stage of development. The MUDROD GUI will provide the interface for user interactions with: a) search constraints input, b) ranked results, c) data exploration based on data recommendations, and d) navigation through semantics to find relevant datasets. The optimization will be added with three components to assist users with better discovery and access: a) ontology graph will be used to show the semantic association hierarchy of user keywords for navigation; b) similarity ranking will be added to provide better matched results for end users; c) recommendation will be added once user selected a specific dataset. Scientists would be able to use the three functionalities to quickly nail down to available datasets and be directed to the PO.DAAC and other Earth science data downloading and subsetting services within NASA data systems. Match-up Products • Metadata ISO, GCMD, etc… Geospatial Metadata Repository Data Aggregation Service Data Aggregation Service Geospatial Metadata Repository <<W10N>> OPeNDAP OPeNDAP Promogranate <<in-situ>> Cache <<in-situ>> SPURS <<satellite>> Physical Ocean PO.DAAC Labs http://podaac.jpl.nasa.gov/podaac_labs VirtualQSS Portal + -+ - “RDX Wall Art: The Making Of” iand new short documentary iand new short isa new short - isa new short documentary - highlighting iand new sho documentary - some of the pioneers highlighting iand new sho more ... Nav PO.DAAC Engine Architecture Oceanographic Common Search Interface 1.2.4.1 Semantic Virtualized Quality SearchCache Dispatcher NASA PO.DAAC Screening Service Based on the semantic capability developed for PO.DAAC clearinghouse and w10n w10n EIE, we will OPeNDAP integrate a Archive semantic search dispatcher to transform keyword-based search into semantic search THUANG/JPL Cloud Computing @GSAW 2015 1-7 Use or disclosure of information contained on this sheet is subject to the restriction on the Cover Page of this proposal. Apache Solr ECHO w10n NSIDC Other Data Center OPeNDAP OPeNDAP Archive Archive w10n 6 Na#onal Aeronau#cs and Space Administra#on Jet Propulsion Laboratory Visualization as a Service (VaaS) California Ins,tute of Technology Pasadena, California • We are also working on visualization widgets that can be embedded into any HTML pages. Visualization as a Service (VaaS) • Visualizing L3 GRACE data from a NetCDF file located somewhere within PO.DAAC. THUANG/JPL Cloud Computing @GSAW 2015 7 Na#onal Aeronau#cs and Space Administra#on Jet Propulsion Laboratory NASA Sea Level Change Portal California Ins,tute of Technology Pasadena, California • The Sea Level Change Portal will serve as central hub for enabling collaboration between the NASA Sea Level Science Team • The ultimate goal is to provide sciences and general public with “one-stop” source for current sea level change information and data, including interactive tools for accessing and viewing regional data, a virtual dashboard of sea level indicators, and ongoing updates through a suite of editorial products that include content articles, graphics, video, and animations. THUANG/JPL Cloud Computing @GSAW 2015 8 Na#onal Aeronau#cs and Space Administra#on Jet Propulsion Laboratory California Ins,tute of Technology Pasadena, California Sea Level Change Portal DATA ACCESS AND ANALYSIS • Goal: Enable easy access to multi-disciplinary data sets and facilitate quick online analyses. Features include • Global data selection for geographical maps • Spatial • Temporal • Data analysis • Regional averages (time series) • Basic statistical analysis (RMS, correlation, PDF, spectral analysis, …) • Model/data comparison • Data subscription • Define search and receive “data alert” once new data matching this search arrives. THUANG/JPL Cloud Computing @GSAW 2015 9 Na#onal Aeronau#cs and Space Administra#on Jet Propulsion Laboratory California Ins,tute of Technology Pasadena, California NASA Sea Level Change Portal ARCHITECTURE JPL NASA GSFC SLCP CMS Nexus Science Data Analysis Client Data Analysis Platform EDGE OpenSearch Metadata ISO, GCMD, etc… Analysis Data Aggregation Service Geospatial Metadata Repository Data Search and Analysis Content Search Ruby on Rails EDGE OpenSearch Metadata ISO, GCMD, etc… Analysis Content Index Geospatial Metadata Repository Metadata (Dataset and Granule) Data Access and Distribution Workflow Data Analysis Data Analysis (Time-Series, subsetting, comparison, etc.) Data Aggregation Service PostgreSQL Data Management Metadata NASA Common Metadata Repository (CMR) Data Center Data Non-NASA Data Center Metadata Data Center NASA Data Center THUANG/JPL Cloud Computing @GSAW 2015 Data 10 Na#onal Aeronau#cs and Space Administra#on Sea Level Change Portal Jet Propulsion Laboratory California Ins,tute of Technology Pasadena, California NEXUS: DATA ANALYSIS PLATFORM Nexus Data Analysis Platform EDGE OpenSearch Metadata ISO, GCMD, etc… Analysis • Data analysis platform on the Cloud • Data management and transformation Data Aggregation Service • Multi-disciplinary data coordination Geospatial Metadata Repository Data Management Data Access and Distribution Workflow Data Analysis • On-the-fly analysis services • Time series • Correlation • Re-gridding • Data subsetting • Data visualization service • RESTful access to geospatial array data THUANG/JPL Cloud Computing @GSAW 2015 11 Na#onal Aeronau#cs and Space Administra#on Federated, Multi-Cloud Architecture for Big Data Jet Propulsion Laboratory California Ins,tute of Technology Pasadena, California • Recognizing not all data centers are equal • Providing a common, portal software solution stack improves interoperability by providing distributed data analysis solution for Big Earth Science Data. Data Center Data Center Nexus Applications Handler Handler Handler Ingest Ingest Ingest Nexus Data Analysis Platform OpenSearc h EDGE Metadata ISO, GCMD, etc… Manager Manager Manager Inventory Sig Event Security Product Subscriber Handler Ingest Business Logics Manager Manager Manager Geospatial Metadata Repository Handler Ingest Analysi s Data Aggregation Service Search Product Subscriber Handler Ingest Metadata ISO, GCMD, etc… OpenSearc h Analysi s Data Aggregation Service Business Logics Applications Data Analysis Platform EDGE Inventory Geospatial Metadata Repository Product Subscriber Job Tracking Services Sig Event Security Search Product Subscriber Job Tracking Services ZooKeeper ZooKeeper ZooKeeper ZooKeeper Data Management Workflow Data Access and Distribution Data Management Data Access and Distribution Data Analysis Workflow Data Analysis File & Product Services ZooKeeper ZooKeeper File & Product Services Ingest Pool Ingest Pool Ingest Pool Worker Pool Worker Pool Central Analytic Node HORIZON Data Management and Workflow Framework Data Management Node Analytic Node Ingest Pool Worker Pool Worker Pool HORIZON Data Management and Workflow Framework Analytic Node Data Management Node Nexus Data Analysis Platform Data Center Data Center EDGE OpenSearch Metadata ISO, GCMD, etc… Analysis Data Aggregation Service Applications Handler Handler Handler Ingest Ingest Ingest Geospatial Metadata Repository Business Logics Manager Manager Manager Inventory Sig Event Security Product Subscriber Applications Search Product Subscriber Handler Handler Ingest Ingest Handler Ingest Business Logics Manager Manager Manager Inventory Security Product Subscriber Sig Event Search Product Subscriber Job Tracking Services ZooKeeper Job Tracking Services ZooKeeper ZooKeeper ZooKeeper File & Product Services File & Product Services Ingest Pool ZooKeeper ZooKeeper Ingest Pool Ingest Pool Ingest Pool Worker Pool Worker Pool Worker Pool Worker Pool HORIZON Data Management and Workflow Framework Data Management Node Data Management Data Access and Distribution Workflow Data Analysis HORIZON Data Management and Workflow Framework Data Management Node Data Center Data Center Applications Applications Handler Handler Ingest Ingest Handler Nexus Handler Handler Ingest Ingest Handler Nexus Data Analysis Platform Ingest Data Analysis Platform Ingest EDGE OpenSearc h Business Logics Manager Manager Manager Inventory Security Sig Event Search Metadata ISO, GCMD, etc… EDGE Analysi s OpenSearc h Product Subscriber Inventory Product Subscriber ZooKeeper Sig Event Search Product Subscriber Geospatial Metadata Repository Job Tracking Services ZooKeeper ZooKeeper File & Product Services Worker Pool Security Product Subscriber Geospatial Metadata Repository Ingest Pool Business Logics Manager Manager Manager Data Aggregation Service ZooKeeper Data Management Data Access and Distribution Workflow Data Analysis Worker Pool HORIZON Data Management and Workflow Framework Data Management Node THUANG/JPL Analysi s Data Aggregation Service Job Tracking Services Ingest Pool Metadata ISO, GCMD, etc… Data Management Data Access and Distribution Workflow Data Analysis Analytic Node Analytic Node Cloud Computing @GSAW 2015 ZooKeeper ZooKeeper File & Product Services Ingest Pool Ingest Pool Worker Pool Worker Pool HORIZON Data Management and Workflow Framework Data Management Node 12 Na#onal Aeronau#cs and Space Administra#on Jet Propulsion Laboratory Summary California Ins,tute of Technology Pasadena, California • Start with the architecture. Can’t build with just Cloud. • Don’t jump into Cloud because it is popular • Use Cloud because it makes sense • Cost? • Reliability? • Platform to improve data access and analysis? • Might need to rethink existing software solutions when moving to Cloud • Truly leverage the elasticity of the Cloud? THUANG/JPL Cloud Computing @GSAW 2015 13 Na#onal Aeronau#cs and Space Administra#on Jet Propulsion Laboratory Summary California Ins,tute of Technology Pasadena, California • User automation deployment – Puppet, Chef, Salt, etc. • Bring the computing close to the data make sense – On-Premise Cloud (currently) • Need local experts • Governance • For Commercial Cloud • Simplified (fixed) costing model for Amazon resources • Reduced storage and transfer out pricing, and zone replication • For On-premise Cloud • Export controlled: public data, ITAR software • Suggest standardize Cloud stack • Federated, multi-Cloud environment THUANG/JPL Cloud Computing @GSAW 2015 14 Na#onal Aeronau#cs and Space Administra#on Jet Propulsion Laboratory California Ins,tute of Technology Pasadena, California THANKS Ques,ons, and more informa,on [email protected] THUANG/JPL Cloud Computing @GSAW 2015
© Copyright 2025