Presentation - Information Services

Role of advanced e-infrastructure in
Scientific Collaboration and Big Discovery
Subrata Chattopadhyay
CDAC , Bangalore
[email protected]
18-Mar-15
Outline
•
•
•
•
•
•
C-DAC and background
Recent Big Experiments and discovery
Role of e-Infrastructure – HPC / Grid / Cloud
G.A.R.U.D.A - platform for collaboration
Use Cases – Applications
Conclusion
2
C-DAC
 Centre for Development of Advanced Computing (C-DAC) is the premier R&D
organization of the Department of Electronics and Information Technology
(DeitY), Ministry of Communications & Information Technology (MCIT) for
carrying out R&D in IT, Electronics and associated areas
 Established in 1988, spread over 11 cities about 3000 employees
Eleven Centres
Thematic Areas
 High Performance Computing, Grid and Cloud Computing
 Multilingual Computing and Heritage Computing
 Professional Electronics including VLSI and
Embedded Systems
 Software Technologies, including FOSS
 Cyber Security and Cyber Forensics
 Health Informatics
 Education and Training
Significant Achievements: HPC, Grid & Cloud Computing

C-DAC's PARAM YUVA-II:

I dia’s fastest and the first supercomputer to
cross 500 TeraFlops peak performance

Ranked No. 44 in the Green500 list of World's
Supercomputers announced in November
2013. It is the No. 1 system in India and No. 9 in
Asia Pacific as per this list

Bio Blaze: an exclusive Supercomputer for
Bioinformatics research for better diagnosis of
diseases and discovery of new drugs

Meghdoot Cloud Stack: A free and open source
cloud stack

Launched PARAM Shavak – Supercomputer in a
Box on 25th December 2014 by Shri Ravi Shankar
Prasad, Hon'ble Minister of Communications and
IT
National HPC Facilities
NPSF @ Pune
Biogene
CTSF @ Bangalore
Biocrome System
Bioinformatics Resources & Applications Facility (BRAF), Pune
Scientific Discovery
Observational to Computationally-intensive research
Computer simulations reconcile the inductive and
deductive approaches of the Scientific Method
2
9
Time scale(s)
LHC
start
LHC
simulation
LHC
End ???
LHC
approval
CLIC
simulations
CLIC
Approval ??
CLIC
start
CLIC
End ?????
10
Square Kilometre Array - SKA
• Next generation radio telescope
• Large multi national Project
• 100 x more sensitive
• 1000000 X faster
• 5 square km of dish over 3000 km
• Cost Euro 1.5 Billion , construction start in 2018, partially
• Ready in 2020, fully in 2025
• 10 member countries, India is Assoc. member
• Currently the worlds most ambitious IT project
• First real exascale ready application
• Largest global big-data challenge
SKA is a cosmic time machine
Cosmic Questions:
• Universe not
eternal - what
beginning and
what end?
• Shape – Sphere /
Saddle / Flat
• Multiverse ?
Universe :
•
•
•
•
0.5% planets & stars
4% gas
24% Dark matter
71.5% Dark energy
Science data processor pipeline
Beam
Steering
SKA 1
SKA 2
Observation Time-series
Buffer
Searching
10 Tb/s
50 PB 200 Pflop
1000Tb/s 10/1 TB/s 10 Eflop
10 Tb/s
Software
complexity
Imaging
Search
analysis
10 Pflop
1 Eflop
Image
Storage
HPC science
processing
Beamforming/
De-dispersion
Gridding
Visibilities
Bulk Store
Switch
Observation
Buffer
Image
Processor
Visibility
Steering
UV
Processor
Course
Delays
Fine F-step/
Correlation
Buffer store
Corner
Turning
Buffer store
Non-Imaging:
Switch
…
Incoming
Data from
collectors
Course
Delays
Correlator
Beamformer
Corner
Turning
Imaging:
Object/timing
Storage
1 EB/y 100 Pflop
1 Eflop
10 EB/y
Thirty Meter Telescope (TMT) Project
• Time line
–
–
–
–
2004
2009
2011
2018
–
–
–
–
–
–
UC
Caltech
Canada
Japan
India
China
project start, design development
preconstruction phase
start construction
complete, first light, start AO science
• Partnership
• Cost
– 970M$
GTC 2009Jul25
14
About TMT
 The project was conceived in the year 2004
 USA, Canada, Japan, India and China are the participating
countries
 Construction
30 meter diameter primary mirror.
Mirror consists of 492 smaller (1.4 m), hexagonal mirrors.
The shape of each segment, as well as its position relative
to neighbouring segments, controlled actively.
A 3 m secondary mirror produces an unobstructed fieldof-view of 20 arc minutes in diameter
 Makes use of Adaptive Optics
 Scientific instrumentation for gathering information apart
from images
TMT Mauna Kea
GTCELTs
2009Jul25
ALMA and
2009
16
16
BRAIN Initiative
• Announced by US president in 2013
• Mapping and understanding the most complex organ – 100
Billion Neurons
• The Brain Research through Advancing Innovative
Neurotechnologies Initiative (BRAIN Initiative) is a broad,
collaborative research initiative to unlock the mysteries of the
human brain
17
The BRAIN Initiative: Surviving the Data Deluge
Mapping brain activity will produce nearly as much data as the Large
Hadron Collider, yet managing the sheer volume of information will be
the simplest challenge for brain data managers.
BRAIN Initiative spans biology, physical sciences, engineering, computer
science, and the social and behavioral sciences.
Research - the development of molecular-scale probes that can sense and
record the activity of neural networks;
Adva es i “Big Data to a alyze the huge a ou ts of i for atio
mainly to understand how thoughts, emotions, actions, and memories
are represented in the brain.
# Fund $300 million /year for next 10 years – by 3 federal agencies - NIH,
DARPA and NSF also 4 private research institutes
Role of e-Infrastructure
Grid Computing
Climate
Modeling
Disaster
Management
Bio
Informatics
CFD
Crypt
analysis
Grid Middleware
GG-BLR
GG-CHE
GG-HYD
TF BLR
TF PUNE
IITD
20
PRL
YUVA
Grid Computing

Sharing of resources among the community
 Seen as a collective pool
 Heterogeneous
 Geographically distributed
 Different Administrative domains
 Wide variety of Tools, Interfaces to choose
with.
Components of Grid Middleware
22
Popular Middleware
•
•
•
•
•
•
Globus – Globus Alliance
GridBus – University of Melbourne
UNICORE - Uniform Interface to Computing Resource
gLite – CERN / EGEE /EGI
Legion – (Avaki - Corporate Distributor)
Alchemi – (.NET Grid Computing Framework)
• Condor
• SGE
23
70 + Partners
6000 CPUs –
550TF
1700 +
Certificates
220TB Storage
EGI, chain reds
caBIG
NKN
GARUDA – India’s national grid computing initiative bringing together academic,
scientific and research communities for developing their data and compute intensive
18-Mar-15
applications.
National Knowledge Network
(NKN) Emerging Advanced Network
Themes
 Virtual
Classrooms
 Remote
Medical
Diagnosis
 Collaborativ
e
Research
 Grid
Computing
Multi-10G Core Backbone
Typically 1G
at the Edge
1000+ Institutes Connected
of 1500 Approved
Eventually
Connect
255,000 Villages
Core
Distribution
Edge
International: Mumbai-CERN Link: D. Foster Initiative (from 2008)
Now 2G guaranteed; bursting to 10G
2013: Transition to an International 10G Infrastructure
– Indian Grid Certification Authority located at C-DAC,
Knowledge Park, Bangalore, India.
– IGCA is the accredited member of APGridPMA.
– Issues X.509 Certificates to support the secure environment
in Grid. (for GARUDA, institutes that do research in grid
from India and foreign institutes that collaborates with
GARUDA).
– http://ca.garudaindia.in
1749
• Certificates
Issued
41
• Valid Host
Certificates
45
• Registration
Authorities
CLI
Workflows
Grid PSE
Access Portal
Cloud
Interface
Federated
Information Server
Hand held
devices
Programming
Development
Environment
Job Scheduler
WSRF+GT4 + other Services + Cloud S/W (Nimbus/ VMware)
Virtualization support
Grid Security and High-Performance Grid Networking
NKN
CDAC Resource centers
Non – Research
Research
Organizations
Educational institutions
Computing Centers
Organizations
Computing Resources and Virtual Organizations
Resources
18-Mar-15
Security
Middleware
Resource
Management
User
Environments
Programming
Environments
Data Grid
Data Grid
Resource Enabler & Monitoring
GARUDA – enabled Applications
Visualization
Garuda Access Portal
GSRM
18-Mar-15
Paryavekshanam
Garuda Information Registry
Garuda GridFTP
Globus Online
18-Mar-15
AGSG
PSP
Scilab
Galaxy Workflow – OSDD
Garuda
Megha
VRGeo
18-Mar-15
Garuda User Forum

CDAC Resource :
• 4TF HPC clusters each at
Bangalore, Chennai &
Hyderabad
• PARAM Yuva II at Pune
and PARAM Padma at
Bangalore

Fourteen of the partner
institutions
are
also
contributing
resources
including
satellite
terminals.

Total computing power is
more than 6000 CPUs
equivalent to 550TF

Storage space 220 TB
18-Mar-15
Job Flow
RSL
Job Template
Output & Error files are
available to user
GG-CHE
Gridway
GRIDFS
IMSC
USER
IITD
GLOBUS
TF PUNE
GG-BLR
TF BLR
GG-HYD
LRM SCRIPT
32
18-Mar-15
DMSAR Processing in Grid
DMSAR – Disaster Management using Synthetic Aperture
Radar
Co po e ts of a Grid e a led SA‘ Syste
Disaster
Remote
Visualization
Data Acquisition &
Raw data Transfer
Transferring the captured data
into disaster Ground Unit
Raw data transmission to the GARUDA Grid Head Node
Raw data Splitting & Initialization
Bangalore
P-1
Bangalore
Grid Head Node
Linux
AIX
Cluster Cluster
Delhi
P-2
TB of Input
Raw data
@ t1
Splitter
Programme
Linux
Cluster
Chennai
P-3
Linux
Cluster
Pune
P-4
Linux
Cluster
Backward Transfer of Results
Ban-HN
A/L
Cluster
Pune
Visualization
Server @
Bangalore
Linux
Cluster
Delhi
Linux
Cluster
Chennai
Using GARUDA High Speed
Network resources
Linux
Cluster
Using G-SAT resources
Grid based Real time Remote
Visualization Setup
Windows based tool interfaced with
Grid Setup
Bioinformatics : Open Source Drug Discovery
Project Team : OSSD community
OSDD HeadNode
Internet / NKN
Garuda Middleware
Stack, login service,
Gridway Metascheduler
DB
Ext DB
OSDD User
Community
OSDD Customized
Galaxy
Internet / NKN
GARUDA Grid
Internet / NKN
Garuda Middleware Stack
JNU
Cluster
OSDD Tools – weka, cdk,…
18-Mar-15
• Galaxy Workflow for genomics
proteomics applications
• Distributed job execution through
Gridway
LRM- Torque
Yuva
Cluster
• HPC clusters to run drug discovery
problems
• Users connected through both
NKN and Internet
NKN
GGHYD
Cluster
• OSDD users given access to
Garuda through OSDD VO
Other
OSDD
Cluster
Grid Enabled
Bioinformatics tools
useful in drug discovery
pipeline
CAE: Aeroacoustics Optimization
Project Team : Zeus Numerix
Aim:
Optimize the noise generated by a 3-D wing with flaps in landing
configuration by variation of flap location and orientation.
• Uses Kepler workflow Framework
integrated with native Globus job
submission routines
• Optimization Module uses OPT4J
framework
• Optimization module includes
AFFG (Adaptive Fuzzy Fitness
Granule) routine which can
reduce the number of fitness
function evaluations up to 50%.
• Completion sucessful 40
simultaneous simulations (parallel
+ serial )e Fuzzy Fitness Granule)
routine which
18-Mar-15
18-Mar-15
Scilab
• Open source,
cross-platform
numerical
computational
package and a
high-level,
numerically
oriented
programming
language.
• In collaboration with IITB
• scilab.in accesses Megha for executing scilab code and
rendering graphics
• Many textbooks examples are solved and available as
part of text book companion project
43
VRGeo
Open-Source
Collaborative Mapping
Platform for Crowdsourcing Geospatial
information
44
18-Mar-15
The Global Grid…
and the “non-Global” middleware
CNGrid
Genesis II
NKN &
Garuda
GISELA
46
courtesy : Roberto Barbera,
INFN
SAGrid &
SANREN
EUAsiaGrid
Collaboration in CHAIN-REDS
2,800 people
outreached in total
•
Serving applications of National Importance
•
First in India
•
Global Integration
– Alliance with the Open Source Drug Discovery (OSDD)
project of CSIR
– Disaster management applications
– Weather forecasting models & Earthquake
engineering
– Applications from the fields of Bioinformatics, CAE &
Material sciences
– Setting up of Indian Grid Certification Authority (IGCA)
in 2009, to issue digital certificates for grid
researchers in India
– Digital certificates trusted by other International
Certification authorities
– Issued more than 1400 IGCA certificates
– Integrated with the European Grid Infrastructure
through the EU-India Grid
– Achieved middleware interoperability between the
European Glite middleware & Garuda middleware
components
Conclusion
• C-DAC leading key developments in HPC, Grid
and Cloud
• Advanced e-infrastructure play a critical role in
big scientific discovery
• Garuda – unique platform in India provides
opportunity for R&D collaboration in order to
solve national problems
• Garuda also aims to accelerate international
Collaboration for research in next generation
technology
50
Applying Advanced Computing for Human Advancement
Thank you
www.cdac.in