Grid Infrastructure Monitoring

Grid Infrastructure Monitoring
System
y
Based on Nagios
g
E. Imamagic,
g D. Dobrenic
SRCE
HPDC 2007,, Workshop
p on Grid Monitoring
g
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios
Overview
™
™
™
Motivation
Nagios framework
Nagios-based grid monitoring
Š
Architecture
Š Grid e
extensions
tensions
Š Statistics
Š Demo
™
™
™
Contributions to WLCG Grid Service Monitoring WG
Future work
Conclusions
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios
Motivation
™
Provide site admin-centric monitoring
Š
™
E bl b
Enable
better
tt resource availability
il bilit
Š
™
simplify grid resources operations
issue notifications as soon as problem appears
Achieve complex sensor’s
sensor s dependencies
Š
enables problem isolation
Š only relevant notifications are issued
™
Visualization & management interface
Š
™
grid resources status
g
Report generation
Š
availability, problem history
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios
Nagios Framework
™
Open source monitoring framework
Š
™
™
Hostt and
H
d service
i problems
bl
d
detection
t ti and
d recovery
Provides wide set of basic sensors
Š
™
™
eas to de
easy
develop
elop ccustom
stom sensors
Centralized vs. distributed deployment
High configurability
Š
™
widely used & actively developed
service dependencies, fine-grained notification options
Web interface
Š
status view, administration
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios
Nagios--based Grid Monitoring
Nagios
™
Monitoring CRO-GRID Infrastructure (2004-2006)
Š
Globus Toolkit Pre-WS & WS, UNICORE, other services
Š active recovery of services
Š still in production within CRO NGI
™
Monitoring EGEE resources in Central Europe (CE)
Š
core services since mid 2006
Š all CE sites for 1st line support since September 2006
Š centralized deployment - single server @ SRCE
Š http://nagios.ce-egee.org
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios
Architecture
SAM
gatherer
Nagios
web
interface
VOMS proxy
certificate
Credential
refresh
Sensors
descriptions
Gather nodes
i f
information
ti
SE
SE
Site BDII
LFC
Site BDII
CE
CE
WMS
BDII
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios
MON
Grid Extensions
™
Grid sensors
Š
Security facilities & services
• CA distribution
distribution, Certificate lifetime
lifetime, MyProxy
MyProxy, VOMS
VOMS, VOMS Admin
Š
Monitoring & information services
• R-GMA, BDII, MDS, GridICE
Š
Job management services
• Globus Gatekeeper, RB, WMS, WMProxy, Job matching
Š
File management services
• GridFTP, SRM, DPNS, LFC
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios
Grid Extensions
™
™
Sensor hierarchy
Automatic recovery
Š
both local and remote services
Š security handled with sudo
™
™
Certificate based authentication for the web interface
NCG, SAM gatherer, Credential mgmt.
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios
Statistics
™
EGEE implementation statistics
Š
69 hosts
Š 570 services actively monitored
Š 1029 services results imported from SAM
™
Nagios server statistics (last month)
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios
Demo
EGEE implementation
p
web interface
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios
Contributions to WLCG Grid
Service Monitoring WG
™
™
™
All sensors rewritten to be compliant with Probe
specification
D
Developed
l
d iinterface
t f
tto N
Nagios
i d
data
t compliant
li t with
ith D
Data
t
exchange format
Nagios based prototype
Nagios-based
Š
several grid extensions used (NCG, credential management,
SAM gatherer)
g
)
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios
Future Work
™
™
Utilizing our extensions on site level
Distributing monitoring deployment
Š
™
™
™
hierarchy of Nagios servers
Migration of credential management to robot certificates
F th sensor development
Further
d
l
t
Service check execution optimization
Š
active
ti vs. passive
i checks
h k
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios
Conclusions
™
Nagios
Š
highly configurable monitoring framework with notifications,
service dependencies
dependencies, …
Š simple, programming language-agnostic sensor API
™
Grid extensions
Š
integration with existing infrastructure (user certificates, VOMS,
GOCDB, SAM)
Š sensors for
f key
k grid
id services
i
™
Nagios @ grid
Š
enables sites’ better availability
Š admins get only relevant notifications
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios
Thank
Th k Y
You!!
Questions?
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios