OpenStack collaborates with Biarca to Develop a Health Monitoring

Health Monitoring System for OpenStack
OpenStack collaborates with Biarca to Develop a Health
Monitoring Tool for Swift Clusters
Overview
OpenStack is open source software that provides Infrastructure as a Service (IaaS) for building and
managing cloud computing platforms for public and private clouds. It is backed by some of the biggest
companies in software development and web hosting, as well as thousands of individual community
members. OpenStack is managed by the OpenStack Foundation, a non-profit organization that oversees
both development and community-building around the project. OpenStack is a loosely coupled
distributed system, with many “moving parts.” OpenStack Swift is the part that handles object storage.
Key Customer Requirements
OpenStack defines Swift as a highly available, distributed, eventually consistent object store. It can be
used by organizations to store lots of data efficiently, safely and cheaply. The major components of a
Swift cluster are proxy server nodes and storage server nodes. These nodes run many processes and
services to keep the cluster up and running, and provide overall availability. It is important to be able to
monitor what is going on inside the cluster, depict health of the storage and predict failures. Tracking
server-level metrics like CPU utilization, load, memory consumption, disk usage and utilization, are
necessary, but not sufficient. Though there are general server management tools/applications, and a
few ways to gather metrics for a Swift cluster, such as Swift Recon, Swift StatsD, Swift Dispersion, and
Swift Informant, these are not complete in themselves to locate anomalies in such a complex
ecosystem. There are multiple approaches to Swift monitoring and it requires a combination of these
various algorithms to predict failures.
Biarca, a leading professional services company designed a customized solution that would precisely
predict failures in object storage clusters.
Unique Customized Solutions from Biarca
The Biarca team did a thorough study of Swift cluster metrics and behaviors. They developed an open
source application to monitor the health of a swift cluster. This application checks the status of drives
and services of each node within the swift cluster, and provides a graphical layout of the swift cluster
along with the status of each node. The application also applies anomaly detection (machine learning)
algorithm on top of the statsd metrics to determine the health of the cluster.
The chief functions of this application are:
●
Monitor the status of drives in the cluster
●
Monitor the status of swift services on each node in the cluster
●
Monitor the logs of statsd metrics in the system
●
Apply anomaly detection algorithm on statsd metrics
This is a screenshot of Tulsi when the system is healthy
This is a screenshot of Tulsi when a disk is down.
This is a screenshot of Tulsi when a node is down.
Business Benefits
Normally, in redundant systems the failure of a component requires a corrective action to be taken. But,
In the case of Swift clusters, failures are expected and the system self heals to a point. So, the tricky
part is figuring out when to ignore a failure and when to act on it. This is where the Health Monitoring
tool is invaluable. It is able to identify failures occurring within the cluster and discern when action
needs to be taken and thereby, ensure the high availability of a Swift cluster at optimal cost. The Swift
Monitoring tool reports actionable failures to the user through a user friendly GUI, thereby helping the
administrator to see the failures as they occur so that he can plan when to corrective action.
Customer Testimonial
"As part of EVault/ Seagate, we implemented a public cloud storage offering using OpenStack
Swift. We found that while the core storage functionality of Swift is rock-solid, the manageability and
monitoring aspect has numerous gaps. The Tulsi project from Biarca addresses this exact problem
making Swift a lot more usable. I am very excited to see the code being open-sourced and am hopeful
that a community will form around it!"
Amar Kapadia, OpenStack Swift Blogger
Customer solution at a glance
First deploy a Swift cluster. And then install the health monitoring solution modules in the cluster. The
swift cluster is now being monitored and any failures in services and disks that might occur start getting
reported. The project is open sourced and can be downloaded from the following github location:
https://github.com/vedgithub/tulsi.