Health Monitoring System for OpenStack OpenStack collaborates with Biarca to Develop a Health Monitoring Tool for Swift Clusters Overview OpenStack is open source software that provides Infrastructure as a Service (IaaS) for building and managing cloud computing platforms for public and private clouds. It is backed by some of the biggest companies in software development and web hosting, as well as thousands of individual community members. OpenStack is managed by the OpenStack Foundation, a non-profit organization that oversees both development and community-building around the project. OpenStack is a loosely coupled distributed system, with many “moving parts.” OpenStack Swift is the part that handles object storage. Key Customer Requirements OpenStack defines Swift as a highly available, distributed, eventually consistent object store. It can be used by organizations to store lots of data efficiently, safely and cheaply. The major components of a Swift cluster are proxy server nodes and storage server nodes. These nodes run many processes and services to keep the cluster up and running, and provide overall availability. It is important to be able to monitor what is going on inside the cluster, depict health of the storage and predict failures. Tracking server-level metrics like CPU utilization, load, memory consumption, disk usage and utilization, are necessary, but not sufficient. Though there are general server management tools/applications, and a few ways to gather metrics for a Swift cluster, such as Swift Recon, Swift StatsD, Swift Dispersion, and Swift Informant, these are not complete in themselves to locate anomalies in such a complex ecosystem. There are multiple approaches to Swift monitoring and it requires a combination of these various algorithms to predict failures. Biarca, a leading professional services company designed a customized solution that would precisely predict failures in object storage clusters. Unique Customized Solutions from Biarca The Biarca team did a thorough study of Swift cluster metrics and behaviors. They developed an open source application to monitor the health of a swift cluster. This application checks the status of drives and services of each node within the swift cluster, and provides a graphical layout of the swift cluster along with the status of each node. The application also applies anomaly detection (machine learning) algorithm on top of the statsd metrics to determine the health of the cluster. The chief functions of this application are: ● Monitor the status of drives in the cluster ● Monitor the status of swift services on each node in the cluster ● Monitor the logs of statsd metrics in the system ● Apply anomaly detection algorithm on statsd metrics This is a screenshot of Tulsi when the system is healthy This is a screenshot of Tulsi when a disk is down. This is a screenshot of Tulsi when a node is down. Business Benefits Normally, in redundant systems the failure of a component requires a corrective action to be taken. But, In the case of Swift clusters, failures are expected and the system self heals to a point. So, the tricky part is figuring out when to ignore a failure and when to act on it. This is where the Health Monitoring tool is invaluable. It is able to identify failures occurring within the cluster and discern when action needs to be taken and thereby, ensure the high availability of a Swift cluster at optimal cost. The Swift Monitoring tool reports actionable failures to the user through a user friendly GUI, thereby helping the administrator to see the failures as they occur so that he can plan when to corrective action. Customer Testimonial "As part of EVault/ Seagate, we implemented a public cloud storage offering using OpenStack Swift. We found that while the core storage functionality of Swift is rock-solid, the manageability and monitoring aspect has numerous gaps. The Tulsi project from Biarca addresses this exact problem making Swift a lot more usable. I am very excited to see the code being open-sourced and am hopeful that a community will form around it!" Amar Kapadia, OpenStack Swift Blogger Customer solution at a glance First deploy a Swift cluster. And then install the health monitoring solution modules in the cluster. The swift cluster is now being monitored and any failures in services and disks that might occur start getting reported. The project is open sourced and can be downloaded from the following github location: https://github.com/vedgithub/tulsi.
© Copyright 2024