Vsm - 01.org

VSM Introduction & Major Features
Wang, Yaguang
Ferber, Dan
March 2015
1
Agenda
 Overview
 Architecture & Key Components
 Major Features
2
Overview
3
VSM Overview
• VSM (Virtual Storage Manager) is an open source ceph management tool developed by Intel,
and announced on 2014 Nov’s openstack Paris summit. It is designed to help make the day to
day management of Ceph easier for storage administrators.
• Home page:
• https://01.org/virtual-storage-manager
• Code Repository:
• https://github.com/01org/virtual-storage-manager
• Issue Tracking:
• https://01.org/jira/browse/VSM
• Mailing list:
• http://vsm-discuss.33411.n7.nabble.com/
4
VSM Overview
• VSM is designed for bundled ceph storage appliance, it creates ceph cluster for management and
monitoring. So far it runs on Centos 6.5 with Ceph firefly and openstack havana icehouse.
Allows Easier Ceph Creation and
Management
Creates Ceph cluster
Manages Ceph
Monitors Ceph
Web-based UI
Administrator-friendly interface
for management and
monitoring
VSM API Interfaces
For automation
5
Typical VSM-Managed Cluster
VSM Controller – Dedicated server or server instance
Server Nodes
 Are members of VSM-managed Ceph cluster
 VSM agent runs on every server in VSM-managed
cluster
OpenStack-Administered Network
VSM
Controller
OpenStack
Admin
Network Configuration
Client
Node
Client
Node
Client
Node
RADOS
RADOS
RADOS
 Ceph public subnet – Carries data traffic between
clients and Ceph cluster servers
 Administration subnet – Carries administrative
communications between VSM controller and agents
 Also administrative comms between Ceph daemons
 Ceph cluster subnet – Carries data traffic between
Ceph storage nodes – replication and rebalancing
OpenStack admin (optional)
Ceph public - 10GbE
or InfiniBand
Administration GbE
Server Node
Server Node
Server Node
Server Node
VSM Agent
VSM Agent
VSM Agent
VSM Agent
Monitor
Monitor
Monitor
OSD OSD OSD
OSD OSD OSD
OSD OSD OSD
OSD OSD OSD
SSD
SSD
SSD
SSD
 Optionally connected to VSM via SSH connection
 Allows VSM to “tell” OpenStack about Ceph storage
pools
Ceph cluster - 10GbE
or InfiniBand
6
Architecture & Key components
7
VSM Architecture
Web UI
python-vsmclient
API
scheduler
storage
conductor
RabbitMQ
agent
storage
agent
storage
agent
8
Key Components: Overview

Python-vsmclient


This is a client for the vsm API, it consists of

a Python API (the vsmclient module),

a command-line script (vsm). Each implements 100% of the vsm API.
Vsm


A major module for ceph management
Vsm-dashboard


web based management interface for VSM.
Vsm-deploy

The ceph deployment tool kit provided by VSM .
9
Key Components: relationships
Vsm
component
Vsm-deploy
REST
3rdparty
Vsm-dashboard
CLI
Openstack
vsmclient
REST
REST API
Keystone
VSM
SSH
Nova
Ceph
SSH
Cinder
10
Key Components: interaction flow
VSM Architecture
External Party
VSM Agents
VSM UI Portal
(on VSM Controller Node)
a. HTTPs authenticate with username/password, get token
2.Authenticate via REST
VSM Controller Node
Keystone
(Authentication
Center. Listen
on ports 5000
&35357)
SSH Channel
9. Socket Connection
Apache/Django
(VSM Dashboard)
5.(c.)Https Authenticate
via REST
3.Library Call
OpenStack
MySQL DB
(Listen on
ports 3306)
(Via SQLAlchemy+MariaDB Client)
1. HTTPS with
Username/Password
OpenStack
Nova Controller
11.(0)HTTPS authenticate & Register via REST
b. Https request w/ token
External Client
(On Ceph
Storage/Monitor Nodes)
Ceph
(On Ceph
Storage/Monitor
Nodes)
VSM
Agent 1
Ceph Command Call
VSM
Conductor
(Manage DB
Access)
VSM
Agent 2
8.RPC Message
Ceph
Ceph Command Call
10.RPC Message
Operator Web Browser
Rabbit MQ
(Listen on
port 5672)
VSM Web
Client Proxy
Ceph
6.RPC Message
VSM API
(Expose API.
Listen on port
8778)
7. RPC Message
VSM Software
VSM Scheduler
(Schedule and
dispatch tasks to
Agent, Conductor)
3rd Party Software
3rdParty Software – Customized
Authentication Messages
U
4.HTTPS
REST
11
Major Features
Major Features
Cooperation
Cluster Management
-
Managing monitors or servers
-
Replacing failed disks/servers
-
Managing storage pools (replicated,
erasure-coded or cache tiering)
-
REST API & CLI
Cluster Monitoring
-
Pools for OpenStack
-
Cluster Health and OSD/PG and
Monitors status on dashboard
-
Capacity utilization for Storage Groups
and Pools
VSM
Configuration Management
Capacity Management
-
-
Grouping storage devices by function
or performance
-
Monitoring capacity for storage group
and pool
-
Creates and controls Ceph cluster
configuration through pre-defined
manifests
Operator-selectable configuration
options are defined and validated by
OEM or System Integrator
Openness
-
Apache V2 license.
-
Written in python
-
Bundle (CentOS 6.5,
Ceph Firefly 80.x,
Openstack
Havana/Icehouse)
13
What it does…
Internet
Web-based UI

Administrator-friendly interface for management and
monitoring
Configuration management

Storage Groups aggregate similar drives

Zones aggregate drives within failure domains
Data Center
Data Center Firewall
Capacity Management
OpenStack

Pools segregate allocation by application
Nova Controller
Server(s)

Capacity utilization by storage group and pool
Cluster Management

Manage capacity growth

Manage Server & disk maintenance
Cluster Monitoring

Capacity and performance

OSD, PG, and monitor state
VSM APIs

Client Server
Operator
Cluster Server
SSH
HTTP
API
(OSDs/Monitor/MDS))
VSM
Controller
(Controller Node
Server)
VSM Agent
Socket
(on each Ceph node)
Management framework = Consistent configuration
Operator-friendly interface for management & monitoring
Software interface supports automation
14
What it is…
VSM Controller Software
 Runs on dedicated server (or server instance)
 Connects to Ceph cluster through VSM agent
 Connects to OpenStack Nova controller (optional)
via SSH
 Never touches clients or client data
VSM Agent Software
 Runs on every server in the Ceph cluster
 Relays server configuration & status information
to VSM controller
15
Managing Servers and Disks
Servers can host more than one type
of drive, drives with similar
performance characteristics are
identified by Storage Class.
Drives with the same Storage Class
are grouped together in Storage
Groups
Storage Groups are paired with
specific Storage Classes.
7200_RPM_HDD
10K_RPM_HDD
15K_RPM_HDD
Storage
class
“Capacity” = 7200_RPM_HDD “Performance” = 10K_RPM_HDD “High Performance” = 15K_RPM_HDD
VSM monitors Storage Group capacity
utilization, warns at threshold.
Capacity
Performance
High Performance
Storage Classes and Storage Groups
are defined in the cluster manifest file
Drives are identified by Storage Class
in the server manifest file
Storage
group
16
Managing Failure Domains
Zone 2
Zone 1
Zone 3
Servers can be grouped into failure
domains. In VSM, failure domains are
indented by zones.
Zones are placed under each Storage
Group
Drives in each zone are placed in their
respective storage group
In the example at right, six servers are
placed in three different zones. VSM
creates three zones under each storage
group, and places the drives in their
respective storage groups and zones.
Performance
Capacity
Zone 1
Zone 2
Zone 2
Zone 1
Zone 2
Zone 2
Zones are defined in the cluster manifest
file
Zone membership is defined in the
server manifest file
7200_RPM_HDD
(Capacity)
10K_RPM_HDD
(Perfromance)
17
One Zone with server-level
replication
17
VSM Controller: Cluster Manifest
VSM Controller runs on dedicated server or server instance
 VSM controller uses the cluster manifest file to define the cluster meta information,
like storage classes, storage groups, zones, subnet, and also cluster settings, like
capacity threshold for warning, ec and cache tier settings.
Meta information
 The cluster manifest defines the storage classes, storage groups and zones to be used
in the ceph cluster, also it defines the subnets for management, public or cluster.
Cluster Setting
 The cluster manifest provides a few settings to help tune cluster behaviors, like
capacity thresholds, heartbeat intervals.
 Also the default settings for ec pool and cache tiering are also covered.
18
VSM Controller: Cluster Manifest File
Cluster Manifest File
•
•
[storage_class]
7200_rpm_sata
10krpm_sas
ssd
ssd_cached_7200rpm_sata
ssd_cached_10krpm_sas
Storage classes defined
[storage_group]
#format: [storage group name] ["user friendly storage group name"] [storage
class]
high_performance "High_Performance_SSD"
ssd
capacity
"Economy_Disk"
7200_rpm_sata
performance
"High_Performance_Disk" 10krpm_sas
value_performance "High_Performance_Disk_with_ssd_cached_Acceleration"
ssd_cached_10krpm_sas
value_capacity
"Capacity_Disk_with_ssd_cached_Acceleration"
ssd_cached_7200rpm_sata
Storage groups
[cluster]
cluster_a
Data disk file system
[management_addr]
192.168.123.0/24
Network configuration
[ceph_cluster_addr]
192.168.125.0/24
[zone]
Zone0
zone1
[settings]
storage_group_near_full_threshold
storage_group_full_threshold 85
ceph_near_full_threshold 75
ceph_full_threshold 90
pg_count_factor 100
heartbeat_interval 6
osd_heartbeat_interval 7
osd_heartbeat_grace 21
65
settings
[ec_profiles]
#format: [profile-name] [path-to-plugin] [plugin-name]
[pg_num value] [json format key/value]
#the key/value strings should not have
spacesdefault_profile /usr/lib64/ceph/erasure-code
jerasure 3
{"k":2,"m":1,"technique":"reed_sol_van"}#profile_name2
/usr/lib64/ceph/erasure-code jerasure 6
{"k":4,"m":2,"technique":"reed_sol_van"}
Ec pool profiles
Cluster name
[file_system]
xfs
[ceph_public_addr]
192.168.124.0/24
Resides on the VSM controller server.
Tells VSM how to organize storage devices,
how the network is configured, and other
management details
[cache_tier_defaults]
ct_hit_set_count 1
ct_hit_set_period_s 3600
ct_target_max_mem_mb 1000000
ct_target_dirty_ratio 0.4
ct_target_full_ratio 0.8
ct_target_max_objects 1000000
ct_target_min_flush_age_m 10
ct_target_min_evict_age_m 20
Cache tier default
parameters
zone
19
VSM Agent: Server Manifest
VSM Agent runs on every server managed by VSM
 VSM Agent uses the server manifest file to identify and authenticate with the VSM
controller, and describe server configuration
Discovery and authentication
 To be added to a cluster, the server manifest file must contain the IP address of the VSM
controller, and a valid authentication key
 The authentication key is generated by vsm controller in advance, to be applied into
server manifest before launching agent daemon.
Server Configuration
 VSM relies on the server manifest to identify and classify data devices and associated
journals. VSM does not have knowledge of how storage devices have been partitioned.
 Devices and partitions are specified “by path” to ensure that paths remain constant in the
event of a device removal or failure.
20
VSM Agent: Server Manifest
Address of VSM Controller
[vsm_controller_ip]
#10.239.82.168
[role]
storage
monitor
Include “storage” if server will host OSD daemons
Include “monitor” if server will host monitor daemons
[auth_key]
token-tenant
Authentication key provided by authentication key tool on
VSM controller node.
[7200_rpm_sata]
#format [sata_device] [journal_device]
%osd-by-path-1%
%journal-by-path-1%
%osd-by-path-2%
%journal-by-path-2%
%osd-by-path-3%
%journal-by-path-3%
%osd-by-path-4%
%journal-by-path-4%
[10krpm_sas]
#format [sas_device] [journal_device]
%osd-by-path-5%
%journal-by-path-5%
%osd-by-path-6%
%journal-by-path-6%
%osd-by-path-7%
%journal-by-path-7%
%osd-by-path-7%
%journal-by-path-7%
[ssd]
#format [ssd_device]
Server Manifest File
• Resides on each server that VSM manages.
• Defines how storage is configured on each server
• Identifies other roles (Ceph daemons) that should
be run on the server
• Authenticates servers to VSM controller
Storage Class 7200_rpm_sata: Specifies path to four 7200 RPM
drives and their associated journal drives/partitions
Storage Class 10krpm_sas: Specifies path to four 10K RPM drives
and their associated journal drives/partitions
[journal_device]
[ssd_cached_7200rpm_sata]
#format [intel_cache_device]
[journal_device]
[ssd_cached_10krpm_sas]
#format [intel_cache_device]
[journal_device]
No drives associated with these Storage Class
21
VSM Operation Map
Getting Started
Log In
Create
Cluster
Navigation
Dashboard
Overview
Managing
Capacity
Storage
Group Status
Manage
Pools
Creating
Storage Pools
RBD Status
Monitoring
Cluster Health
Dashboard
Overview
OSD Status
Monitor
Status
PG Status
Managing
Servers
Manage
Servers
Add & Remove
Servers
Add & Remove
Monitors
Stop & Start
Servers
Managing
Storage
Devices
Manage
Devices
Restart OSDs
Remove OSDs
Restore OSDs
Working with
OpenStack
OpenStack
Access
Managing
Pools
Managing VSM
Manage VSM
Users
Manage VSM
Configuration
MDS Status
22
Live Demo
23
Thank You!