Taking snapshot of a thousand dancing dolphins www.zmanda.com Twitter: @zmanda

Backup of Distributed MySQL Applications
Taking snapshot of a thousand dancing dolphins
Chander Kant
CEO
Paddy Sreenivasan
VP Engineering
www.zmanda.com
Twitter: @zmanda
Cloud Backup
Twitter @zmanda
Open Source Backup
1
Zmanda
• Worldwide Leader in Open Source Backup
• 500,000+ Protected Systems
• Open Source, Open APIs, Open Formats
• Smashes traditional backup business model
• MySQL Backup Specialist
• Zmanda Recovery Manager for MySQL
• Zmanda Cloud Backup
Cloud Backup
Twitter @zmanda
Open Source Backup
2
Protected by Zmanda
Subscribers of Enterprise Editions
Web and Media
Government
Research & Education
Telecom & Service Providers
Manufacturing & Services
Cloud Backup
Twitter @zmanda
Open Source Backup
3
Top 5 MySQL Backup Requirements
• Backup live database with minimal impact on application and users
• Versatile
•
Scale Out = Multitude of servers
•
Scale up = Large Databases with no increase in lock times
•
Backup of local or remote MySQL servers
• Intelligent Recovery
•
Precise restore to a particular point-in-time or database event
•
Fast restore in case of failure
• Global Enterprise Management
•
Manage all databases from a single entity
•
Backup automation from scheduling, monitoring to reporting
• Easy to Use and Secure
Cloud Backup
Twitter @zmanda
Open Source Backup
4
Zmanda Recovery Manager for MySQL
ZRM remote to MySQL
ZRM local to MySQL
Enterprise-wide MySQL backup
Cloud Backup
ZRM to MySQL Cluster
Twitter @zmanda
Open Source Backup
5
Zmanda Recovery Manager (ZRM) for MySQL
As easy as What, Where, When and How.
Cloud Backup
Twitter @zmanda
Open Source Backup
6
Backups of MySQL Running on Amazon EC2
Zmanda Management
Console
EC2
EBS
EBS
Backup Catalog
Incremental Backups
EBS
S3
Full Backups
Cloud Backup
Twitter @zmanda
Open Source Backup
7
Blazing Fast Snapshot based Full Backups
Scenario:
• 100+GB of database growing into Terabytes
• 24x7 application (i.e. no backup window)
• Active OLTP workload
• Need ability to restore to specific database event
Solution:
• Storage Snapshot + MySQL Logs + Automated
point-and-click restore
• Solaris 10 x86
• ZFS Snapshot
• MySQL Enterprise 5.0
• ZRM
• Raw copy speed of 500 GB/hr
Cloud Backup
Twitter @zmanda
Open Source Backup
8
Point-in-time Recovery
1. ZRM creates unified snapshots
of data and MySQL binary log
2. For point-in-time recovery
between T2 and T3, ZRM
reads data from snapshot T2
and replays transactions from
Binlog T3 up to RPO.
1
3. Note that ZRM can treat inplace snapshot as a backup
(which is ideal for EBS
Snapshots)
Cloud Backup
Twitter @zmanda
2
Open Source Backup
9
Take a Snapshot of a Thousand Dancing Dolphins
Cloud Backup
Twitter @zmanda
Open Source Backup
10
Backup & DR Needs for a large-scale MySQL Implementation
• Application managers desire a point-in-time restore
which is coordinated across multiple servers
• IT managers want to have as identical configuration
across all nodes - so process of replacing nodes
becomes simple
• Depending on the application, retention policy could be
several years
• Overall application should be able to recover from
multiple node failures, human errors or sabotage, and
geographic problems (disaster, connectivity etc.)
Cloud Backup
Twitter @zmanda
Open Source Backup
11
Coordinated Backups vs. Coordinated Restore
Coordinated Backups
• Backup all nodes consistent to a specific event
• E.g. all rows are backed up until a specific Global
Sequence Number (GSN) or create a checkpoint
event specifically for backup purposes
• Cleanest backup images but periodic hiccups
Cloud Backup
Twitter @zmanda
Open Source Backup
12
Coordinated Backups vs. Coordinated Restore
Coordinated Restore
• Each individual node backed up completely independent
of each other
• No checkpoint event
• However more processing required at the time of
recovery
• ZRM can be scripted to identify this point in the
backed up binary logs for every shard
• Visual log analyzer feature of ZRM helps DBAs to
efficiently search for these points
• Clock synchronization helps
Cloud Backup
Twitter @zmanda
Open Source Backup
13
Cloud Backup
Twitter @zmanda
Open Source Backup
14
Cloud Backup
Twitter @zmanda
Open Source Backup
15
Recover Anytime, Anywhere.
Cloud Backup
Twitter @zmanda
Open Source Backup
16
Case Study: ZRM configuration with MySQL Shards
100 database nodes
Consolidated Meta
ZRM server
ZRM servers
LVM Snapshots
Converted full
and incremental
backups
NFS
Remote
Remote Data
Center
Shared Storage (with Deduplication)
Cloud Backup
Twitter @zmanda
Open Source Backup
17
Case Study: Restoration Scenarios
• Recovery from application errors
• Apply transactions for the node (or across nodes)
• Recovery from failed disk or node
• Apply full backup and incremental backups to latest
checkpoint
• ZRM provides portable backup images
Cloud Backup
Twitter @zmanda
Open Source Backup
18
Backup images
• Local full backup image is a LVM snapshot on the local node
• The LVM snapshot is converted into regular backups on a
weekly basis in the background
• The incremental backup data is available over NFS to ZRM
meta backup server
• The backup images and the catalog from shared storage are
replicated to a remote datacenter
Cloud Backup
Twitter @zmanda
Open Source Backup
19
Backup policies
• The full and incremental backups are compressed
• Unless deduplication based storage is deployed
• The shared storage for backups can use deduplication
Cloud Backup
Twitter @zmanda
Open Source Backup
20
Restoration steps (Operator error)
• Identify offending record change
• Use Visual Log Analyzer of ZRM on hosts for the record
• Reasonable time synchronization is helpful here
• Identify prior event for the key
• Use Search in Zmanda Management Console
• Coordinated Restore Script
• Application level script takes input from ZRM and commits
new records for all effected nodes.
Cloud Backup
Twitter @zmanda
Open Source Backup
21
Restoration steps (Failed node)
• Restore failed node to last available backup
• Use Meta ZRM server for restoration
• If a checkpoint is present, use Visual Log Analyzer of
ZRM to identify the last restored checkpoint
• Call Application level node synchronization procedure
Cloud Backup
Twitter @zmanda
Open Source Backup
22
Zmanda: Backup To Cloud
Cloud Backup
Twitter @zmanda
Open Source Backup
23
Zmanda: Backup To Cloud
Cloud Backup
Twitter @zmanda
Open Source Backup
24
Zmanda Cloud Backup (For MySQL on Windows)
• Apps: Exchange, SQL Server, Oracle, SharePoint and
MySQL
• Compliant with EU Data Protection Directive 95/46
• Network Drive support
• Logical full backups only
• Can backup remote MySQL databases
Cloud Backup
Twitter @zmanda
Open Source Backup
25
Zmanda Recovery Manager in Action
• More than one million new Athletes created every month.
• Each with the ability to customize their avatars, accumulate game credits
and buy virtual prizes.
• Combination of users, identities, games-in-play, credits and prizes
generates a lot of data at a very fast pace — all of which is core to the
company's success.
• Multiple Storage Engines: InnoDB, MyISAM and Archive
• In addition to regular full backups, the company must complete an
incremental backup of MySQL every 15 minutes.
Cloud Backup
Twitter @zmanda
Open Source Backup
26
Zmanda Recovery Manager in Action
“ZRM helps us formalize and automate the backup process for all our
production data, and consolidates all backups from different systems
into one consistent platform.... Furthermore, the ZRM platform greatly
simplified our production systems' recovery scenarios by reducing the
number of steps required in the data recovery process.”
Franck Leveneur,
Senior Data Architect,
Six Degrees Games, Inc
Cloud Backup
Twitter @zmanda
Open Source Backup
27
Protected by Zmanda
Subscribers of Enterprise Editions
Web and Media
Government
Research & Education
Telecom & Service Providers
Manufacturing & Services
Cloud Backup
Twitter @zmanda
Open Source Backup
28