Case Study – Disaster Recovery

Network Diagram
Case Study – Disaster Recovery
DISASTER RECOVERY CENTRE
DATA CENTRE
DNS Server
DNS Server
Mail Server
(Zimbra)
Mail Server
( zimbra)
Web Server
Web Server
Cisco Security
Agent
Cisco Security
Agent
Media Server
Media Server
Need -Some applications hosted on SAN based
storage.
Need a multisite DR solution on the top
of their Fujitsu storage and brocade SAN.
The primary and DR site were connected
by leased line and the recovery/failover
SLA was below 5 minutes.
Our Solution ---
SAN Switch
Brocade 300
CTRL
SAN Switch
SAN Switch
CTRL
Fujitsu ETERNUS DX90
Storage Array
Brocade 300
CTRL
500 m Apart
Within same campus
SAN Switch
CTRL
Fujitsu ETERNUS DX90
Storage Array
A simple script based DR solution catering
scale out application stack.
A incremental SAN storage replication
solution which caters network outage.
A recovery mechanism being node
agnostic.
DC Data Centre
FC Connectivity To DR Site
DC DATA CENTRE
LAN
Switch
DNS Server
Mail Server
(Zimbra)
Web Server
Cisco Security Agent
DC-Management
server
Media Server
SAN Switch
Brocade 300
CTRL
SAN Switch
CTRL
Fujitsu ETERNUS DX90
Storage Array
500 m Apart
Within same campus
DR Data Centre
DISASTER RECOVERY DATA CENTRE
LAN
Switch
DNS Server
Mail Server
(Zimbra)
FC Connectivity to DATA Center
Web Server
Cisco Security Agent
DR --Management
server
Media Server
SAN Switch
Brocade 300
CTRL
SAN Switch
CTRL
Fujitsu ETERNUS DX90
Storage Array
500 m Apart
Within same campus
DC /DR Data Centre Setup
DC DATA CENTRE
DISASTER RECOVERY CENTRE
DNS Server
DNS Server
Mail Server
(Zimbra)
Mail Server
(Zimbra)
Web Server
Web Server
DMS1
DMS2
Cisco Security Agent
Cisco Security Agent
Media Server
Media Server
Brocade 300
SAN Switch
CTRL
SAN Switch
SAN Switch
CTRL
Fujitsu ETERNUS DX90
Storage Array
Brocade 300
SAN Switch
CTRL
500 m Apart
Within same campus
CTRL
Fujitsu ETERNUS DX90
Storage Array
Network Diagram
Architecture
Details
The DR architecture was designed based on two Data Management Server placed each in primary and secondary
site . Each DMS virtualizes the protection storage, provisioned from fujitsu Array. We have provisioned a
distributed object file system, to enable instant storage of virtual copies of point-in-time data from the collection
of applications. Each application was tied up with a SLA's to determine the lifecycle of the application data. Each
SLA's would specify the following:
• The frequency of application data snapshots to be taken
• The storage pool in DMS which they’ll be kept, for example in the Tier I pool on SAS disk of the most recent
snapshots, or in the de-duplicated Tier II pool using capacity optimized SATA drives,
• A retention policy directing how long they’re to be stored
The consolidated application data is stored in DMS node. The data is de duplicated to reduce the network
bandwidth require to transfer between sites. The de duplicated consolidated data is replicated between two
DMS in sync or async manner with the change block tracking mechanism. The dataflow is from primary to
secondary direction always.
For any corresponding backup job SLA , the corresponding recovery job is provisioned with the associated
metadata of the backup job. The consistent snapshot of application data and it's catalogue is kept in both the
primary and secondary sites.
DATA CENTRE
DISASTER RECOVERY CENTRE
Passive Server
Active Server
Backup
Virtualized Volume
Restore
Virtualized Volume
DMS2
DMS1
Virtualized Volume
Sync/Async Replication
Virtualized Volume
PR/DR Site
Primary Site Array
Secondary Site Array
STATE TRANSITION
Network
Recovery Cycle
andDiagram
State Transition
Each application server in PR/DR site lies in active /Passive mode. The active server data is periodically stored in
PR DMS. The data capture is done based on changed file sets from the last backup. The PR DMS transfers
application backup data and its catalogue to DR DMS in sync/async manner. In case of local disk failure , the data
can be recovered from local DMS. In case of site failure , a passive server is provisioned and backup data is
recovered from DR DMS. The recovery of the data happens through network only.
The PR / DR site determination is triggered through user defined scripts. The data flow takes place from PR
DMS to DR DMS only. To make site failover transparent to user , the PR and DR keeps a separate name server
with the same DNS name mapped to different id. In the user console the primary DNS is pointed to PR DNS and
secondary DNS is pointed to DR DNS server. That way user can be directed to active server node of the
application.
TCO / Benefit Analysis
The DR setup is working in 4 Mbps Network Bandwidth and caters Network Breakdown. The dedup capability
optimize the bandwidth requirement. The solution is highly script based and requires no over provisioning. It
was cluster aware. The DMS server boots the application in less than 4 seconds and the application size is
less than 10 MB. It was integrating seamlessly to their SAN storage. The cost of the entire DR solution was
20K $ , which was 10% of the nearest competitors quoted