Business Resilience End-to-End Traditional BCP/ DR Approach GMAC-RFC

Business Resilience
End-to-End
GMAC-RFC
Case Study
Chuck Wachter, CDRP
BRM Program Manager
952-857-6384
[email protected]
Traditional BCP/ DR Approach
Systems
Lost Data
Vital Records
Applications
Data
Restore Technology Capability
Notifications
Restore Communications
Recovery Point Objective
Move to
Resume Alternate Return
Site
Business
Home
Data Synchronization
Restore Business Functions
Relocate Office Equipment /
Supplies
Work Flow
Recovery Time Objective
Arriving at BRM
•
Significant growth created a complex and interdependent business and
IT systems environment.
•
Analysis concluded that unacceptable loss would occur from a
significant outage lasting more than 24 hours.
•
Proposed external vendor solutions could not mitigate the problem and
meet recovery requirements.
•
Required an integrated, sustaining, all-encompassing approach.
•
Determined an internal recovery solution would meet requirements,
create a lower CODB, and provid additional benefits:
– Addressed day-to-day impacts up to catastrophic versus catastrophic only
– Minimize vendor contractual limitations
– Process enhancements: change management, testing, service delivery,
Incident Response etc.
1
Industry Best Practices
•
•
•
•
Over 60% of G2000 organizations are implementing a dual data center
strategy to support continuous availability
Business and IT availability classification requirements aligned with
associated cost, application and system architecture requirements
Companies that manage resilience internally achieve a greater degree of
maturity and business process alignment with their supporting IT systems
Financial services industry is under increased scrutiny by regulatory agencies
due to the critical nature and impact of its services to the economy. Their
response includes:
– Investment in failover capabilities to significantly reduce the time for recovery and
to respond to widespread/regional disruptions
– A realignment from the traditional approach of recovering technology and facilities
toward a full business resumption model
– Increased frequency of exercising business and IT resilience capabilities
As real-time business requirements become more pervasive, business
continuity must be integrated throughout the corporate culture and
business processes, with clear accountability and measurement
defined to align with acceptable corporate risk.
Business Resilience Defined
Resilience is the ability and capacity to withstand and adapt to new risk
environments. A resilient organization effectively aligns its strategy, operations,
business systems, governance structure, and decision-support capabilities so
that it can uncover and adjust to continually changing risks, endure disruptions to
its primary earnings drivers, and create advantages over less adaptive
competitors.
Program Mission
The Business Resiliency program manages the organizations capabilities to
continue to provide services at anytime, regardless of the event and impact.
Prioritization of investments in people, processes, technology and facilities are
based on business risk and criticality. Comprehensive testing continuously
validates the recovery capabilities and an integrated governance model assures
transparent coordination and reporting.
Best Practice:
Resiliency Model
Business
Processes
IT
Services
Internal Sponsorship
& Governance
BCDR
Program
BCDR Framework
Infrastructure
Capabilities
Suppliers
Operational
Management
2
Leadership needs to ensure that business and IT BCDR
strategies are continuously aligned to create value
Business-Driven Strategy
Business/IT Alignment
Model
Business define
requirements & strategy
Focus is on making “today’s
and future business better”
Accomplished in
conjunction with
Governance, business
process improvements and
capabilities
BCDR requirements are
understood and integrated
in business process,
valuable and cost effective
IT-Enabled Strategy
IT aligns and offers BCDR
strategic capabilities to
enable new growth,
products/services, channels
Creates BCDR “portfolio
visibility” for business
leverage
Provides responsive, flexible
technology environment
IT BCDR Services are cost
effective, responsive and
measured
Approach
Business resiliency is
incorporated throughout the corporate
culture and business
processes. Every level
of the organization has
been evaluated, with
new models and
methodologies
developed and
integrated into daily
processes, ensuring
resiliency and
compliance with
regulatory requirements
at every layer.
STRATEGY
• Governance
• Continuity Strategy
• Availability Strategy
• Recovery Strategy
• Communications
• Risk Management
APPLICATIONS and DATA
• Application Architecture /
Design
• Application Availability /
Recovery
• Application Integration
• Data Backup/Recovery
• Data Security
ARCHITECTURE &
TECHNOLOGY
• Platforms & Networks
• Systems Software
• Storage
• Middleware
• Standards
ORGANIZATION
• Roles
• Responsibilities
• Skills
• Cross Organizational
Cooperation
PROCESS
• BUSINESS PROCESS
FACILITIES
• Data Center
Infrastructure
• Workspace
Infrastructure
• Physical Security
• Environmental (Power,
HVAC)
–
–
–
–
BCP Mgmt
Risk Mgmt
Product Delivery & Mgmt
Information Mgmt
• IT PROCESS
–
–
–
–
Application Development
Application Operations Mgmt
Operations Service Delivery
Operations Service Mgmt IT
• CROSS-FUNCTIONAL
PROCESS
– Business Process Integration
– Partner Controls & Integration
– Overall Life-cycle Integration
Program Vision
•
The vision for Business Resilience Management is:
– Establish BRM processes and services that are well-integrated
with business and IT planning, development and operational
processes such that enterprise-wide BRM implementation,
testing and compliance are ensured, and support business
objectives.
•
Our Strategy for accomplishing this is:
– Adopt an internal strategy to deploy BRM solutions and develop
industry partnerships to support business expansion and growth.
– Deploy a dual data center infrastructure, storage and data
architecture to support business continuous availability needs
and flexible, scalable and agile BRM solutions.
– Eliminate BRM gaps through investment into development of
internal capabilities.
3
System Remediation
BRM Building Blocks
BRM Governance & Coordination
System Remediation
BRM Building Blocks
Details behind the Building Blocks
BRM Gov ernance & Coordina tion
BRM Program Framework
4
BRM Strategic Investment
$MM
Vendor Solution
• Does not address BRM requirements
R
e
s
o
u
r
c
e
s
BRM Program
• Driven by Business Requirements
• Aligning Business and IT
• Value based BRM Investments
Industry Recommended
BRM Spend Based on
IT Budget
BRM Strategic Investments
Projected BRM spend
2003
2004
2005
20xx
Resiliency Tier Framework (RTF)
Provides a common dialogue for Business & IT recoverability
TIER 1
TIER 2
TIER 3
Critical
Availability
Important
Availability
Deferred
Availability
8 Hrs
24 Hrs
48 Hrs
72 Hrs
Vital Data
(A)
4 Hrs
TIER 0A
TIER 1A
TIER 2A
TIER 3A
Dual DC Remote
Data Replication
Critical
Data
(B)
8 Hrs
TIER 0B
TIER 1B
TIER 2B
TIER 3B
Dual DC Remote
Data Replication
Important
Data
(C)
24 Hrs
TIER 0C
TIER 1C
TIER 2C
TIER 3C
Virtual Vault
Storage
Backup/Restore
Deferred
Data
(D)
48 Hrs
TIER 0D
TIER 1D
TIER 2D
TIER 3D
Offsite Tape
Backup/Restore
Dual DC
Automated
Failover
Dual DC
Manual
Failover
RPO Minimum Resiliency
Recovery Point Objective (RPO)
Recovery Time Objective (RTO)
TIER 0
Vital
Availability
Dual DC
Dual DC
Standby Cold Drop Ship
Restore
Cold Restore
RTO Minimum Resiliency
RTF Certification Standard: Tier Alignment
Recovery Time Objective (RTO)
(2) Business Process, Application and
System Resiliency Requirements
TIER 1
T IER 2
TIER 3
Critical
Availability
Important
Availability
Deferred
Availability
FACILITIES ENVIRONMENT
Full Application & Data
Application & Data Deployment Across DDCs
DDC Deploy
Office Recovery Site Fixed Secondary Office
Site, or Commercial
Hotsite
XSP Data Center Site Resiliency
Resilient SLAs
NETWORK ENVIRONMENT
Site/Campus & Edge Data Network Resiliency
HA Redundancy
XSP WAN/VAN/MAN/ISP/Voice Network Resiliency
HA Redundancy
Voice & TCOMM Network
HA Redundancy
PLATFORM ENVIRONMENT
App & DB Server Platform Resiliency
HA Redundancy
Workstation Recovery
Prebuilt Spares
STORAGE ENVIRONMENT
Online SAN Replication/Restore
Local & Remote DDC
Offline Tape Backup/Restore
In Failover Mode
DATA MGMT ENVIRONMENT
Database Resiliency
Local & Remote DDC
File Storage Resiliency
Local & Remote DDC
APPLICATION ENVIRONMENT
Application Architecture Resiliency
Local & Remote DDC
APPLICATION INTEGRATION ENVIRONMENT
Middleware Architecture Resiliency
Local & Remote DDC
SECURITY MGMT ENVIRONMENT
Security Controls & Process
Highest
Recovery Point Objective (RPO)
8 Hrs
TIER 0-A
ENTERPRISE SYSTEMS CLASSIFICATIONS
24 Hrs
48 Hrs
(1) Business
RTO/RPO
Requirements
72 Hrs
Vital Data
(A)
4 Hrs
TIER 0A
TIER 1A
TIER 2A
TIER 3A
Dual DC Remote
Data Replication
Critical
Data
(B)
8 Hrs
TIER 0B
TIER 1B
TIER 2B
TIER 3B
Dual DC Remote
Data Replication
Important
Data
(C)
24 Hrs
TIER 0C
TIER 1C
TIER 2C
TIER 3C
Virtual Vault
Storage
Backup/Restore
Deferred
Data
(D)
48 Hrs
TIER 0D
TIER 1D
TIER 2D
TIER 3D
Offsite Tape
Backup/Restore
Dual DC
Automated
Failover
Dual DC
Manual
Failover
RPO Minimum Resiliency
BRM Tier Application Recovery
Categorization Framework
TIER 0
Vital
Availability
Dual DC
Dual DC
Standby Cold Drop Ship
Restore
Cold Restore
RTO Minimum Resiliency
(3) Remediation and Resiliency Solutions
Tier 0-A Applications:
High Availability/Redundant Platforms,
Data Replication/Recovery RPO,
DDC automated recovery,
Full Application Recovery Plan & Test
5
Business Exposure and
Impact Analytics
GMAC-RFC consistently
assesses and determines
required recovery capabilities
for processes and new
initiatives. The assessment
is based on an analytical
model consisting of
quantitative and qualitative
measures. The assessment
and analysis process is
structured in four phases,
designed to conduct a
comprehensive analysis for
people, process, technology,
facilities and
interdependencies.
Business Application Resilience (BAR) Planning Methodology
Business Resiliency Planning
Recovery Time Objective (RTO)
TIER 0
TIER 1
TIER 2
TIER 3
Vital
Availability
Critical
Availability
Important
Availability
Deferred
Availability
8 Hrs
24 Hrs
48 Hrs
72 Hrs
TIER 2A
TIER 3A
Dual DC Remote
Data Replication
TIER 2B
TIER 3B
Dual DC Remote
Data Replication
TIER 3C
Virtual Vault
Storage
Backup/Restore
1 Application Suite 1
1 2 3 4
5
4 Hrs
TIER 0A
TIER 1A
Critical
Data
(B)
8 Hrs
TIER 0B
TIER 1B
6
10
9
7
8
3
1
2
Important
Data
(C)
24 Hrs
TIER 0C
TIER 1C
Deferred
Data
(D)
48 Hrs
TIER 0D
TIER 1D
Dual DC
Automated
Failover
Dual DC
Manual
Failover
4
7
TIER 2C
8
9
10
5
TIER 2D
6
Offsite Tape
Backup/Restore
TIER 3D
Dual DC
Dual DC
Standby Cold Drop Ship
Restore
Cold Restore
1
2 3
4
5
6
8
7
3 Application Suite 1
RPO Minimum Resiliency
Recovery Point Objective
(RPO)
2 Application Suite 1
Vital Data
(A)
6 Application Suite 1
7 Application Suite 1
8 Application Suite 1
9
9 10 11 12
RTO Minimum Resiliency
4 Application Suite 1
5 Application Suite 1
Application Suite 1
10 Application Suite 1
Target State
Planned
Best Effort
Systems Availability &
Recovery Gaps
Application 1 BRM Gap Scorecard
Application 2 BRM Gap Scorecard
100%
RTF Compliance
RTF Compliance
100%
80%
60%
40%
20%
80%
60%
40%
20%
0%
0%
High Availabilty
Capability
High Availabilty
Capability
Data and Application
Recovery Capability
Data and Application
Recovery Capability
Application 4 BRM Gap Scorecard
100%
100%
RTF Compliance
RTF Compliance
Application 3 BRM Gap Scorecard
80%
60%
40%
20%
0%
80%
60%
40%
20%
0%
High Availabilty
Capability
Data and Application
Recovery Capability
High Availabilty
Capability
Data and Application
Recovery Capability
6
Critical Service Portfolio
The Vital and Critical portfolio management process has been
established to prioritize resiliency requirements and enhancements
for business processes based on criticality. It is also designed to
eliminate a functional, silo-view, by combining process components
into one integrated profile known as a ‘Recovery Domain’.
Recovery Domains enable a structured process to continuously
enhance resiliency capabilities and provide a strong foundation to
provide highest availability for our business processes.
Recovery Domain Definition Standard
Investor
Portal
2
VISION
MULSOR
(agg, calc,
credinst files)
$$
$
Trustee
(Investors)
Loan level info,
pool level aggregation
MULSOR
GL info
for reconciliation
DRT
Peoples oft
DF Letters
Distribution data
Distribution calculation
result s
All Loans,
Payoff files,
SOD tables
Distribution data
(for MULSCR
reported deals)
Bond payment
Loan level
distribution results
MULREO
DMS
REO liquidation activity
(expense s, proceeds)
Original loan info
GL entries
Potential expenses,
REO Amortization
Daily HCF Interface feed
PROD
Clos e of
Escrow
Liquidations and
expenses
REO Amortization
Auto
Pooling
DDS
Distribution calculation
result s
INTEX
Shared
Execution
DMS
Workflow
Foreclosure, REO,
SSCRA, Bankruptcy
Loan level info,
pool level aggregation
Distribution calculation
result s
Curtailments
REO loans
4
EAGLE
HSS
HIP
HE Whole
Loan
Servicer advances
(manually entered
from report)
Liquidations
(manually entered
from report)
REO funds
CSS
Interface / Data feed
Data dependency
IMS
IDR Feed
Bank
Loan
Accounting
Distribution
List
Daily changes, payoffs,
index values, se rvicer tran sfers,
monthly loan updates
BOS
Manual checks
$
Remittances and payoffs
FHLMC reporting data
(manual
FNMA reporting
data process)
Recovery Domains
5
Monet
Midnet
1
EOM initialization for
HE Loans
Servic er
Transmission
PC Seller/
Servic er
System
(2nd mtg)
Excluded from Recovery Boundary
Initialization for
new HE Loans
HIP
Working
Tables
7th B.D. comparison
"Upload Proof"
Servicers
and Service
Bureaus
Daily Home Equity
transactions from GMAC,
HCF and MFI
Recovery Domain Boundary
ACQ
Monthly servicer
cutoff data files
EOM initialization for
HE Loans
Reconciled
loan data
1
Month-end
Initialization files
Scrubbed servicer cutoff
data files
Servicers
Loan Accountant
assignments (manual)
Loan
Accounting
Penalty
Tracker
Other
default
reporting
Newly funded loans
IDR Feed
5
Manual checks
Legend
Servic er
Compass
REO, 3PS,
SPO, WO
Snapshot
COLOAN
3
HE
Structured
Homecomings
Default data
(San Diego)
Newly funded loans
Nightly uploads ofand changes
DLQ, FCL, LM,for selected fields
BK, REO info
“Ca sh In” data
Active loans,
payoffs, repurchase s
PRN
files
$
$
Why Use Recovery Domains?
The volume and complexity of
business systems require that
recovery parameters are
understood to ensure
recoverability.
What is a Recovery Domain?
A method for aligning business
functions and supporting
applications and infrastructure
into logical groups that enable
resumption of target business or
systems functions.
HIP
HIP
3
Process Integration and Improvement
•
Integrate BRM Resiliency oversight, standards and best practices
into RCG People, Process & Technology areas:
– New Application Development (SDLC)
– Business Impact Assessment and Planning
– Existing Business and IT System remediation
– Annual Operating Planning
– IT Frameworks
– Delivery Assurance Processes and procedures
– IT Operations Service Management
– Education and cross-training
– Improve resiliency maturity and metrics scorecards
– Etc.
7
Process Integration and Improvement:
BRM Alignment with DA Framework
R equired R C G -IT D eliverables
P lan
D ef in e
C o n st ru ct
Tes t
De pl oy
Pr oject C h arter
Busin ess R equir em ents
S ystem D esign
D ocum ent
S ystem T est Su mm ar y
R epor t
R elease N o tes
Pr oject Pla n / SOW
Syste m R e quir em ents
S our ce C ode/U nit T e st
S ystem T est D e fect Log
In stallation G uide
R equ ir em ents
T r aceability
S ystem T est P lan
U AT Sum m ar y R epor t
O per ations Gu ide
Ar chitectu re D esign
D ocum ent
Business
RTO/RPO
Rqmts
S ystem T est C ases
D efine T e st Stra tegy/
Appr oach
Rqmts Definition
Checklist
Project SOW,
Review
Checklist
Recovery test
approach
E stim ates
Issue & R isk T r ackin g
Failover Test
Plan,
Application
Recovery Plan
(ARP)
Infrastructure
Recovery
Plan(IRP)
U AT T e st Plan
S ystem D eployme nt
P lan
Design
Review,
O per ations
& Supp ort
P lan
Data
Arch review,
R eadiness
SysS ystem
ArchT est
Review,
R eview
SysArch Spec,
Devlp Review
Checklists
P roject C o n t ro ls
W or k P lan ning &
T r acking
Sta tus & C ost R epor ting
C h ange C on tr ol
Deployment Review,
Production
Certification Review,
EPT Doc Docs
R esour ce P lann ing
D eliver y A ss ura nce T ollgates
Includes
resiliency
requirements
Includes
resiliency
scope & costs
Includes
Business
availability
Recovery
requirements
Includes
RTF Tier
requirements,
OLAs/SLAs
Includes
Local & DDC
Includes
Configuration
Local HA/Failover Deployment EPT
& ARP, IRP
test plans
Technology
Services Group
IT Service
Management
Policy
Process Standards & Guidelines
(Overview & Glossary)
BRM Alignment with IT Service Mgmt
BRM Resiliency Stds
Incident Management
Problem Management
Change Management
Release Management
Configuration Management
Service Level Management
Availability Management
Capacity Management
IT Financial Management
IT Service Continuity Management
BRM Resiliency ( People, process, Tools)
Service Management Framework
Dual Data Center (DDC)
Adopted a geographically
disperse Dual Data Center
(DDC) resiliency strategy.
Vital and Critical
applications are required
to have full fail-over
capability within the DDC
architecture.
WHY:
• Geographically redundant DC reduces risks
• Internal self-sufficiency and capabilities enable
business resiliency
• Standard availability and recovery solutions reduce
complexity and costs
• Standard support and SLAs meet business recovery
objectives
• Shared infrastructure enables long-term economies
of scale and reuse
Production/Recovery
Production/Recovery
Dallas DDC2
8
Data Resilience
Deploy Tiered Storage Architecture Standard
for improved RPO Resiliency and Recovery
•
•
•
•
•
Provide tiered Storage
options to support business
RPOs
Provide Local DC and
Remote DC Data
Replication and Recovery
Provide Resilient Storage
Architecture
Integrate backup and
recovery architecture
Employ Information Mgmt
practices to enable data
recovery
BRM Governance & Coordination
The governance model is
designed to provide
centralized oversight and to
enable business ownership.
Consistent program tools
allow for prioritized
assessment, analysis,
evaluation and decisionmaking processes
depending on criticality
across the enterprise.
Defined roles and
responsibilities assure
consistent business
resiliency planning and
execution.
BRM Governance: An established
set of methods by which Business
Areas address their business
resilience needs.
Business Risk
Risk Committee
BRM Operations Team
-BRM Program Manager
-BRM Architect
-BRM Specialist
-BCP Site Coordinator
Business Resilience
Management Committee
(BRMC)
BRM Program
Business Units
- Stakeholders
- Projects
- Builds BRM Framework,
-Transitions ownership to the BRM Operations Team
--Enables Business & IT to achieve risk goals
Standards?
•
type
9
Recovery Plans
The plan structure is designed
so that all plans are integrated
in an efficient manner. Data
content flows are documented
between plans ensuring that
all required data is captured
and non-essential data is
minimized. Each plan is an
assigned an owner.
BRM Program Objectives
Implement BRM practices into way of doing business
Build
Transition
Sustaining
Model
- BRM Project
generates Artifacts
-RFC Owner & Stakeholders
are identified
- RFC staff provides
input & approves
project work
-Ownership of Artifacts is
transferred to the RFC Owner
-Operational program
supported by RFC
associates and
integrated into
business processes.
- BRM Team consults and
helps nurture the Artifacts
BRM Program
Sustaining Model - Employees
Projects
Transition – Consultants
BRM Artifacts
R ecov ery T ime Ob jective (R TO)
Vit al Dat a
(A)
Critical
Data
(B)
T I ER 0
T IER 1
TI ER 2
T I ER 3
Vit al
Critical
Important
Deferre d
Availability
Av ailab ility
Availability
Availability
8 Hrs
24 H rs
48 Hrs
T IER 0A
TIER 1A
T IER 2A
T IER 3A
Dual DC Re mote
Dat a Replication
TIER 0B
TIER 1B
T IER 2B
TIER 3B
Dual DC Re mote
Dat a Replication
TIER 3C
Virtual Vault
Storage
Backup/Restore
TIER 3D
Of fsite Tape
Backup/Restore
24 H rs
TIER 0C
TIER 1C
T IER 2C
Deferred
Data
(D )
48 H rs
TIER 0D
TIER 1D
T IER 2D
Du al DC
Manual
Dual DC
St andb y Cold
Du al DC
Dro p Ship
F ailover
Re sto re
Cold Restore
Dual DC
Automated
Failov er
Recovery
Domains
72 Hrs
4 H rs
8 H rs
Important
Data
( C)
Redun dant San Di ego
Productio n Burbank
Redun dant Dal las
RPOMinimumResiliency
RecoveryPoint Objective(RPO
)
RTF
Productio n S an Diego
Redun dant Burbank
P roduction MSP
DDC
Redun dant MSP
Minne apolis
Burbank
BAR Methodology
Prod Dal las
San Diego
Dallas
R TO Min imum Resilienc y
SDLC Checklists
BRM Planning Overview
•
Operational Readiness
–
–
–
–
Establish Governance
Train Staff
Assess and Adopt BRM Framework
Establish BRM Resiliency Baseline
•
Annual Operating Plan
– Assess
– Prioritize
– Plan
•
Execution
– Maintain Recovery Plans
– Exercise Recovery Plans
– Execute Resiliency Risk
Reduction Projects
•
Oversight and Compliance
10
Review
•
Develop a continuity framework that addresses all levels of the
organization; facilities, technology, applications, data, processes,
governance, strategy.
•
Integrate all elements of the framework.
•
Establish a governance committee, placing responsibility within
the business.
•
BRM Operations maintains the framework, tools, methodologies,
artifacts.
•
Incorporate BRM processes and capabilities into day-to-day
processes.
•
Invest internally to improve processes versus externally leaving
processes as-is.
Business strategy alignment and cost optimization through
implementation of the BRM strategies provides company with a
range of options to improve business value
BRM Program
Value Proposition
•
Reduce business risks
•
Enable business growth
•
Invest in gap reduction
•
Take complexity out
•
Create choice and flexible BRM
options
11