Business Resilience End-to-End GMAC-RFC Case Study Chuck Wachter, CDRP BRM Program Manager 952-857-6384 [email protected] Traditional BCP/ DR Approach Systems Lost Data Vital Records Applications Data Restore Technology Capability Notifications Restore Communications Recovery Point Objective Move to Resume Alternate Return Site Business Home Data Synchronization Restore Business Functions Relocate Office Equipment / Supplies Work Flow Recovery Time Objective Arriving at BRM • Significant growth created a complex and interdependent business and IT systems environment. • Analysis concluded that unacceptable loss would occur from a significant outage lasting more than 24 hours. • Proposed external vendor solutions could not mitigate the problem and meet recovery requirements. • Required an integrated, sustaining, all-encompassing approach. • Determined an internal recovery solution would meet requirements, create a lower CODB, and provid additional benefits: – Addressed day-to-day impacts up to catastrophic versus catastrophic only – Minimize vendor contractual limitations – Process enhancements: change management, testing, service delivery, Incident Response etc. 1 Industry Best Practices • • • • Over 60% of G2000 organizations are implementing a dual data center strategy to support continuous availability Business and IT availability classification requirements aligned with associated cost, application and system architecture requirements Companies that manage resilience internally achieve a greater degree of maturity and business process alignment with their supporting IT systems Financial services industry is under increased scrutiny by regulatory agencies due to the critical nature and impact of its services to the economy. Their response includes: – Investment in failover capabilities to significantly reduce the time for recovery and to respond to widespread/regional disruptions – A realignment from the traditional approach of recovering technology and facilities toward a full business resumption model – Increased frequency of exercising business and IT resilience capabilities As real-time business requirements become more pervasive, business continuity must be integrated throughout the corporate culture and business processes, with clear accountability and measurement defined to align with acceptable corporate risk. Business Resilience Defined Resilience is the ability and capacity to withstand and adapt to new risk environments. A resilient organization effectively aligns its strategy, operations, business systems, governance structure, and decision-support capabilities so that it can uncover and adjust to continually changing risks, endure disruptions to its primary earnings drivers, and create advantages over less adaptive competitors. Program Mission The Business Resiliency program manages the organizations capabilities to continue to provide services at anytime, regardless of the event and impact. Prioritization of investments in people, processes, technology and facilities are based on business risk and criticality. Comprehensive testing continuously validates the recovery capabilities and an integrated governance model assures transparent coordination and reporting. Best Practice: Resiliency Model Business Processes IT Services Internal Sponsorship & Governance BCDR Program BCDR Framework Infrastructure Capabilities Suppliers Operational Management 2 Leadership needs to ensure that business and IT BCDR strategies are continuously aligned to create value Business-Driven Strategy Business/IT Alignment Model Business define requirements & strategy Focus is on making “today’s and future business better” Accomplished in conjunction with Governance, business process improvements and capabilities BCDR requirements are understood and integrated in business process, valuable and cost effective IT-Enabled Strategy IT aligns and offers BCDR strategic capabilities to enable new growth, products/services, channels Creates BCDR “portfolio visibility” for business leverage Provides responsive, flexible technology environment IT BCDR Services are cost effective, responsive and measured Approach Business resiliency is incorporated throughout the corporate culture and business processes. Every level of the organization has been evaluated, with new models and methodologies developed and integrated into daily processes, ensuring resiliency and compliance with regulatory requirements at every layer. STRATEGY • Governance • Continuity Strategy • Availability Strategy • Recovery Strategy • Communications • Risk Management APPLICATIONS and DATA • Application Architecture / Design • Application Availability / Recovery • Application Integration • Data Backup/Recovery • Data Security ARCHITECTURE & TECHNOLOGY • Platforms & Networks • Systems Software • Storage • Middleware • Standards ORGANIZATION • Roles • Responsibilities • Skills • Cross Organizational Cooperation PROCESS • BUSINESS PROCESS FACILITIES • Data Center Infrastructure • Workspace Infrastructure • Physical Security • Environmental (Power, HVAC) – – – – BCP Mgmt Risk Mgmt Product Delivery & Mgmt Information Mgmt • IT PROCESS – – – – Application Development Application Operations Mgmt Operations Service Delivery Operations Service Mgmt IT • CROSS-FUNCTIONAL PROCESS – Business Process Integration – Partner Controls & Integration – Overall Life-cycle Integration Program Vision • The vision for Business Resilience Management is: – Establish BRM processes and services that are well-integrated with business and IT planning, development and operational processes such that enterprise-wide BRM implementation, testing and compliance are ensured, and support business objectives. • Our Strategy for accomplishing this is: – Adopt an internal strategy to deploy BRM solutions and develop industry partnerships to support business expansion and growth. – Deploy a dual data center infrastructure, storage and data architecture to support business continuous availability needs and flexible, scalable and agile BRM solutions. – Eliminate BRM gaps through investment into development of internal capabilities. 3 System Remediation BRM Building Blocks BRM Governance & Coordination System Remediation BRM Building Blocks Details behind the Building Blocks BRM Gov ernance & Coordina tion BRM Program Framework 4 BRM Strategic Investment $MM Vendor Solution • Does not address BRM requirements R e s o u r c e s BRM Program • Driven by Business Requirements • Aligning Business and IT • Value based BRM Investments Industry Recommended BRM Spend Based on IT Budget BRM Strategic Investments Projected BRM spend 2003 2004 2005 20xx Resiliency Tier Framework (RTF) Provides a common dialogue for Business & IT recoverability TIER 1 TIER 2 TIER 3 Critical Availability Important Availability Deferred Availability 8 Hrs 24 Hrs 48 Hrs 72 Hrs Vital Data (A) 4 Hrs TIER 0A TIER 1A TIER 2A TIER 3A Dual DC Remote Data Replication Critical Data (B) 8 Hrs TIER 0B TIER 1B TIER 2B TIER 3B Dual DC Remote Data Replication Important Data (C) 24 Hrs TIER 0C TIER 1C TIER 2C TIER 3C Virtual Vault Storage Backup/Restore Deferred Data (D) 48 Hrs TIER 0D TIER 1D TIER 2D TIER 3D Offsite Tape Backup/Restore Dual DC Automated Failover Dual DC Manual Failover RPO Minimum Resiliency Recovery Point Objective (RPO) Recovery Time Objective (RTO) TIER 0 Vital Availability Dual DC Dual DC Standby Cold Drop Ship Restore Cold Restore RTO Minimum Resiliency RTF Certification Standard: Tier Alignment Recovery Time Objective (RTO) (2) Business Process, Application and System Resiliency Requirements TIER 1 T IER 2 TIER 3 Critical Availability Important Availability Deferred Availability FACILITIES ENVIRONMENT Full Application & Data Application & Data Deployment Across DDCs DDC Deploy Office Recovery Site Fixed Secondary Office Site, or Commercial Hotsite XSP Data Center Site Resiliency Resilient SLAs NETWORK ENVIRONMENT Site/Campus & Edge Data Network Resiliency HA Redundancy XSP WAN/VAN/MAN/ISP/Voice Network Resiliency HA Redundancy Voice & TCOMM Network HA Redundancy PLATFORM ENVIRONMENT App & DB Server Platform Resiliency HA Redundancy Workstation Recovery Prebuilt Spares STORAGE ENVIRONMENT Online SAN Replication/Restore Local & Remote DDC Offline Tape Backup/Restore In Failover Mode DATA MGMT ENVIRONMENT Database Resiliency Local & Remote DDC File Storage Resiliency Local & Remote DDC APPLICATION ENVIRONMENT Application Architecture Resiliency Local & Remote DDC APPLICATION INTEGRATION ENVIRONMENT Middleware Architecture Resiliency Local & Remote DDC SECURITY MGMT ENVIRONMENT Security Controls & Process Highest Recovery Point Objective (RPO) 8 Hrs TIER 0-A ENTERPRISE SYSTEMS CLASSIFICATIONS 24 Hrs 48 Hrs (1) Business RTO/RPO Requirements 72 Hrs Vital Data (A) 4 Hrs TIER 0A TIER 1A TIER 2A TIER 3A Dual DC Remote Data Replication Critical Data (B) 8 Hrs TIER 0B TIER 1B TIER 2B TIER 3B Dual DC Remote Data Replication Important Data (C) 24 Hrs TIER 0C TIER 1C TIER 2C TIER 3C Virtual Vault Storage Backup/Restore Deferred Data (D) 48 Hrs TIER 0D TIER 1D TIER 2D TIER 3D Offsite Tape Backup/Restore Dual DC Automated Failover Dual DC Manual Failover RPO Minimum Resiliency BRM Tier Application Recovery Categorization Framework TIER 0 Vital Availability Dual DC Dual DC Standby Cold Drop Ship Restore Cold Restore RTO Minimum Resiliency (3) Remediation and Resiliency Solutions Tier 0-A Applications: High Availability/Redundant Platforms, Data Replication/Recovery RPO, DDC automated recovery, Full Application Recovery Plan & Test 5 Business Exposure and Impact Analytics GMAC-RFC consistently assesses and determines required recovery capabilities for processes and new initiatives. The assessment is based on an analytical model consisting of quantitative and qualitative measures. The assessment and analysis process is structured in four phases, designed to conduct a comprehensive analysis for people, process, technology, facilities and interdependencies. Business Application Resilience (BAR) Planning Methodology Business Resiliency Planning Recovery Time Objective (RTO) TIER 0 TIER 1 TIER 2 TIER 3 Vital Availability Critical Availability Important Availability Deferred Availability 8 Hrs 24 Hrs 48 Hrs 72 Hrs TIER 2A TIER 3A Dual DC Remote Data Replication TIER 2B TIER 3B Dual DC Remote Data Replication TIER 3C Virtual Vault Storage Backup/Restore 1 Application Suite 1 1 2 3 4 5 4 Hrs TIER 0A TIER 1A Critical Data (B) 8 Hrs TIER 0B TIER 1B 6 10 9 7 8 3 1 2 Important Data (C) 24 Hrs TIER 0C TIER 1C Deferred Data (D) 48 Hrs TIER 0D TIER 1D Dual DC Automated Failover Dual DC Manual Failover 4 7 TIER 2C 8 9 10 5 TIER 2D 6 Offsite Tape Backup/Restore TIER 3D Dual DC Dual DC Standby Cold Drop Ship Restore Cold Restore 1 2 3 4 5 6 8 7 3 Application Suite 1 RPO Minimum Resiliency Recovery Point Objective (RPO) 2 Application Suite 1 Vital Data (A) 6 Application Suite 1 7 Application Suite 1 8 Application Suite 1 9 9 10 11 12 RTO Minimum Resiliency 4 Application Suite 1 5 Application Suite 1 Application Suite 1 10 Application Suite 1 Target State Planned Best Effort Systems Availability & Recovery Gaps Application 1 BRM Gap Scorecard Application 2 BRM Gap Scorecard 100% RTF Compliance RTF Compliance 100% 80% 60% 40% 20% 80% 60% 40% 20% 0% 0% High Availabilty Capability High Availabilty Capability Data and Application Recovery Capability Data and Application Recovery Capability Application 4 BRM Gap Scorecard 100% 100% RTF Compliance RTF Compliance Application 3 BRM Gap Scorecard 80% 60% 40% 20% 0% 80% 60% 40% 20% 0% High Availabilty Capability Data and Application Recovery Capability High Availabilty Capability Data and Application Recovery Capability 6 Critical Service Portfolio The Vital and Critical portfolio management process has been established to prioritize resiliency requirements and enhancements for business processes based on criticality. It is also designed to eliminate a functional, silo-view, by combining process components into one integrated profile known as a ‘Recovery Domain’. Recovery Domains enable a structured process to continuously enhance resiliency capabilities and provide a strong foundation to provide highest availability for our business processes. Recovery Domain Definition Standard Investor Portal 2 VISION MULSOR (agg, calc, credinst files) $$ $ Trustee (Investors) Loan level info, pool level aggregation MULSOR GL info for reconciliation DRT Peoples oft DF Letters Distribution data Distribution calculation result s All Loans, Payoff files, SOD tables Distribution data (for MULSCR reported deals) Bond payment Loan level distribution results MULREO DMS REO liquidation activity (expense s, proceeds) Original loan info GL entries Potential expenses, REO Amortization Daily HCF Interface feed PROD Clos e of Escrow Liquidations and expenses REO Amortization Auto Pooling DDS Distribution calculation result s INTEX Shared Execution DMS Workflow Foreclosure, REO, SSCRA, Bankruptcy Loan level info, pool level aggregation Distribution calculation result s Curtailments REO loans 4 EAGLE HSS HIP HE Whole Loan Servicer advances (manually entered from report) Liquidations (manually entered from report) REO funds CSS Interface / Data feed Data dependency IMS IDR Feed Bank Loan Accounting Distribution List Daily changes, payoffs, index values, se rvicer tran sfers, monthly loan updates BOS Manual checks $ Remittances and payoffs FHLMC reporting data (manual FNMA reporting data process) Recovery Domains 5 Monet Midnet 1 EOM initialization for HE Loans Servic er Transmission PC Seller/ Servic er System (2nd mtg) Excluded from Recovery Boundary Initialization for new HE Loans HIP Working Tables 7th B.D. comparison "Upload Proof" Servicers and Service Bureaus Daily Home Equity transactions from GMAC, HCF and MFI Recovery Domain Boundary ACQ Monthly servicer cutoff data files EOM initialization for HE Loans Reconciled loan data 1 Month-end Initialization files Scrubbed servicer cutoff data files Servicers Loan Accountant assignments (manual) Loan Accounting Penalty Tracker Other default reporting Newly funded loans IDR Feed 5 Manual checks Legend Servic er Compass REO, 3PS, SPO, WO Snapshot COLOAN 3 HE Structured Homecomings Default data (San Diego) Newly funded loans Nightly uploads ofand changes DLQ, FCL, LM,for selected fields BK, REO info “Ca sh In” data Active loans, payoffs, repurchase s PRN files $ $ Why Use Recovery Domains? The volume and complexity of business systems require that recovery parameters are understood to ensure recoverability. What is a Recovery Domain? A method for aligning business functions and supporting applications and infrastructure into logical groups that enable resumption of target business or systems functions. HIP HIP 3 Process Integration and Improvement • Integrate BRM Resiliency oversight, standards and best practices into RCG People, Process & Technology areas: – New Application Development (SDLC) – Business Impact Assessment and Planning – Existing Business and IT System remediation – Annual Operating Planning – IT Frameworks – Delivery Assurance Processes and procedures – IT Operations Service Management – Education and cross-training – Improve resiliency maturity and metrics scorecards – Etc. 7 Process Integration and Improvement: BRM Alignment with DA Framework R equired R C G -IT D eliverables P lan D ef in e C o n st ru ct Tes t De pl oy Pr oject C h arter Busin ess R equir em ents S ystem D esign D ocum ent S ystem T est Su mm ar y R epor t R elease N o tes Pr oject Pla n / SOW Syste m R e quir em ents S our ce C ode/U nit T e st S ystem T est D e fect Log In stallation G uide R equ ir em ents T r aceability S ystem T est P lan U AT Sum m ar y R epor t O per ations Gu ide Ar chitectu re D esign D ocum ent Business RTO/RPO Rqmts S ystem T est C ases D efine T e st Stra tegy/ Appr oach Rqmts Definition Checklist Project SOW, Review Checklist Recovery test approach E stim ates Issue & R isk T r ackin g Failover Test Plan, Application Recovery Plan (ARP) Infrastructure Recovery Plan(IRP) U AT T e st Plan S ystem D eployme nt P lan Design Review, O per ations & Supp ort P lan Data Arch review, R eadiness SysS ystem ArchT est Review, R eview SysArch Spec, Devlp Review Checklists P roject C o n t ro ls W or k P lan ning & T r acking Sta tus & C ost R epor ting C h ange C on tr ol Deployment Review, Production Certification Review, EPT Doc Docs R esour ce P lann ing D eliver y A ss ura nce T ollgates Includes resiliency requirements Includes resiliency scope & costs Includes Business availability Recovery requirements Includes RTF Tier requirements, OLAs/SLAs Includes Local & DDC Includes Configuration Local HA/Failover Deployment EPT & ARP, IRP test plans Technology Services Group IT Service Management Policy Process Standards & Guidelines (Overview & Glossary) BRM Alignment with IT Service Mgmt BRM Resiliency Stds Incident Management Problem Management Change Management Release Management Configuration Management Service Level Management Availability Management Capacity Management IT Financial Management IT Service Continuity Management BRM Resiliency ( People, process, Tools) Service Management Framework Dual Data Center (DDC) Adopted a geographically disperse Dual Data Center (DDC) resiliency strategy. Vital and Critical applications are required to have full fail-over capability within the DDC architecture. WHY: • Geographically redundant DC reduces risks • Internal self-sufficiency and capabilities enable business resiliency • Standard availability and recovery solutions reduce complexity and costs • Standard support and SLAs meet business recovery objectives • Shared infrastructure enables long-term economies of scale and reuse Production/Recovery Production/Recovery Dallas DDC2 8 Data Resilience Deploy Tiered Storage Architecture Standard for improved RPO Resiliency and Recovery • • • • • Provide tiered Storage options to support business RPOs Provide Local DC and Remote DC Data Replication and Recovery Provide Resilient Storage Architecture Integrate backup and recovery architecture Employ Information Mgmt practices to enable data recovery BRM Governance & Coordination The governance model is designed to provide centralized oversight and to enable business ownership. Consistent program tools allow for prioritized assessment, analysis, evaluation and decisionmaking processes depending on criticality across the enterprise. Defined roles and responsibilities assure consistent business resiliency planning and execution. BRM Governance: An established set of methods by which Business Areas address their business resilience needs. Business Risk Risk Committee BRM Operations Team -BRM Program Manager -BRM Architect -BRM Specialist -BCP Site Coordinator Business Resilience Management Committee (BRMC) BRM Program Business Units - Stakeholders - Projects - Builds BRM Framework, -Transitions ownership to the BRM Operations Team --Enables Business & IT to achieve risk goals Standards? • type 9 Recovery Plans The plan structure is designed so that all plans are integrated in an efficient manner. Data content flows are documented between plans ensuring that all required data is captured and non-essential data is minimized. Each plan is an assigned an owner. BRM Program Objectives Implement BRM practices into way of doing business Build Transition Sustaining Model - BRM Project generates Artifacts -RFC Owner & Stakeholders are identified - RFC staff provides input & approves project work -Ownership of Artifacts is transferred to the RFC Owner -Operational program supported by RFC associates and integrated into business processes. - BRM Team consults and helps nurture the Artifacts BRM Program Sustaining Model - Employees Projects Transition – Consultants BRM Artifacts R ecov ery T ime Ob jective (R TO) Vit al Dat a (A) Critical Data (B) T I ER 0 T IER 1 TI ER 2 T I ER 3 Vit al Critical Important Deferre d Availability Av ailab ility Availability Availability 8 Hrs 24 H rs 48 Hrs T IER 0A TIER 1A T IER 2A T IER 3A Dual DC Re mote Dat a Replication TIER 0B TIER 1B T IER 2B TIER 3B Dual DC Re mote Dat a Replication TIER 3C Virtual Vault Storage Backup/Restore TIER 3D Of fsite Tape Backup/Restore 24 H rs TIER 0C TIER 1C T IER 2C Deferred Data (D ) 48 H rs TIER 0D TIER 1D T IER 2D Du al DC Manual Dual DC St andb y Cold Du al DC Dro p Ship F ailover Re sto re Cold Restore Dual DC Automated Failov er Recovery Domains 72 Hrs 4 H rs 8 H rs Important Data ( C) Redun dant San Di ego Productio n Burbank Redun dant Dal las RPOMinimumResiliency RecoveryPoint Objective(RPO ) RTF Productio n S an Diego Redun dant Burbank P roduction MSP DDC Redun dant MSP Minne apolis Burbank BAR Methodology Prod Dal las San Diego Dallas R TO Min imum Resilienc y SDLC Checklists BRM Planning Overview • Operational Readiness – – – – Establish Governance Train Staff Assess and Adopt BRM Framework Establish BRM Resiliency Baseline • Annual Operating Plan – Assess – Prioritize – Plan • Execution – Maintain Recovery Plans – Exercise Recovery Plans – Execute Resiliency Risk Reduction Projects • Oversight and Compliance 10 Review • Develop a continuity framework that addresses all levels of the organization; facilities, technology, applications, data, processes, governance, strategy. • Integrate all elements of the framework. • Establish a governance committee, placing responsibility within the business. • BRM Operations maintains the framework, tools, methodologies, artifacts. • Incorporate BRM processes and capabilities into day-to-day processes. • Invest internally to improve processes versus externally leaving processes as-is. Business strategy alignment and cost optimization through implementation of the BRM strategies provides company with a range of options to improve business value BRM Program Value Proposition • Reduce business risks • Enable business growth • Invest in gap reduction • Take complexity out • Create choice and flexible BRM options 11
© Copyright 2024