Agenda • • • • • • • What is Availability Road to Active / Active Conflicts Migrations Today’s Technologies Case Studies Questions HP & GoldenGate Software Partnership Highlights • GoldenGate’s First Product on HP NSK Delivered 1996 • Success across all geographic regions and verticals including: – banking; financial services; healthcare; retail & government. • The majority of HP NonStop customers use GoldenGate solutions today. • HP customers drove GoldenGate to support open systems. • HP customers brought us to Active/Active. • Currently engaged in other areas of HP. HP-UX, HP Neoview and Blades. What is Availability? Three States of Availability Operational Application #1: Active #2: Planned Unplanned outage Outage Banking Transaction Processing Retail POS / Order Processing Healthcare Physician Order Entry Clinical Information Systems Customer facing Applications Telecommunications & Billing Performance, Latency, Scalability #3: Unplanned Outage Migrations System Failure Upgrades Data Failure Maintenance Road to Active / Active Goals of an Active/Active Implementation • Better use of existing hardware – Put your backup system to use • Continually test backup system – It is working right now • Reduce response time – Handle peaks – each processing a portion of load – Maintain your system with planned switchovers • Allows phased Migrations/Upgrades (no downtime)! – Once you have the ability to process on two systems, you can perform phased migrations How GoldenGate TDM Works: Modular “Building Blocks” Capture: Committed changes are captured (and can be filtered) as they occur by reading the transaction logs. Trail files: Universal data format enables heterogeneity. Route: No distance constraints via TCP/IP. Compression & encryption. Delivery: Applies transactional data with guaranteed integrity. Source Trail Capture Source Database LAN / WAN / Internet Target Trail Target Trail Deliv er Source Trail Deliver Capture Bi-directional Target Database Uni-Directional Plus Live Reporting When you need: • • • Current up-to-the-minute reporting information Reduce impact of reporting demands on your production system Verification of your failover data readiness Under Normal Operating Conditions PRIMARY SYSTEM AVAILABLE for § BOTH READ and WRITE SECONDARY SYSTEM AVAILABLE for • ONLY READ operations Live Standby (Active – Passive) When you need: • • • • • Live reporting+ Fastest possible recovery & switchover Reverse direction replication ready Next best thing to Active-Active Backup that can be used for reporting Under Normal Operating Conditions PRIMARY SYSTEM AVAILABLE for • BOTH READ and WRITE SECONDARY SYSTEM AVAILABLE for • ONLY READ operations Active / Active – Data Routed to Avoid Data Collision When you need: • • • Continuous availability Transaction load distribution Performance scalability Under Normal Operating Conditions Both SYSTEMS AVAILABLE for • BOTH READ and WRITE Active / Active – With Data Collisions When you need: • Continuous availability • Transaction load distribution • Performance scalability • Conflict detection & resolution Under Normal Operating Conditions Both SYSTEMS AVAILABLE for • BOTH READ and WRITE Conflicts: Avoidance, Detection, and Resolution Active/Active - Considerations Loop Detection • • Detecting if operation was performed by replication component or the application Sometimes referenced as ping-pong detection Conflict Avoidance • Building an environment where conflicts are avoided under normal processing conditions Conflict Detection • Detecting if the same row was updated on both the source and target before the changes were applied by data replication Conflict Resolution • Determining business rules on how to handle collisions Conflict Avoidance • Application partitioning – User-based – Account number based – Geographic – … • Database Key partitioning – Even vs. Odd – Increments by server count (1,4,7,10…) (2,5,8,11…) (3,6,9,12…) Conflict Scenarios • Database Design – Key Sequencing • Application Logic – Account Balance – Inventory – Customer address • Network Outage – What do you do? Conflict Resolution Approaches • Exception handling / management – Human intervention – Automated approaches • Simple automated approaches – Timestamp – Trusted source / site priority – Merge approach • Complex automated approaches – Quantitative resolution – Complex rules-based resolution Migrations Migration Challenges • Maintaining SLA during planned outage – Revenue Impact – Customer Expectations – Interdependencies, Integration • Synchronization issues – Incremental data movement – Source database impact • Data issues – – – – Instantiating Terabytes/Petabytes Staging areas Change Management Special Handling • Failback strategy – System/Application verification – Continued data growth • Application Availability High Availability Zero database downtime and minimal application downtime during the project Low Impact Non-intrusive on the source database and OLTP activity • Data Issues Real Time Real-time incremental synchronization of data transactions during the migration • Risk Mitigation Verification Verification of data between the databases before the cutover Failback Failback solution in the event of unexpected issues on the new environment If it ain’t broken… Why do they migrate critical systems? • Their hardware or operating system is at “end-of-life” – Tru64, OpenVMS, old hardware … • Their application version is no longer supported – Siebel 6.x, GE Carecast, etc – Take advantage of new features • Data center consolidation / virtualization – Operating old servers becomes increasingly expensive – TCO reduction, MIPS reduction • Change in vendor / strategy – Mainframe to HP-UX Three Flavors of Migrations Unidirectional Migration • • Eliminate downtime during the data migration – Data on target is at near-zero lag from source data Big-bang cutover with no fail-back Big-Bang Cutover Target Trail Source Trail Deliver Capture Source Database Target Database Verify Unidirectional Migration with Failback Option • • Eliminate downtime during the data migration Big-bang cutover with failback – capture transactions on new system and if something goes wrong, bring old system up-to-speed (failback requires downtime) Big-Bang Cutover Target Trail Source Trail Deliver Capture Fail-back Contingency Source Database Failback Trail Failback Trail Capture Delivery Verify Target Database Bidirectional Migration • • • • Eliminate downtime during the data migration Gradual cutover with two active systems Switch users back and forth on a schedule Not Trivial – Need Application knowledge (Packaged Solutions for BASE24, GE Carecast, Siebel) Phased Cutovers Target Trail Source Trail Capture Source Database Deliver Source Trail Target Trail Capture Delivery Verify Target Database Migration Validation How Confident Are You: Does Node A = Node B? Visibility to act on discrepancies sooner Why Veridata? Data Discrepancies are a Reality… User errors § Input errors § Unintended use § Malicious intent Application errors § Faulty logic § Failed upgrades § Latent bugs Infrastructure errors § System failure § Disk corruption § Network outage Configuration errors § Applications § Replication § Network “Although redundancy in a data architecture will be added value in some cases and required in others, redundancy introduces the risk of discrepancies when all related copies of data are not kept in sync and current.” -- Ted Friedman, Gartner, January 2004 GoldenGate Veridata: How it Works • • • The user chooses tables or files on the source and target databases The comparison is initiated from the Veridata web-based UI or command line As the databases continue to change, GoldenGate Veridata reports: – Persistent discrepancies – In-flight data discrepancies (user configurable) Today’s Technologies Hardware Redundancies • Hardware / Operating System Redundancies – – – Tandem Stratus Clustering • Database Server Redundancies – – Oracle RAC DB2 Sysplex/Datasharing • Storage Redundancies – – – Storage Mirroring Host-based Mirroring Raid • Backup Technology – – Backups Snapshots Hardware Redundancies • Pros – Non intrusive – Easy to implement – Complementary strategy • Cons • • • • • No heterogeneous support Exact environments Inflexible Recovery is not instantaneous Distance constraints Replication Technology • Physical Replication – – – – EMC Fujitsu Hitachi Veritas • Logical Replication – – – – DRNet GoldenGate RDF Shadowbase Physical Replication • Pros – Non-intrusive – Easy to implement – Complementary strategy • Cons – – – – – No heterogeneous support Exact environments Inflexible Recovery is all or nothing Distance constraints Logical Replication • Pros – – – – – – – – Selective Filtering Mapping Transformation Active/Active Targeted repair No distance constraints Flexible topologies (one-to-many) • Cons – Not a black box implementation Logical Replication – Further Breakdown • Tightly Coupled/Peer to Peer – Pros • Less processes – Cons • • • • Trouble with outages Hard to scale for high volumes Inflexible topologies Harder to implement heterogeneous capabilities • Decoupled Architecture – Pros • • • • Handle outages by design Create non-equal source and target pairs for better scalability Easy to add new platforms Easy to add new databases – Cons • More processes Change Data Capture - Techniques • Shadow Tables – • Timestamp Based – Pros • No modifications to the Application • No increased I/O in commit path • Easiest to code • Custom tailored capture • Real-Time capture – Cons • • • • Application intrusive Increased I/O in commit path Inflexible to Application changes Second toughest to code • Trigger Based – Pros • • • • – – • Log Based Custom tailored capture No modifications to application Real-Time capture Second easiest to code • Increased I/O in commit path • Inflexible to Application changes Cons • Batch capture • Impact on Source system • Scripts and timestamp management – Cons Pros Pros • • • • • – No modifications to the Application No increased I/O in commit path Custom tailored capture No modifications to application Real-Time capture Cons • Toughest to code GoldenGate TDM: Heterogeneity Supports Applications Running On… Databases Capture: § Oracle § DB2 UDB § Microsoft SQL Server § Sybase ASE § Teradata § Enscribe § SQL/MP § SQL/MX § Ingres Delivery: § All listed above § MySQL and any ODBC compatible databases O/S and Platforms HP NonStop (S series, Itanium, Blades, Neoview) HP-UX HP TRU64 Windows 2000, 2003, XP Linux Sun Solaris IBM AIX IBM z/OS OpenVMS Customer Case Studies Case Study: Bank of America Zero Downtime for 18,000 ATMs 18,000 ATMs Continuously Available Business Challenges: § 100% availability for systems supporting 18,000 ATMs § Disaster Tolerance: Reduce switchover time § Consolidate data from 4 geographically dispersed Data Centers into a single system § Support active-active for HA and fraud detection § Synchronize thousands of transactions per second, millions per day GoldenGate Solution: § High availability, dual-active solution with advanced conflict resolution capabilities § Live Standby into data centers § Enables zero downtime migrations, system upgrades § Results: § Reduced application recovery time by 90% § Eliminate outages for application, database and OS upgrades Fraud Detection Application Dual-Active ACI BASE24 HP Nonstop ATMs SF ACI BASE24 HP Nonstop VA Hot Backup Site: Kansas City Data Center ACI Base 24 LA ATMs ACI Base 24 TX “GoldenGate offered us benefits that would also enable us to meet our long term goals.” - Michele Schwappach, SVP Senior Technology Manager, Bank of America Case Study: US Bank Active/Active for Continuous Uptime Business Challenges: § 100% availability for systems supporting 2,500 branches & 5,000 ATMs in US. § Zero Downtime during critical application upgrades/migrations. § Scalability as systems grow. § Load balancing and improved response times and performance. § Ability to handle data conflicts. GoldenGate Solution: § High availability, dual-active solution with advanced conflict resolution capabilities § Enables zero downtime migrations, system upgrades § Started with Active/Passive and moved to Active/Active environment. § US Bank created its own user-exits to handle data collisions. § Results: Continuous uptime § US Bank’s customers are happy. More casino customers now! 5,000 ATMs & 2,500 Branches Continuously Available ACI Base24 ACI Base24 Dual-Active HP Nonstop St Paul, MN HP Nonstop Portland, OR MS SQL Server Data Warehouse “Active-active implementations can seem like a daunting task but this should not discourage you from pursuing such a solution because the benefits are tremendous” Rich Rosales, Development Manager, US Bancorp Case Study: MGM Mirage No Gamble for High Availability & Real-Time Data Warehousing Business Challenges: • Improve availability for casino marker & money management systems • Integrate data in real-time from cage/money mgmt systems, property mgmt & players club to enterprise data warehouse (EDW) • Improve customer service and business intelligence for marketing & customer service. GoldenGate Solution: • GoldenGate Live Standby for real-time copies of production systems with no downtime • GoldenGate real-time data feeds into EDW increases the value of MGM’s consolidated customer view • Migrate Players Club system from SQL Server 2000-2005 & upgrade hardware (future). Continuously Available Applications & Single View of the Customer Cage & Marker Mgmt. & Property Mgmt For MGM Cage & Marker Mgmt. Backups HP Nonstop Bellagio Backups HP Nonstop Treasure Island Stratus MGM Bellagio Opera Property Management System (Oracle) Enterprise Data Warehouse (SQL Server 2000) Players Club Program SQL Server 2005 SQL Server 2000 Results: § No Downtime for mission critical systems § Real-time consolidated view of customer in EDW Thank You [email protected] [email protected] Questions?
© Copyright 2024