Agenda • What is Availability Road to Active / Active

Agenda
•
•
•
•
•
•
•
What is Availability
Road to Active / Active
Conflicts
Migrations
Today’s Technologies
Case Studies
Questions
HP & GoldenGate Software Partnership Highlights
•
GoldenGate’s First Product on HP NSK Delivered 1996
•
Success across all geographic regions and verticals including:
– banking; financial services; healthcare; retail & government.
•
The majority of HP NonStop customers use GoldenGate solutions today.
•
HP customers drove GoldenGate to support open systems.
•
HP customers brought us to Active/Active.
•
Currently engaged in other areas of HP. HP-UX, HP Neoview and Blades.
What is Availability?
Three States of Availability
Operational Application
#1: Active
#2: Planned
Unplanned
outage
Outage
Banking Transaction Processing
Retail POS / Order Processing
Healthcare Physician Order Entry
Clinical Information Systems
Customer facing Applications
Telecommunications & Billing
Performance, Latency,
Scalability
#3: Unplanned
Outage
Migrations
System Failure
Upgrades
Data Failure
Maintenance
Road to Active / Active
Goals of an Active/Active Implementation
• Better use of existing hardware
– Put your backup system to use
• Continually test backup system
– It is working right now
• Reduce response time
– Handle peaks – each processing a portion of load
– Maintain your system with planned switchovers
• Allows phased Migrations/Upgrades (no downtime)!
– Once you have the ability to process on two systems, you can perform phased
migrations
How GoldenGate TDM Works: Modular “Building Blocks”
Capture: Committed changes are captured (and can be filtered) as they
occur by reading the transaction logs.
Trail files: Universal data format enables heterogeneity.
Route: No distance constraints via TCP/IP. Compression & encryption.
Delivery: Applies transactional data with
guaranteed integrity.
Source Trail
Capture
Source
Database
LAN / WAN /
Internet
Target Trail
Target Trail
Deliv
er
Source Trail
Deliver
Capture
Bi-directional
Target
Database
Uni-Directional Plus Live Reporting
When you need:
•
•
•
Current up-to-the-minute reporting information
Reduce impact of reporting demands on your production
system
Verification of your failover data readiness
Under Normal Operating Conditions
PRIMARY SYSTEM AVAILABLE for
§ BOTH READ and WRITE
SECONDARY SYSTEM AVAILABLE for
• ONLY READ operations
Live Standby (Active – Passive)
When you need:
•
•
•
•
•
Live reporting+
Fastest possible recovery & switchover
Reverse direction replication ready
Next best thing to Active-Active
Backup that can be used for reporting
Under Normal Operating Conditions
PRIMARY SYSTEM AVAILABLE for
• BOTH READ and WRITE
SECONDARY SYSTEM AVAILABLE for
• ONLY READ operations
Active / Active – Data Routed to Avoid Data Collision
When you need:
•
•
•
Continuous availability
Transaction load distribution
Performance scalability
Under Normal Operating Conditions
Both SYSTEMS AVAILABLE for
• BOTH READ and WRITE
Active / Active – With Data Collisions
When you need:
•
Continuous availability
•
Transaction load distribution
•
Performance scalability
•
Conflict detection & resolution
Under Normal Operating Conditions
Both SYSTEMS AVAILABLE for
•
BOTH READ and WRITE
Conflicts:
Avoidance, Detection, and Resolution
Active/Active - Considerations
Loop Detection
•
•
Detecting if operation was performed by replication component or the
application
Sometimes referenced as ping-pong detection
Conflict Avoidance
•
Building an environment where conflicts are avoided under normal
processing conditions
Conflict Detection
•
Detecting if the same row was updated on both the source and target
before the changes were applied by data replication
Conflict Resolution
•
Determining business rules on how to handle collisions
Conflict Avoidance
• Application partitioning
– User-based
– Account number based
– Geographic
– …
• Database Key partitioning
– Even vs. Odd
– Increments by server count (1,4,7,10…) (2,5,8,11…)
(3,6,9,12…)
Conflict Scenarios
• Database Design
– Key Sequencing
• Application Logic
– Account Balance
– Inventory
– Customer address
• Network Outage
– What do you do?
Conflict Resolution Approaches
• Exception handling / management
– Human intervention
– Automated approaches
• Simple automated approaches
– Timestamp
– Trusted source / site priority
– Merge approach
• Complex automated approaches
– Quantitative resolution
– Complex rules-based resolution
Migrations
Migration Challenges
• Maintaining SLA during planned
outage
– Revenue Impact
– Customer Expectations
– Interdependencies, Integration
• Synchronization issues
– Incremental data movement
– Source database impact
• Data issues
–
–
–
–
Instantiating Terabytes/Petabytes
Staging areas
Change Management
Special Handling
• Failback strategy
– System/Application verification
– Continued data growth
• Application Availability
High Availability
Zero database downtime and minimal application
downtime during the project
Low Impact
Non-intrusive on the source database and OLTP activity
• Data Issues
Real Time
Real-time incremental synchronization of data
transactions during the migration
• Risk Mitigation
Verification
Verification of data between the databases before the
cutover
Failback
Failback solution in the event of unexpected issues on
the new environment
If it ain’t broken… Why do they migrate critical systems?
•
Their hardware or operating system is at “end-of-life”
– Tru64, OpenVMS, old hardware …
•
Their application version is no longer supported
– Siebel 6.x, GE Carecast, etc
– Take advantage of new features
•
Data center consolidation / virtualization
– Operating old servers becomes increasingly expensive
– TCO reduction, MIPS reduction
•
Change in vendor / strategy
– Mainframe to HP-UX
Three Flavors of Migrations
Unidirectional Migration
•
•
Eliminate downtime during the data migration
– Data on target is at near-zero lag from source data
Big-bang cutover with no fail-back
Big-Bang Cutover
Target Trail
Source Trail
Deliver
Capture
Source
Database
Target
Database
Verify
Unidirectional Migration with Failback Option
•
•
Eliminate downtime during the data migration
Big-bang cutover with failback
– capture transactions on new system and if something goes wrong, bring
old system up-to-speed (failback requires downtime)
Big-Bang Cutover
Target Trail
Source Trail
Deliver
Capture
Fail-back Contingency
Source
Database
Failback Trail
Failback Trail
Capture
Delivery
Verify
Target
Database
Bidirectional Migration
•
•
•
•
Eliminate downtime during the data migration
Gradual cutover with two active systems
Switch users back and forth on a schedule
Not Trivial – Need Application knowledge (Packaged Solutions for BASE24, GE
Carecast, Siebel)
Phased Cutovers
Target Trail
Source Trail
Capture
Source
Database
Deliver
Source Trail
Target Trail
Capture
Delivery
Verify
Target
Database
Migration Validation
How Confident Are You: Does Node A = Node B?
Visibility to act on discrepancies
sooner
Why Veridata? Data Discrepancies are a Reality…
User errors
§ Input errors
§ Unintended use
§ Malicious intent
Application errors
§ Faulty logic
§ Failed upgrades
§ Latent bugs
Infrastructure errors
§ System failure
§ Disk corruption
§ Network outage
Configuration errors
§ Applications
§ Replication
§ Network
“Although redundancy in a data architecture will be added value in some cases and required in others,
redundancy introduces the risk of discrepancies when all related copies of data are not kept in sync
and current.”
-- Ted Friedman, Gartner, January 2004
GoldenGate Veridata: How it Works
•
•
•
The user chooses tables or files on the source and target databases
The comparison is initiated from the Veridata web-based UI or command line
As the databases continue to change, GoldenGate Veridata reports:
– Persistent discrepancies
– In-flight data discrepancies (user configurable)
Today’s Technologies
Hardware Redundancies
• Hardware / Operating System Redundancies
–
–
–
Tandem
Stratus
Clustering
• Database Server Redundancies
–
–
Oracle RAC
DB2 Sysplex/Datasharing
• Storage Redundancies
–
–
–
Storage Mirroring
Host-based Mirroring
Raid
• Backup Technology
–
–
Backups
Snapshots
Hardware Redundancies
• Pros
– Non intrusive
– Easy to implement
– Complementary strategy
• Cons
•
•
•
•
•
No heterogeneous support
Exact environments
Inflexible
Recovery is not instantaneous
Distance constraints
Replication Technology
• Physical Replication
–
–
–
–
EMC
Fujitsu
Hitachi
Veritas
• Logical Replication
–
–
–
–
DRNet
GoldenGate
RDF
Shadowbase
Physical Replication
• Pros
– Non-intrusive
– Easy to implement
– Complementary strategy
• Cons
–
–
–
–
–
No heterogeneous support
Exact environments
Inflexible
Recovery is all or nothing
Distance constraints
Logical Replication
• Pros
–
–
–
–
–
–
–
–
Selective
Filtering
Mapping
Transformation
Active/Active
Targeted repair
No distance constraints
Flexible topologies (one-to-many)
• Cons
–
Not a black box implementation
Logical Replication – Further Breakdown
• Tightly Coupled/Peer to Peer
– Pros
• Less processes
– Cons
•
•
•
•
Trouble with outages
Hard to scale for high volumes
Inflexible topologies
Harder to implement heterogeneous capabilities
• Decoupled Architecture
– Pros
•
•
•
•
Handle outages by design
Create non-equal source and target pairs for better scalability
Easy to add new platforms
Easy to add new databases
– Cons
• More processes
Change Data Capture - Techniques
• Shadow Tables
–
• Timestamp Based
–
Pros
• No modifications to the Application
• No increased I/O in commit path
• Easiest to code
• Custom tailored capture
• Real-Time capture
–
Cons
•
•
•
•
Application intrusive
Increased I/O in commit path
Inflexible to Application changes
Second toughest to code
• Trigger Based
–
Pros
•
•
•
•
–
–
• Log Based
Custom tailored capture
No modifications to application
Real-Time capture
Second easiest to code
• Increased I/O in commit path
• Inflexible to Application changes
Cons
• Batch capture
• Impact on Source system
• Scripts and timestamp management
–
Cons
Pros
Pros
•
•
•
•
•
–
No modifications to the Application
No increased I/O in commit path
Custom tailored capture
No modifications to application
Real-Time capture
Cons
• Toughest to code
GoldenGate TDM: Heterogeneity Supports Applications Running On…
Databases
Capture:
§ Oracle
§ DB2 UDB
§ Microsoft SQL Server
§ Sybase ASE
§ Teradata
§ Enscribe
§ SQL/MP
§ SQL/MX
§ Ingres
Delivery:
§ All listed above
§ MySQL and any ODBC compatible
databases
O/S and Platforms
HP NonStop
(S series, Itanium, Blades, Neoview)
HP-UX
HP TRU64
Windows 2000, 2003, XP
Linux
Sun Solaris
IBM AIX
IBM z/OS
OpenVMS
Customer Case Studies
Case Study: Bank of America
Zero Downtime for 18,000 ATMs
18,000 ATMs Continuously Available
Business Challenges:
§ 100% availability for systems supporting 18,000
ATMs
§ Disaster Tolerance: Reduce switchover time
§ Consolidate data from 4 geographically dispersed
Data Centers into a single system
§ Support active-active for HA and fraud detection
§ Synchronize thousands of transactions per
second, millions per day
GoldenGate Solution:
§ High availability, dual-active solution with
advanced conflict resolution capabilities
§ Live Standby into data centers
§ Enables zero downtime migrations, system
upgrades
§ Results:
§ Reduced application recovery time by 90%
§ Eliminate outages for application, database
and OS upgrades
Fraud Detection
Application
Dual-Active
ACI BASE24
HP Nonstop
ATMs
SF
ACI BASE24
HP Nonstop
VA
Hot Backup Site:
Kansas City Data Center
ACI Base 24
LA
ATMs
ACI Base 24
TX
“GoldenGate offered us benefits that would also
enable us to meet our long term goals.”
- Michele Schwappach, SVP Senior Technology Manager,
Bank of America
Case Study: US Bank
Active/Active for Continuous Uptime
Business Challenges:
§ 100% availability for systems supporting 2,500
branches & 5,000 ATMs in US.
§ Zero Downtime during critical application
upgrades/migrations.
§ Scalability as systems grow.
§ Load balancing and improved response times and
performance.
§ Ability to handle data conflicts.
GoldenGate Solution:
§ High availability, dual-active solution with
advanced conflict resolution capabilities
§ Enables zero downtime migrations, system
upgrades
§ Started with Active/Passive and moved to
Active/Active environment.
§ US Bank created its own user-exits to handle data
collisions.
§ Results: Continuous uptime
§ US Bank’s customers are happy. More casino
customers now!
5,000 ATMs & 2,500 Branches
Continuously Available
ACI Base24
ACI Base24
Dual-Active
HP Nonstop
St Paul, MN
HP Nonstop
Portland, OR
MS SQL Server
Data Warehouse
“Active-active implementations can seem like a
daunting task but this should not discourage you
from pursuing such a solution because the benefits
are tremendous”
Rich Rosales, Development Manager, US Bancorp
Case Study: MGM Mirage
No Gamble for High Availability & Real-Time Data Warehousing
Business Challenges:
•
Improve availability for casino marker &
money management systems
•
Integrate data in real-time from cage/money
mgmt systems, property mgmt & players
club to enterprise data warehouse (EDW)
•
Improve customer service and business
intelligence for marketing & customer
service.
GoldenGate Solution:
•
GoldenGate Live Standby for real-time
copies of production systems with no
downtime
•
GoldenGate real-time data feeds into EDW
increases the value of MGM’s consolidated
customer view
•
Migrate Players Club system from SQL
Server 2000-2005 & upgrade hardware
(future).
Continuously Available Applications &
Single View of the Customer
Cage & Marker Mgmt.
& Property Mgmt
For MGM
Cage & Marker Mgmt.
Backups
HP Nonstop
Bellagio
Backups
HP Nonstop
Treasure Island
Stratus
MGM
Bellagio
Opera Property Management
System (Oracle)
Enterprise Data Warehouse
(SQL Server 2000)
Players Club Program
SQL Server 2005
SQL Server 2000
Results:
§ No Downtime for mission critical systems
§ Real-time consolidated view of customer in EDW
Thank You
[email protected]
[email protected]
Questions?