How to Understand Galera Alexey Yurchenko Codership Oy

How to Understand Galera
Alexey Yurchenko
Codership Oy
Agenda
●
●
The difference between traditional (e.g.
MySQL) replication and Galera.
General Galera principles.
2
Galera Difference
Traditional Replication Approach
Server-centric:
“One server streams data to another”
Replication stream
Server 1
1
Server
Server 2
“master”
“slave”
4
Traditional Replication Approach
8
9
6
7
5
10
11
12
Cool topologies!
24
4
3
23
15
13
1
14
16
17
18
22
2
1
21
19
20
5
Traditional Replication Approach
But there are questions:
●
A
●
C
If node C crashes, do we still
have a cluster?
If node B crashes and clients
failover to C, how B joins back?
●
Which node has data X?
●
How do we backup the cluster?
B
6
Galera (wsrep API) Approach
Data-centric:
There is some Data
(a dataset)
Server 1
Server 2
Server 3
...
Server N
- it is synchronized between one or more servers
7
Galera (wsrep API) Approach
Data-centric:
Data does not belong to a Node Node belongs to Data
8
Galera (wsrep API) Approach
Data-centric:
There is some Data
(a dataset)
The dataset needs an ID:
00295a79-9c48-11e2-bdf0-9a916cbb9294
9
Galera (wsrep API) Approach
Data-centric:
Dataset ID == Cluster ID
Data
Cluster
Server
Server
Server
Server
00295a79-9c48-11e2-bdf0-9a916cbb9294
10
Galera (wsrep API) Approach
Data-centric:
Cluster A (9b0e4ad8-a42b-11e2-bdf0-9851bb738edb)
A
C
B
Cluster B (e1eedd79-a5d6-11e2-a801-aae16e1f5226)
11
Global Transaction ID (GTID)
Dataset
+
sequence of atomic changes
... 64198 64199 64200 64201 64202 64203 64204 64205 64206 64207 ...
=
00295a79-9c48-11e2-bdf0-9a916cbb9294:64201
12
Global Transaction ID (GTID)
00295a79-9c48-11e2-bdf0-9a916cbb9294:64201
64201
Global Transaction ID
00295a799c48-11e2bdf09a916cbb9294:
64201
Dataset State ID
13
Global Transaction ID (GTID)
00295a79-9c48-11e2-bdf0-9a916cbb9294:0
- initial data
00295a79-9c48-11e2-bdf0-9a916cbb9294:1
- first change/transaction
00000000-0000-0000-0000-000000000000:-1
- undefined GTID
14
Global Transaction ID (GTID)
Galera GTID:
00295a79-9c48-11e2-bdf0-9a916cbb9294:64201
MySQL 5.6 GTID:
8182213e-7c1e-11e2-a6e2-080027635ef5:12345
15
Global Transaction ID (GTID)
Galera GTID:
00295a79-9c48-11e2-bdf0-9a916cbb9294:64201
Cluster ID
MySQL 5.6 GTID:
8182213e-7c1e-11e2-a6e2-080027635ef5:12345
Server ID
16
Global Transaction ID (GTID)
Galera GTID:
00295a79-9c48-11e2-bdf0-9a916cbb9294:64201
Cluster ID
data change
in the cluster
MySQL 5.6 GTID:
8182213e-7c1e-11e2-a6e2-080027635ef5:12345
transaction
Server ID
processed
by the server
17
Global Transaction ID (GTID)
What we see in MySQL 5.6:
8182213e-7c1e-11e2-a6e2-080027635ef5:12345
8182213e-7c1e-11e2-a6e2-080027635ef5:12346
8182213e-7c1e-11e2-a6e2-080027635ef5:12347
← new master promoted →
f4e3bf7a-a91f-11e2-4e02-3f8dbcffaed8:1
f4e3bf7a-a91f-11e2-4e02-3f8dbcffaed8:2
f4e3bf7a-a91f-11e2-4e02-3f8dbcffaed8:3
18
Global Transaction ID (GTID)
What we see in Galera:
00295a79-9c48-11e2-bdf0-9a916cbb9294:64201
00295a79-9c48-11e2-bdf0-9a916cbb9294:64202
00295a79-9c48-11e2-bdf0-9a916cbb9294:64203
← new master promoted →
00295a79-9c48-11e2-bdf0-9a916cbb9294:64204
00295a79-9c48-11e2-bdf0-9a916cbb9294:64205
00295a79-9c48-11e2-bdf0-9a916cbb9294:64206
19
Global Transaction ID (GTID)
1) Galera nodes are ANONYMOUS =>
all equal.
2) Galera cluster == one big distributed
“master”.
3) Asynchronous replication to/from
Galera cluster is now piece of cake.
20
Global Transaction ID (GTID)
ASYNCHRONOUS
UUID1
UUID2
21
Global Transaction ID (GTID)
ASYNCHRONOUS
UUID1
UUID2
UUID1:123
UUID2:539
UUID1:123
UUID2:543
22
Global Transaction ID (GTID)
SYNCHRONOUS / ASYNCHRONOUS
<==>
SINGLE DATABASE / INDEPENDENT DATABASES
<==>
CONSISTENCY / INCONSISTENCY
23
Master / Slave ?
client2
client1
slave1
slave2
master1
slave2
master for
client 1
cluster
slave1
slave2
slave1
master2
slave1
slave2
master for
client 2
24
Master / Slave ?
1. Not a node role/function.
2. Is a relation between a node and a client.
25
Galera Cluster as a Meeting
00295a79-9c48-11e2-bdf0-9a916cbb9294
26
Galera Cluster as a Meeting
00295a79-9c48-11e2-bdf0-9a916cbb9294
27
Galera Cluster as a Meeting
00295a79-9c48-11e2-bdf0-9a916cbb9294
28
Galera Cluster as a Meeting
00295a79-9c48-11e2-bdf0-9a916cbb9294
29
Galera Cluster as a Meeting
00295a79-9c48-11e2-bdf0-9a916cbb9294
30
Galera Cluster as a Meeting
00295a79-9c48-11e2-bdf0-9a916cbb9294
31
Galera Cluster as a Meeting
00295a79-9c48-11e2-bdf0-9a916cbb9294
32
Galera Cluster as a Meeting
00295a79-9c48-11e2-0800-9a916cbb9294
33
Galera Cluster as a Meeting
???
00295a79-9c48-11e2-0800-9a916cbb9294
34
Galera Cluster as a Meeting
New meeting!
e1eedd79-a5d6-11e2-0800-a8e16e1f5226
35
wsrep_cluster_address
10.0.0.1
10.0.0.5
?
10.0.0.4
10.0.0.2
10.0.0.3
10.0.0.6
36
wsrep_cluster_address
10.0.0.1
10.0.0.4
gcomm://10.0.0.2
10.0.0.5
handshake
10.0.0.2
10.0.0.3
10.0.0.6
37
wsrep_cluster_address
10.0.0.1
10.0.0.4
10.0.0.5
10.0.0.2
10.0.0.3
10.0.0.6
38
wsrep_cluster_address
10.0.0.1
10.0.0.2
10.0.0.4
10.0.0.5
10.0.0.6
10.0.0.3
39
wsrep_cluster_address
10.0.0.5
?
Nothing
here
40
wsrep_cluster_address
wsrep_cluster_address = gcomm://node1,node2
=> try to connect to members: node1, node2
wsrep_cluster_address = gcomm://
=> no members in the cluster, you are the first
one. Start a new cluster.
41
Node Synchronization (State Transfer)
10.0.0.1
10.0.0.2
10.0.0.4
10.0.0.5
10.0.0.6
10.0.0.3
42
Node Synchronization (State Transfer)
JOINED
SYNCED
SYNCED
UNDEFINED
DESYNC
SYNCED
43
Node Synchronization (State Transfer)
New node
Cluster
Old node
UNDEFINED
Here's what I've got,
SYNCED
give me what I'm missing
JOINER
Found a donor
You be
for you, please hold
the donor
DONOR
Here's your stuff
JOINED
(through private channel)
JOINED
catch-up
catch-up
SYNCED
SYNCED
44
Primary Component
PRIMARY
45
Primary Component
PRIMARY
NON-PRIMARY
46
Primary Component
PRIMARY
keeps on working
NON-PRIMARY
tries to reconnect
47
Primary Component
PRIMARY
48
Primary Component
SPLIT-BRAIN!
NON-PPRIMARY
NON-PRIMARY
49
2-node Cluster and Split Brain
clients
VIP
wsrep_provider_options=”pc.ignore_sb”
50
2-node Cluster and Split Brain
Galera replication can be used in every
manner traditional asynchronous
master-slave replication is.
It implements a SUPERSET
of traditional replication functionality
51
How Synchronous is Galera?
1. Synchronous penalty.
2. Slave lag.
52
Galera Synchronous Penalty?
The only thing Galera does synchronously is
copying of data buffer to all cluster members
on COMMIT command from client.
=> ~1 RTT added latency
53
Galera Synchronous Penalty?
~1 RTT added latency
=>
Connection throughput = 1/RTT trx/sec
=>
Total throughput = 1/RTT trx/sec ✕
#connections
54
Galera Synchronous Penalty in WAN (EC2)
- sysbench client at us-east
- sysbench client at eu-west
Both clients connect to a
standalone server at
us-east accessibility zone
95%
latencies
2-node Galera cluster: us-east client
connects to us-east node, eu-west
client connects to eu-west node
55
Galera Synchronous Penalty in WAN (EC2)
- sysbench client at us-east
- sysbench client at eu-west
Both clients connect to a
standalone server at
us-east accessibility zone
Throughput
2-node Galera cluster: us-east client
connects to us-east node, eu-west
client connects to eu-west node
56
Galera Synchronous Penalty?
Still:
CALLAGHAN'S LAW:
A given row can't be modified more often than
1/RTT times a second
(discovered by Mark Callaghan)
57
Slave lag in Galera?
Client
Master node
Slave node
START TRANSACTION
process
COMMIT
Replicate
changeset
commit
apply
OK
SELECT
(stale data)
commit
58
Questions?
Thank you for listening!
Happy Clustering :-)