Dive into VM Live Migration

Dive Into VM Live Migration
OpenStack Liberty Summit 2015
Vancouver
Michał Dulko
Michał Jastrzębski
Paweł Koniszewski
Why Bother?
Use Cases
o
Imminent host failure
o
Maintenance mode
o
Optimal resource placement
Imminent Host
Failure
o
Cooling issues
o
Storage problems
o
Networking problems
o
Your datacenter was struck by a flood
Maintenance Mode
o
Firmware upgrades
o
Hardware upgrades
o
Kernel upgrades
Optimal Resource Placement
o
Reduce costs
o
o
o
Move VMs closer to their storage to lessen network
latency
Stack more VMs on hosts to save power
Increase resiliency
o
o
Noisy neighbour separation
Spread VMs across more hosts
General Flow
Assumptions
o
Live
o
Consistent
o
Transparent
o
Minimal service disruption
Migrations in OpenStack

Non-live migration (cold migration)
o

True live migration (shared storage or volume-based)
o

nova migrate <server>
nova live-migration <server> [<host>]
Block live migration
o
nova live-migration --block-migrate <server> [<host>]
Compatibility
Migration type
Local storage
Volumes
Shared storage
Block LM
✓
✗
✗
True LM
✗
✓
✓
Block LM with read-only
devices
✗
✗
✗
True LM with read-only
devices
✗
✗
✓
Live Migration Process
o
Pre-Migration
o
Reservation
o
Iterative pre-copy
o
Stop and copy
o
Commitment
Pre-migration
Pre-migration
Compute node A
Compute node B
Reservation
Iterative pre-copy
Stop and copy
VM A
Active
Commitment
Active VM on physical host A, host B
selected by scheduler or preselected.
Reservation
Pre-migration
Reservation
Compute node A
Compute node A
Compute node B
Compute node B
VM A
VM
Active
A
VM A
VM
Reserved
A
Iterative pre-copy
Stop and copy
Commitment
ACTIV
E
Confirm availability of resources on host
B; reserve a new VM.
Iterative pre-copy
Pre-migration
Compute node A
Compute node B
Reservation
Copying
Iterative pre-copy
Stop and copy
VM A
Active
VM A
Paused
Commitment
Memory is transferred from A to B and
next dirtied pages are iteratively copied.
Stop and copy
Pre-migration
Reservation
Compute node A
Compute node A
Compute node B
Compute node B
Copy
Iterative pre-copy
Stop and copy
Commitment
VM A
VM
Paused
A
PAUSE
D
VM A
VM
Paused
A
PAUSE
D
Suspend VM and copy remaining pages
and CPU state.
Commitment
Pre-migration
Reservation
Compute node A
Compute node A
Compute node B
Compute node B
Iterative pre-copy
Stop and copy
Commitment
VM
A
PAUSE
D
VM A
VM
Active
A
PAUSE
D
Host B becomes primary host for VM A.
Performance & reliability
Pitfalls
o
o
OpenStack does not allow triggering any operations
on VM during LM
VMs with intensive memory workload are hard to
migrate
o
LM generates heavy load on network
o
Migrations between CN with different CPUs
o
Memory oversubscription
Interacting With Live Migration
o
OpenStack disallow any operation on ongoing LM
o
You can use virsh instead to interact
Diagnosis
o
Information about ongoing LM
virsh domjobinfo <domain>
Time elapsed
1918595 ms
Data processed
410.137 GiB
Data remaining
4.600 GiB
Data total
16.008 GiB
Constant pages
144658
Normal pages
107307605
Normal data
409.346 GiB
Expected downtime
1023 ms
Forcing Migration Finish
o
Cancel on-going LM
virsh domjobabort <domain>
o
Pause VM during LM
virsh suspend <domain>
Tuning Maximum Downtime
o
QEMU
virsh qemu-monitor-command --hmp <domain> migrate_set_downtime <time (sec)>
o
libvirt
virsh migrate-setmaxdowntime <domain> <time (sec)>
Auto Converge
o
nova.conf setting
live_migration_flag += VIR_MIGRATE_AUTO_CONVERGE
Tunneled Migration
o
nova.conf setting
live_migration_flag += VIR_MIGRATE_TUNNELLED
libvirt
libvirt
Hypervisor
Hypervisor
Source Host
Destination Host
Tunneled Migration
o
nova.conf setting
live_migration_flag -= VIR_MIGRATE_TUNNELLED
libvirt
libvirt
Hypervisor
Hypervisor
Source Host
Destination Host
Tuning Bandwidth
o
libvirt
virsh migrate-setspeed <domain> <speed (MiB/s)>
o
nova.conf settings
live_migration_bandwidth = <speed (MiB/s)>
XBZRLE Compression
o
nova.conf settings
live_migration_flag += VIR_MIGRATE_COMPRESSED
Sent Page
Cache
Updated
Page
Source Host
Delta
Compression
Destination Host
Received
Pages
Apply
Delta
Delta
Delta
Updated
Page
LM On Dedicated Network
o
nova.conf
o
live_migration_uri = qemu+tcp://%s/system
Compute node A
Compute node B
Management Network
VM A
Active
VM A
Paused
LM On Dedicated Network
o
nova.conf
o
o
live_migration_uri = qemu+tcp://%s-lm/system
Set up your DNS to resolve hostnames with -lm suffix to IPs in your
dedicated network.
Compute node A
Compute node B
Management Network
VM A
Active
LM Network
VM A
Paused
Different CPUs Between Compute Nodes
o
CPU instruction set of source node needs to be a
subset of CPU instruction set of destination node
Compute Node A
Live Migration
Compute Node B
Passed
MMX
AVX
Failed
MMX
SSE2
AVX
Different CPUs Between Compute Nodes
o
This can be skipped by explicitly setting VM CPU
model in nova.conf:
o
o
o
o
cpu_mode = custom
virt_type = kvm or virt_type = qemu
And then you can set cpu_model
List of supported named CPUs is in
libvirt/cpu_map.xml
Memory Oversubscription
o
LM to specific host does not use memory
oversubscription
o
ram_allocation_ratio
Compute Node A
2 GB RAM
Reported RAM =
available - reserved
nova-conductor
2 GB
2 GB
2 GB
nova-scheduler
ram_allocation_ratio = 2.0
4 GB
Memory Oversubscription
o
Skip it by
o
reserved_host_memory_mb=-2048
Compute Node A
2 GB RAM
Reported RAM =
available - reserved
nova-conductor
4 GB
4 GB
4 GB
nova-scheduler
ram_allocation_ratio = 1.0
4 GB
Secure Live Migration
Why Security Matters?
o
Everything can be sniffed!
o
Migrated machines can contain sensitive data
o
Legal issues with unencrypted data transfer
Encryption
o
Hypervisor native encryption
o
o
libvirt tunneled transport
o
o
o
o
QEMU doesn’t support it
live_migration_uri = qemu+ssh://%s/system
live_migration_flag += VIR_MIGRATE_TUNNELLED
Uses only one core
IPSec tunnel between hosts
Memory Access Is Critical
3
Transfer rate [GBps]
2.5
2
Intel(R) Xeon(R) CPU E5-2690 v2
1.5
Intel(R) Xeon(R) CPU E5-2660 v3
1
0.5
0
QEMU+SSH
QEMU+TCP
Future Of Live Migration
Multithreaded Compression
o
Compress every page sent during LM
o
zlib used for compression
o
Configurable:
o
o
Number of threads
Comperession ratio
Post-copy Live Migration
o
Move workload immediately to destination host
Compute node A
Compute node B
Copying
VM A
Paused
VM A
Active
Post-copy Live Migration
o
Cheap solution to finish live migration in a finite time
o
VM needs to be rebooted in case of failure
o
Heavy performance impact
Active LM Monitoring In OpenStack
o
Track memory transfer progress
o
Detect possible problems and take actions
Actions On Ongoing Live Migration
o
Pause VM
o
Abort LM
o
See progress
o
Change configuration on the fly:
o
o
Maximum tolerable VM down time
Transfer bandwith
Your voice matters!
o
Mailing lists:
o
o
o
[email protected]
[email protected]
Win The Enterprise group:
o
o
o
[email protected] (IRC: pkoniszewski)
[email protected] (IRC: inc0)
[email protected] (IRC: dulek)
Q&A (& disclaimers)
Intel technologies’ features and benefits depend on system configuration and may require
enabled hardware, software or service activation. Performance varies depending on system
configuration. No computer system can be absolutely secure. Check with your system
manufacturer or retailer or learn more at intel.com.
Software and workloads used in performance tests may have been optimized for performance
only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
measured using specific computer systems, components, software, operations and functions. Any
change to any of those factors may cause the results to vary. You should consult other
information and performance tests to assist you in fully evaluating your contemplated
purchases, including the performance of that product when combined with other products.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
© 2015 Intel Corporation.