]>
Commit | Line | Data |
---|---|---|
34c26412 OW |
1 | XBZRLE (Xor Based Zero Run Length Encoding) |
2 | =========================================== | |
3 | ||
4 | Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction | |
5 | of VM downtime and the total live-migration time of Virtual machines. | |
6 | It is particularly useful for virtual machines running memory write intensive | |
7 | workloads that are typical of large enterprise applications such as SAP ERP | |
8 | Systems, and generally speaking for any application that uses a sparse memory | |
9 | update pattern. | |
10 | ||
11 | Instead of sending the changed guest memory page this solution will send a | |
12 | compressed version of the updates, thus reducing the amount of data sent during | |
13 | live migration. | |
14 | In order to be able to calculate the update, the previous memory pages need to | |
15 | be stored on the source. Those pages are stored in a dedicated cache | |
16 | (hash table) and are accessed by their address. | |
17 | The larger the cache size the better the chances are that the page has already | |
18 | been stored in the cache. | |
19 | A small cache size will result in high cache miss rate. | |
20 | Cache size can be changed before and during migration. | |
21 | ||
22 | Format | |
23 | ======= | |
24 | ||
25 | The compression format performs a XOR between the previous and current content | |
26 | of the page, where zero represents an unchanged value. | |
27 | The page data delta is represented by zero and non zero runs. | |
28 | A zero run is represented by its length (in bytes). | |
29 | A non zero run is represented by its length (in bytes) and the new data. | |
30 | The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128) | |
31 | ||
32 | There can be more than one valid encoding, the sender may send a longer encoding | |
33 | for the benefit of reducing computation cost. | |
34 | ||
35 | page = zrun nzrun | |
36 | | zrun nzrun page | |
37 | ||
38 | zrun = length | |
39 | ||
40 | nzrun = length byte... | |
41 | ||
42 | length = uleb128 encoded integer | |
43 | ||
44 | On the sender side XBZRLE is used as a compact delta encoding of page updates, | |
7c2b0f65 | 45 | retrieving the old page content from the cache (default size of 64MB). The |
34c26412 OW |
46 | receiving side uses the existing page's content and XBZRLE to decode the new |
47 | page's content. | |
48 | ||
49 | This work was originally based on research results published | |
50 | VEE 2011: Evaluation of Delta Compression Techniques for Efficient Live | |
51 | Migration of Large Virtual Machines by Benoit, Svard, Tordsson and Elmroth. | |
52 | Additionally the delta encoder XBRLE was improved further using the XBZRLE | |
53 | instead. | |
54 | ||
55 | XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it | |
56 | ideal for in-line, real-time encoding such as is needed for live-migration. | |
57 | ||
58 | Example | |
59 | old buffer: | |
60 | 1001 zeros | |
61 | 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 68 00 00 6b 00 6d | |
62 | 3074 zeros | |
63 | ||
64 | new buffer: | |
65 | 1001 zeros | |
66 | 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 68 00 00 67 00 69 | |
67 | 3074 zeros | |
68 | ||
69 | encoded buffer: | |
70 | ||
71 | encoded length 24 | |
72 | e9 07 0f 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 03 01 67 01 01 69 | |
73 | ||
27af7d6e C |
74 | Cache update strategy |
75 | ===================== | |
7c2b0f65 | 76 | Keeping the hot pages in the cache is effective for decreasing cache |
27af7d6e C |
77 | misses. XBZRLE uses a counter as the age of each page. The counter will |
78 | increase after each ram dirty bitmap sync. When a cache conflict is | |
79 | detected, XBZRLE will only evict pages in the cache that are older than | |
80 | a threshold. | |
81 | ||
34c26412 OW |
82 | Usage |
83 | ====================== | |
84 | 1. Verify the destination QEMU version is able to decode the new format. | |
85 | {qemu} info migrate_capabilities | |
86 | {qemu} xbzrle: off , ... | |
87 | ||
88 | 2. Activate xbzrle on both source and destination: | |
89 | {qemu} migrate_set_capability xbzrle on | |
90 | ||
91 | 3. Set the XBZRLE cache size - the cache size is in MBytes and should be a | |
92 | power of 2. The cache default value is 64MBytes. (on source only) | |
93 | {qemu} migrate_set_cache_size 256m | |
94 | ||
06b1c6f8 MZ |
95 | Commit 73af8dd8d7 "migration: Make xbzrle_cache_size a migration parameter" |
96 | (v2.11.0) deprecated migrate-set-cache-size, therefore, the new parameter | |
97 | is recommended. | |
98 | {qemu} migrate_set_parameter xbzrle-cache-size 256m | |
99 | ||
34c26412 OW |
100 | 4. Start outgoing migration |
101 | {qemu} migrate -d tcp:destination.host:4444 | |
102 | {qemu} info migrate | |
103 | capabilities: xbzrle: on | |
104 | Migration status: active | |
105 | transferred ram: A kbytes | |
106 | remaining ram: B kbytes | |
107 | total ram: C kbytes | |
108 | total time: D milliseconds | |
109 | duplicate: E pages | |
110 | normal: F pages | |
111 | normal bytes: G kbytes | |
112 | cache size: H bytes | |
113 | xbzrle transferred: I kbytes | |
114 | xbzrle pages: J pages | |
afb5d01c | 115 | xbzrle cache miss: K pages |
6bcd361a MZ |
116 | xbzrle cache miss rate: L |
117 | xbzrle encoding rate: M | |
118 | xbzrle overflow: N | |
34c26412 | 119 | |
6bcd361a | 120 | xbzrle cache miss: the number of cache misses to date - high cache-miss rate |
34c26412 OW |
121 | indicates that the cache size is set too low. |
122 | xbzrle overflow: the number of overflows in the decoding which where the delta | |
123 | could not be compressed. This can happen if the changes in the pages are too | |
124 | large or there are many short changes; for example, changing every second byte | |
125 | (half a page). | |
126 | ||
127 | Testing: Testing indicated that live migration with XBZRLE was completed in 110 | |
128 | seconds, whereas without it would not be able to complete. | |
129 | ||
130 | A simple synthetic memory r/w load generator: | |
131 | .. include <stdlib.h> | |
132 | .. include <stdio.h> | |
133 | .. int main() | |
134 | .. { | |
135 | .. char *buf = (char *) calloc(4096, 4096); | |
136 | .. while (1) { | |
137 | .. int i; | |
138 | .. for (i = 0; i < 4096 * 4; i++) { | |
139 | .. buf[i * 4096 / 4]++; | |
140 | .. } | |
141 | .. printf("."); | |
142 | .. } | |
143 | .. } |