]>
Commit | Line | Data |
---|---|---|
34c26412 OW |
1 | XBZRLE (Xor Based Zero Run Length Encoding) |
2 | =========================================== | |
3 | ||
4 | Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction | |
5 | of VM downtime and the total live-migration time of Virtual machines. | |
6 | It is particularly useful for virtual machines running memory write intensive | |
7 | workloads that are typical of large enterprise applications such as SAP ERP | |
8 | Systems, and generally speaking for any application that uses a sparse memory | |
9 | update pattern. | |
10 | ||
11 | Instead of sending the changed guest memory page this solution will send a | |
12 | compressed version of the updates, thus reducing the amount of data sent during | |
13 | live migration. | |
14 | In order to be able to calculate the update, the previous memory pages need to | |
15 | be stored on the source. Those pages are stored in a dedicated cache | |
16 | (hash table) and are accessed by their address. | |
17 | The larger the cache size the better the chances are that the page has already | |
18 | been stored in the cache. | |
19 | A small cache size will result in high cache miss rate. | |
20 | Cache size can be changed before and during migration. | |
21 | ||
22 | Format | |
23 | ======= | |
24 | ||
25 | The compression format performs a XOR between the previous and current content | |
26 | of the page, where zero represents an unchanged value. | |
27 | The page data delta is represented by zero and non zero runs. | |
28 | A zero run is represented by its length (in bytes). | |
29 | A non zero run is represented by its length (in bytes) and the new data. | |
30 | The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128) | |
31 | ||
32 | There can be more than one valid encoding, the sender may send a longer encoding | |
33 | for the benefit of reducing computation cost. | |
34 | ||
35 | page = zrun nzrun | |
36 | | zrun nzrun page | |
37 | ||
38 | zrun = length | |
39 | ||
40 | nzrun = length byte... | |
41 | ||
42 | length = uleb128 encoded integer | |
43 | ||
44 | On the sender side XBZRLE is used as a compact delta encoding of page updates, | |
45 | retrieving the old page content from the cache (default size of 512 MB). The | |
46 | receiving side uses the existing page's content and XBZRLE to decode the new | |
47 | page's content. | |
48 | ||
49 | This work was originally based on research results published | |
50 | VEE 2011: Evaluation of Delta Compression Techniques for Efficient Live | |
51 | Migration of Large Virtual Machines by Benoit, Svard, Tordsson and Elmroth. | |
52 | Additionally the delta encoder XBRLE was improved further using the XBZRLE | |
53 | instead. | |
54 | ||
55 | XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it | |
56 | ideal for in-line, real-time encoding such as is needed for live-migration. | |
57 | ||
58 | Example | |
59 | old buffer: | |
60 | 1001 zeros | |
61 | 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 68 00 00 6b 00 6d | |
62 | 3074 zeros | |
63 | ||
64 | new buffer: | |
65 | 1001 zeros | |
66 | 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 68 00 00 67 00 69 | |
67 | 3074 zeros | |
68 | ||
69 | encoded buffer: | |
70 | ||
71 | encoded length 24 | |
72 | e9 07 0f 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 03 01 67 01 01 69 | |
73 | ||
74 | Usage | |
75 | ====================== | |
76 | 1. Verify the destination QEMU version is able to decode the new format. | |
77 | {qemu} info migrate_capabilities | |
78 | {qemu} xbzrle: off , ... | |
79 | ||
80 | 2. Activate xbzrle on both source and destination: | |
81 | {qemu} migrate_set_capability xbzrle on | |
82 | ||
83 | 3. Set the XBZRLE cache size - the cache size is in MBytes and should be a | |
84 | power of 2. The cache default value is 64MBytes. (on source only) | |
85 | {qemu} migrate_set_cache_size 256m | |
86 | ||
87 | 4. Start outgoing migration | |
88 | {qemu} migrate -d tcp:destination.host:4444 | |
89 | {qemu} info migrate | |
90 | capabilities: xbzrle: on | |
91 | Migration status: active | |
92 | transferred ram: A kbytes | |
93 | remaining ram: B kbytes | |
94 | total ram: C kbytes | |
95 | total time: D milliseconds | |
96 | duplicate: E pages | |
97 | normal: F pages | |
98 | normal bytes: G kbytes | |
99 | cache size: H bytes | |
100 | xbzrle transferred: I kbytes | |
101 | xbzrle pages: J pages | |
102 | xbzrle cache miss: K | |
103 | xbzrle overflow : L | |
104 | ||
105 | xbzrle cache-miss: the number of cache misses to date - high cache-miss rate | |
106 | indicates that the cache size is set too low. | |
107 | xbzrle overflow: the number of overflows in the decoding which where the delta | |
108 | could not be compressed. This can happen if the changes in the pages are too | |
109 | large or there are many short changes; for example, changing every second byte | |
110 | (half a page). | |
111 | ||
112 | Testing: Testing indicated that live migration with XBZRLE was completed in 110 | |
113 | seconds, whereas without it would not be able to complete. | |
114 | ||
115 | A simple synthetic memory r/w load generator: | |
116 | .. include <stdlib.h> | |
117 | .. include <stdio.h> | |
118 | .. int main() | |
119 | .. { | |
120 | .. char *buf = (char *) calloc(4096, 4096); | |
121 | .. while (1) { | |
122 | .. int i; | |
123 | .. for (i = 0; i < 4096 * 4; i++) { | |
124 | .. buf[i * 4096 / 4]++; | |
125 | .. } | |
126 | .. printf("."); | |
127 | .. } | |
128 | .. } |