]>
Commit | Line | Data |
---|---|---|
34c26412 OW |
1 | XBZRLE (Xor Based Zero Run Length Encoding) |
2 | =========================================== | |
3 | ||
4 | Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction | |
5 | of VM downtime and the total live-migration time of Virtual machines. | |
6 | It is particularly useful for virtual machines running memory write intensive | |
7 | workloads that are typical of large enterprise applications such as SAP ERP | |
8 | Systems, and generally speaking for any application that uses a sparse memory | |
9 | update pattern. | |
10 | ||
11 | Instead of sending the changed guest memory page this solution will send a | |
12 | compressed version of the updates, thus reducing the amount of data sent during | |
13 | live migration. | |
14 | In order to be able to calculate the update, the previous memory pages need to | |
15 | be stored on the source. Those pages are stored in a dedicated cache | |
16 | (hash table) and are accessed by their address. | |
17 | The larger the cache size the better the chances are that the page has already | |
18 | been stored in the cache. | |
19 | A small cache size will result in high cache miss rate. | |
20 | Cache size can be changed before and during migration. | |
21 | ||
22 | Format | |
23 | ======= | |
24 | ||
25 | The compression format performs a XOR between the previous and current content | |
26 | of the page, where zero represents an unchanged value. | |
27 | The page data delta is represented by zero and non zero runs. | |
28 | A zero run is represented by its length (in bytes). | |
29 | A non zero run is represented by its length (in bytes) and the new data. | |
30 | The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128) | |
31 | ||
32 | There can be more than one valid encoding, the sender may send a longer encoding | |
33 | for the benefit of reducing computation cost. | |
34 | ||
35 | page = zrun nzrun | |
36 | | zrun nzrun page | |
37 | ||
38 | zrun = length | |
39 | ||
40 | nzrun = length byte... | |
41 | ||
42 | length = uleb128 encoded integer | |
43 | ||
44 | On the sender side XBZRLE is used as a compact delta encoding of page updates, | |
45 | retrieving the old page content from the cache (default size of 512 MB). The | |
46 | receiving side uses the existing page's content and XBZRLE to decode the new | |
47 | page's content. | |
48 | ||
49 | This work was originally based on research results published | |
50 | VEE 2011: Evaluation of Delta Compression Techniques for Efficient Live | |
51 | Migration of Large Virtual Machines by Benoit, Svard, Tordsson and Elmroth. | |
52 | Additionally the delta encoder XBRLE was improved further using the XBZRLE | |
53 | instead. | |
54 | ||
55 | XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it | |
56 | ideal for in-line, real-time encoding such as is needed for live-migration. | |
57 | ||
58 | Example | |
59 | old buffer: | |
60 | 1001 zeros | |
61 | 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 68 00 00 6b 00 6d | |
62 | 3074 zeros | |
63 | ||
64 | new buffer: | |
65 | 1001 zeros | |
66 | 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 68 00 00 67 00 69 | |
67 | 3074 zeros | |
68 | ||
69 | encoded buffer: | |
70 | ||
71 | encoded length 24 | |
72 | e9 07 0f 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 03 01 67 01 01 69 | |
73 | ||
27af7d6e C |
74 | Cache update strategy |
75 | ===================== | |
76 | Keeping the hot pages in the cache is effective for decreased cache | |
77 | misses. XBZRLE uses a counter as the age of each page. The counter will | |
78 | increase after each ram dirty bitmap sync. When a cache conflict is | |
79 | detected, XBZRLE will only evict pages in the cache that are older than | |
80 | a threshold. | |
81 | ||
34c26412 OW |
82 | Usage |
83 | ====================== | |
84 | 1. Verify the destination QEMU version is able to decode the new format. | |
85 | {qemu} info migrate_capabilities | |
86 | {qemu} xbzrle: off , ... | |
87 | ||
88 | 2. Activate xbzrle on both source and destination: | |
89 | {qemu} migrate_set_capability xbzrle on | |
90 | ||
91 | 3. Set the XBZRLE cache size - the cache size is in MBytes and should be a | |
92 | power of 2. The cache default value is 64MBytes. (on source only) | |
93 | {qemu} migrate_set_cache_size 256m | |
94 | ||
95 | 4. Start outgoing migration | |
96 | {qemu} migrate -d tcp:destination.host:4444 | |
97 | {qemu} info migrate | |
98 | capabilities: xbzrle: on | |
99 | Migration status: active | |
100 | transferred ram: A kbytes | |
101 | remaining ram: B kbytes | |
102 | total ram: C kbytes | |
103 | total time: D milliseconds | |
104 | duplicate: E pages | |
105 | normal: F pages | |
106 | normal bytes: G kbytes | |
107 | cache size: H bytes | |
108 | xbzrle transferred: I kbytes | |
109 | xbzrle pages: J pages | |
110 | xbzrle cache miss: K | |
111 | xbzrle overflow : L | |
112 | ||
113 | xbzrle cache-miss: the number of cache misses to date - high cache-miss rate | |
114 | indicates that the cache size is set too low. | |
115 | xbzrle overflow: the number of overflows in the decoding which where the delta | |
116 | could not be compressed. This can happen if the changes in the pages are too | |
117 | large or there are many short changes; for example, changing every second byte | |
118 | (half a page). | |
119 | ||
120 | Testing: Testing indicated that live migration with XBZRLE was completed in 110 | |
121 | seconds, whereas without it would not be able to complete. | |
122 | ||
123 | A simple synthetic memory r/w load generator: | |
124 | .. include <stdlib.h> | |
125 | .. include <stdio.h> | |
126 | .. int main() | |
127 | .. { | |
128 | .. char *buf = (char *) calloc(4096, 4096); | |
129 | .. while (1) { | |
130 | .. int i; | |
131 | .. for (i = 0; i < 4096 * 4; i++) { | |
132 | .. buf[i * 4096 / 4]++; | |
133 | .. } | |
134 | .. printf("."); | |
135 | .. } | |
136 | .. } |