]>
Commit | Line | Data |
---|---|---|
03feae73 KW |
1 | == General == |
2 | ||
3 | A qcow2 image file is organized in units of constant size, which are called | |
4 | (host) clusters. A cluster is the unit in which all allocations are done, | |
5 | both for actual guest data and for image metadata. | |
6 | ||
7 | Likewise, the virtual disk as seen by the guest is divided into (guest) | |
8 | clusters of the same size. | |
9 | ||
10 | All numbers in qcow2 are stored in Big Endian byte order. | |
11 | ||
12 | ||
13 | == Header == | |
14 | ||
15 | The first cluster of a qcow2 image contains the file header: | |
16 | ||
17 | Byte 0 - 3: magic | |
18 | QCOW magic string ("QFI\xfb") | |
19 | ||
20 | 4 - 7: version | |
4fabffc1 | 21 | Version number (valid values are 2 and 3) |
03feae73 KW |
22 | |
23 | 8 - 15: backing_file_offset | |
24 | Offset into the image file at which the backing file name | |
25 | is stored (NB: The string is not null terminated). 0 if the | |
26 | image doesn't have a backing file. | |
27 | ||
28 | 16 - 19: backing_file_size | |
29 | Length of the backing file name in bytes. Must not be | |
30 | longer than 1023 bytes. Undefined if the image doesn't have | |
31 | a backing file. | |
32 | ||
33 | 20 - 23: cluster_bits | |
34 | Number of bits that are used for addressing an offset | |
35 | within a cluster (1 << cluster_bits is the cluster size). | |
36 | Must not be less than 9 (i.e. 512 byte clusters). | |
37 | ||
38 | Note: qemu as of today has an implementation limit of 2 MB | |
39 | as the maximum cluster size and won't be able to open images | |
40 | with larger cluster sizes. | |
41 | ||
42 | 24 - 31: size | |
43 | Virtual disk size in bytes | |
44 | ||
45 | 32 - 35: crypt_method | |
46 | 0 for no encryption | |
47 | 1 for AES encryption | |
7674b575 | 48 | 2 for LUKS encryption |
03feae73 KW |
49 | |
50 | 36 - 39: l1_size | |
51 | Number of entries in the active L1 table | |
52 | ||
53 | 40 - 47: l1_table_offset | |
54 | Offset into the image file at which the active L1 table | |
55 | starts. Must be aligned to a cluster boundary. | |
56 | ||
57 | 48 - 55: refcount_table_offset | |
58 | Offset into the image file at which the refcount table | |
59 | starts. Must be aligned to a cluster boundary. | |
60 | ||
61 | 56 - 59: refcount_table_clusters | |
62 | Number of clusters that the refcount table occupies | |
63 | ||
64 | 60 - 63: nb_snapshots | |
65 | Number of snapshots contained in the image | |
66 | ||
67 | 64 - 71: snapshots_offset | |
68 | Offset into the image file at which the snapshot table | |
69 | starts. Must be aligned to a cluster boundary. | |
70 | ||
4fabffc1 KW |
71 | If the version is 3 or higher, the header has the following additional fields. |
72 | For version 2, the values are assumed to be zero, unless specified otherwise | |
73 | in the description of a field. | |
74 | ||
75 | 72 - 79: incompatible_features | |
76 | Bitmask of incompatible features. An implementation must | |
77 | fail to open an image if an unknown bit is set. | |
78 | ||
0f6d767a SH |
79 | Bit 0: Dirty bit. If this bit is set then refcounts |
80 | may be inconsistent, make sure to scan L1/L2 | |
81 | tables to repair refcounts before accessing the | |
82 | image. | |
83 | ||
69c98726 HR |
84 | Bit 1: Corrupt bit. If this bit is set then any data |
85 | structure may be corrupt and the image must not | |
86 | be written to (unless for regaining | |
87 | consistency). | |
88 | ||
89 | Bits 2-63: Reserved (set to 0) | |
4fabffc1 KW |
90 | |
91 | 80 - 87: compatible_features | |
92 | Bitmask of compatible features. An implementation can | |
93 | safely ignore any unknown bits that are set. | |
94 | ||
dae8796d SH |
95 | Bit 0: Lazy refcounts bit. If this bit is set then |
96 | lazy refcount updates can be used. This means | |
97 | marking the image file dirty and postponing | |
98 | refcount metadata updates. | |
99 | ||
100 | Bits 1-63: Reserved (set to 0) | |
4fabffc1 KW |
101 | |
102 | 88 - 95: autoclear_features | |
103 | Bitmask of auto-clear features. An implementation may only | |
104 | write to an image with unknown auto-clear features if it | |
105 | clears the respective bits from this field first. | |
106 | ||
bca5a8f4 VSO |
107 | Bit 0: Bitmaps extension bit |
108 | This bit indicates consistency for the bitmaps | |
109 | extension data. | |
110 | ||
111 | It is an error if this bit is set without the | |
112 | bitmaps extension present. | |
113 | ||
114 | If the bitmaps extension is present but this | |
115 | bit is unset, the bitmaps extension data must be | |
116 | considered inconsistent. | |
117 | ||
118 | Bits 1-63: Reserved (set to 0) | |
4fabffc1 KW |
119 | |
120 | 96 - 99: refcount_order | |
121 | Describes the width of a reference count block entry (width | |
6815bce5 MK |
122 | in bits: refcount_bits = 1 << refcount_order). For version 2 |
123 | images, the order is always assumed to be 4 | |
124 | (i.e. refcount_bits = 16). | |
7f75a07d | 125 | This value may not exceed 6 (i.e. refcount_bits = 64). |
4fabffc1 KW |
126 | |
127 | 100 - 103: header_length | |
128 | Length of the header structure in bytes. For version 2 | |
129 | images, the length is always assumed to be 72 bytes. | |
130 | ||
03feae73 KW |
131 | Directly after the image header, optional sections called header extensions can |
132 | be stored. Each extension has a structure like the following: | |
133 | ||
134 | Byte 0 - 3: Header extension type: | |
135 | 0x00000000 - End of the header extension area | |
136 | 0xE2792ACA - Backing file format name | |
4fabffc1 | 137 | 0x6803f857 - Feature name table |
bca5a8f4 | 138 | 0x23852875 - Bitmaps extension |
7674b575 | 139 | 0x0537be77 - Full disk encryption header pointer |
03feae73 KW |
140 | other - Unknown header extension, can be safely |
141 | ignored | |
142 | ||
143 | 4 - 7: Length of the header extension data | |
144 | ||
145 | 8 - n: Header extension data | |
146 | ||
147 | n - m: Padding to round up the header extension size to the next | |
148 | multiple of 8. | |
149 | ||
4fabffc1 KW |
150 | Unless stated otherwise, each header extension type shall appear at most once |
151 | in the same image. | |
152 | ||
8e436ec1 MK |
153 | If the image has a backing file then the backing file name should be stored in |
154 | the remaining space between the end of the header extension area and the end of | |
155 | the first cluster. It is not allowed to store other data here, so that an | |
156 | implementation can safely modify the header and add extensions without harming | |
157 | data of compatible features that it doesn't support. Compatible features that | |
158 | need space for additional data can use a header extension. | |
4fabffc1 KW |
159 | |
160 | ||
161 | == Feature name table == | |
162 | ||
163 | The feature name table is an optional header extension that contains the name | |
164 | for features used by the image. It can be used by applications that don't know | |
165 | the respective feature (e.g. because the feature was introduced only later) to | |
166 | display a useful error message. | |
167 | ||
168 | The number of entries in the feature name table is determined by the length of | |
169 | the header extension data. Each entry look like this: | |
170 | ||
171 | Byte 0: Type of feature (select feature bitmap) | |
172 | 0: Incompatible feature | |
173 | 1: Compatible feature | |
174 | 2: Autoclear feature | |
175 | ||
176 | 1: Bit number within the selected feature bitmap (valid | |
177 | values: 0-63) | |
178 | ||
179 | 2 - 47: Feature name (padded with zeros, but not necessarily null | |
180 | terminated if it has full length) | |
03feae73 KW |
181 | |
182 | ||
bca5a8f4 VSO |
183 | == Bitmaps extension == |
184 | ||
185 | The bitmaps extension is an optional header extension. It provides the ability | |
186 | to store bitmaps related to a virtual disk. For now, there is only one bitmap | |
187 | type: the dirty tracking bitmap, which tracks virtual disk changes from some | |
188 | point in time. | |
189 | ||
190 | The data of the extension should be considered consistent only if the | |
191 | corresponding auto-clear feature bit is set, see autoclear_features above. | |
192 | ||
193 | The fields of the bitmaps extension are: | |
194 | ||
195 | Byte 0 - 3: nb_bitmaps | |
196 | The number of bitmaps contained in the image. Must be | |
197 | greater than or equal to 1. | |
198 | ||
199 | Note: Qemu currently only supports up to 65535 bitmaps per | |
200 | image. | |
201 | ||
202 | 4 - 7: Reserved, must be zero. | |
203 | ||
204 | 8 - 15: bitmap_directory_size | |
205 | Size of the bitmap directory in bytes. It is the cumulative | |
b348c262 | 206 | size of all (nb_bitmaps) bitmap directory entries. |
bca5a8f4 VSO |
207 | |
208 | 16 - 23: bitmap_directory_offset | |
209 | Offset into the image file at which the bitmap directory | |
210 | starts. Must be aligned to a cluster boundary. | |
211 | ||
7674b575 DB |
212 | == Full disk encryption header pointer == |
213 | ||
214 | The full disk encryption header must be present if, and only if, the | |
215 | 'crypt_method' header requires metadata. Currently this is only true | |
216 | of the 'LUKS' crypt method. The header extension must be absent for | |
217 | other methods. | |
218 | ||
219 | This header provides the offset at which the crypt method can store | |
220 | its additional data, as well as the length of such data. | |
221 | ||
222 | Byte 0 - 7: Offset into the image file at which the encryption | |
223 | header starts in bytes. Must be aligned to a cluster | |
224 | boundary. | |
225 | Byte 8 - 15: Length of the written encryption header in bytes. | |
226 | Note actual space allocated in the qcow2 file may | |
227 | be larger than this value, since it will be rounded | |
228 | to the nearest multiple of the cluster size. Any | |
229 | unused bytes in the allocated space will be initialized | |
230 | to 0. | |
231 | ||
232 | For the LUKS crypt method, the encryption header works as follows. | |
233 | ||
234 | The first 592 bytes of the header clusters will contain the LUKS | |
235 | partition header. This is then followed by the key material data areas. | |
236 | The size of the key material data areas is determined by the number of | |
237 | stripes in the key slot and key size. Refer to the LUKS format | |
238 | specification ('docs/on-disk-format.pdf' in the cryptsetup source | |
239 | package) for details of the LUKS partition header format. | |
240 | ||
241 | In the LUKS partition header, the "payload-offset" field will be | |
242 | calculated as normal for the LUKS spec. ie the size of the LUKS | |
243 | header, plus key material regions, plus padding, relative to the | |
244 | start of the LUKS header. This offset value is not required to be | |
245 | qcow2 cluster aligned. Its value is currently never used in the | |
246 | context of qcow2, since the qcow2 file format itself defines where | |
247 | the real payload offset is, but none the less a valid payload offset | |
248 | should always be present. | |
249 | ||
250 | In the LUKS key slots header, the "key-material-offset" is relative | |
251 | to the start of the LUKS header clusters in the qcow2 container, | |
252 | not the start of the qcow2 file. | |
253 | ||
254 | Logically the layout looks like | |
255 | ||
256 | +-----------------------------+ | |
257 | | QCow2 header | | |
258 | | QCow2 header extension X | | |
259 | | QCow2 header extension FDE | | |
260 | | QCow2 header extension ... | | |
261 | | QCow2 header extension Z | | |
262 | +-----------------------------+ | |
263 | | ....other QCow2 tables.... | | |
264 | . . | |
265 | . . | |
266 | +-----------------------------+ | |
267 | | +-------------------------+ | | |
268 | | | LUKS partition header | | | |
269 | | +-------------------------+ | | |
270 | | | LUKS key material 1 | | | |
271 | | +-------------------------+ | | |
272 | | | LUKS key material 2 | | | |
273 | | +-------------------------+ | | |
274 | | | LUKS key material ... | | | |
275 | | +-------------------------+ | | |
276 | | | LUKS key material 8 | | | |
277 | | +-------------------------+ | | |
278 | +-----------------------------+ | |
279 | | QCow2 cluster payload | | |
280 | . . | |
281 | . . | |
282 | . . | |
283 | | | | |
284 | +-----------------------------+ | |
285 | ||
286 | == Data encryption == | |
287 | ||
288 | When an encryption method is requested in the header, the image payload | |
289 | data must be encrypted/decrypted on every write/read. The image headers | |
290 | and metadata are never encrypted. | |
291 | ||
292 | The algorithms used for encryption vary depending on the method | |
293 | ||
294 | - AES: | |
295 | ||
296 | The AES cipher, in CBC mode, with 256 bit keys. | |
297 | ||
298 | Initialization vectors generated using plain64 method, with | |
299 | the virtual disk sector as the input tweak. | |
300 | ||
301 | This format is no longer supported in QEMU system emulators, due | |
302 | to a number of design flaws affecting its security. It is only | |
303 | supported in the command line tools for the sake of back compatibility | |
304 | and data liberation. | |
305 | ||
306 | - LUKS: | |
307 | ||
308 | The algorithms are specified in the LUKS header. | |
309 | ||
310 | Initialization vectors generated using the method specified | |
311 | in the LUKS header, with the physical disk sector as the | |
312 | input tweak. | |
bca5a8f4 | 313 | |
03feae73 KW |
314 | == Host cluster management == |
315 | ||
316 | qcow2 manages the allocation of host clusters by maintaining a reference count | |
317 | for each host cluster. A refcount of 0 means that the cluster is free, 1 means | |
318 | that it is used, and >= 2 means that it is used and any write access must | |
319 | perform a COW (copy on write) operation. | |
320 | ||
321 | The refcounts are managed in a two-level table. The first level is called | |
322 | refcount table and has a variable size (which is stored in the header). The | |
323 | refcount table can cover multiple clusters, however it needs to be contiguous | |
324 | in the image file. | |
325 | ||
326 | It contains pointers to the second level structures which are called refcount | |
327 | blocks and are exactly one cluster in size. | |
328 | ||
329 | Given a offset into the image file, the refcount of its cluster can be obtained | |
330 | as follows: | |
331 | ||
4b318d6c | 332 | refcount_block_entries = (cluster_size * 8 / refcount_bits) |
03feae73 | 333 | |
3789985f ZYW |
334 | refcount_block_index = (offset / cluster_size) % refcount_block_entries |
335 | refcount_table_index = (offset / cluster_size) / refcount_block_entries | |
03feae73 KW |
336 | |
337 | refcount_block = load_cluster(refcount_table[refcount_table_index]); | |
338 | return refcount_block[refcount_block_index]; | |
339 | ||
340 | Refcount table entry: | |
341 | ||
342 | Bit 0 - 8: Reserved (set to 0) | |
343 | ||
344 | 9 - 63: Bits 9-63 of the offset into the image file at which the | |
345 | refcount block starts. Must be aligned to a cluster | |
346 | boundary. | |
347 | ||
348 | If this is 0, the corresponding refcount block has not yet | |
349 | been allocated. All refcounts managed by this refcount block | |
350 | are 0. | |
351 | ||
4fabffc1 | 352 | Refcount block entry (x = refcount_bits - 1): |
03feae73 | 353 | |
4fabffc1 KW |
354 | Bit 0 - x: Reference count of the cluster. If refcount_bits implies a |
355 | sub-byte width, note that bit 0 means the least significant | |
356 | bit in this context. | |
03feae73 KW |
357 | |
358 | ||
359 | == Cluster mapping == | |
360 | ||
361 | Just as for refcounts, qcow2 uses a two-level structure for the mapping of | |
362 | guest clusters to host clusters. They are called L1 and L2 table. | |
363 | ||
364 | The L1 table has a variable size (stored in the header) and may use multiple | |
365 | clusters, however it must be contiguous in the image file. L2 tables are | |
366 | exactly one cluster in size. | |
367 | ||
368 | Given a offset into the virtual disk, the offset into the image file can be | |
369 | obtained as follows: | |
370 | ||
371 | l2_entries = (cluster_size / sizeof(uint64_t)) | |
372 | ||
373 | l2_index = (offset / cluster_size) % l2_entries | |
374 | l1_index = (offset / cluster_size) / l2_entries | |
375 | ||
376 | l2_table = load_cluster(l1_table[l1_index]); | |
377 | cluster_offset = l2_table[l2_index]; | |
378 | ||
379 | return cluster_offset + (offset % cluster_size) | |
380 | ||
381 | L1 table entry: | |
382 | ||
383 | Bit 0 - 8: Reserved (set to 0) | |
384 | ||
385 | 9 - 55: Bits 9-55 of the offset into the image file at which the L2 | |
386 | table starts. Must be aligned to a cluster boundary. If the | |
387 | offset is 0, the L2 table and all clusters described by this | |
388 | L2 table are unallocated. | |
389 | ||
390 | 56 - 62: Reserved (set to 0) | |
391 | ||
392 | 63: 0 for an L2 table that is unused or requires COW, 1 if its | |
393 | refcount is exactly one. This information is only accurate | |
394 | in the active L1 table. | |
395 | ||
4fabffc1 | 396 | L2 table entry: |
03feae73 | 397 | |
4fabffc1 KW |
398 | Bit 0 - 61: Cluster descriptor |
399 | ||
400 | 62: 0 for standard clusters | |
401 | 1 for compressed clusters | |
402 | ||
403 | 63: 0 for a cluster that is unused or requires COW, 1 if its | |
404 | refcount is exactly one. This information is only accurate | |
b6af0975 | 405 | in L2 tables that are reachable from the active L1 |
4fabffc1 KW |
406 | table. |
407 | ||
408 | Standard Cluster Descriptor: | |
409 | ||
410 | Bit 0: If set to 1, the cluster reads as all zeros. The host | |
411 | cluster offset can be used to describe a preallocation, | |
412 | but it won't be used for reading data from this cluster, | |
413 | nor is data read from the backing file if the cluster is | |
414 | unallocated. | |
415 | ||
416 | With version 2, this is always 0. | |
417 | ||
418 | 1 - 8: Reserved (set to 0) | |
03feae73 KW |
419 | |
420 | 9 - 55: Bits 9-55 of host cluster offset. Must be aligned to a | |
421 | cluster boundary. If the offset is 0, the cluster is | |
422 | unallocated. | |
423 | ||
424 | 56 - 61: Reserved (set to 0) | |
425 | ||
03feae73 | 426 | |
bf3f363a | 427 | Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)): |
03feae73 KW |
428 | |
429 | Bit 0 - x: Host cluster offset. This is usually _not_ aligned to a | |
430 | cluster boundary! | |
431 | ||
432 | x+1 - 61: Compressed size of the images in sectors of 512 bytes | |
433 | ||
03feae73 | 434 | If a cluster is unallocated, read requests shall read the data from the backing |
4fabffc1 KW |
435 | file (except if bit 0 in the Standard Cluster Descriptor is set). If there is |
436 | no backing file or the backing file is smaller than the image, they shall read | |
437 | zeros for all parts that are not covered by the backing file. | |
03feae73 KW |
438 | |
439 | ||
440 | == Snapshots == | |
441 | ||
442 | qcow2 supports internal snapshots. Their basic principle of operation is to | |
443 | switch the active L1 table, so that a different set of host clusters are | |
444 | exposed to the guest. | |
445 | ||
446 | When creating a snapshot, the L1 table should be copied and the refcount of all | |
3789985f | 447 | L2 tables and clusters reachable from this L1 table must be increased, so that |
03feae73 KW |
448 | a write causes a COW and isn't visible in other snapshots. |
449 | ||
450 | When loading a snapshot, bit 63 of all entries in the new active L1 table and | |
451 | all L2 tables referenced by it must be reconstructed from the refcount table | |
452 | as it doesn't need to be accurate in inactive L1 tables. | |
453 | ||
454 | A directory of all snapshots is stored in the snapshot table, a contiguous area | |
455 | in the image file, whose starting offset and length are given by the header | |
456 | fields snapshots_offset and nb_snapshots. The entries of the snapshot table | |
457 | have variable length, depending on the length of ID, name and extra data. | |
458 | ||
459 | Snapshot table entry: | |
460 | ||
461 | Byte 0 - 7: Offset into the image file at which the L1 table for the | |
462 | snapshot starts. Must be aligned to a cluster boundary. | |
463 | ||
464 | 8 - 11: Number of entries in the L1 table of the snapshots | |
465 | ||
466 | 12 - 13: Length of the unique ID string describing the snapshot | |
467 | ||
468 | 14 - 15: Length of the name of the snapshot | |
469 | ||
470 | 16 - 19: Time at which the snapshot was taken in seconds since the | |
471 | Epoch | |
472 | ||
473 | 20 - 23: Subsecond part of the time at which the snapshot was taken | |
474 | in nanoseconds | |
475 | ||
476 | 24 - 31: Time that the guest was running until the snapshot was | |
477 | taken in nanoseconds | |
478 | ||
479 | 32 - 35: Size of the VM state in bytes. 0 if no VM state is saved. | |
480 | If there is VM state, it starts at the first cluster | |
481 | described by first L1 table entry that doesn't describe a | |
482 | regular guest cluster (i.e. VM state is stored like guest | |
483 | disk content, except that it is stored at offsets that are | |
484 | larger than the virtual disk presented to the guest) | |
485 | ||
486 | 36 - 39: Size of extra data in the table entry (used for future | |
487 | extensions of the format) | |
488 | ||
c2c9a466 KW |
489 | variable: Extra data for future extensions. Unknown fields must be |
490 | ignored. Currently defined are (offset relative to snapshot | |
491 | table entry): | |
492 | ||
493 | Byte 40 - 47: Size of the VM state in bytes. 0 if no VM | |
494 | state is saved. If this field is present, | |
495 | the 32-bit value in bytes 32-35 is ignored. | |
03feae73 | 496 | |
4fabffc1 KW |
497 | Byte 48 - 55: Virtual disk size of the snapshot in bytes |
498 | ||
499 | Version 3 images must include extra data at least up to | |
500 | byte 55. | |
501 | ||
03feae73 KW |
502 | variable: Unique ID string for the snapshot (not null terminated) |
503 | ||
504 | variable: Name of the snapshot (not null terminated) | |
f2520804 HR |
505 | |
506 | variable: Padding to round up the snapshot table entry size to the | |
507 | next multiple of 8. | |
bca5a8f4 VSO |
508 | |
509 | ||
510 | == Bitmaps == | |
511 | ||
512 | As mentioned above, the bitmaps extension provides the ability to store bitmaps | |
513 | related to a virtual disk. This section describes how these bitmaps are stored. | |
514 | ||
515 | All stored bitmaps are related to the virtual disk stored in the same image, so | |
516 | each bitmap size is equal to the virtual disk size. | |
517 | ||
518 | Each bit of the bitmap is responsible for strictly defined range of the virtual | |
519 | disk. For bit number bit_nr the corresponding range (in bytes) will be: | |
520 | ||
521 | [bit_nr * bitmap_granularity .. (bit_nr + 1) * bitmap_granularity - 1] | |
522 | ||
523 | Granularity is a property of the concrete bitmap, see below. | |
524 | ||
525 | ||
526 | === Bitmap directory === | |
527 | ||
528 | Each bitmap saved in the image is described in a bitmap directory entry. The | |
529 | bitmap directory is a contiguous area in the image file, whose starting offset | |
530 | and length are given by the header extension fields bitmap_directory_offset and | |
531 | bitmap_directory_size. The entries of the bitmap directory have variable | |
b348c262 | 532 | length, depending on the lengths of the bitmap name and extra data. |
bca5a8f4 VSO |
533 | |
534 | Structure of a bitmap directory entry: | |
535 | ||
536 | Byte 0 - 7: bitmap_table_offset | |
537 | Offset into the image file at which the bitmap table | |
538 | (described below) for the bitmap starts. Must be aligned to | |
539 | a cluster boundary. | |
540 | ||
541 | 8 - 11: bitmap_table_size | |
542 | Number of entries in the bitmap table of the bitmap. | |
543 | ||
544 | 12 - 15: flags | |
545 | Bit | |
546 | 0: in_use | |
547 | The bitmap was not saved correctly and may be | |
548 | inconsistent. | |
549 | ||
550 | 1: auto | |
551 | The bitmap must reflect all changes of the virtual | |
552 | disk by any application that would write to this qcow2 | |
553 | file (including writes, snapshot switching, etc.). The | |
554 | type of this bitmap must be 'dirty tracking bitmap'. | |
555 | ||
556 | 2: extra_data_compatible | |
557 | This flags is meaningful when the extra data is | |
558 | unknown to the software (currently any extra data is | |
559 | unknown to Qemu). | |
560 | If it is set, the bitmap may be used as expected, extra | |
561 | data must be left as is. | |
562 | If it is not set, the bitmap must not be used, but | |
563 | both it and its extra data be left as is. | |
564 | ||
565 | Bits 3 - 31 are reserved and must be 0. | |
566 | ||
567 | 16: type | |
568 | This field describes the sort of the bitmap. | |
569 | Values: | |
570 | 1: Dirty tracking bitmap | |
571 | ||
572 | Values 0, 2 - 255 are reserved. | |
573 | ||
574 | 17: granularity_bits | |
575 | Granularity bits. Valid values: 0 - 63. | |
576 | ||
b5d1f154 | 577 | Note: Qemu currently supports only values 9 - 31. |
bca5a8f4 VSO |
578 | |
579 | Granularity is calculated as | |
580 | granularity = 1 << granularity_bits | |
581 | ||
582 | A bitmap's granularity is how many bytes of the image | |
583 | accounts for one bit of the bitmap. | |
584 | ||
585 | 18 - 19: name_size | |
586 | Size of the bitmap name. Must be non-zero. | |
587 | ||
588 | Note: Qemu currently doesn't support values greater than | |
589 | 1023. | |
590 | ||
591 | 20 - 23: extra_data_size | |
592 | Size of type-specific extra data. | |
593 | ||
594 | For now, as no extra data is defined, extra_data_size is | |
595 | reserved and should be zero. If it is non-zero the | |
596 | behavior is defined by extra_data_compatible flag. | |
597 | ||
598 | variable: extra_data | |
599 | Extra data for the bitmap, occupying extra_data_size bytes. | |
600 | Extra data must never contain references to clusters or in | |
601 | some other way allocate additional clusters. | |
602 | ||
603 | variable: name | |
604 | The name of the bitmap (not null terminated), occupying | |
605 | name_size bytes. Must be unique among all bitmap names | |
606 | within the bitmaps extension. | |
607 | ||
608 | variable: Padding to round up the bitmap directory entry size to the | |
609 | next multiple of 8. All bytes of the padding must be zero. | |
610 | ||
611 | ||
612 | === Bitmap table === | |
613 | ||
614 | Each bitmap is stored using a one-level structure (as opposed to two-level | |
615 | structures like for refcounts and guest clusters mapping) for the mapping of | |
616 | bitmap data to host clusters. This structure is called the bitmap table. | |
617 | ||
618 | Each bitmap table has a variable size (stored in the bitmap directory entry) | |
619 | and may use multiple clusters, however, it must be contiguous in the image | |
620 | file. | |
621 | ||
622 | Structure of a bitmap table entry: | |
623 | ||
624 | Bit 0: Reserved and must be zero if bits 9 - 55 are non-zero. | |
625 | If bits 9 - 55 are zero: | |
626 | 0: Cluster should be read as all zeros. | |
627 | 1: Cluster should be read as all ones. | |
628 | ||
629 | 1 - 8: Reserved and must be zero. | |
630 | ||
631 | 9 - 55: Bits 9 - 55 of the host cluster offset. Must be aligned to | |
632 | a cluster boundary. If the offset is 0, the cluster is | |
633 | unallocated; in that case, bit 0 determines how this | |
634 | cluster should be treated during reads. | |
635 | ||
636 | 56 - 63: Reserved and must be zero. | |
637 | ||
638 | ||
639 | === Bitmap data === | |
640 | ||
641 | As noted above, bitmap data is stored in separate clusters, described by the | |
642 | bitmap table. Given an offset (in bytes) into the bitmap data, the offset into | |
643 | the image file can be obtained as follows: | |
644 | ||
645 | image_offset(bitmap_data_offset) = | |
646 | bitmap_table[bitmap_data_offset / cluster_size] + | |
647 | (bitmap_data_offset % cluster_size) | |
648 | ||
649 | This offset is not defined if bits 9 - 55 of bitmap table entry are zero (see | |
650 | above). | |
651 | ||
652 | Given an offset byte_nr into the virtual disk and the bitmap's granularity, the | |
653 | bit offset into the image file to the corresponding bit of the bitmap can be | |
654 | calculated like this: | |
655 | ||
656 | bit_offset(byte_nr) = | |
657 | image_offset(byte_nr / granularity / 8) * 8 + | |
658 | (byte_nr / granularity) % 8 | |
659 | ||
660 | If the size of the bitmap data is not a multiple of the cluster size then the | |
661 | last cluster of the bitmap data contains some unused tail bits. These bits must | |
662 | be zero. | |
663 | ||
664 | ||
665 | === Dirty tracking bitmaps === | |
666 | ||
667 | Bitmaps with 'type' field equal to one are dirty tracking bitmaps. | |
668 | ||
669 | When the virtual disk is in use dirty tracking bitmap may be 'enabled' or | |
670 | 'disabled'. While the bitmap is 'enabled', all writes to the virtual disk | |
671 | should be reflected in the bitmap. A set bit in the bitmap means that the | |
672 | corresponding range of the virtual disk (see above) was written to while the | |
673 | bitmap was 'enabled'. An unset bit means that this range was not written to. | |
674 | ||
675 | The software doesn't have to sync the bitmap in the image file with its | |
676 | representation in RAM after each write. Flag 'in_use' should be set while the | |
677 | bitmap is not synced. | |
678 | ||
679 | In the image file the 'enabled' state is reflected by the 'auto' flag. If this | |
680 | flag is set, the software must consider the bitmap as 'enabled' and start | |
681 | tracking virtual disk changes to this bitmap from the first write to the | |
682 | virtual disk. If this flag is not set then the bitmap is disabled. |