3 A qcow2 image file is organized in units of constant size, which are called
4 (host) clusters. A cluster is the unit in which all allocations are done,
5 both for actual guest data and for image metadata.
7 Likewise, the virtual disk as seen by the guest is divided into (guest)
8 clusters of the same size.
10 All numbers in qcow2 are stored in Big Endian byte order.
15 The first cluster of a qcow2 image contains the file header:
18 QCOW magic string ("QFI\xfb")
21 Version number (valid values are 2 and 3)
23 8 - 15: backing_file_offset
24 Offset into the image file at which the backing file name
25 is stored (NB: The string is not null terminated). 0 if the
26 image doesn't have a backing file.
28 16 - 19: backing_file_size
29 Length of the backing file name in bytes. Must not be
30 longer than 1023 bytes. Undefined if the image doesn't have
34 Number of bits that are used for addressing an offset
35 within a cluster (1 << cluster_bits is the cluster size).
36 Must not be less than 9 (i.e. 512 byte clusters).
38 Note: qemu as of today has an implementation limit of 2 MB
39 as the maximum cluster size and won't be able to open images
40 with larger cluster sizes.
43 Virtual disk size in bytes.
45 Note: qemu has an implementation limit of 32 MB as
46 the maximum L1 table size. With a 2 MB cluster
47 size, it is unable to populate a virtual cluster
48 beyond 2 EB (61 bits); with a 512 byte cluster
49 size, it is unable to populate a virtual size
50 larger than 128 GB (37 bits). Meanwhile, L1/L2
51 table layouts limit an image to no more than 64 PB
52 (56 bits) of populated clusters, and an image may
53 hit other limits first (such as a file system's
62 Number of entries in the active L1 table
64 40 - 47: l1_table_offset
65 Offset into the image file at which the active L1 table
66 starts. Must be aligned to a cluster boundary.
68 48 - 55: refcount_table_offset
69 Offset into the image file at which the refcount table
70 starts. Must be aligned to a cluster boundary.
72 56 - 59: refcount_table_clusters
73 Number of clusters that the refcount table occupies
76 Number of snapshots contained in the image
78 64 - 71: snapshots_offset
79 Offset into the image file at which the snapshot table
80 starts. Must be aligned to a cluster boundary.
82 If the version is 3 or higher, the header has the following additional fields.
83 For version 2, the values are assumed to be zero, unless specified otherwise
84 in the description of a field.
86 72 - 79: incompatible_features
87 Bitmask of incompatible features. An implementation must
88 fail to open an image if an unknown bit is set.
90 Bit 0: Dirty bit. If this bit is set then refcounts
91 may be inconsistent, make sure to scan L1/L2
92 tables to repair refcounts before accessing the
95 Bit 1: Corrupt bit. If this bit is set then any data
96 structure may be corrupt and the image must not
97 be written to (unless for regaining
100 Bits 2-63: Reserved (set to 0)
102 80 - 87: compatible_features
103 Bitmask of compatible features. An implementation can
104 safely ignore any unknown bits that are set.
106 Bit 0: Lazy refcounts bit. If this bit is set then
107 lazy refcount updates can be used. This means
108 marking the image file dirty and postponing
109 refcount metadata updates.
111 Bits 1-63: Reserved (set to 0)
113 88 - 95: autoclear_features
114 Bitmask of auto-clear features. An implementation may only
115 write to an image with unknown auto-clear features if it
116 clears the respective bits from this field first.
118 Bit 0: Bitmaps extension bit
119 This bit indicates consistency for the bitmaps
122 It is an error if this bit is set without the
123 bitmaps extension present.
125 If the bitmaps extension is present but this
126 bit is unset, the bitmaps extension data must be
127 considered inconsistent.
129 Bits 1-63: Reserved (set to 0)
131 96 - 99: refcount_order
132 Describes the width of a reference count block entry (width
133 in bits: refcount_bits = 1 << refcount_order). For version 2
134 images, the order is always assumed to be 4
135 (i.e. refcount_bits = 16).
136 This value may not exceed 6 (i.e. refcount_bits = 64).
138 100 - 103: header_length
139 Length of the header structure in bytes. For version 2
140 images, the length is always assumed to be 72 bytes.
142 Directly after the image header, optional sections called header extensions can
143 be stored. Each extension has a structure like the following:
145 Byte 0 - 3: Header extension type:
146 0x00000000 - End of the header extension area
147 0xE2792ACA - Backing file format name
148 0x6803f857 - Feature name table
149 0x23852875 - Bitmaps extension
150 0x0537be77 - Full disk encryption header pointer
151 other - Unknown header extension, can be safely
154 4 - 7: Length of the header extension data
156 8 - n: Header extension data
158 n - m: Padding to round up the header extension size to the next
161 Unless stated otherwise, each header extension type shall appear at most once
164 If the image has a backing file then the backing file name should be stored in
165 the remaining space between the end of the header extension area and the end of
166 the first cluster. It is not allowed to store other data here, so that an
167 implementation can safely modify the header and add extensions without harming
168 data of compatible features that it doesn't support. Compatible features that
169 need space for additional data can use a header extension.
172 == Feature name table ==
174 The feature name table is an optional header extension that contains the name
175 for features used by the image. It can be used by applications that don't know
176 the respective feature (e.g. because the feature was introduced only later) to
177 display a useful error message.
179 The number of entries in the feature name table is determined by the length of
180 the header extension data. Each entry look like this:
182 Byte 0: Type of feature (select feature bitmap)
183 0: Incompatible feature
184 1: Compatible feature
187 1: Bit number within the selected feature bitmap (valid
190 2 - 47: Feature name (padded with zeros, but not necessarily null
191 terminated if it has full length)
194 == Bitmaps extension ==
196 The bitmaps extension is an optional header extension. It provides the ability
197 to store bitmaps related to a virtual disk. For now, there is only one bitmap
198 type: the dirty tracking bitmap, which tracks virtual disk changes from some
201 The data of the extension should be considered consistent only if the
202 corresponding auto-clear feature bit is set, see autoclear_features above.
204 The fields of the bitmaps extension are:
206 Byte 0 - 3: nb_bitmaps
207 The number of bitmaps contained in the image. Must be
208 greater than or equal to 1.
210 Note: Qemu currently only supports up to 65535 bitmaps per
213 4 - 7: Reserved, must be zero.
215 8 - 15: bitmap_directory_size
216 Size of the bitmap directory in bytes. It is the cumulative
217 size of all (nb_bitmaps) bitmap directory entries.
219 16 - 23: bitmap_directory_offset
220 Offset into the image file at which the bitmap directory
221 starts. Must be aligned to a cluster boundary.
223 == Full disk encryption header pointer ==
225 The full disk encryption header must be present if, and only if, the
226 'crypt_method' header requires metadata. Currently this is only true
227 of the 'LUKS' crypt method. The header extension must be absent for
230 This header provides the offset at which the crypt method can store
231 its additional data, as well as the length of such data.
233 Byte 0 - 7: Offset into the image file at which the encryption
234 header starts in bytes. Must be aligned to a cluster
236 Byte 8 - 15: Length of the written encryption header in bytes.
237 Note actual space allocated in the qcow2 file may
238 be larger than this value, since it will be rounded
239 to the nearest multiple of the cluster size. Any
240 unused bytes in the allocated space will be initialized
243 For the LUKS crypt method, the encryption header works as follows.
245 The first 592 bytes of the header clusters will contain the LUKS
246 partition header. This is then followed by the key material data areas.
247 The size of the key material data areas is determined by the number of
248 stripes in the key slot and key size. Refer to the LUKS format
249 specification ('docs/on-disk-format.pdf' in the cryptsetup source
250 package) for details of the LUKS partition header format.
252 In the LUKS partition header, the "payload-offset" field will be
253 calculated as normal for the LUKS spec. ie the size of the LUKS
254 header, plus key material regions, plus padding, relative to the
255 start of the LUKS header. This offset value is not required to be
256 qcow2 cluster aligned. Its value is currently never used in the
257 context of qcow2, since the qcow2 file format itself defines where
258 the real payload offset is, but none the less a valid payload offset
259 should always be present.
261 In the LUKS key slots header, the "key-material-offset" is relative
262 to the start of the LUKS header clusters in the qcow2 container,
263 not the start of the qcow2 file.
265 Logically the layout looks like
267 +-----------------------------+
269 | QCow2 header extension X |
270 | QCow2 header extension FDE |
271 | QCow2 header extension ... |
272 | QCow2 header extension Z |
273 +-----------------------------+
274 | ....other QCow2 tables.... |
277 +-----------------------------+
278 | +-------------------------+ |
279 | | LUKS partition header | |
280 | +-------------------------+ |
281 | | LUKS key material 1 | |
282 | +-------------------------+ |
283 | | LUKS key material 2 | |
284 | +-------------------------+ |
285 | | LUKS key material ... | |
286 | +-------------------------+ |
287 | | LUKS key material 8 | |
288 | +-------------------------+ |
289 +-----------------------------+
290 | QCow2 cluster payload |
295 +-----------------------------+
297 == Data encryption ==
299 When an encryption method is requested in the header, the image payload
300 data must be encrypted/decrypted on every write/read. The image headers
301 and metadata are never encrypted.
303 The algorithms used for encryption vary depending on the method
307 The AES cipher, in CBC mode, with 256 bit keys.
309 Initialization vectors generated using plain64 method, with
310 the virtual disk sector as the input tweak.
312 This format is no longer supported in QEMU system emulators, due
313 to a number of design flaws affecting its security. It is only
314 supported in the command line tools for the sake of back compatibility
319 The algorithms are specified in the LUKS header.
321 Initialization vectors generated using the method specified
322 in the LUKS header, with the physical disk sector as the
325 == Host cluster management ==
327 qcow2 manages the allocation of host clusters by maintaining a reference count
328 for each host cluster. A refcount of 0 means that the cluster is free, 1 means
329 that it is used, and >= 2 means that it is used and any write access must
330 perform a COW (copy on write) operation.
332 The refcounts are managed in a two-level table. The first level is called
333 refcount table and has a variable size (which is stored in the header). The
334 refcount table can cover multiple clusters, however it needs to be contiguous
337 It contains pointers to the second level structures which are called refcount
338 blocks and are exactly one cluster in size.
340 Although a large enough refcount table can reserve clusters past 64 PB
341 (56 bits) (assuming the underlying protocol can even be sized that
342 large), note that some qcow2 metadata such as L1/L2 tables must point
343 to clusters prior to that point.
345 Note: qemu has an implementation limit of 8 MB as the maximum refcount
346 table size. With a 2 MB cluster size and a default refcount_order of
347 4, it is unable to reference host resources beyond 2 EB (61 bits); in
348 the worst case, with a 512 cluster size and refcount_order of 6, it is
349 unable to access beyond 32 GB (35 bits).
351 Given an offset into the image file, the refcount of its cluster can be
354 refcount_block_entries = (cluster_size * 8 / refcount_bits)
356 refcount_block_index = (offset / cluster_size) % refcount_block_entries
357 refcount_table_index = (offset / cluster_size) / refcount_block_entries
359 refcount_block = load_cluster(refcount_table[refcount_table_index]);
360 return refcount_block[refcount_block_index];
362 Refcount table entry:
364 Bit 0 - 8: Reserved (set to 0)
366 9 - 63: Bits 9-63 of the offset into the image file at which the
367 refcount block starts. Must be aligned to a cluster
370 If this is 0, the corresponding refcount block has not yet
371 been allocated. All refcounts managed by this refcount block
374 Refcount block entry (x = refcount_bits - 1):
376 Bit 0 - x: Reference count of the cluster. If refcount_bits implies a
377 sub-byte width, note that bit 0 means the least significant
381 == Cluster mapping ==
383 Just as for refcounts, qcow2 uses a two-level structure for the mapping of
384 guest clusters to host clusters. They are called L1 and L2 table.
386 The L1 table has a variable size (stored in the header) and may use multiple
387 clusters, however it must be contiguous in the image file. L2 tables are
388 exactly one cluster in size.
390 The L1 and L2 tables have implications on the maximum virtual file
391 size; for a given L1 table size, a larger cluster size is required for
392 the guest to have access to more space. Furthermore, a virtual
393 cluster must currently map to a host offset below 64 PB (56 bits)
394 (although this limit could be relaxed by putting reserved bits into
395 use). Additionally, as cluster size increases, the maximum host
396 offset for a compressed cluster is reduced (a 2M cluster size requires
397 compressed clusters to reside below 512 TB (49 bits), and this limit
398 cannot be relaxed without an incompatible layout change).
400 Given an offset into the virtual disk, the offset into the image file can be
403 l2_entries = (cluster_size / sizeof(uint64_t))
405 l2_index = (offset / cluster_size) % l2_entries
406 l1_index = (offset / cluster_size) / l2_entries
408 l2_table = load_cluster(l1_table[l1_index]);
409 cluster_offset = l2_table[l2_index];
411 return cluster_offset + (offset % cluster_size)
415 Bit 0 - 8: Reserved (set to 0)
417 9 - 55: Bits 9-55 of the offset into the image file at which the L2
418 table starts. Must be aligned to a cluster boundary. If the
419 offset is 0, the L2 table and all clusters described by this
420 L2 table are unallocated.
422 56 - 62: Reserved (set to 0)
424 63: 0 for an L2 table that is unused or requires COW, 1 if its
425 refcount is exactly one. This information is only accurate
426 in the active L1 table.
430 Bit 0 - 61: Cluster descriptor
432 62: 0 for standard clusters
433 1 for compressed clusters
435 63: 0 for clusters that are unused, compressed or require COW.
436 1 for standard clusters whose refcount is exactly one.
437 This information is only accurate in L2 tables
438 that are reachable from the active L1 table.
440 Standard Cluster Descriptor:
442 Bit 0: If set to 1, the cluster reads as all zeros. The host
443 cluster offset can be used to describe a preallocation,
444 but it won't be used for reading data from this cluster,
445 nor is data read from the backing file if the cluster is
448 With version 2, this is always 0.
450 1 - 8: Reserved (set to 0)
452 9 - 55: Bits 9-55 of host cluster offset. Must be aligned to a
453 cluster boundary. If the offset is 0, the cluster is
456 56 - 61: Reserved (set to 0)
459 Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)):
461 Bit 0 - x-1: Host cluster offset. This is usually _not_ aligned to a
462 cluster or sector boundary! If cluster_bits is
463 small enough that this field includes bits beyond
464 55, those upper bits must be set to 0.
466 x - 61: Number of additional 512-byte sectors used for the
467 compressed data, beyond the sector containing the offset
468 in the previous field. Some of these sectors may reside
469 in the next contiguous host cluster.
471 Note that the compressed data does not necessarily occupy
472 all of the bytes in the final sector; rather, decompression
473 stops when it has produced a cluster of data.
475 Another compressed cluster may map to the tail of the final
476 sector used by this compressed cluster.
478 If a cluster is unallocated, read requests shall read the data from the backing
479 file (except if bit 0 in the Standard Cluster Descriptor is set). If there is
480 no backing file or the backing file is smaller than the image, they shall read
481 zeros for all parts that are not covered by the backing file.
486 qcow2 supports internal snapshots. Their basic principle of operation is to
487 switch the active L1 table, so that a different set of host clusters are
488 exposed to the guest.
490 When creating a snapshot, the L1 table should be copied and the refcount of all
491 L2 tables and clusters reachable from this L1 table must be increased, so that
492 a write causes a COW and isn't visible in other snapshots.
494 When loading a snapshot, bit 63 of all entries in the new active L1 table and
495 all L2 tables referenced by it must be reconstructed from the refcount table
496 as it doesn't need to be accurate in inactive L1 tables.
498 A directory of all snapshots is stored in the snapshot table, a contiguous area
499 in the image file, whose starting offset and length are given by the header
500 fields snapshots_offset and nb_snapshots. The entries of the snapshot table
501 have variable length, depending on the length of ID, name and extra data.
503 Snapshot table entry:
505 Byte 0 - 7: Offset into the image file at which the L1 table for the
506 snapshot starts. Must be aligned to a cluster boundary.
508 8 - 11: Number of entries in the L1 table of the snapshots
510 12 - 13: Length of the unique ID string describing the snapshot
512 14 - 15: Length of the name of the snapshot
514 16 - 19: Time at which the snapshot was taken in seconds since the
517 20 - 23: Subsecond part of the time at which the snapshot was taken
520 24 - 31: Time that the guest was running until the snapshot was
523 32 - 35: Size of the VM state in bytes. 0 if no VM state is saved.
524 If there is VM state, it starts at the first cluster
525 described by first L1 table entry that doesn't describe a
526 regular guest cluster (i.e. VM state is stored like guest
527 disk content, except that it is stored at offsets that are
528 larger than the virtual disk presented to the guest)
530 36 - 39: Size of extra data in the table entry (used for future
531 extensions of the format)
533 variable: Extra data for future extensions. Unknown fields must be
534 ignored. Currently defined are (offset relative to snapshot
537 Byte 40 - 47: Size of the VM state in bytes. 0 if no VM
538 state is saved. If this field is present,
539 the 32-bit value in bytes 32-35 is ignored.
541 Byte 48 - 55: Virtual disk size of the snapshot in bytes
543 Version 3 images must include extra data at least up to
546 variable: Unique ID string for the snapshot (not null terminated)
548 variable: Name of the snapshot (not null terminated)
550 variable: Padding to round up the snapshot table entry size to the
556 As mentioned above, the bitmaps extension provides the ability to store bitmaps
557 related to a virtual disk. This section describes how these bitmaps are stored.
559 All stored bitmaps are related to the virtual disk stored in the same image, so
560 each bitmap size is equal to the virtual disk size.
562 Each bit of the bitmap is responsible for strictly defined range of the virtual
563 disk. For bit number bit_nr the corresponding range (in bytes) will be:
565 [bit_nr * bitmap_granularity .. (bit_nr + 1) * bitmap_granularity - 1]
567 Granularity is a property of the concrete bitmap, see below.
570 === Bitmap directory ===
572 Each bitmap saved in the image is described in a bitmap directory entry. The
573 bitmap directory is a contiguous area in the image file, whose starting offset
574 and length are given by the header extension fields bitmap_directory_offset and
575 bitmap_directory_size. The entries of the bitmap directory have variable
576 length, depending on the lengths of the bitmap name and extra data.
578 Structure of a bitmap directory entry:
580 Byte 0 - 7: bitmap_table_offset
581 Offset into the image file at which the bitmap table
582 (described below) for the bitmap starts. Must be aligned to
585 8 - 11: bitmap_table_size
586 Number of entries in the bitmap table of the bitmap.
591 The bitmap was not saved correctly and may be
595 The bitmap must reflect all changes of the virtual
596 disk by any application that would write to this qcow2
597 file (including writes, snapshot switching, etc.). The
598 type of this bitmap must be 'dirty tracking bitmap'.
600 2: extra_data_compatible
601 This flags is meaningful when the extra data is
602 unknown to the software (currently any extra data is
604 If it is set, the bitmap may be used as expected, extra
605 data must be left as is.
606 If it is not set, the bitmap must not be used, but
607 both it and its extra data be left as is.
609 Bits 3 - 31 are reserved and must be 0.
612 This field describes the sort of the bitmap.
614 1: Dirty tracking bitmap
616 Values 0, 2 - 255 are reserved.
619 Granularity bits. Valid values: 0 - 63.
621 Note: Qemu currently supports only values 9 - 31.
623 Granularity is calculated as
624 granularity = 1 << granularity_bits
626 A bitmap's granularity is how many bytes of the image
627 accounts for one bit of the bitmap.
630 Size of the bitmap name. Must be non-zero.
632 Note: Qemu currently doesn't support values greater than
635 20 - 23: extra_data_size
636 Size of type-specific extra data.
638 For now, as no extra data is defined, extra_data_size is
639 reserved and should be zero. If it is non-zero the
640 behavior is defined by extra_data_compatible flag.
643 Extra data for the bitmap, occupying extra_data_size bytes.
644 Extra data must never contain references to clusters or in
645 some other way allocate additional clusters.
648 The name of the bitmap (not null terminated), occupying
649 name_size bytes. Must be unique among all bitmap names
650 within the bitmaps extension.
652 variable: Padding to round up the bitmap directory entry size to the
653 next multiple of 8. All bytes of the padding must be zero.
658 Each bitmap is stored using a one-level structure (as opposed to two-level
659 structures like for refcounts and guest clusters mapping) for the mapping of
660 bitmap data to host clusters. This structure is called the bitmap table.
662 Each bitmap table has a variable size (stored in the bitmap directory entry)
663 and may use multiple clusters, however, it must be contiguous in the image
666 Structure of a bitmap table entry:
668 Bit 0: Reserved and must be zero if bits 9 - 55 are non-zero.
669 If bits 9 - 55 are zero:
670 0: Cluster should be read as all zeros.
671 1: Cluster should be read as all ones.
673 1 - 8: Reserved and must be zero.
675 9 - 55: Bits 9 - 55 of the host cluster offset. Must be aligned to
676 a cluster boundary. If the offset is 0, the cluster is
677 unallocated; in that case, bit 0 determines how this
678 cluster should be treated during reads.
680 56 - 63: Reserved and must be zero.
685 As noted above, bitmap data is stored in separate clusters, described by the
686 bitmap table. Given an offset (in bytes) into the bitmap data, the offset into
687 the image file can be obtained as follows:
689 image_offset(bitmap_data_offset) =
690 bitmap_table[bitmap_data_offset / cluster_size] +
691 (bitmap_data_offset % cluster_size)
693 This offset is not defined if bits 9 - 55 of bitmap table entry are zero (see
696 Given an offset byte_nr into the virtual disk and the bitmap's granularity, the
697 bit offset into the image file to the corresponding bit of the bitmap can be
698 calculated like this:
700 bit_offset(byte_nr) =
701 image_offset(byte_nr / granularity / 8) * 8 +
702 (byte_nr / granularity) % 8
704 If the size of the bitmap data is not a multiple of the cluster size then the
705 last cluster of the bitmap data contains some unused tail bits. These bits must
709 === Dirty tracking bitmaps ===
711 Bitmaps with 'type' field equal to one are dirty tracking bitmaps.
713 When the virtual disk is in use dirty tracking bitmap may be 'enabled' or
714 'disabled'. While the bitmap is 'enabled', all writes to the virtual disk
715 should be reflected in the bitmap. A set bit in the bitmap means that the
716 corresponding range of the virtual disk (see above) was written to while the
717 bitmap was 'enabled'. An unset bit means that this range was not written to.
719 The software doesn't have to sync the bitmap in the image file with its
720 representation in RAM after each write. Flag 'in_use' should be set while the
721 bitmap is not synced.
723 In the image file the 'enabled' state is reflected by the 'auto' flag. If this
724 flag is set, the software must consider the bitmap as 'enabled' and start
725 tracking virtual disk changes to this bitmap from the first write to the
726 virtual disk. If this flag is not set then the bitmap is disabled.