]>
Commit | Line | Data |
---|---|---|
03feae73 KW |
1 | == General == |
2 | ||
3 | A qcow2 image file is organized in units of constant size, which are called | |
4 | (host) clusters. A cluster is the unit in which all allocations are done, | |
5 | both for actual guest data and for image metadata. | |
6 | ||
7 | Likewise, the virtual disk as seen by the guest is divided into (guest) | |
8 | clusters of the same size. | |
9 | ||
10 | All numbers in qcow2 are stored in Big Endian byte order. | |
11 | ||
12 | ||
13 | == Header == | |
14 | ||
15 | The first cluster of a qcow2 image contains the file header: | |
16 | ||
17 | Byte 0 - 3: magic | |
18 | QCOW magic string ("QFI\xfb") | |
19 | ||
20 | 4 - 7: version | |
4fabffc1 | 21 | Version number (valid values are 2 and 3) |
03feae73 KW |
22 | |
23 | 8 - 15: backing_file_offset | |
24 | Offset into the image file at which the backing file name | |
25 | is stored (NB: The string is not null terminated). 0 if the | |
26 | image doesn't have a backing file. | |
27 | ||
28 | 16 - 19: backing_file_size | |
29 | Length of the backing file name in bytes. Must not be | |
30 | longer than 1023 bytes. Undefined if the image doesn't have | |
31 | a backing file. | |
32 | ||
33 | 20 - 23: cluster_bits | |
34 | Number of bits that are used for addressing an offset | |
35 | within a cluster (1 << cluster_bits is the cluster size). | |
36 | Must not be less than 9 (i.e. 512 byte clusters). | |
37 | ||
38 | Note: qemu as of today has an implementation limit of 2 MB | |
39 | as the maximum cluster size and won't be able to open images | |
40 | with larger cluster sizes. | |
41 | ||
42 | 24 - 31: size | |
43 | Virtual disk size in bytes | |
44 | ||
45 | 32 - 35: crypt_method | |
46 | 0 for no encryption | |
47 | 1 for AES encryption | |
48 | ||
49 | 36 - 39: l1_size | |
50 | Number of entries in the active L1 table | |
51 | ||
52 | 40 - 47: l1_table_offset | |
53 | Offset into the image file at which the active L1 table | |
54 | starts. Must be aligned to a cluster boundary. | |
55 | ||
56 | 48 - 55: refcount_table_offset | |
57 | Offset into the image file at which the refcount table | |
58 | starts. Must be aligned to a cluster boundary. | |
59 | ||
60 | 56 - 59: refcount_table_clusters | |
61 | Number of clusters that the refcount table occupies | |
62 | ||
63 | 60 - 63: nb_snapshots | |
64 | Number of snapshots contained in the image | |
65 | ||
66 | 64 - 71: snapshots_offset | |
67 | Offset into the image file at which the snapshot table | |
68 | starts. Must be aligned to a cluster boundary. | |
69 | ||
4fabffc1 KW |
70 | If the version is 3 or higher, the header has the following additional fields. |
71 | For version 2, the values are assumed to be zero, unless specified otherwise | |
72 | in the description of a field. | |
73 | ||
74 | 72 - 79: incompatible_features | |
75 | Bitmask of incompatible features. An implementation must | |
76 | fail to open an image if an unknown bit is set. | |
77 | ||
0f6d767a SH |
78 | Bit 0: Dirty bit. If this bit is set then refcounts |
79 | may be inconsistent, make sure to scan L1/L2 | |
80 | tables to repair refcounts before accessing the | |
81 | image. | |
82 | ||
69c98726 HR |
83 | Bit 1: Corrupt bit. If this bit is set then any data |
84 | structure may be corrupt and the image must not | |
85 | be written to (unless for regaining | |
86 | consistency). | |
87 | ||
88 | Bits 2-63: Reserved (set to 0) | |
4fabffc1 KW |
89 | |
90 | 80 - 87: compatible_features | |
91 | Bitmask of compatible features. An implementation can | |
92 | safely ignore any unknown bits that are set. | |
93 | ||
dae8796d SH |
94 | Bit 0: Lazy refcounts bit. If this bit is set then |
95 | lazy refcount updates can be used. This means | |
96 | marking the image file dirty and postponing | |
97 | refcount metadata updates. | |
98 | ||
99 | Bits 1-63: Reserved (set to 0) | |
4fabffc1 KW |
100 | |
101 | 88 - 95: autoclear_features | |
102 | Bitmask of auto-clear features. An implementation may only | |
103 | write to an image with unknown auto-clear features if it | |
104 | clears the respective bits from this field first. | |
105 | ||
106 | Bits 0-63: Reserved (set to 0) | |
107 | ||
108 | 96 - 99: refcount_order | |
109 | Describes the width of a reference count block entry (width | |
6815bce5 MK |
110 | in bits: refcount_bits = 1 << refcount_order). For version 2 |
111 | images, the order is always assumed to be 4 | |
112 | (i.e. refcount_bits = 16). | |
7f75a07d | 113 | This value may not exceed 6 (i.e. refcount_bits = 64). |
4fabffc1 KW |
114 | |
115 | 100 - 103: header_length | |
116 | Length of the header structure in bytes. For version 2 | |
117 | images, the length is always assumed to be 72 bytes. | |
118 | ||
03feae73 KW |
119 | Directly after the image header, optional sections called header extensions can |
120 | be stored. Each extension has a structure like the following: | |
121 | ||
122 | Byte 0 - 3: Header extension type: | |
123 | 0x00000000 - End of the header extension area | |
124 | 0xE2792ACA - Backing file format name | |
4fabffc1 | 125 | 0x6803f857 - Feature name table |
03feae73 KW |
126 | other - Unknown header extension, can be safely |
127 | ignored | |
128 | ||
129 | 4 - 7: Length of the header extension data | |
130 | ||
131 | 8 - n: Header extension data | |
132 | ||
133 | n - m: Padding to round up the header extension size to the next | |
134 | multiple of 8. | |
135 | ||
4fabffc1 KW |
136 | Unless stated otherwise, each header extension type shall appear at most once |
137 | in the same image. | |
138 | ||
8e436ec1 MK |
139 | If the image has a backing file then the backing file name should be stored in |
140 | the remaining space between the end of the header extension area and the end of | |
141 | the first cluster. It is not allowed to store other data here, so that an | |
142 | implementation can safely modify the header and add extensions without harming | |
143 | data of compatible features that it doesn't support. Compatible features that | |
144 | need space for additional data can use a header extension. | |
4fabffc1 KW |
145 | |
146 | ||
147 | == Feature name table == | |
148 | ||
149 | The feature name table is an optional header extension that contains the name | |
150 | for features used by the image. It can be used by applications that don't know | |
151 | the respective feature (e.g. because the feature was introduced only later) to | |
152 | display a useful error message. | |
153 | ||
154 | The number of entries in the feature name table is determined by the length of | |
155 | the header extension data. Each entry look like this: | |
156 | ||
157 | Byte 0: Type of feature (select feature bitmap) | |
158 | 0: Incompatible feature | |
159 | 1: Compatible feature | |
160 | 2: Autoclear feature | |
161 | ||
162 | 1: Bit number within the selected feature bitmap (valid | |
163 | values: 0-63) | |
164 | ||
165 | 2 - 47: Feature name (padded with zeros, but not necessarily null | |
166 | terminated if it has full length) | |
03feae73 KW |
167 | |
168 | ||
169 | == Host cluster management == | |
170 | ||
171 | qcow2 manages the allocation of host clusters by maintaining a reference count | |
172 | for each host cluster. A refcount of 0 means that the cluster is free, 1 means | |
173 | that it is used, and >= 2 means that it is used and any write access must | |
174 | perform a COW (copy on write) operation. | |
175 | ||
176 | The refcounts are managed in a two-level table. The first level is called | |
177 | refcount table and has a variable size (which is stored in the header). The | |
178 | refcount table can cover multiple clusters, however it needs to be contiguous | |
179 | in the image file. | |
180 | ||
181 | It contains pointers to the second level structures which are called refcount | |
182 | blocks and are exactly one cluster in size. | |
183 | ||
184 | Given a offset into the image file, the refcount of its cluster can be obtained | |
185 | as follows: | |
186 | ||
4b318d6c | 187 | refcount_block_entries = (cluster_size * 8 / refcount_bits) |
03feae73 | 188 | |
3789985f ZYW |
189 | refcount_block_index = (offset / cluster_size) % refcount_block_entries |
190 | refcount_table_index = (offset / cluster_size) / refcount_block_entries | |
03feae73 KW |
191 | |
192 | refcount_block = load_cluster(refcount_table[refcount_table_index]); | |
193 | return refcount_block[refcount_block_index]; | |
194 | ||
195 | Refcount table entry: | |
196 | ||
197 | Bit 0 - 8: Reserved (set to 0) | |
198 | ||
199 | 9 - 63: Bits 9-63 of the offset into the image file at which the | |
200 | refcount block starts. Must be aligned to a cluster | |
201 | boundary. | |
202 | ||
203 | If this is 0, the corresponding refcount block has not yet | |
204 | been allocated. All refcounts managed by this refcount block | |
205 | are 0. | |
206 | ||
4fabffc1 | 207 | Refcount block entry (x = refcount_bits - 1): |
03feae73 | 208 | |
4fabffc1 KW |
209 | Bit 0 - x: Reference count of the cluster. If refcount_bits implies a |
210 | sub-byte width, note that bit 0 means the least significant | |
211 | bit in this context. | |
03feae73 KW |
212 | |
213 | ||
214 | == Cluster mapping == | |
215 | ||
216 | Just as for refcounts, qcow2 uses a two-level structure for the mapping of | |
217 | guest clusters to host clusters. They are called L1 and L2 table. | |
218 | ||
219 | The L1 table has a variable size (stored in the header) and may use multiple | |
220 | clusters, however it must be contiguous in the image file. L2 tables are | |
221 | exactly one cluster in size. | |
222 | ||
223 | Given a offset into the virtual disk, the offset into the image file can be | |
224 | obtained as follows: | |
225 | ||
226 | l2_entries = (cluster_size / sizeof(uint64_t)) | |
227 | ||
228 | l2_index = (offset / cluster_size) % l2_entries | |
229 | l1_index = (offset / cluster_size) / l2_entries | |
230 | ||
231 | l2_table = load_cluster(l1_table[l1_index]); | |
232 | cluster_offset = l2_table[l2_index]; | |
233 | ||
234 | return cluster_offset + (offset % cluster_size) | |
235 | ||
236 | L1 table entry: | |
237 | ||
238 | Bit 0 - 8: Reserved (set to 0) | |
239 | ||
240 | 9 - 55: Bits 9-55 of the offset into the image file at which the L2 | |
241 | table starts. Must be aligned to a cluster boundary. If the | |
242 | offset is 0, the L2 table and all clusters described by this | |
243 | L2 table are unallocated. | |
244 | ||
245 | 56 - 62: Reserved (set to 0) | |
246 | ||
247 | 63: 0 for an L2 table that is unused or requires COW, 1 if its | |
248 | refcount is exactly one. This information is only accurate | |
249 | in the active L1 table. | |
250 | ||
4fabffc1 | 251 | L2 table entry: |
03feae73 | 252 | |
4fabffc1 KW |
253 | Bit 0 - 61: Cluster descriptor |
254 | ||
255 | 62: 0 for standard clusters | |
256 | 1 for compressed clusters | |
257 | ||
258 | 63: 0 for a cluster that is unused or requires COW, 1 if its | |
259 | refcount is exactly one. This information is only accurate | |
b6af0975 | 260 | in L2 tables that are reachable from the active L1 |
4fabffc1 KW |
261 | table. |
262 | ||
263 | Standard Cluster Descriptor: | |
264 | ||
265 | Bit 0: If set to 1, the cluster reads as all zeros. The host | |
266 | cluster offset can be used to describe a preallocation, | |
267 | but it won't be used for reading data from this cluster, | |
268 | nor is data read from the backing file if the cluster is | |
269 | unallocated. | |
270 | ||
271 | With version 2, this is always 0. | |
272 | ||
273 | 1 - 8: Reserved (set to 0) | |
03feae73 KW |
274 | |
275 | 9 - 55: Bits 9-55 of host cluster offset. Must be aligned to a | |
276 | cluster boundary. If the offset is 0, the cluster is | |
277 | unallocated. | |
278 | ||
279 | 56 - 61: Reserved (set to 0) | |
280 | ||
03feae73 | 281 | |
bf3f363a | 282 | Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)): |
03feae73 KW |
283 | |
284 | Bit 0 - x: Host cluster offset. This is usually _not_ aligned to a | |
285 | cluster boundary! | |
286 | ||
287 | x+1 - 61: Compressed size of the images in sectors of 512 bytes | |
288 | ||
03feae73 | 289 | If a cluster is unallocated, read requests shall read the data from the backing |
4fabffc1 KW |
290 | file (except if bit 0 in the Standard Cluster Descriptor is set). If there is |
291 | no backing file or the backing file is smaller than the image, they shall read | |
292 | zeros for all parts that are not covered by the backing file. | |
03feae73 KW |
293 | |
294 | ||
295 | == Snapshots == | |
296 | ||
297 | qcow2 supports internal snapshots. Their basic principle of operation is to | |
298 | switch the active L1 table, so that a different set of host clusters are | |
299 | exposed to the guest. | |
300 | ||
301 | When creating a snapshot, the L1 table should be copied and the refcount of all | |
3789985f | 302 | L2 tables and clusters reachable from this L1 table must be increased, so that |
03feae73 KW |
303 | a write causes a COW and isn't visible in other snapshots. |
304 | ||
305 | When loading a snapshot, bit 63 of all entries in the new active L1 table and | |
306 | all L2 tables referenced by it must be reconstructed from the refcount table | |
307 | as it doesn't need to be accurate in inactive L1 tables. | |
308 | ||
309 | A directory of all snapshots is stored in the snapshot table, a contiguous area | |
310 | in the image file, whose starting offset and length are given by the header | |
311 | fields snapshots_offset and nb_snapshots. The entries of the snapshot table | |
312 | have variable length, depending on the length of ID, name and extra data. | |
313 | ||
314 | Snapshot table entry: | |
315 | ||
316 | Byte 0 - 7: Offset into the image file at which the L1 table for the | |
317 | snapshot starts. Must be aligned to a cluster boundary. | |
318 | ||
319 | 8 - 11: Number of entries in the L1 table of the snapshots | |
320 | ||
321 | 12 - 13: Length of the unique ID string describing the snapshot | |
322 | ||
323 | 14 - 15: Length of the name of the snapshot | |
324 | ||
325 | 16 - 19: Time at which the snapshot was taken in seconds since the | |
326 | Epoch | |
327 | ||
328 | 20 - 23: Subsecond part of the time at which the snapshot was taken | |
329 | in nanoseconds | |
330 | ||
331 | 24 - 31: Time that the guest was running until the snapshot was | |
332 | taken in nanoseconds | |
333 | ||
334 | 32 - 35: Size of the VM state in bytes. 0 if no VM state is saved. | |
335 | If there is VM state, it starts at the first cluster | |
336 | described by first L1 table entry that doesn't describe a | |
337 | regular guest cluster (i.e. VM state is stored like guest | |
338 | disk content, except that it is stored at offsets that are | |
339 | larger than the virtual disk presented to the guest) | |
340 | ||
341 | 36 - 39: Size of extra data in the table entry (used for future | |
342 | extensions of the format) | |
343 | ||
c2c9a466 KW |
344 | variable: Extra data for future extensions. Unknown fields must be |
345 | ignored. Currently defined are (offset relative to snapshot | |
346 | table entry): | |
347 | ||
348 | Byte 40 - 47: Size of the VM state in bytes. 0 if no VM | |
349 | state is saved. If this field is present, | |
350 | the 32-bit value in bytes 32-35 is ignored. | |
03feae73 | 351 | |
4fabffc1 KW |
352 | Byte 48 - 55: Virtual disk size of the snapshot in bytes |
353 | ||
354 | Version 3 images must include extra data at least up to | |
355 | byte 55. | |
356 | ||
03feae73 KW |
357 | variable: Unique ID string for the snapshot (not null terminated) |
358 | ||
359 | variable: Name of the snapshot (not null terminated) | |
f2520804 HR |
360 | |
361 | variable: Padding to round up the snapshot table entry size to the | |
362 | next multiple of 8. |