]>
Commit | Line | Data |
---|---|---|
8508eee7 KC |
1 | .. |
2 | Copyright (C) 2017 Red Hat Inc. | |
3 | ||
4 | This work is licensed under the terms of the GNU GPL, version 2 or | |
5 | later. See the COPYING file in the top-level directory. | |
6 | ||
7 | ============================ | |
8 | Live Block Device Operations | |
9 | ============================ | |
10 | ||
11 | QEMU Block Layer currently (as of QEMU 2.9) supports four major kinds of | |
12 | live block device jobs -- stream, commit, mirror, and backup. These can | |
13 | be used to manipulate disk image chains to accomplish certain tasks, | |
14 | namely: live copy data from backing files into overlays; shorten long | |
15 | disk image chains by merging data from overlays into backing files; live | |
16 | synchronize data from a disk image chain (including current active disk) | |
17 | to another target image; and point-in-time (and incremental) backups of | |
18 | a block device. Below is a description of the said block (QMP) | |
19 | primitives, and some (non-exhaustive list of) examples to illustrate | |
20 | their use. | |
21 | ||
22 | .. note:: | |
23 | The file ``qapi/block-core.json`` in the QEMU source tree has the | |
24 | canonical QEMU API (QAPI) schema documentation for the QMP | |
25 | primitives discussed here. | |
26 | ||
27 | .. todo (kashyapc):: Remove the ".. contents::" directive when Sphinx is | |
28 | integrated. | |
29 | ||
30 | .. contents:: | |
31 | ||
32 | Disk image backing chain notation | |
33 | --------------------------------- | |
34 | ||
35 | A simple disk image chain. (This can be created live using QMP | |
36 | ``blockdev-snapshot-sync``, or offline via ``qemu-img``):: | |
37 | ||
38 | (Live QEMU) | |
39 | | | |
40 | . | |
41 | V | |
42 | ||
43 | [A] <----- [B] | |
44 | ||
45 | (backing file) (overlay) | |
46 | ||
47 | The arrow can be read as: Image [A] is the backing file of disk image | |
48 | [B]. And live QEMU is currently writing to image [B], consequently, it | |
49 | is also referred to as the "active layer". | |
50 | ||
51 | There are two kinds of terminology that are common when referring to | |
52 | files in a disk image backing chain: | |
53 | ||
54 | (1) Directional: 'base' and 'top'. Given the simple disk image chain | |
55 | above, image [A] can be referred to as 'base', and image [B] as | |
56 | 'top'. (This terminology can be seen in in QAPI schema file, | |
57 | block-core.json.) | |
58 | ||
59 | (2) Relational: 'backing file' and 'overlay'. Again, taking the same | |
60 | simple disk image chain from the above, disk image [A] is referred | |
61 | to as the backing file, and image [B] as overlay. | |
62 | ||
63 | Throughout this document, we will use the relational terminology. | |
64 | ||
65 | .. important:: | |
66 | The overlay files can generally be any format that supports a | |
67 | backing file, although QCOW2 is the preferred format and the one | |
68 | used in this document. | |
69 | ||
70 | ||
71 | Brief overview of live block QMP primitives | |
72 | ------------------------------------------- | |
73 | ||
74 | The following are the four different kinds of live block operations that | |
75 | QEMU block layer supports. | |
76 | ||
77 | (1) ``block-stream``: Live copy of data from backing files into overlay | |
78 | files. | |
79 | ||
80 | .. note:: Once the 'stream' operation has finished, three things to | |
81 | note: | |
82 | ||
83 | (a) QEMU rewrites the backing chain to remove | |
84 | reference to the now-streamed and redundant backing | |
85 | file; | |
86 | ||
87 | (b) the streamed file *itself* won't be removed by QEMU, | |
88 | and must be explicitly discarded by the user; | |
89 | ||
90 | (c) the streamed file remains valid -- i.e. further | |
91 | overlays can be created based on it. Refer the | |
92 | ``block-stream`` section further below for more | |
93 | details. | |
94 | ||
95 | (2) ``block-commit``: Live merge of data from overlay files into backing | |
96 | files (with the optional goal of removing the overlay file from the | |
97 | chain). Since QEMU 2.0, this includes "active ``block-commit``" | |
98 | (i.e. merge the current active layer into the base image). | |
99 | ||
100 | .. note:: Once the 'commit' operation has finished, there are three | |
101 | things to note here as well: | |
102 | ||
103 | (a) QEMU rewrites the backing chain to remove reference | |
104 | to now-redundant overlay images that have been | |
105 | committed into a backing file; | |
106 | ||
107 | (b) the committed file *itself* won't be removed by QEMU | |
108 | -- it ought to be manually removed; | |
109 | ||
110 | (c) however, unlike in the case of ``block-stream``, the | |
111 | intermediate images will be rendered invalid -- i.e. | |
112 | no more further overlays can be created based on | |
113 | them. Refer the ``block-commit`` section further | |
114 | below for more details. | |
115 | ||
116 | (3) ``drive-mirror`` (and ``blockdev-mirror``): Synchronize a running | |
117 | disk to another image. | |
118 | ||
119 | (4) ``drive-backup`` (and ``blockdev-backup``): Point-in-time (live) copy | |
120 | of a block device to a destination. | |
121 | ||
122 | ||
123 | .. _`Interacting with a QEMU instance`: | |
124 | ||
125 | Interacting with a QEMU instance | |
126 | -------------------------------- | |
127 | ||
128 | To show some example invocations of command-line, we will use the | |
129 | following invocation of QEMU, with a QMP server running over UNIX | |
130 | socket:: | |
131 | ||
2feac451 | 132 | $ ./x86_64-softmmu/qemu-system-x86_64 -display none -no-user-config \ |
8508eee7 KC |
133 | -M q35 -nodefaults -m 512 \ |
134 | -blockdev node-name=node-A,driver=qcow2,file.driver=file,file.node-name=file,file.filename=./a.qcow2 \ | |
135 | -device virtio-blk,drive=node-A,id=virtio0 \ | |
136 | -monitor stdio -qmp unix:/tmp/qmp-sock,server,nowait | |
137 | ||
138 | The ``-blockdev`` command-line option, used above, is available from | |
139 | QEMU 2.9 onwards. In the above invocation, notice the ``node-name`` | |
140 | parameter that is used to refer to the disk image a.qcow2 ('node-A') -- | |
141 | this is a cleaner way to refer to a disk image (as opposed to referring | |
142 | to it by spelling out file paths). So, we will continue to designate a | |
143 | ``node-name`` to each further disk image created (either via | |
144 | ``blockdev-snapshot-sync``, or ``blockdev-add``) as part of the disk | |
145 | image chain, and continue to refer to the disks using their | |
146 | ``node-name`` (where possible, because ``block-commit`` does not yet, as | |
147 | of QEMU 2.9, accept ``node-name`` parameter) when performing various | |
148 | block operations. | |
149 | ||
150 | To interact with the QEMU instance launched above, we will use the | |
151 | ``qmp-shell`` utility (located at: ``qemu/scripts/qmp``, as part of the | |
152 | QEMU source directory), which takes key-value pairs for QMP commands. | |
153 | Invoke it as below (which will also print out the complete raw JSON | |
154 | syntax for reference -- examples in the following sections):: | |
155 | ||
156 | $ ./qmp-shell -v -p /tmp/qmp-sock | |
157 | (QEMU) | |
158 | ||
159 | .. note:: | |
160 | In the event we have to repeat a certain QMP command, we will: for | |
161 | the first occurrence of it, show the ``qmp-shell`` invocation, *and* | |
162 | the corresponding raw JSON QMP syntax; but for subsequent | |
163 | invocations, present just the ``qmp-shell`` syntax, and omit the | |
164 | equivalent JSON output. | |
165 | ||
166 | ||
167 | Example disk image chain | |
168 | ------------------------ | |
169 | ||
170 | We will use the below disk image chain (and occasionally spelling it | |
171 | out where appropriate) when discussing various primitives:: | |
172 | ||
173 | [A] <-- [B] <-- [C] <-- [D] | |
174 | ||
175 | Where [A] is the original base image; [B] and [C] are intermediate | |
176 | overlay images; image [D] is the active layer -- i.e. live QEMU is | |
177 | writing to it. (The rule of thumb is: live QEMU will always be pointing | |
178 | to the rightmost image in a disk image chain.) | |
179 | ||
180 | The above image chain can be created by invoking | |
181 | ``blockdev-snapshot-sync`` commands as following (which shows the | |
182 | creation of overlay image [B]) using the ``qmp-shell`` (our invocation | |
183 | also prints the raw JSON invocation of it):: | |
184 | ||
185 | (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2 | |
186 | { | |
187 | "execute": "blockdev-snapshot-sync", | |
188 | "arguments": { | |
189 | "node-name": "node-A", | |
190 | "snapshot-file": "b.qcow2", | |
191 | "format": "qcow2", | |
192 | "snapshot-node-name": "node-B" | |
193 | } | |
194 | } | |
195 | ||
196 | Here, "node-A" is the name QEMU internally uses to refer to the base | |
197 | image [A] -- it is the backing file, based on which the overlay image, | |
198 | [B], is created. | |
199 | ||
200 | To create the rest of the overlay images, [C], and [D] (omitting the raw | |
201 | JSON output for brevity):: | |
202 | ||
203 | (QEMU) blockdev-snapshot-sync node-name=node-B snapshot-file=c.qcow2 snapshot-node-name=node-C format=qcow2 | |
204 | (QEMU) blockdev-snapshot-sync node-name=node-C snapshot-file=d.qcow2 snapshot-node-name=node-D format=qcow2 | |
205 | ||
206 | ||
207 | A note on points-in-time vs file names | |
208 | -------------------------------------- | |
209 | ||
210 | In our disk image chain:: | |
211 | ||
212 | [A] <-- [B] <-- [C] <-- [D] | |
213 | ||
214 | We have *three* points in time and an active layer: | |
215 | ||
216 | - Point 1: Guest state when [B] was created is contained in file [A] | |
217 | - Point 2: Guest state when [C] was created is contained in [A] + [B] | |
218 | - Point 3: Guest state when [D] was created is contained in | |
219 | [A] + [B] + [C] | |
220 | - Active layer: Current guest state is contained in [A] + [B] + [C] + | |
221 | [D] | |
222 | ||
223 | Therefore, be aware with naming choices: | |
224 | ||
225 | - Naming a file after the time it is created is misleading -- the | |
226 | guest data for that point in time is *not* contained in that file | |
227 | (as explained earlier) | |
228 | - Rather, think of files as a *delta* from the backing file | |
229 | ||
230 | ||
231 | Live block streaming --- ``block-stream`` | |
232 | ----------------------------------------- | |
233 | ||
234 | The ``block-stream`` command allows you to do live copy data from backing | |
235 | files into overlay images. | |
236 | ||
237 | Given our original example disk image chain from earlier:: | |
238 | ||
239 | [A] <-- [B] <-- [C] <-- [D] | |
240 | ||
241 | The disk image chain can be shortened in one of the following different | |
242 | ways (not an exhaustive list). | |
243 | ||
244 | .. _`Case-1`: | |
245 | ||
246 | (1) Merge everything into the active layer: I.e. copy all contents from | |
247 | the base image, [A], and overlay images, [B] and [C], into [D], | |
248 | *while* the guest is running. The resulting chain will be a | |
249 | standalone image, [D] -- with contents from [A], [B] and [C] merged | |
250 | into it (where live QEMU writes go to):: | |
251 | ||
252 | [D] | |
253 | ||
254 | .. _`Case-2`: | |
255 | ||
256 | (2) Taking the same example disk image chain mentioned earlier, merge | |
257 | only images [B] and [C] into [D], the active layer. The result will | |
258 | be contents of images [B] and [C] will be copied into [D], and the | |
259 | backing file pointer of image [D] will be adjusted to point to image | |
260 | [A]. The resulting chain will be:: | |
261 | ||
262 | [A] <-- [D] | |
263 | ||
264 | .. _`Case-3`: | |
265 | ||
266 | (3) Intermediate streaming (available since QEMU 2.8): Starting afresh | |
267 | with the original example disk image chain, with a total of four | |
268 | images, it is possible to copy contents from image [B] into image | |
269 | [C]. Once the copy is finished, image [B] can now be (optionally) | |
270 | discarded; and the backing file pointer of image [C] will be | |
271 | adjusted to point to [A]. I.e. after performing "intermediate | |
272 | streaming" of [B] into [C], the resulting image chain will be (where | |
273 | live QEMU is writing to [D]):: | |
274 | ||
275 | [A] <-- [C] <-- [D] | |
276 | ||
277 | ||
278 | QMP invocation for ``block-stream`` | |
279 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
280 | ||
281 | For `Case-1`_, to merge contents of all the backing files into the | |
282 | active layer, where 'node-D' is the current active image (by default | |
283 | ``block-stream`` will flatten the entire chain); ``qmp-shell`` (and its | |
284 | corresponding JSON output):: | |
285 | ||
286 | (QEMU) block-stream device=node-D job-id=job0 | |
287 | { | |
288 | "execute": "block-stream", | |
289 | "arguments": { | |
290 | "device": "node-D", | |
291 | "job-id": "job0" | |
292 | } | |
293 | } | |
294 | ||
295 | For `Case-2`_, merge contents of the images [B] and [C] into [D], where | |
296 | image [D] ends up referring to image [A] as its backing file:: | |
297 | ||
298 | (QEMU) block-stream device=node-D base-node=node-A job-id=job0 | |
299 | ||
300 | And for `Case-3`_, of "intermediate" streaming", merge contents of | |
301 | images [B] into [C], where [C] ends up referring to [A] as its backing | |
302 | image:: | |
303 | ||
304 | (QEMU) block-stream device=node-C base-node=node-A job-id=job0 | |
305 | ||
306 | Progress of a ``block-stream`` operation can be monitored via the QMP | |
307 | command:: | |
308 | ||
309 | (QEMU) query-block-jobs | |
310 | { | |
311 | "execute": "query-block-jobs", | |
312 | "arguments": {} | |
313 | } | |
314 | ||
315 | ||
316 | Once the ``block-stream`` operation has completed, QEMU will emit an | |
317 | event, ``BLOCK_JOB_COMPLETED``. The intermediate overlays remain valid, | |
318 | and can now be (optionally) discarded, or retained to create further | |
319 | overlays based on them. Finally, the ``block-stream`` jobs can be | |
320 | restarted at anytime. | |
321 | ||
322 | ||
323 | Live block commit --- ``block-commit`` | |
324 | -------------------------------------- | |
325 | ||
326 | The ``block-commit`` command lets you merge live data from overlay | |
327 | images into backing file(s). Since QEMU 2.0, this includes "live active | |
328 | commit" (i.e. it is possible to merge the "active layer", the right-most | |
329 | image in a disk image chain where live QEMU will be writing to, into the | |
330 | base image). This is analogous to ``block-stream``, but in the opposite | |
331 | direction. | |
332 | ||
333 | Again, starting afresh with our example disk image chain, where live | |
334 | QEMU is writing to the right-most image in the chain, [D]:: | |
335 | ||
336 | [A] <-- [B] <-- [C] <-- [D] | |
337 | ||
338 | The disk image chain can be shortened in one of the following ways: | |
339 | ||
340 | .. _`block-commit_Case-1`: | |
341 | ||
342 | (1) Commit content from only image [B] into image [A]. The resulting | |
343 | chain is the following, where image [C] is adjusted to point at [A] | |
344 | as its new backing file:: | |
345 | ||
346 | [A] <-- [C] <-- [D] | |
347 | ||
348 | (2) Commit content from images [B] and [C] into image [A]. The | |
349 | resulting chain, where image [D] is adjusted to point to image [A] | |
350 | as its new backing file:: | |
351 | ||
352 | [A] <-- [D] | |
353 | ||
354 | .. _`block-commit_Case-3`: | |
355 | ||
356 | (3) Commit content from images [B], [C], and the active layer [D] into | |
357 | image [A]. The resulting chain (in this case, a consolidated single | |
358 | image):: | |
359 | ||
360 | [A] | |
361 | ||
362 | (4) Commit content from image only image [C] into image [B]. The | |
363 | resulting chain:: | |
364 | ||
365 | [A] <-- [B] <-- [D] | |
366 | ||
367 | (5) Commit content from image [C] and the active layer [D] into image | |
368 | [B]. The resulting chain:: | |
369 | ||
370 | [A] <-- [B] | |
371 | ||
372 | ||
373 | QMP invocation for ``block-commit`` | |
374 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
375 | ||
376 | For :ref:`Case-1 <block-commit_Case-1>`, to merge contents only from | |
377 | image [B] into image [A], the invocation is as follows:: | |
378 | ||
379 | (QEMU) block-commit device=node-D base=a.qcow2 top=b.qcow2 job-id=job0 | |
380 | { | |
381 | "execute": "block-commit", | |
382 | "arguments": { | |
383 | "device": "node-D", | |
384 | "job-id": "job0", | |
385 | "top": "b.qcow2", | |
386 | "base": "a.qcow2" | |
387 | } | |
388 | } | |
389 | ||
390 | Once the above ``block-commit`` operation has completed, a | |
391 | ``BLOCK_JOB_COMPLETED`` event will be issued, and no further action is | |
392 | required. As the end result, the backing file of image [C] is adjusted | |
393 | to point to image [A], and the original 4-image chain will end up being | |
394 | transformed to:: | |
395 | ||
396 | [A] <-- [C] <-- [D] | |
397 | ||
398 | .. note:: | |
399 | The intermediate image [B] is invalid (as in: no more further | |
400 | overlays based on it can be created). | |
401 | ||
402 | Reasoning: An intermediate image after a 'stream' operation still | |
403 | represents that old point-in-time, and may be valid in that context. | |
404 | However, an intermediate image after a 'commit' operation no longer | |
405 | represents any point-in-time, and is invalid in any context. | |
406 | ||
407 | ||
408 | However, :ref:`Case-3 <block-commit_Case-3>` (also called: "active | |
409 | ``block-commit``") is a *two-phase* operation: In the first phase, the | |
410 | content from the active overlay, along with the intermediate overlays, | |
411 | is copied into the backing file (also called the base image). In the | |
412 | second phase, adjust the said backing file as the current active image | |
413 | -- possible via issuing the command ``block-job-complete``. Optionally, | |
414 | the ``block-commit`` operation can be cancelled by issuing the command | |
415 | ``block-job-cancel``, but be careful when doing this. | |
416 | ||
417 | Once the ``block-commit`` operation has completed, the event | |
418 | ``BLOCK_JOB_READY`` will be emitted, signalling that the synchronization | |
419 | has finished. Now the job can be gracefully completed by issuing the | |
420 | command ``block-job-complete`` -- until such a command is issued, the | |
421 | 'commit' operation remains active. | |
422 | ||
423 | The following is the flow for :ref:`Case-3 <block-commit_Case-3>` to | |
424 | convert a disk image chain such as this:: | |
425 | ||
426 | [A] <-- [B] <-- [C] <-- [D] | |
427 | ||
428 | Into:: | |
429 | ||
430 | [A] | |
431 | ||
432 | Where content from all the subsequent overlays, [B], and [C], including | |
433 | the active layer, [D], is committed back to [A] -- which is where live | |
434 | QEMU is performing all its current writes). | |
435 | ||
436 | Start the "active ``block-commit``" operation:: | |
437 | ||
438 | (QEMU) block-commit device=node-D base=a.qcow2 top=d.qcow2 job-id=job0 | |
439 | { | |
440 | "execute": "block-commit", | |
441 | "arguments": { | |
442 | "device": "node-D", | |
443 | "job-id": "job0", | |
444 | "top": "d.qcow2", | |
445 | "base": "a.qcow2" | |
446 | } | |
447 | } | |
448 | ||
449 | ||
450 | Once the synchronization has completed, the event ``BLOCK_JOB_READY`` will | |
451 | be emitted. | |
452 | ||
453 | Then, optionally query for the status of the active block operations. | |
454 | We can see the 'commit' job is now ready to be completed, as indicated | |
455 | by the line *"ready": true*:: | |
456 | ||
457 | (QEMU) query-block-jobs | |
458 | { | |
459 | "execute": "query-block-jobs", | |
460 | "arguments": {} | |
461 | } | |
462 | { | |
463 | "return": [ | |
464 | { | |
465 | "busy": false, | |
466 | "type": "commit", | |
467 | "len": 1376256, | |
468 | "paused": false, | |
469 | "ready": true, | |
470 | "io-status": "ok", | |
471 | "offset": 1376256, | |
472 | "device": "job0", | |
473 | "speed": 0 | |
474 | } | |
475 | ] | |
476 | } | |
477 | ||
478 | Gracefully complete the 'commit' block device job:: | |
479 | ||
480 | (QEMU) block-job-complete device=job0 | |
481 | { | |
482 | "execute": "block-job-complete", | |
483 | "arguments": { | |
484 | "device": "job0" | |
485 | } | |
486 | } | |
487 | { | |
488 | "return": {} | |
489 | } | |
490 | ||
491 | Finally, once the above job is completed, an event | |
492 | ``BLOCK_JOB_COMPLETED`` will be emitted. | |
493 | ||
494 | .. note:: | |
495 | The invocation for rest of the cases (2, 4, and 5), discussed in the | |
496 | previous section, is omitted for brevity. | |
497 | ||
498 | ||
499 | Live disk synchronization --- ``drive-mirror`` and ``blockdev-mirror`` | |
500 | ---------------------------------------------------------------------- | |
501 | ||
502 | Synchronize a running disk image chain (all or part of it) to a target | |
503 | image. | |
504 | ||
505 | Again, given our familiar disk image chain:: | |
506 | ||
507 | [A] <-- [B] <-- [C] <-- [D] | |
508 | ||
c117bb14 KC |
509 | The ``drive-mirror`` (and its newer equivalent ``blockdev-mirror``) |
510 | allows you to copy data from the entire chain into a single target image | |
511 | (which can be located on a different host), [E]. | |
512 | ||
513 | .. note:: | |
514 | ||
515 | When you cancel an in-progress 'mirror' job *before* the source and | |
516 | target are synchronized, ``block-job-cancel`` will emit the event | |
517 | ``BLOCK_JOB_CANCELLED``. However, note that if you cancel a | |
518 | 'mirror' job *after* it has indicated (via the event | |
519 | ``BLOCK_JOB_READY``) that the source and target have reached | |
520 | synchronization, then the event emitted by ``block-job-cancel`` | |
521 | changes to ``BLOCK_JOB_COMPLETED``. | |
522 | ||
523 | Besides the 'mirror' job, the "active ``block-commit``" is the only | |
524 | other block device job that emits the event ``BLOCK_JOB_READY``. | |
525 | The rest of the block device jobs ('stream', "non-active | |
526 | ``block-commit``", and 'backup') end automatically. | |
527 | ||
528 | So there are two possible actions to take, after a 'mirror' job has | |
529 | emitted the event ``BLOCK_JOB_READY``, indicating that the source and | |
530 | target have reached synchronization: | |
531 | ||
532 | (1) Issuing the command ``block-job-cancel`` (after it emits the event | |
533 | ``BLOCK_JOB_COMPLETED``) will create a point-in-time (which is at | |
534 | the time of *triggering* the cancel command) copy of the entire disk | |
8508eee7 | 535 | image chain (or only the top-most image, depending on the ``sync`` |
c117bb14 KC |
536 | mode), contained in the target image [E]. One use case for this is |
537 | live VM migration with non-shared storage. | |
8508eee7 | 538 | |
c117bb14 KC |
539 | (2) Issuing the command ``block-job-complete`` (after it emits the event |
540 | ``BLOCK_JOB_COMPLETED``) will adjust the guest device (i.e. live | |
541 | QEMU) to point to the target image, [E], causing all the new writes | |
542 | from this point on to happen there. | |
8508eee7 KC |
543 | |
544 | About synchronization modes: The synchronization mode determines | |
545 | *which* part of the disk image chain will be copied to the target. | |
546 | Currently, there are four different kinds: | |
547 | ||
548 | (1) ``full`` -- Synchronize the content of entire disk image chain to | |
549 | the target | |
550 | ||
551 | (2) ``top`` -- Synchronize only the contents of the top-most disk image | |
552 | in the chain to the target | |
553 | ||
554 | (3) ``none`` -- Synchronize only the new writes from this point on. | |
555 | ||
556 | .. note:: In the case of ``drive-backup`` (or ``blockdev-backup``), | |
557 | the behavior of ``none`` synchronization mode is different. | |
558 | Normally, a ``backup`` job consists of two parts: Anything | |
559 | that is overwritten by the guest is first copied out to | |
560 | the backup, and in the background the whole image is | |
561 | copied from start to end. With ``sync=none``, it's only | |
562 | the first part. | |
563 | ||
564 | (4) ``incremental`` -- Synchronize content that is described by the | |
565 | dirty bitmap | |
566 | ||
567 | .. note:: | |
568 | Refer to the :doc:`bitmaps` document in the QEMU source | |
569 | tree to learn about the detailed workings of the ``incremental`` | |
570 | synchronization mode. | |
571 | ||
572 | ||
573 | QMP invocation for ``drive-mirror`` | |
574 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
575 | ||
576 | To copy the contents of the entire disk image chain, from [A] all the | |
577 | way to [D], to a new target (``drive-mirror`` will create the destination | |
578 | file, if it doesn't already exist), call it [E]:: | |
579 | ||
580 | (QEMU) drive-mirror device=node-D target=e.qcow2 sync=full job-id=job0 | |
581 | { | |
582 | "execute": "drive-mirror", | |
583 | "arguments": { | |
584 | "device": "node-D", | |
585 | "job-id": "job0", | |
586 | "target": "e.qcow2", | |
587 | "sync": "full" | |
588 | } | |
589 | } | |
590 | ||
591 | The ``"sync": "full"``, from the above, means: copy the *entire* chain | |
592 | to the destination. | |
593 | ||
594 | Following the above, querying for active block jobs will show that a | |
595 | 'mirror' job is "ready" to be completed (and QEMU will also emit an | |
596 | event, ``BLOCK_JOB_READY``):: | |
597 | ||
598 | (QEMU) query-block-jobs | |
599 | { | |
600 | "execute": "query-block-jobs", | |
601 | "arguments": {} | |
602 | } | |
603 | { | |
604 | "return": [ | |
605 | { | |
606 | "busy": false, | |
607 | "type": "mirror", | |
608 | "len": 21757952, | |
609 | "paused": false, | |
610 | "ready": true, | |
611 | "io-status": "ok", | |
612 | "offset": 21757952, | |
613 | "device": "job0", | |
614 | "speed": 0 | |
615 | } | |
616 | ] | |
617 | } | |
618 | ||
619 | And, as noted in the previous section, there are two possible actions | |
620 | at this point: | |
621 | ||
622 | (a) Create a point-in-time snapshot by ending the synchronization. The | |
623 | point-in-time is at the time of *ending* the sync. (The result of | |
624 | the following being: the target image, [E], will be populated with | |
625 | content from the entire chain, [A] to [D]):: | |
626 | ||
627 | (QEMU) block-job-cancel device=job0 | |
628 | { | |
629 | "execute": "block-job-cancel", | |
630 | "arguments": { | |
631 | "device": "job0" | |
632 | } | |
633 | } | |
634 | ||
635 | (b) Or, complete the operation and pivot the live QEMU to the target | |
636 | copy:: | |
637 | ||
638 | (QEMU) block-job-complete device=job0 | |
639 | ||
640 | In either of the above cases, if you once again run the | |
641 | `query-block-jobs` command, there should not be any active block | |
642 | operation. | |
643 | ||
644 | Comparing 'commit' and 'mirror': In both then cases, the overlay images | |
645 | can be discarded. However, with 'commit', the *existing* base image | |
646 | will be modified (by updating it with contents from overlays); while in | |
647 | the case of 'mirror', a *new* target image is populated with the data | |
648 | from the disk image chain. | |
649 | ||
650 | ||
651 | QMP invocation for live storage migration with ``drive-mirror`` + NBD | |
652 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
653 | ||
654 | Live storage migration (without shared storage setup) is one of the most | |
655 | common use-cases that takes advantage of the ``drive-mirror`` primitive | |
656 | and QEMU's built-in Network Block Device (NBD) server. Here's a quick | |
657 | walk-through of this setup. | |
658 | ||
659 | Given the disk image chain:: | |
660 | ||
661 | [A] <-- [B] <-- [C] <-- [D] | |
662 | ||
663 | Instead of copying content from the entire chain, synchronize *only* the | |
664 | contents of the *top*-most disk image (i.e. the active layer), [D], to a | |
665 | target, say, [TargetDisk]. | |
666 | ||
667 | .. important:: | |
668 | The destination host must already have the contents of the backing | |
669 | chain, involving images [A], [B], and [C], visible via other means | |
670 | -- whether by ``cp``, ``rsync``, or by some storage array-specific | |
671 | command.) | |
672 | ||
673 | Sometimes, this is also referred to as "shallow copy" -- because only | |
674 | the "active layer", and not the rest of the image chain, is copied to | |
675 | the destination. | |
676 | ||
677 | .. note:: | |
678 | In this example, for the sake of simplicity, we'll be using the same | |
679 | ``localhost`` as both source and destination. | |
680 | ||
681 | As noted earlier, on the destination host the contents of the backing | |
682 | chain -- from images [A] to [C] -- are already expected to exist in some | |
683 | form (e.g. in a file called, ``Contents-of-A-B-C.qcow2``). Now, on the | |
684 | destination host, let's create a target overlay image (with the image | |
685 | ``Contents-of-A-B-C.qcow2`` as its backing file), to which the contents | |
686 | of image [D] (from the source QEMU) will be mirrored to:: | |
687 | ||
688 | $ qemu-img create -f qcow2 -b ./Contents-of-A-B-C.qcow2 \ | |
689 | -F qcow2 ./target-disk.qcow2 | |
690 | ||
691 | And start the destination QEMU (we already have the source QEMU running | |
692 | -- discussed in the section: `Interacting with a QEMU instance`_) | |
693 | instance, with the following invocation. (As noted earlier, for | |
694 | simplicity's sake, the destination QEMU is started on the same host, but | |
695 | it could be located elsewhere):: | |
696 | ||
2feac451 | 697 | $ ./x86_64-softmmu/qemu-system-x86_64 -display none -no-user-config \ |
8508eee7 KC |
698 | -M q35 -nodefaults -m 512 \ |
699 | -blockdev node-name=node-TargetDisk,driver=qcow2,file.driver=file,file.node-name=file,file.filename=./target-disk.qcow2 \ | |
700 | -device virtio-blk,drive=node-TargetDisk,id=virtio0 \ | |
701 | -S -monitor stdio -qmp unix:./qmp-sock2,server,nowait \ | |
702 | -incoming tcp:localhost:6666 | |
703 | ||
704 | Given the disk image chain on source QEMU:: | |
705 | ||
706 | [A] <-- [B] <-- [C] <-- [D] | |
707 | ||
708 | On the destination host, it is expected that the contents of the chain | |
709 | ``[A] <-- [B] <-- [C]`` are *already* present, and therefore copy *only* | |
710 | the content of image [D]. | |
711 | ||
712 | (1) [On *destination* QEMU] As part of the first step, start the | |
713 | built-in NBD server on a given host (local host, represented by | |
714 | ``::``)and port:: | |
715 | ||
716 | (QEMU) nbd-server-start addr={"type":"inet","data":{"host":"::","port":"49153"}} | |
717 | { | |
718 | "execute": "nbd-server-start", | |
719 | "arguments": { | |
720 | "addr": { | |
721 | "data": { | |
722 | "host": "::", | |
723 | "port": "49153" | |
724 | }, | |
725 | "type": "inet" | |
726 | } | |
727 | } | |
728 | } | |
729 | ||
730 | (2) [On *destination* QEMU] And export the destination disk image using | |
731 | QEMU's built-in NBD server:: | |
732 | ||
733 | (QEMU) nbd-server-add device=node-TargetDisk writable=true | |
734 | { | |
735 | "execute": "nbd-server-add", | |
736 | "arguments": { | |
737 | "device": "node-TargetDisk" | |
738 | } | |
739 | } | |
740 | ||
741 | (3) [On *source* QEMU] Then, invoke ``drive-mirror`` (NB: since we're | |
742 | running ``drive-mirror`` with ``mode=existing`` (meaning: | |
743 | synchronize to a pre-created file, therefore 'existing', file on the | |
744 | target host), with the synchronization mode as 'top' (``"sync: | |
745 | "top"``):: | |
746 | ||
747 | (QEMU) drive-mirror device=node-D target=nbd:localhost:49153:exportname=node-TargetDisk sync=top mode=existing job-id=job0 | |
748 | { | |
749 | "execute": "drive-mirror", | |
750 | "arguments": { | |
751 | "device": "node-D", | |
752 | "mode": "existing", | |
753 | "job-id": "job0", | |
754 | "target": "nbd:localhost:49153:exportname=node-TargetDisk", | |
755 | "sync": "top" | |
756 | } | |
757 | } | |
758 | ||
759 | (4) [On *source* QEMU] Once ``drive-mirror`` copies the entire data, and the | |
760 | event ``BLOCK_JOB_READY`` is emitted, issue ``block-job-cancel`` to | |
761 | gracefully end the synchronization, from source QEMU:: | |
762 | ||
763 | (QEMU) block-job-cancel device=job0 | |
764 | { | |
765 | "execute": "block-job-cancel", | |
766 | "arguments": { | |
767 | "device": "job0" | |
768 | } | |
769 | } | |
770 | ||
771 | (5) [On *destination* QEMU] Then, stop the NBD server:: | |
772 | ||
773 | (QEMU) nbd-server-stop | |
774 | { | |
775 | "execute": "nbd-server-stop", | |
776 | "arguments": {} | |
777 | } | |
778 | ||
779 | (6) [On *destination* QEMU] Finally, resume the guest vCPUs by issuing the | |
780 | QMP command `cont`:: | |
781 | ||
782 | (QEMU) cont | |
783 | { | |
784 | "execute": "cont", | |
785 | "arguments": {} | |
786 | } | |
787 | ||
788 | .. note:: | |
789 | Higher-level libraries (e.g. libvirt) automate the entire above | |
790 | process (although note that libvirt does not allow same-host | |
791 | migrations to localhost for other reasons). | |
792 | ||
793 | ||
794 | Notes on ``blockdev-mirror`` | |
795 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
796 | ||
797 | The ``blockdev-mirror`` command is equivalent in core functionality to | |
798 | ``drive-mirror``, except that it operates at node-level in a BDS graph. | |
799 | ||
800 | Also: for ``blockdev-mirror``, the 'target' image needs to be explicitly | |
801 | created (using ``qemu-img``) and attach it to live QEMU via | |
802 | ``blockdev-add``, which assigns a name to the to-be created target node. | |
803 | ||
804 | E.g. the sequence of actions to create a point-in-time backup of an | |
805 | entire disk image chain, to a target, using ``blockdev-mirror`` would be: | |
806 | ||
807 | (0) Create the QCOW2 overlays, to arrive at a backing chain of desired | |
808 | depth | |
809 | ||
810 | (1) Create the target image (using ``qemu-img``), say, ``e.qcow2`` | |
811 | ||
812 | (2) Attach the above created file (``e.qcow2``), run-time, using | |
813 | ``blockdev-add`` to QEMU | |
814 | ||
815 | (3) Perform ``blockdev-mirror`` (use ``"sync": "full"`` to copy the | |
816 | entire chain to the target). And notice the event | |
817 | ``BLOCK_JOB_READY`` | |
818 | ||
819 | (4) Optionally, query for active block jobs, there should be a 'mirror' | |
820 | job ready to be completed | |
821 | ||
822 | (5) Gracefully complete the 'mirror' block device job, and notice the | |
823 | the event ``BLOCK_JOB_COMPLETED`` | |
824 | ||
825 | (6) Shutdown the guest by issuing the QMP ``quit`` command so that | |
826 | caches are flushed | |
827 | ||
828 | (7) Then, finally, compare the contents of the disk image chain, and | |
829 | the target copy with ``qemu-img compare``. You should notice: | |
830 | "Images are identical" | |
831 | ||
832 | ||
833 | QMP invocation for ``blockdev-mirror`` | |
834 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
835 | ||
836 | Given the disk image chain:: | |
837 | ||
838 | [A] <-- [B] <-- [C] <-- [D] | |
839 | ||
840 | To copy the contents of the entire disk image chain, from [A] all the | |
841 | way to [D], to a new target, call it [E]. The following is the flow. | |
842 | ||
843 | Create the overlay images, [B], [C], and [D]:: | |
844 | ||
845 | (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2 | |
846 | (QEMU) blockdev-snapshot-sync node-name=node-B snapshot-file=c.qcow2 snapshot-node-name=node-C format=qcow2 | |
847 | (QEMU) blockdev-snapshot-sync node-name=node-C snapshot-file=d.qcow2 snapshot-node-name=node-D format=qcow2 | |
848 | ||
849 | Create the target image, [E]:: | |
850 | ||
851 | $ qemu-img create -f qcow2 e.qcow2 39M | |
852 | ||
853 | Add the above created target image to QEMU, via ``blockdev-add``:: | |
854 | ||
855 | (QEMU) blockdev-add driver=qcow2 node-name=node-E file={"driver":"file","filename":"e.qcow2"} | |
856 | { | |
857 | "execute": "blockdev-add", | |
858 | "arguments": { | |
859 | "node-name": "node-E", | |
860 | "driver": "qcow2", | |
861 | "file": { | |
862 | "driver": "file", | |
863 | "filename": "e.qcow2" | |
864 | } | |
865 | } | |
866 | } | |
867 | ||
868 | Perform ``blockdev-mirror``, and notice the event ``BLOCK_JOB_READY``:: | |
869 | ||
870 | (QEMU) blockdev-mirror device=node-B target=node-E sync=full job-id=job0 | |
871 | { | |
872 | "execute": "blockdev-mirror", | |
873 | "arguments": { | |
874 | "device": "node-D", | |
875 | "job-id": "job0", | |
876 | "target": "node-E", | |
877 | "sync": "full" | |
878 | } | |
879 | } | |
880 | ||
881 | Query for active block jobs, there should be a 'mirror' job ready:: | |
882 | ||
883 | (QEMU) query-block-jobs | |
884 | { | |
885 | "execute": "query-block-jobs", | |
886 | "arguments": {} | |
887 | } | |
888 | { | |
889 | "return": [ | |
890 | { | |
891 | "busy": false, | |
892 | "type": "mirror", | |
893 | "len": 21561344, | |
894 | "paused": false, | |
895 | "ready": true, | |
896 | "io-status": "ok", | |
897 | "offset": 21561344, | |
898 | "device": "job0", | |
899 | "speed": 0 | |
900 | } | |
901 | ] | |
902 | } | |
903 | ||
904 | Gracefully complete the block device job operation, and notice the | |
905 | event ``BLOCK_JOB_COMPLETED``:: | |
906 | ||
907 | (QEMU) block-job-complete device=job0 | |
908 | { | |
909 | "execute": "block-job-complete", | |
910 | "arguments": { | |
911 | "device": "job0" | |
912 | } | |
913 | } | |
914 | { | |
915 | "return": {} | |
916 | } | |
917 | ||
918 | Shutdown the guest, by issuing the ``quit`` QMP command:: | |
919 | ||
920 | (QEMU) quit | |
921 | { | |
922 | "execute": "quit", | |
923 | "arguments": {} | |
924 | } | |
925 | ||
926 | ||
927 | Live disk backup --- ``drive-backup`` and ``blockdev-backup`` | |
928 | ------------------------------------------------------------- | |
929 | ||
930 | The ``drive-backup`` (and its newer equivalent ``blockdev-backup``) allows | |
931 | you to create a point-in-time snapshot. | |
932 | ||
933 | In this case, the point-in-time is when you *start* the ``drive-backup`` | |
934 | (or its newer equivalent ``blockdev-backup``) command. | |
935 | ||
936 | ||
937 | QMP invocation for ``drive-backup`` | |
938 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
939 | ||
940 | Yet again, starting afresh with our example disk image chain:: | |
941 | ||
942 | [A] <-- [B] <-- [C] <-- [D] | |
943 | ||
944 | To create a target image [E], with content populated from image [A] to | |
945 | [D], from the above chain, the following is the syntax. (If the target | |
946 | image does not exist, ``drive-backup`` will create it):: | |
947 | ||
948 | (QEMU) drive-backup device=node-D sync=full target=e.qcow2 job-id=job0 | |
949 | { | |
950 | "execute": "drive-backup", | |
951 | "arguments": { | |
952 | "device": "node-D", | |
953 | "job-id": "job0", | |
954 | "sync": "full", | |
955 | "target": "e.qcow2" | |
956 | } | |
957 | } | |
958 | ||
959 | Once the above ``drive-backup`` has completed, a ``BLOCK_JOB_COMPLETED`` event | |
960 | will be issued, indicating the live block device job operation has | |
961 | completed, and no further action is required. | |
962 | ||
963 | ||
964 | Notes on ``blockdev-backup`` | |
965 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
966 | ||
967 | The ``blockdev-backup`` command is equivalent in functionality to | |
968 | ``drive-backup``, except that it operates at node-level in a Block Driver | |
969 | State (BDS) graph. | |
970 | ||
971 | E.g. the sequence of actions to create a point-in-time backup | |
972 | of an entire disk image chain, to a target, using ``blockdev-backup`` | |
973 | would be: | |
974 | ||
975 | (0) Create the QCOW2 overlays, to arrive at a backing chain of desired | |
976 | depth | |
977 | ||
978 | (1) Create the target image (using ``qemu-img``), say, ``e.qcow2`` | |
979 | ||
980 | (2) Attach the above created file (``e.qcow2``), run-time, using | |
981 | ``blockdev-add`` to QEMU | |
982 | ||
983 | (3) Perform ``blockdev-backup`` (use ``"sync": "full"`` to copy the | |
984 | entire chain to the target). And notice the event | |
985 | ``BLOCK_JOB_COMPLETED`` | |
986 | ||
987 | (4) Shutdown the guest, by issuing the QMP ``quit`` command, so that | |
988 | caches are flushed | |
989 | ||
990 | (5) Then, finally, compare the contents of the disk image chain, and | |
991 | the target copy with ``qemu-img compare``. You should notice: | |
992 | "Images are identical" | |
993 | ||
994 | The following section shows an example QMP invocation for | |
995 | ``blockdev-backup``. | |
996 | ||
997 | QMP invocation for ``blockdev-backup`` | |
998 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
999 | ||
1000 | Given a disk image chain of depth 1 where image [B] is the active | |
1001 | overlay (live QEMU is writing to it):: | |
1002 | ||
1003 | [A] <-- [B] | |
1004 | ||
1005 | The following is the procedure to copy the content from the entire chain | |
1006 | to a target image (say, [E]), which has the full content from [A] and | |
1007 | [B]. | |
1008 | ||
1009 | Create the overlay [B]:: | |
1010 | ||
1011 | (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2 | |
1012 | { | |
1013 | "execute": "blockdev-snapshot-sync", | |
1014 | "arguments": { | |
1015 | "node-name": "node-A", | |
1016 | "snapshot-file": "b.qcow2", | |
1017 | "format": "qcow2", | |
1018 | "snapshot-node-name": "node-B" | |
1019 | } | |
1020 | } | |
1021 | ||
1022 | ||
1023 | Create a target image that will contain the copy:: | |
1024 | ||
1025 | $ qemu-img create -f qcow2 e.qcow2 39M | |
1026 | ||
1027 | Then add it to QEMU via ``blockdev-add``:: | |
1028 | ||
1029 | (QEMU) blockdev-add driver=qcow2 node-name=node-E file={"driver":"file","filename":"e.qcow2"} | |
1030 | { | |
1031 | "execute": "blockdev-add", | |
1032 | "arguments": { | |
1033 | "node-name": "node-E", | |
1034 | "driver": "qcow2", | |
1035 | "file": { | |
1036 | "driver": "file", | |
1037 | "filename": "e.qcow2" | |
1038 | } | |
1039 | } | |
1040 | } | |
1041 | ||
1042 | Then invoke ``blockdev-backup`` to copy the contents from the entire | |
1043 | image chain, consisting of images [A] and [B] to the target image | |
1044 | 'e.qcow2':: | |
1045 | ||
1046 | (QEMU) blockdev-backup device=node-B target=node-E sync=full job-id=job0 | |
1047 | { | |
1048 | "execute": "blockdev-backup", | |
1049 | "arguments": { | |
1050 | "device": "node-B", | |
1051 | "job-id": "job0", | |
1052 | "target": "node-E", | |
1053 | "sync": "full" | |
1054 | } | |
1055 | } | |
1056 | ||
1057 | Once the above 'backup' operation has completed, the event, | |
1058 | ``BLOCK_JOB_COMPLETED`` will be emitted, signalling successful | |
1059 | completion. | |
1060 | ||
1061 | Next, query for any active block device jobs (there should be none):: | |
1062 | ||
1063 | (QEMU) query-block-jobs | |
1064 | { | |
1065 | "execute": "query-block-jobs", | |
1066 | "arguments": {} | |
1067 | } | |
1068 | ||
1069 | Shutdown the guest:: | |
1070 | ||
1071 | (QEMU) quit | |
1072 | { | |
1073 | "execute": "quit", | |
1074 | "arguments": {} | |
1075 | } | |
1076 | "return": {} | |
1077 | } | |
1078 | ||
1079 | .. note:: | |
1080 | The above step is really important; if forgotten, an error, "Failed | |
1081 | to get shared "write" lock on e.qcow2", will be thrown when you do | |
1082 | ``qemu-img compare`` to verify the integrity of the disk image | |
1083 | with the backup content. | |
1084 | ||
1085 | ||
1086 | The end result will be the image 'e.qcow2' containing a | |
1087 | point-in-time backup of the disk image chain -- i.e. contents from | |
1088 | images [A] and [B] at the time the ``blockdev-backup`` command was | |
1089 | initiated. | |
1090 | ||
1091 | One way to confirm the backup disk image contains the identical content | |
1092 | with the disk image chain is to compare the backup and the contents of | |
1093 | the chain, you should see "Images are identical". (NB: this is assuming | |
1094 | QEMU was launched with ``-S`` option, which will not start the CPUs at | |
1095 | guest boot up):: | |
1096 | ||
1097 | $ qemu-img compare b.qcow2 e.qcow2 | |
1098 | Warning: Image size mismatch! | |
1099 | Images are identical. | |
1100 | ||
1101 | NOTE: The "Warning: Image size mismatch!" is expected, as we created the | |
1102 | target image (e.qcow2) with 39M size. |