Tong Ho [Tue, 12 May 2020 14:36:49 +0000 (16:36 +0200)]
target/microblaze: Add MFS Rd,EDR translation
This is to fix cpu-abort with 'qemu: fatal: unknown mfs reg d'
(in the default case) when microblaze guest issues 'MFS Rd,EDR'
instruction.
Since embeddedsw release 2019.2, XPlm_ExceptionHandler() issues
the instruction on exception, and microblaze model aborts when
PLM firmware guest encounters an exception.
hw/dma/xilinx_axidma: mm2s: Stream descriptor by descriptor
Stream descriptor by descriptor from memory instead of
buffering entire packets before pushing. This enables
non-packet streaming clients to work and also lifts the
limitation that our internal DMA buffer needs to be able
to hold entire packets.
Some stream clients stream an endless stream of data while
other clients stream data in packets. Stream interfaces
usually have a way to signal the end of a packet or the
last beat of a transfer.
This adds an end-of-packet flag to the push interface.
Peter Maydell [Thu, 14 May 2020 09:58:30 +0000 (10:58 +0100)]
Merge remote-tracking branch 'remotes/gkurz/tags/9p-next-2020-05-14' into staging
Changes:
- Christian Schoenebeck is now co-maintainer for 9pfs
- relax checks for O_NOATIME
- minor documentation updates
# gpg: Signature made Thu 14 May 2020 08:14:37 BST
# gpg: using RSA key B4828BAF943140CEF2A3491071D4D5E5822F73D6
# gpg: Good signature from "Greg Kurz <[email protected]>" [full]
# gpg: aka "Gregory Kurz <[email protected]>" [full]
# gpg: aka "[jpeg image of size 3330]" [full]
# Primary key fingerprint: B482 8BAF 9431 40CE F2A3 4910 71D4 D5E5 822F 73D6
* remotes/gkurz/tags/9p-next-2020-05-14:
xen-9pfs: Fix log messages of reply errors
9pfs: local: ignore O_NOATIME if we don't have permissions
qemu-options.hx: 9p: clarify -virtfs vs. -fsdev
MAINTAINERS: Upgrade myself as 9pfs co-maintainer
If delivery of some 9pfs response fails for some reason, log the
error message by mentioning the 9P protocol reply type, not by
client's request type. The latter could be misleading that the
error occurred already when handling the request input.
Omar Sandoval [Thu, 14 May 2020 06:06:43 +0000 (08:06 +0200)]
9pfs: local: ignore O_NOATIME if we don't have permissions
QEMU's local 9pfs server passes through O_NOATIME from the client. If
the QEMU process doesn't have permissions to use O_NOATIME (namely, it
does not own the file nor have the CAP_FOWNER capability), the open will
fail. This causes issues when from the client's point of view, it
believes it has permissions to use O_NOATIME (e.g., a process running as
root in the virtual machine). Additionally, overlayfs on Linux opens
files on the lower layer using O_NOATIME, so in this case a 9pfs mount
can't be used as a lower layer for overlayfs (cf.
https://github.com/osandov/drgn/blob/dabfe1971951701da13863dbe6d8a1d172ad9650/vmtest/onoatimehack.c
and https://github.com/NixOS/nixpkgs/issues/54509).
Luckily, O_NOATIME is effectively a hint, and is often ignored by, e.g.,
network filesystems. open(2) notes that O_NOATIME "may not be effective
on all filesystems. One example is NFS, where the server maintains the
access time." This means that we can honor it when possible but fall
back to ignoring it.
The docs are ambiguous about the difference (or actually their
equality) between options '-virtfs' vs. '-fsdev'. So clarify that
'-virtfs' is actually just a convenience shortcut for its
generalized form '-fsdev' in conjunction with '-device virtio-9p-pci'.
And as we're at it, also be a bit more descriptive what 9pfs is
actually used for.
Denis Plotnikov [Thu, 7 May 2020 08:25:20 +0000 (11:25 +0300)]
qcow2: add zstd cluster compression
zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.
The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G
The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.
Denis Plotnikov [Thu, 7 May 2020 08:25:18 +0000 (11:25 +0300)]
qcow2: introduce compression type feature
The patch adds some preparation parts for incompatible compression type
feature to qcow2 allowing the use different compression methods for
image clusters (de)compressing.
It is implied that the compression type is set on the image creation and
can be changed only later by image conversion, thus compression type
defines the only compression algorithm used for the image, and thus,
for all image clusters.
The goal of the feature is to add support of other compression methods
to qcow2. For example, ZSTD which is more effective on compression than ZLIB.
The default compression is ZLIB. Images created with ZLIB compression type
are backward compatible with older qemu versions.
Adding of the compression type breaks a number of tests because now the
compression type is reported on image creation and there are some changes
in the qcow2 header in size and offsets.
The tests are fixed in the following ways:
* filter out compression_type for many tests
* fix header size, feature table size and backing file offset
affected tests: 031, 036, 061, 080
header_size +=8: 1 byte compression type
7 bytes padding
feature_table += 48: incompatible feature compression type
backing_file_offset += 56 (8 + 48 -> header_change + feature_table_change)
* add "compression type" for test output matching when it isn't filtered
affected tests: 049, 060, 061, 065, 082, 085, 144, 182, 185, 198, 206,
242, 255, 274, 280
Peter Maydell [Tue, 12 May 2020 16:00:10 +0000 (17:00 +0100)]
Merge remote-tracking branch 'remotes/stefanberger/tags/pull-tpm-2020-05-08-1' into staging
Merge tpm 2020/05/08 v3
# gpg: Signature made Tue 12 May 2020 16:50:34 BST
# gpg: using RSA key B818B9CADF9089C2D5CEC66B75AD65802A0B4211
# gpg: Good signature from "Stefan Berger <[email protected]>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B818 B9CA DF90 89C2 D5CE C66B 75AD 6580 2A0B 4211
* remotes/stefanberger/tags/pull-tpm-2020-05-08-1:
hw/tpm: fix usage of bool in tpm-tis.c
Jafar Abdi [Sat, 23 Mar 2019 14:26:37 +0000 (17:26 +0300)]
hw/tpm: fix usage of bool in tpm-tis.c
Clean up wrong usage of FALSE and TRUE in places that use "bool" from stdbool.h.
FALSE and TRUE (with capital letters) are the constants defined by glib for
being used with the "gboolean" type of glib. But some parts of the code also use
TRUE and FALSE for variables that are declared as "bool" (the type from <stdbool.h>).
Peter Maydell [Mon, 11 May 2020 13:34:27 +0000 (14:34 +0100)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20200511' into staging
target-arm queue:
aspeed: Add boot stub for smp booting
target/arm: Drop access_el3_aa32ns_aa64any()
aspeed: Support AST2600A1 silicon revision
aspeed: sdmc: Implement AST2600 locking behaviour
nrf51: Tracing cleanups
target/arm: Improve handling of SVE loads and stores
target/arm: Don't show TCG-only CPUs in KVM-only QEMU builds
hw/arm/musicpal: Map the UART devices unconditionally
target/arm: Fix tcg_gen_gvec_dup_imm vs DUP (indexed)
target/arm: Use tcg_gen_gvec_5_ptr for sve FMLA/FCMLA
* remotes/pmaydell/tags/pull-target-arm-20200511: (34 commits)
target/arm: Fix tcg_gen_gvec_dup_imm vs DUP (indexed)
target/arm: Use tcg_gen_gvec_5_ptr for sve FMLA/FCMLA
hw/arm/musicpal: Map the UART devices unconditionally
target/arm: Restrict TCG cpus to TCG accel
target/arm/cpu: Restrict v8M IDAU interface to Aarch32 CPUs
target/arm/cpu: Use ARRAY_SIZE() to iterate over ARMCPUInfo[]
target/arm: Make set_feature() available for other files
target/arm/kvm: Inline set_feature() calls
target/arm: Remove sve_memopidx
target/arm: Reuse sve_probe_page for gather loads
target/arm: Reuse sve_probe_page for scatter stores
target/arm: Reuse sve_probe_page for gather first-fault loads
target/arm: Use SVEContLdSt for contiguous stores
target/arm: Update contiguous first-fault and no-fault loads
target/arm: Use SVEContLdSt for multi-register contiguous loads
target/arm: Handle watchpoints in sve_ld1_r
target/arm: Use SVEContLdSt in sve_ld1_r
target/arm: Adjust interface of sve_ld1_host_fn
target/arm: Add sve infrastructure for page lookup
target/arm: Drop manual handling of set/clear_helper_retaddr
...
hw/arm/musicpal: Map the UART devices unconditionally
I can't find proper documentation or datasheet, but it is likely
a MMIO mapped serial device mapped in the 0x80000000..0x8000ffff
range belongs to the SoC address space, thus is always mapped in
the memory bus.
Map the devices on the bus regardless a chardev is attached to it.
Follow the model set up for contiguous loads. This handles
watchpoints correctly for contiguous stores, recognizing the
exception before any changes to memory.
target/arm: Update contiguous first-fault and no-fault loads
With sve_cont_ldst_pages, the differences between first-fault and no-fault
are minimal, so unify the routines. With cpu_probe_watchpoint, we are able
to make progress through pages with TLB_WATCHPOINT set when the watchpoint
does not actually fire.
Handle all of the watchpoints for active elements all at once,
before we've modified the vector register. This removes the
TLB_WATCHPOINT bit from page[].flags, which means that we can
use the normal fast path via RAM.
First use of the new helper functions, so we can remove the
unused markup. No longer need a scratch for user-only, as
we completely probe the page set before reading; system mode
still requires a scratch for MMIO.
The current interface includes a loop; change it to load a
single element. We will then be able to use the function
for ld{2,3,4} where individual vector elements are not adjacent.
Replace each call with the simplest possible loop over active
elements.
target/arm: Add sve infrastructure for page lookup
For contiguous predicated memory operations, we want to
minimize the number of tlb lookups performed. We have
open-coded this for sve_ld1_r, but for correctness with
MTE we will need this for all of the memory operations.
Create a structure that holds the bounds of active elements,
and metadata for two pages. Add routines to find those
active elements, lookup the pages, and run watchpoints
for those pages.
Temporarily mark the functions unused to avoid Werror.
Use the "normal" memory access functions, rather than the
softmmu internal helper functions directly.
Since fb901c905dc3, cpu_mem_index is now a simple extract
from env->hflags and not a large computation. Which means
that it's now more work to pass around this value than it
is to recompute it.
This only adjusts the primitives, and does not clean up
all of the uses within sve_helper.c.
We currently have target-endian versions of these operations,
but no easy way to force a specific endianness. This can be
helpful if the target has endian-specific operations, or a mode
that swaps endianness.
This new interface will allow targets to probe for a page
and then handle watchpoints themselves. This will be most
useful for vector predicated memory operations, where one
page lookup can be used for many operations, and one test
can avoid many watchpoint checks.
accel/tcg: Adjust probe_access call to page_check_range
We have validated that addr+size does not cross a page boundary.
Therefore we need to validate exactly one page. We can achieve
that passing any value 1 <= x <= size to page_check_range.
hw/timer/nrf51_timer: Display timer ID in trace events
The NRF51 series SoC have 3 timer peripherals, each having
4 counters. To help differentiate which peripheral is accessed,
display the timer ID in the trace events.
Joel Stanley [Thu, 9 Apr 2020 06:31:37 +0000 (16:01 +0930)]
aspeed: Add boot stub for smp booting
This is a boot stub that is similar to the code u-boot runs, allowing
the kernel to boot the secondary CPU.
u-boot works as follows:
1. Initialises the SMP mailbox area in the SCU at 0x1e6e2180 with default values
2. Copies a stub named 'mailbox_insn' from flash to the SCU, just above the
mailbox area
3. Sets AST_SMP_MBOX_FIELD_READY to a magic value to indicate the
secondary can begin execution from the stub
4. The stub waits until the AST_SMP_MBOX_FIELD_GOSIGN register is set to
a magic value
5. Jumps to the address in AST_SMP_MBOX_FIELD_ENTRY, starting Linux
Linux indicates it is ready by writing the address of its entrypoint
function to AST_SMP_MBOX_FIELD_ENTRY and the 'go' magic number to
AST_SMP_MBOX_FIELD_GOSIGN. The secondary CPU sees this at step 4 and
breaks out of it's loop.
To be compatible, a fixed qemu stub is loaded into the mailbox area. As
qemu can ensure the stub is loaded before execution starts, we do not
need to emulate the AST_SMP_MBOX_FIELD_READY behaviour of u-boot. The
secondary CPU's program counter points to the beginning of the stub,
allowing qemu to start secondaries at step four.
Reboot behaviour is preserved by resetting AST_SMP_MBOX_FIELD_GOSIGN
when the secondaries are reset.
This is only configured when the system is booted with -kernel and qemu
does not execute u-boot first.
Peter Maydell [Fri, 8 May 2020 13:29:18 +0000 (14:29 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches:
- qcow2: Fix preallocation on block devices
- backup: Make sure that source and target size match
- vmdk: Fix zero cluster handling
- Follow-up cleanups and fixes for the truncate changes
- iotests: Skip more tests if required drivers are missing
* remotes/kevin/tags/for-upstream: (30 commits)
block: Drop unused .bdrv_has_zero_init_truncate
vhdx: Rework truncation logic
parallels: Rework truncation logic
ssh: Support BDRV_REQ_ZERO_WRITE for truncate
sheepdog: Support BDRV_REQ_ZERO_WRITE for truncate
rbd: Support BDRV_REQ_ZERO_WRITE for truncate
nfs: Support BDRV_REQ_ZERO_WRITE for truncate
file-win32: Support BDRV_REQ_ZERO_WRITE for truncate
gluster: Drop useless has_zero_init callback
qcow2: Fix preallocation on block devices
iotests/055: Use cache.no-flush for vmdk target
iotests: Backup with different source/target size
backup: Make sure that source and target size match
backup: Improve error for bdrv_getlength() failure
iotests/283: Use consistent size for source and target
iotests: vmdk: Enable zeroed_grained=on by default
vmdk: Flush only once in vmdk_L2update()
vmdk: Don't update L2 table for zero write on zero cluster
vmdk: Fix partial overwrite of zero cluster
vmdk: Fix zero cluster allocation
...
Eric Blake [Tue, 28 Apr 2020 20:29:05 +0000 (15:29 -0500)]
block: Drop unused .bdrv_has_zero_init_truncate
Now that there are no clients of bdrv_has_zero_init_truncate, none of
the drivers need to worry about providing it.
What's more, this eliminates a source of some confusion: a literal
reading of the documentation as written in ceaca56f and implemented in
commit 1dcaf527 claims that a driver which returns 0 for
bdrv_has_zero_init_truncate() must not return 1 for
bdrv_has_zero_init(); this condition was violated for parallels, qcow,
and sometimes for vdi, although in practice it did not matter since
those drivers also lacked .bdrv_co_truncate.
Eric Blake [Tue, 28 Apr 2020 20:29:04 +0000 (15:29 -0500)]
vhdx: Rework truncation logic
The vhdx driver uses truncation for image growth, with a special case
for blocks that already read as zero but which are only being
partially written. But with a bit of rearranging, it's just as easy
to defer the decision on whether truncation resulted in zeroes to the
actual allocation attempt, reducing the number of places that still
use bdrv_has_zero_init_truncate.
Eric Blake [Tue, 28 Apr 2020 20:29:03 +0000 (15:29 -0500)]
parallels: Rework truncation logic
The parallels driver tries to use truncation for image growth, but can
only do so when reads are guaranteed as zero. Now that we have a way
to request zero contents from truncation, we can defer the decision to
actual allocation attempts rather than up front, reducing the number
of places that still use bdrv_has_zero_init_truncate.
Eric Blake [Tue, 28 Apr 2020 20:29:02 +0000 (15:29 -0500)]
ssh: Support BDRV_REQ_ZERO_WRITE for truncate
Our .bdrv_has_zero_init_truncate can detect when the remote side
always zero fills; we can reuse that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it when the server gives it to us for
free.
Eric Blake [Tue, 28 Apr 2020 20:29:01 +0000 (15:29 -0500)]
sheepdog: Support BDRV_REQ_ZERO_WRITE for truncate
Our .bdrv_has_zero_init_truncate always returns 1 because sheepdog
always 0-fills; we can use that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it.
Eric Blake [Tue, 28 Apr 2020 20:29:00 +0000 (15:29 -0500)]
rbd: Support BDRV_REQ_ZERO_WRITE for truncate
Our .bdrv_has_zero_init_truncate always returns 1 because rbd always
0-fills; we can use that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it.
Eric Blake [Tue, 28 Apr 2020 20:28:59 +0000 (15:28 -0500)]
nfs: Support BDRV_REQ_ZERO_WRITE for truncate
Our .bdrv_has_zero_init_truncate returns 1 if we detect that the OS
always 0-fills; we can use that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it when the OS gives it to us for
free.
Eric Blake [Tue, 28 Apr 2020 20:28:58 +0000 (15:28 -0500)]
file-win32: Support BDRV_REQ_ZERO_WRITE for truncate
When using bdrv_file, .bdrv_has_zero_init_truncate always returns 1;
therefore, we can behave just like file-posix, and always implement
BDRV_REQ_ZERO_WRITE by ignoring it since the OS gives it to us for
free (note that file-posix.c had to use an 'if' because it shared code
between regular files and block devices, but in file-win32.c,
bdrv_host_device uses a separate .bdrv_file_open).
Max Reitz [Tue, 5 May 2020 14:18:01 +0000 (16:18 +0200)]
qcow2: Fix preallocation on block devices
Calling bdrv_getlength() to get the pre-truncate file size will not
really work on block devices, because they have always the same length,
and trying to write beyond it will fail with a rather cryptic error
message.
Instead, we should use qcow2_get_last_cluster() and bdrv_getlength()
only as a fallback.
Before this patch:
$ truncate -s 1G test.img
$ sudo losetup -f --show test.img
/dev/loop0
$ sudo qemu-img create -f qcow2 -o preallocation=full /dev/loop0 64M
Formatting '/dev/loop0', fmt=qcow2 size=67108864 cluster_size=65536
preallocation=full lazy_refcounts=off refcount_bits=16
qemu-img: /dev/loop0: Could not resize image: Failed to resize refcount
structures: No space left on device
With this patch:
$ sudo qemu-img create -f qcow2 -o preallocation=full /dev/loop0 64M
Formatting '/dev/loop0', fmt=qcow2 size=67108864 cluster_size=65536
preallocation=full lazy_refcounts=off refcount_bits=16
qemu-img: /dev/loop0: Could not resize image: Failed to resize
underlying file: Preallocation mode 'full' unsupported for this
non-regular file
So as you can see, it still fails, but now the problem is missing
support on the block device level, so we at least get a better error
message.
Note that we cannot preallocate block devices on truncate by design,
because we do not know what area to preallocate. Their length is always
the same, the truncate operation does not change it.
Kevin Wolf [Tue, 5 May 2020 06:46:18 +0000 (08:46 +0200)]
iotests/055: Use cache.no-flush for vmdk target
055 uses the backup block job to create a compressed backup of an
$IMGFMT image with both qcow2 and vmdk targets. However, cluster
allocation in vmdk is very slow because it flushes the image file after
each L2 update.
There is no reason why we need this level of safety in this test, so
let's disable flushes for vmdk. For the blockdev-backup tests this is
achieved by simply adding the cache.no-flush=on to the drive_add() for
the target. For drive-backup, the caching flags are copied from the
source node, so we'll also add the flag to the source node, even though
it is not vmdk.
This can make the test run significantly faster (though it doesn't make
a difference on tmpfs). In my usual setup it goes from ~45s to ~15s.
Kevin Wolf [Thu, 30 Apr 2020 14:27:55 +0000 (16:27 +0200)]
iotests: Backup with different source/target size
This tests that the backup job catches situations where the target node
has a different size than the source node. It must also forbid resize
operations when the job is already running.
Kevin Wolf [Thu, 30 Apr 2020 14:27:54 +0000 (16:27 +0200)]
backup: Make sure that source and target size match
Since the introduction of a backup filter node in commit 00e30f05d, the
backup block job crashes when the target image is smaller than the
source image because it will try to write after the end of the target
node without having BLK_PERM_RESIZE. (Previously, the BlockBackend layer
would have caught this and errored out gracefully.)
We can fix this and even do better than the old behaviour: Check that
source and target have the same image size at the start of the block job
and unshare BLK_PERM_RESIZE. (This permission was already unshared
before the same commit 00e30f05d, but the BlockBackend that was used to
make the restriction was removed without a replacement.) This will
immediately error out when starting the job instead of only when writing
to a block that doesn't exist in the target.
Longer target than source would technically work because we would never
write to blocks that don't exist, but semantically these are invalid,
too, because a backup is supposed to create a copy, not just an image
that starts with a copy.
Kevin Wolf [Thu, 30 Apr 2020 14:27:53 +0000 (16:27 +0200)]
backup: Improve error for bdrv_getlength() failure
bdrv_get_device_name() will be an empty string with modern management
tools that don't use -drive. Use bdrv_get_device_or_node_name() instead
so that the node name is used if the BlockBackend is anonymous.
While at it, start with upper case to make the message consistent with
the rest of the function.
Kevin Wolf [Thu, 30 Apr 2020 14:27:52 +0000 (16:27 +0200)]
iotests/283: Use consistent size for source and target
The test case forgot to specify the null-co size for the target node.
When adding a check to backup that both sizes match, this would fail
because of the size mismatch and not the behaviour that the test really
wanted to test.
Kevin Wolf [Thu, 30 Apr 2020 13:30:07 +0000 (15:30 +0200)]
iotests: vmdk: Enable zeroed_grained=on by default
In order to avoid bitrot in the zero cluster code in VMDK, enable
zeroed_grain=on by default for the tests.
059 now unsets the default options because zeroed_grain=on works only
with some subformats and the test case tests many different subformats,
including those for which it doesn't work.
Kevin Wolf [Thu, 30 Apr 2020 13:30:06 +0000 (15:30 +0200)]
vmdk: Flush only once in vmdk_L2update()
If we have a backup L2 table, we currently flush once after writing to
the active L2 table and again after writing to the backup table. A
single flush is enough and makes things a little less slow.
Kevin Wolf [Thu, 30 Apr 2020 13:30:03 +0000 (15:30 +0200)]
vmdk: Fix zero cluster allocation
m_data must contain valid data even for zero clusters when no cluster
was allocated in the image file. Without this, zero writes segfault with
images that have zeroed_grain=on.
For zero writes, we don't want to allocate a cluster in the image file
even in compressed files.
Kevin Wolf [Thu, 30 Apr 2020 13:30:02 +0000 (15:30 +0200)]
vmdk: Rename VmdkMetaData.valid to new_allocation
m_data is used for zero clusters even though valid == 0. It really only
means that a new cluster was allocated in the image file. Rename it to
reflect this.
The code however does not detect correctly situations when the old and
the new end of the image are within the same cluster. The problem can
be reproduced with these steps:
Peter Maydell [Thu, 7 May 2020 17:43:20 +0000 (18:43 +0100)]
Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20200507a' into staging
Migration pull 2020-05-07
Mostly tidy-ups, but two new features:
cpu-throttle-tailslow for making a gentler throttle
xbzrle encoding rate measurement for getting a feal for xbzrle
performance.
# gpg: Signature made Thu 07 May 2020 18:00:27 BST
# gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <[email protected]>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert/tags/pull-migration-20200507a:
migration/multifd: Do error_free after migrate_set_error to avoid memleaks
migration/multifd: fix memleaks in multifd_new_send_channel_async
migration/xbzrle: add encoding rate
migration/rdma: fix a memleak on error path in rdma_start_incoming_migration
migration/ram: Consolidate variable reset after placement in ram_load_postcopy()
migration/throttle: Add cpu-throttle-tailslow migration parameter
migration/colo: Add missing error-propagation code
docs/devel/migration: start a debugging section
migration: move the units of migrate parameters from milliseconds to ms
monitor/hmp-cmds: add hmp_handle_error() for hmp_migrate_set_speed()
migration/migration: improve error reporting for migrate parameters
migration: fix bad indentation in error_report()
Pan Nengyuan [Wed, 6 May 2020 09:54:16 +0000 (05:54 -0400)]
migration/multifd: Do error_free after migrate_set_error to avoid memleaks
When error happen in multifd_send_thread, it use error_copy to set migrate error in
multifd_send_terminate_threads(). We should call error_free after it.
Similarly, fix another two places in multifd_recv_thread/multifd_save_cleanup.
The leak stack:
Direct leak of 48 byte(s) in 1 object(s) allocated from:
#0 0x7f781af07cf0 in calloc (/lib64/libasan.so.5+0xefcf0)
#1 0x7f781a2ce22d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5322d)
#2 0x55ee1d075c17 in error_setv /mnt/sdb/backup/qemu/util/error.c:61
#3 0x55ee1d076464 in error_setg_errno_internal /mnt/sdb/backup/qemu/util/error.c:109
#4 0x55ee1cef066e in qio_channel_socket_writev /mnt/sdb/backup/qemu/io/channel-socket.c:569
#5 0x55ee1cee806b in qio_channel_writev /mnt/sdb/backup/qemu/io/channel.c:207
#6 0x55ee1cee806b in qio_channel_writev_all /mnt/sdb/backup/qemu/io/channel.c:171
#7 0x55ee1cee8248 in qio_channel_write_all /mnt/sdb/backup/qemu/io/channel.c:257
#8 0x55ee1ca12c9a in multifd_send_thread /mnt/sdb/backup/qemu/migration/multifd.c:657
#9 0x55ee1d0607fc in qemu_thread_start /mnt/sdb/backup/qemu/util/qemu-thread-posix.c:519
#10 0x7f78159ae2dd in start_thread (/lib64/libpthread.so.0+0x82dd)
#11 0x7f78156df4b2 in __GI___clone (/lib64/libc.so.6+0xfc4b2)
Indirect leak of 52 byte(s) in 1 object(s) allocated from:
#0 0x7f781af07f28 in __interceptor_realloc (/lib64/libasan.so.5+0xeff28)
#1 0x7f78156f07d9 in __GI___vasprintf_chk (/lib64/libc.so.6+0x10d7d9)
#2 0x7f781a30ea6c in g_vasprintf (/lib64/libglib-2.0.so.0+0x93a6c)
#3 0x7f781a2e7cd0 in g_strdup_vprintf (/lib64/libglib-2.0.so.0+0x6ccd0)
#4 0x7f781a2e7d8c in g_strdup_printf (/lib64/libglib-2.0.so.0+0x6cd8c)
#5 0x55ee1d075c86 in error_setv /mnt/sdb/backup/qemu/util/error.c:65
#6 0x55ee1d076464 in error_setg_errno_internal /mnt/sdb/backup/qemu/util/error.c:109
#7 0x55ee1cef066e in qio_channel_socket_writev /mnt/sdb/backup/qemu/io/channel-socket.c:569
#8 0x55ee1cee806b in qio_channel_writev /mnt/sdb/backup/qemu/io/channel.c:207
#9 0x55ee1cee806b in qio_channel_writev_all /mnt/sdb/backup/qemu/io/channel.c:171
#10 0x55ee1cee8248 in qio_channel_write_all /mnt/sdb/backup/qemu/io/channel.c:257
#11 0x55ee1ca12c9a in multifd_send_thread /mnt/sdb/backup/qemu/migration/multifd.c:657
#12 0x55ee1d0607fc in qemu_thread_start /mnt/sdb/backup/qemu/util/qemu-thread-posix.c:519
#13 0x7f78159ae2dd in start_thread (/lib64/libpthread.so.0+0x82dd)
#14 0x7f78156df4b2 in __GI___clone (/lib64/libc.so.6+0xfc4b2)
Pan Nengyuan [Wed, 6 May 2020 09:54:15 +0000 (05:54 -0400)]
migration/multifd: fix memleaks in multifd_new_send_channel_async
When error happen in multifd_new_send_channel_async, 'sioc' will not be used
to create the multifd_send_thread. Let's free it to avoid a memleak. And also
do error_free after migrate_set_error() to avoid another leak in the same place.
The leak stack:
Direct leak of 2880 byte(s) in 8 object(s) allocated from:
#0 0x7f20b5118ae8 in __interceptor_malloc (/lib64/libasan.so.5+0xefae8)
#1 0x7f20b44df1d5 in g_malloc (/lib64/libglib-2.0.so.0+0x531d5)
#2 0x564133bce18b in object_new_with_type /mnt/sdb/backup/qemu/qom/object.c:683
#3 0x564133eea950 in qio_channel_socket_new /mnt/sdb/backup/qemu/io/channel-socket.c:56
#4 0x5641339cfe4f in socket_send_channel_create /mnt/sdb/backup/qemu/migration/socket.c:37
#5 0x564133a10328 in multifd_save_setup /mnt/sdb/backup/qemu/migration/multifd.c:772
#6 0x5641339cebed in migrate_fd_connect /mnt/sdb/backup/qemu/migration/migration.c:3530
#7 0x5641339d15e4 in migration_channel_connect /mnt/sdb/backup/qemu/migration/channel.c:92
#8 0x5641339cf5b7 in socket_outgoing_migration /mnt/sdb/backup/qemu/migration/socket.c:108
Direct leak of 384 byte(s) in 8 object(s) allocated from:
#0 0x7f20b5118cf0 in calloc (/lib64/libasan.so.5+0xefcf0)
#1 0x7f20b44df22d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5322d)
#2 0x56413406fc17 in error_setv /mnt/sdb/backup/qemu/util/error.c:61
#3 0x564134070464 in error_setg_errno_internal /mnt/sdb/backup/qemu/util/error.c:109
#4 0x5641340851be in inet_connect_addr /mnt/sdb/backup/qemu/util/qemu-sockets.c:379
#5 0x5641340851be in inet_connect_saddr /mnt/sdb/backup/qemu/util/qemu-sockets.c:458
#6 0x5641340870ab in socket_connect /mnt/sdb/backup/qemu/util/qemu-sockets.c:1105
#7 0x564133eeaabf in qio_channel_socket_connect_sync /mnt/sdb/backup/qemu/io/channel-socket.c:145
#8 0x564133eeabf5 in qio_channel_socket_connect_worker /mnt/sdb/backup/qemu/io/channel-socket.c:168
Indirect leak of 360 byte(s) in 8 object(s) allocated from:
#0 0x7f20b5118ae8 in __interceptor_malloc (/lib64/libasan.so.5+0xefae8)
#1 0x7f20af901817 in __GI___vasprintf_chk (/lib64/libc.so.6+0x10d817)
#2 0x7f20b451fa6c in g_vasprintf (/lib64/libglib-2.0.so.0+0x93a6c)
#3 0x7f20b44f8cd0 in g_strdup_vprintf (/lib64/libglib-2.0.so.0+0x6ccd0)
#4 0x7f20b44f8d8c in g_strdup_printf (/lib64/libglib-2.0.so.0+0x6cd8c)
#5 0x56413406fc86 in error_setv /mnt/sdb/backup/qemu/util/error.c:65
#6 0x564134070464 in error_setg_errno_internal /mnt/sdb/backup/qemu/util/error.c:109
#7 0x5641340851be in inet_connect_addr /mnt/sdb/backup/qemu/util/qemu-sockets.c:379
#8 0x5641340851be in inet_connect_saddr /mnt/sdb/backup/qemu/util/qemu-sockets.c:458
#9 0x5641340870ab in socket_connect /mnt/sdb/backup/qemu/util/qemu-sockets.c:1105
#10 0x564133eeaabf in qio_channel_socket_connect_sync /mnt/sdb/backup/qemu/io/channel-socket.c:145
#11 0x564133eeabf5 in qio_channel_socket_connect_worker /mnt/sdb/backup/qemu/io/channel-socket.c:168
Wei Wang [Thu, 30 Apr 2020 00:59:35 +0000 (08:59 +0800)]
migration/xbzrle: add encoding rate
Users may need to check the xbzrle encoding rate to know if the guest
memory is xbzrle encoding-friendly, and dynamically turn off the
encoding if the encoding rate is low.
Pan Nengyuan [Mon, 20 Apr 2020 10:27:27 +0000 (06:27 -0400)]
migration/rdma: fix a memleak on error path in rdma_start_incoming_migration
'rdma->host' is malloced in qemu_rdma_data_init, but forgot to free on the error
path in rdma_start_incoming_migration(), this patch fix that.
The leak stack:
Direct leak of 2 byte(s) in 1 object(s) allocated from:
#0 0x7fb7add18ae8 in __interceptor_malloc (/lib64/libasan.so.5+0xefae8)
#1 0x7fb7ad0df1d5 in g_malloc (/lib64/libglib-2.0.so.0+0x531d5)
#2 0x7fb7ad0f8b32 in g_strdup (/lib64/libglib-2.0.so.0+0x6cb32)
#3 0x55a0464a0f6f in qemu_rdma_data_init /mnt/sdb/qemu/migration/rdma.c:2647
#4 0x55a0464b0e76 in rdma_start_incoming_migration /mnt/sdb/qemu/migration/rdma.c:4020
#5 0x55a0463f898a in qemu_start_incoming_migration /mnt/sdb/qemu/migration/migration.c:365
#6 0x55a0458c75d3 in qemu_init /mnt/sdb/qemu/softmmu/vl.c:4438
#7 0x55a046a3d811 in main /mnt/sdb/qemu/softmmu/main.c:48
#8 0x7fb7a8417872 in __libc_start_main (/lib64/libc.so.6+0x23872)
#9 0x55a04536b26d in _start (/mnt/sdb/qemu/build/x86_64-softmmu/qemu-system-x86_64+0x286926d)
At the tail stage of throttling, the Guest is very sensitive to
CPU percentage while the @cpu-throttle-increment is excessive
usually at tail stage.
If this parameter is true, we will compute the ideal CPU percentage
used by the Guest, which may exactly make the dirty rate match the
dirty rate threshold. Then we will choose a smaller throttle increment
between the one specified by @cpu-throttle-increment and the one
generated by ideal CPU percentage.
Therefore, it is compatible to traditional throttling, meanwhile
the throttle increment won't be excessive at tail stage. This may
make migration time longer, and is disabled by default.