Peter Maydell [Fri, 7 Apr 2017 14:23:48 +0000 (15:23 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer fixes for 2.9.0-rc4
# gpg: Signature made Fri 07 Apr 2017 13:44:17 BST
# gpg: using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream:
mirror: Fix aio context of mirror_top_bs
block: Assert attached child node has right aio context
block: Fix unpaired aio_disable_external in external snapshot
block: Don't check permissions for copy on read
qemu-img: img_create does not support image-opts, fix docs
iotests: Add mirror tests for orphaned source
block/mirror: Fix use-after-free
commit: Set commit_top_bs->total_sectors
commit: Set commit_top_bs->aio_context
block: Ignore guest dev permissions during incoming migration
block: Fix unpaired aio_disable_external in external snapshot
bdrv_replace_child_noperm tries to hand over the quiesce_counter state
from old bs to the new one, but if they are not on the same aio context
this causes unbalance.
Fix this by setting the correct aio context before calling
bdrv_append().
Kevin Wolf [Fri, 7 Apr 2017 10:29:05 +0000 (12:29 +0200)]
block: Don't check permissions for copy on read
The assertion is currently failing. We can't require callers to have
write permissions when all they are doing is a read, so comment it out.
Add a FIXME comment in the code so that the check is re-enabled when
copy on read is refactored into its own filter driver.
Jeff Cody [Thu, 6 Apr 2017 17:45:42 +0000 (13:45 -0400)]
qemu-img: img_create does not support image-opts, fix docs
The documentation and help for qemu-img claims that 'qemu-img create'
will take the '--image-opts' argument. This is not true, so this
patch removes those claims.
Max Reitz [Mon, 3 Apr 2017 17:51:49 +0000 (19:51 +0200)]
block/mirror: Fix use-after-free
If @bs does not have any parents, the only reference to @mirror_top_bs
will be held by the BlockJob object after the bdrv_unref() following
block_job_create(). However, if block_job_create() fails, this reference
will not exist and @mirror_top_bs will have been deleted when we
goto fail.
The issue comes back at all later entries to the fail label: We delete
the BlockJob object before rolling back our changes to the node graph.
This means that we will delete @mirror_top_bs in the process.
All in all, whenever @bs does not have any parents and we go down the
fail path we will dereference @mirror_top_bs after it has been deleted.
Fix this by invoking bdrv_unref() only when block_job_create() was
successful and by bdrv_ref()'ing @mirror_top_bs in the fail path before
deleting the BlockJob object. Finally, bdrv_unref() it at the end of the
fail path after we actually no longer need it.
Kevin Wolf [Thu, 6 Apr 2017 17:07:14 +0000 (19:07 +0200)]
commit: Set commit_top_bs->total_sectors
Like in the mirror filter driver, we also need to set the image size for
the commit filter driver. This is less likely to be a problem in
practice than for the mirror because we're not at the active layer here,
but attaching new parents to a node in the middle of the chain is
possible, so the size needs to be correct anyway.
Kevin Wolf [Tue, 4 Apr 2017 15:29:03 +0000 (17:29 +0200)]
block: Ignore guest dev permissions during incoming migration
Usually guest devices don't like other writers to the same image, so
they use blk_set_perm() to prevent this from happening. In the migration
phase before the VM is actually running, though, they don't have a
problem with writes to the image. On the other hand, storage migration
needs to be able to write to the image in this phase, so the restrictive
blk_set_perm() call of qdev devices breaks it.
This patch flags all BlockBackends with a qdev device as
blk->disable_perm during incoming migration, which means that the
requested permissions are stored in the BlockBackend, but not actually
applied to its root node yet.
Once migration has finished and the VM should be resumed, the
permissions are applied. If they cannot be applied (e.g. because the NBD
server used for block migration hasn't been shut down), resuming the VM
fails.
Alex Williamson [Thu, 6 Apr 2017 22:03:26 +0000 (16:03 -0600)]
vfio/pci-quirks: Exclude non-ioport BAR from NVIDIA quirk
The NVIDIA BAR5 quirk is targeting an ioport BAR. Some older devices
have a BAR5 which is not ioport and can induce a segfault here. Test
the BAR type to skip these devices.
Paolo Bonzini [Wed, 5 Apr 2017 08:11:36 +0000 (10:11 +0200)]
tco: do not generate an NMI
This behavior is not indicated in the datasheet and can confuse the OS.
The TCO can trap NMIs from SERR# or IOCHK# and convert them to SMIs; but
any other TCO event is either delivered as an SMI or completely disabled.
Peter Maydell [Tue, 4 Apr 2017 17:00:23 +0000 (18:00 +0100)]
Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into staging
Some 9pfs bugs fixes: potential hang at reset, migration blocker leak.
# gpg: Signature made Tue 04 Apr 2017 17:07:55 BST
# gpg: using DSA key 0x02FC3AEB0101DBC2
# gpg: Good signature from "Greg Kurz <[email protected]>"
# gpg: aka "Greg Kurz <[email protected]>"
# gpg: aka "Greg Kurz <[email protected]>"
# gpg: aka "Gregory Kurz (Groug) <[email protected]>"
# gpg: aka "[jpeg image of size 3330]"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894 DBA2 02FC 3AEB 0101 DBC2
* remotes/gkurz/tags/for-upstream:
9pfs: clear migration blocker at session reset
9pfs: fix multiple flush for same request
Greg Kurz [Tue, 4 Apr 2017 16:06:01 +0000 (18:06 +0200)]
9pfs: clear migration blocker at session reset
The migration blocker survives a device reset: if the guest mounts a 9p
share and then gets rebooted with system_reset, it will be unmigratable
until it remounts and umounts the 9p share again.
This happens because the migration blocker is supposed to be cleared when
we put the last reference on the root fid, but virtfs_reset() wrongly calls
free_fid() instead of put_fid().
This patch fixes virtfs_reset() so that it honor the way fids are supposed
to be manipulated: first get a reference and later put it back when you're
done.
Greg Kurz [Tue, 4 Apr 2017 16:06:01 +0000 (18:06 +0200)]
9pfs: fix multiple flush for same request
If a client tries to flush the same outstanding request several times, only
the first flush completes. Subsequent ones keep waiting for the request
completion in v9fs_flush() and, therefore, leak a PDU. This will cause QEMU
to hang when draining active PDUs the next time the device is reset.
Let have each flush request wake up the next one if any. The last waiter
frees the cancelled PDU.
pci: Only unmap bus_master_enabled_region if was added previously
Normally pci_init_bus_master() would be called either via
bus->machine_done.notify or directly from do_pci_register_device().
However if a device's realize() failed, pci_init_bus_master() is not
called, and do_pci_unregister_device() fails on
memory_region_del_subregion() as it was not mapped.
This adds a check that subregion was mapped before unmapping it.
The qio_dns_resolver_lookup_sync() method is required to be a no-op
for socket kinds that don't require name resolution. Thus the KIND_FD
handling should not return an error.
Wang guang [Mon, 3 Apr 2017 11:05:21 +0000 (12:05 +0100)]
io: fix incoming client socket initialization
The channel socket was initialized manually, but forgot to set
QIO_CHANNEL_FEATURE_SHUTDOWN. Thus, the colo_process_incoming_thread
would hang at recvmsg. This patch just call qio_channel_socket_new to
get channel, Which set QIO_CHANNEL_FEATURE_SHUTDOWN already.
Peter Maydell [Fri, 31 Mar 2017 12:36:41 +0000 (13:36 +0100)]
tests/libqtest.c: Delete possible stale unix sockets
Occasionally if a test crashes or is interrupted by the user
at the wrong moment it could leave behind a stale UNIX
socket in /tmp/. This will then cause a subsequent test
run to fail spuriously with
tests/libqtest.c:70:init_socket: assertion failed (ret != -1): (-1 != -1)
if it happens to reuse the same PID.
Defend against this by deleting any stray stale socket before
trying to open the new ones for this test.
main-loop: Acquire main_context lock around os_host_main_loop_wait.
When running virt-rescue the serial console hangs from time to time.
Virt-rescue runs an ordinary Linux kernel "appliance", but there is
only a single idle process running inside, so the qemu main loop is
largely idle. With virt-rescue >= 1.37 you may be able to observe the
hang by doing:
$ virt-rescue -e ^] --scratch
><rescue> while true; do ls -l /usr/bin; done
The hang in virt-rescue can be resolved by pressing a key on the
serial console.
Possibly with the same root cause, we also observed hangs during very
early boot of regular Linux VMs with a serial console. Those hangs
are extremely rare, but you may be able to observe them by running
this command on baremetal for a sufficiently long time:
$ while libguestfs-test-tool -t 60 >& /tmp/log ; do echo -n . ; done
(Check in /tmp/log that the failure was caused by a hang during early
boot, and not some other reason)
During investigation of this bug, Paolo Bonzini wrote:
> glib is expecting QEMU to use g_main_context_acquire around accesses to
> GMainContext. However QEMU is not doing that, instead it is taking its
> own mutex. So we should add g_main_context_acquire and
> g_main_context_release in the two implementations of
> os_host_main_loop_wait; these should undo the effect of Frediano's
> glib patch.
This patch exactly implements Paolo's suggestion in that paragraph.
This fixes the serial console hang in my testing, across 3 different
physical machines (AMD, Intel Core i7 and Intel Xeon), over many hours
of automated testing. I wasn't able to reproduce the early boot hangs
(but as noted above, these are extremely rare in any case).
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1435432 Reported-by: Richard W.M. Jones <[email protected]> Tested-by: Richard W.M. Jones <[email protected]> Signed-off-by: Richard W.M. Jones <[email protected]>
Message-Id: <20170331205133[email protected]>
[Paolo: this is actually a glib bug: recent glib versions are also
expecting g_main_context_acquire around g_poll---but that is not
documented and probably not even intended]. Signed-off-by: Paolo Bonzini <[email protected]>
Peter Maydell [Mon, 3 Apr 2017 15:43:39 +0000 (16:43 +0100)]
Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2017-04-03' into staging
Block patches for 2.9-rc3
# gpg: Signature made Mon 03 Apr 2017 16:29:49 BST
# gpg: using RSA key 0xF407DB0061D5CF40
# gpg: Good signature from "Max Reitz <[email protected]>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40
* remotes/maxreitz/tags/pull-block-2017-04-03:
block/parallels: Avoid overflows
iotests: Improve image-clear tests on non-aligned image
qcow2: Discard unaligned tail when wiping image
iotests: fix 097 when run with qcow
qemu-io-cmds: Assert that global and nofile commands don't use ct->perms
sheepdog: Fix blockdev-add
nbd: Tidy up blockdev-add interface
sockets: New helper socket_address_crumple()
qapi-schema: SocketAddressFlat variants 'vsock' and 'fd'
gluster: Prepare for SocketAddressFlat extension
block: Document -drive problematic code and bugs
io vnc sockets: Clean up SocketAddressKind switches
char: Fix socket with "type": "vsock" address
nbd sockets vnc: Mark problematic address family tests TODO
block: add missed aio_context_acquire into release_drive
Max Reitz [Fri, 31 Mar 2017 17:05:12 +0000 (19:05 +0200)]
block/parallels: Avoid overflows
Change the types of variables in allocate_clusters() to int64_t so we do
not have to worry about potential overflows.
Add an assertion that our accesses to s->bat[] do not result in a buffer
overflow and that the implicit conversion performed when invoking
bat_entry_off() does not result in an integer overflow.
Eric Blake [Fri, 31 Mar 2017 18:53:56 +0000 (13:53 -0500)]
iotests: Improve image-clear tests on non-aligned image
Tweak 097 and 176 to operate on an image that is not cluster-aligned,
to give further coverage of clearing out an entire image, including
the recent fix to eliminate the difference between fast path (97) and
slow (176) for qcow2. Also tested on qcow (97 only, since qcow lacks
snapshots).
Eric Blake [Fri, 31 Mar 2017 18:53:55 +0000 (13:53 -0500)]
qcow2: Discard unaligned tail when wiping image
There is a subtle difference between the fast (qcow2v3 with no
extra data) and slow path (qcow2v2 format [aka 0.10], or when a
snapshot is present) of qcow2_make_empty(). The slow path fails
to discard the final (partial) cluster of an unaligned image.
The problem stems from the fact that qcow2_discard_clusters() was
silently ignoring sub-cluster head and tail on unaligned requests.
A quick audit of all callers shows that qcow2_snapshot_create() has
always passed a cluster-aligned request since the call was added
in commit 1ebf561; qcow2_co_pdiscard() has passed a cluster-aligned
request since commit ecdbead taught the block layer about preferred
discard alignment; and qcow2_make_empty() was fixed to pass an
aligned start (but not necessarily end) in commit a3e1505.
Asserting that the start is always aligned also points out that we
now have a dead check: rounding the end offset down can never result
in a value less than the aligned start offset (the check was rendered
dead with commit ecdbead). Meanwhile, we do not want to round the
end cluster down in the one case of the end offset matching the
(unaligned) file size - that final partial cluster should still be
discarded.
With those fixes in place, the fast and slow paths are back in sync
at discarding an entire image; the next patch will update
qemu-iotests to ensure we don't regress.
Note that bdrv_co_pdiscard ignores ALL partial cluster requests,
including the partial cluster at the end of an image; it can be
argued that the partial cluster at the end should be special-cased
so that a guest issuing discard requests at proper alignments
everywhere else can likewise empty the entire image. But that
optimization is left for another day.
qcow2: Don't strand clusters near 2G intervals during commit
extended the 097 test case so that it did two passes, once
with an internal snapshot, once without.
qcow (v1) does not support internal snapshots, so this change
broke test 097 when run against qcow.
This splits 097 in two, creating a new 176 that tests the
internal snapshot codepath, effectively putting 097 back
to its content before the above commit.
Peter Maydell [Fri, 31 Mar 2017 13:38:49 +0000 (14:38 +0100)]
qemu-io-cmds: Assert that global and nofile commands don't use ct->perms
It would be a bug for a command with the CMD_NOFILE_OK or
CMD_FLAG_GLOBAL flags set to also set the ct->perms field,
because the former says "OK for a file not to be open"
but the latter is a check on a file.
Add an assertion in qemuio_add_command() so we can catch that
sort of buggy command definition immediately rather than it
being a bug that only manifests when a particular set of
command line options is used.
(Coverity gets confused about this (CID 1371723) and reports
that we might dereference a NULL blk pointer in this case,
because it can't tell that that code path never happens with
the cmdinfo_t that we have. This commit won't help unconfuse
it, but it does fix the underlying issue.)
Commit 831acdc "sheepdog: Implement bdrv_parse_filename()" and commit d282f34 "sheepdog: Support blockdev-add" have different ideas on how
the QemuOpts parameters for the server address are named. Fix that.
While there, rename BlockdevOptionsSheepdog member addr to server, for
consistency with BlockdevOptionsSsh, BlockdevOptionsGluster,
BlockdevOptionsNbd.
SocketAddress is a simple union, and simple unions are awkward: they
have their variant members wrapped in a "data" object on the wire, and
require additional indirections in C. I intend to limit its use to
existing external interfaces, and convert all internal interfaces to
SocketAddressFlat.
BlockdevOptionsNbd is an external interface using SocketAddress. We
already use SocketAddressFlat elsewhere in blockdev-add. Replace it
by SocketAddressFlat while we can (it's new in 2.9) for simplicity and
consistency. For example,
Since the internal interfaces still take SocketAddress, this requires
conversion function socket_address_crumple(). It'll go away when I
update the interfaces.
Unfortunately, SocketAddress is also visible in -drive since 2.8:
Nobody should be using it, as it's fairly new and has never been
documented, so adding still more compatibility gunk to keep it working
isn't worth the trouble. You now have to use
Because of this interface change, iotest 147 has to be adapted.
Unfortunately, we cannot just flatten all of the addresses because
nbd-server-start still takes a plain SocketAddress. Therefore, we need
both and this is most easily achieved by writing the SocketAddress into
the code and flattening it where necessary.
SocketAddress is a simple union, and simple unions are awkward: they
have their variant members wrapped in a "data" object on the wire, and
require additional indirections in C. I intend to limit its use to
existing external interfaces. New ones should use SocketAddressFlat.
I further intend to convert all internal interfaces to
SocketAddressFlat. This helper should go away then.
qapi-schema: SocketAddressFlat variants 'vsock' and 'fd'
Note that the new variants are impossible in qemu_gluster_glfs_init(),
because the gconf->server can only come from qemu_gluster_parse_uri()
or qemu_gluster_parse_json(), and neither can create anything but
'inet' or 'unix'.
qemu_gluster_glfs_init() and qemu_gluster_parse_json() rely on the
fact that SocketAddressFlatType has only two members
SOCKET_ADDRESS_FLAT_TYPE_INET and SOCKET_ADDRESS_FLAT_TYPE_UNIX.
Correct, but won't stay correct. Make them more robust.
-blockdev and blockdev_add convert their arguments via QObject to
BlockdevOptions for qmp_blockdev_add(), which converts them back to
QObject, then to a flattened QDict. The QDict's members are typed
according to the QAPI schema.
-drive converts its argument via QemuOpts to a (flat) QDict. This
QDict's members are all QString.
Thus, the QType of a flat QDict member depends on whether it comes
from -drive or -blockdev/blockdev_add, except when the QAPI type maps
to QString, which is the case for 'str' and enumeration types.
The block layer core extracts generic configuration from the flat
QDict, and the block driver extracts driver-specific configuration.
Both commonly do so by converting (parts of) the flat QDict to
QemuOpts, which turns all values into strings. Not exactly elegant,
but correct.
However, A few places access the flat QDict directly:
* Most of them access members that are always QString. Correct.
* bdrv_open_inherit() accesses a boolean, carefully. Correct.
* nfs_config() uses a QObject input visitor. Correct only because the
visited type contains nothing but QStrings.
* nbd_config() and ssh_config() use a QObject input visitor, and the
visited types contain non-QStrings: InetSocketAddress members
@numeric, @to, @ipv4, @ipv6. -drive works as long as you don't try
to use them (they're all optional). @to is ignored anyway.
Reproducer:
-drive driver=ssh,server.host=h,server.port=22,server.ipv4,path=p
-drive driver=nbd,server.type=inet,server.data.host=h,server.data.port=22,server.data.ipv4
both fail with "Invalid parameter type for 'data.ipv4', expected: boolean"
Add suitable comments to all these places. Mark the buggy ones FIXME.
"Fortunately", -drive's driver-specific options are entirely
undocumented.
io vnc sockets: Clean up SocketAddressKind switches
We have quite a few switches over SocketAddressKind. Some have case
labels for all enumeration values, others rely on a default label.
Some abort when the value isn't a valid SocketAddressKind, others
report an error then.
Unify as follows. Always provide case labels for all enumeration
values, to clarify intent. Abort when the value isn't a valid
SocketAddressKind, because the program state is messed up then.
Crashes because SocketAddress_to_str() is blissfully unaware of
SOCKET_ADDRESS_KIND_VSOCK. Fix that. Pick the output format to match
socket_parse(), just like the existing formats.
nbd sockets vnc: Mark problematic address family tests TODO
Certain features make sense only with certain address families. For
instance, passing file descriptors requires AF_UNIX. Testing
SocketAddress's saddr->type == SOCKET_ADDRESS_KIND_UNIX is obvious,
but problematic: it can't recognize AF_UNIX when type ==
SOCKET_ADDRESS_KIND_FD.
Mark such tests of saddr->type TODO. We may want to check the address
family with getsockname() there.
Denis V. Lunev [Tue, 28 Mar 2017 16:12:46 +0000 (19:12 +0300)]
block: add missed aio_context_acquire into release_drive
Recently we expirience hang with iothreads enabled with the following
call trace:
Thread 1 (Thread 0x7fa95efebc80 (LWP 177117)):
0 ppoll () from /lib64/libc.so.6
2 qemu_poll_ns () at qemu-timer.c:313
3 aio_poll () at aio-posix.c:457
4 bdrv_flush () at block/io.c:2641
5 bdrv_close () at block.c:2143
6 bdrv_delete () at block.c:2352
7 bdrv_unref () at block.c:3429
8 blk_remove_bs () at block/block-backend.c:427
9 blk_delete () at block/block-backend.c:178
10 blk_unref () at block/block-backend.c:226
11 object_property_del_all () at qom/object.c:399
12 object_finalize () at qom/object.c:461
13 object_unref () at qom/object.c:898
14 object_property_del_child () at qom/object.c:422
15 qmp_marshal_device_del () at qmp-marshal.c:1145
16 handle_qmp_command () at /usr/src/debug/qemu-2.6.0/monitor.c:3929
Technically bdrv_flush() stucks in
while (rwco.ret == NOT_DONE) {
aio_poll(aio_context, true);
}
but rwco.ret is equal to 0 thus we have missed wakeup. Code investigation
reveals that we do not have performed aio_context_acquire() on this call
stack.
libusbx doesn't exist any more, the fork got merged back to libusb. So
stop using LIBUSBX_API_VERSION and use LIBUSB_API_VERSION instead. For
backward compatibility alias LIBUSB_API_VERSION to LIBUSBX_API_VERSION
in case we figure LIBUSB_API_VERSION isn't defined.
Peter Maydell [Fri, 31 Mar 2017 14:31:11 +0000 (15:31 +0100)]
disas/cris.c: Avoid unintentional sign extension
Commit 001ebaca7b11 fixed some unintended sign extension issues
spotted by Coverity (CID 1005402, 1005403), but didn't catch
all of them. Fix the rest, so we behave consistently whether
'long' is 32 bit or 64 bit.
Peter Maydell [Tue, 28 Mar 2017 10:58:38 +0000 (11:58 +0100)]
configure: Mark SPARC as supported
Thanks to John Paul Adrian Glaubitz <[email protected]>
and the Debian Project, we now have access to a SPARC Linux
system we can use for build testing. Move SPARC back into
the "supported" list.
Peter Maydell [Thu, 30 Mar 2017 10:52:31 +0000 (11:52 +0100)]
tcg/sparc: Zero extend address argument to ld/st helpers
The C store helper functions take the address argument as a
target_ulong type; if this is 32 bit but the host is 64 bit
then the SPARC calling convention requires that the caller
must zero extend the value. We weren't doing this, which
meant we could pass values to the caller with high bits set
and QEMU would crash if it was compiled with optimizations.
In particular, the i386 BIOS would not start.
Peter Maydell [Thu, 30 Mar 2017 10:52:30 +0000 (11:52 +0100)]
tcg/sparc: Zero extend data argument to store helpers
The C store helper functions take the data argument as a uint8_t,
uint16_t, etc depending on the store size. The SPARC calling
convention requires that data types smaller than the register
size must be extended by the caller. We weren't doing this,
which meant that if QEMU was compiled with optimizations enabled
we could end up storing incorrect values to guest memory.
(In particular the i386 guest BIOS would crash on startup.)
Add code to the trampolines that call the store helpers to
do the zero extension as required.
Paolo Bonzini [Mon, 3 Apr 2017 11:41:28 +0000 (13:41 +0200)]
exec: revert MemoryRegionCache
MemoryRegionCache did not know about virtio support for IOMMUs (because the
two features were developed at the same time). Revert MemoryRegionCache
to "normal" address_space_* operations for 2.9, as it is simpler than
undoing the virtio patches.
* remotes/kraxel/tags/pull-fixes-20170403-1:
vnc: allow to connect with add_client when -vnc none
Fix input-linux reading from device
xhci: flush dequeue pointer to endpoint context
Peter Maydell [Mon, 3 Apr 2017 10:15:33 +0000 (11:15 +0100)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170403' into staging
ppc patch queue 2017-04-03
A single bugfix in this pull request, for an ugly assert() failure, if
the user ignores the information in query-hotpluggable-cpus and tries
to hot add CPUs to pseries with bad parameters.
vnc: allow to connect with add_client when -vnc none
Do not skip VNC initialization, in particular of auth method when vnc is
configured without sockets, since we should still allow connections
through QMP add_client.
Javier Celaya [Mon, 27 Mar 2017 18:26:24 +0000 (20:26 +0200)]
Fix input-linux reading from device
The evdev devices in input-linux.c are read in blocks of one whole
event. If there are not enough bytes available, they are discarded,
instead of being kept for the next read operation. This results in
lost events, of even non-working devices.
This patch keeps track of the number of bytes to be read to fill up
a whole event, and then handle it.
Changes from v1 to v2:
- Fix: Calculate offset on each iteration
Changes from v2 to v3:
- Fix coding style
- Store offset instead of bytes to be read
Gerd Hoffmann [Fri, 31 Mar 2017 10:25:21 +0000 (12:25 +0200)]
xhci: flush dequeue pointer to endpoint context
When done processing a endpoint ring we must update the dequeue pointer
in the endpoint context in guest memory. This is needed to make sure
the guest has a correct view of things and also to make live migration
work properly, because xhci post_load restores alot of the state from
xhci data structures in guest memory.
Add xhci_set_ep_state() call to do that.
The recursive calls stopped by commit ddb603ab6c981c1d67cb42266fc700c33e5b2d8f had the (unintentional) side
effect to hiding this bug. xhci_set_ep_state() was called before
processing, to set the state to running, which updated the dequeue
pointer too.
David Gibson [Sun, 2 Apr 2017 06:14:30 +0000 (16:14 +1000)]
pseries: Enforce homogeneous threads-per-core
For reasons that may be useful in future, CPU core objects, as used on the
pseries machine type have their own nr-threads property, potentially
allowing cores with different numbers of threads in the same system.
If the user/management uses the values specified in query-hotpluggable-cpus
as they're expected to do, this will never matter in pratice. But that's
not actually enforced - it's possible to manually specify a core with
a different number of threads from that in -smp. That will confuse the
platform - most immediately, this can be used to create a CPU thread with
index above max_cpus which leads to an assertion failure in
spapr_cpu_core_realize().
For now, enforce that all cores must have the same, standard, number of
threads.
Tejaswini Poluri [Tue, 28 Mar 2017 07:19:43 +0000 (12:49 +0530)]
target-i386: fix "info lapic" segfault on isapc
Start QEMU with
"qemu-system-x86_64 -nographic -M isapc -serial none-monitor stdio"
and enter "info lapic" at the monitor prompt ⇒
Segmentation fault
Stefan Hajnoczi [Mon, 27 Mar 2017 16:50:05 +0000 (17:50 +0100)]
iscsi: drop unused IscsiAIOCB.qiov field
The IscsiAIOCB.qiov field has been unused since commit 063c3378a9e3c25cc0afac3c72e4823d0621e352 ("block/iscsi: introduce
bdrv_co_{readv, writev, flush_to_disk}") back in 2013.
Max Reitz [Fri, 31 Mar 2017 12:04:31 +0000 (14:04 +0200)]
block/curl: Check protocol prefix
If the user has explicitly specified a block driver and thus a protocol,
we have to make sure the URL's protocol prefix matches. Otherwise the
latter will silently override the former which might catch some users by
surprise.
Max Reitz [Fri, 31 Mar 2017 12:04:30 +0000 (14:04 +0200)]
qapi/curl: Extend and fix blockdev-add schema
The curl block driver accepts more options than just "filename"; also,
the URL is actually expected to be passed through the "url" option
instead of "filename".
Eric Blake [Fri, 31 Mar 2017 15:27:30 +0000 (10:27 -0500)]
rbd: Fix regression in legacy key/values containing escaped :
Commit c7cacb3 accidentally broke legacy key-value parsing through
pseudo-filename parsing of -drive file=rbd://..., for any key that
contains an escaped ':'. Such a key is surprisingly common, thanks
to mon_host specifying a 'host:port' string. The break happens
because passing things from QDict through QemuOpts back to another
QDict requires that we pack our parsed key/value pairs into a string,
and then reparse that string, but the intermediate string that we
created ("key1=value1:key2=value2") lost the \: escaping that was
present in the original, so that we could no longer see which : were
used as separators vs. those used as part of the original input.
Fix it by collecting the key/value pairs through a QList, and
sending that list on a round trip through a JSON QString (as in
'["key1","value1","key2","value2"]') on its way through QemuOpts,
rather than hand-rolling our own string. Since the string is only
handled internally, this was faster than creating a full-blown
struct of '[{"key1":"value1"},{"key2":"value2"}]', and safer at
guaranteeing order compared to '{"key1":"value1","key2":"value2"}'.
It would be nicer if we didn't have to round-trip through QemuOpts
in the first place, but that's a much bigger task for later.
Even without an RBD setup, this serves a test of whether we get
the incorrect parser error of:
qemu-system-x86_64: -drive file=rbd:...cache=writeback: conf option 6789 has no value
or the correct behavior of hanging while trying to connect to
the requested mon_host of 192.168.1.2:6789.
The original patch intend to prevent linux i915 driver from using
stolen meory. But this patch breaks windows IGD driver loading on
Gen9+, as IGD HW will use stolen memory on Gen9+, once windows IGD
driver see zero size stolen memory, it will unload.
Meanwhile stolen memory will be disabled in 915 when i915 run as
a guest.
Peter Maydell [Fri, 31 Mar 2017 11:43:27 +0000 (12:43 +0100)]
Merge remote-tracking branch 'remotes/dgilbert/tags/pull-hmp-20170331' into staging
HMP pull (one bugfix)
# gpg: Signature made Fri 31 Mar 2017 11:57:17 BST
# gpg: using RSA key 0x0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
Eric Auger [Tue, 28 Mar 2017 17:20:40 +0000 (19:20 +0200)]
hw/intc/arm_gicv3_kvm: Check KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS in reset
KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS needs to be checked before
attempting to read ICC_CTLR_EL1; otherwise kernel versions not
exposing this kvm device group will be incompatible with qemu 2.9.
Iwona Kotlarska [Thu, 30 Mar 2017 05:09:24 +0000 (07:09 +0200)]
hmp: fix "dump-quest-memory" segfault
Running QEMU with "qemu-system-x86_64 -M none -nographic -m 256" and executing
"dump-guest-memory /dev/null 0 8192" results in segfault.
Fix by checking if we have CPU.
Peter Maydell [Fri, 31 Mar 2017 10:09:51 +0000 (11:09 +0100)]
Merge remote-tracking branch 'remotes/mdroth/tags/qga-pull-2017-03-30-tag' into staging
qemu-ga patch queue for 2.9
* fix make check failure of guest-get-fsinfo when nested virtual block
device partitions are mounted in the test environment
* fix static compilation for mingw builds
Peter Maydell [Fri, 31 Mar 2017 09:09:42 +0000 (10:09 +0100)]
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Fri 31 Mar 2017 01:50:55 BST
# gpg: using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
e1000: disable debug by default
virtio-net: avoid call tap_enable when there's only one queue
Jason Wang [Wed, 29 Mar 2017 02:41:23 +0000 (10:41 +0800)]
virtio-net: avoid call tap_enable when there's only one queue
We call tap_enable() even if for multiqueue is not enabled. This is
wrong since it should be used for multiqueue codes to enable a
disabled queue. Fixing this by only calling this when multiqueue is
used.
Michael Roth [Thu, 23 Mar 2017 20:24:32 +0000 (15:24 -0500)]
qga: don't fail if mount doesn't have slave devices
In some cases the slave devices of a virtual block device are tracked
by the parent in the corresponding sysfs node. For instance, if we
have a loop-back mount of the form:
/dev/loop3p1 on /home/mdroth/mnt type ext4 (rw,relatime,data=ordered)
The current code however assumes the mounted virtual block device,
loop3p1 in this case, contains the slaves directory, and reports an
error otherwise. This breaks 'make check' in certain environments.
Fix this by simply skipping attempts to generate disk topology
information in these cases. Since this information is documented
in QAPI as optionally-reported, this should be ok from an API
perspective.
In the future, this can possibly be improved upon by collecting
topology information from the parent in these cases.
There's no reason to pack structures where we don't care about size or
padding, this applies to AcpiStdTable in tests/acpi-utils.h.
OTOH bios-tables-test happens to be passing the address of a field in
this struct to a function that expects a pointer to normally aligned
data which results in a SIGBUS on architectures like SPARC that have
strict alignment requirements.
Fixes: 9e8458c02 ("acpi unit-test: compare DSDT and SSDT tables against expected values") Reported-by: Peter Maydell <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Tested-by: Peter Maydell <[email protected]>
Jason Wang [Wed, 29 Mar 2017 04:10:04 +0000 (12:10 +0800)]
vhost: generalize iommu memory region
We assumes the iommu_ops were attached to the root region of address
space. This may not be true for all kinds of IOMMU implementation and
especially after commit 3716d5902d74 ("pci: introduce a bus master
container"). So fix this by not assuming as->root has iommu_ops,
instead depending on the regions reported by memory listener through:
- register a memory listener to dma_as
- during region_add, if it's a region of IOMMU, register a specific
IOMMU notifier, and store all notifiers in a list.
- during region_del, compare and delete the IOMMU notifier from the list
This is also a must for making vhost device IOTLB works for all types
of IOMMUs. Note, since we register one notifier during each
.region_add, the IOTLB may be flushed more than one times, this is
suboptimal and could be optimized in the future.
Peter Maydell [Thu, 30 Mar 2017 14:28:19 +0000 (15:28 +0100)]
Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault' into staging
slirp updates
# gpg: Signature made Tue 28 Mar 2017 23:51:51 BST
# gpg: using RSA key 0xB0A51BF58C9179C5
# gpg: Good signature from "Samuel Thibault <[email protected]>"
# gpg: aka "Samuel Thibault <[email protected]>"
# gpg: aka "Samuel Thibault <[email protected]>"
# gpg: aka "Samuel Thibault <[email protected]>"
# gpg: aka "Samuel Thibault <[email protected]>"
# gpg: aka "Samuel Thibault <[email protected]>"
# gpg: aka "Samuel Thibault <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 900C B024 B679 31D4 0F82 304B D017 8C76 7D06 9EE6
# Subkey fingerprint: AEBF 7448 FAB9 453A 4552 390E B0A5 1BF5 8C91 79C5
* remotes/thibault/tags/samuel-thibault:
slirp: Send RDNSS in RA only if host has an IPv6 DNS server
slirp: Make RA build more flexible
slirp: fix compilation errors with DEBUG set
Peter Maydell [Thu, 30 Mar 2017 12:55:40 +0000 (13:55 +0100)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
virtio, pci: fixes
More fixes for 2.9.
Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Wed 29 Mar 2017 00:35:49 BST
# gpg: using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>"
# gpg: aka "Michael S. Tsirkin <[email protected]>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream:
virtio: fix vring_align() on 64-bit windows
pci: Add missing drop of bus master AS reference
event_notifier: prevent accidental use after close
Peter Maydell [Tue, 28 Mar 2017 13:01:52 +0000 (14:01 +0100)]
configure: Don't claim 'unsupported host OS' when better message available
The change in commit 898be3e0415c6d which made completely
unrecognized OSes cause an error_exit "Unsupported host OS"
has some unfortunate unintended effects:
* if you run 'configure --help' on an unsupported host OS
(eg if intending to use it as a build machine for a
cross compile to a supported host) then the message
is printed instead of --help
* if the C compiler doesn't work or is missing (eg if
you passed an incorrect --cross-prefix by mistake)
the message is printed instead of the more useful
'compiler does not exist or does not work' message
Fix this by postponing the error_exit in this situation
until later, when we have already identified the more
useful cases for this.
The long term fix for this would be to move handling
of --help much further up in the configure script,
and make its output not dependent on checks that configure
runs. However for 2.9 this would be too invasive.
Laurent Vivier [Tue, 28 Mar 2017 12:09:34 +0000 (14:09 +0200)]
spapr: fix memory hot-unplugging
If, once the kernel has booted, we try to remove a memory
hotplugged while the kernel was not started, QEMU crashes on
an assert:
qemu-system-ppc64: hw/virtio/vhost.c:651:
vhost_commit: Assertion `r >= 0' failed.
...
#4 in vhost_commit
#5 in memory_region_transaction_commit
#6 in pc_dimm_memory_unplug
#7 in spapr_memory_unplug
#8 spapr_machine_device_unplug
#9 in hotplug_handler_unplug
#10 in spapr_lmb_release
#11 in detach
#12 in set_allocation_state
#13 in rtas_set_indicator
...
If we take a closer look to the guest kernel log, we can see when
we try to unplug the memory:
pseries-hotplug-mem: Attempting to hot-add 4 LMB(s)
What happens:
1- The kernel has ignored the memory hotplug event because
it was not started when it was generated.
2- When we hot-unplug the memory,
QEMU starts to remove the memory,
generates an hot-unplug event,
and signals the kernel of the incoming new event
3- as the kernel is started, on the QEMU signal, it reads
the event list, decodes the hotplug event and tries to
finish the hotplugging.
4- QEMU receive the the hotplug notification while it
is trying to hot-unplug the memory. This moves the memory
DRC to an invalid state
This patch prevents this by not allowing to set the allocation
state to USABLE while the DRC is awaiting release.
Running postcopy-test with ASAN produces the following error:
QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64 tests/postcopy-test
...
=================================================================
==23641==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f1556600000 at pc 0x55b8e9d28208 bp 0x7f1555f4d3c0 sp 0x7f1555f4d3b0
READ of size 8 at 0x7f1556600000 thread T6
#0 0x55b8e9d28207 in htab_save_first_pass /home/elmarco/src/qq/hw/ppc/spapr.c:1528
#1 0x55b8e9d2939c in htab_save_iterate /home/elmarco/src/qq/hw/ppc/spapr.c:1665
#2 0x55b8e9beae3a in qemu_savevm_state_iterate /home/elmarco/src/qq/migration/savevm.c:1044
#3 0x55b8ea677733 in migration_thread /home/elmarco/src/qq/migration/migration.c:1976
#4 0x7f15845f46c9 in start_thread (/lib64/libpthread.so.0+0x76c9)
#5 0x7f157d9d0f7e in clone (/lib64/libc.so.6+0x107f7e)
0x7f1556600000 is located 0 bytes to the right of 2097152-byte region [0x7f1556400000,0x7f1556600000)
allocated by thread T0 here:
#0 0x7f159bb76980 in posix_memalign (/lib64/libasan.so.3+0xc7980)
#1 0x55b8eab185b2 in qemu_try_memalign /home/elmarco/src/qq/util/oslib-posix.c:106
#2 0x55b8eab186c8 in qemu_memalign /home/elmarco/src/qq/util/oslib-posix.c:122
#3 0x55b8e9d268a8 in spapr_reallocate_hpt /home/elmarco/src/qq/hw/ppc/spapr.c:1214
#4 0x55b8e9d26e04 in ppc_spapr_reset /home/elmarco/src/qq/hw/ppc/spapr.c:1261
#5 0x55b8ea12e913 in qemu_system_reset /home/elmarco/src/qq/vl.c:1697
#6 0x55b8ea13fa40 in main /home/elmarco/src/qq/vl.c:4679
#7 0x7f157d8e9400 in __libc_start_main (/lib64/libc.so.6+0x20400)
Thread T6 created by T0 here:
#0 0x7f159bae0488 in __interceptor_pthread_create (/lib64/libasan.so.3+0x31488)
#1 0x55b8eab1d9cb in qemu_thread_create /home/elmarco/src/qq/util/qemu-thread-posix.c:465
#2 0x55b8ea67874c in migrate_fd_connect /home/elmarco/src/qq/migration/migration.c:2096
#3 0x55b8ea66cbb0 in migration_channel_connect /home/elmarco/src/qq/migration/migration.c:500
#4 0x55b8ea678f38 in socket_outgoing_migration /home/elmarco/src/qq/migration/socket.c:87
#5 0x55b8eaa5a03a in qio_task_complete /home/elmarco/src/qq/io/task.c:142
#6 0x55b8eaa599cc in gio_task_thread_result /home/elmarco/src/qq/io/task.c:88
#7 0x7f15823e38e6 (/lib64/libglib-2.0.so.0+0x468e6)
SUMMARY: AddressSanitizer: heap-buffer-overflow /home/elmarco/src/qq/hw/ppc/spapr.c:1528 in htab_save_first_pass
index seems to be wrongly incremented, unless I miss something that
would be worth a comment.
Andrew Baumann [Fri, 24 Mar 2017 23:19:43 +0000 (16:19 -0700)]
virtio: fix vring_align() on 64-bit windows
long is 32-bits on 64-bit windows, which caused the top half of the
address to be truncated; this patch changes it to use the
QEMU_ALIGN_UP macro which does not suffer the same problem
The recent introduction of a bus master container added
memory_region_add_subregion() into the PCI device registering path but
missed memory_region_del_subregion() in the unregistering path leaving
a reference to the root memory region of the new container.
Halil Pasic [Thu, 2 Mar 2017 18:13:08 +0000 (19:13 +0100)]
event_notifier: prevent accidental use after close
Let's set the handles to the underlying facilities to their extremal
value so no accidental misuse can happen, and to make it obvious that the
notifier is dysfunctional. E.g. if we just close an fd but do not touch
the int holding the fd eventually a read/write could succeed again when
the fd gets reused, and corrupt the file addressed by the fd.
Samuel Thibault [Sun, 26 Mar 2017 18:46:34 +0000 (20:46 +0200)]
slirp: Send RDNSS in RA only if host has an IPv6 DNS server
Previously we would always send an RDNSS option in the RA, making the guest
try to resolve DNS through IPv6, even if the host does not actually have
and IPv6 DNS server available.
This makes the RDNSS option enabled only when an IPv6 DNS server is
available.
Eduardo Habkost [Mon, 27 Mar 2017 14:48:15 +0000 (11:48 -0300)]
i386: Don't override -cpu options on -cpu host/max
The existing code for "host" and "max" CPU models overrides every
single feature in the CPU object at realize time, even the ones
that were explicitly enabled or disabled by the user using
"feat=on" or "feat=off", while features set using +feat/-feat are
kept.
This means "-cpu host,+invtsc" works as expected, while
"-cpu host,invtsc=on" doesn't.
This was a known bug, already documented in a comment inside
x86_cpu_expand_features(). What makes this bug worse now is that
libvirt 3.0.0 and newer now use "feat=on|off" instead of
+feat/-feat when it detects a QEMU version that supports it (see
libvirt commit d47db7b16dd5422c7e487c8c8ee5b181a2f9cd66).
Change the feature property getter/setter to set a
env->user_features field, to keep track of features that were
explicitly changed using QOM properties. Then make the
max_features code not override user features when handling "-cpu
host" and "-cpu max".
This will also allow us to remove the plus_features/minus_features
hack in the future, but I plan to do that after 2.9.0 is
released.
Eduardo Habkost [Mon, 27 Mar 2017 14:48:14 +0000 (11:48 -0300)]
i386: Replace uint32_t* with FeatureWord on feature getter/setter
Instead of passing a pointer to the feature property getter and
setter functions, pass a FeatureWord enum so they can perform
other actions related to the feature flag.
This will be used to add a new "user_features" field to keep
track of features that were explicitly set by the user.
We first snprintf() to a fixed buffer, then g_strdup() the result
*boggle*.
Worse, the size of the fixed buffer INET6_ADDRSTRLEN + 5 + 4 is bogus:
the 4 correctly accounts for '[', ']', ':' and '\0', but
INET6_ADDRSTRLEN is not a suitable limit for inet->host, and 5 is not
one for inet->port! They are for host and port in *numeric* form
(exploiting that INET6_ADDRSTRLEN > INET_ADDRSTRLEN), but inet->host
can also be a hostname, and inet->port can be a service name, to be
resolved with getaddrinfo().
Fortunately, the only user so far is the "socket" network backend's
net_socket_connected(), which uses it to initialize a NetSocketState's
info_str[]. info_str[] has considerable more space: 256 instead of
55. So the bug's impact appears to be limited to truncated "info
networks" with the "socket" network backend.
* remotes/cody/tags/block-pull-request:
rbd: Fix bugs around -drive parameter "server"
rbd: Revert -blockdev parameter password-secret
rbd: Revert -blockdev and -drive parameter auth-supported
rbd: Clean up qemu_rbd_create()'s detour through QemuOpts
rbd: Clean up runtime_opts, fix -drive to reject filename
rbd: Don't accept -drive driver=rbd, keyvalue-pairs=...
rbd: Clean up after the previous commit
rbd: Don't limit length of parameter values
rbd: Fix to cleanly reject -drive without pool or image
rbd: Reject -blockdev server.*.{numeric, to, ipv4, ipv6}
qemu_rbd_open() takes option parameters as a flattened QDict, with
keys of the form server.%d.host, server.%d.port, where %d counts up
from zero.
qemu_rbd_array_opts() extracts these values as follows. First, it
calls qdict_array_entries() to find the list's length. For each list
element, it formats the list's key prefix (e.g. "server.0."), then
creates a new QDict holding the options with that key prefix, then
converts that to a QemuOpts, so it can finally get the member values
from there.
If there's one surefire way to make code using QDict more awkward,
it's creating more of them and mixing in QemuOpts for good measure.
The extraction of keys starting with server.%d into another QDict
makes us ignore parameters like server.0.neither-host-nor-port
silently.
The conversion to QemuOpts abuses runtime_opts, as described a few
commits ago.
Rewrite to simply get the values straight from the options QDict.
Fixes -drive not to crash when server.*.* are present, but
server.*.host is absent.
Fixes -drive to reject invalid server.*.*.
Permits cleaning up runtime_opts. Do that, and fix -drive to reject
bogus parameters host and port instead of silently ignoring them.
This reverts a part of commit 8a47e8e. We're having second thoughts
on the QAPI schema (and thus the external interface), and haven't
reached consensus, yet. Issues include:
* BlockdevOptionsRbd member @password-secret isn't actually a
password, it's a key generated by Ceph.
* We're not sure where member @password-secret belongs (see the
previous commit).
* How @password-secret interacts with settings from a configuration
file specified with @conf is undocumented.
Let's avoid painting ourselves into a corner now, and revert the
feature for 2.9.
Note that users can still configure an authentication key with a
configuration file. They probably do that anyway if they use Ceph
outside QEMU as well.