Since bdrv_co_preadv does all neccessary checks including
reading after the end of the backing file, avoid duplication
of verification before bdrv_co_preadv call.
Kevin Wolf [Mon, 11 Dec 2017 14:33:17 +0000 (15:33 +0100)]
block: Don't acquire AioContext in hmp_qemu_io()
Commit 15afd94a047 added code to acquire and release the AioContext in
qemuio_command(). This means that the lock is taken twice now in the
call path from hmp_qemu_io(). This causes BDRV_POLL_WHILE() to hang for
any requests issued to nodes in a non-mainloop AioContext.
Dropping the first locking from hmp_qemu_io() fixes the problem.
Kevin Wolf [Wed, 6 Dec 2017 10:00:59 +0000 (11:00 +0100)]
block: Unify order in drain functions
Drain requests are propagated to child nodes, parent nodes and directly
to the AioContext. The order in which this happened was different
between all combinations of drain/drain_all and begin/end.
The correct order is to keep children only drained when their parents
are also drained. This means that at the start of a drained section, the
AioContext needs to be drained first, the parents second and only then
the children. The correct order for the end of a drained section is the
opposite.
This patch changes the three other functions to follow the example of
bdrv_drained_begin(), which is the only one that got it right.
Kevin Wolf [Wed, 6 Dec 2017 09:45:27 +0000 (10:45 +0100)]
block: Don't wait for requests in bdrv_drain*_end()
The device is drained, so there is no point in waiting for requests at
the end of the drained section. Remove the bdrv_drain_recurse() calls
there.
The bdrv_drain_recurse() calls were introduced in commit 481cad48e5e
in order to call the .bdrv_co_drain_end() driver callback. This is now
done by a separate bdrv_drain_invoke() call.
Kevin Wolf [Tue, 5 Dec 2017 13:05:02 +0000 (14:05 +0100)]
test-bdrv-drain: Test BlockDriver callbacks for drain
This adds a test case that the BlockDriver callbacks for drain are
called in bdrv_drained_all_begin/end(), and that both of them are called
exactly once.
Kevin Wolf [Tue, 5 Dec 2017 12:53:35 +0000 (13:53 +0100)]
block: Call .drain_begin only once in bdrv_drain_all_begin()
bdrv_drain_all_begin() used to call the .bdrv_co_drain_begin() driver
callback inside its polling loop. This means that how many times it got
called for each node depended on long it had to poll the event loop.
This is obviously not right and results in nodes that stay drained even
after bdrv_drain_all_end(), which calls .bdrv_co_drain_begin() once per
node.
Fix bdrv_drain_all_begin() to call the callback only once, too.
Kevin Wolf [Tue, 5 Dec 2017 11:52:09 +0000 (12:52 +0100)]
block: Make bdrv_drain_invoke() recursive
This change separates bdrv_drain_invoke(), which calls the BlockDriver
drain callbacks, from bdrv_drain_recurse(). Instead, the function
performs its own recursion now.
One reason for this is that bdrv_drain_recurse() can be called multiple
times by bdrv_drain_all_begin(), but the callbacks may only be called
once. The separation is necessary to fix this bug.
The other reason is that we intend to go to a model where we call all
driver callbacks first, and only then start polling. This is not fully
achieved yet with this patch, as bdrv_drain_invoke() contains a
BDRV_POLL_WHILE() loop for the block driver callbacks, which can still
call callbacks for any unrelated event. It's a step in this direction
anyway.
John Snow [Tue, 5 Dec 2017 01:08:20 +0000 (20:08 -0500)]
iotests: fix 197 for vpc
VPC has some difficulty creating geometries of particular size.
However, we can indeed force it to use a literal one, so let's
do that for the sake of test 197, which is testing some specific
offsets.
Kevin Wolf [Thu, 30 Nov 2017 16:38:43 +0000 (17:38 +0100)]
block: Formats don't need CONSISTENT_READ with NO_IO
Commit 1f4ad7d fixed 'qemu-img info' for raw images that are currently
in use as a mirror target. It is not enough for image formats, though,
as these still unconditionally request BLK_PERM_CONSISTENT_READ.
As this permission is geared towards whether the guest-visible data is
consistent, and has no impact on whether the metadata is sane, and
'qemu-img info' does not read guest-visible data (except for the raw
format), it makes sense to not require BLK_PERM_CONSISTENT_READ if there
is not going to be any guest I/O performed, regardless of image format.
* remotes/bonzini/tags/for-upstream: (41 commits)
chardev: convert the socket server to QIONetListener
blockdev: convert qemu-nbd server to QIONetListener
blockdev: convert internal NBD server to QIONetListener
test: add some chardev mux event tests
chardev: fix backend events regression with mux chardev
rcu: reduce more than 7MB heap memory by malloc_trim()
checkpatch: volatile with a comment or sig_atomic_t is okay
i8259: move TYPE_INTERRUPT_STATS_PROVIDER upper
kvm-i8259: support "info pic" and "info irq"
i8259: generalize statistics into common code
i8259: use DEBUG_IRQ_COUNT always
i8259: convert DPRINTFs into trace
Remove legacy -no-kvm-pit option
scsi: replace hex constants with #defines
scsi: provide general-purpose functions to manage sense data
hw/i386/vmport: replace fprintf() by trace events or LOG_UNIMP
hw/mips/boston: Remove workaround for writes to ROM aborting
exec: Don't reuse unassigned_mem_ops for io_mem_rom
block/iscsi: only report an iSCSI Failure if we don't handle it gracefully
block/iscsi: dont leave allocmap in an invalid state on UNMAP failure
...
chardev: convert the socket server to QIONetListener
Instead of creating a QIOChannelSocket directly for the chardev
server socket, use a QIONetListener. This provides the ability
to listen on multiple sockets at the same time, so enables
full support for IPv4/IPv6 dual stack.
blockdev: convert qemu-nbd server to QIONetListener
Instead of creating a QIOChannelSocket directly for the NBD
server socket, use a QIONetListener. This provides the ability
to listen on multiple sockets at the same time, so enables
full support for IPv4/IPv6 dual stack. This also means we can
honour multiple FDs received during socket activation.
blockdev: convert internal NBD server to QIONetListener
Instead of creating a QIOChannelSocket directly for the NBD
server socket, use a QIONetListener. This provides the ability
to listen on multiple sockets at the same time, so enables
full support for IPv4/IPv6 dual stack.
Check the expected behaviour of qemu_chr_be_event() on a mux chardev.
For some reason, sending the event on the base chardev broadcast to
all frontends, while sending it on the mux chardev itself should
trigger the event on the currently focused chardev frontend.
chardev: fix backend events regression with mux chardev
Kirill noticied that on recent versions on QEMU he was not able to
trigger SysRq to invoke debug capabilites of Linux Kernel. He tracked
it down to qemu_chr_be_event() ignoring CHR_EVENT_BREAK due s->be
being NULL. The bug was introduced in 2.8, commit a4afa548fc6d ("char:
move front end handlers in CharBackend"). Since the commit, the
qemu_chr_be_event() failed to deliver CHR_EVENT_BREAK due to
qemu_chr_fe_init() does not set s->be in case of mux.
Let's fix this by teaching mux to send an event to the frontend with
the focus.
Yang Zhong [Wed, 20 Dec 2017 13:16:46 +0000 (21:16 +0800)]
rcu: reduce more than 7MB heap memory by malloc_trim()
Since there are some issues in memory alloc/free machenism
in glibc for little chunk memory, if Qemu frequently
alloc/free little chunk memory, the glibc doesn't alloc
little chunk memory from free list of glibc and still
allocate from OS, which make the heap size bigger and bigger.
This patch introduce malloc_trim(), which will free heap
memory when there is no rcu call during rcu thread loop.
malloc_trim() can be enabled/disabled by --enable-malloc-trim/
--disable-malloc-trim in the Qemu configure command. The
default malloc_trim() is enabled for libc.
Below are test results from smaps file.
(1)without patch 55f0783e1000-55f07992a000 rw-p 00000000 00:00 0 [heap]
Size: 21796 kB
Rss: 14260 kB
Pss: 14260 kB
Peter Xu [Sun, 10 Dec 2017 06:38:18 +0000 (14:38 +0800)]
kvm-i8259: support "info pic" and "info irq"
Let's leverage the i8259 common code for kvm-i8259 too.
I think it's still possible that stats can lost when i8259 is in kernel
and meanwhile when irqfd is used, e.g., by vfio or vhost devices.
However that should be rare IMHO since they should be using MSIs mostly
if they really want performance (that's why people use vhost and device
assignment), and no old INTx should be used. As long as the INTx users
are emulated in QEMU the stats will be correct.
For "info pic", it should be always accurate since we fetch kvm regs
before dump.
More importantly, it's just too simple to do this now - it's only 10+
LOC to gain this feature.
Paolo Bonzini [Mon, 27 Nov 2017 12:27:41 +0000 (13:27 +0100)]
scsi: provide general-purpose functions to manage sense data
Extract the common parts of scsi_sense_buf_to_errno, scsi_convert_sense
and scsi_target_send_command's REQUEST SENSE handling into two new
functions scsi_parse_sense_buf and scsi_build_sense_buf.
Fix a bug in scsi_target_send_command along the way; the length was
written in buf[10] rather than buf[7].
Reported-by: Dr. David Alan Gilbert <[email protected]> Reviewed-by: Dr. David Alan Gilbert <[email protected]> Fixes: b07fbce634 ("scsi-bus: correct responses for INQUIRY and REQUEST SENSE") Signed-off-by: Paolo Bonzini <[email protected]>
Peter Maydell [Wed, 13 Dec 2017 17:52:29 +0000 (17:52 +0000)]
hw/mips/boston: Remove workaround for writes to ROM aborting
Now that the memory system correctly handles writes to ROM for
guest CPUs that may generate exceptions for decode errors, we
can remove the workaround from the boston board.
Peter Maydell [Wed, 13 Dec 2017 17:52:28 +0000 (17:52 +0000)]
exec: Don't reuse unassigned_mem_ops for io_mem_rom
We set up the io_mem_rom special memory region using the
unassigned_mem_ops structure; this is then used when a guest tries to
write to ROM. This is incorrect, because the behaviour of unassigned
memory may be different from that of ROM for writes. In particular,
on some architectures writing to unassigned memory generates a guest
exception, whereas writing to ROM is generally ignored. Use a
special readonly_mem_ops for this purpose instead, so writes to
ROM are ignored for all guest CPUs.
Peter Lieven [Fri, 8 Dec 2017 11:51:08 +0000 (12:51 +0100)]
block/iscsi: only report an iSCSI Failure if we don't handle it gracefully
we currently report an "iSCSI Failure" in iscsi_co_generic_cb if the task
hasn't completed with SCSI_STATUS_GOOD. However, we expect a failure in
some cases and handle it gracefully. This is the case for misaligned UNMAPs
and WRITESAME10/16 calls without UNMAP. In this case a failure in the
logs can be quite misleading.
While we are at it improve the logging to reveal which operation failed
at what LBA.
Peter Xu [Thu, 23 Nov 2017 09:23:32 +0000 (17:23 +0800)]
cpu: refactor cpu_address_space_init()
Normally we create an address space for that CPU and pass that address
space into the function. Let's just do it inside to unify address space
creations. It'll simplify my next patch to rename those address spaces.
Thomas Huth [Thu, 30 Nov 2017 08:53:06 +0000 (09:53 +0100)]
hw/moxie/moxiesim: Add support for loading a BIOS on moxiesim
The moxiesim machine already defines a memory region for a firmware,
but does not provide the possibility to load an image via "-bios" yet.
This will be needed for the boot-serial tester, so let's add support
for "-bios" here now.
Thomas Huth [Thu, 30 Nov 2017 08:53:03 +0000 (09:53 +0100)]
tests/boot-serial-test: Add code to allow to specify our own kernel or bios
QEMU only ships with some few firmware images, i.e. we can currently run
the boot-serial test only on a very limited set of machines. But writing
some characters to the default UART of a machine can often be done with
some few lines of assembly, so we add the possibility to the boot-serial
tester to use its own mini-kernels or mini-firmwares. We write such images
then into a file that we can load with the "-kernel" or "-bios" parameter
when we launch QEMU.
Thomas Huth [Thu, 30 Nov 2017 08:53:02 +0000 (09:53 +0100)]
tests/boot-serial-test: Make sure that we check the timeout regularly
If the guest continuesly writes characters to the UART, we never leave
the inner while loop and thus never check whether we've reached the
timeout value. So if we fail to find the expected string in the UART
output, the test just hangs and never finishs. Use a counter to regularly
break out of the while loop to check the timeout.
Peter Maydell [Wed, 13 Dec 2017 11:19:19 +0000 (11:19 +0000)]
target/i386: Fix handling of VEX prefixes
In commit e3af7c788b73a6495eb9d94992ef11f6ad6f3c56 we
replaced direct calls to to cpu_ld*_code() with calls
to the x86_ld*_code() wrappers which incorporate an
advance of s->pc. Unfortunately we didn't notice that
in one place the old code was deliberately not incrementing
s->pc:
if (!CODE64(s) && (vex2 & 0xc0) != 0xc0) {
/* 4.1.4.6: In 32-bit mode, bits [7:6] must be 11b,
This meant we were mishandling this set of instructions.
Remove the manual advance of s->pc for the "is VEX" case
(which is now done by x86_ldub_code()) and instead rewind
PC in the case where we decide that this isn't really VEX.
sockets: remove obsolete code that updated listen address
When listening on unix/tcp sockets there was optional code that would update
the original SocketAddress struct with the info about the actual address that
was listened on. Since the conversion of everything to QIOChannelSocket, no
remaining caller made use of this feature. It has been replaced with the ability
to query the listen address after the fact using the function
qio_channel_socket_get_local_address. This is a better model when the input
address can result in listening on multiple distinct sockets.
Samuel Thibault [Mon, 11 Dec 2017 00:19:50 +0000 (01:19 +0100)]
baum: Truncate braille device size to 84x1
Baum device bigger than 84 do not actually exist, but the user's own
Braille device might be wider than 84 columns. Some guest drivers
would be upset by such sizes, so clamp the device size.
Stefan Weil [Mon, 13 Nov 2017 06:48:45 +0000 (07:48 +0100)]
target/i386: Fix compiler warnings
These gcc warnings are fixed:
target/i386/translate.c:4461:12: warning:
variable 'prefixes' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
target/i386/translate.c:4466:9: warning:
variable 'rex_w' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
target/i386/translate.c:4466:16: warning:
variable 'rex_r' might be clobbered by 'longjmp' or 'vfork' [-Wclobbered]
Tested with x86_64-w64-mingw32-gcc from Debian stretch.
cpu-exec: fix missed CPU kick during interrupt injection
The conditional memory barrier not only looks strange but actually is
wrong.
On s390x, I can reproduce interrupts via cpu_interrupt() not leading to
a proper kick out of emulation every now and then. cpu_interrupt() is
especially used for inter CPU communication via SIGP (esp. external
calls and emergency interrupts).
With this patch, I was not able to reproduce. (esp. no stalls or hangs
in the guest).
My setup is s390x MTTCG with 16 VCPUs on 8 CPU host, running make -j16.
cpus: make pause_all_cpus() play with SMP on single threaded TCG
pause_all_cpus() is sometimes called from a VCPU thread (e.g. s390x
during special reset). It cannot deal with multiple VCPUs per Thread
(single threaded TCG) yet.
Booting an s390x guest with -smp 2 and single threaded TCG from disk
currently fails. The DIAG 308 will issue a pause_all_cpus() and wait
forever for the CPUs to actually stop. But it is waiting for itself.
So let's stop all VCPUs belonging to the current thread. Factor out
stopping of a VCPU.
Evgeny Yakovlev [Wed, 22 Nov 2017 18:14:16 +0000 (21:14 +0300)]
hyperv: set partition-wide MSRs only on first vcpu
Hyper-V has a notion of partition-wide MSRs. Those MSRs are read and
written as usual on each VCPU, however the hypervisor maintains a single
global value for all VCPUs. Thus writing such an MSR from any single
VCPU affects the global value that is read by all other VCPUs.
This leads to an issue during VCPU hotplug: the zero-initialzied values
of those MSRs get synced into KVM and override the global values as has
already been set by the guest.
This change makes the partition-wide MSRs only be synchronized on the
first vcpu.
Yang Zhong [Wed, 22 Nov 2017 07:27:56 +0000 (15:27 +0800)]
x86/cpu: Enable new SSE/AVX/AVX512 cpu features
Intel IceLake cpu has added new cpu features,AVX512_VBMI2/GFNI/
VAES/VPCLMULQDQ/AVX512_VNNI/AVX512_BITALG. Those new cpu features
need expose to guest VM.
Fam Zheng [Tue, 5 Dec 2017 07:19:28 +0000 (15:19 +0800)]
scsi-block: Add share-rw option
Scsi-block doesn't use the DEFINE_BLOCK_PROPERTIES() macro so it didn't
gain the share-rw back when it was added to all other storage devices.
This option is meaningful here, and need to be used when attaching a
shared storage to guest.
Paolo Bonzini [Fri, 24 Nov 2017 16:44:22 +0000 (17:44 +0100)]
contrib: add systemd unit files
This lets distros standardize on how QEMU should install systemd
services for qemu-ga and qemu-pr-helper.
The qemu-ga unit file comes from Fedora, but I checked that
Debian is using the same path for the virtio-serisal port.
I would like to include this in 2.11, so that the qemu-pr-helper
socket can be standardized across distros. Note however that
the files are not installed. We can add a configure option
in 2.12 perhaps, but it's too late now; documenting the files
in the release notes should do.
linzhecheng [Tue, 28 Nov 2017 04:46:56 +0000 (12:46 +0800)]
qemu-thread: fix races on threads that exit very quickly
If we create a thread with QEMU_THREAD_DETACHED mode, QEMU may get a segfault with low probability.
The backtrace is:
#0 0x00007f46c60291d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f46c602a8c8 in __GI_abort () at abort.c:90
#2 0x00000000008543c9 in PAT_abort ()
#3 0x000000000085140d in patchIllInsHandler ()
#4 <signal handler called>
#5 pthread_detach (th=139933037614848) at pthread_detach.c:50
#6 0x0000000000829759 in qemu_thread_create (thread=thread@entry=0x7ffdaa8205e0, name=name@entry=0x94d94a "io-task-worker", start_routine=start_routine@entry=0x7eb9a0 <qio_task_thread_worker>,
arg=arg@entry=0x3f5cf70, mode=mode@entry=1) at util/qemu_thread_posix.c:512
#7 0x00000000007ebc96 in qio_task_run_in_thread (task=0x31db2c0, worker=worker@entry=0x7e7e40 <qio_channel_socket_connect_worker>, opaque=0xcd23380, destroy=0x7f1180 <qapi_free_SocketAddress>)
at io/task.c:141
#8 0x00000000007e7f33 in qio_channel_socket_connect_async (ioc=ioc@entry=0x626c0b0, addr=<optimized out>, callback=callback@entry=0x55e080 <qemu_chr_socket_connected>, opaque=opaque@entry=0x42862c0,
destroy=destroy@entry=0x0) at io/channel_socket.c:194
#9 0x000000000055bdd1 in socket_reconnect_timeout (opaque=0x42862c0) at qemu_char.c:4744
#10 0x00007f46c72483b3 in g_timeout_dispatch () from /usr/lib64/libglib-2.0.so.0
#11 0x00007f46c724799a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#12 0x000000000076c646 in glib_pollfds_poll () at main_loop.c:228
#13 0x000000000076c6eb in os_host_main_loop_wait (timeout=348000000) at main_loop.c:273
#14 0x000000000076c815 in main_loop_wait (nonblocking=nonblocking@entry=0) at main_loop.c:521
#15 0x000000000056a511 in main_loop () at vl.c:2076
#16 0x0000000000420705 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4940
The cause of this problem is a glibc bug; for more information, see
https://sourceware.org/bugzilla/show_bug.cgi?id=19951.
The solution for this bug is to use pthread_attr_setdetachstate.
There is a similar issue with pthread_setname_np, which is moved
from creating thread to created thread.
Peter Maydell [Wed, 20 Dec 2017 20:38:36 +0000 (20:38 +0000)]
Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2017-12-20' into staging
QAPI patches for 2017-12-20
# gpg: Signature made Wed 20 Dec 2017 18:53:28 GMT
# gpg: using RSA key 0x3870B400EB918653
# gpg: Good signature from "Markus Armbruster <[email protected]>"
# gpg: aka "Markus Armbruster <[email protected]>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* remotes/armbru/tags/pull-qapi-2017-12-20:
qmp: remove qmp_cpu
qapi-docs: fix a comment typo
qapi2texi: De-duplicate code to add blank line before symbol
qapi: Rename QAPIDoc.parser, .section to ._parser, ._section
qapi2texi: Simplify representation of section text
qapi: Simplify representation of QAPIDoc section text
qapi: Unify representation of doc section without name
qapi2texi: Clean up texi_sections()
tests/qapi-schema/doc-bad-section: New, factored out of doc-good
qapi: Make cur_doc local to QAPISchemaParser.__init__()
qapi: Eliminate QAPISchemaParser.__init__()'s local fname
qapi: Stop rejecting #optional
qapi-schema: Fix query-vm-generation-id's doc comment markup
'qmp_cpu' was implemented in commit 755f196898 ("qapi: Convert the cpu
command") as a functional no-op, a QMP call that does nothing and
return success. The idea, apparently, was to provide a counterpart
for the HMP 'hmp_cpu' command, introduced in the same commit.
After 6 years of its creation, qmp_cpu remains a functional no-op
that does nothing, having no value for any caller/user. A proposal
was sent to implement qmp_cpu like hmp_cpu works, but it was denied
[1]. The reason is that QMP must be as stateless as possible and a
function that changes its state (the current CPU monitor in the case
of qmp_cpu) goes against it. Any QMP command that needs a specific
monitor CPU setup must provide it in its arguments, instead of relying
in the current QMP monitor state.
After discussions that happened in [2] it was decided that a command
that does nothing since its birth, no one uses for anything and will
not be implemented, should be deprecated and erased. Given that we will
*not* provide any replacement for qmp_cpu and we believe that there
is no user relying on it, there is no point in adding a deprecation
delay for it.
So, this patch nukes qmp_cpu from QEMU code, removing both its blank
implementation in qmp.c and its doc in qapi-schema.json.
qapi: Make cur_doc local to QAPISchemaParser.__init__()
QAPISchemaParser.cur_doc is used only by .__init__() and its helper
.reject_expr_doc(). Make it local to __init__() and pass it to
.reject_expr_doc() explicitly.
Commit 1d8bda1 got rid of #optional tags, and added a check to keep
them from getting added back, to make sure patches then in flight
don't add them back. It's been six months, time to drop that check.
It's hard to get all images to have all these packages, the usual
"FEATURES" and "require" mechanism doesn't scale with so many features.
With that change, the test basically only works in ubuntu.
Until a better way comes up, leave the feature enabling to ./configure
detection.
Peter Maydell [Wed, 20 Dec 2017 11:30:55 +0000 (11:30 +0000)]
Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
Pull request
v2:
* Fixed incorrect virtio_blk_data_plane_create() local_err refactoring in
"hw/block: Use errp directly rather than local_err" that broke virtio-blk
over virtio-mmio [Peter]
# gpg: Signature made Tue 19 Dec 2017 15:08:14 GMT
# gpg: using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg: aka "Stefan Hajnoczi <[email protected]>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8
* remotes/stefanha/tags/block-pull-request: (23 commits)
qemu-iotests: add 203 savevm with IOThreads test
iothread: fix iothread_stop() race condition
iotests: add VM.add_object()
blockdev: add x-blockdev-set-iothread force boolean
docs: mark nested AioContext locking as a legacy API
block: avoid recursive AioContext acquire in bdrv_inactivate_all()
virtio-blk: reject configs with logical block size > physical block size
virtio-blk: make queue size configurable
qemu-iotests: add 202 external snapshots IOThread test
blockdev: add x-blockdev-set-iothread testing command
iothread: add iothread_by_id() API
block: drop unused BlockDirtyBitmapState->aio_context field
block: don't keep AioContext acquired after internal_snapshot_prepare()
block: don't keep AioContext acquired after blockdev_backup_prepare()
block: don't keep AioContext acquired after drive_backup_prepare()
block: don't keep AioContext acquired after external_snapshot_prepare()
blockdev: hold AioContext for bdrv_unref() in external_snapshot_clean()
qdev: drop unused #include "sysemu/iothread.h"
dev-storage: Fix the unusual function name
hw/block: Use errp directly rather than local_err
...
Signed-off-by: Peter Maydell <[email protected]>
# Conflicts:
# hw/core/qdev-properties-system.c
option: Drop unused get_param_value(), get_next_param_value()
Their last user went away in commit f51074cdc6, "pci-hotplug-old: Has
been dead for five major releases, bury", v2.3.0. Remove them, as new
code should use QemuOpts or maybe keyval_parse() instead.
qemu-options: Move -iscsi under "Block device options"
-iscsi ended up under the "Device URL Syntax" heading by a sequence of
errors, as explained in the previous commit. Move it under the "Block
device options" heading. Nothing left under "Device URL Syntax";
drop the heading.
qemu-options qemu-doc: Move "Device URL Syntax" to qemu-doc
Commit 0f5314a (v1.0) added section "Device URL Syntax" to
qemu-options.hx. It's enclosed in STEXI..ETEXI, thus affects only
qemu-options.texi, not --help. It appears as a subsection under
section "Invocation". Similarly, qemu.1 has it as a subsection under
"OPTIONS".
Commit f9dadc9 (v1.1.0) dropped new option -iscsi into the middle of
this section. No effect on qemu-options.texi. It appears in --help
run together with the "Bluetooth(R) options:" header.
Commit c70a01e (v1.5.0) gives it is own heading in --help by moving
commit 0f5314a's DEFHEADING(Device URL Syntax:) outside STEXI..ETEXI.
Trouble is the heading makes no sense for -iscsi.
Move all of the "Device URL Syntax" Texinfo to qemu-doc.texi. Mark it
for inclusion in qemu.1 with '@c man begin NOTES'. This turns it into
a separate section outside the list of options both in qemu-doc and in
qemu.1.
There's substantial overlap with the existing qemu-doc section "Disk
Images". Mark with a TODO comment.
The table of option parameters lacks @table and @end table. The
parameters become items in the enclosing table of options. Screwed up
when l2tpv3 was added in commit 3fb69aa. Fix the obvious way.
qemu-options: Remove stray colons from output of --help
Commit 43f187a broke --help: it put colons into blank lines. It
removed the colon from DEFHEADING(TITLE:) and added it back in the
macro expansion of DEFHEADING(TITLE), so hxtool can emit "@subsection
TITLE" more easily. Trouble is it's added back even for the blank
lines made with DEFHEADING().
Put the colons back where they were before commit 43f187a, and strip
them in hxtool instead.
Peter Maydell [Tue, 19 Dec 2017 19:11:11 +0000 (19:11 +0000)]
Merge remote-tracking branch 'remotes/aurel/tags/pull-target-sh4-20171218' into staging
Queued target/sh4 patches
# gpg: Signature made Mon 18 Dec 2017 22:36:42 GMT
# gpg: using RSA key 0x1388C0F899E8336B
# gpg: Good signature from "Aurelien Jarno <[email protected]>"
# gpg: aka "Aurelien Jarno <[email protected]>"
# gpg: aka "Aurelien Jarno <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 7746 2642 A9EF 94FD 0F77 196D BA9C 7806 1DDD 8C9B
# Subkey fingerprint: 52BC 8695 BE34 F90A D7D4 0CB8 1388 C0F8 99E8 336B
* remotes/aurel/tags/pull-target-sh4-20171218:
target/sh4: Convert to DisasContextBase
target/sh4: Do not singlestep after exceptions
target/sh4: Convert to DisasJumpType
target/sh4: Use cmpxchg for movco when parallel_cpus
target/sh4: fix TCG leak during gusa sequence
target/sh4: add missing tcg_temp_free() in _decode_opc()
* remotes/cody/tags/block-pull-request:
block/curl: fix minor memory leaks
block/curl: check error return of curl_global_init()
block/sheepdog: code beautification
block/sheepdog: remove spurious NULL check
blockjob: kick jobs on set-speed
backup: use copy_bitmap in incremental backup
backup: simplify non-dirty bits progress processing
backup: init copy_bitmap from sync_bitmap for incremental
backup: move from done_bitmap to copy_bitmap
hbitmap: add next_zero function
Peter Maydell [Tue, 19 Dec 2017 12:48:56 +0000 (12:48 +0000)]
Merge remote-tracking branch 'remotes/stefanberger/tags/pull-tpm-2017-12-19-1' into staging
Merge tpm 2017/12/19 v1
# gpg: Signature made Tue 19 Dec 2017 11:51:13 GMT
# gpg: using RSA key 0x75AD65802A0B4211
# gpg: Good signature from "Stefan Berger <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B818 B9CA DF90 89C2 D5CE C66B 75AD 6580 2A0B 4211
* remotes/stefanberger/tags/pull-tpm-2017-12-19-1:
tpm: move qdev_prop_tpm to hw/tpm/
Cornelia Huck [Mon, 18 Dec 2017 18:36:31 +0000 (19:36 +0100)]
tpm: move qdev_prop_tpm to hw/tpm/
Building with --disable-tpm yields
../hw/core/qdev-properties-system.o: In function `set_tpm':
/home/cohuck/git/qemu/hw/core/qdev-properties-system.c:274: undefined reference to `qemu_find_tpm_be'
/home/cohuck/git/qemu/hw/core/qdev-properties-system.c:278: undefined reference to `tpm_backend_init'
../hw/core/qdev-properties-system.o: In function `release_tpm':
/home/cohuck/git/qemu/hw/core/qdev-properties-system.c:291: undefined reference to `tpm_backend_reset'
Move the implementation of DEFINE_PROP_TPMBE to hw/tpm/ so that it is
only built when tpm is actually configured, and build tpm_util in every
case.
Stefan Hajnoczi [Thu, 7 Dec 2017 20:13:19 +0000 (20:13 +0000)]
iothread: fix iothread_stop() race condition
There is a small chance that iothread_stop() hangs as follows:
Thread 3 (Thread 0x7f63eba5f700 (LWP 16105)):
#0 0x00007f64012c09b6 in ppoll () at /lib64/libc.so.6
#1 0x000055959992eac9 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2 0x000055959992eac9 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:322
#3 0x0000559599930711 in aio_poll (ctx=0x55959bdb83c0, blocking=blocking@entry=true) at util/aio-posix.c:629
#4 0x00005595996806fe in iothread_run (opaque=0x55959bd78400) at iothread.c:59
#5 0x00007f640159f609 in start_thread () at /lib64/libpthread.so.0
#6 0x00007f64012cce6f in clone () at /lib64/libc.so.6
Thread 1 (Thread 0x7f640b45b280 (LWP 16103)):
#0 0x00007f64015a0b6d in pthread_join () at /lib64/libpthread.so.0
#1 0x00005595999332ef in qemu_thread_join (thread=<optimized out>) at util/qemu-thread-posix.c:547
#2 0x00005595996808ae in iothread_stop (iothread=<optimized out>) at iothread.c:91
#3 0x000055959968094d in iothread_stop_iter (object=<optimized out>, opaque=<optimized out>) at iothread.c:102
#4 0x0000559599857d97 in do_object_child_foreach (obj=obj@entry=0x55959bdb8100, fn=fn@entry=0x559599680930 <iothread_stop_iter>, opaque=opaque@entry=0x0, recurse=recurse@entry=false) at qom/object.c:852
#5 0x0000559599859477 in object_child_foreach (obj=obj@entry=0x55959bdb8100, fn=fn@entry=0x559599680930 <iothread_stop_iter>, opaque=opaque@entry=0x0) at qom/object.c:867
#6 0x0000559599680a6e in iothread_stop_all () at iothread.c:341
#7 0x000055959955b1d5 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4913
The relevant code from iothread_run() is:
while (!atomic_read(&iothread->stopping)) {
aio_poll(iothread->ctx, true);
1. IOThread:
while (!atomic_read(&iothread->stopping)) -> stopping=false
2. Main loop:
iothread->stopping = true;
aio_notify(iothread->ctx);
3. IOThread:
aio_poll(iothread->ctx, true); -> hang
The bug is explained by the AioContext->notify_me doc comments:
"If this field is 0, everything (file descriptors, bottom halves,
timers) will be re-evaluated before the next blocking poll(), thus the
event_notifier_set call can be skipped."
The problem is that "everything" does not include checking
iothread->stopping. This means iothread_run() will block in aio_poll()
if aio_notify() was called just before aio_poll().
This patch fixes the hang by replacing aio_notify() with
aio_bh_schedule_oneshot(). This makes aio_poll() or g_main_loop_run()
to return.
Implementing this properly required a new bool running flag. The new
flag prevents races that are tricky if we try to use iothread->stopping.
Now iothread->stopping is purely for iothread_stop() and
iothread->running is purely for the iothread_run() thread.
Stefan Hajnoczi [Thu, 7 Dec 2017 20:13:17 +0000 (20:13 +0000)]
blockdev: add x-blockdev-set-iothread force boolean
When a node is already associated with a BlockBackend the
x-blockdev-set-iothread command refuses to set the IOThread. This is to
prevent accidentally changing the IOThread when the nodes are in use.
When the nodes are created with -drive they automatically get a
BlockBackend. In that case we know nothing is using them yet and it's
safe to set the IOThread. Add a force boolean to override the check.
Paolo Bonzini [Thu, 7 Dec 2017 20:13:15 +0000 (20:13 +0000)]
block: avoid recursive AioContext acquire in bdrv_inactivate_all()
BDRV_POLL_WHILE() does not support recursive AioContext locking. It
only releases the AioContext lock once regardless of how many times the
caller has acquired it. This results in a hang since the IOThread does
not make progress while the AioContext is still locked.
virtio-blk logical block size should never be larger than physical block
size because it doesn't make sense to have such configurations. QEMU doesn't
have a way to effectively express this condition; the best it can do is
report the physical block exponent as 0 - indicating the logical block size
equals the physical block size.
Mark Kanda [Mon, 11 Dec 2017 15:16:24 +0000 (09:16 -0600)]
virtio-blk: make queue size configurable
Depending on the configuration, it can be beneficial to adjust the virtio-blk
queue size to something other than the current default of 128. Add a new
property to make the queue size configurable.