Modify qemu_opts_print_help():
- to print expected argument type
- skip description if not available
- sort lines
- prefix with the list name (like qdev, to avoid confusion)
- drop 16-chars alignment, use a '-' as seperator for option name and
description
For ex, "-spice help" output is changed from:
port No description available
tls-port No description available
addr No description available
[...]
gl No description available
rendernode No description available
size Virtual disk size
compat Compatibility level (0.10 or 1.1)
backing_file File name of a base image
[...]
lazy_refcounts Postpone refcount updates
refcount_bits Width of a reference count entry in bits
to:
backing_file=str - File name of a base image
backing_fmt=str - Image format of the base image
cluster_size=size - qcow2 cluster size
[...]
refcount_bits=num - Width of a reference count entry in bits
size=size - Virtual disk size
qemu-option: add help fallback to print the list of options
QDev options accept 'help' (or '?', but that's problematic with shell
globbing) in the list of parameters, which is handy to list the
available options.
Unfortunately, this isn't built in QemuOpts. qemu_opts_parse_noisily()
seems to be the common path for command line options, so place a
fallback to print help, listing the available options.
This is quite handy, for example with qemu "-spice help".
qdev_device_help() is used from command line "-device help", or from
HMP "device_add". If used from command line, print help to stdout
(it is only printed on explicit demand).
Peter Maydell [Wed, 3 Oct 2018 13:07:49 +0000 (14:07 +0100)]
Merge remote-tracking branch 'remotes/elmarco/tags/chardev-pull-request' into staging
chardev patches
# gpg: Signature made Wed 03 Oct 2018 11:57:34 BST
# gpg: using RSA key DAE8E10975969CE5
# gpg: Good signature from "Marc-André Lureau <[email protected]>"
# gpg: aka "Marc-André Lureau <[email protected]>"
# Primary key fingerprint: 87A9 BD93 3F87 C606 D276 F62D DAE8 E109 7596 9CE5
* remotes/elmarco/tags/chardev-pull-request:
chardev: use a child source for qio input source
chardev: mark the calls that allow an implicit mux monitor
char.h: fix gtk-doc comment style
chardev: unref if underlying chardev has no parent
chardev: remove qemu_chr_fe_read_all() counter
chardev: avoid crash if no associated address
chardev: mark the calls that allow an implicit mux monitor
This is mostly for readability of the code. Let's make it clear which
callers can create an implicit monitor when the chardev is muxed.
This will also enforce a safer behaviour, as we don't really support
creating monitor anywhere/anytime at the moment. Add an assert() to
make sure the programmer explicitely wanted that behaviour.
There are documented cases, such as: -serial/-parallel/-virtioconsole
and to less extent -debugcon.
Less obvious and questionable ones are -gdb, SLIRP -guestfwd and Xen
console. Add a FIXME note for those, but keep the support for now.
Other qemu_chr_new() callers either have a fixed parameter/filename
string or do not need it, such as -qtest:
* qtest.c: qtest_init()
Afaik, only used by tests/libqtest.c, without mux. I don't think we
support it outside of qemu testing: drop support for implicit mux
monitor (qemu_chr_new() call: no implicit mux now).
* hw/
All with literal @filename argument that doesn't enable mux monitor.
* tests/
All with @filename argument that doesn't enable mux monitor.
On a related note, the list of monitor creation places:
- the chardev creators listed above: all from command line (except
perhaps Xen console?)
- -gdb & hmp gdbserver will create a "GDB monitor command" chardev
that is wired to an HMP monitor.
- -mon command line option
From this short study, I would like to think that a monitor may only
be created in the main thread today, though I remain skeptical :)
chardev: unref if underlying chardev has no parent
It's possible to write code creating a chardev backend that is not
registered. When it is not user-created, it makes sense to keep it
hidden. Let the associated frontend destroy it also in this case.
There is no obvious reason to have a loop counter. This limits from
reading several megabytes large buffers in one go, since socket
read/write usually have a limit.
A socket chardev may not have associated address (when adding client
fd manually for example). But on disconnect, updating socket filename
expects an address and may lead to this crash:
Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555d8c70c in SocketAddress_to_str (prefix=0x555556043062 "disconnected:", addr=0x0, is_listen=false, is_telnet=false) at /home/elmarco/src/qq/chardev/char-socket.c:388
388 switch (addr->type) {
(gdb) bt
#0 0x0000555555d8c70c in SocketAddress_to_str (prefix=0x555556043062 "disconnected:", addr=0x0, is_listen=false, is_telnet=false) at /home/elmarco/src/qq/chardev/char-socket.c:388
#1 0x0000555555d8c8aa in update_disconnected_filename (s=0x555556b1ed00) at /home/elmarco/src/qq/chardev/char-socket.c:419
#2 0x0000555555d8c959 in tcp_chr_disconnect (chr=0x555556b1ed00) at /home/elmarco/src/qq/chardev/char-socket.c:438
#3 0x0000555555d8cba1 in tcp_chr_hup (channel=0x555556b75690, cond=G_IO_HUP, opaque=0x555556b1ed00) at /home/elmarco/src/qq/chardev/char-socket.c:482
#4 0x0000555555da596e in qio_channel_fd_source_dispatch (source=0x555556bb68b0, callback=0x555555d8cb58 <tcp_chr_hup>, user_data=0x555556b1ed00) at /home/elmarco/src/qq/io/channel-watch.c:84
Replace filename with a generic "disconnected:socket" in this case.
* remotes/bonzini/tags/for-upstream: (80 commits)
hw/scsi/mptendian: Avoid taking address of fields in packed structs
cpus: fix TCG kick timer leak
docs/devel/memory.txt: Document _with_attrs accessors
hw/nvram/fw_cfg: Use memberwise copy of MemoryRegionOps struct
memory: Remove old_mmio accessors
memory: Fix access_with_adjusted_size(small size) on big-endian memory regions
memory: Refactor common shifting code from accessors
memory: Use MAKE_64BIT_MASK()
virtio: do not take address of packed members
replay: replay BH for IDE trim operation
hostmem-file: make available memory-backend-file on POSIX-based hosts
target/i386: fix translation for icount mode
hvf: drop unused variable
qom/object: add some interface asserts
accel/tcg: Remove dead code
lsi53c895a: convert to trace-events
scsi-block: Deprecate rotation_rate
kvmclock: run KVM_KVMCLOCK_CTRL ioctl in vcpu thread
MAINTAINERS: add myself as elf2dmp maintainer
contrib: add elf2dmp tool
...
Peter Maydell [Thu, 27 Sep 2018 13:48:52 +0000 (14:48 +0100)]
hw/scsi/mptendian: Avoid taking address of fields in packed structs
Taking the address of a field in a packed struct is a bad idea, because
it might not be actually aligned enough for that pointer type (and
thus cause a crash on dereference on some host architectures). Newer
versions of clang warn about this. Avoid the bug by not using the
"modify in place" byte swapping functions.
This patch was produced with the following simple spatch script:
@@
expression E;
@@
-le16_to_cpus(&E);
+E = le16_to_cpu(E);
@@
expression E;
@@
-le32_to_cpus(&E);
+E = le32_to_cpu(E);
@@
expression E;
@@
-le64_to_cpus(&E);
+E = le64_to_cpu(E);
@@
expression E;
@@
-cpu_to_le16s(&E);
+E = cpu_to_le16(E);
@@
expression E;
@@
-cpu_to_le32s(&E);
+E = cpu_to_le32(E);
@@
expression E;
@@
-cpu_to_le64s(&E);
+E = cpu_to_le64(E);
followed by some minor tidying of overlong lines and bad indent.
Peter Maydell [Fri, 24 Aug 2018 17:04:21 +0000 (18:04 +0100)]
hw/nvram/fw_cfg: Use memberwise copy of MemoryRegionOps struct
We've now removed the 'old_mmio' member from MemoryRegionOps,
so we can perform the copy as a simple struct copy rather
than having to do it via a memberwise copy.
memory: Fix access_with_adjusted_size(small size) on big-endian memory regions
Memory regions configured as DEVICE_BIG_ENDIAN (or DEVICE_NATIVE_ENDIAN on
big-endian guest) behave incorrectly when the memory access 'size' is smaller
than the implementation 'access_size'.
In the following code segment from access_with_adjusted_size():
if (memory_region_big_endian(mr)) {
for (i = 0; i < size; i += access_size) {
r |= access_fn(mr, addr + i, value, access_size,
(size - access_size - i) * 8, access_mask, attrs);
}
(size - access_size - i) * 8 is the number of bits that will arithmetic
shift the current value.
Currently we can only 'left' shift a read() access, and 'right' shift a write().
When the access 'size' is smaller than the implementation, we get a negative
number of bits to shift.
For the read() case, a negative 'left' shift is a 'right' shift :)
However since the 'shift' type is unsigned, there is currently no way to
right shift.
Fix this by changing the access_fn() prototype to handle signed shift values,
and modify the memory_region_shift_read|write_access() helpers to correctly
arithmetic shift the opposite direction when the 'shift' value is negative.
Paolo Bonzini [Wed, 26 Sep 2018 21:17:42 +0000 (23:17 +0200)]
virtio: do not take address of packed members
The address of a packed member is not packed, which may cause accesses
to unaligned pointers. Avoid this by reading the packed value before
passing it to another function.
Pavel Dovgalyuk [Wed, 12 Sep 2018 08:19:50 +0000 (11:19 +0300)]
replay: replay BH for IDE trim operation
This patch makes IDE trim BH deterministic, because it affects
the device state. Therefore its invocation should be replayed
instead of running at the random moment.
hostmem-file: make available memory-backend-file on POSIX-based hosts
Before this change, memory-backend-file object is valid for Linux hosts
only because hostmem-file.c is compiled only on Linux hosts.
However, other POSIX-based hosts (such as macOS) can support
memory-backend-file object in the same way as on Linux hosts.
This patch makes hostmem-file.c and related functions to be compiled on
all POSIX-based hosts to make available memory-backend-file on them.
Pavel Dovgalyuk [Thu, 20 Sep 2018 07:17:03 +0000 (10:17 +0300)]
target/i386: fix translation for icount mode
This patch fixes the checking of boundary crossing instructions.
In icount mode only first instruction of the block may cross
the page boundary to keep the translation deterministic.
These conditions already existed, but compared the wrong variable.
Thomas Huth [Mon, 17 Sep 2018 17:08:54 +0000 (19:08 +0200)]
accel/tcg: Remove dead code
The global cpu_single_env variable has been removed more than 5 years
ago, so apparently nobody used this dead debug code in that timeframe
anymore. Thus let's remove it completely now.
This option is added together with scsi-disk but is never honoured,
becuase we don't emulate the VPD page for scsi-block. We could intercept
and inject the user specified value like for max xfer len, but it's
probably not helpful since the intent of 070f80095ad was for random
entropy aspects, not for performance. If emulated rotation rate is
desired, scsi-hd is more suitable.
kvmclock: run KVM_KVMCLOCK_CTRL ioctl in vcpu thread
According to KVM API Documentation, we should only
run vcpu ioctls from the same thread that was used
to create the vcpu. This patch makes KVM_KVMCLOCK_CTRL
ioctl consistent with the Documentation.
Viktor Prutyanov [Wed, 29 Aug 2018 12:41:25 +0000 (15:41 +0300)]
contrib: add elf2dmp tool
elf2dmp is a converter from ELF dump (produced by 'dump-guest-memory') to
Windows MEMORY.DMP format (also know as 'Complete Memory Dump') which can be
opened in WinDbg.
This tool can help if VMCoreInfo device/driver is absent in Windows VM and
'dump-guest-memory -w' is not available but dump can be created in ELF format.
The tool works as follows:
1. Determine the system paging root looking at GS_BASE or KERNEL_GS_BASE
to locate the PRCB structure and finds the kernel CR3 nearby if QEMU CPU
state CR3 is not suitable.
2. Find an address within the kernel image by dereferencing the first
IDT entry and scans virtual memory upwards until the start of the
kernel.
3. Download a PDB matching the kernel from the Microsoft symbol store,
and figure out the layout of certain relevant structures necessary for
the dump.
4. Populate the corresponding structures in the memory image and create
the appropriate dump header.
Paolo Bonzini [Tue, 21 Aug 2018 13:31:24 +0000 (15:31 +0200)]
target/i386: unify masking of interrupts
Interrupt handling depends on various flags in env->hflags or env->hflags2,
and the exact detail were not exactly replicated between x86_cpu_has_work
and x86_cpu_exec_interrupt. Create a new function that extracts the
highest-priority non-masked interrupt, and use it in both functions.
Paolo Bonzini [Wed, 22 Aug 2018 10:38:20 +0000 (12:38 +0200)]
char-pty: remove unnecessary #ifdef
For some reason __APPLE__ was not checked in pty code. However, the #ifdef
is redundant: this file is already compiled only if CONFIG_POSIX, same as
util/qemu-openpty.c which it uses.
Peter reported a test failure on FreeBSD with the new reconnect test:
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
gtester -k --verbose -m=quick tests/test-char
TEST: tests/test-char... (pid=16190)
/char/null: OK
/char/invalid: OK
/char/ringbuf: OK
/char/mux: OK
/char/stdio: OK
/char/pipe: OK
/char/file: OK
/char/file-fifo: OK
/char/udp: OK
/char/serial: OK
/char/hotswap: OK
/char/socket/basic: OK
/char/socket/reconnect: FAIL
GTester: last random seed: R02S521380d9c12f1dac3ad1763bf5665c27
(pid=16367)
/char/socket/fdpass: OK
FAIL: tests/test-char
**
ERROR:tests/test-char.c:353:char_socket_test_common: assertion failed:
(object_property_get_bool(OBJECT(chr_client), "connected",
&error_abort))
It turns out that the socket test code checks both server and client
connection states, but doesn't wait for both.
This commit broke "reconnect socket" chardev that are created after
"machine_done": they no longer try to connect. It broke also
vhost-user-test that uses chardev while there is no "machine_done"
event.
The goal of this patch was to move the "connect" source to the
frontend context. chr->gcontext is set with
qemu_chr_fe_set_handlers(). But there is no guarantee that it will be
called, so we can't delay connection until then: the chardev should
still attempt to connect during open(). qemu_chr_fe_set_handlers() is
eventually called later and will update the context.
Unless there is a good reason to not use initially the default
context, I think we should revert to the previous state to fix the
regressions.
Igor Mammedov [Tue, 4 Sep 2018 12:39:37 +0000 (14:39 +0200)]
memory: cleanup side effects of memory_region_init_foo() on failure
if MemoryRegion intialization fails it's left in semi-initialized state,
where it's size is not 0 and attached as child to owner object.
And this leds to crash in following use-case:
(monitor) object_add memory-backend-file,id=mem1,size=99999G,mem-path=/tmp/foo,discard-data=yes
memory.c:2083: memory_region_get_ram_ptr: Assertion `mr->ram_block' failed
Aborted (core dumped)
it happens due to assumption that memory region is intialized when
memory_region_size() != 0
and therefore it's ok to access it in
file_backend_unparent()
if (memory_region_size() != 0)
memory_region_get_ram_ptr()
which happens when object_add fails and unparents failed backend making
file_backend_unparent() access invalid memory region.
Fix it by making sure that memory_region_init_foo() APIs cleanup externally
visible side effects on failure (like set size to 0 and unparenting object)
Pavel Dovgalyuk [Wed, 12 Sep 2018 08:20:13 +0000 (11:20 +0300)]
ui: fix virtual timers
UI uses timers based on virtual clock for managing key queue.
This is incorrect because this service is not related to the guest state,
and its events should not be recorded and replayed. But these timers should
stop when the guest is not executing.
This patch changes using virtual clock to the new virtual_ext clock,
which runs as virtual clock, but its timers are not saved to the log.
Pavel Dovgalyuk [Wed, 12 Sep 2018 08:20:07 +0000 (11:20 +0300)]
slirp: fix ipv6 timers
ICMP implementation for IPv6 uses timers based on virtual clock.
This is incorrect because this service is not related to the guest state,
and its events should not be recorded and replayed.
This patch changes using virtual clock to the new virtual_ext clock.
Pavel Dovgalyuk [Wed, 12 Sep 2018 08:20:02 +0000 (11:20 +0300)]
timer: introduce new virtual clock
Slirp and VNC modules use virtual clock for processing some events that
are related to the guest execution speed.
But virtual clock-related events are consideres to be deterministic and
are recorded/replayed by icount mechanism. But slirp and VNC lie outside
the recorded guest core (which includes CPU and peripherals).
Therefore slirp and VNC are external for the guest, but should work at
guest speed.
This patch introduces new virtual clock which can be used for external
subsystems for running timers that are synchronized with the guest.
Pavel Dovgalyuk [Wed, 12 Sep 2018 08:19:39 +0000 (11:19 +0300)]
replay: allow loading any snapshots before recording
This patch enables using -loadvm in recording mode to allow starting
the execution recording from any of the available snapshots.
It also fixes loading of the record/replay state, therefore snapshots
created in replay mode may also be used for starting the new recording.
Pavel Dovgalyuk [Wed, 12 Sep 2018 08:19:10 +0000 (11:19 +0300)]
translator: fix breakpoint processing
QEMU cannot pass through the breakpoints when 'si' command is used
in remote gdb. This patch disables inserting the breakpoints
when we are already single stepping though the gdb remote protocol.
This patch also fixes icount calculation for the blocks that include
breakpoints - instruction with breakpoint is not executed and shouldn't
be used in icount calculation.
Pavel Dovgalyuk [Wed, 12 Sep 2018 08:18:59 +0000 (11:18 +0300)]
replay: flush events when exiting
This patch adds events processing when emulation finishes instead
of just cleaning the queue. Now the bdrv coroutines will be in consistent
state when emulator closes. It allows correct polling of the block layer
at exit.
Pavel Dovgalyuk [Wed, 12 Sep 2018 08:19:45 +0000 (11:19 +0300)]
replay: wake up vCPU when replaying
In record/replay icount mode vCPU thread and iothread synchronize
the execution using the checkpoints.
vCPU thread processes the virtual timers and iothread processes all others.
When iothread wants to wake up sleeping vCPU thread, it sends dummy queued
work. Therefore it could be the following sequence of the events in
record mode:
- IO: sending dummy work
- IO: processing timers
- CPU: wakeup
- CPU: clearing dummy work
- CPU: processing virtual timers
But due to the races in replay mode the sequence may change:
- IO: sending dummy work
- CPU: wakeup
- CPU: clearing dummy work
- CPU: sleeping again because nothing to do
- IO: Processing timers
- CPU: zzzz
In this case vCPU will not wake up, because dummy work is not to be set up
again.
This patch tries to wake up the vCPU when it sleeps and the icount warp
checkpoint isn't met. It means that vCPU has something to do, because
there are no other reasons of non-matching warp checkpoint.
v5: improve checking that vCPU is still sleeping
Message-Id: <20180912081945.3228.19776.stgit@pasha-VirtualBox> Signed-off-by: Paolo Bonzini <[email protected]>
Li Zhijian [Thu, 13 Sep 2018 10:07:13 +0000 (18:07 +0800)]
change get_image_size return type to int64_t
Previously, if the size of initrd >=2G, qemu exits with error:
root@haswell-OptiPlex-9020:/home/lizj# /home/lizhijian/lkp/qemu-colo/x86_64-softmmu/qemu-system-x86_64 -kernel ./vmlinuz-4.16.0-rc4 -initrd large.cgz -nographic
qemu: error reading initrd large.cgz: No such file or directory
root@haswell-OptiPlex-9020:/home/lizj# du -sh large.cgz
2.5G large.cgz
this patch changes the caller side that use this function to calculate
size of initrd file as well.
v2: update error message and int64_t printing format
Register an exit notifier to remove the PID file. By the time atexit()
is called, qemu_write_pidfile() guarantees QEMU owns the PID file,
thus we could safely remove it when exiting.
Paolo Bonzini [Tue, 11 Sep 2018 13:16:58 +0000 (15:16 +0200)]
serial: fix DLL writes
Commit 0147883450fe84bb8de2d4a58381881f4262ce9b tries to handle
word-sized writes to DLL/DLH, but due to a typo,
this patch is causing tracebacks in all Linux kernels running the PXA
serial driver, due to an unexpected DLL register value. Here is the
surrounding code from drivers/tty/serial/pxa.c:
serial_out(up, UART_DLL, quot & 0xff); /* LS of divisor */
/*
* work around Errata #75 according to Intel(R) PXA27x
* Processor Family Specification Update (Nov 2005)
*/
dll = serial_in(up, UART_DLL);
WARN_ON(dll != (quot & 0xff)); // <-- warning
On Linux, lockf() is just an interface on top of fcntl(2) locking.
Many other systems implement lockf() in this way, but note that
POSIX.1 leaves the relationship between lockf() and fcntl(2) locks
unspecified. A portable application should probably avoid mixing
calls to these interfaces.
IOW, if its just a shim around fcntl() on many systems, it is clearer
if we just use fcntl() directly, as we then know how fcntl() locks will
behave if they're on a network filesystem like NFS.
QEMU will leave the pidfile existing on disk when it exits which
initially made me think it avoids the deletion race. The app
managing QEMU, however, may well delete the pidfile after it has
seen QEMU exit, and even if the app locks the pidfile before
deleting it, there is still a race.
eg consider the following sequence
QEMU 1 libvirtd QEMU 2
1. lock(pidfile)
2. exit()
3. open(pidfile)
4. lock(pidfile)
5. open(pidfile)
6. unlink(pidfile)
7. close(pidfile)
8. lock(pidfile)
IOW, at step 8 the new QEMU has successfully acquired the lock, but
the pidfile no longer exists on disk because it was deleted after
the original QEMU exited.
While we could just say no external app should ever delete the
pidfile, I don't think that is satisfactory as people don't read
docs, and admins don't like stale pidfiles being left around on
disk.
To make this robust, I think we might want to copy libvirt's
approach to pidfile acquisition which runs in a loop and checks that
the file on disk /after/ acquiring the lock matches the file that
was locked. Then we could in fact safely let QEMU delete its own
pidfiles on clean exit..
hw/char/sh_serial: Add timeout handling to unbreak serial input
As of commit 18e8cf159177100e ("serial: sh-sci: increase RX FIFO trigger
defaults for (H)SCIF") in Linux v4.11-rc1, the serial console on the
QEMU SH4 target is broken: it delays serial input until enough data has
been received.
Since aforementioned commit, the Linux SCIF driver programs the Receive
FIFO Data Count Trigger bits in the FIFO Control Register, to postpone
generating a receive interrupt until:
1. At least the receive trigger count of bytes of data are available
in the receive FIFO, OR
2. No further data has been received for at least 15 etu after the
last received data.
While QEMU implements the former, it does not implement the latter.
Hence the receive interrupt is not generated until the former condition
is met.
Fix this by adding basic timeout handling. As the QEMU SCIF emulation
ignores any serial speed programming, the timeout value used conforms to
a default speed of 9600 bps, which is fine for any interactive console.
configure: preserve various environment variables in config.status
The config.status script is auto-generated by configure upon
completion. The intention is that config.status can be later invoked by
the developer directly, or by make indirectly, to re-detect the same
environment that configure originally used.
The current config.status script, however, only contains a record of the
command line arguments to configure. Various environment variables have
an effect on what configure will find. In particular PKG_CONFIG_LIBDIR &
PKG_CONFIG_PATH vars will affect what libraries pkg-config finds. The
PATH var will affect what toolchain binaries and XXXX-config scripts are
found. The LD_LIBRARY_PATH var will affect what libraries are
found. Most commands have env variables that will override the name/path
of the default version configure finds.
All these key env variables should be recorded in the config.status script.
Autoconf would also preserve CFLAGS, LDFLAGS, LIBS, CPPFLAGS, but QEMU
deals with those differently, expecting extra flags to be set using
configure args, rather than env variables. At the end of the script we
also don't have the original values of those env vars, as we modify them
during configure.
Jan Kiszka [Mon, 27 Aug 2018 08:47:51 +0000 (10:47 +0200)]
kvm: x86: Fix kvm_arch_fixup_msi_route for remap-less case
The AMD IOMMU does not (yet) support interrupt remapping. But
kvm_arch_fixup_msi_route assumes that all implementations do and crashes
when the AMD IOMMU is used in KVM mode.
hostmem-memfd: add checks before adding hostmem-memfd & properties
Run some memfd-related checks before registering hostmem-memfd &
various properties. This will help libvirt to figure out what the host
is supposed to be capable of.
qemu_memfd_check() is changed to a less optimized version, since it is
used with various flags, it no longer caches the result.
Viktor Prutyanov [Wed, 29 Aug 2018 18:30:56 +0000 (21:30 +0300)]
dump: fix Windows dump memory run mapping
We should map and use guest memory run by parts if it can't be mapped as
a whole.
After this patch, continuos guest physical memory blocks which are not
continuos in host virtual address space will be processed correctly.
Paolo Bonzini [Tue, 11 Sep 2018 11:15:32 +0000 (13:15 +0200)]
cpus: take seqlock across qemu_icount updates
Even though writes of qemu_icount can safely race with reads in
qemu_icount_raw, qemu_icount is also read by icount_adjust, which
runs in the I/O thread. Therefore, writes do needs protection of
the vm_clock_lock; for simplicity the patch protects it with both
seqlock+spinlock, which we already do for hosts that lack 64-bit atomics.
The bug actually predated the introduction of vm_clock_lock;
cpu_update_icount would have needed the BQL before the spinlock was
introduced.
test-rcu-list: access n_reclaims and n_nodes_removed with atomic64
To avoid undefined behaviour.
Note that these "atomics" are atomic in the "access once" sense.
The variables are updated by a single thread at a time, so no
"full" atomics are necessary.
With the seqlock, we either have to use atomics to remain
within defined behaviour (and note that 64-bit atomics aren't
always guaranteed to compile, irrespective of __nocheck), or
drop the atomics and be in undefined behaviour territory.
Fix it by dropping the seqlock and using atomic64 accessors.
This will limit scalability when !CONFIG_ATOMIC64, but those
machines (1) don't have many users and (2) are unlikely to
have many cores.
Pavel Dovgalyuk [Fri, 11 May 2018 08:16:01 +0000 (11:16 +0300)]
ps2: prevent changing irq state on save and load
Commit 2858ab09e6f708e381fc1a1cc87e747a690c4884 changed
PS/2 keyboard/mouse buffers to the standard size. However, its state
may change when migrating from the old buffer size and therefore irq needs
updating. But this change made wrong, because it throws the whole queue
if there are too much data instead of cropping it.
That commit also updates irq (because the queue state may change).
But updating the irq may change the VM state (and determinism of
the execution). E.g., when replaying the execution, one may save
the VM state and the state of the interrupt controller will be updated
at the moment of saving, instead of using the recorded update events.
This patch makes the queue update deterministic: it removes the update_irq
call and crops the queue to prevent losing the characters and changing
the required irq status.
virtio: Return true from virtio_queue_empty if broken
Both virtio-blk and virtio-scsi use virtio_queue_empty() as the
loop condition in VQ handlers (virtio_blk_handle_vq,
virtio_scsi_handle_cmd_vq). When a device is marked broken in
virtqueue_pop, for example if a vIOMMU address translation failed, we
want to break out of the loop.
This fixes a hanging problem when booting a CentOS 3.10.0-862.el7.x86_64
kernel with ATS enabled:
The dead loop happens immediately when the kernel boots and initializes
the device, where virtio_scsi_data_plane_handle_cmd will not return:
> ...
> #13 0x00005586602b7793 in virtio_scsi_handle_cmd_vq
> #14 0x00005586602b8d66 in virtio_scsi_data_plane_handle_cmd
> #15 0x00005586602ddab7 in virtio_queue_notify_aio_vq
> #16 0x00005586602dfc9f in virtio_queue_host_notifier_aio_poll
> #17 0x00005586607885da in run_poll_handlers_once
> #18 0x000055866078880e in try_poll_mode
> #19 0x00005586607888eb in aio_poll
> #20 0x0000558660784561 in aio_wait_bh_oneshot
> #21 0x00005586602b9582 in virtio_scsi_dataplane_stop
> #22 0x00005586605a7110 in virtio_bus_stop_ioeventfd
> #23 0x00005586605a9426 in virtio_pci_stop_ioeventfd
> #24 0x00005586605ab808 in virtio_pci_common_write
> #25 0x0000558660242396 in memory_region_write_accessor
> #26 0x00005586602425ab in access_with_adjusted_size
> #27 0x0000558660245281 in memory_region_dispatch_write
> #28 0x00005586601e008e in flatview_write_continue
> #29 0x00005586601e01d8 in flatview_write
> #30 0x00005586601e04de in address_space_write
> #31 0x00005586601e052f in address_space_rw
> #32 0x00005586602607f2 in kvm_cpu_exec
> #33 0x0000558660227148 in qemu_kvm_cpu_thread_fn
> #34 0x000055866078bde7 in qemu_thread_start
> #35 0x00007f5784906594 in start_thread
> #36 0x00007f5784639e6f in clone
With this patch, virtio_queue_empty will now return 1 as soon as the
vdev is marked as broken, after a "virtio: zero sized buffers are not
allowed" error.
To be consistent, update virtio_queue_empty_rcu as well.
Peter Maydell [Tue, 2 Oct 2018 08:54:44 +0000 (09:54 +0100)]
Merge remote-tracking branch 'remotes/dgibson/tags/libfdt-20181002' into staging
Update dtc submodule to v1.4.7
We have some upcoming things planned for ppc that will require some
newer libfdt features. In preparation, update the dtc/libfdt
submodule to upstreasm version v1.4.7.
* remotes/xtensa/tags/20181001-xtensa:
target/xtensa: extract gen_check_interrupts call
target/xtensa: make rsr/wsr helpers return void
target/xtensa: extract unconditional TB termination via slot 0
target/xtensa: always end TB on CCOUNT access/CCOMPARE write
target/xtensa: change SR number checks to assertions
target/xtensa: extract unconditional TB termination
target/xtensa: extract test for division by zero
target/xtensa: extract test for cpdisabled exception
target/xtensa: extract test for alloca exception
target/xtensa: extract test for window underflow exception
target/xtensa: extract test for window overflow exception
target/xtensa: extract test for debug exception
target/xtensa: extract test for syscall instruction
target/xtensa: extract test for privileged instruction
target/xtensa: extract test for an illegal instruction