Git Repo - qemu.git/log

vl: handle -object help

List the user creatable objects.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>

tests/qom-proplist: check class properties iterator

This test failed before "fix iterating properties over a class".

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>

tests/qom-proplist: check properties are not listed multiple times

And factor out a common function used by the follow class properties
iterator test.

Fix uninitialized "seentype" variable.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>

tests/qom-proplist: check duplicate "bv" property registration failed

"bv" is already a class property.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>

qom/object: register 'type' property as class property

Let's save a few byte in each object instance.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>

qom/object: fix iterating properties over a class

object_class_property_iter_init() starts from the given class, so the
next class should continue with the parent class.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>

qemu-option: improve qemu_opts_print_help() output

Modify qemu_opts_print_help():
- to print expected argument type
- skip description if not available
- sort lines
- prefix with the list name (like qdev, to avoid confusion)
- drop 16-chars alignment, use a '-' as seperator for option name and
  description

For ex, "-spice help" output is changed from:

port             No description available
tls-port         No description available
addr             No description available
[...]
gl               No description available
rendernode       No description available

to:

spice.addr=str
spice.agent-mouse=bool (on/off)
spice.disable-agent-file-xfer=bool (on/off)
[...]
spice.x509-key-password=str
spice.zlib-glz-wan-compression=str

"qemu-img create -f qcow2 -o help", changed from:

size             Virtual disk size
compat           Compatibility level (0.10 or 1.1)
backing_file     File name of a base image
[...]
lazy_refcounts   Postpone refcount updates
refcount_bits    Width of a reference count entry in bits

to:

backing_file=str - File name of a base image
backing_fmt=str - Image format of the base image
cluster_size=size - qcow2 cluster size
[...]
refcount_bits=num - Width of a reference count entry in bits
size=size - Virtual disk size

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Eric Blake <[email protected]>

qemu-option: add help fallback to print the list of options

QDev options accept 'help' (or '?', but that's problematic with shell
globbing) in the list of parameters, which is handy to list the
available options.

Unfortunately, this isn't built in QemuOpts. qemu_opts_parse_noisily()
seems to be the common path for command line options, so place a
fallback to print help, listing the available options.

This is quite handy, for example with qemu "-spice help".

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Eric Blake <[email protected]>

cutils: add qemu_pstrcmp0()

A char** variant of g_strcmp0().

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>

qdev-monitor: print help to stdout

qdev_device_help() is used from command line "-device help", or from
HMP "device_add". If used from command line, print help to stdout
(it is only printed on explicit demand).

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Eric Blake <[email protected]>

Merge remote-tracking branch 'remotes/elmarco/tags/chardev-pull-request' into staging

chardev patches

# gpg: Signature made Wed 03 Oct 2018 11:57:34 BST
# gpg:                using RSA key DAE8E10975969CE5
# gpg: Good signature from "Marc-André Lureau <[email protected]>"
# gpg:                 aka "Marc-André Lureau <[email protected]>"
# Primary key fingerprint: 87A9 BD93 3F87 C606 D276  F62D DAE8 E109 7596 9CE5

* remotes/elmarco/tags/chardev-pull-request:
  chardev: use a child source for qio input source
  chardev: mark the calls that allow an implicit mux monitor
  char.h: fix gtk-doc comment style
  chardev: unref if underlying chardev has no parent
  chardev: remove qemu_chr_fe_read_all() counter
  chardev: avoid crash if no associated address

Signed-off-by: Peter Maydell <[email protected]>

chardev: use a child source for qio input source

GLib child source were added with version 2.28. We can use them now
that we bumped our requirement to 2.40.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>

chardev: mark the calls that allow an implicit mux monitor

This is mostly for readability of the code. Let's make it clear which
callers can create an implicit monitor when the chardev is muxed.

This will also enforce a safer behaviour, as we don't really support
creating monitor anywhere/anytime at the moment. Add an assert() to
make sure the programmer explicitely wanted that behaviour.

There are documented cases, such as: -serial/-parallel/-virtioconsole
and to less extent -debugcon.

Less obvious and questionable ones are -gdb, SLIRP -guestfwd and Xen
console. Add a FIXME note for those, but keep the support for now.

Other qemu_chr_new() callers either have a fixed parameter/filename
string or do not need it, such as -qtest:

* qtest.c: qtest_init()
  Afaik, only used by tests/libqtest.c, without mux. I don't think we
  support it outside of qemu testing: drop support for implicit mux
  monitor (qemu_chr_new() call: no implicit mux now).

* hw/
  All with literal @filename argument that doesn't enable mux monitor.

* tests/
  All with @filename argument that doesn't enable mux monitor.

On a related note, the list of monitor creation places:

- the chardev creators listed above: all from command line (except
  perhaps Xen console?)

- -gdb & hmp gdbserver will create a "GDB monitor command" chardev
  that is wired to an HMP monitor.

- -mon command line option

From this short study, I would like to think that a monitor may only
be created in the main thread today, though I remain skeptical :)

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>

char.h: fix gtk-doc comment style

Fix up conformance to GTK-Doc function comment style, as documented in
https://developer.gnome.org/gtk-doc-manual/stable/documenting_symbols.html.en

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>

chardev: unref if underlying chardev has no parent

It's possible to write code creating a chardev backend that is not
registered. When it is not user-created, it makes sense to keep it
hidden. Let the associated frontend destroy it also in this case.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>

chardev: remove qemu_chr_fe_read_all() counter

There is no obvious reason to have a loop counter. This limits from
reading several megabytes large buffers in one go, since socket
read/write usually have a limit.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>

chardev: avoid crash if no associated address

A socket chardev may not have associated address (when adding client
fd manually for example). But on disconnect, updating socket filename
expects an address and may lead to this crash:

  Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
  0x0000555555d8c70c in SocketAddress_to_str (prefix=0x555556043062 "disconnected:", addr=0x0, is_listen=false, is_telnet=false) at /home/elmarco/src/qq/chardev/char-socket.c:388
  388     switch (addr->type) {
  (gdb) bt
  #0  0x0000555555d8c70c in SocketAddress_to_str (prefix=0x555556043062 "disconnected:", addr=0x0, is_listen=false, is_telnet=false) at /home/elmarco/src/qq/chardev/char-socket.c:388
  #1  0x0000555555d8c8aa in update_disconnected_filename (s=0x555556b1ed00) at /home/elmarco/src/qq/chardev/char-socket.c:419
  #2  0x0000555555d8c959 in tcp_chr_disconnect (chr=0x555556b1ed00) at /home/elmarco/src/qq/chardev/char-socket.c:438
  #3  0x0000555555d8cba1 in tcp_chr_hup (channel=0x555556b75690, cond=G_IO_HUP, opaque=0x555556b1ed00) at /home/elmarco/src/qq/chardev/char-socket.c:482
  #4  0x0000555555da596e in qio_channel_fd_source_dispatch (source=0x555556bb68b0, callback=0x555555d8cb58 <tcp_chr_hup>, user_data=0x555556b1ed00) at /home/elmarco/src/qq/io/channel-watch.c:84

Replace filename with a generic "disconnected:socket" in this case.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* configure fix for environment variables (Daniel)
* fix memory leaks (Alex)
* x86_64 MTTCG fixes (Emilio)
* introduce atomic64 (Emilio)
* Fix for virtio hang (Fam, myself)
* SH serial port fix (Geert)
* Deprecate rotation_rate for scsi-block (Fam)
* Extend memory-backend-file availability to all POSIX hosts (Hikaru)
* Memory API cleanups and fixes (Igor, Li Qiang, Peter, Philippe)
* MSI/IOMMU fix (Jan)
* Socket reconnection fixes (Marc-André)
* icount fixes (Emilio, myself)
* QSP fixes for Coverity (myself)
* Some record/replay improovements (Pavel)
* Packed struct fixes (Peter)
* Windows dump fixes and elf2dmp (Viktor)
* kbmclock fix (Yongji)

# gpg: Signature made Tue 02 Oct 2018 18:13:12 BST
# gpg:                using RSA key BFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg:                 aka "Paolo Bonzini <[email protected]>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (80 commits)
  hw/scsi/mptendian: Avoid taking address of fields in packed structs
  cpus: fix TCG kick timer leak
  docs/devel/memory.txt: Document _with_attrs accessors
  hw/nvram/fw_cfg: Use memberwise copy of MemoryRegionOps struct
  memory: Remove old_mmio accessors
  memory: Fix access_with_adjusted_size(small size) on big-endian memory regions
  memory: Refactor common shifting code from accessors
  memory: Use MAKE_64BIT_MASK()
  virtio: do not take address of packed members
  replay: replay BH for IDE trim operation
  hostmem-file: make available memory-backend-file on POSIX-based hosts
  target/i386: fix translation for icount mode
  hvf: drop unused variable
  qom/object: add some interface asserts
  accel/tcg: Remove dead code
  lsi53c895a: convert to trace-events
  scsi-block: Deprecate rotation_rate
  kvmclock: run KVM_KVMCLOCK_CTRL ioctl in vcpu thread
  MAINTAINERS: add myself as elf2dmp maintainer
  contrib: add elf2dmp tool
  ...

Signed-off-by: Peter Maydell <[email protected]>

hw/scsi/mptendian: Avoid taking address of fields in packed structs

Taking the address of a field in a packed struct is a bad idea, because
it might not be actually aligned enough for that pointer type (and
thus cause a crash on dereference on some host architectures). Newer
versions of clang warn about this. Avoid the bug by not using the
"modify in place" byte swapping functions.

This patch was produced with the following simple spatch script:
@@
expression E;
@@
-le16_to_cpus(&E);
+E = le16_to_cpu(E);
@@
expression E;
@@
-le32_to_cpus(&E);
+E = le32_to_cpu(E);
@@
expression E;
@@
-le64_to_cpus(&E);
+E = le64_to_cpu(E);
@@
expression E;
@@
-cpu_to_le16s(&E);
+E = cpu_to_le16(E);
@@
expression E;
@@
-cpu_to_le32s(&E);
+E = cpu_to_le32(E);
@@
expression E;
@@
-cpu_to_le64s(&E);
+E = cpu_to_le64(E);

followed by some minor tidying of overlong lines and bad indent.

Signed-off-by: Peter Maydell <[email protected]>
Message-Id: <20180927134852 [email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

cpus: fix TCG kick timer leak

This is an alternative fix to Marc-André's original patch.

Reported-by: Marc-André Lureau <[email protected]>
Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Alex Bennée <[email protected]>
Message-Id: <20180927171724 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

docs/devel/memory.txt: Document _with_attrs accessors

When we added the _with_attrs accessors we forgot to mention
them in the documentation.

Signed-off-by: Peter Maydell <[email protected]>
Message-Id: <20180824170422 [email protected]>
Based-on: <20180802174042 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hw/nvram/fw_cfg: Use memberwise copy of MemoryRegionOps struct

We've now removed the 'old_mmio' member from MemoryRegionOps,
so we can perform the copy as a simple struct copy rather
than having to do it via a memberwise copy.

Signed-off-by: Peter Maydell <[email protected]>
Message-Id: <20180824170422 [email protected]>
Based-on: <20180802174042 [email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

memory: Remove old_mmio accessors

Now that all the users of old_mmio MemoryRegion accessors
have been converted, we can remove the core code support.

Signed-off-by: Peter Maydell <[email protected]>
Message-Id: <20180824170422 [email protected]>
Based-on: <20180802174042 [email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

memory: Fix access_with_adjusted_size(small size) on big-endian memory regions

Memory regions configured as DEVICE_BIG_ENDIAN (or DEVICE_NATIVE_ENDIAN on
big-endian guest) behave incorrectly when the memory access 'size' is smaller
than the implementation 'access_size'.

In the following code segment from access_with_adjusted_size():

    if (memory_region_big_endian(mr)) {
        for (i = 0; i < size; i += access_size) {
            r |= access_fn(mr, addr + i, value, access_size,
                        (size - access_size - i) * 8, access_mask, attrs);
        }

(size - access_size - i) * 8 is the number of bits that will arithmetic
shift the current value.

Currently we can only 'left' shift a read() access, and 'right' shift a write().

When the access 'size' is smaller than the implementation, we get a negative
number of bits to shift.

For the read() case, a negative 'left' shift is a 'right' shift :)
However since the 'shift' type is unsigned, there is currently no way to
right shift.

Fix this by changing the access_fn() prototype to handle signed shift values,
and modify the memory_region_shift_read|write_access() helpers to correctly
arithmetic shift the opposite direction when the 'shift' value is negative.

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <20180927002416 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

memory: Refactor common shifting code from accessors

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <20180927002416 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

memory: Use MAKE_64BIT_MASK()

Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <20180927002416 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

virtio: do not take address of packed members

The address of a packed member is not packed, which may cause accesses
to unaligned pointers. Avoid this by reading the packed value before
passing it to another function.

Cc: Jason Wang <[email protected]>
Cc: Peter Maydell <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

replay: replay BH for IDE trim operation

This patch makes IDE trim BH deterministic, because it affects
the device state. Therefore its invocation should be replayed
instead of running at the random moment.

Signed-off-by: Pavel Dovgalyuk <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Message-Id: <20180912081950.3228.68987.stgit@pasha-VirtualBox>
Acked-by: John Snow <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hostmem-file: make available memory-backend-file on POSIX-based hosts

Before this change, memory-backend-file object is valid for Linux hosts
only because hostmem-file.c is compiled only on Linux hosts.
However, other POSIX-based hosts (such as macOS) can support
memory-backend-file object in the same way as on Linux hosts.
This patch makes hostmem-file.c and related functions to be compiled on
all POSIX-based hosts to make available memory-backend-file on them.

Signed-off-by: Hikaru Nishida <[email protected]>
Message-Id: <20180924123205 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target/i386: fix translation for icount mode

This patch fixes the checking of boundary crossing instructions.
In icount mode only first instruction of the block may cross
the page boundary to keep the translation deterministic.
These conditions already existed, but compared the wrong variable.

Signed-off-by: Pavel Dovgalyuk <[email protected]>
Message-Id: <20180920071702.22477.43980.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <[email protected]>

hvf: drop unused variable

Signed-off-by: Paolo Bonzini <[email protected]>

qom/object: add some interface asserts

An interface can't have any instance size or callback, or itself
implement other interfaces (this is unsupported).

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20180912125303 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

accel/tcg: Remove dead code

The global cpu_single_env variable has been removed more than 5 years
ago, so apparently nobody used this dead debug code in that timeframe
anymore. Thus let's remove it completely now.

Signed-off-by: Thomas Huth <[email protected]>
Message-Id: <1537204134 [email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

lsi53c895a: convert to trace-events

Signed-off-by: Mark Cave-Ayland <[email protected]>
Message-Id: <20180917053229 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

scsi-block: Deprecate rotation_rate

This option is added together with scsi-disk but is never honoured,
becuase we don't emulate the VPD page for scsi-block. We could intercept
and inject the user specified value like for max xfer len, but it's
probably not helpful since the intent of 070f80095ad was for random
entropy aspects, not for performance. If emulated rotation rate is
desired, scsi-hd is more suitable.

Signed-off-by: Fam Zheng <[email protected]>
Message-Id: <20180917083138 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

kvmclock: run KVM_KVMCLOCK_CTRL ioctl in vcpu thread

According to KVM API Documentation, we should only
run vcpu ioctls from the same thread that was used
to create the vcpu. This patch makes KVM_KVMCLOCK_CTRL
ioctl consistent with the Documentation.

No functional change.

Signed-off-by: Yongji Xie <[email protected]>
Signed-off-by: Chai Wen <[email protected]>
Message-Id: <1531315364 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Yongji Xie <[email protected]>

MAINTAINERS: add myself as elf2dmp maintainer

Add myself as contrib/elf2dmp maintainer and elf2dmp as maintained.

Signed-off-by: Viktor Prutyanov <[email protected]>
Message-Id: <20180918095422 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

contrib: add elf2dmp tool

elf2dmp is a converter from ELF dump (produced by 'dump-guest-memory') to
Windows MEMORY.DMP format (also know as 'Complete Memory Dump') which can be
opened in WinDbg.

This tool can help if VMCoreInfo device/driver is absent in Windows VM and
'dump-guest-memory -w' is not available but dump can be created in ELF format.

The tool works as follows:
1. Determine the system paging root looking at GS_BASE or KERNEL_GS_BASE
to locate the PRCB structure and finds the kernel CR3 nearby if QEMU CPU
state CR3 is not suitable.
2. Find an address within the kernel image by dereferencing the first
IDT entry and scans virtual memory upwards until the start of the
kernel.
3. Download a PDB matching the kernel from the Microsoft symbol store,
and figure out the layout of certain relevant structures necessary for
the dump.
4. Populate the corresponding structures in the memory image and create
the appropriate dump header.

Signed-off-by: Viktor Prutyanov <[email protected]>
Message-Id: <1535546488 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

dump: move Windows dump structures definitions

This patch moves definitions of Windows dump structures to
include/qemu/win_dump_defs.h to keep create_win_dump() prototype separate.

Signed-off-by: Viktor Prutyanov <[email protected]>
Message-Id: <1535546488 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hw: edu: replace device name with macro

Just as other devices do.

Signed-off-by: Li Qiang <[email protected]>
Message-Id: <1536901871 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

i386: Compile CPUX86State xsave_buf only when support KVM or HVF

While at it, also rename var to indicate it is not used only in KVM.

Reviewed-by: Nikita Leshchenko <[email protected]>
Reviewed-by: Patrick Colp <[email protected]>
Signed-off-by: Liran Alon <[email protected]>
Message-Id: <20180914003827 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target/i386: rename HF_SVMI_MASK to HF_GUEST_MASK

This flag will be used for KVM's nested VMX migration; the HF_GUEST_MASK name
is already used in KVM, adopt it in QEMU as well.

Signed-off-by: Paolo Bonzini <[email protected]>

target/i386: unify masking of interrupts

Interrupt handling depends on various flags in env->hflags or env->hflags2,
and the exact detail were not exactly replicated between x86_cpu_has_work
and x86_cpu_exec_interrupt. Create a new function that extracts the
highest-priority non-masked interrupt, and use it in both functions.

Signed-off-by: Paolo Bonzini <[email protected]>

char-pty: remove unnecessary #ifdef

For some reason __APPLE__ was not checked in pty code. However, the #ifdef
is redundant: this file is already compiled only if CONFIG_POSIX, same as
util/qemu-openpty.c which it uses.

Reported-by: Roman Bolshakov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

test-char: add socket reconnect test

This test exhibits a regression fixed by the previous reverts.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20180817135224 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

test-char: fix random socket test failure

Peter reported a test failure on FreeBSD with the new reconnect test:

MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
gtester -k --verbose -m=quick tests/test-char
TEST: tests/test-char... (pid=16190)
  /char/null:                                                          OK
  /char/invalid:                                                       OK
  /char/ringbuf:                                                       OK
  /char/mux:                                                           OK
  /char/stdio:                                                         OK
  /char/pipe:                                                          OK
  /char/file:                                                          OK
  /char/file-fifo:                                                     OK
  /char/udp:                                                           OK
  /char/serial:                                                        OK
  /char/hotswap:                                                       OK
  /char/socket/basic:                                                  OK
  /char/socket/reconnect:                                              FAIL
GTester: last random seed: R02S521380d9c12f1dac3ad1763bf5665c27
(pid=16367)
  /char/socket/fdpass:                                                 OK
FAIL: tests/test-char
**
ERROR:tests/test-char.c:353:char_socket_test_common: assertion failed:
(object_property_get_bool(OBJECT(chr_client), "connected",
&error_abort))

It turns out that the socket test code checks both server and client
connection states, but doesn't wait for both.

Wait for the client side as well.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Message-Id: <20180823143125 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char-socket: update all ioc handlers when changing context

So far, tcp_chr_update_read_handler() only updated the read
handler. Let's also update the hup handler.

Factorize the code while at it. (note that s->ioc != NULL when
s->connected)

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20180817135224 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Revert "chardev: tcp: postpone async connection setup"

This reverts commit 25679e5d58e258e9950685ffbd0cae4cd40d9cc2.

This commit broke "reconnect socket" chardev that are created after
"machine_done": they no longer try to connect. It broke also
vhost-user-test that uses chardev while there is no "machine_done"
event.

The goal of this patch was to move the "connect" source to the
frontend context. chr->gcontext is set with
qemu_chr_fe_set_handlers(). But there is no guarantee that it will be
called, so we can't delay connection until then: the chardev should
still attempt to connect during open(). qemu_chr_fe_set_handlers() is
eventually called later and will update the context.

Unless there is a good reason to not use initially the default
context, I think we should revert to the previous state to fix the
regressions.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20180817135224 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Revert "chardev: tcp: postpone TLS work until machine done"

This reverts commit 99f2f54174a595e3ada6e4332fcd2b37ebb0d55d.

See next commit reverting 25679e5d58e258e9950685ffbd0cae4cd40d9cc2 as
well for rationale.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20180817135224 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

memory: cleanup side effects of memory_region_init_foo() on failure

if MemoryRegion intialization fails it's left in semi-initialized state,
where it's size is not 0 and attached as child to owner object.
And this leds to crash in following use-case:
    (monitor) object_add memory-backend-file,id=mem1,size=99999G,mem-path=/tmp/foo,discard-data=yes
    memory.c:2083: memory_region_get_ram_ptr: Assertion `mr->ram_block' failed
    Aborted (core dumped)
it happens due to assumption that memory region is intialized when
   memory_region_size() != 0
and therefore it's ok to access it in
   file_backend_unparent()
      if (memory_region_size() != 0)
          memory_region_get_ram_ptr()

which happens when object_add fails and unparents failed backend making
file_backend_unparent() access invalid memory region.

Fix it by making sure that memory_region_init_foo() APIs cleanup externally
visible side effects on failure (like set size to 0 and unparenting object)

Signed-off-by: Igor Mammedov <[email protected]>
Message-Id: <1536064777 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hw: hyperv_testdev: add read callback

Signed-off-by: Li Qiang <[email protected]>
Message-Id: <20180912160118 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hw: pc-testdev: add read memory region callback

Also change the write callback name.

Signed-off-by: Li Qiang <[email protected]>
Message-Id: <20180912160118 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hw: debugexit: add read callback

Signed-off-by: Li Qiang <[email protected]>
Message-Id: <20180912160118 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

fw_cfg_mem: add read memory region callback

Signed-off-by: Li Qiang <[email protected]>
Message-Id: <20180912160118 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

ui: fix virtual timers

UI uses timers based on virtual clock for managing key queue.
This is incorrect because this service is not related to the guest state,
and its events should not be recorded and replayed. But these timers should
stop when the guest is not executing.
This patch changes using virtual clock to the new virtual_ext clock,
which runs as virtual clock, but its timers are not saved to the log.

Signed-off-by: Pavel Dovgalyuk <[email protected]>
Message-Id: <20180912082013.3228.33664.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <[email protected]>

slirp: fix ipv6 timers

ICMP implementation for IPv6 uses timers based on virtual clock.
This is incorrect because this service is not related to the guest state,
and its events should not be recorded and replayed.
This patch changes using virtual clock to the new virtual_ext clock.

Signed-off-by: Pavel Dovgalyuk <[email protected]>
Message-Id: <20180912082007.3228.91491.stgit@pasha-VirtualBox>
Reviewed-by: Samuel Thibault <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

timer: introduce new virtual clock

Slirp and VNC modules use virtual clock for processing some events that
are related to the guest execution speed.
But virtual clock-related events are consideres to be deterministic and
are recorded/replayed by icount mechanism. But slirp and VNC lie outside
the recorded guest core (which includes CPU and peripherals).
Therefore slirp and VNC are external for the guest, but should work at
guest speed.
This patch introduces new virtual clock which can be used for external
subsystems for running timers that are synchronized with the guest.

Signed-off-by: Pavel Dovgalyuk <[email protected]>
Message-Id: <20180912082002.3228.82417.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <[email protected]>

replay: allow loading any snapshots before recording

This patch enables using -loadvm in recording mode to allow starting
the execution recording from any of the available snapshots.
It also fixes loading of the record/replay state, therefore snapshots
created in replay mode may also be used for starting the new recording.

Signed-off-by: Pavel Dovgalyuk <[email protected]>
Message-Id: <20180912081939.3228.56131.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <[email protected]>

translator: fix breakpoint processing

QEMU cannot pass through the breakpoints when 'si' command is used
in remote gdb. This patch disables inserting the breakpoints
when we are already single stepping though the gdb remote protocol.
This patch also fixes icount calculation for the blocks that include
breakpoints - instruction with breakpoint is not executed and shouldn't
be used in icount calculation.

Signed-off-by: Pavel Dovgalyuk <[email protected]>
Message-Id: <20180912081910.3228.8523.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <[email protected]>

replay: flush events when exiting

This patch adds events processing when emulation finishes instead
of just cleaning the queue. Now the bdrv coroutines will be in consistent
state when emulator closes. It allows correct polling of the block layer
at exit.

Signed-off-by: Pavel Dovgalyuk <[email protected]>
Message-Id: <20180912081859.3228.79735.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <[email protected]>

replay: wake up vCPU when replaying

In record/replay icount mode vCPU thread and iothread synchronize
the execution using the checkpoints.
vCPU thread processes the virtual timers and iothread processes all others.
When iothread wants to wake up sleeping vCPU thread, it sends dummy queued
work. Therefore it could be the following sequence of the events in
record mode:
- IO: sending dummy work
- IO: processing timers
- CPU: wakeup
- CPU: clearing dummy work
- CPU: processing virtual timers

But due to the races in replay mode the sequence may change:
- IO: sending dummy work
- CPU: wakeup
- CPU: clearing dummy work
- CPU: sleeping again because nothing to do
- IO: Processing timers
- CPU: zzzz

In this case vCPU will not wake up, because dummy work is not to be set up
again.

This patch tries to wake up the vCPU when it sleeps and the icount warp
checkpoint isn't met. It means that vCPU has something to do, because
there are no other reasons of non-matching warp checkpoint.

Signed-off-by: Pavel Dovgalyuk <[email protected]>
--

v5: improve checking that vCPU is still sleeping
Message-Id: <20180912081945.3228.19776.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <[email protected]>

configure: enable mttcg for i386 and x86_64

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target/i386: move x86_64_hregs to DisasContext

And convert it to a bool to use an existing hole
in the struct.

Signed-off-by: Emilio G. Cota <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target/i386: move cpu_tmp1_i64 to DisasContext

Signed-off-by: Emilio G. Cota <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target/i386: move cpu_tmp3_i32 to DisasContext

Signed-off-by: Emilio G. Cota <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target/i386: move cpu_tmp2_i32 to DisasContext

Signed-off-by: Emilio G. Cota <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target/i386: move cpu_ptr1 to DisasContext

Signed-off-by: Emilio G. Cota <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target/i386: move cpu_ptr0 to DisasContext

Signed-off-by: Emilio G. Cota <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target/i386: move cpu_tmp4 to DisasContext

Signed-off-by: Emilio G. Cota <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target/i386: move cpu_tmp0 to DisasContext

Signed-off-by: Emilio G. Cota <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target/i386: move cpu_T1 to DisasContext

Signed-off-by: Emilio G. Cota <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target/i386: move cpu_T0 to DisasContext

Signed-off-by: Emilio G. Cota <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target/i386: move cpu_A0 to DisasContext

Signed-off-by: Emilio G. Cota <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target/i386: move cpu_cc_srcT to DisasContext

Signed-off-by: Emilio G. Cota <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

change get_image_size return type to int64_t

Previously, if the size of initrd >=2G, qemu exits with error:
root@haswell-OptiPlex-9020:/home/lizj# /home/lizhijian/lkp/qemu-colo/x86_64-softmmu/qemu-system-x86_64 -kernel ./vmlinuz-4.16.0-rc4 -initrd large.cgz -nographic
qemu: error reading initrd large.cgz: No such file or directory
root@haswell-OptiPlex-9020:/home/lizj# du -sh large.cgz
2.5G large.cgz

this patch changes the caller side that use this function to calculate
size of initrd file as well.

v2: update error message and int64_t printing format

Signed-off-by: Li Zhijian <[email protected]>
Message-Id: <1536833233 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Delete PID file on exit

Register an exit notifier to remove the PID file. By the time atexit()
is called, qemu_write_pidfile() guarantees QEMU owns the PID file,
thus we could safely remove it when exiting.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20180907121319 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

serial: fix DLL writes

Commit 0147883450fe84bb8de2d4a58381881f4262ce9b tries to handle
word-sized writes to DLL/DLH, but due to a typo,
this patch is causing tracebacks in all Linux kernels running the PXA
serial driver, due to an unexpected DLL register value. Here is the
surrounding code from drivers/tty/serial/pxa.c:

serial_out(up, UART_DLL, quot & 0xff); /* LS of divisor */

/*
* work around Errata #75 according to Intel(R) PXA27x
* Processor Family Specification Update (Nov 2005)
*/
dll = serial_in(up, UART_DLL);
WARN_ON(dll != (quot & 0xff)); // <-- warning

Reported-by: Guenter Roeck <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Fixes: 0147883450fe84bb8de2d4a58381881f4262ce9b
Signed-off-by: Paolo Bonzini <[email protected]>

util: use fcntl() for qemu_write_pidfile() locking

Daniel BerrangÃ© suggested to use fcntl() locks rather than lockf().

'man lockf':

   On Linux, lockf() is just an interface on top of fcntl(2) locking.
   Many other systems implement lockf() in this way, but note that
   POSIX.1 leaves the relationship between lockf() and fcntl(2) locks
   unspecified.  A portable application should probably avoid mixing
   calls to these interfaces.

IOW, if its just a shim around fcntl() on many systems, it is clearer
if we just use fcntl() directly, as we then know how fcntl() locks will
behave if they're on a network filesystem like NFS.

Suggested-by: Daniel P. Berrangé <[email protected]>
Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20180831145314 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

util: add qemu_write_pidfile()

There are variants of qemu_create_pidfile() in qemu-pr-helper and
qemu-ga. Let's have a common implementation in libqemuutil.

The code is initially based from pr-helper write_pidfile(), with
various improvements and suggestions from Daniel BerrangÃ©:

  QEMU will leave the pidfile existing on disk when it exits which
  initially made me think it avoids the deletion race. The app
  managing QEMU, however, may well delete the pidfile after it has
  seen QEMU exit, and even if the app locks the pidfile before
  deleting it, there is still a race.

  eg consider the following sequence

        QEMU 1        libvirtd        QEMU 2

  1.    lock(pidfile)

  2.    exit()

  3.                 open(pidfile)

  4.                 lock(pidfile)

  5.                                  open(pidfile)

  6.                 unlink(pidfile)

  7.                 close(pidfile)

  8.                                  lock(pidfile)

  IOW, at step 8 the new QEMU has successfully acquired the lock, but
  the pidfile no longer exists on disk because it was deleted after
  the original QEMU exited.

  While we could just say no external app should ever delete the
  pidfile, I don't think that is satisfactory as people don't read
  docs, and admins don't like stale pidfiles being left around on
  disk.

  To make this robust, I think we might want to copy libvirt's
  approach to pidfile acquisition which runs in a loop and checks that
  the file on disk /after/ acquiring the lock matches the file that
  was locked. Then we could in fact safely let QEMU delete its own
  pidfiles on clean exit..

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20180831145314 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hw/char/sh_serial: Add timeout handling to unbreak serial input

As of commit 18e8cf159177100e ("serial: sh-sci: increase RX FIFO trigger
defaults for (H)SCIF") in Linux v4.11-rc1, the serial console on the
QEMU SH4 target is broken: it delays serial input until enough data has
been received.

Since aforementioned commit, the Linux SCIF driver programs the Receive
FIFO Data Count Trigger bits in the FIFO Control Register, to postpone
generating a receive interrupt until:
  1. At least the receive trigger count of bytes of data are available
     in the receive FIFO, OR
  2. No further data has been received for at least 15 etu after the
     last received data.

While QEMU implements the former, it does not implement the latter.
Hence the receive interrupt is not generated until the former condition
is met.

Fix this by adding basic timeout handling.  As the QEMU SCIF emulation
ignores any serial speed programming, the timeout value used conforms to
a default speed of 9600 bps, which is fine for any interactive console.

Reported-by: Rob Landley <[email protected]>
Signed-off-by: Geert Uytterhoeven <[email protected]>
Tested-by: Ulrich Hecht <[email protected]>
Tested-by: Rob Landley <[email protected]>
Tested-by: Rich Felker <[email protected]>
Message-Id: <20180905131125 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

configure: preserve various environment variables in config.status

The config.status script is auto-generated by configure upon
completion. The intention is that config.status can be later invoked by
the developer directly, or by make indirectly, to re-detect the same
environment that configure originally used.

The current config.status script, however, only contains a record of the
command line arguments to configure. Various environment variables have
an effect on what configure will find. In particular PKG_CONFIG_LIBDIR &
PKG_CONFIG_PATH vars will affect what libraries pkg-config finds. The
PATH var will affect what toolchain binaries and XXXX-config scripts are
found. The LD_LIBRARY_PATH var will affect what libraries are
found. Most commands have env variables that will override the name/path
of the default version configure finds.

All these key env variables should be recorded in the config.status script.

Autoconf would also preserve CFLAGS, LDFLAGS, LIBS, CPPFLAGS, but QEMU
deals with those differently, expecting extra flags to be set using
configure args, rather than env variables. At the end of the script we
also don't have the original values of those env vars, as we modify them
during configure.

Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Stefan Weil <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Message-Id: <20180904123603 [email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Daniel P. Berrangé <[email protected]>

kvm: x86: Fix kvm_arch_fixup_msi_route for remap-less case

The AMD IOMMU does not (yet) support interrupt remapping. But
kvm_arch_fixup_msi_route assumes that all implementations do and crashes
when the AMD IOMMU is used in KVM mode.

Fixes: 8b5ed7dffa1f ("intel_iommu: add support for split irqchip")
Reported-by: Christopher Goldsworthy <[email protected]>
Signed-off-by: Jan Kiszka <[email protected]>
Message-Id: <48ae78d8-58ec-8813-8680-6f407ea46041@siemens.com>
Reviewed-by: Peter Xu <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hostmem-memfd: add checks before adding hostmem-memfd & properties

Run some memfd-related checks before registering hostmem-memfd &
various properties. This will help libvirt to figure out what the host
is supposed to be capable of.

qemu_memfd_check() is changed to a less optimized version, since it is
used with various flags, it no longer caches the result.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20180906161415 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

dump: fix Windows dump memory run mapping

We should map and use guest memory run by parts if it can't be mapped as
a whole.
After this patch, continuos guest physical memory blocks which are not
continuos in host virtual address space will be processed correctly.

Signed-off-by: Viktor Prutyanov <[email protected]>
Message-Id: <1535567456 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

cpus: access .qemu_icount_bias with atomic64

Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <20180910232752 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

cpus: access .qemu_icount with atomic64

Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <20180910232752 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

cpus: take seqlock across qemu_icount updates

Even though writes of qemu_icount can safely race with reads in
qemu_icount_raw, qemu_icount is also read by icount_adjust, which
runs in the I/O thread. Therefore, writes do needs protection of
the vm_clock_lock; for simplicity the patch protects it with both
seqlock+spinlock, which we already do for hosts that lack 64-bit atomics.

The bug actually predated the introduction of vm_clock_lock;
cpu_update_icount would have needed the BQL before the spinlock was
introduced.

Reported-by: Emilio G. Cota <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

test-rcu-list: access n_reclaims and n_nodes_removed with atomic64

To avoid undefined behaviour.

Note that these "atomics" are atomic in the "access once" sense.
The variables are updated by a single thread at a time, so no
"full" atomics are necessary.

Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <20180910232752 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

qsp: use atomic64 accessors

With the seqlock, we either have to use atomics to remain
within defined behaviour (and note that 64-bit atomics aren't
always guaranteed to compile, irrespective of __nocheck), or
drop the atomics and be in undefined behaviour territory.

Fix it by dropping the seqlock and using atomic64 accessors.
This will limit scalability when !CONFIG_ATOMIC64, but those
machines (1) don't have many users and (2) are unlikely to
have many cores.

- With CONFIG_ATOMIC64:
$ tests/atomic_add-bench -n 1 -m -p
Throughput: 13.00 Mops/s

- Forcing !CONFIG_ATOMIC64:
$ tests/atomic_add-bench -n 1 -m -p
Throughput: 10.89 Mops/s

Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <20180910232752 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

tests: add atomic64-bench

- With CONFIG_ATOMIC64:
$ tests/atomic64-bench  -n 1
Throughput:         310.40 Mops/s

- Without:
$ tests/atomic64-bench  -n 1
Throughput:         149.08 Mops/s

Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <20180910232752 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

util: add atomic64

This introduces read/set accessors for int64_t and uint64_t.

Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <20180910232752 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

cacheinfo: add i/d cache_linesize_log

Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <20180910232752 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

cpus: initialize timers_state.vm_clock_lock

We forgot to initialize the spinlock introduced in 94377115b2
("cpus: protect TimerState writes with a spinlock", 2018-08-23).
Fix it.

Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <20180903171831 [email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

atomic: fix comment s/x64_64/x86_64/

Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <20180903171831 [email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

ps2: prevent changing irq state on save and load

Commit 2858ab09e6f708e381fc1a1cc87e747a690c4884 changed
PS/2 keyboard/mouse buffers to the standard size. However, its state
may change when migrating from the old buffer size and therefore irq needs
updating. But this change made wrong, because it throws the whole queue
if there are too much data instead of cropping it.

That commit also updates irq (because the queue state may change).
But updating the irq may change the VM state (and determinism of
the execution). E.g., when replaying the execution, one may save
the VM state and the state of the interrupt controller will be updated
at the moment of saving, instead of using the recorded update events.

This patch makes the queue update deterministic: it removes the update_irq
call and crops the queue to prevent losing the characters and changing
the required irq status.

Signed-off-by: Pavel Dovgalyuk <[email protected]>
Message-Id: <20180511081601.14610.39946.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <[email protected]>

es1370: fix ADC_FRAMEADR and ADC_FRAMECNT

They are not consecutive with DAC1_FRAME* and DAC2_FRAME*.

Fixes: 154c1d1f960c5147a3f8ef00907504112f271cd8
Signed-off-by: Paolo Bonzini <[email protected]>

qsp: hide indirect function calls from Coverity

Coverity does not see anymore that qemu_mutex_lock is taking a lock.
Hide all the QSP magic so that static analysis works again.

Signed-off-by: Paolo Bonzini <[email protected]>

virtio: Return true from virtio_queue_empty if broken

Both virtio-blk and virtio-scsi use virtio_queue_empty() as the
loop condition in VQ handlers (virtio_blk_handle_vq,
virtio_scsi_handle_cmd_vq). When a device is marked broken in
virtqueue_pop, for example if a vIOMMU address translation failed, we
want to break out of the loop.

This fixes a hanging problem when booting a CentOS 3.10.0-862.el7.x86_64
kernel with ATS enabled:

  $ qemu-system-x86_64 \
    ... \
    -device intel-iommu,intremap=on,caching-mode=on,eim=on,device-iotlb=on \
    -device virtio-scsi-pci,iommu_platform=on,ats=on,id=scsi0,bus=pci.4,addr=0x0

The dead loop happens immediately when the kernel boots and initializes
the device, where virtio_scsi_data_plane_handle_cmd will not return:

    > ...
    > #13 0x00005586602b7793 in virtio_scsi_handle_cmd_vq
    > #14 0x00005586602b8d66 in virtio_scsi_data_plane_handle_cmd
    > #15 0x00005586602ddab7 in virtio_queue_notify_aio_vq
    > #16 0x00005586602dfc9f in virtio_queue_host_notifier_aio_poll
    > #17 0x00005586607885da in run_poll_handlers_once
    > #18 0x000055866078880e in try_poll_mode
    > #19 0x00005586607888eb in aio_poll
    > #20 0x0000558660784561 in aio_wait_bh_oneshot
    > #21 0x00005586602b9582 in virtio_scsi_dataplane_stop
    > #22 0x00005586605a7110 in virtio_bus_stop_ioeventfd
    > #23 0x00005586605a9426 in virtio_pci_stop_ioeventfd
    > #24 0x00005586605ab808 in virtio_pci_common_write
    > #25 0x0000558660242396 in memory_region_write_accessor
    > #26 0x00005586602425ab in access_with_adjusted_size
    > #27 0x0000558660245281 in memory_region_dispatch_write
    > #28 0x00005586601e008e in flatview_write_continue
    > #29 0x00005586601e01d8 in flatview_write
    > #30 0x00005586601e04de in address_space_write
    > #31 0x00005586601e052f in address_space_rw
    > #32 0x00005586602607f2 in kvm_cpu_exec
    > #33 0x0000558660227148 in qemu_kvm_cpu_thread_fn
    > #34 0x000055866078bde7 in qemu_thread_start
    > #35 0x00007f5784906594 in start_thread
    > #36 0x00007f5784639e6f in clone

With this patch, virtio_queue_empty will now return 1 as soon as the
vdev is marked as broken, after a "virtio: zero sized buffers are not
allowed" error.

To be consistent, update virtio_queue_empty_rcu as well.

Signed-off-by: Fam Zheng <[email protected]>
Message-Id: <20180910145616 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge remote-tracking branch 'remotes/dgibson/tags/libfdt-20181002' into staging

Update dtc submodule to v1.4.7

We have some upcoming things planned for ppc that will require some
newer libfdt features.  In preparation, update the dtc/libfdt
submodule to upstreasm version v1.4.7.

# gpg: Signature made Tue 02 Oct 2018 05:23:43 BST
# gpg:                using RSA key 6C38CACA20D9B392
# gpg: Good signature from "David Gibson <[email protected]>"
# gpg:                 aka "David Gibson (Red Hat) <[email protected]>"
# gpg:                 aka "David Gibson (ozlabs.org) <[email protected]>"
# gpg:                 aka "David Gibson (kernel.org) <[email protected]>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/libfdt-20181002:
  Update dtc/libfdt submodule to v1.4.7

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/xtensa/tags/20181001-xtensa' into staging

target/xtensa: preparation for FLIX support

Separate generation of per-instruction code (such as raising exceptions
and terminating TB) from per-opcode code.

# gpg: Signature made Mon 01 Oct 2018 19:14:34 BST
# gpg:                using RSA key 51F9CC91F83FA044
# gpg: Good signature from "Max Filippov <[email protected]>"
# gpg:                 aka "Max Filippov <[email protected]>"
# gpg:                 aka "Max Filippov <[email protected]>"
# Primary key fingerprint: 2B67 854B 98E5 327D CDEB  17D8 51F9 CC91 F83F A044

* remotes/xtensa/tags/20181001-xtensa:
  target/xtensa: extract gen_check_interrupts call
  target/xtensa: make rsr/wsr helpers return void
  target/xtensa: extract unconditional TB termination via slot 0
  target/xtensa: always end TB on CCOUNT access/CCOMPARE write
  target/xtensa: change SR number checks to assertions
  target/xtensa: extract unconditional TB termination
  target/xtensa: extract test for division by zero
  target/xtensa: extract test for cpdisabled exception
  target/xtensa: extract test for alloca exception
  target/xtensa: extract test for window underflow exception
  target/xtensa: extract test for window overflow exception
  target/xtensa: extract test for debug exception
  target/xtensa: extract test for syscall instruction
  target/xtensa: extract test for privileged instruction
  target/xtensa: extract test for an illegal instruction

Signed-off-by: Peter Maydell <[email protected]>