Git Repo - qemu.git/log

]> Git Repo - qemu.git/log

Paolo Bonzini [Tue, 28 Jul 2015 16:34:07 +0000 (18:34 +0200)]

virtio-blk-dataplane: delete bottom half before the AioContext is freed

Other uses of aio_bh_new are safe as long as all scheduled bottom
halves are run before an iothread is destroyed, which bdrv_drain will
ensure:

- archipelago_finish_aiocb: BH deletes itself

- inject_error: BH deletes itself

- blkverify_aio_bh: BH deletes itself

- abort_aio_request: BH deletes itself

- curl_aio_readv: BH deletes itself

- gluster_finish_aiocb: BH deletes itself

- bdrv_aio_rw_vector: BH deletes itself

- bdrv_co_maybe_schedule_bh: BH deletes itself

- iscsi_schedule_bh, iscsi_co_generic_cb: BH deletes itself

- laio_attach_aio_context: deleted in laio_detach_aio_context,
called through bdrv_detach_aio_context before deleting the iothread

- nfs_co_generic_cb: BH deletes itself

- null_aio_common: BH deletes itself

- qed_aio_complete: BH deletes itself

- rbd_finish_aiocb: BH deletes itself

- dma_blk_cb: BH deletes itself

- virtio_blk_dma_restart_cb: BH deletes itself

- qemu_bh_new: main loop AioContext is never destroyed

- test-aio.c: bh_delete_cb deletes itself, otherwise deleted in
the same function that calls aio_bh_new

Reported-by: Cornelia Huck <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-id: 1438101249 [email protected]
Message-Id: <1438086628 [email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

commit | commitdiff | tree

Peter Maydell [Tue, 28 Jul 2015 18:02:04 +0000 (19:02 +0100)]

Merge remote-tracking branch 'remotes/stefanha/tags/net-pull-request' into staging

Pull request

These two .can_receive() are now reviewed.  The net subsystem queue for 2.4 is now empty.

# gpg: Signature made Tue Jul 28 13:26:03 2015 BST using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg:                 aka "Stefan Hajnoczi <[email protected]>"

* remotes/stefanha/tags/net-pull-request:
  xen: Drop net_rx_ok
  hw/net: handle flow control in mcf_fec driver receiver

Signed-off-by: Peter Maydell <[email protected]>

commit | commitdiff | tree

Peter Maydell [Tue, 28 Jul 2015 16:09:56 +0000 (17:09 +0100)]

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

virtio fixes for 2.4

Mostly virtio 1 spec compliance fixes.
We are unlikely to make it perfectly compliant in
the first release, but it seems worth it to try.

Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Mon Jul 27 21:55:48 2015 BST using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>"
# gpg:                 aka "Michael S. Tsirkin <[email protected]>"

* remotes/mst/tags/for_upstream:
  virtio: minor cleanup
  acpi: fix pvpanic device is not shown in ui
  virtio-blk: only clear VIRTIO_F_ANY_LAYOUT for legacy device
  virtio-blk: fail get_features when both scsi and 1.0 were set
  virtio: get_features() can fail
  virtio-pci: fix memory MR cleanup for modern
  virtio: set any_layout in virtio core
  virtio-9p: fix any_layout
  virtio-serial: fix ANY_LAYOUT
  virtio: hide legacy features from modern guests

Signed-off-by: Peter Maydell <[email protected]>

commit | commitdiff | tree

Peter Maydell [Tue, 28 Jul 2015 14:25:24 +0000 (15:25 +0100)]

Merge remote-tracking branch 'remotes/lalrae/tags/mips-20150728' into staging

MIPS patches 2015-07-28

Changes:
* net/dp8393x fixes
* Vectored Interrupts bug fix
* fix for a bug in machine.c which was provoking a warning on FreeBSD

# gpg: Signature made Tue Jul 28 10:47:19 2015 BST using RSA key ID 0B29DA6B
# gpg: Good signature from "Leon Alrae <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 8DD3 2F98 5495 9D66 35D4  4FC0 5211 8E3C 0B29 DA6B

* remotes/lalrae/tags/mips-20150728:
  net/dp8393x: do not use memory_region_init_rom_device with NULL
  net/dp8393x: remove check of runt packets
  net/dp8393x: disable user creation
  target-mips: fix offset calculation for Interrupts
  target-mips: fix passing incompatible pointer type in machine.c

Signed-off-by: Peter Maydell <[email protected]>

commit | commitdiff | tree

Peter Maydell [Tue, 28 Jul 2015 13:19:16 +0000 (14:19 +0100)]

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* crypto fixes
* megasas SIGSEGV fix
* memory refcount change to fix virtio hot-unplug

# gpg: Signature made Tue Jul 28 08:29:07 2015 BST using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg:                 aka "Paolo Bonzini <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  memory: do not add a reference to the owner of aliased regions
  megasas: Add write function to handle write access to PCI BAR 3
  crypto: extend unit tests to cover decryption too
  crypto: fix built-in AES decrypt function

Signed-off-by: Peter Maydell <[email protected]>

commit | commitdiff | tree

Peter Maydell [Tue, 28 Jul 2015 12:22:57 +0000 (13:22 +0100)]

Merge remote-tracking branch 'remotes/cody/tags/jtc-for-upstream-pull-request' into staging

# gpg: Signature made Tue Jul 28 05:22:29 2015 BST using RSA key ID C0DE3057
# gpg: Good signature from "Jeffrey Cody <[email protected]>"
# gpg:                 aka "Jeffrey Cody <[email protected]>"
# gpg:                 aka "Jeffrey Cody <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 9957 4B4D 3474 90E7 9D98  D624 BDBE 7B27 C0DE 3057

* remotes/cody/tags/jtc-for-upstream-pull-request:
  block/ssh: Avoid segfault if inet_connect doesn't set errno.
  sheepdog: serialize requests to overwrapping area

Signed-off-by: Peter Maydell <[email protected]>

commit | commitdiff | tree

Fam Zheng [Tue, 28 Jul 2015 09:52:56 +0000 (17:52 +0800)]

xen: Drop net_rx_ok

Let net_rx_packet() (which checks the same conditions) drops the packet
if the device is not ready. Drop net_xen_info.can_receive and update the
return value for the buffer full case.

We rely on the qemu_flush_queued_packets() in net_event() to wake up
the peer when the buffer becomes available again.

Signed-off-by: Fam Zheng <[email protected]>
Message-id: 1438077176 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

commit | commitdiff | tree

Peter Maydell [Tue, 28 Jul 2015 10:28:44 +0000 (11:28 +0100)]

Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2015-07-27' into staging

trivial patches for 2015-07-27

# gpg: Signature made Mon Jul 27 20:50:14 2015 BST using RSA key ID A4C3D7DB
# gpg: Good signature from "Michael Tokarev <[email protected]>"
# gpg:                 aka "Michael Tokarev <[email protected]>"
# gpg:                 aka "Michael Tokarev <[email protected]>"

* remotes/mjt/tags/pull-trivial-patches-2015-07-27:
  gdbstub: Set current CPU on interruptions
  qapi: add missing @
  Fix Cortex-A9 global timer
  gitignore: Ignore shader generated files
  vmstate: remove unused declaration
  make: Clean build messages
  qemu-common.h: Document cutils.c string functions
  device_tree: Fix a typo
  hw/acpi/ich9: clean up stale comment about KVM not supporting SMM
  hw/acpi/ich9: clear smi_en on reset

Signed-off-by: Peter Maydell <[email protected]>

commit | commitdiff | tree

Greg Ungerer [Tue, 28 Jul 2015 01:02:54 +0000 (11:02 +1000)]

hw/net: handle flow control in mcf_fec driver receiver

The network mcf_fec driver emulated receive side method is not dealing
with network queue flow control properly.

Modify the receive side to check if we have enough space in the
descriptors to store the current packet. If not we process none of it
and return 0. When the guest frees up some buffers through its descriptors
we signal the qemu net layer to send more packets.

[Fixed coding style: 4-space indent and curly braces on if statement.
--Stefan]

Signed-off-by: Greg Ungerer <[email protected]>
Message-id: 1438045374 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

commit | commitdiff | tree

Hervé Poussineau [Sun, 26 Jul 2015 20:32:55 +0000 (22:32 +0200)]

net/dp8393x: do not use memory_region_init_rom_device with NULL

Replace memory_region_init_rom_device() with memory_region_init_ram() and
memory_region_set_readonly().
This fixes a guest-triggerable QEMU crash when guest tries to write to PROM.

Signed-off-by: Hervé Poussineau <[email protected]>
[[email protected]: shorten subject length]
Signed-off-by: Leon Alrae <[email protected]>

commit | commitdiff | tree

Hervé Poussineau [Fri, 24 Jul 2015 18:42:23 +0000 (20:42 +0200)]

net/dp8393x: remove check of runt packets

Ethernet requires that messages are at least 64 bytes on the wire. This
limitation does not exist on emulation (no wire message), so remove the
check. Netcard is now able to receive small network packets.

Signed-off-by: Hervé Poussineau <[email protected]>
Reviewed-by: Aurelien Jarno <[email protected]>
Signed-off-by: Leon Alrae <[email protected]>

commit | commitdiff | tree

Hervé Poussineau [Fri, 24 Jul 2015 18:42:21 +0000 (20:42 +0200)]

net/dp8393x: disable user creation

Netcard needs an address space to write data to, which can't be specified
on command line.
This fixes a crash when user starts QEMU with "-device dp8393x"

Signed-off-by: Hervé Poussineau <[email protected]>
Reviewed-by: Aurelien Jarno <[email protected]>
Signed-off-by: Leon Alrae <[email protected]>

commit | commitdiff | tree

Peter Maydell [Tue, 28 Jul 2015 08:11:48 +0000 (09:11 +0100)]

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches for 2.4.0-rc3

# gpg: Signature made Mon Jul 27 16:19:17 2015 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"

* remotes/kevin/tags/for-upstream:
block: qemu-iotests - add check for multiplication overflow in vpc
block: vpc - prevent overflow if max_table_entries >= 0x40000000

Signed-off-by: Peter Maydell <[email protected]>

commit | commitdiff | tree

Yongbok Kim [Fri, 10 Jul 2015 11:10:02 +0000 (12:10 +0100)]

target-mips: fix offset calculation for Interrupts

Correct computation of vector offsets for EXCP_EXT_INTERRUPT.
For instance, if Cause.IV is 0 the vector offset should be 0x180.

Simplify the finding vector number logic for the Vectored Interrupts.

Signed-off-by: Yongbok Kim <[email protected]>
Reviewed-by: Leon Alrae <[email protected]>
[[email protected]: cosmetic changes]
Signed-off-by: Leon Alrae <[email protected]>

commit | commitdiff | tree

Leon Alrae [Wed, 22 Jul 2015 13:59:23 +0000 (14:59 +0100)]

target-mips: fix passing incompatible pointer type in machine.c

Reported-by: Peter Maydell <[email protected]>
Signed-off-by: Leon Alrae <[email protected]>

commit | commitdiff | tree

Richard W.M. Jones [Wed, 22 Jul 2015 13:27:47 +0000 (14:27 +0100)]

block/ssh: Avoid segfault if inet_connect doesn't set errno.

On some (but not all) systems:

  $ qemu-img create -f qcow2 overlay -b ssh://xen/
  Segmentation fault

It turns out this happens when inet_connect returns -1 in the
following code, but errno == 0.

  s->sock = inet_connect(s->hostport, errp);
  if (s->sock < 0) {
      ret = -errno;
      goto err;
  }

In the test case above, no host called "xen" exists, so getaddrinfo fails.

On Fedora 22, getaddrinfo happens to set errno = ENOENT (although it
is *not* documented to do that), so it doesn't segfault.

On RHEL 7, errno is not set by the failing getaddrinfo, so ret =
-errno = 0, so the caller doesn't know there was an error and
continues with a half-initialized BDRVSSHState struct, and everything
goes south from there, eventually resulting in a segfault.

Fix this by setting ret to -EIO (same as block/nbd.c and
block/sheepdog.c).  The real error is saved in the Error** errp
struct, so it is printed correctly:

  $ ./qemu-img create -f qcow2 overlay -b ssh://xen/
  qemu-img: overlay: address resolution failed for xen:22: No address associated with hostname

Signed-off-by: Richard W.M. Jones <[email protected]>
Reported-by: Jun Li
BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1147343
Signed-off-by: Jeff Cody <[email protected]>

commit | commitdiff | tree

Hitoshi Mitake [Fri, 17 Jul 2015 16:44:24 +0000 (01:44 +0900)]

sheepdog: serialize requests to overwrapping area

Current sheepdog driver only serializes create requests in oid
unit. This mechanism isn't enough for handling requests to
overwrapping area spanning multiple oids, so it can result bugs like
below:
https://bugs.launchpad.net/sheepdog-project/+bug/1456421

This patch adds a new serialization mechanism for the problem. The
difference from the old one is:
1. serialize entire aiocb if their targetting areas overwrap
2. serialize all requests (read, write, and discard), not only creates

This patch also removes the old mechanism because the new one can be
an alternative.

Cc: Kevin Wolf <[email protected]>
Cc: Stefan Hajnoczi <[email protected]>
Cc: Teruaki Ishizaki <[email protected]>
Cc: Vasiliy Tolstov <[email protected]>
Signed-off-by: Hitoshi Mitake <[email protected]>
Tested-by: Vasiliy Tolstov <[email protected]>
Signed-off-by: Jeff Cody <[email protected]>

commit | commitdiff | tree

Paolo Bonzini [Mon, 27 Jul 2015 14:29:56 +0000 (16:29 +0200)]

memory: do not add a reference to the owner of aliased regions

Very often the owner of the aliased region is the same as the owner of the alias
region itself.  When this happens, the reference count can never go back to 0 and
the owner is leaked.  This is for example breaking hot-unplug of virtio-pci
devices (the device cannot be plugged back again with the same id).

Another common use for alias is to transform the system I/O address space
into an MMIO regions; in this case the aliased region never dies, so there
is no problem.  Otherwise the owner is always the same for aliasing
and aliased region.

I checked all calls to memory_region_init_alias introduced after commit
dfde4e6 (memory: add ref/unref calls, 2013-05-06) and they do not need the
reference in order to keep the owner of the aliased region alive.

Reported-by: Michael S. Tsirkin <[email protected]>
Tested-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

commit | commitdiff | tree

Salva Peiró [Mon, 27 Jul 2015 08:51:52 +0000 (10:51 +0200)]

megasas: Add write function to handle write access to PCI BAR 3

This patch fixes a QEMU SEGFAULT when a write operation is performed on
the memory region of the PCI BAR 3 (base address space).
When a writeb(0xe0000000) is performed the .write function is invoked to
handle the write access, however, since the .write is not initialised,
the call to 0, causes QEMU to SEGFAULT.

Signed-off-by: Salva Peiró <[email protected]>
Acked-by: Hannes Reinecke <[email protected]>
Message-Id: <1437987112 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

commit | commitdiff | tree

Michael S. Tsirkin [Mon, 27 Jul 2015 15:39:37 +0000 (18:39 +0300)]

virtio: minor cleanup

There's no need for blk to set ANY_LAYOUT, it's
done by virtio core as necessary.

Signed-off-by: Michael S. Tsirkin <[email protected]>

commit | commitdiff | tree

Gal Hammer [Sun, 26 Jul 2015 08:00:51 +0000 (11:00 +0300)]

acpi: fix pvpanic device is not shown in ui

Commit 2332333c added a _STA method that hides the device. The fact
that the device is not shown in the gui make it harder to install its
Windows' device.

https://bugzilla.redhat.com/show_bug.cgi?id=1238141

Signed-off-by: Gal Hammer <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>

commit | commitdiff | tree

Jan Kiszka [Fri, 24 Jul 2015 16:52:31 +0000 (18:52 +0200)]

gdbstub: Set current CPU on interruptions

gdb expects that the thread ID for c and g-class operations is set to
the CPU we provide when reporting VM stop conditions. If the stub is
still tuned to a different CPU, the wrong information is delivered to
the gdb frontend.

Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

commit | commitdiff | tree

Marc-André Lureau [Fri, 3 Jul 2015 09:51:01 +0000 (11:51 +0200)]

qapi: add missing @

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

commit | commitdiff | tree

Johannes Schlatow [Mon, 29 Jun 2015 15:45:41 +0000 (17:45 +0200)]

Fix Cortex-A9 global timer

The auto increment bit of the timer control register was wrongly
defined.

See Cortex-A9 MPcore Technical Reference Manual, Section 4.4.2.

Signed-off-by: Johannes Schlatow <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

commit | commitdiff | tree

Michal Privoznik [Tue, 23 Jun 2015 12:30:20 +0000 (14:30 +0200)]

gitignore: Ignore shader generated files

As of d98bc0b65 there are two files that are automatically generated:
ui/shader/texture-blit-frag.h and /ui/shader/texture-blit-vert.h. None
of them is wanted to be tracked by git. Put them into the ignore file
then.

Signed-off-by: Michal Privoznik <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

commit | commitdiff | tree

Marc-André Lureau [Tue, 23 Jun 2015 16:41:27 +0000 (18:41 +0200)]

vmstate: remove unused declaration

Since 38e0735e, register_device_unmigratable() has been removed

Signed-off-by: Marc-André Lureau <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

commit | commitdiff | tree

Stefan Weil [Sat, 18 Jul 2015 14:54:32 +0000 (16:54 +0200)]

make: Clean build messages

We want to have uniform build messages, so fix some messages
which did not follow the standard pattern.

Signed-off-by: Stefan Weil <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

commit | commitdiff | tree

Peter Maydell [Sun, 19 Jul 2015 20:34:22 +0000 (21:34 +0100)]

qemu-common.h: Document cutils.c string functions

Add documentation comments for various utility string functions
which we have implemented in util/cutils.c:
pstrcpy()
strpadcpy()
pstrcat()
strstart()
stristart()
qemu_strnlen()
qemu_strsep()

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

commit | commitdiff | tree

Kamalesh Babulal [Fri, 24 Jul 2015 08:18:13 +0000 (13:48 +0530)]

device_tree: Fix a typo

Fix spelling of 'allocting' -> 'allocating'.

Signed-off-by: Kamalesh Babulal <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

commit | commitdiff | tree

Laszlo Ersek [Fri, 24 Jul 2015 18:16:01 +0000 (20:16 +0200)]

hw/acpi/ich9: clean up stale comment about KVM not supporting SMM

Commit fba72476c6 ("ich9: add smm_enabled field and arguments") detached
SMM availability from kvm_enabled(). However, the comment in pm_reset()
was not updated; let's do it now.

Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Igor Mammedov <[email protected]>
Cc: Gerd Hoffmann <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: [email protected]
Signed-off-by: Laszlo Ersek <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

commit | commitdiff | tree

Laszlo Ersek [Fri, 24 Jul 2015 18:16:00 +0000 (20:16 +0200)]

hw/acpi/ich9: clear smi_en on reset

Otherwise on reboot firmware might think (due to APMC_EN remaining set
from the previous boot) that SMI support is absent.

Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Igor Mammedov <[email protected]>
Cc: Gerd Hoffmann <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: [email protected]
Signed-off-by: Laszlo Ersek <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

commit | commitdiff | tree

Peter Maydell [Mon, 27 Jul 2015 18:37:09 +0000 (19:37 +0100)]

Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20150727' into staging

Fix buglets for 2.4

# gpg: Signature made Mon Jul 27 15:26:48 2015 BST using RSA key ID 4DD0279B
# gpg: Good signature from "Richard Henderson <[email protected]>"
# gpg:                 aka "Richard Henderson <[email protected]>"
# gpg:                 aka "Richard Henderson <[email protected]>"

* remotes/rth/tags/pull-tcg-20150727:
  tcg: mark temps as mem_coherent = 0 for mov with a constant
  tcg: correctly mark dead inputs for mov with a constant

Signed-off-by: Peter Maydell <[email protected]>

commit | commitdiff | tree

Paolo Bonzini [Fri, 24 Jul 2015 11:42:55 +0000 (13:42 +0200)]

main-loop: fix qemu_notify_event for aio_notify optimization

aio_notify can be optimized away, and in fact almost always will.  However,
qemu_notify_event is used in places where this is incorrect---most notably,
when handling SIGTERM.  When aio_notify is optimized away, it is possible that
QEMU enters a blocking ppoll immediately afterwards and stays there, without
reaching main_loop_should_exit().

Fix this by using a bottom half.  The bottom half can be optimized too, but
scheduling it is enough for the ppoll not to block.  The hang is thus avoided.

Reported-by: Peter Maydell <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Message-id: 1437738175 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

commit | commitdiff | tree

Jeff Cody [Fri, 24 Jul 2015 14:26:52 +0000 (10:26 -0400)]

block: qemu-iotests - add check for multiplication overflow in vpc

This checks that VPC is able to successfully fail (without segfault)
on an image file with a max_table_entries that exceeds 0x40000000.

This table entry is within the valid range for VPC (although too large
for this sample image).

Cc: [email protected]
Signed-off-by: Jeff Cody <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

commit | commitdiff | tree

Jeff Cody [Fri, 24 Jul 2015 14:26:51 +0000 (10:26 -0400)]

block: vpc - prevent overflow if max_table_entries >= 0x40000000

When we allocate the pagetable based on max_table_entries, we multiply
the max table entry value by 4 to accomodate a table of 32-bit integers.
However, max_table_entries is a uint32_t, and the VPC driver accepts
ranges for that entry over 0x40000000. So during this allocation:

s->pagetable = qemu_try_blockalign(bs->file, s->max_table_entries * 4);

The size arg overflows, allocating significantly less memory than
expected.

Since qemu_try_blockalign() size argument is size_t, cast the
multiplication correctly to prevent overflow.

The value of "max_table_entries * 4" is used elsewhere in the code as
well, so store the correct value for use in all those cases.

We also check the Max Tables Entries value, to make sure that it is <
SIZE_MAX / 4, so we know the pagetable size will fit in size_t.

Cc: [email protected]
Reported-by: Richard W.M. Jones <[email protected]>
Signed-off-by: Jeff Cody <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

commit | commitdiff | tree

Peter Maydell [Fri, 24 Jul 2015 17:28:08 +0000 (18:28 +0100)]

configure: Work around broken static pkg-config info for Ubuntu gnutls

Unfortunately Ubuntu's pkg-config information for gnutls is broken
for the static linking case, and outputs --libs options which the
compiler does not recognize. Work around this problem by testing
that the --cflags/--libs output will at least allow compilation
before enabling gnutls support.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Daniel P. Berrange <[email protected]>
Message-id: 1437758888 [email protected]

commit | commitdiff | tree

Jason Wang [Mon, 27 Jul 2015 09:49:21 +0000 (17:49 +0800)]

virtio-blk: only clear VIRTIO_F_ANY_LAYOUT for legacy device

Chapter 6.3 of spec said

"
Transitional devices MUST offer, and if offered by the device
transitional drivers MUST accept the following:

VIRTIO_F_ANY_LAYOUT (27)
"

So this patch only clear VIRTIO_F_LAYOUT for legacy device.

Cc: Stefan Hajnoczi <[email protected]>
Cc: Kevin Wolf <[email protected]>
Cc: [email protected]
Signed-off-by: Jason Wang <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>

commit | commitdiff | tree

Jason Wang [Mon, 27 Jul 2015 09:49:20 +0000 (17:49 +0800)]

virtio-blk: fail get_features when both scsi and 1.0 were set

SCSI passthrough was no longer supported in virtio 1.0, so this patch
fail the get_features() when both 1.0 and scsi is set. And also only
advertise VIRTIO_BLK_F_SCSI for legacy virtio-blk device.

Signed-off-by: Jason Wang <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>

commit | commitdiff | tree

Jason Wang [Mon, 27 Jul 2015 09:49:19 +0000 (17:49 +0800)]

virtio: get_features() can fail

Signed-off-by: Jason Wang <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>

commit | commitdiff | tree

Michael S. Tsirkin [Mon, 27 Jul 2015 08:06:17 +0000 (11:06 +0300)]

virtio-pci: fix memory MR cleanup for modern

Each memory_region_add_subregion must be paired with
memory_region_del_subregion.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>

commit | commitdiff | tree

Aurelien Jarno [Mon, 27 Jul 2015 10:55:58 +0000 (12:55 +0200)]

tcg: mark temps as mem_coherent = 0 for mov with a constant

When a constant has to be loaded in a mov op, we fail to set
mem_coherent = 0. This patch fixes that.

Cc: Richard Henderson <[email protected]>
Signed-off-by: Aurelien Jarno <[email protected]>
Message-Id: <1437994568 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

commit | commitdiff | tree

Aurelien Jarno [Mon, 27 Jul 2015 10:55:57 +0000 (12:55 +0200)]

tcg: correctly mark dead inputs for mov with a constant

When tcg_reg_alloc_mov propagate a constant, we failed to correctly mark
a temp as dead if the liveness analysis hints so. This fixes the
following assert when configure with --enable-debug-tcg:

qemu-x86_64: tcg/tcg.c:1827: tcg_reg_alloc_bb_end: Assertion `ts->val_type == TEMP_VAL_DEAD' failed.

Cc: Richard Henderson <[email protected]>
Reported-by: Richard Henderson <[email protected]>
Signed-off-by: Aurelien Jarno <[email protected]>
Message-Id: <1437994568 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

commit | commitdiff | tree

Peter Maydell [Mon, 27 Jul 2015 13:53:42 +0000 (14:53 +0100)]

Merge remote-tracking branch 'remotes/stefanha/tags/net-pull-request' into staging

Pull request

Here are NIC fixes from Fam Zheng that prevent rx hangs (caused by NIC models
where .can_receive() stops rx but qemu_flush_queued_packets() isn't called).

# gpg: Signature made Mon Jul 27 14:51:48 2015 BST using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg:                 aka "Stefan Hajnoczi <[email protected]>"

* remotes/stefanha/tags/net-pull-request:
  axienet: Flush queued packets when rx is done
  dp8393x: Flush packets when link comes up
  stellaris_enet: Flush queued packets when read done
  mipsnet: Flush queued packets when receiving is enabled
  milkymist-minimac2: Flush queued packets when link comes up
  mcf_fec: Drop mcf_fec_can_receive
  etsec: Flush queue when rx buffer is consumed
  etsec: Move etsec_can_receive into etsec_receive
  usbnet: Drop usbnet_can_receive
  eepro100: Drop nic_can_receive
  pcnet: Drop pcnet_can_receive
  xgmac: Drop packets with eth_can_rx is false.
  hw/net: fix mcf_fec driver receiver
  hw/net: add simple phy support to mcf_fec driver
  hw/net: add ANLPAR bit definitions to generic mii
  hw/net: create common collection of MII definitions

Signed-off-by: Peter Maydell <[email protected]>

commit | commitdiff | tree

Fam Zheng [Wed, 15 Jul 2015 10:19:13 +0000 (18:19 +0800)]

axienet: Flush queued packets when rx is done

eth_can_rx checks s->rxsize and returns false if it is non-zero. Because
of the .can_receive semantics change, this will make the incoming queue
disabled by peer, until it is explicitly flushed. So we should flush it
when s->rxsize is becoming zero.

Squash eth_can_rx semantics into etx_rx and drop .can_receive()
callback, also add flush when rx buffer becomes available again after a
packet gets queued.

The other conditions, "!axienet_rx_resetting(s) &&
axienet_rx_enabled(s)" are OK because enet_write already calls
qemu_flush_queued_packets when the register bits are changed.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Message-id: 1436955553 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

commit | commitdiff | tree

Fam Zheng [Wed, 15 Jul 2015 10:19:12 +0000 (18:19 +0800)]

dp8393x: Flush packets when link comes up

.can_receive callback changes semantics that once return 0, backend will
try sending again until explicitly flushed, change the device to meet
that.

dp8393x_can_receive checks SONIC_CR_RXEN bit in SONIC_CR register and
SONIC_ISR_RBE bit in SONIC_ISR register, try flushing the queue when
either bit is being updated.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Message-id: 1436955553 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

commit | commitdiff | tree

Fam Zheng [Wed, 15 Jul 2015 10:19:11 +0000 (18:19 +0800)]

stellaris_enet: Flush queued packets when read done

If s->np reaches 31, the queue will be disabled by peer when it sees
stellaris_enet_can_receive() returns false, until we explicitly flushes
it which notifies the peer. Do this when guest is done reading all
existing data.

Move the semantics to stellaris_enet_receive, by returning 0 when the
buffer is full, so that new packets will be queued. In
stellaris_enet_read, flush and restart the queue when guest has done
reading.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Message-id: 1436955553 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

commit | commitdiff | tree

Fam Zheng [Wed, 15 Jul 2015 10:19:10 +0000 (18:19 +0800)]

mipsnet: Flush queued packets when receiving is enabled

Drop .can_receive and move the semantics to mipsnet_receive, by
returning 0.

After 0 is returned, we must flush the queue explicitly to restart it:
Call qemu_flush_queued_packets when s->busy or s->rx_count is being
updated.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Message-id: 1436955553 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

commit | commitdiff | tree

Fam Zheng [Wed, 15 Jul 2015 10:19:09 +0000 (18:19 +0800)]

milkymist-minimac2: Flush queued packets when link comes up

Drop .can_receive and move the semantics into minimac2_rx, by returning
0.

That is once minimac2_rx returns 0, incoming packets will be queued
until the queue is explicitly flushed. We do this when s->regs[R_STATE0]
or s->regs[R_STATE1] is changed in minimac2_write.

Also drop the unused trace point.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Message-id: 1436955553 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

commit | commitdiff | tree

Fam Zheng [Wed, 15 Jul 2015 10:19:08 +0000 (18:19 +0800)]

mcf_fec: Drop mcf_fec_can_receive

The semantics of .can_receive requires us to flush the queue explicitly
when s->rx_enabled becomes true after it returns 0, but the packet being
queued is not meaningful since the guest hasn't activated the card.
Let's just drop the packet in this case.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Message-id: 1436955553 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

commit | commitdiff | tree

Fam Zheng [Wed, 15 Jul 2015 10:19:07 +0000 (18:19 +0800)]

etsec: Flush queue when rx buffer is consumed

The BH will be scheduled when etsec->rx_buffer_len is becoming 0, which
is the condition of queuing.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Message-id: 1436955553 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

commit | commitdiff | tree

Fam Zheng [Wed, 15 Jul 2015 10:19:06 +0000 (18:19 +0800)]

etsec: Move etsec_can_receive into etsec_receive

When etsec_reset returns 0, peer would queue the packet as if
.can_receive returns false. Drop etsec_can_receive and let etsec_receive
carry the semantics.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Message-id: 1436955553 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

commit | commitdiff | tree

Fam Zheng [Wed, 15 Jul 2015 10:19:05 +0000 (18:19 +0800)]

usbnet: Drop usbnet_can_receive

usbnet_receive already drops packet if rndis_state is not
RNDIS_DATA_INITIALIZED, and queues packet if in buffer is not available.
The only difference is s->dev.config but that is similar to rndis_state.

Drop usbnet_can_receive and move these checks to usbnet_receive, so that
we don't need to explicitly flush the queue when s->dev.config changes
value.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Message-id: 1436955553 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

commit | commitdiff | tree

Fam Zheng [Wed, 15 Jul 2015 10:19:04 +0000 (18:19 +0800)]

eepro100: Drop nic_can_receive

nic_receive already checks the conditions and drop packets if false.
Due to the new semantics since 6e99c63 ("net/socket: Drop
net_socket_can_send"), having .can_receive returning 0 requires us to
explicitly flush the queued packets when the conditions are becoming
true, but queuing the packets when guest driver is not ready doesn't
make much sense.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Message-id: 1436955553 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

commit | commitdiff | tree

Fam Zheng [Wed, 15 Jul 2015 10:19:03 +0000 (18:19 +0800)]

pcnet: Drop pcnet_can_receive

pcnet_receive already checks the conditions and drop packets if false.
Due to the new semantics since 6e99c63 ("net/socket: Drop
net_socket_can_send"), having .can_receive returning 0 requires us to
explicitly flush the queued packets when the conditions are becoming
true, but queuing the packets when guest driver is not ready doesn't
make much sense.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Message-id: 1436955553 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

commit | commitdiff | tree

Fam Zheng [Wed, 15 Jul 2015 10:19:02 +0000 (18:19 +0800)]

xgmac: Drop packets with eth_can_rx is false.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Message-id: 1436955553 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

commit | commitdiff | tree

Greg Ungerer [Fri, 26 Jun 2015 05:27:16 +0000 (15:27 +1000)]

hw/net: fix mcf_fec driver receiver

The network mcf_fec driver emulated receive side method is returning a
result of 0 causing the network layer to disable receive for this emulated
device. This results in the guest only ever receiving one packet.

Fix the recieve side processing to return the number of bytes that we
passed back through to the guest.

Signed-off-by: Greg Ungerer <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Message-id: 1435296436 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

commit | commitdiff | tree

Greg Ungerer [Fri, 26 Jun 2015 05:27:15 +0000 (15:27 +1000)]

hw/net: add simple phy support to mcf_fec driver

The Linux fec driver needs at least basic phy support to probe and work.
The current qemu mcf_fec emulation has no support for the reading or
writing of the MDIO lines to access an attached phy.

This code adds a very simple set of register results for a fixed phy
setup - very similar to that used on an m5208evb board. This is enough
to probe and identify an emulated attached phy.

Signed-off-by: Greg Ungerer <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Message-id: 1435296436 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

commit | commitdiff | tree

Greg Ungerer [Fri, 26 Jun 2015 05:27:14 +0000 (15:27 +1000)]

hw/net: add ANLPAR bit definitions to generic mii

Add a base set of bit definitions for the standard MII phy "Auto-Negotiation
Link Partner Ability Register" (ANLPAR).

The original definitions moved into mii.h from the allwinner_emac driver
did not define these.

Signed-off-by: Greg Ungerer <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Message-id: 1435296436 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

commit | commitdiff | tree

Greg Ungerer [Fri, 26 Jun 2015 05:27:13 +0000 (15:27 +1000)]

hw/net: create common collection of MII definitions

Create a common set of definitions of address and register values for
ethernet MII phys. A few of the current ethernet drivers have at least
a partial set of these definitions. Others just use hard coded raw
constant numbers.

This initial set is copied directly from the allwinner_emac code.

Signed-off-by: Greg Ungerer <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Message-id: 1435296436 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

commit | commitdiff | tree

Peter Maydell [Mon, 27 Jul 2015 12:10:00 +0000 (13:10 +0100)]

Merge remote-tracking branch 'remotes/jnsnow/tags/cve-2015-5154-pull-request' into staging

# gpg: Signature made Mon Jul 27 13:01:10 2015 BST using RSA key ID AAFC390E
# gpg: Good signature from "John Snow (John Huston) <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: FAEB 9711 A12C F475 812F  18F2 88A9 064D 1835 61EB
#      Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76  CBD0 7DEF 8106 AAFC 390E

* remotes/jnsnow/tags/cve-2015-5154-pull-request:
  ide: Clear DRQ after handling all expected accesses
  ide/atapi: Fix START STOP UNIT command completion
  ide: Check array bounds before writing to io_buffer (CVE-2015-5154)

Signed-off-by: Peter Maydell <[email protected]>

commit | commitdiff | tree

Daniel P. Berrange [Tue, 21 Jul 2015 08:55:02 +0000 (09:55 +0100)]

crypto: extend unit tests to cover decryption too

The current unit test only verifies the encryption API,
resulting in us missing a recently introduced bug in the
decryption API from commit d3462e3. It was fortunately
later discovered & fixed by commit bd09594, thanks to the
QEMU I/O tests for qcow2 encryption, but we should really
detect this directly in the crypto unit tests. Also remove
an accidental debug message and simplify some asserts.

Signed-off-by: Daniel P. Berrange <[email protected]>
Message-Id: <1437468902 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

commit | commitdiff | tree

Daniel P. Berrange [Fri, 24 Jul 2015 12:23:54 +0000 (13:23 +0100)]

crypto: fix built-in AES decrypt function

The qcrypto_cipher_decrypt_aes method was using the wrong
key material, and passing the wrong mode. This caused it
to incorrectly decrypt ciphertext.

Signed-off-by: Daniel P. Berrange <[email protected]>
Message-Id: <1437740634 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

commit | commitdiff | tree

Michael S. Tsirkin [Wed, 22 Jul 2015 09:32:25 +0000 (12:32 +0300)]

virtio: set any_layout in virtio core

Exceptions:
- virtio-blk
- compat machine types

Signed-off-by: Michael S. Tsirkin <[email protected]>

commit | commitdiff | tree

Michael S. Tsirkin [Thu, 23 Jul 2015 17:57:53 +0000 (20:57 +0300)]

virtio-9p: fix any_layout

virtio pci allows any device to have a modern interface,
this in turn requires ANY_LAYOUT support.
Fix up ANY_LAYOUT for virtio-9p.

Reported-by: Jason Wang <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Jason Wang <[email protected]>

commit | commitdiff | tree

Michael S. Tsirkin [Thu, 23 Jul 2015 14:52:02 +0000 (17:52 +0300)]

virtio-serial: fix ANY_LAYOUT

Don't assume a specific layout for control messages.
Required by virtio 1.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Amit Shah <[email protected]>
Reviewed-by: Jason Wang <[email protected]>

commit | commitdiff | tree

Michael S. Tsirkin [Wed, 22 Jul 2015 10:09:25 +0000 (13:09 +0300)]

virtio: hide legacy features from modern guests

NOTIFY_ON_EMPTY, ANY_LAYOUT and BAD are only valid on the legacy
interface.

Hide them from modern guests.

Signed-off-by: Michael S. Tsirkin <[email protected]>

commit | commitdiff | tree

Kevin Wolf [Mon, 27 Jul 2015 03:42:53 +0000 (23:42 -0400)]

ide: Clear DRQ after handling all expected accesses

This is additional hardening against an end_transfer_func that fails to
clear the DRQ status bit. The bit must be unset as soon as the PIO
transfer has completed, so it's better to do this in a central place
instead of duplicating the code in all commands (and forgetting it in
some).

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: John Snow <[email protected]>

commit | commitdiff | tree

Kevin Wolf [Mon, 27 Jul 2015 03:42:53 +0000 (23:42 -0400)]

ide/atapi: Fix START STOP UNIT command completion

The command must be completed on all code paths. START STOP UNIT with
pwrcnd set should succeed without doing anything.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: John Snow <[email protected]>

commit | commitdiff | tree

Kevin Wolf [Mon, 27 Jul 2015 03:42:53 +0000 (23:42 -0400)]

ide: Check array bounds before writing to io_buffer (CVE-2015-5154)

If the end_transfer_func of a command is called because enough data has
been read or written for the current PIO transfer, and it fails to
correctly call the command completion functions, the DRQ bit in the
status register and s->end_transfer_func may remain set. This allows the
guest to access further bytes in s->io_buffer beyond s->data_end, and
eventually overflowing the io_buffer.

One case where this currently happens is emulation of the ATAPI command
START STOP UNIT.

This patch fixes the problem by adding explicit array bounds checks
before accessing the buffer instead of relying on end_transfer_func to
function correctly.

Cc: [email protected]
Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: John Snow <[email protected]>

commit | commitdiff | tree

Peter Maydell [Fri, 24 Jul 2015 12:07:10 +0000 (13:07 +0100)]

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* qemu-char fixes
* SCSI fixes (including CVE-2015-5158)
* RCU fixes
* Framebuffer logic to set DIRTY_MEMORY_VGA
* Fix compiler warning for --disable-vnc
* qemu-doc fixes
* x86 TCG pasto fix

# gpg: Signature made Fri Jul 24 12:57:52 2015 BST using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg:                 aka "Paolo Bonzini <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  target-i386/FPU: a misprint in helper_fistll_ST0
  qemu-doc: fix typos
  framebuffer: set DIRTY_MEMORY_VGA on RAM that is used for the framebuffer
  memory: count number of active VGA logging clients
  vl: Fix compiler warning for builds without VNC
  scsi: Handle no media case for scsi_get_configuration
  rcu: actually register threads that have RCU read-side critical sections
  scsi: fix buffer overflow in scsi_req_parse_cdb (CVE-2015-5158)
  vnc: fix memory leak
  qemu-char: Fix missed data on unix socket
  qemu-char: handle EINTR for TCP character devices
  exec.c: Use atomic_rcu_read() to access dispatch in memory_region_section_get_iotlb()

Signed-off-by: Peter Maydell <[email protected]>

commit | commitdiff | tree

Dmitry Poletaev [Wed, 8 Jul 2015 09:48:40 +0000 (12:48 +0300)]

target-i386/FPU: a misprint in helper_fistll_ST0

There is a cut-and-paste mistake in the patch
https://lists.gnu.org/archive/html/qemu-devel/2014-11/msg01657.html .
It cause errors in guest work. Here is the bugfix.

Signed-off-by: Dmitry Poletaev <[email protected]>
Reported-by: Kirill Batuzov <[email protected]>
Message-Id: <2692911436348920@web2m.yandex.ru>
Signed-off-by: Paolo Bonzini <[email protected]>

commit | commitdiff | tree

Gonglei [Fri, 3 Jul 2015 09:50:57 +0000 (17:50 +0800)]

qemu-doc: fix typos

Signed-off-by: Gonglei <[email protected]>
Message-Id: <1435917057 [email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

commit | commitdiff | tree

Paolo Bonzini [Mon, 13 Jul 2015 10:00:29 +0000 (12:00 +0200)]

framebuffer: set DIRTY_MEMORY_VGA on RAM that is used for the framebuffer

The MemoryRegionSection contains enough information to access the
RAM region underlying the framebuffer, and can be cached inside the
display device.

By doing this, the new framebuffer_update_memory_section function can
enable dirty memory logging on the relevant RAM region. The function
must be called whenever the stride or base of the framebuffer changes;
a simple way to cover these cases is to call it on every full frame
invalidation, which is a rare case.

framebuffer_update_display now works entirely on a MemoryRegionSection,
without going through cpu_physical_memory_map/unmap.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

commit | commitdiff | tree

Paolo Bonzini [Tue, 14 Jul 2015 11:56:53 +0000 (13:56 +0200)]

memory: count number of active VGA logging clients

For a board that has multiple framebuffer devices, both of them
might want to use DIRTY_MEMORY_VGA on the same memory region.
The lack of reference counting in memory_region_set_log makes
this very awkward to implement.

Suggested-by: Peter Maydell <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

commit | commitdiff | tree

Stefan Weil [Wed, 22 Jul 2015 17:53:30 +0000 (19:53 +0200)]

vl: Fix compiler warning for builds without VNC

This regression was caused by commit 70b94331.

  CC    vl.o
vl.c: In function ‘select_display’:
vl.c:2064:12: error: unused variable ‘err’ [-Werror=unused-variable]
     Error *err = NULL;
            ^

Reported-by: Claudio Fontana <[email protected]>
Signed-off-by: Stefan Weil <[email protected]>
Message-Id: <1437587610 [email protected]>
Reviewed-by: Wen Congyang <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

commit | commitdiff | tree

Matthew Rosato [Wed, 15 Jul 2015 18:52:32 +0000 (14:52 -0400)]

scsi: Handle no media case for scsi_get_configuration

Currently, scsi_get_configuration always returns a current
profile (DVD or CD), even when there is actually no media present.
By comparison, ide/atapi uses a default profile of 0 (MMC_PROFILE_NONE)
for this case and checks for tray_open, so let's do the same for scsi.

This fixes a problem I'm seeing with Fedora 22 guests where systemd
cdrom_id fails to unmount after a QEMU-initiated eject against a
scsi cdrom device because it believes the media is still present
(but unreadable).

Signed-off-by: Matthew Rosato <[email protected]>
Message-Id: <1436986352 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

commit | commitdiff | tree

Paolo Bonzini [Thu, 9 Jul 2015 06:55:38 +0000 (08:55 +0200)]

rcu: actually register threads that have RCU read-side critical sections

Otherwise, grace periods are detected too early!

Signed-off-by: Paolo Bonzini <[email protected]>

commit | commitdiff | tree

Paolo Bonzini [Tue, 21 Jul 2015 06:59:39 +0000 (08:59 +0200)]

scsi: fix buffer overflow in scsi_req_parse_cdb (CVE-2015-5158)

This is a guest-triggerable buffer overflow present in QEMU 2.2.0
and newer. scsi_cdb_length returns -1 as an error value, but the
caller does not check it.

Luckily, the massive overflow means that QEMU will just SIGSEGV,
making the impact much smaller.

Reported-by: Zhu Donghai (朱东海) <[email protected]>
Fixes: 1894df02811f6b79ea3ffbf1084599d96f316173
Reviewed-by: Fam Zheng <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>

commit | commitdiff | tree

Gonglei [Wed, 22 Jul 2015 09:08:53 +0000 (17:08 +0800)]

vnc: fix memory leak

If vnc's password is configured, it will leak memory
which cipher variable pointed on every vnc connection.

Cc: Daniel P. Berrange <[email protected]>
Reviewed-by: Daniel P. Berrange <[email protected]>
Signed-off-by: Gonglei <[email protected]>
Message-Id: <1437556133 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

commit | commitdiff | tree

Peter Maydell [Fri, 24 Jul 2015 10:11:30 +0000 (11:11 +0100)]

Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20150723' into staging

Last minute fixes for 2.4.

# gpg: Signature made Fri Jul 24 04:42:31 2015 BST using RSA key ID 4DD0279B
# gpg: Good signature from "Richard Henderson <[email protected]>"
# gpg:                 aka "Richard Henderson <[email protected]>"
# gpg:                 aka "Richard Henderson <[email protected]>"

* remotes/rth/tags/pull-tcg-20150723:
  tcg/optimize: fix tcg_opt_gen_movi
  tcg/aarch64: use 32-bit offset for 32-bit softmmu emulation
  tcg/aarch64: use 32-bit offset for 32-bit user-mode emulation
  tcg/aarch64: add ext argument to tcg_out_insn_3310
  tcg/i386: Extend addresses for 32-bit guests

Signed-off-by: Peter Maydell <[email protected]>

commit | commitdiff | tree

Peter Maydell [Fri, 24 Jul 2015 08:17:44 +0000 (09:17 +0100)]

Merge remote-tracking branch 'remotes/awilliam/tags/vfio-fixes-20150723.0' into staging

VFIO fixes for v2.4.0-rc3
- Fix Realtek NIC quirk (Alex Williamson)
- Restore bootindex functionality (Alex Williamson)

# gpg: Signature made Thu Jul 23 19:51:23 2015 BST using RSA key ID 3BB08B22
# gpg: Good signature from "Alex Williamson <[email protected]>"
# gpg:                 aka "Alex Williamson <[email protected]>"
# gpg:                 aka "Alex Williamson <[email protected]>"
# gpg:                 aka "Alex Williamson <[email protected]>"

* remotes/awilliam/tags/vfio-fixes-20150723.0:
  vfio/pci: Fix bootindex
  vfio/pci: Fix RTL8168 NIC quirks

Signed-off-by: Peter Maydell <[email protected]>

commit | commitdiff | tree

Aurelien Jarno [Fri, 10 Jul 2015 16:03:30 +0000 (18:03 +0200)]

tcg/optimize: fix tcg_opt_gen_movi

Due to a copy&paste, the new op value is tested against mov_i32 instead
of movi_i32. The test is therefore always false. Fix that.

Signed-off-by: Aurelien Jarno <[email protected]>
Message-Id: <1436544211 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

commit | commitdiff | tree

Richard Henderson [Thu, 23 Jul 2015 22:04:52 +0000 (18:04 -0400)]

tcg/aarch64: use 32-bit offset for 32-bit softmmu emulation

Similar to the same fix for user-mode, except this instance
occurs on the softmmu path. Again, the tlb addend must be
the base register, while the guest address is the index.

Reviewed-by: Aurelien Jarno <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

commit | commitdiff | tree

Paolo Bonzini [Wed, 15 Jul 2015 15:27:01 +0000 (17:27 +0200)]

tcg/aarch64: use 32-bit offset for 32-bit user-mode emulation

Thanks to the previous patch, it is now easy for tcg_out_qemu_ld and
tcg_out_qemu_st to use a 32-bit zero extended offset. However, the
guest base register x28 must be the base and addr_reg must be the
index.

Reported-by: Leon Alrae <[email protected]>
Reviewed-by: Aurelien Jarno <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <1436974021 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

commit | commitdiff | tree

Paolo Bonzini [Wed, 15 Jul 2015 15:27:00 +0000 (17:27 +0200)]

tcg/aarch64: add ext argument to tcg_out_insn_3310

The new argument lets you pick uxtw or uxtx mode for the offset
register. For now, all callers pass TCG_TYPE_I64 so that uxtx
is generated. The bits for uxtx are removed from I3312_TO_I3310.

Reported-by: Leon Alrae <[email protected]>
Reviewed-by: Aurelien Jarno <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <1436974021 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

commit | commitdiff | tree

Richard Henderson [Thu, 16 Jul 2015 21:25:49 +0000 (22:25 +0100)]

tcg/i386: Extend addresses for 32-bit guests

Removing the ??? comment explaining why it (mostly) worked.

Reviewed-by: Aurelien Jarno <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-Id: <1437081950 [email protected]>

commit | commitdiff | tree

Peter Maydell [Thu, 23 Jul 2015 11:54:53 +0000 (12:54 +0100)]

Merge remote-tracking branch 'remotes/ehabkost/tags/numa-pull-request' into staging

NUMA queue, 2015-07-22

# gpg: Signature made Wed Jul 22 19:11:04 2015 BST using RSA key ID 984DC5A6
# gpg: Good signature from "Eduardo Habkost <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/numa-pull-request:
  hostmem: Fix qemu_opt_get_bool() crash in host_memory_backend_init()

Signed-off-by: Peter Maydell <[email protected]>

commit | commitdiff | tree

Nils Carlson [Sun, 19 Jul 2015 20:39:56 +0000 (20:39 +0000)]

qemu-char: Fix missed data on unix socket

Commit 812c1057 introduced HUP detection on unix and tcp sockets prior
to a read in tcp_chr_read. This unfortunately broke CloudStack 4.2
which relied on the old behaviour where data on a socket was readable
even if a HUP was present.

A working solution is to properly check the return values from recv,
handling a closed socket once there is no more data to read.

Also enable polling for G_IO_NVAL to ensure the callback is called
for all possible events as these should now be possible to handle
with the improved error detection.

Signed-off-by: Nils Carlson <[email protected]>
Message-Id: <1437338396 [email protected]>
[Do not handle EINTR; use socket_error(). - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>

commit | commitdiff | tree

Paolo Bonzini [Tue, 21 Jul 2015 07:25:54 +0000 (09:25 +0200)]

qemu-char: handle EINTR for TCP character devices

Signed-off-by: Paolo Bonzini <[email protected]>

commit | commitdiff | tree

Peter Maydell [Mon, 20 Jul 2015 11:27:16 +0000 (12:27 +0100)]

exec.c: Use atomic_rcu_read() to access dispatch in memory_region_section_get_iotlb()

When accessing the dispatch pointer in an AddressSpace within an RCU
critical section we should always use atomic_rcu_read(). Fix an
access within memory_region_section_get_iotlb() which was incorrectly
doing a direct pointer access.

Signed-off-by: Peter Maydell <[email protected]>
Message-Id: <1437391637 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

commit | commitdiff | tree

Alex Williamson [Wed, 22 Jul 2015 20:56:01 +0000 (14:56 -0600)]

vfio/pci: Fix bootindex

bootindex was incorrectly changed to a device Property during the
platform code split, resulting in it no longer working. Remove it.

Signed-off-by: Alex Williamson <[email protected]>
Cc: [email protected] # v2.3+

commit | commitdiff | tree

Alex Williamson [Wed, 22 Jul 2015 20:56:01 +0000 (14:56 -0600)]

vfio/pci: Fix RTL8168 NIC quirks

The RTL8168 quirk correctly describes using bit 31 as a signal to
mark a latch/completion, but the code mistakenly uses bit 28.  This
causes the Realtek driver to spin on this register for quite a while,
20k cycles on Windows 7 v7.092 driver.  Then it gets frustrated and
tries to set the bit itself and spins for another 20k cycles.  For
some this still results in a working driver, for others not.  About
the only thing the code really does in its current form is protect
the guest from sneaking in writes to the real hardware MSI-X table.
The fix is obviously to use bit 31 as we document that we should.

The other problem doesn't seem to affect current drivers as nobody
seems to use these window registers for writes to the MSI-X table, but
we need to use the stored data when a write is triggered, not the
value of the current write, which only provides the offset.

Note that only the Windows drivers from Realtek seem to use these
registers, the Microsoft drivers provided with Windows 8.1 do not
access them, nor do Linux in-kernel drivers.

Link: https://bugs.launchpad.net/qemu/+bug/1384892
Signed-off-by: Alex Williamson <[email protected]>
Cc: [email protected] # v2.1+

commit | commitdiff | tree

Eduardo Habkost [Thu, 16 Jul 2015 20:29:12 +0000 (17:29 -0300)]

hostmem: Fix qemu_opt_get_bool() crash in host_memory_backend_init()

This fixes the following crash, introduced by commit
49d2e648e8087d154d8bf8b91f27c8e05e79d5a6:

  $ gdb --args qemu-system-x86_64 -machine pc,mem-merge=off -object memory-backend-ram,id=ram-node0,size=1024
  [...]
  Program received signal SIGABRT, Aborted.
  (gdb) bt
  #0  0x00007ffff253b8c7 in raise () at /lib64/libc.so.6
  #1  0x00007ffff253d52a in abort () at /lib64/libc.so.6
  #2  0x00007ffff253446d in __assert_fail_base () at /lib64/libc.so.6
  #3  0x00007ffff2534522 in  () at /lib64/libc.so.6
  #4  0x00005555558bb80a in qemu_opt_get_bool_helper (opts=0x55555621b650, name=name@entry=0x5555558ec922 "mem-merge", defval=defval@entry=true, del=del@entry=false) at qemu/util/qemu-option.c:388
  #5  0x00005555558bbb5a in qemu_opt_get_bool (opts=<optimized out>, name=name@entry=0x5555558ec922 "mem-merge", defval=defval@entry=true) at qemu/util/qemu-option.c:398
  #6  0x0000555555720a24 in host_memory_backend_init (obj=0x5555562ac970) at qemu/backends/hostmem.c:226

Instead of using qemu_opt_get_bool(), that didn't work with
qemu_machine_opts for a long time, we can use the corresponding
MachineState fields.

Reviewed-by: Marcel Apfelbaum <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

commit | commitdiff | tree

Peter Maydell [Wed, 22 Jul 2015 17:17:19 +0000 (18:17 +0100)]

Update version for v2.4.0-rc2 release

Signed-off-by: Peter Maydell <[email protected]>

commit | commitdiff | tree

Peter Maydell [Wed, 22 Jul 2015 15:22:49 +0000 (16:22 +0100)]

Merge remote-tracking branch 'remotes/elmarco/tags/for-upstream' into staging

qxl: build fix for 2.4

# gpg: Signature made Wed Jul 22 15:55:00 2015 BST using DSA key ID F43F0992
# gpg: Good signature from "Marc-André Lureau <[email protected]>"
# gpg:                 aka "Marc-Andre Lureau <[email protected]>"
# gpg:                 aka "Marc-Andre Lureau <[email protected]>"
# gpg:                 aka "Marc-André Lureau <[email protected]>"
# gpg:                 aka "Marc-André Lureau (elmarco) <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 7346 2483 9404 4E20 ABFF  7D48 D864 9487 F43F 0992

* remotes/elmarco/tags/for-upstream:
  qxl: Fix new function name for spice-server library

Signed-off-by: Peter Maydell <[email protected]>

commit | commitdiff | tree

Frediano Ziglio [Mon, 20 Jul 2015 08:43:23 +0000 (09:43 +0100)]

qxl: Fix new function name for spice-server library

The new spice-server function to limit the number of monitors (0.12.6)
changed while development from spice_qxl_set_monitors_config_limit to
spice_qxl_max_monitors (accepted upstream).
By mistake I post patch with former name.
This patch fix the function name.

Signed-off-by: Frediano Ziglio <[email protected]>
Acked-by: Christophe Fergeau <[email protected]>
Acked-by: Martin Kletzander <[email protected]>
Signed-off-by: Marc-André Lureau <[email protected]>

commit | commitdiff | tree

Peter Maydell [Wed, 22 Jul 2015 11:52:34 +0000 (12:52 +0100)]

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

# gpg: Signature made Wed Jul 22 12:43:35 2015 BST using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg:                 aka "Stefan Hajnoczi <[email protected]>"

* remotes/stefanha/tags/block-pull-request:
  AioContext: optimize clearing the EventNotifier
  AioContext: fix broken placement of event_notifier_test_and_clear
  AioContext: fix broken ctx->dispatching optimization
  aio-win32: reorganize polling loop
  tests: remove irrelevant assertions from test-aio
  qemu-timer: initialize "timers_done_ev" to set
  mirror: Speed up bitmap initial scanning

Signed-off-by: Peter Maydell <[email protected]>

commit | commitdiff | tree

Paolo Bonzini [Tue, 21 Jul 2015 14:07:53 +0000 (16:07 +0200)]

AioContext: optimize clearing the EventNotifier

It is pretty rare for aio_notify to actually set the EventNotifier. It
can happen with worker threads such as thread-pool.c's, but otherwise it
should never be set thanks to the ctx->notify_me optimization. The
previous patch, unfortunately, added an unconditional call to
event_notifier_test_and_clear; now add a userspace fast path that
avoids the call.

Note that it is not possible to do the same with event_notifier_set;
it would break, as proved (again) by the included formal model.

This patch survived over 3000 reboots on aarch64 KVM.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Tested-by: Richard W.M. Jones <[email protected]>
Message-id: 1437487673 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

commit | commitdiff | tree

Paolo Bonzini [Tue, 21 Jul 2015 14:07:52 +0000 (16:07 +0200)]

AioContext: fix broken placement of event_notifier_test_and_clear

event_notifier_test_and_clear must be called before processing events.
Otherwise, an aio_poll could "eat" the notification before the main
I/O thread invokes ppoll().  The main I/O thread then never wakes up.
This is an example of what could happen:

   i/o thread       vcpu thread                     worker thread
   ---------------------------------------------------------------------
   lock_iothread
   notify_me = 1
   ...
   unlock_iothread
                                                     bh->scheduled = 1
                                                     event_notifier_set
                    lock_iothread
                    notify_me = 3
                    ppoll
                    notify_me = 1
                    aio_dispatch
                     aio_bh_poll
                      thread_pool_completion_bh
                                                     bh->scheduled = 1
                                                     event_notifier_set
                     node->io_read(node->opaque)
                      event_notifier_test_and_clear
   ppoll
   *** hang ***

"Tracing" with qemu_clock_get_ns shows pretty much the same behavior as
in the previous bug, so there are no new tricks here---just stare more
at the code until it is apparent.

One could also use a formal model, of course.  The included one shows
this with three processes: notifier corresponds to a QEMU thread pool
worker, temporary_waiter to a VCPU thread that invokes aio_poll(),
waiter to the main I/O thread.  I would be happy to say that the
formal model found the bug for me, but actually I wrote it after the
fact.

This patch is a bit of a big hammer.  The next one optimizes it,
with help (this time for real rather than a posteriori :)) from
another, similar formal model.

Reported-by: Richard W. M. Jones <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Tested-by: Richard W.M. Jones <[email protected]>
Message-id: 1437487673 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

commit | commitdiff | tree

Paolo Bonzini [Tue, 21 Jul 2015 14:07:51 +0000 (16:07 +0200)]

AioContext: fix broken ctx->dispatching optimization

This patch rewrites the ctx->dispatching optimization, which was the cause
of some mysterious hangs that could be reproduced on aarch64 KVM only.
The hangs were indirectly caused by aio_poll() and in particular by
flash memory updates's call to blk_write(), which invokes aio_poll().
Fun stuff: they had an extremely short race window, so much that
adding all kind of tracing to either the kernel or QEMU made it
go away (a single printf made it half as reproducible).

On the plus side, the failure mode (a hang until the next keypress)
made it very easy to examine the state of the process with a debugger.
And there was a very nice reproducer from Laszlo, which failed pretty
often (more than half of the time) on any version of QEMU with a non-debug
kernel; it also failed fast, while still in the firmware.  So, it could
have been worse.

For some unknown reason they happened only with virtio-scsi, but
that's not important.  It's more interesting that they disappeared with
io=native, making thread-pool.c a likely suspect for where the bug arose.
thread-pool.c is also one of the few places which use bottom halves
across threads, by the way.

I hope that no other similar bugs exist, but just in case :) I am
going to describe how the successful debugging went...  Since the
likely culprit was the ctx->dispatching optimization, which mostly
affects bottom halves, the first observation was that there are two
qemu_bh_schedule() invocations in the thread pool: the one in the aio
worker and the one in thread_pool_completion_bh.  The latter always
causes the optimization to trigger, the former may or may not.  In
order to restrict the possibilities, I introduced new functions
qemu_bh_schedule_slow() and qemu_bh_schedule_fast():

     /* qemu_bh_schedule_slow: */
     ctx = bh->ctx;
     bh->idle = 0;
     if (atomic_xchg(&bh->scheduled, 1) == 0) {
         event_notifier_set(&ctx->notifier);
     }

     /* qemu_bh_schedule_fast: */
     ctx = bh->ctx;
     bh->idle = 0;
     assert(ctx->dispatching);
     atomic_xchg(&bh->scheduled, 1);

Notice how the atomic_xchg is still in qemu_bh_schedule_slow().  This
was already debated a few months ago, so I assumed it to be correct.
In retrospect this was a very good idea, as you'll see later.

Changing thread_pool_completion_bh() to qemu_bh_schedule_fast() didn't
trigger the assertion (as expected).  Changing the worker's invocation
to qemu_bh_schedule_slow() didn't hide the bug (another assumption
which luckily held).  This already limited heavily the amount of
interaction between the threads, hinting that the problematic events
must have triggered around thread_pool_completion_bh().

As mentioned early, invoking a debugger to examine the state of a
hung process was pretty easy; the iothread was always waiting on a
poll(..., -1) system call.  Infinite timeouts are much rarer on x86,
and this could be the reason why the bug was never observed there.
With the buggy sequence more or less resolved to an interaction between
thread_pool_completion_bh() and poll(..., -1), my "tracing" strategy was
to just add a few qemu_clock_get_ns(QEMU_CLOCK_REALTIME) calls, hoping
that the ordering of aio_ctx_prepare(), aio_ctx_dispatch, poll() and
qemu_bh_schedule_fast() would provide some hint.  The output was:

    (gdb) p last_prepare
    $3 = 103885451
    (gdb) p last_dispatch
    $4 = 103876492
    (gdb) p last_poll
    $5 = 115909333
    (gdb) p last_schedule
    $6 = 115925212

Notice how the last call to qemu_poll_ns() came after aio_ctx_dispatch().
This makes little sense unless there is an aio_poll() call involved,
and indeed with a slightly different instrumentation you can see that
there is one:

    (gdb) p last_prepare
    $3 = 107569679
    (gdb) p last_dispatch
    $4 = 107561600
    (gdb) p last_aio_poll
    $5 = 110671400
    (gdb) p last_schedule
    $6 = 110698917

So the scenario becomes clearer:

   iothread                   VCPU thread
--------------------------------------------------------------------------
   aio_ctx_prepare
   aio_ctx_check
   qemu_poll_ns(timeout=-1)
                              aio_poll
                                aio_dispatch
                                  thread_pool_completion_bh
                                    qemu_bh_schedule()

At this point bh->scheduled = 1 and the iothread has not been woken up.
The solution must be close, but this alone should not be a problem,
because the bottom half is only rescheduled to account for rare situations
(see commit 3c80ca1, thread-pool: avoid deadlock in nested aio_poll()
calls, 2014-07-15).

Introducing a third thread---a thread pool worker thread, which
also does qemu_bh_schedule()---does bring out the problematic case.
The third thread must be awakened *after* the callback is complete and
thread_pool_completion_bh has redone the whole loop, explaining the
short race window.  And then this is what happens:

                                                      thread pool worker
--------------------------------------------------------------------------
                                                      <I/O completes>
                                                      qemu_bh_schedule()

Tada, bh->scheduled is already 1, so qemu_bh_schedule() does nothing
and the iothread is never woken up.  This is where the bh->scheduled
optimization comes into play---it is correct, but removing it would
have masked the bug.

So, what is the bug?

Well, the question asked by the ctx->dispatching optimization ("is any
active aio_poll dispatching?") was wrong.  The right question to ask
instead is "is any active aio_poll *not* dispatching", i.e. in the prepare
or poll phases?  In that case, the aio_poll is sleeping or might go to
sleep anytime soon, and the EventNotifier must be invoked to wake
it up.

In any other case (including if there is *no* active aio_poll at all!)
we can just wait for the next prepare phase to pick up the event (e.g. a
bottom half); the prepare phase will avoid the blocking and service the
bottom half.

Expressing the invariant with a logic formula, the broken one looked like:

   !(exists(thread): in_dispatching(thread)) => !optimize

or equivalently:

   !(exists(thread):
          in_aio_poll(thread) && in_dispatching(thread)) => !optimize

In the correct one, the negation is in a slightly different place:

   (exists(thread):
         in_aio_poll(thread) && !in_dispatching(thread)) => !optimize

or equivalently:

   (exists(thread): in_prepare_or_poll(thread)) => !optimize

Even if the difference boils down to moving an exclamation mark :)
the implementation is quite different.  However, I think the new
one is simpler to understand.

In the old implementation, the "exists" was implemented with a boolean
value.  This didn't really support well the case of multiple concurrent
event loops, but I thought that this was okay: aio_poll holds the
AioContext lock so there cannot be concurrent aio_poll invocations, and
I was just considering nested event loops.  However, aio_poll _could_
indeed be concurrent with the GSource.  This is why I came up with the
wrong invariant.

In the new implementation, "exists" is computed simply by counting how many
threads are in the prepare or poll phases.  There are some interesting
points to consider, but the gist of the idea remains:

1) AioContext can be used through GSource as well; as mentioned in the
patch, bit 0 of the counter is reserved for the GSource.

2) the counter need not be updated for a non-blocking aio_poll, because
it won't sleep forever anyway.  This is just a matter of checking
the "blocking" variable.  This requires some changes to the win32
implementation, but is otherwise not too complicated.

3) as mentioned above, the new implementation will not call aio_notify
when there is *no* active aio_poll at all.  The tests have to be
adjusted for this change.  The calls to aio_notify in async.c are fine;
they only want to kick aio_poll out of a blocking wait, but need not
do anything if aio_poll is not running.

4) nested aio_poll: these just work with the new implementation; when
a nested event loop is invoked, the outer event loop is never in the
prepare or poll phases.  The outer event loop thus has already decremented
the counter.

Reported-by: Richard W. M. Jones <[email protected]>
Reported-by: Laszlo Ersek <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Tested-by: Richard W.M. Jones <[email protected]>
Message-id: 1437487673 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

Empty description

RSS Atom

This page took 0.102289 seconds and 4 git commands to generate.