Jason Wang [Wed, 11 Jan 2017 04:32:12 +0000 (12:32 +0800)]
vhost_net: device IOTLB support
This patches implements Device IOTLB support for vhost kernel. This is
done through:
1) switch to use dma helpers when map/unmap vrings from vhost codes
2) introduce a set of VhostOps to:
- setting up device IOTLB request callback
- processing device IOTLB request
- processing device IOTLB invalidation
2) kernel support for Device IOTLB API:
- allow vhost-net to query the IOMMU IOTLB entry through eventfd
- enable the ability for qemu to update a specified mapping of vhost
- through ioctl.
- enable the ability to invalidate a specified range of iova for the
device IOTLB of vhost through ioctl. In x86/intel_iommu case this is
triggered through iommu memory region notifier from device IOTLB
invalidation descriptor processing routine.
With all the above, kernel vhost_net can co-operate with userspace
IOMMU. For vhost-user, the support could be easily done on top by
implementing the VhostOps.
Stefan Hajnoczi [Thu, 12 Jan 2017 11:46:11 +0000 (11:46 +0000)]
virtio: disable notifications again after poll succeeded
While AioContext is in polling mode virtqueue notifications are not
necessary. Some device virtqueue handlers enable notifications. Make
sure they stay disabled to avoid unnecessary vmexits.
Both virtio-net and virtio-crypto do not balance
virtio_queue_set_notification() enable and disable calls. This makes
the notifications_disabled counter unreliable and Doug Goldstein
reported the following assertion failure:
#3 0x00007ffff44d1c62 in __GI___assert_fail (
assertion=assertion@entry=0x555555ae8e8a "vq->notification_disabled > 0",
file=file@entry=0x555555ae89c0 "/home/doug/work/qemu/hw/virtio/virtio.c",
line=line@entry=215,
function=function@entry=0x555555ae9630 <__PRETTY_FUNCTION__.43707>
"virtio_queue_set_notification") at assert.c:101
#4 0x00005555557f25d6 in virtio_queue_set_notification (vq=0x55555666aa90,
enable=enable@entry=1) at /home/doug/work/qemu/hw/virtio/virtio.c:215
#5 0x00005555557dc311 in virtio_net_has_buffers (q=<optimized out>,
q=<optimized out>, bufsize=102)
at /home/doug/work/qemu/hw/net/virtio-net.c:1008
#6 virtio_net_receive (nc=<optimized out>, buf=0x555557386b88 "", size=102)
at /home/doug/work/qemu/hw/net/virtio-net.c:1148
#7 0x00005555559cad33 in nc_sendv_compat (flags=<optimized out>, iovcnt=1,
iov=0x7fffead746d0, nc=0x55555788b340) at net/net.c:705
#8 qemu_deliver_packet_iov (sender=<optimized out>, flags=<optimized out>,
iov=0x7fffead746d0, iovcnt=1, opaque=0x55555788b340) at net/net.c:732
#9 0x00005555559cd929 in qemu_net_queue_deliver (size=<optimized out>,
data=<optimized out>, flags=<optimized out>, sender=<optimized out>,
queue=0x55555788b550) at net/queue.c:164
#10 qemu_net_queue_flush (queue=0x55555788b550) at net/queue.c:261
This patch is safe to revert since it's just an optimization for
virtqueue polling. The next patch will improve the situation again
without resorting to nesting.
Paolo Bonzini [Wed, 11 Jan 2017 08:38:15 +0000 (09:38 +0100)]
virtio-net: enable ioeventfd even if vhost=off
virtio-net-pci does not enable ioeventfd for historical reasons (and
nobody ever checked whether it should be revisited). Note that other
backends do enable ioeventfd for virtio-net.
However, it has a major effect on performance. On Windows, throughput is
_multiplied_ by 2 or 3 on TCP_STREAM (on small packets it is "only" a 30%
improvement) and a little less so on TCP_MAERTS albeit still very much
statistically significant. Latency also has a single digit improvement.
This is not visible when using vhost, which forces ioeventfd=on, but it
is substantial without vhost. In addition, also on Windows and with the
RHEL 7.3 kernel, APICv seems to slow down virtio-net performance a bit,
but the penalty with this patch goes from -25% to -7%.
Peter Maydell [Tue, 17 Jan 2017 11:20:27 +0000 (11:20 +0000)]
Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
# gpg: Signature made Mon 16 Jan 2017 13:38:52 GMT
# gpg: using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg: aka "Stefan Hajnoczi <[email protected]>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8
* remotes/stefanha/tags/block-pull-request:
async: optimize aio_bh_poll
aio: document locking
aio-win32: remove walking_handlers, protecting AioHandler list with list_lock
aio-posix: remove walking_handlers, protecting AioHandler list with list_lock
aio: tweak walking in dispatch phase
aio-posix: split aio_dispatch_handlers out of aio_dispatch
qemu-thread: optimize QemuLockCnt with futexes on Linux
aio: make ctx->list_lock a QemuLockCnt, subsuming ctx->walking_bh
qemu-thread: introduce QemuLockCnt
aio: rename bh_lock to list_lock
block: get rid of bdrv_io_unplugged_begin/end
Peter Maydell [Mon, 16 Jan 2017 18:23:02 +0000 (18:23 +0000)]
Merge remote-tracking branch 'remotes/stsquad/tags/pull-tcg-common-tlb-reset-20170113-r1' into staging
This is the same as the v3 posted except a re-base and a few extra signoffs
# gpg: Signature made Fri 13 Jan 2017 14:26:46 GMT
# gpg: using RSA key 0xFBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <[email protected]>"
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44
* remotes/stsquad/tags/pull-tcg-common-tlb-reset-20170113-r1:
cputlb: drop flush_global flag from tlb_flush
cpu_common_reset: wrap TCG specific code in tcg_enabled()
qom/cpu: move tlb_flush to cpu_common_reset
Paolo Bonzini [Thu, 12 Jan 2017 18:07:53 +0000 (19:07 +0100)]
aio: make ctx->list_lock a QemuLockCnt, subsuming ctx->walking_bh
This will make it possible to walk the list of bottom halves without
holding the AioContext lock---and in turn to call bottom half
handlers without holding the lock.
Paolo Bonzini [Thu, 12 Jan 2017 18:07:52 +0000 (19:07 +0100)]
qemu-thread: introduce QemuLockCnt
A QemuLockCnt comprises a counter and a mutex, with primitives
to increment and decrement the counter, and to take and release the
mutex. It can be used to do lock-free visits to a data structure
whenever mutexes would be too heavy-weight and the critical section
is too long for RCU.
This could be implemented simply by protecting the counter with the
mutex, but QemuLockCnt is harder to misuse and more efficient.
Paolo Bonzini [Tue, 29 Nov 2016 11:33:34 +0000 (12:33 +0100)]
block: get rid of bdrv_io_unplugged_begin/end
bdrv_io_plug and bdrv_io_unplug are only called (via their
BlockBackend equivalents) after starting asynchronous I/O.
bdrv_drain is not going to be called while they are running,
because---even if a coroutine runs for some reason---it will
only drain in the next iteration of the event loop through
bdrv_co_yield_to_drain.
Peter Maydell [Mon, 16 Jan 2017 12:41:35 +0000 (12:41 +0000)]
Merge remote-tracking branch 'remotes/vivier/tags/m68k-for-2.9-pull-request' into staging
# gpg: Signature made Sat 14 Jan 2017 09:06:31 GMT
# gpg: using RSA key 0xF30C38BD3F2FBE3C
# gpg: Good signature from "Laurent Vivier <[email protected]>"
# gpg: aka "Laurent Vivier <[email protected]>"
# gpg: aka "Laurent Vivier (Red Hat) <[email protected]>"
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C
* remotes/vivier/tags/m68k-for-2.9-pull-request:
target-m68k: increment/decrement with SP
target-m68k: CAS doesn't need aligned access
target-m68k: manage pre-dec et post-inc in CAS
target-m68k: fix gen_flush_flags()
target-m68k: fix bit operation with immediate value
m68k: Remove PCI and USB from config file
target-m68k: Implement bfffo
target-m68k: Implement bitfield ops for memory
target-m68k: Implement bitfield ops for registers
* remotes/rth/tags/pull-tcg-20170113:
tcg/aarch64: Fix tcg_out_movi
tcg/aarch64: Fix addsub2 for 0+C
target/arm: Fix ubfx et al for aarch64
tcg/s390: Fix merge error with facilities
As the name suggests, the qapi2texi script converts JSON QAPI
description into a texi file suitable for different target
formats (info/man/txt/pdf/html...).
It parses the following kind of blocks:
Free-form:
##
# = Section
# == Subsection
#
# Some text foo with *emphasis*
# 1. with a list
# 2. like that
#
# And some code:
# | $ echo foo
# | -> do this
# | <- get that
#
##
Symbol description:
##
# @symbol:
#
# Symbol body ditto ergo sum. Foo bar
# baz ding.
#
# @param1: the frob to frobnicate
# @param2: #optional how hard to frobnicate
#
# Returns: the frobnicated frob.
# If frob isn't frobnicatable, GenericError.
#
# Since: version
# Notes: notes, comments can have
# - itemized list
# - like this
#
# Example:
#
# -> { "execute": "quit" }
# <- { "return": {} }
#
##
That's roughly following the following EBNF grammar:
api_comment = "##\n" comment "##\n"
comment = freeform_comment | symbol_comment
freeform_comment = { "# " text "\n" | "#\n" }
symbol_comment = "# @" name ":\n" { member | tag_section | freeform_comment }
member = "# @" name ':' [ text ] "\n" freeform_comment
tag_section = "# " ( "Returns:", "Since:", "Note:", "Notes:", "Example:", "Examples:" ) [ text ] "\n" freeform_comment
text = free text with markup
Note that the grammar is ambiguous: a line "# @foo:\n" can be parsed
both as freeform_comment and as symbol_comment. The actual parser
recognizes symbol_comment.
See docs/qapi-code-gen.txt for more details.
Deficiencies and limitations:
- the generated QMP documentation includes internal types
- union type support is lacking
- type information is lacking in generated documentation
- doc comment error message positions are imprecise, they point
to the beginning of the comment.
- a few minor issues, all marked TODO/FIXME in the code