Eric Blake [Fri, 6 May 2016 16:26:42 +0000 (10:26 -0600)]
nbd: Switch to byte-based block access
Sector-based blk_read() should die; switch to byte-based
blk_pread() instead.
Add a constant for our magic number 512, to make it obvious
that this size will NOT change even if BDRV_SECTOR_SIZE does,
even though the two happen to be the same for now. Split
assignments from conditionals to keep checkpatch.pl happy.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Fri, 6 May 2016 16:26:41 +0000 (10:26 -0600)]
atapi: Switch to byte-based block access
Sector-based blk_read() should die; switch to byte-based
blk_pread() instead.
Add new defines ATAPI_SECTOR_BITS and ATAPI_SECTOR_SIZE to
use anywhere we were previously scaling BDRV_SECTOR_* by 4,
for better legibility.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Fri, 6 May 2016 16:26:40 +0000 (10:26 -0600)]
m25p80: Switch to byte-based block access
Sector-based blk_read() should die; switch to byte-based
blk_pread() instead.
Likewise for blk_aio_readv() and blk_aio_writev().
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Fri, 6 May 2016 16:26:39 +0000 (10:26 -0600)]
sd: Switch to byte-based block access
Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead. Likewise for blk_read().
Greatly simplifies the code, now that we let the block layer
take care of alignment and read-modify-write on our behalf :)
In fact, we no longer need to include 'buf' in the migration
stream (although we do have to ensure that the stream remains
compatible).
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Fri, 6 May 2016 16:26:38 +0000 (10:26 -0600)]
pflash: Switch to byte-based block access
Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead. Likewise for blk_read().
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Fri, 6 May 2016 16:26:37 +0000 (10:26 -0600)]
onenand: Switch to byte-based block access
Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead. Likewise for blk_read().
This particular device picks its size during onenand_initfn(),
and can be at most 0x80000000 bytes; therefore, shifting an
'int sec' request to get back to a byte offset should never
overflow 32 bits. But adding assertions to document that point
should not hurt.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Fri, 6 May 2016 16:26:36 +0000 (10:26 -0600)]
nand: Switch to byte-based block access
Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead. Likewise for blk_read().
This file is doing some complex computations to map various
flash page sizes (256, 512, and 2048) atop generic uses of
512-byte sector operations. Perhaps someone will want to tidy
up the file for fewer gymnastics in managing addresses and
offsets, and less wasteful visits of 256-byte pages, but it
was out of scope for this series, where I just went with the
mechanical conversion.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Fri, 6 May 2016 16:26:35 +0000 (10:26 -0600)]
fdc: Switch to byte-based block access
Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead. Likewise for blk_read().
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Fri, 6 May 2016 16:26:34 +0000 (10:26 -0600)]
xen_disk: Switch to byte-based aio block access
Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Fri, 6 May 2016 16:26:33 +0000 (10:26 -0600)]
virtio: Switch to byte-based aio block access
Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.
The trace is modified at the same time, and nb_sectors is now
unused. Fix a comment typo while in the vicinity.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Fri, 6 May 2016 16:26:32 +0000 (10:26 -0600)]
scsi-disk: Switch to byte-based aio block access
Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.
As part of the cleanup, scsi_init_iovec() no longer needs to return
a value, and reword a comment.
[ kwolf: Fix read accounting change ]
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Fri, 6 May 2016 16:26:31 +0000 (10:26 -0600)]
ide: Switch to byte-based aio block access
Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.
The patch had to touch multiple files at once, because dma_blk_io()
takes pointers to the functions, and ide_issue_trim() piggybacks on
the same interface (while ignoring offset under the hood).
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Fri, 6 May 2016 16:26:30 +0000 (10:26 -0600)]
block: Introduce byte-based aio read/write
blk_aio_readv() and blk_aio_writev() are annoying in that they
can't access sub-sector granularity, and cannot pass flags.
Also, they require the caller to pass redundant information
about the size of the I/O (qiov->size in bytes must match
nb_sectors in sectors).
Add new blk_aio_preadv() and blk_aio_pwritev() functions to fix
the flaws. The next few patches will upgrade callers, then
finally delete the old interfaces.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Fri, 6 May 2016 16:26:29 +0000 (10:26 -0600)]
block: Switch blk_*write_zeroes() to byte interface
Sector-based blk_write() should die; convert the one-off
variant blk_write_zeroes() to use an offset/count interface
instead. Likewise for blk_co_write_zeroes() and
blk_aio_write_zeroes().
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Fri, 6 May 2016 16:26:28 +0000 (10:26 -0600)]
block: Switch blk_read_unthrottled() to byte interface
Sector-based blk_read() should die; convert the one-off
variant blk_read_unthrottled().
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Fri, 6 May 2016 16:26:27 +0000 (10:26 -0600)]
block: Allow BDRV_REQ_FUA through blk_pwrite()
We have several block drivers that understand BDRV_REQ_FUA,
and emulate it in the block layer for the rest by a full flush.
But without a way to actually request BDRV_REQ_FUA during a
pass-through blk_pwrite(), FUA-aware block drivers like NBD are
forced to repeat the emulation logic of a full flush regardless
of whether the backend they are writing to could do it more
efficiently.
This patch just wires up a flags argument; followup patches
will actually make use of it in the NBD driver and in qemu-io.
Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Kevin Wolf [Mon, 9 May 2016 10:03:04 +0000 (12:03 +0200)]
qemu-io: Fix memory leak in 'aio_write -z'
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Janne Karhunen [Tue, 3 May 2016 09:43:30 +0000 (02:43 -0700)]
Allow users to specify the vmdk virtual hardware version.
Vmdk images have metadata to indicate the vmware virtual
hardware version image was created/tested to run with.
Allow users to specify that version via new 'hwversion'
option.
[ kwolf: Adjust qemu-iotests common.filter ]
Signed-off-by: Janne Karhunen <Janne.Karhunen@gmail.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Zhou Jie [Fri, 29 Apr 2016 05:12:01 +0000 (13:12 +0800)]
block: always compile-check debug prints
Files with conditional debug statements should ensure that the printf is
always compiled. This prevents bitrot of the format string of the debug
statement. And switch debug output to stderr.
Signed-off-by: Zhou Jie <zhoujie2011@cn.fujitsu.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Wei Jiangang [Tue, 26 Apr 2016 10:12:43 +0000 (18:12 +0800)]
block: Fix typo in comment
s/imlement/implement/
Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Kevin Wolf [Tue, 26 Apr 2016 15:28:25 +0000 (17:28 +0200)]
block: Remove BlockDriver.bdrv_read/write
There are no block drivers left that implement the old .bdrv_read/write
interface, so it can be removed now. This gets us rid of the
corresponding emulation functions, too.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Kevin Wolf [Tue, 26 Apr 2016 15:14:08 +0000 (17:14 +0200)]
vvfat: Implement .bdrv_co_preadv/pwritev interfaces
This doesn't really convert any of the actual vvfat logic to use
vectored I/O (and it's doubtful whether that would make sense), but
instead just adapts the wrappers to the modern interface.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Kevin Wolf [Tue, 26 Apr 2016 14:24:46 +0000 (16:24 +0200)]
vpc: Implement .bdrv_co_pwritev() interface
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Kevin Wolf [Tue, 26 Apr 2016 14:24:46 +0000 (16:24 +0200)]
vpc: Implement .bdrv_co_preadv() interface
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Kevin Wolf [Tue, 26 Apr 2016 11:39:11 +0000 (13:39 +0200)]
vmdk: Implement .bdrv_co_pwritev() interface
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Kevin Wolf [Mon, 25 Apr 2016 15:34:41 +0000 (17:34 +0200)]
vmdk: Implement .bdrv_co_preadv() interface
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Kevin Wolf [Mon, 25 Apr 2016 15:14:48 +0000 (17:14 +0200)]
vmdk: Add vmdk_find_offset_in_cluster()
This is a byte granularity version of vmdk_find_index_in_cluster().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Kevin Wolf [Mon, 25 Apr 2016 14:22:39 +0000 (16:22 +0200)]
vdi: Implement .bdrv_co_pwritev() interface
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Kevin Wolf [Mon, 25 Apr 2016 14:22:39 +0000 (16:22 +0200)]
vdi: Implement .bdrv_co_preadv() interface
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Kevin Wolf [Mon, 25 Apr 2016 13:43:09 +0000 (15:43 +0200)]
dmg: Implement .bdrv_co_preadv() interface
This implements .bdrv_co_preadv() for the cloop block driver. While
updating the error paths, change -1 to a valid -errno code.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Kevin Wolf [Mon, 25 Apr 2016 13:21:46 +0000 (15:21 +0200)]
cloop: Implement .bdrv_co_preadv() interface
This implements .bdrv_co_preadv() for the cloop block driver. While
updating the error paths, change -1 to a valid -errno code.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Kevin Wolf [Mon, 25 Apr 2016 12:53:22 +0000 (14:53 +0200)]
bochs: Implement .bdrv_co_preadv() interface
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Kevin Wolf [Mon, 25 Apr 2016 09:25:18 +0000 (11:25 +0200)]
block: Introduce .bdrv_co_preadv/pwritev BlockDriver function
Many parts of the block layer are already byte granularity. The block
driver interface, however, was still missing an interface that allows
making use of this. This patch introduces a new BlockDriver interface,
which is based on coroutines, vectored, has flags and uses a byte
granularity. This is now the preferred interface for new drivers.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Kevin Wolf [Mon, 25 Apr 2016 12:57:23 +0000 (14:57 +0200)]
block: Rename bdrv_co_do_preadv/writev to bdrv_co_preadv/writev
It used to be an internal helper function just for implementing
bdrv_co_do_readv/writev(), but now that it's a public interface, it
deserves a name without "do" in it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Kevin Wolf [Mon, 25 Apr 2016 12:13:12 +0000 (14:13 +0200)]
block: Support AIO drivers in bdrv_driver_preadv/pwritev()
Instead of registering emulation functions as .bdrv_co_writev, just
directly check whether the function is there or not, and use the AIO
interface if it isn't. This makes the read/write functions more
consistent with how things are done in other places (flush, discard,
etc.)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Kevin Wolf [Mon, 25 Apr 2016 10:13:39 +0000 (12:13 +0200)]
block: Introduce bdrv_driver_pwritev()
This is a function that simply calls into the block driver for doing a
write, providing the byte granularity interface we want to eventually
have everywhere, and using whatever interface that driver supports.
This one is a bit more interesting than the version for reads: It adds
support for .bdrv_co_writev_flags() everywhere, so that drivers
implementing this function can drop .bdrv_co_writev() now.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Kevin Wolf [Mon, 25 Apr 2016 09:46:41 +0000 (11:46 +0200)]
block: Introduce bdrv_driver_preadv()
This is a function that simply calls into the block driver for doing a
read, providing the byte granularity interface we want to eventually
have everywhere, and using whatever interface that driver supports.
For now, this is just a wrapper for calling bs->drv->bdrv_co_readv().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Paolo Bonzini [Thu, 7 Apr 2016 16:33:35 +0000 (18:33 +0200)]
linux-aio: make it more type safe
Replace void* with an opaque LinuxAioState type.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini [Thu, 7 Apr 2016 16:33:34 +0000 (18:33 +0200)]
block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end
Extract the handling of io_plug "depth" from linux-aio.c and let the
main bdrv_drain loop do nothing but wait on I/O.
Like the two newly introduced functions, bdrv_io_plug and bdrv_io_unplug
now operate on all children. The visit order is now symmetrical between
plug and unplug, making it possible for formats to implement plug/unplug.
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini [Thu, 7 Apr 2016 16:33:33 +0000 (18:33 +0200)]
block: introduce bdrv_no_throttling_begin/end
Extract the handling of throttling from bdrv_flush_io_queue. These
new functions will soon become BdrvChildRole callbacks, as they can
be generalized to "beginning of drain" and "end of drain".
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini [Thu, 7 Apr 2016 16:33:32 +0000 (18:33 +0200)]
block: extract bdrv_drain_poll/bdrv_co_yield_to_drain from bdrv_drain/bdrv_co_drain
Do not call bdrv_drain_recurse twice in bdrv_co_drain. A small
tweak to the logic in Fam's patch, which is harmless since no
one implements bdrv_drain anyway. But better get it right.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini [Thu, 7 Apr 2016 16:33:31 +0000 (18:33 +0200)]
block: move restarting of throttled reqs to block/throttle-groups.c
We want to remove throttled_reqs from block/io.c. This is the easy
part---hide the handling of throttled_reqs during disable/enable of
throttling within throttle-groups.c.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini [Thu, 7 Apr 2016 16:33:30 +0000 (18:33 +0200)]
block: make bdrv_start_throttled_reqs return void
The return value is unused and I am not sure why it would be useful.
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Kevin Wolf [Thu, 7 Apr 2016 16:33:29 +0000 (18:33 +0200)]
block: Don't disable I/O throttling on sync requests
We had to disable I/O throttling with synchronous requests because we
didn't use to run timers in nested event loops when the code was
introduced. This isn't true any more, and throttling works just fine
even when using the synchronous API.
The removed code is in fact dead code since commit
a8823a3b ('block: Use
blk_co_pwritev() for blk_write()') because I/O throttling can only be
set on the top layer, but BlockBackend always uses the coroutine
interface now instead of using the sync API emulation in block.c.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <
1458660792-3035-2-git-send-email-kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Maydell [Thu, 12 May 2016 11:35:25 +0000 (12:35 +0100)]
Open 2.7 development tree
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Wed, 11 May 2016 15:44:26 +0000 (16:44 +0100)]
Update version for v2.6.0 release
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Mon, 9 May 2016 13:08:12 +0000 (14:08 +0100)]
Update version for v2.6.0-rc5 release
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Mon, 9 May 2016 12:42:25 +0000 (13:42 +0100)]
Merge remote-tracking branch 'remotes/kraxel/tags/pull-vga-
20160509-1' into staging
vga security fixes (CVE-2016-3710, CVE-2016-3712)
# gpg: Signature made Mon 09 May 2016 13:39:30 BST using RSA key ID
D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg: aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
* remotes/kraxel/tags/pull-vga-
20160509-1:
vga: make sure vga register setup for vbe stays intact (CVE-2016-3712).
vga: update vga register setup on vbe changes
vga: factor out vga register setup
vga: add vbe_enabled() helper
vga: fix banked access bounds checking (CVE-2016-3710)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Mon, 2 May 2016 16:27:01 +0000 (17:27 +0100)]
Update version for v2.6.0-rc4 release
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Gerd Hoffmann [Fri, 15 Apr 2016 06:43:29 +0000 (08:43 +0200)]
Revert "acpi: mark PMTIMER as unlocked"
This reverts commit
7070e085d490c396f9237c8f10bf8b6e69cd0066.
Commit message claims locking is not needed, but that appears
to not be true, seabios ehci driver runs into timekeeping problems
with this, see
https://bugzilla.redhat.com/show_bug.cgi?id=
1322713
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id:
1460702609-25971-1-git-send-email-kraxel@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Gerd Hoffmann [Tue, 26 Apr 2016 12:48:06 +0000 (14:48 +0200)]
vga: make sure vga register setup for vbe stays intact (CVE-2016-3712).
Call vbe_update_vgaregs() when the guest touches GFX, SEQ or CRT
registers, to make sure the vga registers will always have the
values needed by vbe mode. This makes sure the sanity checks
applied by vbe_fixup_regs() are effective.
Without this guests can muck with shift_control, can turn on planar
vga modes or text mode emulation while VBE is active, making qemu
take code paths meant for CGA compatibility, but with the very
large display widths and heigts settable using VBE registers.
Which is good for one or another buffer overflow. Not that
critical as they typically read overflows happening somewhere
in the display code. So guests can DoS by crashing qemu with a
segfault, but it is probably not possible to break out of the VM.
Fixes: CVE-2016-3712
Reported-by: Zuozhi Fzz <zuozhi.fzz@alibaba-inc.com>
Reported-by: P J P <ppandit@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Gerd Hoffmann [Tue, 26 Apr 2016 13:39:22 +0000 (15:39 +0200)]
vga: update vga register setup on vbe changes
Call the new vbe_update_vgaregs() function on vbe configuration
changes, to make sure vga registers are up-to-date.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Gerd Hoffmann [Tue, 26 Apr 2016 13:24:18 +0000 (15:24 +0200)]
vga: factor out vga register setup
When enabling vbe mode qemu will setup a bunch of vga registers to make
sure the vga emulation operates in correct mode for a linear
framebuffer. Move that code to a separate function so we can call it
from other places too.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Gerd Hoffmann [Tue, 26 Apr 2016 12:11:34 +0000 (14:11 +0200)]
vga: add vbe_enabled() helper
Makes code a bit easier to read.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Gerd Hoffmann [Tue, 26 Apr 2016 06:49:10 +0000 (08:49 +0200)]
vga: fix banked access bounds checking (CVE-2016-3710)
vga allows banked access to video memory using the window at 0xa00000
and it supports a different access modes with different address
calculations.
The VBE bochs extentions support banked access too, using the
VBE_DISPI_INDEX_BANK register. The code tries to take the different
address calculations into account and applies different limits to
VBE_DISPI_INDEX_BANK depending on the current access mode.
Which is probably effective in stopping misprogramming by accident.
But from a security point of view completely useless as an attacker
can easily change access modes after setting the bank register.
Drop the bogus check, add range checks to vga_mem_{readb,writeb}
instead.
Fixes: CVE-2016-3710
Reported-by: Qinghao Tang <luodalongde@gmail.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Jan Vesely [Fri, 29 Apr 2016 17:15:23 +0000 (13:15 -0400)]
configure: Check if struct fsxattr is available from linux header
Fixes build failure with --enable-xfsctl and
new linux headers (>=4.5) and older xfsprogs(<4.5):
In file included from /usr/include/xfs/xfs.h:38:0,
from /var/tmp/portage/app-emulation/qemu-2.5.0-r1/work/qemu-2.5.0/block/raw-posix.c:97:
/usr/include/xfs/xfs_fs.h:42:8: error: redefinition of ‘struct fsxattr’
struct fsxattr {
^
In file included from /var/tmp/portage/app-emulation/qemu-2.5.0-r1/work/qemu-2.5.0/block/raw-posix.c:60:0:
/usr/include/linux/fs.h:155:8: note: originally defined here
struct fsxattr {
This is really a bug in the system headers, but we can work around it
by defining HAVE_FSXATTR in the QEMU headers if linux/fs.h provides
the struct, so that xfs_fs.h doesn't try to define it as well.
CC: qemu-trivial@nongnu.org
CC: Markus Armbruster <armbru@redhat.com>
CC: Peter Maydell <peter.maydell@linaro.org>
CC: Stefan Weil <sw@weilnetz.de>
Tested-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Jan Vesely <jano.vesely@gmail.com>
[PMM: adjusted commit message, comments]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Sun, 1 May 2016 21:52:47 +0000 (22:52 +0100)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
acpi: last minute fix for 2.6
Minor, obvious fix only affecting BE hosts.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Sun 01 May 2016 13:43:28 BST using RSA key ID
D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
* remotes/mst/tags/for_upstream:
acpi: fix bios linker loadder COMMAND_ALLOCATE on bigendian host
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Igor Mammedov [Fri, 29 Apr 2016 12:44:40 +0000 (14:44 +0200)]
acpi: fix bios linker loadder COMMAND_ALLOCATE on bigendian host
'make check' fails with:
ERROR:tests/bios-tables-test.c:493:load_expected_aml:
assertion failed: (g_file_test(aml_file, G_FILE_TEST_EXISTS))
since commit:
caf50c7166a6ed96c462ab5db4b495e1234e4cc6
tests: pc: acpi: drop not needed 'expected SSDT' blobs
Assert happens because qemu-system-x86_64 generates
SSDT table and test looks for a corresponding expected
table to compare with.
However there is no expected SSDT blob anymore, since
QEMU souldn't generate one. As it happens BIOS is not
able to read ACPI tables from QEMU and fallbacks to
embeded legacy ACPI codepath, which generates SSDT.
That happens due to wrongly sized endiannes conversion
which makes
uint8_t BiosLinkerLoaderEntry.alloc.zone
end up with 0 due to truncation of 32 bit integer
which on host is 1 or 2.
Fix it by dropping invalid cpu_to_le32() as uint8_t
doesn't require any conversion.
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=
1330174
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Peter Maydell [Fri, 29 Apr 2016 11:12:33 +0000 (12:12 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
vvfat fixes for 2.6.0-rc4
# gpg: Signature made Fri 29 Apr 2016 10:52:13 BST using RSA key ID
C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
* remotes/kevin/tags/for-upstream:
vvfat: Fix default volume label
vvfat: Fix volume name assertion
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Fri, 29 Apr 2016 10:26:10 +0000 (11:26 +0100)]
Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2016-04-29' into staging
QAPI patches for 2016-04-29
# gpg: Signature made Fri 29 Apr 2016 10:13:08 BST using RSA key ID
EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg: aka "Markus Armbruster <armbru@pond.sub.org>"
* remotes/armbru/tags/pull-qapi-2016-04-29:
qapi: Don't pass NULL to printf in string input visitor
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Kevin Wolf [Wed, 27 Apr 2016 12:18:16 +0000 (14:18 +0200)]
vvfat: Fix default volume label
Commit
d5941dd documented that it leaves the default volume name as it
was ("QEMU VVFAT"), but it doesn't actually implement this. You get an
empty name (eleven space characters) instead.
This fixes the implementation to apply the advertised default.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Kevin Wolf [Wed, 27 Apr 2016 12:11:38 +0000 (14:11 +0200)]
vvfat: Fix volume name assertion
Commit
d5941dd made the volume name configurable, but it didn't consider
that the rw code compares the volume name string to assert that the
first directory entry is the volume name. This made vvfat crash in rw
mode.
This fixes the assertion to compare with the configured volume name
instead of a literal string.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Thu, 28 Apr 2016 21:45:28 +0000 (15:45 -0600)]
qapi: Don't pass NULL to printf in string input visitor
Make sure the error message for visit_type_uint64() gracefully
handles a NULL 'name' when called from the top level or a list
context, as not all the world behaves like glibc in allowing
NULL through a printf-family %s.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <
1461879932-9020-21-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Samuel Thibault [Thu, 28 Apr 2016 16:53:08 +0000 (18:53 +0200)]
slirp: fix guest network access with darwin host
On Darwin, connect, sendto and friends want the exact size of the sockaddr,
not more (and in particular, not sizeof(struct sockaddr_storaget))
This commit adds the sockaddr_size helper to be used when passing a sockaddr
size to such function, and makes use of it int sendto and connect calls.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: John Arbuckle <programmingkidx@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Thu, 28 Apr 2016 10:48:12 +0000 (11:48 +0100)]
Merge remote-tracking branch 'remotes/lalrae/tags/mips-
20160428' into staging
MIPS patches 2016-04-28
Changes:
* fixed RDHWR exception host PC
# gpg: Signature made Thu 28 Apr 2016 10:11:18 BST using RSA key ID
0B29DA6B
# gpg: Good signature from "Leon Alrae <leon.alrae@imgtec.com>"
* remotes/lalrae/tags/mips-
20160428:
target-mips: Fix RDHWR exception host PC
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Thu, 28 Apr 2016 10:05:37 +0000 (11:05 +0100)]
Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2016-04-28' into staging
Fix dangling pointers and error message regressions
# gpg: Signature made Thu 28 Apr 2016 07:25:51 BST using RSA key ID
EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg: aka "Markus Armbruster <armbru@pond.sub.org>"
* remotes/armbru/tags/pull-error-2016-04-28:
qom: -object error messages lost location, restore it
replay: Fix dangling location bug in replay_configure()
QemuOpts: Fix qemu_opts_foreach() dangling location regression
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Thu, 28 Apr 2016 09:25:26 +0000 (10:25 +0100)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.6-
20160426' into staging
ppc patch queue for 2016-04-26 (last minute qemu-2.6 fix)
This just has one, last-minute, fix for a serious regression of memory
hotplug.
Patch author's comment:
Really sorry for the way last-minute fix, but without this memory
hotplug is totally broken :( Hoping to get this in for Wednesday's
RC4, which I think will be the final before release.
# gpg: Signature made Tue 26 Apr 2016 03:52:20 BST using RSA key ID
20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.6-
20160426:
spapr_drc: fix aborts during DRC-count based hotplug
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
James Hogan [Wed, 27 Apr 2016 22:21:06 +0000 (23:21 +0100)]
target-mips: Fix RDHWR exception host PC
Commit
b00c72180c36 ("target-mips: add PC, XNP reg numbers to RDHWR")
changed the rdhwr helpers to use check_hwrena() to check the register
being accessed is enabled in CP0_HWREna when used from user mode. If
that check fails an EXCP_RI exception is raised at the host PC
calculated with GETPC().
However check_hwrena() may not be fully inlined as the
do_raise_exception() part of it is common regardless of the arguments.
This causes GETPC() to calculate the address in the call in the helper
instead of the generated code calling the helper. No TB will be found
and the EPC reported with the resulting guest RI exception points to the
beginning of the TB instead of the RDHWR instruction.
We can't reliably force check_hwrena() to be inlined, and converting it
to a macro would be ugly, so instead pass the host PC in as an argument,
with each rdhwr helper passing GETPC(). This should avoid any dependence
on compiler behaviour, and in practice seems to ensure the full inlining
of check_hwrena() on x86_64.
This issue causes failures when running a MIPS KVM (trap & emulate)
guest in a MIPS QEMU TCG guest, as the inner guest kernel will do a
RDHWR of counter, which is disabled in the outer guest's CP0_HWREna by
KVM so it can emulate the inner guest's counter. The emulation fails and
the RI exception is passed to the inner guest.
Fixes: b00c72180c36 ("target-mips: add PC, XNP reg numbers to RDHWR")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Cc: Yongbok Kim <yongbok.kim@imgtec.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
Markus Armbruster [Wed, 27 Apr 2016 14:29:09 +0000 (16:29 +0200)]
qom: -object error messages lost location, restore it
qemu_opts_foreach() runs its callback with the error location set to
the option's location. Any errors the callback reports use the
option's location automatically.
Commit
90998d5 moved the actual error reporting from "inside"
qemu_opts_foreach() to after it. Here's a typical hunk:
if (qemu_opts_foreach(qemu_find_opts("object"),
- object_create,
- object_create_initial, NULL)) {
+ user_creatable_add_opts_foreach,
+ object_create_initial, &err)) {
+ error_report_err(err);
exit(1);
}
Before, object_create() reports from within qemu_opts_foreach(), using
the option's location. Afterwards, we do it after
qemu_opts_foreach(), using whatever location happens to be current
there. Commonly a "none" location.
This is because Error objects don't have location information.
Problematic.
Reproducer:
$ qemu-system-x86_64 -nodefaults -display none -object secret,id=foo,foo=bar
qemu-system-x86_64: Property '.foo' not found
Note no location. This commit restores it:
qemu-system-x86_64: -object secret,id=foo,foo=bar: Property '.foo' not found
Note that the qemu_opts_foreach() bug just fixed could mask the bug
here: if the location it leaves dangling hasn't been clobbered, yet,
it's the correct one.
Reported-by: Eric Blake <eblake@redhat.com>
Cc: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <
1461767349-15329-4-git-send-email-armbru@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[Paragraph on Error added to commit message]
Markus Armbruster [Wed, 27 Apr 2016 14:29:08 +0000 (16:29 +0200)]
replay: Fix dangling location bug in replay_configure()
replay_configure() pushes and pops a Location with automatic storage
duration. Except it fails to pop when -icount parameter "rr" isn't
given. cur_loc then points to unused stack space, and will most
likely get clobbered in short order.
Clobbered cur_loc can make loc_pop() and error_print_loc() crash or
report bogus locations.
Broken in commit
890ad55.
I didn't take the time to find a reproducer.
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <
1461767349-15329-3-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Markus Armbruster [Wed, 27 Apr 2016 14:29:07 +0000 (16:29 +0200)]
QemuOpts: Fix qemu_opts_foreach() dangling location regression
qemu_opts_foreach() pushes and pops a Location with automatic storage
duration. Except it fails to pop when @func() returns non-zero.
cur_loc then points to unused stack space, and will most likely get
clobbered in short order.
Clobbered cur_loc can make loc_pop() and error_print_loc() crash or
report bogus locations.
Affects several qemu command line options as well as qemu-img,
qemu-io, qemu-nbd -object, and blkdebug's configuration file.
Broken in commit
a4c7367, v2.4.0.
Reproducer:
$ qemu-system-x86_64 -nodefaults -display none -object secret,id=foo,foo=bar
main() reports "Property '.foo' not found" like this:
if (qemu_opts_foreach(qemu_find_opts("object"),
user_creatable_add_opts_foreach,
object_create_delayed, &err)) {
error_report_err(err);
exit(1);
}
cur_loc then points to where qemu_opts_foreach()'s Location used to
be, i.e. unused stack space. With optimization, this Location doesn't
get clobbered for me, and also happens to be the correct location.
Without optimization, it does get clobbered in a way that makes
error_report_err() report no location.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <
1461767349-15329-2-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Michael Roth [Mon, 25 Apr 2016 22:24:25 +0000 (17:24 -0500)]
spapr_drc: fix aborts during DRC-count based hotplug
CPU/memory resources can be signalled en-masse via
spapr_hotplug_req_add_by_count(), and when doing so, actually change
the meaning of the 'drc' parameter passed to
spapr_hotplug_req_event() to be a count rather than an index.
f40eb92 added a hook in spapr_hotplug_req_event() to record when a
device had been 'signalled' to the guest, but that code assumes that
drc is always an index. In cases where it's a count, such as memory
hotplug, the DRC lookup will fail, leading to an assert.
Fix this by only explicitly setting the signalled state for cases where
we are doing PCI hotplug.
For other resources types, since we cannot selectively track whether a
resource has been signalled in cases where we signal attach as a count,
set the 'signalled' state to true immediately upon making the
resource available via drck->attach().
Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: david@gibson.dropbear.id.au
Cc: qemu-ppc@nongnu.org
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Gerd Hoffmann [Fri, 22 Apr 2016 10:44:53 +0000 (12:44 +0200)]
usb/uhci: move pid check
commit "
5f77e06 usb: add pid check at the first of uhci_handle_td()"
moved the pid verification to the start of the uhci_handle_td function,
to simplify the error handling (we don't have to free stuff which we
didn't allocate in the first place ...).
Problem is now the check fires too often, it raises error IRQs even for
TDs which we are not going to process because they are not set active.
So, lets move down the check a bit, so it is done only for active TDs,
but still before we are going to allocate stuff to process the requested
transfer.
Reported-by: Joe Clifford <joe@thunderbug.co.uk>
Tested-by: Joe Clifford <joe@thunderbug.co.uk>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id:
1461321893-15811-1-git-send-email-kraxel@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Mon, 25 Apr 2016 10:15:53 +0000 (11:15 +0100)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.6-
20160423' into staging
ppc patch queue for 2016-03-23
A single fix for a bug in parameter handling for the spapr PCI host
bridge.
# gpg: Signature made Sat 23 Apr 2016 07:55:29 BST using RSA key ID
20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.6-
20160423:
hw/ppc/spapr: Fix crash when specifying bad parameters to spapr-pci-host-bridge
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Thomas Huth [Thu, 21 Apr 2016 10:08:58 +0000 (12:08 +0200)]
hw/ppc/spapr: Fix crash when specifying bad parameters to spapr-pci-host-bridge
QEMU currently crashes when using bad parameters for the
spapr-pci-host-bridge device:
$ qemu-system-ppc64 -device spapr-pci-host-bridge,buid=0x123,liobn=0x321,mem_win_addr=0x1,io_win_addr=0x10
Segmentation fault
The problem is that spapr_tce_find_by_liobn() might return NULL, but
the code in spapr_populate_pci_dt() does not check for this condition
and then tries to dereference this NULL pointer.
Apart from that, the return value of spapr_populate_pci_dt() also
has to be checked for all PCI buses, not only for the last one, to
make sure we catch all errors.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Peter Maydell [Fri, 22 Apr 2016 15:17:12 +0000 (16:17 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Mirror block job fixes for 2.6.0-rc4
# gpg: Signature made Fri 22 Apr 2016 15:46:41 BST using RSA key ID
C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
* remotes/kevin/tags/for-upstream:
mirror: Workaround for unexpected iohandler events during completion
aio-posix: Skip external nodes in aio_dispatch
virtio: Mark host notifiers as external
event-notifier: Add "is_external" parameter
iohandler: Introduce iohandler_get_aio_context
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Fam Zheng [Fri, 22 Apr 2016 13:53:56 +0000 (21:53 +0800)]
mirror: Workaround for unexpected iohandler events during completion
Commit
5a7e7a0ba moved mirror_exit to a BH handler but didn't add any
protection against new requests that could sneak in just before the
BH is dispatched. For example (assuming a code base at that commit):
main_loop_wait # 1
os_host_main_loop_wait
g_main_context_dispatch
aio_ctx_dispatch
aio_dispatch
...
mirror_run
bdrv_drain
(a) block_job_defer_to_main_loop
qemu_iohandler_poll
virtio_queue_host_notifier_read
...
virtio_submit_multiwrite
(b) blk_aio_multiwrite
main_loop_wait # 2
<snip>
aio_dispatch
aio_bh_poll
(c) mirror_exit
At (a) we know the BDS has no pending request. However, the same
main_loop_wait call is going to dispatch iohandlers (EventNotifier
events), which may lead to a new I/O from guest. So the invariant is
already broken at (c). Data loss.
Commit
f3926945c8 made iohandler to use aio API. The order of
virtio_queue_host_notifier_read and block_job_defer_to_main_loop within
a main_loop_wait becomes unpredictable, and even worse, if the host
notifier event arrives at the next main_loop_wait call, the
unpredictable order between mirror_exit and
virtio_queue_host_notifier_read is also a trouble. As shown below, this
commit made the bug easier to trigger:
- Bug case 1:
main_loop_wait # 1
os_host_main_loop_wait
g_main_context_dispatch
aio_ctx_dispatch (qemu_aio_context)
...
mirror_run
bdrv_drain
(a) block_job_defer_to_main_loop
aio_ctx_dispatch (iohandler_ctx)
virtio_queue_host_notifier_read
...
virtio_submit_multiwrite
(b) blk_aio_multiwrite
main_loop_wait # 2
...
aio_dispatch
aio_bh_poll
(c) mirror_exit
- Bug case 2:
main_loop_wait # 1
os_host_main_loop_wait
g_main_context_dispatch
aio_ctx_dispatch (qemu_aio_context)
...
mirror_run
bdrv_drain
(a) block_job_defer_to_main_loop
main_loop_wait # 2
...
aio_ctx_dispatch (iohandler_ctx)
virtio_queue_host_notifier_read
...
virtio_submit_multiwrite
(b) blk_aio_multiwrite
aio_dispatch
aio_bh_poll
(c) mirror_exit
In both cases, (b) breaks the invariant wanted by (a) and (c).
Until then, the request loss has been silent. Later,
3f09bfbc7be added
asserts at (c) to check the invariant (in
bdrv_replace_in_backing_chain), and Max reported an assertion failure
first visible there, by doing active committing while the guest is
running bonnie++.
2.5 added bdrv_drained_begin at (a) to protect the dataplane case from
similar problems, but we never realize the main loop bug until now.
As a bandage, this patch disables iohandler's external events
temporarily together with bs->ctx.
Launchpad Bug:
1570134
Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fam Zheng [Fri, 22 Apr 2016 13:53:55 +0000 (21:53 +0800)]
aio-posix: Skip external nodes in aio_dispatch
aio_poll doesn't poll the external nodes so this should never be true,
but aio_ctx_dispatch may get notified by the events from GSource. To
make bdrv_drained_begin effective in main loop, we should check the
is_external flag here too.
Also do the check in aio_pending so aio_dispatch is not called
superfluously, when there is no events other than external ones.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fam Zheng [Fri, 22 Apr 2016 13:53:54 +0000 (21:53 +0800)]
virtio: Mark host notifiers as external
The effect of this change is the block layer drained section can work,
for example when mirror job is being completed.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fam Zheng [Fri, 22 Apr 2016 13:53:53 +0000 (21:53 +0800)]
event-notifier: Add "is_external" parameter
All callers pass "false" keeping the old semantics. The windows
implementation doesn't distinguish the flag yet. On posix, it is passed
down to the underlying aio context.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fam Zheng [Fri, 22 Apr 2016 13:53:52 +0000 (21:53 +0800)]
iohandler: Introduce iohandler_get_aio_context
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Christoffer Dall [Fri, 22 Apr 2016 11:12:09 +0000 (13:12 +0200)]
util: align memory allocations to 2M on AArch64
For KVM to use Transparent Huge Pages (THP) we have to ensure that the
alignment of the userspace address of the KVM memory slot and the IPA
that the guest sees for a memory region have the same offset from the 2M
huge page size boundary.
One way to achieve this is to always align the IPA region at a 2M
boundary and ensure that the mmap alignment is also at 2M.
Unfortunately, we were only doing this for __arm__, not for __aarch64__,
so add this simple condition.
This fixes a performance regression using KVM/ARM on AArch64 platforms
that showed a performance penalty of more than 50%, introduced by the
following commit:
9fac18f (oslib: allocate PROT_NONE pages on top of RAM, 2015-09-10)
We were only lucky before the above commit, because we were allocating
large regions and naturally getting a 2M alignment on those allocations
then.
Cc: qemu-stable@nongnu.org
Reported-by: Shih-Wei Li <shihwei@cs.columbia.edu>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: wrapped long line]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Eric Blake [Thu, 21 Apr 2016 14:42:30 +0000 (08:42 -0600)]
nbd: Don't mishandle unaligned client requests
The NBD protocol does not (yet) force any alignment constraints
on clients. Even though qemu NBD clients always send requests
that are aligned to 512 bytes, we must be prepared for non-qemu
clients that don't care about alignment (even if it means they
are less efficient). Our use of blk_read() and blk_write() was
silently operating on the wrong file offsets when the client
made an unaligned request, corrupting the client's data (but
as the client already has control over the file we are serving,
I don't think it is a security hole, per se, just a data
corruption bug).
Note that in the case of NBD_CMD_READ, an unaligned length could
cause us to return up to 511 bytes of uninitialized trailing
garbage from blk_try_blockalign() - hopefully nothing sensitive
from the heap's prior usage is ever leaked in that manner.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Tested-by: Kevin Wolf <kwolf@redhat.com>
Message-id:
1461249750-31928-1-git-send-email-eblake@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Thu, 21 Apr 2016 16:46:50 +0000 (17:46 +0100)]
Update version for v2.6.0-rc3 release
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Aurelien Jarno [Thu, 21 Apr 2016 08:48:50 +0000 (10:48 +0200)]
tcg: check for CONFIG_DEBUG_TCG instead of NDEBUG
Check for CONFIG_DEBUG_TCG instead of NDEBUG, drop now useless code.
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-id:
1461228530-14852-2-git-send-email-aurelien@aurel32.net
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Aurelien Jarno [Thu, 21 Apr 2016 08:48:49 +0000 (10:48 +0200)]
tcg: use tcg_debug_assert instead of assert (fix performance regression)
The TCG code is quite performance sensitive, but at the same time can
also be quite tricky. That is why asserts that can be enabled with the
--enable-debug-tcg configure option.
This used to work the following way:
| #include "config.h"
|
| ...
|
| #if !defined(CONFIG_DEBUG_TCG) && !defined(NDEBUG)
| /* define it to suppress various consistency checks (faster) */
| #define NDEBUG
| #endif
|
| ...
|
| #include <assert.h>
Since commit
757e725b (tcg: Clean up includes) "config.h" as been
replaced by "qemu/osdep.h" which itself includes <assert.h>. As a
consequence the assertions are always enabled, even when using
--disable-debug-tcg, causing a performance regression, especially on
targets with many registers. For instance on qemu-system-ppc the
speed difference is about 15%.
tcg_debug_assert is controlled directly by CONFIG_DEBUG_TCG and already
uses in some places. This patch replaces all the calls to assert into
calss to tcg_debug_assert.
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-id:
1461228530-14852-1-git-send-email-aurelien@aurel32.net
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Sylvain Garrigues [Wed, 20 Apr 2016 21:35:28 +0000 (23:35 +0200)]
hw/arm/boot: always clear r0 when booting kernels
The 32-bit ARM Linux kernel booting ABI requires that r0 is 0
when calling the kernel image. A bug in commit
10b8ec73e610e01
meant that for boards which use the write_board_setup hook (which
means "highbank", "midway", "raspi2" and "xilinx-zynq-a9") we
were incorrectly skipping the "clear r0" instruction in the
mini-bootloader. Use the right offset in the "add lr, pc, #n"
instruction so that we return from the board-setup code to the
correct place.
Signed-off-by: Sylvain Garrigues <sylvain@sylvaingarrigues.com>
[PMM: Expanded commit message]
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Eduardo Habkost [Wed, 20 Apr 2016 14:55:30 +0000 (11:55 -0300)]
MAINTAINERS: Avoid using K: for NUMA section
When using K: in MAINTAINERS, false positives makes
get_maintainer.pl not use git history to find contributors. As
those patterns cause lots of false positives they are causing
more harm than good, so remove them.
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-id:
1461164130-3847-1-git-send-email-ehabkost@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Wed, 20 Apr 2016 15:43:53 +0000 (16:43 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Mirror block job fixes for 2.6.0-rc3
# gpg: Signature made Wed 20 Apr 2016 15:56:43 BST using RSA key ID
C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
* remotes/kevin/tags/for-upstream:
iotests: Test case for drive-mirror with unaligned image size
iotests: Add iotests.image_size
mirror: Don't extend the last sub-chunk
block/mirror: Refresh stale bitmap iterator cache
block/mirror: Revive dead yielding code
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Wed, 20 Apr 2016 15:16:55 +0000 (16:16 +0100)]
Merge remote-tracking branch 'remotes/sstabellini/tags/xen-2016-04-20' into staging
Xen 2016/04/20
# gpg: Signature made Wed 20 Apr 2016 12:08:56 BST using RSA key ID
70E1AE90
# gpg: Good signature from "Stefano Stabellini <stefano.stabellini@eu.citrix.com>"
* remotes/sstabellini/tags/xen-2016-04-20:
xenfb: use the correct condition to avoid excessive looping
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Fam Zheng [Wed, 20 Apr 2016 02:48:36 +0000 (10:48 +0800)]
iotests: Test case for drive-mirror with unaligned image size
This is the regression test for the virtual size mismatch issue between
target and source images.
[ kwolf: Added test_unaligned_with_update ]
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Fam Zheng [Wed, 20 Apr 2016 02:48:35 +0000 (10:48 +0800)]
iotests: Add iotests.image_size
This retrieves the virtual size of the image out of qemu-img info.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fam Zheng [Wed, 20 Apr 2016 02:48:34 +0000 (10:48 +0800)]
mirror: Don't extend the last sub-chunk
The last sub-chunk is rounded up to the copy granularity in the target
image, resulting in a larger size than the source.
Add a function to clip the copied sectors to the end.
This undoes the "wrong" changes to tests/qemu-iotests/109.out in
e5b43573e28. The remaining two offset changes are okay.
[ kwolf: Use DIV_ROUND_UP to calculate nb_chunks now ]
Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Max Reitz [Tue, 19 Apr 2016 22:59:48 +0000 (00:59 +0200)]
block/mirror: Refresh stale bitmap iterator cache
If the drive's dirty bitmap is dirtied while the mirror operation is
running, the cache of the iterator used by the mirror code may become
stale and not contain all dirty bits.
This only becomes an issue if we are looking for contiguously dirty
chunks on the drive. In that case, we can easily detect the discrepancy
and just refresh the iterator if one occurs.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Max Reitz [Tue, 19 Apr 2016 22:59:47 +0000 (00:59 +0200)]
block/mirror: Revive dead yielding code
mirror_iteration() is supposed to wait if the current chunk is subject
to a still in-flight mirroring operation. However, it mixed checking
this conflict situation with checking the dirty status of a chunk. A
simplification for the latter condition (the first chunk encountered is
always dirty) led to neglecting the former: We just skip the first chunk
and thus never test whether it conflicts with an in-flight operation.
To fix this, pull out the code which waits for in-flight operations on
the first chunk of the range to be mirrored to settle.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Maydell [Wed, 20 Apr 2016 14:05:19 +0000 (15:05 +0100)]
Merge remote-tracking branch 'remotes/mdroth/tags/qga-pull-2016-04-19-tag' into staging
qemu-ga patch queue for 2.6
* fixes inadvertant change that unconditionally disables qemu-ga unit test
* fixes make check failures when building with --disable-guest-agent that
were present visible before the unit test was inadvertantly disabled.
# gpg: Signature made Tue 19 Apr 2016 23:30:09 BST using RSA key ID
F108B584
# gpg: Good signature from "Michael Roth <flukshun@gmail.com>"
# gpg: aka "Michael Roth <mdroth@utexas.edu>"
# gpg: aka "Michael Roth <mdroth@linux.vnet.ibm.com>"
* remotes/mdroth/tags/qga-pull-2016-04-19-tag:
qemu-ga: do not run qga test when guest agent disabled
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Wed, 20 Apr 2016 13:42:09 +0000 (14:42 +0100)]
Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging
# gpg: Signature made Tue 19 Apr 2016 17:28:01 BST using RSA key ID
C0DE3057
# gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>"
# gpg: aka "Jeffrey Cody <jeff@codyprime.org>"
# gpg: aka "Jeffrey Cody <codyprime@gmail.com>"
* remotes/cody/tags/block-pull-request:
block/gluster: prevent data loss after i/o error
block/gluster: code movement of qemu_gluster_close()
block/gluster: return correct error value
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Yang Hongyang [Tue, 19 Apr 2016 07:39:13 +0000 (15:39 +0800)]
qemu-ga: do not run qga test when guest agent disabled
When configure with --disable-guest-agent, make check will fail with:
ERROR:tests/test-qga.c:74:fixture_setup: assertion failed (error == NULL):
Failed to execute child process "/home/xx/qemu/qemu-ga" (No such file or
directory) (g-exec-error-quark, 8)
make: *** [check-tests/test-qga] Error 1
This check was commented out by
bab47d9a75a. I think that was by
mistake, because the commit message of that commit didn't mention
this change.
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Cc: qemu-stable@nongnu.org
Peter Maydell [Tue, 19 Apr 2016 09:43:43 +0000 (10:43 +0100)]
Update language files for QEMU 2.6.0
Update translation files (change created via 'make -C po update').
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id:
1461059023-14470-1-git-send-email-peter.maydell@linaro.org
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Jeff Cody [Tue, 5 Apr 2016 14:40:09 +0000 (10:40 -0400)]
block/gluster: prevent data loss after i/o error
Upon receiving an I/O error after an fsync, by default gluster will
dump its cache. However, QEMU will retry the fsync, which is especially
useful when encountering errors such as ENOSPC when using the werror=stop
option. When using caching with gluster, however, the last written data
will be lost upon encountering ENOSPC. Using the write-behind-cache
xlator option of 'resync-failed-syncs-after-fsync' should cause gluster
to retain the cached data after a failed fsync, so that ENOSPC and other
transient errors are recoverable.
Unfortunately, we have no way of knowing if the
'resync-failed-syncs-after-fsync' xlator option is supported, so for now
close the fd and set the BDS driver to NULL upon fsync error.
Signed-off-by: Jeff Cody <jcody@redhat.com>
This page took 0.093652 seconds and 4 git commands to generate.