scsi: esp: make cmdbuf big enough for maximum CDB size
While doing DMA read into ESP command buffer 's->cmdbuf', it could
write past the 's->cmdbuf' area, if it was transferring more than 16
bytes. Increase the command buffer size to 32, which is maximum when
's->do_cmd' is set, and add a check on 'len' to avoid OOB access.
Paolo Bonzini [Wed, 15 Jun 2016 12:29:33 +0000 (14:29 +0200)]
scsi: esp: clean up handle_ti/esp_do_dma if s->do_cmd
Avoid duplicated code between esp_do_dma and handle_ti. esp_do_dma
has the same code that handle_ti contains after the call to esp_do_dma;
but the code in handle_ti is never reached because it is in an "else if".
Remove the else and also the pointless return.
esp_do_dma also has a partially dead assignment of the to_device
variable. Sink it to the point where it's actually used.
Finally, assert that the other caller of esp_do_dma (esp_transfer_data)
only transfers data and not a command. This is true because get_cmd
cancels the old request synchronously before its caller handle_satn_stop
sets do_cmd to 1.
scsi: esp: check buffer length before reading scsi command
The 53C9X Fast SCSI Controller(FSC) comes with an internal 16-byte
FIFO buffer. It is used to handle command and data transfer.
Routine get_cmd() in non-DMA mode, uses 'ti_size' to read scsi
command into a buffer. Add check to validate command length against
buffer size to avoid any overrun.
Eric Blake [Wed, 11 May 2016 22:39:44 +0000 (16:39 -0600)]
nbd: Avoid magic number for NBD max name size
Declare a constant and use that when determining if an export
name fits within the constraints we are willing to support.
Note that upstream NBD recently documented that clients MUST
support export names of 256 bytes (not including trailing NUL),
and SHOULD support names up to 4096 bytes. 4096 is a bit big
(we would lose benefits of stack-allocation of a name array),
and we already have other limits in place (for example, qcow2
snapshot names are clamped around 1024). So for now, just
stick to the required minimum, as that's easier to audit than
a full-scale support for larger names.
Eric Blake [Wed, 11 May 2016 22:39:40 +0000 (16:39 -0600)]
nbd: Clean up ioctl handling of qemu-nbd -c
The kernel ioctl() interface into NBD is limited to 'unsigned long';
we MUST pass in input with that type (and not int or size_t, as
there may be platform ABIs where the wrong types promote incorrectly
through var-args). Furthermore, on 32-bit platforms, the kernel
is limited to a maximum export size of 2T (our BLKSIZE of 512 times
a SIZE_BLOCKS constrained by 32 bit unsigned long).
Eric Blake [Wed, 11 May 2016 22:39:39 +0000 (16:39 -0600)]
nbd: Group all Linux-specific ioctl code in one place
NBD ioctl()s are used to manage an NBD client session where
initial handshake is done in userspace, but then the transmission
phase is handed off to the kernel through a /dev/nbdX device.
As such, all ioctls sent to the kernel on the /dev/nbdX fd belong
in client.c; nbd_disconnect() was out-of-place in server.c.
Eric Blake [Wed, 11 May 2016 22:39:38 +0000 (16:39 -0600)]
nbd: Reject unknown request flags
The NBD protocol says that clients should not send a command flag
that has not been negotiated (whether by the client requesting an
option during a handshake, or because we advertise support for the
flag in response to NBD_OPT_EXPORT_NAME), and that servers should
reject invalid flags with EINVAL. We were silently ignoring the
flags instead. The client can't rely on our behavior, since it is
their fault for passing the bad flag in the first place, but it's
better to be robust up front than to possibly behave differently
than the client was expecting with the attempted flag.
Eric Blake [Wed, 11 May 2016 22:39:37 +0000 (16:39 -0600)]
nbd: Improve server handling of bogus commands
We have a few bugs in how we handle invalid client commands:
- A client can send an NBD_CMD_DISC where from + len overflows,
convincing us to reply with an error and stay connected, even
though the protocol requires us to silently disconnect. Fix by
hoisting the special case sooner.
- A client can send an NBD_CMD_WRITE where from + len overflows,
where we reply to the client with EINVAL without consuming the
payload; this will normally cause us to fail if the next thing
read is not the right magic, but in rare cases, could cause us
to interpret the data payload as valid commands and do things
not requested by the client. Fix by adding a complete flag to
track whether we are in sync or must disconnect.
Furthermore, we have split the checks for bogus from/len across
two functions, when it is easier to do it all at once.
Eric Blake [Wed, 11 May 2016 22:39:36 +0000 (16:39 -0600)]
nbd: Quit server after any write error
We should never ignore failure from nbd_negotiate_send_rep(); if
we are unable to write to the client, then it is not worth trying
to continue the negotiation. Fortunately, the problem is not
too severe - chances are that the errors being ignored here (mainly
inability to write the reply to the client) are indications of
a closed connection or something similar, which will also affect
the next attempt to interact with the client and eventually reach
a point where the errors are detected to end the loop.
Eric Blake [Wed, 11 May 2016 22:39:35 +0000 (16:39 -0600)]
nbd: More debug typo fixes, use correct formats
Clean up some debug message oddities missed earlier; this includes
some typos, and recognizing that %d is not necessarily compatible
with uint32_t. Also add a couple messages that I found useful
while debugging things.
Eric Blake [Wed, 11 May 2016 22:39:34 +0000 (16:39 -0600)]
nbd: Use BDRV_REQ_FUA for better FUA where supported
Rather than always flushing ourselves, let the block layer
forward the FUA on to the underlying device - where all
underlying layers also understand FUA, we are now more
efficient; and where any underlying layer doesn't understand
it, now the block layer takes care of the full flush fallback
on our behalf.
QEMU compiles a list of data directories from various sources. When
consuming a QEMU binary it's useful to be able to get this list of
data directories: a primary reason is so you can list what BIOSes or
keymaps ship with this version of QEMU. However without reproducing
the method that QEMU uses internally, it's not possible to get the
list of data directories.
This commit adds a simple '-L help' option that just lists out the
data directories as qemu calculates them:
$ ./x86_64-softmmu/qemu-system-x86_64 -L help
/home/rjones/d/qemu/pc-bios
/usr/local/share/qemu
$ ./x86_64-softmmu/qemu-system-x86_64 -L /tmp -L help
/tmp
/home/rjones/d/qemu/pc-bios
/usr/local/share/qemu
Chao Peng [Mon, 13 Jun 2016 02:21:27 +0000 (10:21 +0800)]
target-i386: kvm: cache KVM_GET_SUPPORTED_CPUID data
KVM_GET_SUPPORTED_CPUID ioctl is called frequently when initializing
CPU. Depends on CPU features and CPU count, the number of calls can be
extremely high which slows down QEMU booting significantly. In our
testing, we saw 5922 calls with switches:
This ioctl takes more than 100ms, which is almost half of the total
QEMU startup time.
While for most cases the data returned from two different invocations
are not changed, that means, we can cache the data to avoid trapping
into kernel for the second time. To make sure the cache safe one
assumption is desirable: the ioctl is stateless. This is not true for
CPUID leaves in general (such as CPUID leaf 0xD, whose value depends
on guest XCR0 and IA32_XSS) but it is true of KVM_GET_SUPPORTED_CPUID,
which runs before there is a value for XCR0 and IA32_XSS.
Paolo Bonzini [Mon, 13 Jun 2016 09:42:40 +0000 (11:42 +0200)]
nbd: simplify the nbd_request and nbd_reply structs
These structs are never used to represent the bytes that go over the
network. The big-endian network data is built into a uint8_t array
in nbd_{receive,send}_{request,reply}. Remove the unused magic field,
reorder the struct to avoid holes, and remove the packed attribute.
Peter Maydell [Fri, 10 Jun 2016 16:15:42 +0000 (17:15 +0100)]
nbd: Don't use cpu_to_*w() functions
The cpu_to_*w() functions just compose a pointer dereference
with a byteswap. Instead use st*_p(), which handles potential
pointer misalignment and avoids the need to cast the pointer.
Peter Maydell [Fri, 10 Jun 2016 15:00:36 +0000 (16:00 +0100)]
nbd: Don't use *_to_cpup() functions
The *_to_cpup() functions are not very useful, as they simply do
a pointer dereference and then a *_to_cpu(). Instead use either:
* ld*_*_p(), if the data is at an address that might not be
correctly aligned for the load
* a local dereference and *_to_cpu(), if the pointer is
the correct type and known to be correctly aligned
The CONFIG_SIGEV_THREAD_ID switch is unused since the related code
has been removed by commit 6d327171551a12b937c5718073b9848d0274c74d
("aio / timers: Remove alarm timers"), so it can safely be removed
nowadays.
Sergey Fedorov [Thu, 9 Jun 2016 17:58:35 +0000 (20:58 +0300)]
Makefile: Fix tag file generation targets
"ctags" produces a file named "tags", not "ctags". It doesn't look
reasonable to use phony target name as a file name to remove. Just use
exact file names to remove in "ctags" and "TAGS" target receipts.
Paolo Bonzini [Mon, 6 Jun 2016 11:57:39 +0000 (13:57 +0200)]
os-posix: include sys/mman.h
qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check
is bogus without a previous inclusion of sys/mman.h. Include it in
sysemu/os-posix.h and remove it from everywhere else.
Peter Maydell [Thu, 16 Jun 2016 14:22:56 +0000 (15:22 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches
# gpg: Signature made Thu 16 Jun 2016 15:01:27 BST
# gpg: using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (39 commits)
hbitmap: add 'pos < size' asserts
iotests: Add test for oVirt-like storage migration
iotests: Add test for post-mirror backing chains
block/null: Implement bdrv_refresh_filename()
block/mirror: Fix target backing BDS
block: Allow replacement of a BDS by its overlay
rbd:change error_setg() to error_setg_errno()
iotests: 095: Clean up QEMU before showing image info
block: Create the commit block job before reopening any image
block: Prevent sleeping jobs from resuming if they have been paused
block: use the block job list in qmp_query_block_jobs()
block: use the block job list in bdrv_drain_all()
block: Fix snapshot=on with aio=native
block: Remove bs->zero_beyond_eof
qcow2: Let vmstate call qcow2_co_preadv/pwrite directly
block: Make bdrv_load/save_vmstate coroutine_fns
block: Allow .bdrv_load/save_vmstate() to return 0/-errno
block: Make .bdrv_load_vmstate() vectored
block: Introduce bdrv_preadv()
doc: Fix mailing list address in tests/qemu-iotests/README
...
Kevin Wolf [Thu, 16 Jun 2016 13:22:18 +0000 (15:22 +0200)]
Merge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2016-06-16' into queue-block
Block patches
# gpg: Signature made Thu Jun 16 15:21:35 2016 CEST
# gpg: using RSA key 0x3BB14202E838ACAD
# gpg: Good signature from "Max Reitz <[email protected]>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40
# Subkey fingerprint: 58B3 81CE 2DC8 9CF9 9730 EE64 3BB1 4202 E838 ACAD
* mreitz/tags/pull-block-for-kevin-2016-06-16:
hbitmap: add 'pos < size' asserts
iotests: Add test for oVirt-like storage migration
iotests: Add test for post-mirror backing chains
block/null: Implement bdrv_refresh_filename()
block/mirror: Fix target backing BDS
block: Allow replacement of a BDS by its overlay
rbd:change error_setg() to error_setg_errno()
iotests: 095: Clean up QEMU before showing image info
block: Create the commit block job before reopening any image
block: Prevent sleeping jobs from resuming if they have been paused
block: use the block job list in qmp_query_block_jobs()
block: use the block job list in bdrv_drain_all()
Max Reitz [Fri, 10 Jun 2016 18:57:48 +0000 (20:57 +0200)]
block/null: Implement bdrv_refresh_filename()
The null block driver ignores any filename used for creating its BDSs,
which allows creating such BDSs even without any filename at all. In
that case, we currently construct a JSON filename when queried instead
of a plain "null-co://" or "null-aio://". This patch implements
bdrv_refresh_filename() to remedy this behavior.
Max Reitz [Fri, 10 Jun 2016 18:57:47 +0000 (20:57 +0200)]
block/mirror: Fix target backing BDS
Currently, we are trying to move the backing BDS from the source to the
target in bdrv_replace_in_backing_chain() which is called from
mirror_exit(). However, mirror_complete() already tries to open the
target's backing chain with a call to bdrv_open_backing_file().
First, we should only set the target's backing BDS once. Second, the
mirroring block job has a better idea of what to set it to than the
generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
conditions on when to move the backing BDS from source to target are not
really correct).
Therefore, remove that code from bdrv_replace_in_backing_chain() and
leave it to mirror_complete().
Depending on what kind of mirroring is performed, we furthermore want to
use different strategies to open the target's backing chain:
- If blockdev-mirror is used, we can assume the user made sure that the
target already has the correct backing chain. In particular, we should
not try to open a backing file if the target does not have any yet.
- If drive-mirror with mode=absolute-paths is used, we can and should
reuse the already existing chain of nodes that the source BDS is in.
In case of sync=full, no backing BDS is required; with sync=top, we
just link the source's backing BDS to the target, and with sync=none,
we use the source BDS as the target's backing BDS.
We should not try to open these backing files anew because this would
lead to two BDSs existing per physical file in the backing chain, and
we would like to avoid such concurrent access.
- If drive-mirror with mode=existing is used, we have to use the
information provided in the physical image file which means opening
the target's backing chain completely anew, just as it has been done
already.
If the target's backing chain shares images with the source, this may
lead to multiple BDSs per physical image file. But since we cannot
reliably ascertain this case, there is nothing we can do about it.
Max Reitz [Fri, 10 Jun 2016 18:57:46 +0000 (20:57 +0200)]
block: Allow replacement of a BDS by its overlay
change_parent_backing_link() asserts that the BDS to be replaced is not
used as a backing file. However, we may want to replace a BDS by its
overlay in which case that very link should not be redirected.
For instance, when doing a sync=none drive-mirror operation, we may have
the following BDS/BB forest before block job completion:
target
base <- source <- BlockBackend
During job completion, we want to establish the source BDS as the
target's backing node:
target
|
v
base <- source <- BlockBackend
This makes the target a valid replacement for the source:
target <- BlockBackend
|
v
base <- source
Without this modification to change_parent_backing_link() we have to
inject the target into the graph before the source is its backing node,
thus temporarily creating a wrong graph:
Alberto Garcia [Fri, 27 May 2016 10:53:39 +0000 (12:53 +0200)]
block: Create the commit block job before reopening any image
If the base or overlay images need to be reopened in read-write mode
but the block_job_create() call fails then no one will put those
images back in read-only mode.
We can solve this problem easily by calling block_job_create() first.
Alberto Garcia [Fri, 27 May 2016 10:53:36 +0000 (12:53 +0200)]
block: use the block job list in bdrv_drain_all()
bdrv_drain_all() pauses all block jobs by using bdrv_next() to iterate
over all top-level BlockDriverStates. Therefore the code is unable to
find block jobs in other nodes.
This patch uses block_job_next() to iterate over all block jobs.
Kevin Wolf [Thu, 16 Jun 2016 10:59:30 +0000 (12:59 +0200)]
block: Fix snapshot=on with aio=native
snapshot=on creates a temporary overlay that is always opened with
cache=unsafe (the cache mode specified by the user is only for the
actual image file and its children). This means that we must not inherit
the BDRV_O_NATIVE_AIO flag for the temporary overlay because trying to
use Linux AIO with cache=unsafe results in an error.
Reproducer without this patch:
$ x86_64-softmmu/qemu-system-x86_64 -drive file=/tmp/test.qcow2,cache=none,aio=native,snapshot=on
qemu-system-x86_64: -drive file=/tmp/test.qcow2,cache=none,aio=native,snapshot=on: aio=native was
specified, but it requires cache.direct=on, which was not specified.
Kevin Wolf [Wed, 1 Jun 2016 15:07:24 +0000 (17:07 +0200)]
qcow2: Let vmstate call qcow2_co_preadv/pwrite directly
We don't really want to go through the block layer in order to read from
or write to the vmstate in a qcow2 image. Doing so required a few ugly
hacks like saving and restoring the old image size (because writing to
vmstate offsets would increase the image size) or disabling the "reads
after EOF = zeroes" logic. When calling the right functions directly,
these hacks aren't necessary any more.
Note that .bdrv_vmstate_load/save() return 0 instead of the number of
bytes in case of success now.
Thomas Huth [Thu, 16 Jun 2016 07:53:53 +0000 (09:53 +0200)]
doc: Fix mailing list address in tests/qemu-iotests/README
The address of the mailing list is [email protected]
instead of [email protected]. And while we're
at it, also mention the qemu-block mailing list here.
Kevin Wolf [Fri, 28 Nov 2014 14:23:12 +0000 (15:23 +0100)]
linux-aio: Cancel BH if not needed
linux-aio uses a BH in order to make sure that the remaining completions
are processed even in nested event loops of completion callbacks in
order to avoid deadlocks.
There is no need, however, to have the BH overhead for the first call
into qemu_laio_completion_bh() or after all pending completions have
already been processed. Therefore, this patch calls directly into
qemu_laio_completion_bh() in qemu_laio_completion_cb() and cancels
the BH after qemu_laio_completion_bh() has processed all pending
completions.
Kevin Wolf [Fri, 3 Jun 2016 14:45:44 +0000 (16:45 +0200)]
block: Don't enforce 512 byte minimum alignment
If block drivers say that they can do an alignment < 512 bytes, let's
just suppose they mean it. raw-posix used to be an offender with respect
to this, but it can actually deal with byte-aligned requests now.
The default is still 512 bytes for any drivers that only implement
sector-based interfaces, but it is 1 now for drivers that implement
.bdrv_co_preadv.
Kevin Wolf [Fri, 3 Jun 2016 15:36:27 +0000 (17:36 +0200)]
raw-posix: Implement .bdrv_co_preadv/pwritev
The raw-posix block driver actually supports byte-aligned requests now
on non-O_DIRECT images, like it already (and previously incorrectly)
claimed in bs->request_alignment.
For some block drivers this means that a RMW cycle can be avoided when
they write sub-sector metadata e.g. for cluster allocation.
Kevin Wolf [Wed, 6 Aug 2014 15:18:07 +0000 (17:18 +0200)]
raw-posix: Switch to bdrv_co_* interfaces
In order to use the modern byte-based .bdrv_co_preadv/pwritev()
interface, this patch switches raw-posix to coroutine-based interfaces
as a first step. In terms of semantics and performance, it doesn't make
a difference with the existing code whether we go from a coroutine to a
callback-based interface already in block/io.c or only in linux-aio.c
As there have been concerns in the past that this change may be a step
in the wrong direction with respect to a possible AIO fast path, the
old callback-based interface for linux-aio is left around and can be
reactivated when a fast path (e.g. directly from virtio-blk dataplane,
bypassing the whole block layer) is implemented.
Kevin Wolf [Fri, 3 Jun 2016 14:17:28 +0000 (16:17 +0200)]
block: Prepare bdrv_aligned_preadv() for byte-aligned requests
This patch makes bdrv_aligned_preadv() ready to accept byte-aligned
requests. Note that this doesn't mean that such requests are actually
made. The caller still ensures that all requests are aligned to at least
512 bytes.
Kevin Wolf [Thu, 2 Jun 2016 09:41:52 +0000 (11:41 +0200)]
block: Byte-based bdrv_co_do_copy_on_readv()
In a first step to convert the common I/O path to work on bytes rather
than sectors, this converts the copy-on-read logic that is used by
bdrv_aligned_preadv().
The code still exists today, but by a (happy?) accident we entirely
broke the ability to use qcow[2] encryption in the system emulators
in the 2.4.0 release due to
qcow2/qcow: protect against uninitialized encryption key
This commit was designed to prevent future coding bugs which
might cause QEMU to read/write data on an encrypted block
device in plain text mode before a decryption key is set.
It turns out this preventative measure was a little too good,
because we already had a long standing bug where QEMU read
encrypted data in plain text mode during system emulator
startup, in order to guess disk geometry:
Thread 10 (Thread 0x7fffd3fff700 (LWP 30373)):
#0 0x00007fffe90b1a28 in raise () at /lib64/libc.so.6
#1 0x00007fffe90b362a in abort () at /lib64/libc.so.6
#2 0x00007fffe90aa227 in __assert_fail_base () at /lib64/libc.so.6
#3 0x00007fffe90aa2d2 in () at /lib64/libc.so.6
#4 0x000055555587ae19 in qcow2_co_readv (bs=0x5555562accb0, sector_num=0, remaining_sectors=1, qiov=0x7fffffffd260) at block/qcow2.c:1229
#5 0x000055555589b60d in bdrv_aligned_preadv (bs=bs@entry=0x5555562accb0, req=req@entry=0x7fffd3ffea50, offset=offset@entry=0, bytes=bytes@entry=512, align=align@entry=512, qiov=qiov@entry=0x7fffffffd260, flags=0) at block/io.c:908
#6 0x000055555589b8bc in bdrv_co_do_preadv (bs=0x5555562accb0, offset=0, bytes=512, qiov=0x7fffffffd260, flags=<optimized out>) at block/io.c:999
#7 0x000055555589c375 in bdrv_rw_co_entry (opaque=0x7fffffffd210) at block/io.c:544
#8 0x000055555586933b in coroutine_thread (opaque=0x555557876310) at coroutine-gthread.c:134
#9 0x00007ffff64e1835 in g_thread_proxy (data=0x5555562b5590) at gthread.c:778
#10 0x00007ffff6bb760a in start_thread () at /lib64/libpthread.so.0
#11 0x00007fffe917f59d in clone () at /lib64/libc.so.6
Thread 1 (Thread 0x7ffff7ecab40 (LWP 30343)):
#0 0x00007fffe91797a9 in syscall () at /lib64/libc.so.6
#1 0x00007ffff64ff87f in g_cond_wait (cond=cond@entry=0x555555e085f0 <coroutine_cond>, mutex=mutex@entry=0x555555e08600 <coroutine_lock>) at gthread-posix.c:1397
#2 0x00005555558692c3 in qemu_coroutine_switch (co=<optimized out>) at coroutine-gthread.c:117
#3 0x00005555558692c3 in qemu_coroutine_switch (from_=0x5555562b5e30, to_=to_@entry=0x555557876310, action=action@entry=COROUTINE_ENTER) at coroutine-gthread.c:175
#4 0x0000555555868a90 in qemu_coroutine_enter (co=0x555557876310, opaque=0x0) at qemu-coroutine.c:116
#5 0x0000555555859b84 in thread_pool_completion_bh (opaque=0x7fffd40010e0) at thread-pool.c:187
#6 0x0000555555859514 in aio_bh_poll (ctx=ctx@entry=0x5555562953b0) at async.c:85
#7 0x0000555555864d10 in aio_dispatch (ctx=ctx@entry=0x5555562953b0) at aio-posix.c:135
#8 0x0000555555864f75 in aio_poll (ctx=ctx@entry=0x5555562953b0, blocking=blocking@entry=true) at aio-posix.c:291
#9 0x000055555589c40d in bdrv_prwv_co (bs=bs@entry=0x5555562accb0, offset=offset@entry=0, qiov=qiov@entry=0x7fffffffd260, is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at block/io.c:591
#10 0x000055555589c503 in bdrv_rw_co (bs=bs@entry=0x5555562accb0, sector_num=sector_num@entry=0, buf=buf@entry=0x7fffffffd2e0 "\321,", nb_sectors=nb_sectors@entry=21845, is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at block/io.c:614
#11 0x000055555589c562 in bdrv_read_unthrottled (nb_sectors=21845, buf=0x7fffffffd2e0 "\321,", sector_num=0, bs=0x5555562accb0) at block/io.c:622
#12 0x000055555589c562 in bdrv_read_unthrottled (bs=0x5555562accb0, sector_num=sector_num@entry=0, buf=buf@entry=0x7fffffffd2e0 "\321,", nb_sectors=nb_sectors@entry=21845) at block/io.c:634
nb_sectors@entry=1) at block/block-backend.c:504
#14 0x0000555555752e9f in guess_disk_lchs (blk=blk@entry=0x5555562a5290, pcylinders=pcylinders@entry=0x7fffffffd52c, pheads=pheads@entry=0x7fffffffd530, psectors=psectors@entry=0x7fffffffd534) at hw/block/hd-geometry.c:68
#15 0x0000555555752ff7 in hd_geometry_guess (blk=0x5555562a5290, pcyls=pcyls@entry=0x555557875d1c, pheads=pheads@entry=0x555557875d20, psecs=psecs@entry=0x555557875d24, ptrans=ptrans@entry=0x555557875d28) at hw/block/hd-geometry.c:133
#16 0x0000555555752b87 in blkconf_geometry (conf=conf@entry=0x555557875d00, ptrans=ptrans@entry=0x555557875d28, cyls_max=cyls_max@entry=65536, heads_max=heads_max@entry=16, secs_max=secs_max@entry=255, errp=errp@entry=0x7fffffffd5e0) at hw/block/block.c:71
#17 0x0000555555799bc4 in ide_dev_initfn (dev=0x555557875c80, kind=IDE_HD) at hw/ide/qdev.c:174
#18 0x0000555555768394 in device_realize (dev=0x555557875c80, errp=0x7fffffffd640) at hw/core/qdev.c:247
#19 0x0000555555769a81 in device_set_realized (obj=0x555557875c80, value=<optimized out>, errp=0x7fffffffd730) at hw/core/qdev.c:1058
#20 0x00005555558240ce in property_set_bool (obj=0x555557875c80, v=<optimized out>, opaque=0x555557875de0, name=<optimized out>, errp=0x7fffffffd730)
at qom/object.c:1514
#21 0x0000555555826c87 in object_property_set_qobject (obj=obj@entry=0x555557875c80, value=value@entry=0x55555784bcb0, name=name@entry=0x55555591cb3d "realized", errp=errp@entry=0x7fffffffd730) at qom/qom-qobject.c:24
#22 0x0000555555825760 in object_property_set_bool (obj=obj@entry=0x555557875c80, value=value@entry=true, name=name@entry=0x55555591cb3d "realized", errp=errp@entry=0x7fffffffd730) at qom/object.c:905
#23 0x000055555576897b in qdev_init_nofail (dev=dev@entry=0x555557875c80) at hw/core/qdev.c:380
#24 0x0000555555799ead in ide_create_drive (bus=bus@entry=0x555557629630, unit=unit@entry=0, drive=0x5555562b77e0) at hw/ide/qdev.c:122
#25 0x000055555579a746 in pci_ide_create_devs (dev=dev@entry=0x555557628db0, hd_table=hd_table@entry=0x7fffffffd830) at hw/ide/pci.c:440
#26 0x000055555579b165 in pci_piix3_ide_init (bus=<optimized out>, hd_table=0x7fffffffd830, devfn=<optimized out>) at hw/ide/piix.c:218
#27 0x000055555568ca55 in pc_init1 (machine=0x5555562960a0, pci_enabled=1, kvmclock_enabled=<optimized out>) at /home/berrange/src/virt/qemu/hw/i386/pc_piix.c:256
#28 0x0000555555603ab2 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4249
So the safety net is correctly preventing QEMU reading cipher
text as if it were plain text, during startup and aborting QEMU
to avoid bad usage of this data.
For added fun this bug only happens if the encrypted qcow2
file happens to have data written to the first cluster,
otherwise the cluster won't be allocated and so qcow2 would
not try the decryption routines at all, just return all 0's.
That no one even noticed, let alone reported, this bug that
has shipped in 2.4.0, 2.5.0 and 2.6.0 shows that the number
of actual users of encrypted qcow2 is approximately zero.
So rather than fix the crash, and backport it to stable
releases, just go ahead with what we have warned users about
and disable any use of qcow2 encryption in the system
emulators. qemu-img/qemu-io/qemu-nbd are still able to access
qcow2 encrypted images for the sake of data conversion.
In the future, qcow2 will gain support for the alternative
luks format, but when this happens it'll be using the
'-object secret' infrastructure for getting keys, which
avoids this problematic scenario entirely.
Eric Blake [Mon, 13 Jun 2016 18:56:34 +0000 (12:56 -0600)]
block: Avoid bogus flags during mirroring
Commit e253f4b8 converted mirroring from sector-based bdrv_aio_*
to byte-based blk_aio_*, but failed to account for the subtle
difference in signatures (the former takes a semi-redundant length,
the latter takes a flags parameter). Since all of our flags are
currently smaller in size than BDRV_SECTOR_SIZE, it has no ill
effects until we either perform sub-sector mirroring, or we start
asserting that no unexpected flags are set. I found it while
testing new asserts when qemu-iotests 132 started warning about an
unknown flag 0x200000.
Colin Lord [Wed, 8 Jun 2016 17:56:28 +0000 (13:56 -0400)]
blockdev: clarify error on attempt to open locked tray
When opening a device with a locked tray, gives an error explaining the
device tray is locked and that the user should wait and try again. This
is less confusing than the previous error, which simply stated that the
tray was locked.
Kevin Wolf [Wed, 1 Jun 2016 14:55:05 +0000 (16:55 +0200)]
qcow2: Implement .bdrv_co_pwritev()
This changes qcow2 to implement the byte-based .bdrv_co_pwritev
interface rather than the sector-based old one.
As preallocation uses the same allocation function as normal writes, and
the interface of that function needs to be changed, it is converted in
the same patch.
Kevin Wolf [Tue, 31 May 2016 14:13:07 +0000 (16:13 +0200)]
qcow2: Implement .bdrv_co_preadv()
Reading from qcow2 images is now byte granularity.
Most of the affected code in qcow2 actually gets simpler with this
change. The only exception is encryption, which is fixed on 512 bytes
blocks; in order to keep this working, bs->request_alignment is set for
encrypted images.
Kevin Wolf [Tue, 31 May 2016 14:41:09 +0000 (16:41 +0200)]
qcow2: Work with bytes in qcow2_get_cluster_offset()
This patch changes the units that qcow2_get_cluster_offset() uses
internally, without touching the interface just yet. This will be done
in another patch.
Peter Maydell [Thu, 16 Jun 2016 09:53:33 +0000 (10:53 +0100)]
Merge remote-tracking branch 'remotes/amit-migration/tags/migration-for-2.7-4' into staging
Migration:
- Fixes for TLS series
- Postcopy: Add stats, fix, test case
# gpg: Signature made Thu 16 Jun 2016 05:40:09 BST
# gpg: using RSA key 0xEB0B4DFC657EF670
# gpg: Good signature from "Amit Shah <[email protected]>"
# gpg: aka "Amit Shah <[email protected]>"
# gpg: aka "Amit Shah <[email protected]>"
# Primary key fingerprint: 48CA 3722 5FE7 F4A8 B337 2735 1E9A 3B5F 8540 83B6
# Subkey fingerprint: CC63 D332 AB8F 4617 4529 6534 EB0B 4DFC 657E F670
* remotes/amit-migration/tags/migration-for-2.7-4:
migration: rename functions to starting migrations
migration: fix typos in qapi-schema from latest migration additions
Postcopy: Check for support when setting the capability
tests: fix libqtest socket timeouts
test: Postcopy
Postcopy: Add stats on page requests
Migration: Split out ram part of qmp_query_migrate
Postcopy: Avoid 0 length discards
Postcopy: Check for support when setting the capability
Knowing whether the destination host supports migration with
postcopy can be tricky.
The destination doesn't need the capability set, however
if we set it then use the opportunity to do the test and
tell the user/management layer early.
/* I'm a bootable disk */
.org 0x7dfe
.byte 0x55
.byte 0xAA
...........
and that can be assembled by the following magic:
as --32 -march=i486 fill.s -o fill.o
objcopy -O binary fill.o fill.boot
dd if=fill.boot of=bootsect bs=256 count=2 skip=124
xxd -i bootsect
The discard code in migration/ram.c would send request for
zero length discards in the case where no discards were needed.
It doesn't appear to have had any bad effect.
Peter Maydell [Wed, 15 Jun 2016 15:12:19 +0000 (16:12 +0100)]
Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into staging
X86 queue, 2016-06-14 (v2)
# gpg: Signature made Tue 14 Jun 2016 22:29:04 BST
# gpg: using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <[email protected]>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6
* remotes/ehabkost/tags/x86-pull-request:
target-i386: Consolidate calls of object_property_parse() in x86_cpu_parse_featurestr
target-i386: Use cpu_generic_init() in cpu_x86_init()
target-i386: Move xcc->kvm_required check to realize time
target-i386: Remove assert(kvm_enabled()) from host_x86_cpu_initfn()
target-i386: Move features logic that requires CPUState to realize time
target-i386: Remove xlevel & hv-spinlocks option fixups
target-i386: Implement CPUID[0xB] (Extended Topology Enumeration)
pc: Add 2.7 machine
target-i386: add Skylake-Client cpu model
Eduardo Habkost [Fri, 10 Jun 2016 11:42:08 +0000 (08:42 -0300)]
target-i386: Remove assert(kvm_enabled()) from host_x86_cpu_initfn()
The code will be changed to allow creation of the CPU object and
report kvm_required errors only at realizefn, so we need to make
the instance_init function more flexible.
Igor Mammedov [Mon, 6 Jun 2016 15:16:44 +0000 (17:16 +0200)]
target-i386: Move features logic that requires CPUState to realize time
Making x86_cpu_parse_featurestr() a pure convertor
of legacy feature string into global properties, needs
it to be called before a CPU instance is created so
parser shouldn't modify CPUState directly or access
it at all. Hence move current hack that directly pokes
into CPUState, to set/unset +-feats, from parser to
CPU's realize method.
The "fixup will be removed in future versions" warnings are
present since QEMU 1.7.0, at least, so users should have fixed
their scripts and configurations, already.
In the case of libvirt users, libvirt doesn't use the "xlevel"
option, and already rejects HyperV spinlock retry count < 0xFFF.
I looked at a dozen Intel CPU that have this CPUID and all of them
always had Core offset as 1 (a wasted bit when hyperthreading is
disabled) and Package offset at least 4 (wasted bits at <= 4 cores).
QEMU uses more compact IDs and it doesn't make much sense to change it
now. I keep the SMT and Core sub-leaves even if there is just one
thread/core; it makes the code simpler and there should be no harm.
Peter Maydell [Tue, 14 Jun 2016 15:04:25 +0000 (16:04 +0100)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20160614-2' into staging
target-arm queue:
* add PMU support for virt machine under KVM
* fix reset and migration of TTBCR(S)
* add virt-2.7 machine type
* QOMify various ARM devices
* implement xilinx DisplayPort device
* don't permit ARMv8-only Neon insns to work on ARMv7
Peter Maydell [Tue, 14 Jun 2016 14:59:15 +0000 (15:59 +0100)]
target-arm: Don't permit ARMv8-only Neon insns on ARMv7
The Neon instructions VCVTA, VCVTM, VCVTN, VCVTP, VRINTA, VRINTM,
VRINTN, VRINTP, VRINTX, and VRINTZ were only introduced with ARMv8,
so they need a guard to make them UNDEF if the CPU only supports ARMv7.
(We got this right for all the other new-in-v8 insns, but forgot
it for these Neon 2-reg-misc ops.)
Most of the control flow logic between send and recv (error checking
etc) is the same. Factor this out into a common send_recv() API.
This is then usable by clients, where the control logic for send
and receive differs only by a boolean. E.g.