Peter Maydell [Fri, 15 Aug 2014 13:49:50 +0000 (14:49 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block patches
# gpg: Signature made Fri 15 Aug 2014 14:07:42 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
* remotes/kevin/tags/for-upstream: (59 commits)
block: Catch !bs->drv in bdrv_check()
iotests: Add test for image header overlap
qcow2: Catch !*host_offset for data allocation
qcow2: Return useful error code in refcount_init()
mirror: Handle failure for potentially large allocations
vpc: Handle failure for potentially large allocations
vmdk: Handle failure for potentially large allocations
vhdx: Handle failure for potentially large allocations
vdi: Handle failure for potentially large allocations
rbd: Handle failure for potentially large allocations
raw-win32: Handle failure for potentially large allocations
raw-posix: Handle failure for potentially large allocations
qed: Handle failure for potentially large allocations
qcow2: Handle failure for potentially large allocations
qcow1: Handle failure for potentially large allocations
parallels: Handle failure for potentially large allocations
nfs: Handle failure for potentially large allocations
iscsi: Handle failure for potentially large allocations
dmg: Handle failure for potentially large allocations
curl: Handle failure for potentially large allocations
...
Max Reitz [Thu, 7 Aug 2014 20:47:55 +0000 (22:47 +0200)]
block: Catch !bs->drv in bdrv_check()
qemu-img check calls bdrv_check() twice if the first run repaired some
inconsistencies. If the first run however again triggered corruption
prevention (on qcow2) due to very bad inconsistencies, bs->drv may be
NULL afterwards. Thus, bdrv_check() should check whether bs->drv is set.
Max Reitz [Thu, 7 Aug 2014 20:47:53 +0000 (22:47 +0200)]
qcow2: Catch !*host_offset for data allocation
qcow2_alloc_cluster_offset() uses host_offset == 0 as "no preferred
offset" for the (data) cluster range to be allocated. However, this
offset is actually valid and may be allocated on images with a corrupted
refcount table or first refcount block.
In this case, the corruption prevention should normally catch that
write anyway (because it would overwrite the image header). But since 0
is a special value here, the function assumes that nothing has been
allocated at all which it asserts against.
Because this condition is not qemu's fault but rather that of a broken
image, it shouldn't throw an assertion but rather mark the image corrupt
and show an appropriate message, which this patch does by calling the
corruption check earlier than it would be called normally (before the
assertion).
Max Reitz [Wed, 28 May 2014 22:19:54 +0000 (00:19 +0200)]
qcow2: Return useful error code in refcount_init()
If bdrv_pread() returns an error, it is very unlikely that it was
ENOMEM. In this case, the return value should be passed along; as
bdrv_pread() will always either return the number of bytes read or a
negative value (the error code), the condition for checking whether
bdrv_pread() failed can be simplified (and clarified) as well.
Kevin Wolf [Wed, 21 May 2014 16:16:21 +0000 (18:16 +0200)]
mirror: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses the allocations in the mirror block job.
Kevin Wolf [Wed, 21 May 2014 16:08:38 +0000 (18:08 +0200)]
vpc: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses the allocations in the vpc block driver.
Kevin Wolf [Tue, 20 May 2014 11:56:27 +0000 (13:56 +0200)]
vmdk: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses the allocations in the vmdk block driver.
Kevin Wolf [Tue, 20 May 2014 11:55:50 +0000 (13:55 +0200)]
vhdx: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses the allocations in the vhdx block driver.
Kevin Wolf [Tue, 20 May 2014 11:25:43 +0000 (13:25 +0200)]
vdi: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses the allocations in the vdi block driver.
Kevin Wolf [Wed, 21 May 2014 16:11:48 +0000 (18:11 +0200)]
rbd: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses the allocations in the rbd block driver.
Kevin Wolf [Wed, 21 May 2014 16:05:47 +0000 (18:05 +0200)]
raw-win32: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses the allocations in the raw-win32 block driver.
Kevin Wolf [Wed, 21 May 2014 16:02:42 +0000 (18:02 +0200)]
raw-posix: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses the allocations in the raw-posix block driver.
Kevin Wolf [Tue, 20 May 2014 11:39:57 +0000 (13:39 +0200)]
qed: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses the allocations in the qed block driver.
Kevin Wolf [Tue, 20 May 2014 15:12:47 +0000 (17:12 +0200)]
qcow2: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses the allocations in the qcow2 block driver.
Kevin Wolf [Tue, 20 May 2014 11:36:05 +0000 (13:36 +0200)]
qcow1: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses the allocations in the qcow1 block driver.
Kevin Wolf [Tue, 20 May 2014 11:32:14 +0000 (13:32 +0200)]
parallels: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses the allocations in the parallels block driver.
Kevin Wolf [Tue, 20 May 2014 11:31:20 +0000 (13:31 +0200)]
nfs: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses the allocations in the nfs block driver.
Kevin Wolf [Tue, 20 May 2014 11:30:49 +0000 (13:30 +0200)]
iscsi: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses the allocations in the iscsi block driver.
Kevin Wolf [Tue, 20 May 2014 11:28:14 +0000 (13:28 +0200)]
dmg: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses the allocations in the dmg block driver.
Kevin Wolf [Tue, 20 May 2014 11:26:40 +0000 (13:26 +0200)]
curl: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses the allocations in the curl block driver.
Kevin Wolf [Tue, 20 May 2014 11:22:38 +0000 (13:22 +0200)]
cloop: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses the allocations in the cloop block driver.
Kevin Wolf [Tue, 20 May 2014 11:21:26 +0000 (13:21 +0200)]
bochs: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses the allocations in the bochs block driver.
Kevin Wolf [Tue, 20 May 2014 11:16:51 +0000 (13:16 +0200)]
block: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses bounce buffer allocations in block.c. While at it,
convert bdrv_commit() from plain g_malloc() to qemu_try_blockalign().
Jeff Cody [Wed, 23 Jul 2014 21:23:00 +0000 (17:23 -0400)]
block: vpc - use block layer ops in vpc_create, instead of posix calls
Use the block layer to create, and write to, the image file in the VPC
.bdrv_create() operation.
This has a couple of benefits: Images can now be created over protocols,
and hacks such as NOCOW are not needed in the image format driver, and
the underlying file protocol appropriate for the host OS can be relied
upon.
Jeff Cody [Wed, 23 Jul 2014 21:22:59 +0000 (17:22 -0400)]
block: use the standard 'ret' instead of 'result'
Most QEMU code uses 'ret' for function return values. The VDI driver
uses a mix of 'result' and 'ret'. This cleans that up, switching over
to the standard 'ret' usage.
Jeff Cody [Wed, 23 Jul 2014 21:22:58 +0000 (17:22 -0400)]
block: vdi - use block layer ops in vdi_create, instead of posix calls
Use the block layer to create, and write to, the image file in the
VDI .bdrv_create() operation.
This has a couple of benefits: Images can now be created over protocols,
and hacks such as NOCOW are not needed in the image format driver, and
the underlying file protocol appropriate for the host OS can be relied
upon.
Jeff Cody [Wed, 23 Jul 2014 21:22:57 +0000 (17:22 -0400)]
block: allow bdrv_unref() to be passed NULL pointers
If bdrv_unref() is passed a NULL BDS pointer, it is safe to
exit with no operation. This will allow cleanup code to blindly
call bdrv_unref() on a BDS that has been initialized to NULL.
Paolo Bonzini [Wed, 6 Aug 2014 09:33:41 +0000 (11:33 +0200)]
test-coroutine: add baseline test that times the cost of function calls
This can be used to compute the cost of coroutine operations. In the
end the cost of the function call is a few clock cycles, so it's pretty
cheap for now, but it may become more relevant as the coroutine code
is optimized.
For example, here are the results on my machine:
Function call 100000000 iterations: 0.173884 s
Yield 100000000 iterations: 8.445064 s
Lifecycle 1000000 iterations: 0.098445 s
Nesting 10000 iterations of 1000 depth each: 7.406431 s
One yield takes 83 nanoseconds, one enter takes 97 nanoseconds,
one coroutine allocation takes (roughly, since some of the allocations
in the nesting test do hit the pool) 739 nanoseconds:
Jeff Cody [Wed, 6 Aug 2014 19:54:58 +0000 (15:54 -0400)]
block: VHDX endian fixes
This patch contains several changes for endian conversion fixes for
VHDX, particularly for big-endian machines (multibyte values in VHDX are
all on disk in LE format).
Tests were done with existing qemu-iotests on an IBM POWER7 (8406-71Y).
This includes sample images created by Hyper-V, both with dirty logs and
without.
In addition, VHDX image files created (and written to) on a BE machine
were tested on a LE machine, and vice-versa.
Stefan Hajnoczi [Tue, 15 Jul 2014 14:44:26 +0000 (16:44 +0200)]
thread-pool: avoid deadlock in nested aio_poll() calls
The thread pool has a race condition if two elements complete before
thread_pool_completion_bh() runs:
If element A's callback waits for element B using aio_poll() it will
deadlock since pool->completion_bh is not marked scheduled when the
nested aio_poll() runs.
Fix this by marking the BH scheduled while thread_pool_completion_bh()
is executing. This way any nested aio_poll() loops will enter
thread_pool_completion_bh() and complete the remaining elements.
Stefan Hajnoczi [Tue, 15 Jul 2014 14:44:25 +0000 (16:44 +0200)]
thread-pool: avoid per-thread-pool EventNotifier
EventNotifier is implemented using an eventfd or pipe. It therefore
consumes file descriptors, which can be limited by rlimits and should
therefore be used sparingly.
Switch from EventNotifier to QEMUBH in thread-pool.c. Originally
EventNotifier was used because qemu_bh_schedule() was not thread-safe
yet.
Stefan Hajnoczi [Mon, 7 Jul 2014 13:15:53 +0000 (15:15 +0200)]
block: bump coroutine pool size for drives
When a BlockDriverState is associated with a storage controller
DeviceState we expect guest I/O. Use this opportunity to bump the
coroutine pool size by 64.
This patch ensures that the coroutine pool size scales with the number
of drives attached to the guest. It should increase coroutine pool
usage (which makes qemu_coroutine_create() fast) without hogging too
much memory when fewer drives are attached.
@mport: #'mport' is the port number on which mapperd is
listening. This is optional and if not specified,
QEMU will make Archipelago to use the default port.
@vport: #'vport' is the port number on which vlmcd is
listening. This is optional and if not specified,
QEMU will make Archipelago to use the default port.
@segment: #optional The name of the shared memory segment
Archipelago stack is using. This is optional
and if not specified, QEMU will make Archipelago
use the default value, 'archipelago'.
'mport' is the port number on which mapperd is listening. This is optional
and if not specified, QEMU will make Archipelago to use the default port.
'vport' is the port number on which vlmcd is listening. This is optional
and if not specified, QEMU will make Archipelago to use the default port.
'segment' is the name of the shared memory segment Archipelago stack is using.
This is optional and if not specified, QEMU will make Archipelago to use the
default value, 'archipelago'.
This drops the unnecessary bdrv_truncate() from, and also improves,
cluster allocation code path.
Before, when we need a new cluster, get_cluster_offset truncates the
image to bdrv_getlength() + cluster_size, and returns the offset of
added area, i.e. the image length before truncating.
This is not efficient, so it's now rewritten as:
- Save the extent file length when opening.
- When allocating cluster, use the saved length as cluster offset.
- Don't truncate image, because we'll anyway write data there: just
write any data at the EOF position, in descending priority:
* New user data (cluster allocation happens in a write request).
* Filling data in the beginning and/or ending of the new cluster, if
not covered by user data: either backing file content (COW), or
zero for standalone images.
One major benifit of this change is, on host mounted NFS images, even
over a fast network, ftruncate is slow (see the example below). This
change significantly speeds up cluster allocation. Comparing by
converting a cirros image (296M) to VMDK on an NFS mount point, over
1Gbe LAN:
$ time qemu-img convert cirros-0.3.1.img /mnt/a.raw -O vmdk
Before:
real 0m21.796s
user 0m0.130s
sys 0m0.483s
After:
real 0m2.017s
user 0m0.047s
sys 0m0.190s
We also get rid of unchecked bdrv_getlength() and bdrv_truncate(), and
get a little more documentation in function comments.
Tested that this passes qemu-iotests for all VMDK subformats.
qemu-iotests: Add data pattern in version3 VMDK sample image in 059
It's possible that we diverge from the specification with our
implementation. Having a reference image in the test cases may detect
such problems when we introduce a bug that can read what it creates, but
can't handle a real VMDK.
Stefan Hajnoczi [Wed, 9 Jul 2014 12:01:32 +0000 (14:01 +0200)]
qdev-monitor: include QOM properties in -device FOO, help output
Update -device FOO,help to include QOM properties in addition to qdev
properties. Devices are gradually adding more QOM properties that are
not reflected as qdev properties.
It is important to report all device properties since management tools
like libvirt use this information (and device-list-properties QMP) to
detect the presence of QEMU features.
This patch reuses the device-list-properties QMP machinery to avoid code
duplication.
Stefan Hajnoczi [Wed, 9 Jul 2014 12:01:31 +0000 (14:01 +0200)]
qmp: hide "hotplugged" device property from device-list-properties
The "hotplugged" device property was not reported before commit f4eb32b590bf58c1c67570775eb78beb09964fad ("qmp: show QOM properties in
device-list-properties"). Fix this difference.
Stefan Hajnoczi [Wed, 23 Jul 2014 11:55:32 +0000 (12:55 +0100)]
docs/multiple-iothreads.txt: add documentation on IOThread programming
This document explains how IOThreads and the main loop are related,
especially how to write code that can run in an IOThread. Currently
only virtio-blk-data-plane uses these techniques. The next obvious
target is virtio-scsi; there has also been work on virtio-net.
Maria Kustova [Mon, 21 Jul 2014 11:16:33 +0000 (15:16 +0400)]
docs: Make the recommendation for the backing file name position a requirement
The current version of the qcow2 specification recommends to save the backing
file name in the end of the first cluster. It follows that the backing file
name can be saved somewhere in the image, but the first cluster, which
contradicts the current QEMU implementation.
The patch makes the backing file name required to be placed after the header
extensions in the first image cluster.
qemu-img: Make img_convert() get image size just once per image
Chiefly so I don't have to do the error checking in quadruplicate in
the next commit. Moreover, replacing the frequently updated
bs_sectors by an array assigned just once makes the code easier to
understand.
* refresh_total_sectors() converts to sectors, rounding up, and stores
in total_sectors.
* bdrv_getlength() converts total_sectors back to bytes (now rounded
up to a multiple of the sector size).
* Callers wanting sectors rather bytes convert it right back.
Example: bdrv_get_geometry().
bdrv_nb_sectors() provides a way to omit the last two conversions.
It's exactly bdrv_getlength() with the conversion to bytes omitted.
It's functionally like bdrv_get_geometry() without its odd error
handling.
Reimplement bdrv_getlength() and bdrv_get_geometry() on top of
bdrv_nb_sectors().
The next patches will convert some users of bdrv_getlength() to
bdrv_nb_sectors().
Peter Maydell [Fri, 15 Aug 2014 12:41:55 +0000 (13:41 +0100)]
Merge remote-tracking branch 'remotes/mjt/tags/trivial-patches-2014-08-09' into staging
trivial patches for 2014-08-09
# gpg: Signature made Fri 08 Aug 2014 21:36:44 BST using RSA key ID A4C3D7DB
# gpg: Good signature from "Michael Tokarev <[email protected]>"
# gpg: aka "Michael Tokarev <[email protected]>"
# gpg: aka "Michael Tokarev <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D 4324 457C E0A0 8044 65C5
# Subkey fingerprint: 6F67 E18E 7C91 C5B1 5514 66A7 BEE5 9D74 A4C3 D7DB
* remotes/mjt/tags/trivial-patches-2014-08-09:
build-sys: Move qapi-{types, visit, event}.o into util-obj-y
po: Add Chinese translation
qemu-img: Check getchar() return value in read_password() for WIN32
hw/timer: Move extern declaration from .c to .h file
virtio: Move extern declaration to header file
Show length mismatch error is hex
target-i386/cpu.c: Fix two error output indentation
l2tpv3 (configure): it is linux-specific
hw/timer/imx_*: fix TIMER_MAX clash with system symbol
Michael Tokarev [Fri, 1 Aug 2014 19:20:24 +0000 (23:20 +0400)]
l2tpv3 (configure): it is linux-specific
Some non-linux systems, for example a system with
FreeBSD kernel and glibc, may declare struct mmsghdr
(in glibc) but may not have linux-specific header
file linux/ip.h. The actual implementation in qemu
includes this linux-specific header file unconditionally,
so compilation fails if it is not present. Include
this header in the configure test too.
Michael Tokarev [Fri, 1 Aug 2014 20:14:48 +0000 (00:14 +0400)]
hw/timer/imx_*: fix TIMER_MAX clash with system symbol
The symbol TIMER_MAX used in imx_epit.c and imx_gpt.c
clashes with system symbol with the same name. Because
all qemu source files includes qemu-common.h which, in
turn, includes limits.h, which is not unusual to define
it. Rename local symbol to have a reasonable prefix.
Tomoki Sekiyama [Mon, 30 Jun 2014 21:51:40 +0000 (17:51 -0400)]
qga: Disable unsupported commands by default
Currently management softwares cannot know whether a qemu-ga command is
supported or not on the running platform until they actually execute it.
This patch disables unsupported commands at launch time of qemu-ga, so that
management softwares can check whether they are supported from 'enabled'
property of the result from 'guest-info' command.
Tomoki Sekiyama [Mon, 30 Jun 2014 21:51:34 +0000 (17:51 -0400)]
qga: Add guest-get-fsinfo command
Add command to get mounted filesystems information in the guest.
The returned value contains a list of mountpoint paths and
corresponding disks info such as disk bus type, drive address,
and the disk controllers' PCI addresses, so that management layer
such as libvirt can resolve the disk backends.
For example, when `lsblk' result is:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdb 8:16 0 1G 0 disk
`-sdb1 8:17 0 1024M 0 part
`-vg0-lv0 253:1 0 1.4G 0 lvm /mnt/test
sdc 8:32 0 1G 0 disk
`-sdc1 8:33 0 512M 0 part
`-vg0-lv0 253:1 0 1.4G 0 lvm /mnt/test
vda 252:0 0 25G 0 disk
`-vda1 252:1 0 25G 0 part /
where sdb is a SCSI disk with PCI controller 0000:00:0a.0 and ID=1,
sdc is an IDE disk with PCI controller 0000:00:01.1, and
vda is a virtio-blk disk with PCI device 0000:00:06.0,
guest-get-fsinfo command will return the following result:
In Linux guest, the disk information is resolved from sysfs. So far,
it only supports virtio-blk, virtio-scsi, IDE, SATA, SCSI disks on x86
hosts, and "disk" parameter may be empty for unsupported disk types.
Signed-off-by: Tomoki Sekiyama <[email protected]>
*updated schema to report 2.2 as initial supported version
Tomoki Sekiyama [Mon, 30 Jun 2014 21:51:27 +0000 (17:51 -0400)]
qga: Add guest-fsfreeze-freeze-list command
If an array of mount point paths is specified as 'mountpoints' argument
of guest-fsfreeze-freeze-list, qemu-ga will only freeze the file systems
mounted on specified paths in Linux guests. Otherwise, it works as the
same way as guest-fsfreeze-freeze.
This would be useful when the host wants to create partial disk snapshots.
Signed-off-by: Tomoki Sekiyama <[email protected]> Reviewed-by: Eric Blake <[email protected]>
*updated schema to report 2.2 as initial supported version
Peter Maydell [Thu, 7 Aug 2014 13:54:47 +0000 (14:54 +0100)]
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
KVM changes include a MIPS patch and the testdev backend used by the
ARM kvm-unit-tests. icount include the first part of reverse execution
and Sebastian Tanase's patches to slow down -icount execution to the
desired speed of the target.
v1->v2: fix dump_drift_info to print nothing outside icount mode,
and to compile on 32-bit architectures
# gpg: Signature made Thu 07 Aug 2014 14:09:58 BST using RSA key ID 9B4D86F2
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg: aka "Paolo Bonzini <[email protected]>"
* remotes/bonzini/tags/for-upstream:
target-mips: Ignore unassigned accesses with KVM
monitor: Add drift info to 'info jit'
cpu-exec: Print to console if the guest is late
cpu-exec: Add sleeping algorithm
icount: Add align option to icount
icount: Add QemuOpts for icount
icount: Fix virtual clock start value on ARM
timer: add cpu_icount_to_ns function.
migration: migrate icount fields.
icount: put icount variables into TimerState.
backends: Introduce chr-testdev
James Hogan [Mon, 28 Jul 2014 11:37:50 +0000 (12:37 +0100)]
target-mips: Ignore unassigned accesses with KVM
MIPS registers an unassigned access handler which raises a guest bus
error exception. However this causes QEMU to crash when KVM is enabled
as it isn't called from the main execution loop so longjmp() gets called
without a corresponding setjmp().
Until the KVM API can be updated to trigger a guest exception in
response to an MMIO exit, prevent the bus error exception being raised
from mips_cpu_unassigned_access() if KVM is enabled.
The check is at run time since the do_unassigned_access callback is
initialised before it is known whether KVM will be enabled.
The problem can be triggered with Malta emulation by making the guest
write to the reset region at physical address 0x1bf00000, since it is
marked read-only which is treated as unassigned for writes.
Sebastian Tanase [Fri, 25 Jul 2014 09:56:33 +0000 (11:56 +0200)]
monitor: Add drift info to 'info jit'
Show in 'info jit' the current delay between the host clock
and the guest clock. In addition, print the maximum advance
and delay of the guest compared to the host.
Sebastian Tanase [Fri, 25 Jul 2014 09:56:32 +0000 (11:56 +0200)]
cpu-exec: Print to console if the guest is late
If the align option is enabled, we print to the user whenever
the guest clock is behind the host clock in order for he/she
to have a hint about the actual performance. The maximum
print interval is 2s and we limit the number of messages to 100.
If desired, this can be changed in cpu-exec.c
Sebastian Tanase [Fri, 25 Jul 2014 09:56:31 +0000 (11:56 +0200)]
cpu-exec: Add sleeping algorithm
The goal is to sleep qemu whenever the guest clock
is in advance compared to the host clock (we use
the monotonic clocks). The amount of time to sleep
is calculated in the execution loop in cpu_exec.
At first, we tried to approximate at each for loop the real time elapsed
while searching for a TB (generating or retrieving from cache) and
executing it. We would then approximate the virtual time corresponding
to the number of virtual instructions executed. The difference between
these 2 values would allow us to know if the guest is in advance or delayed.
However, the function used for measuring the real time
(qemu_clock_get_ns(QEMU_CLOCK_REALTIME)) proved to be very expensive.
We had an added overhead of 13% of the total run time.
Therefore, we modified the algorithm and only take into account the
difference between the 2 clocks at the begining of the cpu_exec function.
During the for loop we try to reduce the advance of the guest only by
computing the virtual time elapsed and sleeping if necessary. The overhead
is thus reduced to 3%. Even though this method still has a noticeable
overhead, it no longer is a bottleneck in trying to achieve a better
guest frequency for which the guest clock is faster than the host one.
As for the the alignement of the 2 clocks, with the first algorithm
the guest clock was oscillating between -1 and 1ms compared to the host clock.
Using the second algorithm we notice that the guest is 5ms behind the host, which
is still acceptable for our use case.
The tests where conducted using fio and stress. The host machine in an i5 CPU at
3.10GHz running Debian Jessie (kernel 3.12). The guest machine is an arm versatile-pb
built with buildroot.
Currently, on our test machine, the lowest icount we can achieve that is suitable for
aligning the 2 clocks is 6. However, we observe that the IO tests (using fio) are
slower than the cpu tests (using stress).
Sebastian Tanase [Wed, 23 Jul 2014 09:47:50 +0000 (11:47 +0200)]
icount: Fix virtual clock start value on ARM
When using the icount option on ARM, the virtual
clock starts counting at realtime clock but it
should start at 0.
The reason why the virtual clock starts at realtime clock
is because the first time we call qemu_clock_warp (which
calls icount_warp_rt) in tcg_exec_all, qemu_icount_bias
(which is part of the virtual time computation mechanism)
will increment by realtime - vm_clock_warp_start, with
vm_clock_warp_start being 0 (see icount_warp_rt in cpus.c).
By changing the value of vm_clock_warp_start from 0 to -1,
the first time we call qemu_clock_warp which calls
icount_warp_rt, we will return immediatly because
icount_warp_rt first checks if vm_clock_warp_start is -1
and if it's the case it returns. Therefore, qemu_icount_bias
will first be incremented by the value of a virtual timer
deadline when the virtual cpu goes from active to inactive.
The virtual time will start at 0 and increment based
on the instruction counter when the vcpu is active or
the qemu_icount_bias value when inactive.
KONRAD Frederic [Thu, 31 Jul 2014 23:37:10 +0000 (01:37 +0200)]
migration: migrate icount fields.
This fixes a bug where qemu_icount and qemu_icount_bias are not migrated.
It adds a subsection "timer/icount" to vmstate_timers so icount is migrated only
when needed.
chr-testdev enables a virtio serial channel to be used for guest
initiated qemu exits. hw/misc/debugexit already enables guest
initiated qemu exits, but only for PC targets. chr-testdev supports
any virtio-capable target. kvm-unit-tests/arm is already making use
of this backend.
Currently there is a single command implemented, "q". It takes a
(prefix) argument for the exit code, thus an exit is implemented by
writing, e.g. "1q", to the virtio-serial port.
It can be used as:
$QEMU ... \
-device virtio-serial-device \
-device virtserialport,chardev=ctd -chardev testdev,id=ctd
Alex Williamson [Tue, 5 Aug 2014 19:05:57 +0000 (13:05 -0600)]
vfio: Don't cache MSIMessage
Commit 40509f7f added a test to avoid updating KVM MSI routes when the
MSIMessage is unchanged and f4d45d47 switched to relying on this
rather than doing our own comparison. Our cached msg is effectively
unused now. Remove it.
Alex Williamson [Tue, 5 Aug 2014 19:05:52 +0000 (13:05 -0600)]
vfio: Fix MSI-X vector expansion
When new MSI-X vectors are enabled we need to disable MSI-X and
re-enable it with the correct number of vectors. That means we need
to reprogram the eventfd triggers for each vector. Prior to f4d45d47
vector->use tracked whether a vector was masked or unmasked and we
could always pick the KVM path when available for unmasked vectors.
Now vfio doesn't track mask state itself and vector->use and virq
remains configured even for masked vectors. Therefore we need to ask
the MSI-X code whether a vector is masked in order to select the
correct signaling path. As noted in the comment, MSI relies on
hardware to handle masking.
Peter Maydell [Mon, 4 Aug 2014 14:01:38 +0000 (15:01 +0100)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20140804' into staging
target-arm queue:
* Set PC correctly when loading AArch64 ELF files
* sdhci: Fix ADMA dma_memory_read access
* some more foundational work for EL2/EL3 support
* fix bugs which reveal themselves if the TARGET_PAGE_SIZE
is not set to 1K
# gpg: Signature made Mon 04 Aug 2014 14:51:34 BST using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <[email protected]>"
* remotes/pmaydell/tags/pull-target-arm-20140804:
target-arm: A64: fix TLB flush instructions
target-arm: don't hardcode mask values in arm_cpu_handle_mmu_fault
target-arm: Fix bit test in sp_el0_access
target-arm: Add FAR_EL2 and 3
target-arm: Add ESR_EL2 and 3
target-arm: Make far_el1 an array
target-arm: A64: Respect SPSEL when taking exceptions
target-arm: A64: Respect SPSEL in ERET SP restore
target-arm: A64: Break out aarch64_save/restore_sp
sd: sdhci: Fix ADMA dma_memory_read access
hw/arm/virt: formatting: memory map
hw/arm/boot: Set PC correctly when loading AArch64 ELF files
Alex Bennée [Mon, 4 Aug 2014 13:41:56 +0000 (14:41 +0100)]
target-arm: A64: fix TLB flush instructions
According to the ARM ARM we weren't correctly flushing the TLB entries
where bits 63:56 didn't match bit 55 of the virtual address. This
exposed a problem when we switched QEMU's internal TARGET_PAGE_BITS to
12 for aarch64.