Andreas Färber [Mon, 29 Jul 2013 03:44:47 +0000 (05:44 +0200)]
qtest: Prepare QOM machine tests
Instantiate all [*] machines per target, so that they get a bit of test
coverage at all. This has proven helpful during QOM refactorings.
[*] ppcemb target contains some non-working non-embedded machines, and
ppc405 CPUs are not available there either.
i386 and x86_64 do not cover pc*-x.y or xenfv.
Andreas Färber [Tue, 5 Nov 2013 16:46:04 +0000 (17:46 +0100)]
Merge tag 'for_anthony' of git://git.kernel.org/pub/scm/virt/kvm/mst/qemu
pci, pc, pvpanic bug fixes
This fixes strange pvpanic behaviour: you had to
pause to let VM continue (and potentially reboot on panic
if enabled).
This also fixes two bugs reported by Andreas.
One is a long-standing bug exposed by recent pci changes,
the other affects old piix machine types and was caused
by recent acpi changes.
Paolo Bonzini [Mon, 4 Nov 2013 13:30:47 +0000 (14:30 +0100)]
vl: allow "cont" from panicked state
After reporting the GUEST_PANICKED monitor event, QEMU stops the VM.
The reason for this is that events are edge-triggered, and can be lost if
management dies at the wrong time. Stopping a panicked VM lets management
know of a panic even if it has crashed; management can learn about the
panic when it restarts and queries running QEMU processes. The downside
is of course that the VM will be paused while management is not running,
but that is acceptable if it only happens with explicit "-device pvpanic".
Upon learning of a panic, management (if configured to do so) can pick a
variety of behaviors: leave the VM paused, reset it, destroy it. In
addition to all of these behaviors, it is possible to dump the VM core
from the host.
However, right now, the panicked state is irreversible, and can only be
exited by resetting the machine. This means that any policy decision
is entirely in the hands of the host. In particular there is no way to
use the "reboot on panic" option together with pvpanic.
This patch makes the panicked state reversible (and removes various
workarounds that were there because of the state being irreversible).
With this change, management has a wider set of possible policies: it
can just log the crash and leave policy to the guest, it can leave the
VM paused. In particular, the "log the crash and continue" is implemented
simply by sending a "cont" as soon as management learns about the panic.
Management could also implement the "irreversible paused state" itself.
And again, all such actions can be coupled with dumping the VM core.
Unfortunately we cannot change the behavior of 1.6.0. Thus, even if
it uses "-device pvpanic", management should check for "cont" failures.
If "cont" fails, management can then log that the VM remained paused
and urge the administrator to update QEMU.
Anthony Liguori [Thu, 31 Oct 2013 16:02:26 +0000 (17:02 +0100)]
Merge remote-tracking branch 'kwolf/tags/for-anthony' into staging
Block patches for 1.7.0-rc0 (v2)
# gpg: Signature made Thu 31 Oct 2013 04:44:39 PM CET using RSA key ID C88F2FD6
# gpg: Can't check signature: public key not found
* kwolf/tags/for-anthony: (30 commits)
vmdk: Implment bdrv_get_specific_info
qapi: Add optional field 'compressed' to ImageInfo
qemu-iotests: prefill some data to test image
sheepdog: check simultaneous create in resend_aioreq
sheepdog: cancel aio requests if possible
sheepdog: make add_aio_request and send_aioreq void functions
sheepdog: try to reconnect to sheepdog after network error
coroutine: add co_aio_sleep_ns() to allow sleep in block drivers
sheepdog: reload inode outside of resend_aioreq
sheepdog: handle vdi objects in resend_aio_req
sheepdog: check return values of qemu_co_recv/send correctly
qemu-iotests: Test case for backing file deletion
qemu-iotests: drop duplicated "create_image"
qemu-iotests: Fix 051 reference output
block: Avoid unecessary drv->bdrv_getlength() calls
block: Disable BDRV_O_COPY_ON_READ for the backing file
ahci: fix win7 hang on boot
sheepdog: pass copy_policy in the request
sheepdog: explicitly set copies as type uint8_t
block: Don't copy backing file name on error
...
Anthony Liguori [Thu, 31 Oct 2013 16:01:12 +0000 (17:01 +0100)]
Merge remote-tracking branch 'agraf/ppc-for-upstream' into staging
* agraf/ppc-for-upstream: (29 commits)
spapr: Use DeviceClass::fw_name for device tree CPU node
target-ppc: Fill in OpenFirmware names for some PowerPCCPU families
target-ppc: dump-guest-memory support
dump-guest-memory: Check for the correct return value
target-ppc: Use #define for max slb entries
target-ppc: Check for error on address translation in memsave command
target-ppc: Update slb array with correct index values.
spapr-pci: enable irqfd for INTx
xics-kvm: enable irqfd for MSI
xics: Implement H_XIRR_X
xics: Implement H_IPOLL
xics-kvm: Support for in-kernel XICS interrupt controller
xics: add cpu_setup callback
xics: split to xics and xics-common
xics: add missing const specifiers to TypeInfo
xics: convert init() to realize()
xics: add pre_save/post_load dispatchers
xics: replace fprintf with error_report
spapr: move cpu_setup after kvmppc_set_papr
xics: move reset and cpu_setup
...
Anthony Liguori [Thu, 31 Oct 2013 16:00:25 +0000 (17:00 +0100)]
Merge remote-tracking branch 'kraxel/usb.91' into staging
* kraxel/usb.91:
usb-hcd-xhci: Update endpoint context dequeue pointer for streams too
usb-hcd-xhci: Report completion of active transfer with CC_STOPPED on ep stop
usb-hcd-xhci: Remove unused cancelled member from XHCITransfer
usb-hcd-xhci: Remove unused sstreamsm member from XHCIStreamContext
usb-host-libusb: Detach kernel drivers earlier
usb-host-libusb: Configuration 0 may be a valid configuration
usb-host-libusb: Fix reset handling
Anthony Liguori [Thu, 31 Oct 2013 15:58:32 +0000 (16:58 +0100)]
Merge remote-tracking branch 'mst/tags/for_anthony' into staging
pci, pc, acpi fixes, enhancements
This includes some pretty big changes:
- pci master abort support by Marcel
- pci IRQ API rework by Marcel
- acpi generation support by myself
Everything has gone through several revisions, latest versions have been on
list for a while without any more comments, tested by several
people.
Please pull for 1.7.
Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Tue 15 Oct 2013 07:33:48 AM CEST using RSA key ID D28D5469
# gpg: Can't check signature: public key not found
* mst/tags/for_anthony: (39 commits)
ssdt-proc: update generated file
ssdt: fix PBLK length
i386: ACPI table generation code from seabios
pc: use new api to add builtin tables
acpi: add interface to access user-installed tables
hpet: add API to find it
pvpanic: add API to access io port
ich9: APIs for pc guest info
piix: APIs for pc guest info
acpi/piix: add macros for acpi property names
i386: define pc guest info
loader: allow adding ROMs in done callbacks
i386: add bios linker/loader
loader: use file path size from fw_cfg.h
acpi: ssdt pcihp: updat generated file
acpi: pre-compiled ASL files
acpi: add rules to compile ASL source
i386: add ACPI table files from seabios
q35: expose mmcfg size as a property
q35: use macro for MCFG property name
...
Alex Bennée [Tue, 22 Oct 2013 14:16:06 +0000 (15:16 +0100)]
integrator: fix Linux boot failure by emulating dbg region
Commit 9b8c69243 (since reverted) broke the ability to boot the kernel
as the value returned by unassigned_mem_read returned non-zero and left
the kernel looping forever waiting for it to change (see
integrator_led_set in the kernel code).
Relying on a varying implementation detail is incorrect anyway so this
introduces a basic stub of a memory region for the debug/LED section
on the integrator board.
Alvise Rigo [Fri, 11 Oct 2013 17:38:45 +0000 (19:38 +0200)]
target-arm: fix sorting issue of KVM cpreg list
The compare_u64 function was not sorting the KVM cpreg_list in the
right way due to the wrong returned value. Since we are comparing
two 64bit values we can't simply return their difference if the
returned type is int.
Alvise Rigo [Fri, 11 Oct 2013 17:38:44 +0000 (19:38 +0200)]
target-arm: sort TCG cpreg list by KVM-style 64 bit ID number
Both KVM and TCG populate the cpreg_list with 64 bit register IDs,
but in the TCG side the cpreg_list is sorted using the 32 bit ID
version while in the kvm side the 64 bit ID version is used. This
patch makes the sorting of the cpreg_list consistent between KVM and
TCG.
Peter Maydell [Fri, 25 Oct 2013 14:44:38 +0000 (15:44 +0100)]
hw/arm/boot: Make user not specifying a kernel not an error
Typically ARM boards will have some kind of flash which might contain
a boot ROM; it's therefore a valid use case to provide only an
image for the boot ROM and not require QEMU's internal boot loader
at all. Remove the fatal error if -kernel isn't specified.
Fam Zheng [Wed, 30 Oct 2013 09:42:28 +0000 (17:42 +0800)]
qemu-iotests: prefill some data to test image
Case 030 occasionally fails because of block job compltes too fast to be
captured by script, and 'unexpected qmp event' of job completion causes
the test failure.
Simply fill in some data to the test image to make this false alarm less
likely to happen.
(For other benefits to prefill data to test image, see also commit ab68cdfaa).
MORITA Kazutaka [Thu, 24 Oct 2013 07:01:18 +0000 (16:01 +0900)]
sheepdog: check simultaneous create in resend_aioreq
After reconnection happens, all the inflight requests are moved to the
failed request list. As a result, sd_co_rw_vector() can send another
create request before resend_aioreq() resends a create request from
the failed list.
This patch adds a helper function check_simultaneous_create() and
checks simultaneous create requests more strictly in resend_aioreq().
MORITA Kazutaka [Thu, 24 Oct 2013 07:01:17 +0000 (16:01 +0900)]
sheepdog: cancel aio requests if possible
This patch tries to cancel aio requests in pending queue and failed
queue. When the sheepdog driver cannot cancel the requests, it waits
for them to be completed.
MORITA Kazutaka [Thu, 24 Oct 2013 07:01:15 +0000 (16:01 +0900)]
sheepdog: try to reconnect to sheepdog after network error
This introduces a failed request queue and links all the inflight
requests to the list after network error happens. After QEMU
reconnects to the sheepdog server successfully, the sheepdog block
driver will retry all the requests in the failed queue.
Max Reitz [Tue, 29 Oct 2013 18:18:54 +0000 (19:18 +0100)]
qemu-iotests: Test case for backing file deletion
Add a test case for trying to open an image file where it is impossible
to open its backing file (in this case, because it was deleted). When
doing this, qemu (or qemu-io in this case) should not crash but rather
print an appropriate error message.
The block layer generally keeps the size of an image cached in
bs->total_sectors so that it doesn't have to perform expensive
operations to get the size whenever it needs it.
This doesn't work however when using a backend that can change its size
without qemu being aware of it, i.e. passthrough of removable media like
CD-ROMs or floppy disks. For this reason, the caching is disabled when a
removable device is used.
It is obvious that checking whether the _guest_ device has removable
media isn't the right thing to do when we want to know whether the size
of the host backend can change. To make things worse, non-top-level
BlockDriverStates never have any device attached, which makes qemu
assume they are removable, so drv->bdrv_getlength() is always called on
the protocol layer. In the case of raw-posix, this causes unnecessary
lseek() system calls, which turned out to be rather expensive.
This patch completely changes the logic and disables bs->total_sectors
caching only for certain block driver types, for which a size change is
expected: host_cdrom and host_floppy on POSIX, host_device on win32; also
the raw format in case it sits on top of one of these protocols, but in
the common case the nested bdrv_getlength() call on the protocol driver
will use the cache again and avoid an expensive drv->bdrv_getlength()
call.
Thibaut LAURENT [Fri, 25 Oct 2013 00:15:07 +0000 (02:15 +0200)]
block: Disable BDRV_O_COPY_ON_READ for the backing file
Since commit 0ebd24e0a203cf2852c310b59fbe050190dc6c8c,
bdrv_open_common will throw an error when trying to open a file
read-only with the BDRV_O_COPY_ON_READ flag set.
Although BDRV_O_RDWR is unset for the backing files,
BDRV_O_COPY_ON_READ is still passed on if copy-on-read was requested
for the drive. Let's unset this flag too before opening the backing
file, or bdrv_open_common will fail.
Alexander Graf [Mon, 28 Oct 2013 19:01:51 +0000 (21:01 +0200)]
ahci: fix win7 hang on boot
When AHCI executes an asynchronous IDE command, it checked DRDY without
checking either DRQ or BSY. This sometimes caused interrupt to be sent
before command is actually completed.
This resulted in a race condition: if guest then managed to access the
device before command has completed, it would hang waiting for an
interrupt.
This was observed with windows 7 guests.
To fix, check for DRQ or BSY in additiona to DRDY, if set,
the command is asynchronous so delay the interrupt until
asynchronous done callback is invoked.
Liu Yuan [Wed, 23 Oct 2013 08:51:52 +0000 (16:51 +0800)]
sheepdog: pass copy_policy in the request
Currently copy_policy isn't used. Recent sheepdog supports erasure coding, which
make use of copy_policy internally, but require client explicitly passing
copy_policy from base inode to newly creately inode for snapshot related
operations.
If connected sheep daemon doesn't utilize copy_policy, passing it to sheep
daemon is just one extra null effect operation. So no compatibility problem.
With this patch, sheepdog can provide erasure coded volume for QEMU VM.
Liu Yuan [Wed, 23 Oct 2013 08:51:51 +0000 (16:51 +0800)]
sheepdog: explicitly set copies as type uint8_t
'copies' is actually uint8_t since day one, but request headers and some helper
functions parameterize it as uint32_t for unknown reasons and effectively
reserve 24 bytes for possible future use. This patch explicitly set the correct
for copies and reserve the left bytes.
This is a preparation patch that allow passing copy_policy in request header.
Max Reitz [Sat, 26 Oct 2013 13:44:43 +0000 (15:44 +0200)]
block: Don't copy backing file name on error
bdrv_open_backing_file() tries to copy the backing file name using
pstrcpy directly after calling bdrv_open() to open the backing file
without checking whether that was actually successful. If it was not,
ps->backing_hd->file will probably be NULL and qemu will crash.
Fix this by moving pstrcpy after checking whether bdrv_open() succeeded.
Kevin Wolf [Thu, 27 Jun 2013 11:50:05 +0000 (13:50 +0200)]
tests: Multiboot mmap test case
This adds a test case for Multiboot memory map in the tests/multiboot
directory, where future i386 test kernels can be dropped. Because this
requires an x86 build host and an installed 32 bit libgcc, the test is
not part of a regular 'make check'.
The reference output for the test is verified against test runs of the
same multiboot kernel booted by some GRUB 0.97.
Kevin Wolf [Mon, 22 Jul 2013 12:30:23 +0000 (14:30 +0200)]
exec: Fix bounce buffer allocation in address_space_map()
This fixes a regression introduced by commit e3127ae0c, which kept the
allocation size of the bounce buffer limited to one page in order to
avoid unbounded allocations (as explained in the commit message of 6d16c2f88), but broke the reporting of the shortened bounce buffer to
the caller. The caller therefore assumes that the full requested size
was provided and causes memory corruption when writing beyond the end of
the actually allocated buffer.
Max Reitz [Thu, 24 Oct 2013 18:35:06 +0000 (20:35 +0200)]
qcow2: Flush image after creation
Opening the qcow2 image with BDRV_O_NO_FLUSH prevents any flushes during
the image creation. This means that the image has not yet been flushed
to disk when qemu-img create exits. This flush is delayed until the next
operation on the image involving opening it without BDRV_O_NO_FLUSH and
closing (or directly flushing) it. For large images and/or images with a
small cluster size and preallocated metadata, this flush may take a
significant amount of time and may occur unexpectedly.
Reopening the image without BDRV_O_NO_FLUSH right before the end of
qcow2_create2() results in hoisting the potentially costly flush into
the image creation, which is expected to take some time (whereas
successive image operations may be not).
Michael Tokarev [Mon, 21 Oct 2013 08:33:58 +0000 (12:33 +0400)]
configure: create fsdev/ directory
In some cases when building with parallelism (make -jN),
build fails because the directory where output files are
supposed to be does not exist. In particular, when make
decides to build virtfs-proxy-helper.1 before other files
in fsdev/, build will fail with the following error:
perl -Ww -- BUILDDIR/scripts/texi2pod.pl BUILDDIR/fsdev/virtfs-proxy-helper.texi fsdev/virtfs-proxy-helper.pod && pod2man --utf8 --section=1 --center=" " --release=" " fsdev/virtfs-proxy-helper.pod > fsdev/virtfs-proxy-helper.1
opening "fsdev/virtfs-proxy-helper.pod": No such file or directory
Andreas Färber [Tue, 15 Oct 2013 16:33:37 +0000 (18:33 +0200)]
spapr: Use DeviceClass::fw_name for device tree CPU node
Instead of relying on cpu_model, obtain the device tree node label
per CPU. Use DeviceClass::fw_name as source.
Whenever DeviceClass::fw_name is unknown, default to "PowerPC,UNKNOWN".
As a consequence, spapr_fixup_cpu_dt() can operate on each CPU's fw_name,
obsoleting sPAPREnvironment::cpu_model, and spapr_create_fdt_skel() can
drop its cpu_model argument.
Andreas Färber [Tue, 15 Oct 2013 16:33:36 +0000 (18:33 +0200)]
target-ppc: Fill in OpenFirmware names for some PowerPCCPU families
Set the expected values for POWER7, POWER7+, POWER8 and POWER5+.
Note that POWER5+ and POWER7+ are intentionally lacking the '+', so the
lack of a POWER7P family constitutes no problem.
This enables IRQFD for LSI (level triggered INTx interrupts) by adding
a spapr_route_intx_pin_to_irq() callback to the sPAPR PCI host bus. This
callback is called to know the global interrupt number to link resampling fd
with IRQFD's fd in KVM.
This implements H_XIRR_X hypercall in addition to H_XIRR as
it is mandatory for PAPR+ and there is no way for the guest to
detect whether it is supported or not so just add it.
As the Partition Adjunct Option is not supported at the moment,
the CPPR parameter of the hypercall is ignored.
This adds support for the H_IPOLL hypercall which the guest
uses to poll for a pending interrupt. This hypercall is
mandatory for PAPR+ and there is no way for the guest to
detect whether it is supported or not so just add it.
David Gibson [Thu, 26 Sep 2013 06:18:44 +0000 (16:18 +1000)]
xics-kvm: Support for in-kernel XICS interrupt controller
Recent (host) kernels support emulating the PAPR defined "XICS" interrupt
controller system within KVM. This patch allows qemu to initialize and
configure the in-kernel XICS, and keep its state in sync with qemu's XICS
state as necessary.
This should give considerable performance improvements. e.g. on a simple
IPI ping-pong test between hardware threads, using qemu XICS gives us
around 5,000 irqs/second, whereas the in-kernel XICS gives us around
70,000 irqs/s on the same hardware configuration.
The upcoming XICS-KVM support will use bits of emulated XICS code.
So this introduces new level of hierarchy - "xics-common" class. Both
emulated XICS and XICS-KVM will inherit from it and override class
callbacks when required.
The new "xics-common" class implements:
1. replaces static "nr_irqs" and "nr_servers" properties with
the dynamic ones and adds callbacks to be executed when properties
are set.
2. xics_cpu_setup() callback renamed to xics_common_cpu_setup() as
it is a common part for both XICS'es
3. xics_reset() renamed to xics_common_reset() for the same reason.
The emulated XICS changes:
1. the part of xics_realize() which creates ICPs is moved to
the "nr_servers" property callback as realize() is too late to
create/initialize devices and instance_init() is too early to create
devices as the number of child devices comes via the "nr_servers"
property.
2. added ics_initfn() which does a little part of what xics_realize() did.
David Gibson [Thu, 26 Sep 2013 06:18:35 +0000 (16:18 +1000)]
target-ppc: Add helper for KVM_PPC_RTAS_DEFINE_TOKEN
Recent PowerKVM allows the kernel to intercept some RTAS calls from the
guest directly. This is used to implement the more efficient in-kernel
XICS for example. qemu is still responsible for assigning the RTAS token
numbers however, and needs to tell the kernel which RTAS function name is
assigned to a given token value. This patch adds a convenience wrapper for
the KVM_PPC_RTAS_DEFINE_TOKEN ioctl() which is used for this purpose.