John Snow [Thu, 21 Aug 2014 17:44:38 +0000 (13:44 -0400)]
ahci: Add test_hba_enable to ahci-test.
This test engages the HBA functionality and initializes
values to sane defaults to allow for minimal HBA functionality.
Buffers are allocated and pointers are updated to allow minimal
I/O commands to complete as expected. Error registers and responses
are sanity checked for specification adherence.
John Snow [Thu, 21 Aug 2014 17:44:37 +0000 (13:44 -0400)]
ahci: Add test_hba_spec to ahci-test.
Add a test routine that checks the boot-up values of the HBA
configuration memory space against the AHCI 1.3 specification
and Intel ICH9 data sheet (for Q35 machines) for adherence and
sane values.
The HBA is not yet engaged or put into the idle state.
[Replaced g_assert_false(...) with g_assert(!...) for glib <2.38
compatibility, reported by Peter Maydell <[email protected]>.
--Stefan]
John Snow [Thu, 21 Aug 2014 17:44:36 +0000 (13:44 -0400)]
ahci: properly shadow the TFD register
In a real AHCI device, several S/ATA registers are mirrored or shadowed
within the AHCI register set. These registers are not updated
synchronously for each read access, but are instead updated after a
Device-to-Host Register FIS packet is received. The D2H FIS contains
the values from these registers on the device.
In QEMU, by reaching directly into the device to grab these bits before
they are "sent," we may introduce race conditions where unexpected
values are present "before they are sent" which could cause issues for
some guests, particularly if an attempt is made to read the PxTFD
register prior to enabling the port, where incorrect values will be read.
This patch also addresses the boot-time values for the PxTFD and PxSIG
registers to bring them in line with the AHCI 1.3 specification.
Lastly, several fields (PxTFD, PxSIG and PxSACT) are read-only,
and any attempts to write to them should be ignored.
John Snow [Thu, 21 Aug 2014 17:44:33 +0000 (13:44 -0400)]
ahci: MSI capability should be at 0x80, not 0x50.
In the Intel ICH9 data sheet, the MSI capability offset
in the PCI configuration space for ICH9 AHCI devices is
specified to be 0x80.
Further, the PCI capability pointer should always point
to 0x80 in ICH9 devices, despite the fact that AHCI 1.3
specifies that it should be pointing to PMCAP (Which in
this instance would be 0x70) to maintain adherence to
the Intel data sheet specifications and real observed behavior.
John Snow [Thu, 21 Aug 2014 17:44:32 +0000 (13:44 -0400)]
ahci: Adding basic functionality qtest.
Currently, there is no qtest to test the functionality of
the AHCI functionality present within the Q35 machine type.
This patch adds a skeleton for an AHCI test suite,
and adds a simple sanity-check test case where we
identify that the AHCI device is present, then
disengage the virtual machine.
Max Reitz [Wed, 20 Aug 2014 17:59:35 +0000 (19:59 +0200)]
qcow2: Add overlap-check.template option
Being able to set the overlap-check option to a string and then refine
it via the overlap-check.* options is a nice idea for the command line
but does not work so well for non-flattened dicts. In that case, one can
only specify either but not both, so add a field to overlap-check.*
which does the same as directly specifying overlap-check but can be used
in conjunction with the other fields in non-flattened dicts.
Max Reitz [Wed, 20 Aug 2014 17:59:33 +0000 (19:59 +0200)]
qcow2: Fix leak of QemuOpts in qcow2_open()
Currently, the QemuOpts object opts is leaked if anything fails from its
creation up to and including the image repair block. Fix this by freeing
that object in the fail path.
Max Reitz [Fri, 5 Sep 2014 14:07:18 +0000 (16:07 +0200)]
qcow2: Check L1/L2/reftable entries for alignment
Offsets taken from the L1, L2 and refcount tables are generally assumed
to be correctly aligned. However, this cannot be guaranteed if the image
has been written to by something different than qemu, thus check all
offsets taken from these tables for correct cluster alignment.
Max Reitz [Fri, 5 Sep 2014 14:07:15 +0000 (16:07 +0200)]
qapi/block: Add "fatal" to BLOCK_IMAGE_CORRUPTED
Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when
reading from an image, they should generally not be. Nonetheless, even
an image only read from may of course be corrupted and this can be
detected during normal operation. In this case, a non-fatal event should
be emitted, but the image should not be marked corrupt (in accordance to
"fatal" set to false).
This is an analogue to Linux null_blk. It can be used for testing or
benchmarking block device emulation and general block layer
functionalities such as coroutines and throttling, where disk IO is not
necessary or wanted.
Use null-aio:// for AIO version, and null-co:// for coroutine version.
[Resolved conflict with Fam's async bdrv_aio_cancel() series:
1. Drop .bdrv_aio_cancel() since it is now done by block.c
2. Rename qemu_aio_release() to qemu_aio_unref()
--Stefan]
Paolo Bonzini [Mon, 15 Sep 2014 12:52:58 +0000 (14:52 +0200)]
aio-win32: avoid out-of-bounds access to the events array
If ret is WAIT_TIMEOUT and there was an event returned by select(),
we can write to a location after the end of the array. But in
that case we can retry the WaitForMultipleObjects call with the
same set of events, so just move the event[ret - WAIT_OBJECT_0]
assignment inside the existin conditional.
qdev-monitor: fix segmentation fault on qdev_device_help()
Normally, qmp_device_list_properties() may return NULL when
a device haven't special properties excpet Object and DeviceState
properties, such as virtio-balloon-device.
We just need check local_err instead of prop_list.
Example:
Segmentation fault (core dumped)
The backtrace as below:
Program received signal SIGSEGV, Segmentation fault.
0x00005555559af1a8 in error_get_pretty (err=0x0) at util/error.c:152
152 return err->msg;
(gdb) bt
func=0x55555574a6ca <device_help_func>, opaque=0x0, abort_on_failure=0) at util/qemu-option.c:1072
Now that all the implementations are converted to asynchronous version
and we can emulate synchronous cancellation with it. Let's drop the
unused member.
ide: Convert trim_aiocb_info.cancel to .cancel_async
We know that either bh is scheduled or ide_issue_trim_cb will be called
again, so we just set i, j and ret to the right values. In both cases,
ide_trim_bh_cb will be called.
Also forward the cancellation to the iocb->aiocb which we get from
bdrv_aio_discard.
sheepdog: Convert sd_aiocb_info.cancel to .cancel_async
Also drop the now unused SheepdogAIOCB.finished field. Note that this
aio is internal to sheepdog driver and has NULL cb and opaque, and
should be unused at all.
quorum: Convert quorum_aiocb_info.cancel to .cancel_async
Before, we cancel all the child requests with bdrv_aio_cancel, then free
the acb..
Now we just kick off asynchronous cancellation of child requests and
return, we know quorum_aio_cb will be called later, so in the end
quorum_aio_finalize will take care of calling the caller's cb.
Liu Yuan [Thu, 11 Sep 2014 05:41:21 +0000 (13:41 +0800)]
quorum: fix quorum_aio_cancel()
For a fifo read pattern, we only have one running aio (possible other cases that
has less number than num_children in the future), so we need to check if
.acb is NULL against bdrv_aio_cancel() to avoid segfault.
thread-pool: Convert thread_pool_aiocb_info.cancel to cancel_async
The .cancel_async shares the same the first half with .cancel: try to
steal the request if not submitted yet. In this case set the elem to
THREAD_DONE status and ret to -ECANCELED, which means
thread_pool_completion_bh will call the cb with -ECANCELED.
If the request is already submitted, do nothing, as we know the normal
completion will happen in the future.
Testing code update:
Before, done_cb is only called if the request is already submitted by
thread pool. Now done_cb is always called, even before it is submitted,
because we emulate bdrv_aio_cancel with bdrv_aio_cancel_async. So also
update the test criteria accordingly.
This is the async version of bdrv_aio_cancel, which doesn't block the
caller. It guarantees that the cb is called either before returning or
some time later.
bdrv_aio_cancel can base on bdrv_aio_cancel_async, later we can convert
all .io_cancel implementations to .io_cancel_async, and the aio_poll is
the common logic. In the end, .io_cancel can be dropped.
Before, bdrv_aio_cancel will either complete the request (like normal)
and call CB with an actual return code, or skip calling the request (for
example when the IO req is not submitted by thread pool yet).
We will change bdrv_aio_cancel to do it differently: always call CB
before return, with either [1] a normal req completion ret code, or [2]
ret == -ECANCELED. So the callers' callback must accept both cases. The
existing logic works with case [1], but not [2].
The simplest transition of callback code is do nothing in case [2], just
as if the CB is not called by the bdrv_aio_cancel() call.
John Snow [Sat, 13 Sep 2014 03:51:12 +0000 (23:51 -0400)]
ide/atapi: Mark non-data commands as complete
When the command completion code in IDE and AHCI
was unified to put all command completion inside
of a callback, "cmd_done," we neglected to
ensure that all AHCI/ATAPI command paths would
eventually register as finished. for the PCI
interface to IDE this is not a problem because
cmd_done is a nop, but the AHCI implementation
needs to send a D2H_REG_FIS and interrupt back
to the guest to inform of completion.
This patch adds calls to ide_stop_transfer,
which calls ide_cmd_done, inside of
ide_atapi_cmd_ok and ide_atapi_cmd_error.
This fixes regressions observed by trying to boot QEMU
with a Fedora 20 live CD under Q35/AHCI, which uses
ATAPI command 0x00, which is a status check that may
cause a hang because we never complete, and ATAPI
command 0x56, which is unsupported by our current
implementation and results in an error that we never
report back to the guest.
Peter Maydell [Sun, 14 Sep 2014 19:29:59 +0000 (20:29 +0100)]
block/vhdx.c: Mark parent_vhdx_guid variable as unused
The parent_vhdx_guid variable is defined but never used, which provokes
complaints from newer versions of clang. Since the variable definition
is here acting as documentation of the image format, mark it with the
'unused' attribute to keep the compiler happy rather than simply
deleting it.
When trying to create a fixed vhd image qemu-img will return the
following error:
qemu-img: test.vhdx: Could not create image: Cannot allocate memory
This happens because of a incorrect check in vhdx.c. Specifficaly,
in vhdx_create_bat(), after allocating memory for the BAT entry,
there is a check to determine if the allocation was unsuccsessful.
The error comes from the fact that it checks if s->bat isn't NULL,
which is true in case of succsessful allocation, and exits with
error ENOMEM.
usb-storage: Fix how legacy init handles option ID clash
usb_msd_init() calls qemu_opts_create() with a made-up ID and false
fail_if_exists. If the ID already exists, it happily messes up those
options, then fails drive_new(), because the BlockDriverState with
that ID already exists, too.
Li Liu [Tue, 9 Sep 2014 11:19:48 +0000 (19:19 +0800)]
qemu-char: Permit only a single "stdio" character device
When more than one is used, the terminal settings aren't restored
correctly on exit. Fixable. However, such usage makes no sense,
because the users race for input, so outlaw it instead.
If you want to connect multiple things to stdio, use the mux
chardev.
Max Filippov [Thu, 18 Sep 2014 05:03:36 +0000 (22:03 -0700)]
exec.c: fix setting 1-byte-long watchpoints
With commit 05068c0dfb5b 'exec.c: Relax restrictions on watchpoint length
and alignment' it's no longer possible to set 1-byte-long watchpoint
because of incorrect address range check.
Fix that by changing condition that checks for address wraparound.
Paolo Bonzini [Tue, 26 Aug 2014 10:16:08 +0000 (12:16 +0200)]
serial: check if backed by a physical serial port at realize time
Right now, s->poll_msl may linger at "0" value for an arbitrarily long
time, until serial_update_msl is called for the first time. This is
unnecessary, and will lead to the s->poll_msl field being unnecessarily
migrated.
We can call serial_update_msl immediately at realize time (via
serial_reset) and be done with it. The memory-mapped UART was already
doing that, but not the ISA and PCI variants.
Regarding the delta bits, be consistent with what serial_reset does when
the serial port is not backed by a physical serial port, and always clear
them at reset time.
Peter Maydell [Thu, 18 Sep 2014 19:02:00 +0000 (20:02 +0100)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pci, pc, virtio, misc bugfixes
A bunch of bugfixes - some of these will make sense for 2.1.2
I put Cc: qemu-stable included where appropriate.
Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Thu 18 Sep 2014 19:52:18 BST using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>"
# gpg: aka "Michael S. Tsirkin <[email protected]>"
* remotes/mst/tags/for_upstream:
pc: leave more space for BIOS allocations
virtio-pci: fix migration for pci bus master
vhost-user: fix VIRTIO_NET_F_MRG_RXBUF negotiation
virtio-pci: enable bus master for old guests
Revert "virtio: don't call device on !vm_running"
virtio-net: drop assert on vm stop
Revert "rng-egd: remove redundant free"
qdev: Move global validation to a single function
qdev: Rename qdev_prop_check_global() to qdev_prop_check_globals()
test-qdev-global-props: Test handling of hotpluggable and non-device types
test-qdev-global-props: Initialize not_used=true for all props
test-qdev-global-props: Run tests on subprocess
tests: disable global props test for old glib
test-qdev-global-props: Trivial comment fix
hw/machine: Free old values of string properties
Since QEMU 2.1, we are allocating more space for ACPI tables, so no
space is left after initrd for the BIOS to allocate memory.
Besides ACPI tables, there are a few other uses of high memory in
SeaBIOS: SMBIOS tables and USB drivers use it in particular. These uses
allocate a very small amount of memory. Malloc metadata also lives
there. So we need _some_ extra padding there to avoid initrd breakage,
but not much.
John Snow found a case where RHEL5 was broken by the recent change to
ACPI_TABLE_SIZE; in his case 4KB of extra padding are fine, but just to
be safe I am adding 32KB, which is roughly the same amount of padding
that was left by QEMU 2.0 and earlier.
Current support for bus master (clearing OK bit)
together with the need to support guests which do not
enable PCI bus mastering, leads to extra state in
VIRTIO_PCI_FLAG_BUS_MASTER_BUG bit, which isn't robust
in case of cross-version migration for the case when
guests use the device before setting DRIVER_OK.
Rip out VIRTIO_PCI_FLAG_BUS_MASTER_BUG and implement a simpler
work-around: treat clearing of PCI_COMMAND as a virtio reset. Old
guests never touch this bit so they will work.
As reset clears device status, DRIVER and MASTER bits are
now in sync, so we can fix up cross-version migration simply
by synchronising them, without need to detect a buggy guest
explicitly.
Drop tracking VIRTIO_PCI_FLAG_BUS_MASTER_BUG completely.
As reset makes the device quiescent, in the future we'll be able to drop
checking OK bit in a bunch of places.
commit cc943c36faa192cd4b32af8fe5edb31894017d35
pci: Use bus master address space for delivering MSI/MSI-X messages
breaks virtio-net for rhel6.[56] x86 guests because they don't
enable bus mastering for virtio PCI devices. For the same reason,
rhel6.[56] ppc64 guests cannot boot on a virtio-blk disk anymore.
Old guests forgot to enable bus mastering, enable it automatically on
DRIVER (guests use some devices before DRIVER_OK).
This reverts commit a1bc7b827e422e1ff065640d8ec5347c4aadfcd8.
virtio: don't call device on !vm_running
It turns out that virtio net assumes that vm_running
is updated before device status callback in many places,
so this change leads to asserts.
Previous commit fixes the root issue that motivated a1bc7b827e422e1ff065640d8ec5347c4aadfcd8 differently,
so there's no longer a need for this change.
In the future, we might be able to drop checking vm_running
completely, and check vm state directly.
Eduardo Habkost [Fri, 8 Aug 2014 19:03:31 +0000 (16:03 -0300)]
qdev: Move global validation to a single function
Currently GlobalProperty.not_used=false has multiple meanings:
* It may be a property for a hotpluggable device, which may or may not
have been used by a device;
* It may be a machine-type-provided property, which may or may not have
been used by a device.
* It may be a user-provided property that was actually not used by
any device.
Simplify the logic by having two separate fields: 'user_provided' and
'used'. This allows the entire global property validation logic to be
contained in a single function, and allows more specific error messages.
Eduardo Habkost [Fri, 8 Aug 2014 19:03:27 +0000 (16:03 -0300)]
test-qdev-global-props: Run tests on subprocess
There are multiple reasons for running the global property tests on a
subprocess:
* We need the global_props lists to be empty for each test case, so
global properties from the previous test won't affect the next one;
* We don't want the qdev_prop_check_global() warnings to pollute test
output;
* With a subprocess, we can ensure qdev_prop_check_global() is printing
the warning messages it should.
follow-up patch moves global property tests to subprocesses.
Unfortunately with old glib this causes:
tests/test-qdev-global-props.c: In function
‘test_static_prop’:
tests/test-qdev-global-props.c:80:5: error: implicit
declaration of function ‘g_test_trap_subprocess’
[-Werror=implicit-function-declaration]
tests/test-qdev-global-props.c:80:5: error: nested extern
declaration of ‘g_test_trap_subprocess’ [-Werror=nested-externs]
This function was only added in glib 2.38, and our
minimum version is 2.12.
Pavel Dovgalyuk [Wed, 17 Sep 2014 08:05:19 +0000 (12:05 +0400)]
target-i386: update fp status fix
This patch introduces cpu_set_fpuc() function, which changes fpuc field
of the CPU state and calls update_fp_status() function.
These calls update status of softfloat library and prevent bugs caused
by non-coherent rounding settings of the FPU and softfloat.
v2 changes:
* Added missed calls and intoduced setter function (as suggested by TeLeMan)
Peter Lieven [Fri, 5 Sep 2014 20:07:41 +0000 (22:07 +0200)]
ui/vnc: set TCP_NODELAY
we currently have the Nagle algorithm enabled for all outgoing VNC updates.
This may delay sensitive updates as mouse movements or typing in the console.
As we currently prepare all data in a buffer and then send as much as we can
disabling the Nagle algorithm should not cause big trouble. Well established
VNC servers like TightVNC set TCP_NODELAY as well.
A regular framebuffer update request generates exactly one framebuffer update
which should be pushed out as fast as possible.
Peter Maydell [Tue, 2 Sep 2014 10:24:14 +0000 (11:24 +0100)]
util/qemu-sockets.c: Support specifying IPv4 or IPv6 in socket_dgram()
Currently you can specify whether you want a UDP chardev backend
to be IPv4 or IPv6 using the ipv4 or ipv6 options if you use the
QemuOpts parsing code in inet_dgram_opts(). However the QMP struct
parsing code in socket_dgram() doesn't provide this flexibility
(which in turn prevents us from converting the UDP backend handling
to the new style QAPI framework).
Use the existing inet_addr_to_opts() function to convert the
remote->inet address to option strings; this handles ipv4 and
ipv6 flags as well as host and port. (It will also convert any
'to' specification, which is harmless as it is ignored in this
context.)
Peter Maydell [Tue, 2 Sep 2014 10:24:13 +0000 (11:24 +0100)]
qemu-char: Convert socket backend to QAPI
Convert the socket char backend to the new style QAPI framework;
this allows it to return an Error ** to callers who might not
want it to print directly about socket failures.
Philipp Hahn [Wed, 10 Sep 2014 11:47:15 +0000 (13:47 +0200)]
hw/dma/i8257: Silence phony error message
Convert into trace event. Otherwise the message
dma: unregistered DMA channel used nchan=0 dma_pos=0 dma_len=1
gets printed every time and fills up the log-file with 50 MiB / minute.
Alexander Graf [Fri, 5 Sep 2014 13:52:45 +0000 (10:52 -0300)]
kvmclock: Ensure time in migration never goes backward
When we migrate we ask the kernel about its current belief on what the guest
time would be. However, I've seen cases where the kvmclock guest structure
indicates a time more recent than the kvm returned time.
To make sure we never go backwards, calculate what the guest would have seen as time at the point of migration and use that value instead of the kernel returned one when it's more recent.
This bases the view of the kvmclock after migration on the
same foundation in host as well as guest.
pit: fix pit interrupt can't inject into vm after migration
kvm_pit is running in kmod. kvm_pit is going to inject
interrupt to vm before cpu_synchronize_all_post_init at
dest side. vcpu will lose the pit interrupt, but
ack_irq(in kmod) has been 0. ack_irq become 1 after
vcpu responds pit interrupt. pit interruptcan inject
to vm when ack_irq is 1.
By the way, kvm_pit_vm_state_change has save and load
state of pit, so pre_save and post_load is unnecessary.
Peter Maydell [Mon, 15 Sep 2014 16:35:21 +0000 (17:35 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block patches
# gpg: Signature made Fri 12 Sep 2014 16:09:43 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
* remotes/kevin/tags/for-upstream: (22 commits)
qcow2: Add falloc and full preallocation option
raw-posix: Add falloc and full preallocation option
qapi: introduce PreallocMode and new PreallocModes full and falloc.
block: don't convert file size to sector size
block: round up file size to nearest sector
iotests: Send the correct fd in socket_scm_helper
blockdev: Refuse to drive_del something added with blockdev-add
block: extend BLOCK_IO_ERROR with reason string
dataplane: fix virtio_blk_data_plane_create() op blocker error path
qemu-iotests: Run 025 for Archipelago block driver
block/archipelago: Implement bdrv_truncate()
block: Make the block accounting functions operate on BlockAcctStats
block: rename BlockAcctType members to start with BLOCK_ instead of BDRV_
block: Extract the block accounting code
block: Extract the BlockAcctStats structure
IDE: MMIO IDE device control should be little endian
thread-pool: Drop unnecessary includes
xen: Drop redundant bdrv_close() from pci_piix3_xen_ide_unplug()
xen_disk: Plug memory leak on error path
qemu-io: Clean up openfile() after commit 2e40134
...