This patch is only to refactor some lines of codes to get better and more robust codes.
As you have seen, in qed_read_table_cb() it's nice to
use qiov->size because that function doesn't obviously use a single
struct iovec.
In other two functions, if qiov use more than one struct iovec, the existing way will get wrong nb_sectors.
To make the code more robust, it will be nicer to refactor the existing way as below.
Stefan Hajnoczi [Wed, 30 Nov 2011 12:23:41 +0000 (12:23 +0000)]
qcow2: avoid reentrant bdrv_read() in copy_sectors()
A BlockDriverState should not issue requests on itself through the
public block layer interface. Nested, or reentrant, requests are
problematic because they do I/O throttling and request tracking twice.
Features like block layer copy-on-read use request tracking to avoid
race conditions between concurrent requests. The reentrant request will
have to "wait" for its parent request to complete. But the parent is
waiting for the reentrant request to make progress so we have reached
deadlock.
The solution is for block drivers to avoid the public block layer
interfaces for reentrant requests. Instead they should call their own
internal functions if they wish to perform reentrant requests.
This is also a good opportunity to make copy_sectors() a true
coroutine_fn. That means calling bdrv_co_writev() instead of
bdrv_write(). Behavior is unchanged but we're being explicit that this
executes in coroutine context.
Kevin Wolf [Mon, 19 Sep 2011 09:26:48 +0000 (11:26 +0200)]
qcow2: Unlock during COW
Unlocking during COW allows for more parallelism. One change it requires is
that buffers are dynamically allocated instead of just using a per-image
buffer.
While touching the code, drop the synchronous qcow2_read() function and replace
it by a bdrv_read() call.
Michael Roth [Tue, 29 Nov 2011 22:47:49 +0000 (16:47 -0600)]
Makefile: use full path for qapi-generated directory
Generally $(BUILD_DIR) == $(CURDIR), but that isn't necessarilly the
case, so use $(BUILD_DIR)/qapi-generated for generated files to
avoid potentionally sticking generating files in odd places outside
the build's include paths.
Michael Roth [Tue, 29 Nov 2011 22:47:48 +0000 (16:47 -0600)]
qapi: fix guardname generation
Fix a bug in handling dotted paths, and exclude directory prefixes
from generated guardnames to avoid odd/pseudo-random guardnames in
generated headers.
Max Filippov [Thu, 24 Nov 2011 12:11:31 +0000 (16:11 +0400)]
configure: avoid screening of --{en, dis}able-usb-redir options
--*dir) option pattern precede --{en,dis}able-usb-redir) patterns in the
option analysis switch, making the latter options have no effect.
There were some --*dir that are supported by Autoconf and not by QEMU configure.
The aim was to let QEMU packagers use the rpm (or similar) macro that overrides
directories for their distribution.
cutils: Make strtosz & friends leave follow set to callers
strtosz() & friends require the size to be at the end of the string,
or be followed by whitespace or ','. I find this surprising, because
the name suggests it works like strtol().
The check simplifies callers that accept exactly that follow set
slightly. No such callers exist.
The check is redundant for callers that accept a smaller follow set,
and thus need to check themselves anyway. Right now, this is the case
for all but one caller. All of them neglected to check, or checked
incorrectly, but the previous few commits fixed them up.
Finally, the check is problematic for callers that accept a larger
follow set. This is the case in monitor_parse_command().
Fortunately, the problems there are relatively harmless.
monitor_parse_command() uses strtosz() for argument type 'o'. When
the last argument is of type 'o', a trailing ',' is diagnosed
differently than other trailing junk:
(qemu) migrate_set_speed 1x
invalid size
(qemu) migrate_set_speed 1,
migrate_set_speed: extraneous characters at the end of line
A related inconsistency exists with non-last arguments. No such
command exists, but let's use memsave to explore the inconsistency.
The monitor permits, but does not require whitespace between
arguments. For instance, "memsave (1-1)1024foo" is parsed as command
memsave with three arguments 0, 1024 and "foo". Yes, this is daft,
but at least it's consistently daft.
If I change memsave's second argument from 'i' to 'o', then "memsave
(1-1)1foo" is rejected, because the size is followed by an 'f'. But
"memsave (1-1)1," is still accepted, and duly saves to file ",".
We don't have any users of strtosz that profit from the check. In the
users we have, it appears to encourage sloppy error checking, or gets
in the way. Drop the bothersome check.
strtosz_suffix() fails unless the size is followed by 0, whitespace or
','. Useless here, because we need to fail for any junk following the
size, even if it starts with whitespace or ','. Check manually.
Things like "qemu-img create xxx 1024," and "qemu-img convert -S '1024
junk'" are now caught.
cpu_x86_find_by_name() uses strtosz_suffix_unit(), but screws up the
error checking. It detects some failures, but not all. Undetected
failures result in a zero tsc_khz value (error value -1 divided by
1000), which means "no tsc_freq set".
To reproduce, try "-cpu qemu64,tsc_freq=9999999T".
strtosz_suffix_unit() fails, because the value overflows int64_t,
strtosz_suffix() fails unless the size is followed by 0, whitespace or
','. Useless here, because we need to fail for any junk following the
size, even if it starts with whitespace or ','. Check manually.
Things like "-m 1024," are now caught.
strtosz_suffix() fails unless the size is followed by 0, whitespace or
','. Useless here, because we need to fail for any junk following the
size, even if it starts with whitespace or ','. Check manually.
cutils: Drop broken support for zero strtosz default_suffix
Commit 9f9b17a4's strtosz() defaults a missing suffix to 'M', except
it rejects fractions then (switch case 0).
When commit d8427002 introduced strtosz_suffix(), that changed:
fractions are no longer rejected, because we go to switch case 'M' on
missing suffix now. Not mentioned in commit message, probably
unintentional. Not worth changing back now.
Because case 0 is still around, you can get the old behavior by
passing a zero default_suffix to strtosz_suffix() or
strtosz_suffix_unit(). Undocumented and not used. Drop.
Commit d8427002 also neglected to update the function comment. Fix it
up.
Hans de Goede [Sat, 19 Nov 2011 09:22:47 +0000 (10:22 +0100)]
usb-redir: Don't try to write to the chardev after a close event
Since we handle close async in a bh, do_write and thus write can get
called after receiving a close event. This patch adds a check to
the usb-redir write callback to not call qemu_chr_fe_write on a closed
backend.
These fixes mainly target the other side sending some (error status)
packets after a disconnect packet. In some cases these would get queued
up and then reported to the controller when a new device gets connected.
* Fully reset device state on disconnect
* Don't allow a connect message when already connected
* Ignore iso and interrupt status messages when disconnected
Define a state callback and make that generate chardev open/close events when
called by the spice-server.
Notes:
1) For all but the newest spice-server versions (which have a fix for this)
the code ignores these events for a spicevmc with a subtype of vdagent, this
subtype specific knowledge is undesirable, but unavoidable for now, see:
http://lists.freedesktop.org/archives/spice-devel/2011-July/004837.html
2) This code deliberately sends the events immediately rather then from a
bh. This is done this way because:
a) There is no need to do it from a bh; and
b) Doing it from a bh actually causes issues because the spice-server may send
data immediately after the open and when the open runs from a bh, then
qemu_chr_be_can_write will return 0 for the first write which the spice-server
does not expect, when this happens the spice-server will never retry the write
causing communication to stall.
Hans de Goede [Sat, 19 Nov 2011 09:22:43 +0000 (10:22 +0100)]
qemu-char: rename qemu_chr_event to qemu_chr_be_event and make it public
Rename qemu_chr_event to qemu_chr_be_event, since it is only to be
called by backends and make it public so that it can be used by chardev
code which lives outside of qemu-char.c
Paolo Bonzini [Thu, 24 Nov 2011 12:28:52 +0000 (13:28 +0100)]
virtio: add and use virtio_set_features
vdev->guest_features is not masking features that are not supported by
the guest. Fix this by introducing a common wrapper to be used by all
virtio bus implementations.
Paolo Bonzini [Mon, 21 Nov 2011 08:29:11 +0000 (09:29 +0100)]
9pfs: improve portability to older systems
Small requirements on "new" features have percolated to virtio-9p-local.c.
In particular, the utimensat wrapper actually only supports dirfd = AT_FDCWD
and flags = AT_SYMLINK_NOFOLLOW in the fallback code. Remove the arguments
so that virtio-9p-local.c will not use AT_* constants.
At the same time, fail local_ioc_getversion if the ioctl is not supported
by the host.
"Anthony, I think we should revert that commit and refactor cpuid for
1.1. The logic is spread over too many places which makes it hard to
reason about."
Gerd Hoffmann [Wed, 16 Nov 2011 11:37:17 +0000 (12:37 +0100)]
usb-host: add usb_host_do_reset function.
Add a special function to reset the host usb device. It tracks the time
needed by the USBDEVFS_RESET ioctl and prints a warning in case it needs
too long. Usually it should be finished in 200 - 300 miliseconds.
Warning threshold is one second.
Intention is to help troubleshooting by indicating that the usb device
stopped responding even to a reset request and is possibly broken.
Kevin Wolf [Tue, 22 Nov 2011 15:52:13 +0000 (16:52 +0100)]
vvfat: Add migration blocker
vvfat caches more or less everything when in writable mode. For migration
to work, it would have to be invalidated. Block migration for now when
in writable mode (default is readonly).
Gerd Hoffmann [Fri, 18 Nov 2011 09:49:25 +0000 (10:49 +0100)]
usb-ehci: add register names
The mmio register name list only had the names for four port status
registers. We emulate a EHCI adapter with six ports though, the last
two ones are listed as "unknown" in traces. Fix it.
Julian Pidancet [Wed, 23 Nov 2011 01:03:15 +0000 (01:03 +0000)]
rtl8139: Fix invalid IO access alignment
This patch makes iPXE work with the rtl8139 emulation. The rtl8139
driver in iPXE issues a 16bit access on the ChipCmd register
(offset 0x37) to check the status of the rx buffer. The offset of the
ioport access was getting fixed up to 0x36 in qemu, causing the value
read in iPXE to be invalid.
This fixes an issue with iPXE reporting timeouts during TFTP transfers.
Reposting this here because it is trivial enough and the original post
on qemu-devel didn't attract much attention.
Also, the inw() which was causing the issue has been replaced with an
inb() in upstream iPXE:
https://git.ipxe.org/ipxe.git/commit/91dd64ad25baa27954a7518e73df4fca8a2d0c93
Gerd Hoffmann [Tue, 22 Nov 2011 11:39:58 +0000 (12:39 +0100)]
usb: make usb_create_simple catch and pass up errors.
Use qdev_init() instead of qdev_init_nofail(), usb device initialization
can fail, most common case being port and device speed mismatch. Handle
failures correctly and pass up NULL pointers then.
Also fixup usb_create_simple() callers (only one was buggy) to properly
check for NULL pointers before referncing the usb_create_simple() return
value.
slirp: Clean up net_slirp_hostfwd_remove()'s use of get_str_sep()
get_str_sep() can fail, but net_slirp_hostfwd_remove() doesn't check.
Works, because it initializes buf[] to "", which get_str_sep() doesn't
touch when it fails. Coverity doesn't like it, and neither do I.
Paolo Bonzini [Fri, 18 Nov 2011 15:32:02 +0000 (16:32 +0100)]
scsi-generic: add as boot device
There is no reason why a scsi-generic device cannot boot if it has
the right type, and indeed it provides already a bootindex property.
So register those devices too.
Paolo Bonzini [Fri, 18 Nov 2011 16:03:45 +0000 (17:03 +0100)]
scsi: fix fw path
The pre-1.0 firmware path for SCSI devices already included the LUN
using the suffix argument to add_boot_device_path. Avoid that it is
included twice, and convert the colons to commas for consistency with
other kinds of devices
Paolo Bonzini [Fri, 18 Nov 2011 15:32:00 +0000 (16:32 +0100)]
usb-msd: do not register twice in the boot order
USB mass storage devices are registered twice in the boot order.
To avoid having to keep the two paths in sync, pass the bootindex
property down to the scsi-disk device and let it register itself.
Max Filippov [Mon, 21 Nov 2011 00:54:58 +0000 (04:54 +0400)]
configure: check for EFD_NONBLOCK | EFD_CLOEXEC flags
Add check for the EFD_NONBLOCK and EFD_CLOEXEC flags to the
CONFIG_EVENTFD test.
This fixes the following build failure on Fedora 9:
CC event_notifier.o
event_notifier.c: In function `event_notifier_init':
event_notifier.c:21: error: `EFD_NONBLOCK' undeclared (first use in this function)
event_notifier.c:21: error: (Each undeclared identifier is reported only once
event_notifier.c:21: error: for each function it appears in.)
event_notifier.c:21: error: `EFD_CLOEXEC' undeclared (first use in this function)
make: *** [event_notifier.o] Error 1
Avi Kivity [Tue, 15 Nov 2011 18:12:17 +0000 (20:12 +0200)]
configure: build position independent executables on x86-Linux hosts
Change the default on x86 Linux hosts to building PIE (position
independent executables); instead of restricting the option to
user-only targets, apply it to all targets.
In addition, set the relocation sections to read-only (relro) when
available; this reduces the attack surface by disallowing changes to
relocation tables at runtime.
While PIE reduces performance and relro increases load time, it
greatly improves security, with the potential to reduce a code
execution vulnerability to a self denial of service.
Non-x86 are not changed, as they require TCG changes; neither are
non-Linux, due to lack of test coverage.
Hongyong Zang [Mon, 21 Nov 2011 10:56:18 +0000 (18:56 +0800)]
ivshmem: fix PCI BAR2 registration during initialization
Ivshmem cannot work, and the command lspci cannot show ivshmem BAR2 in the guest.
As for pci_register_bar(), parameter MemoryRegion should be s->bar instead of s->ivshmem.
Anthony Liguori [Mon, 14 Nov 2011 21:09:46 +0000 (15:09 -0600)]
qcow2: implement bdrv_invalidate_cache (v2)
We don't reopen the actual file, but instead invoke the close and open routines.
We specifically ignore the backing file since it's contents are read-only and
therefore immutable.
Anthony Liguori [Mon, 14 Nov 2011 21:09:45 +0000 (15:09 -0600)]
block: allow migration to work with image files (v3)
Image files have two types of data: immutable data that describes things like
image size, backing files, etc. and mutable data that includes offset and
reference count tables.
Today, image formats aggressively cache mutable data to improve performance. In
some cases, this happens before a guest even starts. When dealing with live
migration, since a file is open on two machines, the caching of meta data can
lead to data corruption.
This patch addresses this by introducing a mechanism to invalidate any cached
mutable data a block driver may have which is then used by the live migration
code.
NB, this still requires coherent shared storage. Addressing migration without
coherent shared storage (i.e. NFS) requires additional work.
Anthony Liguori [Mon, 14 Nov 2011 21:09:43 +0000 (15:09 -0600)]
migrate: add migration blockers
This lets different subsystems register an Error that is thrown whenever
migration is attempted. This works nicely because it gracefully supports
things like hotplug.
Right now, if multiple errors are registered, only one of them is reported.
I expect that for 1.1, we'll extend query-migrate to return all of the reasons
why migration is disabled at any given point in time.
Gerd Hoffmann [Fri, 11 Nov 2011 16:14:15 +0000 (17:14 +0100)]
usb-linux: fix /proc/bus/usb/devices scan
Commit 0c402e5abb8c2755390eee864b43a98280fc2453 is incomplete
and misses one of the two function pointer calls in
usb_host_scan_dev(). Add the additional port handling logic
to the other call too.
Gerd Hoffmann [Mon, 21 Nov 2011 13:01:26 +0000 (14:01 +0100)]
usb-storage: don't try to send the status early.
Until recently all scsi commands sent to scsi-disk did either transfer
data or finished instantly. The correct implementation of
SYNCRONIZE_CACHE changed the picture though, and usb-storage needs
a fix to handle that case correctly.
Blue Swirl [Sun, 13 Nov 2011 11:11:52 +0000 (11:11 +0000)]
x86: fix pcmpestrm and pcmpistrm
Fix obvious typos (decrement and off-by-one error) in pcmpestrm and pcmpistrm
which resulted in infinite loop. Reported by Frank Mehnert,
spotted also by Coverity (bug 84752853).
loader: Fix read_targphys() to behave when read() fails
Happily passes (size_t)-1 to rom_add_blob_fixed(), which promptly dies
attempting to malloc that much. Spotted by Coverity.
Bonus fix for ROMs larger than INT_MAX bytes: return ssize_t instead
of int. Bug can't bite, because the only user load_aout() limits ROM
size to an int value.
In both cases, val is computed, but then not used in the
subsequent line, which then re-computes the quantity in
a different type (int32_t vs unsigned long).
Keep the computation type that's been working so far.
Blue Swirl [Sat, 19 Nov 2011 11:17:58 +0000 (11:17 +0000)]
Merge branch 's390-1.0' of git://repo.or.cz/qemu/agraf
* 's390-1.0' of git://repo.or.cz/qemu/agraf:
s390x: initialize virtio dev region
tcg: Use TCGReg for standard tcg-target entry points.
tcg: Standardize on TCGReg as the enum for hard registers
s390x: Add shutdown for TCG s390-virtio machine
s390: Fix cpu shutdown for KVM
s390: fix short kernel command lines
s390: fix reset hypercall to reset the status
s390x: implement SIGP restart and shutdown
s390x: implement rrbe instruction properly
s390x: update R and C bits in storage key
s390x: make ipte 31-bit aware
s390x: add ldeb instruction
Blue Swirl [Sat, 19 Nov 2011 11:17:11 +0000 (11:17 +0000)]
Merge branch 'ppc-1.0' of git://repo.or.cz/qemu/agraf
* 'ppc-1.0' of git://repo.or.cz/qemu/agraf:
pseries: Fix qdev.id handling in the VIO bus code
pseries: Allow kernel's early debug output to work
pseries: Default reg for vty should be SPAPR_VTY_BASE_ADDRESS
pseries: Check we have a chardev in spapr_vty_init()
pseries: Fix buggy spapr_vio_find_by_reg()
pseries: Correct RAM size check for SLOF
PPC: Fix for the gdb single step problem on an rfi instruction
tcg-ppc64: Fix compile errors for userspace only builds with gcc 4.6
pseries: Fix initialization of sPAPREnvironment structure
Michael Ellerman [Tue, 15 Nov 2011 18:53:13 +0000 (18:53 +0000)]
pseries: Fix qdev.id handling in the VIO bus code
When the user creates a device on the command line with -device, they
can specify the id, using id=foo. Currently the VIO bus code overwrites
this id with it's own value. We should only set qdev.id if it is not
already set by the user.
The device tree code uses qdev.id for the device tree node name, however
we can't rely on the user specifiying the id using proper device tree
syntax, ie. device@reg. So separate the device tree node name from the
qdev.id, but use the same syntax, so they will match by default.
David Gibson [Sun, 13 Nov 2011 17:19:01 +0000 (17:19 +0000)]
pseries: Allow kernel's early debug output to work
The PAPR specification defines a virtual TTY/console interface for guest
OSes to use via the H_PUT_TERM_CHAR and H_GET_TERM_CHAR hypercalls. There
can be multiple virtual ttys, so these take a "termno" parameter. This
encodes which vty to use as the 'reg' property on the device tree node
associated with that vty.
However, with the early debug options enabled, the Linux kernel will
attempt debugging output through the vty very early, before it has read
the device tree. In this case it always uses a termno of 0. This works
on the existing PowerVM hypervisor, so we assume there must be a hack /
feature in there which interprets termno==0 to mean the default primary
console.
To help with debugging kernels, including existing distribution kernels,
this patch implements a similar feature / hack in qemu. If termno==0
is supplied to H_{GET,PUT}_TERM_CHAR, they use the first available vty
device instead.
We need to be careful in the case that the user has manually created
an spapr-vty at address 0. So first we search for the specified reg and
only if that doesn't match do we fall back.
Michael Ellerman [Sun, 13 Nov 2011 17:19:00 +0000 (17:19 +0000)]
pseries: Default reg for vty should be SPAPR_VTY_BASE_ADDRESS
In commit b4a78527359a4540d84d4cdf629d01cbb262f698 ("Place pseries vty
devices at addresses more similar to existing machines"), we changed the
default reg for the vty to 0x30000000, however we didn't update the default
value for a user specified vty device. Fix that.
Michael Ellerman [Sun, 13 Nov 2011 17:18:59 +0000 (17:18 +0000)]
pseries: Check we have a chardev in spapr_vty_init()
If qemu is run like:
qemu-system-ppc64 -nodefaults -device spapr-vty
We end up in spapr_vty_init() with dev->chardev == NULL. Currently
that leads to a segfault because we unconditionally call
qemu_chr_add_handlers().
Although we could make that call conditional, I think a spapr-vty
without a chardev is basically useless so fail the init. This is
similar to what the serial code does for example.
David Gibson [Sun, 13 Nov 2011 17:18:58 +0000 (17:18 +0000)]
pseries: Fix buggy spapr_vio_find_by_reg()
The spapr_vio_find_by_reg() function in hw/spapr_vio.c is supposed to find
the device structure for a PAPR virtual IO device with the given reg value,
and return NULL if none exists.
It does the first ok, but if no device with that reg exists, it just
returns the last device traversed in the list. This patch fixes it.
David Gibson [Sun, 13 Nov 2011 17:18:57 +0000 (17:18 +0000)]
pseries: Correct RAM size check for SLOF
The SLOF firmware used on the pseries machine needs a reasonable amount of
(guest) RAM in order to run, so we have a check in the machine init
function to check that this is available. However, SLOF runs in real mode
(MMU off) which means it can only actually access the RMA (Real Mode Area),
not all of RAM. In many cases the RMA is the same as all RAM, but when
running with Book3S HV KVM on PowerPC 970, the RMA must be especially
allocated to be (host) physically contiguous. In this case, the RMA size
is determined by what the host admin allocated at boot time, and will
usually be less than the whole guest RAM size.
This patch corrects the test to see if SLOF has enough memory for this
case.
In addition, more recent versions of SLOF that were committed earlier don't
need quite as much memory as earlier versions. Therefore, this patch also
reduces the amount of RAM we require to run SLOF.
Paolo Bonzini [Mon, 14 Nov 2011 13:31:52 +0000 (14:31 +0100)]
scsi-block: always use SG_IO for MMC devices
CD burning messes up the state of the host page cache and host block
device. Just pass all operations down to the device, even though that
might have slightly worse performance. Everything else just is not
reliable in combination with burning.