Haozhong Zhang [Tue, 24 Nov 2015 03:33:57 +0000 (11:33 +0800)]
target-i386: Add support to migrate vcpu's TSC rate
This patch enables migrating vcpu's TSC rate. If KVM on the
destination machine supports TSC scaling, guest programs will
observe a consistent TSC rate across the migration.
If TSC scaling is not supported on the destination machine, the
migration will not be aborted and QEMU on the destination will
not set vcpu's TSC rate to the migrated value.
If vcpu's TSC rate specified by CPU option 'tsc-freq' on the
destination machine is inconsistent with the migrated TSC rate,
the migration will be aborted.
For backwards compatibility, the migration of vcpu's TSC rate is
disabled on pc-*-2.5 and older machine types.
Signed-off-by: Haozhong Zhang <[email protected]> Reviewed-by: Eduardo Habkost <[email protected]>
[ehabkost: Rewrote comment at kvm_arch_put_registers()]
[ehabkost: Moved compat code to pc-2.5] Signed-off-by: Eduardo Habkost <[email protected]>
Haozhong Zhang [Tue, 24 Nov 2015 03:33:56 +0000 (11:33 +0800)]
target-i386: Reorganize TSC rate setting code
Following changes are made to the TSC rate setting code in
kvm_arch_init_vcpu():
* The code is moved to a new function kvm_arch_set_tsc_khz().
* If kvm_arch_set_tsc_khz() fails, i.e. following two conditions are
both satisfied:
* KVM does not support the TSC scaling or it fails to set vcpu's
TSC rate by KVM_SET_TSC_KHZ,
* the TSC rate to be set is different than the value currently used
by KVM, then kvm_arch_init_vcpu() will fail. Prevously,
* the lack of TSC scaling never failed kvm_arch_init_vcpu(),
* the failure of KVM_SET_TSC_KHZ failed kvm_arch_init_vcpu()
unconditionally, even though the TSC rate to be set is identical
to the value currently used by KVM.
Haozhong Zhang [Tue, 24 Nov 2015 03:33:55 +0000 (11:33 +0800)]
target-i386: Fallback vcpu's TSC rate to value returned by KVM
If no user-specified TSC rate is present, we will try to set
env->tsc_khz to the value returned by KVM_GET_TSC_KHZ. This patch
does not change the current functionality of QEMU and just
prepares for later patches to enable migrating vcpu's TSC rate.
Peter Maydell [Thu, 21 Jan 2016 13:09:47 +0000 (13:09 +0000)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches
# gpg: Signature made Wed 20 Jan 2016 15:37:57 GMT using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
* remotes/kevin/tags/for-upstream:
iotests: Test that throttle values ranges
blockdev: Error out on negative throttling option values
vmdk: Create streamOptimized as version 3
qcow2: Make image inaccessible after failed qcow2_invalidate_cache()
qcow2: Fix BDRV_O_INACTIVE handling in qcow2_invalidate_cache()
qcow2: Implement .bdrv_inactivate
block: Inactivate BDS when migration completes
block: Rename BDRV_O_INCOMING to BDRV_O_INACTIVE
block: Fix error path in bdrv_invalidate_cache()
block: Assert no write requests under BDRV_O_INCOMING
qcow2: Write full header on image creation
qcow2: Write feature table only for v3 images
block: Clean up includes
qemu-iotests: Reduce racy output in 028
qemu-img: Speed up comparing empty/zero images
block/raw-posix: avoid bogus fixup for cylinders on DASD disks
block: Fix .bdrv_open flags
Peter Maydell [Thu, 21 Jan 2016 12:42:17 +0000 (12:42 +0000)]
Merge remote-tracking branch 'remotes/berrange/tags/pull-io-next-2016-01-20-1' into staging
I/O channels fixes 2016/01/20 v1
# gpg: Signature made Wed 20 Jan 2016 11:31:47 GMT using RSA key ID 15104FDF
# gpg: Good signature from "Daniel P. Berrange <[email protected]>"
# gpg: aka "Daniel P. Berrange <[email protected]>"
* remotes/berrange/tags/pull-io-next-2016-01-20-1:
io: use memset instead of { 0 } for initializing array
io: fix description of @errp parameter initialization
io: some fixes to handling of /dev/null when running commands
io: increment counter when killing off subcommand
io: fix sign of errno value passed to error report
Peter Maydell [Thu, 21 Jan 2016 12:09:41 +0000 (12:09 +0000)]
Merge remote-tracking branch 'remotes/kraxel/tags/pull-socket-20160120-1' into staging
Convert qemu-socket to use QAPI exclusively, update MAINTAINERS.
# gpg: Signature made Wed 20 Jan 2016 06:49:07 GMT using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg: aka "Gerd Hoffmann <[email protected]>"
# gpg: aka "Gerd Hoffmann (private) <[email protected]>"
* remotes/kraxel/tags/pull-socket-20160120-1:
vnc: distiguish between ipv4/ipv6 omitted vs set to off
sockets: remove use of QemuOpts from socket_dgram
sockets: remove use of QemuOpts from socket_connect
sockets: remove use of QemuOpts from socket_listen
sockets: remove use of QemuOpts from header file
add MAINTAINERS entry for qemu socket code
Fam Zheng [Wed, 20 Jan 2016 04:21:20 +0000 (12:21 +0800)]
blockdev: Error out on negative throttling option values
extract_common_blockdev_options() uses qemu_opt_get_number() to parse
the bps/iops numbers to uint64_t, then converts to double and stores in
ThrottleConfig. The actual parsing is done by strtoull() in
parse_option_number(). Negative numbers are wrapped to large positive
ones, and stored.
We used to reject negative numbers since 7d81c1413c9, but this regressed
when the option parsing code was changed later. Now fix this again.
This time, define an arbitrary large upper limit (1e15), and check the
values so both negative and impractically big numbers are caught and
reported.
Kevin Wolf [Tue, 22 Dec 2015 15:14:10 +0000 (16:14 +0100)]
qcow2: Make image inaccessible after failed qcow2_invalidate_cache()
If qcow2_invalidate_cache() fails, we are in a state where qcow2_close()
has already been completed, but the image hasn't been reopened yet.
Calling into any qcow2 function for an image in this state will cause
crashes.
The real solution would be to get rid of the close/open pair and instead
do an atomic reset of the involved data structures, but this isn't
trivial, so let's just make the image inaccessible for now.
Kevin Wolf [Tue, 22 Dec 2015 15:10:32 +0000 (16:10 +0100)]
qcow2: Fix BDRV_O_INACTIVE handling in qcow2_invalidate_cache()
What qcow2_invalidate_cache() should do is close the image with
BDRV_O_INACTIVE set and reopen it with the flag cleared. In fact, it
used to do exactly the opposite: qcow2_close() relied on bs->open_flags,
which is already updated to have cleared BDRV_O_INACTIVE at this point,
whereas qcow2_open() was called with s->flags, which has the flag still
set. Fix this.
Kevin Wolf [Tue, 22 Dec 2015 15:04:57 +0000 (16:04 +0100)]
qcow2: Implement .bdrv_inactivate
The callback has to ensure that closing or flushing the image afterwards
wouldn't cause a write access to the image files. This means that just
the caches have to be written out, which is part of the existing
.bdrv_close implementation.
Kevin Wolf [Tue, 22 Dec 2015 13:07:08 +0000 (14:07 +0100)]
block: Inactivate BDS when migration completes
So far, live migration with shared storage meant that the image is in a
not-really-ready don't-touch-me state on the destination while the
source is still actively using it, but after completing the migration,
the image was fully opened on both sides. This is bad.
This patch adds a block driver callback to inactivate images on the
source before completing the migration. Inactivation means that it goes
to a state as if it was just live migrated to the qemu instance on the
source (i.e. BDRV_O_INACTIVE is set). You're then supposed to continue
either on the source or on the destination, which takes ownership of the
image.
A typical migration looks like this now with respect to disk images:
1. Destination qemu is started, the image is opened with
BDRV_O_INACTIVE. The image is fully opened on the source.
2. Migration is about to complete. The source flushes the image and
inactivates it. Now both sides have the image opened with
BDRV_O_INACTIVE and are expecting the other side to still modify it.
3. One side (the destination on success) continues and calls
bdrv_invalidate_all() in order to take ownership of the image again.
This removes BDRV_O_INACTIVE on the resuming side; the flag remains
set on the other side.
This ensures that the same image isn't written to by both instances
(unless both are resumed, but then you get what you deserve). This is
important because .bdrv_close for non-BDRV_O_INACTIVE images could write
to the image file, which is definitely forbidden while another host is
using the image.
Kevin Wolf [Wed, 13 Jan 2016 14:56:06 +0000 (15:56 +0100)]
block: Rename BDRV_O_INCOMING to BDRV_O_INACTIVE
Instead of covering only the state of images on the migration
destination before the migration is completed, the flag will also cover
the state of images on the migration source after completion. This
common state implies that the image is technically still open, but no
writes will happen and any cached contents will be reloaded from disk if
and when the image leaves this state.
Kevin Wolf [Wed, 16 Dec 2015 13:00:36 +0000 (14:00 +0100)]
block: Assert no write requests under BDRV_O_INCOMING
As long as BDRV_O_INCOMING is set, the image file is only opened so we
have a file descriptor for it. We're definitely not supposed to modify
the image, it's still owned by the migration source.
Kevin Wolf [Wed, 2 Dec 2015 17:34:39 +0000 (18:34 +0100)]
qcow2: Write full header on image creation
When creating a qcow2 image, we didn't necessarily call
qcow2_update_header(), but could end up with the basic header that
qcow2_create2() created manually. One thing that this basic header
lacks is the feature table. Let's make sure that it's always present.
This requires a few updates to test cases as well.
Eric Blake [Fri, 11 Dec 2015 03:27:17 +0000 (20:27 -0700)]
qemu-iotests: Reduce racy output in 028
On my machine, './check -qcow2 028' was failing about 80% of the
time, due to a race in how many times the repeated attempts
to run 'info block-jobs' could occur before the job was done,
showing up as a failure of fewer '(qemu) ' prompts than in the
expected output. Silence the output during the repetitions, then
add a final clean command to keep the expected output useful;
once patched, I was finally able to run the test 20 times in a
row with no failures.
Fam Zheng [Wed, 13 Jan 2016 08:37:41 +0000 (16:37 +0800)]
qemu-img: Speed up comparing empty/zero images
Two empty raw files are always compared by actually reading data even if
there is no data, because BDRV_BLOCK_ZERO is considered "allocated" in
bdrv_is_allocated_above(). That is inefficient.
Use bdrv_get_block_status_above() for more information, and skip the
consecutive zero sectors.
This brings a huge speed up in comparing sparse/empty raw images:
$ qemu-img create a 1G
$ time ~/build/master/bin/qemu-img compare a a
Images are identical.
io: use memset instead of { 0 } for initializing array
Some versions of GCC on OS-X complain about CMSG_SPACE
not being constant size, which prevents use of { 0 }
io/channel-socket.c: In function 'qio_channel_socket_writev':
io/channel-socket.c:497:18: error: variable-sized object may not be initialized
char control[CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)] = { 0 };
The compiler is at fault here, but it is nicer to avoid
tickling this compiler bug by using memset instead.
io: some fixes to handling of /dev/null when running commands
The /dev/null file handle was leaked in a couple of places.
There is also the possibility that both readfd and writefd
point to the same /dev/null file handle, so care must be
taken not to close the same file handle twice.
Alex Williamson [Tue, 19 Jan 2016 18:33:42 +0000 (11:33 -0700)]
vfio/pci: Lazy PBA emulation
The PCI spec recommends devices use additional alignment for MSI-X
data structures to allow software to map them to separate processor
pages. One advantage of doing this is that we can emulate those data
structures without a significant performance impact to the operation
of the device. Some devices fail to implement that suggestion and
assigned device performance suffers.
One such case of this is a Mellanox MT27500 series, ConnectX-3 VF,
where the MSI-X vector table and PBA are aligned on separate 4K
pages. If PBA emulation is enabled, performance suffers. It's not
clear how much value we get from PBA emulation, but the solution here
is to only lazily enable the emulated PBA when a masked MSI-X vector
fires. We then attempt to more aggresively disable the PBA memory
region any time a vector is unmasked. The expectation is then that
a typical VM will run entirely with PBA emulation disabled, and only
when used is that emulation re-enabled.
Alex Williamson [Tue, 19 Jan 2016 18:33:41 +0000 (11:33 -0700)]
vfio/pci-quirks: Only quirk to size of PCI config space
For quirks that support the full PCIe extended config space, limit the
quirk to only the size of config space available through vfio. This
allows host systems with broken MMCONFIG regions to still make use of
these quirks without generating bad address faults trying to access
beyond the end of config space exposed through vfio. This may expose
direct access to the mirror of extended config space, only trapping
the sub-range of standard config space, but allowing this makes the
quirk, and thus the device, functional. We expect that only device
specific accesses make use of the mirror, not general extended PCI
capability accesses, so any virtualization in this space is likely
unnecessary anyway, and the device is still IOMMU isolated, so it
should only be able to hurt itself through any bogus configurations
enabled by this space.
block/raw-posix: avoid bogus fixup for cylinders on DASD disks
large volume DASD that have > 64k cylinders do claim to have
0xFFFE cylinders as special value in the old 16 bit field. We
want to pass this "token" along to the guest, instead of
calculating the real number. Otherwise qemu might fail with
"cyls must be between 1 and 65535"
Kevin Wolf [Mon, 11 Jan 2016 18:07:50 +0000 (19:07 +0100)]
block: Fix .bdrv_open flags
bdrv_common_open() modified bs->open_flags after inferring the set of
options to pass to the driver's .bdrv_open callback. This means that the
cache options were correctly set in bs->open_flags (and therefore
correctly displayed in 'info block'), but the image would actually be
opened with the default cache mode instead.
This patch removes the flags parameter to bdrv_common_open() (except for
BDRV_O_NO_BACKING it's the same as bs->open_flags anyway, and having two
names for the same thing is confusing), and moves the assignment of
open_flags down to immediately before calling into the block drivers. In
all other places, bs->open_flags is now used consistently.
vnc: distiguish between ipv4/ipv6 omitted vs set to off
The VNC code for interpreting QemuOpts does not currently
distinguish between ipv4/ipv6 being omitted, and being
set to 'off', because historically the 'ipv4' and 'ipv6'
options were just flags which did not accept a value.
The upshot is that if someone runs
$QEMU -vnc localhost:1,ipv6=off
QEMU still uses PF_UNSPEC and thus may still bind to IPv6,
when it should use PF_INET.
This is another instance of the problem previously fixed
for chardevs in
The socket_dgram method accepts a QAPI SocketAddress object
which it then turns into QemuOpts before calling the
inet_dgram_opts helper method. By converting the latter to
use QAPI SocketAddress directly, the QemuOpts conversion
step can be eliminated.
This removes the very last use of QemuOpts from the
sockets code, so the socket_optslist[] array is also
removed.
sockets: remove use of QemuOpts from socket_connect
The socket_connect method accepts a QAPI SocketAddress object
which it then turns into QemuOpts before calling the
inet_connect_opts/unix_connect_opts helper methods. By
converting the latter to use QAPI SocketAddress directly,
the QemuOpts conversion step can be eliminated
sockets: remove use of QemuOpts from socket_listen
The socket_listen method accepts a QAPI SocketAddress object
which it then turns into QemuOpts before calling the
inet_listen_opts/unix_listen_opts helper methods. By
converting the latter to use QAPI SocketAddress directly,
the QemuOpts conversion step can be eliminated
There are no callers of the sockets methods which accept
QemuOpts any more. Make all the QemuOpts related functions
static to avoid new callers being added, in preparation
for removal of all QemuOpts usage, in favour of QAPI
SocketAddress.
When killing the subcommand, it is intended to first send
SIGTERM, then SIGKILL and only report an error if it still
doesn't die after SIGKILL. The 'step' counter was not
being incremented though, so the code never got past the
SIGTERM stage.
Andreas Färber [Mon, 18 Jan 2016 17:19:35 +0000 (18:19 +0100)]
MAINTAINERS: Fix sPAPR entry heading
get_maintainers.pl does not handle parenthesis in maintenance areas well
in connection with list emails (here: [email protected]).
Resolve a recurring CC issue breaking git-send-email by reverting part
of commit 085eb217dfb3ee12e7985c11f71f8a038394735a ("Add David Gibson
for sPAPR in MAINTAINERS file").
Paolo Bonzini [Mon, 19 Oct 2015 11:11:39 +0000 (13:11 +0200)]
qdev: Free QemuOpts when the QOM path goes away
Otherwise there is a race where the DEVICE_DELETED event has been sent but
attempts to reuse the ID will fail.
Note that similar races exist for other QemuOpts, which this patch
does not attempt to fix.
For example, if the device is a block device, then unplugging it also
deletes its backend. However, this backend's get deleted in
drive_info_del(), which is only called when properties are
destroyed. Just like device_finalize(), drive_info_del() is called
some time after DEVICE_DELETED is sent. A separate patch series has
been sent to plug this other bug. Character devices also have yet to
be fixed.
Currently the ObjectProperty iterator API works as follows:
ObjectPropertyIterator *iter;
iter = object_property_iter_init(obj);
while ((prop = object_property_iter_next(iter))) {
...
}
object_property_iter_free(iter);
This has the benefit that the ObjectPropertyIterator struct
can be opaque, but has the downside that callers need to
explicitly call a free function. It is also not in keeping
with iterator style used elsewhere in QEMU/GLib2.
This patch changes the API to use stack allocation instead:
ObjectPropertyIterator iter;
object_property_iter_init(&iter, obj);
while ((prop = object_property_iter_next(&iter))) {
...
}
qom: Allow properties to be registered against classes
When there are many instances of a given class, registering
properties against the instance is wasteful of resources. The
majority of objects have a statically defined list of possible
properties, so most of the properties are easily registerable
against the class. Only those properties which are conditionally
registered at runtime need be recorded against the klass.
Registering properties against classes also makes it possible
to provide static introspection of QOM - currently introspection
is only possible after creating an instance of a class, which
severely limits its usefulness.
This impl only supports simple scalar properties. It does not
attempt to allow child object / link object properties against
the class. There are ways to support those too, but it would
make this patch more complicated, so it is left as an exercise
for the future.
There is no equivalent to object_property_del() provided, since
classes must be immutable once they are defined.
Peter Maydell [Mon, 7 Dec 2015 16:23:43 +0000 (16:23 +0000)]
scripts: Add new clean-includes script to fix C include directives
Add a new scripts/clean-includes, which can be used to automatically
ensure that a C source file includes qemu/osdep.h first and doesn't
then include any headers which osdep.h provides already.
Similarly to the commit 764eb39d1b6 fixing VNC+SASL+QXL, when starting
QEMU with SPICE but no SASL, and at the same time VNC with SASL, then
spice_server_init() will get called without a previous call to
spice_server_set_sasl_appname(), which will cause cyrus-sasl to
try to use /etc/sasl2/spice.conf (spice-server uses "spice" as its
default appname) rather than the expected /etc/sasl2/qemu.conf.
This commit unconditionally calls spice_server_set_sasl_appname()
before calling spice_server_init() in order to use the correct appname
even if SPICE without SASL was requested on qemu command line.
This pointer should be cleared in vnc_display_close()
otherwise a use-after-free can happen when when using the
old style 'x509' and 'tls' options rather than a persistent
tls-creds -object, by issuing monitor commands to change
the vnc server like so:
Start with: -vnc unix:test.socket,x509,tls
Then use the following monitor command:
change vnc unix:test.socket
After this the pointer is still set but invalid and a crash
can be triggered for instance by issuing the same command a
second time which will try to object_unparent() the same
pointer again.
Paolo Bonzini [Thu, 17 Dec 2015 12:47:02 +0000 (13:47 +0100)]
gtk: implement set_echo
Even without line editing, this makes -qmp vc more pleasant with the
GTK+ backend. The only issue is that set_echo is invoked very early,
long before a vc is actually associated with a VirtualConsole. To work
around this, create a temporary VirtualConsole until then.
Peter Maydell [Mon, 18 Jan 2016 09:33:36 +0000 (09:33 +0000)]
Merge remote-tracking branch 'remotes/mcayland/tags/qemu-sparc-signed' into staging
qemu-sparc update
# gpg: Signature made Sat 16 Jan 2016 12:32:06 GMT using RSA key ID AE0F321F
# gpg: Good signature from "Mark Cave-Ayland <[email protected]>"
* remotes/mcayland/tags/qemu-sparc-signed:
target-sparc: Migrate CWP and PIL for SPARC64
target-sparc: Use VMState arrays for SPARC64 TLB/MMU state
target-sparc: Convert to VMStateDescription
target-sparc: Don't flush TLB in cpu_load function
target-sparc: Split cpu_put_psr into side-effect and no-side-effect parts
vmstate: define vmstate_info_uinttl
vmstate: Introduce VMSTATE_VARRAY_MULTPLY
vmstate: introduce CPU_DoubleU arrays
Peter Maydell [Mon, 11 Jan 2016 12:40:28 +0000 (12:40 +0000)]
target-sparc: Migrate CWP and PIL for SPARC64
In SPARC32 the env->cwp and env->psrpil state is part of the PSR
register, and gets migrated as part of that register.
In SPARC64 this state is in separate CWP and PIL registers, but we
were not doing anything to migrate those.
Add the missing fields to the migration vmstate (which is a
migration break, but without these fields migration is completely
broken anyway).
This change means that trying a save/load of a SPARC64 target at
the boot rom prompt now produces a system which at least responds
to keyboard input after the restore.
Peter Maydell [Mon, 11 Jan 2016 12:40:27 +0000 (12:40 +0000)]
target-sparc: Use VMState arrays for SPARC64 TLB/MMU state
Use VMState arrays for SPARC64 TLB/MMU state. This is
a migration-break for SPARC64 (but not for SPARC32),
which is acceptable because currently migration does not
work for any SPARC64 machines due to the lack of any migration
of interrupt controller state.
Juan Quintela [Mon, 11 Jan 2016 12:40:26 +0000 (12:40 +0000)]
target-sparc: Convert to VMStateDescription
Convert the SPARC CPU from cpu_load/save functions to VMStateDescription.
We preserve migration compatibility with the previous version
(required for SPARC32 but not necessarily for SPARC64).
Signed-off-by: Juan Quintela <[email protected]>
[PMM:
* Rebase and update to apply to master
* VMSTATE_STRUCT_POINTER now takes type, not pointer-to-type
* QEMUTimer* are migrated via VMSTATE_TIMER_PTR
* Put CPUTimer vmstate struct inside TARGET_SPARC64 ifdef
* Convert handling of PSR to use a vmstate_psr, like Alpha and ARM
] Signed-off-by: Peter Maydell <[email protected]> Signed-off-by: Mark Cave-Ayland <[email protected]>
Peter Maydell [Mon, 11 Jan 2016 12:40:24 +0000 (12:40 +0000)]
target-sparc: Split cpu_put_psr into side-effect and no-side-effect parts
For inbound migration we really want to be able to set the PSR without
having any side effects, but cpu_put_psr() calls cpu_check_irqs() which
might try to deliver CPU interrupts. Split cpu_put_psr() into the
no-side-effect and side-effect parts.
This includes reordering the cpu_check_irqs() to the end of cpu_put_psr(),
because that function may actually end up calling cpu_interrupt(), which
does not seem like a good thing to happen in the middle of updating the PSR.
Juan Quintela [Mon, 11 Jan 2016 12:40:23 +0000 (12:40 +0000)]
vmstate: define vmstate_info_uinttl
We are going to define arrays of this type, so we need the integer type.
Signed-off-by: Juan Quintela <[email protected]>
[PMM: updated to apply on current QEMU; renamed to 'uinttl'
rather than 'uinttls' to match other vmstate naming] Signed-off-by: Peter Maydell <[email protected]> Signed-off-by: Mark Cave-Ayland <[email protected]>
Juan Quintela [Mon, 11 Jan 2016 12:40:21 +0000 (12:40 +0000)]
vmstate: introduce CPU_DoubleU arrays
Add vmstate support for migrating arrays of CPU_DoubleU via
VMSTATE_CPUDOUBLE_ARRAY.
Signed-off-by: Juan Quintela <[email protected]>
[PMM: rebased, since files have all moved since 2012;
added VMSTATE_CPUDOUBLE_ARRAY_V for consistency with FLOAT64] Signed-off-by: Peter Maydell <[email protected]> Signed-off-by: Mark Cave-Ayland <[email protected]>
# gpg: Signature made Fri 15 Jan 2016 17:58:28 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg: aka "Paolo Bonzini <[email protected]>"
* remotes/bonzini/tags/for-upstream:
qemu-char: do not leak QemuMutex when freeing a character device
qemu-char: add logfile facility to all chardev backends
nbd-server: do not exit on failed memory allocation
nbd-server: do not check request length except for reads and writes
nbd-server: Coroutine based negotiation
nbd: Split nbd.c
nbd: Always call "close_fn" in nbd_client_new
SCSI device: fix to incomplete QOMify
iscsi: send readcapacity10 when readcapacity16 failed
qemu-char: delete send_all/recv_all helper methods
vmw_pvscsi: x-disable-pcie, x-old-pci-configuration back-compat props are 2.5 specific
scsi: initialise info object with appropriate size
i386: avoid null pointer dereference
target-i386: do not duplicate page protection checks
scsi: revert change to scsi_req_cancel_async and add assertions
qemu-char: add logfile facility to all chardev backends
Typically a UNIX guest OS will log boot messages to a serial
port in addition to any graphical console. An admin user
may also wish to use the serial port for an interactive
console. A virtualization management system may wish to
collect system boot messages by logging the serial port,
but also wish to allow admins interactive access.
Currently providing such a feature forces the mgmt app
to either provide 2 separate serial ports, one for
logging boot messages and one for interactive console
login, or to proxy all output via a separate service
that can multiplex the two needs onto one serial port.
While both are valid approaches, they each have their
own downsides. The former causes confusion and extra
setup work for VM admins creating disk images. The latter
places an extra burden to re-implement much of the QEMU
chardev backends logic in libvirt or even higher level
mgmt apps and adds extra hops in the data transfer path.
A simpler approach that is satisfactory for many use
cases is to allow the QEMU chardev backends to have a
"logfile" property associated with them.
This patch introduces a 'ChardevCommon' struct which
is setup as a base for all the ChardevBackend types.
Ideally this would be registered directly as a base
against ChardevBackend, rather than each type, but
the QAPI generator doesn't allow that since the
ChardevBackend is a non-discriminated union. The
ChardevCommon struct provides the optional 'logfile'
parameter, as well as 'logappend' which controls
whether QEMU truncates or appends (default truncate).
Paolo Bonzini [Thu, 7 Jan 2016 13:34:13 +0000 (14:34 +0100)]
nbd-server: do not exit on failed memory allocation
The amount of memory allocated in nbd_co_receive_request is driven by the
NBD client (possibly a virtual machine). Parallel I/O can cause the
server to allocate a large amount of memory; check for failures and
return ENOMEM in that case.
Paolo Bonzini [Thu, 7 Jan 2016 13:32:42 +0000 (14:32 +0100)]
nbd-server: do not check request length except for reads and writes
Only reads and writes need to allocate memory correspondent to the
request length. Other requests can be sent to the storage without
allocating any memory, and thus any request length is acceptable.
Fam Zheng [Thu, 14 Jan 2016 08:41:03 +0000 (16:41 +0800)]
nbd-server: Coroutine based negotiation
Create a coroutine in nbd_client_new, so that nbd_send_negotiate doesn't
need qemu_set_block().
Handlers need to be set temporarily for csock fd in case the coroutine
yields during I/O.
With this, if the other end disappears in the middle of the negotiation,
we don't block the whole event loop.
To make the code clearer, unify all function names that belong to
negotiate, so they are less likely to be misused. This is important
because we rely on negotiation staying in main loop, as commented in
nbd_negotiate_read/write().
Fam Zheng [Thu, 14 Jan 2016 08:41:01 +0000 (16:41 +0800)]
nbd: Always call "close_fn" in nbd_client_new
Rename the parameter "close" to "close_fn" to disambiguous with
close(2).
This unifies error handling paths of NBDClient allocation:
nbd_client_new will shutdown the socket and call the "close_fn" callback
if negotiation failed, so the caller don't need a different path than
the normal close.
The returned pointer is never used, make it void in preparation for the
next patch.
Zhu Lingshan [Tue, 29 Dec 2015 03:32:14 +0000 (11:32 +0800)]
iscsi: send readcapacity10 when readcapacity16 failed
When play with Dell MD3000 target, for sure it
is a TYPE_DISK, but readcapacity16 would fail.
Then we find that readcapacity10 succeeded. It
looks like the target just support readcapacity10
even through it is a TYPE_DISK or have some
TYPE_ROM characteristics.
This patch can give a chance to send
readcapacity16 when readcapacity10 failed.
This patch is not harmful to original pathes
The qemu-char.c contains two helper methods send_all
and recv_all. These are in fact declared in sockets.h
so ought to have been in util/qemu-sockets.c. For added
fun the impl of recv_all is completely missing on Win32.
Fortunately there is only a single caller of these
methods, the TPM passthrough code, which is only
ever compiled on Linux. With only a single caller
these helpers are not compelling enough to keep so
inline them in the TPM code, avoiding the need to
fix the missing recv_all on Win32.
P J P [Mon, 21 Dec 2015 09:43:13 +0000 (15:13 +0530)]
scsi: initialise info object with appropriate size
While processing controller 'CTRL_GET_INFO' command, the routine
'megasas_ctrl_get_info' overflows the '&info' object size. Use its
appropriate size to null initialise it.
P J P [Fri, 18 Dec 2015 06:05:07 +0000 (11:35 +0530)]
i386: avoid null pointer dereference
Hello,
A null pointer dereference issue was reported by Mr Ling Liu, CC'd here. It
occurs while doing I/O port write operations via hmp interface. In that,
'current_cpu' remains null as it is not called from cpu_exec loop, which
results in the said issue.
Below is a proposed (tested)patch to fix this issue; Does it look okay?
When I/O port write operation is called from hmp interface,
'current_cpu' remains null, as it is not called from cpu_exec()
loop. This leads to a null pointer dereference in vapic_write
routine. Add check to avoid it.
Paolo Bonzini [Tue, 17 Nov 2015 16:09:33 +0000 (17:09 +0100)]
target-i386: do not duplicate page protection checks
x86_cpu_handle_mmu_fault is currently checking twice for writability
and executability of pages; the first time to decide whether to
trigger a page fault, the second time to compute the "prot" argument
to tlb_set_page_with_attrs.
Reorganize code so that first "prot" is computed, then it is used
to check whether to raise a page fault, then finally PROT_WRITE is
removed if the D bit will have to be set.
Paolo Bonzini [Fri, 18 Dec 2015 08:54:53 +0000 (09:54 +0100)]
scsi: revert change to scsi_req_cancel_async and add assertions
Fam Zheng noticed that the change in commit 36896bf ("scsi: always call
notifier on async cancellation", 2015-12-16) could cause a leak of
the request; scsi_req_cancel_async now calls scsi_req_ref
multiple times for multiple cancellations, but there is only
one call to scsi_req_cancel_complete.
So revert the patch and instead assert that the problematic case (a call
to scsi_req_cancel_async after the aiocb has been completed) cannot
happen.
Peter Maydell [Fri, 15 Jan 2016 15:49:43 +0000 (15:49 +0000)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20160115' into staging
target-arm queue:
* use the right MMU index when handling unaligned accesses
* xlnx-zynqmp: Add support for high DDR memory regions
* target-arm: support QMP dump-guest-memory
* ARM: virt: Don't generate RTC ACPI device when using UEFI
* remotes/pmaydell/tags/pull-target-arm-20160115:
ARM: virt: Don't generate RTC ACPI device when using UEFI
target-arm: dump-guest-memory: add vfp notes for arm
elf: add arm note types
target-arm: dump-guest-memory: add prfpreg notes for aarch64
target-arm: support QMP dump-guest-memory
dump: allow target to set the physical base
dump: allow target to set the page size
dump: qemunotes aren't commonly needed
qapi-schema: dump-guest-memory: Improve text
xlnx-zynqmp: Add support for high DDR memory regions
target-arm: Use the right MMU index in arm_regime_using_lpae_format
Shannon Zhao [Fri, 15 Jan 2016 14:11:31 +0000 (22:11 +0800)]
ARM: virt: Don't generate RTC ACPI device when using UEFI
When booting the VM with UEFI, UEFI takes ownership of the RTC hardware.
While UEFI can use libfdt to disable the RTC device node in the DTB that
it passes to the OS, it cannot modify AML. Therefore, we won't generate
the RTC ACPI device at all when using UEFI.
Andrew Jones [Mon, 11 Jan 2016 19:56:25 +0000 (20:56 +0100)]
target-arm: dump-guest-memory: add vfp notes for arm
gdb won't actually dump these with 'info all-registers' since
it first tries to confirm that it should by checking the VFP
hwcap in the .auxv note. Well, we don't generate an .auxv note.
Andrew Jones [Mon, 11 Jan 2016 19:56:21 +0000 (20:56 +0100)]
dump: allow target to set the physical base
crash assumes the physical base in the kdump subheader of
makedumpfile formatted dumps is correct. Zero is not correct
for all architectures, so allow it to be changed.
Andrew Jones [Mon, 11 Jan 2016 19:56:20 +0000 (20:56 +0100)]
dump: allow target to set the page size
This is necessary for targets that don't have TARGET_PAGE_SIZE ==
real-target-page-size. The target should set the page size to the
correct one, if known, or, if not known, to the maximum page size
it supports.
Andrew Jones [Mon, 11 Jan 2016 19:56:19 +0000 (20:56 +0100)]
dump: qemunotes aren't commonly needed
Only one of three architectures implementing qmp-dump-guest-memory write
qemu notes. And, another architecture (arm/aarch64) is coming, which
won't use them either. Make the common implementation truly common.
Alistair Francis [Tue, 12 Jan 2016 22:39:18 +0000 (14:39 -0800)]
xlnx-zynqmp: Add support for high DDR memory regions
The Xilinx ZynqMP SoC and EP108 board supports three memory regions:
- A 2GB region starting at 0
- A 32GB region starting at 32GB
- A 256GB region starting at 768GB
This patch adds support for the first two memory regions, which is
automatically created based on the size specified by the QEMU memory
command line argument.
On hardware the physical memory region is one continuous region, it is then
mapped into the three different regions by the DDRC. As we don't model the
DDRC this is done at startup by QEMU. The board creates the memory region and
then passes that memory region to the SoC. The SoC then maps the memory
regions.
Alvise Rigo [Fri, 15 Jan 2016 10:37:42 +0000 (11:37 +0100)]
target-arm: Use the right MMU index in arm_regime_using_lpae_format
arm_regime_using_lpae_format checks whether the LPAE extension is used
for stage 1 translation regimes. MMU indexes not exclusively of a stage 1
regime won't work with this method.
In case of ARMMMUIdx_S12NSE0 or ARMMMUIdx_S12NSE1, offset these values
by ARMMMUIdx_S1NSE0 to get the right index indicating a stage 1
translation regime.
Rename also the function to arm_s1_regime_using_lpae_format and update
the comments to reflect the change.
Commit 8acc216b956 attempted to silence some sign-compare
warnings in libvixl by adding -Wno-sign-compare to the CFLAGS
for the relevant objects. Unfortunately it was ineffective
because it was placed before $(QEMU_CFLAGS), so the -Wall in
the general flags overrode -Wno-sign-compare rather than
vice-versa. Reorder the flags so the warning suppression works.
Thanks to Franz-Josef Haider <[email protected]>
for pointing out what was wrong with the original patch.
Peter Maydell [Thu, 14 Jan 2016 13:07:38 +0000 (13:07 +0000)]
Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2016-01-13' into staging
Error reporting patches for 2016-01-13
# gpg: Signature made Wed 13 Jan 2016 14:21:48 GMT using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <[email protected]>"
# gpg: aka "Markus Armbruster <[email protected]>"
* remotes/armbru/tags/pull-error-2016-01-13: (41 commits)
checkpatch: Detect newlines in error_report and other error functions
error: Consistently name Error * objects err, and not errp
s390/sclp: Simplify control flow in sclp_realize()
hw/s390x: Rename local variables Error *l_err to just err
error: Clean up errors with embedded newlines (again)
vhdx: Fix "log that needs to be replayed" error message
pci-assign: Clean up "Failed to assign" error messages
vmdk: Clean up "Invalid extent lines" error message
vmdk: Clean up control flow in vmdk_parse_extents() a bit
error: Strip trailing '\n' from error string arguments (again)
qemu-io qemu-nbd: Use error_report() etc. instead of fprintf()
migration: Use error_reportf_err() instead of monitor_printf()
spapr: Use error_reportf_err()
error: Use error_prepend() where it makes obvious sense
error: Use error_reportf_err() where it makes obvious sense
error: Don't decorate original error message when adding to it
error: New error_prepend(), error_reportf_err()
test-throttle: Simplify qemu_init_main_loop() error handling
qemu-nbd: Clean up "Failed to load snapshot" error message
block: Clean up "Could not create temporary overlay" error message
...