Peter Maydell [Fri, 20 Jun 2014 17:01:24 +0000 (18:01 +0100)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pc,pci,virtio,hotplug fixes, enhancements
numa work by Hu Tao and others
memory hotplug by Igor
vhost-user by Nikolay, Antonios and others
guest virtio announcements by Jason
qtest fixes by Sergey
qdev hotplug fixes by Paolo
misc other fixes mostly by myself
Signed-off-by: Michael S. Tsirkin <[email protected]>
* remotes/mst/tags/for_upstream: (109 commits)
numa: use RAM_ADDR_FMT with ram_addr_t
qapi/string-output-visitor: fix bugs
tests: simplify code
qapi: fix input visitor bugs
acpi: rephrase comment
qmp: add ACPI_DEVICE_OST event handling
qmp: add query-acpi-ospm-status command
acpi: implement ospm_status() method for PIIX4/ICH9_LPC devices
acpi: introduce TYPE_ACPI_DEVICE_IF interface
qmp: add query-memory-devices command
numa: handle mmaped memory allocation failure correctly
pc: acpi: do not hardcode preprocessor
qmp: clean out whitespace
qdev: recursively unrealize devices when unrealizing bus
qdev: reorganize error reporting in bus_set_realized
qapi: fix build on glib < 2.28
qapi: make string output visitor parse int list
qapi: make string input visitor parse int list
tests: fix memory leak in test of string input visitor
hmp: add info memdev
...
Conflicts:
include/hw/i386/pc.h
[PMM: fixed minor conflict in pc.h] Signed-off-by: Peter Maydell <[email protected]>
Peter Maydell [Fri, 20 Jun 2014 16:41:09 +0000 (17:41 +0100)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20140619' into staging
target-arm:
* Support PSCI 0.2 when using KVM
* fix AIRCR reset value for v7M CPUs
* report correct size information for pflash_cfi01
* minor coverity fixes
* avoid warnings on Windows builds due to #define clash
* implement TTBCR PD0/PD1 bits
# gpg: Signature made Thu 19 Jun 2014 18:35:06 BST using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <[email protected]>"
* remotes/pmaydell/tags/pull-target-arm-20140619:
armv7m_nvic: fix AIRCR implementation
Use PSCI v0.2 compatible string when KVM or TCG provides it
target-arm: Introduce per-CPU field for PSCI version
target-arm: Implement kvm_arch_reset_vcpu() for KVM ARM64
target-arm: Enable KVM_ARM_VCPU_PSCI_0_2 feature when possible
target-arm: Common kvm_arm_vcpu_init() for KVM ARM and KVM ARM64
kvm: Handle exit reason KVM_EXIT_SYSTEM_EVENT
hw/block/pflash_cfi01: Report correct size info for parallel configs
hw/arm/vexpress: Forbid specifying flash contents in two ways at once
target-arm/translate-a64.c: Fix dead ?: in handle_simd_shift_fpint_conv()
target-arm/translate-a64.c: Remove dead ?: in disas_simd_3same_int()
target-arm: Add ULL suffix to calculation of page size
hw/arm/spitz: Avoid clash with Windows header symbol MOD_SHIFT
target-arm: implement PD0/PD1 bits for TTBCR
Peter Maydell [Fri, 20 Jun 2014 15:57:28 +0000 (16:57 +0100)]
Merge remote-tracking branch 'remotes/kraxel/tags/pull-vnc-20140619-1' into staging
vnc: cleanups and fixes
# gpg: Signature made Thu 19 Jun 2014 12:02:09 BST using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg: aka "Gerd Hoffmann <[email protected]>"
# gpg: aka "Gerd Hoffmann (private) <[email protected]>"
* remotes/kraxel/tags/pull-vnc-20140619-1:
vnc: fix screen updates
vnc: Drop superfluous conditionals around g_strdup()
vnc: Drop superfluous conditionals around g_free()
target-arm: Introduce per-CPU field for PSCI version
We require to know the PSCI version available to given CPU at
potentially many places. Currently, we need to know PSCI version
when generating DTB for virt machine.
This patch introduce per-CPU 32bit field representing the PSCI
version available to the CPU. The encoding of this 32bit field
is same as described in PSCI v0.2 spec.
target-arm: Implement kvm_arch_reset_vcpu() for KVM ARM64
To implement kvm_arch_reset_vcpu(), we simply re-init the VCPU
using kvm_arm_vcpu_init() so that all registers of VCPU are set
to their reset values by in-kernel KVM code.
target-arm: Enable KVM_ARM_VCPU_PSCI_0_2 feature when possible
Latest linux kernel supports in-kernel emulation of PSCI v0.2 but
to enable it we need to select KVM_ARM_VCPU_PSCI_0_2 feature using
KVM_ARM_VCPU_INIT ioctl.
Also, we can use KVM_ARM_VCPU_PSCI_0_2 feature for VCPU only when
linux kernel has KVM_CAP_ARM_PSCI_0_2 capability.
This patch updates kvm_arch_init_vcpu() to enable KVM_ARM_VCPU_PSCI_0_2
feature for VCPU when KVM ARM/ARM64 has KVM_CAP_ARM_PSCI_0_2 capability.
target-arm: Common kvm_arm_vcpu_init() for KVM ARM and KVM ARM64
Introduce a common kvm_arm_vcpu_init() for doing KVM_ARM_VCPU_INIT
ioctl in KVM ARM and KVM ARM64. This also helps us factor-out few
common code lines from kvm_arch_init_vcpu() for KVM ARM/ARM64.
Peter Maydell [Thu, 19 Jun 2014 17:06:25 +0000 (18:06 +0100)]
hw/block/pflash_cfi01: Report correct size info for parallel configs
If the flash device is configured with a device-width which is
not equal to the bank-width, indicating that it is actually several
narrow flash devices in parallel, the CFI table should report the
number of blocks and the size of a single device, not of the whole
combined setup. This stops Linux from complaining:
"NOR chip too large to fit in mapping. Attempting to cope..."
As usual, we retain the old broken but backwards compatible behaviour
when the device-width is not specified.
Peter Maydell [Thu, 19 Jun 2014 17:06:25 +0000 (18:06 +0100)]
hw/arm/vexpress: Forbid specifying flash contents in two ways at once
Detect attempts by the user to specify the contents of the first flash
device via both -bios and -drive if=pflash... simultaneously and
print a helpful error message.
Peter Maydell [Thu, 19 Jun 2014 17:06:25 +0000 (18:06 +0100)]
target-arm/translate-a64.c: Fix dead ?: in handle_simd_shift_fpint_conv()
In handle_simd_shift_fpint_conv(), the combination of is_double == true,
is_scalar == false and is_q == false is an unallocated encoding; the
'both parts false' case of the nested ?: expression for calculating
maxpass is therefore unreachable and can be removed.
Peter Maydell [Thu, 19 Jun 2014 17:06:24 +0000 (18:06 +0100)]
target-arm/translate-a64.c: Remove dead ?: in disas_simd_3same_int()
In disas_simd_3same_int(), none of the instructions permit is_q
to be false with size == 3 (this would be a vector operation with
a one-element vector, and the instruction set encodes those as
scalar operations). Replace the always-true ?: check with an
assert.
Peter Maydell [Thu, 19 Jun 2014 17:06:24 +0000 (18:06 +0100)]
target-arm: Add ULL suffix to calculation of page size
The maximum block size for AArch64 address translation is 2GB. This means
that we need a ULL suffix on our shift to avoid shifting into the sign
bit of a signed 32 bit integer.
Fabian Aggeler [Thu, 19 Jun 2014 17:06:24 +0000 (18:06 +0100)]
target-arm: implement PD0/PD1 bits for TTBCR
Corrected handling of writes to TTBCR for ARMv8 (previously UNK/SBZP
bits are not RES0) and ARMv7 (new bits PD0/PD1 for CPUs with Security
Extensions).
Bits PD0/PD1 are now respected in get_phys_addr_v6/v5() and
get_level1_table_address.
commit 4407ab055be995e64633322a78e64dfa376dc534
vl.c: extend -m option to support options for memory hotplug
prints ram_addr_t with u64 format, this is wrong for
some systems, in particular w32.
print ram_addr_t with RAM_ADDR_FMT to fix build on w32.
because we forgot to pass 'true' as the human parameter on one of the
two calls to format_string.
Also, this is a worsening of quality; previously we would produce
16 (0x10)
to make it obvious which number was hex.
Fix these issues.
Igor Mammedov [Mon, 16 Jun 2014 17:12:26 +0000 (19:12 +0200)]
acpi: introduce TYPE_ACPI_DEVICE_IF interface
... it will be used to abstract generic ACPI bits from
device that implements ACPI interface.
ACPIOSTInfo type is used for passing-through raw _OST
event/status codes reported by guest OS to a management
layer. It lets management tools interpret values
as specified by ACPI spec if it is interested in it.
QEMU doesn't encode these values as enum, since it
doesn't need to handle them and it allows interface
to scale well without any changes in QEMU while guest
OS and management evolves in time.
when memory_region_init_ram_from_file() fails
memory_region_size() will still return size that was
provided at region init time.
Instead use errp to properly detect error condition.
Paolo Bonzini [Wed, 11 Jun 2014 12:52:09 +0000 (14:52 +0200)]
qdev: recursively unrealize devices when unrealizing bus
When the patch was posted that became 5c21ce7 (qdev: Realize buses
on device realization, 2014-03-12), it included recursive realization
and unrealization of devices when the bus's "realized" property
was toggled.
However, due to the same old worries about recursive realization
and prerequisites not being realized yet, those hunks were dropped when
committing the patch. Unfortunately, this causes a use-after-free bug
(easily reproduced by a PCI hot-unplug action).
Before the patch, device_unparent behaved as follows:
for each child bus
unparent bus ----------------------------.
| for each child device |
| unparent device ---------------. |
| | unrealize device | |
| | call dc->unparent | |
| '------------------------------- |
'----------------------------------------'
unrealize device
After the patch, it behaves as follows instead:
unrealize device --------------------.
| for each child bus |
| unrealize bus (A) |
'------------------------------------'
for each child bus
unparent bus ----------------------.
| for each child device |
| unrealize device (B) |
| call dc->unparent |
'----------------------------------'
At the step marked (B) the device might use data from the bus that is
not available anymore due to step (A).
To fix this, we need to unrealize devices before step (A). To sidestep
concerns about recursive realization, only do recursive unrealization
and leave the "value && !bus->realized" case as it is.
The resulting flow is:
for each child bus
unrealize bus ---------------------.
| for each child device |
| unrealize device (B) |
| call bc->unrealize (A) |
'----------------------------------'
unrealize device
for each child bus
unparent bus ----------------------.
| for each child device |
| unparent device |
'----------------------------------'
where everything is "powered down" before it is unassembled.
The following commits:
qapi: make string output visitor parse int list
qapi: make string input visitor parse int list
break with glib < 2.28 since they use the
new g_list_free_full function.
Paolo Bonzini [Tue, 10 Jun 2014 11:15:23 +0000 (19:15 +0800)]
hostmem: allow preallocation of any memory region
And allow preallocation of file-based memory even without -mem-prealloc.
Some care is necessary because -mem-prealloc does not allow disabling
preallocation for hostmem-file.
Paolo Bonzini [Wed, 14 May 2014 09:43:20 +0000 (17:43 +0800)]
memory: add error propagation to file-based RAM allocation
Right now, -mem-path will fall back to RAM-based allocation in some
cases. This should never happen with "-object memory-file", prepare
the code by adding correct error propagation.
Paolo Bonzini [Wed, 14 May 2014 09:43:19 +0000 (17:43 +0800)]
memory: move mem_path handling to memory_region_allocate_system_memory
Like the previous patch did in exec.c, split memory_region_init_ram and
memory_region_init_ram_from_file, and push mem_path one step further up.
Other RAM regions than system memory will now be backed by regular RAM.
Also, boards that do not use memory_region_allocate_system_memory will
not support -mem-path anymore. This can be changed before the patches
are merged by migrating boards to use the function.
Wanlong Gao [Wed, 14 May 2014 09:43:28 +0000 (17:43 +0800)]
configure: add Linux libnuma detection
Add detection of libnuma (mostly contained in the numactl package)
to the configure script. Can be enabled or disabled on the command
line, default is use if available.
Luiz Capitulino [Wed, 14 May 2014 09:43:10 +0000 (17:43 +0800)]
man: improve -numa doc
The -numa option documentation in qemu's manpage lacks the command-line
options and some information regarding how it relates to options -m and
-smp. This commit fills in the missing text.
Wanlong Gao [Wed, 14 May 2014 09:43:06 +0000 (17:43 +0800)]
NUMA: check if the total numa memory size is equal to ram_size
If the total number of the assigned numa nodes memory is not
equal to the assigned ram size, it will write the wrong data
to ACPI table, then the guest will ignore the wrong ACPI table
and recognize all memory to one node. It's buggy, we should
check it to ensure that we write the right data to ACPI table.
This test needs a bit more work: issues have been
found on legacy systems, disable it for now to
avoid false positives for people.
Will re-enable after issues are addressed.
Nikolay Nikolaev [Tue, 10 Jun 2014 10:03:23 +0000 (13:03 +0300)]
Add qtest for vhost-user
This test creates a 'server' chardev to listen for vhost-user messages.
Once VHOST_USER_SET_MEM_TABLE is received it mmaps each received region,
and read 1k bytes from it. The read data is compared to data from readl.
The test requires hugetlbfs to be already mounted and writable. The mount
point defaults to '/hugetlbfs' and can be specified via the environment
variable QTEST_HUGETLBFS_PATH.
The rom pc-bios/pxe-virtio.rom is used to instantiate a virtio pcicontroller.
This test needs a bit more work: issues have been
found on legacy systems, disable it for now to
avoid false positives for people.
Will re-enable after issues are addressed.
Nikolay Nikolaev [Tue, 27 May 2014 12:07:10 +0000 (15:07 +0300)]
libqemustub: add stubs to be able to use qemu-char.c
chardev depends on lots of external symbols that are not necessarily
needed to be able to use, for example, 'socket chardev'. So add stubs
for these functions:
Nikolay Nikolaev [Tue, 10 Jun 2014 10:02:57 +0000 (13:02 +0300)]
Add vhost-user protocol documentation
This document describes the basic message format used by vhost-user
for communication over a unix domain socket. The protocol is based
on the existing ioctl interface used for the kernel version of vhost.
Nikolay Nikolaev [Tue, 10 Jun 2014 10:02:16 +0000 (13:02 +0300)]
Add the vhost-user netdev backend to the command line
The supplied chardev id will be inspected for supported options. Only
a socket backend, with a set path (i.e. a Unix socket) and optionally
the server parameter set, will be allowed. Other options (nowait, telnet)
will make the chardev unusable and the netdev will not be initialised.
Peter Maydell [Thu, 19 Jun 2014 15:18:04 +0000 (16:18 +0100)]
Merge remote-tracking branch 'remotes/bonzini/scsi-next' into staging
* remotes/bonzini/scsi-next:
virtio-scsi: define dummy handle_output for vhost-scsi vqs
block/iscsi: drop obsolete pointers from iscsi_co_writev
block/iscsi: fix init value for iTask->retries
block/iscsi: bump libiscsi requirement to 1.9.0
virtio-scsi: add support for the any_layout feature
virtio-scsi: introduce virtio_scsi_complete_cmd_req
virtio-scsi: prepare sense data handling for any_layout
virtio-scsi: add extra argument and return type to qemu_sgl_concat
virtio-scsi: add target swap for VirtIOSCSICtrlTMFReq fields
virtio-scsi: start preparing for any_layout
util: add return value to qemu_iovec_concat_iov
megasas: use PCI DMA API
scsi: Print command name in debug
scsi-disk: fix bug in scsi_block_new_request() introduced by commit 137745c
scsi-disk.c: Fix compilation with -DDEBUG_SCSI
block/iscsi: use 16 byte CDBs only when necessary
block/iscsi: fix potential segfault on early callback
block/iscsi: handle BUSY condition
Stefan Weil [Wed, 28 May 2014 15:42:24 +0000 (17:42 +0200)]
w32: Fix regression caused by new g_poll implementation
Commit 5a007547df76446ab891df93ebc55749716609bf tried to fix a
performance degradation caused by bad handling of small timeouts
in the original implementation of g_poll.
Since that commit, hard disk I/O no longer works.
Instead of rewriting the g_poll implementation, this patch simply copies
the original code (released under LGPL) from latest glib and only modifies
it where needed (see comments in the code). URL of the original code:
https://git.gnome.org/browse/glib/tree/glib/gpoll.c
Nikolay Nikolaev [Tue, 27 May 2014 12:06:29 +0000 (15:06 +0300)]
Add new vhost-user netdev backend
Add a new QEMU netdev backend that is intended to invoke vhost_net with the
vhost-user backend. It uses an Unix socket chardev to establish a
communication with the 'slave' (client and server mode supported).
At runtime the netdev will handle OPEN/CLOSE events from the chardev. Upon
disconnection it will set link_down accordingly and notify virtio-net; the
virtio-net interface will go down.
Nikolay Nikolaev [Tue, 27 May 2014 12:06:16 +0000 (15:06 +0300)]
vhost-net: vhost-user feature bits support
Handle the feature bits negotiation when using vhost-user. Allow
the underlying implementation to have a finer control over all the
bits except the VIRTIO_NET_F_MAC.
Nikolay Nikolaev [Tue, 27 May 2014 12:06:02 +0000 (15:06 +0300)]
Add vhost-user as a vhost backend.
The initialization takes a chardev backed by a unix domain socket.
It should implement qemu_fe_set_msgfds in order to be able to pass
file descriptors to the remote process.
Each ioctl request of vhost-kernel has a vhost-user message equivalent,
which is sent over the control socket.
The general approach is to copy the data from the supplied argument
pointer to a designated field in the message. If a file descriptor is
to be passed it will be placed in the fds array for inclusion in
the sendmsg control header.
VHOST_SET_MEM_TABLE ignores the supplied vhost_memory structure and scans
the global ram_list for ram blocks with a valid fd field set. This would
be set when the '-object memory-file' option with share=on property is used.
Nikolay Nikolaev [Tue, 27 May 2014 12:05:49 +0000 (15:05 +0300)]
Add vhost-backend and VhostBackendType
Use vhost_set_backend_type to initialise a proper vhost_ops structure.
In vhost_net_init and vhost_net_start_one call conditionally TAP related
initialisation depending on the vhost backend type.
Nikolay Nikolaev [Tue, 27 May 2014 12:05:35 +0000 (15:05 +0300)]
Add vhost_ops to vhost_dev struct and replace all relevant ioctls
Decouple vhost from the Linux kernel by introducing vhost_ops. The
intention is to provide different backends - a 'kernel' backend based on
the ioctl interface, and an 'user' backend based on a UNIX domain socket
and shared memory interface.
Nikolay Nikolaev [Tue, 27 May 2014 12:05:22 +0000 (15:05 +0300)]
vhost_net_init will use VhostNetOptions to get all its arguments
vhost_dev_init will replace devfd and devpath with a single opaque argument.
This is initialised with a file descriptor. When TAP is used (through
vhost_net), open /dev/vhost-net and pass the fd as an opaque parameter in
VhostNetOptions. The same applies to vhost-scsi - open /dev/vhost-scsi and
pass the fd.
Nikolay Nikolaev [Tue, 27 May 2014 12:04:55 +0000 (15:04 +0300)]
vhost_net should call the poll callback only when it is set
The poll callback needs to be called when bringing up or down
the vhost_net instance. As it is not mandatory for an NetClient
to implement it, invoke it only when it is set.
Nikolay Nikolaev [Tue, 27 May 2014 12:04:42 +0000 (15:04 +0300)]
vhost: add vhost_get_features and vhost_ack_features
Generalize the features get/ack to be used for both vhost-net and vhost-scsi.
In vhost-net add vhost_net_get_feature_bits to select the feature bit set
depending on the NetClient kind.
Nikolay Nikolaev [Tue, 27 May 2014 12:04:15 +0000 (15:04 +0300)]
Add chardev API qemu_chr_fe_get_msgfds
This extends the existing qemu_chr_fe_get_msgfd by allowing to read a set
of fds. The function for receiving the fds - unix_process_msgfd is extended
to allocate the needed array size.
Nikolay Nikolaev [Tue, 27 May 2014 12:04:02 +0000 (15:04 +0300)]
Add chardev API qemu_chr_fe_set_msgfds
This will set an array of file descriptors to the internal structures.
The next time a message is send the array will be send as ancillary
data. This feature works on the UNIX domain socket backend only.
Nikolay Nikolaev [Tue, 27 May 2014 12:03:48 +0000 (15:03 +0300)]
Add chardev API qemu_chr_fe_read_all
This function will attempt to read data from the chardev trying
to fill the buffer up to the given length.
Add tcp_chr_disconnect to reuse disconnection code where needed.
Jason Wang [Tue, 20 May 2014 06:01:44 +0000 (14:01 +0800)]
virtio-net: announce self by guest
It's hard to track all mac addresses and their configurations (e.g
vlan or ipv6) in qemu. Without this information, it's impossible to
build proper garp packet after migration. The only possible solution
to this is let guest (who knows all configurations) to do this.
So, this patch introduces a new readonly config status bit of virtio-net,
VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce
presence of its link through config update interrupt.When guest has
done the announcement, it should ack the notification through
VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated by a new
feature bit VIRTIO_NET_F_ANNOUNCE (which has already been supported by
Linux guest).
During load, a counter of announcing rounds is set so that after the vm is
running it can trigger rounds of config interrupts to notify the guest to build
and send the correct garps.
Jason Wang [Tue, 20 May 2014 06:01:43 +0000 (14:01 +0800)]
migration: introduce self_announce_delay()
This patch introduces self_announce_delay() to calculate the delay for
the next announce round. This could be used by other device e.g
virtio-net who wants to do announcing by itself.
Igor Mammedov [Mon, 2 Jun 2014 13:25:28 +0000 (15:25 +0200)]
pc: ACPI BIOS: reserve SRAT entry for hotplug mem hole
Needed for Windows to use hotplugged memory device, otherwise
it complains that server is not configured for memory hotplug.
Tests shows that aftewards it uses dynamically provided
proximity value from _PXM() method if available.