Peter Maydell [Thu, 22 Jun 2017 10:34:38 +0000 (11:34 +0100)]
Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2017-06-09-v2' into staging
QAPI patches for 2017-06-09
# gpg: Signature made Tue 20 Jun 2017 13:31:39 BST
# gpg: using RSA key 0x3870B400EB918653
# gpg: Good signature from "Markus Armbruster <[email protected]>"
# gpg: aka "Markus Armbruster <[email protected]>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* remotes/armbru/tags/pull-qapi-2017-06-09-v2: (41 commits)
tests/qdict: check more get_try_int() cases
console: use get_uint() for "head" property
i386/cpu: use get_uint() for "min-level"/"min-xlevel" properties
numa: use get_uint() for "size" property
pnv-core: use get_uint() for "core-pir" property
pvpanic: use get_uint() for "ioport" property
auxbus: use get_uint() for "addr" property
arm: use get_uint() for "mp-affinity" property
xen: use get_uint() for "max-ram-below-4g" property
pc: use get_uint() for "hpet-intcap" property
pc: use get_uint() for "apic-id" property
pc: use get_uint() for "iobase" property
acpi: use get_uint() for "pci-hole*" properties
acpi: use get_uint() for various acpi properties
acpi: use get_uint() for "acpi-pcihp-io*" properties
platform-bus: use get_uint() for "addr" property
bcm2835_fb: use {get, set}_uint() for "vcram-size" and "vcram-base"
aspeed: use {set, get}_uint() for "ram-size" property
pcihp: use get_uint() for "bsel" property
pc-dimm: make "size" property uint64
...
* remotes/rth/tags/pull-tcg-20170619:
target/arm: Exit after clearing aarch64 interrupt mask
target/s390x: Exit after changing PSW mask
target/alpha: Use tcg_gen_lookup_and_goto_ptr
tcg: Increase hit rate of lookup_tb_ptr
tcg/arm: Use ldr (literal) for goto_tb
tcg/arm: Try pc-relative addresses for movi
tcg/arm: Remove limit on code buffer size
tcg/arm: Use indirect branch for goto_tb
tcg/aarch64: Use ADR in tcg_out_movi
translate-all: consolidate tb init in tb_gen_code
tcg: allocate TB structs before the corresponding translated code
util: add cacheinfo
$ make subdir-arm-softmmu
make[1]: *** No rule to make target 'tci.o', needed by 'qemu-system-arm'. Stop.
Makefile:328: recipe for target 'subdir-arm-softmmu' failed
make: *** [subdir-arm-softmmu] Error 2
Peter Maydell [Tue, 20 Jun 2017 15:01:15 +0000 (16:01 +0100)]
Merge remote-tracking branch 'remotes/famz/tags/docker-and-block-pull-request' into staging
# gpg: Signature made Fri 16 Jun 2017 01:18:46 BST
# gpg: using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6
* remotes/famz/tags/docker-and-block-pull-request: (23 commits)
block: make accounting thread-safe
block: split BlockAcctStats creation and setup
block: introduce block_account_one_io
block: protect modification of dirty bitmaps with a mutex
migration/block: reset dirty bitmap before reading
block: introduce dirty_bitmap_mutex
block: protect tracked_requests and flush_queue with reqs_lock
block: access write_gen with atomics
block: use Stat64 for wr_highest_offset
util: add stats64 module
throttle-groups: protect throttled requests with a CoMutex
throttle-groups: do not use qemu_co_enter_next
throttle-groups: only start one coroutine from drained_begin
block: access io_plugged with atomic ops
block: access wakeup with atomic ops
block: access serialising_in_flight with atomic ops
block: access io_limits_disabled with atomic ops
block: access quiesce_counter with atomic ops
block: access copy_on_read with atomic ops
docker: Add flex and bison to centos6 image
...
* remotes/bonzini/tags/for-upstream: (41 commits)
vhost-user-scsi: Introduce a vhost-user-scsi sample application
vhost-user-scsi: Introduce vhost-user-scsi host device
qemu-doc: include version number
docs: create interop/ subdirectory
include/exec/poison: Mark some CONFIG defines as poisoned, too
include/exec/poison: Add missing TARGET defines
nbd/server: refactor nbd_trip
nbd/server: rename rc to ret
nbd/server: get rid of fail: return rc
nbd/server: nbd_negotiate: fix error path
nbd/server: remove NBDClientNewData
nbd/server: refactor nbd_co_receive_request
nbd/server: get rid of EAGAIN dead code
nbd/server: refactor nbd_co_send_reply
nbd/server: get rid of ssize_t
nbd/server: get rid of nbd_negotiate_read and friends
nbd: make nbd_drop public
nbd: rename read_sync and friends
accel: move kvm related accelerator files into accel/
tcg: move tcg backend files into accel/tcg/
...
xen: use get_uint() for "max-ram-below-4g" property
TYPE_PC_MACHINE's property PC_MACHINE_MAX_RAM_BELOW_4G's getter and
setter pc_machine_get_max_ram_below_4g() and
pc_machine_set_max_ram_below_4g() use visit_type_size()
PIIX4: piix4_pm_add_propeties() defines these with
object_property_add_uint*_ptr().
Q35: ich9_lpc_add_properties() and ich9_pm_add_properties() define them
similarly, except for ACPI_PM_PROP_GPE0_BLK(). That one's getter
ich9_pm_get_gpe0_blk() uses visit_type_uint32().
The getter and setter of TYPE_APIC_COMMON property "id" are
apic_common_get_id() and apic_common_set_id().
apic_common_get_id() reads either APICCommonState member uint32_t
initial_apic_id or uint8_t id into an int64_t local variable. It then
passes this variable to visit_type_int().
apic_common_set_id() uses visit_type_int() to read the value into a
local variable, which it then assigns both to initial_apic_id and id.
While the state backing the property is two unsigned members, 8 and 32
bits wide, the actual visitor is 64 bits signed.
Change getter and setter to use visit_type_uint32(). Then everything's
uint32_t, except for @id.
Switch to use QNum/uint where appropriate to remove i64 limitation.
The input visitor will cast i64 input to u64 for compatibility
reasons (existing json QMP client already use negative i64 for large
u64, and expect an implicit cast in qemu).
Note: before the patch, uint64_t values above INT64_MAX are sent over
json QMP as negative values, e.g. UINT64_MAX is sent as -1. After the
patch, they are sent unmodified. Clearly a bug fix, but we have to
consider compatibility issues anyway. libvirt should cope fine,
because its parsing of unsigned integers accepts negative values
modulo 2^64. There's hope that other clients will, too.
Before the previous commit, parameter promote_int = true made
visit_start_alternate() with an input visitor avoid QTYPE_QINT
variants and create QTYPE_QFLOAT variants instead. This was used
where QTYPE_QINT variants were invalid.
The previous commit fused QTYPE_QINT with QTYPE_QFLOAT, rendering
promote_int useless and unused.
We would like to use a same QObject type to represent numbers, whether
they are int, uint, or floats. Getters will allow some compatibility
between the various types if the number fits other representations.
* remotes/vivier/tags/m68k-for-2.10-pull-request:
target-m68k: define ext_opsize
target-m68k: move FPU helpers to fpu_helper.c
softfloat: define 680x0 specific values
target/m68k: fix V flag for CC_OP_SUBx
We can call tb_htable_lookup even when the tb_jmp_cache is completely
empty. Therefore, un-nest most of the code dependent on tb != NULL
from the read from the cache.
This improves the hit rate of lookup_tb_ptr; for instance, when booting
and immediately shutting down debian-arm, the hit rate improves from
93.2% to 99.4%.
Emilio G. Cota [Tue, 6 Jun 2017 23:12:25 +0000 (19:12 -0400)]
tcg: allocate TB structs before the corresponding translated code
Allocating an arbitrarily-sized array of tbs results in either
(a) a lot of memory wasted or (b) unnecessary flushes of the code
cache when we run out of TB structs in the array.
An obvious solution would be to just malloc a TB struct when needed,
and keep the TB array as an array of pointers (recall that tb_find_pc()
needs the TB array to run in O(log n)).
Perhaps a better solution, which is implemented in this patch, is to
allocate TB's right before the translated code they describe. This
results in some memory waste due to padding to have code and TBs in
separate cache lines--for instance, I measured 4.7% of padding in the
used portion of code_gen_buffer when booting aarch64 Linux on a
host with 64-byte cache lines. However, it can allow for optimizations
in some host architectures, since TCG backends could safely assume that
the TB and the corresponding translated code are very close to each
other in memory. See this message by rth for a detailed explanation:
* remotes/kraxel/tags/pull-ui-20170614-1:
spice: don't enter opengl mode in case another UI provides opengl support
sdl: prefer sdl2 over sdl1
gtk: prefer gtk3 over gtk2
spice: Use proper enum type for kbd led state
Improve Cocoa modifier key handling
Fam Zheng [Fri, 16 Jun 2017 16:06:58 +0000 (00:06 +0800)]
migration: Fix race of image locking between src and dst
Previously, dst side will immediately try to lock the write byte upon
receiving QEMU_VM_EOF, but at src side, bdrv_inactivate_all() is only
done after sending it. If the src host is under load, dst may fail to
acquire the lock due to racing with the src unlocking it.
Fix this by hoisting the bdrv_inactivate_all() operation before
QEMU_VM_EOF.
N.B. A further improvement could possibly be done to cleanly handover
locks between src and dst, so that there is no window where a third QEMU
could steal the locks and prevent src and dst from running.
N.B. This commit includes a minor improvement to the error handling
by using qemu_file_set_error().
tests: Remove test cases for alternates of 'number' and 'int'
Alternates with both a 'number' and an 'int' branch will become
invalid when the next patch merges of QFloat and QInt into QNum.
More sophisticated alternate code could keep them valid, but since
we have no users outside tests, simply drop the tests.
Stefan Hajnoczi [Fri, 9 Jun 2017 15:16:15 +0000 (16:16 +0100)]
hw/i386: fix nvdimm check error path
Commit e987c37aee1752177906847630d32477da57e705 ("hw/i386: check if
nvdimm is enabled before plugging") introduced a check to reject nvdimm
hotplug if -machine pc,nvdimm=on was not given.
This check executes after pc_dimm_memory_plug() has already completed
and does not reverse the effect of this function in the case of failure.
Perform the check before calling pc_dimm_memory_plug(). This fixes the
following abort:
$ qemu -M accel=kvm -m 1G,slots=4,maxmem=8G \
-object memory-backend-file,id=mem1,share=on,mem-path=nvdimm.dat,size=1G
(qemu) device_add nvdimm,memdev=mem1
nvdimm is not enabled: missing 'nvdimm' in '-M'
(qemu) device_add nvdimm,memdev=mem1
Core dumped
The backtrace is:
#0 0x00007fffdb5b191f in raise () at /lib64/libc.so.6
#1 0x00007fffdb5b351a in abort () at /lib64/libc.so.6
#2 0x00007fffdb5a9da7 in __assert_fail_base () at /lib64/libc.so.6
#3 0x00007fffdb5a9e52 in () at /lib64/libc.so.6
#4 0x000055555577a5fa in qemu_ram_set_idstr (new_block=0x555556747a00, name=<optimized out>, dev=dev@entry=0x555556705590) at qemu/exec.c:1709
#5 0x0000555555a0fe86 in vmstate_register_ram (mr=mr@entry=0x55555673a0e0, dev=dev@entry=0x555556705590) at migration/savevm.c:2293
#6 0x0000555555965088 in pc_dimm_memory_plug (dev=dev@entry=0x555556705590, hpms=hpms@entry=0x5555566bb0e0, mr=mr@entry=0x555556705630, align=<optimized out>, errp=errp@entry=0x7fffffffc660)
at hw/mem/pc-dimm.c:110
#7 0x000055555581d89b in pc_dimm_plug (errp=0x7fffffffc6c0, dev=0x555556705590, hotplug_dev=<optimized out>) at qemu/hw/i386/pc.c:1713
#8 0x000055555581d89b in pc_machine_device_plug_cb (hotplug_dev=<optimized out>, dev=0x555556705590, errp=0x7fffffffc6c0) at qemu/hw/i386/pc.c:2004
#9 0x0000555555914da6 in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x7fffffffc7e8) at hw/core/qdev.c:926
Peter Xu [Fri, 9 Jun 2017 13:53:28 +0000 (21:53 +0800)]
intel_iommu: cleanup vtd_{do_}iommu_translate()
First, let vtd_do_iommu_translate() return a status, so that we
explicitly knows whether error occured. Meanwhile, we make sure that
IOMMUTLBEntry is filled in in that.
Then, cleanup vtd_iommu_translate a bit. So even with PT we'll get a log
now. Also, remove useless assignments.
Laszlo Ersek [Fri, 16 Jun 2017 11:22:01 +0000 (13:22 +0200)]
tests/q35-test: add TSEG size checks
These checks verify that the guest RAM turns from read-write to
"blackhole" when crossing the low boundary of the TSEG. Both the standard
1MB/2MB/8MB TSEG sizes and an extended (16MB) TSEG size are tested.
Laszlo Ersek [Fri, 16 Jun 2017 11:22:00 +0000 (13:22 +0200)]
tests/q35-test: push down qtest_start / qtest_end to test case(s)
A test program can start up QEMU several times, with different command
lines. For such cases, qtest_start() and qtest_end() are called from
within the individual test functions. Examples: "virtio-console-test.c",
"numa-test.c", and many others.
Laszlo Ersek [Thu, 8 Jun 2017 16:10:13 +0000 (18:10 +0200)]
q35/mch: implement extended TSEG sizes
The q35 machine type currently lets the guest firmware select a 1MB, 2MB
or 8MB TSEG (basically, SMRAM) size. In edk2/OVMF, we use 8MB, but even
that is not enough when a lot of VCPUs (more than approx. 224) are
configured -- SMRAM footprint scales largely proportionally with VCPU
count.
Introduce a new property for "mch" called "extended-tseg-mbytes", which
expresses (in megabytes) the user's choice of TSEG (SMRAM) size.
Invent a new, QEMU-specific register in the config space of the DRAM
Controller, at offset 0x50, in order to allow guest firmware to query the
TSEG (SMRAM) size.
According to Intel Document Number 316966-002, Table 5-1 "DRAM Controller
Register Address Map (D0:F0)":
Warning: Address locations that are not listed are considered Intel
Reserved registers locations. Reads to Reserved registers may
return non-zero values. Writes to reserved locations may
cause system failures.
All registers that are defined in the PCI 2.3 specification,
but are not necessary or implemented in this component are
simply not included in this document. The
reserved/unimplemented space in the PCI configuration header
space is not documented as such in this summary.
Offsets 0x50 and 0x51 are not listed in Table 5-1. They are also not part
of the standard PCI config space header. And they precede the capability
list as well, which starts at 0xe0 for this device.
When the guest writes value 0xffff to this register, the value that can be
read back is that of "mch.extended-tseg-mbytes" -- unless it remains
0xffff. The guest is required to write 0xffff first (as opposed to a
read-only register) because PCI config space is generally not cleared on
QEMU reset, and after S3 resume or reboot, new guest firmware running on
old QEMU could read a guest OS-injected value from this register.
After reading the available "extended" TSEG size, the guest firmware may
actually request that TSEG size by writing pattern 11b to the ESMRAMC
register's TSEG_SZ bit-field. (The Intel spec referenced above defines
only patterns 00b (1MB), 01b (2MB) and 10b (8MB); 11b is reserved.)
On the QEMU command line, the value can be set with
-global mch.extended-tseg-mbytes=N
The default value for 2.10+ q35 machine types is 16. The value is limited
to 0xfff (4095) at the moment, purely so that the product (4095 MB) can be
stored to the uint32_t variable "tseg_size" in mch_update_smram(). Users
are responsible for choosing sensible TSEG sizes.
On 2.9 and earlier q35 machine types, the default value is 0. This lets
the 11b bit pattern in ESMRAMC.TSEG_SZ, and the register at offset 0x50,
keep their original behavior.
When "extended-tseg-mbytes" is nonzero, the new register at offset 0x50 is
set to that value on reset, for completeness.
PCI config space is migrated automatically, so no VMSD changes are
necessary.
Paolo Bonzini [Mon, 5 Jun 2017 12:39:07 +0000 (14:39 +0200)]
block: split BlockAcctStats creation and setup
block_acct_destroy is called unconditionally in blk_delete, but there is
no BlockAcctStats function that is called unconditionally in blk_new.
Split block_acct_init in two, so that it will be possible to create a
QemuMutex in block_acct_init and destroy it in block_acct_cleanup.
Paolo Bonzini [Mon, 5 Jun 2017 12:38:58 +0000 (14:38 +0200)]
throttle-groups: protect throttled requests with a CoMutex
Another possibility is to use tg->lock, which we're holding anyway in
both schedule_next_request and throttle_group_co_io_limits_intercept.
This would require open-coding the CoQueue however, so I've chosen this
alternative.
Paolo Bonzini [Mon, 5 Jun 2017 12:38:57 +0000 (14:38 +0200)]
throttle-groups: do not use qemu_co_enter_next
Prepare for removing this function; always restart throttled requests
from coroutine context. This will matter when restarting throttled
requests will have to acquire a CoMutex.
Paolo Bonzini [Mon, 5 Jun 2017 12:38:56 +0000 (14:38 +0200)]
throttle-groups: only start one coroutine from drained_begin
Starting all waiting coroutines from bdrv_drain_all is unnecessary;
throttle_group_co_io_limits_intercept calls schedule_next_request as
soon as the coroutine restarts, which in turn will restart the next
request if possible.
If we only start the first request and let the coroutines dance from
there the code is simpler and there is more reuse between
throttle_group_config, throttle_group_restart_blk and timer_cb. The
next patch will benefit from this.
We also stop accessing from throttle_group_restart_blk the
blkp->throttled_reqs CoQueues even when there was no
attached throttling group. This worked but is not pretty.
The only thing that can interrupt the dance is the QEMU_CLOCK_VIRTUAL
timer when switching from one block device to the next, because the
timer is set to "now + 1" but QEMU_CLOCK_VIRTUAL might not be running.
Set that timer to point in the present ("now") rather than the future
and things work.
* remotes/rth/tags/pull-s390-20170613:
s390x/cpumodel: wire up cpu type + id for TCG
target/s390x: rework PGM interrupt psw.addr handling
target/s390x: correctly indicate PER nullification
vhost-user-scsi: Introduce a vhost-user-scsi sample application
This commit introduces a vhost-user-scsi backend sample application. It
must be linked with libiscsi and libvhost-user.
To use it, compile with:
$ make vhost-user-scsi
And run as follows:
$ ./vhost-user-scsi -u vus.sock -i iscsi://uri_to_target/
$ qemu-system-x86_64 --enable-kvm -m 512 \
-object memory-backend-file,id=mem,size=512m,share=on,mem-path=guestmem \
-numa node,memdev=mem \
-chardev socket,id=vhost-user-scsi,path=vus.sock \
-device vhost-user-scsi-pci,chardev=vhost-user-scsi \
The application is currently limited at one LUN only and it processes
requests synchronously (therefore only achieving QD1). The purpose of
the code is to show how a backend can be implemented and to test the
vhost-user-scsi Qemu implementation.
If a different instance of this vhost-user-scsi application is executed
at a remote host, a VM can be live migrated to such a host.
This commit introduces a vhost-user device for SCSI. This is based
on the existing vhost-scsi implementation, but done over vhost-user
instead. It also uses a chardev to connect to the backend. Unlike
vhost-scsi (today), VMs using vhost-user-scsi can be live migrated.
To use it, start Qemu with a command line equivalent to:
Paolo Bonzini [Tue, 6 Jun 2017 14:55:19 +0000 (16:55 +0200)]
docs: create interop/ subdirectory
This is for the future interoperability & management guide. It includes
the QAPI docs, including the automatically generated ones, other socket
protocols (vhost-user, VNC), and the qcow2 file format.