Alex Bennée [Thu, 27 Oct 2016 15:10:07 +0000 (16:10 +0100)]
target-arm/arm-powerctl: wake up sleeping CPUs
Testing with Alexander's bare metal syncronisation tests fails in MTTCG
leaving one CPU spinning forever waiting for the second CPU to wake up.
We simply need to kick the vCPU once we have processed the PSCI power on
call.
As the power control API is for system emulation only as is the
qemu_kick_cpu function we also ensure we only build arm-powerctl for
SoftMMU builds.
KONRAD Frederic [Thu, 27 Oct 2016 15:10:06 +0000 (16:10 +0100)]
tcg: protect translation related stuff with tb_lock.
This protects all translation related work with tb_lock() too ensure
thread safety. This effectively serialises all code generation. In
addition to the code generation we also take the lock for TB
invalidation. This has a knock on effect of meaning tb_lock() is held
for modification of the SoftMMU TLB by non-self threads which will be
used in later patches.
This adds calls to the assert_(memory|tb)_lock for all public APIs which
are documented as needing them held for linux-user mode. The asserts are
NOPs for system-mode although these will be converted when MTTCG is
enabled.
Alex Bennée [Thu, 27 Oct 2016 15:10:04 +0000 (16:10 +0100)]
linux-user/elfload: ensure mmap_lock() held while setting up
Future patches will enforce the holding of mmap_lock() when we are
manipulating internal memory structures. Technically it doesn't matter
in the case of elfload as we haven't started executing yet. However it
is easier to grab the lock when required than special case the
translate-all API.
Paolo Bonzini [Thu, 27 Oct 2016 15:10:03 +0000 (16:10 +0100)]
tcg: comment on which functions have to be called with tb_lock held
softmmu requires more functions to be thread-safe, because translation
blocks can be invalidated from e.g. notdirty callbacks. Probably the
same holds for user-mode emulation, it's just that no one has ever
tried to produce a coherent locking there.
This patch will guide the introduction of more tb_lock and tb_unlock
calls for system emulation.
Note that after this patch some (most) of the mentioned functions are
still called outside tb_lock/tb_unlock. The next one will rectify this.
Alex Bennée [Thu, 27 Oct 2016 15:10:00 +0000 (16:10 +0100)]
translate-all: add DEBUG_LOCKING asserts
This adds asserts to check the locking on the various translation
engines structures. There are two sets of structures that are protected
by locks.
The first the l1map and PageDesc structures used to track which
translation blocks are associated with which physical addresses. In
user-mode this is covered by the mmap_lock.
The second case are TB context related structures which are protected by
tb_lock which is also user-mode only.
Currently the asserts do nothing in SoftMMU mode but this will change
for MTTCG.
Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. However, portable parallel
code is written assuming only cmpxchg which means that in
practice this is a viable alternative.
Emilio G. Cota [Mon, 27 Jun 2016 19:02:13 +0000 (15:02 -0400)]
target-arm: emulate aarch64's LL/SC using cmpxchg helpers
Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.
The appended emulates LL/SC pairs in aarch64 with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.
Emilio G. Cota [Mon, 27 Jun 2016 19:02:08 +0000 (15:02 -0400)]
target-arm: emulate LL/SC using cmpxchg helpers
Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.
The appended emulates LL/SC pairs in ARM with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.
target-arm: Rearrange aa32 load and store functions
Stop specializing on TARGET_LONG_BITS == 32; unconditionally allocate
a temp and expand with tcg_gen_extu_i32_tl. Split out gen_aa32_addr,
gen_aa32_frob64, gen_aa32_ld_i32 and gen_aa32_st_i32 as separate interfaces.
Emilio G. Cota [Mon, 27 Jun 2016 19:02:05 +0000 (15:02 -0400)]
tests: add atomic_add-bench
With this microbenchmark we can measure the overhead of emulating atomic
instructions with a configurable degree of contention.
The benchmark spawns $n threads, each performing $o atomic ops (additions)
in a loop. Each atomic operation is performed on a different cache line
(assuming lines are 64b long) that is randomly selected from a range [0, $r).
Emilio G. Cota [Mon, 27 Jun 2016 19:02:06 +0000 (15:02 -0400)]
target-i386: remove helper_lock()
It's been superseded by the atomic helpers.
The use of the atomic helpers provides a significant performance and scalability
improvement. Below is the result of running the atomic_add-test microbenchmark with:
$ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n
, where $n is the number of threads and $r is the allowed range for the additions.
The scenarios measured are:
- atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset)
- cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper
- master: before this patchset
Results sorted in ascending range, i.e. descending degree of contention.
Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64
Opteron 6376 cores.
For master I stopped measuring master after 8 threads, because there is little
point in measuring the well-known performance collapse of a contended lock.
Allow qemu to build on 32-bit hosts without 64-bit atomic ops.
Even if we only allow 32-bit hosts to multi-thread emulate 32-bit
guests, we still need some way to handle the 32-bit guest using a
64-bit atomic operation. Do so by dropping back to single-step.
Wikipedia suggests that only very old AMD64 (circa 2004) did not have
this instruction. Further, it's required by Windows 8 so no new cpus
will ever omit it.
If we truely care about these, then we could check this at startup time
and then avoid executing paths that use it.
Add all of cmpxchg, op_fetch, fetch_op, and xchg.
Handle both endian-ness, and sizes up to 8.
Handle expanding non-atomically, when emulating in serial.
Alex Bennée [Wed, 5 Oct 2016 18:13:04 +0000 (11:13 -0700)]
linux-user: enable parallel code generation on clone
The variable parallel_cpus controls the generation of thread aware
atomic code. We only need to set it once we clone our first thread.
At this point any existing translations need to be thrown away.
While the check against sizeof(void *) is appropriate for
normal usage within qemu, there are places in which we want
wider operaions and have checked for their existance.
Peter Maydell [Tue, 25 Oct 2016 16:03:11 +0000 (17:03 +0100)]
Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2016-10-25' into staging
QAPI patches for 2016-10-25
# gpg: Signature made Tue 25 Oct 2016 16:56:27 BST
# gpg: using RSA key 0x3870B400EB918653
# gpg: Good signature from "Markus Armbruster <[email protected]>"
# gpg: aka "Markus Armbruster <[email protected]>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* remotes/armbru/tags/pull-qapi-2016-10-25:
qdict: implement a qdict_crumple method for un-flattening a dict
qapi: don't pass two copies of TestInputVisitorData to tests
qapi: rename QmpOutputVisitor to QObjectOutputVisitor
qapi: rename QmpInputVisitor to QObjectInputVisitor
qapi: rename *qmp-*-visitor* to *qobject-*-visitor*
qapi: add trace events for visitor
trivial: Restore blank line in qapi-schema
qdict: implement a qdict_crumple method for un-flattening a dict
The qdict_flatten() method will take a dict whose elements are
further nested dicts/lists and flatten them by concatenating
keys.
The qdict_crumple() method aims to do the reverse, taking a flat
qdict, and turning it into a set of nested dicts/lists. It will
apply nesting based on the key name, with a '.' indicating a
new level in the hierarchy. If the keys in the nested structure
are all numeric, it will create a list, otherwise it will create
a dict.
If the keys are a mixture of numeric and non-numeric, or the
numeric keys are not in strictly ascending order, an error will
be reported.
The intent of this function is that it allows a set of QemuOpts
to be turned into a nested data structure that mirrors the nesting
used when the same object is defined over QMP.
qapi: don't pass two copies of TestInputVisitorData to tests
The input_visitor_test_add() method was accepting an instance
of 'TestInputVisitorData' and passing it as the 'user_data'
parameter to test functions. The main 'TestInputVisitorData'
instance that was actually used, was meanwhile being allocated
automatically by the test framework fixture setup.
The 'user_data' parameter is going to be needed for tests
added in later patches, so getting rid of the current mistaken
usage now allows this.
qapi: rename QmpOutputVisitor to QObjectOutputVisitor
The QmpOutputVisitor has no direct dependency on QMP. It is
valid to use it anywhere that one wants a QObject. Rename it
to better reflect its functionality as a generic QAPI
to QObject converter.
The commit before previous renamed the files, this one renames C
identifiers.
qapi: rename QmpInputVisitor to QObjectInputVisitor
The QmpInputVisitor has no direct dependency on QMP. It is
valid to use it anywhere that one has a QObject. Rename it
to better reflect its functionality as a generic QObject
to QAPI converter.
The previous commit renamed the files, this one renames C identifiers.
qapi: rename *qmp-*-visitor* to *qobject-*-visitor*
The QMP visitors have no direct dependency on QMP. It is
valid to use them anywhere that one has a QObject. Rename them
to better reflect their functionality as a generic QObject
to QAPI converter.
This is the first of three parts: rename the files. The next two
parts will rename C identifiers. The split is necessary to make git
rename detection work.
Eric Blake [Mon, 17 Oct 2016 21:29:54 +0000 (16:29 -0500)]
trivial: Restore blank line in qapi-schema
Commit de63ab6 accidentally undid part of commit a43edcf,
because the two patches were written in parallel, and the
blank line was not noticed as a casualty of merge conflicts.
Peter Maydell [Tue, 25 Oct 2016 09:25:27 +0000 (10:25 +0100)]
Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into staging
x86 and CPU queue, 2016-10-24
x2APIC support to APIC code, cpu_exec_init() refactor on all
architectures, and other x86 changes.
# gpg: Signature made Mon 24 Oct 2016 20:51:14 BST
# gpg: using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <[email protected]>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6
* remotes/ehabkost/tags/x86-pull-request:
exec: call cpu_exec_exit() from a CPU unrealize common function
exec: move cpu_exec_init() calls to realize functions
exec: split cpu_exec_init()
pc: q35: Bump max_cpus to 288
pc: Require IRQ remapping and EIM if there could be x2APIC CPUs
pc: Add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs
Increase MAX_CPUMASK_BITS from 255 to 288
pc: Clarify FW_CFG_MAX_CPUS usage comment
pc: kvm_apic: Pass APIC ID depending on xAPIC/x2APIC mode
pc: apic_common: Reset APIC ID to initial ID when switching into x2APIC mode
pc: apic_common: Restore APIC ID to initial ID on reset
pc: apic_common: Extend APIC ID property to 32bit
pc: Leave max apic_id_limit only in legacy cpu hotplug code
acpi: cphp: Force switch to modern cpu hotplug if APIC ID > 254
pc: acpi: x2APIC support for SRAT table
pc: acpi: x2APIC support for MADT table and _MAT method
Laurent Vivier [Thu, 20 Oct 2016 11:26:04 +0000 (13:26 +0200)]
exec: call cpu_exec_exit() from a CPU unrealize common function
As cpu_exec_exit() mirrors the cpu_exec_realizefn(),
rename it as cpu_exec_unrealizefn().
Create and register a cpu_common_unrealizefn() function for
the CPU device class and call cpu_exec_unrealizefn() from
this function.
Remove cpu_exec_exit() from cpu_common_finalize()
(which mirrors init, not realize), and as x86_cpu_unrealizefn()
and ppc_cpu_unrealizefn() overwrite the device class unrealize function,
add a call to a parent_unrealize pointer.
Laurent Vivier [Thu, 20 Oct 2016 11:26:03 +0000 (13:26 +0200)]
exec: move cpu_exec_init() calls to realize functions
Modify all CPUs to call it from XXX_cpu_realizefn() function.
Remove all the cannot_destroy_with_object_finalize_yet as
unsafe references have been moved to cpu_exec_realizefn().
(tested with QOM command provided by commit 4c315c27)
for arm:
Setting of cpu->mp_affinity is moved from arm_cpu_initfn()
to arm_cpu_realizefn() as setting of cpu_index is now done
in cpu_exec_realizefn(). To avoid to overwrite an user defined
value, we set it to an invalid value by default, and update
it in realize function only if the value is still invalid.
Laurent Vivier [Thu, 20 Oct 2016 11:26:02 +0000 (13:26 +0200)]
exec: split cpu_exec_init()
Put in cpu_exec_initfn() what initializes the CPU,
and leave in cpu_exec_init() what adds it to the environment.
As cpu_exec_initfn() is called by all XX_cpu_initfn(), call it
directly in cpu_common_initfn().
cpu_exec_init() is now a realize function, it will be renamed
to cpu_exec_realizefn() and moved to the XX_cpu_realizefn()
function in a following patch.
Igor Mammedov [Thu, 20 Oct 2016 14:58:42 +0000 (16:58 +0200)]
pc: Add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs
Currently firmware uses 1 byte at 0x5F offset in RTC CMOS
to get number of CPUs present at boot. However 1 byte is
not enough to handle more than 255 CPUs. So add a new
fw_cfg file that would allow QEMU to tell it.
For compat reasons add file only for machine types that
support more than 255 CPUs.
Igor Mammedov [Wed, 19 Oct 2016 12:05:37 +0000 (14:05 +0200)]
pc: apic_common: Reset APIC ID to initial ID when switching into x2APIC mode
SDM: x2APIC State Transitions:
State Changes From xAPIC Mode to x2APIC Mode
"
Any APIC ID value written to the memory-mapped
local APIC ID register is not preserved
"
Igor Mammedov [Wed, 19 Oct 2016 12:05:33 +0000 (14:05 +0200)]
acpi: cphp: Force switch to modern cpu hotplug if APIC ID > 254
Switch to modern cpu hotplug at machine startup time if
a cpu present at boot has apic-id in range unsupported
by legacy cpu hotplug interface (i.e. > 254), to avoid
killing QEMU from legacy cpu hotplug code with error:
"acpi: invalid cpu id: #apic-id#"
Peter Maydell [Mon, 24 Oct 2016 18:37:33 +0000 (19:37 +0100)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20161024' into staging
target-arm queue:
* support variable (runtime-determined) page sizes, for a
nearly-20% speedup of TCG for ARMv7 and v8 CPUs with 4K pages
* ptimer: add tests, support more flexible behaviour around
what happens on the "zero" tick, use ptimer for a9gtimer
* virt: ACPI: Add IORT Structure definition
* i2c: Fix SMBus read transactions to avoid double events
* timer: stm32f2xx_timer: add check for prescaler value
* QOMify musicpal, pxa2xx_gpio, strongarm, pl110
* target-arm: Implement new HLT trap for semihosting
* i2c: Add asserts for second smbus i2c_start_transfer()
Peter Maydell [Mon, 24 Oct 2016 17:26:59 +0000 (18:26 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches
# gpg: Signature made Mon 24 Oct 2016 17:02:47 BST
# gpg: using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (23 commits)
block/replication: Clarify 'top-id' parameter usage
block: More operations for meta dirty bitmap
tests: Add test code for hbitmap serialization
block: BdrvDirtyBitmap serialization interface
hbitmap: serialization
block: Assert that bdrv_release_dirty_bitmap succeeded
block: Add two dirty bitmap getters
block: Support meta dirty bitmap
tests: Add test code for meta bitmap
HBitmap: Introduce "meta" bitmap to track bit changes
block: Hide HBitmap in block dirty bitmap interface
quorum: do not allocate multiple iovecs for FIFO strategy
quorum: change child_iter to children_read
iotests: Do not rely on unavailable domains in 162
iotests: Remove raciness from 162
qemu-nbd: Add --fork option
qemu-iotests: Test I/O in a single drive from a throttling group
throttle: Correct access to wrong BlockBackendPublic structures
qapi: fix memory leak in bdrv_image_info_specific_dump
block: improve error handling in raw_open
...
Kevin Wolf [Mon, 24 Oct 2016 16:02:26 +0000 (18:02 +0200)]
Merge remote-tracking branch 'mreitz/tags/pull-block-2016-10-24' into queue-block
Block patches for master
# gpg: Signature made Mon Oct 24 17:56:44 2016 CEST
# gpg: using RSA key 0xF407DB0061D5CF40
# gpg: Good signature from "Max Reitz <[email protected]>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40
* mreitz/tags/pull-block-2016-10-24:
block/replication: Clarify 'top-id' parameter usage
block: More operations for meta dirty bitmap
tests: Add test code for hbitmap serialization
block: BdrvDirtyBitmap serialization interface
hbitmap: serialization
block: Assert that bdrv_release_dirty_bitmap succeeded
block: Add two dirty bitmap getters
block: Support meta dirty bitmap
tests: Add test code for meta bitmap
HBitmap: Introduce "meta" bitmap to track bit changes
block: Hide HBitmap in block dirty bitmap interface
quorum: do not allocate multiple iovecs for FIFO strategy
quorum: change child_iter to children_read
Fam Zheng [Thu, 13 Oct 2016 21:58:30 +0000 (17:58 -0400)]
block: More operations for meta dirty bitmap
Callers can create an iterator of meta bitmap with
bdrv_dirty_meta_iter_new(), then use the bdrv_dirty_iter_* operations on
it. Meta iterators are also counted by bitmap->active_iterators.
Also add a couple of functions to retrieve granularity and count.
Functions to serialize / deserialize(restore) HBitmap. HBitmap should be
saved to linear sequence of bits independently of endianness and bitmap
array element (unsigned long) size. Therefore Little Endian is chosen.
These functions are appropriate for dirty bitmap migration, restoring
the bitmap in several steps is available. To save performance, every
step writes only the last level of the bitmap. All other levels are
restored by hbitmap_deserialize_finish() as a last step of restoring.
So, HBitmap is inconsistent while restoring.
Fam Zheng [Thu, 13 Oct 2016 21:58:26 +0000 (17:58 -0400)]
block: Assert that bdrv_release_dirty_bitmap succeeded
We use a loop over bs->dirty_bitmaps to make sure the caller is
only releasing a bitmap owned by bs. Let's also assert that in this case
the caller is releasing a bitmap that does exist.
Fam Zheng [Thu, 13 Oct 2016 21:58:21 +0000 (17:58 -0400)]
block: Hide HBitmap in block dirty bitmap interface
HBitmap is an implementation detail of block dirty bitmap that should be hidden
from users. Introduce a BdrvDirtyBitmapIter to encapsulate the underlying
HBitmapIter.
A small difference in the interface is, before, an HBitmapIter is initialized
in place, now the new BdrvDirtyBitmapIter must be dynamically allocated because
the structure definition is in block/dirty-bitmap.c.
Paolo Bonzini [Wed, 5 Oct 2016 16:35:27 +0000 (18:35 +0200)]
quorum: do not allocate multiple iovecs for FIFO strategy
In FIFO mode there are no parallel reads, hence there is no need to
allocate separate buffers and clone the iovecs.
The two cases of quorum_aio_cb are now even more different, and
most of quorum_aio_finalize is only needed in one of them, so split
them in separate functions.
Max Reitz [Wed, 28 Sep 2016 20:46:44 +0000 (22:46 +0200)]
iotests: Do not rely on unavailable domains in 162
There are some (mostly ISP-specific) name servers who will redirect
non-existing domains to special hosts. In this case, we will get a
different error message when trying to connect to such a host, which
breaks test 162.
162 needed this specific error message so it can confirm that qemu was
indeed trying to connect to the user-specified port. However, we can
also confirm this by setting up a local NBD server on exactly that port;
so we can fix the issue by doing just that.
Max Reitz [Wed, 28 Sep 2016 20:46:42 +0000 (22:46 +0200)]
qemu-nbd: Add --fork option
Using the --fork option, one can make qemu-nbd fork the worker process.
The original process will exit on error of the worker or once the worker
enters the main loop.
Alberto Garcia [Mon, 17 Oct 2016 15:46:03 +0000 (18:46 +0300)]
qemu-iotests: Test I/O in a single drive from a throttling group
iotest 093 contains a test that creates a throttling group with
several drives and performs I/O in all of them. This patch adds a new
test that creates a similar setup but only performs I/O in one of the
drives at the same time.
This is useful to test that the round robin algorithm is behaving
properly in these scenarios, and is specifically written using the
regression introduced in 27ccdd52598290f0f8b58be56e as an example.
Alberto Garcia [Mon, 17 Oct 2016 15:46:02 +0000 (18:46 +0300)]
throttle: Correct access to wrong BlockBackendPublic structures
In 27ccdd52598290f0f8b58be56e235aff7aebfaf3 the throttling fields were
moved from BlockDriverState to BlockBackend. However in a few cases
the code started using throttling fields from the active BlockBackend
instead of the round-robin token, making the algorithm behave
incorrectly.
This can cause starvation if there's a throttling group with several
drives but only one of them has I/O.
Halil Pasic [Tue, 11 Oct 2016 14:12:35 +0000 (16:12 +0200)]
block: improve error handling in raw_open
Make raw_open for POSIX more consistent in handling errors by setting
the error object also when qemu_open fails. The error object was set
generally set in case of errors, but I guess this case was overlooked.
Do the same for win32.
Kevin Wolf [Fri, 7 Oct 2016 15:05:04 +0000 (17:05 +0200)]
block: Remove "options" indirection from blockdev-add
Now that QAPI supports boxed types, we can have unions at the top level
of a command, so let's put our real options directly there for
blockdev-add instead of having a single "options" dict that contains the
real arguments.
blockdev-add is still experimental and we already made substantial
changes to the API recently, so we're free to make changes like this
one, too.
Xu Tian [Sun, 9 Oct 2016 09:17:27 +0000 (17:17 +0800)]
block: failed qemu-img command should return non-zero exit code
If the backing file cannot be opened when doing qemu-img rebase, the
variable 'ret' was not assigned a non-zero value, and the qemu-img
process terminated with exit code zero. Fix this.
Corey Minyard [Mon, 24 Oct 2016 15:42:33 +0000 (10:42 -0500)]
i2c: Add asserts for second smbus i2c_start_transfer()
Some SMBus operations restart the transfer to convert from
write to read mode without an intervening i2c_end_transfer().
The second call cannot fail, so the return code is unchecked,
but this causes Coverity to complain. So add some asserts
and documentation about this.
Peter Maydell [Mon, 24 Oct 2016 15:26:56 +0000 (16:26 +0100)]
target-arm: Implement new HLT trap for semihosting
Version 2.0 of the semihosting specification introduces new trap
instructions for AArch32: HLT 0xF000 for A32 and HLT 0x3C for T32.
Implement these (in the same way we implement the existing HLT
semihosting trap for A64).