Thomas Huth [Wed, 22 Jan 2020 13:40:20 +0000 (14:40 +0100)]
docs/devel: Fix qtest paths and info about check-block in testing.rst
The qtests have recently been moved to a separate subdirectory, so
the paths that are mentioned in the documentation have to be adjusted
accordingly. And some of the iotests are now always run as part of
"make check", so this information has to be adjusted here, too.
Peter Maydell [Mon, 3 Feb 2020 11:14:24 +0000 (11:14 +0000)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-docs-20200203' into staging
docs:
* Fix Makefile concurrency bug where we could run Sphinx twice
in parallel on the same manual (which makes it crash)
* Support handling hxtool doc fragments for rST manuals
* Convert qemu-img docs to rST
* Convert qemu-trace-stap docs to rST
* Convert virtfs-proxy-helper docs to rST
Peter Maydell [Fri, 24 Jan 2020 16:26:06 +0000 (16:26 +0000)]
virtfs-proxy-helper: Convert documentation to rST
The virtfs-proxy-helper documentation is currently in
fsdev/qemu-trace-stap.texi in Texinfo format, which we
present to the user as:
* a virtfs-proxy-helper manpage
* but not (unusually for QEMU) part of the HTML docs
Convert the documentation to rST format that lives in
the docs/ subdirectory, and present it to the user as:
* a virtfs-proxy-helper manpage
* part of the interop/ Sphinx manual
There are minor formatting changes to suit Sphinx, but no
content changes. In particular I've split the -u and -g
options into each having their own description text.
Peter Maydell [Fri, 24 Jan 2020 16:26:05 +0000 (16:26 +0000)]
scripts/qemu-trace-stap: Convert documentation to rST
The qemu-trace-stap documentation is currently in
scripts/qemu-trace-stap.texi in Texinfo format, which we
present to the user as:
* a qemu-trace-stap manpage
* but not (unusually for QEMU) part of the HTML docs
Convert the documentation to rST format that lives in
the docs/ subdirectory, and present it to the user as:
* a qemu-trace-stap manpage
* part of the interop/ Sphinx manual
There are minor formatting changes to suit Sphinx, but no
content changes.
Now the qemu-img documentation has been converted to rST, we can
remove the texinfo document fragments from qemu-img-cmds.hx, as
they are no longer used.
Peter Maydell [Fri, 24 Jan 2020 16:26:03 +0000 (16:26 +0000)]
qemu-img: Convert invocation documentation to rST
The qemu-img documentation is currently in qemu-nbd.texi in Texinfo
format, which we present to the user as:
* a qemu-img manpage
* a section of the main qemu-doc HTML documentation
Convert the documentation to rST format, and present it to the user as:
* a qemu-img manpage
* part of the interop/ Sphinx manual
The qemu-img rST document uses the new hxtool extension
to handle pulling rST fragments out of qemu-img-cmds.hx.
The documentation of the various options and commands is rather
muddled, with some options being described inside the relevant
command description and some in a more general section near the start
of the manual. All the command synopses are replicated in the .hx
file and then again in the manual. A lot of text is also duplicated
in the qemu-img.c code for the help text. I have not attempted to
deal with any of this, but have simply transposed the existing
structure into rST.
As usual, there are some minor formatting changes but no
textual changes, except that as with one or two other conversions
I have dropped the 'see also' section since it's not very
informative and looks odd in the HTML.
Peter Maydell [Fri, 24 Jan 2020 16:26:02 +0000 (16:26 +0000)]
qemu-img-cmds.hx: Add rST documentation fragments
Add the rST versions of the documentation fragments.
Once we've converted qemu-img.texi to rST we can delete
the texi fragments; for the moment we leave them in place.
(Commit created with the aid of emacs query-replace-regexp
from "@var{\([^}]*\)}" to "\,(upcase \1))".)
Peter Maydell [Fri, 24 Jan 2020 16:26:01 +0000 (16:26 +0000)]
docs/sphinx: Add new hxtool Sphinx extension
Some of our documentation includes sections which are created
by assembling fragments of texinfo from a .hx source file into
a .texi file, which is then included from qemu-doc.texi or
qemu-img.texi.
For Sphinx, rather than creating a file to include, the most natural
way to handle this is to have a small custom Sphinx extension which
reads the .hx file and process it. So instead of:
* makefile produces foo.texi from foo.hx
* qemu-doc.texi says '@include foo.texi'
we have:
* qemu-doc.rst says 'hxtool-doc:: foo.hx'
* the Sphinx extension for hxtool has code that runs to handle that
Sphinx directive which reads the .hx file and emits the appropriate
documentation contents
This is pretty much the same way the kerneldoc extension works right
now. It also has the advantage that it should work for third-party
services like readthedocs that expect to build the docs directly with
sphinx rather than by invoking our makefiles.
In this commit we implement the hxtool extension.
Note that syntax errors in the rST fragments will be correctly
reported to the user with the filename and line number within the
hx file.
Peter Maydell [Fri, 24 Jan 2020 16:26:00 +0000 (16:26 +0000)]
hxtool: Support SRST/ERST directives
We want to add support for including rST document fragments
in our .hx files, in the same way we currently have texinfo
fragments. These will be delimited by SRST and ERST directives,
in the same way the texinfo is delimited by STEXI/ETEXI.
The rST fragments will not be extracted by the hxtool
script, but by a different mechanism, so all we need to
do in hxtool is have it ignore all the text inside a
SRST/ERST section, with suitable error-checking for
mismatched rST-vs-texi fragment delimiters.
The resulting effective state machine has only three states:
* flag = 0, rstflag = 0 : reading section for C output
* flag = 1, rstflag = 0 : reading texi fragment
* flag = 0, rstflag = 1 : reading rST fragment
and flag = 1, rstflag = 1 is not possible. Using two
variables makes the parallel between the rST handling and
the texi handling clearer; in any case all this code will
be deleted once we've converted entirely to rST.
Peter Maydell [Fri, 24 Jan 2020 16:25:59 +0000 (16:25 +0000)]
Makefile: Ensure we don't run Sphinx in parallel for manpages
Sphinx will corrupt its doctree cache if we run two copies
of it in parallel. In commit 6bda415c10d966c8d3 we worked
around this by having separate doctrees for 'html' vs 'manpage'
runs. However now that we have more than one manpage produced
from a single manual we can run into this again when trying
to produce the two manpages.
Use the trick described in 'Atomic Rules in GNU Make'
https://www.cmcrossroads.com/article/atomic-rules-gnu-make
to ensure that we only run the Sphinx manpage builder once
for each manual, even if we're producing several manpages.
This fixes doctree corruption in parallel builds and also
avoids pointlessly running Sphinx more often than we need to.
(In GNU Make 4.3 there is builtin support for this, via
the "&:" syntax, but we can't wait for that to be available
in all the distros we support...)
The generic "one invocation for multiple output files"
machinery is provided as a macro named 'atomic' in rules.mak;
we then wrap this in a more specific macro for defining
the rule and dependencies for the manpages in a Sphinx
manual, to avoid excessive repetition.
Peter Maydell [Mon, 3 Feb 2020 09:52:42 +0000 (09:52 +0000)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.0-20200203' into staging
ppc patch queue 2020-02093
This pull request supersedes ppc-for-5.0-20200131. The only changes
are one extra patch to suppress some irritating warnings during tests
under TCG, and an extra Tested-by in one of the other patches.
Here's the next batch of patches for ppc and associated machine types.
Highlights includes:
* Remove the deprecated "prep" machine type and its OpenHackware
firmware
* Add TCG emulation of the msgsndp etc. supervisor privileged
doorbell instructions
* Allow "pnv" machine type to run Hostboot style firmwares
* Add a virtual TPM device for spapr machines
* Implement devices for POWER8 PHB3 and POWER9 PHB4 host bridges for
the pnv machine type
* Use faster Spectre mitigation by default for POWER9 DD2.3 machines
* Introduce Firmware Assisted NMI dump facility for spapr machines
* Fix a performance regression with load/store multiple instructions
in TCG
as well as some other assorted cleanups and fixes.
* remotes/dgibson/tags/ppc-for-5.0-20200203: (35 commits)
tests: Silence various warnings with pseries
target/ppc: Use probe_write for DCBZ
target/ppc: Remove redundant mask in DCBZ
target/ppc: Use probe_access for LMW, STMW
target/ppc: Use probe_access for LSW, STSW
ppc: spapr: Activate the FWNMI functionality
migration: Include migration support for machine check handling
ppc: spapr: Handle "ibm,nmi-register" and "ibm,nmi-interlock" RTAS calls
target/ppc: Build rtas error log upon an MCE
target/ppc: Handle NMI guest exit
ppc: spapr: Introduce FWNMI capability
Wrapper function to wait on condition for the main loop mutex
target/ppc/cpu.h: Put macro parameter in parentheses
spapr: Enable DD2.3 accelerated count cache flush in pseries-5.0 machine
ppc/pnv: change the PowerNV machine devices to be non user creatable
ppc/pnv: Add models for POWER8 PHB3 PCIe Host bridge
ppc/pnv: Add models for POWER9 PHB4 PCIe Host bridge
docs/specs/tpm: reST-ify TPM documentation
hw/ppc/Kconfig: Enable TPM_SPAPR as part of PSERIES config
tpm_spapr: Support suspend and resume
...
Greg Kurz [Sat, 1 Feb 2020 22:46:16 +0000 (23:46 +0100)]
tests: Silence various warnings with pseries
Some default features of the pseries machine are only available with
KVM. Warnings are printed when the pseries machine is used with another
accelerator:
qemu-system-ppc64: warning: TCG doesn't support requested feature,
cap-ccf-assist=on
qemu-system-ppc64: warning: Firmware Assisted Non-Maskable
Interrupts(FWNMI) not supported in TCG
qemu-system-ppc64: warning: TCG doesn't support requested feature,
cap-ccf-assist=on
qemu-system-ppc64: warning: Firmware Assisted Non-Maskable
Interrupts(FWNMI) not supported in TCG
qemu-system-ppc64: warning: TCG doesn't support requested feature,
cap-ccf-assist=on
qemu-system-ppc64: warning: Firmware Assisted Non-Maskable
Interrupts(FWNMI) not supported in TCG
This is annoying for CI since it usually runs without KVM. We already
disable features that emit similar warnings thanks to properties of
the pseries machine, but this is open-coded in various
places. Consolidate the set of properties in a single place. Extend it
to silence the above warnings. And use it in the various tests that
start pseries machines.
Use a minimum number of mmu lookups for the contiguous bytes
that are accessed. If the lookup succeeds, we can finish the
operation with host addresses only.
Use a minimum number of mmu lookups for the contiguous bytes
that are accessed. If the lookup succeeds, we can finish the
operation with host addresses only.
Aravinda Prasad [Thu, 30 Jan 2020 18:44:22 +0000 (00:14 +0530)]
migration: Include migration support for machine check handling
This patch includes migration support for machine check
handling. Especially this patch blocks VM migration
requests until the machine check error handling is
complete as these errors are specific to the source
hardware and is irrelevant on the target hardware.
Aravinda Prasad [Thu, 30 Jan 2020 18:44:21 +0000 (00:14 +0530)]
ppc: spapr: Handle "ibm,nmi-register" and "ibm,nmi-interlock" RTAS calls
This patch adds support in QEMU to handle "ibm,nmi-register"
and "ibm,nmi-interlock" RTAS calls.
The machine check notification address is saved when the
OS issues "ibm,nmi-register" RTAS call.
This patch also handles the case when multiple processors
experience machine check at or about the same time by
handling "ibm,nmi-interlock" call. In such cases, as per
PAPR, subsequent processors serialize waiting for the first
processor to issue the "ibm,nmi-interlock" call. The second
processor that also received a machine check error waits
till the first processor is done reading the error log.
The first processor issues "ibm,nmi-interlock" call
when the error log is consumed.
Aravinda Prasad [Thu, 30 Jan 2020 18:44:20 +0000 (00:14 +0530)]
target/ppc: Build rtas error log upon an MCE
Upon a machine check exception (MCE) in a guest address space,
KVM causes a guest exit to enable QEMU to build and pass the
error to the guest in the PAPR defined rtas error log format.
This patch builds the rtas error log, copies it to the rtas_addr
and then invokes the guest registered machine check handler. The
handler in the guest takes suitable action(s) depending on the type
and criticality of the error. For example, if an error is
unrecoverable memory corruption in an application inside the
guest, then the guest kernel sends a SIGBUS to the application.
For recoverable errors, the guest performs recovery actions and
logs the error.
Aravinda Prasad [Thu, 30 Jan 2020 18:44:19 +0000 (00:14 +0530)]
target/ppc: Handle NMI guest exit
Memory error such as bit flips that cannot be corrected
by hardware are passed on to the kernel for handling.
If the memory address in error belongs to guest then
the guest kernel is responsible for taking suitable action.
Patch [1] enhances KVM to exit guest with exit reason
set to KVM_EXIT_NMI in such cases. This patch handles
KVM_EXIT_NMI exit.
[1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
(e20bbd3d and related commits)
Aravinda Prasad [Thu, 30 Jan 2020 18:44:18 +0000 (00:14 +0530)]
ppc: spapr: Introduce FWNMI capability
Introduce fwnmi an spapr capability and add a helper function
which tries to enable it, which would be used by following patch
of the series. This patch by itself does not change the existing
behavior.
Aravinda Prasad [Thu, 30 Jan 2020 18:44:17 +0000 (00:14 +0530)]
Wrapper function to wait on condition for the main loop mutex
Introduce a wrapper function to wait on condition for
the main loop mutex. This function atomically releases
the main loop mutex and causes the calling thread to
block on the condition. This wrapper is required because
qemu_global_mutex is a static variable.
David Gibson [Wed, 29 Jan 2020 23:28:56 +0000 (10:28 +1100)]
spapr: Enable DD2.3 accelerated count cache flush in pseries-5.0 machine
For POWER9 DD2.2 cpus, the best current Spectre v2 indirect branch
mitigation is "count cache disabled", which is configured with:
-machine cap-ibs=fixed-ccd
However, this option isn't available on DD2.3 CPUs with KVM, because they
don't have the count cache disabled.
For POWER9 DD2.3 cpus, it is "count cache flush with assist", configured
with:
-machine cap-ibs=workaround,cap-ccf-assist=on
However this option isn't available on DD2.2 CPUs with KVM, because they
don't have the special CCF assist instruction this relies on.
On current machine types, we default to "count cache flush w/o assist",
that is:
-machine cap-ibs=workaround,cap-ccf-assist=off
This runs, with mitigation on both DD2.2 and DD2.3 host cpus, but has a
fairly significant performance impact.
It turns out we can do better. The special instruction that CCF assist
uses to trigger a count cache flush is a no-op on earlier CPUs, rather than
trapping or causing other badness. It doesn't, of itself, implement the
mitigation, but *if* we have count-cache-disabled, then the count cache
flush is unnecessary, and so using the count cache flush mitigation is
harmless.
Therefore for the new pseries-5.0 machine type, enable cap-ccf-assist by
default. Along with that, suppress throwing an error if cap-ccf-assist
is selected but KVM doesn't support it, as long as KVM *is* giving us
count-cache-disabled. To allow TCG to work out of the box, even though it
doesn't implement the ccf flush assist, downgrade the error in that case to
a warning. This matches several Spectre mitigations where we allow TCG
to operate for debugging, since we don't really make guarantees about TCG
security properties anyway.
While we're there, make the TCG warning for this case match that for other
mitigations.
Cédric Le Goater [Wed, 29 Jan 2020 11:37:20 +0000 (12:37 +0100)]
ppc/pnv: change the PowerNV machine devices to be non user creatable
The PowerNV machine emulates an OpenPOWER system and the PowerNV chip
devices are models of the internal logic of the POWER processor. They
can not be instantiated by the user on the QEMU command line.
The PHB3/PHB4 devices could be an exception in the future after some
rework on how the device tree is built. For the moment, exclude them
also.
Cédric Le Goater [Mon, 27 Jan 2020 14:45:06 +0000 (15:45 +0100)]
ppc/pnv: Add models for POWER8 PHB3 PCIe Host bridge
This is a model of the PCIe Host Bridge (PHB3) found on a POWER8
processor. It includes the PowerBus logic interface (PBCQ), IOMMU
support, a single PCIe Gen.3 Root Complex, and support for MSI and LSI
interrupt sources as found on a POWER8 system using the XICS interrupt
controller.
The POWER8 processor comes in different flavors: Venice, Murano,
Naple, each having a different number of PHBs. To make things simpler,
the models provides 3 PHB3 per chip. Some platforms, like the
Firestone, can also couple PHBs on the first chip to provide more
bandwidth but this is too specific to model in QEMU.
XICS requires some adjustment to support the PHB3 MSI. The changes are
provided here but they could be decoupled in prereq patches.
ppc/pnv: Add models for POWER9 PHB4 PCIe Host bridge
These changes introduces models for the PCIe Host Bridge (PHB4) of the
POWER9 processor. It includes the PowerBus logic interface (PBCQ),
IOMMU support, a single PCIe Gen.4 Root Complex, and support for MSI
and LSI interrupt sources as found on a POWER9 system using the XIVE
interrupt controller.
POWER9 processor comes with 3 PHB4 PEC (PCI Express Controller) and
each PEC can have several PHBs. By default,
Each PEC has a set "global" registers and some "per-stack" (per-PHB)
registers. Those are organized in two XSCOM ranges, the "Nest" range
and the "PCI" range, each range contains both some "PEC" registers and
some "per-stack" registers.
No default device layout is provided and PCI devices can be added on
any of the available PCIe Root Port (pcie.0 .. 2 of a Power9 chip)
with address 0x0 as the firwware (skiboot) only accepts a single
device per root port. To run a simple system with a network and a
storage adapters, use a command line options such as :
Stefan Berger [Tue, 21 Jan 2020 15:29:32 +0000 (10:29 -0500)]
tpm_spapr: Support TPM for ppc64 using CRQ based interface
Implement support for TPM on ppc64 by implementing the vTPM CRQ interface
as a frontend. It can use the tpm_emulator driver backend with the external
swtpm.
The Linux vTPM driver for ppc64 works with this emulation.
Stefan Berger [Tue, 21 Jan 2020 15:29:31 +0000 (10:29 -0500)]
spapr: Implement get_dt_compatible() callback
For devices that cannot be statically initialized, implement a
get_dt_compatible() callback that allows us to ask the device for
the 'compatible' value.
Cédric Le Goater [Mon, 27 Jan 2020 14:41:54 +0000 (15:41 +0100)]
ppc/pnv: Add support for "hostboot" mode
When the "hb-mode" option is activated on the powernv machine, the
firmware is mapped at 0x8000000 and the HRMOR of the HW threads are
set to the same address.
The PNOR mapping on the FW address space of the LPC bus is left enabled
to let the firmware load any other images required to boot the host.
Commit 158e17a65e1a ("ppc/pnv: Link "chip" property to PnvCore::chip
pointer") introduced some cleanups of the PnvCore realize handler.
Let's continue by reworking a bit the interface of the PnvCore
handlers for the CPU threads. These changes make the "core-pir"
property alias unused. Remove it.
Greg Kurz [Wed, 22 Jan 2020 13:11:12 +0000 (14:11 +0100)]
spapr: Don't allow multiple active vCPUs at CAS
According to the description of "ibm,client-architecture-support" that
can found in LoPAPR "B.6.2.3 Root Node Methods":
If multiple partition processors or threads are active at the time of
the ibm,client-architecture-support method call, or an error is detected
in the format of the ibm,architecture.vec structure, the err? boolean
shall be TRUE; else FALSE.
We certainly don't want to temper with the platform or with the PCR of
the other vCPUs if they happen to be active. Ensure we have only one
active vCPU and fail CAS otherwise. This is just for conformance and
robustness, it doesn't fix any known bugs.
Cédric Le Goater [Mon, 20 Jan 2020 10:49:35 +0000 (11:49 +0100)]
target/ppc: add support for Hypervisor Facility Unavailable Exception
The privileged message send and clear instructions (msgsndp & msgclrp)
are privileged, but will generate a hypervisor facility unavailable
exception if not enabled in the HFSCR and executed in privileged
non-hypervisor state.
Add checks when accessing the DPDES register and when using the
msgsndp and msgclrp isntructions.
The Processor Control facility for POWER8 processors and later
provides a mechanism for the hypervisor to send messages to other
threads in the system (msgsnd instruction) and cause hypervisor-level
exceptions. Privileged non-hypervisor programs can also send messages
(msgsndp instruction) but are restricted to the threads of the same
subprocessor and cause privileged-level exceptions.
The Directed Privileged Doorbell Exception State (DPDES) register
reflects the state of pending privileged doorbell exceptions and can
be used to modify that state. The register can be used to read and
modify the state of privileged doorbell exceptions for all threads of
a subprocessor and thus is a shared facility for that subprocessor.
The register can be read/written by the hypervisor and read by the
supervisor if enabled in the HFSCR, otherwise a hypervisor facility
unavailable exception is generated.
The privileged message send and clear instructions (msgsndp & msgclrp)
are used to generate and clear the presence of a directed privileged
doorbell exception, respectively. The msgsndp instruction can be used
to target any thread of the current subprocessor, msgclrp acts on the
thread issuing the instruction. These instructions are privileged, but
will generate a hypervisor facility unavailable exception if not
enabled in the HFSCR and executed in privileged non-hypervisor
state. The HV facility unavailable exception will be addressed in
other patch.
Add and implement this register and instructions by reading or
modifying the pending interrupt state of the cpu.
Note that TCG only supports one thread per core and so we only need to
worry about the cpu making the access.
Greg Kurz [Fri, 17 Jan 2020 09:15:52 +0000 (10:15 +0100)]
spapr: Fail CAS if option vector table cannot be parsed
Most of the option vector helpers have assertions to check their
arguments aren't null. The guest can provide an arbitrary address
for the CAS structure that would result in such null arguments.
Fail CAS with H_PARAMETER and print a warning instead of aborting
QEMU.
Peter Maydell [Fri, 31 Jan 2020 17:36:59 +0000 (17:36 +0000)]
Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging
Pull request
# gpg: Signature made Thu 30 Jan 2020 21:38:06 GMT
# gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>" [full]
# gpg: aka "Stefan Hajnoczi <[email protected]>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8
* remotes/stefanha/tags/tracing-pull-request:
qemu_set_log_filename: filename argument may be NULL
hw/display/qxl.c: Use trace_event_get_state_backends()
memory.c: Use trace_event_get_state_backends()
docs/devel/tracing.txt: Recommend only trace_event_get_state_backends()
Makefile: Keep trace-events-subdirs ordered
Alex Bennée [Fri, 31 Jan 2020 15:34:39 +0000 (15:34 +0000)]
target/arm: fix TCG leak for fcvt half->double
When support for the AHP flag was added we inexplicably only freed the
new temps in one of the two legs. Move those tcg_temp_free to the same
level as the allocation to fix that leak.
Peter Maydell [Fri, 31 Jan 2020 10:37:11 +0000 (10:37 +0000)]
Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
Pull request
# gpg: Signature made Thu 30 Jan 2020 21:31:02 GMT
# gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>" [full]
# gpg: aka "Stefan Hajnoczi <[email protected]>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8
* remotes/stefanha/tags/block-pull-request:
tests/qemu-iotests: use AIOMODE with various tests
tests/qemu-iotests: enable testing with aio options
qemu-nbd: adds option for aio engines
qemu-img: adds option to use aio engine for benchmarking
qemu-io: adds option to use aio engine
block/io_uring: adds userspace completion polling
block: add trace events for io_uring
block/file-posix.c: extend to use io_uring
blockdev: adds bdrv_parse_aio to use io_uring
util/async: add aio interfaces for io_uring
stubs: add stubs for io_uring interface
block/io_uring: implements interfaces for io_uring
block/block: add BDRV flag for io_uring
qapi/block-core: add option for io_uring
configure: permit use of io_uring
block/io: take bs->reqs_lock in bdrv_mark_request_serialising
block/io: wait for serialising requests when a request becomes serialising
block: eliminate BDRV_REQ_NO_SERIALISING
Salvador Fandino [Thu, 23 Jan 2020 19:36:26 +0000 (20:36 +0100)]
qemu_set_log_filename: filename argument may be NULL
NULL is a valid log filename used to indicate we want to use stderr
but qemu_set_log_filename (which is called by bsd-user/main.c) was not
handling it correctly.
That also made redundant a couple of NULL checks in calling code which
have been removed.
Peter Maydell [Mon, 20 Jan 2020 15:11:42 +0000 (15:11 +0000)]
hw/display/qxl.c: Use trace_event_get_state_backends()
The preferred way to test whether a trace event is enabled is to
use trace_event_get_state_backends(), because this will give the
correct answer (allowing expensive computations to be skipped)
whether the trace event is compile-time or run-time disabled.
Convert the old-style direct use of TRACE_FOO_ENABLED.
Peter Maydell [Mon, 20 Jan 2020 15:11:41 +0000 (15:11 +0000)]
memory.c: Use trace_event_get_state_backends()
The preferred way to test whether a trace event is enabled is to
use trace_event_get_state_backends(), because this will give the
correct answer (allowing expensive computations to be skipped)
whether the trace event is compile-time or run-time disabled.
Convert the four old-style direct uses of TRACE_FOO_ENABLED in
memory.c.
Peter Maydell [Mon, 20 Jan 2020 15:11:40 +0000 (15:11 +0000)]
docs/devel/tracing.txt: Recommend only trace_event_get_state_backends()
Instead of recommending checking the TRACE_FOO_ENABLED macro to
skip expensive computations needed only for tracing, recommend
only using trace_event_get_state_backends(). This works for both
compile-time and run-time disabling of events, and has no extra
performance impact if the event is compile-time disabled.
Adding the same directory multiple times to trace-events-subdirs
might trigger build failures, in particular when using the LTTng
Userspace Tracer library as backend.
For example when using two times the hw/core/ directory:
$ ./configure --enable-trace-backends=ust && make
[...]
CC trace-ust-all.o
In file included from trace-ust-all.h:13,
from trace-ust-all.c:13:
trace-ust-all.h:35151:1: error: redefinition of ‘__tracepoint_cb_qemu___loader_write_rom’
35151 | TRACEPOINT_EVENT(
| ^~~~~~~~~~~~~~~~
trace-ust-all.h:31791:1: note: previous definition of ‘__tracepoint_cb_qemu___loader_write_rom’ was here
31791 | TRACEPOINT_EVENT(
| ^~~~~~~~~~~~~~~~
To ease review and reduce likelihood of merge failures (see [*]),
keep trace-events-subdirs ordered when possible, following eb7ccb3c0.
[*] https://www.mail-archive.com/[email protected]/msg671007.html
Duplicate trace-events-subdirs entries generates duplicated
symbols when using the LTTng Userspace Tracer library.
Aarushi Mehta [Mon, 20 Jan 2020 14:18:48 +0000 (14:18 +0000)]
stubs: add stubs for io_uring interface
Follow linux-aio.o and stub out the block/io_uring.o APIs that will be
missing when a binary is linked with obj-util-y but without
block-util-y (e.g. vhost-user-gpu).
For example, the stubs are necessary so that a binary using util/async.o
from obj-util-y for qemu_bh_new() links successfully. In this case
block/io_uring.o from block-util-y isn't needed and we can avoid
dragging in the block layer by linking the stubs instead. The stub
functions never get called.
Paolo Bonzini [Wed, 8 Jan 2020 14:55:56 +0000 (15:55 +0100)]
block/io: take bs->reqs_lock in bdrv_mark_request_serialising
bdrv_mark_request_serialising is writing the overlap_offset and
overlap_bytes fields of BdrvTrackedRequest. Take bs->reqs_lock
for the whole duration of it, and not just when waiting for
serialising requests, so that tracked_request_overlaps does not
look at a half-updated request.
The new code does not unlock/relock around retries. This is unnecessary
because a retry is always preceded by a CoQueue wait, which already
releases and reacquires bs->reqs_lock.
Paolo Bonzini [Wed, 8 Jan 2020 14:55:55 +0000 (15:55 +0100)]
block/io: wait for serialising requests when a request becomes serialising
Marking without waiting would not result in actual serialising behavior.
Thus, make a call bdrv_mark_request_serialising sufficient for
serialisation to happen.
Peter Maydell [Thu, 30 Jan 2020 16:19:04 +0000 (16:19 +0000)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20200130' into staging
target-arm queue:
* hw/core/or-irq: Fix incorrect assert forbidding num-lines == MAX_OR_LINES
* target/arm/arm-semi: Don't let the guest close stdin/stdout/stderr
* aspeed: some minor bugfixes
* aspeed: add eMMC controller model for AST2600 SoC
* hw/arm/raspi: Remove obsolete use of -smp to set the soc 'enabled-cpus'
* New 3-phase reset API for device models
* hw/intc/arm_gicv3_kvm: Stop wrongly programming GICR_PENDBASER.PTZ bit
* Arm KVM: stop/restart the guest counter when the VM is stopped and started
* remotes/pmaydell/tags/pull-target-arm-20200130: (26 commits)
target/arm/cpu: Add the kvm-no-adjvtime CPU property
target/arm/kvm: Implement virtual time adjustment
tests/arm-cpu-features: Check feature default values
target/arm/kvm64: kvm64 cpus have timer registers
hw/arm/virt: Add missing 5.0 options call to 4.2 options
target/arm/kvm: trivial: Clean up header documentation
hw/intc/arm_gicv3_kvm: Stop wrongly programming GICR_PENDBASER.PTZ bit
hw/s390x/ipl: replace deprecated qdev_reset_all registration
vl: replace deprecated qbus_reset_all registration
docs/devel/reset.rst: add doc about Resettable interface
hw/core: deprecate old reset functions and introduce new ones
hw/core/qdev: update hotplug reset regarding resettable
hw/core/qdev: handle parent bus change regarding resettable
hw/core/resettable: add support for changing parent
hw/core: add Resettable support to BusClass and DeviceClass
hw/core: create Resettable QOM interface
hw/core/qdev: add trace events to help with resettable transition
add device_legacy_reset function to prepare for reset api change
hw/arm/raspi: Remove obsolete use of -smp to set the soc 'enabled-cpus'
misc/pca9552: Add qom set and get
...
Andrew Jones [Thu, 30 Jan 2020 16:02:06 +0000 (16:02 +0000)]
target/arm/cpu: Add the kvm-no-adjvtime CPU property
kvm-no-adjvtime is a KVM specific CPU property and a first of its
kind. To accommodate it we also add kvm_arm_add_vcpu_properties()
and a KVM specific CPU properties description to the CPU features
document.
Andrew Jones [Thu, 30 Jan 2020 16:02:06 +0000 (16:02 +0000)]
target/arm/kvm: Implement virtual time adjustment
When a VM is stopped (such as when it's paused) guest virtual time
should stop counting. Otherwise, when the VM is resumed it will
experience time jumps and its kernel may report soft lockups. Not
counting virtual time while the VM is stopped has the side effect
of making the guest's time appear to lag when compared with real
time, and even with time derived from the physical counter. For
this reason, this change, which is enabled by default, comes with
a KVM CPU feature allowing it to be disabled, restoring legacy
behavior.
This patch only provides the implementation of the virtual time
adjustment. A subsequent patch will provide the CPU property
allowing the change to be enabled and disabled.
Andrew Jones [Thu, 30 Jan 2020 16:02:06 +0000 (16:02 +0000)]
target/arm/kvm64: kvm64 cpus have timer registers
Add the missing GENERIC_TIMER feature to kvm64 cpus.
We don't currently use these registers when KVM is enabled, but it's
probably best we add the feature flag for consistency and potential
future use. There's also precedent, as we add the PMU feature flag to
KVM enabled guests, even though we don't use those registers either.
This change was originally posted as a hunk of a different, never
merged patch from Bijan Mottahedeh.
Zenghui Yu [Thu, 30 Jan 2020 16:02:05 +0000 (16:02 +0000)]
hw/intc/arm_gicv3_kvm: Stop wrongly programming GICR_PENDBASER.PTZ bit
If LPIs are disabled, KVM will just ignore the GICR_PENDBASER.PTZ bit when
restoring GICR_CTLR. Setting PTZ here makes littlt sense in "reduce GIC
initialization time".
And what's worse, PTZ is generally programmed by guest to indicate to the
Redistributor whether the LPI Pending table is zero when enabling LPIs.
If migration is triggered when the PTZ has just been cleared by guest (and
before enabling LPIs), we will see PTZ==1 on the destination side, which
is not as expected. Let's just drop this hackish userspace behavior.
Also take this chance to refine the comment a bit.
Replace deprecated qdev_reset_all by resettable_cold_reset_fn for
the ipl registration in the main reset handlers.
This does not impact the behavior for the following reasons:
+ at this point resettable just call the old reset methods of devices
and buses in the same order than qdev/qbus.
+ resettable handlers registered with qemu_register_reset are
serialized; there is no interleaving.
+ eventual explicit calls to legacy reset API (device_reset or
qdev/qbus_reset) inside this reset handler will not be masked out
by resettable mechanism; they do not go through resettable api.
Replace deprecated qbus_reset_all by resettable_cold_reset_fn for
the sysbus reset registration.
Apart for the raspi machines, this does not impact the behavior
because:
+ at this point resettable just calls the old reset methods of devices
and buses in the same order as qdev/qbus.
+ resettable handlers registered with qemu_register_reset are
serialized; there is no interleaving.
+ eventual explicit calls to legacy reset API (device_reset or
qdev/qbus_reset) inside this reset handler will not be masked out
by resettable mechanism; they do not go through resettable api.
For the raspi machines, during the sysbus reset the sd-card is not
reset twice anymore but only once. This is a consequence of switching
both sysbus reset and changing parent to resettable; it detects the
second reset is not needed. This has no impact on the state after
reset; the sd-card reset method only reset local state and query
information from the block backend.
The raspi reset change can be observed by using the following command
(reset will occurs, then do Ctrl-C to end qemu; no firmware is
given here).
qemu-system-aarch64 -M raspi3 \
-trace resettable_phase_hold_exec \
-trace qdev_update_parent_bus \
-trace resettable_change_parent \
-trace qdev_reset -trace qbus_reset
Before the patch, the qdev/qbus_reset traces show when reset method are
called. After the patch, the resettable_phase_hold_exec show when reset
method are called.
The traced reset order of the raspi3 is listed below. I've added empty
lines and the tree structure.
+->bcm2835-peripherals reset
|
| +->sd-card reset
| +->sd-bus reset
+->bcm2835_gpio reset
| -> dev_update_parent_bus (move the sd-card on the sdhci-bus)
| -> resettable_change_parent
|
+->bcm2835-dma reset
|
| +->bcm2835-sdhost-bus reset
+->bcm2835-sdhost reset
|
| +->sd-card (reset ONLY BEFORE BEFORE THE PATCH)
| +->sdhci-bus reset
+->generic-sdhci reset
|
+->bcm2835-rng reset
+->bcm2835-property reset
+->bcm2835-fb reset
+->bcm2835-mbox reset
+->bcm2835-aux reset
+->pl011 reset
+->bcm2835-ic reset
+->bcm2836-control reset
System reset
In both case, the sd-card is reset (being on bcm2835_gpio/sd-bus) then moved
to generic-sdhci/sdhci-bus by the bcm2835_gpio reset method.
Before the patch, it is then reset again being part of generic-sdhci/sdhci-bus.
After the patch, it considered again for reset but its reset method is not
called because it is already flagged as reset.
Damien Hedde [Thu, 30 Jan 2020 16:02:04 +0000 (16:02 +0000)]
hw/core: deprecate old reset functions and introduce new ones
Deprecate device_legacy_reset(), qdev_reset_all() and
qbus_reset_all() to be replaced by new functions
device_cold_reset() and bus_cold_reset() which uses resettable API.
Also introduce resettable_cold_reset_fn() which may be used as a
replacement for qdev_reset_all_fn and qbus_reset_all_fn().
Following patches will be needed to look at legacy reset call sites
and switch to resettable api. The legacy functions will be removed
when unused.
This commit make use of the resettable API to reset the device being
hotplugged when it is realized. Also it ensures it is put in a reset
state coherent with the parent it is plugged into.
Note that there is a difference in the reset. Instead of resetting
only the hotplugged device, we reset also its subtree (switch to
resettable API). This is not expected to be a problem because
sub-buses are just realized too. If a hotplugged device has any
sub-buses it is logical to reset them too at this point.
The recently added should_be_hidden and PCI's partially_hotplugged
mechanisms do not interfere with realize operation:
+ In the should_be_hidden use case, device creation is
delayed.
+ The partially_hotplugged mechanism prevents a device to be
unplugged and unrealized from qdev POV and unrealized.
Damien Hedde [Thu, 30 Jan 2020 16:02:04 +0000 (16:02 +0000)]
hw/core/qdev: handle parent bus change regarding resettable
In qdev_set_parent_bus(), when changing the parent bus of a
realized device, if the source and destination buses are not in the
same reset state, some adaptations are required. This patch adds
needed call to resettable_change_parent() to make sure a device reset
state stays coherent with its parent bus.
The addition is a no-op if:
1. the device being parented is not realized.
2. the device is realized, but both buses are not under reset.
Case 2 means that as long as qdev_set_parent_bus() is called
during the machine realization procedure (which is before the
machine reset so nothing is in reset), it is a no op.
There are 52 call sites of qdev_set_parent_bus(). All but one fall
into the no-op case:
+ 29 trivial calls related to virtio (in hw/{s390x,display,virtio}/
{vhost,virtio}-xxx.c) to set a vdev(or vgpu) composing device
parent bus just before realizing the same vdev(vgpu).
+ hw/core/qdev.c: when creating a device in qdev_try_create()
+ hw/core/sysbus.c: when initializing a device in the sysbus
+ hw/i386/amd_iommu.c: before realizing AMDVIState/pci
+ hw/isa/piix4.c: before realizing PIIX4State/rtc
+ hw/misc/auxbus.c: when creating an AUXBus
+ hw/misc/auxbus.c: when creating an AUXBus child
+ hw/misc/macio/macio.c: when initializing a MACIOState child
+ hw/misc/macio/macio.c: before realizing NewWorldMacIOState/pmu
+ hw/misc/macio/macio.c: before realizing NewWorldMacIOState/cuda
+ hw/net/virtio-net.c: Used for migration when using the failover
mechanism to migration a vfio-pci/net. It is
a no-op because at this point the device is
already on the bus.
+ hw/pci-host/designware.c: before realizing DesignwarePCIEHost/root
+ hw/pci-host/gpex.c: before realizing GPEXHost/root
+ hw/pci-host/prep.c: when initialiazing PREPPCIState/pci_dev
+ hw/pci-host/q35.c: before realizing Q35PCIHost/mch
+ hw/pci-host/versatile.c: when initializing PCIVPBState/pci_dev
+ hw/pci-host/xilinx-pcie.c: before realizing XilinxPCIEHost/root
+ hw/s390x/event-facility.c: when creating SCLPEventFacility/
TYPE_SCLP_QUIESCE
+ hw/s390x/event-facility.c: ditto with SCLPEventFacility/
TYPE_SCLP_CPU_HOTPLUG
+ hw/s390x/sclp.c: Not trivial because it is called on a SLCPDevice
just after realizing it. Ok because at this point the destination
bus (sysbus) is not in reset; the realize step is before the
machine reset.
+ hw/sd/core.c: Not OK. Used in sdbus_reparent_card(). See below.
+ hw/ssi/ssi.c: Used to put spi slave on spi bus and connect the cs
line in ssi_auto_connect_slave(). Ok because this function is only
used in realize step in hw/ssi/aspeed_smc.ci, hw/ssi/imx_spi.c,
hw/ssi/mss-spi.c, hw/ssi/xilinx_spi.c and hw/ssi/xilinx_spips.c.
+ hw/xen/xen-legacy-backend.c: when creating a XenLegacyDevice device
+ qdev-monitor.c: in device hotplug creation procedure before realize
Note that this commit alone will have no effect, right now there is no
use of resettable API to reset anything. So a bus will never be tagged
as in-reset by this same API.
The one place where side-effect will occurs is in hw/sd/core.c in
sdbus_reparent_card(). This function is only used in the raspi machines,
including during the sysbus reset procedure. This case will be
carrefully handled when doing the multiple phase reset transition.
Damien Hedde [Thu, 30 Jan 2020 16:02:04 +0000 (16:02 +0000)]
hw/core/resettable: add support for changing parent
Add a function resettable_change_parent() to do the required
plumbing when changing the parent a of Resettable object.
We need to make sure that the reset state of the object remains
coherent with the reset state of the new parent.
We make the 2 following hypothesis:
+ when an object is put in a parent under reset, the object goes in
reset.
+ when an object is removed from a parent under reset, the object
leaves reset.
The added function avoids any glitch if both old and new parent are
already in reset.
Damien Hedde [Thu, 30 Jan 2020 16:02:04 +0000 (16:02 +0000)]
hw/core: add Resettable support to BusClass and DeviceClass
This commit adds support of Resettable interface to buses and devices:
+ ResettableState structure is added in the Bus/Device state
+ Resettable methods are implemented.
+ device/bus_is_in_reset function defined
This commit allows to transition the objects to the new
multi-phase interface without changing the reset behavior at all.
Object single reset method can be split into the 3 different phases
but the 3 phases are still executed in a row for a given object.
From the qdev/qbus reset api point of view, nothing is changed.
qdev_reset_all() and qbus_reset_all() are not modified as well as
device_legacy_reset().
Transition of an object must be done from parent class to child class.
Care has been taken to allow the transition of a parent class
without requiring the child classes to be transitioned at the same
time. Note that SysBus and SysBusDevice class do not need any transition
because they do not override the legacy reset method.
Damien Hedde [Thu, 30 Jan 2020 16:02:03 +0000 (16:02 +0000)]
hw/core: create Resettable QOM interface
This commit defines an interface allowing multi-phase reset. This aims
to solve a problem of the actual single-phase reset (built in
DeviceClass and BusClass): reset behavior is dependent on the order
in which reset handlers are called. In particular doing external
side-effect (like setting an qemu_irq) is problematic because receiving
object may not be reset yet.
The Resettable interface divides the reset in 3 well defined phases.
To reset an object tree, all 1st phases are executed then all 2nd then
all 3rd. See the comments in include/hw/resettable.h for a more complete
description. The interface defines 3 phases to let the future
possibility of holding an object into reset for some time.
The qdev/qbus reset in DeviceClass and BusClass will be modified in
following commits to use this interface. A mechanism is provided
to allow executing a transitional reset handler in place of the 2nd
phase which is executed in children-then-parent order inside a tree.
This will allow to transition devices and buses smoothly while
keeping the exact current qdev/qbus reset behavior for now.
Documentation will be added in a following commit.
Damien Hedde [Thu, 30 Jan 2020 16:02:03 +0000 (16:02 +0000)]
add device_legacy_reset function to prepare for reset api change
Provide a temporary device_legacy_reset function doing what
device_reset does to prepare for the transition with Resettable
API.
All occurrence of device_reset in the code tree are also replaced
by device_legacy_reset.
The new resettable API has different prototype and semantics
(resetting child buses as well as the specified device). Subsequent
commits will make the changeover for each call site individually; once
that is complete device_legacy_reset() will be removed.
hw/arm/raspi: Remove obsolete use of -smp to set the soc 'enabled-cpus'
Since we enabled parallel TCG code generation for softmmu (see
commit 3468b59 "tcg: enable multiple TCG contexts in softmmu")
and its subsequent fix (commit 72649619 "add .min_cpus and
.default_cpus fields to machine_class"), the raspi machines are
restricted to always use their 4 cores:
See in hw/arm/raspi2 (with BCM283X_NCPUS set to 4):
Joel Stanley [Thu, 30 Jan 2020 16:02:02 +0000 (16:02 +0000)]
misc/pca9552: Add qom set and get
Following the pattern of the work recently done with the ASPEED GPIO
model, this adds support for inspecting and modifying the PCA9552 LEDs
from the monitor.
(qemu) qom-set /machine/unattached/device[17] led0 on
(qemu) qom-set /machine/unattached/device[17] led0 off
(qemu) qom-set /machine/unattached/device[17] led0 pwm0
(qemu) qom-set /machine/unattached/device[17] led0 pwm1
Cédric Le Goater [Thu, 30 Jan 2020 16:02:02 +0000 (16:02 +0000)]
hw/arm/aspeed: add a 'execute-in-place' property to boot directly from CE0
The overhead for the OpenBMC firmware images using the a custom U-Boot
is around 2 seconds, which is fine, but with a U-Boot from mainline,
it takes an extra 50 seconds or so to reach Linux. A quick survey on
the number of reads performed on the flash memory region gives the
following figures :
QEMU must be trashing the TCG TBs and reloading text very often. Some
addresses are read more than 250.000 times. Until we find a solution
to improve boot time, execution from MMIO is not activated by default.
Setting this option also breaks migration compatibility.
Cédric Le Goater [Thu, 30 Jan 2020 16:02:02 +0000 (16:02 +0000)]
ftgmac100: check RX and TX buffer alignment
These buffers should be aligned on 16 bytes.
Ignore invalid RX and TX buffer addresses and log an error. All
incoming and outgoing traffic will be dropped because no valid RX or
TX descriptors will be available.
Andrew Jeffery [Thu, 30 Jan 2020 16:02:02 +0000 (16:02 +0000)]
hw/arm: ast2600: Wire up the eMMC controller
Initialise another SDHCI model instance for the AST2600's eMMC
controller and use the SDHCI's num_slots value introduced previously to
determine whether we should create an SD card instance for the new slot.
Andrew Jeffery [Thu, 30 Jan 2020 16:02:02 +0000 (16:02 +0000)]
hw/sd: Configure number of slots exposed by the ASPEED SDHCI model
The AST2600 includes a second cut-down version of the SD/MMC controller
found in the AST2500, named the eMMC controller. It's cut down in the
sense that it only supports one slot rather than two, but it brings the
total number of slots supported by the AST2600 to three.
The existing code assumed that the SD controller always provided two
slots. Rework the SDHCI object to expose the number of slots as a
property to be set by the SoC configuration.