* remotes/pmaydell/tags/pull-target-arm-20220302: (26 commits)
ui/cocoa.m: Remove unnecessary NSAutoreleasePools
ui/cocoa.m: Fix updateUIInfo threading issues
target/arm: Report KVM's actual PSCI version to guest in dtb
target/arm: Implement FEAT_LPA2
target/arm: Advertise all page sizes for -cpu max
target/arm: Validate tlbi TG matches translation granule in use
target/arm: Fix TLBIRange.base for 16k and 64k pages
target/arm: Introduce tlbi_aa64_get_range
target/arm: Extend arm_fi_to_lfsc to level -1
target/arm: Implement FEAT_LPA
target/arm: Implement FEAT_LVA
target/arm: Prepare DBGBVR and DBGWVR for FEAT_LVA
target/arm: Honor TCR_ELx.{I}PS
target/arm: Use MAKE_64BIT_MASK to compute indexmask
target/arm: Pass outputsize down to check_s2_mmu_setup
target/arm: Move arm_pamax out of line
target/arm: Fault on invalid TCR_ELx.TxSZ
target/arm: Set TCR_EL1.TSZ for user-only
hw/registerfields: Add FIELD_SEX<N> and FIELD_SDP<N>
tests/qtest: add qtests for npcm7xx sdhci
...
Peter Maydell [Wed, 2 Mar 2022 20:55:48 +0000 (20:55 +0000)]
Merge remote-tracking branch 'remotes/dgilbert-gitlab/tags/pull-migration-20220302b' into staging
Migration/HMP/Virtio pull 2022-03-02
A bit of a mix this time:
* Minor fixes from myself, Hanna, and Jack
* VNC password rework by Stefan and Fabian
* Postcopy changes from Peter X that are
the start of a larger series to come
* Removing the prehistoic load_state_old
code from Peter M
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
# gpg: Signature made Wed 02 Mar 2022 18:25:12 GMT
# gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <[email protected]>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert-gitlab/tags/pull-migration-20220302b:
migration: Remove load_state_old and minimum_version_id_old
tests: Pass in MigrateStart** into test_migrate_start()
migration: Add migration_incoming_transport_cleanup()
migration: postcopy_pause_fault_thread() never fails
migration: Enlarge postcopy recovery to capture !-EIO too
migration: Move static var in ram_block_from_stream() into global
migration: Add postcopy_thread_create()
migration: Dump ramblock and offset too when non-same-page detected
migration: Introduce postcopy channels on dest node
migration: Tracepoint change in postcopy-run bottom half
migration: Finer grained tracepoints for POSTCOPY_LISTEN
migration: Dump sub-cmd name in loadvm_process_command tp
migration/rdma: set the REUSEADDR option for destination
qapi/monitor: allow VNC display id in set/expire_password
qapi/monitor: refactor set/expire_password with enums
monitor/hmp: add support for flag argument with value
virtiofsd: Let meson check for statx.stx_mnt_id
clock-vmstate: Add missing END_OF_LIST
Peter Maydell [Thu, 24 Feb 2022 10:13:30 +0000 (10:13 +0000)]
ui/cocoa.m: Remove unnecessary NSAutoreleasePools
In commit 6e657e64cdc478 in 2013 we added some autorelease pools to
deal with complaints from macOS when we made calls into Cocoa from
threads that didn't have automatically created autorelease pools.
Later on, macOS got stricter about forbidding cross-thread Cocoa
calls, and in commit 5588840ff77800e839d8 we restructured the code to
avoid them. This left the autorelease pool creation in several
functions without any purpose; delete it.
We still need the pool in cocoa_refresh() for the clipboard related
code which is called directly there.
Peter Maydell [Thu, 24 Feb 2022 10:13:29 +0000 (10:13 +0000)]
ui/cocoa.m: Fix updateUIInfo threading issues
The updateUIInfo method makes Cocoa API calls. It also calls back
into QEMU functions like dpy_set_ui_info(). To do this safely, we
need to follow two rules:
* Cocoa API calls are made on the Cocoa UI thread
* When calling back into QEMU we must hold the iothread lock
Fix the places where we got this wrong, by taking the iothread lock
while executing updateUIInfo, and moving the call in cocoa_switch()
inside the dispatch_async block.
Some of the Cocoa UI methods which call updateUIInfo are invoked as
part of the initial application startup, while we're still doing the
little cross-thread dance described in the comment just above
call_qemu_main(). This meant they were calling back into the QEMU UI
layer before we'd actually finished initializing our display and
registered the DisplayChangeListener, which isn't really valid. Once
updateUIInfo takes the iothread lock, we no longer get away with
this, because during this startup phase the iothread lock is held by
the QEMU main-loop thread which is waiting for us to finish our
display initialization. So we must suppress updateUIInfo until
applicationDidFinishLaunching allows the QEMU main-loop thread to
continue.
Peter Maydell [Thu, 24 Feb 2022 13:46:54 +0000 (13:46 +0000)]
target/arm: Report KVM's actual PSCI version to guest in dtb
When we're using KVM, the PSCI implementation is provided by the
kernel, but QEMU has to tell the guest about it via the device tree.
Currently we look at the KVM_CAP_ARM_PSCI_0_2 capability to determine
if the kernel is providing at least PSCI 0.2, but if the kernel
provides a newer version than that we will still only tell the guest
it has PSCI 0.2. (This is fairly harmless; it just means the guest
won't use newer parts of the PSCI API.)
The kernel exposes the specific PSCI version it is implementing via
the ONE_REG API; use this to report in the dtb that the PSCI
implementation is 1.0-compatible if appropriate. (The device tree
binding currently only distinguishes "pre-0.2", "0.2-compatible" and
"1.0-compatible".)
This feature widens physical addresses (and intermediate physical
addresses for 2-stage translation) from 48 to 52 bits, when using
4k or 16k pages.
This introduces the DS bit to TCR_ELx, which is RES0 unless the
page size is enabled and supports LPA2, resulting in the effective
value of DS for a given table walk. The DS bit changes the format
of the page table descriptor slightly, moving the PS field out to
TCR so that all pages have the same sharability and repurposing
those bits of the page table descriptor for the highest bits of
the output address.
Do not yet enable FEAT_LPA2; we need extra plumbing to avoid
tickling an old kernel bug.
We support 16k pages, but do not advertize that in ID_AA64MMFR0.
The value 0 in the TGRAN*_2 fields indicates that stage2 lookups defer
to the same support as stage1 lookups. This setting is deprecated, so
indicate support for all stage2 page sizes directly.
target/arm: Validate tlbi TG matches translation granule in use
For FEAT_LPA2, we will need other ARMVAParameters, which themselves
depend on the translation granule in use. We might as well validate
that the given TG matches; the architecture "does not require that
the instruction invalidates any entries" if this is not true.
Merge tlbi_aa64_range_get_length and tlbi_aa64_range_get_base,
returning a structure containing both results. Pass in the
ARMMMUIdx, rather than the digested two_ranges boolean.
This is in preparation for FEAT_LPA2, where the interpretation
of 'value' depends on the effective value of DS for the regime.
With FEAT_LPA2, rather than introducing translation level 4,
we introduce level -1, below the current level 0. Extend
arm_fi_to_lfsc to handle these faults.
Assert that this new translation level does not leak into
fault types for which it is not defined, which allows some
masking of fi->level to be removed.
This feature widens physical addresses (and intermediate physical
addresses for 2-stage translation) from 48 to 52 bits, when using
64k pages. The only thing left at this point is to handle the
extra bits in the TTBR and in the table descriptors.
Note that PAR_EL1 and HPFAR_EL2 are nominally extended, but we don't
mask out the high bits when writing to those registers, so no changes
are required there.
This feature is relatively small, as it applies only to
64k pages and thus requires no additional changes to the
table descriptor walking algorithm, only a change to the
minimum TSZ (which is the inverse of the maximum virtual
address space size).
Note that this feature widens VBAR_ELx, but we already
treat the register as being 64 bits wide.
target/arm: Prepare DBGBVR and DBGWVR for FEAT_LVA
The original A.a revision of the AArch64 ARM required that we
force-extend the addresses in these registers from 49 bits.
This language has been loosened via a combination of IMPLEMENTATION
DEFINED and CONSTRAINTED UNPREDICTABLE to allow consideration of
the entire aligned address.
This means that we do not have to consider whether or not FEAT_LVA
is enabled, and decide from which bit an address might need to be
extended.
This field controls the output (intermediate) physical address size
of the translation process. V8 requires to raise an AddressSize
fault if the page tables are programmed incorrectly, such that any
intermediate descriptor address, or the final translated address,
is out of range.
Add a PS field to ARMVAParameters, and properly compute outputsize
in get_phys_addr_lpae. Test the descaddr as extracted from TTBR
and from page table entries.
Restrict descaddrmask so that we won't raise the fault for v7.
target/arm: Pass outputsize down to check_s2_mmu_setup
Pass down the width of the output address from translation.
For now this is still just PAMax, but a subsequent patch will
compute the correct value from TCR_ELx.{I}PS.
Without FEAT_LVA, the behaviour of programming an invalid value
is IMPLEMENTATION DEFINED. With FEAT_LVA, programming an invalid
minimum value requires a Translation fault.
It is most self-consistent to choose to generate the fault always.
Wentao_Liang [Fri, 25 Feb 2022 04:01:42 +0000 (12:01 +0800)]
target/arm: Fix early free of TCG temp in handle_simd_shift_fpint_conv()
handle_simd_shift_fpint_conv() was accidentally freeing the TCG
temporary tcg_fpstatus too early, before the last use of it. Move
the free down to where it belongs.
Akihiko Odaki [Sun, 13 Feb 2022 03:57:53 +0000 (12:57 +0900)]
target/arm: Support PSCI 1.1 and SMCCC 1.0
Support the latest PSCI on TCG and HVF. A 64-bit function called from
AArch32 now returns NOT_SUPPORTED, which is necessary to adhere to SMC
Calling Convention 1.0. It is still not compliant with SMCCC 1.3 since
they do not implement mandatory functions.
Peter Maydell [Mon, 21 Feb 2022 14:07:50 +0000 (14:07 +0000)]
hw/input/tsc210x: Don't abort on bad SPI word widths
The tsc210x doesn't support anything other than 16-bit reads on the
SPI bus, but the guest can program the SPI controller to attempt
them anyway. If this happens, don't abort QEMU, just log this as
a guest error.
This fixes our machine_arm_n8x0.py:N8x0Machine.test_n800
acceptance test, which hits this assertion.
The reason we hit the assertion is because the guest kernel thinks
there is a TSC2005 on this SPI bus address, not a TSC210x. (The n810
*does* have a TSC2005 at this address.) The TSC2005 supports the
24-bit accesses which the guest driver makes, and the TSC210x does
not (that is, our TSC210x emulation is not missing support for a word
width the hardware can handle). It's not clear whether the problem
here is that the guest kernel incorrectly thinks the n800 has the
same device at this SPI bus address as the n810, or that QEMU's n810
board model doesn't get the SPI devices right. At this late date
there no longer appears to be any reliable information on the web
about the hardware behaviour, but I am inclined to think this is a
guest kernel bug. In any case, we prefer not to abort QEMU for
guest-triggerable conditions, so logging the error is the right thing
to do.
Peter Maydell [Mon, 21 Feb 2022 09:41:44 +0000 (09:41 +0000)]
hw/arm/mps2-tz.c: Update AN547 documentation URL
The AN547 application note URL has changed: update our comment
accordingly. (Rev B is still downloadable from the old URL,
but there is a new Rev C of the document now.)
Jimmy Brisson [Thu, 10 Feb 2022 21:02:27 +0000 (15:02 -0600)]
mps3-an547: Add missing user ahb interfaces
With these interfaces missing, TFM would delegate peripherals 0, 1,
2, 3 and 8, and qemu would ignore the delegation of interface 8, as
it thought interface 4 was eth & USB.
This patch corrects this behavior and allows TFM to delegate the
eth & USB peripheral to NS mode.
(The old QEMU behaviour was based on revision B of the AN547
appnote; revision C corrects this error in the documentation,
and this commit brings QEMU in to line with how the FPGA
image really behaves.)
Peter Maydell [Tue, 15 Feb 2022 17:57:05 +0000 (17:57 +0000)]
migration: Remove load_state_old and minimum_version_id_old
There are no longer any VMStateDescription structs in the tree which
use the load_state_old support for custom handling of incoming
migration from very old QEMU. Remove the mechanism entirely.
This includes removing one stray useless setting of
minimum_version_id_old in a VMStateDescription with no load_state_old
function, which crept in after the global weeding-out of them in
commit 17e313406126.
Peter Xu [Tue, 1 Mar 2022 08:39:25 +0000 (16:39 +0800)]
tests: Pass in MigrateStart** into test_migrate_start()
test_migrate_start() will release the MigrateStart structure that passed
in, however that's not super clear to the caller because after the call
returned the pointer can still be referenced by the callers. It can easily
be a source of use-after-free.
Let's pass in a double pointer of that, then we can safely clear the
pointer for the caller after the struct is released.
When do it, we should also null-ify the cleanup hook and the data, then it's
even safe to call it multiple times.
Move the socket_address_list cleanup altogether, because that's a mirror of the
listener channels and only for the purpose of query-migrate. Hence when
someone wants to cleanup the listener transport, it should also want to cleanup
the socket list too, always.
Peter Xu [Tue, 1 Mar 2022 08:39:10 +0000 (16:39 +0800)]
migration: Enlarge postcopy recovery to capture !-EIO too
We used to have quite a few places making sure -EIO happened and that's the
only way to trigger postcopy recovery. That's based on the assumption that
we'll only return -EIO for channel issues.
It'll work in 99.99% cases but logically that won't cover some corner cases.
One example is e.g. ram_block_from_stream() could fail with an interrupted
network, then -EINVAL will be returned instead of -EIO.
I remembered Dave Gilbert pointed that out before, but somehow this is
overlooked. Neither did I encounter anything outside the -EIO error.
However we'd better touch that up before it triggers a rare VM data loss during
live migrating.
To cover as much those cases as possible, remove the -EIO restriction on
triggering the postcopy recovery, because even if it's not a channel failure,
we can't do anything better than halting QEMU anyway - the corpse of the
process may even be used by a good hand to dig out useful memory regions, or
the admin could simply kill the process later on.
Peter Xu [Tue, 1 Mar 2022 08:39:06 +0000 (16:39 +0800)]
migration: Add postcopy_thread_create()
Postcopy create threads. A common manner is we init a sem and use it to sync
with the thread. Namely, we have fault_thread_sem and listen_thread_sem and
they're only used for this.
Make it a shared infrastructure so it's easier to create yet another thread.
Peter Xu [Tue, 1 Mar 2022 08:39:05 +0000 (16:39 +0800)]
migration: Dump ramblock and offset too when non-same-page detected
In ram_load_postcopy() we'll try to detect non-same-page case and dump error.
This error is very helpful for debugging. Adding ramblock & offset into the
error log too.
Peter Xu [Tue, 1 Mar 2022 08:39:04 +0000 (16:39 +0800)]
migration: Introduce postcopy channels on dest node
Postcopy handles huge pages in a special way that currently we can only have
one "channel" to transfer the page.
It's because when we install pages using UFFDIO_COPY, we need to have the whole
huge page ready, it also means we need to have a temp huge page when trying to
receive the whole content of the page.
Currently all maintainance around this tmp page is global: firstly we'll
allocate a temp huge page, then we maintain its status mostly within
ram_load_postcopy().
To enable multiple channels for postcopy, the first thing we need to do is to
prepare N temp huge pages as caching, one for each channel.
Meanwhile we need to maintain the tmp huge page status per-channel too.
To give some example, some local variables maintained in ram_load_postcopy()
are listed; they are responsible for maintaining temp huge page status:
- all_zero: this keeps whether this huge page contains all zeros
- target_pages: this counts how many target pages have been copied
- host_page: this keeps the host ptr for the page to install
Move all these fields to be together with the temp huge pages to form a new
structure called PostcopyTmpPage. Then for each (future) postcopy channel, we
need one structure to keep the state around.
For vanilla postcopy, obviously there's only one channel. It contains both
precopy and postcopy pages.
This patch teaches the dest migration node to start realize the possible number
of postcopy channels by introducing the "postcopy_channels" variable. Its
value is calculated when setup postcopy on dest node (during POSTCOPY_LISTEN
phase).
Vanilla postcopy will have channels=1, but when postcopy-preempt capability is
enabled (in the future), we will boost it to 2 because even during partial
sending of a precopy huge page we still want to preempt it and start sending
the postcopy requested page right away (so we start to keep two temp huge
pages; more if we want to enable multifd). In this patch there's a TODO marked
for that; so far the channels is always set to 1.
We need to send one "host huge page" on one channel only and we cannot split
them, because otherwise the data upon the same huge page can locate on more
than one channel so we need more complicated logic to manage. One temp host
huge page for each channel will be enough for us for now.
Postcopy will still always use the index=0 huge page even after this patch.
However it prepares for the latter patches where it can start to use multiple
channels (which needs src intervention, because only src knows which channel we
should use).
Jack Wang [Tue, 8 Feb 2022 08:56:40 +0000 (09:56 +0100)]
migration/rdma: set the REUSEADDR option for destination
We hit following error during testing RDMA transport:
in case of migration error, mgmt daemon pick one migration port,
incoming rdma:[::]:8089: RDMA ERROR: Error: could not rdma_bind_addr
Then try another -incoming rdma:[::]:8103, sometime it worked,
sometimes need another try with other ports number.
Set the REUSEADDR option for destination, This allow address could
be reused to avoid rdma_bind_addr error out.
Stefan Reiter [Fri, 25 Feb 2022 08:49:49 +0000 (09:49 +0100)]
qapi/monitor: allow VNC display id in set/expire_password
It is possible to specify more than one VNC server on the command line,
either with an explicit ID or the auto-generated ones à la "default",
"vnc2", "vnc3", ...
It is not possible to change the password on one of these extra VNC
displays though. Fix this by adding a "display" parameter to the
"set_password" and "expire_password" QMP and HMP commands.
For HMP, the display is specified using the "-d" value flag.
For QMP, the schema is updated to explicitly express the supported
variants of the commands with protocol-discriminated unions.
Stefan Reiter [Fri, 25 Feb 2022 08:49:47 +0000 (09:49 +0100)]
monitor/hmp: add support for flag argument with value
Adds support for the "-xs" parameter type, where "-x" denotes a flag
name and the "s" suffix indicates that this flag is supposed to take
an arbitrary string parameter.
These parameters are always optional, the entry in the qdict will be
omitted if the flag is not given.
Hanna Reitz [Wed, 23 Feb 2022 09:23:40 +0000 (10:23 +0100)]
virtiofsd: Let meson check for statx.stx_mnt_id
In virtiofsd, we assume that the presence of the STATX_MNT_ID macro
implies existence of the statx.stx_mnt_id field. Unfortunately, that is
not necessarily the case: glibc has introduced the macro in its commit 88a2cf6c4bab6e94a65e9c0db8813709372e9180, but the statx.stx_mnt_id field
is still missing from its own headers.
Let meson.build actually chek for both STATX_MNT_ID and
statx.stx_mnt_id, and set CONFIG_STATX_MNT_ID if both are present.
Then, use this config macro in virtiofsd.
Peter Maydell [Wed, 2 Mar 2022 12:38:46 +0000 (12:38 +0000)]
Merge remote-tracking branch 'remotes/legoater/tags/pull-ppc-20220302' into staging
ppc-7.0 queue
* ppc/pnv fixes
* PMU EBB support
* target/ppc: PowerISA Vector/VSX instruction batch
* ppc/pnv: Extension of the powernv10 machine with XIVE2 ans PHB5 models
* spapr allocation cleanups
# gpg: Signature made Wed 02 Mar 2022 11:00:42 GMT
# gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <[email protected]>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1
* remotes/legoater/tags/pull-ppc-20220302: (87 commits)
hw/ppc/spapr_vio.c: use g_autofree in spapr_dt_vdevice()
hw/ppc/spapr_rtas.c: use g_autofree in rtas_ibm_get_system_parameter()
spapr_pci_nvlink2.c: use g_autofree in spapr_phb_nvgpu_ram_populate_dt()
hw/ppc/spapr_numa.c: simplify spapr_numa_write_assoc_lookup_arrays()
hw/ppc/spapr_drc.c: use g_autofree in spapr_drc_by_index()
hw/ppc/spapr_drc.c: use g_autofree in spapr_dr_connector_new()
hw/ppc/spapr_drc.c: use g_autofree in drc_unrealize()
hw/ppc/spapr_drc.c: use g_autofree in drc_realize()
hw/ppc/spapr_drc.c: use g_auto in spapr_dt_drc()
hw/ppc/spapr_caps.c: use g_autofree in spapr_caps_add_properties()
hw/ppc/spapr_caps.c: use g_autofree in spapr_cap_get_string()
hw/ppc/spapr_caps.c: use g_autofree in spapr_cap_set_string()
hw/ppc/spapr.c: fail early if no firmware found in machine_init()
hw/ppc/spapr.c: use g_autofree in spapr_dt_chosen()
pnv/xive2: Add support for 8bits thread id
pnv/xive2: Add support for automatic save&restore
xive2: Add a get_config() handler for the router configuration
pnv/xive2: Add support XIVE2 P9-compat mode (or Gen1)
ppc/pnv: add XIVE Gen2 TIMA support
pnv/xive2: Introduce new capability bits
...
Peter Maydell [Wed, 2 Mar 2022 10:46:16 +0000 (10:46 +0000)]
Merge remote-tracking branch 'remotes/stsquad/tags/pull-testing-and-semihosting-280222-1' into staging
Testing and semihosting updates:
- restore TESTS/IMAGES filtering to docker tests
- add NOUSER to alpine image
- bump lcitool version
- move arm64/s390x cross build images to lcitool
- add aarch32 runner CI scripts
- expand testing to more vectors
- update s390x jobs to focal for gitlab/travis
- disable threadcount for all sh4
- fix semihosting SYS_HEAPINFO and test
# gpg: Signature made Mon 28 Feb 2022 18:46:41 GMT
# gpg: using RSA key 6685AE99E75167BCAFC8DF35FBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <[email protected]>" [full]
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44
* remotes/stsquad/tags/pull-testing-and-semihosting-280222-1:
tests/tcg: port SYS_HEAPINFO to a system test
semihosting/arm-compat: replace heuristic for softmmu SYS_HEAPINFO
tests/tcg: completely disable threadcount for sh4
gitlab: upgrade the job definition for s390x to 20.04
travis.yml: Update the s390x jobs to Ubuntu Focal
tests/tcg: add vectorised sha512 versions
tests/tcg: add sha512 test
tests/tcg: build sha1-vector with O3 and compare
tests/tcg/ppc64: clean-up handling of byte-reverse
gitlab: add a new aarch32 custom runner definition
scripts/ci: allow for a secondary runner
scripts/ci: add build env rules for aarch32 on aarch64
tests/docker: introduce debian-riscv64-test-cross
tests/docker: update debian-s390x-cross with lcitool
tests/docker: update debian-arm64-cross with lcitool
tests/lcitool: update to latest version
tests/docker: add NOUSER for alpine image
tests/docker: restore TESTS/IMAGES filtering
We can get the job done in spapr_numa_write_assoc_lookup_arrays() a bit
cleaner:
- 'cur_index = int_buf = g_malloc0(..)' is doing a g_malloc0() in the
'int_buf' pointer and making 'cur_index' point to 'int_buf' all in a
single line. No problem with that, but splitting into 2 lines is clearer
to follow
- use g_autofree in 'int_buf' to avoid a g_free() call later on
- 'buf_len' is only being used to store the size of 'int_buf' malloc.
Remove the var and just use the value in g_malloc0() directly
- remove the 'ret' var and just return the result of fdt_setprop()
hw/ppc/spapr.c: fail early if no firmware found in machine_init()
The firmware check consists on a file search (qemu_find_file) and load
it via load_imag_targphys(). This validation is not dependent on any
other machine state but it currently being done at the end of
spapr_machine_init(). This means that we can do a lot of stuff and end
up failing at the end for something that we can verify right out of the
gate.
Move this validation to the start of spapr_machine_init() to fail
earlier. While we're at it, use g_autofree in the 'filename' pointer.
The XIVE interrupt controller on P10 can automatically save and
restore the state of the interrupt registers under the internal NVP
structure representing the VCPU. This saves a costly store/load in
guest entries and exits.
pnv/xive2: Add support XIVE2 P9-compat mode (or Gen1)
The thread interrupt management area (TIMA) is a set of pages mapped
in the Hypervisor and in the guest OS address space giving access to
the interrupt thread context registers for interrupt management, ACK,
EOI, CPPR, etc.
XIVE2 changes slightly the TIMA layout with extra bits for the new
features, larger CAM lines and the controller provides configuration
switches for backward compatibility. This is called the XIVE2
P9-compat mode, of Gen1 TIMA. It impacts the layout of the TIMA and
the availability of the internal features associated with it,
Automatic Save & Restore for instance. Using a P9 layout also means
setting the controller in such a mode at init time.
As the OPAL driver initializes the XIVE2 controller with a XIVE2/P10
TIMA directly, the XIVE2 model only has a simple support for the
compat mode in the OS TIMA.
Only the CAM line updates done by the hypervisor are specific to
POWER10. Instead of duplicating the TM ops table, we handle these
commands locally under the PowerNV XIVE2 model.
These bits control the availability of interrupt features : StoreEOI,
PHB PQ_disable, PHB Address-Based Trigger and the overall XIVE
exploitation mode. These bits can be set at early boot time of the
system to activate/deactivate a feature for testing purposes. The
default value should be '1'.
The 'XIVE exploitation mode' bit is a software bit that skiboot could
use to disable the XIVE OS interface and propose a P8 style XICS
interface instead. There are no plans for that for the moment.
ppc/pnv: Add support for PHB5 "Address-based trigger" mode
When the Address-Based Interrupt Trigger mode is activated, the PHB
maps the interrupt source number into the interrupt command address.
The PHB directly triggers the IC ESB page of the interrupt number and
not the notify page of the IC anymore.
The PQ_disable configuration bit disables the check done on the PQ
state bits when processing new MSI interrupts. When bit 9 is enabled,
the PHB forwards any MSI trigger to the XIVE interrupt controller
without checking the PQ state bits. The XIVE IC knows from the trigger
message that the PQ bits have not been checked and performs the check
locally.
This configuration bit only applies to MSIs and LSIs are still checked
on the PHB to handle the assertion level.
PQ_disable enablement is a requirement for StoreEOI.
The trigger message coming from a HW source contains a special bit
informing the XIVE interrupt controller that the PQ bits have been
checked at the source or not. Depending on the value, the IC can
perform the check and the state transition locally using its own PQ
state bits.
The following changes add new accessors to the XiveRouter required to
query and update the PQ state bits. This only applies to the PowerNV
machine. sPAPR accessors are provided but the pSeries machine should
not be concerned by such complex configuration for the moment.
ppc/xive2: Add support for notification injection on ESB pages
This is an internal offset used to inject triggers when the PQ state
bits are not controlled locally. Such as for LSIs when the PHB5 are
using the Address-Based Interrupt Trigger mode and on the END.
and use a pnv_chip_power10_quad_realize() helper to avoid code
duplication with P9. This still needs some refinements on the XSCOM
registers handling in PnvQuad.
ppc/pnv: Add a XIVE2 controller to the POWER10 chip
The XIVE2 interrupt controller of the POWER10 processor follows the
same logic than on POWER9 but the HW interface has been largely
reviewed. It has a new register interface, different BARs, extra
VSDs, new layout for the XIVE2 structures, and a set of new features
which are described below.
This is a model of the POWER10 XIVE2 interrupt controller for the
PowerNV machine. It focuses primarily on the needs of the skiboot
firmware but some initial hypervisor support is implemented for KVM
use (escalation).
Support for new features will be implemented in time and will require
new support from the OS.
* XIVE2 BARS
The interrupt controller BARs have a different layout outlined below.
Each sub-engine has now own its range and the indirect TIMA access was
replaced with a set of pages, one per CPU, under the IC BAR:
- IC BAR (Interrupt Controller)
. 4 pages, one per sub-engine
. 128 indirect TIMA pages
- TM BAR (Thread Interrupt Management Area)
. 4 pages
- ESB BAR (ESB pages for IPIs)
. up to 1TB
- END BAR (ESB pages for ENDs)
. up to 2TB
- NVC BAR (Notification Virtual Crowd)
. up to 128
- NVPG BAR (Notification Virtual Process and Group)
. up to 1TB
- Direct mapped Thread Context Area (reads & writes)
OPAL does not use the grouping and crowd capability.
* Virtual Structure Tables
XIVE2 adds new tables types and also changes the field layout of the END
and NVP Virtualization Structure Descriptors.
- EAS
- END new layout
- NVT was splitted in :
. NVP (Processor), 32B
. NVG (Group), 32B
. NVC (Crowd == P9 block group) 32B
- IC for remote configuration
- SYNC for cache injection
- ERQ for event input queue
The setup is slighly different on XIVE2 because the indexing has changed
for some of the tables, block ID or the chip topology ID can be used.
* XIVE2 features
SCOM and MMIO registers have a new layout and XIVE2 adds a new global
capability and configuration registers.
The lowlevel hardware offers a set of new features among which :
- a configurable number of priorities : 1 - 8
- StoreEOI with load-after-store ordering is activated by default
- Gen2 TIMA layout
- A P9-compat mode, or Gen1, TIMA toggle bit for SW compatibility
- increase to 24bit for VP number
Other features will have some impact on the Hypervisor and guest OS
when activated, but this is not required for initial support of the
controller.
The VP space is larger in XIVE2 (P10), 24 bits instead of 19bits on
XIVE (P9), and the CAM line can use a 7bits or 8bits thread id.
For now, we only use 7bits thread ids, same as P9, but because of the
change of the size of the VP space, the CAM matching routine is
different between P9 and P10. It is easier to duplicate the whole
routine than to add extra handlers in xive_presenter_tctx_match() used
for P9.
We might come with a better solution later on, after we have added
some more support for the XIVE2 controller.
The XIVE2 interrupt controller of the POWER10 processor as the same
logic as on POWER9 but its SW interface has been largely reworked. The
interrupt controller has a new register interface, different BARs,
extra VSDs. These will be described when we add the device model for
the baremetal machine.
The XIVE internal structures for the EAS, END, NVT have different
layouts which is a problem for the current core XIVE framework. To
avoid adding too much complexity in the XIVE models, a new XIVE2 core
framework is introduced. It duplicates the models which are closely
linked to the XIVE internal structures : Xive2Router and
Xive2ENDSource and reuses the XiveSource, XivePresenter, XiveTCTX
models, as they are more generic.