Alex Bennée [Thu, 5 Dec 2019 12:25:15 +0000 (12:25 +0000)]
linux-user: log page table changes under -d page
The CPU_LOG_PAGE flag is woefully underused and could stand to do
extra duty tracking page changes. If the user doesn't want to see the
details as things change they still have the tracepoints available.
We push the locking into log_page_dump and pass a reason for the
banner text.
Thomas Huth [Tue, 19 Nov 2019 09:21:47 +0000 (10:21 +0100)]
travis.yml: Remove the redundant clang-with-MAIN_SOFTMMU_TARGETS entry
We test clang with the MAIN_SOFTMMU_TARGETS twice, once without
sanitizers and once with sanitizers enabled. That's somewhat redundant
since if compilation and tests succeeded with sanitizers enabled, it
should also work fine without sanitizers. Thus remove the clang entry
without sanitizers to speed up the CI testing a little bit.
Robert Foley [Mon, 18 Nov 2019 21:15:28 +0000 (16:15 -0500)]
Added tests for close and change of logfile.
One test ensures that the logfile handle is still valid even if
the logfile is changed during logging.
The other test validates that the logfile handle remains valid under
the logfile lock even if the logfile is closed.
Robert Foley [Mon, 18 Nov 2019 21:15:27 +0000 (16:15 -0500)]
Add use of RCU for qemu_logfile.
This now allows changing the logfile while logging is active,
and also solves the issue of a seg fault while changing the logfile.
Any read access to the qemu_logfile handle will use
the rcu_read_lock()/unlock() around the use of the handle.
To fetch the handle we will use atomic_rcu_read().
We also in many cases do a check for validity of the
logfile handle before using it to deal with the case where the
file is closed and set to NULL.
The cases where we write to the qemu_logfile will use atomic_rcu_set().
Writers will also use call_rcu() with a newly added qemu_logfile_free
function for freeing/closing when readers have finished.
Robert Foley [Mon, 18 Nov 2019 21:15:26 +0000 (16:15 -0500)]
qemu_log_lock/unlock now preserves the qemu_logfile handle.
qemu_log_lock() now returns a handle and qemu_log_unlock() receives a
handle to unlock. This allows for changing the handle during logging
and ensures the lock() and unlock() are for the same file.
Also in target/tilegx/translate.c removed the qemu_log_lock()/unlock()
calls (and the log("\n")), since the translator can longjmp out of the
loop if it attempts to translate an instruction in an inaccessible page.
Robert Foley [Mon, 18 Nov 2019 21:15:25 +0000 (16:15 -0500)]
Add a mutex to guarantee single writer to qemu_logfile handle.
Also added qemu_logfile_init() for initializing the logfile mutex.
Note that inside qemu_set_log() we needed to add a pair of
qemu_mutex_unlock() calls in order to avoid a double lock in
qemu_log_close(). This unavoidable temporary ugliness will be
cleaned up in a later patch in this series.
Paolo Bonzini [Wed, 11 Dec 2019 14:33:49 +0000 (15:33 +0100)]
ci: build out-of-tree
Most developers are using out-of-tree builds and it was discussed in the past
to only allow those. To prepare for the transition, use out-of-tree builds
in all continuous integration jobs.
Thomas Huth [Wed, 4 Dec 2019 15:46:18 +0000 (16:46 +0100)]
travis.yml: Enable builds on arm64, ppc64le and s390x
Travis recently added the possibility to test on these architectures,
too, so let's enable them in our travis.yml file to extend our test
coverage.
Unfortunately, the libssh in this Ubuntu version (bionic) is in a pretty
unusable Frankenstein state and libspice-server-dev is not available here,
so we can not use the global list of packages to install, but have to
provide individual package lists instead.
Also, some of the iotests crash when using "dist: bionic" on arm64
and ppc64le, thus these two builders have to use "dist: xenial" until
the problem is understood / fixed.
Thomas Huth [Wed, 4 Dec 2019 15:46:16 +0000 (16:46 +0100)]
tests/test-util-filemonitor: Skip test on non-x86 Travis containers
test-util-filemonitor fails in restricted non-x86 Travis containers
since they apparently blacklisted some required system calls there.
Let's simply skip the test if we detect such an environment.
Thomas Huth [Wed, 4 Dec 2019 15:46:15 +0000 (16:46 +0100)]
tests/hd-geo-test: Skip test when images can not be created
In certain environments like restricted containers, we can not create
huge test images. To be able to use "make check" in such container
environments, too, let's skip the hd-geo-test instead of failing when
the test images could not be created.
Thomas Huth [Wed, 4 Dec 2019 15:46:14 +0000 (16:46 +0100)]
iotests: Skip test 079 if it is not possible to create large files
Test 079 fails in the arm64, s390x and ppc64le LXD containers on Travis
(which we will hopefully enable in our CI soon). These containers
apparently do not allow large files to be created. Test 079 tries to
create a 4G sparse file, which is apparently already too big for these
containers, so check first whether we can really create such files before
executing the test.
Thomas Huth [Wed, 4 Dec 2019 15:46:13 +0000 (16:46 +0100)]
iotests: Skip test 060 if it is not possible to create large files
Test 060 fails in the arm64, s390x and ppc64le LXD containers on Travis
(which we will hopefully enable in our CI soon). These containers
apparently do not allow large files to be created. The repair process
in test 060 creates a file of 64 GiB, so test first whether such large
files are possible and skip the test if that's not the case.
Thomas Huth [Wed, 4 Dec 2019 15:46:12 +0000 (16:46 +0100)]
iotests: Provide a function for checking the creation of huge files
Some tests create huge (but sparse) files, and to be able to run those
tests in certain limited environments (like CI containers), we have to
check for the possibility to create such files first. Thus let's introduce
a common function to check for large files, and replace the already
existing checks in the iotests 005 and 220 with this function.
Thomas Huth [Wed, 4 Dec 2019 08:31:33 +0000 (09:31 +0100)]
travis.yml: Run tcg tests with tci
So far we only have compile coverage for tci. But since commit 2f160e0f9797c7522bfd0d09218d0c9340a5137c ("tci: Add implementation
for INDEX_op_ld16u_i64") has been included now, we can also run the
"tcg" and "qtest" tests with tci, so let's enable them in Travis now.
Since we don't gain much additional test coverage by compiling all
targets, and TCI is broken e.g. with the Sparc targets, we also limit
the target list to a reasonable subset now (which should still get us
test coverage by tests/boot-serial-test for example).
By default VM build test use qemu-img from system's PATH to
create the image disk. Due the lack of qemu-img on the system
or the desire to simply use a version built with QEMU, it would
be nice to allow one to set its path. So this patch makes that
possible by reading the path to qemu-img from QEMU_IMG if set,
otherwise it fallback to default behavior.
Alex Bennée [Thu, 28 Nov 2019 15:35:24 +0000 (16:35 +0100)]
configure: allow disable of cross compilation containers
Our docker infrastructure isn't quite as multiarch as we would wish so
lets allow the user to disable it if they want. This will allow us to
use still run check-tcg on non-x86 CI setups.
* remotes/huth-gitlab/tags/pull-request-2019-12-17:
tests: use g_test_rand_int
tests/Makefile: Fix check-report.* targets shown in check-help
glib: use portable g_setenv()
hw/misc/ivshmem: Bury dead legacy INTx code
pseries: disable migration-test if /dev/kvm cannot be used
tests: fix modules-test 'duplicate test case' error
Remove libbluetooth / bluez from the CI tests
Remove the core bluetooth code
hw/usb: Remove the USB bluetooth dongle device
hw/arm/nseries: Replace the bluetooth chardev with a "null" chardev
Peter Maydell [Tue, 17 Dec 2019 14:34:31 +0000 (14:34 +0000)]
Merge remote-tracking branch 'remotes/cleber/tags/python-next-pull-request' into staging
Python queue 2019-12-17
# gpg: Signature made Tue 17 Dec 2019 05:12:43 GMT
# gpg: using RSA key 7ABB96EB8B46B94D5E0FE9BB657E8D33A5F209F3
# gpg: Good signature from "Cleber Rosa <[email protected]>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 7ABB 96EB 8B46 B94D 5E0F E9BB 657E 8D33 A5F2 09F3
* remotes/cleber/tags/python-next-pull-request:
python/qemu: Remove unneeded imports in __init__
python/qemu: accel: Add tcg_available() method
python/qemu: accel: Strengthen kvm_available() checks
python/qemu: accel: Add list_accel() method
python/qemu: Move kvm_available() to its own module
Acceptance tests: use relative location for tests
Acceptance tests: use avocado tags for machine type
Acceptance tests: introduce utility method for tags unique vals
Acceptance test x86_cpu_model_versions: use default vm
tests/acceptance: Makes linux_initrd and empty_cpu_model use QEMUMachine
python/qemu: Add set_qmp_monitor() to QEMUMachine
analyze-migration.py: replace numpy with python 3.2
analyze-migration.py: fix find() type error
Revert "Acceptance test: cancel test if m68k kernel packages goes missing"
tests/boot_linux_console: Fetch assets from Debian snapshot archives
* remotes/dgibson/tags/ppc-for-5.0-20191217: (88 commits)
pseries: Update SLOF firmware image
ppc/pnv: Drop PnvChipClass::type
ppc/pnv: Introduce PnvChipClass::xscom_pcba() method
ppc/pnv: Drop pnv_chip_is_power9() and pnv_chip_is_power10() helpers
ppc/pnv: Pass content of the "compatible" property to pnv_dt_xscom()
ppc/pnv: Pass XSCOM base address and address size to pnv_dt_xscom()
ppc/pnv: Introduce PnvChipClass::xscom_core_base() method
ppc/pnv: Introduce PnvChipClass::intc_print_info() method
ppc/pnv: Drop pnv_is_power9() and pnv_is_power10() helpers
ppc/pnv: Introduce PnvMachineClass::dt_power_mgt()
ppc/pnv: Introduce PnvMachineClass and PnvMachineClass::compat
ppc/pnv: Drop PnvPsiClass::chip_type
ppc/pnv: Introduce PnvPsiClass::compat
ppc: Drop useless extern annotation for functions
ppc/pnv: Fix OCC common area region mapping
ppc/pnv: Introduce PBA registers
ppc/pnv: Make PnvXScomInterface an incomplete type
ppc/pnv: populate the DT with realized XSCOM devices
ppc/pnv: Loop on the whole hierarchy to populate the DT with the XSCOM nodes
target/ppc: Add SPR TBU40
...
* remotes/ehabkost/tags/x86-next-pull-request:
i386: Use g_autofree in a few places
i386: Add new CPU model Cooperlake
i386: Add macro for stibp
i386: Add MSR feature bit for MDS-NO
Paolo Bonzini [Thu, 12 Dec 2019 01:17:58 +0000 (02:17 +0100)]
tests: use g_test_rand_int
g_test_rand_int provides a reproducible random integer number, using a
different number seed every time but allowing reproduction using the
--seed command line option. It is thus better suited to tests than
g_random_int or random.
tests/Makefile: Fix check-report.* targets shown in check-help
The check-report.html and check-report.xml targets were replaced
with check-report.tap in commit 9df43317b82 but the check-help
text was not updated so it still lists check-report.html.
Devices "ivshmem-plain" and "ivshmem-doorbell" support only MSI-X.
Config space register Interrupt Pin is zero. Device "ivshmem"
additionally supported legacy INTx, but it was removed in commit 5a0e75f0a9 "hw/misc/ivshmem: Remove deprecated "ivshmem" legacy
device". The commit left ivshmem_update_irq() behind. Since the
Interrupt Pin register is zero, the function does nothing. Remove it.
Laurent Vivier [Wed, 20 Nov 2019 17:09:55 +0000 (18:09 +0100)]
pseries: disable migration-test if /dev/kvm cannot be used
On ppc64, migration-test only works with kvm_hv, and we already
have a check to verify the module is loaded.
kvm_hv module can be loaded in memory and /sys/module/kvm_hv exists,
but on some systems (like build systems) /dev/kvm can be missing
(by administrators choice).
And as kvm_hv exists test-migration is started but QEMU falls back to
TCG because it cannot be used:
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator
And as the test is done with TCG, it fails.
As for s390x, we must check for the existence and the access rights
of /dev/kvm.
Thomas Huth [Wed, 20 Nov 2019 09:10:13 +0000 (10:10 +0100)]
Remove the core bluetooth code
It's been deprecated since QEMU v3.1. We've explicitly asked in the
deprecation message that people should speak up on qemu-devel in case
they are still actively using the bluetooth part of QEMU, but nobody
ever replied that they are really still using it.
I've tried it on my own to use this bluetooth subsystem for one of my
guests, but I was also not able to get it running anymore: When I was
trying to pass-through a real bluetooth device, either the guest did
not see the device at all, or the guest crashed.
Even worse for the emulated device: When running
qemu-system-x86_64 -bt device:keyboard
QEMU crashes once you hit a key.
So it seems like the bluetooth stack is not only neglected, it is
completely bitrotten, as far as I can tell. The only attention that
this code got during the past years were some CVEs that have been
spotted there. So this code is a burden for the developers, without
any real benefit anymore. Time to remove it.
Note: hw/bt/Kconfig only gets cleared but not removed here yet.
Otherwise there is a problem with the *-softmmu/config-devices.mak.d
dependency files - they still contain a reference to this file which
gets evaluated first on some build hosts, before the file gets
properly recreated. To avoid breaking these builders, we still need
the file around for some time. It will get removed in a couple of
weeks instead.
Alexey Kardashevskiy (12):
allocator: Fix format strings for DEBUG
virtio: Make virtio_set_qaddr static
client: Load initramdisk location
sloffs: Fix -Wunused-result gcc warnings in read/write
pci-phb: Reimplement dma-map-in/out
virtio: Store queue descriptors in virtio_device
virtio-net: Init queues after features negotiation
virtio: Enable IOMMU
ibm,client-architecture-support: Fix stack handling
fdt: Fix updating the tree at H_CAS
version: update to 20191206
version: update to 20191217
Michael Roth (1):
dma: Define default dma methods for using by client/package instances
The XSCOM bus is implemented with a QOM interface, which is mostly
generic from a CPU type standpoint, except for the computation of
addresses on the Pervasive Connect Bus (PCB) network. This is handled
by the pnv_xscom_pcba() function with a switch statement based on
the chip_type class level attribute of the CPU chip.
This can be achieved using QOM. Also the address argument is masked with
PNV_XSCOM_SIZE - 1, which is for POWER8 only. Addresses may have different
sizes with other CPU types. Have each CPU chip type handle the appropriate
computation with a QOM xscom_pcba() method.
Greg Kurz [Fri, 13 Dec 2019 12:00:24 +0000 (13:00 +0100)]
ppc/pnv: Pass content of the "compatible" property to pnv_dt_xscom()
Since pnv_dt_xscom() is called from chip specific dt_populate() hooks,
it shouldn't have to guess the chip type in order to populate the
"compatible" property. Just pass the compat string and its size as
arguments.
Greg Kurz [Fri, 13 Dec 2019 12:00:18 +0000 (13:00 +0100)]
ppc/pnv: Pass XSCOM base address and address size to pnv_dt_xscom()
Since pnv_dt_xscom() is called from chip specific dt_populate() hooks,
it shouldn't have to guess the chip type in order to populate the "reg"
property. Just pass the base address and address size as arguments.
The pnv_chip_core_realize() function configures the XSCOM MMIO subregion
for each core of a single chip. The base address of the subregion depends
on the CPU type. Its computation is currently open-code using the
pnv_chip_is_powerXX() helpers. This can be achieved with QOM. Introduce
a method for this in the base chip class and implement it in child classes.
The pnv_pic_print_info() callback checks the type of the chip in order
to forward to the request appropriate interrupt controller. This can
be achieved with QOM. Introduce a method for this in the base chip class
and implement it in child classes.
This also prepares ground for the upcoming interrupt controller of POWER10
chips.
We add an extra node to advertise power management on some machines,
namely powernv9 and powernv10. This is achieved by using the
pnv_is_power9() and pnv_is_power10() helpers.
This can be achieved with QOM. Add a method to the base class for
powernv machines and have it implemented by machine types that
support power management instead.
Greg Kurz [Fri, 13 Dec 2019 11:59:50 +0000 (12:59 +0100)]
ppc/pnv: Introduce PnvMachineClass and PnvMachineClass::compat
The pnv_dt_create() function generates different contents for the
"compatible" property of the root node in the DT, depending on the
CPU type. This is open coded with multiple ifs using pnv_is_powerXX()
helpers.
It seems cleaner to achieve with QOM. Introduce a base class for the
powernv machine and a compat attribute that each child class can use
to provide the value for the "compatible" property.
Greg Kurz [Fri, 13 Dec 2019 11:59:39 +0000 (12:59 +0100)]
ppc/pnv: Introduce PnvPsiClass::compat
The Processor Service Interface (PSI) model has a chip_type class level
attribute, which is used to generate the content of the "compatible" DT
property according to the CPU type.
Since the PSI model already has specialized classes for each supported
CPU type, it seems cleaner to achieve this with QOM. Provide the content
of the "compatible" property with a new class level attribute.
We could define a "OCC common area" memory region at the machine level
and sub regions for each OCC. But it adds some extra complexity to the
models. Fix the current layout with a simpler model.
Cédric Le Goater [Wed, 11 Dec 2019 08:29:11 +0000 (09:29 +0100)]
ppc/pnv: Introduce PBA registers
The PBA bridge unit (Power Bus Access) connects the OCC (On Chip
Controller) to the Power bus and System Memory. The PBA is used to
gather sensor data, for power management, for sleep states, for
initial boot, among other things.
The PBA logic provides a set of four registers PowerBus Access Base
Address Registers (PBABAR0..3) which map the OCC address space to the
PowerBus space. These registers are setup by the initial FW and define
the PowerBus Range of system memory that can be accessed by PBA.
The current modeling of the PBABAR registers is done under the common
XSCOM handlers. We introduce a specific XSCOM regions for these
registers and fix :
- BAR sizes and BAR masks
- The mapping of the OCC common area. It is common to all chips and
should be mapped once. We will address per-OCC area in the next
change.
- OCC common area is in BAR 3 on P8
Greg Kurz [Wed, 11 Dec 2019 16:04:15 +0000 (17:04 +0100)]
ppc/pnv: Make PnvXScomInterface an incomplete type
PnvXScomInterface is an interface instance. It should never be
dereferenced. Drop the dummy type definition for extra safety,
which is the common practice with QOM interfaces.
While here also convert the bogus OBJECT_CHECK() to INTERFACE_CHECK().
Cédric Le Goater [Tue, 10 Dec 2019 13:58:45 +0000 (14:58 +0100)]
ppc/pnv: populate the DT with realized XSCOM devices
Some devices could be initialized in the instance_init handler but not
realized for configuration reasons. Nodes should not be added in the DT
for such devices.
The Access Segment Descriptor Register (ASDR) provides information about
the storage element when taking a hypervisor storage interrupt. When
performing nested radix address translation, this is normally the guest
real address. This register is present on POWER9 processors and later.
Implement the ADSR, note read and write access is limited to the
hypervisor.
target/ppc: Work [S]PURR implementation and add HV support
The Processor Utilisation of Resources Register (PURR) and Scaled
Processor Utilisation of Resources Register (SPURR) provide an estimate
of the resources used by the thread, present on POWER7 and later
processors.
Currently the [S]PURR registers simply count at the rate of the
timebase.
Preserve this behaviour but rework the implementation to store an offset
like the timebase rather than doing the calculation manually. Also allow
hypervisor write access to the register along with the currently
available read access.
The virtual timebase register (VTB) is a 64-bit register which
increments at the same rate as the timebase register, present on POWER8
and later processors.
The register is able to be read/written by the hypervisor and read by
the supervisor. All other accesses are illegal.
Currently the VTB is just an alias for the timebase (TB) register.
Implement the VTB so that is can be read/written independent of the TB.
Make use of the existing method for accessing timebase facilities where
by the compensation is stored and used to compute the value on reads/is
updated on writes.
This includes in QEMU a new CPU model for the POWER10 processor with
the same capabilities of a POWER9 process. The model will be extended
when support is completed.
Greg Kurz [Mon, 9 Dec 2019 13:28:00 +0000 (14:28 +0100)]
ppc: Make PPCVirtualHypervisor an incomplete type
PPCVirtualHypervisor is an interface instance. It should never be
dereferenced. Drop the dummy type definition for extra safety, which
is the common practice with QOM interfaces.
Greg Kurz [Wed, 4 Dec 2019 19:43:48 +0000 (20:43 +0100)]
ppc: Don't use CPUPPCState::irq_input_state with modern Book3s CPU models
The power7_set_irq() and power9_set_irq() functions set this but it is
never used actually. Modern Book3s compatible CPUs are only supported
by the pnv and spapr machines. They have an interrupt controller, XICS
for POWER7/8 and XIVE for POWER9, whose models don't require to track
IRQ input states at the CPU level.
Greg Kurz [Wed, 4 Dec 2019 19:43:37 +0000 (20:43 +0100)]
ppc: Deassert the external interrupt pin in KVM on reset
When a CPU is reset, QEMU makes sure no interrupt is pending by clearing
CPUPPCstate::pending_interrupts in ppc_cpu_reset(). In the case of a
complete machine emulation, eg. a sPAPR machine, an external interrupt
request could still be pending in KVM though, eg. an IPI. It will be
eventually presented to the guest, which is supposed to acknowledge it at
the interrupt controller. If the interrupt controller is emulated in QEMU,
either XICS or XIVE, ppc_set_irq() won't deassert the external interrupt
pin in KVM since it isn't pending anymore for QEMU. When the vCPU re-enters
the guest, the interrupt request is still pending and the vCPU will try
again to acknowledge it. This causes an infinite loop and eventually hangs
the guest.
The code has been broken since the beginning. The issue wasn't hit before
because accel=kvm,kernel-irqchip=off is an awkward setup that never got
used until recently with the LC92x IBM systems (aka, Boston).
Add a ppc_irq_reset() function to do the necessary cleanup, ie. deassert
the IRQ pins of the CPU in QEMU and most importantly the external interrupt
pin for this vCPU in KVM.
David Gibson [Fri, 29 Nov 2019 05:23:21 +0000 (16:23 +1100)]
spapr: Simplify ovec diff
spapr_ovec_diff(ov, old, new) has somewhat complex semantics. ov is set
to those bits which are in new but not old, and it returns as a boolean
whether or not there are any bits in old but not new.
It turns out that both callers only care about the second, not the first.
This is basically equivalent to a bitmap subset operation, which is easier
to understand and implement. So replace spapr_ovec_diff() with
spapr_ovec_subset().
David Gibson [Fri, 29 Nov 2019 04:00:58 +0000 (15:00 +1100)]
spapr: Fold h_cas_compose_response() into h_client_architecture_support()
spapr_h_cas_compose_response() handles the last piece of the PAPR feature
negotiation process invoked via the ibm,client-architecture-support OF
call. Its only caller is h_client_architecture_support() which handles
most of the rest of that process.
I believe it was placed in a separate file originally to handle some
fiddly dependencies between functions, but mostly it's just confusing
to have the CAS process split into two pieces like this. Now that
compose response is simplified (by just generating the whole device
tree anew), it's cleaner to just fold it into
h_client_architecture_support().
David Gibson [Fri, 18 Oct 2019 10:25:49 +0000 (21:25 +1100)]
spapr: Improve handling of fdt buffer size
Previously, spapr_build_fdt() constructed the device tree in a fixed
buffer of size FDT_MAX_SIZE. This is a bit inflexible, but more
importantly it's awkward for the case where we use it during CAS. In
that case the guest firmware supplies a buffer and we have to
awkwardly check that what we generated fits into it afterwards, after
doing a lot of size checks during spapr_build_fdt().
Simplify this by having spapr_build_fdt() take a 'space' parameter.
For the CAS case, we pass in the buffer size provided by SLOF, for the
machine init case, we continue to pass FDT_MAX_SIZE.
David Gibson [Fri, 18 Oct 2019 04:19:31 +0000 (15:19 +1100)]
spapr: Don't trigger a CAS reboot for XICS/XIVE mode changeover
PAPR allows the interrupt controller used on a POWER9 machine (XICS or
XIVE) to be selected by the guest operating system, by using the
ibm,client-architecture-support (CAS) feature negotiation call.
Currently, if the guest selects an interrupt controller different from the
one selected at initial boot, this causes the system to be reset with the
new model and the boot starts again. This means we run through the SLOF
boot process twice, as well as any other bootloader (e.g. grub) in use
before the OS calls CAS. This can be confusing and/or inconvenient for
users.
Thanks to two fairly recent changes, we no longer need this reboot. 1) we
now completely regenerate the device tree when CAS is called (meaning we
don't need special case updates for all the device tree changes caused by
the interrupt controller mode change), 2) we now have explicit code paths
to activate and deactivate the different interrupt controllers, rather than
just implicitly calling those at machine reset time.
We can therefore eliminate the reboot for changing irq mode, simply by
putting a call to spapr_irq_update_active_intc() before we call
spapr_h_cas_compose_response() (which gives the updated device tree to
the guest firmware and OS).
ppc: well form kvmppc_hint_smt_possible error hint helper
Make kvmppc_hint_smt_possible hint append helper well formed:
rename errp to errp_in, as it is IN-parameter here (which is unusual
for errp), rename function to be kvmppc_error_append_*_hint.
Cédric Le Goater [Mon, 25 Nov 2019 06:58:20 +0000 (07:58 +0100)]
ppc/pnv: Dump the XIVE NVT table
This is useful to dump the saved contexts of the vCPUs : configuration
of the base END index of the vCPU and the Interrupt Pending Buffer
register, which is updated when an interrupt can not be presented.
When dumping the NVT table, we skip empty indirect pages which are not
necessarily allocated.
Cédric Le Goater [Mon, 25 Nov 2019 06:58:18 +0000 (07:58 +0100)]
ppc/pnv: Introduce a pnv_xive_block_id() helper
When PC_TCTXT_CHIPID_OVERRIDE is configured, the PC_TCTXT_CHIPID field
overrides the hardwired chip ID in the Powerbus operations and for CAM
compares. This is typically used in the one block-per-chip configuration
to associate a unique block id number to each IC of the system.
Simplify the model with a pnv_xive_block_id() helper and remove
'tctx_chipid' which becomes useless.
Cédric Le Goater [Mon, 25 Nov 2019 06:58:17 +0000 (07:58 +0100)]
ppc/xive: Synthesize interrupt from the saved IPB in the NVT
When a vCPU is dispatched on a HW thread, its context is pushed in the
thread registers and it is activated by setting the VO bit in the CAM
line word2. The HW grabs the associated NVT, pulls the IPB bits and
merges them with the IPB of the new context. If interrupts were missed
while the vCPU was not dispatched, these are synthesized in this
sequence.
Cédric Le Goater [Mon, 25 Nov 2019 06:58:14 +0000 (07:58 +0100)]
ppc/xive: Move the TIMA operations to the controller model
On the P9 Processor, the thread interrupt context registers of a CPU
can be accessed "directly" when by load/store from the CPU or
"indirectly" by the IC through an indirect TIMA page. This requires to
configure first the PC_TCTXT_INDIRx registers.
Today, we rely on the get_tctx() handler to deduce from the CPU PIR
the chip from which the TIMA access is being done. By handling the
TIMA memory ops under the interrupt controller model of each machine,
we can uniformize the TIMA direct and indirect ops under PowerNV. We
can also check that the CPUs have been enabled in the XIVE controller.
This prepares ground for the future versions of XIVE.
Cédric Le Goater [Mon, 25 Nov 2019 06:58:13 +0000 (07:58 +0100)]
ppc/pnv: Clarify how the TIMA is accessed on a multichip system
The TIMA region gives access to the thread interrupt context registers
of a CPU. It is mapped at the same address on all chips and can be
accessed by any CPU of the system. To identify the chip from which the
access is being done, the PowerBUS uses a 'chip' field in the
load/store messages. QEMU does not model these messages, instead, we
extract the chip id from the CPU PIR and do a lookup at the machine
level to fetch the targeted interrupt controller.
Introduce pnv_get_chip() and pnv_xive_tm_get_xive() helpers to clarify
this process in pnv_xive_get_tctx(). The latter will be removed in the
subsequent patches but the same principle will be kept.
Greg Kurz [Tue, 26 Nov 2019 16:46:33 +0000 (17:46 +0100)]
spapr/xive: Configure number of servers in KVM
The XIVE KVM devices now has an attribute to configure the number of
interrupt servers. This allows to greatly optimize the usage of the VP
space in the XIVE HW, and thus to start a lot more VMs.
Only set this attribute if available in order to support older POWER9
KVM.
The XIVE KVM device now reports the exhaustion of VPs upon the
connection of the first VCPU. Check that in order to have a chance
to provide a hint to the user.
Greg Kurz [Tue, 26 Nov 2019 16:46:28 +0000 (17:46 +0100)]
spapr/xics: Configure number of servers in KVM
The XICS-on-XIVE KVM devices now has an attribute to configure the number
of interrupt servers. This allows to greatly optimize the usage of the VP
space in the XIVE HW, and thus to start a lot more VMs.
Only set this attribute if available in order to support older POWER9 KVM
and pre-POWER9 XICS KVM devices.
Greg Kurz [Tue, 26 Nov 2019 16:46:23 +0000 (17:46 +0100)]
spapr: Pass the maximum number of vCPUs to the KVM interrupt controller
The XIVE and XICS-on-XIVE KVM devices on POWER9 hosts can greatly reduce
their consumption of some scarce HW resources, namely Virtual Presenter
identifiers, if they know the maximum number of vCPUs that may run in the
VM.
Prepare ground for this by passing the value down to xics_kvm_connect()
and kvmppc_xive_connect(). This is purely mechanical, no functional
change.
Cédric Le Goater [Mon, 25 Nov 2019 06:58:12 +0000 (07:58 +0100)]
ppc/xive: Extend the TIMA operation with a XivePresenter parameter
The TIMA operations are performed on behalf of the XIVE IVPE sub-engine
(Presenter) on the thread interrupt context registers. The current
operations supported by the model are simple and do not require access
to the controller but more complex operations will need access to the
controller NVT table and to its configuration.
Cédric Le Goater [Mon, 25 Nov 2019 06:58:11 +0000 (07:58 +0100)]
ppc/xive: Use the XiveFabric and XivePresenter interfaces
Now that the machines have handlers implementing the XiveFabric and
XivePresenter interfaces, remove xive_presenter_match() and make use
of the 'match_nvt' handler of the machine.
Cédric Le Goater [Mon, 25 Nov 2019 06:58:10 +0000 (07:58 +0100)]
ppc/spapr: Implement the XiveFabric interface
The CAM line matching sequence in the pseries machine does not change
much apart from the use of the new QOM interfaces. There is an extra
indirection because of the sPAPR IRQ backend of the machine. Only the
XIVE backend implements the new 'match_nvt' handler.
Cédric Le Goater [Mon, 25 Nov 2019 06:58:08 +0000 (07:58 +0100)]
ppc/xive: Introduce a XiveFabric interface
The XiveFabric QOM interface acts as the PowerBUS interface between
the interrupt controller and the system and should be implemented by
the QEMU machine. On HW, the XIVE sub-engine is responsible for the
communication with the other chip is the Common Queue (CQ) bridge
unit.
This interface offers a 'match_nvt' handler to perform the CAM line
matching when looking for a XIVE Presenter with a dispatched NVT.
Cédric Le Goater [Mon, 25 Nov 2019 06:58:07 +0000 (07:58 +0100)]
ppc/pnv: Fix TIMA indirect access
When the TIMA of a CPU needs to be accessed from the indirect page,
the thread id of the target CPU is first stored in the PC_TCTXT_INDIR0
register. This thread id is relative to the chip and not to the system.
Introduce a helper routine to look for a CPU of a given PIR and fix
pnv_xive_get_indirect_tctx() to scan only the threads of the local
chip and not the whole machine.
Greg Kurz [Mon, 25 Nov 2019 06:58:03 +0000 (07:58 +0100)]
ppc/pnv: Instantiate cores separately
Allocating a big void * array to store multiple objects isn't a
recommended practice for various reasons:
- no compile time type checking
- potential dangling pointers if a reference on an individual is
taken and the array is freed later on
- duplicate boiler plate everywhere the array is browsed through
Allocate an array of pointers and populate it instead.
Cédric Le Goater [Mon, 25 Nov 2019 06:58:02 +0000 (07:58 +0100)]
ppc/xive: Implement the XivePresenter interface
Each XIVE Router model, sPAPR and PowerNV, now implements the 'match_nvt'
handler of the XivePresenter QOM interface. This is simply moving code
and taking into account the new API.
To be noted that the xive_router_get_tctx() helper is not used anymore
when doing CAM matching and will be removed later on after other changes.
The XIVE presenter model is still too simple for the PowerNV machine
and the CAM matching algo is not correct on multichip system. Subsequent
patches will introduce more changes to scan all chips of the system.
Cédric Le Goater [Mon, 25 Nov 2019 06:58:01 +0000 (07:58 +0100)]
ppc/xive: Introduce a XivePresenter interface
When the XIVE IVRE sub-engine (XiveRouter) looks for a Notification
Virtual Target (NVT) to notify, it broadcasts a message on the
PowerBUS to find an XIVE IVPE sub-engine (Presenter) with the NVT
dispatched on one of its HW threads, and then forwards the
notification if any response was received.
The current XIVE presenter model is sufficient for the pseries machine
because it has a single interrupt controller device, but the PowerNV
machine can have multiple chips each having its own interrupt
controller. In this case, the XIVE presenter model is too simple and
the CAM line matching should scan all chips of the system.
To start fixing this issue, we first extend the XIVE Router model with
a new XivePresenter QOM interface representing the XIVE IVPE
sub-engine. This interface exposes a 'match_nvt' handler which the
sPAPR and PowerNV XIVE Router models will need to implement to perform
the CAM line matching.
Cédric Le Goater [Thu, 21 Nov 2019 16:23:40 +0000 (17:23 +0100)]
ppc/pnv: Create BMC devices at machine init
The BMC of the OpenPOWER systems monitors the machine state using
sensors, controls the power and controls the access to the PNOR flash
device containing the firmware image required to boot the host.
QEMU models the power cycle process, access to the sensors and access
to the PNOR device. But, for these features to be available, the QEMU
PowerNV machine needs two extras devices on the command line, an IPMI
BT device for communication and a BMC backend device:
The BMC properties are then defined accordingly in the device tree and
OPAL self adapts. If a BMC device and an IPMI BT device are not
available, OPAL does not try to communicate with the BMC in any
manner. This is not how real systems behave.
To be closer to the default behavior, create an IPMI BMC simulator
device and an IPMI BT device at machine initialization time. We loose
the ability to define an external BMC device but there are benefits:
- a better match with real systems,
- a better test coverage of the OPAL code,
- system powerdown and reset commands that work,
- a QEMU device tree compliant with the specifications (*).
Cédric Le Goater [Mon, 28 Oct 2019 07:00:27 +0000 (08:00 +0100)]
ppc/pnv: Add HIOMAP commands
This activates HIOMAP support on the QEMU PowerNV machine. The PnvPnor
model is used to access the flash contents. The model simply maps the
contents at a fix offset and enables or disables the mapping.