9pfs: make Error **errp const where it is appropriate
Mostly, Error ** is for returning error from the function, so the
callee sets it. However error_append_security_model_hint and
error_append_socket_sockfd_hint get already filled errp
parameter. They don't change the pointer itself, only change the
internal state of referenced Error object. So we can make it Error
*const * errp, to stress the behavior. It will also help coccinelle
script (in future) to distinguish such cases from common errp usage.
ppc: make Error **errp const where it is appropriate
Mostly, Error ** is for returning error from the function, so the
callee sets it. However kvmppc_hint_smt_possible gets already filled
errp parameter. It doesn't change the pointer itself, only change the
internal state of referenced Error object. So we can make it Error
*const * errp, to stress the behavior. It will also help coccinelle
script (in future) to distinguish such cases from common errp usage.
While there, rename the function to
kvmppc_error_append_smt_possible_hint().
qdev-monitor: make Error **errp const where it is appropriate
Mostly, Error ** is for returning error from the function, so the
callee sets it. However qbus_list_bus and qbus_list_dev get already
filled errp parameter. They don't change the pointer itself, only
change the internal state of referenced Error object. So we can make
it Error *const * errp, to stress the behavior. It will also help
coccinelle script (in future) to distinguish such cases from common
errp usage.
While there, rename the functions to
qbus_error_append_bus_list_hint(), qbus_error_append_dev_list_hint().
error: make Error **errp const where it is appropriate
Mostly, Error ** is for returning error from the function, so the
callee sets it. However these three functions get already filled errp
parameter. They don't change the pointer itself, only change the
internal state of referenced Error object. So we can make it
Error *const * errp, to stress the behavior. It will also help
coccinelle script (in future) to distinguish such cases from common
errp usage.
error: Clean up unusual names of Error * variables
Local Error * variables are conventionally named @err or @local_err,
and Error ** parameters @errp. Naming local variables like parameters
is confusing. Clean that up.
Naming parameters like local variables is also confusing. Left for
another day.
memory-device: Fix memory pre-plug error API violations
memory_device_get_free_addr() dereferences @errp when
memory_device_check_addable() fails. That's wrong; see the big
comment in error.h. Introduced in commit 1b6d6af21b "pc-dimm: factor
out capacity and slot checks into MemoryDevice".
No caller actually passes null.
Fix anyway: splice in a local Error *err, and error_propagate().
build_guest_fsinfo_for_virtual_device() dereferences @errp when
build_guest_fsinfo_for_device() fails. That's wrong; see the big
comment in error.h. Introduced in commit 46d4c5723e "qga: Add
guest-get-fsinfo command".
No caller actually passes null.
Fix anyway: splice in a local Error *err, and error_propagate().
isa_ipmi_bt_realize(), ipmi_isa_realize(), pci_ipmi_bt_realize(), and
pci_ipmi_kcs_realize() dereference @errp when IPMIInterfaceClass
method init() fails. That's wrong; see the big comment in error.h.
Introduced in commit 0719029c47 "ipmi: Add an ISA KCS low-level
interface", then imitated in commit a9b74079cb "ipmi: Add a BT
low-level interface" and commit 12f983c6aa "ipmi: Add PCI IPMI
interfaces".
No caller actually passes null.
Fix anyway: splice in a local Error *err, and error_propagate().
fit_load_fdt() passes @errp to fit_image_addr(), then recovers from
ENOENT failures. Passing @errp is wrong, because it works only as
long as @errp is neither @error_fatal nor @error_abort. Error
recovery dereferences @errp. That's also wrong; see the big comment
in error.h. Error recovery can leave *errp pointing to a freed
Error object. Wrong, it must be null on success. Messed up in
commit 3eb99edb48 "loader-fit: Wean off error_printf()".
No caller actually passes such values, or uses *errp on success.
Fix anyway: splice in a local Error *err, and error_propagate().
legacy_acpi_cpu_plug_cb() dereferences @errp when
acpi_set_cpu_present_bit() fails. That's wrong; see the big comment
in error.h. Introduced in commit cc43364de7 "acpi/cpu-hotplug:
introduce helper function to keep bit setting in one place".
No caller actually passes null, and acpi_set_cpu_present_bit() can't
actually fail.
Fix anyway: drop acpi_set_cpu_present_bit()'s @errp parameter.
When os_mem_prealloc() fails, file_ram_alloc() calls qemu_ram_munmap()
and returns null. Except it doesn't when its @errp argument is null,
because it checks for failure with (errp && *errp). Introduced in
commit 056b68af77 "fix qemu exit on memory hotplug when allocation
fails at prealloc time".
No caller actually passes null.
Fix anyway: splice in a local Error *err, and error_propagate().
qcrypto_tls_creds_load_cert() passes uninitialized GError *gerr by
reference to g_file_get_contents(). When g_file_get_contents() fails,
it'll try to set a GError. Unless @gerr is null by dumb luck, this
logs a ERROR_OVERWRITTEN_WARNING warning message and leaves @gerr
unchanged. qcrypto_tls_creds_load_cert() then dereferences the
uninitialized @gerr.
-msg parameter "timestamp" defaults to "off" if you don't specify msg,
and to "on" if you do. Messed up right in commit 5e2ac51917 "add
timestamp to error_report()". Mostly harmless, because "timestamp" is
the only parameter, so "if you do" is "-msg ''", which nobody does.
Change the default to "off" no matter what.
While there, rename enable_timestamp_msg to error_with_timestamp, and
polish documentation.
* remotes/huth-gitlab/tags/pull-request-2019-12-17:
tests: use g_test_rand_int
tests/Makefile: Fix check-report.* targets shown in check-help
glib: use portable g_setenv()
hw/misc/ivshmem: Bury dead legacy INTx code
pseries: disable migration-test if /dev/kvm cannot be used
tests: fix modules-test 'duplicate test case' error
Remove libbluetooth / bluez from the CI tests
Remove the core bluetooth code
hw/usb: Remove the USB bluetooth dongle device
hw/arm/nseries: Replace the bluetooth chardev with a "null" chardev
Peter Maydell [Tue, 17 Dec 2019 14:34:31 +0000 (14:34 +0000)]
Merge remote-tracking branch 'remotes/cleber/tags/python-next-pull-request' into staging
Python queue 2019-12-17
# gpg: Signature made Tue 17 Dec 2019 05:12:43 GMT
# gpg: using RSA key 7ABB96EB8B46B94D5E0FE9BB657E8D33A5F209F3
# gpg: Good signature from "Cleber Rosa <[email protected]>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 7ABB 96EB 8B46 B94D 5E0F E9BB 657E 8D33 A5F2 09F3
* remotes/cleber/tags/python-next-pull-request:
python/qemu: Remove unneeded imports in __init__
python/qemu: accel: Add tcg_available() method
python/qemu: accel: Strengthen kvm_available() checks
python/qemu: accel: Add list_accel() method
python/qemu: Move kvm_available() to its own module
Acceptance tests: use relative location for tests
Acceptance tests: use avocado tags for machine type
Acceptance tests: introduce utility method for tags unique vals
Acceptance test x86_cpu_model_versions: use default vm
tests/acceptance: Makes linux_initrd and empty_cpu_model use QEMUMachine
python/qemu: Add set_qmp_monitor() to QEMUMachine
analyze-migration.py: replace numpy with python 3.2
analyze-migration.py: fix find() type error
Revert "Acceptance test: cancel test if m68k kernel packages goes missing"
tests/boot_linux_console: Fetch assets from Debian snapshot archives
* remotes/dgibson/tags/ppc-for-5.0-20191217: (88 commits)
pseries: Update SLOF firmware image
ppc/pnv: Drop PnvChipClass::type
ppc/pnv: Introduce PnvChipClass::xscom_pcba() method
ppc/pnv: Drop pnv_chip_is_power9() and pnv_chip_is_power10() helpers
ppc/pnv: Pass content of the "compatible" property to pnv_dt_xscom()
ppc/pnv: Pass XSCOM base address and address size to pnv_dt_xscom()
ppc/pnv: Introduce PnvChipClass::xscom_core_base() method
ppc/pnv: Introduce PnvChipClass::intc_print_info() method
ppc/pnv: Drop pnv_is_power9() and pnv_is_power10() helpers
ppc/pnv: Introduce PnvMachineClass::dt_power_mgt()
ppc/pnv: Introduce PnvMachineClass and PnvMachineClass::compat
ppc/pnv: Drop PnvPsiClass::chip_type
ppc/pnv: Introduce PnvPsiClass::compat
ppc: Drop useless extern annotation for functions
ppc/pnv: Fix OCC common area region mapping
ppc/pnv: Introduce PBA registers
ppc/pnv: Make PnvXScomInterface an incomplete type
ppc/pnv: populate the DT with realized XSCOM devices
ppc/pnv: Loop on the whole hierarchy to populate the DT with the XSCOM nodes
target/ppc: Add SPR TBU40
...
* remotes/ehabkost/tags/x86-next-pull-request:
i386: Use g_autofree in a few places
i386: Add new CPU model Cooperlake
i386: Add macro for stibp
i386: Add MSR feature bit for MDS-NO
Paolo Bonzini [Thu, 12 Dec 2019 01:17:58 +0000 (02:17 +0100)]
tests: use g_test_rand_int
g_test_rand_int provides a reproducible random integer number, using a
different number seed every time but allowing reproduction using the
--seed command line option. It is thus better suited to tests than
g_random_int or random.
tests/Makefile: Fix check-report.* targets shown in check-help
The check-report.html and check-report.xml targets were replaced
with check-report.tap in commit 9df43317b82 but the check-help
text was not updated so it still lists check-report.html.
Devices "ivshmem-plain" and "ivshmem-doorbell" support only MSI-X.
Config space register Interrupt Pin is zero. Device "ivshmem"
additionally supported legacy INTx, but it was removed in commit 5a0e75f0a9 "hw/misc/ivshmem: Remove deprecated "ivshmem" legacy
device". The commit left ivshmem_update_irq() behind. Since the
Interrupt Pin register is zero, the function does nothing. Remove it.
Laurent Vivier [Wed, 20 Nov 2019 17:09:55 +0000 (18:09 +0100)]
pseries: disable migration-test if /dev/kvm cannot be used
On ppc64, migration-test only works with kvm_hv, and we already
have a check to verify the module is loaded.
kvm_hv module can be loaded in memory and /sys/module/kvm_hv exists,
but on some systems (like build systems) /dev/kvm can be missing
(by administrators choice).
And as kvm_hv exists test-migration is started but QEMU falls back to
TCG because it cannot be used:
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator
And as the test is done with TCG, it fails.
As for s390x, we must check for the existence and the access rights
of /dev/kvm.
Thomas Huth [Wed, 20 Nov 2019 09:10:13 +0000 (10:10 +0100)]
Remove the core bluetooth code
It's been deprecated since QEMU v3.1. We've explicitly asked in the
deprecation message that people should speak up on qemu-devel in case
they are still actively using the bluetooth part of QEMU, but nobody
ever replied that they are really still using it.
I've tried it on my own to use this bluetooth subsystem for one of my
guests, but I was also not able to get it running anymore: When I was
trying to pass-through a real bluetooth device, either the guest did
not see the device at all, or the guest crashed.
Even worse for the emulated device: When running
qemu-system-x86_64 -bt device:keyboard
QEMU crashes once you hit a key.
So it seems like the bluetooth stack is not only neglected, it is
completely bitrotten, as far as I can tell. The only attention that
this code got during the past years were some CVEs that have been
spotted there. So this code is a burden for the developers, without
any real benefit anymore. Time to remove it.
Note: hw/bt/Kconfig only gets cleared but not removed here yet.
Otherwise there is a problem with the *-softmmu/config-devices.mak.d
dependency files - they still contain a reference to this file which
gets evaluated first on some build hosts, before the file gets
properly recreated. To avoid breaking these builders, we still need
the file around for some time. It will get removed in a couple of
weeks instead.
Alexey Kardashevskiy (12):
allocator: Fix format strings for DEBUG
virtio: Make virtio_set_qaddr static
client: Load initramdisk location
sloffs: Fix -Wunused-result gcc warnings in read/write
pci-phb: Reimplement dma-map-in/out
virtio: Store queue descriptors in virtio_device
virtio-net: Init queues after features negotiation
virtio: Enable IOMMU
ibm,client-architecture-support: Fix stack handling
fdt: Fix updating the tree at H_CAS
version: update to 20191206
version: update to 20191217
Michael Roth (1):
dma: Define default dma methods for using by client/package instances
The XSCOM bus is implemented with a QOM interface, which is mostly
generic from a CPU type standpoint, except for the computation of
addresses on the Pervasive Connect Bus (PCB) network. This is handled
by the pnv_xscom_pcba() function with a switch statement based on
the chip_type class level attribute of the CPU chip.
This can be achieved using QOM. Also the address argument is masked with
PNV_XSCOM_SIZE - 1, which is for POWER8 only. Addresses may have different
sizes with other CPU types. Have each CPU chip type handle the appropriate
computation with a QOM xscom_pcba() method.
Greg Kurz [Fri, 13 Dec 2019 12:00:24 +0000 (13:00 +0100)]
ppc/pnv: Pass content of the "compatible" property to pnv_dt_xscom()
Since pnv_dt_xscom() is called from chip specific dt_populate() hooks,
it shouldn't have to guess the chip type in order to populate the
"compatible" property. Just pass the compat string and its size as
arguments.
Greg Kurz [Fri, 13 Dec 2019 12:00:18 +0000 (13:00 +0100)]
ppc/pnv: Pass XSCOM base address and address size to pnv_dt_xscom()
Since pnv_dt_xscom() is called from chip specific dt_populate() hooks,
it shouldn't have to guess the chip type in order to populate the "reg"
property. Just pass the base address and address size as arguments.
The pnv_chip_core_realize() function configures the XSCOM MMIO subregion
for each core of a single chip. The base address of the subregion depends
on the CPU type. Its computation is currently open-code using the
pnv_chip_is_powerXX() helpers. This can be achieved with QOM. Introduce
a method for this in the base chip class and implement it in child classes.
The pnv_pic_print_info() callback checks the type of the chip in order
to forward to the request appropriate interrupt controller. This can
be achieved with QOM. Introduce a method for this in the base chip class
and implement it in child classes.
This also prepares ground for the upcoming interrupt controller of POWER10
chips.
We add an extra node to advertise power management on some machines,
namely powernv9 and powernv10. This is achieved by using the
pnv_is_power9() and pnv_is_power10() helpers.
This can be achieved with QOM. Add a method to the base class for
powernv machines and have it implemented by machine types that
support power management instead.
Greg Kurz [Fri, 13 Dec 2019 11:59:50 +0000 (12:59 +0100)]
ppc/pnv: Introduce PnvMachineClass and PnvMachineClass::compat
The pnv_dt_create() function generates different contents for the
"compatible" property of the root node in the DT, depending on the
CPU type. This is open coded with multiple ifs using pnv_is_powerXX()
helpers.
It seems cleaner to achieve with QOM. Introduce a base class for the
powernv machine and a compat attribute that each child class can use
to provide the value for the "compatible" property.
Greg Kurz [Fri, 13 Dec 2019 11:59:39 +0000 (12:59 +0100)]
ppc/pnv: Introduce PnvPsiClass::compat
The Processor Service Interface (PSI) model has a chip_type class level
attribute, which is used to generate the content of the "compatible" DT
property according to the CPU type.
Since the PSI model already has specialized classes for each supported
CPU type, it seems cleaner to achieve this with QOM. Provide the content
of the "compatible" property with a new class level attribute.
We could define a "OCC common area" memory region at the machine level
and sub regions for each OCC. But it adds some extra complexity to the
models. Fix the current layout with a simpler model.
Cédric Le Goater [Wed, 11 Dec 2019 08:29:11 +0000 (09:29 +0100)]
ppc/pnv: Introduce PBA registers
The PBA bridge unit (Power Bus Access) connects the OCC (On Chip
Controller) to the Power bus and System Memory. The PBA is used to
gather sensor data, for power management, for sleep states, for
initial boot, among other things.
The PBA logic provides a set of four registers PowerBus Access Base
Address Registers (PBABAR0..3) which map the OCC address space to the
PowerBus space. These registers are setup by the initial FW and define
the PowerBus Range of system memory that can be accessed by PBA.
The current modeling of the PBABAR registers is done under the common
XSCOM handlers. We introduce a specific XSCOM regions for these
registers and fix :
- BAR sizes and BAR masks
- The mapping of the OCC common area. It is common to all chips and
should be mapped once. We will address per-OCC area in the next
change.
- OCC common area is in BAR 3 on P8
Greg Kurz [Wed, 11 Dec 2019 16:04:15 +0000 (17:04 +0100)]
ppc/pnv: Make PnvXScomInterface an incomplete type
PnvXScomInterface is an interface instance. It should never be
dereferenced. Drop the dummy type definition for extra safety,
which is the common practice with QOM interfaces.
While here also convert the bogus OBJECT_CHECK() to INTERFACE_CHECK().
Cédric Le Goater [Tue, 10 Dec 2019 13:58:45 +0000 (14:58 +0100)]
ppc/pnv: populate the DT with realized XSCOM devices
Some devices could be initialized in the instance_init handler but not
realized for configuration reasons. Nodes should not be added in the DT
for such devices.
The Access Segment Descriptor Register (ASDR) provides information about
the storage element when taking a hypervisor storage interrupt. When
performing nested radix address translation, this is normally the guest
real address. This register is present on POWER9 processors and later.
Implement the ADSR, note read and write access is limited to the
hypervisor.
target/ppc: Work [S]PURR implementation and add HV support
The Processor Utilisation of Resources Register (PURR) and Scaled
Processor Utilisation of Resources Register (SPURR) provide an estimate
of the resources used by the thread, present on POWER7 and later
processors.
Currently the [S]PURR registers simply count at the rate of the
timebase.
Preserve this behaviour but rework the implementation to store an offset
like the timebase rather than doing the calculation manually. Also allow
hypervisor write access to the register along with the currently
available read access.
The virtual timebase register (VTB) is a 64-bit register which
increments at the same rate as the timebase register, present on POWER8
and later processors.
The register is able to be read/written by the hypervisor and read by
the supervisor. All other accesses are illegal.
Currently the VTB is just an alias for the timebase (TB) register.
Implement the VTB so that is can be read/written independent of the TB.
Make use of the existing method for accessing timebase facilities where
by the compensation is stored and used to compute the value on reads/is
updated on writes.
This includes in QEMU a new CPU model for the POWER10 processor with
the same capabilities of a POWER9 process. The model will be extended
when support is completed.
Greg Kurz [Mon, 9 Dec 2019 13:28:00 +0000 (14:28 +0100)]
ppc: Make PPCVirtualHypervisor an incomplete type
PPCVirtualHypervisor is an interface instance. It should never be
dereferenced. Drop the dummy type definition for extra safety, which
is the common practice with QOM interfaces.
Greg Kurz [Wed, 4 Dec 2019 19:43:48 +0000 (20:43 +0100)]
ppc: Don't use CPUPPCState::irq_input_state with modern Book3s CPU models
The power7_set_irq() and power9_set_irq() functions set this but it is
never used actually. Modern Book3s compatible CPUs are only supported
by the pnv and spapr machines. They have an interrupt controller, XICS
for POWER7/8 and XIVE for POWER9, whose models don't require to track
IRQ input states at the CPU level.
Greg Kurz [Wed, 4 Dec 2019 19:43:37 +0000 (20:43 +0100)]
ppc: Deassert the external interrupt pin in KVM on reset
When a CPU is reset, QEMU makes sure no interrupt is pending by clearing
CPUPPCstate::pending_interrupts in ppc_cpu_reset(). In the case of a
complete machine emulation, eg. a sPAPR machine, an external interrupt
request could still be pending in KVM though, eg. an IPI. It will be
eventually presented to the guest, which is supposed to acknowledge it at
the interrupt controller. If the interrupt controller is emulated in QEMU,
either XICS or XIVE, ppc_set_irq() won't deassert the external interrupt
pin in KVM since it isn't pending anymore for QEMU. When the vCPU re-enters
the guest, the interrupt request is still pending and the vCPU will try
again to acknowledge it. This causes an infinite loop and eventually hangs
the guest.
The code has been broken since the beginning. The issue wasn't hit before
because accel=kvm,kernel-irqchip=off is an awkward setup that never got
used until recently with the LC92x IBM systems (aka, Boston).
Add a ppc_irq_reset() function to do the necessary cleanup, ie. deassert
the IRQ pins of the CPU in QEMU and most importantly the external interrupt
pin for this vCPU in KVM.
David Gibson [Fri, 29 Nov 2019 05:23:21 +0000 (16:23 +1100)]
spapr: Simplify ovec diff
spapr_ovec_diff(ov, old, new) has somewhat complex semantics. ov is set
to those bits which are in new but not old, and it returns as a boolean
whether or not there are any bits in old but not new.
It turns out that both callers only care about the second, not the first.
This is basically equivalent to a bitmap subset operation, which is easier
to understand and implement. So replace spapr_ovec_diff() with
spapr_ovec_subset().
David Gibson [Fri, 29 Nov 2019 04:00:58 +0000 (15:00 +1100)]
spapr: Fold h_cas_compose_response() into h_client_architecture_support()
spapr_h_cas_compose_response() handles the last piece of the PAPR feature
negotiation process invoked via the ibm,client-architecture-support OF
call. Its only caller is h_client_architecture_support() which handles
most of the rest of that process.
I believe it was placed in a separate file originally to handle some
fiddly dependencies between functions, but mostly it's just confusing
to have the CAS process split into two pieces like this. Now that
compose response is simplified (by just generating the whole device
tree anew), it's cleaner to just fold it into
h_client_architecture_support().
David Gibson [Fri, 18 Oct 2019 10:25:49 +0000 (21:25 +1100)]
spapr: Improve handling of fdt buffer size
Previously, spapr_build_fdt() constructed the device tree in a fixed
buffer of size FDT_MAX_SIZE. This is a bit inflexible, but more
importantly it's awkward for the case where we use it during CAS. In
that case the guest firmware supplies a buffer and we have to
awkwardly check that what we generated fits into it afterwards, after
doing a lot of size checks during spapr_build_fdt().
Simplify this by having spapr_build_fdt() take a 'space' parameter.
For the CAS case, we pass in the buffer size provided by SLOF, for the
machine init case, we continue to pass FDT_MAX_SIZE.
David Gibson [Fri, 18 Oct 2019 04:19:31 +0000 (15:19 +1100)]
spapr: Don't trigger a CAS reboot for XICS/XIVE mode changeover
PAPR allows the interrupt controller used on a POWER9 machine (XICS or
XIVE) to be selected by the guest operating system, by using the
ibm,client-architecture-support (CAS) feature negotiation call.
Currently, if the guest selects an interrupt controller different from the
one selected at initial boot, this causes the system to be reset with the
new model and the boot starts again. This means we run through the SLOF
boot process twice, as well as any other bootloader (e.g. grub) in use
before the OS calls CAS. This can be confusing and/or inconvenient for
users.
Thanks to two fairly recent changes, we no longer need this reboot. 1) we
now completely regenerate the device tree when CAS is called (meaning we
don't need special case updates for all the device tree changes caused by
the interrupt controller mode change), 2) we now have explicit code paths
to activate and deactivate the different interrupt controllers, rather than
just implicitly calling those at machine reset time.
We can therefore eliminate the reboot for changing irq mode, simply by
putting a call to spapr_irq_update_active_intc() before we call
spapr_h_cas_compose_response() (which gives the updated device tree to
the guest firmware and OS).
ppc: well form kvmppc_hint_smt_possible error hint helper
Make kvmppc_hint_smt_possible hint append helper well formed:
rename errp to errp_in, as it is IN-parameter here (which is unusual
for errp), rename function to be kvmppc_error_append_*_hint.
Cédric Le Goater [Mon, 25 Nov 2019 06:58:20 +0000 (07:58 +0100)]
ppc/pnv: Dump the XIVE NVT table
This is useful to dump the saved contexts of the vCPUs : configuration
of the base END index of the vCPU and the Interrupt Pending Buffer
register, which is updated when an interrupt can not be presented.
When dumping the NVT table, we skip empty indirect pages which are not
necessarily allocated.
Cédric Le Goater [Mon, 25 Nov 2019 06:58:18 +0000 (07:58 +0100)]
ppc/pnv: Introduce a pnv_xive_block_id() helper
When PC_TCTXT_CHIPID_OVERRIDE is configured, the PC_TCTXT_CHIPID field
overrides the hardwired chip ID in the Powerbus operations and for CAM
compares. This is typically used in the one block-per-chip configuration
to associate a unique block id number to each IC of the system.
Simplify the model with a pnv_xive_block_id() helper and remove
'tctx_chipid' which becomes useless.
Cédric Le Goater [Mon, 25 Nov 2019 06:58:17 +0000 (07:58 +0100)]
ppc/xive: Synthesize interrupt from the saved IPB in the NVT
When a vCPU is dispatched on a HW thread, its context is pushed in the
thread registers and it is activated by setting the VO bit in the CAM
line word2. The HW grabs the associated NVT, pulls the IPB bits and
merges them with the IPB of the new context. If interrupts were missed
while the vCPU was not dispatched, these are synthesized in this
sequence.
Cédric Le Goater [Mon, 25 Nov 2019 06:58:14 +0000 (07:58 +0100)]
ppc/xive: Move the TIMA operations to the controller model
On the P9 Processor, the thread interrupt context registers of a CPU
can be accessed "directly" when by load/store from the CPU or
"indirectly" by the IC through an indirect TIMA page. This requires to
configure first the PC_TCTXT_INDIRx registers.
Today, we rely on the get_tctx() handler to deduce from the CPU PIR
the chip from which the TIMA access is being done. By handling the
TIMA memory ops under the interrupt controller model of each machine,
we can uniformize the TIMA direct and indirect ops under PowerNV. We
can also check that the CPUs have been enabled in the XIVE controller.
This prepares ground for the future versions of XIVE.
Cédric Le Goater [Mon, 25 Nov 2019 06:58:13 +0000 (07:58 +0100)]
ppc/pnv: Clarify how the TIMA is accessed on a multichip system
The TIMA region gives access to the thread interrupt context registers
of a CPU. It is mapped at the same address on all chips and can be
accessed by any CPU of the system. To identify the chip from which the
access is being done, the PowerBUS uses a 'chip' field in the
load/store messages. QEMU does not model these messages, instead, we
extract the chip id from the CPU PIR and do a lookup at the machine
level to fetch the targeted interrupt controller.
Introduce pnv_get_chip() and pnv_xive_tm_get_xive() helpers to clarify
this process in pnv_xive_get_tctx(). The latter will be removed in the
subsequent patches but the same principle will be kept.
Greg Kurz [Tue, 26 Nov 2019 16:46:33 +0000 (17:46 +0100)]
spapr/xive: Configure number of servers in KVM
The XIVE KVM devices now has an attribute to configure the number of
interrupt servers. This allows to greatly optimize the usage of the VP
space in the XIVE HW, and thus to start a lot more VMs.
Only set this attribute if available in order to support older POWER9
KVM.
The XIVE KVM device now reports the exhaustion of VPs upon the
connection of the first VCPU. Check that in order to have a chance
to provide a hint to the user.
Greg Kurz [Tue, 26 Nov 2019 16:46:28 +0000 (17:46 +0100)]
spapr/xics: Configure number of servers in KVM
The XICS-on-XIVE KVM devices now has an attribute to configure the number
of interrupt servers. This allows to greatly optimize the usage of the VP
space in the XIVE HW, and thus to start a lot more VMs.
Only set this attribute if available in order to support older POWER9 KVM
and pre-POWER9 XICS KVM devices.
Greg Kurz [Tue, 26 Nov 2019 16:46:23 +0000 (17:46 +0100)]
spapr: Pass the maximum number of vCPUs to the KVM interrupt controller
The XIVE and XICS-on-XIVE KVM devices on POWER9 hosts can greatly reduce
their consumption of some scarce HW resources, namely Virtual Presenter
identifiers, if they know the maximum number of vCPUs that may run in the
VM.
Prepare ground for this by passing the value down to xics_kvm_connect()
and kvmppc_xive_connect(). This is purely mechanical, no functional
change.
Cédric Le Goater [Mon, 25 Nov 2019 06:58:12 +0000 (07:58 +0100)]
ppc/xive: Extend the TIMA operation with a XivePresenter parameter
The TIMA operations are performed on behalf of the XIVE IVPE sub-engine
(Presenter) on the thread interrupt context registers. The current
operations supported by the model are simple and do not require access
to the controller but more complex operations will need access to the
controller NVT table and to its configuration.
Cédric Le Goater [Mon, 25 Nov 2019 06:58:11 +0000 (07:58 +0100)]
ppc/xive: Use the XiveFabric and XivePresenter interfaces
Now that the machines have handlers implementing the XiveFabric and
XivePresenter interfaces, remove xive_presenter_match() and make use
of the 'match_nvt' handler of the machine.
Cédric Le Goater [Mon, 25 Nov 2019 06:58:10 +0000 (07:58 +0100)]
ppc/spapr: Implement the XiveFabric interface
The CAM line matching sequence in the pseries machine does not change
much apart from the use of the new QOM interfaces. There is an extra
indirection because of the sPAPR IRQ backend of the machine. Only the
XIVE backend implements the new 'match_nvt' handler.