Sergio Lopez [Fri, 4 Mar 2022 10:08:51 +0000 (11:08 +0100)]
event_notifier: add event_notifier_get_wfd()
event_notifier_get_fd(const EventNotifier *e) always returns
EventNotifier's read file descriptor (rfd). This is not a problem when
the EventNotifier is backed by a an eventfd, as a single file
descriptor is used both for reading and triggering events (rfd ==
wfd).
But, when EventNotifier is backed by a pipe pair, we have two file
descriptors, one that can only be used for reads (rfd), and the other
only for writes (wfd).
There's, at least, one known situation in which we need to obtain wfd
instead of rfd, which is when setting up the file that's going to be
sent to the peer in vhost's SET_VRING_CALL.
Add a new event_notifier_get_wfd(const EventNotifier *e) that can be
used to obtain wfd where needed.
Igor Mammedov [Tue, 22 Feb 2022 10:25:04 +0000 (05:25 -0500)]
pci: drop COMPAT_PROP_PCP for 2.0 machine types
COMPAT_PROP_PCP is 'on' by default and it's used for turning
off PCP capability on PCIe slots for 2.0 machine types using
compat machinery.
Drop not needed compat glue as Q35 supports migration starting
from 2.4 machine types.
Igor Mammedov [Mon, 28 Feb 2022 13:16:34 +0000 (08:16 -0500)]
x86: cleanup unused compat_apic_id_mode
commit f862ddbb1a4 (hw/i386: Remove the deprecated pc-1.x machine types)
removed the last user of broken APIC ID compat knob,
but compat_apic_id_mode itself was forgotten.
Clean it up and simplify x86_cpu_apic_id_from_index()
vhost-vsock: detach the virqueue element in case of error
In vhost_vsock_common_send_transport_reset(), if an element popped from
the virtqueue is invalid, we should call virtqueue_detach_element() to
detach it from the virtqueue before freeing its memory.
Joelle van Dyne [Sun, 27 Feb 2022 21:06:55 +0000 (13:06 -0800)]
pc: add option to disable PS/2 mouse/keyboard
On some older software like Windows 7 installer, having both a PS/2
mouse and USB mouse results in only one device working property (which
might be a different device each boot). While the workaround to not use
a USB mouse with such software is valid, it creates an inconsistent
experience if the user wishes to always use a USB mouse.
This introduces a new machine property to inhibit the creation of the
i8042 PS/2 controller.
Igor Mammedov [Tue, 1 Mar 2022 15:11:59 +0000 (10:11 -0500)]
acpi: pcihp: pcie: set power on cap on parent slot
on creation a PCIDevice has power turned on at the end of pci_qdev_realize()
however later on if PCIe slot isn't populated with any children
it's power is turned off. It's fine if native hotplug is used
as plug callback will power slot on among other things.
However when ACPI hotplug is enabled it replaces native PCIe plug
callbacks with ACPI specific ones (acpi_pcihp_device_*plug_cb) and
as result slot stays powered off. It works fine as ACPI hotplug
on guest side takes care of enumerating/initializing hotplugged
device. But when later guest is migrated, call chain introduced by]
commit d5daff7d312 (pcie: implement slot power control for pcie root ports)
will disable earlier initialized BARs for the hotplugged device
in powered off slot due to commit 23786d13441 (pci: implement power state)
which disables BARs if power is off.
Fix it by setting PCI_EXP_SLTCTL_PCC to PCI_EXP_SLTCTL_PWR_ON
on slot (root port/downstream port) at the time a device
hotplugged into it. As result PCI_EXP_SLTCTL_PWR_ON is migrated
to target and above call chain keeps device plugged into it
powered on.
Fixes: d5daff7d312 ("pcie: implement slot power control for pcie root ports") Fixes: 23786d13441 ("pci: implement power state") Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2053584 Suggested-by: "Michael S. Tsirkin" <[email protected]> Signed-off-by: Igor Mammedov <[email protected]>
Message-Id: <20220301151200.3507298[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
zhenwei pi [Mon, 21 Feb 2022 12:27:17 +0000 (20:27 +0800)]
hw/misc/pvpanic: Use standard headers instead
QEMU side has already imported pvpanic.h from linux, remove bit
definitions from include/hw/misc/pvpanic.h, and use
include/standard-headers/linux/pvpanic.h instead.
Also minor changes for PVPANIC_CRASHLOADED -> PVPANIC_CRASH_LOADED.
Eugenio Pérez [Thu, 17 Feb 2022 17:50:29 +0000 (18:50 +0100)]
virtio-net: Unlimit tx queue size if peer is vdpa
The code used to limit the maximum size of tx queue for others backends
than vhost_user since the introduction of configurable tx queue size in 9b02e1618cf2 ("virtio-net: enable configurable tx queue size").
As vhost_user, vhost_vdpa devices should deal with memory region
crosses already, so let's use the full tx size.
Jonathan Cameron [Tue, 18 Jan 2022 17:48:55 +0000 (17:48 +0000)]
hw/pci-bridge/pxb: Fix missing swizzle
pxb_map_irq_fn() handled the necessary removal of the swizzle
applied to the PXB interrupts by the bus to which it was attached
but neglected to apply the normal swizzle for PCI root ports
on the expander bridge.
Result of this was on ARM virt, the PME interrupts for a second
RP on a PXB instance were miss-routed to #45 rather than #46.
Tested with a selection of different configurations with 1 to 5
RP per PXB instance. Note on my x86 test setup the PME interrupts
are not triggered so I haven't been able to test this.
Thomas Huth [Mon, 17 Jan 2022 19:16:39 +0000 (20:16 +0100)]
hw/i386/pc_piix: Mark the machine types from version 1.4 to 1.7 as deprecated
The list of machine types grows larger and larger each release ... and
it is unlikely that many people still use the very old ones for live
migration. QEMU v1.7 has been released more than 8 years ago, so most
people should have updated their machines to a newer version in those
8 years at least once. Thus let's mark the very old 1.x machine types
as deprecated now.
The driver can create a bypass domain by passing the
VIRTIO_IOMMU_ATTACH_F_BYPASS flag on the ATTACH request. Bypass domains
perform slightly better than domains with identity mappings since they
skip translation.
Currently the virtio-iommu device must be programmed before it allows
DMA from any PCI device. This can make the VM entirely unusable when a
virtio-iommu driver isn't present, for example in a bootloader that
loads the OS from storage.
Similarly to the other vIOMMU implementations, default to DMA bypassing
the IOMMU during boot. Add a "boot-bypass" property, defaulting to true,
that lets users change this behavior.
Replace the VIRTIO_IOMMU_F_BYPASS feature, which didn't support bypass
before feature negotiation, with VIRTIO_IOMMU_F_BYPASS_CONFIG.
We add the bypass field to the migration stream without introducing
subsections, based on the assumption that this virtio-iommu device isn't
being used in production enough to require cross-version migration at
the moment (all previous version required workarounds since they didn't
support ACPI and boot-bypass).
Dov Murik [Tue, 22 Feb 2022 07:19:06 +0000 (07:19 +0000)]
hw/i386: Replace magic number with field length calculation
Replce the literal magic number 48 with length calculation (32 bytes at
the end of the firmware after the table footer + 16 bytes of the OVMF
table footer GUID).
Dov Murik [Tue, 22 Feb 2022 07:19:05 +0000 (07:19 +0000)]
hw/i386: Improve bounds checking in OVMF table parsing
When pc_system_parse_ovmf_flash() parses the optional GUIDed table in
the end of the OVMF flash memory area, the table length field is checked
for sizes that are too small, but doesn't error on sizes that are too
big (bigger than the flash content itself).
Add a check for maximal size of the OVMF table, and add an error report
in case the size is invalid. In such a case, an error like this will be
displayed during launch:
qemu-system-x86_64: OVMF table has invalid size 4047
Jason Wang [Mon, 14 Feb 2022 06:03:46 +0000 (14:03 +0800)]
intel_iommu: support snoop control
SC is required for some kernel features like vhost-vDPA. So this patch
implements basic SC feature. The idea is pretty simple, for software
emulated DMA it would be always coherent. In this case we can simple
advertise ECAP_SC bit. For VFIO and vhost, thing will be more much
complicated, so this patch simply fail the IOMMU notifier
registration.
In the future, we may want to have a dedicated notifiers flag or
similar mechanism to demonstrate the coherency so VFIO could advertise
that if it has VFIO_DMA_CC_IOMMU, for vhost kernel backend we don't
need that since it's a software backend.
Laurent Vivier [Fri, 11 Feb 2022 16:13:09 +0000 (17:13 +0100)]
vhost-vdpa: make notifiers _init()/_uninit() symmetric
vhost_vdpa_host_notifiers_init() initializes queue notifiers
for queues "dev->vq_index" to queue "dev->vq_index + dev->nvqs",
whereas vhost_vdpa_host_notifiers_uninit() uninitializes the
same notifiers for queue "0" to queue "dev->nvqs".
This asymmetry seems buggy, fix that by using dev->vq_index
as the base for both.
Laurent Vivier [Fri, 11 Feb 2022 17:02:59 +0000 (18:02 +0100)]
hw/virtio: vdpa: Fix leak of host-notifier memory-region
If call virtio_queue_set_host_notifier_mr fails, should free
host-notifier memory-region.
This problem can trigger a coredump with some vDPA drivers (mlx5,
but not with the vdpasim), if we unplug the virtio-net card from
the guest after a stop/start.
The same fix has been done for vhost-user: 1f89d3b91e3e ("hw/virtio: Fix leak of host-notifier memory-region")
Halil Pasic [Mon, 7 Feb 2022 11:28:57 +0000 (12:28 +0100)]
virtio: fix the condition for iommu_platform not supported
The commit 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
unsupported") claims to fail the device hotplug when iommu_platform
is requested, but not supported by the (vhost) device. On the first
glance the condition for detecting that situation looks perfect, but
because a certain peculiarity of virtio_platform it ain't.
In fact the aforementioned commit introduces a regression. It breaks
virtio-fs support for Secure Execution, and most likely also for AMD SEV
or any other confidential guest scenario that relies encrypted guest
memory. The same also applies to any other vhost device that does not
support _F_ACCESS_PLATFORM.
The peculiarity is that iommu_platform and _F_ACCESS_PLATFORM collates
"device can not access all of the guest RAM" and "iova != gpa, thus
device needs to translate iova".
Confidential guest technologies currently rely on the device/hypervisor
offering _F_ACCESS_PLATFORM, so that, after the feature has been
negotiated, the guest grants access to the portions of memory the
device needs to see. So in for confidential guests, generally,
_F_ACCESS_PLATFORM is about the restricted access to memory, but not
about the addresses used being something else than guest physical
addresses.
This is the very reason for which commit f7ef7e6e3b ("vhost: correctly
turn on VIRTIO_F_IOMMU_PLATFORM") fences _F_ACCESS_PLATFORM from the
vhost device that does not need it, because on the vhost interface it
only means "I/O address translation is needed".
This patch takes inspiration from f7ef7e6e3b ("vhost: correctly turn on
VIRTIO_F_IOMMU_PLATFORM"), and uses the same condition for detecting the
situation when _F_ACCESS_PLATFORM is requested, but no I/O translation
by the device, and thus no device capability is needed. In this
situation claiming that the device does not support iommu_plattform=on
is counter-productive. So let us stop doing that!
Xueming Li [Mon, 7 Feb 2022 07:19:29 +0000 (15:19 +0800)]
vhost-user: fix VirtQ notifier cleanup
When vhost-user device cleanup, remove notifier MR and munmaps notifier
address in the event-handling thread, VM CPU thread writing the notifier
in concurrent fails with an error of accessing invalid address. It
happens because MR is still being referenced and accessed in another
thread while the underlying notifier mmap address is being freed and
becomes invalid.
This patch calls RCU and munmap notifiers in the callback after the
memory flatview update finish.
Xueming Li [Mon, 7 Feb 2022 07:19:28 +0000 (15:19 +0800)]
vhost-user: remove VirtQ notifier restore
Notifier set when vhost-user backend asks qemu to mmap an FD and
offset. When vhost-user backend restart or getting killed, VQ notifier
FD and mmap addresses become invalid. After backend restart, MR contains
the invalid address will be restored and fail on notifier access.
On the other hand, qemu should munmap the notifier, release underlying
hardware resources to enable backend restart and allocate hardware
notifier resources correctly.
Qemu shouldn't reference and use resources of disconnected backend.
This patch removes VQ notifier restore, uses the default vhost-user
notifier to avoid invalid address access.
After backend restart, the backend should ask qemu to install a hardware
notifier if needed.
Ani Sinha [Wed, 23 Feb 2022 14:33:22 +0000 (20:03 +0530)]
hw/smbios: add assertion to ensure handles of tables 19 and 32 do not collide
Since change dcf359832eec02 ("hw/smbios: fix table memory corruption with large memory vms")
we reserve additional space between handle numbers of tables 17 and 19 for
large VMs. This may cause table 19 to collide with table 32 in their handle
numbers for those large VMs. This change adds an assertion to ensure numbers
do not collide. If they do, qemu crashes with useful debug information for
taking additional steps.
Ani Sinha [Wed, 23 Feb 2022 14:33:21 +0000 (20:03 +0530)]
hw/smbios: fix overlapping table handle numbers with large memory vms
The current smbios table implementation splits the main memory in 16 GiB
(DIMM like) chunks. With the current smbios table assignment code, we can have
only 512 such chunks before the 16 bit handle numbers in the header for tables
17 and 19 conflict. A guest with more than 8 TiB of memory will hit this
limitation and would fail with the following assertion in isa-debugcon:
This change adds an additional offset between tables 17 and 19 handle numbers
when configuring VMs larger than 8 TiB of memory. The value of the offset is
calculated to be equal to the additional space required to be reserved
in order to accomodate more DIMM entries without the table handles colliding.
In normal cases where the VM memory is smaller or equal to 8 TiB, this offset
value is 0. Hence in this case, no additional handle numbers are reserved and
table handle values remain as before.
Since smbios memory is not transmitted over the wire during migration,
this change can break migration for large memory vms if the guest is in the
middle of generating the tables during migration. However, in those
situations, qemu generates invalid table handles anyway with or without this
fix. Hence, we do not preserve the old bug by introducing compat knobs/machine
types.
Ani Sinha [Wed, 23 Feb 2022 14:33:20 +0000 (20:03 +0530)]
hw/smbios: code cleanup - use macro definitions for table header handles
This is a minor cleanup. Using macro definitions makes the code more
readable. It is at once clear which tables use which handle numbers in their
header. It also makes it easy to calculate the gaps between the numbers and
update them if needed.
QOM reference counting is not designed with an infinite amount of
references in mind, trying to take a reference in a loop without
dropping a reference will overflow the integer.
It is generally a symptom of a reference leak (a missing deref, commonly
as part of error handling - such as one fixed here:
https://lore.kernel.org/r/20220228095058.27899-1-sgarzare%40redhat.com ).
All this can lead to either freeing the object too early (memory
corruption) or never freeing it (memory leak).
If we happen to dereference at just the right time (when it's wrapping
around to 0), we might eventually assert when dereferencing, but the
real problem is an extra object_ref so let's assert there to make such
issues cleaner and easier to debug.
Some micro-benchmarking shows using fetch and add this is essentially
free on x86.
Since multiple threads could be incrementing in parallel, we assert
around INT_MAX to make sure none of these approach the wrap around
point: this way we get a memory leak and not a memory corruption, the
former is generally easier to debug.
* remotes/pmaydell/tags/pull-target-arm-20220302: (26 commits)
ui/cocoa.m: Remove unnecessary NSAutoreleasePools
ui/cocoa.m: Fix updateUIInfo threading issues
target/arm: Report KVM's actual PSCI version to guest in dtb
target/arm: Implement FEAT_LPA2
target/arm: Advertise all page sizes for -cpu max
target/arm: Validate tlbi TG matches translation granule in use
target/arm: Fix TLBIRange.base for 16k and 64k pages
target/arm: Introduce tlbi_aa64_get_range
target/arm: Extend arm_fi_to_lfsc to level -1
target/arm: Implement FEAT_LPA
target/arm: Implement FEAT_LVA
target/arm: Prepare DBGBVR and DBGWVR for FEAT_LVA
target/arm: Honor TCR_ELx.{I}PS
target/arm: Use MAKE_64BIT_MASK to compute indexmask
target/arm: Pass outputsize down to check_s2_mmu_setup
target/arm: Move arm_pamax out of line
target/arm: Fault on invalid TCR_ELx.TxSZ
target/arm: Set TCR_EL1.TSZ for user-only
hw/registerfields: Add FIELD_SEX<N> and FIELD_SDP<N>
tests/qtest: add qtests for npcm7xx sdhci
...
Peter Maydell [Wed, 2 Mar 2022 20:55:48 +0000 (20:55 +0000)]
Merge remote-tracking branch 'remotes/dgilbert-gitlab/tags/pull-migration-20220302b' into staging
Migration/HMP/Virtio pull 2022-03-02
A bit of a mix this time:
* Minor fixes from myself, Hanna, and Jack
* VNC password rework by Stefan and Fabian
* Postcopy changes from Peter X that are
the start of a larger series to come
* Removing the prehistoic load_state_old
code from Peter M
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
# gpg: Signature made Wed 02 Mar 2022 18:25:12 GMT
# gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <[email protected]>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert-gitlab/tags/pull-migration-20220302b:
migration: Remove load_state_old and minimum_version_id_old
tests: Pass in MigrateStart** into test_migrate_start()
migration: Add migration_incoming_transport_cleanup()
migration: postcopy_pause_fault_thread() never fails
migration: Enlarge postcopy recovery to capture !-EIO too
migration: Move static var in ram_block_from_stream() into global
migration: Add postcopy_thread_create()
migration: Dump ramblock and offset too when non-same-page detected
migration: Introduce postcopy channels on dest node
migration: Tracepoint change in postcopy-run bottom half
migration: Finer grained tracepoints for POSTCOPY_LISTEN
migration: Dump sub-cmd name in loadvm_process_command tp
migration/rdma: set the REUSEADDR option for destination
qapi/monitor: allow VNC display id in set/expire_password
qapi/monitor: refactor set/expire_password with enums
monitor/hmp: add support for flag argument with value
virtiofsd: Let meson check for statx.stx_mnt_id
clock-vmstate: Add missing END_OF_LIST
Peter Maydell [Thu, 24 Feb 2022 10:13:30 +0000 (10:13 +0000)]
ui/cocoa.m: Remove unnecessary NSAutoreleasePools
In commit 6e657e64cdc478 in 2013 we added some autorelease pools to
deal with complaints from macOS when we made calls into Cocoa from
threads that didn't have automatically created autorelease pools.
Later on, macOS got stricter about forbidding cross-thread Cocoa
calls, and in commit 5588840ff77800e839d8 we restructured the code to
avoid them. This left the autorelease pool creation in several
functions without any purpose; delete it.
We still need the pool in cocoa_refresh() for the clipboard related
code which is called directly there.
Peter Maydell [Thu, 24 Feb 2022 10:13:29 +0000 (10:13 +0000)]
ui/cocoa.m: Fix updateUIInfo threading issues
The updateUIInfo method makes Cocoa API calls. It also calls back
into QEMU functions like dpy_set_ui_info(). To do this safely, we
need to follow two rules:
* Cocoa API calls are made on the Cocoa UI thread
* When calling back into QEMU we must hold the iothread lock
Fix the places where we got this wrong, by taking the iothread lock
while executing updateUIInfo, and moving the call in cocoa_switch()
inside the dispatch_async block.
Some of the Cocoa UI methods which call updateUIInfo are invoked as
part of the initial application startup, while we're still doing the
little cross-thread dance described in the comment just above
call_qemu_main(). This meant they were calling back into the QEMU UI
layer before we'd actually finished initializing our display and
registered the DisplayChangeListener, which isn't really valid. Once
updateUIInfo takes the iothread lock, we no longer get away with
this, because during this startup phase the iothread lock is held by
the QEMU main-loop thread which is waiting for us to finish our
display initialization. So we must suppress updateUIInfo until
applicationDidFinishLaunching allows the QEMU main-loop thread to
continue.
Peter Maydell [Thu, 24 Feb 2022 13:46:54 +0000 (13:46 +0000)]
target/arm: Report KVM's actual PSCI version to guest in dtb
When we're using KVM, the PSCI implementation is provided by the
kernel, but QEMU has to tell the guest about it via the device tree.
Currently we look at the KVM_CAP_ARM_PSCI_0_2 capability to determine
if the kernel is providing at least PSCI 0.2, but if the kernel
provides a newer version than that we will still only tell the guest
it has PSCI 0.2. (This is fairly harmless; it just means the guest
won't use newer parts of the PSCI API.)
The kernel exposes the specific PSCI version it is implementing via
the ONE_REG API; use this to report in the dtb that the PSCI
implementation is 1.0-compatible if appropriate. (The device tree
binding currently only distinguishes "pre-0.2", "0.2-compatible" and
"1.0-compatible".)
This feature widens physical addresses (and intermediate physical
addresses for 2-stage translation) from 48 to 52 bits, when using
4k or 16k pages.
This introduces the DS bit to TCR_ELx, which is RES0 unless the
page size is enabled and supports LPA2, resulting in the effective
value of DS for a given table walk. The DS bit changes the format
of the page table descriptor slightly, moving the PS field out to
TCR so that all pages have the same sharability and repurposing
those bits of the page table descriptor for the highest bits of
the output address.
Do not yet enable FEAT_LPA2; we need extra plumbing to avoid
tickling an old kernel bug.
We support 16k pages, but do not advertize that in ID_AA64MMFR0.
The value 0 in the TGRAN*_2 fields indicates that stage2 lookups defer
to the same support as stage1 lookups. This setting is deprecated, so
indicate support for all stage2 page sizes directly.
target/arm: Validate tlbi TG matches translation granule in use
For FEAT_LPA2, we will need other ARMVAParameters, which themselves
depend on the translation granule in use. We might as well validate
that the given TG matches; the architecture "does not require that
the instruction invalidates any entries" if this is not true.
Merge tlbi_aa64_range_get_length and tlbi_aa64_range_get_base,
returning a structure containing both results. Pass in the
ARMMMUIdx, rather than the digested two_ranges boolean.
This is in preparation for FEAT_LPA2, where the interpretation
of 'value' depends on the effective value of DS for the regime.
With FEAT_LPA2, rather than introducing translation level 4,
we introduce level -1, below the current level 0. Extend
arm_fi_to_lfsc to handle these faults.
Assert that this new translation level does not leak into
fault types for which it is not defined, which allows some
masking of fi->level to be removed.
This feature widens physical addresses (and intermediate physical
addresses for 2-stage translation) from 48 to 52 bits, when using
64k pages. The only thing left at this point is to handle the
extra bits in the TTBR and in the table descriptors.
Note that PAR_EL1 and HPFAR_EL2 are nominally extended, but we don't
mask out the high bits when writing to those registers, so no changes
are required there.
This feature is relatively small, as it applies only to
64k pages and thus requires no additional changes to the
table descriptor walking algorithm, only a change to the
minimum TSZ (which is the inverse of the maximum virtual
address space size).
Note that this feature widens VBAR_ELx, but we already
treat the register as being 64 bits wide.
target/arm: Prepare DBGBVR and DBGWVR for FEAT_LVA
The original A.a revision of the AArch64 ARM required that we
force-extend the addresses in these registers from 49 bits.
This language has been loosened via a combination of IMPLEMENTATION
DEFINED and CONSTRAINTED UNPREDICTABLE to allow consideration of
the entire aligned address.
This means that we do not have to consider whether or not FEAT_LVA
is enabled, and decide from which bit an address might need to be
extended.
This field controls the output (intermediate) physical address size
of the translation process. V8 requires to raise an AddressSize
fault if the page tables are programmed incorrectly, such that any
intermediate descriptor address, or the final translated address,
is out of range.
Add a PS field to ARMVAParameters, and properly compute outputsize
in get_phys_addr_lpae. Test the descaddr as extracted from TTBR
and from page table entries.
Restrict descaddrmask so that we won't raise the fault for v7.
target/arm: Pass outputsize down to check_s2_mmu_setup
Pass down the width of the output address from translation.
For now this is still just PAMax, but a subsequent patch will
compute the correct value from TCR_ELx.{I}PS.
Without FEAT_LVA, the behaviour of programming an invalid value
is IMPLEMENTATION DEFINED. With FEAT_LVA, programming an invalid
minimum value requires a Translation fault.
It is most self-consistent to choose to generate the fault always.
Wentao_Liang [Fri, 25 Feb 2022 04:01:42 +0000 (12:01 +0800)]
target/arm: Fix early free of TCG temp in handle_simd_shift_fpint_conv()
handle_simd_shift_fpint_conv() was accidentally freeing the TCG
temporary tcg_fpstatus too early, before the last use of it. Move
the free down to where it belongs.
Akihiko Odaki [Sun, 13 Feb 2022 03:57:53 +0000 (12:57 +0900)]
target/arm: Support PSCI 1.1 and SMCCC 1.0
Support the latest PSCI on TCG and HVF. A 64-bit function called from
AArch32 now returns NOT_SUPPORTED, which is necessary to adhere to SMC
Calling Convention 1.0. It is still not compliant with SMCCC 1.3 since
they do not implement mandatory functions.
Peter Maydell [Mon, 21 Feb 2022 14:07:50 +0000 (14:07 +0000)]
hw/input/tsc210x: Don't abort on bad SPI word widths
The tsc210x doesn't support anything other than 16-bit reads on the
SPI bus, but the guest can program the SPI controller to attempt
them anyway. If this happens, don't abort QEMU, just log this as
a guest error.
This fixes our machine_arm_n8x0.py:N8x0Machine.test_n800
acceptance test, which hits this assertion.
The reason we hit the assertion is because the guest kernel thinks
there is a TSC2005 on this SPI bus address, not a TSC210x. (The n810
*does* have a TSC2005 at this address.) The TSC2005 supports the
24-bit accesses which the guest driver makes, and the TSC210x does
not (that is, our TSC210x emulation is not missing support for a word
width the hardware can handle). It's not clear whether the problem
here is that the guest kernel incorrectly thinks the n800 has the
same device at this SPI bus address as the n810, or that QEMU's n810
board model doesn't get the SPI devices right. At this late date
there no longer appears to be any reliable information on the web
about the hardware behaviour, but I am inclined to think this is a
guest kernel bug. In any case, we prefer not to abort QEMU for
guest-triggerable conditions, so logging the error is the right thing
to do.
Peter Maydell [Mon, 21 Feb 2022 09:41:44 +0000 (09:41 +0000)]
hw/arm/mps2-tz.c: Update AN547 documentation URL
The AN547 application note URL has changed: update our comment
accordingly. (Rev B is still downloadable from the old URL,
but there is a new Rev C of the document now.)
Jimmy Brisson [Thu, 10 Feb 2022 21:02:27 +0000 (15:02 -0600)]
mps3-an547: Add missing user ahb interfaces
With these interfaces missing, TFM would delegate peripherals 0, 1,
2, 3 and 8, and qemu would ignore the delegation of interface 8, as
it thought interface 4 was eth & USB.
This patch corrects this behavior and allows TFM to delegate the
eth & USB peripheral to NS mode.
(The old QEMU behaviour was based on revision B of the AN547
appnote; revision C corrects this error in the documentation,
and this commit brings QEMU in to line with how the FPGA
image really behaves.)
Peter Maydell [Tue, 15 Feb 2022 17:57:05 +0000 (17:57 +0000)]
migration: Remove load_state_old and minimum_version_id_old
There are no longer any VMStateDescription structs in the tree which
use the load_state_old support for custom handling of incoming
migration from very old QEMU. Remove the mechanism entirely.
This includes removing one stray useless setting of
minimum_version_id_old in a VMStateDescription with no load_state_old
function, which crept in after the global weeding-out of them in
commit 17e313406126.
Peter Xu [Tue, 1 Mar 2022 08:39:25 +0000 (16:39 +0800)]
tests: Pass in MigrateStart** into test_migrate_start()
test_migrate_start() will release the MigrateStart structure that passed
in, however that's not super clear to the caller because after the call
returned the pointer can still be referenced by the callers. It can easily
be a source of use-after-free.
Let's pass in a double pointer of that, then we can safely clear the
pointer for the caller after the struct is released.
When do it, we should also null-ify the cleanup hook and the data, then it's
even safe to call it multiple times.
Move the socket_address_list cleanup altogether, because that's a mirror of the
listener channels and only for the purpose of query-migrate. Hence when
someone wants to cleanup the listener transport, it should also want to cleanup
the socket list too, always.
Peter Xu [Tue, 1 Mar 2022 08:39:10 +0000 (16:39 +0800)]
migration: Enlarge postcopy recovery to capture !-EIO too
We used to have quite a few places making sure -EIO happened and that's the
only way to trigger postcopy recovery. That's based on the assumption that
we'll only return -EIO for channel issues.
It'll work in 99.99% cases but logically that won't cover some corner cases.
One example is e.g. ram_block_from_stream() could fail with an interrupted
network, then -EINVAL will be returned instead of -EIO.
I remembered Dave Gilbert pointed that out before, but somehow this is
overlooked. Neither did I encounter anything outside the -EIO error.
However we'd better touch that up before it triggers a rare VM data loss during
live migrating.
To cover as much those cases as possible, remove the -EIO restriction on
triggering the postcopy recovery, because even if it's not a channel failure,
we can't do anything better than halting QEMU anyway - the corpse of the
process may even be used by a good hand to dig out useful memory regions, or
the admin could simply kill the process later on.
Peter Xu [Tue, 1 Mar 2022 08:39:06 +0000 (16:39 +0800)]
migration: Add postcopy_thread_create()
Postcopy create threads. A common manner is we init a sem and use it to sync
with the thread. Namely, we have fault_thread_sem and listen_thread_sem and
they're only used for this.
Make it a shared infrastructure so it's easier to create yet another thread.
Peter Xu [Tue, 1 Mar 2022 08:39:05 +0000 (16:39 +0800)]
migration: Dump ramblock and offset too when non-same-page detected
In ram_load_postcopy() we'll try to detect non-same-page case and dump error.
This error is very helpful for debugging. Adding ramblock & offset into the
error log too.
Peter Xu [Tue, 1 Mar 2022 08:39:04 +0000 (16:39 +0800)]
migration: Introduce postcopy channels on dest node
Postcopy handles huge pages in a special way that currently we can only have
one "channel" to transfer the page.
It's because when we install pages using UFFDIO_COPY, we need to have the whole
huge page ready, it also means we need to have a temp huge page when trying to
receive the whole content of the page.
Currently all maintainance around this tmp page is global: firstly we'll
allocate a temp huge page, then we maintain its status mostly within
ram_load_postcopy().
To enable multiple channels for postcopy, the first thing we need to do is to
prepare N temp huge pages as caching, one for each channel.
Meanwhile we need to maintain the tmp huge page status per-channel too.
To give some example, some local variables maintained in ram_load_postcopy()
are listed; they are responsible for maintaining temp huge page status:
- all_zero: this keeps whether this huge page contains all zeros
- target_pages: this counts how many target pages have been copied
- host_page: this keeps the host ptr for the page to install
Move all these fields to be together with the temp huge pages to form a new
structure called PostcopyTmpPage. Then for each (future) postcopy channel, we
need one structure to keep the state around.
For vanilla postcopy, obviously there's only one channel. It contains both
precopy and postcopy pages.
This patch teaches the dest migration node to start realize the possible number
of postcopy channels by introducing the "postcopy_channels" variable. Its
value is calculated when setup postcopy on dest node (during POSTCOPY_LISTEN
phase).
Vanilla postcopy will have channels=1, but when postcopy-preempt capability is
enabled (in the future), we will boost it to 2 because even during partial
sending of a precopy huge page we still want to preempt it and start sending
the postcopy requested page right away (so we start to keep two temp huge
pages; more if we want to enable multifd). In this patch there's a TODO marked
for that; so far the channels is always set to 1.
We need to send one "host huge page" on one channel only and we cannot split
them, because otherwise the data upon the same huge page can locate on more
than one channel so we need more complicated logic to manage. One temp host
huge page for each channel will be enough for us for now.
Postcopy will still always use the index=0 huge page even after this patch.
However it prepares for the latter patches where it can start to use multiple
channels (which needs src intervention, because only src knows which channel we
should use).
Jack Wang [Tue, 8 Feb 2022 08:56:40 +0000 (09:56 +0100)]
migration/rdma: set the REUSEADDR option for destination
We hit following error during testing RDMA transport:
in case of migration error, mgmt daemon pick one migration port,
incoming rdma:[::]:8089: RDMA ERROR: Error: could not rdma_bind_addr
Then try another -incoming rdma:[::]:8103, sometime it worked,
sometimes need another try with other ports number.
Set the REUSEADDR option for destination, This allow address could
be reused to avoid rdma_bind_addr error out.
Stefan Reiter [Fri, 25 Feb 2022 08:49:49 +0000 (09:49 +0100)]
qapi/monitor: allow VNC display id in set/expire_password
It is possible to specify more than one VNC server on the command line,
either with an explicit ID or the auto-generated ones à la "default",
"vnc2", "vnc3", ...
It is not possible to change the password on one of these extra VNC
displays though. Fix this by adding a "display" parameter to the
"set_password" and "expire_password" QMP and HMP commands.
For HMP, the display is specified using the "-d" value flag.
For QMP, the schema is updated to explicitly express the supported
variants of the commands with protocol-discriminated unions.
Stefan Reiter [Fri, 25 Feb 2022 08:49:47 +0000 (09:49 +0100)]
monitor/hmp: add support for flag argument with value
Adds support for the "-xs" parameter type, where "-x" denotes a flag
name and the "s" suffix indicates that this flag is supposed to take
an arbitrary string parameter.
These parameters are always optional, the entry in the qdict will be
omitted if the flag is not given.
Hanna Reitz [Wed, 23 Feb 2022 09:23:40 +0000 (10:23 +0100)]
virtiofsd: Let meson check for statx.stx_mnt_id
In virtiofsd, we assume that the presence of the STATX_MNT_ID macro
implies existence of the statx.stx_mnt_id field. Unfortunately, that is
not necessarily the case: glibc has introduced the macro in its commit 88a2cf6c4bab6e94a65e9c0db8813709372e9180, but the statx.stx_mnt_id field
is still missing from its own headers.
Let meson.build actually chek for both STATX_MNT_ID and
statx.stx_mnt_id, and set CONFIG_STATX_MNT_ID if both are present.
Then, use this config macro in virtiofsd.
Peter Maydell [Wed, 2 Mar 2022 12:38:46 +0000 (12:38 +0000)]
Merge remote-tracking branch 'remotes/legoater/tags/pull-ppc-20220302' into staging
ppc-7.0 queue
* ppc/pnv fixes
* PMU EBB support
* target/ppc: PowerISA Vector/VSX instruction batch
* ppc/pnv: Extension of the powernv10 machine with XIVE2 ans PHB5 models
* spapr allocation cleanups
# gpg: Signature made Wed 02 Mar 2022 11:00:42 GMT
# gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <[email protected]>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1
* remotes/legoater/tags/pull-ppc-20220302: (87 commits)
hw/ppc/spapr_vio.c: use g_autofree in spapr_dt_vdevice()
hw/ppc/spapr_rtas.c: use g_autofree in rtas_ibm_get_system_parameter()
spapr_pci_nvlink2.c: use g_autofree in spapr_phb_nvgpu_ram_populate_dt()
hw/ppc/spapr_numa.c: simplify spapr_numa_write_assoc_lookup_arrays()
hw/ppc/spapr_drc.c: use g_autofree in spapr_drc_by_index()
hw/ppc/spapr_drc.c: use g_autofree in spapr_dr_connector_new()
hw/ppc/spapr_drc.c: use g_autofree in drc_unrealize()
hw/ppc/spapr_drc.c: use g_autofree in drc_realize()
hw/ppc/spapr_drc.c: use g_auto in spapr_dt_drc()
hw/ppc/spapr_caps.c: use g_autofree in spapr_caps_add_properties()
hw/ppc/spapr_caps.c: use g_autofree in spapr_cap_get_string()
hw/ppc/spapr_caps.c: use g_autofree in spapr_cap_set_string()
hw/ppc/spapr.c: fail early if no firmware found in machine_init()
hw/ppc/spapr.c: use g_autofree in spapr_dt_chosen()
pnv/xive2: Add support for 8bits thread id
pnv/xive2: Add support for automatic save&restore
xive2: Add a get_config() handler for the router configuration
pnv/xive2: Add support XIVE2 P9-compat mode (or Gen1)
ppc/pnv: add XIVE Gen2 TIMA support
pnv/xive2: Introduce new capability bits
...
Peter Maydell [Wed, 2 Mar 2022 10:46:16 +0000 (10:46 +0000)]
Merge remote-tracking branch 'remotes/stsquad/tags/pull-testing-and-semihosting-280222-1' into staging
Testing and semihosting updates:
- restore TESTS/IMAGES filtering to docker tests
- add NOUSER to alpine image
- bump lcitool version
- move arm64/s390x cross build images to lcitool
- add aarch32 runner CI scripts
- expand testing to more vectors
- update s390x jobs to focal for gitlab/travis
- disable threadcount for all sh4
- fix semihosting SYS_HEAPINFO and test
# gpg: Signature made Mon 28 Feb 2022 18:46:41 GMT
# gpg: using RSA key 6685AE99E75167BCAFC8DF35FBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <[email protected]>" [full]
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44
* remotes/stsquad/tags/pull-testing-and-semihosting-280222-1:
tests/tcg: port SYS_HEAPINFO to a system test
semihosting/arm-compat: replace heuristic for softmmu SYS_HEAPINFO
tests/tcg: completely disable threadcount for sh4
gitlab: upgrade the job definition for s390x to 20.04
travis.yml: Update the s390x jobs to Ubuntu Focal
tests/tcg: add vectorised sha512 versions
tests/tcg: add sha512 test
tests/tcg: build sha1-vector with O3 and compare
tests/tcg/ppc64: clean-up handling of byte-reverse
gitlab: add a new aarch32 custom runner definition
scripts/ci: allow for a secondary runner
scripts/ci: add build env rules for aarch32 on aarch64
tests/docker: introduce debian-riscv64-test-cross
tests/docker: update debian-s390x-cross with lcitool
tests/docker: update debian-arm64-cross with lcitool
tests/lcitool: update to latest version
tests/docker: add NOUSER for alpine image
tests/docker: restore TESTS/IMAGES filtering
We can get the job done in spapr_numa_write_assoc_lookup_arrays() a bit
cleaner:
- 'cur_index = int_buf = g_malloc0(..)' is doing a g_malloc0() in the
'int_buf' pointer and making 'cur_index' point to 'int_buf' all in a
single line. No problem with that, but splitting into 2 lines is clearer
to follow
- use g_autofree in 'int_buf' to avoid a g_free() call later on
- 'buf_len' is only being used to store the size of 'int_buf' malloc.
Remove the var and just use the value in g_malloc0() directly
- remove the 'ret' var and just return the result of fdt_setprop()