Merge tag 'bsd-user-arm-pull-request' of gitlab.com:bsdimp/qemu into staging
bsd-user: arm (32-bit) support
This series of patches brings in 32-bit arm support for bsd-user. It implements
all the bits needed to do image activation, signal handling, stack management
and threading. This allows us to get to the "Hello World" level. The arm and x86
code are now the same as in the bsd-user fork. For full context, the fork is at
https://github.com/qemu-bsd-user/qemu-bsd-user/tree/blitz (though the the recent
sig{bus,segv} needed updates are incomplete).
v5 changes:
o Moved to using the CPUArchState typedef and move
set_sigtramp_args, get_mcontext, set_mcontext, and
get_ucontext_sigreturn prototypes to
bsd-user/freebsd/target_os_ucontext.h
o Fix issues with arm's set_mcontext related to masking
and remove an unnecessary check.
We're down to only one hunk needing review:
bsd-user/arm/target_arch_signal.c: arm set_mcontext
Warnings that should be ignored:
o make checkpatch has a couple of complaints about the comments for the
signal trampoline, since it's a false positive IMHO.
WARNING: Block comments use a leading /* on a separate line
+ /* 8 */ sys_sigreturn,
WARNING: Block comments use a leading /* on a separate line
+ /* 9 */ sys_exit
# gpg: Signature made Fri 07 Jan 2022 11:36:37 PM PST
# gpg: using RSA key 2035F894B00AA3CF7CCDE1B76C1CD1287DB01100
# gpg: Good signature from "Warner Losh <[email protected]>" [unknown]
# gpg: aka "Warner Losh <[email protected]>" [unknown]
# gpg: aka "Warner Losh <[email protected]>" [unknown]
# gpg: aka "Warner Losh <[email protected]>" [unknown]
# gpg: aka "Warner Losh <[email protected]>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2035 F894 B00A A3CF 7CCD E1B7 6C1C D128 7DB0 1100
* tag 'bsd-user-arm-pull-request' of gitlab.com:bsdimp/qemu: (37 commits)
bsd-user: add arm target build
bsd-user/freebsd/target_os_ucontext.h: Require TARGET_*CONTEXT_SIZE
bsd-user/arm/signal.c: arm get_ucontext_sigreturn
bsd-user/arm/signal.c: arm set_mcontext
bsd-user/arm/signal.c: arm get_mcontext
bsd-user/arm/signal.c: arm set_sigtramp_args
bsd-user/arm/target_arch_signal.h: Define size of *context_t
bsd-user/arm/target_arch_signal.h: arm machine context and trapframe for signals
bsd-user/arm/target_arch_signal.h: arm specific signal registers and stack
bsd-user/arm/target_arch_elf.h: arm get_hwcap2 impl
bsd-user/arm/target_arch_elf.h: arm get hwcap
bsd-user/arm/target_arch_elf.h: arm defines for ELF
bsd-user/arm/target_arch_thread.h: Routines to create and switch to a thread
bsd-user/arm/target_arch_sigtramp.h: Signal Trampoline for arm
bsd-user/arm/target_arch_vmparam.h: Parameters for arm address space
bsd-user/arm/target_arch_reg.h: Implement core dump register copying
bsd-user/arm/target_arch_cpu.h: Implement system call dispatch
bsd-user/arm/target_arch_cpu.h: Implement data abort exceptions
bsd-user/arm/target_arch_cpu.h: Implement trivial EXCP exceptions
bsd-user/arm/target_arch_cpu.h: Dummy target_cpu_loop implementation
...
Merge tag 'pull-riscv-to-apply-20220108' of github.com:alistair23/qemu into staging
Second RISC-V PR for QEMU 7.0
- Fix illegal instruction when PMP is disabled
- SiFive PDMA 64-bit support
- SiFive PLIC cleanups
- Mark Hypervisor extension as non experimental
- Enable Hypervisor extension by default
- Support 32 cores on the virt machine
- Corrections for the Vector extension
- Experimental support for 128-bit CPUs
- stval and mtval support for illegal instructions
# gpg: Signature made Fri 07 Jan 2022 09:50:11 PM PST
# gpg: using RSA key F6C4AC46D4934868D3B8CE8F21E10D29DF977054
# gpg: Good signature from "Alistair Francis <[email protected]>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: F6C4 AC46 D493 4868 D3B8 CE8F 21E1 0D29 DF97 7054
* tag 'pull-riscv-to-apply-20220108' of github.com:alistair23/qemu: (37 commits)
target/riscv: Implement the stval/mtval illegal instruction
target/riscv: Fixup setting GVA
target/riscv: Set the opcode in DisasContext
target/riscv: actual functions to realize crs 128-bit insns
target/riscv: modification of the trans_csrxx for 128-bit support
target/riscv: helper functions to wrap calls to 128-bit csr insns
target/riscv: adding high part of some csrs
target/riscv: support for 128-bit M extension
target/riscv: support for 128-bit arithmetic instructions
target/riscv: support for 128-bit shift instructions
target/riscv: support for 128-bit U-type instructions
target/riscv: support for 128-bit bitwise instructions
target/riscv: accessors to registers upper part and 128-bit load/store
target/riscv: moving some insns close to similar insns
target/riscv: setup everything for rv64 to support rv128 execution
target/riscv: array for the 64 upper bits of 128-bit registers
target/riscv: separation of bitwise logic and arithmetic helpers
target/riscv: additional macros to check instruction support
qemu/int128: addition of div/rem 128-bit operations
exec/memop: Adding signed quad and octo defines
...
Warner Losh [Thu, 4 Nov 2021 23:08:04 +0000 (17:08 -0600)]
bsd-user/arm/target_arch_signal.h: Define size of *context_t
Define the native sizes of mcontext_t and ucontext_t so that the tests
in target_os_ucontext.h ensure the size of arm's version of these
structures is correct.
Warner Losh [Thu, 23 Sep 2021 15:12:12 +0000 (09:12 -0600)]
bsd-user/arm/target_arch_vmparam.h: Parameters for arm address space
Various parameters describing the layout of the ARM address space. In
addition, define routines to get the stack pointer and to set the second
return value.
Warner Losh [Thu, 23 Sep 2021 15:08:21 +0000 (09:08 -0600)]
bsd-user/arm/target_arch_cpu.h: Implement system call dispatch
Implement the system call dispatch. This implements all three kinds of
system call: direct and the two indirect variants. It handles all the
special cases for thumb as well.
Implement EXCP_UDEF, EXCP_DEBUG, EXCP_INTERRUPT, EXCP_ATOMIC and
EXCP_YIELD. The first two generate a signal to the emulated
binary. EXCP_ATOMIC handles atomic operations. The remainder are fancy
nops.
Warner Losh [Thu, 23 Sep 2021 14:29:39 +0000 (08:29 -0600)]
bsd-user/arm/target_syscall.h: Add copyright and update name
The preferred name for the 32-bit arm is now armv7. Update the name to
reflect that. In addition, add Stacey's copyright to this file and
update the include guards to the new convention.
Warner Losh [Thu, 4 Nov 2021 22:34:48 +0000 (16:34 -0600)]
bsd-user: create a per-arch signal.c file
Create a place-holder signal.c file for each of the architectures that
are currently built. In the future, some code that's currently inlined
in target_arch_signal.h will live here.
Warner Losh [Fri, 29 Oct 2021 14:39:01 +0000 (08:39 -0600)]
bsd-user/freebsd: Create common target_os_ucontext.h file
FreeBSD has a MI ucontext structure that contains the MD mcontext
machine state and other things that are machine independent. Create an
include file for all the ucontext stuff. It needs to be included in the
arch specific files after target_mcontext is defined. This is largely
copied from sys/_ucontext.h with the comments about layout removed
because we don't support ancient FreeBSD binaries.
Warner Losh [Thu, 4 Nov 2021 22:31:27 +0000 (16:31 -0600)]
bsd-user/mips*: Remove mips support
FreeBSD is dropping support for mips starting with FreeBSD 14. mips
support has been removed from the bsd-user fork because updating it for
new signal requirements will take too much time. Remove it here since it
is a distraction.
Alistair Francis [Mon, 20 Dec 2021 06:49:16 +0000 (16:49 +1000)]
target/riscv: Implement the stval/mtval illegal instruction
The stval and mtval registers can optionally contain the faulting
instruction on an illegal instruction exception. This patch adds support
for setting the stval and mtval registers.
The RISC-V spec states that "The stval register can optionally also be
used to return the faulting instruction bits on an illegal instruction
exception...". In this case we are always writing the value on an
illegal instruction.
This doesn't match all CPUs (some CPUs won't write the data), but in
QEMU let's just populate the value on illegal instructions. This won't
break any guest software, but will provide more information to guests.
Alistair Francis [Mon, 20 Dec 2021 06:49:15 +0000 (16:49 +1000)]
target/riscv: Fixup setting GVA
In preparation for adding support for the illegal instruction address
let's fixup the Hypervisor extension setting GVA logic and improve the
variable names.
Frédéric Pétrot [Thu, 6 Jan 2022 21:01:08 +0000 (22:01 +0100)]
target/riscv: actual functions to realize crs 128-bit insns
The csrs are accessed through function pointers: we add 128-bit read
operations in the table for three csrs (writes fallback to the
64-bit version as the upper 64-bit information is handled elsewhere):
- misa, as mxl is needed for proper operation,
- mstatus and sstatus, to return sd
In addition, we also add read and write accesses to the machine and
supervisor scratch registers.
Frédéric Pétrot [Thu, 6 Jan 2022 21:01:07 +0000 (22:01 +0100)]
target/riscv: modification of the trans_csrxx for 128-bit support
As opposed to the gen_arith and gen_shift generation helpers, the csr insns
do not have a common prototype, so the choice to generate 32/64 or 128-bit
helper calls is done in the trans_csrxx functions.
Frédéric Pétrot [Thu, 6 Jan 2022 21:01:06 +0000 (22:01 +0100)]
target/riscv: helper functions to wrap calls to 128-bit csr insns
Given the side effects they have, the csr instructions are realized as
helpers. We extend this existing infrastructure for 128-bit sized csr.
We return 128-bit values using the same approach as for div/rem.
Theses helpers all call a unique function that is currently a fallback
on the 64-bit version.
The trans_csrxx functions supporting 128-bit are yet to be implemented.
Frédéric Pétrot [Thu, 6 Jan 2022 21:01:04 +0000 (22:01 +0100)]
target/riscv: support for 128-bit M extension
Mult are generated inline (using a cool trick pointed out by Richard), but
for div and rem, given the complexity of the implementation of these
instructions, we call helpers to produce their behavior. From an
implementation standpoint, the helpers return the low part of the results,
while the high part is temporarily stored in a dedicated field of cpu_env
that is used to update the architectural register in the generation wrapper.
Frédéric Pétrot [Thu, 6 Jan 2022 21:01:03 +0000 (22:01 +0100)]
target/riscv: support for 128-bit arithmetic instructions
Addition of 128-bit adds and subs in their various sizes,
"set if less than"s and branches.
Refactored the code to have a comparison function used for both stls and
branches.
Frédéric Pétrot [Thu, 6 Jan 2022 21:01:00 +0000 (22:01 +0100)]
target/riscv: support for 128-bit bitwise instructions
The 128-bit bitwise instructions do not need any function prototype change
as the functions can be applied independently on the lower and upper part of
the registers.
Frédéric Pétrot [Thu, 6 Jan 2022 21:00:59 +0000 (22:00 +0100)]
target/riscv: accessors to registers upper part and 128-bit load/store
Get function to retrieve the 64 top bits of a register, stored in the gprh
field of the cpu state. Set function that writes the 128-bit value at once.
The access to the gprh field can not be protected at compile time to make
sure it is accessed only in the 128-bit version of the processor because we
have no way to indicate that the misa_mxl_max field is const.
The 128-bit ISA adds ldu, lq and sq. We provide support for these
instructions. Note that (a) we compute only 64-bit addresses to actually
access memory, cowardly utilizing the existing address translation mechanism
of QEMU, and (b) we assume for now little-endian memory accesses.
Frédéric Pétrot [Thu, 6 Jan 2022 21:00:58 +0000 (22:00 +0100)]
target/riscv: moving some insns close to similar insns
lwu and ld are functionally close to the other loads, but were after the
stores in the source file.
Similarly, xor was away from or and and by two arithmetic functions, while
the immediate versions were nicely put together.
This patch moves the aforementioned loads after lhu, and xor above or,
where they more logically belong.
Frédéric Pétrot [Thu, 6 Jan 2022 21:00:57 +0000 (22:00 +0100)]
target/riscv: setup everything for rv64 to support rv128 execution
This patch adds the support of the '-cpu rv128' option to
qemu-system-riscv64 so that we can indicate that we want to run rv128
executables.
Still, there is no support for 128-bit insns at that stage so qemu fails
miserably (as expected) if launched with this option.
Frédéric Pétrot [Thu, 6 Jan 2022 21:00:55 +0000 (22:00 +0100)]
target/riscv: separation of bitwise logic and arithmetic helpers
Introduction of a gen_logic function for bitwise logic to implement
instructions in which no propagation of information occurs between bits and
use of this function on the bitwise instructions.
Frédéric Pétrot [Thu, 6 Jan 2022 21:00:54 +0000 (22:00 +0100)]
target/riscv: additional macros to check instruction support
Given that the 128-bit version of the riscv spec adds new instructions, and
that some instructions that were previously only available in 64-bit mode
are now available for both 64-bit and 128-bit, we added new macros to check
for the processor mode during translation.
Although RV128 is a superset of RV64, we keep for now the RV64 only tests
for extensions other than RVI and RVM.
Frédéric Pétrot [Thu, 6 Jan 2022 21:00:53 +0000 (22:00 +0100)]
qemu/int128: addition of div/rem 128-bit operations
Addition of div and rem on 128-bit integers, using the 128/64->128 divu and
64x64->128 mulu in host-utils.
These operations will be used within div/rem helpers in the 128-bit riscv
target.
Frédéric Pétrot [Thu, 6 Jan 2022 21:00:51 +0000 (22:00 +0100)]
exec/memop: Adding signedness to quad definitions
Renaming defines for quad in their various forms so that their signedness is
now explicit.
Done using git grep as suggested by Philippe, with a bit of hand edition to
keep assignments aligned.
Philipp Tomsich [Thu, 6 Jan 2022 13:40:20 +0000 (14:40 +0100)]
target/riscv: Fix position of 'experimental' comment
When commit 0643c12e4b dropped the 'x-' prefix for Zb[abcs] and set
them to be enabled by default, the comment about experimental
extensions was kept in place above them. This moves it down a few
lines to only cover experimental extensions.
Frank Chang [Wed, 5 Jan 2022 02:22:46 +0000 (10:22 +0800)]
target/riscv: rvv-1.0: Call the correct RVF/RVD check function for narrowing fp/int type-convert insns
vfncvt.f.xu.w, vfncvt.f.x.w convert double-width integer to single-width
floating-point. Therefore, should use require_rvf() to check whether
RVF/RVD is enabled.
vfncvt.f.f.w, vfncvt.rod.f.f.w convert double-width floating-point to
single-width integer. Therefore, should use require_scale_rvf() to check
whether RVF/RVD is enabled.
Frank Chang [Wed, 5 Jan 2022 02:22:45 +0000 (10:22 +0800)]
target/riscv: rvv-1.0: Call the correct RVF/RVD check function for widening fp/int type-convert insns
vfwcvt.xu.f.v, vfwcvt.x.f.v, vfwcvt.rtz.xu.f.v and vfwcvt.rtz.x.f.v
convert single-width floating-point to double-width integer.
Therefore, should use require_rvf() to check whether RVF/RVD is enabled.
vfwcvt.f.xu.v, vfwcvt.f.x.v convert single-width integer to double-width
floating-point, and vfwcvt.f.f.v convert double-width floating-point to
single-width floating-point. Therefore, should use require_scale_rvf() to
check whether RVF/RVD is enabled.
target/riscv: Enable the Hypervisor extension by default
Let's enable the Hypervisor extension by default. This doesn't affect
named CPUs (such as lowrisc-ibex or sifive-u54) but does enable the
Hypervisor extensions by default for the virt machine.
We can remove the original sifive_plic_irqs_pending() function and
instead just use the sifive_plic_claim() function (renamed to
sifive_plic_claimed()) to determine if any interrupts are pending.
This requires move the side effects outside of sifive_plic_claimed(),
but as they are only invoked once that isn't a problem.
We have also removed all of the old #ifdef debugging logs, so let's
cleanup the last remaining debug function while we are here.
Jim Shu [Tue, 4 Jan 2022 06:34:08 +0000 (14:34 +0800)]
hw/dma: sifive_pdma: permit 4/8-byte access size of PDMA registers
It's obvious that PDMA supports 64-bit access of 64-bit registers, and
in previous commit, we confirm that PDMA supports 32-bit access of
both 32/64-bit registers. Thus, we configure 32/64-bit memory access
of PDMA registers as valid in general.
Nikita Shubin [Tue, 14 Dec 2021 09:26:59 +0000 (12:26 +0300)]
target/riscv/pmp: fix no pmp illegal intrs
As per the privilege specification, any access from S/U mode should fail
if no pmp region is configured and pmp is present, othwerwise access
should succeed.
Merge tag 'for_upstream' of git://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio,pci,pc: features,fixes,cleanups
New virtio mem options.
A vhost-user cleanup.
Control over smbios entry point type.
Config interrupt support for vdpa.
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Fri 07 Jan 2022 04:30:41 PM PST
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "[email protected]"
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>" [undefined]
# gpg: aka "Michael S. Tsirkin <[email protected]>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of git://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (55 commits)
tests: acpi: Add updated TPM related tables
acpi: tpm: Add missing device identification objects
tests: acpi: prepare for updated TPM related tables
virtio/vhost-vsock: don't double close vhostfd, remove redundant cleanup
hw/scsi/vhost-scsi: don't double close vhostfd on error
hw/scsi/vhost-scsi: don't leak vqs on error
docs: reSTify virtio-balloon-stats documentation and move to docs/interop
hw/i386/pc: Add missing property descriptions
acpihp: simplify acpi_pcihp_disable_root_bus
tests: acpi: SLIC: update expected blobs
tests: acpi: add SLIC table test
tests: acpi: whitelist expected blobs before changing them
acpi: fix QEMU crash when started with SLIC table
intel-iommu: correctly check passthrough during translation
virtio-mem: Set "unplugged-inaccessible=auto" for the 7.0 machine on x86
virtio-mem: Support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE
linux-headers: sync VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE
MAINTAINERS: Add a separate entry for acpi/VIOT tables
virtio: signal after wrapping packed used_idx
virtio-mem: Support "prealloc=on" option
...
Daniil Tatianin [Mon, 29 Nov 2021 12:52:04 +0000 (15:52 +0300)]
virtio/vhost-vsock: don't double close vhostfd, remove redundant cleanup
In case of an error during initialization in vhost_dev_init, vhostfd is
closed in vhost_dev_cleanup. Remove close from err_virtio as it's both
redundant and causes a double close on vhostfd.
Daniil Tatianin [Mon, 29 Nov 2021 13:23:57 +0000 (16:23 +0300)]
hw/scsi/vhost-scsi: don't leak vqs on error
vhost_dev_init calls vhost_dev_cleanup in case of an error during
initialization, which zeroes out the entire vsc->dev as well as the
vsc->dev.vqs pointer. This prevents us from properly freeing it in free_vqs.
Keep a local copy of the pointer so we can free it later.
Thomas Huth [Wed, 5 Jan 2022 11:52:45 +0000 (12:52 +0100)]
docs: reSTify virtio-balloon-stats documentation and move to docs/interop
The virtio-balloon-stats documentation might be useful for people that
are implementing software that talks to QEMU via QMP, so this should
reside in the docs/interop/ directory. While we're at it, also convert
the file to restructured text and mention it in the MAINTAINERS file.
Thomas Huth [Mon, 6 Dec 2021 13:42:55 +0000 (14:42 +0100)]
hw/i386/pc: Add missing property descriptions
When running "qemu-system-x86_64 -M pc,help" I noticed that some
properties were still missing their description. Add them now so
that users get at least a slightly better idea what they are all
about.
Ani Sinha [Wed, 29 Dec 2021 07:57:53 +0000 (13:27 +0530)]
acpihp: simplify acpi_pcihp_disable_root_bus
Get rid of the static variable that keeps track of whether hotplug has been
disabled on the root pci bus. Simply use qbus_is_hotpluggable() api to
perform the same check. This eliminates additional if conditional and
simplifies the function.
Igor Mammedov [Mon, 27 Dec 2021 19:31:19 +0000 (14:31 -0500)]
tests: acpi: add SLIC table test
When user uses '-acpitable' to add SLIC table, some ACPI
tables (FADT) will change its 'Oem ID'/'Oem Table ID' fields to
match that of SLIC. Test makes sure thati QEMU handles
those fields correctly when SLIC table is added with
'-acpitable' option.
Igor Mammedov [Mon, 27 Dec 2021 19:31:17 +0000 (14:31 -0500)]
acpi: fix QEMU crash when started with SLIC table
if QEMU is started with used provided SLIC table blob,
-acpitable sig=SLIC,oem_id='CRASH ',oem_table_id="ME",oem_rev=00002210,asl_compiler_id="",asl_compiler_rev=00000000,data=/dev/null
it will assert with:
...
build_append_padded_str (array=0x555556afe320, str=0x555556afdb2e "CRASH ME", maxlen=0x6, pad=0x20) at hw/acpi/aml-build.c:61
acpi_table_begin (desc=0x7fffffffd1b0, array=0x555556afe320) at hw/acpi/aml-build.c:1727
build_fadt (tbl=0x555556afe320, linker=0x555557ca3830, f=0x7fffffffd318, oem_id=0x555556afdb2e "CRASH ME", oem_table_id=0x555556afdb34 "ME") at hw/acpi/aml-build.c:2064
...
which happens due to acpi_table_begin() expecting NULL terminated
oem_id and oem_table_id strings, which is normally the case, but
in case of user provided SLIC table, oem_id points to table's blob
directly and as result oem_id became longer than expected.
Fix issue by handling oem_id consistently and make acpi_get_slic_oem()
return NULL terminated strings.
PS:
After [1] refactoring, oem_id semantics became inconsistent, where
NULL terminated string was coming from machine and old way pointer
into byte array coming from -acpitable option. That used to work
since build_header() wasn't expecting NULL terminated string and
blindly copied the 1st 6 bytes only.
However commit [2] broke that by replacing build_header() with
acpi_table_begin(), which was expecting NULL terminated string
and was checking oem_id size.
Jason Wang [Wed, 5 Jan 2022 04:19:42 +0000 (12:19 +0800)]
intel-iommu: correctly check passthrough during translation
When scalable mode is enabled, the passthrough more is not determined
by the context entry but PASID entry, so switch to use the logic of
vtd_dev_pt_enabled() to determine the passthrough mode in
vtd_do_iommu_translate().
virtio-mem: Support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE
With VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we signal the VM that reading
unplugged memory is not supported. We have to fail feature negotiation
in case the guest does not support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE.
First, VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE is required to properly handle
memory backends (or architectures) without support for the shared zeropage
in the hypervisor cleanly. Without the shared zeropage, even reading an
unpopulated virtual memory location can populate real memory and
consequently consume memory in the hypervisor. We have a guaranteed shared
zeropage only on MAP_PRIVATE anonymous memory.
Second, we want VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE to be the default
long-term as even populating the shared zeropage can be problematic: for
example, without THP support (possible) or without support for the shared
huge zeropage with THP (unlikely), the PTE page tables to hold the shared
zeropage entries can consume quite some memory that cannot be reclaimed
easily.
Third, there are other optimizations+features (e.g., protection of
unplugged memory, reducing the total memory slot size and bitmap sizes)
that will require VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE.
We really only support x86 targets with virtio-mem for now (and
Linux similarly only support x86), but that might change soon, so prepare
for different targets already.
Add a new "unplugged-inaccessible" tristate property for x86 targets:
- "off" will keep VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE unset and legacy
guests working.
- "on" will set VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE and stop legacy guests
from using the device.
- "auto" selects the default based on support for the shared zeropage.
Warn in case the property is set to "off" and we don't have support for the
shared zeropage.
For existing compat machines, the property will default to "off", to
not change the behavior but eventually warn about a problematic setup.
Short-term, we'll set the property default to "auto" for new QEMU machines.
Mid-term, we'll set the property default to "on" for new QEMU machines.
Long-term, we'll deprecate the parameter and disallow legacy
guests completely.
The property has to match on the migration source and destination. "auto"
will result in the same VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE setting as long
as the qemu command line (esp. memdev) match -- so "auto" is good enough
for migration purposes and the parameter doesn't have to be migrated
explicitly.
Stefan Hajnoczi [Tue, 30 Nov 2021 13:45:10 +0000 (13:45 +0000)]
virtio: signal after wrapping packed used_idx
Packed Virtqueues wrap used_idx instead of letting it run freely like
Split Virtqueues do. If the used ring wraps more than once there is no
way to compare vq->signalled_used and vq->used_idx in
virtio_packed_should_notify() since they are modulo vq->vring.num.
This causes the device to stop sending used buffer notifications when
when virtio_packed_should_notify() is called less than once each time
around the used ring.
It is possible to trigger this with virtio-blk's dataplane
notify_guest_bh() irq coalescing optimization. The call to
virtio_notify_irqfd() (and virtio_packed_should_notify()) is deferred to
a BH. If the guest driver is polling it can complete and submit more
requests before the BH executes, causing the used ring to wrap more than
once. The result is that the virtio-blk device ceases to raise
interrupts and I/O hangs.