Git Repo - qemu.git/log

assertions for block_int global state API

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
Message-Id: <20220303151616 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

include/block/block_int: split header into I/O and global state API

Similarly to the previous patch, split block_int.h
in block_int-io.h and block_int-global-state.h

block_int-common.h contains the structures shared between
the two headers, and the functions that can't be categorized as
I/O or global state.

Assertions are added in the next patch.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
Message-Id: <20220303151616 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block.c: assertions to the block layer permissions API

Now that we "covered" the three main cases where the
permission API was being used under BQL (fuse,
amend and invalidate_cache), we can safely assert for
the permission functions implemented in block.c

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
Message-Id: <20220303151616 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

IO_CODE and IO_OR_GS_CODE for block-backend I/O API

Mark all I/O functions with IO_CODE, and all "I/O OR GS" with
IO_OR_GS_CODE.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
Message-Id: <20220303151616 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block/block-backend.c: assertions for block-backend

All the global state (GS) API functions will check that
qemu_in_main_thread() returns true. If not, it means
that the safety of BQL cannot be guaranteed, and
they need to be moved to I/O.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
Message-Id: <20220303151616 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

include/sysemu/block-backend: split header into I/O and global state (GS) API

Similarly to the previous patches, split block-backend.h
in block-backend-io.h and block-backend-global-state.h

In addition, remove "block/block.h" include as it seems
it is not necessary anymore, together with "qemu/iov.h"

block-backend-common.h contains the structures shared between
the two headers, and the functions that can't be categorized as
I/O or global state.

Assertions are added in the next patch.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
Message-Id: <20220303151616 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block/export/fuse.c: allow writable exports to take RESIZE permission

Allow writable exports to get BLK_PERM_RESIZE permission
from creation, in fuse_export_create().
In this way, there is no need to give the permission in
fuse_do_truncate(), which might be run in an iothread.

Permissions should be set only in the main thread, so
in any case if an iothread tries to set RESIZE, it will
be blocked.

Also assert in fuse_do_truncate that if we give the
RESIZE permission we can then restore the original ones.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
Reviewed-by: Hanna Reitz <[email protected]>
Message-Id: <20220303151616 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

IO_CODE and IO_OR_GS_CODE for block I/O API

Mark all I/O functions with IO_CODE, and all "I/O OR GS" with
IO_OR_GS_CODE.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
Message-Id: <20220303151616 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

assertions for block global state API

All the global state (GS) API functions will check that
qemu_in_main_thread() returns true. If not, it means
that the safety of BQL cannot be guaranteed, and
they need to be moved to I/O.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
Message-Id: <20220303151616 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

include/block/block: split header into I/O and global state API

block.h currently contains a mix of functions:
some of them run under the BQL and modify the block layer graph,
others are instead thread-safe and perform I/O in iothreads.
Some others can only be called by either the main loop or the
iothread running the AioContext (and not other iothreads),
and using them in another thread would cause deadlocks, and therefore
it is not ideal to define them as I/O.

It is not easy to understand which function is part of which
group (I/O vs GS vs "I/O or GS"), and this patch aims to clarify it.

The "GS" functions need the BQL, and often use
aio_context_acquire/release and/or drain to be sure they
can modify the graph safely.
The I/O function are instead thread safe, and can run in
any AioContext.
"I/O or GS" functions run instead in the main loop or in
a single iothread, and use BDRV_POLL_WHILE().

By splitting the header in two files, block-io.h
and block-global-state.h we have a clearer view on what
needs what kind of protection. block-common.h
contains common structures shared by both headers.

block.h is left there for legacy and to avoid changing
all includes in all c files that use the block APIs.

Assertions are added in the next patch.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
Message-Id: <20220303151616 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

main loop: macros to mark GS and I/O functions

Righ now, IO_CODE and IO_OR_GS_CODE are nop, as there isn't
really a way to check that a function is only called in I/O.
On the other side, we can use qemu_in_main_thread() to check if
we are in the main loop.

The usage of macros makes easy to extend them in the future without
making changes in all callers. They will also visually help understanding
in which category each function is, without looking at the header.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
Message-Id: <20220303151616 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

main-loop.h: introduce qemu_in_main_thread()

When invoked from the main loop, this function is the same
as qemu_mutex_iothread_locked, and returns true if the BQL is held.
When invoked from iothreads or tests, it returns true only
if the current AioContext is the Main Loop.

This essentially just extends qemu_mutex_iothread_locked to work
also in unit tests or other users like storage-daemon, that run
in the Main Loop but end up using the implementation in
stubs/iothread-lock.c.

Using qemu_mutex_iothread_locked in unit tests defaults to false
because they use the implementation in stubs/iothread-lock,
making all assertions added in next patches fail despite the
AioContext is still the main loop.

See the comment in the function header for more information.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
Message-Id: <20220303151616 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests/185: Add post-READY quit tests

185 tests quitting qemu while a block job is active. It does not
specifically test quitting qemu while a mirror or active commit job is
in its READY phase.

Add two test cases for this, where we respectively mirror or commit to
an external QSD instance, which provides a throttled block device. qemu
is supposed to cancel the job so that it can quit as soon as possible
instead of waiting for the job to complete (which it did before 6.2).

Signed-off-by: Hanna Reitz <[email protected]>
Message-Id: <20220303164814 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qsd: Add --daemonize

To implement this, we reuse the existing daemonizing functions from the
system emulator, which mainly do the following:
- Fork off a child process, and set up a pipe between parent and child
- The parent process waits until the child sends a status byte over the
  pipe (0 means that the child was set up successfully; anything else
  (including errors or EOF) means that the child was not set up
  successfully), and then exits with an appropriate exit status
- The child process enters a new session (forking off again), changes
  the umask, and will ignore terminal signals from then on
- Once set-up is complete, the child will chdir to /, redirect all
  standard I/O streams to /dev/null, and tell the parent that set-up has
  been completed successfully

In contrast to qemu-nbd's --fork implementation, during the set up
phase, error messages are not piped through the parent process.
qemu-nbd mainly does this to detect errors, though (while os_daemonize()
has the child explicitly signal success after set up); because we do not
redirect stderr after forking, error messages continue to appear on
whatever the parent's stderr was (until set up is complete).

Signed-off-by: Hanna Reitz <[email protected]>
Message-Id: <20220303164814 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qsd: Add pre-init argument parsing pass

In contrast to qemu-nbd (where it is called --fork) and the system
emulator, QSD does not have a --daemonize switch yet.  Just like them,
QSD allows setting up block devices and exports on the command line.
When doing so, it is often necessary for whoever invoked the QSD to wait
until these exports are fully set up.  A --daemonize switch allows
precisely this, by virtue of the parent process exiting once everything
is set up.

Note that there are alternative ways of waiting for all exports to be
set up, for example:
- Passing the --pidfile option and waiting until the respective file
  exists (but I do not know if there is a way of implementing this
  without a busy wait loop)
- Set up some network server (e.g. on a Unix socket) and have the QSD
  connect to it after all arguments have been processed by appending
  corresponding --chardev and --monitor options to the command line,
  and then wait until the QSD connects

Having a --daemonize option would make this simpler, though, without
having to rely on additional tools (to set up a network server) or busy
waiting.

Implementing a --daemonize switch means having to fork the QSD process.
Ideally, we should do this as early as possible: All the parent process
has to do is to wait for the child process to signal completion of its
set-up phase, and therefore there is basically no initialization that
needs to be done before the fork.  On the other hand, forking after
initialization steps means having to consider how those steps (like
setting up the block layer or QMP) interact with a later fork, which is
often not trivial.

In order to fork this early, we must scan the command line for
--daemonize long before our current process_options() call.  Instead of
adding custom new code to do so, just reuse process_options() and give
it a @pre_init_pass argument to distinguish the two passes.  I believe
there are some other switches but --daemonize that deserve parsing in
the first pass:

- --help and --version are supposed to only print some text and then
  immediately exit (so any initialization we do would be for naught).
  This changes behavior, because now "--blockdev inv-drv --help" will
  print a help text instead of complaining about the --blockdev
  argument.
  Note that this is similar in behavior to other tools, though: "--help"
  is generally immediately acted upon when finding it in the argument
  list, potentially before other arguments (even ones before it) are
  acted on.  For example, "ls /does-not-exist --help" prints a help text
  and does not complain about ENOENT.

- --pidfile does not need initialization, and is already exempted from
  the sequential order that process_options() claims to strictly follow
  (the PID file is only created after all arguments are processed, not
  at the time the --pidfile argument appears), so it makes sense to
  include it in the same category as --daemonize.

- Invalid arguments should always be reported as soon as possible.  (The
  same caveat with --help applies: That means that "--blockdev inv-drv
  --inv-arg" will now complain about --inv-arg, not inv-drv.)

This patch does make some references to --daemonize without having
implemented it yet, but that will happen in the next patch.

Signed-off-by: Hanna Reitz <[email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20220303164814 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

os-posix: Add os_set_daemonize()

The daemonizing functions in os-posix (os_daemonize() and
os_setup_post()) only daemonize the process if the static `daemonize`
variable is set. Right now, it can only be set by os_parse_cmd_args().

In order to use os_daemonize() and os_setup_post() from the storage
daemon to have it be daemonized, we need some other way to set this
`daemonize` variable, because I would rather not tap into the system
emulator's arg-parsing code. Therefore, this patch adds an
os_set_daemonize() function, which will return an error on os-win32
(because daemonizing is not supported there).

Signed-off-by: Hanna Reitz <[email protected]>
Message-Id: <20220303164814 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

cpus: use coroutine TLS macros for iothread_locked

qemu_mutex_iothread_locked() may be used from coroutines. Standard
__thread variables cannot be used by coroutines. Use the coroutine TLS
macros instead.

Signed-off-by: Stefan Hajnoczi <[email protected]>
Message-Id: <20220222140150 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

rcu: use coroutine TLS macros

RCU may be used from coroutines. Standard __thread variables cannot be
used by coroutines. Use the coroutine TLS macros instead.

Signed-off-by: Stefan Hajnoczi <[email protected]>
Message-Id: <20220222140150 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

util/async: replace __thread with QEMU TLS macros

QEMU TLS macros must be used to make TLS variables safe with coroutines.

Signed-off-by: Stefan Hajnoczi <[email protected]>
Message-Id: <20220222140150 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

tls: add macros for coroutine-safe TLS variables

Compiler optimizations can cache TLS values across coroutine yield
points, resulting in stale values from the previous thread when a
coroutine is re-entered by a new thread.

Serge Guelton developed an __attribute__((noinline)) wrapper and tested
it with clang and gcc. I formatted his idea according to QEMU's coding
style and wrote documentation.

The compiler can still optimize based on analyzing noinline code, so an
asm volatile barrier with an output constraint is required to prevent
unwanted optimizations.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1952483
Suggested-by: Serge Guelton <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>
Message-Id: <20220222140150 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: move BQL logic of bdrv_co_invalidate_cache in bdrv_activate

Split bdrv_co_invalidate cache in two: the Global State (under BQL)
code that takes care of permissions and running GS callbacks,
and leave only the I/O code (->bdrv_co_invalidate_cache) running in
the I/O coroutine.

The only side effect is that bdrv_co_invalidate_cache is not
recursive anymore, and so is every direct call to
bdrv_invalidate_cache().

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
Message-Id: <20220209105452.1694545 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: rename bdrv_invalidate_cache_all, blk_invalidate_cache and test_sync_op_invalidate_cache

Following the bdrv_activate renaming, change also the name
of the respective callers.

bdrv_invalidate_cache_all -> bdrv_activate_all
blk_invalidate_cache -> blk_activate
test_sync_op_invalidate_cache -> test_sync_op_activate

No functional change intended.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Reviewed-by: Hanna Reitz <[email protected]>
Message-Id: <20220209105452.1694545 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: introduce bdrv_activate

This function is currently just a wrapper for bdrv_invalidate_cache(),
but in future will contain the code of bdrv_co_invalidate_cache() that
has to always be protected by BQL, and leave the rest in the I/O
coroutine.

Replace all bdrv_invalidate_cache() invokations with bdrv_activate().

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
Reviewed-by: Hanna Reitz <[email protected]>
Message-Id: <20220209105452.1694545 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

crypto: distinguish between main loop and I/O in block_crypto_amend_options_generic_luks

block_crypto_amend_options_generic_luks uses the block layer
permission API, therefore it should be called with the BQL held.

However, the same function is being called by two BlockDriver
callbacks: bdrv_amend_options (under BQL) and bdrv_co_amend (I/O).

The latter is I/O because it is invoked by block/amend.c's
blockdev_amend_run(), a .run callback of the amend JobDriver.

Therefore we want to change this function to still perform
the permission check, but making sure it is done under BQL regardless
of the caller context.

Remove the permission check in block_crypto_amend_options_generic_luks()
and:
- in block_crypto_amend_options_luks() (BQL case, called by
  .bdrv_amend_options()), reuse helper functions
  block_crypto_amend_{prepare/cleanup} that take care of checking
  permissions.

- for block_crypto_co_amend_luks() (I/O case, called by
  .bdrv_co_amend()), don't check for permissions but delegate
  .bdrv_amend_pre_run() and .bdrv_amend_clean() to do it,
  performing these checks before and after the job runs in its aiocontext.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
Message-Id: <20220209105452.1694545 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

crypto: perform permission checks under BQL

Move the permission API calls into driver-specific callbacks
that always run under BQL. In this case, bdrv_crypto_luks
needs to perform permission checks before and after
qcrypto_block_amend_options(). The problem is that the caller,
block_crypto_amend_options_generic_luks(), can also run in I/O
from .bdrv_co_amend(). This does not comply with Global State-I/O API split,
as permissions API must always run under BQL.

Firstly, introduce .bdrv_amend_pre_run() and .bdrv_amend_clean()
callbacks. These two callbacks are guaranteed to be invoked under
BQL, respectively before and after .bdrv_co_amend().
They take care of performing the permission checks
in the same way as they are currently done before and after
qcrypto_block_amend_options().
These callbacks are in preparation for next patch, where we
delete the original permission check. Right now they just add redundant
control.

Then, call .bdrv_amend_pre_run() before job_start in
qmp_x_blockdev_amend(), so that it will be run before the job coroutine
is created and stay in the main loop.
As a cleanup, use JobDriver's .clean() callback to call
.bdrv_amend_clean(), and run amend-specific cleanup callbacks under BQL.

After this patch, permission failures occur early in the blockdev-amend
job to update a LUKS volume's keys. iotest 296 must now expect them in
x-blockdev-amend's QMP reply instead of waiting for the actual job to
fail later.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
Message-Id: <20220209105452.1694545 [email protected]>
Signed-off-by: Hanna Reitz <[email protected]>
Message-Id: <20220304153729 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

Merge remote-tracking branch 'remotes/nvme/tags/nvme-next-pull-request' into staging

hw/nvme updates

- add enhanced protection information (64-bit guard)

# gpg: Signature made Fri 04 Mar 2022 06:23:36 GMT
# gpg:                using RSA key 522833AA75E2DCE6A24766C04DE1AF316D4F0DE9
# gpg: Good signature from "Klaus Jensen <[email protected]>" [unknown]
# gpg:                 aka "Klaus Jensen <[email protected]>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: DDCA 4D9C 9EF9 31CC 3468  4272 63D5 6FC5 E55D A838
#      Subkey fingerprint: 5228 33AA 75E2 DCE6 A247  66C0 4DE1 AF31 6D4F 0DE9

* remotes/nvme/tags/nvme-next-pull-request:
  hw/nvme: 64-bit pi support
  hw/nvme: add pi tuple size helper
  hw/nvme: add support for the lbafee hbs feature
  hw/nvme: move format parameter parsing
  hw/nvme: add host behavior support feature
  hw/nvme: move dif/pi prototypes into dif.h

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-nios-20220303' into staging

Rewrite nios2 interrupt handling

# gpg: Signature made Thu 03 Mar 2022 19:52:33 GMT
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Richard Henderson <[email protected]>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth-gitlab/tags/pull-nios-20220303:
  target/nios2: Rewrite interrupt handling
  target/nios2: Special case ipending in rdctl and wrctl
  target/nios2: Split mmu_write
  target/nios2: Hoist R_ZERO check in rdctl
  target/nios2: Only build mmu.c for system mode
  target/nios2: Replace MMU_LOG with tracepoints
  target/nios2: Remove mmu_read_debug

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/alistair/tags/pull-riscv-to-apply-20220303' into staging

Fifth RISC-V PR for QEMU 7.0

* Fixup checks for ext_zb[abcs]
* Add AIA support for virt machine
* Increase maximum number of CPUs in virt machine
* Fixup OpenTitan SPI address
* Add support for zfinx, zdinx and zhinx{min} extensions

# gpg: Signature made Thu 03 Mar 2022 05:26:55 GMT
# gpg:                using RSA key F6C4AC46D4934868D3B8CE8F21E10D29DF977054
# gpg: Good signature from "Alistair Francis <[email protected]>" [full]
# Primary key fingerprint: F6C4 AC46 D493 4868 D3B8  CE8F 21E1 0D29 DF97 7054

* remotes/alistair/tags/pull-riscv-to-apply-20220303:
  target/riscv: expose zfinx, zdinx, zhinx{min} properties
  target/riscv: add support for zhinx/zhinxmin
  target/riscv: add support for zdinx
  target/riscv: add support for zfinx
  target/riscv: hardwire mstatus.FS to zero when enable zfinx
  target/riscv: add cfg properties for zfinx, zdinx and zhinx{min}
  hw: riscv: opentitan: fixup SPI addresses
  hw/riscv: virt: Increase maximum number of allowed CPUs
  docs/system: riscv: Document AIA options for virt machine
  hw/riscv: virt: Add optional AIA IMSIC support to virt machine
  hw/intc: Add RISC-V AIA IMSIC device emulation
  hw/riscv: virt: Add optional AIA APLIC support to virt machine
  target/riscv: fix inverted checks for ext_zb[abcs]

Signed-off-by: Peter Maydell <[email protected]>

target/nios2: Rewrite interrupt handling

Previously, we would avoid setting CPU_INTERRUPT_HARD when interrupts
are disabled at a particular point in time, instead queuing the value
into cpu->irq_pending. This is more complicated than required.

Instead, set CPU_INTERRUPT_HARD any time there is a pending interrupt,
and exclusively check for interrupts disabled in nios2_cpu_exec_interrupt.

Reviewed-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/nios2: Special case ipending in rdctl and wrctl

It was never correct to be able to write to ipending.
Until the rest of the irq code is tidied, the read of
ipending will generate an "unnecessary" mask.

Reviewed-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/nios2: Split mmu_write

Create three separate functions for the three separate registers.
Avoid extra dispatch through op_helper.c.
Dispatch to the correct function in translation.
Clean up the ifdefs in wrctl.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/nios2: Hoist R_ZERO check in rdctl

This will avoid having to replicate the check to additional cases.

Reviewed-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/nios2: Only build mmu.c for system mode

We can thus remove an ifdef covering the entire file.

Reviewed-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/nios2: Replace MMU_LOG with tracepoints

Reviewed-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/nios2: Remove mmu_read_debug

This functionality can be had via plugins, if desired.
In the meantime, it is unused code.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20220302' into staging

target-arm queue:
* mps3-an547: Add missing user ahb interfaces
* hw/arm/mps2-tz.c: Update AN547 documentation URL
* hw/input/tsc210x: Don't abort on bad SPI word widths
* hw/i2c: flatten pca954x mux device
* target/arm: Support PSCI 1.1 and SMCCC 1.0
* target/arm: Fix early free of TCG temp in handle_simd_shift_fpint_conv()
* tests/qtest: add qtests for npcm7xx sdhci
* Implement FEAT_LVA
* Implement FEAT_LPA
* Implement FEAT_LPA2 (but do not enable it yet)
* Report KVM's actual PSCI version to guest in dtb
* ui/cocoa.m: Fix updateUIInfo threading issues
* ui/cocoa.m: Remove unnecessary NSAutoreleasePools

# gpg: Signature made Wed 02 Mar 2022 20:52:06 GMT
# gpg:                using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Peter Maydell <[email protected]>" [ultimate]
# gpg:                 aka "Peter Maydell <[email protected]>" [ultimate]
# gpg:                 aka "Peter Maydell <[email protected]>" [ultimate]
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* remotes/pmaydell/tags/pull-target-arm-20220302: (26 commits)
  ui/cocoa.m: Remove unnecessary NSAutoreleasePools
  ui/cocoa.m: Fix updateUIInfo threading issues
  target/arm: Report KVM's actual PSCI version to guest in dtb
  target/arm: Implement FEAT_LPA2
  target/arm: Advertise all page sizes for -cpu max
  target/arm: Validate tlbi TG matches translation granule in use
  target/arm: Fix TLBIRange.base for 16k and 64k pages
  target/arm: Introduce tlbi_aa64_get_range
  target/arm: Extend arm_fi_to_lfsc to level -1
  target/arm: Implement FEAT_LPA
  target/arm: Implement FEAT_LVA
  target/arm: Prepare DBGBVR and DBGWVR for FEAT_LVA
  target/arm: Honor TCR_ELx.{I}PS
  target/arm: Use MAKE_64BIT_MASK to compute indexmask
  target/arm: Pass outputsize down to check_s2_mmu_setup
  target/arm: Move arm_pamax out of line
  target/arm: Fault on invalid TCR_ELx.TxSZ
  target/arm: Set TCR_EL1.TSZ for user-only
  hw/registerfields: Add FIELD_SEX<N> and FIELD_SDP<N>
  tests/qtest: add qtests for npcm7xx sdhci
  ...

Signed-off-by: Peter Maydell <[email protected]>

hw/nvme: 64-bit pi support

This adds support for one possible new protection information format
introduced in TP4068 (and integrated in NVMe 2.0): the 64-bit CRC guard
and 48-bit reference tag. This version does not support storage tags.

Like the CRC16 support already present, this uses a software
implementation of CRC64 (so it is naturally pretty slow). But its good
enough for verification purposes.

This may go nicely hand-in-hand with the support that Keith submitted
for the Linux kernel[1].

[1]: https://lore.kernel.org/linux-nvme/20220126165214 [email protected]/T/

Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Naveen Nagar <[email protected]>
Signed-off-by: Klaus Jensen <[email protected]>

hw/nvme: add pi tuple size helper

A subsequent patch will introduce a new tuple size; so add a helper and
use that instead of sizeof() and magic numbers.

Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Klaus Jensen <[email protected]>

hw/nvme: add support for the lbafee hbs feature

Add support for up to 64 LBA formats through the LBAFEE field of the
Host Behavior Support feature.

Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Naveen Nagar <[email protected]>
Signed-off-by: Klaus Jensen <[email protected]>

hw/nvme: move format parameter parsing

There is no need to extract the format command parameters for each
namespace. Move it to the entry point.

Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Klaus Jensen <[email protected]>

hw/nvme: add host behavior support feature

Add support for getting and setting the Host Behavior Support feature.

Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Naveen Nagar <[email protected]>
Signed-off-by: Klaus Jensen <[email protected]>

hw/nvme: move dif/pi prototypes into dif.h

Move dif/pi data structures and inlines to dif.h.

Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Klaus Jensen <[email protected]>

target/riscv: expose zfinx, zdinx, zhinx{min} properties

Co-authored-by: ardxwe <[email protected]>
Signed-off-by: Weiwei Li <[email protected]>
Signed-off-by: Junqiang Wang <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Message-Id: <20220211043920 [email protected]>
Signed-off-by: Alistair Francis <[email protected]>

target/riscv: add support for zhinx/zhinxmin

  - update extension check REQUIRE_ZHINX_OR_ZFH and REQUIRE_ZFH_OR_ZFHMIN_OR_ZHINX_OR_ZHINXMIN
  - update half float point register read/write
  - disable nanbox_h check

Signed-off-by: Weiwei Li <[email protected]>
Signed-off-by: Junqiang Wang <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Acked-by: Alistair Francis <[email protected]>
Message-Id: <20220211043920 [email protected]>
Signed-off-by: Alistair Francis <[email protected]>

target/riscv: add support for zdinx

-- update extension check REQUIRE_ZDINX_OR_D
-- update double float point register read/write

Co-authored-by: ardxwe <[email protected]>
Signed-off-by: Weiwei Li <[email protected]>
Signed-off-by: Junqiang Wang <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Message-Id: <20220211043920 [email protected]>
Signed-off-by: Alistair Francis <[email protected]>

target/riscv: add support for zfinx

  - update extension check REQUIRE_ZFINX_OR_F
  - update single float point register read/write
  - disable nanbox_s check

Co-authored-by: ardxwe <[email protected]>
Signed-off-by: Weiwei Li <[email protected]>
Signed-off-by: Junqiang Wang <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Acked-by: Alistair Francis <[email protected]>
Message-Id: <20220211043920 [email protected]>
Signed-off-by: Alistair Francis <[email protected]>

target/riscv: hardwire mstatus.FS to zero when enable zfinx

Co-authored-by: ardxwe <[email protected]>
Signed-off-by: Weiwei Li <[email protected]>
Signed-off-by: Junqiang Wang <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Message-Id: <20220211043920 [email protected]>
Signed-off-by: Alistair Francis <[email protected]>

target/riscv: add cfg properties for zfinx, zdinx and zhinx{min}

Co-authored-by: ardxwe <[email protected]>
Signed-off-by: Weiwei Li <[email protected]>
Signed-off-by: Junqiang Wang <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Message-Id: <20220211043920 [email protected]>
Signed-off-by: Alistair Francis <[email protected]>

hw: riscv: opentitan: fixup SPI addresses

This patch updates the SPI_DEVICE, SPI_HOST0, SPI_HOST1
base addresses. Also adds these as unimplemented devices.

The address references can be found [1].

[1] https://github.com/lowRISC/opentitan/blob/6c317992fbd646818b34f2a2dbf44bc850e461e4/hw/top_earlgrey/sw/autogen/top_earlgrey_memory.h#L107

Signed-off-by: Wilfred Mallawa <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Reviewed-by: Bin Meng <[email protected]>
Message-Id: <20220218063839 [email protected]>
Signed-off-by: Alistair Francis <[email protected]>

hw/riscv: virt: Increase maximum number of allowed CPUs

To facilitate software development of RISC-V systems with large number
of HARTs, we increase the maximum number of allowed CPUs to 512 (2^9).

We also add a detailed source level comments about limit defines which
impact the physical address space utilization.

Signed-off-by: Anup Patel <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Reviewed-by: Frank Chang <[email protected]>
Message-Id: <20220220085526 [email protected]>
Signed-off-by: Alistair Francis <[email protected]>

docs/system: riscv: Document AIA options for virt machine

We have two new machine options "aia" and "aia-guests" available
for the RISC-V virt machine so let's document these options.

Signed-off-by: Anup Patel <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Reviewed-by: Frank Chang <[email protected]>
Message-Id: <20220220085526 [email protected]>
Signed-off-by: Alistair Francis <[email protected]>

hw/riscv: virt: Add optional AIA IMSIC support to virt machine

We extend virt machine to emulate both AIA IMSIC and AIA APLIC
devices only when "aia=aplic-imsic" parameter is passed along
with machine name in the QEMU command-line. The AIA IMSIC is
only a per-HART MSI controller so we use AIA APLIC in MSI-mode
to forward all wired interrupts as MSIs to the AIA IMSIC.

We also provide "aia-guests=<xyz>" parameter which can be used
to specify number of VS-level AIA IMSIC Guests MMIO pages for
each HART.

Signed-off-by: Anup Patel <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Acked-by: Alistair Francis <[email protected]>
Message-Id: <20220220085526 [email protected]>
Signed-off-by: Alistair Francis <[email protected]>

hw/intc: Add RISC-V AIA IMSIC device emulation

The RISC-V AIA (Advanced Interrupt Architecture) defines a new
interrupt controller for MSIs (message signal interrupts) called
IMSIC (Incoming Message Signal Interrupt Controller). The IMSIC
is per-HART device and also suppport virtualizaiton of MSIs using
dedicated VS-level guest interrupt files.

This patch adds device emulation for RISC-V AIA IMSIC which
supports M-level, S-level, and VS-level MSIs.

Signed-off-by: Anup Patel <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Reviewed-by: Frank Chang <[email protected]>
Message-Id: <20220220085526 [email protected]>
Signed-off-by: Alistair Francis <[email protected]>

hw/riscv: virt: Add optional AIA APLIC support to virt machine

We extend virt machine to emulate AIA APLIC devices only when
"aia=aplic" parameter is passed along with machine name in QEMU
command-line. When "aia=none" or not specified then we fallback
to original PLIC device emulation.

Signed-off-by: Anup Patel <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Message-Id: <20220220085526 [email protected]>
Signed-off-by: Alistair Francis <[email protected]>

target/riscv: fix inverted checks for ext_zb[abcs]

While changing to the use of cfg_ptr, the conditions for REQUIRE_ZB[ABCS]
inadvertently became inverted and slipped through the initial testing (which
used RV64GC_XVentanaCondOps as a target).
This fixes the regression.

Tested against SPEC2017 w/ GCC 12 (prerelease) for RV64GC_zba_zbb_zbc_zbs.

Fixes: f2a32bec8f0da99 ("target/riscv: access cfg structure through DisasContext")
Signed-off-by: Philipp Tomsich <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Message-Id: <20220203153946.2676353 [email protected]>
Signed-off-by: Alistair Francis <[email protected]>

Merge remote-tracking branch 'remotes/dgilbert-gitlab/tags/pull-migration-20220302b' into staging

Migration/HMP/Virtio pull 2022-03-02

A bit of a mix this time:
  * Minor fixes from myself, Hanna, and Jack
  * VNC password rework by Stefan and Fabian
  * Postcopy changes from Peter X that are
    the start of a larger series to come
  * Removing the prehistoic load_state_old
    code from Peter M

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
# gpg: Signature made Wed 02 Mar 2022 18:25:12 GMT
# gpg:                using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <[email protected]>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert-gitlab/tags/pull-migration-20220302b:
  migration: Remove load_state_old and minimum_version_id_old
  tests: Pass in MigrateStart** into test_migrate_start()
  migration: Add migration_incoming_transport_cleanup()
  migration: postcopy_pause_fault_thread() never fails
  migration: Enlarge postcopy recovery to capture !-EIO too
  migration: Move static var in ram_block_from_stream() into global
  migration: Add postcopy_thread_create()
  migration: Dump ramblock and offset too when non-same-page detected
  migration: Introduce postcopy channels on dest node
  migration: Tracepoint change in postcopy-run bottom half
  migration: Finer grained tracepoints for POSTCOPY_LISTEN
  migration: Dump sub-cmd name in loadvm_process_command tp
  migration/rdma: set the REUSEADDR option for destination
  qapi/monitor: allow VNC display id in set/expire_password
  qapi/monitor: refactor set/expire_password with enums
  monitor/hmp: add support for flag argument with value
  virtiofsd: Let meson check for statx.stx_mnt_id
  clock-vmstate: Add missing END_OF_LIST

Signed-off-by: Peter Maydell <[email protected]>

ui/cocoa.m: Remove unnecessary NSAutoreleasePools

In commit 6e657e64cdc478 in 2013 we added some autorelease pools to
deal with complaints from macOS when we made calls into Cocoa from
threads that didn't have automatically created autorelease pools.
Later on, macOS got stricter about forbidding cross-thread Cocoa
calls, and in commit 5588840ff77800e839d8 we restructured the code to
avoid them. This left the autorelease pool creation in several
functions without any purpose; delete it.

We still need the pool in cocoa_refresh() for the clipboard related
code which is called directly there.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Akihiko Odaki <[email protected]>
Tested-by: Akihiko Odaki <[email protected]>
Message-id: 20220224101330 [email protected]

ui/cocoa.m: Fix updateUIInfo threading issues

The updateUIInfo method makes Cocoa API calls.  It also calls back
into QEMU functions like dpy_set_ui_info().  To do this safely, we
need to follow two rules:
* Cocoa API calls are made on the Cocoa UI thread
* When calling back into QEMU we must hold the iothread lock

Fix the places where we got this wrong, by taking the iothread lock
while executing updateUIInfo, and moving the call in cocoa_switch()
inside the dispatch_async block.

Some of the Cocoa UI methods which call updateUIInfo are invoked as
part of the initial application startup, while we're still doing the
little cross-thread dance described in the comment just above
call_qemu_main().  This meant they were calling back into the QEMU UI
layer before we'd actually finished initializing our display and
registered the DisplayChangeListener, which isn't really valid.  Once
updateUIInfo takes the iothread lock, we no longer get away with
this, because during this startup phase the iothread lock is held by
the QEMU main-loop thread which is waiting for us to finish our
display initialization.  So we must suppress updateUIInfo until
applicationDidFinishLaunching allows the QEMU main-loop thread to
continue.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Akihiko Odaki <[email protected]>
Tested-by: Akihiko Odaki <[email protected]>
Message-id: 20220224101330 [email protected]

target/arm: Report KVM's actual PSCI version to guest in dtb

When we're using KVM, the PSCI implementation is provided by the
kernel, but QEMU has to tell the guest about it via the device tree.
Currently we look at the KVM_CAP_ARM_PSCI_0_2 capability to determine
if the kernel is providing at least PSCI 0.2, but if the kernel
provides a newer version than that we will still only tell the guest
it has PSCI 0.2. (This is fairly harmless; it just means the guest
won't use newer parts of the PSCI API.)

The kernel exposes the specific PSCI version it is implementing via
the ONE_REG API; use this to report in the dtb that the PSCI
implementation is 1.0-compatible if appropriate. (The device tree
binding currently only distinguishes "pre-0.2", "0.2-compatible" and
"1.0-compatible".)

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Marc Zyngier <[email protected]>
Reviewed-by: Akihiko Odaki <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Message-id: 20220224134655.1207865 [email protected]

target/arm: Implement FEAT_LPA2

This feature widens physical addresses (and intermediate physical
addresses for 2-stage translation) from 48 to 52 bits, when using
4k or 16k pages.

This introduces the DS bit to TCR_ELx, which is RES0 unless the
page size is enabled and supports LPA2, resulting in the effective
value of DS for a given table walk. The DS bit changes the format
of the page table descriptor slightly, moving the PS field out to
TCR so that all pages have the same sharability and repurposing
those bits of the page table descriptor for the highest bits of
the output address.

Do not yet enable FEAT_LPA2; we need extra plumbing to avoid
tickling an old kernel bug.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20220301215958 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Advertise all page sizes for -cpu max

We support 16k pages, but do not advertize that in ID_AA64MMFR0.

The value 0 in the TGRAN*_2 fields indicates that stage2 lookups defer
to the same support as stage1 lookups. This setting is deprecated, so
indicate support for all stage2 page sizes directly.

Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Message-id: 20220301215958 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Validate tlbi TG matches translation granule in use

For FEAT_LPA2, we will need other ARMVAParameters, which themselves
depend on the translation granule in use. We might as well validate
that the given TG matches; the architecture "does not require that
the instruction invalidates any entries" if this is not true.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20220301215958 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Fix TLBIRange.base for 16k and 64k pages

The shift of the BaseADDR field depends on the translation
granule in use.

Fixes: 84940ed8255 ("target/arm: Add support for FEAT_TLBIRANGE")
Reported-by: Peter Maydell <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20220301215958 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Introduce tlbi_aa64_get_range

Merge tlbi_aa64_range_get_length and tlbi_aa64_range_get_base,
returning a structure containing both results. Pass in the
ARMMMUIdx, rather than the digested two_ranges boolean.

This is in preparation for FEAT_LPA2, where the interpretation
of 'value' depends on the effective value of DS for the regime.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20220301215958 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Extend arm_fi_to_lfsc to level -1

With FEAT_LPA2, rather than introducing translation level 4,
we introduce level -1, below the current level 0. Extend
arm_fi_to_lfsc to handle these faults.

Assert that this new translation level does not leak into
fault types for which it is not defined, which allows some
masking of fi->level to be removed.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20220301215958 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Implement FEAT_LPA

This feature widens physical addresses (and intermediate physical
addresses for 2-stage translation) from 48 to 52 bits, when using
64k pages. The only thing left at this point is to handle the
extra bits in the TTBR and in the table descriptors.

Note that PAR_EL1 and HPFAR_EL2 are nominally extended, but we don't
mask out the high bits when writing to those registers, so no changes
are required there.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20220301215958 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Implement FEAT_LVA

This feature is relatively small, as it applies only to
64k pages and thus requires no additional changes to the
table descriptor walking algorithm, only a change to the
minimum TSZ (which is the inverse of the maximum virtual
address space size).

Note that this feature widens VBAR_ELx, but we already
treat the register as being 64 bits wide.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20220301215958 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Prepare DBGBVR and DBGWVR for FEAT_LVA

The original A.a revision of the AArch64 ARM required that we
force-extend the addresses in these registers from 49 bits.
This language has been loosened via a combination of IMPLEMENTATION
DEFINED and CONSTRAINTED UNPREDICTABLE to allow consideration of
the entire aligned address.

This means that we do not have to consider whether or not FEAT_LVA
is enabled, and decide from which bit an address might need to be
extended.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20220301215958 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Honor TCR_ELx.{I}PS

This field controls the output (intermediate) physical address size
of the translation process. V8 requires to raise an AddressSize
fault if the page tables are programmed incorrectly, such that any
intermediate descriptor address, or the final translated address,
is out of range.

Add a PS field to ARMVAParameters, and properly compute outputsize
in get_phys_addr_lpae. Test the descaddr as extracted from TTBR
and from page table entries.

Restrict descaddrmask so that we won't raise the fault for v7.

Reviewed-by: Peter Maydell <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20220301215958 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Use MAKE_64BIT_MASK to compute indexmask

The macro is a bit more readable than the inlined computation.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20220301215958 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Pass outputsize down to check_s2_mmu_setup

Pass down the width of the output address from translation.
For now this is still just PAMax, but a subsequent patch will
compute the correct value from TCR_ELx.{I}PS.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20220301215958 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Move arm_pamax out of line

We will shortly share parts of this function with other portions
of address translation.

Reviewed-by: Peter Maydell <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20220301215958 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Fault on invalid TCR_ELx.TxSZ

Without FEAT_LVA, the behaviour of programming an invalid value
is IMPLEMENTATION DEFINED. With FEAT_LVA, programming an invalid
minimum value requires a Translation fault.

It is most self-consistent to choose to generate the fault always.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20220301215958 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Set TCR_EL1.TSZ for user-only

Set this as the kernel would, to 48 bits, to keep the computation
of the address space correct for PAuth.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20220301215958 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

hw/registerfields: Add FIELD_SEX<N> and FIELD_SDP<N>

Add new macros to manipulate signed fields within the register.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20220301215958 [email protected]
Suggested-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

tests/qtest: add qtests for npcm7xx sdhci

Reviewed-by: Hao Wu <[email protected]>
Reviewed-by: Chris Rauer <[email protected]>
Signed-off-by: Shengtan Mao <[email protected]>
Signed-off-by: Patrick Venture <[email protected]>
Message-id: 20220225174451 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Fix early free of TCG temp in handle_simd_shift_fpint_conv()

handle_simd_shift_fpint_conv() was accidentally freeing the TCG
temporary tcg_fpstatus too early, before the last use of it. Move
the free down to where it belongs.

Signed-off-by: Wentao_Liang <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
[PMM: cleaned up commit message]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Support PSCI 1.1 and SMCCC 1.0

Support the latest PSCI on TCG and HVF. A 64-bit function called from
AArch32 now returns NOT_SUPPORTED, which is necessary to adhere to SMC
Calling Convention 1.0. It is still not compliant with SMCCC 1.3 since
they do not implement mandatory functions.

Signed-off-by: Akihiko Odaki <[email protected]>
Message-id: 20220213035753 [email protected]
Reviewed-by: Peter Maydell <[email protected]>
[PMM: update MISMATCH_CHECK checks on PSCI_VERSION macros to match]
Signed-off-by: Peter Maydell <[email protected]>

hw/i2c: flatten pca954x mux device

Previously this device created N subdevices which each owned an i2c bus.
Now this device simply owns the N i2c busses directly.

Tested: Verified devices behind mux are still accessible via qmp and i2c
from within an arm32 SoC.

Reviewed-by: Hao Wu <[email protected]>
Signed-off-by: Patrick Venture <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Tested-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: 20220202164533.1283668 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

hw/input/tsc210x: Don't abort on bad SPI word widths

The tsc210x doesn't support anything other than 16-bit reads on the
SPI bus, but the guest can program the SPI controller to attempt
them anyway. If this happens, don't abort QEMU, just log this as
a guest error.

This fixes our machine_arm_n8x0.py:N8x0Machine.test_n800
acceptance test, which hits this assertion.

The reason we hit the assertion is because the guest kernel thinks
there is a TSC2005 on this SPI bus address, not a TSC210x.  (The n810
*does* have a TSC2005 at this address.) The TSC2005 supports the
24-bit accesses which the guest driver makes, and the TSC210x does
not (that is, our TSC210x emulation is not missing support for a word
width the hardware can handle).  It's not clear whether the problem
here is that the guest kernel incorrectly thinks the n800 has the
same device at this SPI bus address as the n810, or that QEMU's n810
board model doesn't get the SPI devices right.  At this late date
there no longer appears to be any reliable information on the web
about the hardware behaviour, but I am inclined to think this is a
guest kernel bug.  In any case, we prefer not to abort QEMU for
guest-triggerable conditions, so logging the error is the right thing
to do.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/736
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Message-id: 20220221140750 [email protected]

hw/arm/mps2-tz.c: Update AN547 documentation URL

The AN547 application note URL has changed: update our comment
accordingly. (Rev B is still downloadable from the old URL,
but there is a new Rev C of the document now.)

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Tested-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: 20220221094144 [email protected]

mps3-an547: Add missing user ahb interfaces

With these interfaces missing, TFM would delegate peripherals 0, 1,
2, 3 and 8, and qemu would ignore the delegation of interface 8, as
it thought interface 4 was eth & USB.

This patch corrects this behavior and allows TFM to delegate the
eth & USB peripheral to NS mode.

(The old QEMU behaviour was based on revision B of the AN547
appnote; revision C corrects this error in the documentation,
and this commit brings QEMU in to line with how the FPGA
image really behaves.)

Signed-off-by: Jimmy Brisson <[email protected]>
Message-id: 20220210210227.3203883 [email protected]
Reviewed-by: Peter Maydell <[email protected]>
[PMM: added commit message note clarifying that the old behaviour
was a docs issue, not because there were two different versions
of the FPGA image]
Signed-off-by: Peter Maydell <[email protected]>

migration: Remove load_state_old and minimum_version_id_old

There are no longer any VMStateDescription structs in the tree which
use the load_state_old support for custom handling of incoming
migration from very old QEMU. Remove the mechanism entirely.

This includes removing one stray useless setting of
minimum_version_id_old in a VMStateDescription with no load_state_old
function, which crept in after the global weeding-out of them in
commit 17e313406126.

Signed-off-by: Peter Maydell <[email protected]>
Message-Id: <20220215175705.3846411 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Francisco Iglesias <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

tests: Pass in MigrateStart** into test_migrate_start()

test_migrate_start() will release the MigrateStart structure that passed
in, however that's not super clear to the caller because after the call
returned the pointer can still be referenced by the callers. It can easily
be a source of use-after-free.

Let's pass in a double pointer of that, then we can safely clear the
pointer for the caller after the struct is released.

Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20220301083925 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
dgilbert: Fixup apply since I didn't take 24/25

migration: Add migration_incoming_transport_cleanup()

Add a helper to cleanup the transport listener.

When do it, we should also null-ify the cleanup hook and the data, then it's
even safe to call it multiple times.

Move the socket_address_list cleanup altogether, because that's a mirror of the
listener channels and only for the purpose of query-migrate. Hence when
someone wants to cleanup the listener transport, it should also want to cleanup
the socket list too, always.

No functional change intended.

Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20220301083925 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: postcopy_pause_fault_thread() never fails

Per the title, remove the return code and simplify the callers as the errors
will never be triggered. No functional change intended.

Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20220301083925 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: Enlarge postcopy recovery to capture !-EIO too

We used to have quite a few places making sure -EIO happened and that's the
only way to trigger postcopy recovery. That's based on the assumption that
we'll only return -EIO for channel issues.

It'll work in 99.99% cases but logically that won't cover some corner cases.
One example is e.g. ram_block_from_stream() could fail with an interrupted
network, then -EINVAL will be returned instead of -EIO.

I remembered Dave Gilbert pointed that out before, but somehow this is
overlooked. Neither did I encounter anything outside the -EIO error.

However we'd better touch that up before it triggers a rare VM data loss during
live migrating.

To cover as much those cases as possible, remove the -EIO restriction on
triggering the postcopy recovery, because even if it's not a channel failure,
we can't do anything better than halting QEMU anyway - the corpse of the
process may even be used by a good hand to dig out useful memory regions, or
the admin could simply kill the process later on.

Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20220301083925 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: Move static var in ram_block_from_stream() into global

Static variable is very unfriendly to threading of ram_block_from_stream().
Move it into MigrationIncomingState.

Make the incoming state pointer to be passed over to ram_block_from_stream() on
both caller sites.

Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20220301083925 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: Add postcopy_thread_create()

Postcopy create threads. A common manner is we init a sem and use it to sync
with the thread. Namely, we have fault_thread_sem and listen_thread_sem and
they're only used for this.

Make it a shared infrastructure so it's easier to create yet another thread.

Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20220301083925 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: Dump ramblock and offset too when non-same-page detected

In ram_load_postcopy() we'll try to detect non-same-page case and dump error.
This error is very helpful for debugging. Adding ramblock & offset into the
error log too.

Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20220301083925 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
dgilbert: Fix up long line

migration: Introduce postcopy channels on dest node

Postcopy handles huge pages in a special way that currently we can only have
one "channel" to transfer the page.

It's because when we install pages using UFFDIO_COPY, we need to have the whole
huge page ready, it also means we need to have a temp huge page when trying to
receive the whole content of the page.

Currently all maintainance around this tmp page is global: firstly we'll
allocate a temp huge page, then we maintain its status mostly within
ram_load_postcopy().

To enable multiple channels for postcopy, the first thing we need to do is to
prepare N temp huge pages as caching, one for each channel.

Meanwhile we need to maintain the tmp huge page status per-channel too.

To give some example, some local variables maintained in ram_load_postcopy()
are listed; they are responsible for maintaining temp huge page status:

  - all_zero:     this keeps whether this huge page contains all zeros
  - target_pages: this counts how many target pages have been copied
  - host_page:    this keeps the host ptr for the page to install

Move all these fields to be together with the temp huge pages to form a new
structure called PostcopyTmpPage.  Then for each (future) postcopy channel, we
need one structure to keep the state around.

For vanilla postcopy, obviously there's only one channel.  It contains both
precopy and postcopy pages.

This patch teaches the dest migration node to start realize the possible number
of postcopy channels by introducing the "postcopy_channels" variable.  Its
value is calculated when setup postcopy on dest node (during POSTCOPY_LISTEN
phase).

Vanilla postcopy will have channels=1, but when postcopy-preempt capability is
enabled (in the future), we will boost it to 2 because even during partial
sending of a precopy huge page we still want to preempt it and start sending
the postcopy requested page right away (so we start to keep two temp huge
pages; more if we want to enable multifd).  In this patch there's a TODO marked
for that; so far the channels is always set to 1.

We need to send one "host huge page" on one channel only and we cannot split
them, because otherwise the data upon the same huge page can locate on more
than one channel so we need more complicated logic to manage.  One temp host
huge page for each channel will be enough for us for now.

Postcopy will still always use the index=0 huge page even after this patch.
However it prepares for the latter patches where it can start to use multiple
channels (which needs src intervention, because only src knows which channel we
should use).

Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20220301083925 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
  dgilbert: Fixed up long line

migration: Tracepoint change in postcopy-run bottom half

Remove the old two tracepoints and they're even near each other:

trace_loadvm_postcopy_handle_run_cpu_sync()
trace_loadvm_postcopy_handle_run_vmstart()

Add trace_loadvm_postcopy_handle_run_bh() with a finer granule trace.

Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20220301083925 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: Finer grained tracepoints for POSTCOPY_LISTEN

The enablement of postcopy listening has a few steps, add a few tracepoints to
be there ready for some basic measurements for them.

Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20220301083925 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: Dump sub-cmd name in loadvm_process_command tp

It'll be easier to read the name rather than index of sub-cmd when debugging.

Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20220301083925 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/rdma: set the REUSEADDR option for destination

We hit following error during testing RDMA transport:
in case of migration error, mgmt daemon pick one migration port,
incoming rdma:[::]:8089: RDMA ERROR: Error: could not rdma_bind_addr

Then try another -incoming rdma:[::]:8103, sometime it worked,
sometimes need another try with other ports number.

Set the REUSEADDR option for destination, This allow address could
be reused to avoid rdma_bind_addr error out.

Signed-off-by: Jack Wang <[email protected]>
Message-Id: <20220208085640 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Pankaj Gupta <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
dgilbert: Fixed up some tabs

qapi/monitor: allow VNC display id in set/expire_password

It is possible to specify more than one VNC server on the command line,
either with an explicit ID or the auto-generated ones à la "default",
"vnc2", "vnc3", ...

It is not possible to change the password on one of these extra VNC
displays though. Fix this by adding a "display" parameter to the
"set_password" and "expire_password" QMP and HMP commands.

For HMP, the display is specified using the "-d" value flag.

For QMP, the schema is updated to explicitly express the supported
variants of the commands with protocol-discriminated unions.

Signed-off-by: Stefan Reiter <[email protected]>
[FE: update "Since: " from 6.2 to 7.0
make @connected a common member of @SetPasswordOptions]
Signed-off-by: Fabian Ebner <[email protected]>
Message-Id: <20220225084949 [email protected]>
Acked-by: Markus Armbruster <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

qapi/monitor: refactor set/expire_password with enums

'protocol' and 'connected' are better suited as enums than as strings,
make use of that. No functional change intended.

Suggested-by: Markus Armbruster <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Signed-off-by: Stefan Reiter <[email protected]>
[FE: update "Since: " from 6.2 to 7.0
put 'keep' first in enum to ease use as a default]
Signed-off-by: Fabian Ebner <[email protected]>
Message-Id: <20220225084949 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

monitor/hmp: add support for flag argument with value

Adds support for the "-xs" parameter type, where "-x" denotes a flag
name and the "s" suffix indicates that this flag is supposed to take
an arbitrary string parameter.

These parameters are always optional, the entry in the qdict will be
omitted if the flag is not given.

Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Stefan Reiter <[email protected]>
[FE: fixed typo pointed out by Eric Blake
use s instead of V to indicate string parameter]
Signed-off-by: Fabian Ebner <[email protected]>
Message-Id: <20220225084949 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

virtiofsd: Let meson check for statx.stx_mnt_id

In virtiofsd, we assume that the presence of the STATX_MNT_ID macro
implies existence of the statx.stx_mnt_id field. Unfortunately, that is
not necessarily the case: glibc has introduced the macro in its commit
88a2cf6c4bab6e94a65e9c0db8813709372e9180, but the statx.stx_mnt_id field
is still missing from its own headers.

Let meson.build actually chek for both STATX_MNT_ID and
statx.stx_mnt_id, and set CONFIG_STATX_MNT_ID if both are present.
Then, use this config macro in virtiofsd.

Closes: https://gitlab.com/qemu-project/qemu/-/issues/882
Signed-off-by: Hanna Reitz <[email protected]>
Message-Id: <20220223092340 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

clock-vmstate: Add missing END_OF_LIST

Add the missing VMSTATE_END_OF_LIST to vmstate_muldiv

Fixes: 99abcbc7600 ("clock: Provide builtin multiplier/divider")
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Message-Id: <20220111101934 [email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Luc Michel <[email protected]>
Cc: [email protected]
Signed-off-by: Dr. David Alan Gilbert <[email protected]>