Git Repo - qemu.git/log

target/arm: Reuse sve_probe_page for gather first-fault loads

This avoids the need for a separate set of helpers to implement
no-fault semantics, and will enable MTE in the future.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Use SVEContLdSt for contiguous stores

Follow the model set up for contiguous loads. This handles
watchpoints correctly for contiguous stores, recognizing the
exception before any changes to memory.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Update contiguous first-fault and no-fault loads

With sve_cont_ldst_pages, the differences between first-fault and no-fault
are minimal, so unify the routines. With cpu_probe_watchpoint, we are able
to make progress through pages with TLB_WATCHPOINT set when the watchpoint
does not actually fire.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Use SVEContLdSt for multi-register contiguous loads

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Handle watchpoints in sve_ld1_r

Handle all of the watchpoints for active elements all at once,
before we've modified the vector register. This removes the
TLB_WATCHPOINT bit from page[].flags, which means that we can
use the normal fast path via RAM.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Use SVEContLdSt in sve_ld1_r

First use of the new helper functions, so we can remove the
unused markup. No longer need a scratch for user-only, as
we completely probe the page set before reading; system mode
still requires a scratch for MMIO.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Adjust interface of sve_ld1_host_fn

The current interface includes a loop; change it to load a
single element. We will then be able to use the function
for ld{2,3,4} where individual vector elements are not adjacent.

Replace each call with the simplest possible loop over active
elements.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Add sve infrastructure for page lookup

For contiguous predicated memory operations, we want to
minimize the number of tlb lookups performed. We have
open-coded this for sve_ld1_r, but for correctness with
MTE we will need this for all of the memory operations.

Create a structure that holds the bounds of active elements,
and metadata for two pages. Add routines to find those
active elements, lookup the pages, and run watchpoints
for those pages.

Temporarily mark the functions unused to avoid Werror.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Drop manual handling of set/clear_helper_retaddr

Since we converted back to cpu_*_data_ra, we do not need to
do this ourselves.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Use cpu_*_data_ra for sve_ldst_tlb_fn

Use the "normal" memory access functions, rather than the
softmmu internal helper functions directly.

Since fb901c905dc3, cpu_mem_index is now a simple extract
from env->hflags and not a large computation. Which means
that it's now more work to pass around this value than it
is to recompute it.

This only adjusts the primitives, and does not clean up
all of the uses within sve_helper.c.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

accel/tcg: Add endian-specific cpu_{ld, st}* operations

We currently have target-endian versions of these operations,
but no easy way to force a specific endianness. This can be
helpful if the target has endian-specific operations, or a mode
that swaps endianness.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

accel/tcg: Add probe_access_flags

This new interface will allow targets to probe for a page
and then handle watchpoints themselves. This will be most
useful for vector predicated memory operations, where one
page lookup can be used for many operations, and one test
can avoid many watchpoint checks.

Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

accel/tcg: Adjust probe_access call to page_check_range

We have validated that addr+size does not cross a page boundary.
Therefore we need to validate exactly one page. We can achieve
that passing any value 1 <= x <= size to page_check_range.

Passing 1 will simplify the next patch.

Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

accel/tcg: Add block comment for probe_access

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

exec: Fix cpu_watchpoint_address_matches address length

The only caller of cpu_watchpoint_address_matches passes
TARGET_PAGE_SIZE, so the bug is not currently visible.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

exec: Add block comments for watchpoint routines

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

hw/timer/nrf51_timer: Add trace event of counter value update

Add trace event to display timer's counter value updates.

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20200504072822 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

hw/timer/nrf51_timer: Display timer ID in trace events

The NRF51 series SoC have 3 timer peripherals, each having
4 counters. To help differentiate which peripheral is accessed,
display the timer ID in the trace events.

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20200504072822 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

hw/arm/nrf51: Add NRF51_PERIPHERAL_SIZE definition

On the NRF51 series, all peripherals have a fixed I/O size
of 4KiB. Define NRF51_PERIPHERAL_SIZE and use it.

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20200504072822 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

aspeed: sdmc: Implement AST2600 locking behaviour

The AST2600 handles this differently with the extra 'hardlock' state, so
move the testing to the soc specific class' write callback.

Signed-off-by: Joel Stanley <[email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>
Message-id: 20200505090136 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

aspeed: Support AST2600A1 silicon revision

There are minimal differences from Qemu's point of view between the A0
and A1 silicon revisions.

As the A1 exercises different code paths in u-boot it is desirable to
emulate that instead.

Signed-off-by: Joel Stanley <[email protected]>
Reviewed-by: Andrew Jeffery <[email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>
Message-id: 20200504093703 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Drop access_el3_aa32ns_aa64any()

Calling access_el3_aa32ns() works for AArch32 only cores
but it does not handle 32-bit EL2 on top of 64-bit EL3
for mixed 32/64-bit cores.

Merge access_el3_aa32ns_aa64any() into access_el3_aa32ns()
and only use the latter.

Fixes: 68e9c2fe65 ("target-arm: Add VTCR_EL2")
Reported-by: Laurent Desnogues <[email protected]>
Signed-off-by: Edgar E. Iglesias <[email protected]>
Message-id: 20200505141729 [email protected]
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

aspeed: Add boot stub for smp booting

This is a boot stub that is similar to the code u-boot runs, allowing
the kernel to boot the secondary CPU.

u-boot works as follows:

1. Initialises the SMP mailbox area in the SCU at 0x1e6e2180 with default values

2. Copies a stub named 'mailbox_insn' from flash to the SCU, just above the
    mailbox area

3. Sets AST_SMP_MBOX_FIELD_READY to a magic value to indicate the
    secondary can begin execution from the stub

4. The stub waits until the AST_SMP_MBOX_FIELD_GOSIGN register is set to
    a magic value

5. Jumps to the address in AST_SMP_MBOX_FIELD_ENTRY, starting Linux

Linux indicates it is ready by writing the address of its entrypoint
function to AST_SMP_MBOX_FIELD_ENTRY and the 'go' magic number to
AST_SMP_MBOX_FIELD_GOSIGN. The secondary CPU sees this at step 4 and
breaks out of it's loop.

To be compatible, a fixed qemu stub is loaded into the mailbox area. As
qemu can ensure the stub is loaded before execution starts, we do not
need to emulate the AST_SMP_MBOX_FIELD_READY behaviour of u-boot. The
secondary CPU's program counter points to the beginning of the stub,
allowing qemu to start secondaries at step four.

Reboot behaviour is preserved by resetting AST_SMP_MBOX_FIELD_GOSIGN
when the secondaries are reset.

This is only configured when the system is booted with -kernel and qemu
does not execute u-boot first.

Reviewed-by: Cédric Le Goater <[email protected]>
Tested-by: Cédric Le Goater <[email protected]>
Signed-off-by: Joel Stanley <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- qcow2: Fix preallocation on block devices
- backup: Make sure that source and target size match
- vmdk: Fix zero cluster handling
- Follow-up cleanups and fixes for the truncate changes
- iotests: Skip more tests if required drivers are missing

# gpg: Signature made Fri 08 May 2020 13:39:55 BST
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Kevin Wolf <[email protected]>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (30 commits)
  block: Drop unused .bdrv_has_zero_init_truncate
  vhdx: Rework truncation logic
  parallels: Rework truncation logic
  ssh: Support BDRV_REQ_ZERO_WRITE for truncate
  sheepdog: Support BDRV_REQ_ZERO_WRITE for truncate
  rbd: Support BDRV_REQ_ZERO_WRITE for truncate
  nfs: Support BDRV_REQ_ZERO_WRITE for truncate
  file-win32: Support BDRV_REQ_ZERO_WRITE for truncate
  gluster: Drop useless has_zero_init callback
  qcow2: Fix preallocation on block devices
  iotests/055: Use cache.no-flush for vmdk target
  iotests: Backup with different source/target size
  backup: Make sure that source and target size match
  backup: Improve error for bdrv_getlength() failure
  iotests/283: Use consistent size for source and target
  iotests: vmdk: Enable zeroed_grained=on by default
  vmdk: Flush only once in vmdk_L2update()
  vmdk: Don't update L2 table for zero write on zero cluster
  vmdk: Fix partial overwrite of zero cluster
  vmdk: Fix zero cluster allocation
  ...

Signed-off-by: Peter Maydell <[email protected]>

block: Drop unused .bdrv_has_zero_init_truncate

Now that there are no clients of bdrv_has_zero_init_truncate, none of
the drivers need to worry about providing it.

What's more, this eliminates a source of some confusion: a literal
reading of the documentation as written in ceaca56f and implemented in
commit 1dcaf527 claims that a driver which returns 0 for
bdrv_has_zero_init_truncate() must not return 1 for
bdrv_has_zero_init(); this condition was violated for parallels, qcow,
and sometimes for vdi, although in practice it did not matter since
those drivers also lacked .bdrv_co_truncate.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20200428202905 [email protected]>
Acked-by: Richard W.M. Jones <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vhdx: Rework truncation logic

The vhdx driver uses truncation for image growth, with a special case
for blocks that already read as zero but which are only being
partially written. But with a bit of rearranging, it's just as easy
to defer the decision on whether truncation resulted in zeroes to the
actual allocation attempt, reducing the number of places that still
use bdrv_has_zero_init_truncate.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20200428202905 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

parallels: Rework truncation logic

The parallels driver tries to use truncation for image growth, but can
only do so when reads are guaranteed as zero. Now that we have a way
to request zero contents from truncation, we can defer the decision to
actual allocation attempts rather than up front, reducing the number
of places that still use bdrv_has_zero_init_truncate.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20200428202905 [email protected]>
Reviewed-by: Denis V. Lunev <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

ssh: Support BDRV_REQ_ZERO_WRITE for truncate

Our .bdrv_has_zero_init_truncate can detect when the remote side
always zero fills; we can reuse that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it when the server gives it to us for
free.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20200428202905 [email protected]>
Reviewed-by: Richard W.M. Jones <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

sheepdog: Support BDRV_REQ_ZERO_WRITE for truncate

Our .bdrv_has_zero_init_truncate always returns 1 because sheepdog
always 0-fills; we can use that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20200428202905 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

rbd: Support BDRV_REQ_ZERO_WRITE for truncate

Our .bdrv_has_zero_init_truncate always returns 1 because rbd always
0-fills; we can use that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20200428202905 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

nfs: Support BDRV_REQ_ZERO_WRITE for truncate

Our .bdrv_has_zero_init_truncate returns 1 if we detect that the OS
always 0-fills; we can use that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it when the OS gives it to us for
free.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20200428202905 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

file-win32: Support BDRV_REQ_ZERO_WRITE for truncate

When using bdrv_file, .bdrv_has_zero_init_truncate always returns 1;
therefore, we can behave just like file-posix, and always implement
BDRV_REQ_ZERO_WRITE by ignoring it since the OS gives it to us for
free (note that file-posix.c had to use an 'if' because it shared code
between regular files and block devices, but in file-win32.c,
bdrv_host_device uses a separate .bdrv_file_open).

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20200428202905 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

gluster: Drop useless has_zero_init callback

block.c already defaults to 0 if we don't provide a callback; there's
no need to write a callback that always fails.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Message-Id: <20200428202905 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qcow2: Fix preallocation on block devices

Calling bdrv_getlength() to get the pre-truncate file size will not
really work on block devices, because they have always the same length,
and trying to write beyond it will fail with a rather cryptic error
message.

Instead, we should use qcow2_get_last_cluster() and bdrv_getlength()
only as a fallback.

Before this patch:
$ truncate -s 1G test.img
$ sudo losetup -f --show test.img
/dev/loop0
$ sudo qemu-img create -f qcow2 -o preallocation=full /dev/loop0 64M
Formatting '/dev/loop0', fmt=qcow2 size=67108864 cluster_size=65536
preallocation=full lazy_refcounts=off refcount_bits=16
qemu-img: /dev/loop0: Could not resize image: Failed to resize refcount
structures: No space left on device

With this patch:
$ sudo qemu-img create -f qcow2 -o preallocation=full /dev/loop0 64M
Formatting '/dev/loop0', fmt=qcow2 size=67108864 cluster_size=65536
preallocation=full lazy_refcounts=off refcount_bits=16
qemu-img: /dev/loop0: Could not resize image: Failed to resize
underlying file: Preallocation mode 'full' unsupported for this
non-regular file

So as you can see, it still fails, but now the problem is missing
support on the block device level, so we at least get a better error
message.

Note that we cannot preallocate block devices on truncate by design,
because we do not know what area to preallocate. Their length is always
the same, the truncate operation does not change it.

Signed-off-by: Max Reitz <[email protected]>
Message-Id: <20200505141801.1096763 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests/055: Use cache.no-flush for vmdk target

055 uses the backup block job to create a compressed backup of an
$IMGFMT image with both qcow2 and vmdk targets. However, cluster
allocation in vmdk is very slow because it flushes the image file after
each L2 update.

There is no reason why we need this level of safety in this test, so
let's disable flushes for vmdk. For the blockdev-backup tests this is
achieved by simply adding the cache.no-flush=on to the drive_add() for
the target. For drive-backup, the caching flags are copied from the
source node, so we'll also add the flag to the source node, even though
it is not vmdk.

This can make the test run significantly faster (though it doesn't make
a difference on tmpfs). In my usual setup it goes from ~45s to ~15s.

Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20200505064618 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests: Backup with different source/target size

This tests that the backup job catches situations where the target node
has a different size than the source node. It must also forbid resize
operations when the job is already running.

Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20200430142755 [email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

backup: Make sure that source and target size match

Since the introduction of a backup filter node in commit 00e30f05d, the
backup block job crashes when the target image is smaller than the
source image because it will try to write after the end of the target
node without having BLK_PERM_RESIZE. (Previously, the BlockBackend layer
would have caught this and errored out gracefully.)

We can fix this and even do better than the old behaviour: Check that
source and target have the same image size at the start of the block job
and unshare BLK_PERM_RESIZE. (This permission was already unshared
before the same commit 00e30f05d, but the BlockBackend that was used to
make the restriction was removed without a replacement.) This will
immediately error out when starting the job instead of only when writing
to a block that doesn't exist in the target.

Longer target than source would technically work because we would never
write to blocks that don't exist, but semantically these are invalid,
too, because a backup is supposed to create a copy, not just an image
that starts with a copy.

Fixes: 00e30f05de1d19586345ec373970ef4c192c6270
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1778593
Cc: [email protected]
Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20200430142755 [email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

backup: Improve error for bdrv_getlength() failure

bdrv_get_device_name() will be an empty string with modern management
tools that don't use -drive. Use bdrv_get_device_or_node_name() instead
so that the node name is used if the BlockBackend is anonymous.

While at it, start with upper case to make the message consistent with
the rest of the function.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Message-Id: <20200430142755 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests/283: Use consistent size for source and target

The test case forgot to specify the null-co size for the target node.
When adding a check to backup that both sizes match, this would fail
because of the size mismatch and not the behaviour that the test really
wanted to test.

Fixes: a541fcc27c98b96da187c7d4573f3270f3ddd283
Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20200430142755 [email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests: vmdk: Enable zeroed_grained=on by default

In order to avoid bitrot in the zero cluster code in VMDK, enable
zeroed_grain=on by default for the tests.

059 now unsets the default options because zeroed_grain=on works only
with some subformats and the test case tests many different subformats,
including those for which it doesn't work.

Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20200430133007 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vmdk: Flush only once in vmdk_L2update()

If we have a backup L2 table, we currently flush once after writing to
the active L2 table and again after writing to the backup table. A
single flush is enough and makes things a little less slow.

Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20200430133007 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vmdk: Don't update L2 table for zero write on zero cluster

If a cluster is already zeroed, we don't have to call vmdk_L2update(),
which is rather slow because it flushes the image file.

Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20200430133007 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vmdk: Fix partial overwrite of zero cluster

When overwriting a zero cluster, we must not perform copy-on-write from
the backing file, but from a zeroed buffer.

Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20200430133007 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vmdk: Fix zero cluster allocation

m_data must contain valid data even for zero clusters when no cluster
was allocated in the image file. Without this, zero writes segfault with
images that have zeroed_grain=on.

For zero writes, we don't want to allocate a cluster in the image file
even in compressed files.

Fixes: 524089bce43fd1cd3daaca979872451efa2cf7c6
Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20200430133007 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vmdk: Rename VmdkMetaData.valid to new_allocation

m_data is used for zero clusters even though valid == 0. It really only
means that a new cluster was allocated in the image file. Rename it to
reflect this.

While at it, change it from int to bool, too.

Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20200430133007 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qcow2: Avoid integer wraparound in qcow2_co_truncate()

After commit f01643fb8b47e8a70c04bbf45e0f12a9e5bc54de when an image is
extended and BDRV_REQ_ZERO_WRITE is set then the new clusters are
zeroized.

The code however does not detect correctly situations when the old and
the new end of the image are within the same cluster. The problem can
be reproduced with these steps:

   qemu-img create -f qcow2 backing.qcow2 1M
   qemu-img create -f qcow2 -F qcow2 -b backing.qcow2 top.qcow2
   qemu-img resize --shrink top.qcow2 520k
   qemu-img resize top.qcow2 567k

In the last step offset - zero_start causes an integer wraparound.

Signed-off-by: Alberto Garcia <[email protected]>
Message-Id: <20200504155217 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests/113: mark bochs as required to support whitelisting

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20200430124713 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests/109: mark required formats as required to support whitelisting

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20200430124713 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests/055: skip vmdk target tests if vmdk is not whitelisted

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20200430124713 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests/055: refactor compressed backup to vmdk

Instead of looping in each test, let's better refactor vmdk target case
as a subclass.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20200430124713 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests/041: drop self.assert_no_active_block_jobs()

Drop check for no block-jobs: it's obvious that there no jobs
immediately after vm.launch().

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20200430124713 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests/148: use skip_if_unsupported

Skip test-case with quorum if quorum is not whitelisted.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20200430124713 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests/082: require bochs

Test fails if bochs not whitelisted, so, skip it in this case.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20200430124713 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests: handle tmpfs

Some tests requires O_DIRECT, or want it by default. Introduce smarter
O_DIRECT handling:

- Check O_DIRECT in common.rc, if it is requested by selected
cache-mode.

- Support second fall-through argument in _default_cache_mode

Inspired-by: Max's 23e1d054112cec1e
Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20200430124713 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20200507a' into staging

Migration pull 2020-05-07

Mostly tidy-ups, but two new features:
  cpu-throttle-tailslow for making a gentler throttle
  xbzrle encoding rate measurement for getting a feal for xbzrle
performance.

# gpg: Signature made Thu 07 May 2020 18:00:27 BST
# gpg:                using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <[email protected]>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert/tags/pull-migration-20200507a:
  migration/multifd: Do error_free after migrate_set_error to avoid memleaks
  migration/multifd: fix memleaks in multifd_new_send_channel_async
  migration/xbzrle: add encoding rate
  migration/rdma: fix a memleak on error path in rdma_start_incoming_migration
  migration/ram: Consolidate variable reset after placement in ram_load_postcopy()
  migration/throttle: Add cpu-throttle-tailslow migration parameter
  migration/colo: Add missing error-propagation code
  docs/devel/migration: start a debugging section
  migration: move the units of migrate parameters from milliseconds to ms
  monitor/hmp-cmds: add hmp_handle_error() for hmp_migrate_set_speed()
  migration/migration: improve error reporting for migrate parameters
  migration: fix bad indentation in error_report()

Signed-off-by: Peter Maydell <[email protected]>

migration/multifd: Do error_free after migrate_set_error to avoid memleaks

When error happen in multifd_send_thread, it use error_copy to set migrate error in
multifd_send_terminate_threads(). We should call error_free after it.

Similarly, fix another two places in multifd_recv_thread/multifd_save_cleanup.

The leak stack:
Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x7f781af07cf0 in calloc (/lib64/libasan.so.5+0xefcf0)
    #1 0x7f781a2ce22d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5322d)
    #2 0x55ee1d075c17 in error_setv /mnt/sdb/backup/qemu/util/error.c:61
    #3 0x55ee1d076464 in error_setg_errno_internal /mnt/sdb/backup/qemu/util/error.c:109
    #4 0x55ee1cef066e in qio_channel_socket_writev /mnt/sdb/backup/qemu/io/channel-socket.c:569
    #5 0x55ee1cee806b in qio_channel_writev /mnt/sdb/backup/qemu/io/channel.c:207
    #6 0x55ee1cee806b in qio_channel_writev_all /mnt/sdb/backup/qemu/io/channel.c:171
    #7 0x55ee1cee8248 in qio_channel_write_all /mnt/sdb/backup/qemu/io/channel.c:257
    #8 0x55ee1ca12c9a in multifd_send_thread /mnt/sdb/backup/qemu/migration/multifd.c:657
    #9 0x55ee1d0607fc in qemu_thread_start /mnt/sdb/backup/qemu/util/qemu-thread-posix.c:519
    #10 0x7f78159ae2dd in start_thread (/lib64/libpthread.so.0+0x82dd)
    #11 0x7f78156df4b2 in __GI___clone (/lib64/libc.so.6+0xfc4b2)

Indirect leak of 52 byte(s) in 1 object(s) allocated from:
    #0 0x7f781af07f28 in __interceptor_realloc (/lib64/libasan.so.5+0xeff28)
    #1 0x7f78156f07d9 in __GI___vasprintf_chk (/lib64/libc.so.6+0x10d7d9)
    #2 0x7f781a30ea6c in g_vasprintf (/lib64/libglib-2.0.so.0+0x93a6c)
    #3 0x7f781a2e7cd0 in g_strdup_vprintf (/lib64/libglib-2.0.so.0+0x6ccd0)
    #4 0x7f781a2e7d8c in g_strdup_printf (/lib64/libglib-2.0.so.0+0x6cd8c)
    #5 0x55ee1d075c86 in error_setv /mnt/sdb/backup/qemu/util/error.c:65
    #6 0x55ee1d076464 in error_setg_errno_internal /mnt/sdb/backup/qemu/util/error.c:109
    #7 0x55ee1cef066e in qio_channel_socket_writev /mnt/sdb/backup/qemu/io/channel-socket.c:569
    #8 0x55ee1cee806b in qio_channel_writev /mnt/sdb/backup/qemu/io/channel.c:207
    #9 0x55ee1cee806b in qio_channel_writev_all /mnt/sdb/backup/qemu/io/channel.c:171
    #10 0x55ee1cee8248 in qio_channel_write_all /mnt/sdb/backup/qemu/io/channel.c:257
    #11 0x55ee1ca12c9a in multifd_send_thread /mnt/sdb/backup/qemu/migration/multifd.c:657
    #12 0x55ee1d0607fc in qemu_thread_start /mnt/sdb/backup/qemu/util/qemu-thread-posix.c:519
    #13 0x7f78159ae2dd in start_thread (/lib64/libpthread.so.0+0x82dd)
    #14 0x7f78156df4b2 in __GI___clone (/lib64/libc.so.6+0xfc4b2)

Reported-by: Euler Robot <[email protected]>
Signed-off-by: Pan Nengyuan <[email protected]>
Message-Id: <20200506095416 [email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/multifd: fix memleaks in multifd_new_send_channel_async

When error happen in multifd_new_send_channel_async, 'sioc' will not be used
to create the multifd_send_thread. Let's free it to avoid a memleak. And also
do error_free after migrate_set_error() to avoid another leak in the same place.

The leak stack:
Direct leak of 2880 byte(s) in 8 object(s) allocated from:
    #0 0x7f20b5118ae8 in __interceptor_malloc (/lib64/libasan.so.5+0xefae8)
    #1 0x7f20b44df1d5 in g_malloc (/lib64/libglib-2.0.so.0+0x531d5)
    #2 0x564133bce18b in object_new_with_type /mnt/sdb/backup/qemu/qom/object.c:683
    #3 0x564133eea950 in qio_channel_socket_new /mnt/sdb/backup/qemu/io/channel-socket.c:56
    #4 0x5641339cfe4f in socket_send_channel_create /mnt/sdb/backup/qemu/migration/socket.c:37
    #5 0x564133a10328 in multifd_save_setup /mnt/sdb/backup/qemu/migration/multifd.c:772
    #6 0x5641339cebed in migrate_fd_connect /mnt/sdb/backup/qemu/migration/migration.c:3530
    #7 0x5641339d15e4 in migration_channel_connect /mnt/sdb/backup/qemu/migration/channel.c:92
    #8 0x5641339cf5b7 in socket_outgoing_migration /mnt/sdb/backup/qemu/migration/socket.c:108

Direct leak of 384 byte(s) in 8 object(s) allocated from:
    #0 0x7f20b5118cf0 in calloc (/lib64/libasan.so.5+0xefcf0)
    #1 0x7f20b44df22d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5322d)
    #2 0x56413406fc17 in error_setv /mnt/sdb/backup/qemu/util/error.c:61
    #3 0x564134070464 in error_setg_errno_internal /mnt/sdb/backup/qemu/util/error.c:109
    #4 0x5641340851be in inet_connect_addr /mnt/sdb/backup/qemu/util/qemu-sockets.c:379
    #5 0x5641340851be in inet_connect_saddr /mnt/sdb/backup/qemu/util/qemu-sockets.c:458
    #6 0x5641340870ab in socket_connect /mnt/sdb/backup/qemu/util/qemu-sockets.c:1105
    #7 0x564133eeaabf in qio_channel_socket_connect_sync /mnt/sdb/backup/qemu/io/channel-socket.c:145
    #8 0x564133eeabf5 in qio_channel_socket_connect_worker /mnt/sdb/backup/qemu/io/channel-socket.c:168

Indirect leak of 360 byte(s) in 8 object(s) allocated from:
    #0 0x7f20b5118ae8 in __interceptor_malloc (/lib64/libasan.so.5+0xefae8)
    #1 0x7f20af901817 in __GI___vasprintf_chk (/lib64/libc.so.6+0x10d817)
    #2 0x7f20b451fa6c in g_vasprintf (/lib64/libglib-2.0.so.0+0x93a6c)
    #3 0x7f20b44f8cd0 in g_strdup_vprintf (/lib64/libglib-2.0.so.0+0x6ccd0)
    #4 0x7f20b44f8d8c in g_strdup_printf (/lib64/libglib-2.0.so.0+0x6cd8c)
    #5 0x56413406fc86 in error_setv /mnt/sdb/backup/qemu/util/error.c:65
    #6 0x564134070464 in error_setg_errno_internal /mnt/sdb/backup/qemu/util/error.c:109
    #7 0x5641340851be in inet_connect_addr /mnt/sdb/backup/qemu/util/qemu-sockets.c:379
    #8 0x5641340851be in inet_connect_saddr /mnt/sdb/backup/qemu/util/qemu-sockets.c:458
    #9 0x5641340870ab in socket_connect /mnt/sdb/backup/qemu/util/qemu-sockets.c:1105
    #10 0x564133eeaabf in qio_channel_socket_connect_sync /mnt/sdb/backup/qemu/io/channel-socket.c:145
    #11 0x564133eeabf5 in qio_channel_socket_connect_worker /mnt/sdb/backup/qemu/io/channel-socket.c:168

Reported-by: Euler Robot <[email protected]>
Signed-off-by: Pan Nengyuan <[email protected]>
Message-Id: <20200506095416 [email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/xbzrle: add encoding rate

Users may need to check the xbzrle encoding rate to know if the guest
memory is xbzrle encoding-friendly, and dynamically turn off the
encoding if the encoding rate is low.

Signed-off-by: Yi Sun <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Message-Id: <1588208375 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/rdma: fix a memleak on error path in rdma_start_incoming_migration

'rdma->host' is malloced in qemu_rdma_data_init, but forgot to free on the error
path in rdma_start_incoming_migration(), this patch fix that.

The leak stack:
Direct leak of 2 byte(s) in 1 object(s) allocated from:
    #0 0x7fb7add18ae8 in __interceptor_malloc (/lib64/libasan.so.5+0xefae8)
    #1 0x7fb7ad0df1d5 in g_malloc (/lib64/libglib-2.0.so.0+0x531d5)
    #2 0x7fb7ad0f8b32 in g_strdup (/lib64/libglib-2.0.so.0+0x6cb32)
    #3 0x55a0464a0f6f in qemu_rdma_data_init /mnt/sdb/qemu/migration/rdma.c:2647
    #4 0x55a0464b0e76 in rdma_start_incoming_migration /mnt/sdb/qemu/migration/rdma.c:4020
    #5 0x55a0463f898a in qemu_start_incoming_migration /mnt/sdb/qemu/migration/migration.c:365
    #6 0x55a0458c75d3 in qemu_init /mnt/sdb/qemu/softmmu/vl.c:4438
    #7 0x55a046a3d811 in main /mnt/sdb/qemu/softmmu/main.c:48
    #8 0x7fb7a8417872 in __libc_start_main (/lib64/libc.so.6+0x23872)
    #9 0x55a04536b26d in _start (/mnt/sdb/qemu/build/x86_64-softmmu/qemu-system-x86_64+0x286926d)

Reported-by: Euler Robot <[email protected]>
Signed-off-by: Pan Nengyuan <[email protected]>
Message-Id: <20200420102727 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/ram: Consolidate variable reset after placement in ram_load_postcopy()

Let's consolidate resetting the variables.

Cc: "Dr. David Alan Gilbert" <[email protected]>
Cc: Juan Quintela <[email protected]>
Cc: Peter Xu <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20200421085300 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Fixup for context conflicts with 91ba442

migration/throttle: Add cpu-throttle-tailslow migration parameter

At the tail stage of throttling, the Guest is very sensitive to
CPU percentage while the @cpu-throttle-increment is excessive
usually at tail stage.

If this parameter is true, we will compute the ideal CPU percentage
used by the Guest, which may exactly make the dirty rate match the
dirty rate threshold. Then we will choose a smaller throttle increment
between the one specified by @cpu-throttle-increment and the one
generated by ideal CPU percentage.

Therefore, it is compatible to traditional throttling, meanwhile
the throttle increment won't be excessive at tail stage. This may
make migration time longer, and is disabled by default.

Signed-off-by: Keqian Zhu <[email protected]>
Message-Id: <20200413101508 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/colo: Add missing error-propagation code

Running the coccinelle script produced:

  $ spatch \
    --macro-file scripts/cocci-macro-file.h --include-headers \
    --sp-file scripts/coccinelle/find-missing-error_propagate.cocci \
    --keep-comments --smpl-spacing --dir .
  HANDLING: ./migration/colo.c
  [[manual check required: error_propagate() might be missing in migrate_set_block_enabled() ./migration/colo.c:439:4]]

Add the missing error_propagate() after review.

Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <20200413205250 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

docs/devel/migration: start a debugging section

Explain how to use analyze-migration.py, this may help.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20200330174852 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: move the units of migrate parameters from milliseconds to ms

Signed-off-by: Mao Zhongyi <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Message-Id: <474bb6cf67defb8be9de5035c11aee57a680557a.1585641083 [email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

monitor/hmp-cmds: add hmp_handle_error() for hmp_migrate_set_speed()

Signed-off-by: Mao Zhongyi <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Message-Id: <305323f835436023c53d759f5ab18af3ec874183.1585641083 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/migration: improve error reporting for migrate parameters

use QERR_INVALID_PARAMETER_VALUE instead of
"Parameter '%s' expects" for consistency.

Signed-off-by: Mao Zhongyi <[email protected]>
Message-Id: <4ce71da4a5f98ad6ead0806ec71043473dcb4c07.1585641083 [email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: fix bad indentation in error_report()

bad indentation conflicts with CODING_STYLE doc.

Signed-off-by: Mao Zhongyi <[email protected]>
Message-Id: <09f7529c665cac0c6a5e032ac6fdb6ca701f7e37.1585329482 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

Merge remote-tracking branch 'remotes/berrange/tags/qcrypto-next-pull-request' into staging

Misc crypto subsystem fixes

* Improve error message for large files when creating LUKS volumes
* Expand crypto hash benchmark coverage
* Misc code refactoring with no functional change

# gpg: Signature made Thu 07 May 2020 12:57:02 BST
# gpg:                using RSA key DAF3A6FDB26B62912D0E8E3FBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <[email protected]>" [full]
# gpg:                 aka "Daniel P. Berrange <[email protected]>" [full]
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF

* remotes/berrange/tags/qcrypto-next-pull-request:
  crypto: extend hash benchmark to cover more algorithms
  block: luks: better error message when creating too large files
  crypto: Redundant type conversion for AES_KEY pointer
  crypto/secret: fix inconsequential errors.
  crypto: fix getter of a QCryptoSecret's property

Signed-off-by: Peter Maydell <[email protected]>

crypto: extend hash benchmark to cover more algorithms

Extend the hash benchmark so that it can validate all algorithms
supported by QEMU instead of being limited to sha256.

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Daniel P. Berrangé <[email protected]>

block: luks: better error message when creating too large files

Currently if you attampt to create too large file with luks you
get the following error message:

Formatting 'test.luks', fmt=luks size=17592186044416 key-secret=sec0
qemu-img: test.luks: Could not resize file: File too large

While for raw format the error message is
qemu-img: test.img: The image size is too large for file format 'raw'

The reason for this is that qemu-img checks for errono of the failure,
and presents the later error when it is -EFBIG

However crypto generic code 'swallows' the errno and replaces it
with -EIO.

As an attempt to make it better, we can make luks driver,
detect -EFBIG and in this case present a better error message,
which is what this patch does

The new error message is:

qemu-img: error creating test.luks: The requested file size is too large

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534898
Signed-off-by: Maxim Levitsky <[email protected]>
Signed-off-by: Daniel P. Berrangé <[email protected]>

crypto: Redundant type conversion for AES_KEY pointer

We can delete the redundant type conversion if
we set the the AES_KEY parameter with 'const' in
qcrypto_cipher_aes_ecb_(en|de)crypt() function.

Reported-by: Euler Robot <[email protected]>
Signed-off-by: Chen Qun <[email protected]>
Signed-off-by: Daniel P. Berrangé <[email protected]>

crypto/secret: fix inconsequential errors.

Change condition from QCRYPTO_SECRET_FORMAT_RAW
to QCRYPTO_SECRET_FORMAT_BASE64 in if-operator, because
this is potential error if you add another format value.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Alexey Krasikov <[email protected]>
Signed-off-by: Daniel P. Berrangé <[email protected]>

crypto: fix getter of a QCryptoSecret's property

This fixes the condition-check done by the "loaded" property
getter, such that the property returns true even when the
secret is loaded by the 'file' option.

Signed-off-by: Tong Ho <[email protected]>
Signed-off-by: Daniel P. Berrangé <[email protected]>

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.1-20200507' into staging

ppc patch queue for 2020-04-07

First pull request for qemu-5.1.  This includes:
* Removal of all remaining cases where we had CAS triggered reboots
* A number of improvements to NMI injection
* Support for partition scoped radix translation in softmmu
* Some fixes for NVDIMM handling
* A handful of other minor fixes

# gpg: Signature made Thu 07 May 2020 06:00:55 BST
# gpg:                using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <[email protected]>" [full]
# gpg:                 aka "David Gibson (Red Hat) <[email protected]>" [full]
# gpg:                 aka "David Gibson (ozlabs.org) <[email protected]>" [full]
# gpg:                 aka "David Gibson (kernel.org) <[email protected]>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-5.1-20200507:
  target-ppc: fix rlwimi, rlwinm, rlwnm for Clang-9
  spapr_nvdimm: Tweak error messages
  spapr_nvdimm.c: make 'label-size' mandatory
  target/ppc: Add support for Radix partition-scoped translation
  target/ppc: Rework ppc_radix64_walk_tree() for partition-scoped translation
  target/ppc: Extend ppc_radix64_check_prot() with a 'partition_scoped' bool
  target/ppc: Introduce ppc_radix64_xlate() for Radix tree translation
  spapr: Don't allow unplug of NVLink2 devices
  target/ppc: Assert if HV mode is set when running under a pseries machine
  target/ppc: Introduce a relocation bool in ppc_radix64_handle_mmu_fault()
  target/ppc: Enforce that the root page directory size must be at least 5
  spapr: Drop CAS reboot flag
  spapr/cas: Separate CAS handling from rebuilding the FDT
  spapr: Simplify selection of radix/hash during CAS
  ppc/pnv: Add support for NMI interface
  ppc/spapr: tweak change system reset helper
  spapr: Don't check capabilities removed between CAS calls
  target/ppc: Improve syscall exception logging

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20200506' into staging

Add tcg_gen_gvec_dup_imm
Misc tcg patches

# gpg: Signature made Wed 06 May 2020 19:23:43 BST
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Richard Henderson <[email protected]>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth/tags/pull-tcg-20200506:
  tcg: Fix integral argument type to tcg_gen_rot[rl]i_i{32,64}
  tcg: Add load_dest parameter to GVecGen2
  tcg: Improve vector tail clearing
  tcg: Add tcg_gen_gvec_dup_tl
  tcg: Remove tcg_gen_gvec_dup{8,16,32,64}i
  tcg: Use tcg_gen_gvec_dup_imm in logical simplifications
  target/arm: Use tcg_gen_gvec_dup_imm
  target/ppc: Use tcg_gen_gvec_dup_imm
  target/s390x: Use tcg_gen_gvec_dup_imm
  tcg: Add tcg_gen_gvec_dup_imm

Signed-off-by: Peter Maydell <[email protected]>

target-ppc: fix rlwimi, rlwinm, rlwnm for Clang-9

Starting with Clang v9, -Wtype-limits is implemented and triggers a
few "result of comparison is always true" errors when compiling PPC32
targets.

The comparisons seem to be necessary only on PPC64, since the
else branch in PPC32 only has a "g_assert_not_reached();" in all cases.

This patch restructures the code so that the actual if/else is done on a
local flag variable, that is set accordingly for PPC64, and always
true for PPC32.

Signed-off-by: Daniele Buono <[email protected]>
Message-Id: <20200505183818 [email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr_nvdimm: Tweak error messages

The restrictions here (which are checked at pre-plug time) are PAPR
specific, rather than being inherent to the NVDIMM devices. Adjust the
error messages to be clearer about this.

Signed-off-by: David Gibson <[email protected]>

spapr_nvdimm.c: make 'label-size' mandatory

The pseries machine does not support NVDIMM modules without label.
Attempting to do so, even if the overall block size is aligned with
256MB, will seg fault the guest kernel during NVDIMM probe. This
can be avoided by forcing 'label-size' to always be present for
sPAPR NVDIMMs.

The verification was put before the alignment check because the
presence of label-size affects the alignment calculation, so
it's not optimal to warn the user about an alignment error,
then about the lack of label-size, then about a new alignment
error when the user sets a label-size.

Signed-off-by: Daniel Henrique Barboza <[email protected]>
Message-Id: <20200413203628 [email protected]>
Signed-off-by: David Gibson <[email protected]>

target/ppc: Add support for Radix partition-scoped translation

The Radix tree translation model currently supports process-scoped
translation for the PowerNV machine (Hypervisor mode) and for the
pSeries machine (Guest mode). Guests running under an emulated
Hypervisor (PowerNV machine) require a new type of Radix translation,
called partition-scoped, which is missing today.

The Radix tree translation is a 2 steps process. The first step,
process-scoped translation, converts an effective Address to a guest
real address, and the second step, partition-scoped translation,
converts a guest real address to a host real address.

There are difference cases to covers :

* Hypervisor real mode access: no Radix translation.

* Hypervisor or host application access (quadrant 0 and 3) with
  relocation on: process-scoped translation.

* Guest OS real mode access: only partition-scoped translation.

* Guest OS real or guest application access (quadrant 0 and 3) with
  relocation on: both process-scoped translation and partition-scoped
  translations.

* Hypervisor access in quadrant 1 and 2 with relocation on: both
  process-scoped translation and partition-scoped translations.

The radix tree partition-scoped translation is performed using tables
pointed to by the first double-word of the Partition Table Entries and
process-scoped translation uses tables pointed to by the Process Table
Entries (second double-word of the Partition Table Entries).

Both partition-scoped and process-scoped translations process are
identical and thus the radix tree traversing code is largely reused.
However, errors in partition-scoped translations generate hypervisor
exceptions.

Signed-off-by: Suraj Jitindar Singh <[email protected]>
Signed-off-by: Greg Kurz <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>
Message-Id: <20200403140056 [email protected]>
[dwg: Fixup from Greg Kurz folded in]
Signed-off-by: David Gibson <[email protected]>

target/ppc: Rework ppc_radix64_walk_tree() for partition-scoped translation

The ppc_radix64_walk_tree() routine walks through the nested radix
tables to look for a PTE.

Split it in two and introduce a new routine ppc_radix64_next_level()
which we will use for partition-scoped Radix translation when
translating the process tree addresses. The prototypes are slightly
change to use a 'AddressSpace *' parameter, instead of a 'PowerPCCPU *'
which is not required, and to return an error code instead of a PTE
value. It clarifies error handling in the callers.

Signed-off-by: Suraj Jitindar Singh <[email protected]>
Signed-off-by: Greg Kurz <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>
Message-Id: <20200403140056 [email protected]>
Signed-off-by: David Gibson <[email protected]>

target/ppc: Extend ppc_radix64_check_prot() with a 'partition_scoped' bool

This prepares ground for partition-scoped Radix translation.

Signed-off-by: Suraj Jitindar Singh <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Message-Id: <20200403140056 [email protected]>
Signed-off-by: David Gibson <[email protected]>

target/ppc: Introduce ppc_radix64_xlate() for Radix tree translation

This is moving code under a new ppc_radix64_xlate() routine shared by
the MMU Radix page fault handler and the 'get_phys_page_debug' PPC
callback. The difference being that 'get_phys_page_debug' does not
generate exceptions.

The specific part of process-scoped Radix translation is moved under
ppc_radix64_process_scoped_xlate() in preparation of the future support
for partition-scoped Radix translation. Routines raising the exceptions
now take a 'cause_excp' bool to cover the 'get_phys_page_debug' case.

It should be functionally equivalent.

Signed-off-by: Suraj Jitindar Singh <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>
Message-Id: <20200403140056 [email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr: Don't allow unplug of NVLink2 devices

Currently, we can't properly handle unplug of NVLink2 devices, because we
don't have code to tear down their special memory resources. There's not
a lot of impetus to implement that: since hardware NVLink2 devices can't
be hot unplugged, the guest side drivers don't usually support unplug
anyway.

Therefore, simply prevent unplug of NVLink2 devices.

Signed-off-by: David Gibson <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>

target/ppc: Assert if HV mode is set when running under a pseries machine

Signed-off-by: Suraj Jitindar Singh <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>
Message-Id: <20200330094946 [email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Signed-off-by: David Gibson <[email protected]>

target/ppc: Introduce a relocation bool in ppc_radix64_handle_mmu_fault()

It will ease the introduction of new routines for partition-scoped
Radix translation.

Signed-off-by: Suraj Jitindar Singh <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>
Message-Id: <20200330094946 [email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Signed-off-by: David Gibson <[email protected]>

target/ppc: Enforce that the root page directory size must be at least 5

According to the ISA the root page directory size of a radix tree for
either process- or partition-scoped translation must be >= 5.

Thus add this to the list of conditions checked when validating the
partition table entry in validate_pate();

Signed-off-by: Suraj Jitindar Singh <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>
Message-Id: <20200330094946 [email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr: Drop CAS reboot flag

The CAS reboot flag is false by default and all the locations that
could set it to true have been dropped. This means that all code
blocks depending on the flag being set is dead code and the other
code blocks should be executed always.

Just do that and drop the now uneeded CAS reboot flag. Fix a
comment on the way to make checkpatch happy.

Signed-off-by: Greg Kurz <[email protected]>
Message-Id: <158514994893.478799.11772512888322840990 [email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr/cas: Separate CAS handling from rebuilding the FDT

At the moment "ibm,client-architecture-support" ("CAS") is implemented
in SLOF and QEMU assists via the custom H_CAS hypercall which copies
an updated flatten device tree (FDT) blob to the SLOF memory which
it then uses to update its internal tree.

When we enable the OpenFirmware client interface in QEMU, we won't need
to copy the FDT to the guest as the client is expected to fetch
the device tree using the client interface.

This moves FDT rebuild out to a separate helper which is going to be
called from the "ibm,client-architecture-support" handler and leaves
writing FDT to the guest in the H_CAS handler.

This should not cause any behavioral change.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Message-Id: <20200310050733 [email protected]>
Signed-off-by: Greg Kurz <[email protected]>
Message-Id: <158514994229.478799.2178881312094922324 [email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr: Simplify selection of radix/hash during CAS

The guest can select the MMU mode by setting bits 0-1 of byte 24
in OV5 to to 0b00 for hash or 0b01 for radix. As required by the
architecture, we terminate the boot process if any other value
is found there.

The usual way to negotiate features in OV5 is basically ANDing
the bitfield provided by the guest and the bitfield of features
supported by QEMU, previously populated at machine init.

For some not documented reason, MMU is treated differently : bit 1
of byte 24 (the radix/hash bit) is cleared from the guest OV5 and
explicitely set in the final negotiated OV5 if radix was requested.

Since the only expected input from the guest is the radix/hash bit
being set or not, it seems more appropriate to handle this like we
do for XIVE.

Set the radix bit in spapr->ov5 at machine init if it has a chance
to work (ie. power9, either TCG or a radix capable KVM) and rely
exclusively on spapr_ovec_intersect() to set the radix bit in
spapr->ov5_cas.

Signed-off-by: Greg Kurz <[email protected]>
Message-Id: <158514993621.478799.4204740354545734293 [email protected]>
Signed-off-by: David Gibson <[email protected]>

ppc/pnv: Add support for NMI interface

This implements the NMI interface for the PNV machine, similarly to
commit 3431648272d ("spapr: Add support for new NMI interface") for
SPAPR.

Signed-off-by: Nicholas Piggin <[email protected]>
Message-Id: <20200325144147 [email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>
Signed-off-by: David Gibson <[email protected]>

ppc/spapr: tweak change system reset helper

Rather than have the helper take an optional vector address
override, instead have its caller modify env->nip itself.
This is more consistent when adding pnv nmi support, and also
with mce injection added later.

Signed-off-by: Nicholas Piggin <[email protected]>
Message-Id: <20200325144147 [email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr: Don't check capabilities removed between CAS calls

We currently check if some capability in OV5 was removed by the guest
since the previous CAS, and we trigger a CAS reboot in that case. This
was required because it could call for a device-tree property or node
removal, that we didn't support until recently (see commit 6787d27b04a7
"spapr: add option vector handling in CAS-generated resets" for details).

Now that we render a full FDT at CAS and that SLOF is able to handle
node removal, we don't need to do a CAS reset in this case anymore.
Also, this check can only return true if the guest has already called
CAS since the last full system reset (otherwise spapr->ov5_cas is
empty). Linux doesn't do that so this can be considered as dead code
for the vast majority of existing setups.

Drop the check. Since the only use of the ov5_cas_old variable is
precisely the check itself, drop the variable as well.

Signed-off-by: Greg Kurz <[email protected]>
Message-Id: <158514993021.478799.10928618293640651819 [email protected]>
Signed-off-by: David Gibson <[email protected]>

target/ppc: Improve syscall exception logging

system calls (at least in Linux) use registers r3-r8 for inputs, so
include those registers in the dump.

This also adds a mode for PAPR hcalls, which have a different calling
convention.

Signed-off-by: Nicholas Piggin <[email protected]>
Message-Id: <20200317054918 [email protected]>
Signed-off-by: David Gibson <[email protected]>

Merge remote-tracking branch 'remotes/stefanberger/tags/pull-tpm-2020-05-06-1' into staging

Merge tpm 2020/05/06 v1

# gpg: Signature made Wed 06 May 2020 15:16:17 BST
# gpg:                using RSA key B818B9CADF9089C2D5CEC66B75AD65802A0B4211
# gpg: Good signature from "Stefan Berger <[email protected]>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B818 B9CA DF90 89C2 D5CE  C66B 75AD 6580 2A0B 4211

* remotes/stefanberger/tags/pull-tpm-2020-05-06-1:
  hw: add compat machines for 5.1
  hw/arm/virt: Remove the compat forcing tpm-tis-device PPI to off
  tpm: tpm-tis-device: set PPI to false by default

Signed-off-by: Peter Maydell <[email protected]>

tcg: Fix integral argument type to tcg_gen_rot[rl]i_i{32,64}

For the benefit of compatibility of function pointer types,
we have standardized on int32_t and int64_t as the integral
argument to tcg expanders.

We converted most of them in 474b2e8f0f7, but missed the rotates.

Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Add load_dest parameter to GVecGen2

We have this same parameter for GVecGen2i, GVecGen3,
and GVecGen3i. This will make some SVE2 insns easier
to parameterize.

Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Improve vector tail clearing

Better handling of non-power-of-2 tails as seen with Arm 8-byte
vector operations.

Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Add tcg_gen_gvec_dup_tl

For use when a target needs to pass a configure-specific
target_ulong value to duplicate.

Reviewed-by: LIU Zhiwei <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Remove tcg_gen_gvec_dup{8,16,32,64}i

These interfaces are now unused.

Reviewed-by: LIU Zhiwei <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Use tcg_gen_gvec_dup_imm in logical simplifications

Replace the outgoing interface.

Reviewed-by: LIU Zhiwei <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>