Git Repo - qemu.git/log

ACPI: Record Generic Error Status Block(GESB) table

kvm_arch_on_sigbus_vcpu() error injection uses source_id as
index in etc/hardware_errors to find out Error Status Data
Block entry corresponding to error source. So supported source_id
values should be assigned here and not be changed afterwards to
make sure that guest will write error into expected Error Status
Data Block.

Before QEMU writes a new error to ACPI table, it will check whether
previous error has been acknowledged. If not acknowledged, the new
errors will be ignored and not be recorded. For the errors section
type, QEMU simulate it to memory section error.

Signed-off-by: Dongjiu Geng <[email protected]>
Signed-off-by: Xiang Zheng <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Message-id: 20200512030609 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

KVM: Move hwpoison page related functions into kvm-all.c

kvm_hwpoison_page_add() and kvm_unpoison_all() will both
be used by X86 and ARM platforms, so moving them into
"accel/kvm/kvm-all.c" to avoid duplicate code.

For architectures that don't use the poison-list functionality
the reset handler will harmlessly do nothing, so let's register
the kvm_unpoison_all() function in the generic kvm_init() function.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Dongjiu Geng <[email protected]>
Signed-off-by: Xiang Zheng <[email protected]>
Acked-by: Xiang Zheng <[email protected]>
Message-id: 20200512030609 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

ACPI: Record the Generic Error Status Block address

Record the GHEB address via fw_cfg file, when recording
a error to CPER, it will use this address to find out
Generic Error Data Entries and write the error.

In order to avoid migration failure, make hardware
error table address to a part of GED device instead
of global variable, then this address will be migrated
to target QEMU.

Acked-by: Xiang Zheng <[email protected]>
Signed-off-by: Dongjiu Geng <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Message-id: 20200512030609 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

ACPI: Build Hardware Error Source Table

This patch builds Hardware Error Source Table(HEST) via fw_cfg blobs.
Now it only supports ARMv8 SEA, a type of Generic Hardware Error
Source version 2(GHESv2) error source. Afterwards, we can extend
the supported types if needed. For the CPER section, currently it
is memory section because kernel mainly wants userspace to handle
the memory errors.

This patch follows the spec ACPI 6.2 to build the Hardware Error
Source table. For more detailed information, please refer to
document: docs/specs/acpi_hest_ghes.rst

build_ghes_hw_error_notification() helper will help to add Hardware
Error Notification to ACPI tables without using packed C structures
and avoid endianness issues as API doesn't need explicit conversion.

Signed-off-by: Xiang Zheng <[email protected]>
Signed-off-by: Dongjiu Geng <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Message-id: 20200512030609 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

ACPI: Build related register address fields via hardware error fw_cfg blob

This patch builds error_block_address and read_ack_register fields
in hardware errors table , the error_block_address points to Generic
Error Status Block(GESB) via bios_linker. The max size for one GESB
is 1kb, For more detailed information, please refer to
document: docs/specs/acpi_hest_ghes.rst

Now we only support one Error source, if necessary, we can extend to
support more.

Suggested-by: Laszlo Ersek <[email protected]>
Signed-off-by: Xiang Zheng <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Signed-off-by: Dongjiu Geng <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Message-id: 20200512030609 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

docs: APEI GHES generation and CPER record description

Add APEI/GHES detailed design document

Signed-off-by: Dongjiu Geng <[email protected]>
Signed-off-by: Xiang Zheng <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Message-id: 20200512030609 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

hw/arm/virt: Introduce a RAS machine option

RAS Virtualization feature is not supported now, so
add a RAS machine option and disable it by default.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Dongjiu Geng <[email protected]>
Signed-off-by: Xiang Zheng <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Message-id: 20200512030609 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

acpi: nvdimm: change NVDIMM_UUID_LE to a common macro

The little end UUID is used in many places, so make
NVDIMM_UUID_LE to a common macro to convert the UUID
to a little end array.

Reviewed-by: Xiang Zheng <[email protected]>
Signed-off-by: Dongjiu Geng <[email protected]>
Message-id: 20200512030609 [email protected]
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

aspeed: Add support for the sonorapass-bmc board

Sonora Pass is a 2 socket x86 motherboard designed by Facebook
and supported by OpenBMC. Strapping configuration was obtained
from hardware and i2c configuration is based on dts found at:

https://github.com/facebook/openbmc-linux/blob/1633c87b8ba7c162095787c988979b748ba65dc8/arch/arm/boot/dts/aspeed-bmc-facebook-sonorapass.dts

Booted a test image of http://github.com/facebook/openbmc to login
prompt.

Signed-off-by: Patrick Williams <[email protected]>
Reviewed-by: Amithash Prasad <[email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>
[PMM: fixed block comment style nit]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Vectorize SABA/UABA

Include 64-bit element size in preparation for SVE2.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200513163245 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Vectorize SABD/UABD

Include 64-bit element size in preparation for SVE2.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200513163245 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Clear tail in gvec_fmul_idx_*, gvec_fmla_idx_*

Must clear the tail for AdvSIMD when SVE is enabled.

Fixes: ca40a6e6e39
Cc: [email protected]
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200513163245 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Pass pointer to qc to qrdmla/qrdmls

Pass a pointer directly to env->vfp.qc[0], rather than env.
This will allow SVE2, which does not modify QC, to pass a
pointer to dummy storage.

Change the return type of inl_qrdml.h_s16 to match the
sense of the operation: signed.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200513163245 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Create gen_gvec_{qrdmla,qrdmls}

Provide a functional interface for the vector expansion.
This fits better with the existing set of helpers that
we provide for other operations.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200513163245 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Remove fp_status from helper_{recpe, rsqrte}_u32

These operations do not touch fp_status.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200513163245 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Create gen_gvec_{uqadd, sqadd, uqsub, sqsub}

Provide a functional interface for the vector expansion.
This fits better with the existing set of helpers that
we provide for other operations.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200513163245 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Create gen_gvec_{cmtst,ushl,sshl}

Provide a functional interface for the vector expansion.
This fits better with the existing set of helpers that
we provide for other operations.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200513163245 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Swap argument order for VSHL during decode

Rather than perform the argument swap during code generation,
perform it during decode. This means it doesn't have to be
special cased later, and we can share code with aarch64 code
generation. Hopefully the decode comment addresses any confusion
that might arise in between.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200513163245 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Create gen_gvec_{mla,mls}

Provide a functional interface for the vector expansion.
This fits better with the existing set of helpers that
we provide for other operations.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200513163245 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Create gen_gvec_{ceq,clt,cle,cgt,cge}0

Provide a functional interface for the vector expansion.
This fits better with the existing set of helpers that
we provide for other operations.

Macro-ize the 5 nearly identical comparisons.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200513163245 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Tidy handle_vec_simd_shri

Now that we've converted all cases to gvec, there is quite a bit
of dead code at the end of the function. Remove it.

Sink the call to gen_gvec_fn2i to the end, loading a function
pointer within the switch statement.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200513163245 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Remove unnecessary range check for VSHL

In 1dc8425e551, while converting to gvec, I added an extra range check
against the shift count. This was unnecessary because the encoding of
the shift count produces 0 to the element size - 1.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200513163245 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Create gen_gvec_{sri,sli}

The functions eliminate duplication of the special cases for
this operation. They match up with the GVecGen2iFn typedef.

Add out-of-line helpers. We got away with only having inline
expanders because the neon vector size is only 16 bytes, and
we know that the inline expansion will always succeed.
When we reuse this for SVE, tcg-gvec-op may decide to use an
out-of-line helper due to longer vector lengths.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200513163245 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Create gen_gvec_{u,s}{rshr,rsra}

Create vectorized versions of handle_shri_with_rndacc
for shift+round and shift+round+accumulate. Add out-of-line
helpers in preparation for longer vector lengths from SVE.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200513163245 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Create gen_gvec_[us]sra

The functions eliminate duplication of the special cases for
this operation. They match up with the GVecGen2iFn typedef.

Add out-of-line helpers. We got away with only having inline
expanders because the neon vector size is only 16 bytes, and
we know that the inline expansion will always succeed.
When we reuse this for SVE, tcg-gvec-op may decide to use an
out-of-line helper due to longer vector lengths.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200513163245 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Use correct GDB XML for M-profile cores

GDB's remote protocol requires M-profile cores to use the feature
name 'org.gnu.gdb.arm.m-profile' instead of the 'org.gnu.gdb.arm.core'
feature used for A- and R-profile cores. We weren't doing this, which
meant GDB treated our M-profile cores like A-profile ones. This mostly
doesn't matter, but for instance means that it doesn't correctly
handle backtraces where an M-profile exception frame is involved.

Ship a copy of GDB's arm-m-profile.xml and use it on the M-profile
cores. The integer registers have the same offsets as the
arm-core.xml, but register 25 is the M-profile XPSR rather than the
A-profile CPSR, so we need to update arm_cpu_gdb_read_register() and
arm_cpu_gdb_write_register() to handle XSPR reads and writes.

Fixes: https://bugs.launchpad.net/qemu/+bug/1877136
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: 20200507134755 [email protected]

Merge remote-tracking branch 'remotes/gkurz/tags/9p-next-2020-05-14' into staging

Changes:
- Christian Schoenebeck is now co-maintainer for 9pfs
- relax checks for O_NOATIME
- minor documentation updates

# gpg: Signature made Thu 14 May 2020 08:14:37 BST
# gpg:                using RSA key B4828BAF943140CEF2A3491071D4D5E5822F73D6
# gpg: Good signature from "Greg Kurz <[email protected]>" [full]
# gpg:                 aka "Gregory Kurz <[email protected]>" [full]
# gpg:                 aka "[jpeg image of size 3330]" [full]
# Primary key fingerprint: B482 8BAF 9431 40CE F2A3  4910 71D4 D5E5 822F 73D6

* remotes/gkurz/tags/9p-next-2020-05-14:
  xen-9pfs: Fix log messages of reply errors
  9pfs: local: ignore O_NOATIME if we don't have permissions
  qemu-options.hx: 9p: clarify -virtfs vs. -fsdev
  MAINTAINERS: Upgrade myself as 9pfs co-maintainer

Signed-off-by: Peter Maydell <[email protected]>

xen-9pfs: Fix log messages of reply errors

If delivery of some 9pfs response fails for some reason, log the
error message by mentioning the 9P protocol reply type, not by
client's request type. The latter could be misleading that the
error occurred already when handling the request input.

Signed-off-by: Christian Schoenebeck <[email protected]>
Acked-by: Stefano Stabellini <[email protected]>
Message-Id: <ad0e5a9b6abde52502aa40b30661d29aebe1590a.1589132512 [email protected]>
Signed-off-by: Greg Kurz <[email protected]>

9pfs: local: ignore O_NOATIME if we don't have permissions

QEMU's local 9pfs server passes through O_NOATIME from the client. If
the QEMU process doesn't have permissions to use O_NOATIME (namely, it
does not own the file nor have the CAP_FOWNER capability), the open will
fail. This causes issues when from the client's point of view, it
believes it has permissions to use O_NOATIME (e.g., a process running as
root in the virtual machine). Additionally, overlayfs on Linux opens
files on the lower layer using O_NOATIME, so in this case a 9pfs mount
can't be used as a lower layer for overlayfs (cf.
https://github.com/osandov/drgn/blob/dabfe1971951701da13863dbe6d8a1d172ad9650/vmtest/onoatimehack.c
and https://github.com/NixOS/nixpkgs/issues/54509).

Luckily, O_NOATIME is effectively a hint, and is often ignored by, e.g.,
network filesystems. open(2) notes that O_NOATIME "may not be effective
on all filesystems. One example is NFS, where the server maintains the
access time." This means that we can honor it when possible but fall
back to ignoring it.

Acked-by: Christian Schoenebeck <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Message-Id: <e9bee604e8df528584693a4ec474ded6295ce8ad.1587149256 [email protected]>
Signed-off-by: Greg Kurz <[email protected]>

qemu-options.hx: 9p: clarify -virtfs vs. -fsdev

The docs are ambiguous about the difference (or actually their
equality) between options '-virtfs' vs. '-fsdev'. So clarify that
'-virtfs' is actually just a convenience shortcut for its
generalized form '-fsdev' in conjunction with '-device virtio-9p-pci'.

And as we're at it, also be a bit more descriptive what 9pfs is
actually used for.

Signed-off-by: Christian Schoenebeck <[email protected]>
Acked-by: Cornelia Huck <[email protected]>
Message-Id: <208f1fceffce2feaf7c900b29e326b967dce7762.1585661532 [email protected]>
Signed-off-by: Greg Kurz <[email protected]>

MAINTAINERS: Upgrade myself as 9pfs co-maintainer

As suggested by Greg, let's upgrade myself as co-maintainer of 9pfs.

Signed-off-by: Christian Schoenebeck <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Greg Kurz <[email protected]>

Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2020-05-13' into staging

Block patches:
- zstd compression for qcow2
- Fix use-after-free

# gpg: Signature made Wed 13 May 2020 15:14:06 BST
# gpg:                using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Max Reitz <[email protected]>" [full]
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* remotes/maxreitz/tags/pull-block-2020-05-13:
  block/block-copy: fix use-after-free of task pointer
  iotests: 287: add qcow2 compression type test
  qcow2: add zstd cluster compression
  qcow2: rework the cluster compression routine
  qcow2: introduce compression type feature

Signed-off-by: Peter Maydell <[email protected]>

block/block-copy: fix use-after-free of task pointer

Obviously, we should g_free the task after trace point and offset
update.

Reported-by: Coverity (CID 1428756)
Fixes: 4ce5dd3e9b5ee0fac18625860eb3727399ee965e
Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20200507183800 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

iotests: 287: add qcow2 compression type test

The test checks fulfilling qcow2 requirements for the compression
type feature and zstd compression type operability.

Signed-off-by: Denis Plotnikov <[email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Tested-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-Id: <20200507082521 [email protected]>
Signed-off-by: Max Reitz <[email protected]>

qcow2: add zstd cluster compression

zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.

The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G

The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.

compress cmd:
  time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
                  src.img [zlib|zstd]_compressed.img
decompress cmd
  time ./qemu-img convert -O qcow2
                  [zlib|zstd]_compressed.img uncompressed.img

           compression               decompression
         zlib       zstd           zlib         zstd
------------------------------------------------------------
real     65.5       16.3 (-75 %)    1.9          1.6 (-16 %)
user     65.0       15.8            5.3          2.5
sys       3.3        0.2            2.0          2.0

Both ZLIB and ZSTD gave the same compression ratio: 1.57
compressed image size in both cases: 1.4G

Signed-off-by: Denis Plotnikov <[email protected]>
QAPI part:
Acked-by: Markus Armbruster <[email protected]>
Message-Id: <20200507082521 [email protected]>
Signed-off-by: Max Reitz <[email protected]>

qcow2: rework the cluster compression routine

The patch enables processing the image compression type defined
for the image and chooses an appropriate method for image clusters
(de)compression.

Signed-off-by: Denis Plotnikov <[email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Message-Id: <20200507082521 [email protected]>
Signed-off-by: Max Reitz <[email protected]>

qcow2: introduce compression type feature

The patch adds some preparation parts for incompatible compression type
feature to qcow2 allowing the use different compression methods for
image clusters (de)compressing.

It is implied that the compression type is set on the image creation and
can be changed only later by image conversion, thus compression type
defines the only compression algorithm used for the image, and thus,
for all image clusters.

The goal of the feature is to add support of other compression methods
to qcow2. For example, ZSTD which is more effective on compression than ZLIB.

The default compression is ZLIB. Images created with ZLIB compression type
are backward compatible with older qemu versions.

Adding of the compression type breaks a number of tests because now the
compression type is reported on image creation and there are some changes
in the qcow2 header in size and offsets.

The tests are fixed in the following ways:
    * filter out compression_type for many tests
    * fix header size, feature table size and backing file offset
      affected tests: 031, 036, 061, 080
      header_size +=8: 1 byte compression type
                       7 bytes padding
      feature_table += 48: incompatible feature compression type
      backing_file_offset += 56 (8 + 48 -> header_change + feature_table_change)
    * add "compression type" for test output matching when it isn't filtered
      affected tests: 049, 060, 061, 065, 082, 085, 144, 182, 185, 198, 206,
                      242, 255, 274, 280

Signed-off-by: Denis Plotnikov <[email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
QAPI part:
Acked-by: Markus Armbruster <[email protected]>
Message-Id: <20200507082521 [email protected]>
Signed-off-by: Max Reitz <[email protected]>

Merge remote-tracking branch 'remotes/stefanberger/tags/pull-tpm-2020-05-08-1' into staging

Merge tpm 2020/05/08 v3

# gpg: Signature made Tue 12 May 2020 16:50:34 BST
# gpg:                using RSA key B818B9CADF9089C2D5CEC66B75AD65802A0B4211
# gpg: Good signature from "Stefan Berger <[email protected]>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B818 B9CA DF90 89C2 D5CE  C66B 75AD 6580 2A0B 4211

* remotes/stefanberger/tags/pull-tpm-2020-05-08-1:
  hw/tpm: fix usage of bool in tpm-tis.c

Signed-off-by: Peter Maydell <[email protected]>

hw/tpm: fix usage of bool in tpm-tis.c

Clean up wrong usage of FALSE and TRUE in places that use "bool" from stdbool.h.

FALSE and TRUE (with capital letters) are the constants defined by glib for
being used with the "gboolean" type of glib. But some parts of the code also use
TRUE and FALSE for variables that are declared as "bool" (the type from <stdbool.h>).

Signed-off-by: Jafar Abdi <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20200511' into staging

target-arm queue:
aspeed: Add boot stub for smp booting
target/arm: Drop access_el3_aa32ns_aa64any()
aspeed: Support AST2600A1 silicon revision
aspeed: sdmc: Implement AST2600 locking behaviour
nrf51: Tracing cleanups
target/arm: Improve handling of SVE loads and stores
target/arm: Don't show TCG-only CPUs in KVM-only QEMU builds
hw/arm/musicpal: Map the UART devices unconditionally
target/arm: Fix tcg_gen_gvec_dup_imm vs DUP (indexed)
target/arm: Use tcg_gen_gvec_5_ptr for sve FMLA/FCMLA

# gpg: Signature made Mon 11 May 2020 14:33:14 BST
# gpg:                using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Peter Maydell <[email protected]>" [ultimate]
# gpg:                 aka "Peter Maydell <[email protected]>" [ultimate]
# gpg:                 aka "Peter Maydell <[email protected]>" [ultimate]
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* remotes/pmaydell/tags/pull-target-arm-20200511: (34 commits)
  target/arm: Fix tcg_gen_gvec_dup_imm vs DUP (indexed)
  target/arm: Use tcg_gen_gvec_5_ptr for sve FMLA/FCMLA
  hw/arm/musicpal: Map the UART devices unconditionally
  target/arm: Restrict TCG cpus to TCG accel
  target/arm/cpu: Restrict v8M IDAU interface to Aarch32 CPUs
  target/arm/cpu: Use ARRAY_SIZE() to iterate over ARMCPUInfo[]
  target/arm: Make set_feature() available for other files
  target/arm/kvm: Inline set_feature() calls
  target/arm: Remove sve_memopidx
  target/arm: Reuse sve_probe_page for gather loads
  target/arm: Reuse sve_probe_page for scatter stores
  target/arm: Reuse sve_probe_page for gather first-fault loads
  target/arm: Use SVEContLdSt for contiguous stores
  target/arm: Update contiguous first-fault and no-fault loads
  target/arm: Use SVEContLdSt for multi-register contiguous loads
  target/arm: Handle watchpoints in sve_ld1_r
  target/arm: Use SVEContLdSt in sve_ld1_r
  target/arm: Adjust interface of sve_ld1_host_fn
  target/arm: Add sve infrastructure for page lookup
  target/arm: Drop manual handling of set/clear_helper_retaddr
  ...

Signed-off-by: Peter Maydell <[email protected]>

target/arm: Fix tcg_gen_gvec_dup_imm vs DUP (indexed)

DUP (indexed) can duplicate 128-bit elements, so using esz
unconditionally can assert in tcg_gen_gvec_dup_imm.

Fixes: 8711e71f9cbb
Reported-by: Laurent Desnogues <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Laurent Desnogues <[email protected]>
Tested-by: Laurent Desnogues <[email protected]>
Message-id: 20200507172352 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Use tcg_gen_gvec_5_ptr for sve FMLA/FCMLA

Now that we can pass 7 parameters, do not encode register
operands within simd_data.

Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Taylor Simpson <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200507172352 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

hw/arm/musicpal: Map the UART devices unconditionally

I can't find proper documentation or datasheet, but it is likely
a MMIO mapped serial device mapped in the 0x80000000..0x8000ffff
range belongs to the SoC address space, thus is always mapped in
the memory bus.
Map the devices on the bus regardless a chardev is attached to it.

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Jan Kiszka <[email protected]>
Message-id: 20200505095945 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Restrict TCG cpus to TCG accel

A KVM-only build won't be able to run TCG cpus.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: 20200504172448 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm/cpu: Restrict v8M IDAU interface to Aarch32 CPUs

As IDAU is a v8M feature, restrict it to the Aarch32 CPUs.

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20200504172448 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm/cpu: Use ARRAY_SIZE() to iterate over ARMCPUInfo[]

Use ARRAY_SIZE() to iterate over ARMCPUInfo[].

Since on the aarch64-linux-user build, arm_cpus[] is empty, add
the cpu_count variable and only iterate when it is non-zero.

Suggested-by: Richard Henderson <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: 20200504172448 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Make set_feature() available for other files

Move the common set_feature() and unset_feature() functions
from cpu.c and cpu64.c to cpu.h.

Suggested-by: Peter Maydell <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Eric Auger <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: 20200504172448 [email protected]
Message-ID: <20190921150420 [email protected]>
[PMD: Split Thomas's patch in two: set_feature, cpu_register]
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

target/arm/kvm: Inline set_feature() calls

We want to move the inlined declarations of set_feature()
from cpu*.c to cpu.h. To avoid clashing with the KVM
declarations, inline the few KVM calls.

Suggested-by: Richard Henderson <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: 20200504172448 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Remove sve_memopidx

None of the sve helpers use TCGMemOpIdx any longer, so we can
stop passing it.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Reuse sve_probe_page for gather loads

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Reuse sve_probe_page for scatter stores

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Reuse sve_probe_page for gather first-fault loads

This avoids the need for a separate set of helpers to implement
no-fault semantics, and will enable MTE in the future.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Use SVEContLdSt for contiguous stores

Follow the model set up for contiguous loads. This handles
watchpoints correctly for contiguous stores, recognizing the
exception before any changes to memory.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Update contiguous first-fault and no-fault loads

With sve_cont_ldst_pages, the differences between first-fault and no-fault
are minimal, so unify the routines. With cpu_probe_watchpoint, we are able
to make progress through pages with TLB_WATCHPOINT set when the watchpoint
does not actually fire.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Use SVEContLdSt for multi-register contiguous loads

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Handle watchpoints in sve_ld1_r

Handle all of the watchpoints for active elements all at once,
before we've modified the vector register. This removes the
TLB_WATCHPOINT bit from page[].flags, which means that we can
use the normal fast path via RAM.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Use SVEContLdSt in sve_ld1_r

First use of the new helper functions, so we can remove the
unused markup. No longer need a scratch for user-only, as
we completely probe the page set before reading; system mode
still requires a scratch for MMIO.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Adjust interface of sve_ld1_host_fn

The current interface includes a loop; change it to load a
single element. We will then be able to use the function
for ld{2,3,4} where individual vector elements are not adjacent.

Replace each call with the simplest possible loop over active
elements.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Add sve infrastructure for page lookup

For contiguous predicated memory operations, we want to
minimize the number of tlb lookups performed. We have
open-coded this for sve_ld1_r, but for correctness with
MTE we will need this for all of the memory operations.

Create a structure that holds the bounds of active elements,
and metadata for two pages. Add routines to find those
active elements, lookup the pages, and run watchpoints
for those pages.

Temporarily mark the functions unused to avoid Werror.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Drop manual handling of set/clear_helper_retaddr

Since we converted back to cpu_*_data_ra, we do not need to
do this ourselves.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Use cpu_*_data_ra for sve_ldst_tlb_fn

Use the "normal" memory access functions, rather than the
softmmu internal helper functions directly.

Since fb901c905dc3, cpu_mem_index is now a simple extract
from env->hflags and not a large computation. Which means
that it's now more work to pass around this value than it
is to recompute it.

This only adjusts the primitives, and does not clean up
all of the uses within sve_helper.c.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

accel/tcg: Add endian-specific cpu_{ld, st}* operations

We currently have target-endian versions of these operations,
but no easy way to force a specific endianness. This can be
helpful if the target has endian-specific operations, or a mode
that swaps endianness.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

accel/tcg: Add probe_access_flags

This new interface will allow targets to probe for a page
and then handle watchpoints themselves. This will be most
useful for vector predicated memory operations, where one
page lookup can be used for many operations, and one test
can avoid many watchpoint checks.

Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

accel/tcg: Adjust probe_access call to page_check_range

We have validated that addr+size does not cross a page boundary.
Therefore we need to validate exactly one page. We can achieve
that passing any value 1 <= x <= size to page_check_range.

Passing 1 will simplify the next patch.

Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

accel/tcg: Add block comment for probe_access

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

exec: Fix cpu_watchpoint_address_matches address length

The only caller of cpu_watchpoint_address_matches passes
TARGET_PAGE_SIZE, so the bug is not currently visible.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

exec: Add block comments for watchpoint routines

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200508154359 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

hw/timer/nrf51_timer: Add trace event of counter value update

Add trace event to display timer's counter value updates.

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20200504072822 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

hw/timer/nrf51_timer: Display timer ID in trace events

The NRF51 series SoC have 3 timer peripherals, each having
4 counters. To help differentiate which peripheral is accessed,
display the timer ID in the trace events.

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20200504072822 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

hw/arm/nrf51: Add NRF51_PERIPHERAL_SIZE definition

On the NRF51 series, all peripherals have a fixed I/O size
of 4KiB. Define NRF51_PERIPHERAL_SIZE and use it.

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20200504072822 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

aspeed: sdmc: Implement AST2600 locking behaviour

The AST2600 handles this differently with the extra 'hardlock' state, so
move the testing to the soc specific class' write callback.

Signed-off-by: Joel Stanley <[email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>
Message-id: 20200505090136 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

aspeed: Support AST2600A1 silicon revision

There are minimal differences from Qemu's point of view between the A0
and A1 silicon revisions.

As the A1 exercises different code paths in u-boot it is desirable to
emulate that instead.

Signed-off-by: Joel Stanley <[email protected]>
Reviewed-by: Andrew Jeffery <[email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>
Message-id: 20200504093703 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Drop access_el3_aa32ns_aa64any()

Calling access_el3_aa32ns() works for AArch32 only cores
but it does not handle 32-bit EL2 on top of 64-bit EL3
for mixed 32/64-bit cores.

Merge access_el3_aa32ns_aa64any() into access_el3_aa32ns()
and only use the latter.

Fixes: 68e9c2fe65 ("target-arm: Add VTCR_EL2")
Reported-by: Laurent Desnogues <[email protected]>
Signed-off-by: Edgar E. Iglesias <[email protected]>
Message-id: 20200505141729 [email protected]
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

aspeed: Add boot stub for smp booting

This is a boot stub that is similar to the code u-boot runs, allowing
the kernel to boot the secondary CPU.

u-boot works as follows:

1. Initialises the SMP mailbox area in the SCU at 0x1e6e2180 with default values

2. Copies a stub named 'mailbox_insn' from flash to the SCU, just above the
    mailbox area

3. Sets AST_SMP_MBOX_FIELD_READY to a magic value to indicate the
    secondary can begin execution from the stub

4. The stub waits until the AST_SMP_MBOX_FIELD_GOSIGN register is set to
    a magic value

5. Jumps to the address in AST_SMP_MBOX_FIELD_ENTRY, starting Linux

Linux indicates it is ready by writing the address of its entrypoint
function to AST_SMP_MBOX_FIELD_ENTRY and the 'go' magic number to
AST_SMP_MBOX_FIELD_GOSIGN. The secondary CPU sees this at step 4 and
breaks out of it's loop.

To be compatible, a fixed qemu stub is loaded into the mailbox area. As
qemu can ensure the stub is loaded before execution starts, we do not
need to emulate the AST_SMP_MBOX_FIELD_READY behaviour of u-boot. The
secondary CPU's program counter points to the beginning of the stub,
allowing qemu to start secondaries at step four.

Reboot behaviour is preserved by resetting AST_SMP_MBOX_FIELD_GOSIGN
when the secondaries are reset.

This is only configured when the system is booted with -kernel and qemu
does not execute u-boot first.

Reviewed-by: Cédric Le Goater <[email protected]>
Tested-by: Cédric Le Goater <[email protected]>
Signed-off-by: Joel Stanley <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- qcow2: Fix preallocation on block devices
- backup: Make sure that source and target size match
- vmdk: Fix zero cluster handling
- Follow-up cleanups and fixes for the truncate changes
- iotests: Skip more tests if required drivers are missing

# gpg: Signature made Fri 08 May 2020 13:39:55 BST
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Kevin Wolf <[email protected]>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (30 commits)
  block: Drop unused .bdrv_has_zero_init_truncate
  vhdx: Rework truncation logic
  parallels: Rework truncation logic
  ssh: Support BDRV_REQ_ZERO_WRITE for truncate
  sheepdog: Support BDRV_REQ_ZERO_WRITE for truncate
  rbd: Support BDRV_REQ_ZERO_WRITE for truncate
  nfs: Support BDRV_REQ_ZERO_WRITE for truncate
  file-win32: Support BDRV_REQ_ZERO_WRITE for truncate
  gluster: Drop useless has_zero_init callback
  qcow2: Fix preallocation on block devices
  iotests/055: Use cache.no-flush for vmdk target
  iotests: Backup with different source/target size
  backup: Make sure that source and target size match
  backup: Improve error for bdrv_getlength() failure
  iotests/283: Use consistent size for source and target
  iotests: vmdk: Enable zeroed_grained=on by default
  vmdk: Flush only once in vmdk_L2update()
  vmdk: Don't update L2 table for zero write on zero cluster
  vmdk: Fix partial overwrite of zero cluster
  vmdk: Fix zero cluster allocation
  ...

Signed-off-by: Peter Maydell <[email protected]>

block: Drop unused .bdrv_has_zero_init_truncate

Now that there are no clients of bdrv_has_zero_init_truncate, none of
the drivers need to worry about providing it.

What's more, this eliminates a source of some confusion: a literal
reading of the documentation as written in ceaca56f and implemented in
commit 1dcaf527 claims that a driver which returns 0 for
bdrv_has_zero_init_truncate() must not return 1 for
bdrv_has_zero_init(); this condition was violated for parallels, qcow,
and sometimes for vdi, although in practice it did not matter since
those drivers also lacked .bdrv_co_truncate.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20200428202905 [email protected]>
Acked-by: Richard W.M. Jones <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vhdx: Rework truncation logic

The vhdx driver uses truncation for image growth, with a special case
for blocks that already read as zero but which are only being
partially written. But with a bit of rearranging, it's just as easy
to defer the decision on whether truncation resulted in zeroes to the
actual allocation attempt, reducing the number of places that still
use bdrv_has_zero_init_truncate.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20200428202905 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

parallels: Rework truncation logic

The parallels driver tries to use truncation for image growth, but can
only do so when reads are guaranteed as zero. Now that we have a way
to request zero contents from truncation, we can defer the decision to
actual allocation attempts rather than up front, reducing the number
of places that still use bdrv_has_zero_init_truncate.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20200428202905 [email protected]>
Reviewed-by: Denis V. Lunev <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

ssh: Support BDRV_REQ_ZERO_WRITE for truncate

Our .bdrv_has_zero_init_truncate can detect when the remote side
always zero fills; we can reuse that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it when the server gives it to us for
free.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20200428202905 [email protected]>
Reviewed-by: Richard W.M. Jones <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

sheepdog: Support BDRV_REQ_ZERO_WRITE for truncate

Our .bdrv_has_zero_init_truncate always returns 1 because sheepdog
always 0-fills; we can use that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20200428202905 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

rbd: Support BDRV_REQ_ZERO_WRITE for truncate

Our .bdrv_has_zero_init_truncate always returns 1 because rbd always
0-fills; we can use that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20200428202905 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

nfs: Support BDRV_REQ_ZERO_WRITE for truncate

Our .bdrv_has_zero_init_truncate returns 1 if we detect that the OS
always 0-fills; we can use that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it when the OS gives it to us for
free.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20200428202905 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

file-win32: Support BDRV_REQ_ZERO_WRITE for truncate

When using bdrv_file, .bdrv_has_zero_init_truncate always returns 1;
therefore, we can behave just like file-posix, and always implement
BDRV_REQ_ZERO_WRITE by ignoring it since the OS gives it to us for
free (note that file-posix.c had to use an 'if' because it shared code
between regular files and block devices, but in file-win32.c,
bdrv_host_device uses a separate .bdrv_file_open).

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20200428202905 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

gluster: Drop useless has_zero_init callback

block.c already defaults to 0 if we don't provide a callback; there's
no need to write a callback that always fails.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Message-Id: <20200428202905 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qcow2: Fix preallocation on block devices

Calling bdrv_getlength() to get the pre-truncate file size will not
really work on block devices, because they have always the same length,
and trying to write beyond it will fail with a rather cryptic error
message.

Instead, we should use qcow2_get_last_cluster() and bdrv_getlength()
only as a fallback.

Before this patch:
$ truncate -s 1G test.img
$ sudo losetup -f --show test.img
/dev/loop0
$ sudo qemu-img create -f qcow2 -o preallocation=full /dev/loop0 64M
Formatting '/dev/loop0', fmt=qcow2 size=67108864 cluster_size=65536
preallocation=full lazy_refcounts=off refcount_bits=16
qemu-img: /dev/loop0: Could not resize image: Failed to resize refcount
structures: No space left on device

With this patch:
$ sudo qemu-img create -f qcow2 -o preallocation=full /dev/loop0 64M
Formatting '/dev/loop0', fmt=qcow2 size=67108864 cluster_size=65536
preallocation=full lazy_refcounts=off refcount_bits=16
qemu-img: /dev/loop0: Could not resize image: Failed to resize
underlying file: Preallocation mode 'full' unsupported for this
non-regular file

So as you can see, it still fails, but now the problem is missing
support on the block device level, so we at least get a better error
message.

Note that we cannot preallocate block devices on truncate by design,
because we do not know what area to preallocate. Their length is always
the same, the truncate operation does not change it.

Signed-off-by: Max Reitz <[email protected]>
Message-Id: <20200505141801.1096763 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests/055: Use cache.no-flush for vmdk target

055 uses the backup block job to create a compressed backup of an
$IMGFMT image with both qcow2 and vmdk targets. However, cluster
allocation in vmdk is very slow because it flushes the image file after
each L2 update.

There is no reason why we need this level of safety in this test, so
let's disable flushes for vmdk. For the blockdev-backup tests this is
achieved by simply adding the cache.no-flush=on to the drive_add() for
the target. For drive-backup, the caching flags are copied from the
source node, so we'll also add the flag to the source node, even though
it is not vmdk.

This can make the test run significantly faster (though it doesn't make
a difference on tmpfs). In my usual setup it goes from ~45s to ~15s.

Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20200505064618 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests: Backup with different source/target size

This tests that the backup job catches situations where the target node
has a different size than the source node. It must also forbid resize
operations when the job is already running.

Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20200430142755 [email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

backup: Make sure that source and target size match

Since the introduction of a backup filter node in commit 00e30f05d, the
backup block job crashes when the target image is smaller than the
source image because it will try to write after the end of the target
node without having BLK_PERM_RESIZE. (Previously, the BlockBackend layer
would have caught this and errored out gracefully.)

We can fix this and even do better than the old behaviour: Check that
source and target have the same image size at the start of the block job
and unshare BLK_PERM_RESIZE. (This permission was already unshared
before the same commit 00e30f05d, but the BlockBackend that was used to
make the restriction was removed without a replacement.) This will
immediately error out when starting the job instead of only when writing
to a block that doesn't exist in the target.

Longer target than source would technically work because we would never
write to blocks that don't exist, but semantically these are invalid,
too, because a backup is supposed to create a copy, not just an image
that starts with a copy.

Fixes: 00e30f05de1d19586345ec373970ef4c192c6270
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1778593
Cc: [email protected]
Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20200430142755 [email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

backup: Improve error for bdrv_getlength() failure

bdrv_get_device_name() will be an empty string with modern management
tools that don't use -drive. Use bdrv_get_device_or_node_name() instead
so that the node name is used if the BlockBackend is anonymous.

While at it, start with upper case to make the message consistent with
the rest of the function.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Message-Id: <20200430142755 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests/283: Use consistent size for source and target

The test case forgot to specify the null-co size for the target node.
When adding a check to backup that both sizes match, this would fail
because of the size mismatch and not the behaviour that the test really
wanted to test.

Fixes: a541fcc27c98b96da187c7d4573f3270f3ddd283
Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20200430142755 [email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests: vmdk: Enable zeroed_grained=on by default

In order to avoid bitrot in the zero cluster code in VMDK, enable
zeroed_grain=on by default for the tests.

059 now unsets the default options because zeroed_grain=on works only
with some subformats and the test case tests many different subformats,
including those for which it doesn't work.

Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20200430133007 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vmdk: Flush only once in vmdk_L2update()

If we have a backup L2 table, we currently flush once after writing to
the active L2 table and again after writing to the backup table. A
single flush is enough and makes things a little less slow.

Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20200430133007 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vmdk: Don't update L2 table for zero write on zero cluster

If a cluster is already zeroed, we don't have to call vmdk_L2update(),
which is rather slow because it flushes the image file.

Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20200430133007 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vmdk: Fix partial overwrite of zero cluster

When overwriting a zero cluster, we must not perform copy-on-write from
the backing file, but from a zeroed buffer.

Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20200430133007 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vmdk: Fix zero cluster allocation

m_data must contain valid data even for zero clusters when no cluster
was allocated in the image file. Without this, zero writes segfault with
images that have zeroed_grain=on.

For zero writes, we don't want to allocate a cluster in the image file
even in compressed files.

Fixes: 524089bce43fd1cd3daaca979872451efa2cf7c6
Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20200430133007 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vmdk: Rename VmdkMetaData.valid to new_allocation

m_data is used for zero clusters even though valid == 0. It really only
means that a new cluster was allocated in the image file. Rename it to
reflect this.

While at it, change it from int to bool, too.

Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20200430133007 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qcow2: Avoid integer wraparound in qcow2_co_truncate()

After commit f01643fb8b47e8a70c04bbf45e0f12a9e5bc54de when an image is
extended and BDRV_REQ_ZERO_WRITE is set then the new clusters are
zeroized.

The code however does not detect correctly situations when the old and
the new end of the image are within the same cluster. The problem can
be reproduced with these steps:

   qemu-img create -f qcow2 backing.qcow2 1M
   qemu-img create -f qcow2 -F qcow2 -b backing.qcow2 top.qcow2
   qemu-img resize --shrink top.qcow2 520k
   qemu-img resize top.qcow2 567k

In the last step offset - zero_start causes an integer wraparound.

Signed-off-by: Alberto Garcia <[email protected]>
Message-Id: <20200504155217 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests/113: mark bochs as required to support whitelisting

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20200430124713 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests/109: mark required formats as required to support whitelisting

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20200430124713 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests/055: skip vmdk target tests if vmdk is not whitelisted

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20200430124713 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>