Git Repo - qemu.git/log

tests/acceptance: add base class record/replay kernel tests

This patch adds a base for testing kernel boot recording and replaying.
Each test has the phase of recording and phase of replaying.
Virtual machines just boot the kernel and do not interact with
the network.
Structure and image links for the tests are borrowed from boot_linux_console.py
Testing controls the message pattern at the end of the kernel
boot for both record and replay modes. In replay mode QEMU is also
intended to finish the execution automatically.

Signed-off-by: Pavel Dovgalyuk <[email protected]>
Tested-by: Philippe Mathieu-Daude <[email protected]>
Message-Id: <159073589099.20809.14078431743098373301.stgit@pasha-ThinkPad-X280>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
[PMD: Keep imports sorted alphabetically]
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>

MAINTAINERS: Add an entry to review Avocado based acceptance tests

    Acceptance tests can test any piece of the QEMU codebase.
    As such, the directory holding them does not belong to a specific
    subsystem with designated maintainers.

    Each subsystem covered by a test is welcomed to add the test path
    to its section.
    See for example commits 71b290e70, b11785ca2 or 5d480ddde.

Add an entry for to allow reviewers to be notified when acceptance /
integration tests are added or modified.
The designated reviewers are not maintainers, subsystem maintainers
are expected to merge their tests.

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Acked-by: Eduardo Habkost <[email protected]>
Acked-by: Cleber Rosa <[email protected]>
Message-Id: <20200129212345 [email protected]>
Message-Id: <20200605165656 [email protected]>

qht: Fix threshold rate calculation

tests/qht-bench.c:287:29: error: implicit conversion from 'unsigned long'
  to 'double' changes value from 18446744073709551615
  to 18446744073709551616 [-Werror,-Wimplicit-int-float-conversion]
        *threshold = rate * UINT64_MAX;
                          ~ ^~~~~~~~~~

Fix this by splitting the 64-bit constant into two halves,
each of which is individually perfectly representable, the
sum of which produces the correct arithmetic result.

This is very likely just a sticking plaster over some underlying
incorrect code, but it will suppress the warning for the moment.

Cc: Emilio G. Cota <[email protected]>
Reported-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20200618' into staging

s390x update:
- update Linux headers to 5.8-rc1 (for vfio-ccw path handling)
- vfio-ccw: add support for path handling
- documentation fix

# gpg: Signature made Thu 18 Jun 2020 16:36:04 BST
# gpg:                using RSA key C3D0D66DC3624FF6A8C018CEDECF6B93C6F02FAF
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Cornelia Huck <[email protected]>" [marginal]
# gpg:                 aka "Cornelia Huck <[email protected]>" [full]
# gpg:                 aka "Cornelia Huck <[email protected]>" [full]
# gpg:                 aka "Cornelia Huck <[email protected]>" [marginal]
# gpg:                 aka "Cornelia Huck <[email protected]>" [marginal]
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0  18CE DECF 6B93 C6F0 2FAF

* remotes/cohuck/tags/s390x-20200618:
  docs/s390x: fix vfio-ap device_del description
  vfio-ccw: Add support for the CRW region and IRQ
  s390x/css: Refactor the css_queue_crw() routine
  vfio-ccw: Refactor ccw irq handler
  vfio-ccw: Add support for the schib region
  vfio-ccw: Refactor cleanup of regions
  Linux headers: update

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging

# gpg: Signature made Thu 18 Jun 2020 14:16:22 BST
# gpg:                using RSA key EF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <[email protected]>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request: (33 commits)
  net: Drop the NetLegacy structure, always use Netdev instead
  net: Drop the legacy "name" parameter from the -net option
  hw/net/e1000e: Do not abort() on invalid PSRCTL register value
  colo-compare: Fix memory leak in packet_enqueue()
  net/colo-compare.c: Correct ordering in complete and finalize
  net/colo-compare.c: Check that colo-compare is active
  net/colo-compare.c: Only hexdump packets if tracing is enabled
  net/colo-compare.c: Fix deadlock in compare_chr_send
  chardev/char.c: Use qemu_co_sleep_ns if in coroutine
  net/colo-compare.c: Create event_bh with the right AioContext
  net: use peer when purging queue in qemu_flush_or_purge_queue_packets()
  net: cadence_gem: Fix RX address filtering
  net: cadence_gem: TX_LAST bit should be set by guest
  net: cadence_gem: Update the reset value for interrupt mask register
  net: cadnece_gem: Update irq_read_clear field of designcfg_debug1 reg
  net: cadence_gem: Add support for jumbo frames
  net: cadence_gem: Fix up code style
  net: cadence_gem: Move tx/rx packet buffert to CadenceGEMState
  net: cadence_gem: Set ISR according to queue in use
  net: cadence_gem: Define access permission for interrupt registers
  ...

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20200617a' into staging

Migration (and HMP and virtiofs) pull 2020-06-17

Migration:
   HMP/migration and test changes from Mao Zhongyi
   multifd fix from Laurent Vivier
HMP
   qom-set partial reversion/change from David Hildenbrand
      now you need -j to pass json format, but it's regained the
      old 100M type format.
  Memory leak fix from Pan Nengyuan

Virtiofs
  fchmod seccomp fix from Max Reitz

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
# gpg: Signature made Wed 17 Jun 2020 19:34:58 BST
# gpg:                using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <[email protected]>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert/tags/pull-migration-20200617a:
  migration: fix multifd_send_pages() next channel
  docs/xbzrle: update 'cache miss rate' and 'encoding rate' to docs
  monitor/hmp-cmds: improvements for the 'info migrate'
  monitor/hmp-cmds: add 'goto end' to reduce duplicate code.
  monitor/hmp-cmds: delete redundant Error check before invoke hmp_handle_error()
  monitor/hmp-cmds: don't silently output when running 'migrate_set_downtime' fails
  monitor/hmp-cmds: add units for migrate_parameters
  tests/migration: fix unreachable path in stress test
  tests/migration: mem leak fix
  hmp: Make json format optional for qom-set
  qom-hmp-cmds: fix a memleak in hmp_qom_get
  virtiofsd: Whitelist fchmod

Signed-off-by: Peter Maydell <[email protected]>

net: Drop the NetLegacy structure, always use Netdev instead

Now that the "name" parameter is gone, there is hardly any difference
between NetLegacy and Netdev anymore, so we can drop NetLegacy and always
use Netdev to simplify the code quite a bit.

The only two differences that were really left between Netdev and NetLegacy:

1) NetLegacy does not allow a "hubport" type. We can continue to block
   this with a simple check in net_client_init1() for this type.

2) The "id" parameter was optional in NetLegacy (and an internal id
   was chosen via assign_name() during initialization), but it is mandatory
   for Netdev. To avoid that the visitor code bails out here, we have to
   add an internal id to the QemuOpts already earlier now.

Signed-off-by: Thomas Huth <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: Drop the legacy "name" parameter from the -net option

It's been deprecated since QEMU v3.1, so it's time to finally
remove it. The "id" parameter can simply be used instead.

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

hw/net/e1000e: Do not abort() on invalid PSRCTL register value

libFuzzer found using 'qemu-system-i386 -M q35':

qemu: hardware error: e1000e: PSRCTL.BSIZE0 cannot be zero
CPU #0:
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000663
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 0000ffff
IDT=     00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
EFER=0000000000000000
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
==1988== ERROR: libFuzzer: deadly signal
    #6 0x7fae4d3ea894 in __GI_abort (/lib64/libc.so.6+0x22894)
    #7 0x563f4cc59a1d in hw_error (qemu-fuzz-i386+0xe8ca1d)
    #8 0x563f4d7c93f2 in e1000e_set_psrctl (qemu-fuzz-i386+0x19fc3f2)
    #9 0x563f4d7b798f in e1000e_core_write (qemu-fuzz-i386+0x19ea98f)
    #10 0x563f4d7afc46 in e1000e_mmio_write (qemu-fuzz-i386+0x19e2c46)
    #11 0x563f4cc9a0a7 in memory_region_write_accessor (qemu-fuzz-i386+0xecd0a7)
    #12 0x563f4cc99c13 in access_with_adjusted_size (qemu-fuzz-i386+0xeccc13)
    #13 0x563f4cc987b4 in memory_region_dispatch_write (qemu-fuzz-i386+0xecb7b4)

It simply sent the following 2 I/O command to the e1000e
PCI BAR #2 I/O region:

  writew 0x0100 0x0c00 # RCTL =   E1000_RCTL_DTYP_MASK
  writeb 0x2170 0x00   # PSRCTL = 0

2813 static void
2814 e1000e_set_psrctl(E1000ECore *core, int index, uint32_t val)
2815 {
2816     if (core->mac[RCTL] & E1000_RCTL_DTYP_MASK) {
2817
2818         if ((val & E1000_PSRCTL_BSIZE0_MASK) == 0) {
2819             hw_error("e1000e: PSRCTL.BSIZE0 cannot be zero");
2820         }

Instead of calling hw_error() which abort the process (it is
meant for CPU fatal error condition, not for device logging),
log the invalid request with qemu_log_mask(LOG_GUEST_ERROR)
and return, ignoring the request.

Cc: [email protected]
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

colo-compare: Fix memory leak in packet_enqueue()

The patch is to fix the "pkt" memory leak in packet_enqueue().
The allocated "pkt" needs to be freed if the colo compare
primary or secondary queue is too big.

Replace the error_report of full queue with a trace event.

Signed-off-by: Derek Su <[email protected]>
Reviewed-by: Zhang Chen <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net/colo-compare.c: Correct ordering in complete and finalize

In colo_compare_complete, insert CompareState into net_compares
only after everything has been initialized.
In colo_compare_finalize, remove CompareState from net_compares
before anything is deinitialized.

Signed-off-by: Lukas Straub <[email protected]>
Reviewed-by: Zhang Chen <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net/colo-compare.c: Check that colo-compare is active

If the colo-compare object is removed before failover and a
checkpoint happens, qemu crashes because it tries to lock
the destroyed event_mtx in colo_notify_compares_event.

Fix this by checking if everything is initialized by
introducing a new variable colo_compare_active which
is protected by a new mutex colo_compare_mutex. The new mutex
also protects against concurrent access of the net_compares
list and makes sure that colo_notify_compares_event isn't
active while we destroy event_mtx and event_complete_cond.

With this it also is again possible to use colo without
colo-compare (periodic mode) and to use multiple colo-compare
for multiple network interfaces.

Signed-off-by: Lukas Straub <[email protected]>
Tested-by: Lukas Straub <[email protected]>
Reviewed-by: Zhang Chen <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net/colo-compare.c: Only hexdump packets if tracing is enabled

Else the log will be flooded if there is a lot of network
traffic.

Signed-off-by: Lukas Straub <[email protected]>
Reviewed-by: Zhang Chen <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Tested-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net/colo-compare.c: Fix deadlock in compare_chr_send

The chr_out chardev is connected to a filter-redirector
running in the main loop. qemu_chr_fe_write_all might block
here in compare_chr_send if the (socket-)buffer is full.
If another filter-redirector in the main loop want's to
send data to chr_pri_in it might also block if the buffer
is full. This leads to a deadlock because both event loops
get blocked.

Fix this by converting compare_chr_send to a coroutine and
putting the packets in a send queue.

Signed-off-by: Lukas Straub <[email protected]>
Reviewed-by: Zhang Chen <[email protected]>
Tested-by: Zhang Chen <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

chardev/char.c: Use qemu_co_sleep_ns if in coroutine

To be able to convert compare_chr_send to a coroutine in the
next commit, use qemu_co_sleep_ns if in coroutine.

Signed-off-by: Lukas Straub <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Reviewed-by: Zhang Chen <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net/colo-compare.c: Create event_bh with the right AioContext

qemu_bh_new will set the bh to be executed in the main
loop. This causes crashes as colo_compare_handle_event assumes
that it has exclusive access the queues, which are also
concurrently accessed in the iothread.

Create the bh with the AioContext of the iothread to fulfill
these assumptions and fix the crashes. This is safe, because
the bh already takes the appropriate locks.

Signed-off-by: Lukas Straub <[email protected]>
Reviewed-by: Zhang Chen <[email protected]>
Reviewed-by: Derek Su <[email protected]>
Tested-by: Derek Su <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: use peer when purging queue in qemu_flush_or_purge_queue_packets()

The sender of packet will be checked in the qemu_net_queue_purge() but
we use NetClientState not its peer when trying to purge the incoming
queue in qemu_flush_or_purge_packets(). This will trigger the assert
in virtio_net_reset since we can't pass the sender check:

hw/net/virtio-net.c:533: void virtio_net_reset(VirtIODevice *): Assertion
`!virtio_net_get_subqueue(nc)->async_tx.elem' failed.
#9 0x55a33fa31b78 in virtio_net_reset hw/net/virtio-net.c:533:13
#10 0x55a33fc88412 in virtio_reset hw/virtio/virtio.c:1919:9
#11 0x55a341d82764 in virtio_bus_reset hw/virtio/virtio-bus.c:95:9
#12 0x55a341dba2de in virtio_pci_reset hw/virtio/virtio-pci.c:1824:5
#13 0x55a341db3e02 in virtio_pci_common_write hw/virtio/virtio-pci.c:1252:13
#14 0x55a33f62117b in memory_region_write_accessor memory.c:496:5
#15 0x55a33f6205e4 in access_with_adjusted_size memory.c:557:18
#16 0x55a33f61e177 in memory_region_dispatch_write memory.c:1488:16

Reproducer:
https://www.mail-archive.com/[email protected]/msg701914.html

Fix by using the peer.

Reported-by: "Alexander Bulekov" <[email protected]>
Acked-by: Alexander Bulekov <[email protected]>
Fixes: ca77d85e1dbf9 ("net: complete all queued packets on VM stop")
Cc: [email protected]
Signed-off-by: Jason Wang <[email protected]>

net: cadence_gem: Fix RX address filtering

Two defects are fixed:

1/ Detection of multicast frames
2/ Treating drop of mis-addressed frames as non-error

Signed-off-by: Tong Ho <[email protected]>
Signed-off-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: Sai Pavan Boddu <[email protected]>
Reviewed-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: cadence_gem: TX_LAST bit should be set by guest

TX_LAST bit should not be set by hardware, its set by guest to inform
the last bd of the frame.

Signed-off-by: Sai Pavan Boddu <[email protected]>
Signed-off-by: Edgar E. Iglesias <[email protected]>
Reviewed-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: cadence_gem: Update the reset value for interrupt mask register

Mask all interrupt on reset.

Signed-off-by: Sai Pavan Boddu <[email protected]>
Reviewed-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: cadnece_gem: Update irq_read_clear field of designcfg_debug1 reg

Advertise support of clear-on-read for ISR registers.

Signed-off-by: Sai Pavan Boddu <[email protected]>
Reviewed-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: cadence_gem: Add support for jumbo frames

Add a property "jumbo-max-len", which sets default value of jumbo frames
up to 16,383 bytes. Add Frame length checks for standard and jumbo
frames.

Signed-off-by: Sai Pavan Boddu <[email protected]>
Reviewed-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: cadence_gem: Fix up code style

Fix the code style for register definitions.

Signed-off-by: Sai Pavan Boddu <[email protected]>
Reviewed-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: cadence_gem: Move tx/rx packet buffert to CadenceGEMState

Moving this buffers to CadenceGEMState, as their size will be increased
more when JUMBO frames support is added.

Signed-off-by: Sai Pavan Boddu <[email protected]>
Reviewed-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: cadence_gem: Set ISR according to queue in use

Set ISR according to queue in use, added interrupt support for
all queues.

Signed-off-by: Sai Pavan Boddu <[email protected]>
Reviewed-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: cadence_gem: Define access permission for interrupt registers

Q1 to Q7 ISR's are clear-on-read, IER/IDR registers
are write-only, mask reg are read-only.

Signed-off-by: Sai Pavan Boddu <[email protected]>
Reviewed-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: cadence_gem: Fix irq update w.r.t queue

Set irq's specific to a queue, present implementation is setting q1 irq
based on q0 status.

Signed-off-by: Sai Pavan Boddu <[email protected]>
Reviewed-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: cadence_gem: Fix the queue address update during wrap around

During wrap around and reset, queues are pointing to initial base
address of queue 0, irrespective of what queue we are dealing with.
Fix it by assigning proper base address every time.

Signed-off-by: Sai Pavan Boddu <[email protected]>
Reviewed-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: cadence_gem: Fix debug statements

Enabling debug breaks the build, Fix them and make debug statements
always compilable. Fix few statements to use sized integer casting.

Signed-off-by: Sai Pavan Boddu <[email protected]>
Reviewed-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

hw/net/tulip: Log descriptor overflows

Log with GUEST_ERROR what the guest is doing wrong.

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

hw/net/tulip: Fix 'Descriptor Error' definition

Bit #14 is "DE" for 'Descriptor Error':

  When set, indicates a frame truncation caused by a frame
  that does not fit within the current descriptor buffers,
  and that the 21143 does not own the next descriptor.

  [Table 4-1. RDES0 Bit Fields Description]

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

Fix tulip breakage

The tulip network driver in a qemu-system-hppa emulation is broken in
the sense that bigger network packages aren't received any longer and
thus even running e.g. "apt update" inside the VM fails.

The breakage was introduced by commit 8ffb7265af ("check frame size and
r/w data length") which added checks to prevent accesses outside of the
rx/tx buffers.

But the new checks were implemented wrong. The variable rx_frame_len
counts backwards, from rx_frame_size down to zero, and the variable len
is never bigger than rx_frame_len, so accesses just can't happen and the
checks are unnecessary.
On the contrary the checks now prevented bigger packages to be moved
into the rx buffers.

This patch reverts the wrong checks and were sucessfully tested with a
qemu-system-hppa emulation.

Fixes: 8ffb7265af ("check frame size and r/w data length")
Buglink: https://bugs.launchpad.net/bugs/1874539
Signed-off-by: Helge Deller <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

virtio-net: align RSC fields with updated virtio-net header

Removal of duplicated RSC definitions. Changing names of the
fields to ones defined in the Linux header.

Signed-off-by: Yuri Benditovich <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

virtio-net: add migration support for RSS and hash report

Save and restore RSS/hash report configuration.

Signed-off-by: Yuri Benditovich <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

vmstate.h: provide VMSTATE_VARRAY_UINT16_ALLOC macro

Similar to VMSTATE_VARRAY_UINT32_ALLOC, but the size is
16-bit field.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Yuri Benditovich <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

virtio-net: reference implementation of hash report

Suggest VIRTIO_NET_F_HASH_REPORT if specified in device
parameters.
If the VIRTIO_NET_F_HASH_REPORT is set,
the device extends configuration space. If the feature
is negotiated, the packet layout is extended to
accomodate the hash information. In this case deliver
packet's hash value and report type in virtio header
extension.
Use for configuration the same procedure as already
used for RSS. We add two fields in rss_data that
controls what the device does with the calculated hash
if rss_data.enabled is set. If field 'populate' is set
the hash is set in the packet, if field 'redirect' is
set the hash is used to decide the queue to place the
packet to.

Signed-off-by: Yuri Benditovich <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

tap: allow extended virtio header with hash info

Signed-off-by: Yuri Benditovich <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

virtio-net: implement RX RSS processing

If VIRTIO_NET_F_RSS negotiated and RSS is enabled, process
incoming packets, calculate packet's hash and place the
packet into respective RX virtqueue.

Signed-off-by: Yuri Benditovich <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

virtio-net: implement RSS configuration command

Optionally report RSS feature.
Handle RSS configuration command and keep RSS parameters
in virtio-net device context.

Signed-off-by: Yuri Benditovich <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- enhance handling of size-related BlockConf properties
- nvme: small fixes, refactoring and cleanups
- virtio-blk: On restart, process queued requests in the proper context
- icount: make dma reads deterministic
- iotests: Some fixes for rarely run cases
- .gitignore: Ignore storage-daemon files
- Minor code cleanups

# gpg: Signature made Wed 17 Jun 2020 15:47:19 BST
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Kevin Wolf <[email protected]>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (43 commits)
  iotests: Add copyright line in qcow2.py
  iotests/{190,291}: compat=0.10 is unsupported
  iotests/229: data_file is unsupported
  iotests/292: data_file is unsupported
  iotests/041: Skip test_small_target for qed
  iotests.py: Add skip_for_formats() decorator
  block: lift blocksize property limit to 2 MiB
  qdev-properties: add getter for size32 and blocksize
  block: make BlockConf size props 32bit and accept size suffixes
  qdev-properties: make blocksize accept size suffixes
  qdev-properties: add size32 property type
  qdev-properties: blocksize: use same limits in code and description
  block: consolidate blocksize properties consistency checks
  virtio-blk: store opt_io_size with correct size
  .gitignore: Ignore storage-daemon files
  hw/block/nvme: verify msix_init_exclusive_bar() return value
  hw/block/nvme: add msix_qsize parameter
  hw/block/nvme: Verify msix_vector_use() returned value
  hw/block/nvme: factor out controller identify setup
  hw/block/nvme: do cmb/pmr init as part of pci init
  ...

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/kraxel/tags/microvm-20200617-pull-request' into staging

microvm: memory config tweaks

# gpg: Signature made Wed 17 Jun 2020 13:28:44 BST
# gpg:                using RSA key 4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>" [full]
# gpg:                 aka "Gerd Hoffmann <[email protected]>" [full]
# gpg:                 aka "Gerd Hoffmann (private) <[email protected]>" [full]
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/microvm-20200617-pull-request:
  microvm: move virtio base to 0xfeb00000
  x86: move max-ram-below-4g to pc
  microvm: drop max-ram-below-4g support
  microvm: use 3G split unconditionally

Signed-off-by: Peter Maydell <[email protected]>

docs/s390x: fix vfio-ap device_del description

device_del requires an id and not a sysfsfile.

Fixes: bac03ec72f1b ("s390x/vfio-ap: document hot plug/unplug of vfio-ap device")
Signed-off-by: Christian Borntraeger <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Message-Id: <20200617160604 [email protected]>
[CH: add missing '$']
Signed-off-by: Cornelia Huck <[email protected]>

vfio-ccw: Add support for the CRW region and IRQ

The crw region can be used to obtain information about
Channel Report Words (CRW) from vfio-ccw driver.

Currently only channel-path related CRWs are passed to
QEMU from vfio-ccw driver.

Signed-off-by: Farhan Ali <[email protected]>
Signed-off-by: Eric Farman <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Message-Id: <20200505125757 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/css: Refactor the css_queue_crw() routine

We have a use case (vfio-ccw) where a CRW is already built and
ready to use. Rather than teasing out the components just to
reassemble it later, let's rework this code so we can queue a
fully-qualified CRW directly.

Signed-off-by: Eric Farman <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Message-Id: <20200505125757 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

vfio-ccw: Refactor ccw irq handler

Make it easier to add new ones in the future.

Signed-off-by: Eric Farman <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Message-Id: <20200505125757 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

vfio-ccw: Add support for the schib region

The schib region can be used to obtain the latest SCHIB from the host
passthrough subchannel. Since the guest SCHIB is virtualized,
we currently only update the path related information so that the
guest is aware of any path related changes when it issues the
'stsch' instruction.

Signed-off-by: Farhan Ali <[email protected]>
Signed-off-by: Eric Farman <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Message-Id: <20200505125757 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

vfio-ccw: Refactor cleanup of regions

While we're at it, add a g_free() for the async_cmd_region that
is the last thing currently created. g_free() knows how to handle
NULL pointers, so this makes it easier to remember what cleanups
need to be performed when new regions are added.

Signed-off-by: Eric Farman <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Message-Id: <20200505125757 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

Linux headers: update

Update against Linux 5.8-rc1.

Signed-off-by: Cornelia Huck <[email protected]>

configure: Add -Wno-psabi

On aarch64, gcc 9.3 is generating

qemu/exec.c: In function ‘address_space_translate_iommu’:
qemu/exec.c:431:28: note: parameter passing for argument of type \
  ‘MemTxAttrs’ {aka ‘struct MemTxAttrs’} changed in GCC 9.1

and many other repetitions.  This structure, and the functions
amongst which it is passed, are not part of a QEMU public API.
Therefore we do not care how the compiler passes the argument,
so long as the compiler is self-consistent.

The only portion of QEMU which does have a public api, and so
must have a stable abi, is "qemu/plugin.h".  We test this by
forcing -Wpsabi in tests/plugin/Makefile.

Buglink: https://bugs.launchpad.net/qemu/+bug/1881552
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200617201309.1640952 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

configure: Disable -Wtautological-type-limit-compare

Clang 10 enables this by default with -Wtype-limit.

All of the instances flagged by this Werror so far have been
cases in which we really do want the compiler to optimize away
the test completely. Disabling the warning will avoid having
to add ifdefs to work around this.

Cc: Eric Blake <[email protected]>
Buglink: https://bugs.launchpad.net/qemu/+bug/1878628
Acked-by: Thomas Huth <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200617201309.1640952 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

configure: Clean up warning flag lists

Use a helper function to tidy the assembly of gcc_flags.
Separate flags that disable warnings from those that enable,
and sort the disable warnings to the end.

Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200617201309.1640952 [email protected]
Suggested-by: Eric Blake <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

migration: fix xbzrle encoding rate calculation

It's reported an error of implicit conversion from "unsigned long" to
"double" when compiling with Clang 10. Simply make the encoding rate 0
when the encoded_size is 0.

Fixes: e460a4b1a4
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reported-by: Richard Henderson <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200617201309.1640952 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

fpu/softfloat: Silence 'bitwise negation of boolean expression' warning

When building with clang version 10.0.0-4ubuntu1, we get:

    CC      lm32-softmmu/fpu/softfloat.o
  fpu/softfloat.c:3365:13: error: bitwise negation of a boolean expression; did you mean logical negation? [-Werror,-Wbool-operation]
      absZ &= ~ ( ( ( roundBits ^ 0x40 ) == 0 ) & roundNearestEven );
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  fpu/softfloat.c:3423:18: error: bitwise negation of a boolean expression; did you mean logical negation? [-Werror,-Wbool-operation]
          absZ0 &= ~ ( ( (uint64_t) ( absZ1<<1 ) == 0 ) & roundNearestEven );
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  ...

  fpu/softfloat.c:4273:18: error: bitwise negation of a boolean expression; did you mean logical negation? [-Werror,-Wbool-operation]
          zSig1 &= ~ ( ( zSig2 + zSig2 == 0 ) & roundNearestEven );
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix by rewriting the fishy bitwise AND of two bools as an int.

Suggested-by: Eric Blake <[email protected]>
Buglink: https://bugs.launchpad.net/bugs/1881004
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20200617201309.1640952 [email protected]
Message-Id: <20200528155420 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

migration: fix multifd_send_pages() next channel

multifd_send_pages() loops around the available channels,
the next channel to use between two calls to multifd_send_pages() is stored
inside a local static variable, next_channel.

It works well, except if the number of channels decreases between two calls
to multifd_send_pages(). In this case, the loop can try to access the
data of a channel that doesn't exist anymore.

The problem can be triggered if we start a migration with a given number of
channels and then we cancel the migration to restart it with a lower number.
This ends generally with an error like:
qemu-system-ppc64: .../util/qemu-thread-posix.c:77: qemu_mutex_lock_impl: Assertion `mutex->initialized' failed.

This patch fixes the error by capping next_channel with the current number
of channels before using it.

Signed-off-by: Laurent Vivier <[email protected]>
Message-Id: <20200617113154 [email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

docs/xbzrle: update 'cache miss rate' and 'encoding rate' to docs

Signed-off-by: Mao Zhongyi <[email protected]>
Message-Id: <20200603080904 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

monitor/hmp-cmds: improvements for the 'info migrate'

When running:

(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
...
xbzrle transferred: 640892 kbytes
xbzrle pages: 16645936 pages
xbzrle cache miss: 1525426
xbzrle cache miss rate: 0.09
xbzrle encoding rate: 91.42
xbzrle overflow: 40896
...
compression pages: 377710 pages
compression busy: 0
compression busy rate: 0.00
compressed size: 463169457
compression rate: 3.33

Add units for 'xbzrle cache miss' and 'compressed size',
make it easier to read.

Suggested-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Mao Zhongyi <[email protected]>
Message-Id: <20200603080904 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

monitor/hmp-cmds: add 'goto end' to reduce duplicate code.

Signed-off-by: Mao Zhongyi <[email protected]>
Message-Id: <20200603080904 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

monitor/hmp-cmds: delete redundant Error check before invoke hmp_handle_error()

hmp_handle_error() does Error check internally.

Signed-off-by: Mao Zhongyi <[email protected]>
Message-Id: <20200603080904 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

monitor/hmp-cmds: don't silently output when running 'migrate_set_downtime' fails

Although 'migrate_set_downtime' has been deprecated and replaced
with 'migrate_set_parameter downtime_limit', it has not been
completely eliminated, possibly due to compatibility with older
versions. I think as long as this old parameter is running, we
should report appropriate message when something goes wrong, not
be silent.

before:
(qemu) migrate_set_downtime -1
(qemu)

after:
(qemu) migrate_set_downtime -1
Error: Parameter 'downtime_limit' expects an integer in the range of 0 to 2000 seconds

Signed-off-by: Mao Zhongyi <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Message-Id: <20200603080904 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

monitor/hmp-cmds: add units for migrate_parameters

When running:
(qemu) info migrate_parameters
announce-initial: 50 ms
announce-max: 550 ms
announce-step: 100 ms
compress-wait-thread: on
...
max-bandwidth: 33554432 bytes/second
downtime-limit: 300 milliseconds
x-checkpoint-delay: 20000
...
xbzrle-cache-size: 67108864

add units for the parameters 'x-checkpoint-delay' and
'xbzrle-cache-size', it's easier to read, also move
milliseconds to ms to keep the same style.

Signed-off-by: Mao Zhongyi <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Message-Id: <20200603080904 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

tests/migration: fix unreachable path in stress test

If stressone() or stress() exits it's because of a failure
because the test runs forever otherwise, so change stressone
and stress type to void to make the exit_failure() as the exit
function of main().

Signed-off-by: Mao Zhongyi <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Message-Id: <20200603080904 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

tests/migration: mem leak fix

‘data’ has the possibility of memory leaks， so use the
glib macros g_autofree recommended by CODING_STYLE.rst
to automatically release the memory that returned from
g_malloc().

Signed-off-by: Mao Zhongyi <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Message-Id: <20200603080904 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

hmp: Make json format optional for qom-set

Commit 7d2ef6dcc1cf ("hmp: Simplify qom-set") switched to the json
parser, making it possible to specify complex types. However, with this
change it is no longer possible to specify proper sizes (e.g., 2G, 128M),
turning the interface harder to use for properties that consume sizes.

Let's switch back to the previous handling and allow to specify passing
json via the "-j" parameter.

Cc: Philippe Mathieu-Daudé <[email protected]>
Cc: Markus Armbruster <[email protected]>
Cc: Dr. David Alan Gilbert <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: "Daniel P. Berrangé" <[email protected]>
Cc: Eduardo Habkost <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20200610075153 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

qom-hmp-cmds: fix a memleak in hmp_qom_get

'obj' forgot to free at the end of hmp_qom_get(). Fix that.

The leak stack:
Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7f4e3a779ae8 in __interceptor_malloc (/lib64/libasan.so.5+0xefae8)
    #1 0x7f4e398f91d5 in g_malloc (/lib64/libglib-2.0.so.0+0x531d5)
    #2 0x55c9fd9a3999 in qstring_from_substr /build/qemu/src/qobject/qstring.c:45
    #3 0x55c9fd894bd3 in qobject_output_type_str /build/qemu/src/qapi/qobject-output-visitor.c:175
    #4 0x55c9fd894bd3 in qobject_output_type_str /build/qemu/src/qapi/qobject-output-visitor.c:168
    #5 0x55c9fd88b34d in visit_type_str /build/qemu/src/qapi/qapi-visit-core.c:308
    #6 0x55c9fd59aa6b in property_get_str /build/qemu/src/qom/object.c:2064
    #7 0x55c9fd5adb8a in object_property_get_qobject /build/qemu/src/qom/qom-qobject.c:38
    #8 0x55c9fd4a029d in hmp_qom_get /build/qemu/src/qom/qom-hmp-cmds.c:66

Fixes: 89cf4fe34f4
Reported-by: Euler Robot <[email protected]>
Signed-off-by: Pan Nengyuan <[email protected]>
Message-Id: <20200603070338 [email protected]>
Reviewed-by: Li Qiang <[email protected]>
Tested-by: Li Qiang <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

virtiofsd: Whitelist fchmod

lo_setattr() invokes fchmod() in a rarely used code path, so it should
be whitelisted or virtiofsd will crash with EBADSYS.

Said code path can be triggered for example as follows:

On the host, in the shared directory, create a file with the sticky bit
set and a security.capability xattr:
(1) # touch foo
(2) # chmod u+s foo
(3) # setcap '' foo

Then in the guest let some process truncate that file after it has
dropped all of its capabilities (at least CAP_FSETID):

int main(int argc, char *argv[])
{
    capng_setpid(getpid());
    capng_clear(CAPNG_SELECT_BOTH);
    capng_updatev(CAPNG_ADD, CAPNG_PERMITTED | CAPNG_EFFECTIVE, 0);
    capng_apply(CAPNG_SELECT_BOTH);

    ftruncate(open(argv[1], O_RDWR), 0);
}

This will cause the guest kernel to drop the sticky bit (i.e. perform a
mode change) as part of the truncate (where FATTR_FH is set), and that
will cause virtiofsd to invoke fchmod() instead of fchmodat().

(A similar configuration exists further below with futimens() vs.
utimensat(), but the former is not a syscall but just a wrapper for the
latter, so no further whitelisting is required.)

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1842667
Reported-by: Qian Cai <[email protected]>
Cc: [email protected]
Signed-off-by: Max Reitz <[email protected]>
Message-Id: <20200608093111 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Vivek Goyal <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

Merge remote-tracking branch 'remotes/kraxel/tags/usb-20200617-pull-request' into staging

usb-host: add hostdevice property, workaround libusb bug

# gpg: Signature made Wed 17 Jun 2020 11:47:37 BST
# gpg:                using RSA key 4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>" [full]
# gpg:                 aka "Gerd Hoffmann <[email protected]>" [full]
# gpg:                 aka "Gerd Hoffmann (private) <[email protected]>" [full]
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/usb-20200617-pull-request:
  usb-host: workaround libusb bug
  usb: add hostdevice property to usb-host

Signed-off-by: Peter Maydell <[email protected]>

iotests: Add copyright line in qcow2.py

The file qcow2.py was originally contributed in 2012 by Kevin Wolf,
but was not given traditional boilerplate headers at the time. The
missing license was just rectified (commit 16306a7b39) using the
project-default GPLv2+, but as Vladimir is not at Red Hat, he did not
add a Copyright line. All earlier contributions have come from CC'd
authors, where all but Stefan used a Red Hat address at the time of
the contribution, and that copyright carries over to the split to
qcow2_format.py (d5262c7124).

CC: Kevin Wolf <[email protected]>
CC: Stefan Hajnoczi <[email protected]>
CC: Eduardo Habkost <[email protected]>
CC: Max Reitz <[email protected]>
CC: Philippe Mathieu-Daudé <[email protected]>
CC: Paolo Bonzini <[email protected]>
Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20200609205944.3549240 [email protected]>
Acked-by: Stefan Hajnoczi <[email protected]>
Acked-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests/{190,291}: compat=0.10 is unsupported

Fixes: 5d72c68b49769c927e90b78af6d90f6a384b26ac
Fixes: cf2d1203dcfc2bf964453d83a2302231ce77f2dc
Signed-off-by: Max Reitz <[email protected]>
Message-Id: <20200617104822 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests/229: data_file is unsupported

Fixes: d89ac3cf305b28c024a76805a84d75c0ee1e786f
Signed-off-by: Max Reitz <[email protected]>
Message-Id: <20200617104822 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests/292: data_file is unsupported

Fixes: e4d7019e1a81c61de6a925c3ac5bb6e62ea21b29
Signed-off-by: Max Reitz <[email protected]>
Message-Id: <20200617104822 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests/041: Skip test_small_target for qed

qed does not support shrinking images, so the test_small_target method
should be skipped to keep 041 passing.

Fixes: 16cea4ee1c8e5a69a058e76f426b2e17974d8d7d
Signed-off-by: Max Reitz <[email protected]>
Message-Id: <20200617104822 [email protected]>
Tested-by: Thomas Huth <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests.py: Add skip_for_formats() decorator

Sometimes, we want to skip some test methods for certain formats. This
decorator allows that.

Signed-off-by: Max Reitz <[email protected]>
Message-Id: <20200617104822 [email protected]>
Tested-by: Thomas Huth <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: lift blocksize property limit to 2 MiB

Logical and physical block sizes in QEMU are limited to 32 KiB.

This appears unnecessarily tight, and we've seen bigger block sizes
handy at times.

Lift the limitation up to 2 MiB which appears to be good enough for
everybody, and matches the qcow2 cluster size limit.

Signed-off-by: Roman Kagan <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-Id: <20200528225516.1676602 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qdev-properties: add getter for size32 and blocksize

Add getter for size32, and use it for blocksize, too.

In its human-readable branch, it reports approximate size in
human-readable units next to the exact byte value, like the getter for
64bit size does.

Adjust the expected test output accordingly.

Signed-off-by: Roman Kagan <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-Id: <20200528225516.1676602 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: make BlockConf size props 32bit and accept size suffixes

Convert all size-related properties in BlockConf to 32bit. This will
accommodate bigger block sizes (in a followup patch). This also allows
to make them all accept size suffixes, either via DEFINE_PROP_BLOCKSIZE
or via DEFINE_PROP_SIZE32.

Also, since min_io_size is exposed to the guest by scsi and virtio-blk
devices as an uint16_t in units of logical blocks, introduce an
additional check in blkconf_blocksizes to prevent its silent truncation.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20200528225516.1676602 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qdev-properties: make blocksize accept size suffixes

It appears convenient to be able to specify physical_block_size and
logical_block_size using common size suffixes.

Teach the blocksize property setter to interpret them. Also express the
upper and lower limits in the respective units.

Signed-off-by: Roman Kagan <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-Id: <20200528225516.1676602 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qdev-properties: add size32 property type

Introduce size32 property type which handles size suffixes (k, m, g)
just like size property, but is uint32_t rather than uint64_t. It's
going to be useful for properties that are byte sizes but are inherently
32bit, like BlkConf.opt_io_size or .discard_granularity (they are
switched to this new property type in a followup commit).

The getter for size32 is left out for a separate patch as its benefit is
less obvious, and it affects test output; for now the regular uint32
getter is used.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20200528225516.1676602 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qdev-properties: blocksize: use same limits in code and description

Make it easier (more visible) to maintain the limits on the blocksize
properties in sync with the respective description, by using macros both
in the code and in the description.

Signed-off-by: Roman Kagan <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-Id: <20200528225516.1676602 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: consolidate blocksize properties consistency checks

Several block device properties related to blocksize configuration must
be in certain relationship WRT each other: physical block must be no
smaller than logical block; min_io_size, opt_io_size, and
discard_granularity must be a multiple of a logical block.

To ensure these requirements are met, add corresponding consistency
checks to blkconf_blocksizes, adjusting its signature to communicate
possible error to the caller. Also remove the now redundant consistency
checks from the specific devices.

Signed-off-by: Roman Kagan <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Paul Durrant <[email protected]>
Message-Id: <20200528225516.1676602 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

virtio-blk: store opt_io_size with correct size

The width of opt_io_size in virtio_blk_config is 32bit. However, it's
written with virtio_stw_p; this may result in value truncation, and on
big-endian systems with legacy virtio in completely bogus readings in
the guest.

Use the appropriate accessor to store it.

Signed-off-by: Roman Kagan <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Message-Id: <20200528225516.1676602 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

.gitignore: Ignore storage-daemon files

The files are generated.

Fixes: 2af282ec51a ("qemu-storage-daemon: Add --monitor option")
Cc: Kevin Wolf <[email protected]>
Signed-off-by: Roman Bolshakov <[email protected]>
Message-Id: <20200612105830 [email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Tested-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hw/block/nvme: verify msix_init_exclusive_bar() return value

Pass an Error to msix_init_exclusive_bar() and check it.

Signed-off-by: Klaus Jensen <[email protected]>
Message-Id: <20200609190333 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hw/block/nvme: add msix_qsize parameter

Decouple the requested maximum number of ioqpairs (param max_ioqpairs)
from the number of MSI-X interrupt vectors by introducing a new
msix_qsize parameter and initialize MSI-X with that. This allows
emulating a device that has fewer vectors than I/O queue pairs and also
allows more than 2048 queue pairs. To keep the device behaving as
previously, use a msix_qsize default of 65 (default max_ioqpairs + 1).

This decoupling was actually suggested by Maxim some time ago in a
slightly different context, so adding a Suggested-by.

Suggested-by: Maxim Levitsky <[email protected]>
Signed-off-by: Klaus Jensen <[email protected]>
Message-Id: <20200609190333 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hw/block/nvme: Verify msix_vector_use() returned value

msix_vector_use() returns -EINVAL on error. Assert it won't.

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <20200609190333 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hw/block/nvme: factor out controller identify setup

Signed-off-by: Klaus Jensen <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Message-Id: <20200609190333 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hw/block/nvme: do cmb/pmr init as part of pci init

Signed-off-by: Klaus Jensen <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <20200609190333 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hw/block/nvme: factor out pmr setup

Signed-off-by: Klaus Jensen <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <20200609190333 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hw/block/nvme: factor out cmb setup

Signed-off-by: Klaus Jensen <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Message-Id: <20200609190333 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hw/block/nvme: factor out pci setup

Signed-off-by: Klaus Jensen <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Message-Id: <20200609190333 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hw/block/nvme: factor out namespace setup

Signed-off-by: Klaus Jensen <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Message-Id: <20200609190333 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hw/block/nvme: add namespace helpers

Introduce some small helpers to make the next patches easier on the eye.

Signed-off-by: Klaus Jensen <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Message-Id: <20200609190333 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hw/block/nvme: factor out block backend setup

Signed-off-by: Klaus Jensen <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Message-Id: <20200609190333 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hw/block/nvme: factor out device state setup

Signed-off-by: Klaus Jensen <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Message-Id: <20200609190333 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hw/block/nvme: factor out property/constraint checks

Signed-off-by: Klaus Jensen <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Message-Id: <20200609190333 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hw/block/nvme: remove redundant cmbloc/cmbsz members

Signed-off-by: Klaus Jensen <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Message-Id: <20200609190333 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hw/block/nvme: add max_ioqpairs device parameter

The num_queues device paramater has a slightly confusing meaning because
it accounts for the admin queue pair which is not really optional.
Secondly, it is really a maximum value of queues allowed.

Add a new max_ioqpairs parameter that only accounts for I/O queue pairs,
but keep num_queues for compatibility.

Signed-off-by: Klaus Jensen <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Message-Id: <20200609190333 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hw/block/nvme: fix pin-based interrupt behavior

First, since the device only supports MSI-X or pin-based interrupt, if
MSI-X is not enabled, it should not accept interrupt vectors different
from 0 when creating completion queues.

Secondly, the irq_status NvmeCtrl member is meant to be compared to the
INTMS register, so it should only be 32 bits wide. And it is really only
useful when used with multi-message MSI.

Third, since we do not force a 1-to-1 correspondence between cqid and
interrupt vector, the irq_status register should not have bits set
according to cqid, but according to the associated interrupt vector.

Fix these issues, but keep irq_status available so we can easily support
multi-message MSI down the line.

Fixes: 5e9aa92eb1a5 ("hw/block: Fix pin-based interrupt behaviour of NVMe")
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Marcel Apfelbaum <[email protected]>
Signed-off-by: Klaus Jensen <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Message-Id: <20200609190333 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hw/block/nvme: refactor nvme_addr_read

Pull the controller memory buffer check to its own function. The check
will be used on its own in later patches.

Signed-off-by: Klaus Jensen <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Message-Id: <20200609190333 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hw/block/nvme: use constants in identify

Signed-off-by: Klaus Jensen <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Message-Id: <20200609190333 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hw/block/nvme: move device parameters to separate struct

Move device configuration parameters to separate struct to make it
explicit what is configurable and what is set internally.

Signed-off-by: Klaus Jensen <[email protected]>
Signed-off-by: Klaus Jensen <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Message-Id: <20200609190333 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>