Pavel Dovgalyuk [Fri, 29 May 2020 07:04:51 +0000 (10:04 +0300)]
tests/acceptance: add base class record/replay kernel tests
This patch adds a base for testing kernel boot recording and replaying.
Each test has the phase of recording and phase of replaying.
Virtual machines just boot the kernel and do not interact with
the network.
Structure and image links for the tests are borrowed from boot_linux_console.py
Testing controls the message pattern at the end of the kernel
boot for both record and replay modes. In replay mode QEMU is also
intended to finish the execution automatically.
MAINTAINERS: Add an entry to review Avocado based acceptance tests
Acceptance tests can test any piece of the QEMU codebase.
As such, the directory holding them does not belong to a specific
subsystem with designated maintainers.
Each subsystem covered by a test is welcomed to add the test path
to its section.
See for example commits 71b290e70, b11785ca2 or 5d480ddde.
Add an entry for to allow reviewers to be notified when acceptance /
integration tests are added or modified.
The designated reviewers are not maintainers, subsystem maintainers
are expected to merge their tests.
tests/qht-bench.c:287:29: error: implicit conversion from 'unsigned long'
to 'double' changes value from 18446744073709551615
to 18446744073709551616 [-Werror,-Wimplicit-int-float-conversion]
*threshold = rate * UINT64_MAX;
~ ^~~~~~~~~~
Fix this by splitting the 64-bit constant into two halves,
each of which is individually perfectly representable, the
sum of which produces the correct arithmetic result.
This is very likely just a sticking plaster over some underlying
incorrect code, but it will suppress the warning for the moment.
* remotes/cohuck/tags/s390x-20200618:
docs/s390x: fix vfio-ap device_del description
vfio-ccw: Add support for the CRW region and IRQ
s390x/css: Refactor the css_queue_crw() routine
vfio-ccw: Refactor ccw irq handler
vfio-ccw: Add support for the schib region
vfio-ccw: Refactor cleanup of regions
Linux headers: update
Peter Maydell [Thu, 18 Jun 2020 15:52:09 +0000 (16:52 +0100)]
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Thu 18 Jun 2020 14:16:22 BST
# gpg: using RSA key EF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <[email protected]>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request: (33 commits)
net: Drop the NetLegacy structure, always use Netdev instead
net: Drop the legacy "name" parameter from the -net option
hw/net/e1000e: Do not abort() on invalid PSRCTL register value
colo-compare: Fix memory leak in packet_enqueue()
net/colo-compare.c: Correct ordering in complete and finalize
net/colo-compare.c: Check that colo-compare is active
net/colo-compare.c: Only hexdump packets if tracing is enabled
net/colo-compare.c: Fix deadlock in compare_chr_send
chardev/char.c: Use qemu_co_sleep_ns if in coroutine
net/colo-compare.c: Create event_bh with the right AioContext
net: use peer when purging queue in qemu_flush_or_purge_queue_packets()
net: cadence_gem: Fix RX address filtering
net: cadence_gem: TX_LAST bit should be set by guest
net: cadence_gem: Update the reset value for interrupt mask register
net: cadnece_gem: Update irq_read_clear field of designcfg_debug1 reg
net: cadence_gem: Add support for jumbo frames
net: cadence_gem: Fix up code style
net: cadence_gem: Move tx/rx packet buffert to CadenceGEMState
net: cadence_gem: Set ISR according to queue in use
net: cadence_gem: Define access permission for interrupt registers
...
Peter Maydell [Thu, 18 Jun 2020 14:30:13 +0000 (15:30 +0100)]
Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20200617a' into staging
Migration (and HMP and virtiofs) pull 2020-06-17
Migration:
HMP/migration and test changes from Mao Zhongyi
multifd fix from Laurent Vivier
HMP
qom-set partial reversion/change from David Hildenbrand
now you need -j to pass json format, but it's regained the
old 100M type format.
Memory leak fix from Pan Nengyuan
Virtiofs
fchmod seccomp fix from Max Reitz
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
# gpg: Signature made Wed 17 Jun 2020 19:34:58 BST
# gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <[email protected]>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert/tags/pull-migration-20200617a:
migration: fix multifd_send_pages() next channel
docs/xbzrle: update 'cache miss rate' and 'encoding rate' to docs
monitor/hmp-cmds: improvements for the 'info migrate'
monitor/hmp-cmds: add 'goto end' to reduce duplicate code.
monitor/hmp-cmds: delete redundant Error check before invoke hmp_handle_error()
monitor/hmp-cmds: don't silently output when running 'migrate_set_downtime' fails
monitor/hmp-cmds: add units for migrate_parameters
tests/migration: fix unreachable path in stress test
tests/migration: mem leak fix
hmp: Make json format optional for qom-set
qom-hmp-cmds: fix a memleak in hmp_qom_get
virtiofsd: Whitelist fchmod
Thomas Huth [Mon, 18 May 2020 18:01:03 +0000 (20:01 +0200)]
net: Drop the NetLegacy structure, always use Netdev instead
Now that the "name" parameter is gone, there is hardly any difference
between NetLegacy and Netdev anymore, so we can drop NetLegacy and always
use Netdev to simplify the code quite a bit.
The only two differences that were really left between Netdev and NetLegacy:
1) NetLegacy does not allow a "hubport" type. We can continue to block
this with a simple check in net_client_init1() for this type.
2) The "id" parameter was optional in NetLegacy (and an internal id
was chosen via assign_name() during initialization), but it is mandatory
for Netdev. To avoid that the visitor code bails out here, we have to
add an internal id to the QemuOpts already earlier now.
2813 static void
2814 e1000e_set_psrctl(E1000ECore *core, int index, uint32_t val)
2815 {
2816 if (core->mac[RCTL] & E1000_RCTL_DTYP_MASK) {
2817
2818 if ((val & E1000_PSRCTL_BSIZE0_MASK) == 0) {
2819 hw_error("e1000e: PSRCTL.BSIZE0 cannot be zero");
2820 }
Instead of calling hw_error() which abort the process (it is
meant for CPU fatal error condition, not for device logging),
log the invalid request with qemu_log_mask(LOG_GUEST_ERROR)
and return, ignoring the request.
Derek Su [Fri, 22 May 2020 07:53:57 +0000 (15:53 +0800)]
colo-compare: Fix memory leak in packet_enqueue()
The patch is to fix the "pkt" memory leak in packet_enqueue().
The allocated "pkt" needs to be freed if the colo compare
primary or secondary queue is too big.
Replace the error_report of full queue with a trace event.
Lukas Straub [Fri, 22 May 2020 07:53:56 +0000 (15:53 +0800)]
net/colo-compare.c: Correct ordering in complete and finalize
In colo_compare_complete, insert CompareState into net_compares
only after everything has been initialized.
In colo_compare_finalize, remove CompareState from net_compares
before anything is deinitialized.
Lukas Straub [Fri, 22 May 2020 07:53:55 +0000 (15:53 +0800)]
net/colo-compare.c: Check that colo-compare is active
If the colo-compare object is removed before failover and a
checkpoint happens, qemu crashes because it tries to lock
the destroyed event_mtx in colo_notify_compares_event.
Fix this by checking if everything is initialized by
introducing a new variable colo_compare_active which
is protected by a new mutex colo_compare_mutex. The new mutex
also protects against concurrent access of the net_compares
list and makes sure that colo_notify_compares_event isn't
active while we destroy event_mtx and event_complete_cond.
With this it also is again possible to use colo without
colo-compare (periodic mode) and to use multiple colo-compare
for multiple network interfaces.
Lukas Straub [Fri, 22 May 2020 07:53:53 +0000 (15:53 +0800)]
net/colo-compare.c: Fix deadlock in compare_chr_send
The chr_out chardev is connected to a filter-redirector
running in the main loop. qemu_chr_fe_write_all might block
here in compare_chr_send if the (socket-)buffer is full.
If another filter-redirector in the main loop want's to
send data to chr_pri_in it might also block if the buffer
is full. This leads to a deadlock because both event loops
get blocked.
Fix this by converting compare_chr_send to a coroutine and
putting the packets in a send queue.
Lukas Straub [Fri, 22 May 2020 07:53:51 +0000 (15:53 +0800)]
net/colo-compare.c: Create event_bh with the right AioContext
qemu_bh_new will set the bh to be executed in the main
loop. This causes crashes as colo_compare_handle_event assumes
that it has exclusive access the queues, which are also
concurrently accessed in the iothread.
Create the bh with the AioContext of the iothread to fulfill
these assumptions and fix the crashes. This is safe, because
the bh already takes the appropriate locks.
Jason Wang [Mon, 11 May 2020 04:04:53 +0000 (12:04 +0800)]
net: use peer when purging queue in qemu_flush_or_purge_queue_packets()
The sender of packet will be checked in the qemu_net_queue_purge() but
we use NetClientState not its peer when trying to purge the incoming
queue in qemu_flush_or_purge_packets(). This will trigger the assert
in virtio_net_reset since we can't pass the sender check:
hw/net/virtio-net.c:533: void virtio_net_reset(VirtIODevice *): Assertion
`!virtio_net_get_subqueue(nc)->async_tx.elem' failed.
#9 0x55a33fa31b78 in virtio_net_reset hw/net/virtio-net.c:533:13
#10 0x55a33fc88412 in virtio_reset hw/virtio/virtio.c:1919:9
#11 0x55a341d82764 in virtio_bus_reset hw/virtio/virtio-bus.c:95:9
#12 0x55a341dba2de in virtio_pci_reset hw/virtio/virtio-pci.c:1824:5
#13 0x55a341db3e02 in virtio_pci_common_write hw/virtio/virtio-pci.c:1252:13
#14 0x55a33f62117b in memory_region_write_accessor memory.c:496:5
#15 0x55a33f6205e4 in access_with_adjusted_size memory.c:557:18
#16 0x55a33f61e177 in memory_region_dispatch_write memory.c:1488:16
Sai Pavan Boddu [Tue, 12 May 2020 14:54:44 +0000 (20:24 +0530)]
net: cadence_gem: Fix the queue address update during wrap around
During wrap around and reset, queues are pointing to initial base
address of queue 0, irrespective of what queue we are dealing with.
Fix it by assigning proper base address every time.
When set, indicates a frame truncation caused by a frame
that does not fit within the current descriptor buffers,
and that the 21143 does not own the next descriptor.
The tulip network driver in a qemu-system-hppa emulation is broken in
the sense that bigger network packages aren't received any longer and
thus even running e.g. "apt update" inside the VM fails.
The breakage was introduced by commit 8ffb7265af ("check frame size and
r/w data length") which added checks to prevent accesses outside of the
rx/tx buffers.
But the new checks were implemented wrong. The variable rx_frame_len
counts backwards, from rx_frame_size down to zero, and the variable len
is never bigger than rx_frame_len, so accesses just can't happen and the
checks are unnecessary.
On the contrary the checks now prevented bigger packages to be moved
into the rx buffers.
This patch reverts the wrong checks and were sucessfully tested with a
qemu-system-hppa emulation.
virtio-net: reference implementation of hash report
Suggest VIRTIO_NET_F_HASH_REPORT if specified in device
parameters.
If the VIRTIO_NET_F_HASH_REPORT is set,
the device extends configuration space. If the feature
is negotiated, the packet layout is extended to
accomodate the hash information. In this case deliver
packet's hash value and report type in virtio header
extension.
Use for configuration the same procedure as already
used for RSS. We add two fields in rss_data that
controls what the device does with the calculated hash
if rss_data.enabled is set. If field 'populate' is set
the hash is set in the packet, if field 'redirect' is
set the hash is used to decide the queue to place the
packet to.
If VIRTIO_NET_F_RSS negotiated and RSS is enabled, process
incoming packets, calculate packet's hash and place the
packet into respective RX virtqueue.
Peter Maydell [Thu, 18 Jun 2020 11:15:33 +0000 (12:15 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches:
- enhance handling of size-related BlockConf properties
- nvme: small fixes, refactoring and cleanups
- virtio-blk: On restart, process queued requests in the proper context
- icount: make dma reads deterministic
- iotests: Some fixes for rarely run cases
- .gitignore: Ignore storage-daemon files
- Minor code cleanups
* remotes/kevin/tags/for-upstream: (43 commits)
iotests: Add copyright line in qcow2.py
iotests/{190,291}: compat=0.10 is unsupported
iotests/229: data_file is unsupported
iotests/292: data_file is unsupported
iotests/041: Skip test_small_target for qed
iotests.py: Add skip_for_formats() decorator
block: lift blocksize property limit to 2 MiB
qdev-properties: add getter for size32 and blocksize
block: make BlockConf size props 32bit and accept size suffixes
qdev-properties: make blocksize accept size suffixes
qdev-properties: add size32 property type
qdev-properties: blocksize: use same limits in code and description
block: consolidate blocksize properties consistency checks
virtio-blk: store opt_io_size with correct size
.gitignore: Ignore storage-daemon files
hw/block/nvme: verify msix_init_exclusive_bar() return value
hw/block/nvme: add msix_qsize parameter
hw/block/nvme: Verify msix_vector_use() returned value
hw/block/nvme: factor out controller identify setup
hw/block/nvme: do cmb/pmr init as part of pci init
...
* remotes/kraxel/tags/microvm-20200617-pull-request:
microvm: move virtio base to 0xfeb00000
x86: move max-ram-below-4g to pc
microvm: drop max-ram-below-4g support
microvm: use 3G split unconditionally
Eric Farman [Tue, 5 May 2020 12:57:56 +0000 (14:57 +0200)]
s390x/css: Refactor the css_queue_crw() routine
We have a use case (vfio-ccw) where a CRW is already built and
ready to use. Rather than teasing out the components just to
reassemble it later, let's rework this code so we can queue a
fully-qualified CRW directly.
Farhan Ali [Tue, 5 May 2020 12:57:54 +0000 (14:57 +0200)]
vfio-ccw: Add support for the schib region
The schib region can be used to obtain the latest SCHIB from the host
passthrough subchannel. Since the guest SCHIB is virtualized,
we currently only update the path related information so that the
guest is aware of any path related changes when it issues the
'stsch' instruction.
Eric Farman [Tue, 5 May 2020 12:57:53 +0000 (14:57 +0200)]
vfio-ccw: Refactor cleanup of regions
While we're at it, add a g_free() for the async_cmd_region that
is the last thing currently created. g_free() knows how to handle
NULL pointers, so this makes it easier to remember what cleanups
need to be performed when new regions are added.
qemu/exec.c: In function ‘address_space_translate_iommu’:
qemu/exec.c:431:28: note: parameter passing for argument of type \
‘MemTxAttrs’ {aka ‘struct MemTxAttrs’} changed in GCC 9.1
and many other repetitions. This structure, and the functions
amongst which it is passed, are not part of a QEMU public API.
Therefore we do not care how the compiler passes the argument,
so long as the compiler is self-consistent.
The only portion of QEMU which does have a public api, and so
must have a stable abi, is "qemu/plugin.h". We test this by
forcing -Wpsabi in tests/plugin/Makefile.
Clang 10 enables this by default with -Wtype-limit.
All of the instances flagged by this Werror so far have been
cases in which we really do want the compiler to optimize away
the test completely. Disabling the warning will avoid having
to add ifdefs to work around this.
Use a helper function to tidy the assembly of gcc_flags.
Separate flags that disable warnings from those that enable,
and sort the disable warnings to the end.
Wei Wang [Wed, 17 Jun 2020 20:13:05 +0000 (13:13 -0700)]
migration: fix xbzrle encoding rate calculation
It's reported an error of implicit conversion from "unsigned long" to
"double" when compiling with Clang 10. Simply make the encoding rate 0
when the encoded_size is 0.
Laurent Vivier [Wed, 17 Jun 2020 11:31:54 +0000 (13:31 +0200)]
migration: fix multifd_send_pages() next channel
multifd_send_pages() loops around the available channels,
the next channel to use between two calls to multifd_send_pages() is stored
inside a local static variable, next_channel.
It works well, except if the number of channels decreases between two calls
to multifd_send_pages(). In this case, the loop can try to access the
data of a channel that doesn't exist anymore.
The problem can be triggered if we start a migration with a given number of
channels and then we cancel the migration to restart it with a lower number.
This ends generally with an error like:
qemu-system-ppc64: .../util/qemu-thread-posix.c:77: qemu_mutex_lock_impl: Assertion `mutex->initialized' failed.
This patch fixes the error by capping next_channel with the current number
of channels before using it.
Mao Zhongyi [Wed, 3 Jun 2020 08:08:59 +0000 (16:08 +0800)]
monitor/hmp-cmds: don't silently output when running 'migrate_set_downtime' fails
Although 'migrate_set_downtime' has been deprecated and replaced
with 'migrate_set_parameter downtime_limit', it has not been
completely eliminated, possibly due to compatibility with older
versions. I think as long as this old parameter is running, we
should report appropriate message when something goes wrong, not
be silent.
before:
(qemu) migrate_set_downtime -1
(qemu)
after:
(qemu) migrate_set_downtime -1
Error: Parameter 'downtime_limit' expects an integer in the range of 0 to 2000 seconds
Mao Zhongyi [Wed, 3 Jun 2020 08:08:58 +0000 (16:08 +0800)]
monitor/hmp-cmds: add units for migrate_parameters
When running:
(qemu) info migrate_parameters
announce-initial: 50 ms
announce-max: 550 ms
announce-step: 100 ms
compress-wait-thread: on
...
max-bandwidth: 33554432 bytes/second
downtime-limit: 300 milliseconds
x-checkpoint-delay: 20000
...
xbzrle-cache-size: 67108864
add units for the parameters 'x-checkpoint-delay' and
'xbzrle-cache-size', it's easier to read, also move
milliseconds to ms to keep the same style.
Mao Zhongyi [Wed, 3 Jun 2020 08:08:57 +0000 (16:08 +0800)]
tests/migration: fix unreachable path in stress test
If stressone() or stress() exits it's because of a failure
because the test runs forever otherwise, so change stressone
and stress type to void to make the exit_failure() as the exit
function of main().
Mao Zhongyi [Wed, 3 Jun 2020 08:08:56 +0000 (16:08 +0800)]
tests/migration: mem leak fix
‘data’ has the possibility of memory leaks, so use the
glib macros g_autofree recommended by CODING_STYLE.rst
to automatically release the memory that returned from
g_malloc().
Commit 7d2ef6dcc1cf ("hmp: Simplify qom-set") switched to the json
parser, making it possible to specify complex types. However, with this
change it is no longer possible to specify proper sizes (e.g., 2G, 128M),
turning the interface harder to use for properties that consume sizes.
Let's switch back to the previous handling and allow to specify passing
json via the "-j" parameter.
Pan Nengyuan [Wed, 3 Jun 2020 07:03:38 +0000 (03:03 -0400)]
qom-hmp-cmds: fix a memleak in hmp_qom_get
'obj' forgot to free at the end of hmp_qom_get(). Fix that.
The leak stack:
Direct leak of 40 byte(s) in 1 object(s) allocated from:
#0 0x7f4e3a779ae8 in __interceptor_malloc (/lib64/libasan.so.5+0xefae8)
#1 0x7f4e398f91d5 in g_malloc (/lib64/libglib-2.0.so.0+0x531d5)
#2 0x55c9fd9a3999 in qstring_from_substr /build/qemu/src/qobject/qstring.c:45
#3 0x55c9fd894bd3 in qobject_output_type_str /build/qemu/src/qapi/qobject-output-visitor.c:175
#4 0x55c9fd894bd3 in qobject_output_type_str /build/qemu/src/qapi/qobject-output-visitor.c:168
#5 0x55c9fd88b34d in visit_type_str /build/qemu/src/qapi/qapi-visit-core.c:308
#6 0x55c9fd59aa6b in property_get_str /build/qemu/src/qom/object.c:2064
#7 0x55c9fd5adb8a in object_property_get_qobject /build/qemu/src/qom/qom-qobject.c:38
#8 0x55c9fd4a029d in hmp_qom_get /build/qemu/src/qom/qom-hmp-cmds.c:66
Max Reitz [Mon, 8 Jun 2020 09:31:11 +0000 (11:31 +0200)]
virtiofsd: Whitelist fchmod
lo_setattr() invokes fchmod() in a rarely used code path, so it should
be whitelisted or virtiofsd will crash with EBADSYS.
Said code path can be triggered for example as follows:
On the host, in the shared directory, create a file with the sticky bit
set and a security.capability xattr:
(1) # touch foo
(2) # chmod u+s foo
(3) # setcap '' foo
Then in the guest let some process truncate that file after it has
dropped all of its capabilities (at least CAP_FSETID):
This will cause the guest kernel to drop the sticky bit (i.e. perform a
mode change) as part of the truncate (where FATTR_FH is set), and that
will cause virtiofsd to invoke fchmod() instead of fchmodat().
(A similar configuration exists further below with futimens() vs.
utimensat(), but the former is not a syscall but just a wrapper for the
latter, so no further whitelisting is required.)
Eric Blake [Tue, 9 Jun 2020 20:59:44 +0000 (15:59 -0500)]
iotests: Add copyright line in qcow2.py
The file qcow2.py was originally contributed in 2012 by Kevin Wolf,
but was not given traditional boilerplate headers at the time. The
missing license was just rectified (commit 16306a7b39) using the
project-default GPLv2+, but as Vladimir is not at Red Hat, he did not
add a Copyright line. All earlier contributions have come from CC'd
authors, where all but Stefan used a Red Hat address at the time of
the contribution, and that copyright carries over to the split to
qcow2_format.py (d5262c7124).
Roman Kagan [Thu, 28 May 2020 22:55:14 +0000 (01:55 +0300)]
block: make BlockConf size props 32bit and accept size suffixes
Convert all size-related properties in BlockConf to 32bit. This will
accommodate bigger block sizes (in a followup patch). This also allows
to make them all accept size suffixes, either via DEFINE_PROP_BLOCKSIZE
or via DEFINE_PROP_SIZE32.
Also, since min_io_size is exposed to the guest by scsi and virtio-blk
devices as an uint16_t in units of logical blocks, introduce an
additional check in blkconf_blocksizes to prevent its silent truncation.
Roman Kagan [Thu, 28 May 2020 22:55:12 +0000 (01:55 +0300)]
qdev-properties: add size32 property type
Introduce size32 property type which handles size suffixes (k, m, g)
just like size property, but is uint32_t rather than uint64_t. It's
going to be useful for properties that are byte sizes but are inherently
32bit, like BlkConf.opt_io_size or .discard_granularity (they are
switched to this new property type in a followup commit).
The getter for size32 is left out for a separate patch as its benefit is
less obvious, and it affects test output; for now the regular uint32
getter is used.
Roman Kagan [Thu, 28 May 2020 22:55:11 +0000 (01:55 +0300)]
qdev-properties: blocksize: use same limits in code and description
Make it easier (more visible) to maintain the limits on the blocksize
properties in sync with the respective description, by using macros both
in the code and in the description.
Several block device properties related to blocksize configuration must
be in certain relationship WRT each other: physical block must be no
smaller than logical block; min_io_size, opt_io_size, and
discard_granularity must be a multiple of a logical block.
To ensure these requirements are met, add corresponding consistency
checks to blkconf_blocksizes, adjusting its signature to communicate
possible error to the caller. Also remove the now redundant consistency
checks from the specific devices.
Roman Kagan [Thu, 28 May 2020 22:55:09 +0000 (01:55 +0300)]
virtio-blk: store opt_io_size with correct size
The width of opt_io_size in virtio_blk_config is 32bit. However, it's
written with virtio_stw_p; this may result in value truncation, and on
big-endian systems with legacy virtio in completely bogus readings in
the guest.
Klaus Jensen [Tue, 9 Jun 2020 19:03:32 +0000 (21:03 +0200)]
hw/block/nvme: add msix_qsize parameter
Decouple the requested maximum number of ioqpairs (param max_ioqpairs)
from the number of MSI-X interrupt vectors by introducing a new
msix_qsize parameter and initialize MSI-X with that. This allows
emulating a device that has fewer vectors than I/O queue pairs and also
allows more than 2048 queue pairs. To keep the device behaving as
previously, use a msix_qsize default of 65 (default max_ioqpairs + 1).
This decoupling was actually suggested by Maxim some time ago in a
slightly different context, so adding a Suggested-by.
Klaus Jensen [Tue, 9 Jun 2020 19:03:19 +0000 (21:03 +0200)]
hw/block/nvme: add max_ioqpairs device parameter
The num_queues device paramater has a slightly confusing meaning because
it accounts for the admin queue pair which is not really optional.
Secondly, it is really a maximum value of queues allowed.
Add a new max_ioqpairs parameter that only accounts for I/O queue pairs,
but keep num_queues for compatibility.
Klaus Jensen [Tue, 9 Jun 2020 19:03:18 +0000 (21:03 +0200)]
hw/block/nvme: fix pin-based interrupt behavior
First, since the device only supports MSI-X or pin-based interrupt, if
MSI-X is not enabled, it should not accept interrupt vectors different
from 0 when creating completion queues.
Secondly, the irq_status NvmeCtrl member is meant to be compared to the
INTMS register, so it should only be 32 bits wide. And it is really only
useful when used with multi-message MSI.
Third, since we do not force a 1-to-1 correspondence between cqid and
interrupt vector, the irq_status register should not have bits set
according to cqid, but according to the associated interrupt vector.
Fix these issues, but keep irq_status available so we can easily support
multi-message MSI down the line.