virtio-net: support RSC v4/v6 tcp traffic for Windows HCK
This commit adds implementation of RX packets
coalescing, compatible with requirements of Windows
Hardware compatibility kit.
The device enables feature VIRTIO_NET_F_RSC_EXT in
host features if it supports extended RSC functionality
as defined in the specification.
This feature requires at least one of VIRTIO_NET_F_GUEST_TSO4,
VIRTIO_NET_F_GUEST_TSO6. Windows guest driver acks
this feature only if VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
is also present.
If the guest driver acks VIRTIO_NET_F_RSC_EXT feature,
the device coalesces TCPv4 and TCPv6 packets (if
respective VIRTIO_NET_F_GUEST_TSO feature is on,
populates extended RSC information in virtio header
and sets VIRTIO_NET_HDR_F_RSC_INFO bit in header flags.
The device does not recalculate checksums in the coalesced
packet, so they are not valid.
In this case:
All the data packets in a tcp connection are cached
to a single buffer in every receive interval, and will
be sent out via a timer, the 'virtio_net_rsc_timeout'
controls the interval, this value may impact the
performance and response time of tcp connection,
50000(50us) is an experience value to gain a performance
improvement, since the whql test sends packets every 100us,
so '300000(300us)' passes the test case, it is the default
value as well, tune it via the command line parameter
'rsc_interval' within 'virtio-net-pci' device, for example,
to launch a guest with interval set as '500000':
The timer will only be triggered if the packets pool is not empty,
and it'll drain off all the cached packets.
'NetRscChain' is used to save the segments of IPv4/6 in a
VirtIONet device.
A new segment becomes a 'Candidate' as well as it passed sanity check,
the main handler of TCP includes TCP window update, duplicated
ACK check and the real data coalescing.
An 'Candidate' segment means:
1. Segment is within current window and the sequence is the expected one.
2. 'ACK' of the segment is in the valid window.
Sanity check includes:
1. Incorrect version in IP header
2. An IP options or IP fragment
3. Not a TCP packet
4. Sanity size check to prevent buffer overflow attack.
5. An ECN packet
Even though, there might more cases should be considered such as
ip identification other flags, while it breaks the test because
windows set it to the same even it's not a fragment.
Normally it includes 2 typical ways to handle a TCP control flag,
'bypass' and 'finalize', 'bypass' means should be sent out directly,
while 'finalize' means the packets should also be bypassed, but this
should be done after search for the same connection packets in the
pool and drain all of them out, this is to avoid out of order fragment.
All the 'SYN' packets will be bypassed since this always begin a new'
connection, other flags such 'URG/FIN/RST/CWR/ECE' will trigger a
finalization, because this normally happens upon a connection is going
to be closed, an 'URG' packet also finalize current coalescing unit.
Statistics can be used to monitor the basic coalescing status, the
'out of order' and 'out of window' means how many retransmitting packets,
thus describe the performance intuitively.
Difference between ip v4 and v6 processing:
Fragment length in ipv4 header includes itself, while it's not
included for ipv6, thus means ipv6 can carry a real 65535 payload.
Note that main goal of implementing this feature in software
is to create reference setup for certification tests. In such
setups guest migration is not required, so the coalesced packets
not yet delivered to the guest will be lost in case of migration.
Igor Mammedov [Thu, 27 Dec 2018 14:13:34 +0000 (15:13 +0100)]
tests: acpi: use AcpiSdtTable::aml instead of AcpiSdtTable::header::signature
AcpiSdtTable::header::signature is the only remained field from
AcpiTableHeader structure used by tests. Instead of using packed
structure to access signature, just read it directly from table
blob and remove no longer used AcpiSdtTable::header / union and
keep only AcpiSdtTable::aml byte array.
Igor Mammedov [Thu, 27 Dec 2018 14:13:33 +0000 (15:13 +0100)]
tests: acpi: squash sanitize_fadt_ptrs() into test_acpi_fadt_table()
some parts of sanitize_fadt_ptrs() do redundant job
- locating FADT
- checking original checksum
There is no need to do it as test_acpi_fadt_table() already does that,
so drop duplicate code and move remaining fixup code into
test_acpi_fadt_table().
Igor Mammedov [Thu, 27 Dec 2018 14:13:32 +0000 (15:13 +0100)]
tests: smbios: fetch whole table in one step instead of reading it step by step
replace a bunch of ACPI_READ_ARRAY/ACPI_READ_FIELD macro, that read
SMBIOS table field by field with one memread() to fetch whole table
at once and drop no longer used ACPI_READ_ARRAY/ACPI_READ_FIELD macro.
Igor Mammedov [Thu, 27 Dec 2018 14:13:31 +0000 (15:13 +0100)]
tests: acpi: reuse fetch_table() in vmgenid-test
Move fetch_table() into acpi-utils.c renaming it to acpi_fetch_table()
and reuse it in vmgenid-test that reads RSDT and then tables it references,
to find and parse VMGNEID SSDT.
While at it wrap RSDT referenced tables enumeration into FOREACH macro
(similar to what we do with QLIST_FOREACH & co) to reuse it with bios and
vmgenid tests.
Eric Blake [Fri, 11 Jan 2019 20:13:30 +0000 (14:13 -0600)]
qemu.py: Fix error message when qemu dies from signal
When qemu dies from a signal, the python code gets a negative
value for exitcode; but signal numbers are positive. Copy the
pattern used in qemu-iotests/iotests.py for reporting a positive
value.
This change adds a regression test for that fix. It starts
QEMU with a 2GiB dummy initrd, and checks that it evaluates the
file size correctly and prints an accurate message.
Cleber Rosa [Fri, 9 Nov 2018 15:07:10 +0000 (10:07 -0500)]
check-help: visual and content improvements
The "check" target is not a target that will run all other tests
listed, so in order to be accurate it's necessary to list those that
will run. The same is true for "check-clean".
Then, to give a better visual impression of the differences in the
various targets, let's add empty lines.
Finally, a small (and hopeful) grammar fix from a non-native speaker.
Cleber Rosa [Fri, 9 Nov 2018 15:07:09 +0000 (10:07 -0500)]
Travis CI: make specified Python versions usable on jobs
For the two Python jobs, which seem to have the goal of making sure
QEMU builds successfully on the 3.0-3.6 spectrum of Python 3 versions,
the specified version is only applicable if a Python virtual
environment is used. To do that, it's necessary to define the
(primary?) language of the job to be Python.
Also, Travis doesn't have a 3.0 Python installation available for the
chosen distro, 3.4 being the lower version available.
Reference: https://docs.travis-ci.com/user/languages/python/#specifying-python-versions Signed-off-by: Cleber Rosa <[email protected]>
Message-Id: <20181109150710[email protected]> Reviewed-by: Alex Bennée <[email protected]>
[ehabkost: Now 3.4 is the lowest Python version available] Signed-off-by: Eduardo Habkost <[email protected]>
fixup! Travis CI: make specified Python versions usable on jobs
Cleber Rosa [Fri, 9 Nov 2018 15:07:07 +0000 (10:07 -0500)]
configure: keep track of Python version
Some functionality is dependent on the Python version
detected/configured on configure. While it's possible to run the
Python version later and check for the version, doing it once is
preferable. Also, it's a relevant information to keep in build logs,
as the overall behavior of the build can be affected by it.
block/mirror: fix and improve do_sync_target_write
Use bdrv_dirty_bitmap_next_dirty_area() instead of
bdrv_dirty_iter_next_area(), because of the following problems of
bdrv_dirty_iter_next_area():
1. Using HBitmap iterators we should carefully handle unaligned offset,
as first call to hbitmap_iter_next() may return a value less than
original offset (actually, it will be original offset rounded down to
bitmap granularity). This handling is not done in
do_sync_target_write().
look at the code:
if (max_offset == iter->bitmap->size) {
/* If max_offset points to the image end, round it up by the
* bitmap granularity */
gran_max_offset = ROUND_UP(max_offset, granularity);
} else {
gran_max_offset = max_offset;
}
ret = hbitmap_iter_next(&iter->hbi, false);
if (ret < 0 || ret + granularity > gran_max_offset) {
return false;
}
and assume that max_offset != iter->bitmap->size but still unaligned.
if 0 < ret < max_offset we found dirty area, but the function can
return false in this case (if ret + granularity > max_offset).
3. bdrv_dirty_iter_next_area() uses inefficient loop to find the end of
the dirty area. Let's use more efficient hbitmap_next_zero instead
(bdrv_dirty_bitmap_next_dirty_area() do so)
The function alters bdrv_dirty_iter_next_area(), which is wrong and
less efficient (see further commit
"block/mirror: fix and improve do_sync_target_write" for description).
Peter Maydell [Tue, 15 Jan 2019 18:32:57 +0000 (18:32 +0000)]
Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault' into staging
slirp updates
Gerd Hoffmann (1):
slirp: add tftp tracing
Marc-André Lureau (61):
slirp: associate slirp_output callback with the Slirp context
slirp: remove do_pty from fork_exec()
slirp: replace ex_pty with ex_chardev
slirp: use a dedicated field for chardev pointer
slirp: remove unused EMU_RSH
slirp: rename /extra/chardev
slirp: move internal function declarations
slirp: remove Monitor dependency, return a string for info
slirp: fix slirp_add_exec() leaks
slirp: replace the poor-man string split with g_strsplit()
slirp: remove dead declarations
slirp: move socket pair creation in helper function
slirp: remove unused M_TRAILINGSPACE
slirp: use a callback structure to interface with qemu
slirp: remove PROBE_CONN dead-code
slirp: remove FULL_BOLT
slirp: remove the disabled readv()/writev() code path
slirp: remove HAVE_SYS_SIGNAL_H
slirp: remove unused HAVE_SYS_BITYPES_H
slirp: remove NO_UNIX_SOCKETS
slirp: remove unused HAVE_SYS_STROPTS_H
slirp: remove unused HAVE_ARPA_INET_H
slirp: remove unused HAVE_SYS_WAIT_H
slirp: remove unused HAVE_SYS_SELECT_H
slirp: remove HAVE_SYS_IOCTL_H
slirp: remove HAVE_SYS_FILIO_H
slirp: remove unused DECLARE_IOVEC
slirp: remove unused HAVE_INET_ATON
slirp: replace HOST_WORDS_BIGENDIAN with glib equivalent
slirp: replace SIZEOF_CHAR_P with glib equivalent
slirp: replace compile time DO_KEEPALIVE
slirp: remove unused global slirp_instance
slirp: replace error_report() with g_critical()
slirp: improve a bit the debug macros
slirp: add a callback to log guest errors
slirp: remove #if notdef dead code
slirp: remove unused sbflush()
slirp: NULL is defined by stddef.h
slirp: remove dead TCP_ACK_HACK code
slirp: replace ARRAY_SIZE with G_N_ELEMENTS
net: do not depend on slirp internals
glib-compat: add g_spawn_async_with_fds() fallback
slirp: simplify fork_exec()
slirp: replace error_report() with g_critical()
slirp: drop <Vista compatibility
slirp: rename exec_list
slirp: use virtual time for packet expiration
slirp: replace a fprintf with g_critical()
slirp: replace some fprintf() with DEBUG_MISC
slirp: replace a DEBUG block with WITH_ICMP_ERROR_MSG
slirp: no need to make DPRINTF conditional on DEBUG
slirp: always build with debug statements
slirp: introduce SLIRP_DEBUG environment variable
slirp: use %p for pointers format
slirp: remove remaining DEBUG blocks
slirp: replace DEBUG_ARGS with DEBUG_ARG
slirp: factor out guestfwd addition checks
slirp: add clock_get_ns() callback
build-sys: use a separate slirp-obj-y && slirp.mo
slirp: set G_LOG_DOMAIN
slirp: call into g_debug() for DEBUG macros
Prasad J Pandit (1):
slirp: check data length while emulating ident function
Samuel Thibault (2):
slirp: Enable fork_exec support on Windows
slirp: Mark debugging calls as unlikely
# gpg: Signature made Mon 14 Jan 2019 22:52:32 GMT
# gpg: using RSA key DB550E89F0FA54F3
# gpg: Good signature from "Samuel Thibault <[email protected]>"
# gpg: aka "Samuel Thibault <[email protected]>"
# gpg: aka "Samuel Thibault <[email protected]>"
# gpg: aka "Samuel Thibault <[email protected]>"
# gpg: aka "Samuel Thibault <[email protected]>"
# gpg: aka "Samuel Thibault <[email protected]>"
# gpg: aka "Samuel Thibault <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 900C B024 B679 31D4 0F82 304B D017 8C76 7D06 9EE6
# Subkey fingerprint: E61D BB15 D417 2BDE C97E 92D9 DB55 0E89 F0FA 54F3
* remotes/thibault/tags/samuel-thibault: (65 commits)
slirp: check data length while emulating ident function
slirp: Mark debugging calls as unlikely
slirp: call into g_debug() for DEBUG macros
slirp: set G_LOG_DOMAIN
build-sys: use a separate slirp-obj-y && slirp.mo
slirp: add clock_get_ns() callback
slirp: factor out guestfwd addition checks
slirp: replace DEBUG_ARGS with DEBUG_ARG
slirp: remove remaining DEBUG blocks
slirp: use %p for pointers format
slirp: introduce SLIRP_DEBUG environment variable
slirp: always build with debug statements
slirp: no need to make DPRINTF conditional on DEBUG
slirp: replace a DEBUG block with WITH_ICMP_ERROR_MSG
slirp: replace some fprintf() with DEBUG_MISC
slirp: replace a fprintf with g_critical()
slirp: use virtual time for packet expiration
slirp: rename exec_list
slirp: drop <Vista compatibility
slirp: Enable fork_exec support on Windows
...
Peter Maydell [Tue, 15 Jan 2019 14:19:18 +0000 (14:19 +0000)]
Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2019-01-14' into staging
nbd patches for 2019-01-14
Promote bitmap/NBD interfaces to stable for use in incremental
backups. Add 'qemu-nbd --bitmap'.
- John Snow: 0/11 bitmaps: remove x- prefix from QMP api
- Philippe Mathieu-Daudé: qemu-nbd: Rename 'exp' variable clashing with math::exp() symbol
- Eric Blake: 0/8 Promote x-nbd-server-add-bitmap to stable
# gpg: Signature made Mon 14 Jan 2019 16:13:45 GMT
# gpg: using RSA key A7A16B4A2527436A
# gpg: Good signature from "Eric Blake <[email protected]>"
# gpg: aka "Eric Blake (Free Software Programmer) <[email protected]>"
# gpg: aka "[jpeg image of size 6874]"
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A
* remotes/ericb/tags/pull-nbd-2019-01-14:
qemu-nbd: Add --bitmap=NAME option
nbd: Merge nbd_export_bitmap into nbd_export_new
nbd: Remove x-nbd-server-add-bitmap
nbd: Allow bitmap export during QMP nbd-server-add
nbd: Merge nbd_export_set_name into nbd_export_new
nbd: Only require disabled bitmap for read-only exports
nbd: Forbid nbd-server-stop when server is not running
nbd: Add some error case testing to iotests 223
qemu-nbd: Rename 'exp' variable clashing with math::exp() symbol
iotests: add iotest 236 for testing bitmap merge
iotests: implement pretty-print for log and qmp_log
iotests: change qmp_log filters to expect QMP objects only
iotests: remove default filters from qmp_log
iotests: add qmp recursive sorting function
iotests: add filter_generated_node_ids
iotests.py: don't abort if IMGKEYSECRET is undefined
block: remove 'x' prefix from experimental bitmap APIs
blockdev: n-ary bitmap merge
block/dirty-bitmap: remove assertion from restore
blockdev: abort transactions in reverse order
Top changeset contributors by employer
Red Hat 3091 (43.3%)
Linaro 1201 (16.8%)
(None) 484 (6.8%)
IBM 426 (6.0%)
Academics (various) 186 (2.6%)
Virtuozzo 172 (2.4%)
Wave Computing 118 (1.7%)
Igalia 109 (1.5%)
Xilinx 102 (1.4%)
Cadence Design Systems 80 (1.1%)
Top lines changed by employer
Red Hat 140523 (30.3%)
Cadence Design Systems 81010 (17.5%)
Linaro 78098 (16.8%)
Wave Computing 33134 (7.1%)
IBM 18918 (4.1%)
SiFive 14436 (3.1%)
Academics (various) 11995 (2.6%)
(None) 11458 (2.5%)
Virtuozzo 10770 (2.3%)
Oracle 6698 (1.4%)
# gpg: Signature made Mon 14 Jan 2019 16:08:52 GMT
# gpg: using RSA key FBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <[email protected]>"
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44
* remotes/stsquad/tags/pull-misc-gitdm-next-140119-1:
MAINTAINERS: add myself as a route for gitdm updates
contrib/gitdm: add another name to WaveComp map
contrib/gitdm: add two more IBM'ers to group-map-ibm
contrib/gitdm: Add other IBMers
contrib/gitdm: add Nokia and Proxmox to the domain-map
Igor Mammedov [Thu, 27 Dec 2018 14:13:30 +0000 (15:13 +0100)]
tests: acpi: reuse fetch_table() for fetching FACS and DSDT
It allows to remove a bit more of code duplication and
reuse common utility to get ACPI tables from guest (modulo RSDP).
While at it, consolidate signature checking into fetch_table() instead
of open-codding it.
Considering FACS is special and doesn't have checksum, make checksum
validation optin, the same goes for signature verification.
PS:
By pure accident, patch also fixes FACS not being tested against
reference table since it wasn't added to data::tables list.
But we managed not to regress it since reference file was added
by commit
(d25979380 acpi unit-test: add test files)
back in 2013
Igor Mammedov [Thu, 27 Dec 2018 14:13:29 +0000 (15:13 +0100)]
tests: acpi: simplify rsdt handling
RSDT referenced tables always have length at offset 4 and checksum at
offset 9, that's enough for reusing fetch_table() and replacing custom
RSDT fetching code with it.
While at it
* merge fetch_rsdt_referenced_tables() into test_acpi_rsdt_table()
* drop test_data::rsdt_table/rsdt_tables_addr/rsdt_tables_nr since
we need this data only for duration of test_acpi_rsdt_table() to
fetch other tables and use locals instead.
Igor Mammedov [Thu, 27 Dec 2018 14:13:28 +0000 (15:13 +0100)]
tests: acpi: make sure FADT is fetched only once
Whole FADT is fetched as part of RSDT referenced tables in
fetch_rsdt_referenced_tables() albeit a bit later than when FADT
is partially parsed in fadt_fetch_facs_and_dsdt_ptrs().
However there is no reason for calling fetch_rsdt_referenced_tables()
so late, just move it right after we fetched RSDT and before
fadt_fetch_facs_and_dsdt_ptrs(). That way we can reuse whole FADT
fetched by fetch_rsdt_referenced_tables() and avoid duplicate
custom fields fetching in fadt_fetch_facs_and_dsdt_ptrs().
While at it rename fadt_fetch_facs_and_dsdt_ptrs() to
test_acpi_fadt_table(). The follow up patch will merge
fadt_fetch_facs_and_dsdt_ptrs() into test_acpi_rsdt_table(),
so that we would end up calling only test_acpi_FOO_table()
for consistency for tables that require special processing.
Igor Mammedov [Thu, 27 Dec 2018 14:13:27 +0000 (15:13 +0100)]
tests: acpi: use AcpiSdtTable::aml in consistent way
Currently in the 1st case we store table body fetched from QEMU in
AcpiSdtTable::aml minus it's header but in the 2nd case when we
load reference aml from disk, it holds whole blob including header.
More over in the 1st case, we read header in separate AcpiSdtTable::header
structure and then jump over hoops to fixup tables and combine both.
Treat AcpiSdtTable::aml as whole table blob approach in both cases
and when fetching tables from QEMU, first get table length and then
fetch whole table into AcpiSdtTable::aml instead if doing it field
by field.
As result
* AcpiSdtTable::aml is used in consistent manner
* FADT fixups use offsets from spec instead of being shifted by
header length
* calculating checksums and dumping blobs becomes simpler
Li Qiang [Sat, 15 Dec 2018 12:03:52 +0000 (04:03 -0800)]
vhost-user: fix ioeventfd_enabled
Currently, the vhost-user-test assumes the eventfd is available.
However it's not true because the accel is qtest. So the
'vhost_set_vring_file' will not add fds to the msg and the server
side of vhost-user-test will be broken. The bug is in 'ioeventfd_enabled'.
We should make this function return true if not using kvm accel.
Jian Wang [Sat, 22 Dec 2018 10:27:28 +0000 (18:27 +0800)]
qemu: avoid memory leak while remove disk
Memset vhost_dev to zero in the vhost_dev_cleanup function.
This causes dev.vqs to be NULL, so that
vqs does not free up space when calling the g_free function.
This will result in a memory leak. But you can't release vqs
directly in the vhost_dev_cleanup function, because vhost_net
will also call this function, and vhost_net's vqs is assigned by array.
In order to solve this problem, we first save the pointer of vqs,
and release the space of vqs after vhost_dev_cleanup is called.
It's been marked as deprecated in QEMU v2.6.0 already, so really nobody
should use the legacy "ivshmem" device anymore (but use ivshmem-plain or
ivshmem-doorbell instead). Time to remove the deprecated device now.
Belatedly also update a mention of the deprecated "ivshmem" in the file
docs/specs/ivshmem-spec.txt to "ivshmem-doorbell". Missed in commit 5400c02b90b ("ivshmem: Split ivshmem-plain, ivshmem-doorbell off ivshmem").
Dongli Zhang [Sun, 16 Dec 2018 23:34:39 +0000 (07:34 +0800)]
msix: make pba size math more uniform
In msix_exclusive_bar the bar_pba_size is more than what the pba is
expected to have, although this never affects the bar size.
Specifically, the math in msix_init_exclusive_bar allocates too much
memory in some cases.
For example consider nentries = 8. msix_exclusive_bar will give us
bar_pba_size = 16. So 16 bytes. However 8 bytes would be enough - this
is all that the spec requires.
So in practice bar_pba_size sometimes allocates an extra 8 bytes but
never more.
Since each MSIX entry size is 16 bytes, and since we make sure that
table+pba is a power of two, this always leaves a multiple of 16 bytes
for the PBA, so extra 8 bytes have no effect.
However, its ugly to have pba size temporary variable have an incorrect
value. For consistency switch to the formula used in msix_init.
We better stop right away. For now, errors would be partially ignored
(so the guest might get informed or the device might get unplugged),
although actual plug/unplug will be reported as failed to the user.
While at it, properly move the check to the pre_plug handler for the plug
case, as we can test the slot state before the device will be realized.
slirp: check data length while emulating ident function
While emulating identification protocol, tcp_emu() does not check
available space in the 'sc_rcv->sb_data' buffer. It could lead to
heap buffer overflow issue. Add check to avoid it.
This will allow to have cflags for the whole slirp.mo -objs.
It makes it possible to build tests that links only with
slirp-obj-y (and not the whole common-obj).
It is also a step towards building slirp as a shared library, although
this requires a bit more thoughts to build with
net/slirp.o (CONFIG_SLIRP would need to be 'm') and other build issues.
Peter Maydell [Mon, 14 Jan 2019 19:28:12 +0000 (19:28 +0000)]
Merge remote-tracking branch 'remotes/stsquad/tags/pull-testing-next-140119-1' into staging
A bunch of fixes for testing:
- Various Travis updates
- "stable" SID snapshot for docker
- avoid :latest docker tags
- g_usleep fix for some tests
# gpg: Signature made Mon 14 Jan 2019 14:59:35 GMT
# gpg: using RSA key FBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <[email protected]>"
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44
* remotes/stsquad/tags/pull-testing-next-140119-1: (21 commits)
Revert "tests: Disable qht-bench parallel test when using gprof"
tests: use g_usleep instead of rem = sleep(time)
tests/docker: remove SID_AGE test hack
tests/docker: update our Travis image
travis: bump to Xenial baseline
docker: Use a stable snapshot for Debian Sid
travis: remove matrix settings that duplicate global settings
travis: run tests in verbose mode
travis: stop using container based envs
travis: stop redefining the script commands
travis: use homebrew addon for MacOSX
travis: don't clone git submodules upfront
travis: standardize the syntax used for env variables
travis: define all the build matrix entries in one place
travis: add whitespace between each major section & matrix entry
tests: use in-place sed magic for enabling deb-src in travis image
tests: update Fedora i386 cross image to Fedora 29
tests: update Fedora dockerfile to use Fedora 29
tests: remove obsolete 'debian' dockerfile
tests: run ldconfig after installing extra software
...
Peter Maydell [Mon, 14 Jan 2019 17:35:00 +0000 (17:35 +0000)]
Merge remote-tracking branch 'remotes/ehabkost/tags/x86-next-pull-request' into staging
x86 queue, 2019-01-14
* Reenable RDTSCP support on Opteron_G[345] CPU models CPU models
(Borislav Petkov)
* host-phys-bits-limit option for better control of 5-level EPT
(Eduardo Habkost)
* Disable MPX support on named CPU models (Paolo Bonzini)
* expose HV_CPUID_ENLIGHTMENT_INFO.EAX and HV_CPUID_NESTED_FEATURES.EAX
as feature words (Vitaly Kuznetsov)
# gpg: Signature made Mon 14 Jan 2019 14:33:55 GMT
# gpg: using RSA key 2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <[email protected]>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6
* remotes/ehabkost/tags/x86-next-pull-request:
i386/kvm: add a comment explaining why .feat_names are commented out for Hyper-V feature bits
x86: host-phys-bits-limit option
target/i386: Disable MPX support on named CPU models
target-i386: Reenable RDTSCP support on Opteron_G[345] CPU models CPU models
i386/kvm: expose HV_CPUID_ENLIGHTMENT_INFO.EAX and HV_CPUID_NESTED_FEATURES.EAX as feature words
Eric Blake [Fri, 11 Jan 2019 19:47:20 +0000 (13:47 -0600)]
qemu-nbd: Add --bitmap=NAME option
Having to fire up qemu, then use QMP commands for nbd-server-start
and nbd-server-add, just to expose a persistent dirty bitmap, is
rather tedious. Make it possible to expose a dirty bitmap using
just qemu-nbd (of course, for now this only works when qemu-nbd is
visiting a BDS formatted as qcow2).
Of course, any good feature also needs unit testing, so expand
iotest 223 to cover it.
Eric Blake [Fri, 11 Jan 2019 19:47:19 +0000 (13:47 -0600)]
nbd: Merge nbd_export_bitmap into nbd_export_new
We only have one caller that wants to export a bitmap name,
which it does right after creation of the export. But there is
still a brief window of time where an NBD client could see the
export but not the dirty bitmap, which a robust client would
have to interpret as meaning the entire image should be treated
as dirty. Better is to eliminate the window entirely, by
inlining nbd_export_bitmap() into nbd_export_new(), and refusing
to create the bitmap in the first place if the requested bitmap
can't be located.
We also no longer need logic for setting a different bitmap
name compared to the bitmap being exported.
Eric Blake [Fri, 11 Jan 2019 19:47:18 +0000 (13:47 -0600)]
nbd: Remove x-nbd-server-add-bitmap
Now that nbd-server-add can do the same functionality (well, other
than making the exported bitmap name different than the underlying
bitamp - but we argued that was not essential, since it is just as
easy to create a new non-persistent bitmap with the desired name),
we no longer need the experimental separate command.
Eric Blake [Fri, 11 Jan 2019 19:47:17 +0000 (13:47 -0600)]
nbd: Allow bitmap export during QMP nbd-server-add
With the experimental x-nbd-server-add-bitmap command, there was
a window of time where an NBD client could see the export but not
the associated dirty bitmap, which can cause a client that planned
on using the dirty bitmap to be forced to treat the entire image
as dirty as a safety fallback. Furthermore, if the QMP client
successfully exports a disk but then fails to add the bitmap, it
has to take on the burden of removing the export. Since we don't
allow changing the exposed dirty bitmap (whether to a different
bitmap, or removing advertisement of the bitmap), it is nicer to
make the bitmap tied to the export at the time the export is
created, with automatic failure to export if the bitmap is not
available.
The experimental command included an optional 'bitmap-export-name'
field for remapping the name exposed over NBD to be different from
the bitmap name stored on disk. However, my libvirt demo code
for implementing differential backups on top of persistent bitmaps
did not need to take advantage of that feature (it is instead
possible to create a new temporary bitmap with the desired name,
use block-dirty-bitmap-merge to merge one or more persistent
bitmaps into the temporary, then associate the temporary with the
NBD export, if control is needed over the exported bitmap name).
Hence, I'm not copying that part of the experiment over to the
stable addition. For more details on the libvirt demo, see
https://www.redhat.com/archives/libvir-list/2018-October/msg01254.html,
https://kvmforum2018.sched.com/event/FzuB/facilitating-incremental-backup-eric-blake-red-hat
This patch focuses on the user interface, and reduces (but does
not completely eliminate) the window where an NBD client can see
the export but not the dirty bitmap, with less work to clean up
after errors. Later patches will add further cleanups now that
this interface is declared stable via a single QMP command,
including removing the race window.
Eric Blake [Fri, 11 Jan 2019 19:47:16 +0000 (13:47 -0600)]
nbd: Merge nbd_export_set_name into nbd_export_new
The existing NBD code had a weird split where nbd_export_new()
created an export but did not add it to the list of exported
names until a later nbd_export_set_name() came along and grabbed
a second reference on the object; later, the first call to
nbd_export_close() drops the second reference while removing
the export from the list. This is in part because the QAPI
NbdServerRemoveNode enum documents the possibility of adding a
mode where we could do a soft disconnect: preventing new clients,
but waiting for existing clients to gracefully quit, based on
the mode used when calling nbd_export_close().
But in spite of all that, note that we never change the name of
an NBD export while it is exposed, which means it is easier to
just inline the process of setting the name as part of creating
the export.
Inline the contents of nbd_export_set_name() and
nbd_export_set_description() into the two points in an export
lifecycle where they matter, then adjust both callers to pass
the name up front. Note that for creation, all callers pass a
non-NULL name, (passing NULL at creation was for old style
servers, but we removed support for that in commit 7f7dfe2a),
so we can add an assert and do things unconditionally; but for
cleanup, because of the dual nature of nbd_export_close(), we
still have to be careful to avoid use-after-free. Along the
way, add a comment reminding ourselves of the potential of
adding a middle mode disconnect.
Eric Blake [Fri, 11 Jan 2019 19:47:15 +0000 (13:47 -0600)]
nbd: Only require disabled bitmap for read-only exports
Our initial implementation of x-nbd-server-add-bitmap put
in a restriction because of incremental backups: in that
usage, we are exporting one qcow2 file (the temporary overlay
target of a blockdev-backup sync:none job) and a dirty bitmap
owned by a second qcow2 file (the source of the
blockdev-backup, which is the backing file of the temporary).
While both qcow2 files are still writable (the target in
order to capture copy-on-write of old contents, and the
source in order to track live guest writes in the meantime),
the NBD client expects to see constant data, including the
dirty bitmap. An enabled bitmap in the source would be
modified by guest writes, which is at odds with the NBD
export being a read-only constant view, hence the initial
code choice of enforcing a disabled bitmap (the intent is
that the exposed bitmap was disabled in the same transaction
that started the blockdev-backup job, although we don't want
to track enough state to actually enforce that).
However, consider the case of a bitmap contained in a read-only
node (including when the bitmap is found in a backing layer of
the active image). Because the node can't be modified, the
bitmap won't change due to writes, regardless of whether it is
still enabled. Forbidding the export unless the bitmap is
disabled is awkward, paritcularly since we can't change the
bitmap to be disabled (because the node is read-only).
Alternatively, consider the case of live storage migration,
where management directs the destination to create a writable
NBD server, then performs a drive-mirror from the source to
the target, prior to doing the rest of the live migration.
Since storage migration can be time-consuming, it may be wise
to let the destination include a dirty bitmap to track which
portions it has already received, where even if the migration
is interrupted and restarted, the source can query the
destination block status in order to potentially minimize
re-sending data that has not changed in the meantime on a
second attempt. Such code has not been written, and might not
be trivial (after all, a cluster being marked dirty in the
bitmap does not necessarily guarantee it has the desired
contents), but it makes sense that letting an active dirty
bitmap be exposed and changing alongside writes may prove
useful in the future.
Solve both issues by gating the restriction against a
disabled bitmap to only happen when the caller has requested
a read-only export, and where the BDS that owns the bitmap
(whether or not it is the BDS handed to nbd_export_new() or
from its backing chain) is still writable. We could drop
the check altogether (if management apps are prepared to
deal with a changing bitmap even on a read-only image), but
for now keeping a check for the read-only case still stands
a chance of preventing management errors.
Update iotest 223 to show the looser behavior by leaving
a bitmap enabled the whole run; note that we have to tear
down and re-export a node when handling an error.
Eric Blake [Fri, 11 Jan 2019 19:47:14 +0000 (13:47 -0600)]
nbd: Forbid nbd-server-stop when server is not running
Since we already forbid other nbd-server commands when not
in the right state, it is unlikely that any caller was relying
on a second stop to behave as a silent no-op. Update iotest
223 to show the improved behavior.
Eric Blake [Fri, 11 Jan 2019 19:47:13 +0000 (13:47 -0600)]
nbd: Add some error case testing to iotests 223
Testing success paths is important, but it's also nice to highlight
expected failure handling, to show that we don't crash, and so that
upcoming tests that change behavior can demonstrate the resulting
effects on error paths.
Add the following errors:
Attempting to export without a running server
Attempting to start a second server
Attempting to export a bad node name
Attempting to export a name that is already exported
Attempting to export an enabled bitmap
Attempting to remove an already removed export
Attempting to quit server a second time
All of these properly complain except for a second server-stop,
which will be fixed next.
qemu-nbd: Rename 'exp' variable clashing with math::exp() symbol
The use of a variable named 'exp' prevents includes to import <math.h>.
Rename it to avoid:
qemu-nbd.c:64:19: error: ‘exp’ redeclared as different kind of symbol
static NBDExport *exp;
^~~
In file included from /usr/include/features.h:428,
from /usr/include/bits/libc-header-start.h:33,
from /usr/include/stdint.h:26,
from /usr/lib/gcc/x86_64-redhat-linux/8/include/stdint.h:9,
from /source/qemu/include/qemu/osdep.h:80,
from /source/qemu/qemu-nbd.c:19:
/usr/include/bits/mathcalls.h:95:1: note: previous declaration of ‘exp’ was here
__MATHCALL_VEC (exp,, (_Mdouble_ __x));
^~~~~~~~~~~~~~
John Snow [Fri, 21 Dec 2018 09:35:28 +0000 (04:35 -0500)]
iotests: implement pretty-print for log and qmp_log
If iotests have lines exceeding >998 characters long, git doesn't
want to send it plaintext to the list. We can solve this by allowing
the iotests to use pretty printed QMP output that we can match against
instead.
John Snow [Fri, 21 Dec 2018 09:35:27 +0000 (04:35 -0500)]
iotests: change qmp_log filters to expect QMP objects only
As laid out in the previous commit's message:
```
Several places in iotests deal with serializing objects into JSON
strings, but to add pretty-printing it seems desirable to localize
all of those cases.
log() seems like a good candidate for that centralized behavior.
log() can already serialize json objects, but when it does so,
it assumes filters=[] operates on QMP objects, not strings.
qmp_log currently operates by dumping outgoing and incoming QMP
objects into strings and filtering them assuming that filters=[]
are string filters.
```
Therefore:
Change qmp_log to treat filters as if they're always qmp object filters,
then change the logging call to rely on log()'s ability to serialize QMP
objects, so we're not duplicating that effort.
Add a qmp version of filter_testfiles and adjust the only caller using
it for qmp_log to use the qmp version.
John Snow [Fri, 21 Dec 2018 09:35:26 +0000 (04:35 -0500)]
iotests: remove default filters from qmp_log
Several places in iotests deal with serializing objects into JSON
strings, but to add pretty-printing it seems desirable to localize
all of those cases.
log() seems like a good candidate for that centralized behavior.
log() can already serialize json objects, but when it does so,
it assumes filters=[] operates on QMP objects, not strings.
qmp_log currently operates by dumping outgoing and incoming QMP
objects into strings and filtering them assuming that filters=[]
are string filters.
To have qmp_log use log's serialization, qmp_log will need to
accept only qmp filters, not text filters.
However, only a single caller of qmp_log actually requires any
filters at all. I remove the default filter and add it explicitly
to the caller in preparation for refactoring qmp_log to use rich
filters instead.
test 206 is amended to name the filter explicitly and the default
is removed.
John Snow [Fri, 21 Dec 2018 09:35:25 +0000 (04:35 -0500)]
iotests: add qmp recursive sorting function
Python before 3.6 does not sort dictionaries (including kwargs).
Therefore, printing QMP objects involves sorting the keys to have
a predictable ordering in the iotests output. This means that
iotests output will sometimes show arguments in an order not
specified by the test author.
Presently, we accomplish this by using json.dumps' sort_keys argument,
where we only serialize the arguments dictionary, but not the command.
However, if we want to pretty-print QMP objects being sent to the
QEMU process, we need to build the entire command before logging it.
Ordinarily, this would then involve "arguments" being sorted above
"execute", which would necessitate a rather ugly and harder-to-read
change to many iotests outputs.
To facilitate pretty-printing AND maintaining predictable output AND
having "arguments" sort after "execute", add a custom sort function
that takes a dictionary and recursively builds an OrderedDict that
maintains the specific key order we wish to see in iotests output.
The qmp_log function uses this to build a QMP object that keeps
"execute" above "arguments", but sorts all keys and keys in any
subdicts in "arguments" lexicographically to maintain consistent
iotests output, with no incompatible changes to any current test.
John Snow [Fri, 21 Dec 2018 09:35:23 +0000 (04:35 -0500)]
iotests.py: don't abort if IMGKEYSECRET is undefined
Instead of using os.environ[], use .get with a default of empty string
to match the setup in check to allow us to import the iotests module
(for debugging, say) without needing a crafted environment just to
import the module.
John Snow [Fri, 21 Dec 2018 09:35:22 +0000 (04:35 -0500)]
block: remove 'x' prefix from experimental bitmap APIs
The 'x' prefix was added because I was uncertain of the direction we'd
take for the libvirt API. With the general approach solidified, I feel
comfortable committing to this API for 4.0.
John Snow [Fri, 21 Dec 2018 09:35:21 +0000 (04:35 -0500)]
blockdev: n-ary bitmap merge
Especially outside of transactions, it is helpful to provide
all-or-nothing semantics for bitmap merges. This facilitates
the coalescing of multiple bitmaps into a single target for
the "checkpoint" interpretation when assembling bitmaps that
represent arbitrary points in time from component bitmaps.
This is an incompatible change from the preliminary version
of the API.
John Snow [Fri, 11 Jan 2019 17:59:16 +0000 (11:59 -0600)]
blockdev: abort transactions in reverse order
Presently, we abort transactions in the same order they were processed in.
Bitmap commands, though, attempt to restore backup data structures on abort.
That's not valid, they need to be aborted in reverse chronological order.
Replace the QSIMPLEQ data structure with a QTAILQ one, so we can iterate
in reverse for the abort phase of the transaction.
Alex Bennée [Fri, 11 Jan 2019 13:50:02 +0000 (13:50 +0000)]
tests: use g_usleep instead of rem = sleep(time)
Relying on sleep to always return having slept isn't safe as a signal
may have occurred. If signals are constantly incoming the program will
never reach its termination condition. This is believed to be the
mechanism causing time outs for qht-test in Travis.
The glib g_usleep() deals with all of this for us so lets use it instead.
The Debian Sid repository is not garanteed to be stable, as his
'unstable' name suggest :)
To allow quick testing, packages are pushed various time a day,
which my be annoying when trying to use it for stable development
(which is not recommended, but Sid provides edge packages we use
for testing).
Debian provides repositories snapshots which are suitable for our
use. Pick a recent date that works. When required, update to newer
releases will be easy.
This fixes current issues with this image:
$ make docker-image-debian-sid
[...]
The following packages have unmet dependencies:
build-essential : Depends: dpkg-dev (>= 1.17.11) but it is not going to be installed
git : Depends: perl but it is not going to be installed
Depends: liberror-perl but it is not going to be installed
pkg-config : Depends: libdpkg-perl but it is not going to be installed
texinfo : Depends: perl (>= 5.26.2-6) but it is not going to be installed
Depends: libtext-unidecode-perl but it is not going to be installed
Depends: libxml-libxml-perl but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
Travis sometimes fails a build because it produces no console output for
over 10 minutes. If this is due to a genuine hang, it would be useful to
have used verbose test output to see where it failed. If this is just
due to tests being very slow, having verbose output might allow the
build to succeed.
"Container-based infrastructure is currently being deprecated.
Please remove any sudo: false keys in your .travis.yml file
to use the default fully-virtualized Linux infrastructure
instead."
One of the matrix entries redefines the script command in order to add
the ${MAKEFLAGS} variable. Ideally ${MAKEFLAGS} would be referenced by
the definition of the ${TEST_CMD} env variable, but this isn't possible
in travis. ${MAKEFLAGS} exists to eliminate duplication of flags in
every "make" command, but this cure causes a worse problem, namely the
reduplication of the "script" command. It is simpler to just insert "-j3"
directly into any "make" command.