Git Repo - qemu.git/log

target/arm/arm-semi: Factor out implementation of SYS_ISTTY

Factor out the implementation of SYS_ISTTY via the new function
tables.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: 20190916141544 [email protected]

target/arm/arm-semi: Factor out implementation of SYS_READ

Factor out the implementation of SYS_READ via the
new function tables.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: 20190916141544 [email protected]

target/arm/arm-semi: Factor out implementation of SYS_WRITE

Factor out the implementation of SYS_WRITE via the
new function tables.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20190916141544 [email protected]

target/arm/arm-semi: Factor out implementation of SYS_CLOSE

Currently for the semihosting calls which take a file descriptor
(SYS_CLOSE, SYS_WRITE, SYS_READ, SYS_ISTTY, SYS_SEEK, SYS_FLEN)
we have effectively two implementations, one for real host files
and one for when we indirect via the gdbstub. We want to add a
third one to deal with the magic :semihosting-features file.

Instead of having a three-way if statement in each of these
cases, factor out the implementation of the calls to separate
functions which we dispatch to via function pointers selected
via the GuestFDType for the guest fd.

In this commit, we set up the framework for the dispatch,
and convert the SYS_CLOSE call to use it.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20190916141544 [email protected]

target/arm/arm-semi: Use set_swi_errno() in gdbstub callback functions

When we are routing semihosting operations through the gdbstub, the
work of sorting out the return value and setting errno if necessary
is done by callback functions which are invoked by the gdbstub code.
Clean up some ifdeffery in those functions by having them call
set_swi_errno() to set the semihosting errno.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20190916141544 [email protected]

target/arm/arm-semi: Restrict use of TaskState*

The semihosting code needs accuss to the linux-user only
TaskState pointer so it can set the semihosting errno per-thread
for linux-user mode. At the moment we do this by having some
ifdefs so that we define a 'ts' local in do_arm_semihosting()
which is either a real TaskState * or just a CPUARMState *,
depending on which mode we're compiling for.

This is awkward if we want to refactor do_arm_semihosting()
into other functions which might need to be passed the TaskState.
Restrict usage of the TaskState local by:
* making set_swi_errno() always take the CPUARMState pointer
   and (for the linux-user version) get TaskState from that
* creating a new get_swi_errno() which reads the errno
* having the two semihosting calls which need the TaskState
   for other purposes (SYS_GET_CMDLINE and SYS_HEAPINFO)
   define a variable with scope restricted to just that code

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20190916141544 [email protected]

target/arm/arm-semi: Make semihosting code hand out its own file descriptors

Currently the Arm semihosting code returns the guest file descriptors
(handles) which are simply the fd values from the host OS or the
remote gdbstub. Part of the semihosting 2.0 specification requires
that we implement special handling of opening a ":semihosting-features"
filename. Guest fds which result from opening the special file
won't correspond to host fds, so to ensure that we don't end up
with duplicate fds we need to have QEMU code control the allocation
of the fd values we give the guest.

Add in an abstraction layer which lets us allocate new guest FD
values, and translate from a guest FD value back to the host one.
This also fixes an odd hole where a semihosting guest could
use the semihosting API to read, write or close file descriptors
that it had never allocated but which were being used by QEMU itself.
(This isn't a security hole, because enabling semihosting permits
the guest to do arbitrary file access to the whole host filesystem,
and so should only be done if the guest is completely trusted.)

Currently the only kind of guest fd is one which maps to a
host fd, but in a following commit we will add one which maps
to the :semihosting-features magic data.

If the guest is migrated with an open semihosting file descriptor
then subsequent attempts to use the fd will all fail; this is
not a change from the previous situation (where the host fd
being used on the source end would not be re-opened on the
destination end).

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20190916141544 [email protected]

target/arm/arm-semi: Correct comment about gdb syscall races

In arm_gdb_syscall() we have a comment suggesting a race
because the syscall completion callback might not happen
before the gdb_do_syscallv() call returns. The comment is
correct that the callback may not happen but incorrect about
the effects. Correct it and note the important caveat that
callers must never do any work of any kind after return from
arm_gdb_syscall() that depends on its return value.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20190916141544 [email protected]

target/arm/arm-semi: Always set some kind of errno for failed calls

If we fail a semihosting call we should always set the
semihosting errno to something; we were failing to do
this for some of the "check inputs for sanity" cases.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20190916141544 [email protected]

target/arm/arm-semi: Capture errno in softmmu version of set_swi_errno()

The set_swi_errno() function is called to capture the errno
from a host system call, so that we can return -1 from the
semihosting function and later allow the guest to get a more
specific error code with the SYS_ERRNO function. It comes in
two versions, one for user-only and one for softmmu. We forgot
to capture the errno in the softmmu version; fix the error.

(Semihosting calls directed to gdb are unaffected because
they go through a different code path that captures the
error return from the gdbstub call in arm_semi_cb() or
arm_semi_flen_cb().)

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20190916141544 [email protected]

hw/net/lan9118.c: Switch to transaction-based ptimer API

Switch the cmsdk-apb-watchdog code away from bottom-half based
ptimers to the new transaction-based ptimer API. This just requires
adding begin/commit calls around the various places that modify the
ptimer state, and using the new ptimer_init() function to create the
timer.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20191008171740 [email protected]

hw/watchdog/cmsdk-apb-watchdog.c: Switch to transaction-based ptimer API

Switch the cmsdk-apb-watchdog code away from bottom-half based
ptimers to the new transaction-based ptimer API. This just requires
adding begin/commit calls around the various places that modify the
ptimer state, and using the new ptimer_init() function to create the
timer.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20191008171740 [email protected]

hw/timer/mss-timerc: Switch to transaction-based ptimer API

Switch the mss-timer code away from bottom-half based ptimers to
the new transaction-based ptimer API. This just requires adding
begin/commit calls around the various places that modify the ptimer
state, and using the new ptimer_init() function to create the timer.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20191008171740 [email protected]

hw/timer/imx_gpt.c: Switch to transaction-based ptimer API

Switch the imx_epit.c code away from bottom-half based ptimers to
the new transaction-based ptimer API. This just requires adding
begin/commit calls around the various places that modify the ptimer
state, and using the new ptimer_init() function to create the timer.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20191008171740 [email protected]

hw/timer/imx_epit.c: Switch to transaction-based ptimer API

Switch the imx_epit.c code away from bottom-half based ptimers to
the new transaction-based ptimer API. This just requires adding
begin/commit calls around the various places that modify the ptimer
state, and using the new ptimer_init() function to create the timer.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20191008171740 [email protected]

hw/timer/exynos4210_rtc.c: Switch main ptimer to transaction-based API

Switch the exynos41210_rtc main ptimer over to the transaction-based
API, completing the transition for this device.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20191008171740 [email protected]

hw/timer/exynos4210_rtc.c: Switch 1Hz ptimer to transaction-based API

Switch the exynos41210_rtc 1Hz ptimer over to the transaction-based
API. (We will switch the other ptimer used by this device in a
separate commit.)

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20191008171740 [email protected]

hw/timer/exynos4210_pwm.c: Switch to transaction-based ptimer API

Switch the exynos4210_pwm code away from bottom-half based ptimers to
the new transaction-based ptimer API. This just requires adding
begin/commit calls around the various places that modify the ptimer
state, and using the new ptimer_init() function to create the timer.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20191008171740 [email protected]

hw/timer/exynos4210_mct.c: Switch ltick to transaction-based ptimer API

Switch the ltick ptimer over to the ptimer transaction API.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20191008171740 [email protected]

hw/timer/exynos4210_mct.c: Switch LFRC to transaction-based ptimer API

Switch the exynos MCT LFRC timers over to the ptimer transaction API.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20191008171740 [email protected]

hw/timer/exynos4210_mct.c: Switch GFRC to transaction-based ptimer API

We want to switch the exynos MCT code away from bottom-half based ptimers to
the new transaction-based ptimer API. The MCT is complicated
and uses multiple different ptimers, so it's clearer to switch
it a piece at a time. Here we change over only the GFRC.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20191008171740 [email protected]

hw/timer/digic-timer.c: Switch to transaction-based ptimer API

Switch the digic-timer.c code away from bottom-half based ptimers to
the new transaction-based ptimer API. This just requires adding
begin/commit calls around the various places that modify the ptimer
state, and using the new ptimer_init() function to create the timer.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20191008171740 [email protected]

hw/timer/cmsdk-apb-timer.c: Switch to transaction-based ptimer API

Switch the cmsdk-apb-timer code away from bottom-half based ptimers
to the new transaction-based ptimer API. This just requires adding
begin/commit calls around the various places that modify the ptimer
state, and using the new ptimer_init() function to create the timer.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20191008171740 [email protected]

hw/timer/cmsdk-apb-dualtimer.c: Switch to transaction-based ptimer API

Switch the cmsdk-apb-dualtimer code away from bottom-half based
ptimers to the new transaction-based ptimer API. This just requires
adding begin/commit calls around the various places that modify the
ptimer state, and using the new ptimer_init() function to create the
timer.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20191008171740 [email protected]

hw/timer/arm_mptimer.c: Switch to transaction-based ptimer API

Switch the arm_mptimer.c code away from bottom-half based ptimers to
the new transaction-based ptimer API. This just requires adding
begin/commit calls around the various places that modify the ptimer
state, and using the new ptimer_init() function to create the timer.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20191008171740 [email protected]

hw/timer/allwinner-a10-pit.c: Switch to transaction-based ptimer API

Switch the allwinner-a10-pit code away from bottom-half based ptimers to
the new transaction-based ptimer API. This just requires adding
begin/commit calls around the various places that modify the ptimer
state, and using the new ptimer_init() function to create the timer.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20191008171740 [email protected]

hw/arm/musicpal.c: Switch to transaction-based ptimer API

Switch the musicpal code away from bottom-half based ptimers to
the new transaction-based ptimer API. This just requires adding
begin/commit calls around the various places that modify the ptimer
state, and using the new ptimer_init() function to create the timer.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20191008171740 [email protected]

hw/timer/arm_timer.c: Switch to transaction-based ptimer API

Switch the arm_timer.c code away from bottom-half based ptimers
to the new transaction-based ptimer API. This just requires
adding begin/commit calls around the various arms of
arm_timer_write() that modify the ptimer state, and using the
new ptimer_init() function to create the timer.

Fixes: https://bugs.launchpad.net/qemu/+bug/1777777
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20191008171740 [email protected]

tests/ptimer-test: Switch to transaction-based ptimer API

Convert the ptimer test cases to the transaction-based ptimer API,
by changing to ptimer_init(), dropping the now-unused QEMUBH
variables, and surrounding each set of changes to the ptimer
state in ptimer_transaction_begin/commit calls.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20191008171740 [email protected]

ptimer: Provide new transaction-based API

Provide the new transaction-based API. If a ptimer is created
using ptimer_init() rather than ptimer_init_with_bh(), then
instead of providing a QEMUBH, it provides a pointer to the
callback function directly, and has opted into the transaction
API. All calls to functions which modify ptimer state:
- ptimer_set_period()
- ptimer_set_freq()
- ptimer_set_limit()
- ptimer_set_count()
- ptimer_run()
- ptimer_stop()
must be between matched calls to ptimer_transaction_begin()
and ptimer_transaction_commit(). When ptimer_transaction_commit()
is called it will evaluate the state of the timer after all the
changes in the transaction, and call the callback if necessary.

In the old API the individual update functions generally would
call ptimer_trigger() immediately, which would schedule the QEMUBH.
In the new API the update functions will instead defer the
"set s->next_event and call ptimer_reload()" work to
ptimer_transaction_commit().

Because ptimer_trigger() can now immediately call into the
device code which may then call other ptimer functions that
update ptimer_state fields, we must be more careful in
ptimer_reload() not to cache fields from ptimer_state across
the ptimer_trigger() call. (This was harmless with the QEMUBH
mechanism as the BH would not be invoked until much later.)

We use assertions to check that:
* the functions modifying ptimer state are not called outside
a transaction block
* ptimer_transaction_begin() and _commit() calls are paired
* the transaction API is not used with a QEMUBH ptimer

There is some slight repetition of code:
* most of the set functions have similar looking "if s->bh
call ptimer_reload, otherwise set s->need_reload" code
* ptimer_init() and ptimer_init_with_bh() have similar code
We deliberately don't try to avoid this repetition, because
it will all be deleted when the QEMUBH version of the API
is removed.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20191008171740 [email protected]

ptimer: Rename ptimer_init() to ptimer_init_with_bh()

Currently the ptimer design uses a QEMU bottom-half as its
mechanism for calling back into the device model using the
ptimer when the timer has expired. Unfortunately this design
is fatally flawed, because it means that there is a lag
between the ptimer updating its own state and the device
callback function updating device state, and guest accesses
to device registers between the two can return inconsistent
device state.

We want to replace the bottom-half design with one where
the guest device's callback is called either immediately
(when the ptimer triggers by timeout) or when the device
model code closes a transaction-begin/end section (when the
ptimer triggers because the device model changed the
ptimer's count value or other state). As the first step,
rename ptimer_init() to ptimer_init_with_bh(), to free up
the ptimer_init() name for the new API. We can then convert
all the ptimer users away from ptimer_init_with_bh() before
removing it entirely.

(Commit created with
git grep -l ptimer_init | xargs sed -i -e 's/ptimer_init/ptimer_init_with_bh/'
and three overlong lines folded by hand.)

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20191008171740 [email protected]

ARM: KVM: Check KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 for smp_cpus > 256

Host kernel within [4.18, 5.3] report an erroneous KVM_MAX_VCPUS=512
for ARM. The actual capability to instantiate more than 256 vcpus
was fixed in 5.4 with the upgrade of the KVM_IRQ_LINE ABI to support
vcpu id encoded on 12 bits instead of 8 and a redistributor consuming
a single KVM IO device instead of 2.

So let's check this capability when attempting to use more than 256
vcpus within any ARM kvm accelerated machine.

Signed-off-by: Eric Auger <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Message-id: 20191003154640 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

intc/arm_gic: Support IRQ injection for more than 256 vpus

Host kernels that expose the KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 capability
allow injection of interrupts along with vcpu ids larger than 255.
Let's encode the vpcu id on 12 bits according to the upgraded KVM_IRQ_LINE
ABI when needed.

Given that we have two callsites that need to assemble
the value for kvm_set_irq(), a new helper routine, kvm_arm_set_irq
is introduced.

Without that patch qemu exits with "kvm_set_irq: Invalid argument"
message.

Signed-off-by: Eric Auger <[email protected]>
Reported-by: Zenghui Yu <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Message-id: 20191003154640 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

linux headers: update against v5.4-rc1

Update the headers against commit:
0f1a7b3fac05 ("timer-of: don't use conditional expression
with mixed 'void' types")

Signed-off-by: Eric Auger <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Message-id: 20191003154640 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- block: Fix crash with qcow2 partial cluster COW with small cluster
  sizes (misaligned write requests with BDRV_REQ_NO_FALLBACK)
- qcow2: Fix integer overflow potentially causing corruption with huge
  requests
- vhdx: Detect truncated image files
- tools: Support help options for --object
- Various block-related replay improvements
- iotests/028: Fix for long $TEST_DIRs

# gpg: Signature made Mon 14 Oct 2019 17:02:54 BST
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  iotests: Test large write request to qcow2 file
  qcow2: Limit total allocation range to INT_MAX
  qemu-nbd: Support help options for --object
  qemu-img: Support help options for --object
  qemu-io: Support help options for --object
  vl: Split off user_creatable_print_help()
  iotests/028: Fix for long $TEST_DIRs
  block: Reject misaligned write requests with BDRV_REQ_NO_FALLBACK
  replay: add BH oneshot event for block layer
  replay: finish record/replay before closing the disks
  replay: don't drain/flush bdrv queue while RR is working
  replay: update docs for record/replay with block devices
  replay: disable default snapshot for record/replay
  block: implement bdrv_snapshot_goto for blkreplay
  block/vhdx: add check for truncated image files

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging

Pull request

v2:
* Replaced "Launchpad:" tag with "Buglink:" as documented on the SubmitAPatch wiki page [Philippe]

# gpg: Signature made Tue 15 Oct 2019 09:49:05 BST
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>" [full]
# gpg:                 aka "Stefan Hajnoczi <[email protected]>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/tracing-pull-request:
  trace: avoid "is" with a literal Python 3.8 warnings
  trace: add --group=all to tracing.txt

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

Pull request

# gpg: Signature made Mon 14 Oct 2019 09:52:03 BST
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>" [full]
# gpg:                 aka "Stefan Hajnoczi <[email protected]>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request:
  test-bdrv-drain: fix iothread_join() hang

Signed-off-by: Peter Maydell <[email protected]>

trace: avoid "is" with a literal Python 3.8 warnings

The following statement produces a SyntaxWarning with Python 3.8:

if len(format) is 0:
scripts/tracetool/__init__.py:459: SyntaxWarning: "is" with a literal. Did you mean "=="?

Use the conventional len(x) == 0 syntax instead.

Reported-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <20191010122154 [email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

trace: add --group=all to tracing.txt

tracetool needs to know the group name ("all", "root", or a specific
subdirectory).  Also remove the stdin redirection because tracetool.py
needs the path to the trace-events file.  Update the documentation.

Fixes: 2098c56a9bc5901e145fa5d4759f075808811685
       ("trace: move setting of group name into Makefiles")
Buglink: https://bugs.launchpad.net/bugs/1844814
Reported-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Tested-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>
Message-Id: <20191009135154 [email protected]>

Merge remote-tracking branch 'remotes/mcayland/tags/qemu-openbios-20191012' into staging

qemu-openbios queue

# gpg: Signature made Sat 12 Oct 2019 10:47:55 BST
# gpg:                using RSA key CC621AB98E82200D915CC9C45BC2C56FAE0F321F
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Mark Cave-Ayland <[email protected]>" [full]
# Primary key fingerprint: CC62 1AB9 8E82 200D 915C  C9C4 5BC2 C56F AE0F 321F

* remotes/mcayland/tags/qemu-openbios-20191012:
  Update OpenBIOS images to f28e16f9 built from submodule.

Signed-off-by: Peter Maydell <[email protected]>

iotests: Test large write request to qcow2 file

Without HEAD^, the following happens when you attempt a large write
request to a qcow2 file such that the number of bytes covered by all
clusters involved in a single allocation will exceed INT_MAX:

(A) handle_alloc_space() decides to fill the whole area with zeroes and
    fails because bdrv_co_pwrite_zeroes() fails (the request is too
    large).

(B) If handle_alloc_space() does not do anything, but merge_cow()
    decides that the requests can be merged, it will create a too long
    IOV that later cannot be written.

(C) Otherwise, all parts will be written separately, so those requests
    will work.

In either B or C, though, qcow2_alloc_cluster_link_l2() will have an
overflow: We use an int (i) to iterate over nb_clusters, and then
calculate the L2 entry based on "i << s->cluster_bits" -- which will
overflow if the range covers more than INT_MAX bytes.  This then leads
to image corruption because the L2 entry will be wrong (it will be
recognized as a compressed cluster).

Even if that were not the case, the .cow_end area would be empty
(because handle_alloc() will cap avail_bytes and nb_bytes at INT_MAX, so
their difference (which is the .cow_end size) will be 0).

So this test checks that on such large requests, the image will not be
corrupted.  Unfortunately, we cannot check whether COW will be handled
correctly, because that data is discarded when it is written to null-co
(but we have to use null-co, because writing 2 GB of data in a test is
not quite reasonable).

Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qcow2: Limit total allocation range to INT_MAX

When the COW areas are included, the size of an allocation can exceed
INT_MAX. This is kind of limited by handle_alloc() in that it already
caps avail_bytes at INT_MAX, but the number of clusters still reflects
the original length.

This can have all sorts of effects, ranging from the storage layer write
call failing to image corruption. (If there were no image corruption,
then I suppose there would be data loss because the .cow_end area is
forced to be empty, even though there might be something we need to
COW.)

Fix all of it by limiting nb_clusters so the equivalent number of bytes
will not exceed INT_MAX.

Cc: [email protected]
Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-nbd: Support help options for --object

Instead of parsing help options as normal object properties and
returning an error, provide the same help functionality as the system
emulator in qemu-nbd, too.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Eric Blake <[email protected]>

qemu-img: Support help options for --object

Instead of parsing help options as normal object properties and
returning an error, provide the same help functionality as the system
emulator in qemu-img, too.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Eric Blake <[email protected]>

qemu-io: Support help options for --object

Instead of parsing help options as normal object properties and
returning an error, provide the same help functionality as the system
emulator in qemu-io, too.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Eric Blake <[email protected]>

vl: Split off user_creatable_print_help()

Printing help for --object is something that we not only want in the
system emulator, but also in tools that support --object. Move it into a
separate function in qom/object_interfaces.c to make the code accessible
for tools.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Eric Blake <[email protected]>

iotests/028: Fix for long $TEST_DIRs

For long test image paths, the order of the "Formatting" line and the
"(qemu)" prompt after a drive_backup HMP command may be reversed. In
fact, the interaction between the prompt and the line may lead to the
"Formatting" to being greppable at all after "read"-ing it (if the
prompt injects an IFS character into the "Formatting" string).

So just wait until we get a prompt. At that point, the block job must
have been started, so "info block-jobs" will only return "No active
jobs" once it is done.

Reported-by: Thomas Huth <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: John Snow <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Reject misaligned write requests with BDRV_REQ_NO_FALLBACK

The BDRV_REQ_NO_FALLBACK flag means that an operation should only be
performed if it can be offloaded or otherwise performed efficiently.

However a misaligned write request requires a RMW so we should return
an error and let the caller decide how to proceed.

This hits an assertion since commit c8bb23cbdb if the required
alignment is larger than the cluster size:

qemu-img create -f qcow2 -o cluster_size=2k img.qcow2 4G
qemu-io -c "open -o driver=qcow2,file.align=4k blkdebug::img.qcow2" \
-c 'write 0 512'
qemu-io: block/io.c:1127: bdrv_driver_pwritev: Assertion `!(flags & BDRV_REQ_NO_FALLBACK)' failed.
Aborted

The reason is that when writing to an unallocated cluster we try to
skip the copy-on-write part and zeroize it using BDRV_REQ_NO_FALLBACK
instead, resulting in a write request that is too small (2KB cluster
size vs 4KB required alignment).

Signed-off-by: Alberto Garcia <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

replay: add BH oneshot event for block layer

Replay is capable of recording normal BH events, but sometimes
there are single use callbacks scheduled with aio_bh_schedule_oneshot
function. This patch enables recording and replaying such callbacks.
Block layer uses these events for calling the completion function.
Replaying these calls makes the execution deterministic.

Signed-off-by: Pavel Dovgalyuk <[email protected]>
Acked-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

replay: finish record/replay before closing the disks

After recent updates block devices cannot be closed on qemu exit.
This happens due to the block request polling when replay is not finished.
Therefore now we stop execution recording before closing the block devices.

Signed-off-by: Pavel Dovgalyuk <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

replay: don't drain/flush bdrv queue while RR is working

In record/replay mode bdrv queue is controlled by replay mechanism.
It does not allow saving or loading the snapshots
when bdrv queue is not empty. Stopping the VM is not blocked by nonempty
queue, but flushing the queue is still impossible there,
because it may cause deadlocks in replay mode.
This patch disables bdrv_drain_all and bdrv_flush_all in
record/replay mode.

Stopping the machine when the IO requests are not finished is needed
for the debugging. E.g., breakpoint may be set at the specified step,
and forcing the IO requests to finish may break the determinism
of the execution.

Signed-off-by: Pavel Dovgalyuk <[email protected]>
Acked-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

replay: update docs for record/replay with block devices

This patch updates the description of the command lines for using
record/replay with attached block devices.

Signed-off-by: Pavel Dovgalyuk <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

replay: disable default snapshot for record/replay

This patch disables setting '-snapshot' option on by default
in record/replay mode. This is needed for creating vmstates in record
and replay modes.

Signed-off-by: Pavel Dovgalyuk <[email protected]>
Acked-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: implement bdrv_snapshot_goto for blkreplay

This patch enables making snapshots with blkreplay used in
block devices.
This function is required to make bdrv_snapshot_goto without
calling .bdrv_open which is not implemented.

Signed-off-by: Pavel Dovgalyuk <[email protected]>
Acked-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block/vhdx: add check for truncated image files

qemu is currently not able to detect truncated vhdx image files.
Add a basic check if all allocated blocks are reachable at open and
report all errors during bdrv_co_check.

Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20191011a' into staging

Migration pull 2019-10-11

Mostly cleanups and minor fixes

[Note I'm seeing a hang on the aarch64 hosted x86-64 tcg migration
test in xbzrle; but I'm seeing that on current head as well]

# gpg: Signature made Fri 11 Oct 2019 20:14:31 BST
# gpg:                using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <[email protected]>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert/tags/pull-migration-20191011a: (21 commits)
  migration: Support gtree migration
  migration/multifd: pages->used would be cleared when attach to multifd_send_state
  migration/multifd: initialize packet->magic/version once at setup stage
  migration/multifd: use pages->allocated instead of the static max
  migration/multifd: fix a typo in comment of multifd_recv_unfill_packet()
  migration/postcopy: check PostcopyState before setting to POSTCOPY_INCOMING_RUNNING
  migration/postcopy: rename postcopy_ram_enable_notify to postcopy_ram_incoming_setup
  migration/postcopy: postpone setting PostcopyState to END
  migration/postcopy: mis->have_listen_thread check will never be touched
  migration: report SaveStateEntry id and name on failure
  migration: pass in_postcopy instead of check state again
  migration/postcopy: fix typo in mark_postcopy_blocktime_begin's comment
  migration/postcopy: map large zero page in postcopy_ram_incoming_setup()
  migration/postcopy: allocate tmp_page in setup stage
  migration: Don't try and recover return path in non-postcopy
  rcu: Use automatic rc_read unlock in core memory/exec code
  migration: Use automatic rcu_read unlock in rdma.c
  migration: Use automatic rcu_read unlock in ram.c
  migration: Fix missing rcu_read_unlock
  rcu: Add automatically released rcu_read_lock variants
  ...

Signed-off-by: Peter Maydell <[email protected]>

cpus: kick all vCPUs when running thread=single

qemu_cpu_kick is used for a number of reasons including to indicate
there is work to be done. However when thread=single the old
qemu_cpu_kick_rr_cpu only advanced the vCPU to the next executing one
which can lead to a hang in the case that:

a) the kick is from outside the vCPUs (e.g. iothread)
b) the timers are paused (i.e. iothread calling run_on_cpu)

To avoid this lets split qemu_cpu_kick_rr into two functions. One for
the timer which continues to advance to the next timeslice and another
for all other kicks.

Message-Id: <20191001160426 [email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg/ppc: Update vector support for v3.00 dup/dupi

These new instructions are conditional on MSR.VEC for TX=1,
so we can consider these Altivec instructions.

Tested-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg/ppc: Update vector support for v3.00 load/store

These new instructions are a mix of those like LXSD that are
only conditional only on MSR.VEC and those like LXV that are
conditional on MSR.VEC for TX=1. Thus, in the end, we can
consider all of these as Altivec instructions.

Tested-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg/ppc: Update vector support for v3.00 Altivec

These new instructions are conditional only on MSR.VEC and
are thus part of the Altivec instruction set, and not VSX.
This includes negation and compare not equal.

Tested-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg/ppc: Update vector support for v2.07 FP

These new instructions are conditional on MSR.FP when TX=0 and
MSR.VEC when TX=1. Since we only care about the Altivec registers,
and force TX=1, we can consider these to be Altivec instructions.
Since Altivec is true for any use of vector types, we only need
test have_isa_2_07.

This includes moves to and from the integer registers.

Tested-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg/ppc: Update vector support for v2.07 VSX

These new instructions are conditional only on MSR.VSX and
are thus part of the VSX instruction set, and not Altivec.
This includes double-word loads and stores.

Tested-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg/ppc: Update vector support for v2.07 Altivec

These new instructions are conditional only on MSR.VEC and
are thus part of the Altivec instruction set, and not VSX.
This includes lots of double-word arithmetic and a few extra
logical operations.

Tested-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg/ppc: Update vector support for VSX

The VSX instruction set instructions include double-word loads and
stores, double-word load and splat, double-word permute, and bit
select. All of which require multiple operations in the Altivec
instruction set.

Because the VSX registers map %vsr32 to %vr0, and we have no current
intention or need to use vector registers outside %vr0-%vr19, force
on the {ax,bx,cx,tx} bits within the added VSX insns so that we don't
have to otherwise modify the VR[TABC] macros.

Tested-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Aleksandar Markovic <[email protected]>

tcg/ppc: Enable Altivec detection

Now that we have implemented the required tcg operations,
we can enable detection of host vector support.

Tested-by: Mark Cave-Ayland <[email protected]> (PPC32)
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg/ppc: Support vector dup2

This is only used for 32-bit hosts.

Tested-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Aleksandar Markovic <[email protected]>

tcg/ppc: Support vector multiply

For Altivec, this is always an expansion.

Tested-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Aleksandar Markovic <[email protected]>

tcg/ppc: Support vector shift by immediate

For Altivec, this is done via vector shift by vector,
and loading the immediate into a register.

Tested-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Aleksandar Markovic <[email protected]>

tcg/ppc: Add support for vector saturated add/subtract

Add support for vector saturated add/subtract using Altivec
instructions:
VADDSBS, VADDSHS, VADDSWS, VADDUBS, VADDUHS, VADDUWS, and
VSUBSBS, VSUBSHS, VSUBSWS, VSUBUBS, VSUBUHS, VSUBUWS.

Tested-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Aleksandar Markovic <[email protected]>

tcg/ppc: Add support for vector add/subtract

Add support for vector add/subtract using Altivec instructions:
VADDUBM, VADDUHM, VADDUWM, VSUBUBM, VSUBUHM, VSUBUWM.

Tested-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Aleksandar Markovic <[email protected]>

tcg/ppc: Add support for vector maximum/minimum

Add support for vector maximum/minimum using Altivec instructions
VMAXSB, VMAXSH, VMAXSW, VMAXUB, VMAXUH, VMAXUW, and
VMINSB, VMINSH, VMINSW, VMINUB, VMINUH, VMINUW.

Tested-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Aleksandar Markovic <[email protected]>

tcg/ppc: Add support for load/store/logic/comparison

Add various bits and peaces related mostly to load and store
operations. In that context, logic, compare, and splat Altivec
instructions are used, and, therefore, the support for emitting
them is included in this patch too.

Tested-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Aleksandar Markovic <[email protected]>

tcg/ppc: Enable tcg backend vector compilation

Introduce all of the flags required to enable tcg backend vector support,
and a runtime flag to indicate the host supports Altivec instructions.

For now, do not actually set have_isa_altivec to true, because we have not
yet added all of the code to actually generate all of the required insns.
However, we must define these flags in order to disable ifndefs that create
stub versions of the functions added here.

The change to tcg_out_movi works around a buglet in tcg.c wherein if we
do not define tcg_out_dupi_vec we get a declared but not defined Werror,
but if we only declare it we get a defined but not used Werror. We need
to this change to tcg_out_movi eventually anyway, so it's no biggie.

Tested-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Aleksandar Markovic <[email protected]>

tcg/ppc: Replace HAVE_ISEL macro with a variable

Previously we've been hard-coding knowledge that Power7 has ISEL, but
it was an optional instruction before that. Use the AT_HWCAP2 bit,
when present, to properly determine support.

Tested-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg/ppc: Replace HAVE_ISA_2_06

This is identical to have_isa_2_06, so replace it.

Tested-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg/ppc: Create TCGPowerISA and have_isa

Introduce an enum to hold base < 2.06 < 3.00. Use macros to
preserve the existing have_isa_2_06 and have_isa_3_00 predicates.

Tested-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg/ppc: Introduce macros VRT(), VRA(), VRB(), VRC()

Introduce macros VRT(), VRA(), VRB(), VRC() used for encoding
elements of Altivec instructions.

Tested-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Aleksandar Markovic <[email protected]>

tcg/ppc: Introduce macro VX4()

Introduce macro VX4() used for encoding Altivec instructions.

Tested-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Aleksandar Markovic <[email protected]>

Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20191010.0' into staging

VFIO update 2019-10-10

- Fix MSI error path double free (Evgeny Yakovlev)

# gpg: Signature made Thu 10 Oct 2019 20:07:39 BST
# gpg:                using RSA key 239B9B6E3BB08B22
# gpg: Good signature from "Alex Williamson <[email protected]>" [full]
# gpg:                 aka "Alex Williamson <[email protected]>" [full]
# gpg:                 aka "Alex Williamson <[email protected]>" [full]
# gpg:                 aka "Alex Williamson <[email protected]>" [full]
# Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B  8A90 239B 9B6E 3BB0 8B22

* remotes/awilliam/tags/vfio-update-20191010.0:
  hw/vfio/pci: fix double free in vfio_msi_disable

Signed-off-by: Peter Maydell <[email protected]>

tcg/ppc: Introduce Altivec registers

Altivec supports 32 128-bit vector registers, whose names are
by convention v0 through v31.

Tested-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Aleksandar Markovic <[email protected]>

Merge remote-tracking branch 'remotes/gkurz/tags/9p-next-2019-10-10' into staging

The most notable change is that we now detect cross-device setups in the
host since it may cause inode number collision and mayhem in the guest.
A new fsdev property is added for the user to choose the appropriate
policy to handle that: either remap all inode numbers or fail I/Os to
another host device or just print out a warning (default behaviour).

This is also my last PR as _active_ maintainer of 9pfs.

# gpg: Signature made Thu 10 Oct 2019 12:14:07 BST
# gpg:                using RSA key B4828BAF943140CEF2A3491071D4D5E5822F73D6
# gpg: Good signature from "Greg Kurz <[email protected]>" [full]
# gpg:                 aka "Gregory Kurz <[email protected]>" [full]
# gpg:                 aka "[jpeg image of size 3330]" [full]
# Primary key fingerprint: B482 8BAF 9431 40CE F2A3  4910 71D4 D5E5 822F 73D6

* remotes/gkurz/tags/9p-next-2019-10-10:
  MAINTAINERS: Downgrade status of virtio-9p to "Odd Fixes"
  9p: Use variable length suffixes for inode remapping
  9p: stat_to_qid: implement slow path
  9p: Added virtfs option 'multidevs=remap|forbid|warn'
  9p: Treat multiple devices on one export as an error
  fsdev: Add return value to fsdev_throttle_parse_opts()
  9p: Simplify error path of v9fs_device_realize_common()
  9p: unsigned type for type, version, path

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2019-10-10' into staging

Block patches:
- Parallelized request handling for qcow2
- Backup job refactoring to use a filter node instead of before-write
  notifiers
- Add discard accounting information to file-posix nodes
- Allow trivial reopening of nbd nodes
- Some iotest fixes

# gpg: Signature made Thu 10 Oct 2019 12:40:34 BST
# gpg:                using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Max Reitz <[email protected]>" [full]
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* remotes/maxreitz/tags/pull-block-2019-10-10: (36 commits)
  iotests/162: Fix for newer Linux 5.3+
  tests: fix I/O test for hosts defaulting to LUKSv2
  nbd: add empty .bdrv_reopen_prepare
  block/backup: use backup-top instead of write notifiers
  block: introduce backup-top filter driver
  block/block-copy: split block_copy_set_callbacks function
  block/backup: move write_flags calculation inside backup_job_create
  block/backup: move in-flight requests handling from backup to block-copy
  iotests: Use stat -c %b in 125
  iotests: Disable 125 on broken XFS versions
  iotests: Fix 125 for growth_mode = metadata
  qapi: query-blockstat: add driver specific file-posix stats
  file-posix: account discard operations
  scsi: account unmap operations
  scsi: move unmap error checking to the complete callback
  scsi: store unmap offset and nb_sectors in request struct
  ide: account UNMAP (TRIM) operations
  block: add empty account cookie type
  qapi: add unmap to BlockDeviceStats
  qapi: group BlockDeviceStats fields
  ...

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/davidhildenbrand/tags/s390x-tcg-2019-10-10' into staging

- MMU DAT translation rewrite and cleanup
- Implement more TCG CPU features related to the MMU (e.g., IEP)
- Add the current instruction length to unwind data and clean up
- Resolve one TODO for the MVCL instruction

# gpg: Signature made Thu 10 Oct 2019 12:25:06 BST
# gpg:                using RSA key 1BD9CAAD735C4C3A460DFCCA4DDE10F700FF835A
# gpg:                issuer "[email protected]"
# gpg: Good signature from "David Hildenbrand <[email protected]>" [unknown]
# gpg:                 aka "David Hildenbrand <[email protected]>" [full]
# Primary key fingerprint: 1BD9 CAAD 735C 4C3A 460D  FCCA 4DDE 10F7 00FF 835A

* remotes/davidhildenbrand/tags/s390x-tcg-2019-10-10: (31 commits)
  s390x/tcg: MVCL: Exit to main loop if requested
  target/s390x: Remove ILEN_UNWIND
  target/s390x: Remove ilen argument from trigger_pgm_exception
  target/s390x: Remove ilen argument from trigger_access_exception
  target/s390x: Remove ILEN_AUTO
  target/s390x: Rely on unwinding in s390_cpu_virt_mem_rw
  target/s390x: Rely on unwinding in s390_cpu_tlb_fill
  target/s390x: Simplify helper_lra
  target/s390x: Remove fail variable from s390_cpu_tlb_fill
  target/s390x: Return exception from translate_pages
  target/s390x: Return exception from mmu_translate
  target/s390x: Remove exc argument to mmu_translate_asce
  target/s390x: Return exception from mmu_translate_real
  target/s390x: Handle tec in s390_cpu_tlb_fill
  target/s390x: Push trigger_pgm_exception lower in s390_cpu_tlb_fill
  target/s390x: Use tcg_s390_program_interrupt in TCG helpers
  target/s390x: Remove ilen parameter from s390_program_interrupt
  target/s390x: Remove ilen parameter from tcg_s390_program_interrupt
  target/s390x: Add ilen to unwind data
  s390x/cpumodel: Add new TCG features to QEMU cpu model
  ...

Signed-off-by: Peter Maydell <[email protected]>

test-bdrv-drain: fix iothread_join() hang

tests/test-bdrv-drain can hang in tests/iothread.c:iothread_run():

  while (!atomic_read(&iothread->stopping)) {
      aio_poll(iothread->ctx, true);
  }

The iothread_join() function works as follows:

  void iothread_join(IOThread *iothread)
  {
      iothread->stopping = true;
      aio_notify(iothread->ctx);
      qemu_thread_join(&iothread->thread);

If iothread_run() checks iothread->stopping before the iothread_join()
thread sets stopping to true, then aio_notify() may be optimized away
and iothread_run() hangs forever in aio_poll().

The correct way to change iothread->stopping is from a BH that executes
within iothread_run().  This ensures that iothread->stopping is checked
after we set it to true.

This was already fixed for ./iothread.c (note this is a different source
file!) by commit 2362a28ea11c145e1a13ae79342d76dc118a72a6 ("iothread:
fix iothread_stop() race condition"), but not for tests/iothread.c.

Fixes: 0c330a734b51c177ab8488932ac3b0c4d63a718a
       ("aio: introduce aio_co_schedule and aio_co_wake")
Reported-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Message-Id: <20191003100103 [email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

Update OpenBIOS images to f28e16f9 built from submodule.

Signed-off-by: Mark Cave-Ayland <[email protected]>

migration: Support gtree migration

Introduce support for GTree migration. A custom save/restore
is implemented. Each item is made of a key and a data.

If the key is a pointer to an object, 2 VMSDs are passed into
the GTree VMStateField.

When putting the items, the tree is traversed in sorted order by
g_tree_foreach.

On the get() path, gtrees must be allocated using the proper
key compare, key destroy and value destroy. This must be handled
beforehand, for example in a pre_load method.

Tests are added to test save/dump of structs containing gtrees
including the virtio-iommu domain/mappings scenario.

Signed-off-by: Eric Auger <[email protected]>
Message-Id: <20191011121724 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
uintptr_t fixup for test on 32bit

migration/multifd: pages->used would be cleared when attach to multifd_send_state

When we found an available channel in multifd_send_pages(), its
pages->used is cleared and then attached to multifd_send_state.

It is not necessary to do this twice.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20191011085050 [email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/multifd: initialize packet->magic/version once at setup stage

MultiFDPacket_t's magic and version field never changes during
migration, so move these two fields in setup stage.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20191011085050 [email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/multifd: use pages->allocated instead of the static max

multifd_send_fill_packet() prepares meta data for following pages to
transfer. It would be more proper to fill pages->allocated instead of
static max value, especially we want to support flexible packet size.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20191011085050 [email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/multifd: fix a typo in comment of multifd_recv_unfill_packet()

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20191011085050 [email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/postcopy: check PostcopyState before setting to POSTCOPY_INCOMING_RUNNING

Currently, we set PostcopyState blindly to RUNNING, even we found the
previous state is not LISTENING. This will lead to a corner case.

First let's look at the code flow:

qemu_loadvm_state_main()
    ret = loadvm_process_command()
        loadvm_postcopy_handle_run()
            return -1;
    if (ret < 0) {
        if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING)
            ...
    }

>From above snippet, the corner case is loadvm_postcopy_handle_run()
always sets state to RUNNING. And then it checks the previous state. If
the previous state is not LISTENING, it will return -1. But at this
moment, PostcopyState is already been set to RUNNING.

Then ret is checked in qemu_loadvm_state_main(), when it is -1
PostcopyState is checked. Current logic would pause postcopy and retry
if PostcopyState is RUNNING. This is not what we expect, because
postcopy is not active yet.

This patch makes sure state is set to RUNNING only previous state is
LISTENING by checking the state first.

Signed-off-by: Wei Yang <[email protected]>
Suggested by: Peter Xu <[email protected]>
Message-Id: <20191010011316 [email protected]>
Reviewed-by: Peter Xu <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/postcopy: rename postcopy_ram_enable_notify to postcopy_ram_incoming_setup

Function postcopy_ram_incoming_setup and postcopy_ram_incoming_cleanup
is a pair. Rename to make it clear for audience.

Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Message-Id: <20191010011316 [email protected]>
Reviewed-by: Peter Xu <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/postcopy: postpone setting PostcopyState to END

There are two places to call function postcopy_ram_incoming_cleanup()

    postcopy_ram_listen_thread on migration success
    loadvm_postcopy_handle_listen one setup failure

On success, the vm will never accept another migration. On failure,
PostcopyState is transited from LISTENING to END and would be checked in
qemu_loadvm_state_main(). If PostcopyState is RUNNING, migration would
be paused and retried.

Currently PostcopyState is set to END in function
postcopy_ram_incoming_cleanup(). With above analysis, we can take this
step out and postpone this till the end of listen thread to indicate the
listen thread is done.

This is a preparation patch for later cleanup.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20191006000249 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
  Fixed up in merge to the 1 parameter postcopy_state_set

migration/postcopy: mis->have_listen_thread check will never be touched

If mis->have_listen_thread is true, this means current PostcopyState
must be LISTENING or RUNNING. While the check at the beginning of the
function makes sure the state transaction happens when its previous
PostcopyState is ADVISE or DISCARD.

This means we would never touch this check.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20191006000249 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: report SaveStateEntry id and name on failure

This provides helpful information on which entry failed.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20191005220517 [email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: pass in_postcopy instead of check state again

Not necessary to do the check again.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20191005220517 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/postcopy: fix typo in mark_postcopy_blocktime_begin's comment

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20191005220517 [email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/postcopy: map large zero page in postcopy_ram_incoming_setup()

postcopy_ram_incoming_setup() and postcopy_ram_incoming_cleanup() are
counterpart. It is reasonable to map/unmap large zero page in these two
functions respectively.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20191005135021 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/postcopy: allocate tmp_page in setup stage

During migration, a tmp page is allocated so that we could place a whole
host page during postcopy.

Currently the page is allocated during load stage, this is a little bit
late. And more important, if we failed to allocate it, the error is not
checked properly. Even it is NULL, we would still use it.

This patch moves the allocation to setup stage and if failed error
message would be printed and caller would notice it.

Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: Don't try and recover return path in non-postcopy

In normal precopy we can't do reconnection recovery - but we also
don't need to, since you can just rerun migration.
At the moment if the 'return-path' capability is on, we use
the return path in precopy to give a positive 'OK' to the end
of migration; however if migration fails then we fall into
the postcopy recovery path and hang. This fixes it by only
running the return path in the postcopy case.

Reported-by: Greg Kurz <[email protected]>
Tested-by: Greg Kurz <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>