Git Repo - qemu.git/log

cli qmp: Mark --preconfig, exit-preconfig experimental

Committing to the current --preconfig / exit-preconfig interface
before it has seen any use is premature. Mark both as experimental,
the former in documentation, the latter by renaming it to
x-exit-preconfig.

See the previous commit for more detailed rationale.

Signed-off-by: Markus Armbruster <[email protected]>
Message-Id: <20180705091402 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Acked-by: Eduardo Habkost <[email protected]>
Acked-by: Igor Mammedov <[email protected]>
[Straightforward conflict with commit 514337c142f resolved]

qapi: Do not expose "allow-preconfig" in query-qmp-schema

According to commit 047f7038f58, option --preconfig

    [...] allows pausing QEMU in the new RUN_STATE_PRECONFIG state,
    allowing the configuration of QEMU from QMP before the machine
    jumps into board initialization code of machine_run_board_init()

    The intent is to allow management to query machine state and
    additionally configure it using previous query results within one
    QEMU instance (i.e. eliminate the need to start QEMU twice, 1st to
    query board specific parameters and 2nd for actual VM start using
    query results for additional parameters).

The implementation is a bit of a hack: it splices in an additional
main loop before machine creation, in special runstate preconfig.  New
command exit-preconfig exits that main loop.  QEMU continues
initializing, creates the machine, and runs the good old main loop.
The replacement of the main loop is transparent to monitors.

Sadly, some commands expect initialization to be complete.  Running
them in --preconfig's main loop violates their preconditions.  Since
we don't really know which commands are safe, we use a whitelist.
This drags the concept of run state into the QMP core.

The whitelist is done as a command flag in the QAPI schema (commit
d6fe3d02e9a).  Drags the concept of run state further into the QAPI
language.

The command flag is exposed in query-qmp-schema (also commit
d6fe3d02e9a).  This makes it ABI.

I consider the whole thing an offensively ugly hack, but sometimes an
ugly hack is the best we can do to solve a problem people have.

The need described by the commit message quote above is genuine.  The
proper solution would be a main loop that permits complete
configuration via QMP.  This is out of reach, thus the hack.

However, even though the need is genuine, it isn't urgent: libvirt is
not going to use this anytime soon.  Baking a hack into ABI before it
has any users is a bad idea.

This commit reverts the parts of commit d6fe3d02e9a that affect ABI
via query-qmp-schema.  The commit did the following:

(1) Add command flag 'allow-preconfig' to the QAPI schema language

(2) Pass it to code generators

(3) Have the commands.py code generator pass it to the command
    registry (so commit 047f7038f58 can use it as whitelist)

(4) Add 'allow-preconfig' to SchemaInfoCommand (neglecting to update
    qapi-code-gen.txt section "Client JSON Protocol introspection")

(5) Set 'allow-preconfig': true for commands qmp_capabilities,
    query-commands, query-command-line-options, query-status

Revert exactly (4), plus a bit of documentation added to
qemu-tech.info in commit 047f7038f58.

Shrinks query-qmp-schema's output from 126.5KiB to 121.8KiB for me.

Signed-off-by: Markus Armbruster <[email protected]>
Message-Id: <20180705091402 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Acked-by: Eduardo Habkost <[email protected]>
Acked-by: Igor Mammedov <[email protected]>
[Straightforward conflict with commit d626b6c1ae7 resolved]

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-3.0-20180716' into staging

ppc patch queue 2018-07-16

Here's my first hard freeze pull request for qemu-3.0.  This contains
an assortment of bugfixes. Several are for regressions, others are for
bugs that I think are significant enough to address during hard freeze.

# gpg: Signature made Mon 16 Jul 2018 09:28:37 BST
# gpg:                using RSA key 6C38CACA20D9B392
# gpg: Good signature from "David Gibson <[email protected]>"
# gpg:                 aka "David Gibson (Red Hat) <[email protected]>"
# gpg:                 aka "David Gibson (ozlabs.org) <[email protected]>"
# gpg:                 aka "David Gibson (kernel.org) <[email protected]>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-3.0-20180716:
  sm501: Fix warning about unreachable code
  sam460ex: Correct use after free error
  etsec: fix IRQ (un)masking
  ppc/xics: fix ICP reset path
  spapr: Correct inverted test in spapr_pc_dimm_node()
  sm501: Update screen on frame buffer address change

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/vivier2/tags/linux-user-for-3.0-pull-request' into staging

Some fixes for linux-user:
- workaround for CMSG_NXTHDR bug
- two patches for ppc64/ppc64le host:
  fix fcntl() with *LK64 commands
  (seen when dpkg wants to lock the DB)
  fix reserved_va alignment (ppc64 needs
  a 64kB alignment)
- convert a forgotten fcntl() to safe_fcntl()

# gpg: Signature made Sun 15 Jul 2018 20:51:19 BST
# gpg:                using RSA key F30C38BD3F2FBE3C
# gpg: Good signature from "Laurent Vivier <[email protected]>"
# gpg:                 aka "Laurent Vivier <[email protected]>"
# gpg:                 aka "Laurent Vivier (Red Hat) <[email protected]>"
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier2/tags/linux-user-for-3.0-pull-request:
  Zero out the host's `msg_control` buffer
  linux-user: fix mmap_find_vma_reserved()
  linux-user: convert remaining fcntl() to safe_fcntl()
  linux-user: ppc64: use the correct values for F_*LK64s

Signed-off-by: Peter Maydell <[email protected]>

sm501: Fix warning about unreachable code

Coverity warned that the false arm of conditional expression is
unreachable when it is inside an if with the same condition.
Remove the unreachable code to avoid the warning.

Fixes: CID 1394215
Reported-by: Paolo Bonzini <[email protected]>
Signed-off-by: BALATON Zoltan <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: David Gibson <[email protected]>

sam460ex: Correct use after free error

Commit 51b0d834c changed error handling to report file name in error
message but forgot to move freeing it after usage. Noticed by Coverity.

Fixes: CID 1394217
Reported-by: Paolo Bonzini <[email protected]>
Signed-off-by: BALATON Zoltan <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: David Gibson <[email protected]>

etsec: fix IRQ (un)masking

Interrupt conditions occurring while masked are not being
signaled when later unmasked.
The fix is to raise/lower IRQs when IMASK is changed.

To avoid problems like this in future, consolidate
IRQ pin update logic in one function.

Also fix probable typo "IEVENT_TXF | IEVENT_TXF",
and update IRQ pins on reset.

Signed-off-by: Michael Davidsaver <[email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>
Signed-off-by: David Gibson <[email protected]>

ppc/xics: fix ICP reset path

Recent cleanup in commit a028dd423ee6 dropped the ICPStateClass::reset
handler. It is now up to child ICP classes to call the DeviceClass::reset
handler of the parent class, thanks to device_class_set_parent_reset().
This is a better object programming pattern, but unfortunately it causes
QEMU to crash during CPU hotplug:

(qemu) device_add host-spapr-cpu-core,id=core1,core-id=1
Segmentation fault (core dumped)

When the hotplug path tries to reset the ICP device, we end up calling:

static void icp_kvm_reset(DeviceState *dev)
{
    ICPStateClass *icpc = ICP_GET_CLASS(dev);

    icpc->parent_reset(dev);

but icpc->parent_reset is NULL... This happens because icp_kvm_class_init()
calls:

    device_class_set_parent_reset(dc, icp_kvm_reset,
                                  &icpc->parent_reset);

but dc->reset, ie, DeviceClass::reset for the TYPE_ICP type, is
itself NULL.

This patch hence sets DeviceClass::reset for the TYPE_ICP type to
point to icp_reset(). It then registers a reset handler that calls
DeviceClass::reset. If the ICP subtype has configured its own reset
handler with device_class_set_parent_reset(), this ensures it will
be called first and it can then call ICPStateClass::parent_reset
safely. This fixes the reset path for the TYPE_KVM_ICP type, which
is the only subtype that defines its own reset function.

Reported-by: Satheesh Rajendran <[email protected]>
Suggested-by: David Gibson <[email protected]>
Fixes: a028dd423ee6dfd091a8c63028240832bf10f671
Signed-off-by: Greg Kurz <[email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr: Correct inverted test in spapr_pc_dimm_node()

This function was introduced between v2.11 and v2.12 to replace obsolete
ways of specifying the NUMA nodes for DIMMs. It's used to find the correct
node for an LMB, by locating which DIMM object it lies within.

Unfortunately, one of the checks is inverted, so we check whether the
address is less than two different things, rather than actually checking
a range. This introduced a regression, meaning that after a reboot qemu
will advertise incorrect node information for memory to the guest.

Signed-off-by: David Gibson <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>

sm501: Update screen on frame buffer address change

When the guest changes the address of the frame buffer we need to
refresh the screen to correctly display the new content. This fixes
display update problems when changing between screens on AmigaOS.

Signed-off-by: BALATON Zoltan <[email protected]>
Signed-off-by: David Gibson <[email protected]>

Zero out the host's `msg_control` buffer

If this is not done, qemu would drop any control message after the first
one.

This is because glibc's `CMSG_NXTHDR` macro accesses the uninitialized
cmsghdr's length field in order to find out if the message fits into the
`msg_control` buffer, wrongly assuming that it doesn't because the
length field contains garbage. Accessing the length field is fine for
completed messages we receive from the kernel, but is - as far as I know
- not needed since the kernel won't return such an invalid cmsghdr in
the first place.

This is tracked as this glibc bug:
https://sourceware.org/bugzilla/show_bug.cgi?id=13500

It's probably also a good idea to bail with an error if `CMSG_NXTHDR`
returns NULL but `TARGET_CMSG_NXTHDR` doesn't (ie. we still expect
cmsgs).

Signed-off-by: Jonas Schievink <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Message-Id: <20180711221244 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: fix mmap_find_vma_reserved()

The value given by mmap_find_vma_reserved() is used with mmap(),
so it is needed to be aligned with the host page size.

Since commit 18e80c55bb, reserved_va is only aligned to TARGET_PAGE_SIZE,
and it works well if this size is greater or equal to the host page size.

But ppc64 hosts have 64kB page size and when we start a 4kiB page size
guest (like i386), it fails when it tries to mmap the stack:

mmap stack: Invalid argument

Fixes: 18e80c55bb (linux-user: Tidy and enforce reserved_va initialization)
Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <20180714193553 [email protected]>

linux-user: convert remaining fcntl() to safe_fcntl()

Commit 435da5e709 didn't convert a fcntl() call to safe_fcntl()
for TARGET_NR_fcntl64 case. There is no reason to not use it
in this case.

Fixes: 435da5e709 linux-user: Use safe_syscall wrapper for fcntl
Signed-off-by: Laurent Vivier <[email protected]>
Message-Id: <20180713125805 [email protected]>

linux-user: ppc64: use the correct values for F_*LK64s

Qemu includes the glibc headers for the host defines and target headers are
part of the qemu source themselves. The glibc has the F_GETLK64, F_SETLK64
and F_SETLKW64 defined to 12, 13 and 14 for all archs in
sysdeps/unix/sysv/linux/bits/fcntl-linux.h. The linux kernel generic
definition for F_*LK is 5, 6 & 7 and F_*LK64* is 12,13, and 14 as seen in
include/uapi/asm-generic/fcntl.h. On 64bit machine, by default the kernel
assumes all F_*LK to 64bit calls and doesnt support use of F_*LK64* as
can be seen in include/linux/fcntl.h in linux source.

On x86_64 host, the values for F_*LK64* are set to 5, 6 and 7
explicitly in /usr/include/x86_64-linux-gnu/bits/fcntl.h by the glibc.
Whereas, a PPC64 host doesn't have such a definition in
/usr/include/powerpc64le-linux-gnu/bits/fcntl.h by the glibc. So,
the sources on PPC64 host sees the default value of F_*LK64*
as 12, 13 & 14(fcntl-linux.h).

Since the 64bit kernel doesnt support 12, 13 & 14; the glibc fcntl syscall
implementation(__libc_fcntl*(), __fcntl64_nocancel) does the F_*LK64* value
convertion back to F_*LK* values on PPC64 as seen in
sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep.h with FCNTL_ADJUST_CMD()
macro. Whereas on x86_64 host the values for F_*LK64* are set to 5, 6 and 7
and no adjustments are needed.

Since qemu doesnt use the glibc fcntl, but makes the safe_syscall* on its
own, the PPC64 qemu is calling the syscall with 12, 13, and 14(without
adjustment) and they all fail. The fcntl calls to F_GETLK/F_SETLK|W all
fail by all pplications run on PPC64 host user emulation.

The fix here could be to see why on PPC64 the glibc is still keeping
F_*LK64* different from F_*LK and why adjusting them to 5, 6 and 7 before
the syscall for PPC only. See if we can make the
/usr/include/powerpc64le-linux-gnu/bits/fcntl.h to have the values
5, 6 & 7 just like x86_64 and remove the adjustment code in glibc. That
way, qemu sources see the kernel supported values in glibc headers.

OR

On PPC64 host, qemu sources see both F_*LK & F_*LK64* as same and set to
12, 13 and 14 because __USE_FILE_OFFSET64 is defined in qemu
sources(also refer sysdeps/unix/sysv/linux/bits/fcntl-linux.h).
Do the value adjustment just like it is done by glibc source by using
F_GETLK value of 5. That way, we make the syscalls with the actual
supported values in Qemu. The patch is taking this approach.

Signed-off-by: Shivaprasad G Bhat <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Message-Id: <153148521235.87746.14142430397318741182 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

docs: Grammar and spelling fixes

Signed-off-by: Ville Skyttä <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-id: 20180612065150 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- file-posix: Check correct file type (regular file for 'file',
  character or block device for 'host_device'/'host_cdrom')
- scsi-disk: Block Device Characteristics emulation fix
- qemu-img: Consider required alignment for sparse area detection
- Documentation and test improvements

# gpg: Signature made Thu 12 Jul 2018 17:29:17 BST
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  qemu-img: align result of is_allocated_sectors
  scsi-disk: Block Device Characteristics emulation fix
  iotests: add test 226 for file driver types
  file-posix: specify expected filetypes
  qemu-img: Document copy offloading implications with -S and -c
  iotests: nbd: Stop qemu-nbd before remaking image
  iotests: 153: Fix dead code

Signed-off-by: Peter Maydell <[email protected]>

qemu-img: align result of is_allocated_sectors

We currently don't enforce that the sparse segments we detect during convert are
aligned. This leads to unnecessary and costly read-modify-write cycles either
internally in Qemu or in the background on the storage device as nearly all
modern filesystems or hardware have a 4k alignment internally.

This patch modifies is_allocated_sectors so that its *pnum result will always
end at an alignment boundary. This way all requests will end at an alignment
boundary. The start of all requests will also be aligned as long as the results
of get_block_status do not lead to an unaligned offset.

The number of RMW cycles when converting an example image [1] to a raw device that
has 4k sector size is about 4600 4k read requests to perform a total of about 15000
write requests. With this path the additional 4600 read requests are eliminated while
the number of total write requests stays constant.

[1] https://cloud-images.ubuntu.com/releases/16.04/release/ubuntu-16.04-server-cloudimg-amd64-disk1.vmdk

Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

scsi-disk: Block Device Characteristics emulation fix

The current BDC VPD page (page 0xb1) is too short. This can be
seen running sg_utils:

$ sg_vpd --page=bdc /dev/sda
Block device characteristics VPD page (SBC):
Block device characteristics VPD page length too short=8

By the SCSI spec, the expected size of the SBC page is 0x40.
There is no telling how the guest will behave with a shorter
message - it can ignore it, or worse, make (wrong)
assumptions.

This patch fixes the emulation by setting the size to 0x40.
This is the output of the previous sg_vpd command after
applying it:

$ sg_vpd --page=bdc /dev/sda -v
    inquiry cdb: 12 01 b1 00 fc 00
Block device characteristics VPD page (SBC):
   [PQual=0  Peripheral device type: disk]
  Medium rotation rate is not reported
  Product type: Not specified
  WABEREQ=0
  WACEREQ=0
  Nominal form factor not reported
  FUAB=0
  VBULS=0

To improve readability, this patch also adds the VBULS value
explictly and add comments on the existing fields we're
setting.

Signed-off-by: Daniel Henrique Barboza <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests: add test 226 for file driver types

Test that we're rejecting what we ought to for file,
host_driver and host_cdrom drivers. Test that we're
seeing the deprecated message for block and chardevs
on the file driver.

Signed-off-by: John Snow <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

file-posix: specify expected filetypes

Adjust each caller of raw_open_common to specify if they are expecting
host and character devices or not. Tighten expectations of file types upon
open in the common code and refuse types that are not expected.

This has two effects:

(1) Character and block devices are now considered deprecated for the
'file' driver, which expects only S_IFREG, and
(2) no file-posix driver (file, host_cdrom, or host_device) can open
directories now.

I don't think there's a legitimate reason to open directories as if
they were files. This prevents QEMU from opening and attempting to probe
a directory inode, which can break in exciting ways. One of those ways
is lseek on ext4/xfs, which will return 0x7fffffffffffffff as the file
size instead of EISDIR. This can coax QEMU into responding with a
confusing "file too big" instead of "Hey, that's not a file".

See: https://bugs.launchpad.net/qemu/+bug/1739304/
Signed-off-by: John Snow <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-img: Document copy offloading implications with -S and -c

Explicitly enabling zero detection or compression suppresses copy
offloading during convert. Document it.

Suggested-by: Kevin Wolf <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests: nbd: Stop qemu-nbd before remaking image

197 is one example where _make_test_img is used twice without stopping
the NBD server in between. An error will occur like this:

    @@ -26,9 +26,13 @@

     === Partial final cluster ===

    +qemu-img: TEST_DIR/t.IMGFMT: Failed to get "resize" lock
    +Is another process using the image?
     Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1024
    +Failed to find an available port: Address already in use
     read 1024/1024 bytes at offset 0

Patch _make_test_img to stop the old qemu-nbd before starting a new one,
which fixes this problem, and similarly 215.

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests: 153: Fix dead code

This step was left behind my mistake. As suggested by the echoed text,
the intention was to test two devices with the same image, with
different options. The behavior should be the same as two QEMU
processes. Complete it.

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

ui/cocoa.m: replace scrollingDeltaY with deltaY

The NSEvent class method scrollingDeltaY is available
for Mac OS 10.7 and newer. Since QEMU supports Mac OS
10.5 and up, we need to be using a method that is
available on these version of Mac OS X. The deltaY
method is a method that does almost the same thing as
scrollingDeltaY and is available on Mac OS 10.5 and
up. So we can replace scrollingDeltaY with deltaY.

We only check deltaY's value if it is not zero
because zero means that the scrolling increment was
sufficiently fine that it was only reported in scrollingDeltaY,
or that the scrolling was horizontal.

Signed-off-by: John Arbuckle <[email protected]>
Message-id: 20180709150235 [email protected]
Reviewed-by: Peter Maydell <[email protected]>
[PMM: tweak commit message and comment a little]
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/otubo/tags/pull-seccomp-20180712' into staging

pull-seccomp-20180712

# gpg: Signature made Thu 12 Jul 2018 13:55:34 BST
# gpg:                using RSA key DF32E7C0F0FFF9A2
# gpg: Good signature from "Eduardo Otubo (Senior Software Engineer) <[email protected]>"
# Primary key fingerprint: D67E 1B50 9374 86B4 0723  DBAB DF32 E7C0 F0FF F9A2

* remotes/otubo/tags/pull-seccomp-20180712:
  seccomp: allow sched_setscheduler() with SCHED_IDLE policy

Signed-off-by: Peter Maydell <[email protected]>

seccomp: allow sched_setscheduler() with SCHED_IDLE policy

Current and upcoming mesa releases rely on a shader disk cash. It uses
a thread job queue with low priority, set with
sched_setscheduler(SCHED_IDLE). However, that syscall is rejected by
the "resourcecontrol" seccomp qemu filter.

Since it should be safe to allow lowering thread priority, let's allow
scheduling thread to idle policy.

Related to:
https://bugzilla.redhat.com/show_bug.cgi?id=1594456

Signed-off-by: Marc-André Lureau <[email protected]>
Acked-by: Eduardo Otubo <[email protected]>

Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20180712' into staging

- fix confusion around sizes in storage attribute migration
- remove NULL check on error_propagate() in virtio-ccw

# gpg: Signature made Thu 12 Jul 2018 10:27:28 BST
# gpg:                using RSA key DECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <[email protected]>"
# gpg:                 aka "Cornelia Huck <[email protected]>"
# gpg:                 aka "Cornelia Huck <[email protected]>"
# gpg:                 aka "Cornelia Huck <[email protected]>"
# gpg:                 aka "Cornelia Huck <[email protected]>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0  18CE DECF 6B93 C6F0 2FAF

* remotes/cohuck/tags/s390x-20180712:
  error: Remove NULL checks on error_propagate() calls
  s390x/storage attributes: fix CMMA_BLOCK_SIZE usage

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/awilliam/tags/vfio-fixes-20180711.1' into staging

VFIO fixes 2018-07-11

- Avoid RAMBlock segfault in option ROM teardown for vfio-pci devices
   (Cédric Le Goater)

# gpg: Signature made Wed 11 Jul 2018 20:44:44 BST
# gpg:                using RSA key 239B9B6E3BB08B22
# gpg: Good signature from "Alex Williamson <[email protected]>"
# gpg:                 aka "Alex Williamson <[email protected]>"
# gpg:                 aka "Alex Williamson <[email protected]>"
# gpg:                 aka "Alex Williamson <[email protected]>"
# Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B  8A90 239B 9B6E 3BB0 8B22

* remotes/awilliam/tags/vfio-fixes-20180711.1:
  vfio/pci: do not set the PCIDevice 'has_rom' attribute

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/armbru/tags/pull-monitor-2018-07-11' into staging

Monitor patches for 2018-07-11

# gpg: Signature made Wed 11 Jul 2018 20:12:31 BST
# gpg:                using RSA key 3870B400EB918653
# gpg: Good signature from "Markus Armbruster <[email protected]>"
# gpg:                 aka "Markus Armbruster <[email protected]>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-monitor-2018-07-11:
  monitor: fix double-free of request error

Signed-off-by: Peter Maydell <[email protected]>

vfio/pci: do not set the PCIDevice 'has_rom' attribute

PCI devices needing a ROM allocate an optional MemoryRegion with
pci_add_option_rom(). pci_del_option_rom() does the cleanup when the
device is destroyed. The only action taken by this routine is to call
vmstate_unregister_ram() which clears the id string of the optional
ROM RAMBlock and now, also flags the RAMBlock as non-migratable. This
was recently added by commit b895de502717 ("migration: discard
non-migratable RAMBlocks"), .

VFIO devices do their own loading of the PCI option ROM in
vfio_pci_size_rom(). The memory region is switched to an I/O region
and the PCI attribute 'has_rom' is set but the RAMBlock of the ROM
region is not allocated. When the associated PCI device is deleted,
pci_del_option_rom() calls vmstate_unregister_ram() which tries to
flag a NULL RAMBlock, leading to a SEGV.

It seems that 'has_rom' was set to have memory_region_destroy()
called, but since commit 469b046ead06 ("memory: remove
memory_region_destroy") this is not necessary anymore as the
MemoryRegion is freed automagically.

Remove the PCIDevice 'has_rom' attribute setting in vfio.

Fixes: b895de502717 ("migration: discard non-migratable RAMBlocks")
Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>

monitor: fix double-free of request error

qmp_error_response() will free the given error. Fix double-free in
later qmp_request_free().

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20180705164201 [email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Fixes: 1cc37471525d03f963bc71d724f0dc9eab888fc1
Signed-off-by: Markus Armbruster <[email protected]>

error: Remove NULL checks on error_propagate() calls

Patch created mechanically by rerunning:

  $  spatch --sp-file scripts/coccinelle/error_propagate_null.cocci \
            --macro-file scripts/cocci-macro-file.h \
            --dir . --in-place

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Message-Id: <20180705155811 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/storage attributes: fix CMMA_BLOCK_SIZE usage

The macro CMMA_BLOCK_SIZE was defined but not used, and a hardcoded
value was instead used in the code.

This patch fixes the value of CMMA_BLOCK_SIZE and uses it in the
appropriate place in the code, and fixes another case of hardcoded
value in the KVM backend, replacing it with the more appropriate
constant KVM_S390_CMMA_SIZE_MAX.

Signed-off-by: Claudio Imbrenda <[email protected]>
Message-Id: <1530787170 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

Update version for v3.0.0-rc0 release

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- Copy offloading fixes for when the copy increases the image size
- Temporary revert of the removal of deprecated -drive options
- Fix request serialisation in the image fleecing scenario
- Fix copy-on-read crash with unaligned image size
- Fix another drain crash

# gpg: Signature made Tue 10 Jul 2018 16:37:52 BST
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (24 commits)
  block: Use common write req handling in truncate
  block: Fix bdrv_co_truncate overlap check
  block: Use common req handling in copy offloading
  block: Use common req handling for discard
  block: Fix handling of image enlarging write
  block: Extract common write req handling
  block: Use uint64_t for BdrvTrackedRequest byte fields
  block: Use BdrvChild to discard
  block: Add copy offloading trace points
  block: Prefix file driver trace points with "file_"
  Revert "block: Remove deprecated -drive geometry options"
  Revert "block: Remove deprecated -drive option addr"
  Revert "block: Remove deprecated -drive option serial"
  Revert "block: Remove dead deprecation warning code"
  block/blklogwrites: Make sure the log sector size is not too small
  qapi/block-core.json: Add missing documentation for blklogwrites log-append option
  block/backup: fix fleecing scheme: use serialized writes
  block: add BDRV_REQ_SERIALISING flag
  block: split flags in copy_range
  block/io: fix copy_range
  ...

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20180710a' into staging

Migration pull 2018-07-10 (for 3.0)

Migration fixes and migration test fixes, mostly
around postcopy and postcopy recovery

# gpg: Signature made Tue 10 Jul 2018 16:27:19 BST
# gpg:                using RSA key 0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <[email protected]>"
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert/tags/pull-migration-20180710a:
  migration: reorder MIG_CMD_POSTCOPY_RESUME
  tests: hide stderr for postcopy recovery test
  tests: add postcopy recovery test
  tests: introduce wait_for_migration_status()
  tests: introduce migrate_query*() helpers
  tests: allow migrate() to take extra flags
  tests: introduce migrate_postcopy_* helpers
  migration: show pause/recover state on dst host
  migration: fix incorrect bitmap size calculation
  migration: loosen recovery check when load vm
  migration: simplify check to use qemu file buffer
  migration: unify incoming processing
  migration: unbreak postcopy recovery
  migration: move income process out of multifd
  migration: delay postcopy paused state

Signed-off-by: Peter Maydell <[email protected]>

block: Use common write req handling in truncate

Truncation is the last to convert from open coded req handling to
reusing helpers. This time the permission check in prepare has to adapt
to the new caller: it checks a different permission bit, and doesn't
trigger the before write notifier.

Also, truncation should always trigger a bs->total_sectors update and in
turn call parent resize_cb. Update the condition in finish helper, too.

It's intended to do a duplicated bs->read_only check before calling
bdrv_co_write_req_prepare() so that we can be more informative with the
error message, as bdrv_co_write_req_prepare() doesn't have Error
parameter.

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Fix bdrv_co_truncate overlap check

If we are growing the image and potentially using preallocation for the
new area, we need to make sure that no write requests are made to the
"preallocated" area which is [@old_size, @offset), not
[@offset, offset * 2 - @old_size).

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Use common req handling in copy offloading

This brings the request handling logic inline with write and discard,
fixing write_gen, resize_cb, dirty bitmaps and image size refreshing.
The last of these issues broke iotest case 222, which is now fixed.

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Use common req handling for discard

Reuse the new bdrv_co_write_req_prepare/finish helpers. The variation
here is that discard requests don't affect bs->wr_highest_offset, and it
cannot extend the image.

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

migration: reorder MIG_CMD_POSTCOPY_RESUME

It was accidently added before MIG_CMD_PACKAGED so it might break
command compatibility when we run postcopy migration between old/new
QEMUs. Fix that up quickly before the QEMU 3.0 release.

Reported-by: Lukáš Doktor <[email protected]>
Suggested-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20180710094424 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

tests: hide stderr for postcopy recovery test

We dumped something when network failure happens.  We should avoid those
messages to be dumped when running the tests:

  $ ./tests/migration-test -p /x86_64/migration/postcopy/recovery
  /x86_64/migration/postcopy/recovery: qemu-system-x86_64: check_section_footer: Read section footer failed: -5
  qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
  qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
  OK

After the patch:

  $ ./tests/migration-test -p /x86_64/migration/postcopy/recovery
  /x86_64/migration/postcopy/recovery: OK

Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20180710091902 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

tests: add postcopy recovery test

Test the postcopy recovery procedure by emulating a network failure
using migrate-pause command.

Tested-by: Balamuruhan S <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20180710091902 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

tests: introduce wait_for_migration_status()

It's generalized from wait_for_migration_complete() to allow us to wait
for any migration status besides failure.

Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Reviewed-by: Balamuruhan S <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20180710091902 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

tests: introduce migrate_query*() helpers

Introduce helpers to query migration states and use it.

Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Reviewed-by: Balamuruhan S <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20180710091902 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

tests: allow migrate() to take extra flags

For example, we can pass in '"resume": true' to resume a migration.

Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Reviewed-by: Balamuruhan S <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20180710091902 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

tests: introduce migrate_postcopy_* helpers

Separate the old postcopy UNIX socket test into three steps, provide a
helper for each step. With these helpers, we can do more compliated
tests like postcopy recovery, while keep the codes shared.

Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Reviewed-by: Balamuruhan S <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20180710091902 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Fix up merge with 2e295789 / Skip tests for ppc tcg

block: Fix handling of image enlarging write

Two problems exist when a write request that enlarges the image (i.e.
write beyond EOF) finishes:

1) parent is not notified about size change;
2) dirty bitmap is not resized although we try to set the dirty bits;

Fix them just like how bdrv_co_truncate works.

Reported-by: Kevin Wolf <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Extract common write req handling

As a mechanical refactoring patch, this is the first step towards
unified and more correct write code paths. This is helpful because
multiple BlockDriverState fields need to be updated after modifying
image data, and it's hard to maintain in multiple places such as copy
offload, discard and truncate.

Suggested-by: Kevin Wolf <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Use uint64_t for BdrvTrackedRequest byte fields

This matches the types used for bytes in the rest parts of block layer.
In the case of bdrv_co_truncate, new_bytes can be the image size which
probably doesn't fit in a 32 bit int.

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Use BdrvChild to discard

Other I/O functions are already using a BdrvChild pointer in the API, so
make discard do the same. It makes it possible to initiate the same
permission checks before doing I/O, and much easier to share the
helper functions for this, which will be added and used by write,
truncate and copy range paths.

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Add copy offloading trace points

A few trace points that can help reveal what is happening in a copy
offloading I/O path.

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Prefix file driver trace points with "file_"

With in one module, trace points usually have a common prefix named
after the module name. paio_submit and paio_submit_co are the only two
trace points so far in the two file protocol drivers. As we are adding
more, having a common prefix here is better so that trace points can be
enabled with a glob. Rename them.

Suggested-by: Kevin Wolf <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

Revert "block: Remove deprecated -drive geometry options"

This reverts commit a7aff6dd10b16b67e8b142d0c94c5d92c3fe88f6.

Hold off removing this for one more QEMU release (current libvirt
release still uses it.)

Signed-off-by: Cornelia Huck <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

Revert "block: Remove deprecated -drive option addr"

This reverts commit eae3bd1eb7c6b105d30ec06008b3bc3dfc5f45bb.

Reverted to avoid conflicts for geometry options revert.

Signed-off-by: Cornelia Huck <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

Revert "block: Remove deprecated -drive option serial"

This reverts commit b0083267444a5e0f28391f6c2831a539f878d424.

Hold off removing this for one more QEMU release (current libvirt
release still uses it.)

Signed-off-by: Cornelia Huck <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

Revert "block: Remove dead deprecation warning code"

This reverts commit 6266e900b8083945cb766b45c124fb3c42932cb3.

Some deprecated -drive options were still in use by libvirt, only
fixed with libvirt commit b340c6c614 ("qemu: format serial and geometry
on frontend disk device"), which is not yet in any released version
of libvirt.

So let's hold off removing the deprecated options for one more QEMU
release.

Reported-by: Christian Borntraeger <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

migration: show pause/recover state on dst host

These two states will be missing when doing "query-migrate" on
destination VM. Add these states so that we can get the query results
as expected.

Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20180710091902 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: fix incorrect bitmap size calculation

The calculation on size of received bitmap is incorrect for postcopy
recovery. Here we wanted to let the size to cover all the valid bits in
the bitmap, we should use DIV_ROUND_UP() instead of a division.

For example, a RAMBlock with size=4K (which contains only one single 4K
page) will have nbits=1, then nbits/8=0, then the real bitmap won't be
sent to source at all.

Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20180710091902 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: loosen recovery check when load vm

We were checking against -EIO, assuming that it will cover all IO
failures. But actually it is not. One example is that in
qemu_loadvm_section_start_full() we can have tons of places that will
return -EINVAL even if the error is caused by IO failures on the
network.

Let's loosen the recovery check logic here to cover all the error cases
happened by removing the explicit check against -EIO. After all we
won't lose anything here if any other failure happened.

Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20180710091902 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: simplify check to use qemu file buffer

Firstly, renaming the old matching_page_sizes variable to
matches_target_page_size, which suites more to what it did (it only
checks against target page size rather than multiple page sizes).
Meanwhile, simplify the check logic a bit, and enhance the comments.
Should have no functional change.

Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20180710091902 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: unify incoming processing

This is the 2nd patch to unbreak postcopy recovery.

Let's unify the migration_incoming_process() call at a single place
rather than calling it in connection setup codes. This fixes a problem
that we will go into incoming migration procedure even if we are trying
to recovery from a paused postcopy migration.

Fixes: 36c2f8be2c ("migration: Delay start of migration main routines")
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20180627132246 [email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: unbreak postcopy recovery

The whole postcopy recovery logic was accidentally broken.  We need to
fix it in two steps.

This is the first step that we should do the recovery when needed.  It
was bypassed before after commit 36c2f8be2c.

Introduce postcopy_try_recovery() helper for the postcopy recovery
logic.  Call it both in migration_fd_process_incoming() and
migration_ioc_process_incoming().

Fixes: 36c2f8be2c ("migration: Delay start of migration main routines")
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20180627132246 [email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: move income process out of multifd

Move the call to migration_incoming_process() out of multifd code. It's
a bit strange that we can migration generic calls in multifd code.
Instead, let multifd_recv_new_channel() return a boolean showing whether
it's ready to continue the incoming migration.

Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20180627132246 [email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: delay postcopy paused state

Before this patch we firstly setup the postcopy-paused state then we
clean up the QEMUFile handles. That can be racy if there is a very fast
"migrate-recover" command running in parallel. Fix that up.

Reported-by: Peter Maydell <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20180627132246 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

block/blklogwrites: Make sure the log sector size is not too small

The sector size needs to be large enough to accommodate the data
structures for the log super block and log write entries. This was
previously not properly checked, which made it possible to cause
QEMU to badly misbehave.

Signed-off-by: Ari Sundholm <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qapi/block-core.json: Add missing documentation for blklogwrites log-append option

This was accidentally omitted. Thanks to Eric Blake for spotting this.

Signed-off-by: Ari Sundholm <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block/backup: fix fleecing scheme: use serialized writes

Fleecing scheme works as follows: we want a kind of temporary snapshot
of active drive A. We create temporary image B, with B->backing = A.
Then we start backup(sync=none) from A to B. From this point, B reads
as point-in-time snapshot of A (A continues to be active drive,
accepting guest IO).

This scheme needs some additional synchronization between reads from B
and backup COW operations, otherwise, the following situation is
theoretically possible:

(assume B is qcow2, client is NBD client, reading from B)

1. client starts reading and take qcow2 mutex in qcow2_co_preadv, and
   goes up to l2 table loading (assume cache miss)

2) guest write => backup COW => qcow2 write =>
   try to take qcow2 mutex => waiting

3. l2 table loaded, we see that cluster is UNALLOCATED, go to
   "case QCOW2_CLUSTER_UNALLOCATED" and unlock mutex before
   bdrv_co_preadv(bs->backing, ...)

4) aha, mutex unlocked, backup COW continues, and we finally finish
   guest write and change cluster in our active disk A

5. actually, do bdrv_co_preadv(bs->backing, ...) and read
   _new updated_ data.

To avoid this, let's make backup writes serializing, to not intersect
with reads from B.

Note: we expand range of handled cases from (sync=none and
B->backing = A) to just (A in backing chain of B), to finally allow
safe reading from B during backup for all cases when A in backing chain
of B, i.e. B formally looks like point-in-time snapshot of A.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: add BDRV_REQ_SERIALISING flag

Serialized writes should be used in copy-on-write of backup(sync=none)
for image fleecing scheme.

We need to change an assert in bdrv_aligned_pwritev, added in
28de2dcd88de. The assert may fail now, because call to
wait_serialising_requests here may become first call to it for this
request with serializing flag set. It occurs if the request is aligned
(otherwise, we should already set serializing flag before calling
bdrv_aligned_pwritev and correspondingly waited for all intersecting
requests). However, for aligned requests, we should not care about
outdating of previously read data, as there no such data. Therefore,
let's just update an assert to not care about aligned requests.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: split flags in copy_range

Pass read flags and write flags separately. This is needed to handle
coming BDRV_REQ_NO_SERIALISING clearly in following patches.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block/io: fix copy_range

Here two things are fixed:

1. Architecture

On each recursion step, we go to the child of src or dst, only for one
of them. So, it's wrong to create tracked requests for both on each
step. It leads to tracked requests duplication.

2. Wait for serializing requests on write path independently of
BDRV_REQ_NO_SERIALISING

Before commit 9ded4a01149 "backup: Use copy offloading",
BDRV_REQ_NO_SERIALISING was used for only one case: read in
copy-on-write operation during backup. Also, the flag was handled only
on read path (in bdrv_co_preadv and bdrv_aligned_preadv).

After 9ded4a01149, flag is used for not waiting serializing operations
on backup target (in same case of copy-on-write operation). This
behavior change is unsubstantiated and potentially dangerous, let's
drop it and add additional asserts and documentation.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests: 222: Don't run with luks

Luks needs special parameters to operate the image. Since this test is
focusing on image fleecing, skip skip that format.

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

target/arm: Use correct mmu_idx for exception-return unstacking

For M-profile exception returns, the mmu index to use for exception
return unstacking is supposed to be that of wherever we are returning to:
* if returning to handler mode, privileged
* if returning to thread mode, privileged or unprivileged depending on
CONTROL.nPRIV for the destination security state

We were passing the wrong thing as the 'priv' argument to
arm_v7m_mmu_idx_for_secstate_and_priv(). The effect was that guests
which programmed the MPU to behave differently for privileged and
unprivileged code could get spurious MemManage Unstack exceptions.

Reported-by: Adithya Baglody <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20180709124535 [email protected]

Merge remote-tracking branch 'remotes/vivier2/tags/linux-user-for-3.0-pull-request' into staging

Sanitize linux-user stdout

# gpg: Signature made Tue 10 Jul 2018 07:23:34 BST
# gpg:                using RSA key F30C38BD3F2FBE3C
# gpg: Good signature from "Laurent Vivier <[email protected]>"
# gpg:                 aka "Laurent Vivier <[email protected]>"
# gpg:                 aka "Laurent Vivier (Red Hat) <[email protected]>"
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier2/tags/linux-user-for-3.0-pull-request:
  linux-user: Report error message on stderr, rather than stdout
  linux-user: Do not report "syscall not implemented" by default
  linux-user: Do not report "Unsupported syscall" by default

Signed-off-by: Peter Maydell <[email protected]>

block: Fix copy-on-read crash with partial final cluster

If the virtual disk size isn't aligned to full clusters,
bdrv_co_do_copy_on_readv() may get pnum == 0 before having the full
cluster completed, which will let it run into an assertion failure:

qemu-io: block/io.c:1203: bdrv_co_do_copy_on_readv: Assertion `skip_bytes < pnum' failed.

Check for EOF, assert that we read at least as much as the read request
originally wanted to have (which is true at EOF because otherwise
bdrv_check_byte_request() would already have returned an error) and
return success early even though we couldn't copy the full cluster.

Signed-off-by: Kevin Wolf <[email protected]>

test-bdrv-drain: Test bdrv_append() to drained node

Signed-off-by: Kevin Wolf <[email protected]>

block: Poll after drain on attaching a node

Commit dcf94a23b1 ('block: Don't poll in parent drain callbacks')
removed polling in bdrv_child_cb_drained_begin() on the grounds that the
original bdrv_drain() already will poll and BdrvChildRole.drained_begin
calls must not cause graph changes (and therefore must not call
aio_poll() or the recursion through the graph will break.

This reasoning is correct for calls through bdrv_do_drained_begin().
However, BdrvChildRole.drained_begin is also called when a node that is
already in a drained section (i.e. bdrv_do_drained_begin() has already
returned and therefore can't poll any more) is attached to a new parent.
In this case, we must explicitly poll to have all requests completed
before the drained new child can be attached to the parent.

In bdrv_replace_child_noperm(), we know that we're not inside the
recursion of bdrv_do_drained_begin() because graph changes are not
allowed there, and bdrv_replace_child_noperm() is a graph change. The
call of BdrvChildRole.drained_begin() must therefore be followed by a
BDRV_POLL_WHILE() that waits for the completion of requests.

Reported-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

Merge remote-tracking branch 'remotes/xanclic/tags/pull-block-2018-07-09' into staging

Block patches for 3.0-rc0:
- qcow2 metadata overlap protection for the persistent bitmap directory
- Various bug fixes

# gpg: Signature made Mon 09 Jul 2018 19:54:38 BST
# gpg:                using RSA key F407DB0061D5CF40
# gpg: Good signature from "Max Reitz <[email protected]>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* remotes/xanclic/tags/pull-block-2018-07-09:
  qcow2: add overlap check for bitmap directory
  iotests: Add VMDK backing file correlation test
  vmdk: Fix possible segfault with non-VMDK backing
  raw: Drop superfluous semicolon
  qcow2: Drop unreachable break
  file-posix: Fix fd_open check in raw_co_copy_range_to
  qcow2: Drop unused cluster_data

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/rth/tags/pull-sh4-20180709' into staging

Fix translation for gUSA regions.

# gpg: Signature made Mon 09 Jul 2018 18:46:02 BST
# gpg:                using RSA key 64DF38E8AF7E215F
# gpg: Good signature from "Richard Henderson <[email protected]>"
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth/tags/pull-sh4-20180709:
  target/sh4: Fix translator.c assertion failure for gUSA

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/ehabkost/tags/machine-next-pull-request' into staging

Machine/NUMA fixes for -rc0

* Properly free device_memory at machine_finalize()
* Fix implicit NUMA initialization regression (for machines with
  auto_enable_numa_with_memhp=true)

# gpg: Signature made Mon 09 Jul 2018 18:40:38 BST
# gpg:                using RSA key 2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <[email protected]>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/machine-next-pull-request:
  hw/machine: Remove the Zero check of nb_numa_nodes for numa_complete_configuration()
  machine: properly free device_memory

Signed-off-by: Peter Maydell <[email protected]>

qcow2: add overlap check for bitmap directory

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-id: 20180705151515 [email protected]
Signed-off-by: Max Reitz <[email protected]>

iotests: Add VMDK backing file correlation test

This new test verifies that VMDK backing file reads fail when the
backing file has a non-matching CID. This includes non-VMDK backing
files.

Signed-off-by: Max Reitz <[email protected]>
Message-id: 20180702210721 [email protected]
Signed-off-by: Max Reitz <[email protected]>

vmdk: Fix possible segfault with non-VMDK backing

VMDK performs a probing check in vmdk_co_create_opts() to prevent the
user from assigning non-VMDK files as a backing file, because it only
supports VMDK backing files.  However, with the @backing runtime option,
it is possible to assign arbitrary nodes as backing nodes, regardless of
what the image header says.  Therefore, VMDK may not just access backing
nodes assuming they are VMDK nodes -- which it does, because it needs to
compare the backing file's CID with the overlay's parentCID value, and
naturally the backing file only has a CID when it's a VMDK file.
Instead, it should report the CID of non-VMDK backing files not to match
the overlay because clearly a non-present CID does not match.

Without this change, vmdk_read_cid() reads from the backing file's
bs->file, which may be NULL (in which case we get a segfault).  Also, it
interprets bs->opaque as a BDRVVmdkState and then reads from the
.desc_offset field, which usually will just return some arbitrary value
which then results in either garbage to be read, or bdrv_pread() to
return an error, both of which result in a non-matching CID to be
reported.

(In a very unlikely case, we could read something that looks like a
VMDK descriptor, and then get a CID which might actually match.  But
that is highly unlikely, and the only result would be that VMDK accepts
the backing file which is not too bad (albeit unintentional).)

((And in theory, the seek to .desc_offset might leak data from another
block driver's opaque object.  But then again, the user should realize
very quickly that a non-VMDK backing file does not work (because the
read will very likely fail, due to the reasons given above), so this
should not be exploitable.))

Signed-off-by: Max Reitz <[email protected]>
Message-id: 20180702210721 [email protected]
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

raw: Drop superfluous semicolon

Reported-by: Max Reitz <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>
Message-id: 20180702025836 [email protected]
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

qcow2: Drop unreachable break

Reported-by: Max Reitz <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>
Message-id: 20180702025836 [email protected]
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

file-posix: Fix fd_open check in raw_co_copy_range_to

One of them is a typo. But update both to be more readable.

Reported-by: Kevin Wolf <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>
Message-id: 20180702025836 [email protected]
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

qcow2: Drop unused cluster_data

Reported-by: Max Reitz <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>
Message-id: 20180702025836 [email protected]
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

hw/machine: Remove the Zero check of nb_numa_nodes for numa_complete_configuration()

Commit 7a3099fc9c5c("numa: postpone options post-processing till machine_run_board_init()")
broke the commit 7b8be49d36fc("NUMA: Enable adding NUMA node implicitly").

The machine_run_board_init() doesn't do NUMA setup if nb_numa_nodes=0,
but the numa_complete_configuration need add a new node if memory hotplug
is enabled (slots > 0) even nb_numa_nodes=0.

So, Remove the check for numa_complete_configuration() to fix this.

Fixes 7a3099fc9c5c("numa: postpone options post-processing till machine_run_board_init()")
Signed-off-by: Dou Liyang <[email protected]>
Message-Id: <20180704132239 [email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

machine: properly free device_memory

Machines might have inititalized device_memory if they support memory
devices, so let's properly free it.

Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20180702094152 [email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

target/sh4: Fix translator.c assertion failure for gUSA

The translator loop does not allow the tb_start hook to set
dc->base.is_jmp; the only hook allowed to do that is translate_insn.

Split the work between init_disas_context where we validate
the gUSA parameters, and translate_insn where we emit code.

Tested-by: Alex Bennée <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

Merge remote-tracking branch 'remotes/ehabkost/tags/x86-next-pull-request' into staging

x86 fix for -rc0

* Fix EPYC-IBPB compat code

# gpg: Signature made Mon 09 Jul 2018 18:21:27 BST
# gpg:                using RSA key 2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <[email protected]>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/x86-next-pull-request:
  pc: Fix typo on PC_COMPAT_2_12

Signed-off-by: Peter Maydell <[email protected]>

pc: Fix typo on PC_COMPAT_2_12

I forgot a hyphen when amending the compat code on commit
e0051647 ("i386: Enable TOPOEXT feature on AMD EPYC CPU").

Fixes: e00516475c270dcb6705753da96063f95699abf2
Signed-off-by: Eduardo Habkost <[email protected]>
Message-Id: <20180703011026 [email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

translate-all: honour CF_NOCACHE in tb_gen_code

This fixes a record-replay regression introduced by 95590e2
("translate-all: discard TB when tb_link_page returns an existing
matching TB", 2018-06-15). The problem is that code using CF_NOCACHE
assumes that the TB returned from tb_gen_code is always a
newly-generated one. This assumption, however, was broken in
the aforementioned commit.

Fix it by honouring CF_NOCACHE, so that tb_gen_code always
returns a newly-generated TB when CF_NOCACHE is passed to it.
Do this by avoiding the TB hash table if CF_NOCACHE is set.

Reported-by: Pavel Dovgalyuk <[email protected]>
Tested-by: Pavel Dovgalyuk <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Tested-by: Alistair Francis <[email protected]>
Message-id: 1530806837 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20180709' into staging

target-arm queue:
* hw/net/dp8393x: don't make prom region 'nomigrate'
* boards.h: Remove doc comment reference to nonexistent function
* hw/sd/omap_mmc: Split 'pseudo-reset' from 'power-on-reset'
* target/arm: Fix do_predset for large VL
* tcg: Restrict check_size_impl to multiples of the line size
* target/arm: Suppress Coverity warning for PRF
* hw/timer/cmsdk-apb-timer: fix minor corner-case bugs and
   suppress spurious warnings when running Linux's timer driver
* hw/arm/smmu-common: Fix devfn computation in smmu_iommu_mr

# gpg: Signature made Mon 09 Jul 2018 14:53:38 BST
# gpg:                using RSA key 3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <[email protected]>"
# gpg:                 aka "Peter Maydell <[email protected]>"
# gpg:                 aka "Peter Maydell <[email protected]>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* remotes/pmaydell/tags/pull-target-arm-20180709:
  hw/net/dp8393x: don't make prom region 'nomigrate'
  boards.h: Remove doc comment reference to nonexistent function
  hw/sd/omap_mmc: Split 'pseudo-reset' from 'power-on-reset'
  target/arm: Fix do_predset for large VL
  tcg: Restrict check_size_impl to multiples of the line size
  target/arm: Suppress Coverity warning for PRF
  hw/timer/cmsdk-apb-timer: run or stop timer on writes to RELOAD and VALUE
  hw/timer/cmsdk-apb-timer: Correctly identify and set one-shot mode
  hw/timer/cmsdk-apb-timer: Correct ptimer policy settings
  ptimer: Add TRIGGER_ONLY_ON_DECREMENT policy option
  hw/arm/smmu-common: Fix devfn computation in smmu_iommu_mr

Signed-off-by: Peter Maydell <[email protected]>

hw/net/dp8393x: don't make prom region 'nomigrate'

Currently we use memory_region_init_rom_nomigrate() to create
the "dp3893x-prom" memory region, and we don't manually register
it with vmstate_register_ram(). This currently means that its
contents are migrated but as a ram block whose name is the empty
string; in future it may mean they are not migrated at all. Use
memory_region_init_ram() instead.

Note that this is a a cross-version migration compatibility break
for the MIPS "magnum" and "pica61" machines.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Aleksandar Markovic <[email protected]>
Message-id: 20180706174309 [email protected]

boards.h: Remove doc comment reference to nonexistent function

commit b08199c6fbea1 accidentally added a reference to a doc
comment to a nonexistent memory_region_allocate_aux_memory().
This was a leftover from a previous version of the patchset
which defined memory_region_allocate_aux_memory() for
"allocate RAM MemoryRegion and register it for migration"
and left "memory_region_init_ram()" with its original semantics
of "allocate RAM MR but do not register for migration". In
the end we decided on the approach of "memory_region_init_ram()
registers the MR for migration, and memory_region_init_ram_nomigrate()
is a new function which does not", but this comment change
got left in by mistake. Revert that part of the commit.

Reported-by: Thomas Huth <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>
Message-id: 20180702130605 [email protected]

hw/sd/omap_mmc: Split 'pseudo-reset' from 'power-on-reset'

DeviceClass::reset models a "cold power-on" reset which can
also be used to powercycle a device; but there is no "hot reset"
(a.k.a. soft-reset) method available.

The OMAP MMC Power-Up Control bit is not designed to powercycle
a card, but to disable it without powering it off (pseudo-reset):

  Multimedia Card (MMC/SD/SDIO) Interface [SPRU765A]

  MMC_CON[11] Power-Up Control (POW)
  This bit must be set to 1 before any valid transaction to either
  MMC/SD or SPI memory cards.
  When 1, the card is considered powered-up and the controller core
  is enabled.
  When 0, the card is considered powered-down (system dependent),
  and the controller core logic is in pseudo-reset state. This is,
  the MMC_STAT flags and the FIFO pointers are reset, any access to
  MMC_DATA[DATA] has no effect, a write into the MMC.CMD register
  is ignored, and a setting of MMC_SPI[STR] to 1 is ignored.

By splitting the 'pseudo-reset' code out of the 'power-on' reset
function, this patch fixes a latent bug in omap_mmc_write(MMC_CON)i
recently exposed by ecd219f7abb.

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: 20180706162155 [email protected]
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Fix do_predset for large VL

Use MAKE_64BIT_MASK instead of open-coding. Remove an odd
vector size check that is unlikely to be more profitable
than 3 64-bit integer stores. Correct the iteration for WORD
to avoid writing too much data.

Fixes RISU tests of PTRUE for VL 256.

Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Tested-by: Alex Bennée <[email protected]>
Message-id: 20180705191929 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

tcg: Restrict check_size_impl to multiples of the line size

Normally this is automatic in the size restrictions that are placed
on vector sizes coming from the implementation. However, for the
legitimate size tuple [oprsz=8, maxsz=32], we need to clear the final
24 bytes of the vector register. Without this check, do_dup selects
TCG_TYPE_V128 and clears only 16 bytes.

Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Tested-by: Alex Bennée <[email protected]>
Message-id: 20180705191929 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

target/arm: Suppress Coverity warning for PRF

These instructions must perform the sve_access_check, but
since they are implemented as NOPs there is no generated
code to elide when the access check fails.

Fixes: Coverity issues 1393780 & 1393779.
Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>