Git Repo - qemu.git/log

9pfs: introduce relative_openat_nofollow() helper

When using the passthrough security mode, symbolic links created by the
guest are actual symbolic links on the host file system.

Since the resolution of symbolic links during path walk is supposed to
occur on the client side. The server should hence never receive any path
pointing to an actual symbolic link. This isn't guaranteed by the protocol
though, and malicious code in the guest can trick the server to issue
various syscalls on paths whose one or more elements are symbolic links.
In the case of the "local" backend using the "passthrough" or "none"
security modes, the guest can directly create symbolic links to arbitrary
locations on the host (as per spec). The "mapped-xattr" and "mapped-file"
security modes are also affected to a lesser extent as they require some
help from an external entity to create actual symbolic links on the host,
i.e. another guest using "passthrough" mode for example.

The current code hence relies on O_NOFOLLOW and "l*()" variants of system
calls. Unfortunately, this only applies to the rightmost path component.
A guest could maliciously replace any component in a trusted path with a
symbolic link. This could allow any guest to escape a virtfs shared folder.

This patch introduces a variant of the openat() syscall that successively
opens each path element with O_NOFOLLOW. When passing a file descriptor
pointing to a trusted directory, one is guaranteed to be returned a
file descriptor pointing to a path which is beneath the trusted directory.
This will be used by subsequent patches to implement symlink-safe path walk
for any access to the backend.

Symbolic links aren't the only threats actually: a malicious guest could
change a path element to point to other types of file with undesirable
effects:
- a named pipe or any other thing that would cause openat() to block
- a terminal device which would become QEMU's controlling terminal

These issues can be addressed with O_NONBLOCK and O_NOCTTY.

Two helpers are introduced: one to open intermediate path elements and one
to open the rightmost path element.

Suggested-by: Jann Horn <[email protected]>
Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
(renamed openat_nofollow() to relative_openat_nofollow(),
assert path is relative and doesn't contain '//',
fixed side-effect in assert, Greg Kurz)
Signed-off-by: Greg Kurz <[email protected]>

9pfs: remove side-effects in local_open() and local_opendir()

If these functions fail, they should not change *fs. Let's use local
variables to fix this.

Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>

9pfs: remove side-effects in local_init()

If this function fails, it should not modify *ctx.

Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>

9pfs: local: move xattr security ops to 9p-xattr.c

These functions are always called indirectly. It really doesn't make sense
for them to sit in a header file.

Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>

Merge remote-tracking branch 'remotes/kraxel/tags/pull-ui-20170227-1' into staging

gtk: fix kbd on xwayland
vnc: fix double free issues
opengl improvements

# gpg: Signature made Mon 27 Feb 2017 16:11:30 GMT
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg:                 aka "Gerd Hoffmann <[email protected]>"
# gpg:                 aka "Gerd Hoffmann (private) <[email protected]>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/pull-ui-20170227-1:
  vnc: fix double free issues
  spice: add display & head options
  ui: Use XkbGetMap and XkbGetNames instead of XkbGetKeyboard
  gtk-egl: add scanout_disable support
  sdl2: add scanout_disable support
  spice: add scanout_disable support
  virtio-gpu: use dpy_gl_scanout_disable
  console: add dpy_gl_scanout_disable
  console: rename dpy_gl_scanout to dpy_gl_scanout_texture

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/berrange/tags/pull-qcrypto-2017-02-27-1' into staging

Merge qcrypto 2017/02/27 v1

# gpg: Signature made Mon 27 Feb 2017 13:37:34 GMT
# gpg:                using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <[email protected]>"
# gpg:                 aka "Daniel P. Berrange <[email protected]>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF

* remotes/berrange/tags/pull-qcrypto-2017-02-27-1:
  crypto: assert cipher algorithm is always valid
  crypto: fix leak in ivgen essiv init

Signed-off-by: Peter Maydell <[email protected]>

vnc: fix double free issues

Reported by Coverity: CID 1371242, 1371243, 1371244.

Cc: Paolo Bonzini <[email protected]>
Cc: Peter Maydell <[email protected]>
Cc: Daniel P. Berrange <[email protected]>
Signed-off-by: Gerd Hoffmann <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Message-id: 1487682332 [email protected]

spice: add display & head options

This allows to specify display and head to use, simliar to vnc.

Signed-off-by: Gerd Hoffmann <[email protected]>
Message-id: 1487663858 [email protected]

ui: Use XkbGetMap and XkbGetNames instead of XkbGetKeyboard

XkbGetKeyboard does not work in XWayland and even on non-Wayland
X11 servers its use is discouraged:

  https://bugs.freedesktop.org/show_bug.cgi?id=89240

This resolves a problem whereby QEMU prints

  "could not lookup keycode name"

on startup when running under XWayland. Keymap handling is
however still broken after this commit, since Xwayland is
reporting a keymap we can't handle

  "unknown keycodes `(unnamed)', please report to [email protected]"

NB, native Wayland support (which is the default under GTK3) is
not affected - only XWayland (which can be requested with GDK_BACKEND
on GTK3, and is the only option for GTK2).

Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20170227132343 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

gtk-egl: add scanout_disable support

Signed-off-by: Gerd Hoffmann <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Message-id: 1487669841 [email protected]

sdl2: add scanout_disable support

Signed-off-by: Gerd Hoffmann <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Message-id: 1487669841 [email protected]

spice: add scanout_disable support

Signed-off-by: Gerd Hoffmann <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Message-id: 1487669841 [email protected]

virtio-gpu: use dpy_gl_scanout_disable

Signed-off-by: Gerd Hoffmann <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Message-id: 1487669841 [email protected]

console: add dpy_gl_scanout_disable

Helper function (and DisplayChangeListenerOps ptr) to disable scanouts.
Replaces using dpy_gl_scanout_texture with 0x0 size and no texture
specified.

Allows cleanups to make the io and gfx emulation code more readable.

Signed-off-by: Gerd Hoffmann <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Message-id: 1487669841 [email protected]

console: rename dpy_gl_scanout to dpy_gl_scanout_texture

We'll add a variant which accepts dmabufs soon. Change
the name so we can easily disturgish the two variants.

Signed-off-by: Gerd Hoffmann <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Message-id: 1487669841 [email protected]

crypto: assert cipher algorithm is always valid

Crypto routines 'qcrypto_cipher_get_block_len' and
'qcrypto_cipher_get_key_len' return non-zero cipher block and key
lengths from static arrays 'alg_block_len[]' and 'alg_key_len[]'
respectively. Returning 'zero(0)' value from either of them would
likely lead to an error condition.

Signed-off-by: Prasad J Pandit <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>

crypto: fix leak in ivgen essiv init

On error path, the 'salt' doesn't been freed thus leading
a memory leak. This patch avoid this.

Signed-off-by: Li Qiang <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>

tests-aio-multithread: use atomic_read properly

nodes[id].next is written by other threads. If atomic_read is not used
(matching atomic_set in mcs_mutex_lock!) the compiler can optimize the
whole "if" away!

Reported-by: Alex Bennée <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Tested-by: Greg Kurz <[email protected]>
Message-id: 20170227111726 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/artyom/tags/pull-sun4v-20170226' into staging

Pull request for Niagara patches 2017 02 26

# gpg: Signature made Sun 26 Feb 2017 21:56:06 GMT
# gpg:                using RSA key 0x3360C3F7411A125F
# gpg: Good signature from "Artyom Tarasenko <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2AD8 6149 17F4 B2D7 05C0  BB12 3360 C3F7 411A 125F

* remotes/artyom/tags/pull-sun4v-20170226:
  niagara: check if a serial port is available
  niagara: fail if a firmware file is missing

Signed-off-by: Peter Maydell <[email protected]>

niagara: check if a serial port is available

Reported-by: Markus Armbruster <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Signed-off-by: Artyom Tarasenko <[email protected]>

niagara: fail if a firmware file is missing

fail if a firmware file is missing and not qtest_enabled(),
the later is necessary to allow some basic tests if
firmware is not available

Suggested-by: Peter Maydell <[email protected]>
Signed-off-by: Artyom Tarasenko <[email protected]>

Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault' into staging

slirp updates

# gpg: Signature made Sun 26 Feb 2017 14:40:00 GMT
# gpg:                using RSA key 0xB0A51BF58C9179C5
# gpg: Good signature from "Samuel Thibault <[email protected]>"
# gpg:                 aka "Samuel Thibault <[email protected]>"
# gpg:                 aka "Samuel Thibault <[email protected]>"
# gpg:                 aka "Samuel Thibault <[email protected]>"
# gpg:                 aka "Samuel Thibault <[email protected]>"
# gpg:                 aka "Samuel Thibault <[email protected]>"
# gpg:                 aka "Samuel Thibault <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 900C B024 B679 31D4 0F82  304B D017 8C76 7D06 9EE6
#      Subkey fingerprint: AEBF 7448 FAB9 453A 4552  390E B0A5 1BF5 8C91 79C5

* remotes/thibault/tags/samuel-thibault:
  slirp: tcp_listen(): Don't try to close() an fd we never opened
  slirp: Convert mbufs to use g_malloc() and g_free()
  slirp: Check qemu_socket() return value in udp_listen()

Signed-off-by: Peter Maydell <[email protected]>

slirp: tcp_listen(): Don't try to close() an fd we never opened

Coverity points out (CID 1005725) that an error-exit path in tcp_listen()
will try to close(s) even if the reason it got there was that the
qemu_socket() failed and s was never opened. Not only that, this isn't even
the right function to use, because we need closesocket() to do the right
thing on Windows. Change to using the right function and only calling it if
needed.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Samuel Thibault <[email protected]>

slirp: Convert mbufs to use g_malloc() and g_free()

The mbuf code currently doesn't check the result of doing a malloc()
or realloc() of its data (spotted by Coverity, CID 1238946).
Since the m_inc() API assumes that extending an mbuf must succeed,
just convert to g_malloc() and g_free().

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Samuel Thibault <[email protected]>

slirp: Check qemu_socket() return value in udp_listen()

Check the return value from qemu_socket() rather than trying to
pass it to bind() as an fd argument even if it's negative.
This wouldn't have caused any negative consequences, because
it won't be a valid fd number and the bind call will fail;
but Coverity complains (CID 1005723).

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Samuel Thibault <[email protected]>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Fri 24 Feb 2017 18:08:26 GMT
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  tests: Use opened block node for block job tests
  vvfat: Use opened node as backing file
  block: Add bdrv_new_open_driver()
  block: Factor out bdrv_open_driver()
  block: Use BlockBackend for image probing
  block: Factor out bdrv_open_child_bs()
  block: Attach bs->file only during .bdrv_open()
  block: Pass BdrvChild to bdrv_truncate()
  mirror: Resize active commit base in mirror_run()
  qcow2: Use BB for resizing in qcow2_amend_options()
  blockdev: Use BlockBackend to resize in qmp_block_resize()
  iotests: Fix another race in 030
  qemu-img: Improve documentation for PREALLOC_MODE_FALLOC
  qemu-img: Truncate before full preallocation
  qemu-img: Add tests for raw image preallocation
  qemu-img: Do not truncate before preallocation
  qemu-iotests: redirect nbd server stdout to /dev/null
  qemu-iotests: add ability to exclude certain protocols from tests
  qemu-iotests: Test 137 only supports 'file' protocol

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging

# gpg: Signature made Fri 24 Feb 2017 17:45:53 GMT
# gpg:                using RSA key 0xBDBE7B27C0DE3057
# gpg: Good signature from "Jeffrey Cody <[email protected]>"
# gpg:                 aka "Jeffrey Cody <[email protected]>"
# gpg:                 aka "Jeffrey Cody <[email protected]>"
# Primary key fingerprint: 9957 4B4D 3474 90E7 9D98  D624 BDBE 7B27 C0DE 3057

* remotes/cody/tags/block-pull-request:
  RBD: Add support readv,writev for rbd
  block/nfs: try to avoid the bounce buffer in pwritev
  block/nfs: convert to preadv / pwritev

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/yongbok/tags/mips-20170224-2' into staging

MIPS patches 2017-02-24-2

CHanges:
* Add the Boston board with fixing the make check issue on 32-bit hosts.

# gpg: Signature made Fri 24 Feb 2017 11:43:45 GMT
# gpg:                using RSA key 0x2238EB86D5F797C2
# gpg: Good signature from "Yongbok Kim <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 8600 4CF5 3415 A5D9 4CFA  2B5C 2238 EB86 D5F7 97C2

* remotes/yongbok/tags/mips-20170224-2:
  hw/mips: MIPS Boston board support

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/stsquad/tags/pull-mttcg-240217-1' into staging

This is the MTTCG pull-request as posted yesterday.

# gpg: Signature made Fri 24 Feb 2017 11:17:51 GMT
# gpg:                using RSA key 0xFBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <[email protected]>"
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8  DF35 FBD0 DB09 5A9E 2A44

* remotes/stsquad/tags/pull-mttcg-240217-1: (24 commits)
  tcg: enable MTTCG by default for ARM on x86 hosts
  hw/misc/imx6_src: defer clearing of SRC_SCR reset bits
  target-arm: ensure all cross vCPUs TLB flushes complete
  target-arm: don't generate WFE/YIELD calls for MTTCG
  target-arm/powerctl: defer cpu reset work to CPU context
  cputlb: introduce tlb_flush_*_all_cpus[_synced]
  cputlb: atomically update tlb fields used by tlb_reset_dirty
  cputlb: add tlb_flush_by_mmuidx async routines
  cputlb and arm/sparc targets: convert mmuidx flushes from varg to bitmap
  cputlb: introduce tlb_flush_* async work.
  cputlb: tweak qemu_ram_addr_from_host_nofail reporting
  cputlb: add assert_cpu_is_self checks
  tcg: handle EXCP_ATOMIC exception for system emulation
  tcg: enable thread-per-vCPU
  tcg: enable tb_lock() for SoftMMU
  tcg: remove global exit_request
  tcg: drop global lock during TCG code execution
  tcg: rename tcg_current_cpu to tcg_current_rr_cpu
  tcg: add kick timer for single-threaded vCPU emulation
  tcg: add options for enabling MTTCG
  ...

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20170224' into staging

A selection of s390x patches:
- cleanups, fixes and improvements
- program check loop detection (useful with the corresponding kernel
  patch)
- wire up virtio-crypto for ccw
- and finally support many virtqueues for virtio-ccw

# gpg: Signature made Fri 24 Feb 2017 09:19:19 GMT
# gpg:                using RSA key 0xDECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <[email protected]>"
# gpg:                 aka "Cornelia Huck <[email protected]>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0  18CE DECF 6B93 C6F0 2FAF

* remotes/cohuck/tags/s390x-20170224:
  s390x/css: handle format-0 TIC CCW correctly
  s390x/arch_dump: pass cpuid into notes sections
  s390x/arch_dump: use proper note name and note size
  virtio-ccw: support VIRTIO_QUEUE_MAX virtqueues
  s390x: bump ADAPTER_ROUTES_MAX_GSI
  virtio-ccw: check flic->adapter_routes_max_batch
  s390x: add property adapter_routes_max_batch
  virtio-ccw: Check the number of vqs in CCW_CMD_SET_IND
  virtio-ccw: add virtio-crypto-ccw device
  virtio-ccw: handle virtio 1 only devices
  s390x/flic: fail migration on source already
  s390x/kvm: detect some program check loops
  s390x/s390-virtio: get rid of DPRINTF

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/famz/tags/for-upstream' into staging

Docker testing and shippable patches

Hi Peter,

These are testing and build automation patches:

- Shippable.com powered CI config
- Docker cross build
- Fixes and MAINTAINERS tweaks.

# gpg: Signature made Fri 24 Feb 2017 06:31:10 GMT
# gpg:                using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021  AD56 CA35 624C 6A91 71C6

* remotes/famz/tags/for-upstream:
  docker: Install python2 explicitly in docker image
  MAINTAINERS: merge Build and test automation with Docker tests
  .shippable.yml: new CI provider
  new: debian docker targets for cross-compiling
  tests/docker: add basic user mapping support

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/armbru/tags/pull-util-2017-02-23' into staging

option cutils: Fix and clean up number conversions

# gpg: Signature made Thu 23 Feb 2017 19:41:17 GMT
# gpg:                using RSA key 0x3870B400EB918653
# gpg: Good signature from "Markus Armbruster <[email protected]>"
# gpg:                 aka "Markus Armbruster <[email protected]>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-util-2017-02-23: (24 commits)
  option: Fix checking of sizes for overflow and trailing crap
  util/cutils: Change qemu_strtosz*() from int64_t to uint64_t
  util/cutils: Return qemu_strtosz*() error and value separately
  util/cutils: Let qemu_strtosz*() optionally reject trailing crap
  qemu-img: Wrap cvtnum() around qemu_strtosz()
  test-cutils: Drop suffix from test_qemu_strtosz_simple()
  test-cutils: Use qemu_strtosz() more often
  util/cutils: Drop QEMU_STRTOSZ_DEFSUFFIX_* macros
  util/cutils: New qemu_strtosz()
  util/cutils: Rename qemu_strtosz() to qemu_strtosz_MiB()
  util/cutils: New qemu_strtosz_metric()
  test-cutils: Cover qemu_strtosz() around range limits
  test-cutils: Cover qemu_strtosz() with trailing crap
  test-cutils: Cover qemu_strtosz() invalid input
  test-cutils: Add missing qemu_strtosz()... endptr checks
  option: Fix to reject invalid and overflowing numbers
  util/cutils: Clean up control flow around qemu_strtol() a bit
  util/cutils: Clean up variable names around qemu_strtol()
  util/cutils: Rename qemu_strtoll(), qemu_strtoull()
  util/cutils: Rewrite documentation of qemu_strtol() & friends
  ...

Signed-off-by: Peter Maydell <[email protected]>

RBD: Add support readv,writev for rbd

Rbd can do readv and writev directly, so wo do not need to transform
iov to buf or vice versa any more.

Signed-off-by: tianqing <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Signed-off-by: Jeff Cody <[email protected]>

block/nfs: try to avoid the bounce buffer in pwritev

if the passed qiov contains exactly one iov we can
pass the buffer directly.

Signed-off-by: Peter Lieven <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Message-id: 1487349541 [email protected]
Signed-off-by: Jeff Cody <[email protected]>

block/nfs: convert to preadv / pwritev

Signed-off-by: Peter Lieven <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Message-id: 1487349541 [email protected]
Signed-off-by: Jeff Cody <[email protected]>

Merge remote-tracking branch 'remotes/awilliam/tags/vfio-updates-20170223.0' into staging

VFIO updates 2017-02-23

- Report qdev_unplug errors (Alex Williamson)
- Fix ecap ID 0 handling, improve comment (Alex Williamson)
- Disable IGD stolen memory in UPT mode too (Xiong Zhang)

# gpg: Signature made Thu 23 Feb 2017 19:04:17 GMT
# gpg:                using RSA key 0x239B9B6E3BB08B22
# gpg: Good signature from "Alex Williamson <[email protected]>"
# gpg:                 aka "Alex Williamson <[email protected]>"
# gpg:                 aka "Alex Williamson <[email protected]>"
# gpg:                 aka "Alex Williamson <[email protected]>"
# Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B  8A90 239B 9B6E 3BB0 8B22

* remotes/awilliam/tags/vfio-updates-20170223.0:
  vfio/pci-quirks.c: Disable stolen memory for igd VFIO
  vfio/pci: Improve extended capability comments, skip masked caps
  vfio/pci: Report errors from qdev_unplug() via device request

Signed-off-by: Peter Maydell <[email protected]>

tests: Use opened block node for block job tests

blk_insert_bs() and block job related functions will soon require an
opened block node (permission calculations will involve the block
driver), so let our tests be consistent with the real users in this
respect.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Max Reitz <[email protected]>

vvfat: Use opened node as backing file

We should not try to assign a not yet opened node as the backing file,
because as soon as the permission system is added it will fail. The
just added bdrv_new_open_driver() function is the right tool to open a
file with an internal driver, use it.

In case anyone wonders whether that magic fake backing file to trigger a
special action on 'commit' actually works today: No, not for me. One
reason is that we've been adding a raw format driver on top for several
years now and raw doesn't support commit. Other reasons include that the
backing file isn't writable and the driver doesn't support reopen, and
it's also size 0 and the driver doesn't support bdrv_truncate. All of
these are easily fixable, but then 'commit' ended up in an infinite loop
deep in the vvfat code for me, so I thought I'd best leave it alone. I'm
not really sure what it was supposed to do anyway.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Max Reitz <[email protected]>

block: Add bdrv_new_open_driver()

This function allows to create more or less normal BlockDriverStates
even for BlockDrivers that aren't globally registered (e.g. helper
filters for block jobs).

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Max Reitz <[email protected]>

block: Factor out bdrv_open_driver()

This is a function that doesn't do any option parsing, but just does
some basic BlockDriverState setup and calls the .bdrv_open() function of
the block driver.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Max Reitz <[email protected]>

block: Use BlockBackend for image probing

This fixes the use of a parent-less BdrvChild in bdrv_open_inherit() by
converting it into a BlockBackend. Which is exactly what it should be,
image probing is an external, standalone user of a node. The requests
can't be considered to originate from the format driver node because
that one isn't even opened yet.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Max Reitz <[email protected]>

block: Factor out bdrv_open_child_bs()

This is the part of bdrv_open_child() that opens a BDS with option
inheritance, but doesn't attach it as a child to the parent yet.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Max Reitz <[email protected]>

block: Attach bs->file only during .bdrv_open()

The way that attaching bs->file worked was a bit unusual in that it was
the only child that would be attached to a node which is not opened yet.
Because of this, the block layer couldn't know yet which permissions the
driver would eventually need.

This patch moves the point where bs->file is attached to the beginning
of the individual .bdrv_open() implementations, so drivers already know
what they are going to do with the child. This is also more consistent
with how driver-specific children work.

For a moment, bdrv_open() gets its own BdrvChild to perform image
probing, but instead of directly assigning this BdrvChild to the BDS, it
becomes a temporary one and the node name is passed as an option to the
drivers, so that they can simply use bdrv_open_child() to create another
reference for their own use.

This duplicated child for (the not opened yet) bs is not the final
state, a follow-up patch will change the image probing code to use a
BlockBackend, which is completely independent of bs.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Max Reitz <[email protected]>

block: Pass BdrvChild to bdrv_truncate()

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Max Reitz <[email protected]>

mirror: Resize active commit base in mirror_run()

This is more consistent with the commit block job, and it moves the code
to a place where we already have the necessary BlockBackends to resize
the base image when bdrv_truncate() is changed to require a BdrvChild.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Max Reitz <[email protected]>

qcow2: Use BB for resizing in qcow2_amend_options()

In order to able to convert bdrv_truncate() to take a BdrvChild and
later to correctly check the resize permission here, we need to use a
BlockBackend for resizing the image.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Max Reitz <[email protected]>

blockdev: Use BlockBackend to resize in qmp_block_resize()

In order to be able to do permission checking and to keep working with
the BdrvChild based bdrv_truncate() that this involves, we need to
create a temporary BlockBackend to resize the image.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Max Reitz <[email protected]>

iotests: Fix another race in 030

We can't rely on a non-paused job to be present and running for us.
Assume that if the job is not present that it completed already.

Signed-off-by: John Snow <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-img: Improve documentation for PREALLOC_MODE_FALLOC

Now that we are truncating the file in both PREALLOC_MODE_FULL and
PREALLOC_MODE_OFF, not truncating in PREALLOC_MODE_FALLOC looks odd.
Add a comment explaining why we do not truncate in this case.

Signed-off-by: Nir Soffer <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-img: Truncate before full preallocation

In a previous commit (qemu-img: Do not truncate before preallocation) we
moved truncate to the PREALLOC_MODE_OFF branch to avoid slowdown in
posix_fallocate().

However this change is not optimal when using PREALLOC_MODE_FULL, since
knowing the final size from the beginning could allow the file system
driver to do less allocations and possibly avoid fragmentation of the
file.

Now we truncate also before doing full preallocation.

Signed-off-by: Nir Soffer <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-img: Add tests for raw image preallocation

Add tests for creating raw image with and without the preallocation
option.

Signed-off-by: Nir Soffer <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-img: Do not truncate before preallocation

When using file system that does not support fallocate() (e.g. NFS <
4.2), truncating the file only when preallocation=OFF speeds up creating
raw file.

Here is example run, tested on Fedora 24 machine, creating raw file on
NFS version 3 server.

$ time ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 1g
Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc

real 0m21.185s
user 0m0.022s
sys 0m0.574s

$ time ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 1g
Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc

real 0m11.601s
user 0m0.016s
sys 0m0.525s

$ time dd if=/dev/zero of=mnt/test bs=1M count=1024 oflag=direct
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 15.6627 s, 68.6 MB/s

real 0m16.104s
user 0m0.009s
sys 0m0.220s

Running with strace we can see that without this change we do one
pread() and one pwrite() for each block. With this change, we do only
one pwrite() per block.

$ strace ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 8192
...
pread64(9, "\0", 1, 4095)               = 1
pwrite64(9, "\0", 1, 4095)              = 1
pread64(9, "\0", 1, 8191)               = 1
pwrite64(9, "\0", 1, 8191)              = 1

$ strace ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 8192
...
pwrite64(9, "\0", 1, 4095)              = 1
pwrite64(9, "\0", 1, 8191)              = 1

This happens because posix_fallocate is checking if each block is
allocated before writing a byte to the block, and when truncating the
file before preallocation, all blocks are unallocated.

Signed-off-by: Nir Soffer <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: redirect nbd server stdout to /dev/null

Some iotests (e.g. 174) try to filter the output of _make_test_image by
piping the stdout. Pipe the server stdout to /dev/null, so that filter
pipe does not need to wait until process completion.

Signed-off-by: Jeff Cody <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: add ability to exclude certain protocols from tests

Add the ability for shell script tests to exclude specific
protocols. This is useful to allow all protocols except ones known to
not support a feature used in the test (e.g. .bdrv_create).

Signed-off-by: Jeff Cody <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: Test 137 only supports 'file' protocol

Since test 137 make uses of qcow2.py, only local files are supported.

Signed-off-by: Jeff Cody <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2017-02-22' into staging

QAPI patches for 2017-02-22

# gpg: Signature made Wed 22 Feb 2017 19:12:27 GMT
# gpg:                using RSA key 0x3870B400EB918653
# gpg: Good signature from "Markus Armbruster <[email protected]>"
# gpg:                 aka "Markus Armbruster <[email protected]>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-qapi-2017-02-22:
  block: Don't bother asserting type of output visitor's output
  monitor: Clean up handle_hmp_command() a bit
  tests: Don't check qobject_type() before qobject_to_qbool()
  tests: Don't check qobject_type() before qobject_to_qfloat()
  tests: Don't check qobject_type() before qobject_to_qint()
  tests: Don't check qobject_type() before qobject_to_qstring()
  tests: Don't check qobject_type() before qobject_to_qlist()
  Don't check qobject_type() before qobject_to_qdict()
  test-qmp-event: Simplify and tighten event_test_emit()
  libqtest: Clean up qmp_response() a bit
  check-qjson: Simplify around compare_litqobj_to_qobj()
  check-qdict: Tighten qdict_crumple_test_recursive() some
  check-qdict: Simplify qdict_crumple_test_recursive()
  qdict: Make qdict_get_qlist() safe like qdict_get_qdict()
  net: Flatten simple union NetLegacyOptions
  numa: Flatten simple union NumaOptions

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/kraxel/tags/pull-cve-2017-2620-20170224-1' into staging

cirrus: add blit_is_unsafe call to cirrus_bitblt_cputovideo (CVE-2017-2620)

# gpg: Signature made Fri 24 Feb 2017 13:42:39 GMT
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg:                 aka "Gerd Hoffmann <[email protected]>"
# gpg:                 aka "Gerd Hoffmann (private) <[email protected]>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/pull-cve-2017-2620-20170224-1:
  cirrus: add blit_is_unsafe call to cirrus_bitblt_cputovideo (CVE-2017-2620)

Signed-off-by: Peter Maydell <[email protected]>

cirrus: add blit_is_unsafe call to cirrus_bitblt_cputovideo (CVE-2017-2620)

CIRRUS_BLTMODE_MEMSYSSRC blits do NOT check blit destination
and blit width, at all. Oops. Fix it.

Security impact: high.

The missing blit destination check allows to write to host memory.
Basically same as CVE-2014-8106 for the other blit variants.

Cc: [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

Merge remote-tracking branch 'remotes/kraxel/tags/pull-usb-20170223-1' into staging

usb: ohci bugfix, switch core to unrealize, xhci property cleanup

# gpg: Signature made Thu 23 Feb 2017 15:37:57 GMT
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg:                 aka "Gerd Hoffmann <[email protected]>"
# gpg:                 aka "Gerd Hoffmann (private) <[email protected]>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/pull-usb-20170223-1:
  xhci: properties cleanup
  usb: ohci: fix error return code in servicing td
  usb: replace handle_destroy with unrealize

Signed-off-by: Peter Maydell <[email protected]>

hw/mips: MIPS Boston board support

Introduce support for emulating the MIPS Boston development board. The
Boston board is built around an FPGA & 3 PCIe controllers, one of which
is connected to an Intel EG20T Platform Controller Hub. It is used
during the development & debug of new CPUs and the software intended to
run on them, and is essentially the successor to the older MIPS Malta
board.

This patch does not implement the EG20T, instead connecting an already
supported ICH-9 AHCI controller. Whilst this isn't accurate it's enough
for typical stock Boston software (eg. Linux kernels) to work with hard
disks given that both the ICH-9 & EG20T implement the AHCI
specification.

Boston boards typically boot kernels in the FIT image format, and this
patch will treat kernels provided to QEMU as such. When loading a kernel
directly, the board code will generate minimal firmware much as the
Malta board code does. This firmware will set up the CM, CPC & GIC
register base addresses then set argument registers & jump to the kernel
entry point. Alternatively, bootloader code may be loaded using the bios
argument in which case no firmware will be generated & execution will
proceed from the start of the boot code at the default MIPS boot
exception vector (offset 0x1fc00000 into (c)kseg1).

Currently real Boston boards are always used with FPGA bitfiles that
include a Global Interrupt Controller (GIC), so the interrupt
configuration is only defined for such cases. Therefore the board will
only allow use of CPUs which implement the CPS components, including the
GIC, and will otherwise exit with a message.

Signed-off-by: Paul Burton <[email protected]>
Reviewed-by: Yongbok Kim <[email protected]>
[[email protected]:
  isolated boston machine support for mips64el.
  updated for recent Chardev changes.
  ignore missing bios/kernel for qtest.
  added default -drive to if=ide explicitly.
  changed default memory size into 1G due to make check failure
  on 32-bit hosts]
Signed-off-by: Yongbok Kim <[email protected]>

tcg: enable MTTCG by default for ARM on x86 hosts

This enables the multi-threaded system emulation by default for ARMv7
and ARMv8 guests using the x86_64 TCG backend. This is because on the
guest side:

  - The ARM translate.c/translate-64.c have been converted to
    - use MTTCG safe atomic primitives
    - emit the appropriate barrier ops
  - The ARM machine has been updated to
    - hold the BQL when modifying shared cross-vCPU state
    - defer powerctl changes to async safe work

All the host backends support the barrier and atomic primitives but
need to provide same-or-better support for normal load/store
operations.

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Acked-by: Peter Maydell <[email protected]>
Tested-by: Pranith Kumar <[email protected]>
Reviewed-by: Pranith Kumar <[email protected]>

hw/misc/imx6_src: defer clearing of SRC_SCR reset bits

The arm_reset_cpu/set_cpu_on/set_cpu_off() functions do their work
asynchronously in the target vCPUs context. As a result we need to
ensure the SRC_SCR reset bits correctly report the reset status at the
right time. To do this we defer the clearing of the bit with an async
job which will run after the work queued by ARM powerctl functions.

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>

target-arm: ensure all cross vCPUs TLB flushes complete

Previously flushes on other vCPUs would only get serviced when they
exited their TranslationBlocks. While this isn't overly problematic it
violates the semantics of TLB flush from the point of view of source
vCPU.

To solve this we call the cputlb *_all_cpus_synced() functions to do
the flushes which ensures all flushes are completed by the time the
vCPU next schedules its own work. As the TLB instructions are modelled
as CP writes the TB ends at this point meaning cpu->exit_request will
be checked before the next instruction is executed.

Deferring the work until the architectural sync point is a possible
future optimisation.

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>

target-arm: don't generate WFE/YIELD calls for MTTCG

The WFE and YIELD instructions are really only hints and in TCG's case
they were useful to move the scheduling on from one vCPU to the next. In
the parallel context (MTTCG) this just causes an unnecessary cpu_exit
and contention of the BQL.

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>

target-arm/powerctl: defer cpu reset work to CPU context

When switching a new vCPU on we want to complete a bunch of the setup
work before we start scheduling the vCPU thread. To do this cleanly we
defer vCPU setup to async work which will run the vCPUs execution
context as the thread is woken up. The scheduling of the work will kick
the vCPU awake.

This avoids potential races in MTTCG system emulation.

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>

cputlb: introduce tlb_flush_*_all_cpus[_synced]

This introduces support to the cputlb API for flushing all CPUs TLBs
with one call. This avoids the need for target helpers to iterate
through the vCPUs themselves.

An additional variant of the API (_synced) will cause the source vCPUs
work to be scheduled as "safe work". The result will be all the flush
operations will be complete by the time the originating vCPU executes
its safe work. The calling implementation can either end the TB
straight away (which will then pick up the cpu->exit_request on
entering the next block) or defer the exit until the architectural
sync point (usually a barrier instruction).

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

cputlb: atomically update tlb fields used by tlb_reset_dirty

The main use case for tlb_reset_dirty is to set the TLB_NOTDIRTY flags
in TLB entries to force the slow-path on writes. This is used to mark
page ranges containing code which has been translated so it can be
invalidated if written to. To do this safely we need to ensure the TLB
entries in question for all vCPUs are updated before we attempt to run
the code otherwise a race could be introduced.

To achieve this we atomically set the flag in tlb_reset_dirty_range and
take care when setting it when the TLB entry is filled.

On 32 bit systems attempting to emulate 64 bit guests we don't even
bother as we might not have the atomic primitives available. MTTCG is
disabled in this case and can't be forced on. The copy_tlb_helper
function helps keep the atomic semantics in one place to avoid
confusion.

The dirty helper function is made static as it isn't used outside of
cputlb.

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

cputlb: add tlb_flush_by_mmuidx async routines

This converts the remaining TLB flush routines to use async work when
detecting a cross-vCPU flush. The only minor complication is having to
serialise the var_list of MMU indexes into a form that can be punted
to an asynchronous job.

The pending_tlb_flush field on QOM's CPU structure also becomes a
bitfield rather than a boolean.

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

cputlb and arm/sparc targets: convert mmuidx flushes from varg to bitmap

While the vargs approach was flexible the original MTTCG ended up
having munge the bits to a bitmap so the data could be used in
deferred work helpers. Instead of hiding that in cputlb we push the
change to the API to make it take a bitmap of MMU indexes instead.

For ARM some the resulting flushes end up being quite long so to aid
readability I've tended to move the index shifting to a new line so
all the bits being or-ed together line up nicely, for example:

    tlb_flush_page_by_mmuidx(other_cs, pageaddr,
                             (1 << ARMMMUIdx_S1SE1) |
                             (1 << ARMMMUIdx_S1SE0));

Signed-off-by: Alex Bennée <[email protected]>
[AT: SPARC parts only]
Reviewed-by: Artyom Tarasenko <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
[PM: ARM parts only]
Reviewed-by: Peter Maydell <[email protected]>

cputlb: introduce tlb_flush_* async work.

Some architectures allow to flush the tlb of other VCPUs. This is not a problem
when we have only one thread for all VCPUs but it definitely needs to be an
asynchronous work when we are in true multithreaded work.

We take the tb_lock() when doing this to avoid racing with other threads
which may be invalidating TB's at the same time. The alternative would
be to use proper atomic primitives to clear the tlb entries en-mass.

This patch doesn't do anything to protect other cputlb function being
called in MTTCG mode making cross vCPU changes.

Signed-off-by: KONRAD Frederic <[email protected]>
[AJB: remove need for g_malloc on defer, make check fixes, tb_lock]
Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

cputlb: tweak qemu_ram_addr_from_host_nofail reporting

This moves the helper function closer to where it is called and updates
the error message to report via error_report instead of the deprecated
fprintf.

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

cputlb: add assert_cpu_is_self checks

For SoftMMU the TLB flushes are an example of a task that can be
triggered on one vCPU by another. To deal with this properly we need to
use safe work to ensure these changes are done safely. The new assert
can be enabled while debugging to catch these cases.

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

tcg: handle EXCP_ATOMIC exception for system emulation

The patch enables handling atomic code in the guest. This should be
preferably done in cpu_handle_exception(), but the current assumptions
regarding when we can execute atomic sections cause a deadlock.

The current mechanism discards the flags which were set in atomic
execution. We ensure they are properly saved by calling the
cc->cpu_exec_enter/leave() functions around the loop.

As we are running cpu_exec_step_atomic() from the outermost loop we
need to avoid an abort() when single stepping over atomic code since
debug exception longjmp will point to the the setlongjmp in
cpu_exec(). We do this by setting a new jmp_env so that it jumps back
here on an exception.

Signed-off-by: Pranith Kumar <[email protected]>
[AJB: tweak title, merge with new patches, add mmap_lock]
Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
CC: Paolo Bonzini <[email protected]>

tcg: enable thread-per-vCPU

There are a couple of changes that occur at the same time here:

  - introduce a single vCPU qemu_tcg_cpu_thread_fn

  One of these is spawned per vCPU with its own Thread and Condition
  variables. qemu_tcg_rr_cpu_thread_fn is the new name for the old
  single threaded function.

  - the TLS current_cpu variable is now live for the lifetime of MTTCG
    vCPU threads. This is for future work where async jobs need to know
    the vCPU context they are operating in.

The user to switch on multi-thread behaviour and spawn a thread
per-vCPU. For a simple test kvm-unit-test like:

  ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=multi

Will now use 4 vCPU threads and have an expected FAIL (instead of the
unexpected PASS) as the default mode of the test has no protection when
incrementing a shared variable.

We enable the parallel_cpus flag to ensure we generate correct barrier
and atomic code if supported by the front and backends. This doesn't
automatically enable MTTCG until default_mttcg_enabled() is updated to
check the configuration is supported.

Signed-off-by: KONRAD Frederic <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
[AJB: Some fixes, conditionally, commit rewording]
Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

tcg: enable tb_lock() for SoftMMU

tb_lock() has long been used for linux-user mode to protect code
generation. By enabling it now we prepare for MTTCG and ensure all code
generation is serialised by this lock. The other major structure that
needs protecting is the l1_map and its PageDesc structures. For the
SoftMMU case we also use tb_lock() to protect these structures instead
of linux-user mmap_lock() which as the name suggests serialises updates
to the structure as a result of guest mmap operations.

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

tcg: remove global exit_request

There are now only two uses of the global exit_request left.

The first ensures we exit the run_loop when we first start to process
pending work and in the kick handler. This is just as easily done by
setting the first_cpu->exit_request flag.

The second use is in the round robin kick routine. The global
exit_request ensured every vCPU would set its local exit_request and
cause a full exit of the loop. Now the iothread isn't being held while
running we can just rely on the kick handler to push us out as intended.

We lightly re-factor the main vCPU thread to ensure cpu->exit_requests
cause us to exit the main loop and process any IO requests that might
come along. As an cpu->exit_request may legitimately get squashed
while processing the EXCP_INTERRUPT exception we also check
cpu->queued_work_first to ensure queued work is expedited as soon as
possible.

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

tcg: drop global lock during TCG code execution

This finally allows TCG to benefit from the iothread introduction: Drop
the global mutex while running pure TCG CPU code. Reacquire the lock
when entering MMIO or PIO emulation, or when leaving the TCG loop.

We have to revert a few optimization for the current TCG threading
model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
kicking it in qemu_cpu_kick. We also need to disable RAM block
reordering until we have a more efficient locking mechanism at hand.

Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
These numbers demonstrate where we gain something:

20338 jan       20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
20337 jan       20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm

The guest CPU was fully loaded, but the iothread could still run mostly
independent on a second core. Without the patch we don't get beyond

32206 jan       20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
32204 jan       20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm

We don't benefit significantly, though, when the guest is not fully
loading a host CPU.

Signed-off-by: Jan Kiszka <[email protected]>
Message-Id: <1439220437 [email protected]>
[FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex]
Signed-off-by: KONRAD Frederic <[email protected]>
[EGC: fixed iothread lock for cpu-exec IRQ handling]
Signed-off-by: Emilio G. Cota <[email protected]>
[AJB: -smp single-threaded fix, clean commit msg, BQL fixes]
Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Pranith Kumar <[email protected]>
[PM: target-arm changes]
Acked-by: Peter Maydell <[email protected]>

tcg: rename tcg_current_cpu to tcg_current_rr_cpu

..and make the definition local to cpus. In preparation for MTTCG the
concept of a global tcg_current_cpu will no longer make sense. However
we still need to keep track of it in the single-threaded case to be able
to exit quickly when required.

qemu_cpu_kick_no_halt() moves and becomes qemu_cpu_kick_rr_cpu() to
emphasise its use-case. qemu_cpu_kick now kicks the relevant cpu as
well as qemu_kick_rr_cpu() which will become a no-op in MTTCG.

For the time being the setting of the global exit_request remains.

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Pranith Kumar <[email protected]>

tcg: add kick timer for single-threaded vCPU emulation

Currently we rely on the side effect of the main loop grabbing the
iothread_mutex to give any long running basic block chains a kick to
ensure the next vCPU is scheduled. As this code is being re-factored and
rationalised we now do it explicitly here.

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Pranith Kumar <[email protected]>

tcg: add options for enabling MTTCG

We know there will be cases where MTTCG won't work until additional work
is done in the front/back ends to support. It will however be useful to
be able to turn it on.

As a result MTTCG will default to off unless the combination is
supported. However the user can turn it on for the sake of testing.

Signed-off-by: KONRAD Frederic <[email protected]>
[AJB: move to -accel tcg,thread=multi|single, defaults]
Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

tcg: move TCG_MO/BAR types into own file

We'll be using the memory ordering definitions to define values for
both the host and guest. To avoid fighting with circular header
dependencies just move these types into their own minimal header.

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

mttcg: Add missing tb_lock/unlock() in cpu_exec_step()

The recent patch enabling lock assertions uncovered the missing lock
acquisition in cpu_exec_step(). This patch adds them.

Signed-off-by: Pranith Kumar <[email protected]>
Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

mttcg: translate-all: Enable locking debug in a debug build

Enable tcg lock debug asserts in a debug build by default instead of
relying on DEBUG_LOCKING. None of the other DEBUG_* macros have
asserts, so this patch removes DEBUG_LOCKING and enable these asserts
in a debug build.

CC: Richard Henderson <[email protected]>
Signed-off-by: Pranith Kumar <[email protected]>
[AJB: tweak ifdefs so can be early in series]
Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

docs: new design document multi-thread-tcg.txt

This documents the current design for upgrading TCG emulation to take
advantage of modern CPUs by running a thread-per-CPU. The document goes
through the various areas of the code affected by such a change and
proposes design requirements for each part of the solution.

The text marked with (Current solution[s]) to document what the current
approaches being used are.

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170222' into staging

ppc patch queue for 2017-02-22

This pull request has:
   * Yet more POWER9 instruction implementations
   * Some extensions to the softfloat code which are necesssary for
     some of those instructions
   * Some preliminary patches in preparation for POWER9 softmmu
     implementation
   * Igor Mammedov's cleanups to unify hotplug cpu handling across
     architectures
   * Assorted bugfixes

The softfloat and cpu hotplug changes aren't entirely ppc specific (in
fact the hotplug stuff contains some pc specific patches).  However
they're included here because ppc is one of the main beneficiaries,
and the series depend on some ppc specific patches.

# gpg: Signature made Wed 22 Feb 2017 06:29:47 GMT
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <[email protected]>"
# gpg:                 aka "David Gibson (Red Hat) <[email protected]>"
# gpg:                 aka "David Gibson (ozlabs.org) <[email protected]>"
# gpg:                 aka "David Gibson (kernel.org) <[email protected]>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.9-20170222: (43 commits)
  hw/ppc/ppc405_uc.c: Avoid integer overflows
  hw/ppc/spapr: Check for valid page size when hot plugging memory
  target-ppc: fix Book-E TLB matching
  hw/net/spapr_llan: 6 byte mac address device tree entry
  machine: replace query_hotpluggable_cpus() callback with has_hotpluggable_cpus flag
  machine: unify [pc_|spapr_]query_hotpluggable_cpus() callbacks
  spapr: reuse machine->possible_cpus instead of cores[]
  change CPUArchId.cpu type to Object*
  pc: pass apic_id to pc_find_cpu_slot() directly so lookup could be done without CPU object
  pc: calculate topology only once when possible_cpus is initialised
  pc: move pcms->possible_cpus init out of pc_cpus_init()
  machine: move possible_cpus to MachineState
  hw/pci-host/prep: Do not use hw_error() in realize function
  target/ppc/POWER9: Direct all instr and data storage interrupts to the hypv
  target/ppc/POWER9: Adapt LPCR handling for POWER9
  target/ppc/POWER9: Add ISAv3.00 MMU definition
  target/ppc: Fix LPCR DPFD mask define
  target-ppc: Add xscvqpudz and xscvqpuwz instructions
  target-ppc: Implement round to odd variants of quad FP instructions
  softfloat: Add float128_to_uint32_round_to_zero()
  ...

Signed-off-by: Peter Maydell <[email protected]>

s390x/css: handle format-0 TIC CCW correctly

For TIC CCW, bit positions 8-32 of the format-1 CCW must contain zeros;
otherwise, a program-check condition is generated. For format-0 TIC CCWs,
bits 32-63 are ignored.

To convert TIC from format-0 CCW to format-1 CCW correctly, let's clear
bits 8-32 to guarantee compatibility.

Reviewed-by: Pierre Morel <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/arch_dump: pass cpuid into notes sections

we need to pass the cpuid into the pid field of the notes
section, otherwise the notes for different CPUs all have 0:

e.g. objdump -h shows:
old:
  5 .reg-s390-prefix/0 00000004  0000000000000000  0000000000000000
  6 .reg-s390-prefix 00000004  0000000000000000  0000000000000000
21 .reg-s390-prefix/0 00000004  0000000000000000  0000000000000000
new:
  5 .reg-s390-prefix/1 00000004  0000000000000000  0000000000000000
  6 .reg-s390-prefix 00000004  0000000000000000  0000000000000000
21 .reg-s390-prefix/2 00000004  0000000000000000  0000000000000000

Reported-by: Philipp Rudo <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/arch_dump: use proper note name and note size

In binutils/libbfd (bfd/elf.c) it is enforced that all s390
specific ELF notes like e.g. NT_S390_PREFIX or NT_S390_CTRS
have "LINUX" specified as note name and that the namesz is
6. Otherwise the notes are ignored.

QEMU currently uses "CORE" for these notes. Up to now this has
not been a real problem because the dump analysis tool "crash"
does handle that. But it will break all programs that use libbfd
for processing ELF notes.

So fix this and use "LINUX" for all s390 specific notes to comply
with libbfd. Also set the correct namesz.

Reported-by: Philipp Rudo <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

virtio-ccw: support VIRTIO_QUEUE_MAX virtqueues

The maximal number of virtqueues per device can be limited on a per
transport basis. For virtio-ccw this limit is defined by
VIRTIO_CCW_QUEUE_MAX, however the limitation used to come form the
number of adapter routes supported by flic (via notifiers).

Recently the limitation of the flic was adjusted so that it can
accommodate VIRTIO_QUEUE_MAX queues, and is in the meanwhile checked for
separately too.

Let us remove the transport specific limitation of virtio-ccw by
dropping VIRTIO_CCW_QUEUE_MAX and using VIRTIO_QUEUE_MAX instead.

Signed-off-by: Halil Pasic <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x: bump ADAPTER_ROUTES_MAX_GSI

Let's increase ADAPTER_ROUTES_MAX_GSI to VIRTIO_QUEUE_MAX which is the
largest demand foreseeable at the moment. Let us add a compatibility
macro for the previous machines so client code can maintain backwards
migration compatibility

To not mess up migration compatibility for virtio-ccw
VIRTIO_CCW_QUEUE_MAX is left at it's current value, and will be dropped
when virtio-ccw is converted to use the capability of the flic
introduced by this patch.

Signed-off-by: Halil Pasic <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

virtio-ccw: check flic->adapter_routes_max_batch

Currently VIRTIO_CCW_QUEUE_MAX is defined as ADAPTER_ROUTES_MAX_GSI.
That is when checking queue max we implicitly check the constraint
concerning the number of adapter routes. This won't be satisfactory any
more (due to backward migration considerations) if ADAPTER_ROUTES_MAX_GSI
changes (ADAPTER_ROUTES_MAX_GSI is going to change because we want to
support up to VIRTIO_QUEUE_MAX queues per virtio-ccw device).

Let us introduce a check on a recently introduce flic property which
gives us the compatibility machine aware limit on adapter routes.

Signed-off-by: Halil Pasic <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x: add property adapter_routes_max_batch

To make virtio-ccw supports more that 64 virtqueues we will have to
increase ADAPTER_ROUTES_MAX_GSI which is currently limiting the number if
possible adapter routes. Of course increasing the number of supported
routes can break backwards migration.

Let us introduce a compatibility property adapter_routes_max_batch so
client code can use the some old limit if in compatibility mode and
retain the migration compatibility.

Signed-off-by: Halil Pasic <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

virtio-ccw: Check the number of vqs in CCW_CMD_SET_IND

We cannot support more than 64 virtqueues with the 64 bits provided by
classic indicators. If a driver tries to setup classic indicators
(which it is free to do even for virtio-1 devices) for a device with
more than 64 virtqueues, we should reject the attempt so that the
driver does not end up with an unusable device.

This is in preparation for bumping the number of supported virtqueues
on the ccw transport.

Signed-off-by: Halil Pasic <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

virtio-ccw: add virtio-crypto-ccw device

Wire up virtio-crypto for the CCW based VIRTIO.

Signed-off-by: Halil Pasic <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

virtio-ccw: handle virtio 1 only devices

As a preparation for wiring-up virtio-crypto, the first non-transitional
virtio device on the ccw transport, let us introduce a mechanism for
disabling revision 0. This is more or less equivalent with disabling
legacy as revision 0 is legacy only, and legacy drivers use the revision
0 exclusively.

Signed-off-by: Halil Pasic <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/flic: fail migration on source already

Current code puts a 'FLIC_FAILED' marker into the migration stream
to indicate something went wrong while saving flic state and fails
load if it encounters that marker. VMState's put routine recently
gained the ability to return error codes (but did not wire it up
yet).

In order to be able to reap the benefits of returning an error and
failing migration on the source already once this gets wired up
in core, return an error in addition to storing 'FLIC_FAILED'.

Suggested-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
Reviewed-by: Jens Freimann <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>

s390x/kvm: detect some program check loops

Sometimes (e.g. early boot) a guest is broken in such ways that it loops
100% delivering operation exceptions (illegal operation) but the pgm new
PSW is not set properly. This will result in code being read from
address zero, which usually contains another illegal op. Let's detect
this case and put the guest in crashed state. Instead of only detecting
this for address zero apply a heuristic that will work for any program
check new psw so that it will also reach the crashed state if you
provide some random elf file to the -kernel option.
We do not want guest problem state to be able to trigger a guest panic,
e.g. by faulting on an address that is the same as the program check
new PSW, so we check for the problem state bit being off.

With this we
a: get rid of CPU consumption of such broken guests
b: keep the program old PSW. This allows to find out the original illegal
operation - making debugging such early boot issues much easier than
with single stepping

This relies on the kernel using a similar heuristic and passing such
operation exceptions to user space.

Signed-off-by: Christian Borntraeger <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/s390-virtio: get rid of DPRINTF

The DPRINTF approach is likely to introduce bitrot, and the preferred
way for debugging is tracing anyway. Fortunately, there are no users
(left), so nuke it.

Signed-off-by: Cornelia Huck <[email protected]>
Reviewed-by: Halil Pasic <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>

docker: Install python2 explicitly in docker image

Python is no longer installed implicitly, but the QEMU build system
requires it. List it in PACKAGES.

Reported-by: Auger Eric <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>
Message-Id: <20170222021801 [email protected]>
Tested-by: Eric Auger <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>

MAINTAINERS: merge Build and test automation with Docker tests

The docker framework is really just another piece in the build
automation puzzle so lets merge it together. For added bonus I've also
included the Travis and Patchew status links. The Shippable links will
be added later once mainline tests have been configured and setup.

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Message-Id: <20170220105139 [email protected]>
Signed-off-by: Fam Zheng <[email protected]>