Greg Kurz [Sun, 26 Feb 2017 22:42:03 +0000 (23:42 +0100)]
9pfs: introduce relative_openat_nofollow() helper
When using the passthrough security mode, symbolic links created by the
guest are actual symbolic links on the host file system.
Since the resolution of symbolic links during path walk is supposed to
occur on the client side. The server should hence never receive any path
pointing to an actual symbolic link. This isn't guaranteed by the protocol
though, and malicious code in the guest can trick the server to issue
various syscalls on paths whose one or more elements are symbolic links.
In the case of the "local" backend using the "passthrough" or "none"
security modes, the guest can directly create symbolic links to arbitrary
locations on the host (as per spec). The "mapped-xattr" and "mapped-file"
security modes are also affected to a lesser extent as they require some
help from an external entity to create actual symbolic links on the host,
i.e. another guest using "passthrough" mode for example.
The current code hence relies on O_NOFOLLOW and "l*()" variants of system
calls. Unfortunately, this only applies to the rightmost path component.
A guest could maliciously replace any component in a trusted path with a
symbolic link. This could allow any guest to escape a virtfs shared folder.
This patch introduces a variant of the openat() syscall that successively
opens each path element with O_NOFOLLOW. When passing a file descriptor
pointing to a trusted directory, one is guaranteed to be returned a
file descriptor pointing to a path which is beneath the trusted directory.
This will be used by subsequent patches to implement symlink-safe path walk
for any access to the backend.
Symbolic links aren't the only threats actually: a malicious guest could
change a path element to point to other types of file with undesirable
effects:
- a named pipe or any other thing that would cause openat() to block
- a terminal device which would become QEMU's controlling terminal
These issues can be addressed with O_NONBLOCK and O_NOCTTY.
Two helpers are introduced: one to open intermediate path elements and one
to open the rightmost path element.
Suggested-by: Jann Horn <[email protected]> Signed-off-by: Greg Kurz <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]>
(renamed openat_nofollow() to relative_openat_nofollow(),
assert path is relative and doesn't contain '//',
fixed side-effect in assert, Greg Kurz) Signed-off-by: Greg Kurz <[email protected]>
* remotes/kraxel/tags/pull-ui-20170227-1:
vnc: fix double free issues
spice: add display & head options
ui: Use XkbGetMap and XkbGetNames instead of XkbGetKeyboard
gtk-egl: add scanout_disable support
sdl2: add scanout_disable support
spice: add scanout_disable support
virtio-gpu: use dpy_gl_scanout_disable
console: add dpy_gl_scanout_disable
console: rename dpy_gl_scanout to dpy_gl_scanout_texture
on startup when running under XWayland. Keymap handling is
however still broken after this commit, since Xwayland is
reporting a keymap we can't handle
NB, native Wayland support (which is the default under GTK3) is
not affected - only XWayland (which can be requested with GDK_BACKEND
on GTK3, and is the only option for GTK2).
Crypto routines 'qcrypto_cipher_get_block_len' and
'qcrypto_cipher_get_key_len' return non-zero cipher block and key
lengths from static arrays 'alg_block_len[]' and 'alg_key_len[]'
respectively. Returning 'zero(0)' value from either of them would
likely lead to an error condition.
Paolo Bonzini [Mon, 27 Feb 2017 11:17:26 +0000 (12:17 +0100)]
tests-aio-multithread: use atomic_read properly
nodes[id].next is written by other threads. If atomic_read is not used
(matching atomic_set in mcs_mutex_lock!) the compiler can optimize the
whole "if" away!
Peter Maydell [Sun, 26 Feb 2017 22:40:23 +0000 (22:40 +0000)]
Merge remote-tracking branch 'remotes/artyom/tags/pull-sun4v-20170226' into staging
Pull request for Niagara patches 2017 02 26
# gpg: Signature made Sun 26 Feb 2017 21:56:06 GMT
# gpg: using RSA key 0x3360C3F7411A125F
# gpg: Good signature from "Artyom Tarasenko <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2AD8 6149 17F4 B2D7 05C0 BB12 3360 C3F7 411A 125F
* remotes/artyom/tags/pull-sun4v-20170226:
niagara: check if a serial port is available
niagara: fail if a firmware file is missing
Peter Maydell [Sun, 26 Feb 2017 16:38:40 +0000 (16:38 +0000)]
Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault' into staging
slirp updates
# gpg: Signature made Sun 26 Feb 2017 14:40:00 GMT
# gpg: using RSA key 0xB0A51BF58C9179C5
# gpg: Good signature from "Samuel Thibault <[email protected]>"
# gpg: aka "Samuel Thibault <[email protected]>"
# gpg: aka "Samuel Thibault <[email protected]>"
# gpg: aka "Samuel Thibault <[email protected]>"
# gpg: aka "Samuel Thibault <[email protected]>"
# gpg: aka "Samuel Thibault <[email protected]>"
# gpg: aka "Samuel Thibault <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 900C B024 B679 31D4 0F82 304B D017 8C76 7D06 9EE6
# Subkey fingerprint: AEBF 7448 FAB9 453A 4552 390E B0A5 1BF5 8C91 79C5
* remotes/thibault/tags/samuel-thibault:
slirp: tcp_listen(): Don't try to close() an fd we never opened
slirp: Convert mbufs to use g_malloc() and g_free()
slirp: Check qemu_socket() return value in udp_listen()
Peter Maydell [Sat, 4 Feb 2017 23:08:35 +0000 (23:08 +0000)]
slirp: tcp_listen(): Don't try to close() an fd we never opened
Coverity points out (CID 1005725) that an error-exit path in tcp_listen()
will try to close(s) even if the reason it got there was that the
qemu_socket() failed and s was never opened. Not only that, this isn't even
the right function to use, because we need closesocket() to do the right
thing on Windows. Change to using the right function and only calling it if
needed.
Peter Maydell [Sat, 4 Feb 2017 23:08:34 +0000 (23:08 +0000)]
slirp: Convert mbufs to use g_malloc() and g_free()
The mbuf code currently doesn't check the result of doing a malloc()
or realloc() of its data (spotted by Coverity, CID 1238946).
Since the m_inc() API assumes that extending an mbuf must succeed,
just convert to g_malloc() and g_free().
Peter Maydell [Sat, 4 Feb 2017 23:08:33 +0000 (23:08 +0000)]
slirp: Check qemu_socket() return value in udp_listen()
Check the return value from qemu_socket() rather than trying to
pass it to bind() as an fd argument even if it's negative.
This wouldn't have caused any negative consequences, because
it won't be a valid fd number and the bind call will fail;
but Coverity complains (CID 1005723).
Peter Maydell [Sun, 26 Feb 2017 12:26:37 +0000 (12:26 +0000)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches
# gpg: Signature made Fri 24 Feb 2017 18:08:26 GMT
# gpg: using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream:
tests: Use opened block node for block job tests
vvfat: Use opened node as backing file
block: Add bdrv_new_open_driver()
block: Factor out bdrv_open_driver()
block: Use BlockBackend for image probing
block: Factor out bdrv_open_child_bs()
block: Attach bs->file only during .bdrv_open()
block: Pass BdrvChild to bdrv_truncate()
mirror: Resize active commit base in mirror_run()
qcow2: Use BB for resizing in qcow2_amend_options()
blockdev: Use BlockBackend to resize in qmp_block_resize()
iotests: Fix another race in 030
qemu-img: Improve documentation for PREALLOC_MODE_FALLOC
qemu-img: Truncate before full preallocation
qemu-img: Add tests for raw image preallocation
qemu-img: Do not truncate before preallocation
qemu-iotests: redirect nbd server stdout to /dev/null
qemu-iotests: add ability to exclude certain protocols from tests
qemu-iotests: Test 137 only supports 'file' protocol
* remotes/cody/tags/block-pull-request:
RBD: Add support readv,writev for rbd
block/nfs: try to avoid the bounce buffer in pwritev
block/nfs: convert to preadv / pwritev
Peter Maydell [Sat, 25 Feb 2017 21:15:14 +0000 (21:15 +0000)]
Merge remote-tracking branch 'remotes/yongbok/tags/mips-20170224-2' into staging
MIPS patches 2017-02-24-2
CHanges:
* Add the Boston board with fixing the make check issue on 32-bit hosts.
# gpg: Signature made Fri 24 Feb 2017 11:43:45 GMT
# gpg: using RSA key 0x2238EB86D5F797C2
# gpg: Good signature from "Yongbok Kim <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 8600 4CF5 3415 A5D9 4CFA 2B5C 2238 EB86 D5F7 97C2
* remotes/yongbok/tags/mips-20170224-2:
hw/mips: MIPS Boston board support
Peter Maydell [Sat, 25 Feb 2017 18:43:52 +0000 (18:43 +0000)]
Merge remote-tracking branch 'remotes/stsquad/tags/pull-mttcg-240217-1' into staging
This is the MTTCG pull-request as posted yesterday.
# gpg: Signature made Fri 24 Feb 2017 11:17:51 GMT
# gpg: using RSA key 0xFBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <[email protected]>"
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44
* remotes/stsquad/tags/pull-mttcg-240217-1: (24 commits)
tcg: enable MTTCG by default for ARM on x86 hosts
hw/misc/imx6_src: defer clearing of SRC_SCR reset bits
target-arm: ensure all cross vCPUs TLB flushes complete
target-arm: don't generate WFE/YIELD calls for MTTCG
target-arm/powerctl: defer cpu reset work to CPU context
cputlb: introduce tlb_flush_*_all_cpus[_synced]
cputlb: atomically update tlb fields used by tlb_reset_dirty
cputlb: add tlb_flush_by_mmuidx async routines
cputlb and arm/sparc targets: convert mmuidx flushes from varg to bitmap
cputlb: introduce tlb_flush_* async work.
cputlb: tweak qemu_ram_addr_from_host_nofail reporting
cputlb: add assert_cpu_is_self checks
tcg: handle EXCP_ATOMIC exception for system emulation
tcg: enable thread-per-vCPU
tcg: enable tb_lock() for SoftMMU
tcg: remove global exit_request
tcg: drop global lock during TCG code execution
tcg: rename tcg_current_cpu to tcg_current_rr_cpu
tcg: add kick timer for single-threaded vCPU emulation
tcg: add options for enabling MTTCG
...
Peter Maydell [Sat, 25 Feb 2017 17:48:49 +0000 (17:48 +0000)]
Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20170224' into staging
A selection of s390x patches:
- cleanups, fixes and improvements
- program check loop detection (useful with the corresponding kernel
patch)
- wire up virtio-crypto for ccw
- and finally support many virtqueues for virtio-ccw
# gpg: Signature made Fri 24 Feb 2017 09:19:19 GMT
# gpg: using RSA key 0xDECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <[email protected]>"
# gpg: aka "Cornelia Huck <[email protected]>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0 18CE DECF 6B93 C6F0 2FAF
* remotes/cohuck/tags/s390x-20170224:
s390x/css: handle format-0 TIC CCW correctly
s390x/arch_dump: pass cpuid into notes sections
s390x/arch_dump: use proper note name and note size
virtio-ccw: support VIRTIO_QUEUE_MAX virtqueues
s390x: bump ADAPTER_ROUTES_MAX_GSI
virtio-ccw: check flic->adapter_routes_max_batch
s390x: add property adapter_routes_max_batch
virtio-ccw: Check the number of vqs in CCW_CMD_SET_IND
virtio-ccw: add virtio-crypto-ccw device
virtio-ccw: handle virtio 1 only devices
s390x/flic: fail migration on source already
s390x/kvm: detect some program check loops
s390x/s390-virtio: get rid of DPRINTF
Peter Maydell [Sat, 25 Feb 2017 16:37:32 +0000 (16:37 +0000)]
Merge remote-tracking branch 'remotes/famz/tags/for-upstream' into staging
Docker testing and shippable patches
Hi Peter,
These are testing and build automation patches:
- Shippable.com powered CI config
- Docker cross build
- Fixes and MAINTAINERS tweaks.
# gpg: Signature made Fri 24 Feb 2017 06:31:10 GMT
# gpg: using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6
* remotes/famz/tags/for-upstream:
docker: Install python2 explicitly in docker image
MAINTAINERS: merge Build and test automation with Docker tests
.shippable.yml: new CI provider
new: debian docker targets for cross-compiling
tests/docker: add basic user mapping support
Peter Maydell [Fri, 24 Feb 2017 18:34:27 +0000 (18:34 +0000)]
Merge remote-tracking branch 'remotes/armbru/tags/pull-util-2017-02-23' into staging
option cutils: Fix and clean up number conversions
# gpg: Signature made Thu 23 Feb 2017 19:41:17 GMT
# gpg: using RSA key 0x3870B400EB918653
# gpg: Good signature from "Markus Armbruster <[email protected]>"
# gpg: aka "Markus Armbruster <[email protected]>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* remotes/armbru/tags/pull-util-2017-02-23: (24 commits)
option: Fix checking of sizes for overflow and trailing crap
util/cutils: Change qemu_strtosz*() from int64_t to uint64_t
util/cutils: Return qemu_strtosz*() error and value separately
util/cutils: Let qemu_strtosz*() optionally reject trailing crap
qemu-img: Wrap cvtnum() around qemu_strtosz()
test-cutils: Drop suffix from test_qemu_strtosz_simple()
test-cutils: Use qemu_strtosz() more often
util/cutils: Drop QEMU_STRTOSZ_DEFSUFFIX_* macros
util/cutils: New qemu_strtosz()
util/cutils: Rename qemu_strtosz() to qemu_strtosz_MiB()
util/cutils: New qemu_strtosz_metric()
test-cutils: Cover qemu_strtosz() around range limits
test-cutils: Cover qemu_strtosz() with trailing crap
test-cutils: Cover qemu_strtosz() invalid input
test-cutils: Add missing qemu_strtosz()... endptr checks
option: Fix to reject invalid and overflowing numbers
util/cutils: Clean up control flow around qemu_strtol() a bit
util/cutils: Clean up variable names around qemu_strtol()
util/cutils: Rename qemu_strtoll(), qemu_strtoull()
util/cutils: Rewrite documentation of qemu_strtol() & friends
...
Kevin Wolf [Mon, 16 Jan 2017 16:17:38 +0000 (17:17 +0100)]
tests: Use opened block node for block job tests
blk_insert_bs() and block job related functions will soon require an
opened block node (permission calculations will involve the block
driver), so let our tests be consistent with the real users in this
respect.
Kevin Wolf [Wed, 15 Feb 2017 19:51:58 +0000 (20:51 +0100)]
vvfat: Use opened node as backing file
We should not try to assign a not yet opened node as the backing file,
because as soon as the permission system is added it will fail. The
just added bdrv_new_open_driver() function is the right tool to open a
file with an internal driver, use it.
In case anyone wonders whether that magic fake backing file to trigger a
special action on 'commit' actually works today: No, not for me. One
reason is that we've been adding a raw format driver on top for several
years now and raw doesn't support commit. Other reasons include that the
backing file isn't writable and the driver doesn't support reopen, and
it's also size 0 and the driver doesn't support bdrv_truncate. All of
these are easily fixable, but then 'commit' ended up in an infinite loop
deep in the vvfat code for me, so I thought I'd best leave it alone. I'm
not really sure what it was supposed to do anyway.
Kevin Wolf [Wed, 18 Jan 2017 16:16:41 +0000 (17:16 +0100)]
block: Add bdrv_new_open_driver()
This function allows to create more or less normal BlockDriverStates
even for BlockDrivers that aren't globally registered (e.g. helper
filters for block jobs).
Kevin Wolf [Wed, 18 Jan 2017 14:51:56 +0000 (15:51 +0100)]
block: Factor out bdrv_open_driver()
This is a function that doesn't do any option parsing, but just does
some basic BlockDriverState setup and calls the .bdrv_open() function of
the block driver.
Kevin Wolf [Fri, 17 Feb 2017 17:39:24 +0000 (18:39 +0100)]
block: Use BlockBackend for image probing
This fixes the use of a parent-less BdrvChild in bdrv_open_inherit() by
converting it into a BlockBackend. Which is exactly what it should be,
image probing is an external, standalone user of a node. The requests
can't be considered to originate from the format driver node because
that one isn't even opened yet.
Kevin Wolf [Fri, 16 Dec 2016 17:52:37 +0000 (18:52 +0100)]
block: Attach bs->file only during .bdrv_open()
The way that attaching bs->file worked was a bit unusual in that it was
the only child that would be attached to a node which is not opened yet.
Because of this, the block layer couldn't know yet which permissions the
driver would eventually need.
This patch moves the point where bs->file is attached to the beginning
of the individual .bdrv_open() implementations, so drivers already know
what they are going to do with the child. This is also more consistent
with how driver-specific children work.
For a moment, bdrv_open() gets its own BdrvChild to perform image
probing, but instead of directly assigning this BdrvChild to the BDS, it
becomes a temporary one and the node name is passed as an option to the
drivers, so that they can simply use bdrv_open_child() to create another
reference for their own use.
This duplicated child for (the not opened yet) bs is not the final
state, a follow-up patch will change the image probing code to use a
BlockBackend, which is completely independent of bs.
Kevin Wolf [Fri, 17 Feb 2017 10:11:28 +0000 (11:11 +0100)]
mirror: Resize active commit base in mirror_run()
This is more consistent with the commit block job, and it moves the code
to a place where we already have the necessary BlockBackends to resize
the base image when bdrv_truncate() is changed to require a BdrvChild.
Kevin Wolf [Fri, 17 Feb 2017 09:58:25 +0000 (10:58 +0100)]
qcow2: Use BB for resizing in qcow2_amend_options()
In order to able to convert bdrv_truncate() to take a BdrvChild and
later to correctly check the resize permission here, we need to use a
BlockBackend for resizing the image.
Kevin Wolf [Fri, 17 Feb 2017 10:23:33 +0000 (11:23 +0100)]
blockdev: Use BlockBackend to resize in qmp_block_resize()
In order to be able to do permission checking and to keep working with
the BdrvChild based bdrv_truncate() that this involves, we need to
create a temporary BlockBackend to resize the image.
Nir Soffer [Fri, 17 Feb 2017 00:51:27 +0000 (02:51 +0200)]
qemu-img: Improve documentation for PREALLOC_MODE_FALLOC
Now that we are truncating the file in both PREALLOC_MODE_FULL and
PREALLOC_MODE_OFF, not truncating in PREALLOC_MODE_FALLOC looks odd.
Add a comment explaining why we do not truncate in this case.
Nir Soffer [Fri, 17 Feb 2017 00:51:26 +0000 (02:51 +0200)]
qemu-img: Truncate before full preallocation
In a previous commit (qemu-img: Do not truncate before preallocation) we
moved truncate to the PREALLOC_MODE_OFF branch to avoid slowdown in
posix_fallocate().
However this change is not optimal when using PREALLOC_MODE_FULL, since
knowing the final size from the beginning could allow the file system
driver to do less allocations and possibly avoid fragmentation of the
file.
Now we truncate also before doing full preallocation.
Nir Soffer [Fri, 3 Feb 2017 19:50:37 +0000 (21:50 +0200)]
qemu-img: Do not truncate before preallocation
When using file system that does not support fallocate() (e.g. NFS <
4.2), truncating the file only when preallocation=OFF speeds up creating
raw file.
Here is example run, tested on Fedora 24 machine, creating raw file on
NFS version 3 server.
$ time ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 1g
Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc
real 0m21.185s
user 0m0.022s
sys 0m0.574s
$ time ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 1g
Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc
real 0m11.601s
user 0m0.016s
sys 0m0.525s
$ time dd if=/dev/zero of=mnt/test bs=1M count=1024 oflag=direct
1024+0 records in
1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 15.6627 s, 68.6 MB/s
real 0m16.104s
user 0m0.009s
sys 0m0.220s
Running with strace we can see that without this change we do one
pread() and one pwrite() for each block. With this change, we do only
one pwrite() per block.
This happens because posix_fallocate is checking if each block is
allocated before writing a byte to the block, and when truncating the
file before preallocation, all blocks are unallocated.
Jeff Cody [Tue, 14 Feb 2017 18:15:07 +0000 (13:15 -0500)]
qemu-iotests: redirect nbd server stdout to /dev/null
Some iotests (e.g. 174) try to filter the output of _make_test_image by
piping the stdout. Pipe the server stdout to /dev/null, so that filter
pipe does not need to wait until process completion.
Jeff Cody [Tue, 14 Feb 2017 16:21:17 +0000 (11:21 -0500)]
qemu-iotests: add ability to exclude certain protocols from tests
Add the ability for shell script tests to exclude specific
protocols. This is useful to allow all protocols except ones known to
not support a feature used in the test (e.g. .bdrv_create).
Peter Maydell [Fri, 24 Feb 2017 15:00:51 +0000 (15:00 +0000)]
Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2017-02-22' into staging
QAPI patches for 2017-02-22
# gpg: Signature made Wed 22 Feb 2017 19:12:27 GMT
# gpg: using RSA key 0x3870B400EB918653
# gpg: Good signature from "Markus Armbruster <[email protected]>"
# gpg: aka "Markus Armbruster <[email protected]>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* remotes/armbru/tags/pull-qapi-2017-02-22:
block: Don't bother asserting type of output visitor's output
monitor: Clean up handle_hmp_command() a bit
tests: Don't check qobject_type() before qobject_to_qbool()
tests: Don't check qobject_type() before qobject_to_qfloat()
tests: Don't check qobject_type() before qobject_to_qint()
tests: Don't check qobject_type() before qobject_to_qstring()
tests: Don't check qobject_type() before qobject_to_qlist()
Don't check qobject_type() before qobject_to_qdict()
test-qmp-event: Simplify and tighten event_test_emit()
libqtest: Clean up qmp_response() a bit
check-qjson: Simplify around compare_litqobj_to_qobj()
check-qdict: Tighten qdict_crumple_test_recursive() some
check-qdict: Simplify qdict_crumple_test_recursive()
qdict: Make qdict_get_qlist() safe like qdict_get_qdict()
net: Flatten simple union NetLegacyOptions
numa: Flatten simple union NumaOptions
Paul Burton [Thu, 8 Sep 2016 14:51:58 +0000 (15:51 +0100)]
hw/mips: MIPS Boston board support
Introduce support for emulating the MIPS Boston development board. The
Boston board is built around an FPGA & 3 PCIe controllers, one of which
is connected to an Intel EG20T Platform Controller Hub. It is used
during the development & debug of new CPUs and the software intended to
run on them, and is essentially the successor to the older MIPS Malta
board.
This patch does not implement the EG20T, instead connecting an already
supported ICH-9 AHCI controller. Whilst this isn't accurate it's enough
for typical stock Boston software (eg. Linux kernels) to work with hard
disks given that both the ICH-9 & EG20T implement the AHCI
specification.
Boston boards typically boot kernels in the FIT image format, and this
patch will treat kernels provided to QEMU as such. When loading a kernel
directly, the board code will generate minimal firmware much as the
Malta board code does. This firmware will set up the CM, CPC & GIC
register base addresses then set argument registers & jump to the kernel
entry point. Alternatively, bootloader code may be loaded using the bios
argument in which case no firmware will be generated & execution will
proceed from the start of the boot code at the default MIPS boot
exception vector (offset 0x1fc00000 into (c)kseg1).
Currently real Boston boards are always used with FPGA bitfiles that
include a Global Interrupt Controller (GIC), so the interrupt
configuration is only defined for such cases. Therefore the board will
only allow use of CPUs which implement the CPS components, including the
GIC, and will otherwise exit with a message.
Signed-off-by: Paul Burton <[email protected]> Reviewed-by: Yongbok Kim <[email protected]>
[[email protected]:
isolated boston machine support for mips64el.
updated for recent Chardev changes.
ignore missing bios/kernel for qtest.
added default -drive to if=ide explicitly.
changed default memory size into 1G due to make check failure
on 32-bit hosts] Signed-off-by: Yongbok Kim <[email protected]>
Alex Bennée [Thu, 23 Feb 2017 18:29:27 +0000 (18:29 +0000)]
tcg: enable MTTCG by default for ARM on x86 hosts
This enables the multi-threaded system emulation by default for ARMv7
and ARMv8 guests using the x86_64 TCG backend. This is because on the
guest side:
- The ARM translate.c/translate-64.c have been converted to
- use MTTCG safe atomic primitives
- emit the appropriate barrier ops
- The ARM machine has been updated to
- hold the BQL when modifying shared cross-vCPU state
- defer powerctl changes to async safe work
All the host backends support the barrier and atomic primitives but
need to provide same-or-better support for normal load/store
operations.
Alex Bennée [Thu, 23 Feb 2017 18:29:26 +0000 (18:29 +0000)]
hw/misc/imx6_src: defer clearing of SRC_SCR reset bits
The arm_reset_cpu/set_cpu_on/set_cpu_off() functions do their work
asynchronously in the target vCPUs context. As a result we need to
ensure the SRC_SCR reset bits correctly report the reset status at the
right time. To do this we defer the clearing of the bit with an async
job which will run after the work queued by ARM powerctl functions.
Alex Bennée [Thu, 23 Feb 2017 18:29:25 +0000 (18:29 +0000)]
target-arm: ensure all cross vCPUs TLB flushes complete
Previously flushes on other vCPUs would only get serviced when they
exited their TranslationBlocks. While this isn't overly problematic it
violates the semantics of TLB flush from the point of view of source
vCPU.
To solve this we call the cputlb *_all_cpus_synced() functions to do
the flushes which ensures all flushes are completed by the time the
vCPU next schedules its own work. As the TLB instructions are modelled
as CP writes the TB ends at this point meaning cpu->exit_request will
be checked before the next instruction is executed.
Deferring the work until the architectural sync point is a possible
future optimisation.
Alex Bennée [Thu, 23 Feb 2017 18:29:24 +0000 (18:29 +0000)]
target-arm: don't generate WFE/YIELD calls for MTTCG
The WFE and YIELD instructions are really only hints and in TCG's case
they were useful to move the scheduling on from one vCPU to the next. In
the parallel context (MTTCG) this just causes an unnecessary cpu_exit
and contention of the BQL.
Alex Bennée [Thu, 23 Feb 2017 18:29:23 +0000 (18:29 +0000)]
target-arm/powerctl: defer cpu reset work to CPU context
When switching a new vCPU on we want to complete a bunch of the setup
work before we start scheduling the vCPU thread. To do this cleanly we
defer vCPU setup to async work which will run the vCPUs execution
context as the thread is woken up. The scheduling of the work will kick
the vCPU awake.
This avoids potential races in MTTCG system emulation.
Alex Bennée [Thu, 23 Feb 2017 18:29:22 +0000 (18:29 +0000)]
cputlb: introduce tlb_flush_*_all_cpus[_synced]
This introduces support to the cputlb API for flushing all CPUs TLBs
with one call. This avoids the need for target helpers to iterate
through the vCPUs themselves.
An additional variant of the API (_synced) will cause the source vCPUs
work to be scheduled as "safe work". The result will be all the flush
operations will be complete by the time the originating vCPU executes
its safe work. The calling implementation can either end the TB
straight away (which will then pick up the cpu->exit_request on
entering the next block) or defer the exit until the architectural
sync point (usually a barrier instruction).
Alex Bennée [Thu, 23 Feb 2017 18:29:21 +0000 (18:29 +0000)]
cputlb: atomically update tlb fields used by tlb_reset_dirty
The main use case for tlb_reset_dirty is to set the TLB_NOTDIRTY flags
in TLB entries to force the slow-path on writes. This is used to mark
page ranges containing code which has been translated so it can be
invalidated if written to. To do this safely we need to ensure the TLB
entries in question for all vCPUs are updated before we attempt to run
the code otherwise a race could be introduced.
To achieve this we atomically set the flag in tlb_reset_dirty_range and
take care when setting it when the TLB entry is filled.
On 32 bit systems attempting to emulate 64 bit guests we don't even
bother as we might not have the atomic primitives available. MTTCG is
disabled in this case and can't be forced on. The copy_tlb_helper
function helps keep the atomic semantics in one place to avoid
confusion.
The dirty helper function is made static as it isn't used outside of
cputlb.
Alex Bennée [Thu, 23 Feb 2017 18:29:20 +0000 (18:29 +0000)]
cputlb: add tlb_flush_by_mmuidx async routines
This converts the remaining TLB flush routines to use async work when
detecting a cross-vCPU flush. The only minor complication is having to
serialise the var_list of MMU indexes into a form that can be punted
to an asynchronous job.
The pending_tlb_flush field on QOM's CPU structure also becomes a
bitfield rather than a boolean.
Alex Bennée [Thu, 23 Feb 2017 18:29:19 +0000 (18:29 +0000)]
cputlb and arm/sparc targets: convert mmuidx flushes from varg to bitmap
While the vargs approach was flexible the original MTTCG ended up
having munge the bits to a bitmap so the data could be used in
deferred work helpers. Instead of hiding that in cputlb we push the
change to the API to make it take a bitmap of MMU indexes instead.
For ARM some the resulting flushes end up being quite long so to aid
readability I've tended to move the index shifting to a new line so
all the bits being or-ed together line up nicely, for example:
KONRAD Frederic [Thu, 23 Feb 2017 18:29:18 +0000 (18:29 +0000)]
cputlb: introduce tlb_flush_* async work.
Some architectures allow to flush the tlb of other VCPUs. This is not a problem
when we have only one thread for all VCPUs but it definitely needs to be an
asynchronous work when we are in true multithreaded work.
We take the tb_lock() when doing this to avoid racing with other threads
which may be invalidating TB's at the same time. The alternative would
be to use proper atomic primitives to clear the tlb entries en-mass.
This patch doesn't do anything to protect other cputlb function being
called in MTTCG mode making cross vCPU changes.
Signed-off-by: KONRAD Frederic <[email protected]>
[AJB: remove need for g_malloc on defer, make check fixes, tb_lock] Signed-off-by: Alex Bennée <[email protected]> Reviewed-by: Richard Henderson <[email protected]>
This moves the helper function closer to where it is called and updates
the error message to report via error_report instead of the deprecated
fprintf.
Alex Bennée [Thu, 23 Feb 2017 18:29:16 +0000 (18:29 +0000)]
cputlb: add assert_cpu_is_self checks
For SoftMMU the TLB flushes are an example of a task that can be
triggered on one vCPU by another. To deal with this properly we need to
use safe work to ensure these changes are done safely. The new assert
can be enabled while debugging to catch these cases.
Pranith Kumar [Thu, 23 Feb 2017 18:29:15 +0000 (18:29 +0000)]
tcg: handle EXCP_ATOMIC exception for system emulation
The patch enables handling atomic code in the guest. This should be
preferably done in cpu_handle_exception(), but the current assumptions
regarding when we can execute atomic sections cause a deadlock.
The current mechanism discards the flags which were set in atomic
execution. We ensure they are properly saved by calling the
cc->cpu_exec_enter/leave() functions around the loop.
As we are running cpu_exec_step_atomic() from the outermost loop we
need to avoid an abort() when single stepping over atomic code since
debug exception longjmp will point to the the setlongjmp in
cpu_exec(). We do this by setting a new jmp_env so that it jumps back
here on an exception.
Alex Bennée [Thu, 23 Feb 2017 18:29:14 +0000 (18:29 +0000)]
tcg: enable thread-per-vCPU
There are a couple of changes that occur at the same time here:
- introduce a single vCPU qemu_tcg_cpu_thread_fn
One of these is spawned per vCPU with its own Thread and Condition
variables. qemu_tcg_rr_cpu_thread_fn is the new name for the old
single threaded function.
- the TLS current_cpu variable is now live for the lifetime of MTTCG
vCPU threads. This is for future work where async jobs need to know
the vCPU context they are operating in.
The user to switch on multi-thread behaviour and spawn a thread
per-vCPU. For a simple test kvm-unit-test like:
Will now use 4 vCPU threads and have an expected FAIL (instead of the
unexpected PASS) as the default mode of the test has no protection when
incrementing a shared variable.
We enable the parallel_cpus flag to ensure we generate correct barrier
and atomic code if supported by the front and backends. This doesn't
automatically enable MTTCG until default_mttcg_enabled() is updated to
check the configuration is supported.
Alex Bennée [Thu, 23 Feb 2017 18:29:13 +0000 (18:29 +0000)]
tcg: enable tb_lock() for SoftMMU
tb_lock() has long been used for linux-user mode to protect code
generation. By enabling it now we prepare for MTTCG and ensure all code
generation is serialised by this lock. The other major structure that
needs protecting is the l1_map and its PageDesc structures. For the
SoftMMU case we also use tb_lock() to protect these structures instead
of linux-user mmap_lock() which as the name suggests serialises updates
to the structure as a result of guest mmap operations.
Alex Bennée [Thu, 23 Feb 2017 18:29:12 +0000 (18:29 +0000)]
tcg: remove global exit_request
There are now only two uses of the global exit_request left.
The first ensures we exit the run_loop when we first start to process
pending work and in the kick handler. This is just as easily done by
setting the first_cpu->exit_request flag.
The second use is in the round robin kick routine. The global
exit_request ensured every vCPU would set its local exit_request and
cause a full exit of the loop. Now the iothread isn't being held while
running we can just rely on the kick handler to push us out as intended.
We lightly re-factor the main vCPU thread to ensure cpu->exit_requests
cause us to exit the main loop and process any IO requests that might
come along. As an cpu->exit_request may legitimately get squashed
while processing the EXCP_INTERRUPT exception we also check
cpu->queued_work_first to ensure queued work is expedited as soon as
possible.
Jan Kiszka [Thu, 23 Feb 2017 18:29:11 +0000 (18:29 +0000)]
tcg: drop global lock during TCG code execution
This finally allows TCG to benefit from the iothread introduction: Drop
the global mutex while running pure TCG CPU code. Reacquire the lock
when entering MMIO or PIO emulation, or when leaving the TCG loop.
We have to revert a few optimization for the current TCG threading
model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
kicking it in qemu_cpu_kick. We also need to disable RAM block
reordering until we have a more efficient locking mechanism at hand.
Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
These numbers demonstrate where we gain something:
20338 jan 20 0 331m 75m 6904 R 99 0.9 0:50.95 qemu-system-arm
20337 jan 20 0 331m 75m 6904 S 20 0.9 0:26.50 qemu-system-arm
The guest CPU was fully loaded, but the iothread could still run mostly
independent on a second core. Without the patch we don't get beyond
32206 jan 20 0 330m 73m 7036 R 82 0.9 1:06.00 qemu-system-arm
32204 jan 20 0 330m 73m 7036 S 21 0.9 0:17.03 qemu-system-arm
We don't benefit significantly, though, when the guest is not fully
loading a host CPU.
Signed-off-by: Jan Kiszka <[email protected]>
Message-Id: <1439220437[email protected]>
[FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex] Signed-off-by: KONRAD Frederic <[email protected]>
[EGC: fixed iothread lock for cpu-exec IRQ handling] Signed-off-by: Emilio G. Cota <[email protected]>
[AJB: -smp single-threaded fix, clean commit msg, BQL fixes] Signed-off-by: Alex Bennée <[email protected]> Reviewed-by: Richard Henderson <[email protected]> Reviewed-by: Pranith Kumar <[email protected]>
[PM: target-arm changes] Acked-by: Peter Maydell <[email protected]>
Alex Bennée [Thu, 23 Feb 2017 18:29:10 +0000 (18:29 +0000)]
tcg: rename tcg_current_cpu to tcg_current_rr_cpu
..and make the definition local to cpus. In preparation for MTTCG the
concept of a global tcg_current_cpu will no longer make sense. However
we still need to keep track of it in the single-threaded case to be able
to exit quickly when required.
qemu_cpu_kick_no_halt() moves and becomes qemu_cpu_kick_rr_cpu() to
emphasise its use-case. qemu_cpu_kick now kicks the relevant cpu as
well as qemu_kick_rr_cpu() which will become a no-op in MTTCG.
For the time being the setting of the global exit_request remains.
Alex Bennée [Thu, 23 Feb 2017 18:29:09 +0000 (18:29 +0000)]
tcg: add kick timer for single-threaded vCPU emulation
Currently we rely on the side effect of the main loop grabbing the
iothread_mutex to give any long running basic block chains a kick to
ensure the next vCPU is scheduled. As this code is being re-factored and
rationalised we now do it explicitly here.
KONRAD Frederic [Thu, 23 Feb 2017 18:29:08 +0000 (18:29 +0000)]
tcg: add options for enabling MTTCG
We know there will be cases where MTTCG won't work until additional work
is done in the front/back ends to support. It will however be useful to
be able to turn it on.
As a result MTTCG will default to off unless the combination is
supported. However the user can turn it on for the sake of testing.
Signed-off-by: KONRAD Frederic <[email protected]>
[AJB: move to -accel tcg,thread=multi|single, defaults] Signed-off-by: Alex Bennée <[email protected]> Reviewed-by: Richard Henderson <[email protected]>
Alex Bennée [Thu, 23 Feb 2017 18:29:07 +0000 (18:29 +0000)]
tcg: move TCG_MO/BAR types into own file
We'll be using the memory ordering definitions to define values for
both the host and guest. To avoid fighting with circular header
dependencies just move these types into their own minimal header.
Pranith Kumar [Thu, 23 Feb 2017 18:29:05 +0000 (18:29 +0000)]
mttcg: translate-all: Enable locking debug in a debug build
Enable tcg lock debug asserts in a debug build by default instead of
relying on DEBUG_LOCKING. None of the other DEBUG_* macros have
asserts, so this patch removes DEBUG_LOCKING and enable these asserts
in a debug build.
Alex Bennée [Thu, 23 Feb 2017 18:29:04 +0000 (18:29 +0000)]
docs: new design document multi-thread-tcg.txt
This documents the current design for upgrading TCG emulation to take
advantage of modern CPUs by running a thread-per-CPU. The document goes
through the various areas of the code affected by such a change and
proposes design requirements for each part of the solution.
The text marked with (Current solution[s]) to document what the current
approaches being used are.
Peter Maydell [Fri, 24 Feb 2017 10:13:57 +0000 (10:13 +0000)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170222' into staging
ppc patch queue for 2017-02-22
This pull request has:
* Yet more POWER9 instruction implementations
* Some extensions to the softfloat code which are necesssary for
some of those instructions
* Some preliminary patches in preparation for POWER9 softmmu
implementation
* Igor Mammedov's cleanups to unify hotplug cpu handling across
architectures
* Assorted bugfixes
The softfloat and cpu hotplug changes aren't entirely ppc specific (in
fact the hotplug stuff contains some pc specific patches). However
they're included here because ppc is one of the main beneficiaries,
and the series depend on some ppc specific patches.
* remotes/dgibson/tags/ppc-for-2.9-20170222: (43 commits)
hw/ppc/ppc405_uc.c: Avoid integer overflows
hw/ppc/spapr: Check for valid page size when hot plugging memory
target-ppc: fix Book-E TLB matching
hw/net/spapr_llan: 6 byte mac address device tree entry
machine: replace query_hotpluggable_cpus() callback with has_hotpluggable_cpus flag
machine: unify [pc_|spapr_]query_hotpluggable_cpus() callbacks
spapr: reuse machine->possible_cpus instead of cores[]
change CPUArchId.cpu type to Object*
pc: pass apic_id to pc_find_cpu_slot() directly so lookup could be done without CPU object
pc: calculate topology only once when possible_cpus is initialised
pc: move pcms->possible_cpus init out of pc_cpus_init()
machine: move possible_cpus to MachineState
hw/pci-host/prep: Do not use hw_error() in realize function
target/ppc/POWER9: Direct all instr and data storage interrupts to the hypv
target/ppc/POWER9: Adapt LPCR handling for POWER9
target/ppc/POWER9: Add ISAv3.00 MMU definition
target/ppc: Fix LPCR DPFD mask define
target-ppc: Add xscvqpudz and xscvqpuwz instructions
target-ppc: Implement round to odd variants of quad FP instructions
softfloat: Add float128_to_uint32_round_to_zero()
...
Dong Jia Shi [Tue, 21 Feb 2017 09:09:04 +0000 (10:09 +0100)]
s390x/css: handle format-0 TIC CCW correctly
For TIC CCW, bit positions 8-32 of the format-1 CCW must contain zeros;
otherwise, a program-check condition is generated. For format-0 TIC CCWs,
bits 32-63 are ignored.
To convert TIC from format-0 CCW to format-1 CCW correctly, let's clear
bits 8-32 to guarantee compatibility.
s390x/arch_dump: use proper note name and note size
In binutils/libbfd (bfd/elf.c) it is enforced that all s390
specific ELF notes like e.g. NT_S390_PREFIX or NT_S390_CTRS
have "LINUX" specified as note name and that the namesz is
6. Otherwise the notes are ignored.
QEMU currently uses "CORE" for these notes. Up to now this has
not been a real problem because the dump analysis tool "crash"
does handle that. But it will break all programs that use libbfd
for processing ELF notes.
So fix this and use "LINUX" for all s390 specific notes to comply
with libbfd. Also set the correct namesz.
Halil Pasic [Mon, 7 Dec 2015 15:45:17 +0000 (16:45 +0100)]
virtio-ccw: support VIRTIO_QUEUE_MAX virtqueues
The maximal number of virtqueues per device can be limited on a per
transport basis. For virtio-ccw this limit is defined by
VIRTIO_CCW_QUEUE_MAX, however the limitation used to come form the
number of adapter routes supported by flic (via notifiers).
Recently the limitation of the flic was adjusted so that it can
accommodate VIRTIO_QUEUE_MAX queues, and is in the meanwhile checked for
separately too.
Let us remove the transport specific limitation of virtio-ccw by
dropping VIRTIO_CCW_QUEUE_MAX and using VIRTIO_QUEUE_MAX instead.
Halil Pasic [Fri, 9 Dec 2016 19:00:21 +0000 (20:00 +0100)]
s390x: bump ADAPTER_ROUTES_MAX_GSI
Let's increase ADAPTER_ROUTES_MAX_GSI to VIRTIO_QUEUE_MAX which is the
largest demand foreseeable at the moment. Let us add a compatibility
macro for the previous machines so client code can maintain backwards
migration compatibility
To not mess up migration compatibility for virtio-ccw
VIRTIO_CCW_QUEUE_MAX is left at it's current value, and will be dropped
when virtio-ccw is converted to use the capability of the flic
introduced by this patch.
Halil Pasic [Fri, 9 Dec 2016 18:58:10 +0000 (19:58 +0100)]
virtio-ccw: check flic->adapter_routes_max_batch
Currently VIRTIO_CCW_QUEUE_MAX is defined as ADAPTER_ROUTES_MAX_GSI.
That is when checking queue max we implicitly check the constraint
concerning the number of adapter routes. This won't be satisfactory any
more (due to backward migration considerations) if ADAPTER_ROUTES_MAX_GSI
changes (ADAPTER_ROUTES_MAX_GSI is going to change because we want to
support up to VIRTIO_QUEUE_MAX queues per virtio-ccw device).
Let us introduce a check on a recently introduce flic property which
gives us the compatibility machine aware limit on adapter routes.
Halil Pasic [Fri, 9 Dec 2016 12:51:46 +0000 (13:51 +0100)]
s390x: add property adapter_routes_max_batch
To make virtio-ccw supports more that 64 virtqueues we will have to
increase ADAPTER_ROUTES_MAX_GSI which is currently limiting the number if
possible adapter routes. Of course increasing the number of supported
routes can break backwards migration.
Let us introduce a compatibility property adapter_routes_max_batch so
client code can use the some old limit if in compatibility mode and
retain the migration compatibility.
Halil Pasic [Thu, 10 Dec 2015 11:55:06 +0000 (12:55 +0100)]
virtio-ccw: Check the number of vqs in CCW_CMD_SET_IND
We cannot support more than 64 virtqueues with the 64 bits provided by
classic indicators. If a driver tries to setup classic indicators
(which it is free to do even for virtio-1 devices) for a device with
more than 64 virtqueues, we should reject the attempt so that the
driver does not end up with an unusable device.
This is in preparation for bumping the number of supported virtqueues
on the ccw transport.
Halil Pasic [Tue, 14 Feb 2017 13:06:07 +0000 (14:06 +0100)]
virtio-ccw: handle virtio 1 only devices
As a preparation for wiring-up virtio-crypto, the first non-transitional
virtio device on the ccw transport, let us introduce a mechanism for
disabling revision 0. This is more or less equivalent with disabling
legacy as revision 0 is legacy only, and legacy drivers use the revision
0 exclusively.
Cornelia Huck [Wed, 25 Jan 2017 15:01:03 +0000 (16:01 +0100)]
s390x/flic: fail migration on source already
Current code puts a 'FLIC_FAILED' marker into the migration stream
to indicate something went wrong while saving flic state and fails
load if it encounters that marker. VMState's put routine recently
gained the ability to return error codes (but did not wire it up
yet).
In order to be able to reap the benefits of returning an error and
failing migration on the source already once this gets wired up
in core, return an error in addition to storing 'FLIC_FAILED'.
Sometimes (e.g. early boot) a guest is broken in such ways that it loops
100% delivering operation exceptions (illegal operation) but the pgm new
PSW is not set properly. This will result in code being read from
address zero, which usually contains another illegal op. Let's detect
this case and put the guest in crashed state. Instead of only detecting
this for address zero apply a heuristic that will work for any program
check new psw so that it will also reach the crashed state if you
provide some random elf file to the -kernel option.
We do not want guest problem state to be able to trigger a guest panic,
e.g. by faulting on an address that is the same as the program check
new PSW, so we check for the problem state bit being off.
With this we
a: get rid of CPU consumption of such broken guests
b: keep the program old PSW. This allows to find out the original illegal
operation - making debugging such early boot issues much easier than
with single stepping
This relies on the kernel using a similar heuristic and passing such
operation exceptions to user space.
Cornelia Huck [Thu, 26 Jan 2017 16:47:40 +0000 (17:47 +0100)]
s390x/s390-virtio: get rid of DPRINTF
The DPRINTF approach is likely to introduce bitrot, and the preferred
way for debugging is tracing anyway. Fortunately, there are no users
(left), so nuke it.
Alex Bennée [Mon, 20 Feb 2017 10:51:39 +0000 (10:51 +0000)]
MAINTAINERS: merge Build and test automation with Docker tests
The docker framework is really just another piece in the build
automation puzzle so lets merge it together. For added bonus I've also
included the Travis and Patchew status links. The Shippable links will
be added later once mainline tests have been configured and setup.