Peter Maydell [Mon, 24 Oct 2016 17:26:59 +0000 (18:26 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches
# gpg: Signature made Mon 24 Oct 2016 17:02:47 BST
# gpg: using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (23 commits)
block/replication: Clarify 'top-id' parameter usage
block: More operations for meta dirty bitmap
tests: Add test code for hbitmap serialization
block: BdrvDirtyBitmap serialization interface
hbitmap: serialization
block: Assert that bdrv_release_dirty_bitmap succeeded
block: Add two dirty bitmap getters
block: Support meta dirty bitmap
tests: Add test code for meta bitmap
HBitmap: Introduce "meta" bitmap to track bit changes
block: Hide HBitmap in block dirty bitmap interface
quorum: do not allocate multiple iovecs for FIFO strategy
quorum: change child_iter to children_read
iotests: Do not rely on unavailable domains in 162
iotests: Remove raciness from 162
qemu-nbd: Add --fork option
qemu-iotests: Test I/O in a single drive from a throttling group
throttle: Correct access to wrong BlockBackendPublic structures
qapi: fix memory leak in bdrv_image_info_specific_dump
block: improve error handling in raw_open
...
Kevin Wolf [Mon, 24 Oct 2016 16:02:26 +0000 (18:02 +0200)]
Merge remote-tracking branch 'mreitz/tags/pull-block-2016-10-24' into queue-block
Block patches for master
# gpg: Signature made Mon Oct 24 17:56:44 2016 CEST
# gpg: using RSA key 0xF407DB0061D5CF40
# gpg: Good signature from "Max Reitz <[email protected]>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40
* mreitz/tags/pull-block-2016-10-24:
block/replication: Clarify 'top-id' parameter usage
block: More operations for meta dirty bitmap
tests: Add test code for hbitmap serialization
block: BdrvDirtyBitmap serialization interface
hbitmap: serialization
block: Assert that bdrv_release_dirty_bitmap succeeded
block: Add two dirty bitmap getters
block: Support meta dirty bitmap
tests: Add test code for meta bitmap
HBitmap: Introduce "meta" bitmap to track bit changes
block: Hide HBitmap in block dirty bitmap interface
quorum: do not allocate multiple iovecs for FIFO strategy
quorum: change child_iter to children_read
Fam Zheng [Thu, 13 Oct 2016 21:58:30 +0000 (17:58 -0400)]
block: More operations for meta dirty bitmap
Callers can create an iterator of meta bitmap with
bdrv_dirty_meta_iter_new(), then use the bdrv_dirty_iter_* operations on
it. Meta iterators are also counted by bitmap->active_iterators.
Also add a couple of functions to retrieve granularity and count.
Functions to serialize / deserialize(restore) HBitmap. HBitmap should be
saved to linear sequence of bits independently of endianness and bitmap
array element (unsigned long) size. Therefore Little Endian is chosen.
These functions are appropriate for dirty bitmap migration, restoring
the bitmap in several steps is available. To save performance, every
step writes only the last level of the bitmap. All other levels are
restored by hbitmap_deserialize_finish() as a last step of restoring.
So, HBitmap is inconsistent while restoring.
Fam Zheng [Thu, 13 Oct 2016 21:58:26 +0000 (17:58 -0400)]
block: Assert that bdrv_release_dirty_bitmap succeeded
We use a loop over bs->dirty_bitmaps to make sure the caller is
only releasing a bitmap owned by bs. Let's also assert that in this case
the caller is releasing a bitmap that does exist.
Fam Zheng [Thu, 13 Oct 2016 21:58:21 +0000 (17:58 -0400)]
block: Hide HBitmap in block dirty bitmap interface
HBitmap is an implementation detail of block dirty bitmap that should be hidden
from users. Introduce a BdrvDirtyBitmapIter to encapsulate the underlying
HBitmapIter.
A small difference in the interface is, before, an HBitmapIter is initialized
in place, now the new BdrvDirtyBitmapIter must be dynamically allocated because
the structure definition is in block/dirty-bitmap.c.
Paolo Bonzini [Wed, 5 Oct 2016 16:35:27 +0000 (18:35 +0200)]
quorum: do not allocate multiple iovecs for FIFO strategy
In FIFO mode there are no parallel reads, hence there is no need to
allocate separate buffers and clone the iovecs.
The two cases of quorum_aio_cb are now even more different, and
most of quorum_aio_finalize is only needed in one of them, so split
them in separate functions.
Max Reitz [Wed, 28 Sep 2016 20:46:44 +0000 (22:46 +0200)]
iotests: Do not rely on unavailable domains in 162
There are some (mostly ISP-specific) name servers who will redirect
non-existing domains to special hosts. In this case, we will get a
different error message when trying to connect to such a host, which
breaks test 162.
162 needed this specific error message so it can confirm that qemu was
indeed trying to connect to the user-specified port. However, we can
also confirm this by setting up a local NBD server on exactly that port;
so we can fix the issue by doing just that.
Max Reitz [Wed, 28 Sep 2016 20:46:42 +0000 (22:46 +0200)]
qemu-nbd: Add --fork option
Using the --fork option, one can make qemu-nbd fork the worker process.
The original process will exit on error of the worker or once the worker
enters the main loop.
Alberto Garcia [Mon, 17 Oct 2016 15:46:03 +0000 (18:46 +0300)]
qemu-iotests: Test I/O in a single drive from a throttling group
iotest 093 contains a test that creates a throttling group with
several drives and performs I/O in all of them. This patch adds a new
test that creates a similar setup but only performs I/O in one of the
drives at the same time.
This is useful to test that the round robin algorithm is behaving
properly in these scenarios, and is specifically written using the
regression introduced in 27ccdd52598290f0f8b58be56e as an example.
Alberto Garcia [Mon, 17 Oct 2016 15:46:02 +0000 (18:46 +0300)]
throttle: Correct access to wrong BlockBackendPublic structures
In 27ccdd52598290f0f8b58be56e235aff7aebfaf3 the throttling fields were
moved from BlockDriverState to BlockBackend. However in a few cases
the code started using throttling fields from the active BlockBackend
instead of the round-robin token, making the algorithm behave
incorrectly.
This can cause starvation if there's a throttling group with several
drives but only one of them has I/O.
Halil Pasic [Tue, 11 Oct 2016 14:12:35 +0000 (16:12 +0200)]
block: improve error handling in raw_open
Make raw_open for POSIX more consistent in handling errors by setting
the error object also when qemu_open fails. The error object was set
generally set in case of errors, but I guess this case was overlooked.
Do the same for win32.
Kevin Wolf [Fri, 7 Oct 2016 15:05:04 +0000 (17:05 +0200)]
block: Remove "options" indirection from blockdev-add
Now that QAPI supports boxed types, we can have unions at the top level
of a command, so let's put our real options directly there for
blockdev-add instead of having a single "options" dict that contains the
real arguments.
blockdev-add is still experimental and we already made substantial
changes to the API recently, so we're free to make changes like this
one, too.
Xu Tian [Sun, 9 Oct 2016 09:17:27 +0000 (17:17 +0800)]
block: failed qemu-img command should return non-zero exit code
If the backing file cannot be opened when doing qemu-img rebase, the
variable 'ret' was not assigned a non-zero value, and the qemu-img
process terminated with exit code zero. Fix this.
* remotes/bonzini/tags/for-upstream: (50 commits)
exec.c: workaround regression caused by alignment change in d2f39ad
char: remove explicit_be_open from CharDriverState
char: use common error path in qmp_chardev_add
char: replace avail_connections
char: remove unused qemu_chr_fe_event
char: use an enum for CHR_EVENT
char: remove unused CHR_EVENT_FOCUS
char: move fe_open in CharBackend
char: remove explicit_fe_open, use a set_handlers argument
char: rename chr_close/chr_free
char: move front end handlers in CharBackend
tests: start chardev unit tests
char: make some qemu_chr_fe skip if no driver
char: replace qemu_chr_claim/release with qemu_chr_fe_init/deinit
vhost-user: only initialize queue 0 CharBackend
char: fold qemu_chr_set_handlers in qemu_chr_fe_set_handlers
char: use qemu_chr_fe* functions with CharBackend argument
colo: claim in find_and_check_chardev
char: rename some frontend functions
char: remaining switch to CharBackend in frontend
...
Haozhong Zhang [Mon, 24 Oct 2016 12:49:37 +0000 (20:49 +0800)]
exec.c: workaround regression caused by alignment change in d2f39ad
Commit d2f39ad "exec.c: Ensure right alignment also for file backed ram"
added an additional alignment requirement on the size of backend file
besides the previous page size. On x86, the alignment is changed from
4KB in QEMU 2.6 to 2MB in QEMU 2.7.
This change breaks certain usages in QEMU 2.7 on x86, e.g.
-object memory-backend-file,id=mem1,mem-path=/tmp/,size=$SZ
-device pc-dimm,id=dimm1,memdev=mem1
where $SZ is multiple of 4KB but not 2MB (e.g. 1023M). QEMU 2.7
reports the following error message and aborts:
qemu-system-x86_64: -device pc-dimm,memdev=mem1,id=nv1: backend memory size must be multiple of 0x200000
The same regression may also happen in other platforms as indicated by
Igor Mammedov. This change is however necessary for s390 according to
the commit message of d2f39ad, so we workaround the regression by taking
the change only on s390.
No need to count the users of a CharDriverState, it can rely on the fact
of whether there is a CharBackend associated or if there is enough space
in the muxer.
Simplify and fold chr_mux_new_fe() in qemu_chr_fe_init() since there is
a single user now. Also switch from fprintf to raising error instead.
I introduced this function in d61b0c9a2f7f, but it isn't
used. Furthermore, it was incomplete, as it would need to translate QEMU
chr events to Spice port events.
(presumably it was used in the follow-up NBD-spice series that was not
completed: http://lists.gnu.org/archive/html/qemu-devel/2013-11/msg02024.html)
Since the hanlders are associated with a CharBackend, rather than the
CharDriverState, it is more appropriate to store in CharBackend. This
avoids the handler copy dance in qemu_chr_fe_set_handlers() then
mux_chr_update_read_handler(), by storing the CharBackend pointer
directly.
Also a mux CharDriver should go through mux->backends[focused], since
chr->be will stay NULL. Before that, it was possible to call
chr->handler by mistake with surprising results, for ex through
qemu_chr_be_can_write(), which would result in calling the last set
handler front end, not the one with focus.
In most cases, front ends do not care about the side effect of
CharBackend, so we can simply skip the checks and call the qemu_chr_fe
functions even without associated CharDriver.
char: replace qemu_chr_claim/release with qemu_chr_fe_init/deinit
Now that all front end use qemu_chr_fe_init(), we can move chardev
claiming in init(), and add a function deinit() to release the chardev
and cleanup handlers.
The qemu_chr_fe_claim_no_fail() for property are gone, since the
property will raise an error instead. In other cases, where there is
already an error path, an error is raised instead. Finally, other cases
are handled by &error_abort in qemu_chr_fe_init().
All the queues share the same chardev. Initialize only the first queue
CharBackend, and pass it to other queues. This will allow to claim the
chardev only once in a later change.
char: use qemu_chr_fe* functions with CharBackend argument
This also switches from qemu_chr_add_handlers() to
qemu_chr_fe_set_handlers(). Note that qemu_chr_fe_set_handlers() now
takes the focus when fe_open (qemu_chr_add_handlers() did take the
focus)
Store the property in a CharBackend instead of CharDriverState*. This
also replace systematically chr by chr.chr to access the
CharDriverState*. The following patches will replace it with calls to
qemu_chr_fe CharBackend functions.
This new structure is meant to keep the details associated with a char
driver usage. On initialization, it gets a tag from the mux backend.
It can change its handlers thanks to qemu_chr_fe_set_handlers().
This structure is introduced so that all frontend will be moved to hold
and use a CharBackend. This will allow to better track char usage and
allocation, and help prevent some memory leaks or corruption.
Make qemu_chr_add_handlers_full() aware of mux handling. This allows
introduction of a tag associated with the fe handlers and a
qemu_chr_set_handlers() function to set the handler for a particular
tag. That will allow to get rid of qemu_chr_add_handlers*() in later
changes, in favor of qemu_chr_fe_set_handler().
To this end, chr_update_read_handler callback is enhanced with a tag
argument, and mux_chr_update_read_handler() is splitted in new
functions: mux_chr_new_handler_tag(), mux_chr_set_handlers(),
mux_set_focus().
Paolo Bonzini [Sun, 23 Oct 2016 15:42:22 +0000 (17:42 +0200)]
xilinx: fix buffer overflow on realize
ASAN complains about buffer overflow when running:
aarch64-softmmu/qemu-system-aarch64 -machine xilinx-zynq-a9
==476==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000035e38 at pc 0x000000f75253 bp 0x7ffc597e0ec0 sp 0x7ffc597e0eb0
READ of size 8 at 0x602000035e38 thread T0
#0 0xf75252 in xilinx_spips_realize hw/ssi/xilinx_spips.c:623
#1 0xb9ef6c in device_set_realized hw/core/qdev.c:918
#2 0x129ae01 in property_set_bool qom/object.c:1854
#3 0x1296e70 in object_property_set qom/object.c:1088
#4 0x129dd1b in object_property_set_qobject qom/qom-qobject.c:27
#5 0x1297168 in object_property_set_bool qom/object.c:1157
#6 0xb9aeac in qdev_init_nofail hw/core/qdev.c:358
#7 0x78a5bf in zynq_init_spi_flashes /home/elmarco/src/qemu/hw/arm/xilinx_zynq.c:125
#8 0x78af60 in zynq_init /home/elmarco/src/qemu/hw/arm/xilinx_zynq.c:238
#9 0x998eac in main /home/elmarco/src/qemu/vl.c:4534
#10 0x7f96ed692730 in __libc_start_main (/lib64/libc.so.6+0x20730)
#11 0x41d0a8 in _start (/home/elmarco/src/qemu/aarch64-softmmu/qemu-system-aarch64+0x41d0a8)
0x602000035e38 is located 0 bytes to the right of 8-byte region [0x602000035e30,0x602000035e38)
allocated by thread T0 here:
#0 0x7f970b014e60 in malloc (/lib64/libasan.so.3+0xc6e60)
#1 0x7f96f15b0e18 in g_malloc (/lib64/libglib-2.0.so.0+0x4ee18)
#2 0xb9ef6c in device_set_realized hw/core/qdev.c:918
#3 0x129ae01 in property_set_bool qom/object.c:1854
#4 0x1296e70 in object_property_set qom/object.c:1088
#5 0x129dd1b in object_property_set_qobject qom/qom-qobject.c:27
#6 0x1297168 in object_property_set_bool qom/object.c:1157
#7 0xb9aeac in qdev_init_nofail hw/core/qdev.c:358
#8 0x78a5bf in zynq_init_spi_flashes /home/elmarco/src/qemu/hw/arm/xilinx_zynq.c:125
#9 0x78af60 in zynq_init /home/elmarco/src/qemu/hw/arm/xilinx_zynq.c:238
#10 0x998eac in main /home/elmarco/src/qemu/vl.c:4534
#11 0x7f96ed692730 in __libc_start_main (/lib64/libc.so.6+0x20730)
s->spi is allocated with the size of num_busses which may be 1 (by
default). Change to use a loop up to s->num_busses also for the
call to ssi_auto_connect_slaves().
The CharDriverState.init() callback is no longer set since commit a61ae7f88ce and thus unused. The only user, the malta FGPA display has
been converted to use an event "opened" callback instead.
malta: replace chr init by CHR_EVENT_OPENED handler
The CharDriverState.init() callback was introduced in commit ceecf1d158. It is only called from text_console_do_init(), but it is no
longer set since commit a61ae7f88 (init assignment has been removed by
accident).
It seems correct to use an event callback instead and print the console
text on CHR_EVENT_OPENED. That way we can remove the single user of
CharDriverState init().
Since commit b6607a1a204d, serial_hds_isa_init() was introduced to
factor out serial_isa_init() loops. However, sun4uv shouldn't start from
0 when there is a mm serial on 0 already. Add a "from" argument to
serial_hds_isa_init().
Found by reviewing the code, win_stdio_close() is called by
qemu_chr_free() which then call qemu_chr_free_common() taking care of
freeing CharDriverState*.
char: serial: check divider value against baud base
16550A UART device uses an oscillator to generate frequencies
(baud base), which decide communication speed. This speed could
be changed by dividing it by a divider. If the divider is
greater than the baud base, speed is set to zero, leading to a
divide by zero error. Add check to avoid it.
Paolo Bonzini [Thu, 22 Sep 2016 14:23:06 +0000 (16:23 +0200)]
memory: add a per-AddressSpace list of listeners
This speeds up MEMORY_LISTENER_CALL noticeably. Right now,
with many PCI devices you have N regions added to M AddressSpaces
(M = # PCI devices with bus-master enabled) and each call looks
up the whole listener list, with at least M listeners in it.
Because most of the regions in N are BARs, which are also roughly
proportional to M, the whole thing is O(M^3). This changes it
to O(M^2), which is the best we can do without rewriting the
whole thing.
Paolo Bonzini [Thu, 15 Sep 2016 13:16:00 +0000 (15:16 +0200)]
tcg: try sti when moving a constant into a dead memory temp
This comes from free from unifying tcg_reg_alloc_mov and
tcg_reg_alloc_movi's handling of TEMP_VAL_CONST. It triggers
often on moves to cc_dst, such as the following translation
of "sub $0x3c,%esp":
Paolo Bonzini [Wed, 12 Oct 2016 07:23:39 +0000 (09:23 +0200)]
target-i386: fix 32-bit addresses in LEA
This was found with test-i386. The issue is that instructions
such as
addr32 lea (%eax), %rax
did not perform a 32-bit extension, because the LEA translation
skipped the gen_lea_v_seg step. That step does not just add
segments, it also takes care of extending from address size to
pointer size.
Emilio G. Cota [Fri, 14 Oct 2016 09:54:51 +0000 (11:54 +0200)]
qht-bench: relax test_start/stop atomic accesses
test_start/stop are used only as flags to loop on. Barriers are unnecessary,
since no dependent data is transferred among threads apart from the flags
themselves.
This commit relaxes the three accesses to test_start/stop that were
not yet relaxed.
Paolo Bonzini [Mon, 19 Sep 2016 09:36:44 +0000 (11:36 +0200)]
atomic: base mb_read/mb_set on load-acquire and store-release
This introduces load-acquire and store-release operations in QEMU.
For now, just use them as an implementation detail of atomic_mb_read
and atomic_mb_set.
Since docs/atomics.txt documents that atomic_mb_read only synchronizes
with an atomic_mb_set of the same variable, we can use the new implementation
everywhere instead of seq-cst loads and stores.
Paolo Bonzini [Mon, 19 Sep 2016 09:10:57 +0000 (11:10 +0200)]
qemu-thread: use acquire/release to clarify semantics of QemuEvent
Do not use the somewhat mysterious atomic_mb_read/atomic_mb_set,
instead make sure that the operations on QemuEvent are annotated
with the desired acquire and release semantics.
In particular, qemu_event_set wakes up the waiting thread, so it must
be a release from the POV of the waker (compare with qemu_mutex_unlock).
And it actually needs a full barrier, because that's the only thing that
provides something like a "load-release".
Use smp_mb_acquire until we have atomic_load_acquire and
atomic_store_release in atomic.h.
Thomas Huth [Wed, 5 Oct 2016 09:54:44 +0000 (11:54 +0200)]
Put the copyright information on a separate line
The output string QEMU with "--version" is very long, it does
not fit into a normal line of a terminal window anymore. By
putting the copyright information on a separate line instead,
the output looks much nicer.
Roy Shterman [Sun, 9 Oct 2016 08:14:56 +0000 (11:14 +0300)]
block/iscsi: Adding new iSER transport layer option
iSER is a new transport layer supported in Libiscsi,
iSER provides a zero-copy RDMA capable interface that can
improve performance.
In order to use the new iSER transport one need to have RDMA supported HW
and to choose iser as the protocol name in Libiscsi URI.
For now iSER memory buffers are pre-allocated and pre-registered,
hence in order to work with iSER from QEMU, one need to enable
MEMLOCK attribute in the VM to be large enough for all iSER buffers and RDMA
resources.
Roy Shterman [Sun, 9 Oct 2016 08:14:55 +0000 (11:14 +0300)]
block/iscsi: Introducing new zero-copy API
A new API to deploy zero-copy command submission. The new API takes I/O
vectors list and number of I/O vectors to submit as input parameters
when initiating the command. New API must be used if working with
iSER transport option.
Peter Maydell [Mon, 24 Oct 2016 09:26:44 +0000 (10:26 +0100)]
Merge remote-tracking branch 'remotes/sstabellini/tags/xen-20161021-tag' into staging
Xen 2016/10/21
# gpg: Signature made Fri 21 Oct 2016 20:52:42 BST
# gpg: using RSA key 0x894F8F4870E1AE90
# gpg: Good signature from "Stefano Stabellini <[email protected]>"
# gpg: aka "Stefano Stabellini <[email protected]>"
# Primary key fingerprint: D04E 33AB A51F 67BA 07D3 0AEA 894F 8F48 70E1 AE90
* remotes/sstabellini/tags/xen-20161021-tag:
xen_platform: SUSE xenlinux unplug for emulated PCI
xen_platform: unplug also SCSI disks
xen-usb: do not reference PAGE_SIZE
Alex Bennée [Mon, 10 Oct 2016 15:46:25 +0000 (16:46 +0100)]
kvm-all: don't use stale dbg_data->cpu
The changes to run_on_cpu and friends mean that all helpers are passed
the CPUState of vCPU they are running on. The conversion missed the
field in commit e0eeb4a21a3ca4b296220ce4449d8acef9de9049 which
introduced bugs.
Olaf Hering [Fri, 21 Oct 2016 12:37:07 +0000 (14:37 +0200)]
xen_platform: SUSE xenlinux unplug for emulated PCI
Implement SUSE specific unplug protocol for emulated PCI devices
in PVonHVM guests. Its a simple 'outl(1, (ioaddr + 4));'.
This protocol was implemented and used since Xen 3.0.4.
It is used in all SUSE/SLES/openSUSE releases up to SLES11SP3 and
openSUSE 12.3.
In addition old (pre-2011) VMDP versions are handled as well.
Olaf Hering [Fri, 21 Oct 2016 12:37:06 +0000 (14:37 +0200)]
xen_platform: unplug also SCSI disks
Using 'vdev=sd[a-o]' will create an emulated LSI controller, which can
be used by the emulated BIOS to boot from disk. If the HVM domU has also
PV driver the disk may appear twice in the guest. To avoid this an
unplug of the emulated hardware is needed, similar to what is done for
IDE and NIC drivers already.
Since the SCSI controller provides only disks the entire controller can
be unplugged at once.
Impact of the change for classic and pvops based guest kernels:
Peter Maydell [Fri, 21 Oct 2016 12:49:58 +0000 (13:49 +0100)]
Merge remote-tracking branch 'remotes/riku/tags/pull-linux-user-20160921' into staging
Linux-user changes, mostly bugfixes and adding support for some
new syscalls and some obscure syscalls as well. Includes some
missed patches from earlier rounds, and dropping unicore32 target.
v2: fix the syslog patch and test build with clang-3.8
v3: drop ustat patch
# gpg: Signature made Fri 21 Oct 2016 13:38:06 BST
# gpg: using RSA key 0xB44890DEDE3C9BC0
# gpg: Good signature from "Riku Voipio <[email protected]>"
# gpg: aka "Riku Voipio <[email protected]>"
# Primary key fingerprint: FF82 03C8 C391 98AE 0581 41EF B448 90DE DE3C 9BC0
* remotes/riku/tags/pull-linux-user-20160921: (21 commits)
linux-user: disable unicore32 linux-user build
linux-user: added support for pwritev() system call.
linux-user: added support for preadv() system call.
linux-user: Fix fadvise64() syscall support for Mips32
linux-user: Redirect termbits.h for Mips64 to termbits.h for Mips32
linux-user: Update ioctls definitions for Mips32
linux-user: Update mips_syscall_args[] array in main.c
linux-user: Add support for syncfs() syscall
linux-user: Add support for clock_adjtime() syscall
linux-user: Fix definition of target_sigevent for 32-bit guests
linux-user: use libc wrapper instead of direct mremap syscall
linux-user: Don't use alloca() for epoll_wait's epoll event array
linux-user: add RTA_PRIORITY in netlink
linux-user: add kcmp() syscall
linux-user: sparc64: Use correct target SHMLBA in shmat()
linux-user: Remove a duplicate item from strace.list
linux-user: Fix syslog() syscall support
linux-user: Fix socketcall() syscall support
linux-user: Fix msgrcv() and msgsnd() syscalls support
linux-user: Fix mq_open() syscall support
...
In order to cleanup linux-user, we need support for most relatively
modern syscalls. unicore32 lacks support for syscalls like
epoll_pwait, preventing cleaning up the CONFIG_EPOLL mess.
This patch can be reverted when unicore32 starts either supporting
the syscalls as defined in mainline kernel, or the oldabi interface
gains support for syscalls supported since at kernel 2.6.19 / glibc 2.6
Dejan Jovicevic [Tue, 11 Oct 2016 09:52:47 +0000 (11:52 +0200)]
linux-user: added support for pwritev() system call.
This system call performs the same task as the writev() system call,
with the exception of having the fourth argument, offset, which
specifes the file offset at which the input operation is to be performed.
Because of this, the pwritev() implementation is based on the writev()
implementation in linux-user mode.
But, since pwritev() is implemented in the kernel as a 5-argument syscall,
5 arguments are needed to be handled as input and passed to the host
syscall.
The pos_l and pos_h argument of the safe_pwritev() are of type unsigned
long, which can be of different sizes on different platforms. The input
arguments are converted to the appropriate host size when passed to
safe_pwritev().
Dejan Jovicevic [Tue, 11 Oct 2016 09:52:46 +0000 (11:52 +0200)]
linux-user: added support for preadv() system call.
This system call performs the same task as the readv() system call,
with the exception of having the fourth argument, offset, which
specifes the file offset at which the input operation is to be performed.
Because of this, the preadv() implementation is based on the readv()
implementation in linux-user mode.
But, since preadv() is implemented in the kernel as a 5-argument syscall,
5 arguments are needed to be handled as input and passed to the host
syscall.
The pos_l and pos_h argument of the safe_preadv() are of type unsigned
long, which can be of different sizes on different platforms. The input
arguments are converted to the appropriate host size when passed to
safe_preadv().
linux-user: Fix fadvise64() syscall support for Mips32
By looking at the file arch/mips/kernel/scall32-o32.S in Linux
kernel, it can be deduced that, for Mips32 platform, syscall
corresponding to number _NR_fadvise64 as defined in kernel file
arch/mips/include/uapi/asm/unistd.h translates to kernel function
sys_fadvise64_64, and that argument layout for this system call is
as follows:
The same argument layout can be deduced from glibc code, and
relevant commit messages in linux kernel and glibc.
The fix is to change TARGET_NR_fadvise64 to TARGET_NR_fadvise64_64
in Mips32 syscall numbers table. Array mips_syscall_args[] in
linux-user/main.c also already have "fadvise64_64" (and not
"fadvise64") in corresponding place for the syscall number in
question, so no change for linux-user/main.c.
This patch also fixes the failure LTP test posix_fadvise03, if
executed on Qemu-emulated Mips32 platform (user mode).
linux-user: Redirect termbits.h for Mips64 to termbits.h for Mips32
linux-user/mips64/termbits.h and linux-user/mips/termbits.h
originate from the same files in Linux kernel. There is no plan
to split original headers in Linux kernel into Mips32 and Mips64
versions any time soon. Therefore, it is better not to have
separate Mips32 and Mips64 variants in Qemu.
This patch makes these two files effectively the same, allowing the
mainenance by changing only a single file. (This is already done in
the same fashion for some other headers in same directories.)
linux-user: Update mips_syscall_args[] array in main.c
Array mips_syscall_args[] determines number of arguments for each
syscall on Mips32. It wasn't updated with newer syscalls. Also,
preadv and pwritev have 5 arguments, not 6.
This patch implements Qemu user mode syncfs() syscall support. Syscall
syncfs() syncs the filesystem containing file determined by the open
file descriptor passed as the argument to syncfs().
The implementation consists of a straightforward invocation of host's
syncfs(). Configure and strace support is included as well.
Peter Maydell [Fri, 2 Sep 2016 17:40:01 +0000 (18:40 +0100)]
linux-user: Fix definition of target_sigevent for 32-bit guests
The sigevent structure includes a union with some fields which
are pointers. For the QEMU target_sigevent structure we must
represent these as abi_ulongs, not host function pointers.
This error was causing the compiler to believe it should 8-align
the _sigev_un union on a 64-bit host, which meant that the
code in target_to_host_sigevent() was looking at the wrong
offset to find the _tid field, and timer_create() would
spuriously fail with EINVAL.
This fixes the final loose end noted in LP:1042388.
While we're editing the structure, switch the 'int32_t' fields
to 'abi_int'; this will only matter for guests with non-standard
integer alignment like m68k.
Felix Janda [Fri, 30 Sep 2016 23:39:27 +0000 (19:39 -0400)]
linux-user: use libc wrapper instead of direct mremap syscall
This commit essentially reverts commit 3af72a4d98dca033492102603734cbc63cd2694a, which has replaced
five-argument calls to mremap() by direct mremap syscalls for
compatibility with glibc older than version 2.4.
The direct syscall was buggy for 64bit targets on 32bit hosts
because of the default integer type promotions. Since glibc-2.4
is now a decade old, we can remove this workaround.
Peter Maydell [Mon, 18 Jul 2016 14:36:00 +0000 (15:36 +0100)]
linux-user: Don't use alloca() for epoll_wait's epoll event array
The epoll event array which epoll_wait() allocates has a size
determined by the guest which could potentially be quite large.
Use g_try_new() rather than alloca() so that we can fail more
cleanly if the guest hands us an oversize value. (ENOMEM is
not a documented return value for epoll_wait() but in practice
some kernel configurations can return it -- see for instance
sys_oabi_epoll_wait() on ARM.)
This rearrangement includes fixing a bug where we were
incorrectly passing a negative length to unlock_user() in
the error-exit codepath.
Peter Maydell [Tue, 4 Oct 2016 13:13:46 +0000 (14:13 +0100)]
linux-user: sparc64: Use correct target SHMLBA in shmat()
In commit 40df8c0c0722 support was added for target-specific
handling of SHMLBA. Unfortunately the sparc64-specific part
of the change got lost somewhere between the patch being
posted to the list and going into master:
http://patchwork.ozlabs.org/patch/646980/
http://patchwork.ozlabs.org/patch/673339/
linux-user: Remove a duplicate item from strace.list
There is a duplicate item in strace.list. It is benign, but it
shouldn't be there, since it may lead to confusion and even bugs
in the future. It is the only duplicate in strace.list. This
patch removes it.
There are currently several problems related to syslog() support.
For example, if the second argument "bufp" of target syslog() syscall
is NULL, the current implementation always returns error code EFAULT.
However, NULL is a perfectly valid value for the second argument for
many use cases of this syscall. This is, for example, visible from
this excerpt of man page for syslog(2):
> EINVAL Bad arguments (e.g., bad type; or for type 2, 3, or 4, buf is
> NULL, or len is less than zero; or for type 8, the level is
> outside the range 1 to 8).
Moreover, the argument "bufp" is ignored for all cases of values of the
first argument, except 2, 3 and 4. This means that for such cases
(the first argument is not 2, 3 or 4), there is no need to pass "buf"
between host and target, and it can be set to NULL while calling host's
syslog(), without loss of emulation accuracy.
Note also that if "bufp" is NULL and the first argument is 2, 3 or 4, the
correct returned error code is EINVAL, not EFAULT.
All these details are reflected in this patch.
"#ifdef TARGET_NR_syslog" is also proprerly inserted when needed.
Support for Qemu's "-strace" switch for syslog() syscall is included too.
LTP tests syslog11 and syslog12 pass with this patch (while fail without
it), on any platform.
Changes to original patch by Riku Voipio:
fixed error paths in TARGET_SYSLOG_ACTION_READ_ALL to match
Since not all Linux host platforms support socketcall() (most notably
Intel), do_socketcall() function in Qemu's syscalls.c is implemented to
mirror the corespondant implementation of socketcall() in Linux kernel,
and to utilise individual socket operations that are supported on all
Linux platforms. (see kernel source file net/socket.c, definition of
socketcall).
However, error codes produced by Qemu implementation are wrong for the
cases of invalid values of the first argument. Also, naming of constants
is not consistent with kernel one, and not consistant with Qemu convention
of prefixing such constants with "TARGET_". This patch in that light
brings do_socketcall() closer to its kernel counterpart, and in that way
fixes the errors and yields more consisrtent Qemu code.
There were also three missing cases (among 20) for strace support for
socketcall(). The array that contains pointers for appropriate printing
functions is updated with 3 elements, however pointers to functions are
left NULL, and its implementation is left for future.
Also, this patch fixes failure of LTP test socketcall02, if executed on some
Qemu emulated sywstems (uer mode).
linux-user: Fix msgrcv() and msgsnd() syscalls support
If syscalls msgrcv() and msgsnd() fail, they return E2BIG, EACCES,
EAGAIN, EFAULT, EIDRM, EINTR, EINVAL, ENOMEM, or ENOMSG.
By examining negative scenarios of these syscalls for Mips, it was
established that ENOMSG does not have the same value accross all
platforms, but it is nevertheless not included for conversion in
the correspondant conversion table defined in linux-user/syscall.c.
This is certainly a bug, since it leads to the incorrect emulation
of msgrcv() and msgsnd() for scenarios involving ENOMSG.
This patch fixes this by extending the conversion table to include
ENOMSG.
Also, LTP test msgrcv04 will be fixed for some platforms.