Git Repo - qemu.git/log

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Mon 24 Oct 2016 17:02:47 BST
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (23 commits)
  block/replication: Clarify 'top-id' parameter usage
  block: More operations for meta dirty bitmap
  tests: Add test code for hbitmap serialization
  block: BdrvDirtyBitmap serialization interface
  hbitmap: serialization
  block: Assert that bdrv_release_dirty_bitmap succeeded
  block: Add two dirty bitmap getters
  block: Support meta dirty bitmap
  tests: Add test code for meta bitmap
  HBitmap: Introduce "meta" bitmap to track bit changes
  block: Hide HBitmap in block dirty bitmap interface
  quorum: do not allocate multiple iovecs for FIFO strategy
  quorum: change child_iter to children_read
  iotests: Do not rely on unavailable domains in 162
  iotests: Remove raciness from 162
  qemu-nbd: Add --fork option
  qemu-iotests: Test I/O in a single drive from a throttling group
  throttle: Correct access to wrong BlockBackendPublic structures
  qapi: fix memory leak in bdrv_image_info_specific_dump
  block: improve error handling in raw_open
  ...

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'mreitz/tags/pull-block-2016-10-24' into queue-block

Block patches for master

# gpg: Signature made Mon Oct 24 17:56:44 2016 CEST
# gpg:                using RSA key 0xF407DB0061D5CF40
# gpg: Good signature from "Max Reitz <[email protected]>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* mreitz/tags/pull-block-2016-10-24:
  block/replication: Clarify 'top-id' parameter usage
  block: More operations for meta dirty bitmap
  tests: Add test code for hbitmap serialization
  block: BdrvDirtyBitmap serialization interface
  hbitmap: serialization
  block: Assert that bdrv_release_dirty_bitmap succeeded
  block: Add two dirty bitmap getters
  block: Support meta dirty bitmap
  tests: Add test code for meta bitmap
  HBitmap: Introduce "meta" bitmap to track bit changes
  block: Hide HBitmap in block dirty bitmap interface
  quorum: do not allocate multiple iovecs for FIFO strategy
  quorum: change child_iter to children_read

Signed-off-by: Kevin Wolf <[email protected]>

block/replication: Clarify 'top-id' parameter usage

The replication driver only supports the 'top-id' parameter for the
secondary side; it must not be supplied for the primary side.

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Changlong Xie <[email protected]>
Message-id: 1476247808 [email protected]
Signed-off-by: Max Reitz <[email protected]>

block: More operations for meta dirty bitmap

Callers can create an iterator of meta bitmap with
bdrv_dirty_meta_iter_new(), then use the bdrv_dirty_iter_* operations on
it. Meta iterators are also counted by bitmap->active_iterators.

Also add a couple of functions to retrieve granularity and count.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: John Snow <[email protected]>
Message-id: 1476395910 [email protected]
Signed-off-by: Max Reitz <[email protected]>

tests: Add test code for hbitmap serialization

Signed-off-by: Fam Zheng <[email protected]>
[Fixed minor constant issue. --js]
Signed-off-by: John Snow <[email protected]>
Signed-off-by: John Snow <[email protected]>
Message-id: 1476395910 [email protected]
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

block: BdrvDirtyBitmap serialization interface

Several functions to provide necessary access to BdrvDirtyBitmap for
block-migration.c

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
[Add the "finish" parameters. - Fam]
Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: John Snow <[email protected]>
Message-id: 1476395910 [email protected]
Signed-off-by: Max Reitz <[email protected]>

hbitmap: serialization

Functions to serialize / deserialize(restore) HBitmap. HBitmap should be
saved to linear sequence of bits independently of endianness and bitmap
array element (unsigned long) size. Therefore Little Endian is chosen.

These functions are appropriate for dirty bitmap migration, restoring
the bitmap in several steps is available. To save performance, every
step writes only the last level of the bitmap. All other levels are
restored by hbitmap_deserialize_finish() as a last step of restoring.
So, HBitmap is inconsistent while restoring.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
[Fix left shift operand to 1UL; add "finish" parameter. - Fam]
Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: John Snow <[email protected]>
Message-id: 1476395910 [email protected]
Signed-off-by: Max Reitz <[email protected]>

block: Assert that bdrv_release_dirty_bitmap succeeded

We use a loop over bs->dirty_bitmaps to make sure the caller is
only releasing a bitmap owned by bs. Let's also assert that in this case
the caller is releasing a bitmap that does exist.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: John Snow <[email protected]>
Message-id: 1476395910 [email protected]
Signed-off-by: Max Reitz <[email protected]>

block: Add two dirty bitmap getters

For dirty bitmap users to get the size and the name of a
BdrvDirtyBitmap.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: John Snow <[email protected]>
Message-id: 1476395910 [email protected]
Signed-off-by: Max Reitz <[email protected]>

block: Support meta dirty bitmap

The added group of operations enables tracking of the changed bits in
the dirty bitmap.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: John Snow <[email protected]>
Message-id: 1476395910 [email protected]
Signed-off-by: Max Reitz <[email protected]>

tests: Add test code for meta bitmap

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: John Snow <[email protected]>
Message-id: 1476395910 [email protected]
Signed-off-by: Max Reitz <[email protected]>

HBitmap: Introduce "meta" bitmap to track bit changes

Upon each bit toggle, the corresponding bit in the meta bitmap will be
set.

Signed-off-by: Fam Zheng <[email protected]>
[Amended text inline. --js]
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: John Snow <[email protected]>
Message-id: 1476395910 [email protected]
Signed-off-by: Max Reitz <[email protected]>

block: Hide HBitmap in block dirty bitmap interface

HBitmap is an implementation detail of block dirty bitmap that should be hidden
from users. Introduce a BdrvDirtyBitmapIter to encapsulate the underlying
HBitmapIter.

A small difference in the interface is, before, an HBitmapIter is initialized
in place, now the new BdrvDirtyBitmapIter must be dynamically allocated because
the structure definition is in block/dirty-bitmap.c.

Two current users are converted too.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: John Snow <[email protected]>
Message-id: 1476395910 [email protected]
Signed-off-by: Max Reitz <[email protected]>

quorum: do not allocate multiple iovecs for FIFO strategy

In FIFO mode there are no parallel reads, hence there is no need to
allocate separate buffers and clone the iovecs.

The two cases of quorum_aio_cb are now even more different, and
most of quorum_aio_finalize is only needed in one of them, so split
them in separate functions.

Signed-off-by: Paolo Bonzini <[email protected]>
Message-id: 1475685327 [email protected]
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

quorum: change child_iter to children_read

This simplifies a bit the code by using the usual C "inclusive start,
exclusive end" pattern for ranges.

Signed-off-by: Paolo Bonzini <[email protected]>
Message-id: 1475685327 [email protected]
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

iotests: Do not rely on unavailable domains in 162

There are some (mostly ISP-specific) name servers who will redirect
non-existing domains to special hosts. In this case, we will get a
different error message when trying to connect to such a host, which
breaks test 162.

162 needed this specific error message so it can confirm that qemu was
indeed trying to connect to the user-specified port. However, we can
also confirm this by setting up a local NBD server on exactly that port;
so we can fix the issue by doing just that.

Reported-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests: Remove raciness from 162

With qemu-nbd's new --fork option, we no longer need to launch it the
hacky way.

Suggested-by: Sascha Silbe <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-nbd: Add --fork option

Using the --fork option, one can make qemu-nbd fork the worker process.
The original process will exit on error of the worker or once the worker
enters the main loop.

Suggested-by: Sascha Silbe <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: Test I/O in a single drive from a throttling group

iotest 093 contains a test that creates a throttling group with
several drives and performs I/O in all of them. This patch adds a new
test that creates a similar setup but only performs I/O in one of the
drives at the same time.

This is useful to test that the round robin algorithm is behaving
properly in these scenarios, and is specifically written using the
regression introduced in 27ccdd52598290f0f8b58be56e as an example.

Signed-off-by: Alberto Garcia <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

throttle: Correct access to wrong BlockBackendPublic structures

In 27ccdd52598290f0f8b58be56e235aff7aebfaf3 the throttling fields were
moved from BlockDriverState to BlockBackend. However in a few cases
the code started using throttling fields from the active BlockBackend
instead of the round-robin token, making the algorithm behave
incorrectly.

This can cause starvation if there's a throttling group with several
drives but only one of them has I/O.

Cc: [email protected]
Reported-by: Paolo Bonzini <[email protected]>
Signed-off-by: Alberto Garcia <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qapi: fix memory leak in bdrv_image_info_specific_dump

The 'obj' result of the visitor was not properly freed, like done in
other places doing a similar job.

Signed-off-by: Pino Toscano <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: improve error handling in raw_open

Make raw_open for POSIX more consistent in handling errors by setting
the error object also when qemu_open fails. The error object was set
generally set in case of errors, but I guess this case was overlooked.
Do the same for win32.

Signed-off-by: Halil Pasic <[email protected]>
Reviewed-by: Sascha Silbe <[email protected]>
Tested-by: Marc Hartmayer <[email protected]> (POSIX only)
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Remove "options" indirection from blockdev-add

Now that QAPI supports boxed types, we can have unions at the top level
of a command, so let's put our real options directly there for
blockdev-add instead of having a single "options" dict that contains the
real arguments.

blockdev-add is still experimental and we already made substantial
changes to the API recently, so we're free to make changes like this
one, too.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Max Reitz <[email protected]>

qcow2: Support BDRV_REQ_MAY_UNMAP

Handling this is similar to what is done to the L2 entry in the case of
compressed clusters.

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: failed qemu-img command should return non-zero exit code

If the backing file cannot be opened when doing qemu-img rebase, the
variable 'ret' was not assigned a non-zero value, and the qemu-img
process terminated with exit code zero. Fix this.

Signed-off-by: Xu Tian <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* KVM run_on_cpu fix (Alex)
* atomic usage fixes (Emilio, me)
* hugetlbfs alignment fix (Haozhong)
* CharBackend refactoring (Marc-André)
* test-i386 fixes (me)
* MemoryListener optimizations (me)
* Miscellaneous bugfixes (me)
* iSER support (Roy)
* --version formatting (Thomas)

# gpg: Signature made Mon 24 Oct 2016 14:46:19 BST
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg:                 aka "Paolo Bonzini <[email protected]>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (50 commits)
  exec.c: workaround regression caused by alignment change in d2f39ad
  char: remove explicit_be_open from CharDriverState
  char: use common error path in qmp_chardev_add
  char: replace avail_connections
  char: remove unused qemu_chr_fe_event
  char: use an enum for CHR_EVENT
  char: remove unused CHR_EVENT_FOCUS
  char: move fe_open in CharBackend
  char: remove explicit_fe_open, use a set_handlers argument
  char: rename chr_close/chr_free
  char: move front end handlers in CharBackend
  tests: start chardev unit tests
  char: make some qemu_chr_fe skip if no driver
  char: replace qemu_chr_claim/release with qemu_chr_fe_init/deinit
  vhost-user: only initialize queue 0 CharBackend
  char: fold qemu_chr_set_handlers in qemu_chr_fe_set_handlers
  char: use qemu_chr_fe* functions with CharBackend argument
  colo: claim in find_and_check_chardev
  char: rename some frontend functions
  char: remaining switch to CharBackend in frontend
  ...

Signed-off-by: Peter Maydell <[email protected]>

exec.c: workaround regression caused by alignment change in d2f39ad

Commit d2f39ad "exec.c: Ensure right alignment also for file backed ram"
added an additional alignment requirement on the size of backend file
besides the previous page size. On x86, the alignment is changed from
4KB in QEMU 2.6 to 2MB in QEMU 2.7.

This change breaks certain usages in QEMU 2.7 on x86, e.g.
-object memory-backend-file,id=mem1,mem-path=/tmp/,size=$SZ
-device pc-dimm,id=dimm1,memdev=mem1
where $SZ is multiple of 4KB but not 2MB (e.g. 1023M). QEMU 2.7
reports the following error message and aborts:
qemu-system-x86_64: -device pc-dimm,memdev=mem1,id=nv1: backend memory size must be multiple of 0x200000

The same regression may also happen in other platforms as indicated by
Igor Mammedov. This change is however necessary for s390 according to
the commit message of d2f39ad, so we workaround the regression by taking
the change only on s390.

Signed-off-by: Haozhong Zhang <[email protected]>
Reported-by: "Xu, Anthony" <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: remove explicit_be_open from CharDriverState

It's only used in qmp_chardev_add(), so use a create() argument instead.

Also switched to typedef functions for CharDriverParse/CharDriverCreate.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022100951 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: use common error path in qmp_chardev_add

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022100951 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: replace avail_connections

No need to count the users of a CharDriverState, it can rely on the fact
of whether there is a CharBackend associated or if there is enough space
in the muxer.

Simplify and fold chr_mux_new_fe() in qemu_chr_fe_init() since there is
a single user now. Also switch from fprintf to raising error instead.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022100951 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: remove unused qemu_chr_fe_event

I introduced this function in d61b0c9a2f7f, but it isn't
used. Furthermore, it was incomplete, as it would need to translate QEMU
chr events to Spice port events.

(presumably it was used in the follow-up NBD-spice series that was not
completed: http://lists.gnu.org/archive/html/qemu-devel/2013-11/msg02024.html)

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022100951 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: use an enum for CHR_EVENT

This may help to catch unhandled cases, and avoid having to maintain
numbering.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022100951 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: remove unused CHR_EVENT_FOCUS

Usage has long been removed, since commit f220174de8d9.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022100951 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: move fe_open in CharBackend

The fe_open state belongs to front end.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022100951 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: remove explicit_fe_open, use a set_handlers argument

No need to keep explicit_fe_open around if it affects only a
qemu_chr_fe_set_handlers(). Use an additional argument instead.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: rename chr_close/chr_free

The function is used to free the backend opaque pointer, let's name it
accordingly.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: move front end handlers in CharBackend

Since the hanlders are associated with a CharBackend, rather than the
CharDriverState, it is more appropriate to store in CharBackend. This
avoids the handler copy dance in qemu_chr_fe_set_handlers() then
mux_chr_update_read_handler(), by storing the CharBackend pointer
directly.

Also a mux CharDriver should go through mux->backends[focused], since
chr->be will stay NULL. Before that, it was possible to call
chr->handler by mistake with surprising results, for ex through
qemu_chr_be_can_write(), which would result in calling the last set
handler front end, not the one with focus.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

tests: start chardev unit tests

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: make some qemu_chr_fe skip if no driver

In most cases, front ends do not care about the side effect of
CharBackend, so we can simply skip the checks and call the qemu_chr_fe
functions even without associated CharDriver.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: replace qemu_chr_claim/release with qemu_chr_fe_init/deinit

Now that all front end use qemu_chr_fe_init(), we can move chardev
claiming in init(), and add a function deinit() to release the chardev
and cleanup handlers.

The qemu_chr_fe_claim_no_fail() for property are gone, since the
property will raise an error instead. In other cases, where there is
already an error path, an error is raised instead. Finally, other cases
are handled by &error_abort in qemu_chr_fe_init().

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

vhost-user: only initialize queue 0 CharBackend

All the queues share the same chardev. Initialize only the first queue
CharBackend, and pass it to other queues. This will allow to claim the
chardev only once in a later change.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: fold qemu_chr_set_handlers in qemu_chr_fe_set_handlers

qemu_chr_add_handlers*() have been removed in previous change, so the
common qemu_chr_set_handlers() is no longer needed.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: use qemu_chr_fe* functions with CharBackend argument

This also switches from qemu_chr_add_handlers() to
qemu_chr_fe_set_handlers(). Note that qemu_chr_fe_set_handlers() now
takes the focus when fe_open (qemu_chr_add_handlers() did take the
focus)

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

colo: claim in find_and_check_chardev

This factors out claiming of chardev, and changes the call to
non-fatal to return an error like the rest of the chardev checks.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: rename some frontend functions

qemu_chr_accept_input() and qemu_chr_disconnect() are only used by
frontend, so use qemu_chr_fe prefix.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: remaining switch to CharBackend in frontend

Similar to previous change, for the remaining CharDriverState front ends
users.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: replace PROP_CHR with CharBackend

Store the property in a CharBackend instead of CharDriverState*. This
also replace systematically chr by chr.chr to access the
CharDriverState*. The following patches will replace it with calls to
qemu_chr_fe CharBackend functions.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: start converting mux driver to use CharBackend

Start using qemu_chr_fe* CharBackend functions:
initialize a CharBackend and use qemu_chr_fe_set_handlers().

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: introduce CharBackend

This new structure is meant to keep the details associated with a char
driver usage. On initialization, it gets a tag from the mux backend.
It can change its handlers thanks to qemu_chr_fe_set_handlers().

This structure is introduced so that all frontend will be moved to hold
and use a CharBackend. This will allow to better track char usage and
allocation, and help prevent some memory leaks or corruption.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

mux: split mux_chr_update_read_handler()

Make qemu_chr_add_handlers_full() aware of mux handling. This allows
introduction of a tag associated with the fe handlers and a
qemu_chr_set_handlers() function to set the handler for a particular
tag. That will allow to get rid of qemu_chr_add_handlers*() in later
changes, in favor of qemu_chr_fe_set_handler().

To this end, chr_update_read_handler callback is enhanced with a tag
argument, and mux_chr_update_read_handler() is splitted in new
functions: mux_chr_new_handler_tag(), mux_chr_set_handlers(),
mux_set_focus().

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

xilinx: fix buffer overflow on realize

ASAN complains about buffer overflow when running:
aarch64-softmmu/qemu-system-aarch64 -machine xilinx-zynq-a9

==476==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000035e38 at pc 0x000000f75253 bp 0x7ffc597e0ec0 sp 0x7ffc597e0eb0
READ of size 8 at 0x602000035e38 thread T0
    #0 0xf75252 in xilinx_spips_realize hw/ssi/xilinx_spips.c:623
    #1 0xb9ef6c in device_set_realized hw/core/qdev.c:918
    #2 0x129ae01 in property_set_bool qom/object.c:1854
    #3 0x1296e70 in object_property_set qom/object.c:1088
    #4 0x129dd1b in object_property_set_qobject qom/qom-qobject.c:27
    #5 0x1297168 in object_property_set_bool qom/object.c:1157
    #6 0xb9aeac in qdev_init_nofail hw/core/qdev.c:358
    #7 0x78a5bf in zynq_init_spi_flashes /home/elmarco/src/qemu/hw/arm/xilinx_zynq.c:125
    #8 0x78af60 in zynq_init /home/elmarco/src/qemu/hw/arm/xilinx_zynq.c:238
    #9 0x998eac in main /home/elmarco/src/qemu/vl.c:4534
    #10 0x7f96ed692730 in __libc_start_main (/lib64/libc.so.6+0x20730)
    #11 0x41d0a8 in _start (/home/elmarco/src/qemu/aarch64-softmmu/qemu-system-aarch64+0x41d0a8)

0x602000035e38 is located 0 bytes to the right of 8-byte region [0x602000035e30,0x602000035e38)
allocated by thread T0 here:
    #0 0x7f970b014e60 in malloc (/lib64/libasan.so.3+0xc6e60)
    #1 0x7f96f15b0e18 in g_malloc (/lib64/libglib-2.0.so.0+0x4ee18)
    #2 0xb9ef6c in device_set_realized hw/core/qdev.c:918
    #3 0x129ae01 in property_set_bool qom/object.c:1854
    #4 0x1296e70 in object_property_set qom/object.c:1088
    #5 0x129dd1b in object_property_set_qobject qom/qom-qobject.c:27
    #6 0x1297168 in object_property_set_bool qom/object.c:1157
    #7 0xb9aeac in qdev_init_nofail hw/core/qdev.c:358
    #8 0x78a5bf in zynq_init_spi_flashes /home/elmarco/src/qemu/hw/arm/xilinx_zynq.c:125
    #9 0x78af60 in zynq_init /home/elmarco/src/qemu/hw/arm/xilinx_zynq.c:238
    #10 0x998eac in main /home/elmarco/src/qemu/vl.c:4534
    #11 0x7f96ed692730 in __libc_start_main (/lib64/libc.so.6+0x20730)

s->spi is allocated with the size of num_busses which may be 1 (by
default).  Change to use a loop up to s->num_busses also for the
call to ssi_auto_connect_slaves().

Reported-by: Marc-André Lureau <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: remove init callback

The CharDriverState.init() callback is no longer set since commit
a61ae7f88ce and thus unused. The only user, the malta FGPA display has
been converted to use an event "opened" callback instead.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

malta: replace chr init by CHR_EVENT_OPENED handler

The CharDriverState.init() callback was introduced in commit
ceecf1d158. It is only called from text_console_do_init(), but it is no
longer set since commit a61ae7f88 (init assignment has been removed by
accident).

It seems correct to use an event callback instead and print the console
text on CHR_EVENT_OPENED. That way we can remove the single user of
CharDriverState init().

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

sun4uv: fix serial initialization regression

Since commit b6607a1a204d, serial_hds_isa_init() was introduced to
factor out serial_isa_init() loops. However, sun4uv shouldn't start from
0 when there is a mm serial on 0 already. Add a "from" argument to
serial_hds_isa_init().

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

ringbuf: fix chr_write return value

It should return the number of written bytes.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: remove use-after-free on win-stdio

Found by reviewing the code, win_stdio_close() is called by
qemu_chr_free() which then call qemu_chr_free_common() taking care of
freeing CharDriverState*.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

rng: remove unused included header

DEFINE_PROP_CHR is not used (rng is not of TYPE_DEVICE)

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161022095318 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char.h: misc doc fix

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <20161011152012 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

char: serial: check divider value against baud base

16550A UART device uses an oscillator to generate frequencies
(baud base), which decide communication speed. This speed could
be changed by dividing it by a divider. If the divider is
greater than the baud base, speed is set to zero, leading to a
divide by zero error. Add check to avoid it.

Reported-by: Huawei PSIRT <[email protected]>
Signed-off-by: Prasad J Pandit <[email protected]>
Message-Id: <1476251888 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

memory: optimize memory_region_sync_dirty_bitmap

Avoid walking the FlatView of all address spaces. Most of the
address spaces will have no log_sync callback on their listeners.

Signed-off-by: Paolo Bonzini <[email protected]>

memory: optimize memory_global_dirty_log_sync

Only return a nonzero dirty_log_mask for RAM/ROM memory regions.

Signed-off-by: Paolo Bonzini <[email protected]>

memory: add a per-AddressSpace list of listeners

This speeds up MEMORY_LISTENER_CALL noticeably. Right now,
with many PCI devices you have N regions added to M AddressSpaces
(M = # PCI devices with bus-master enabled) and each call looks
up the whole listener list, with at least M listeners in it.
Because most of the regions in N are BARs, which are also roughly
proportional to M, the whole thing is O(M^3). This changes it
to O(M^2), which is the best we can do without rewriting the
whole thing.

Signed-off-by: Paolo Bonzini <[email protected]>

memory: eliminate global MemoryListeners

There is none, so just drop the code.

Signed-off-by: Paolo Bonzini <[email protected]>

tcg: try sti when moving a constant into a dead memory temp

This comes from free from unifying tcg_reg_alloc_mov and
tcg_reg_alloc_movi's handling of TEMP_VAL_CONST.  It triggers
often on moves to cc_dst, such as the following translation
of "sub $0x3c,%esp":

  before:                          after:
  subl   $0x3c,%ebp                subl   $0x3c,%ebp
  movl   %ebp,0x10(%r14)           movl   %ebp,0x10(%r14)
  movl   $0x3c,%ebx                movl   $0x3c,0x2c(%r14)
  movl   %ebx,0x2c(%r14)

Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <1473945360 [email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target-i386: fix 32-bit addresses in LEA

This was found with test-i386.  The issue is that instructions
such as

    addr32 lea (%eax), %rax

did not perform a 32-bit extension, because the LEA translation
skipped the gen_lea_v_seg step.  That step does not just add
segments, it also takes care of extending from address size to
pointer size.

Signed-off-by: Paolo Bonzini <[email protected]>

test-i386: fix bitrot for 64-bit

Signed-off-by: Paolo Bonzini <[email protected]>

qht-bench: relax test_start/stop atomic accesses

test_start/stop are used only as flags to loop on. Barriers are unnecessary,
since no dependent data is transferred among threads apart from the flags
themselves.

This commit relaxes the three accesses to test_start/stop that were
not yet relaxed.

Signed-off-by: Emilio G. Cota <[email protected]>

atomic: base mb_read/mb_set on load-acquire and store-release

This introduces load-acquire and store-release operations in QEMU.
For now, just use them as an implementation detail of atomic_mb_read
and atomic_mb_set.

Since docs/atomics.txt documents that atomic_mb_read only synchronizes
with an atomic_mb_set of the same variable, we can use the new implementation
everywhere instead of seq-cst loads and stores.

Signed-off-by: Paolo Bonzini <[email protected]>

rcu: simplify memory barriers

Thanks to the acquire semantics of qemu_event_reset and qemu_event_wait,
some memory barriers can be removed.

Signed-off-by: Paolo Bonzini <[email protected]>

qemu-thread: use acquire/release to clarify semantics of QemuEvent

Do not use the somewhat mysterious atomic_mb_read/atomic_mb_set,
instead make sure that the operations on QemuEvent are annotated
with the desired acquire and release semantics.

In particular, qemu_event_set wakes up the waiting thread, so it must
be a release from the POV of the waker (compare with qemu_mutex_unlock).
And it actually needs a full barrier, because that's the only thing that
provides something like a "load-release".

Use smp_mb_acquire until we have atomic_load_acquire and
atomic_store_release in atomic.h.

Signed-off-by: Paolo Bonzini <[email protected]>

atomic: introduce smp_mb_acquire and smp_mb_release

Signed-off-by: Paolo Bonzini <[email protected]>

Put the copyright information on a separate line

The output string QEMU with "--version" is very long, it does
not fit into a normal line of a terminal window anymore. By
putting the copyright information on a separate line instead,
the output looks much nicer.

Signed-off-by: Thomas Huth <[email protected]>
Message-Id: <1475661284 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

block/iscsi: Adding new iSER transport layer option

iSER is a new transport layer supported in Libiscsi,
iSER provides a zero-copy RDMA capable interface that can
improve performance.

In order to use the new iSER transport one need to have RDMA supported HW
and to choose iser as the protocol name in Libiscsi URI.

For now iSER memory buffers are pre-allocated and pre-registered,
hence in order to work with iSER from QEMU, one need to enable
MEMLOCK attribute in the VM to be large enough for all iSER buffers and RDMA
resources.

Signed-off-by: Roy Shterman <[email protected]>
Message-Id: <1476000896 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

block/iscsi: Introducing new zero-copy API

A new API to deploy zero-copy command submission. The new API takes I/O
vectors list and number of I/O vectors to submit as input parameters
when initiating the command. New API must be used if working with
iSER transport option.

Signed-off-by: Roy Shterman <[email protected]>
Message-Id: <1476000896 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge remote-tracking branch 'remotes/sstabellini/tags/xen-20161021-tag' into staging

Xen 2016/10/21

# gpg: Signature made Fri 21 Oct 2016 20:52:42 BST
# gpg:                using RSA key 0x894F8F4870E1AE90
# gpg: Good signature from "Stefano Stabellini <[email protected]>"
# gpg:                 aka "Stefano Stabellini <[email protected]>"
# Primary key fingerprint: D04E 33AB A51F 67BA 07D3  0AEA 894F 8F48 70E1 AE90

* remotes/sstabellini/tags/xen-20161021-tag:
  xen_platform: SUSE xenlinux unplug for emulated PCI
  xen_platform: unplug also SCSI disks
  xen-usb: do not reference PAGE_SIZE

Signed-off-by: Peter Maydell <[email protected]>

rbd: shift byte count as a 64-bit value

Otherwise, reads of more than 2GB fail. Until commit
7bbca9e290a9c7c217b5a24fc6094e91e54bd05d, reads of 2^41
bytes succeeded at least theoretically.

In fact, pdiscard ought to receive a 64-bit integer as the
count for the same reason.

Reported by Coverity.

Fixes: 7bbca9e290a9c7c217b5a24fc6094e91e54bd05d
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>

kvm-all: don't use stale dbg_data->cpu

The changes to run_on_cpu and friends mean that all helpers are passed
the CPUState of vCPU they are running on. The conversion missed the
field in commit e0eeb4a21a3ca4b296220ce4449d8acef9de9049 which
introduced bugs.

Reported-by: Claudio Imbrenda <[email protected]>
Tested-by: Claudio Imbrenda <[email protected]>
Signed-off-by: Alex Bennée <[email protected]>
Message-Id: <20161010154625 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

xen_platform: SUSE xenlinux unplug for emulated PCI

Implement SUSE specific unplug protocol for emulated PCI devices
in PVonHVM guests. Its a simple 'outl(1, (ioaddr + 4));'.
This protocol was implemented and used since Xen 3.0.4.
It is used in all SUSE/SLES/openSUSE releases up to SLES11SP3 and
openSUSE 12.3.
In addition old (pre-2011) VMDP versions are handled as well.

Signed-off-by: Olaf Hering <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>

xen_platform: unplug also SCSI disks

Using 'vdev=sd[a-o]' will create an emulated LSI controller, which can
be used by the emulated BIOS to boot from disk. If the HVM domU has also
PV driver the disk may appear twice in the guest. To avoid this an
unplug of the emulated hardware is needed, similar to what is done for
IDE and NIC drivers already.

Since the SCSI controller provides only disks the entire controller can
be unplugged at once.

Impact of the change for classic and pvops based guest kernels:

vdev=sda:disk0
before: pvops:   disk0=pv xvda + emulated sda
        classic: disk0=pv sda  + emulated sdq
after:  pvops:   disk0=pv xvda
        classic: disk0=pv sda

vdev=hda:disk0, vdev=sda:disk1
before: pvops:   disk0=pv xvda
                 disk1=emulated sda
        classic: disk0=pv hda
                 disk1=pv sda  + emulated sdq
after:  pvops:   disk0=pv xvda
                 disk1=not accessible by blkfront, index hda==index sda
        classic: disk0=pv hda
                 disk1=pv sda

vdev=hda:disk0, vdev=sda:disk1, vdev=sdb:disk2
before: pvops:   disk0=pv xvda
                 disk1=emulated sda
                 disk2=pv xvdb + emulated sdb
        classic: disk0=pv hda
                 disk1=pv sda  + emulated sdq
                 disk2=pv sdb  + emulated sdr
after:  pvops:   disk0=pv xvda
                 disk1=not accessible by blkfront, index hda==index sda
                 disk2=pv xvdb
        classic: disk0=pv hda
                 disk1=pv sda
                 disk2=pv sda

Signed-off-by: Olaf Hering <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>

xen-usb: do not reference PAGE_SIZE

PAGE_SIZE is undefined on ARM64. Use XC_PAGE_SIZE instead, which is
always 4096 even when page granularity is 64K.

For this to actually work with 64K pages, more changes are required.

Signed-off-by: Stefano Stabellini <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Release-acked-by: Wei Liu <[email protected]>

Merge remote-tracking branch 'remotes/riku/tags/pull-linux-user-20160921' into staging

Linux-user changes, mostly bugfixes and adding support for some
new syscalls and some obscure syscalls as well. Includes some
missed patches from earlier rounds, and dropping unicore32 target.

v2: fix the syslog patch and test build with clang-3.8
v3: drop ustat patch

# gpg: Signature made Fri 21 Oct 2016 13:38:06 BST
# gpg:                using RSA key 0xB44890DEDE3C9BC0
# gpg: Good signature from "Riku Voipio <[email protected]>"
# gpg:                 aka "Riku Voipio <[email protected]>"
# Primary key fingerprint: FF82 03C8 C391 98AE 0581  41EF B448 90DE DE3C 9BC0

* remotes/riku/tags/pull-linux-user-20160921: (21 commits)
  linux-user: disable unicore32 linux-user build
  linux-user: added support for pwritev() system call.
  linux-user: added support for preadv() system call.
  linux-user: Fix fadvise64() syscall support for Mips32
  linux-user: Redirect termbits.h for Mips64 to termbits.h for Mips32
  linux-user: Update ioctls definitions for Mips32
  linux-user: Update mips_syscall_args[] array in main.c
  linux-user: Add support for syncfs() syscall
  linux-user: Add support for clock_adjtime() syscall
  linux-user: Fix definition of target_sigevent for 32-bit guests
  linux-user: use libc wrapper instead of direct mremap syscall
  linux-user: Don't use alloca() for epoll_wait's epoll event array
  linux-user: add RTA_PRIORITY in netlink
  linux-user: add kcmp() syscall
  linux-user: sparc64: Use correct target SHMLBA in shmat()
  linux-user: Remove a duplicate item from strace.list
  linux-user: Fix syslog() syscall support
  linux-user: Fix socketcall() syscall support
  linux-user: Fix msgrcv() and msgsnd() syscalls support
  linux-user: Fix mq_open() syscall support
  ...

Signed-off-by: Peter Maydell <[email protected]>

linux-user: disable unicore32 linux-user build

In order to cleanup linux-user, we need support for most relatively
modern syscalls. unicore32 lacks support for syscalls like
epoll_pwait, preventing cleaning up the CONFIG_EPOLL mess.

This patch can be reverted when unicore32 starts either supporting
the syscalls as defined in mainline kernel, or the oldabi interface
gains support for syscalls supported since at kernel 2.6.19 / glibc 2.6

Cc: MPRC <[email protected]>
Cc: Xuetao Guan <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user: added support for pwritev() system call.

This system call performs the same task as the writev() system call,
with the exception of having the fourth argument, offset, which
specifes the file offset at which the input operation is to be performed.
Because of this, the pwritev() implementation is based on the writev()
implementation in linux-user mode.

But, since pwritev() is implemented in the kernel as a 5-argument syscall,
5 arguments are needed to be handled as input and passed to the host
syscall.

The pos_l and pos_h argument of the safe_pwritev() are of type unsigned
long, which can be of different sizes on different platforms. The input
arguments are converted to the appropriate host size when passed to
safe_pwritev().

Signed-off-by: Dejan Jovicevic <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user: added support for preadv() system call.

This system call performs the same task as the readv() system call,
with the exception of having the fourth argument, offset, which
specifes the file offset at which the input operation is to be performed.
Because of this, the preadv() implementation is based on the readv()
implementation in linux-user mode.

But, since preadv() is implemented in the kernel as a 5-argument syscall,
5 arguments are needed to be handled as input and passed to the host
syscall.

The pos_l and pos_h argument of the safe_preadv() are of type unsigned
long, which can be of different sizes on different platforms. The input
arguments are converted to the appropriate host size when passed to
safe_preadv().

Signed-off-by: Dejan Jovicevic <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user: Fix fadvise64() syscall support for Mips32

By looking at the file arch/mips/kernel/scall32-o32.S in Linux
kernel, it can be deduced that, for Mips32 platform, syscall
corresponding to number _NR_fadvise64 as defined in kernel file
arch/mips/include/uapi/asm/unistd.h translates to kernel function
sys_fadvise64_64, and that argument layout for this system call is
as follows:

              0             32 0             32
             +----------------+----------------+
      (arg1) |       fd       |     __pad      | (arg2)
             +----------------+----------------+
      (arg3) |             buffer              | (arg4)
             +----------------+----------------+
      (arg5) |               len               | (arg6)
             +----------------+----------------+
      (arg7) |     advise     |    not used    | (arg8)
             +----------------+----------------+

The same argument layout can be deduced from glibc code, and
relevant commit messages in linux kernel and glibc.

The fix is to change TARGET_NR_fadvise64 to TARGET_NR_fadvise64_64
in Mips32 syscall numbers table. Array mips_syscall_args[] in
linux-user/main.c also already have "fadvise64_64" (and not
"fadvise64") in corresponding place for the syscall number in
question, so no change for linux-user/main.c.

This patch also fixes the failure LTP test posix_fadvise03, if
executed on Qemu-emulated Mips32 platform (user mode).

Signed-off-by: Aleksandar Rikalo <[email protected]>
Signed-off-by: Miroslav Tisma <[email protected]>
Signed-off-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user: Redirect termbits.h for Mips64 to termbits.h for Mips32

linux-user/mips64/termbits.h and linux-user/mips/termbits.h
originate from the same files in Linux kernel. There is no plan
to split original headers in Linux kernel into Mips32 and Mips64
versions any time soon. Therefore, it is better not to have
separate Mips32 and Mips64 variants in Qemu.

This patch makes these two files effectively the same, allowing the
mainenance by changing only a single file. (This is already done in
the same fashion for some other headers in same directories.)

Signed-off-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user: Update ioctls definitions for Mips32

Update linux-user/mips/termbits.h with ioctl definitions from kernel
file arch/mips/include/uapi/asm/ioctls.h.

Signed-off-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user: Update mips_syscall_args[] array in main.c

Array mips_syscall_args[] determines number of arguments for each
syscall on Mips32. It wasn't updated with newer syscalls. Also,
preadv and pwritev have 5 arguments, not 6.

Signed-off-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user: Add support for syncfs() syscall

This patch implements Qemu user mode syncfs() syscall support. Syscall
syncfs() syncs the filesystem containing file determined by the open
file descriptor passed as the argument to syncfs().

The implementation consists of a straightforward invocation of host's
syncfs(). Configure and strace support is included as well.

Signed-off-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user: Add support for clock_adjtime() syscall

This patch implements Qemu user mode clock_adjtime() syscall support.

The implementation is based on invocation of host's clock_adjtime().

Signed-off-by: Aleksandar Rikalo <[email protected]>
Signed-off-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user: Fix definition of target_sigevent for 32-bit guests

The sigevent structure includes a union with some fields which
are pointers. For the QEMU target_sigevent structure we must
represent these as abi_ulongs, not host function pointers.

This error was causing the compiler to believe it should 8-align
the _sigev_un union on a 64-bit host, which meant that the
code in target_to_host_sigevent() was looking at the wrong
offset to find the _tid field, and timer_create() would
spuriously fail with EINVAL.

This fixes the final loose end noted in LP:1042388.

While we're editing the structure, switch the 'int32_t' fields
to 'abi_int'; this will only matter for guests with non-standard
integer alignment like m68k.

Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user: use libc wrapper instead of direct mremap syscall

This commit essentially reverts commit
3af72a4d98dca033492102603734cbc63cd2694a, which has replaced
five-argument calls to mremap() by direct mremap syscalls for
compatibility with glibc older than version 2.4.

The direct syscall was buggy for 64bit targets on 32bit hosts
because of the default integer type promotions. Since glibc-2.4
is now a decade old, we can remove this workaround.

Signed-off-by: Felix Janda <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user: Don't use alloca() for epoll_wait's epoll event array

The epoll event array which epoll_wait() allocates has a size
determined by the guest which could potentially be quite large.
Use g_try_new() rather than alloca() so that we can fail more
cleanly if the guest hands us an oversize value. (ENOMEM is
not a documented return value for epoll_wait() but in practice
some kernel configurations can return it -- see for instance
sys_oabi_epoll_wait() on ARM.)

This rearrangement includes fixing a bug where we were
incorrectly passing a negative length to unlock_user() in
the error-exit codepath.

Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user: add RTA_PRIORITY in netlink

Used by fedora21 on ppc64 in the network initialization

Signed-off-by: Laurent Vivier <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user: add kcmp() syscall

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user: sparc64: Use correct target SHMLBA in shmat()

In commit 40df8c0c0722 support was added for target-specific
handling of SHMLBA. Unfortunately the sparc64-specific part
of the change got lost somewhere between the patch being
posted to the list and going into master:
http://patchwork.ozlabs.org/patch/646980/
http://patchwork.ozlabs.org/patch/673339/

Add the accidentally-dropped code.

Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user: Remove a duplicate item from strace.list

There is a duplicate item in strace.list. It is benign, but it
shouldn't be there, since it may lead to confusion and even bugs
in the future. It is the only duplicate in strace.list. This
patch removes it.

Signed-off-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user: Fix syslog() syscall support

There are currently several problems related to syslog() support.

For example, if the second argument "bufp" of target syslog() syscall
is NULL, the current implementation always returns error code EFAULT.
However, NULL is a perfectly valid value for the second argument for
many use cases of this syscall. This is, for example, visible from
this excerpt of man page for syslog(2):

> EINVAL Bad arguments (e.g., bad type; or for type 2, 3, or 4, buf is
> NULL, or len is less than zero; or for type 8, the level is
> outside the range 1 to 8).

Moreover, the argument "bufp" is ignored for all cases of values of the
first argument, except 2, 3 and 4. This means that for such cases
(the first argument is not 2, 3 or 4), there is no need to pass "buf"
between host and target, and it can be set to NULL while calling host's
syslog(), without loss of emulation accuracy.

Note also that if "bufp" is NULL and the first argument is 2, 3 or 4, the
correct returned error code is EINVAL, not EFAULT.

All these details are reflected in this patch.

"#ifdef TARGET_NR_syslog" is also proprerly inserted when needed.

Support for Qemu's "-strace" switch for syslog() syscall is included too.

LTP tests syslog11 and syslog12 pass with this patch (while fail without
it), on any platform.

Changes to original patch by Riku Voipio:

fixed error paths in TARGET_SYSLOG_ACTION_READ_ALL to match

http://lxr.free-electrons.com/source/kernel/printk/printk.c?v=4.7#L1335

Should fix also the build error in:

https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg03721.html

Signed-off-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user: Fix socketcall() syscall support

Since not all Linux host platforms support socketcall() (most notably
Intel), do_socketcall() function in Qemu's syscalls.c is implemented to
mirror the corespondant implementation of socketcall() in Linux kernel,
and to utilise individual socket operations that are supported on all
Linux platforms. (see kernel source file net/socket.c, definition of
socketcall).

However, error codes produced by Qemu implementation are wrong for the
cases of invalid values of the first argument. Also, naming of constants
is not consistent with kernel one, and not consistant with Qemu convention
of prefixing such constants with "TARGET_". This patch in that light
brings do_socketcall() closer to its kernel counterpart, and in that way
fixes the errors and yields more consisrtent Qemu code.

There were also three missing cases (among 20) for strace support for
socketcall(). The array that contains pointers for appropriate printing
functions is updated with 3 elements, however pointers to functions are
left NULL, and its implementation is left for future.

Also, this patch fixes failure of LTP test socketcall02, if executed on some
Qemu emulated sywstems (uer mode).

Signed-off-by: Aleksandar Markovic <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user: Fix msgrcv() and msgsnd() syscalls support

If syscalls msgrcv() and msgsnd() fail, they return E2BIG, EACCES,
EAGAIN, EFAULT, EIDRM, EINTR, EINVAL, ENOMEM, or ENOMSG.

By examining negative scenarios of these syscalls for Mips, it was
established that ENOMSG does not have the same value accross all
platforms, but it is nevertheless not included for conversion in
the correspondant conversion table defined in linux-user/syscall.c.
This is certainly a bug, since it leads to the incorrect emulation
of msgrcv() and msgsnd() for scenarios involving ENOMSG.

This patch fixes this by extending the conversion table to include
ENOMSG.

Also, LTP test msgrcv04 will be fixed for some platforms.

Signed-off-by: Aleksandar Markovic <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>