Git Repo - qemu.git/log

Merge remote-tracking branch 'remotes/famz/tags/block-pull-request' into staging

# gpg: Signature made Tue 18 Apr 2017 15:58:32 BST
# gpg:                using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021  AD56 CA35 624C 6A91 71C6

* remotes/famz/tags/block-pull-request:
  block: Drain BH in bdrv_drained_begin
  block: Walk bs->children carefully in bdrv_drain_recurse

Signed-off-by: Peter Maydell <[email protected]>

block: Drain BH in bdrv_drained_begin

During block job completion, nothing is preventing
block_job_defer_to_main_loop_bh from being called in a nested
aio_poll(), which is a trouble, such as in this code path:

    qmp_block_commit
      commit_active_start
        bdrv_reopen
          bdrv_reopen_multiple
            bdrv_reopen_prepare
              bdrv_flush
                aio_poll
                  aio_bh_poll
                    aio_bh_call
                      block_job_defer_to_main_loop_bh
                        stream_complete
                          bdrv_reopen

block_job_defer_to_main_loop_bh is the last step of the stream job,
which should have been "paused" by the bdrv_drained_begin/end in
bdrv_reopen_multiple, but it is not done because it's in the form of a
main loop BH.

Similar to why block jobs should be paused between drained_begin and
drained_end, BHs they schedule must be excluded as well.  To achieve
this, this patch forces draining the BH in BDRV_POLL_WHILE.

As a side effect this fixes a hang in block_job_detach_aio_context
during system_reset when a block job is ready:

    #0  0x0000555555aa79f3 in bdrv_drain_recurse
    #1  0x0000555555aa825d in bdrv_drained_begin
    #2  0x0000555555aa8449 in bdrv_drain
    #3  0x0000555555a9c356 in blk_drain
    #4  0x0000555555aa3cfd in mirror_drain
    #5  0x0000555555a66e11 in block_job_detach_aio_context
    #6  0x0000555555a62f4d in bdrv_detach_aio_context
    #7  0x0000555555a63116 in bdrv_set_aio_context
    #8  0x0000555555a9d326 in blk_set_aio_context
    #9  0x00005555557e38da in virtio_blk_data_plane_stop
    #10 0x00005555559f9d5f in virtio_bus_stop_ioeventfd
    #11 0x00005555559fa49b in virtio_bus_stop_ioeventfd
    #12 0x00005555559f6a18 in virtio_pci_stop_ioeventfd
    #13 0x00005555559f6a18 in virtio_pci_reset
    #14 0x00005555559139a9 in qdev_reset_one
    #15 0x0000555555916738 in qbus_walk_children
    #16 0x0000555555913318 in qdev_walk_children
    #17 0x0000555555916738 in qbus_walk_children
    #18 0x00005555559168ca in qemu_devices_reset
    #19 0x000055555581fcbb in pc_machine_reset
    #20 0x00005555558a4d96 in qemu_system_reset
    #21 0x000055555577157a in main_loop_should_exit
    #22 0x000055555577157a in main_loop
    #23 0x000055555577157a in main

The rationale is that the loop in block_job_detach_aio_context cannot
make any progress in pausing/completing the job, because bs->in_flight
is 0, so bdrv_drain doesn't process the block_job_defer_to_main_loop
BH. With this patch, it does.

Reported-by: Jeff Cody <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>
Message-Id: <20170418143044 [email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Tested-by: Jeff Cody <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>

block: Walk bs->children carefully in bdrv_drain_recurse

The recursive bdrv_drain_recurse may run a block job completion BH that
drops nodes. The coming changes will make that more likely and use-after-free
would happen without this patch

Stash the bs pointer and use bdrv_ref/bdrv_unref in addition to
QLIST_FOREACH_SAFE to prevent such a case from happening.

Since bdrv_unref accesses global state that is not protected by the AioContext
lock, we cannot use bdrv_ref/bdrv_unref unconditionally. Fortunately the
protection is not needed in IOThread because only main loop can modify a graph
with the AioContext lock held.

Signed-off-by: Fam Zheng <[email protected]>
Message-Id: <20170418143044 [email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Tested-by: Jeff Cody <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>

9pfs: local: set the path of the export root to "."

The local backend was recently converted to using "at*()" syscalls in order
to ensure all accesses happen below the shared directory. This requires that
we only pass relative paths, otherwise the dirfd argument to the "at*()"
syscalls is ignored and the path is treated as an absolute path in the host.
This is actually the case for paths in all fids, with the notable exception
of the root fid, whose path is "/". This causes the following backend ops to
act on the "/" directory of the host instead of the virtfs shared directory
when the export root is involved:
- lstat
- chmod
- chown
- utimensat

ie, chmod /9p_mount_point in the guest will be converted to chmod / in the
host for example. This could cause security issues with a privileged QEMU.

All "*at()" syscalls are being passed an open file descriptor. In the case
of the export root, this file descriptor points to the path in the host that
was passed to -fsdev.

The fix is thus as simple as changing the path of the export root fid to be
"." instead of "/".

This is CVE-2017-7471.

Cc: [email protected]
Reported-by: Léo Gaspard <[email protected]>
Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

Update version for v2.9.0-rc4 release

Signed-off-by: Peter Maydell <[email protected]>

block/io: Comment out permission assertions

In case of block migration, there may be writes to BlockBackends that do
not have the write permission taken. Before this issue is fixed (which
is not going to happen in 2.9), we therefore cannot assert that this is
the case.

Suggested-by: Kevin Wolf <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Tested-by: Kevin Wolf <[email protected]>
Message-id: 20170411145050 [email protected]
Tested-by: Laurent Vivier <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

sheepdog: Fix crash in co_read_response()

This fixes a regression introduced in commit 9d456654.

aio_co_wake() can only be used to reenter a coroutine that was already
previously entered, otherwise co->ctx is uninitialised and we access
garbage. Using it immediately after qemu_coroutine_create() like in
co_read_response() is wrong and causes segfaults.

Replace the call with aio_co_enter(), which gets an explicit AioContext
parameter and works even for new coroutines.

Signed-off-by: Kevin Wolf <[email protected]>
Tested-by: Kashyap Chamarthy <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Message-id: 1491919733 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2017-04-11' into staging

Block patches for 2.9.0-rc4

# gpg: Signature made Tue 11 Apr 2017 14:40:07 BST
# gpg:                using RSA key 0xF407DB0061D5CF40
# gpg: Good signature from "Max Reitz <[email protected]>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* remotes/maxreitz/tags/pull-block-2017-04-11:
  iscsi: Fix iscsi_create
  throttle: Remove block from group on hot-unplug
  block: pass the right options for BlockDriver.bdrv_open()

Signed-off-by: Peter Maydell <[email protected]>

iscsi: Fix iscsi_create

Since d5895fcb (iscsi: Split URL into individual options), creating
qcow2 image on an iscsi LUN fails:

    qemu-img create -f qcow2 iscsi://$SERVER/$IQN/0 1G
    qemu-img: iscsi://$SERVER/$IQN/0: Could not create image: Invalid
        argument

The problem is iscsi_open now expects that transport_name, portal and
target are already parsed into structured options by
iscsi_parse_filename, but it is not called in iscsi_create.

Signed-off-by: Fam Zheng <[email protected]>
Message-id: 20170410075451 [email protected]
Reviewed-by: Eric Blake <[email protected]>
[mreitz: Dropped now superfluous
         qdict_put(bs_options, "filename", ...)]
Signed-off-by: Max Reitz <[email protected]>

throttle: Remove block from group on hot-unplug

When a block device that is part of a throttle group is hot-unplugged,
we forgot to remove it from the throttle group. This leaves stale
memory around, and causes an easily reproducible crash:

$ ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp stdio \
-device virtio-scsi-pci,bus=pci.0 -drive \
id=drive_image2,if=none,format=raw,file=file2,bps=512000,iops=100,group=foo \
-device scsi-hd,id=image2,drive=drive_image2 -drive \
id=drive_image3,if=none,format=raw,file=file3,bps=512000,iops=100,group=foo \
-device scsi-hd,id=image3,drive=drive_image3
{'execute':'qmp_capabilities'}
{'execute':'device_del','arguments':{'id':'image3'}}
{'execute':'system_reset'}

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1428810
Suggested-by: Alberto Garcia <[email protected]>
Signed-off-by: Eric Blake <[email protected]>
Message-id: 20170406190847 [email protected]
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

block: pass the right options for BlockDriver.bdrv_open()

raw_open() expects the caller always passing in the right actual
@options parameter. But when trying to applying snapshot on a RBD
image, bdrv_snapshot_goto() calls raw_open() (by calling the
bdrv_open callback on the BlockDriver) with a NULL @options, and
that will result in a Segmentation fault.

For the other non-raw format drivers, it also makes sense to passing
in the actual options, althought they don't trigger the problem so
far.

Let's prepare a @options by adding the "file" key-value pair to a
copy of the actual options that were given for the node (i.e.
bs->options), and pass it to the callback.

BlockDriver.bdrv_open() expects bs->file to be NULL and just
overwrites it with the result from bdrv_open_child(). That means we
should actually make sure it's NULL because otherwise the child BDS
will have a reference count that is 1 too high. So we unconditionally
invoke bdrv_unref_child() before calling BlockDriver.bdrv_open(), and
we wrap everything in bdrv_ref()/bdrv_unref() so the BDS isn't
deleted in the meantime.

Suggested-by: Max Reitz <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Message-id: 20170405091909 [email protected]
Signed-off-by: Max Reitz <[email protected]>

Merge remote-tracking branch 'remotes/famz/tags/block-pull-request' into staging

# gpg: Signature made Tue 11 Apr 2017 13:10:55 BST
# gpg:                using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021  AD56 CA35 624C 6A91 71C6

* remotes/famz/tags/block-pull-request:
  sheepdog: Use bdrv_coroutine_enter before BDRV_POLL_WHILE
  block: Fix bdrv_co_flush early return
  block: Use bdrv_coroutine_enter to start I/O coroutines
  qemu-io-cmds: Use bdrv_coroutine_enter
  blockjob: Use bdrv_coroutine_enter to start coroutine
  block: Introduce bdrv_coroutine_enter
  async: Introduce aio_co_enter
  coroutine: Extract qemu_aio_coroutine_enter
  tests/block-job-txn: Don't start block job before adding to txn
  block: Quiesce old aio context during bdrv_set_aio_context
  block: Make bdrv_parent_drained_begin/end public

Signed-off-by: Peter Maydell <[email protected]>

sheepdog: Use bdrv_coroutine_enter before BDRV_POLL_WHILE

When called from main thread, the coroutine should run in the context of
bs. Use bdrv_coroutine_enter to ensure that.

Signed-off-by: Fam Zheng <[email protected]>

block: Fix bdrv_co_flush early return

bdrv_inc_in_flight and bdrv_dec_in_flight are mandatory for
BDRV_POLL_WHILE to work, even for the shortcut case where flush is
unnecessary. Move the if block to below bdrv_dec_in_flight, and BTW fix
the variable declaration position.

Signed-off-by: Fam Zheng <[email protected]>
Acked-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>

block: Use bdrv_coroutine_enter to start I/O coroutines

BDRV_POLL_WHILE waits for the started I/O by releasing bs's ctx then polling
the main context, which relies on the yielded coroutine continuing on bs->ctx
before notifying qemu_aio_context with bdrv_wakeup().

Thus, using qemu_coroutine_enter to start I/O is wrong because if the coroutine
is entered from main loop, co->ctx will be qemu_aio_context, as a result of the
"release, poll, acquire" loop of BDRV_POLL_WHILE, race conditions happen when
both main thread and the iothread access the same BDS:

  main loop                                iothread
-----------------------------------------------------------------------
  blockdev_snapshot
    aio_context_acquire(bs->ctx)
                                           virtio_scsi_data_plane_handle_cmd
    bdrv_drained_begin(bs->ctx)
    bdrv_flush(bs)
      bdrv_co_flush(bs)                      aio_context_acquire(bs->ctx).enter
        ...
        qemu_coroutine_yield(co)
      BDRV_POLL_WHILE()
        aio_context_release(bs->ctx)
                                             aio_context_acquire(bs->ctx).return
                                               ...
                                                 aio_co_wake(co)
        aio_poll(qemu_aio_context)               ...
          co_schedule_bh_cb()                    ...
            qemu_coroutine_enter(co)             ...

              /* (A) bdrv_co_flush(bs)           /* (B) I/O on bs */
                      continues... */
                                             aio_context_release(bs->ctx)
        aio_context_acquire(bs->ctx)

Note that in above case, bdrv_drained_begin() doesn't do the "release,
poll, acquire" in BDRV_POLL_WHILE, because bs->in_flight == 0.

Fix this by using bdrv_coroutine_enter and enter coroutine in the right
context.

iotests 109 output is updated because the coroutine reenter flow during
mirror job complete is different (now through co_queue_wakeup, instead
of the unconditional qemu_coroutine_switch before), making the end job
len different.

Signed-off-by: Fam Zheng <[email protected]>
Acked-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>

qemu-io-cmds: Use bdrv_coroutine_enter

qemu_coroutine_create associates @co to qemu_aio_context but we poll
blk's context below. If the coroutine yields, it may never get resumed
again.

Use bdrv_coroutine_enter to make sure we are starting the I/O on the
right context.

Signed-off-by: Fam Zheng <[email protected]>
Acked-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>

blockjob: Use bdrv_coroutine_enter to start coroutine

Resuming and especially starting of the block job coroutine, could be issued in
the main thread. However the coroutine's "home" ctx should be set to the same
context as job->blk. Use bdrv_coroutine_enter to ensure that.

Signed-off-by: Fam Zheng <[email protected]>
Acked-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>

block: Introduce bdrv_coroutine_enter

Signed-off-by: Fam Zheng <[email protected]>
Acked-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>

async: Introduce aio_co_enter

They start the coroutine on the specified context.

Signed-off-by: Fam Zheng <[email protected]>
Acked-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>

coroutine: Extract qemu_aio_coroutine_enter

It's a variant of qemu_coroutine_enter with an explicit AioContext
parameter.

Signed-off-by: Fam Zheng <[email protected]>
Acked-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>

tests/block-job-txn: Don't start block job before adding to txn

Previously, before test_block_job_start returns, the job can already
complete, as a result, the transactional state of other jobs added to
the same txn later cannot be handled correctly.

Move the block_job_start() calls to callers after
block_job_txn_add_job() calls.

Signed-off-by: Fam Zheng <[email protected]>
Acked-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>

block: Quiesce old aio context during bdrv_set_aio_context

The fact that the bs->aio_context is changing can confuse the dataplane
iothread, because of the now fine granularity aio context lock.
bdrv_drain should rather be a bdrv_drained_begin/end pair, but since
bs->aio_context is changing, we can just use aio_disable_external and
bdrv_parent_drained_begin.

Reported-by: Ed Swierk <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>
Acked-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>

block: Make bdrv_parent_drained_begin/end public

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Acked-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>

Merge remote-tracking branch 'remotes/kraxel/tags/pull-fixes-20170411-1' into staging

qxl: bugfixes.

# gpg: Signature made Tue 11 Apr 2017 08:00:00 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg:                 aka "Gerd Hoffmann <[email protected]>"
# gpg:                 aka "Gerd Hoffmann (private) <[email protected]>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/pull-fixes-20170411-1:
  qxl: add migration blocker to avoid pre-save assert
  qxl: switch display on entering VGA

Signed-off-by: Peter Maydell <[email protected]>

qxl: add migration blocker to avoid pre-save assert

Cc: [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Message-id: 20170410113131 [email protected]

Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into staging

Fixes a memory leak.

# gpg: Signature made Mon 10 Apr 2017 13:20:39 BST
# gpg:                using DSA key 0x02FC3AEB0101DBC2
# gpg: Good signature from "Greg Kurz <[email protected]>"
# gpg:                 aka "Greg Kurz <[email protected]>"
# gpg:                 aka "Greg Kurz <[email protected]>"
# gpg:                 aka "Gregory Kurz (Groug) <[email protected]>"
# gpg:                 aka "[jpeg image of size 3330]"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894  DBA2 02FC 3AEB 0101 DBC2

* remotes/gkurz/tags/for-upstream:
  9pfs: xattr: fix memory leak in v9fs_list_xattr

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/stsquad/tags/pull-mttcg-fixups-for-rc2-100417-1' into staging

Final icount and misc MTTCG fixes for 2.9

Minor differences from:
  Message-Id: <20170405132503 [email protected]>

  - dropped new feature patches
  - last minute typo fix from Nikunj A Dadhania <[email protected]>

# gpg: Signature made Mon 10 Apr 2017 11:38:10 BST
# gpg:                using RSA key 0xFBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <[email protected]>"
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8  DF35 FBD0 DB09 5A9E 2A44

* remotes/stsquad/tags/pull-mttcg-fixups-for-rc2-100417-1:
  replay: assert time only goes forward
  cpus: call cpu_update_icount on read
  cpu-exec: update icount after each TB_EXIT
  cpus: introduce cpu_update_icount helper
  cpus: don't credit executed instructions before they have run
  cpus: move icount preparation out of tcg_exec_cpu
  cpus: check cpu->running in cpu_get_icount_raw()
  cpus: remove icount handling from qemu_tcg_cpu_thread_fn
  target/i386/misc_helper: wrap BQL around another IRQ generator
  cpus: fix wrong define name
  scripts/qemugdb/mtree.py: fix up mtree dump

Signed-off-by: Peter Maydell <[email protected]>

configure: on Windows minimum glib version must be 2.30

In the 2.7 release we stated in the ChangeLog that the
minimum glib version for Windows hosts was 2.30, but we
didn't update configure to enforce this because we were
very close to the release at the point where we noticed
the issue, and it only affected building the test suite.
We then forgot that we needed to do it. Fix the omission.

(The reason for the 2.30 requirement is use of
g_dir_make_tmp() -- our fallback implementation uses
mkdtemp(), which isn't available on Windows.)

Reported-by: Mark Cave-Ayland <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Stefan Weil <[email protected]>
Message-id: 1491224655 [email protected]

replay: assert time only goes forward

If we find ourselves trying to add an event to the log where time has
gone backwards it is because a vCPU event has occurred and the
main-loop is not yet aware of time moving forward. This should not
happen and if it does its better to fail early than generate a log
that will have weird behaviour.

Signed-off-by: Alex Bennée <[email protected]>

cpus: call cpu_update_icount on read

This ensures each time the vCPU thread reads the icount we update the
master timer_state.qemu_icount field. This way as long as updates are
in BQL protected sections (which they should be) the main-loop can
never come to update the log and find time has gone backwards.

Signed-off-by: Alex Bennée <[email protected]>

cpu-exec: update icount after each TB_EXIT

There is no particular reason we shouldn't update the global system
icount time as we exit each TranslationBlock run. This ensures the
main-loop doesn't have to wait until we exit to the outer loop for
executed instructions to be credited to timer_state.

The prepare_icount_for_run function is slightly tweaked to match the
logic we run in cpu_loop_exec_tb.

Based on Paolo's original suggestion.

Signed-off-by: Alex Bennée <[email protected]>

cpus: introduce cpu_update_icount helper

By holding off updates to timer_state.qemu_icount we can run into
trouble when the non-vCPU thread needs to know the time. This helper
ensures we atomically update timers_state.qemu_icount based on what
has been currently executed.

Signed-off-by: Alex Bennée <[email protected]>

cpus: don't credit executed instructions before they have run

Outside of the vCPU thread icount time will only be tracked against
timers_state.qemu_icount. We no longer credit cycles until they have
completed the run. Inside the vCPU thread we adjust for passage of
time by looking at how many have run so far. This is only valid inside
the vCPU thread while it is running.

Signed-off-by: Alex Bennée <[email protected]>

cpus: move icount preparation out of tcg_exec_cpu

As icount is only supported for single-threaded execution due to the
requirement for determinism let's remove it from the common
tcg_exec_cpu path.

Also remove the additional fiddling which shouldn't be required as the
icount counters should all be rectified as you enter the loop.

Signed-off-by: Alex Bennée <[email protected]>

cpus: check cpu->running in cpu_get_icount_raw()

The lifetime of current_cpu is now the lifetime of the vCPU thread.
However get_icount_raw() can apply a fudge factor if called while code
is running to take into account the current executed instruction
count.

To ensure this is always the case we also check cpu->running.

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

cpus: remove icount handling from qemu_tcg_cpu_thread_fn

We should never be running in multi-threaded mode with icount enabled.
There is no point calling handle_icount_deadline here so remove it and
assert !use_icount.

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

target/i386/misc_helper: wrap BQL around another IRQ generator

Anything that calls into HW emulation must be protected by the BQL.

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Acked-by: Eduardo Habkost <[email protected]>

cpus: fix wrong define name

While the configure script generates TARGET_SUPPORTS_MTTCG define, one
of the define is cpus.c is checking wrong name: TARGET_SUPPORT_MTTCG

Signed-off-by: Nikunj A Dadhania <[email protected]>
Signed-off-by: Alex Bennée <[email protected]>

9pfs: xattr: fix memory leak in v9fs_list_xattr

Free 'orig_value' in error path.

Signed-off-by: Li Qiang <[email protected]>
Signed-off-by: Greg Kurz <[email protected]>

scripts/qemugdb/mtree.py: fix up mtree dump

Since QEMU has been able to build with native Int128 support this was
broken as it attempts to fish values out of the non-existent
structure. Also the alias print was trying to make a %x out of
gdb.ValueType directly which didn't seem to work.

Signed-off-by: Alex Bennée <[email protected]>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer fixes for 2.9.0-rc4

# gpg: Signature made Fri 07 Apr 2017 13:44:17 BST
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  mirror: Fix aio context of mirror_top_bs
  block: Assert attached child node has right aio context
  block: Fix unpaired aio_disable_external in external snapshot
  block: Don't check permissions for copy on read
  qemu-img: img_create does not support image-opts, fix docs
  iotests: Add mirror tests for orphaned source
  block/mirror: Fix use-after-free
  commit: Set commit_top_bs->total_sectors
  commit: Set commit_top_bs->aio_context
  block: Ignore guest dev permissions during incoming migration

Signed-off-by: Peter Maydell <[email protected]>

mirror: Fix aio context of mirror_top_bs

It should be moved to the same context as source, before inserting to the
graph.

Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Assert attached child node has right aio context

Suggested-by: Kevin Wolf <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Fix unpaired aio_disable_external in external snapshot

bdrv_replace_child_noperm tries to hand over the quiesce_counter state
from old bs to the new one, but if they are not on the same aio context
this causes unbalance.

Fix this by setting the correct aio context before calling
bdrv_append().

Reported-by: Ed Swierk <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Don't check permissions for copy on read

The assertion is currently failing. We can't require callers to have
write permissions when all they are doing is a read, so comment it out.
Add a FIXME comment in the code so that the check is re-enabled when
copy on read is refactored into its own filter driver.

Reported-by: Richard W.M. Jones <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Richard W.M. Jones <[email protected]>

qemu-img: img_create does not support image-opts, fix docs

The documentation and help for qemu-img claims that 'qemu-img create'
will take the '--image-opts' argument. This is not true, so this
patch removes those claims.

Signed-off-by: Jeff Cody <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests: Add mirror tests for orphaned source

Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block/mirror: Fix use-after-free

If @bs does not have any parents, the only reference to @mirror_top_bs
will be held by the BlockJob object after the bdrv_unref() following
block_job_create(). However, if block_job_create() fails, this reference
will not exist and @mirror_top_bs will have been deleted when we
goto fail.

The issue comes back at all later entries to the fail label: We delete
the BlockJob object before rolling back our changes to the node graph.
This means that we will delete @mirror_top_bs in the process.

All in all, whenever @bs does not have any parents and we go down the
fail path we will dereference @mirror_top_bs after it has been deleted.

Fix this by invoking bdrv_unref() only when block_job_create() was
successful and by bdrv_ref()'ing @mirror_top_bs in the fail path before
deleting the BlockJob object. Finally, bdrv_unref() it at the end of the
fail path after we actually no longer need it.

Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

commit: Set commit_top_bs->total_sectors

Like in the mirror filter driver, we also need to set the image size for
the commit filter driver. This is less likely to be a problem in
practice than for the mirror because we're not at the active layer here,
but attaching new parents to a node in the middle of the chain is
possible, so the size needs to be correct anyway.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>

commit: Set commit_top_bs->aio_context

The filter driver that is inserted by the commit job needs to use the
same AioContext as its parent and child nodes.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>

block: Ignore guest dev permissions during incoming migration

Usually guest devices don't like other writers to the same image, so
they use blk_set_perm() to prevent this from happening. In the migration
phase before the VM is actually running, though, they don't have a
problem with writes to the image. On the other hand, storage migration
needs to be able to write to the image in this phase, so the restrictive
blk_set_perm() call of qdev devices breaks it.

This patch flags all BlockBackends with a qdev device as
blk->disable_perm during incoming migration, which means that the
requested permissions are stored in the BlockBackend, but not actually
applied to its root node yet.

Once migration has finished and the VM should be resumed, the
permissions are applied. If they cannot be applied (e.g. because the NBD
server used for block migration hasn't been shut down), resuming the VM
fails.

Signed-off-by: Kevin Wolf <[email protected]>
Tested-by: Kashyap Chamarthy <[email protected]>

qxl: switch display on entering VGA

Since commit cd958edb1fae85d, same size console resize is skipped. This
change broke QXL incoming migration in VGA mode,
qemu_spice_display_switch() is no longer called during qxl_post_load(),
because default message surface is of the same size, and during
displaychangelistener registration, PCIQXLDevice.mode is
QXL_MODE_UNDEFINED. This triggers a later crash on refresh:

==2634== Invalid read of size 4
==3516== at 0x65F3050: pixman_image_get_data (in /usr/lib64/libpixman-1.so.0.34.0)
==3516== by 0x6F0CEB: qemu_spice_create_update (spice-display.c:215)
==3516== by 0x6F1CC7: qemu_spice_display_refresh (spice-display.c:502)
==3516== by 0x58CF77: display_refresh (qxl.c:1948)
==3516== by 0x6E8084: do_safe_dpy_refresh (console.c:1591)
==3516== by 0x6E80D5: dpy_refresh (console.c:1604)
==3516== by 0x6E4508: gui_update (console.c:201)
==3516== by 0x81898E: timerlist_run_timers (qemu-timer.c:536)
==3516== by 0x8189D6: qemu_clock_run_timers (qemu-timer.c:547)
==3516== by 0x818D98: qemu_clock_run_all_timers (qemu-timer.c:662)
==3516== by 0x81952A: main_loop_wait (main-loop.c:514)
==3516== by 0x4ADD29: main_loop (vl.c:1898)

One way to solve this is to explicitely call qemu_spice_display_switch()
on entering VGA mode, which is called during qxl_post_load().

Fixes:
"null pointer access on migration resume of systemrescuecd boot menu with qxl-vga"
https://bugs.launchpad.net/qemu/+bug/1679126
https://bugzilla.redhat.com/show_bug.cgi?id=1438566

Signed-off-by: Marc-André Lureau <[email protected]>
Message-id: 20170406120513 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

Merge remote-tracking branch 'remotes/awilliam/tags/vfio-updates-20170406.0' into staging

VFIO fixes 2017-04-06

- Extra test for NVIDIA BAR5 quirk to avoid segfault (Alex Williamson)

# gpg: Signature made Thu 06 Apr 2017 23:05:53 BST
# gpg:                using RSA key 0x239B9B6E3BB08B22
# gpg: Good signature from "Alex Williamson <[email protected]>"
# gpg:                 aka "Alex Williamson <[email protected]>"
# gpg:                 aka "Alex Williamson <[email protected]>"
# gpg:                 aka "Alex Williamson <[email protected]>"
# Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B  8A90 239B 9B6E 3BB0 8B22

* remotes/awilliam/tags/vfio-updates-20170406.0:
  vfio/pci-quirks: Exclude non-ioport BAR from NVIDIA quirk

Signed-off-by: Peter Maydell <[email protected]>

vfio/pci-quirks: Exclude non-ioport BAR from NVIDIA quirk

The NVIDIA BAR5 quirk is targeting an ioport BAR. Some older devices
have a BAR5 which is not ioport and can induce a segfault here. Test
the BAR type to skip these devices.

Link: https://bugs.launchpad.net/qemu/+bug/1678466
Signed-off-by: Alex Williamson <[email protected]>

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* TCO watchdog fix

# gpg: Signature made Wed 05 Apr 2017 16:24:52 BST
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg:                 aka "Paolo Bonzini <[email protected]>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  tco: do not generate an NMI

Signed-off-by: Peter Maydell <[email protected]>

tco: do not generate an NMI

This behavior is not indicated in the datasheet and can confuse the OS.
The TCO can trap NMIs from SERR# or IOCHK# and convert them to SMIs; but
any other TCO event is either delivered as an SMI or completely disabled.

Reviewed-by: Laszlo Ersek <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Update version for v2.9.0-rc3 release

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into staging

Some 9pfs bugs fixes: potential hang at reset, migration blocker leak.

# gpg: Signature made Tue 04 Apr 2017 17:07:55 BST
# gpg:                using DSA key 0x02FC3AEB0101DBC2
# gpg: Good signature from "Greg Kurz <[email protected]>"
# gpg:                 aka "Greg Kurz <[email protected]>"
# gpg:                 aka "Greg Kurz <[email protected]>"
# gpg:                 aka "Gregory Kurz (Groug) <[email protected]>"
# gpg:                 aka "[jpeg image of size 3330]"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894  DBA2 02FC 3AEB 0101 DBC2

* remotes/gkurz/tags/for-upstream:
  9pfs: clear migration blocker at session reset
  9pfs: fix multiple flush for same request

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pci: fix

A single bugfix for a error handling issue in pci.

Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Tue 04 Apr 2017 16:33:04 BST
# gpg:                using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>"
# gpg:                 aka "Michael S. Tsirkin <[email protected]>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream:
  pci: Only unmap bus_master_enabled_region if was added previously

Signed-off-by: Peter Maydell <[email protected]>

9pfs: clear migration blocker at session reset

The migration blocker survives a device reset: if the guest mounts a 9p
share and then gets rebooted with system_reset, it will be unmigratable
until it remounts and umounts the 9p share again.

This happens because the migration blocker is supposed to be cleared when
we put the last reference on the root fid, but virtfs_reset() wrongly calls
free_fid() instead of put_fid().

This patch fixes virtfs_reset() so that it honor the way fids are supposed
to be manipulated: first get a reference and later put it back when you're
done.

Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Li Qiang <[email protected]>

9pfs: fix multiple flush for same request

If a client tries to flush the same outstanding request several times, only
the first flush completes. Subsequent ones keep waiting for the request
completion in v9fs_flush() and, therefore, leak a PDU. This will cause QEMU
to hang when draining active PDUs the next time the device is reset.

Let have each flush request wake up the next one if any. The last waiter
frees the cancelled PDU.

Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>

pci: Only unmap bus_master_enabled_region if was added previously

Normally pci_init_bus_master() would be called either via
bus->machine_done.notify or directly from do_pci_register_device().

However if a device's realize() failed, pci_init_bus_master() is not
called, and do_pci_unregister_device() fails on
memory_region_del_subregion() as it was not mapped.

This adds a check that subregion was mapped before unmapping it.

Fixes: c53598ed18e4 ("pci: Add missing drop of bus master AS reference")
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Marcel Apfelbaum <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Tested-by: John Snow <[email protected]>

Merge remote-tracking branch 'remotes/berrange/tags/pull-qio-2017-04-04-1' into staging

Merge qio 2017/04/04 v1

# gpg: Signature made Tue 04 Apr 2017 16:17:56 BST
# gpg:                using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <[email protected]>"
# gpg:                 aka "Daniel P. Berrange <[email protected]>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF

* remotes/berrange/tags/pull-qio-2017-04-04-1:
  io: fix FD socket handling in DNS lookup
  io: fix incoming client socket initialization

Signed-off-by: Peter Maydell <[email protected]>

io: fix FD socket handling in DNS lookup

The qio_dns_resolver_lookup_sync() method is required to be a no-op
for socket kinds that don't require name resolution. Thus the KIND_FD
handling should not return an error.

Signed-off-by: Daniel P. Berrange <[email protected]>

io: fix incoming client socket initialization

The channel socket was initialized manually, but forgot to set
QIO_CHANNEL_FEATURE_SHUTDOWN. Thus, the colo_process_incoming_thread
would hang at recvmsg. This patch just call qio_channel_socket_new to
get channel, Which set QIO_CHANNEL_FEATURE_SHUTDOWN already.

Signed-off-by: Wang Guang<[email protected]>
Signed-off-by: zhanghailiang <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* MemoryRegionCache revert
* glib optimization workaround
* fix "info lapic" segfault on isapc
* fix QIOChannel memory leak

# gpg: Signature made Mon 03 Apr 2017 18:17:00 BST
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg:                 aka "Paolo Bonzini <[email protected]>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  main-loop: Acquire main_context lock around os_host_main_loop_wait.
  exec: revert MemoryRegionCache
  nbd: fix memory leak on socket_connect failed
  ipmi: Fix macro issues
  target-i386: fix "info lapic" segfault on isapc
  iscsi: drop unused IscsiAIOCB.qiov field

Signed-off-by: Peter Maydell <[email protected]>

tests/libqtest.c: Delete possible stale unix sockets

Occasionally if a test crashes or is interrupted by the user
at the wrong moment it could leave behind a stale UNIX
socket in /tmp/. This will then cause a subsequent test
run to fail spuriously with
tests/libqtest.c:70:init_socket: assertion failed (ret != -1): (-1 != -1)
if it happens to reuse the same PID.

Defend against this by deleting any stray stale socket before
trying to open the new ones for this test.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Message-id: 1490963801 [email protected]

main-loop: Acquire main_context lock around os_host_main_loop_wait.

When running virt-rescue the serial console hangs from time to time.
Virt-rescue runs an ordinary Linux kernel "appliance", but there is
only a single idle process running inside, so the qemu main loop is
largely idle.  With virt-rescue >= 1.37 you may be able to observe the
hang by doing:

  $ virt-rescue -e ^] --scratch
  ><rescue> while true; do ls -l /usr/bin; done

The hang in virt-rescue can be resolved by pressing a key on the
serial console.

Possibly with the same root cause, we also observed hangs during very
early boot of regular Linux VMs with a serial console.  Those hangs
are extremely rare, but you may be able to observe them by running
this command on baremetal for a sufficiently long time:

  $ while libguestfs-test-tool -t 60 >& /tmp/log ; do echo -n . ; done

(Check in /tmp/log that the failure was caused by a hang during early
boot, and not some other reason)

During investigation of this bug, Paolo Bonzini wrote:

> glib is expecting QEMU to use g_main_context_acquire around accesses to
> GMainContext.  However QEMU is not doing that, instead it is taking its
> own mutex.  So we should add g_main_context_acquire and
> g_main_context_release in the two implementations of
> os_host_main_loop_wait; these should undo the effect of Frediano's
> glib patch.

This patch exactly implements Paolo's suggestion in that paragraph.

This fixes the serial console hang in my testing, across 3 different
physical machines (AMD, Intel Core i7 and Intel Xeon), over many hours
of automated testing.  I wasn't able to reproduce the early boot hangs
(but as noted above, these are extremely rare in any case).

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1435432
Reported-by: Richard W.M. Jones <[email protected]>
Tested-by: Richard W.M. Jones <[email protected]>
Signed-off-by: Richard W.M. Jones <[email protected]>
Message-Id: <20170331205133 [email protected]>
[Paolo: this is actually a glib bug: recent glib versions are also
expecting g_main_context_acquire around g_poll---but that is not
documented and probably not even intended].
Signed-off-by: Paolo Bonzini <[email protected]>

Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2017-04-03' into staging

Block patches for 2.9-rc3

# gpg: Signature made Mon 03 Apr 2017 16:29:49 BST
# gpg:                using RSA key 0xF407DB0061D5CF40
# gpg: Good signature from "Max Reitz <[email protected]>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* remotes/maxreitz/tags/pull-block-2017-04-03:
  block/parallels: Avoid overflows
  iotests: Improve image-clear tests on non-aligned image
  qcow2: Discard unaligned tail when wiping image
  iotests: fix 097 when run with qcow
  qemu-io-cmds: Assert that global and nofile commands don't use ct->perms
  sheepdog: Fix blockdev-add
  nbd: Tidy up blockdev-add interface
  sockets: New helper socket_address_crumple()
  qapi-schema: SocketAddressFlat variants 'vsock' and 'fd'
  gluster: Prepare for SocketAddressFlat extension
  block: Document -drive problematic code and bugs
  io vnc sockets: Clean up SocketAddressKind switches
  char: Fix socket with "type": "vsock" address
  nbd sockets vnc: Mark problematic address family tests TODO
  block: add missed aio_context_acquire into release_drive

Signed-off-by: Peter Maydell <[email protected]>

block/parallels: Avoid overflows

Change the types of variables in allocate_clusters() to int64_t so we do
not have to worry about potential overflows.

Add an assertion that our accesses to s->bat[] do not result in a buffer
overflow and that the implicit conversion performed when invoking
bat_entry_off() does not result in an integer overflow.

Coverity-id: 1307776
Signed-off-by: Max Reitz <[email protected]>
Message-id: 20170331170512 [email protected]
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

iotests: Improve image-clear tests on non-aligned image

Tweak 097 and 176 to operate on an image that is not cluster-aligned,
to give further coverage of clearing out an entire image, including
the recent fix to eliminate the difference between fast path (97) and
slow (176) for qcow2. Also tested on qcow (97 only, since qcow lacks
snapshots).

Signed-off-by: Eric Blake <[email protected]>
Message-id: 20170331185356 [email protected]
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

qcow2: Discard unaligned tail when wiping image

There is a subtle difference between the fast (qcow2v3 with no
extra data) and slow path (qcow2v2 format [aka 0.10], or when a
snapshot is present) of qcow2_make_empty().  The slow path fails
to discard the final (partial) cluster of an unaligned image.

The problem stems from the fact that qcow2_discard_clusters() was
silently ignoring sub-cluster head and tail on unaligned requests.
A quick audit of all callers shows that qcow2_snapshot_create() has
always passed a cluster-aligned request since the call was added
in commit 1ebf561; qcow2_co_pdiscard() has passed a cluster-aligned
request since commit ecdbead taught the block layer about preferred
discard alignment; and qcow2_make_empty() was fixed to pass an
aligned start (but not necessarily end) in commit a3e1505.

Asserting that the start is always aligned also points out that we
now have a dead check: rounding the end offset down can never result
in a value less than the aligned start offset (the check was rendered
dead with commit ecdbead).  Meanwhile, we do not want to round the
end cluster down in the one case of the end offset matching the
(unaligned) file size - that final partial cluster should still be
discarded.

With those fixes in place, the fast and slow paths are back in sync
at discarding an entire image; the next patch will update
qemu-iotests to ensure we don't regress.

Note that bdrv_co_pdiscard ignores ALL partial cluster requests,
including the partial cluster at the end of an image; it can be
argued that the partial cluster at the end should be special-cased
so that a guest issuing discard requests at proper alignments
everywhere else can likewise empty the entire image.  But that
optimization is left for another day.

Signed-off-by: Eric Blake <[email protected]>
Message-id: 20170331185356 [email protected]
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

iotests: fix 097 when run with qcow

The previous commit:

  commit a3e1505daec31ef56f0489f8c8fff1b8e4ca92bd
  Author: Eric Blake <[email protected]>
  Date:   Mon Dec 5 09:49:34 2016 -0600

    qcow2: Don't strand clusters near 2G intervals during commit

extended the 097 test case so that it did two passes, once
with an internal snapshot, once without.

qcow (v1) does not support internal snapshots, so this change
broke test 097 when run against qcow.

This splits 097 in two, creating a new 176 that tests the
internal snapshot codepath, effectively putting 097 back
to its content before the above commit.

Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Message-Id: <20170221115512 [email protected]>
[eblake: test collisions: s/173/176/g]
Signed-off-by: Eric Blake <[email protected]>
Message-id: 20170331185356 [email protected]
Signed-off-by: Max Reitz <[email protected]>

qemu-io-cmds: Assert that global and nofile commands don't use ct->perms

It would be a bug for a command with the CMD_NOFILE_OK or
CMD_FLAG_GLOBAL flags set to also set the ct->perms field,
because the former says "OK for a file not to be open"
but the latter is a check on a file.

Add an assertion in qemuio_add_command() so we can catch that
sort of buggy command definition immediately rather than it
being a bug that only manifests when a particular set of
command line options is used.

(Coverity gets confused about this (CID 1371723) and reports
that we might dereference a NULL blk pointer in this case,
because it can't tell that that code path never happens with
the cmdinfo_t that we have. This commit won't help unconfuse
it, but it does fix the underlying issue.)

Signed-off-by: Peter Maydell <[email protected]>
Message-id: 1490967529 [email protected]
Signed-off-by: Max Reitz <[email protected]>

sheepdog: Fix blockdev-add

Commit 831acdc "sheepdog: Implement bdrv_parse_filename()" and commit
d282f34 "sheepdog: Support blockdev-add" have different ideas on how
the QemuOpts parameters for the server address are named.  Fix that.
While there, rename BlockdevOptionsSheepdog member addr to server, for
consistency with BlockdevOptionsSsh, BlockdevOptionsGluster,
BlockdevOptionsNbd.

Commit 831acdc's example becomes

    --drive driver=sheepdog,server.type=inet,server.host=fido,server.port=7000,vdi=dolly

instead of

    --drive driver=sheepdog,host=fido,vdi=dolly

Signed-off-by: Markus Armbruster <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Tested-by: Kashyap Chamarthy <[email protected]>
Message-id: 1490895797 [email protected]
Signed-off-by: Max Reitz <[email protected]>

nbd: Tidy up blockdev-add interface

SocketAddress is a simple union, and simple unions are awkward: they
have their variant members wrapped in a "data" object on the wire, and
require additional indirections in C.  I intend to limit its use to
existing external interfaces, and convert all internal interfaces to
SocketAddressFlat.

BlockdevOptionsNbd is an external interface using SocketAddress.  We
already use SocketAddressFlat elsewhere in blockdev-add.  Replace it
by SocketAddressFlat while we can (it's new in 2.9) for simplicity and
consistency.  For example,

    { "execute": "blockdev-add",
      "arguments": { "node-name": "foo", "driver": "nbd",
                     "server": { "type": "inet",
                 "data": { "host": "localhost",
           "port": "12345" } } } }

becomes

    { "execute": "blockdev-add",
      "arguments": { "node-name": "foo", "driver": "nbd",
                     "server": { "type": "inet",
                 "host": "localhost", "port": "12345" } } }

Since the internal interfaces still take SocketAddress, this requires
conversion function socket_address_crumple().  It'll go away when I
update the interfaces.

Unfortunately, SocketAddress is also visible in -drive since 2.8:

    -drive if=none,driver=nbd,server.type=inet,server.data.host=127.0.0.1,server.data.port=12345

Nobody should be using it, as it's fairly new and has never been
documented, so adding still more compatibility gunk to keep it working
isn't worth the trouble.  You now have to use

    -drive if=none,driver=nbd,server.type=inet,server.host=127.0.0.1,server.port=12345

Signed-off-by: Markus Armbruster <[email protected]>
Message-id: 1490895797 [email protected]

[mreitz: Change iotest 147 accordingly]

Because of this interface change, iotest 147 has to be adapted.
Unfortunately, we cannot just flatten all of the addresses because
nbd-server-start still takes a plain SocketAddress. Therefore, we need
both and this is most easily achieved by writing the SocketAddress into
the code and flattening it where necessary.

Signed-off-by: Max Reitz <[email protected]>
Message-id: 20170330221243 [email protected]

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

sockets: New helper socket_address_crumple()

SocketAddress is a simple union, and simple unions are awkward: they
have their variant members wrapped in a "data" object on the wire, and
require additional indirections in C.  I intend to limit its use to
existing external interfaces.  New ones should use SocketAddressFlat.
I further intend to convert all internal interfaces to
SocketAddressFlat.  This helper should go away then.

Signed-off-by: Markus Armbruster <[email protected]>
Message-id: 1490895797 [email protected]
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

qapi-schema: SocketAddressFlat variants 'vsock' and 'fd'

Note that the new variants are impossible in qemu_gluster_glfs_init(),
because the gconf->server can only come from qemu_gluster_parse_uri()
or qemu_gluster_parse_json(), and neither can create anything but
'inet' or 'unix'.

Signed-off-by: Markus Armbruster <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-id: 1490895797 [email protected]
Signed-off-by: Max Reitz <[email protected]>

gluster: Prepare for SocketAddressFlat extension

qemu_gluster_glfs_init() and qemu_gluster_parse_json() rely on the
fact that SocketAddressFlatType has only two members
SOCKET_ADDRESS_FLAT_TYPE_INET and SOCKET_ADDRESS_FLAT_TYPE_UNIX.
Correct, but won't stay correct. Make them more robust.

Signed-off-by: Markus Armbruster <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Message-id: 1490895797 [email protected]
Signed-off-by: Max Reitz <[email protected]>

block: Document -drive problematic code and bugs

-blockdev and blockdev_add convert their arguments via QObject to
BlockdevOptions for qmp_blockdev_add(), which converts them back to
QObject, then to a flattened QDict.  The QDict's members are typed
according to the QAPI schema.

-drive converts its argument via QemuOpts to a (flat) QDict.  This
QDict's members are all QString.

Thus, the QType of a flat QDict member depends on whether it comes
from -drive or -blockdev/blockdev_add, except when the QAPI type maps
to QString, which is the case for 'str' and enumeration types.

The block layer core extracts generic configuration from the flat
QDict, and the block driver extracts driver-specific configuration.

Both commonly do so by converting (parts of) the flat QDict to
QemuOpts, which turns all values into strings.  Not exactly elegant,
but correct.

However, A few places access the flat QDict directly:

* Most of them access members that are always QString.  Correct.

* bdrv_open_inherit() accesses a boolean, carefully.  Correct.

* nfs_config() uses a QObject input visitor.  Correct only because the
  visited type contains nothing but QStrings.

* nbd_config() and ssh_config() use a QObject input visitor, and the
  visited types contain non-QStrings: InetSocketAddress members
  @numeric, @to, @ipv4, @ipv6.  -drive works as long as you don't try
  to use them (they're all optional).  @to is ignored anyway.

  Reproducer:
  -drive driver=ssh,server.host=h,server.port=22,server.ipv4,path=p
  -drive driver=nbd,server.type=inet,server.data.host=h,server.data.port=22,server.data.ipv4
  both fail with "Invalid parameter type for 'data.ipv4', expected: boolean"

Add suitable comments to all these places.  Mark the buggy ones FIXME.

"Fortunately", -drive's driver-specific options are entirely
undocumented.

Signed-off-by: Markus Armbruster <[email protected]>
Message-id: 1490895797 [email protected]
[mreitz: Fixed two typos]
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

io vnc sockets: Clean up SocketAddressKind switches

We have quite a few switches over SocketAddressKind.  Some have case
labels for all enumeration values, others rely on a default label.
Some abort when the value isn't a valid SocketAddressKind, others
report an error then.

Unify as follows.  Always provide case labels for all enumeration
values, to clarify intent.  Abort when the value isn't a valid
SocketAddressKind, because the program state is messed up then.

Improve a few error messages while there.

Signed-off-by: Markus Armbruster <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Message-id: 1490895797 [email protected]
Signed-off-by: Max Reitz <[email protected]>

char: Fix socket with "type": "vsock" address

Watch this:

    $ qemu-system-x86_64 -nodefaults -S -display none -qmp stdio
    {"QMP": {"version": {"qemu": {"micro": 91, "minor": 8, "major": 2}, "package": " (v2.8.0-1195-gf84141e-dirty)"}, "capabilities": []}}
    { "execute": "qmp_capabilities" }
    {"return": {}}
    { "execute": "chardev-add", "arguments": { "id": "chr0", "backend": { "type": "socket", "data": { "addr": { "type": "vsock", "data": { "cid": "CID", "port": "P" }}}}}}
    Aborted (core dumped)

Crashes because SocketAddress_to_str() is blissfully unaware of
SOCKET_ADDRESS_KIND_VSOCK.  Fix that.  Pick the output format to match
socket_parse(), just like the existing formats.

Cc: Stefan Hajnoczi <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Marc-André Lureau <[email protected]>
Signed-off-by: Markus Armbruster <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Message-id: 1490895797 [email protected]
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

nbd sockets vnc: Mark problematic address family tests TODO

Certain features make sense only with certain address families.  For
instance, passing file descriptors requires AF_UNIX.  Testing
SocketAddress's saddr->type == SOCKET_ADDRESS_KIND_UNIX is obvious,
but problematic: it can't recognize AF_UNIX when type ==
SOCKET_ADDRESS_KIND_FD.

Mark such tests of saddr->type TODO.  We may want to check the address
family with getsockname() there.

Cc: Paolo Bonzini <[email protected]>
Cc: Gerd Hoffmann <[email protected]>
Cc: Daniel P. Berrange <[email protected]>
Signed-off-by: Markus Armbruster <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-id: 1490895797 [email protected]
Signed-off-by: Max Reitz <[email protected]>

block: add missed aio_context_acquire into release_drive

Recently we expirience hang with iothreads enabled with the following
call trace:
Thread 1 (Thread 0x7fa95efebc80 (LWP 177117)):
0  ppoll () from /lib64/libc.so.6
2  qemu_poll_ns () at qemu-timer.c:313
3  aio_poll () at aio-posix.c:457
4  bdrv_flush () at block/io.c:2641
5  bdrv_close () at block.c:2143
6  bdrv_delete () at block.c:2352
7  bdrv_unref () at block.c:3429
8  blk_remove_bs () at block/block-backend.c:427
9  blk_delete () at block/block-backend.c:178
10 blk_unref () at block/block-backend.c:226
11 object_property_del_all () at qom/object.c:399
12 object_finalize () at qom/object.c:461
13 object_unref () at qom/object.c:898
14 object_property_del_child () at qom/object.c:422
15 qmp_marshal_device_del () at qmp-marshal.c:1145
16 handle_qmp_command () at /usr/src/debug/qemu-2.6.0/monitor.c:3929

Technically bdrv_flush() stucks in
    while (rwco.ret == NOT_DONE) {
        aio_poll(aio_context, true);
    }
but rwco.ret is equal to 0 thus we have missed wakeup. Code investigation
reveals that we do not have performed aio_context_acquire() on this call
stack.

This patch adds missed lock.

Signed-off-by: Denis V. Lunev <[email protected]>
CC: Kevin Wolf <[email protected]>
CC: Max Reitz <[email protected]>
CC: Eric Blake <[email protected]>
CC: Markus Armbruster <[email protected]>
Message-id: 1490717566 [email protected]
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

usb-host: switch to LIBUSB_API_VERSION

libusbx doesn't exist any more, the fork got merged back to libusb. So
stop using LIBUSBX_API_VERSION and use LIBUSB_API_VERSION instead. For
backward compatibility alias LIBUSB_API_VERSION to LIBUSBX_API_VERSION
in case we figure LIBUSB_API_VERSION isn't defined.

Signed-off-by: Gerd Hoffmann <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Tested-by: Peter Maydell <[email protected]>
Message-id: 20170403105238 [email protected]

disas/cris.c: Avoid unintentional sign extension

Commit 001ebaca7b11 fixed some unintended sign extension issues
spotted by Coverity (CID 1005402, 1005403), but didn't catch
all of them. Fix the rest, so we behave consistently whether
'long' is 32 bit or 64 bit.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Edgar E. Iglesias <[email protected]>
Message-id: 1490970671 [email protected]

configure: Mark SPARC as supported

Thanks to John Paul Adrian Glaubitz <[email protected]>
and the Debian Project, we now have access to a SPARC Linux
system we can use for build testing. Move SPARC back into
the "supported" list.

Signed-off-by: Peter Maydell <[email protected]>
Message-id: 1490698718 [email protected]

tcg/sparc: Zero extend address argument to ld/st helpers

The C store helper functions take the address argument as a
target_ulong type; if this is 32 bit but the host is 64 bit
then the SPARC calling convention requires that the caller
must zero extend the value. We weren't doing this, which
meant we could pass values to the caller with high bits set
and QEMU would crash if it was compiled with optimizations.
In particular, the i386 BIOS would not start.

Signed-off-by: Peter Maydell <[email protected]>
Message-id: 1490871151 [email protected]
Reviewed-by: Richard Henderson <[email protected]>

tcg/sparc: Zero extend data argument to store helpers

The C store helper functions take the data argument as a uint8_t,
uint16_t, etc depending on the store size. The SPARC calling
convention requires that data types smaller than the register
size must be extended by the caller. We weren't doing this,
which meant that if QEMU was compiled with optimizations enabled
we could end up storing incorrect values to guest memory.
(In particular the i386 guest BIOS would crash on startup.)

Add code to the trampolines that call the store helpers to
do the zero extension as required.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: 1490871151 [email protected]
Reviewed-by: Richard Henderson <[email protected]>

exec: revert MemoryRegionCache

MemoryRegionCache did not know about virtio support for IOMMUs (because the
two features were developed at the same time). Revert MemoryRegionCache
to "normal" address_space_* operations for 2.9, as it is simpler than
undoing the virtio patches.

Signed-off-by: Paolo Bonzini <[email protected]>

Merge remote-tracking branch 'remotes/kraxel/tags/pull-fixes-20170403-1' into staging

bugfixes: xhci, input-linux and vnc

# gpg: Signature made Mon 03 Apr 2017 11:25:29 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg:                 aka "Gerd Hoffmann <[email protected]>"
# gpg:                 aka "Gerd Hoffmann (private) <[email protected]>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/pull-fixes-20170403-1:
  vnc: allow to connect with add_client when -vnc none
  Fix input-linux reading from device
  xhci: flush dequeue pointer to endpoint context

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170403' into staging

ppc patch queue 2017-04-03

A single bugfix in this pull request, for an ugly assert() failure, if
the user ignores the information in query-hotpluggable-cpus and tries
to hot add CPUs to pseries with bad parameters.

# gpg: Signature made Mon 03 Apr 2017 11:06:58 BST
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <[email protected]>"
# gpg:                 aka "David Gibson (Red Hat) <[email protected]>"
# gpg:                 aka "David Gibson (ozlabs.org) <[email protected]>"
# gpg:                 aka "David Gibson (kernel.org) <[email protected]>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.9-20170403:
  pseries: Enforce homogeneous threads-per-core

Signed-off-by: Peter Maydell <[email protected]>

vnc: allow to connect with add_client when -vnc none

Do not skip VNC initialization, in particular of auth method when vnc is
configured without sockets, since we should still allow connections
through QMP add_client.

Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=1434551

Signed-off-by: Marc-André Lureau <[email protected]>
Message-id: 20170328160646 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

Fix input-linux reading from device

The evdev devices in input-linux.c are read in blocks of one whole
event. If there are not enough bytes available, they are discarded,
instead of being kept for the next read operation. This results in
lost events, of even non-working devices.

This patch keeps track of the number of bytes to be read to fill up
a whole event, and then handle it.

Changes from v1 to v2:
- Fix: Calculate offset on each iteration

Changes from v2 to v3:
- Fix coding style
- Store offset instead of bytes to be read

Signed-off-by: Javier Celaya <[email protected]>
Message-id: 20170327182624 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

xhci: flush dequeue pointer to endpoint context

When done processing a endpoint ring we must update the dequeue pointer
in the endpoint context in guest memory. This is needed to make sure
the guest has a correct view of things and also to make live migration
work properly, because xhci post_load restores alot of the state from
xhci data structures in guest memory.

Add xhci_set_ep_state() call to do that.

The recursive calls stopped by commit
ddb603ab6c981c1d67cb42266fc700c33e5b2d8f had the (unintentional) side
effect to hiding this bug. xhci_set_ep_state() was called before
processing, to set the state to running, which updated the dequeue
pointer too.

Reported-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Gerd Hoffmann <[email protected]>
Tested-by: Dr. David Alan Gilbert <[email protected]>
Message-id: 20170331102521 [email protected]

Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging

# gpg: Signature made Sat 01 Apr 2017 02:23:29 BST
# gpg:                using RSA key 0xBDBE7B27C0DE3057
# gpg: Good signature from "Jeffrey Cody <[email protected]>"
# gpg:                 aka "Jeffrey Cody <[email protected]>"
# gpg:                 aka "Jeffrey Cody <[email protected]>"
# Primary key fingerprint: 9957 4B4D 3474 90E7 9D98  D624 BDBE 7B27 C0DE 3057

* remotes/cody/tags/block-pull-request:
  block/curl: Check protocol prefix
  qapi/curl: Extend and fix blockdev-add schema
  rbd: Fix regression in legacy key/values containing escaped :

Signed-off-by: Peter Maydell <[email protected]>

pseries: Enforce homogeneous threads-per-core

For reasons that may be useful in future, CPU core objects, as used on the
pseries machine type have their own nr-threads property, potentially
allowing cores with different numbers of threads in the same system.

If the user/management uses the values specified in query-hotpluggable-cpus
as they're expected to do, this will never matter in pratice. But that's
not actually enforced - it's possible to manually specify a core with
a different number of threads from that in -smp. That will confuse the
platform - most immediately, this can be used to create a CPU thread with
index above max_cpus which leads to an assertion failure in
spapr_cpu_core_realize().

For now, enforce that all cores must have the same, standard, number of
threads.

Signed-off-by: David Gibson <[email protected]>
Reviewed-by: Bharata B Rao <[email protected]>

nbd: fix memory leak on socket_connect failed

When TCP connection fails between nbd server and client,
the local var, sioc, memory leak.

This patch fixes the memory leak.

Signed-off-by: yaolujing <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-Id: <1491005709 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

ipmi: Fix macro issues

Macro parameters should almost always have () around them when used.
llvm reported an error on this.

Remove redundant parenthesis and put parenthesis around the entire
macros with assignments in case they are used in an expression.

Remove some unused macros.

Reported in https://bugs.launchpad.net/bugs/1651167

Signed-off-by: Corey Minyard <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-Id: <1490894892 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target-i386: fix "info lapic" segfault on isapc

Start QEMU with
"qemu-system-x86_64 -nographic -M isapc -serial none-monitor stdio"
and enter "info lapic" at the monitor prompt â‡’
Segmentation fault

Signed-off-by: Tejaswini Poluri <[email protected]>
Message-Id: <1490685583 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>