Git Repo - qemu.git/log

hw/acpi: remove dead acpi code

Signed-off-by: Aleksandr Bezzubikov <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

fw_cfg: move setting of FW_CFG_VERSION_DMA bit to fw_cfg_init1()

The setting of the FW_CFG_VERSION_DMA bit is the same across both the
TYPE_FW_CFG_MEM and TYPE_FW_CFG_IO devices, so unify the logic in
fw_cfg_init1().

Signed-off-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Laszlo Ersek <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Tested-by: Gabriel Somlo <[email protected]>

fw_cfg: don't map the fw_cfg IO ports in fw_cfg_io_realize()

As indicated by Laszlo it is a QOM bug for the realize() method to actually
map the device. Set up the IO regions within fw_cfg_io_realize() and defer
the mapping with sysbus_add_io() to the caller, as already done in
fw_cfg_init_mem_wide().

This makes the iobase and dma_iobase properties now obsolete so they can be
removed.

Signed-off-by: Mark Cave-Ayland <[email protected]>
Reviewed-by: Laszlo Ersek <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Tested-by: Gabriel Somlo <[email protected]>

i386/kvm/pci-assign: Use errp directly rather than local_err

In assigned_device_pci_cap_init(), first, error messages are filled
to a local_err variable, then through error_propagate() pass to
the parameter of errp. It leads to cumbersome code. In order to
avoid the extra local_err and error_propagate(), drop it and use
errp instead.

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Mao Zhongyi <[email protected]>
Reviewed-by: Marcel Apfelbaum <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

i386/kvm/pci-assign: Fix return type of verify_irqchip_kernel()

When the function no success value to transmit, it usually make the
function return void. It has turned out not to be a success, because
it means that the extra local_err variable and error_propagate() will
be needed. It leads to cumbersome code, therefore, transmit success/
failure in the return value is worth. So fix the return type to avoid
it.

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Mao Zhongyi <[email protected]>
Reviewed-by: Marcel Apfelbaum <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pci: Convert shpc_init() to Error

In order to propagate error message better, convert shpc_init() to
Error also convert the pci_bridge_dev_initfn() to realize.

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Mao Zhongyi <[email protected]>
Reviewed-by: Marcel Apfelbaum <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pci: Convert to realize

Convert i82801b11, io3130_upstream, io3130_downstream and
pcie_root_port devices to realize.

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Mao Zhongyi <[email protected]>
Reviewed-by: Marcel Apfelbaum <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pci: Replace pci_add_capability2() with pci_add_capability()

After the patch 'Make errp the last parameter of pci_add_capability()',
pci_add_capability() and pci_add_capability2() now do exactly the same.
So drop the wrapper pci_add_capability() of pci_add_capability2(), then
replace the pci_add_capability2() with pci_add_capability() everywhere.

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Suggested-by: Eduardo Habkost <[email protected]>
Signed-off-by: Mao Zhongyi <[email protected]>
Reviewed-by: Marcel Apfelbaum <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pci: Make errp the last parameter of pci_add_capability()

Add Error argument for pci_add_capability() to leverage the errp
to pass info on errors. This way is helpful for its callers to
make a better error handling when moving to 'realize'.

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Mao Zhongyi <[email protected]>
Reviewed-by: Marcel Apfelbaum <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pci: Fix the wrong assertion.

pci_add_capability returns a strictly positive value on success,
correct asserts.

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Mao Zhongyi <[email protected]>
Reviewed-by: Marcel Apfelbaum <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pci: Add comment for pci_add_capability2()

Comments for pci_add_capability2() to explain the return
value. This may help to make a correct return value check
for its callers.

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Mao Zhongyi <[email protected]>
Reviewed-by: Marcel Apfelbaum <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pci: Clean up error checking in pci_add_capability()

On success, pci_add_capability2() returns a positive value. On
failure, it sets an error and return a negative value.

pci_add_capability() laboriously checks this behavior. No other
caller does. Drop the checks from pci_add_capability().

Cc: [email protected]
Cc: [email protected]
Signed-off-by: Mao Zhongyi <[email protected]>
Reviewed-by: Marcel Apfelbaum <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

intel_iommu: relax iq tail check on VTD_GCMD_QIE enable

The VT-d spec (section 6.5.2) prescribes software to zero the
Invalidation Queue Tail Register before enabling the VTD_GCMD_QIE
Global Command Register bit. Windows Server 2012 R2 and possibly
other older Windows versions violate the protocol and set a
non-zero queue tail first, which in effect makes them crash early
on boot with -device intel-iommu,intremap=on.

This commit relaxes the check and instead of failing to enable
VTD_GCMD_QIE with vtd_err_qi_enable, it behaves as if the tail
register was set just after enabling VTD_GCMD_QIE
(see vtd_handle_iqt_write).

Signed-off-by: Ladi Prosek <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

hw/pci-bridge/dec: Classify the DEC PCI bridge as bridge device

This way the bridge shows up in the correct section of the
"-device help" text.

Signed-off-by: Thomas Huth <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Reviewed-by: Marcel Apfelbaum <[email protected]>

virtio-net: enable configurable tx queue size

This patch enables the virtio-net tx queue size to be configurable
between 256 (the default queue size) and 1024 by the user when the
vhost-user backend is used.

Currently, the maximum tx queue size for other backends is 512 due
to the following limitations:
- QEMU backend: the QEMU backend implementation in some cases may
send 1024+1 iovs to writev.
- Vhost_net backend: there are possibilities that the guest sends
a vring_desc of memory which crosses a MemoryRegion thereby
generating more than 1024 iovs after translation from guest-physical
address in the backend.

Signed-off-by: Wei Wang <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20170603' into staging

Queued TCG patches

# gpg: Signature made Fri 30 Jun 2017 20:03:53 BST
# gpg:                using RSA key 0xAD1270CC4DD0279B
# gpg: Good signature from "Richard Henderson <[email protected]>"
# gpg:                 aka "Richard Henderson <[email protected]>"
# gpg:                 aka "Richard Henderson <[email protected]>"
# Primary key fingerprint: 9CB1 8DDA F8E8 49AD 2AFC  16A4 AD12 70CC 4DD0 279B

* remotes/rth/tags/pull-tcg-20170603:
  tcg: consistently access cpu->tb_jmp_cache atomically
  gen-icount: use tcg_ctx.tcg_env instead of cpu_env
  gen-icount: add missing inline to gen_tb_end

Signed-off-by: Peter Maydell <[email protected]>

tcg: consistently access cpu->tb_jmp_cache atomically

Some code paths can lead to atomic accesses racing with memset()
on cpu->tb_jmp_cache, which can result in torn reads/writes
and is undefined behaviour in C11.

These torn accesses are unlikely to show up as bugs, but from code
inspection they seem possible. For example, tb_phys_invalidate does:
    /* remove the TB from the hash list */
    h = tb_jmp_cache_hash_func(tb->pc);
    CPU_FOREACH(cpu) {
        if (atomic_read(&cpu->tb_jmp_cache[h]) == tb) {
            atomic_set(&cpu->tb_jmp_cache[h], NULL);
        }
    }
Here atomic_set might race with a concurrent memset (such as the
ones scheduled via "unsafe" async work, e.g. tlb_flush_page) and
therefore we might end up with a torn pointer (or who knows what,
because we are under undefined behaviour).

This patch converts parallel accesses to cpu->tb_jmp_cache to use
atomic primitives, thereby bringing these accesses back to defined
behaviour. The price to pay is to potentially execute more instructions
when clearing cpu->tb_jmp_cache, but given how infrequently they happen
and the small size of the cache, the performance impact I have measured
is within noise range when booting debian-arm.

Note that under "safe async" work (e.g. do_tb_flush) we could use memset
because no other vcpus are running. However I'm keeping these accesses
atomic as well to keep things simple and to avoid confusing analysis
tools such as ThreadSanitizer.

Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <1497486973 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

gen-icount: use tcg_ctx.tcg_env instead of cpu_env

We are relying on cpu_env being defined as a global, yet most
targets (i.e. all but arm/a64) have it defined as a local variable.
Luckily all of them use the same "cpu_env" name, but really
compilation shouldn't break if the name of that local variable
changed.

Fix it by using tcg_ctx.tcg_env, which all targets set in their
translate_init function. This change also helps paving the way
for the upcoming "translation loop common to all targets" work.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <1497639397 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

gen-icount: add missing inline to gen_tb_end

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <1497639397 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

Merge remote-tracking branch 'remotes/famz/tags/block-pull-request' into staging

# gpg: Signature made Fri 30 Jun 2017 15:08:45 BST
# gpg:                using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021  AD56 CA35 624C 6A91 71C6

* remotes/famz/tags/block-pull-request:
  block: Exploit BDRV_BLOCK_EOF for larger zero blocks
  block: Add BDRV_BLOCK_EOF to bdrv_get_block_status()

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/vivier/tags/m68k-for-2.10-pull-request' into staging

# gpg: Signature made Fri 30 Jun 2017 13:30:44 BST
# gpg:                using RSA key 0xF30C38BD3F2FBE3C
# gpg: Good signature from "Laurent Vivier <[email protected]>"
# gpg:                 aka "Laurent Vivier <[email protected]>"
# gpg:                 aka "Laurent Vivier (Red Hat) <[email protected]>"
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier/tags/m68k-for-2.10-pull-request:
  target/m68k: add fmovem
  target/m68k: add explicit single and double precision operations (part 2)
  target/m68k: add fsglmul and fsgldiv
  softfloat: define floatx80_round()
  target/m68k: add explicit single and double precision operations
  target/m68k: add fmovecr
  target/m68k: add fscc.

Signed-off-by: Peter Maydell <[email protected]>

block: Exploit BDRV_BLOCK_EOF for larger zero blocks

When we have a BDS with unallocated clusters, but asking the status
of its underlying bs->file or backing layer encounters an end-of-file
condition, we know that the rest of the unallocated area will read as
zeroes. However, pre-patch, this required two separate calls to
bdrv_get_block_status(), as the first call stops at the point where
the underlying file ends. Thanks to BDRV_BLOCK_EOF, we can now widen
the results of the primary status if the secondary status already
includes BDRV_BLOCK_ZERO.

In turn, this fixes a TODO mentioned in iotest 154, where we can now
see that all sectors in a partial cluster at the end of a file read
as zero when coupling the shorter backing file's status along with our
knowledge that the remaining sectors came from an unallocated cluster.

Also, note that the loop in bdrv_co_get_block_status_above() had an
inefficent exit: in cases where the active layer sets BDRV_BLOCK_ZERO
but does NOT set BDRV_BLOCK_ALLOCATED (namely, where we know we read
zeroes merely because our unallocated clusters lie beyond the backing
file's shorter length), we still ended up probing the backing layer
even though we already had a good answer.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20170505021500 [email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>

block: Add BDRV_BLOCK_EOF to bdrv_get_block_status()

Just as the block layer already sets BDRV_BLOCK_ALLOCATED as a
shortcut for subsequent operations, there are also some optimizations
that are made easier if we can quickly tell that *pnum will advance
us to the end of a file, via a new BDRV_BLOCK_EOF which gets set
by the block layer.

This just plumbs up the new bit; subsequent patches will make use
of it.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20170505021500 [email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

# gpg: Signature made Fri 30 Jun 2017 12:46:17 BST
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg:                 aka "Stefan Hajnoczi <[email protected]>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request:
  virtio-pci: use ioeventfd even when KVM is disabled
  tests: fix virtio-net-test ISR dependence
  tests: fix virtio-blk-test ISR dependence
  tests: fix virtio-scsi-test ISR dependence
  libqos: add virtio used ring support
  libqos: fix typo in virtio.h QVirtQueue->used comment
  virtio-blk: trace vdev so devices can be distinguished

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.10-20170630' into staging

ppc patch queue 2017-06-30

  * More DRC cleanups, these now actually fix a few bugs
  * Properly implements the openpic timers (they now count and
    generate interrupts)
  * Fixes for XICS migration
  * Fixes for migration of POWER9 RPT guests
  * The last of the compatibility mode rework

# gpg: Signature made Fri 30 Jun 2017 10:52:25 BST
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <[email protected]>"
# gpg:                 aka "David Gibson (Red Hat) <[email protected]>"
# gpg:                 aka "David Gibson (ozlabs.org) <[email protected]>"
# gpg:                 aka "David Gibson (kernel.org) <[email protected]>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.10-20170630: (21 commits)
  spapr: Clean up DRC set_isolation_state() path
  spapr: Clean up DRC set_allocation_state path
  spapr: Make DRC reset force DRC into known state
  spapr: Split DRC release from DRC detach
  spapr: Eliminate DRC 'signalled' state variable
  spapr: Start hotplugged PCI devices in ISOLATED state
  target-ppc: Enable open-pic timers to count and generate interrupts
  hw/ppc/spapr.c: consecutive 'spapr->patb_entry = 0' statements
  spapr: prevent QEMU crash when CPU realization fails
  target/ppc: Proper cleanup when ppc_cpu_realizefn fails
  spapr: fix migration of ICPState objects from/to older QEMU
  xics: directly register ICPState objects to vmstate
  target/ppc: Fix return value in tcg radix mmu fault handler
  target/ppc/excp_helper: Take BQL before calling cpu_interrupt()
  spapr: Fix migration of Radix guests
  spapr: Add a "no HPT" encoding to HTAB migration stream
  ppc: Rework CPU compatibility testing across migration
  pseries: Reset CPU compatibility mode
  pseries: Move CPU compatibility property to machine
  qapi: add explicit null to string input and output visitors
  ...

Signed-off-by: Peter Maydell <[email protected]>

virtio-pci: use ioeventfd even when KVM is disabled

Old kvm.ko versions only supported a tiny number of ioeventfds so
virtio-pci avoids ioeventfds when kvm_has_many_ioeventfds() returns 0.

Do not check kvm_has_many_ioeventfds() when KVM is disabled since it
always returns 0. Since commit 8c56c1a592b5092d91da8d8943c17777d6462a6f
("memory: emulate ioeventfd") it has been possible to use ioeventfds in
qtest or TCG mode.

This patch makes -device virtio-blk-pci,iothread=iothread0 work even
when KVM is disabled.

I have tested that virtio-blk-pci works under TCG both with and without
iothread.

This patch fixes qemu-iotests 068, which was accidentally merged early
despite the dependency on ioeventfd.

Cc: Michael S. Tsirkin <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Tested-by: Eric Blake <[email protected]>
Tested-by: Kevin Wolf <[email protected]>
Message-id: 20170628184724 [email protected]
Message-id: 20170615163813 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

tests: fix virtio-net-test ISR dependence

Use the new used ring APIs instead of assuming ISR being set means the
request has completed.

Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Tested-by: Eric Blake <[email protected]>
Tested-by: Kevin Wolf <[email protected]>
Message-id: 20170628184724 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

tests: fix virtio-blk-test ISR dependence

Use the new used ring APIs instead of assuming ISR being set means the
request has completed.

Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Tested-by: Eric Blake <[email protected]>
Tested-by: Kevin Wolf <[email protected]>
Message-id: 20170628184724 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

tests: fix virtio-scsi-test ISR dependence

Use the new used ring APIs instead of assuming ISR being set means the
request has completed.

Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Tested-by: Eric Blake <[email protected]>
Tested-by: Kevin Wolf <[email protected]>
Message-id: 20170628184724 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

libqos: add virtio used ring support

Existing tests do not touch the virtqueue used ring.  Instead they poll
the virtqueue ISR register and peek into their request's device-specific
status field.

It turns out that the virtqueue ISR register can be set to 1 more than
once for a single notification (see commit
83d768b5640946b7da55ce8335509df297e2c7cd "virtio: set ISR on dataplane
notifications").  This causes problems for tests that assume a 1:1
correspondence between the ISR being 1 and request completion.

Peeking at device-specific status fields is also problematic if the
device has no field that can be abused for EINPROGRESS polling
semantics.  This is the case if all the field's values may be set by the
device; there's no magic constant left for polling.

It's time to process the used ring for completed requests, just like a
real virtio guest driver.  This patch adds the necessary APIs.

Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Tested-by: Eric Blake <[email protected]>
Tested-by: Kevin Wolf <[email protected]>
Message-id: 20170628184724 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

libqos: fix typo in virtio.h QVirtQueue->used comment

Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Tested-by: Eric Blake <[email protected]>
Tested-by: Kevin Wolf <[email protected]>
Message-id: 20170628184724 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

spapr: Clean up DRC set_isolation_state() path

There are substantial differences in the various paths through
set_isolation_state(), both for setting to ISOLATED versus UNISOLATED
state and for logical versus physical DRCs.

So, split the set_isolation_state() method into isolate() and unisolate()
methods, and give it different implementations for the two DRC types.

Factor some minimal common checks, including for valid indicator values
(which we weren't previously checking) into rtas_set_isolation_state().

Signed-off-by: David Gibson <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Reviewed-by: Michael Roth <[email protected]>

spapr: Clean up DRC set_allocation_state path

The allocation-state indicator should only actually be implemented for
"logical" DRCs, not physical ones.  Factor a check for this, and also for
valid indicator state values into rtas_set_allocation_state().  Because
they don't exist for physical DRCs, there's no reason that we'd ever want
more than one method implementation, so it can just be a plain function.

In addition, the setting to USABLE and setting to UNUSABLE paths in
set_allocation_state() don't actually have much in common.  So, split the
method separate functions for each parameter value (drc_set_usable()
and drc_set_unusable()).

Signed-off-by: David Gibson <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Reviewed-by: Michael Roth <[email protected]>

spapr: Make DRC reset force DRC into known state

The reset handler for DRCs attempts several state transitions which are
subject to various checks and restrictions.  But at reset time we know
there is no guest, so we can ignore most of the usual sequencing rules and
just set the DRC back to a known state.  In fact, it's safer to do so.

The existing code also has several redundant checks for
drc->awaiting_release inside a block which has already tested that.  This
patch removes those and sets the DRC to a fixed initial state based only
on whether a device is currently plugged or not.

With DRCs correctly reset to a state based on device presence, we don't
need to force state transitions as cold plugged devices are processed.
This allows us to remove all the callers of the set_*_state() methods from
outside spapr_drc.c.

Signed-off-by: David Gibson <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Reviewed-by: Michael Roth <[email protected]>

spapr: Split DRC release from DRC detach

spapr_drc_detach() is called when qemu generic code requests a device be
unplugged. It makes a number of tests, which could well delay further
action until later, before actually detach the device from the DRC.

This splits out the part which actually removes the device from the DRC
into spapr_drc_release(). This will be useful for further cleanups.

Signed-off-by: David Gibson <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Reviewed-by: Michael Roth <[email protected]>

spapr: Eliminate DRC 'signalled' state variable

The 'signalled' field in the DRC appears to be entirely a torturous
workaround for the fact that PCI devices were started in UNISOLATED state
for unclear reasons.

1) 'signalled' is already meaningless for logical (so far, all non PCI)
DRCs.  It's always set to true (at least at any point it might be tested),
and can't be assigned any real meaning due to the way signalling works for
logical DRCs.

2) For PCI DRCs, the only time signalled would be false is when non-zero
functions of a multifunction device are hotplugged, followed by function
zero (the other way around is explicitly not permitted). In that case the
secondary function DRCs are attached, but the notification isn't sent to
the guest until function 0 is plugged.

3) signalled being false is used to allow a DRC detach to switch mode
back to ISOLATED state, which allows a secondary function to be hotplugged
then unplugged with function 0 never inserted.  Without this a secondary
function starting in UNISOLATED state couldn't be detached again without
function 0 being inserted, all the functions configured by the guest, then
sent back to ISOLATED state.

4) But now that PCI DRCs start in ISOLATED state, there's nothing to be
done.  If the guest doesn't get the notification, it won't switch the
device to UNISOLATED state, so nothing prevents it from being unplugged.
If the guest does move it to UNISOLATED state without the signal (due to
a manual drmgr call, for instance) then it really isn't safe to unplug it.

So, this patch removes the signalled variable and all code related to it.

Signed-off-by: David Gibson <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Reviewed-by: Michael Roth <[email protected]>

spapr: Start hotplugged PCI devices in ISOLATED state

PCI DRCs, and only PCI DRCs, are immediately moved to UNISOLATED isolation
state once the device is attached. This has been there from the initial
implementation, and it's not clear why.

The state diagram in PAPR 13.4 suggests PCI devices should start in
ISOLATED state until the guest moves them into UNISOLATED, and the code in
the guest-side drmgr tool seems to work that way too.

Signed-off-by: David Gibson <[email protected]>
Reviewed-by: Michael Roth <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>

target-ppc: Enable open-pic timers to count and generate interrupts

Previously QEMU open-pic implemented the 4 open-pic timers including
all timer registers, but the timers did not "count" or generate any
interrupts. The patch makes the timers both count and generate
interrupts. The timer clock frequency is fixed at 25MHZ.

--

Responding to V2 patch comments.
- Simplify clock frequency logic and commentary.
- Remove camelCase variables.
- Timer objects now created at init rather than lazily.

Signed-off-by: Aaron Larson <[email protected]>
Signed-off-by: David Gibson <[email protected]>

hw/ppc/spapr.c: consecutive 'spapr->patb_entry = 0' statements

In ppc_spapr_reset(), if the guest is using HPT, the code was executing:

    } else {
        spapr->patb_entry = 0;
        spapr_setup_hpt_and_vrma(spapr);
    }

And, at the end of spapr_setup_hpt_and_vrma:

    /* We're setting up a hash table, so that means we're not radix */
    spapr->patb_entry = 0;

Resulting in spapr->patb_entry being assigned to 0 twice in a row.

Given that 'spapr_setup_hpt_and_vrma' is also called inside
'spapr_check_setup_free_hpt' of spapr_hcall.c, this trivial patch removes
the 'patb_entry = 0' assignment from the 'else' clause inside ppc_spapr_reset
to avoid this behavior.

Signed-off-by: Daniel Henrique Barboza <[email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr: prevent QEMU crash when CPU realization fails

ICPState objects were being allocated before CPU thread realization.
However commit 9ed656631d73 (xics: setup cpu at realize time) reversed it
by allocating ICPState objects after CPU thread is realized. But it
didn't take care to fix the error path because of which we observe
a SIGSEGV when CPU thread realization fails during cold/hotplug.

Fix this by ensuring that we do object_unparent() of ICPState object
only in case when is was created earlier.

Signed-off-by: Bharata B Rao <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Signed-off-by: David Gibson <[email protected]>

target/ppc: Proper cleanup when ppc_cpu_realizefn fails

If ppc_cpu_realizefn() fails after cpu_exec_realizefn() has been
called, we will have to undo whatever cpu_exec_realizefn() did
by explicitly calling cpu_exec_unrealizeffn() which is currently
missing. Failure to do this proper cleanup will result in CPU
which was never fully realized to linger on the cpus list causing
SIGSEGV later (for eg when running "info cpus").

Signed-off-by: Bharata B Rao <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr: fix migration of ICPState objects from/to older QEMU

Commit 5bc8d26de20c ("spapr: allocate the ICPState object from under
sPAPRCPUCore") moved ICPState objects from the machine to CPU cores.
This is an improvement since we no longer allocate ICPState objects
that will never be used. But it has the side-effect of breaking
migration of older machine types from older QEMU versions.

This patch allows spapr to register dummy "icp/server" entries to vmstate.
These entries use a dedicated VMStateDescription that can swallow and
discard state of an incoming migration stream, and that don't send anything
on outgoing migration.

As for real ICPState objects, the instance_id is the cpu_index of the
corresponding vCPU, which happens to be equal to the generated instance_id
of older machine types.

The machine can unregister/register these entries when CPUs are dynamically
plugged/unplugged.

This is only available for pseries-2.9 and older machines, thanks to a
compat property.

Signed-off-by: Greg Kurz <[email protected]>
Signed-off-by: David Gibson <[email protected]>

xics: directly register ICPState objects to vmstate

The ICPState objects are currently registered to vmstate as qdev objects.
Their instance ids are hence computed automatically in the migration code,
and thus depends on the order the CPU cores were plugged.

If the destination had its CPU cores plugged in a different order than the
source, then ICPState objects will have different instance_ids and load
the wrong state.

Since CPU objects have a reliable cpu_index which is already used as
instance_id in vmstate, let's use it for ICPState as well.

Please note that this doesn't break migration. Older machine types used to
allocate and realize all ICPState objects at machine init time, for the whole
lifetime of the machine. The qdev instance ids are thus 0,1,2... nr_servers
and happen to map to the vCPU indexes.

Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: David Gibson <[email protected]>

target/ppc: Fix return value in tcg radix mmu fault handler

The mmu fault handler should return 0 if it was able to successfully
handle the fault and a positive value otherwise.

Currently the tcg radix mmu fault handler will return 1 after
successfully handling a fault in virtual mode. This is incorrect
so fix it so that it returns 0 in this case.

The handler already correctly returns 0 when a fault was handled
in real mode and 1 if an interrupt was generated.

Fixes: d5fee0bbe68d ("target/ppc: Implement ISA V3.00 radix page fault handler")
Signed-off-by: Suraj Jitindar Singh <[email protected]>
Signed-off-by: David Gibson <[email protected]>

target/ppc/excp_helper: Take BQL before calling cpu_interrupt()

Since the introduction of MTTCG, using the msgsnd instruction
abort()s if being called without holding the BQL. So let's protect
that part of the code now with qemu_mutex_lock_iothread().

Buglink: https://bugs.launchpad.net/qemu/+bug/1694998
Signed-off-by: Thomas Huth <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr: Fix migration of Radix guests

Fix migration of radix guests by ensuring that we issue
KVM_PPC_CONFIGURE_V3_MMU for radix case post migration.

Reported-by: Nageswara R Sastry <[email protected]>
Signed-off-by: Bharata B Rao <[email protected]>
Reviewed-by: Suraj Jitindar Singh <[email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr: Add a "no HPT" encoding to HTAB migration stream

Add a "no HPT" encoding (using value -1) to the HTAB migration
stream (in the place of HPT size) when the guest doesn't allocate HPT.
This will help the target side to match target HPT with the source HPT
and thus enable successful migration.

Suggested-by: David Gibson <[email protected]>
Signed-off-by: Bharata B Rao <[email protected]>
Signed-off-by: David Gibson <[email protected]>

ppc: Rework CPU compatibility testing across migration

Migrating between different CPU versions is a bit complicated for ppc.
A long time ago, we ensured identical CPU versions at either end by
checking the PVR had the same value.  However, this breaks under KVM
HV, because we always have to use the host's PVR - it's not
virtualized.  That would mean we couldn't migrate between hosts with
different PVRs, even if the CPUs are close enough to compatible in
practice (sometimes identical cores with different surrounding logic
have different PVRs, so this happens in practice quite often).

So, we removed the PVR check, but instead checked that several flags
indicating supported instructions matched.  This turns out to be a bad
idea, because those instruction masks are not architected information, but
essentially a TCG implementation detail.  So changes to qemu internal CPU
modelling can break migration - this happened between qemu-2.6 and
qemu-2.7.  That was addressed by 146c11f1 "target-ppc: Allow eventual
removal of old migration mistakes".

Now, verification of CPU compatibility across a migration basically doesn't
happen.  We simply ignore the PVR of the incoming migration, and hope the
cpu on the destination is close enough to work.

Now that we've cleaned up handling of processor compatibility modes
for pseries machine type, we can do better.  For new machine types
(pseries-2.10+) We allow migration if:

    * The source and destination PVRs are for the same type of CPU, as
      determined by CPU class's pvr_match function
OR  * When the source was in a compatibility mode, and the destination CPU
      supports the same compatibility mode

For older machine types we retain the existing behaviour - current CAS
code will usually set a compat mode which would break backwards
migration if we made them use the new behaviour. [Fixed from an
earlier version by Greg Kurz].

Signed-off-by: David Gibson <[email protected]>
Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Suraj Jitindar Singh <[email protected]>
Tested-by: Andrea Bolognani <[email protected]>

pseries: Reset CPU compatibility mode

Currently, the CPU compatibility mode is set when the cpu is initialized,
then again when the guest negotiates features.  This means if a guest
negotiates a compatibility mode, then reboots, that compatibility mode
will be retained across the reset.

Usually that will get overridden when features are negotiated on the next
boot, but it's still not really correct.  This patch moves the initial set
up of the compatibility mode from cpu init to reset time.  The mode *is*
retained if the reboot was caused by the feature negotiation (it might
be important in that case, though it's unlikely).

Signed-off-by: David Gibson <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Michael Roth <[email protected]>
Tested-by: Andrea Bolognani <[email protected]>

pseries: Move CPU compatibility property to machine

Server class POWER CPUs have a "compat" property, which is used to set the
backwards compatibility mode for the processor.  However, this only makes
sense for machine types which don't give the guest access to hypervisor
privilege - otherwise the compatibility level is under the guest's control.

To reflect this, this removes the CPU 'compat' property and instead
creates a 'max-cpu-compat' property on the pseries machine.  Strictly
speaking this breaks compatibility, but AFAIK the 'compat' option was
never (directly) used with -device or device_add.

The option was used with -cpu.  So, to maintain compatibility, this
patch adds a hack to the cpu option parsing to strip out any compat
options supplied with -cpu and set them on the machine property
instead of the now deprecated cpu property.

Signed-off-by: David Gibson <[email protected]>
Tested-by: Suraj Jitindar Singh <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Tested-by: Greg Kurz <[email protected]>
Tested-by: Andrea Bolognani <[email protected]>

qapi: add explicit null to string input and output visitors

This may be used for deprecated object properties that are kept for
backwards compatibility.

Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Tested-by: Andrea Bolognani <[email protected]>
Signed-off-by: David Gibson <[email protected]>

hw/ppc/prep: Remove superfluous call to soundhw_init()

When using the 40p machine, soundhw_init() is currently called twice,
one time from vl.c and one time from ibm_40p_init(). The call in
ibm_40p_init() was likely just a copy-and-paste from a old version
of the prep machine - but there the call to audio_init() (which was
the previous name of this function) has been removed many years ago
already, with commit b3e6d591b05538056d665572f3e3bbfb3cbb70e7
("audio: enable PCI audio cards for all PCI-enabled targets"), so
we certainly also do not need the soundhw_init() in the 40p function
anymore nowadays.

Signed-off-by: Thomas Huth <[email protected]>
Reviewed-by: Sahid Ferdjaoui <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Reviewed-by: Hervé Poussineau <[email protected]>
Signed-off-by: David Gibson <[email protected]>

target/m68k: add fmovem

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <20170628204241 [email protected]>

target/m68k: add explicit single and double precision operations (part 2)

Add fsabs, fdabs, fsneg, fdneg, fsmove and fdmove.

The value is converted using the new floatx80_round() function.

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <20170628204241 [email protected]>

target/m68k: add fsglmul and fsgldiv

fsglmul and fsgldiv truncate data to single precision before computing
results.

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <20170628204241 [email protected]>

softfloat: define floatx80_round()

Add a function to round a floatx80 to the defined precision
(floatx80_rounding_precision)

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Aurelien Jarno <[email protected]>
Message-Id: <20170628204241 [email protected]>

target/m68k: add explicit single and double precision operations

Add fssqrt, fdsqrt, fsadd, fdadd, fssub, fdsub, fsmul, fdmul,
fsdiv, fddiv.

The precision is managed using set_floatx80_rounding_precision().

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <20170628204241 [email protected]>

target/m68k: add fmovecr

fmovecr moves a floating point constant from the
FPU ROM to a floating point register.

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <20170628204241 [email protected]>

target/m68k: add fscc.

use DisasCompare with FPU conditions in fscc and fbcc.

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <20170628204241 [email protected]>

Merge remote-tracking branch 'remotes/dgilbert/tags/pull-hmp-20170629' into staging

HMP pull 2017-06-29

# gpg: Signature made Thu 29 Jun 2017 17:27:55 BST
# gpg:                using RSA key 0x0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert/tags/pull-hmp-20170629:
  Add chardev-send-break monitor command
  monitor: Add -a (all) option to info registers

Signed-off-by: Peter Maydell <[email protected]>

Add chardev-send-break monitor command

Sending a break on a serial console can be useful for debugging the
guest. But not all chardev backends support sending breaks (only telnet
and mux do). The chardev-send-break command allows to send a break even
if using other backends.

Signed-off-by: Stefan Fritsch <[email protected]>
Acked-by: Dr. David Alan Gilbert <[email protected]>
Message-Id: <20170611074817 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Use 'send a break' in all 3 pieces of text as suggested by eblake

monitor: Add -a (all) option to info registers

The info registers command in the qemu monitor is used to dump register
values.

Currently this command uses the monitor cpu (which can be set by the
user) as the cpu for whose registers will be dumped. Sometimes it is
useful to see the registers for all cpus and currently this requires
setting the monitor cpu and the re-running the command for each cpu
in the system. I would be nice if there was an easier way to do this.

Add the "-a" option to the info registers command to dump the register
values for all cpus.

Signed-off-by: Suraj Jitindar Singh <[email protected]>
Message-Id: <20170608054116 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into staging

- fixes a minor bug that could possibly prevent old guests to remove
  directories
- makes default permissions for new files configurable from the cmdline
  when using mapped security modes
- handle transport errors
- g_malloc()+memcpy() converted to g_memdup()

# gpg: Signature made Thu 29 Jun 2017 14:12:42 BST
# gpg:                using DSA key 0x02FC3AEB0101DBC2
# gpg: Good signature from "Greg Kurz <[email protected]>"
# gpg:                 aka "Greg Kurz <[email protected]>"
# gpg:                 aka "Greg Kurz <[email protected]>"
# gpg:                 aka "Gregory Kurz (Groug) <[email protected]>"
# gpg:                 aka "[jpeg image of size 3330]"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894  DBA2 02FC 3AEB 0101 DBC2

* remotes/gkurz/tags/for-upstream:
  9pfs: handle transport errors in pdu_complete()
  xen-9pfs: disconnect if buffers are misconfigured
  virtio-9p: break device if buffers are misconfigured
  virtio-9p: message header is 7-byte long
  virtio-9p: record element after sanity checks
  9pfs: replace g_malloc()+memcpy() with g_memdup()
  9pfs: local: Add support for custom fmode/dmode in 9ps mapped security modes
  9pfs: local: remove: use correct path component

Signed-off-by: Peter Maydell <[email protected]>

ui/cocoa.m: Fix compatibility issue with Mac OS 10.9 and under

The [NSEvent modifierFlags] method returns an NSEventModifierFlags type value in Mac OS 10.10. It use to be of type NSUInteger. Replacing NSEventModifierFlags with NSUInteger allows for the cooca.m file to be compiled on older versions of Mac OS. This patch was been tested on Mac OS 10.6 and Mac OS 10.12 without problem.

Signed-off-by: John Arbuckle <[email protected]>
Message-id: F6C36C1A-4661-48F4-BEA6-3994889927D0@gmail.com
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

virtio-blk: trace vdev so devices can be distinguished

It is hard to analyze trace logs with multiple virtio-blk devices
because none of the trace events include the VirtIODevice *vdev.

This patch adds vdev so it's clear which device a request is associated
with.

I considered using VirtIOBlock *s instead but VirtIODevice *vdev is more
general and may be correlated with generic virtio trace events like
virtio_set_status.

Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Message-id: 20170614092930 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

9pfs: handle transport errors in pdu_complete()

Contrary to what is written in the comment, a buggy guest can misconfigure
the transport buffers and pdu_marshal() may return an error. If this ever
happens, it is up to the transport layer to handle the situation (9P is
transport agnostic).

This fixes Coverity issue CID1348518.

Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>

xen-9pfs: disconnect if buffers are misconfigured

Implement xen_9pfs_disconnect by unbinding the event channels. On
xen_9pfs_free, call disconnect if any event channels haven't been
disconnected.

If the frontend misconfigured the buffers set the backend to "Closing"
and disconnect it. Misconfigurations include requesting a read of more
bytes than available on the ring buffer, or claiming to be writing more
data than available on the ring buffer.

Signed-off-by: Stefano Stabellini <[email protected]>
Signed-off-by: Greg Kurz <[email protected]>

virtio-9p: break device if buffers are misconfigured

The 9P protocol is transport agnostic: if the guest misconfigured the
buffers, the best we can do is to set the broken flag on the device.

Signed-off-by: Greg Kurz <[email protected]>

virtio-9p: message header is 7-byte long

The 9p spec at http://man.cat-v.org/plan_9/5/intro reads:

"Each 9P message begins with a four-byte size field specify-
  ing the length in bytes of the complete message including
  the four bytes of the size field itself.  The next byte is
  the message type, one of the constants in the enumeration in
  the include file <fcall.h>.  The next two bytes are an iden-
  tifying tag, described below."

ie, each message starts with a 7-byte long header.

The core 9P code already assumes this pretty much everywhere. This patch
does the following:
- makes the assumption explicit in the common 9p.h header, since it isn't
  related to the transport
- open codes the header size in handle_9p_output() and hardens the sanity
  check on the space needed for the reply message

Signed-off-by: Greg Kurz <[email protected]>
Acked-by: Stefano Stabellini <[email protected]>

virtio-9p: record element after sanity checks

If the guest sends a malformed request, we end up with a dangling pointer
in V9fsVirtioState. This doesn't seem to cause any bug, but let's remove
this side effect anyway.

Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>

9pfs: replace g_malloc()+memcpy() with g_memdup()

I found these pattern via grepping the source tree. I don't have a
coccinelle script for it!

Signed-off-by: Marc-André Lureau <[email protected]>

9pfs: local: Add support for custom fmode/dmode in 9ps mapped security modes

In mapped security modes, files are created with very restrictive
permissions (600 for files and 700 for directories). This makes
file sharing between virtual machines and users on the host rather
complicated. Imagine eg. a group of users that need to access data
produced by processes on a virtual machine. Giving those users access
to the data will be difficult since the group access mode is always 0.

This patch makes the default mode for both files and directories
configurable. Existing setups that don't know about the new parameters
keep using the current secure behavior.

Signed-off-by: Tobias Schramm <[email protected]>
Signed-off-by: Greg Kurz <[email protected]>

9pfs: local: remove: use correct path component

Commit a0e640a8 introduced a path processing error.
Pass fstatat the dirpath based path component instead
of the entire path.

Signed-off-by: Bruce Rogers <[email protected]>
Signed-off-by: Greg Kurz <[email protected]>

Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20170628' into staging

migration/next for 20170628

# gpg: Signature made Wed 28 Jun 2017 12:16:44 BST
# gpg:                using RSA key 0xF487EF185872D723
# gpg: Good signature from "Juan Quintela <[email protected]>"
# gpg:                 aka "Juan Quintela <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03  4B82 F487 EF18 5872 D723

* remotes/juanquintela/tags/migration/20170628:
  exec: fix access to ram_list.dirty_memory when sync dirty bitmap
  migration: add "return-path" capability
  vmstate: error hint for failed equal checks
  migration: add comment for TYPE_MIGRATE
  migration: hmp: dump globals
  migration: merge enforce_config_section somewhat
  migration: move skip_section_footers
  migration: move skip_configuration out
  migration: move only_migratable to MigrationState
  migration: move global_state.optional out
  migration: let MigrationState be a qdev
  vl: clean up global property registration
  accel: introduce AccelClass.global_props
  machine: export register_compat_prop()

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/sstabellini/tags/xen-20170627-tag' into staging

Xen 2017/06/27

# gpg: Signature made Tue 27 Jun 2017 23:02:43 BST
# gpg:                using RSA key 0x894F8F4870E1AE90
# gpg: Good signature from "Stefano Stabellini <[email protected]>"
# gpg:                 aka "Stefano Stabellini <[email protected]>"
# Primary key fingerprint: D04E 33AB A51F 67BA 07D3  0AEA 894F 8F48 70E1 AE90

* remotes/sstabellini/tags/xen-20170627-tag:
  xen-disk: add support for multi-page shared rings
  xen-disk: only advertize feature-persistent if grant copy is not available
  xen/disk: don't leak stack data via response ring

Signed-off-by: Peter Maydell <[email protected]>

linux-user: Put PPC AT_IGNOREPPC auxv entries in the right place

The 32-bit PPC auxv is a bit complicated because in the
mists of time it used to be 16-aligned rather than directly
after the environment. Older glibc versions had code to
try to probe for whether it needed alignment or not:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/powerpc/dl-sysdep.c;hb=e84eabb3871c9b39e59323bf3f6b98c2ca9d1cd0
and the kernel has code which puts some magic entries at
the bottom to ensure that the alignment probe fails:
http://elixir.free-electrons.com/linux/latest/source/arch/powerpc/include/asm/elf.h#L158

QEMU has similar code too, but it was broken by commit
7c4ee5bcc82e64, which changed elfload.c from filling in
the auxv starting at the highest address and working down
to starting at the lowest address and working up. This
means that the ARCH_DLINFO hook must now be invoked first
rather than last, and the entries in it for PPC must
be reversed so that the magic AT_IGNOREPPC entries come
at the lowest address in the auxv as they should.

The effect of this was that if running a guest binary that
used an old glibc with the alignment probing the guest ld.so
code would segfault if the size of the guest environment and
argv happened to put the auxv at an address that triggered
the alignment code in the guest glibc.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Tested-by: Richard Henderson <[email protected]>
Message-id: 1498582198 [email protected]

exec: fix access to ram_list.dirty_memory when sync dirty bitmap

In cpu_physical_memory_sync_dirty_bitmap(rb, start, ...), the 2nd
argument 'start' is relative to the start of the ramblock 'rb'. When
it's used to access the dirty memory bitmap of ram_list (i.e.
ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]->blocks[]), an offset to
the start of all RAM (i.e. rb->offset) should be added to it, which has
however been missed since c/s 6b6712efcc. For a ramblock of host memory
backend whose offset is not zero, cpu_physical_memory_sync_dirty_bitmap()
synchronizes the incorrect part of the dirty memory bitmap of ram_list
to the per ramblock dirty bitmap. As a result, a guest with host
memory backend may crash after migration.

Fix it by adding the offset of ramblock when accessing the dirty memory
bitmap of ram_list in cpu_physical_memory_sync_dirty_bitmap().

Reported-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Haozhong Zhang <[email protected]>
Message-Id: <20170628083704 [email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Tested-by: Juan Quintela <[email protected]>
Tested-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: add "return-path" capability

When this capability is enabled, QEMU will use the return path even for
precopy migration. This is helpful at least in one case when destination
failed to load the image while source quited without confirmation. With
return path, source will wait for the last response from destination,
and if destination fails, it'll fail the migration on source, then the
guest can be run again on the source (rather than assuming to be good,
then the guest will be lost after source quits).

It needs to be enabled explicitly on source, otherwise disabled.

Signed-off-by: Peter Xu <[email protected]>
Message-Id: <1498472935 [email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

vmstate: error hint for failed equal checks

In some cases a failing VMSTATE_*_EQUAL does not mean we detected a bug,
but it's actually the best we can do. Especially in these cases a verbose
error message is required.

Let's introduce infrastructure for specifying a error hint to be used if
equal check fails. Let's do this by adding a parameter to the _EQUAL
macros called _err_hint. Also change all current users to pass NULL as
last parameter so nothing changes for them.

Signed-off-by: Halil Pasic <[email protected]>
Message-Id: <20170623144823 [email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: add comment for TYPE_MIGRATE

It'll be strange that the migration object inherits TYPE_DEVICE. Add
some explanations to it.

Signed-off-by: Peter Xu <[email protected]>
Message-Id: <1498634144 [email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: hmp: dump globals

Now we have some globals that can be configured for migration. Dump them
in HMP info migration for better debugging.

(we can also use this to monitor whether COMPAT fields are applied
correctly on compatible machines)

Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <1498536619 [email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: merge enforce_config_section somewhat

These two parameters:

- MachineState::enforce_config_section
- MigrationState::send_configuration

are playing similar role here. This patch merges the first one into
second, then we'll have a single place to reference whether we need to
send the configuration section.

I didn't remove the MachineState.enforce_config_section field since when
applying that machine property (in machine_set_property()) we haven't
yet initialized global properties and migration object. Then, it's
still not easy to pass that boolean to MigrationState at such an early
time.

A natural benefit for current patch is that now we kept the meaning of
"enforce-config-section" since it'll still have the highest
priority (that's what "enforce" mean I guess).

Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <1498536619 [email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: move skip_section_footers

Move it into MigrationState, revert its meaning and renaming it to
send_section_footer, with a property bound to it. Same trick is played
like previous patches.

Removing savevm_skip_section_footers().

Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <1498536619 [email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: move skip_configuration out

It was in SaveState but now moved to MigrationState altogether, reverted
its meaning, then renamed to "send_configuration". Again, using
HW_COMPAT_2_3 for old PC/SPAPR machines, and accel_register_prop() for
xen_init().

Removing savevm_skip_configuration().

Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <1498536619 [email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: move only_migratable to MigrationState

One less global variable, and it does only matter with migration.

We keep the old "--only-migratable" option, but also now we support:

-global migration.only-migratable=true

Currently still keep the old interface.

Hmm, now vl.c has no way to access migrate_get_current(). Export a
function for it to setup only_migratable.

Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <1498536619 [email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: move global_state.optional out

Put it into MigrationState then we can use the properties to specify
whether to enable storing global state.

Removing global_state_set_optional() since now we can use HW_COMPAT_2_3
for x86/power, and AccelClass.global_props for Xen.

Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <1498536619 [email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: let MigrationState be a qdev

Let the old man "MigrationState" join the object family. Direct benefit
is that we can start to use all the property features derived from
current QDev, like: HW_COMPAT_* bits, command line setup for migration
parameters (so will never need to set them up each time using HMP/QMP,
this is really, really attractive for test writters), etc.

I see no reason to disallow this happen yet. So let's start from this
one, to see whether it would be anything good.

Now we init the MigrationState struct statically in main() to make sure
it's initialized after global properties are applied, since we'll use
them during creation of the object.

No functional change at all.

Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <1498536619 [email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

vl: clean up global property registration

It's not that clear on how the global properties are registered to
global_props (and also its priority relationship). Let's provide a
single function to be called in main() for that, with comment to explain
it a bit.

Signed-off-by: Peter Xu <[email protected]>
Message-Id: <1498536619 [email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

accel: introduce AccelClass.global_props

Introduce this new field for the accelerator classes so that each
specific accelerator in the future can register its own global
properties to be used further by the system. It works just like how the
old machine compatible properties do, but only tailored for
accelerators.

Introduce register_compat_props_array() for it. Export it so that it may
be used in other codes as well in the future.

Suggested-by: Eduardo Habkost <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <1498536619 [email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

machine: export register_compat_prop()

We have HW_COMPAT_*, however that's only bound to machines, not other
things (like accelerators). Behind it, it was register_compat_prop()
that played the trick. Let's export the function for further use
outside HW_COMPAT_* magic.

Meanwhile, move it to qdev-properties.c where seems more proper (since
it'll be used not only in machine codes).

Signed-off-by: Peter Xu <[email protected]>
Message-Id: <1498536619 [email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

xen-disk: add support for multi-page shared rings

The blkif protocol has had provision for negotiation of multi-page shared
rings for some time now and many guest OS have support in their frontend
drivers.

This patch makes the necessary modifications to xen-disk support a shared
ring up to order 4 (i.e. 16 pages).

Signed-off-by: Paul Durrant <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>

xen-disk: only advertize feature-persistent if grant copy is not available

If grant copy is available then it will always be used in preference to
persistent maps. In this case feature-persistent should not be advertized
to the frontend, otherwise it may needlessly copy data into persistently
granted buffers.

Signed-off-by: Paul Durrant <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>

xen/disk: don't leak stack data via response ring

Rather than constructing a local structure instance on the stack, fill
the fields directly on the shared ring, just like other (Linux)
backends do. Build on the fact that all response structure flavors are
actually identical (aside from alignment and padding at the end).

This is XSA-216.

Reported by: Anthony Perard <[email protected]>
Signed-off-by: Jan Beulich <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>
Acked-by: Anthony PERARD <[email protected]>

Merge remote-tracking branch 'remotes/edgar/tags/edgar/mmio-exec-v2.for-upstream' into staging

edgar/mmio-exec-v2.for-upstream

# gpg: Signature made Tue 27 Jun 2017 16:22:30 BST
# gpg:                using RSA key 0x29C596780F6BCA83
# gpg: Good signature from "Edgar E. Iglesias (Xilinx key) <[email protected]>"
# gpg:                 aka "Edgar E. Iglesias <[email protected]>"
# Primary key fingerprint: AC44 FEDC 14F7 F1EB EDBF  4151 29C5 9678 0F6B CA83

* remotes/edgar/tags/edgar/mmio-exec-v2.for-upstream:
  xilinx_spips: allow mmio execution
  exec: allow to get a pointer for some mmio memory region
  introduce mmio_interface
  qdev: add MemoryRegion property
  cputlb: fix the way get_page_addr_code fills the tlb
  cputlb: move get_page_addr_code
  cputlb: cleanup get_page_addr_code to use VICTIM_TLB_HIT

Signed-off-by: Peter Maydell <[email protected]>

xilinx_spips: allow mmio execution

This allows to execute from the lqspi area.

When the request_ptr is called the device loads 1024bytes from the SPI device.
Then this code can be executed by the guest.

Tested-by: Edgar E. Iglesias <[email protected]>
Reviewed-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: KONRAD Frederic <[email protected]>
Signed-off-by: Edgar E. Iglesias <[email protected]>

exec: allow to get a pointer for some mmio memory region

This introduces a special callback which allows to run code from some MMIO
devices.

SysBusDevice with a MemoryRegion which implements the request_ptr callback will
be notified when the guest try to execute code from their offset. Then it will
be able to eg: pre-load some code from an SPI device or ask a pointer from an
external simulator, etc..

When the pointer or the data in it are no longer valid the device has to
invalidate it.

Reviewed-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: KONRAD Frederic <[email protected]>
Signed-off-by: Edgar E. Iglesias <[email protected]>

introduce mmio_interface

This introduces mmio_interface object which contains a MemoryRegion
and can be hotplugged/hotunplugged.

Reviewed-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: KONRAD Frederic <[email protected]>
Signed-off-by: Edgar E. Iglesias <[email protected]>

qdev: add MemoryRegion property

We need to pass a pointer to a MemoryRegion for mmio_interface.
So this just adds that.

Reviewed-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: KONRAD Frederic <[email protected]>
Signed-off-by: Edgar E. Iglesias <[email protected]>

cputlb: fix the way get_page_addr_code fills the tlb

get_page_addr_code(..) does a cpu_ldub_code to fill the tlb:
This can lead to some side effects if a device is mapped at this address.

So this patch replaces the cpu_memory_ld by a tlb_fill.

Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: KONRAD Frederic <[email protected]>
Signed-off-by: Edgar E. Iglesias <[email protected]>

cputlb: move get_page_addr_code

This just moves the code before VICTIM_TLB_HIT macro definition
so we can use it.

Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: KONRAD Frederic <[email protected]>
Signed-off-by: Edgar E. Iglesias <[email protected]>