Git Repo - qemu.git/log

spice: fix coverity reported defect in display code

Report:

1. Condition surface, taking false branch
406    if (surface && ssd->surface &&
407        surface_width(surface) == pixman_image_get_width(ssd->surface) &&
408        surface_height(surface) == pixman_image_get_height(ssd->surface)) {
409        /* no-resize fast path: just swap backing store */
...

10. alias_transfer: Assigning: ssd->ds = surface.
440    ssd->ds = surface;

11. var_deref_op: Dereferencing null pointer ssd->ds.
CID 1264334 (#1 of 1): Dereference after null check (FORWARD_NULL)
441    ssd->surface = pixman_image_ref(ssd->ds->image);

Fix:

Move code block dereferencing ssd->ds into the already existing
if (ssd->ds) { ... } block.

Cc: Paolo Bonzini <[email protected]>
Signed-off-by: Gerd Hoffmann <[email protected]>

spice: add unix address support

Teach qemu to set up a Spice server with a UNIX socket using the
following arguments -spice unix,addr=path.

Signed-off-by: Gerd Hoffmann <[email protected]>

Merge remote-tracking branch 'remotes/pmaydell/tags/pull-misc-20150120' into staging

Miscellaneous cross-tree patches:
* load/store helper cleanup
* drop TARGET_HAS_ICE define and checks
* scripts/qapi-types.py: Add dummy member to empty structs
* cpu_ldst.h: Don't define helpers if MMU_MODE*_SUFFIX not defined

# gpg: Signature made Tue 20 Jan 2015 15:43:38 GMT using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <[email protected]>"

* remotes/pmaydell/tags/pull-misc-20150120:
  cpu_ldst.h: Don't define helpers if MMU_MODE*_SUFFIX not defined
  cpu_ldst.h, cpu-all.h, bswap.h: Update documentation on ld/st accessors
  cpu_ldst_template.h: Drop unused cpu_ldfq/stfq/ldfl/stfl accessors
  cpu_ldst.h: Drop unused _raw macros, saddr() and laddr()
  cpu_ldst_template.h: Use ld*_p directly rather than via ld*_raw macros
  cpu_ldst.h: Use inline functions for usermode cpu_ld/st accessors
  cpu_ldst.h: Remove unused very short ld*/st* defines
  cpu_ldst.h: Drop unused ld/st*_kernel defines
  target-mips: Don't use _raw load/store accessors
  linux-user/main.c (m68k): Use get_user_u16 rather than lduw in cpu_loop
  linux-user/vm86.c: Use cpu_ldl_data &c rather than plain ldl &c
  bsd-user/elfload.c: Don't use ldl() or ldq_raw()
  linux-user/elfload.c: Don't use _raw accessor functions
  target-sparc: Don't use {ld, st}*_raw functions
  monitor.c: Use ld*_p() instead of ld*_raw()
  cpu_ldst.h: Remove unused ldul_ macros
  exec.c: Drop TARGET_HAS_ICE define and checks
  scripts/qapi-types.py: Add dummy member to empty structs

Signed-off-by: Peter Maydell <[email protected]>

cpu_ldst.h: Don't define helpers if MMU_MODE*_SUFFIX not defined

Not all targets define a full set of suffix strings for the
NB_MMU_MODES that they have. In this situation, don't define any
helper functions for that mode, rather than defining helper functions
with no suffix at all. The MMU mode is still functional; it is merely
not directly accessible via cpu_ld*_MODE from target helper functions.

Also add an "NB_MMU_MODES >= 2" check to the definition of the mode 1
helpers -- some targets only define one MMU mode.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 1421432008 [email protected]

cpu_ldst.h, cpu-all.h, bswap.h: Update documentation on ld/st accessors

Add documentation of what the cpu_*_* accessors look like.
Correct some minor errors in the existing documentation of the
direct _p accessor family. Remove the near-duplicate comment
on the _p accessors from cpu-all.h and replace it with a reference
to the comment in bswap.h.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Message-id: 1421334118 [email protected]

cpu_ldst_template.h: Drop unused cpu_ldfq/stfq/ldfl/stfl accessors

The cpu_ldfq/stfq/ldfl/stfl accessors for loading and storing
float32 and float64 are completely unused, so delete them.
(The union they use for converting from the float32/float64
type to uint32_t or uint64_t is the wrong way to do it anyway:
they should be using make_float* and float*_val.)

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Message-id: 1421334118 [email protected]

cpu_ldst.h: Drop unused _raw macros, saddr() and laddr()

The _raw macros and their helpers saddr() and laddr() are now
totally unused -- delete them.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Message-id: 1421334118 [email protected]

cpu_ldst_template.h: Use ld*_p directly rather than via ld*_raw macros

The ld*_raw and st*_raw macros are now only used within the code
produced by cpu_ldst_template.h, and only in three places.
Expand these out to just call the ld_p and st_p functions directly.

Note that in all the callsites the address argument is a uintptr_t,
so we can drop that part of the double-cast used in the saddr() and
laddr() macros.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Message-id: 1421334118 [email protected]

cpu_ldst.h: Use inline functions for usermode cpu_ld/st accessors

Use inline functions rather than macros for cpu_ld/st accessors
for the *-user configurations, as we already do for softmmu.
This has a two advantages:
* we can actually typecheck our arguments
* we don't need to leak the _raw macros everywhere

Since the _kernel functions were only used by target-i386/seg_helper.c,
put the definitions for them in that file too. (It already has the
similar template include code to define them for the softmmu case,
so it makes sense to have it deal with defining them for user-only.)

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Message-id: 1421334118 [email protected]

cpu_ldst.h: Remove unused very short ld*/st* defines

The very short ld*/st* defines are now not used anywhere; delete them.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Message-id: 1421334118 [email protected]

cpu_ldst.h: Drop unused ld/st*_kernel defines

The ld*_kernel and st*_kernel defines are not used anywhere;
delete them.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Message-id: 1421334118 [email protected]

target-mips: Don't use _raw load/store accessors

Use cpu_*_data instead of the direct *_raw load/store accessors.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Message-id: 1421334118 [email protected]

linux-user/main.c (m68k): Use get_user_u16 rather than lduw in cpu_loop

In the m68k cpu_loop() use get_user_u16 to read the immediate for
the simcall rahter than lduw, to bring it into line with how other
archs do it and to remove another user of the ldl family of functions.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Message-id: 1421334118 [email protected]

linux-user/vm86.c: Use cpu_ldl_data &c rather than plain ldl &c

Use the cpu_ld*_data and cpu_st*_data family of functions to access
guest memory in vm86.c rather than the very short-named ldl/stl functions.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Message-id: 1421334118 [email protected]

bsd-user/elfload.c: Don't use ldl() or ldq_raw()

Use get_user_u64() and get_user_ual() instead of the ldl() and
ldq_raw() functions.

[Note that this change is not compile tested as it is actually
in dead code -- none of the bsd-user configurations are PPC.]

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Message-id: 1421334118 [email protected]

linux-user/elfload.c: Don't use _raw accessor functions

The _raw accessor functions are an implementation detail that has
leaked out to some callsites. Use get_user_u64() instead of ldq_raw().

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Message-id: 1421334118 [email protected]

target-sparc: Don't use {ld, st}*_raw functions

Instead of using the _raw family of ld/st accessor functions, use
cpu_*_data. All this code is CONFIG_USER_ONLY, so the two are the
same semantically, but the _raw functions are really a detail of
the implementation which has leaked into a few callsites like this one.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Message-id: 1421334118 [email protected]

monitor.c: Use ld*_p() instead of ld*_raw()

The monitor code for doing a memory_dump() was using ld*_raw() to do
target-CPU accesses out of a local buf[] array. The correct functions
for this purpose are ld*_p(), which take a host pointer, rather than
ld*_raw(), which take an integer representing a guest address and
are somewhat meaningless in softmmu configurations. Nobody noticed
because for softmmu the _raw functions are the same as ldl_p but
with some extra casts thrown in. Switch to using the correct functions
instead.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Message-id: 1421334118 [email protected]

cpu_ldst.h: Remove unused ldul_ macros

The five ldul_ macros are not used anywhere and are marked up with an XXX
comment. "ldul" is a non-standard prefix for our family of load instructions:
we don't mark 32-bit accesses for signedness because they return a 32 bit
quantity. So just delete them.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 1421334118 [email protected]

exec.c: Drop TARGET_HAS_ICE define and checks

The TARGET_HAS_ICE #define is intended to indicate whether a target-*
guest CPU implementation supports the breakpoint handling. However,
all our guest CPUs have that support (the only two which do not
define TARGET_HAS_ICE are unicore32 and openrisc, and in both those
cases the bp support is present and the lack of the #define is just
a bug). So remove the #define entirely: all new guest CPU support
should include breakpoint handling as part of the basic implementation.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 1420484960 [email protected]

scripts/qapi-types.py: Add dummy member to empty structs

Make sure that all generated C structs have at least one field; this
avoids potential issues with attempting to malloc space for
zero-length structs in C (g_malloc(sizeof struct) would return NULL).
It also avoids an incompatibility with C++ (where an empty struct is
size 1); that isn't important to us now but might be in future.

Generated empty structures look like this:
    struct Abort
    {
        char qapi_dummy_field_for_empty_struct;
    };

This silences clang warnings like:
./qapi-types.h:3752:1: warning: empty struct has size 0 in C, size 1 in C++ [-Wextern-c-compat]
struct Abort
^

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-id: 1419359069 [email protected]

Merge remote-tracking branch 'remotes/sstabellini/xen-2015-01-20-v2' into staging

* remotes/sstabellini/xen-2015-01-20-v2:
  xen: add a lock for the mapcache
  xen: do not use __-named variables in mapcache
  Xen: Use the ioreq-server API when available
  Add device listener interface

Signed-off-by: Peter Maydell <[email protected]>

xen: add a lock for the mapcache

Extend the existing dummy mapcache_lock/unlock macros to cover all of
xen-mapcache.c. This prepares for unlocked memory access, when parts
of exec.c will not be protected by the BQL.

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>

xen: do not use __-named variables in mapcache

Keep the namespace clean.

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>

Xen: Use the ioreq-server API when available

The ioreq-server API added to Xen 4.5 offers better security than
the existing Xen/QEMU interface because the shared pages that are
used to pass emulation request/results back and forth are removed
from the guest's memory space before any requests are serviced.
This prevents the guest from mapping these pages (they are in a
well known location) and attempting to attack QEMU by synthesizing
its own request structures. Hence, this patch modifies configure
to detect whether the API is available, and adds the necessary
code to use the API if it is.

Signed-off-by: Paul Durrant <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>
Acked-by: Stefano Stabellini <[email protected]>

Add device listener interface

The Xen ioreq-server API, introduced in Xen 4.5, requires that PCI device
models explicitly register with Xen for config space accesses. This patch
adds a listener interface into qdev-core which can be used by the Xen
interface code to monitor for arrival and departure of PCI devices.

Signed-off-by: Paul Durrant <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>

Merge remote-tracking branch 'remotes/kraxel/tags/pull-console-20150119-1' into staging

ui: add shared surface format negotiation.

# gpg: Signature made Mon 19 Jan 2015 12:47:36 GMT using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg:                 aka "Gerd Hoffmann <[email protected]>"
# gpg:                 aka "Gerd Hoffmann (private) <[email protected]>"

* remotes/kraxel/tags/pull-console-20150119-1:
  ui/sdl2: Support shared surface for more pixman formats
  ui/sdl: Support shared surface for more pixman formats
  ui/gtk: Support shared surface for most pixman formats
  ui/spice: Support shared surface for most pixman formats
  ui/vnc: Support shared surface for most pixman formats
  ui/pixman: add qemu_pixman_check_format
  ui: Add dpy_gfx_check_format() to check backend shared surface support
  ui: Make qemu_default_pixman_format() return 0 on unsupported formats

Signed-off-by: Peter Maydell <[email protected]>

ui/sdl2: Support shared surface for more pixman formats

Signed-off-by: Gerd Hoffmann <[email protected]>

ui/sdl: Support shared surface for more pixman formats

At least all the ones I've tested. We make the assumption that
SDL is going to be better at conversion than we are.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
[ kraxel: minor format tweaks ]

Signed-off-by: Gerd Hoffmann <[email protected]>

ui/gtk: Support shared surface for most pixman formats

At least all the ones I've tested. We make the assumption that
pixman is going to be better at conversion than we are.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
[ kraxel: just hook up qemu_pixman_check_format ]

Signed-off-by: Gerd Hoffmann <[email protected]>

ui/spice: Support shared surface for most pixman formats

Just hook up qemu_pixman_check_format.

Signed-off-by: Gerd Hoffmann <[email protected]>

ui/vnc: Support shared surface for most pixman formats

At least all the ones I've tested. We make the assumption that
pixman is going to be better at conversion than we are.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
[ kraxel: just hook up qemu_pixman_check_format ]

Signed-off-by: Gerd Hoffmann <[email protected]>

ui/pixman: add qemu_pixman_check_format

Convinience check_format function for UIs using pixman.

Signed-off-by: Gerd Hoffmann <[email protected]>

ui: Add dpy_gfx_check_format() to check backend shared surface support

This allows VGA to decide whether to use a shared surface based on
whether the UI backend supports the format or not. Backends that
don't provide the new callback fallback to native 32 bpp which
is equivalent to what was supported before.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
[ kraxel: fix console check, allow only 32 bpp as fallback ]

Signed-off-by: Gerd Hoffmann <[email protected]>

ui: Make qemu_default_pixman_format() return 0 on unsupported formats

In order to remove the logic for detecting supported shared
pixmap formats from device models, make qemu_default_pixman_format()
capable for failing by returning 0 which is not a possible format
value rather than asserting.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>

Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20150116' into staging

target-arm queue:
* fix endianness handling in fwcfg wide registers
* fix broken crypto insn emulation on big endian hosts

# gpg: Signature made Fri 16 Jan 2015 12:04:08 GMT using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <[email protected]>"

* remotes/pmaydell/tags/pull-target-arm-20150116:
fw_cfg: fix endianness in fw_cfg_data_mem_read() / _write()
target-arm: crypto: fix BE host support

Signed-off-by: Peter Maydell <[email protected]>

fw_cfg: fix endianness in fw_cfg_data_mem_read() / _write()

(1) Let's contemplate what device endianness means, for a memory mapped
device register (independently of QEMU -- that is, on physical hardware).

It determines the byte order that the device will put on the data bus when
the device is producing a *numerical value* for the CPU. This byte order
may differ from the CPU's own byte order, therefore when software wants to
consume the *numerical value*, it may have to swap the byte order first.

For example, suppose we have a device that exposes in a 2-byte register
the number of sheep we have to count before falling asleep. If the value
is decimal 37 (0x0025), then a big endian register will produce [0x00,
0x25], while a little endian register will produce [0x25, 0x00].

If the device register is big endian, but the CPU is little endian, the
numerical value will read as 0x2500 (decimal 9472), which software has to
byte swap before use.

However... if we ask the device about who stole our herd of sheep, and it
answers "XY", then the byte representation coming out of the register must
be [0x58, 0x59], regardless of the device register's endianness for
numeric values. And, software needs to copy these bytes into a string
field regardless of the CPU's own endianness.

(2) QEMU's device register accessor functions work with *numerical values*
exclusively, not strings:

The emulated register's read accessor function returns the numerical value
(eg. 37 decimal, 0x0025) as a *host-encoded* uint64_t. QEMU translates
this value for the guest to the endianness of the emulated device register
(which is recorded in MemoryRegionOps.endianness). Then guest code must
translate the numerical value from device register to guest CPU
endianness, before including it in any computation (see (1)).

(3) However, the data register of the fw_cfg device shall transfer strings
*only* -- that is, opaque blobs. Interpretation of any given blob is
subject to further agreement -- it can be an integer in an independently
determined byte order, or a genuine string, or an array of structs of
integers (in some byte order) and fixed size strings, and so on.

Because register emulation in QEMU is integer-preserving, not
string-preserving (see (2)), we have to jump through a few hoops.

(3a) We defined the memory mapped fw_cfg data register as
DEVICE_BIG_ENDIAN.

The particular choice is not really relevant -- we picked BE only for
consistency with the control register, which *does* transfer integers --
but our choice affects how we must host-encode values from fw_cfg strings.

(3b) Since we want the fw_cfg string "XY" to appear as the [0x58, 0x59]
array on the data register, *and* we picked DEVICE_BIG_ENDIAN, we must
compose the host (== C language) value 0x5859 in the read accessor
function.

(3c) When the guest performs the read access, the immediate uint16_t value
will be 0x5958 (in LE guests) and 0x5859 (in BE guests). However, the
uint16_t value does not matter. The only thing that matters is the byte
pattern [0x58, 0x59], which the guest code must copy into the target
string *without* any byte-swapping.

(4) Now I get to explain where I screwed up. :(

When we decided for big endian *integer* representation in the MMIO data
register -- see (3a) --, I mindlessly added an indiscriminate
byte-swizzling step to the (little endian) guest firmware.

This was a grave error -- it violates (3c) --, but I didn't realize it. I
only saw that the code I otherwise intended for fw_cfg_data_mem_read():

    value = 0;
    for (i = 0; i < size; ++i) {
        value = (value << 8) | fw_cfg_read(s);
    }

didn't produce the expected result in the guest.

In true facepalm style, instead of blaming my guest code (which violated
(3c)), I blamed my host code (which was correct). Ultimately, I coded
ldX_he_p() into fw_cfg_data_mem_read(), because that happened to work.

Obviously (...in retrospect) that was wrong. Only because my host happened
to be LE, ldX_he_p() composed the (otherwise incorrect) host value 0x5958
from the fw_cfg string "XY". And that happened to compensate for the bogus
indiscriminate byte-swizzling in my guest code.

Clearly the current code leaks the host endianness through to the guest,
which is wrong. Any device should work the same regardless of host
endianness.

The solution is to compose the host-endian representation (2) of the big
endian interpretation (3a, 3b) of the fw_cfg string, and to drop the wrong
byte-swizzling in the guest (3c).

Brown paper bag time for me.

Signed-off-by: Laszlo Ersek <[email protected]>
Message-id: 1420024880 [email protected]
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

target-arm: crypto: fix BE host support

The crypto emulation code in target-arm/crypto_helper.c never worked
correctly on big endian hosts, due to the fact that it uses a union
of array types to convert between the native VFP register size (64
bits) and the types used in the algorithms (bytes and 32 bit words)

We cannot just swab between LE and BE when reading and writing the
registers, as the SHA code performs word additions, so instead, add
array accessors for the CRYPTO_STATE type whose LE and BE specific
implementations ensure that the correct array elements are referenced.

Signed-off-by: Ard Biesheuvel <[email protected]>
Acked-by: Laszlo Ersek <[email protected]>
Message-id: 1420208303 [email protected]
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/amit-migration/tags/mig-2.3-1' into staging

A set of patches collected over the holidays.  Mix of optimizations and
fixes.

# gpg: Signature made Fri 16 Jan 2015 07:42:00 GMT using RSA key ID 854083B6
# gpg: Good signature from "Amit Shah <[email protected]>"
# gpg:                 aka "Amit Shah <[email protected]>"
# gpg:                 aka "Amit Shah <[email protected]>"

* remotes/amit-migration/tags/mig-2.3-1:
  vmstate: type-check sub-arrays
  migration_cancel: shutdown migration socket
  Handle bi-directional communication for fd migration
  socket shutdown
  Tests: QEMUSizedBuffer/QEMUBuffer
  QEMUSizedBuffer: only free qsb that qemu_bufopen allocated
  xbzrle: rebuild the cache_is_cached function
  xbzrle: optimize XBZRLE to decrease the cache misses

Signed-off-by: Peter Maydell <[email protected]>

vmstate: type-check sub-arrays

While we cannot check against the type of the full array, we can check
against the type of the fields.

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Amit Shah <[email protected]>

migration_cancel: shutdown migration socket

Force shutdown on migration socket on cancel to cause the cancel
to complete even if the socket is blocked on a dead network.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Amit Shah <[email protected]>
Signed-off-by: Amit Shah <[email protected]>

Handle bi-directional communication for fd migration

libvirt prefers opening the TCP connection itself, for two reasons.
First, connection failed errors can be detected easier, without having
to parse qemu's error output.
Second, libvirt might be asked to secure the transfer by tunnelling the
communication through an TLS layer.
Therefore, libvirt opens the TCP connection itself and passes an FD to qemu
using QMP and a POSIX-specific mechanism.

Hence, in order to make the reverse-path work in such cases, qemu needs to
distinguish if the transmitted FD is a socket (reverse-path available)
or not (reverse-path might not be available) and use the corresponding
abstraction.

Signed-off-by: Cristian Klein <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Amit Shah <[email protected]>
Signed-off-by: Amit Shah <[email protected]>

socket shutdown

Add QEMUFile interface to allow a socket to be 'shut down' - i.e. any
reads/writes will fail (and any blocking read/write will be woken).

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Amit Shah <[email protected]>
Signed-off-by: Amit Shah <[email protected]>

Tests: QEMUSizedBuffer/QEMUBuffer

Modify some of tests/test-vmstate.c due to qemu_bufopen() change.
If you create a QEMUSizedBuffer yourself, you have to explicitly
free it.

Signed-off-by: Yang Hongyang <[email protected]>
Cc: Dr. David Alan Gilbert <[email protected]>
Cc: Juan Quintela <[email protected]>
Cc: Amit Shah <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Amit Shah <[email protected]>

QEMUSizedBuffer: only free qsb that qemu_bufopen allocated

Only free qsb that qemu_bufopen allocated, and also allow
qemu_bufopen accept qsb as input for write operation. It
will make the API more logical:
1.If you create the QEMUSizedBuffer yourself, you need to
  free it by using qsb_free() but not depends on other API
  like qemu_fclose.
2.allow qemu_bufopen() accept QEMUSizedBuffer as input for
  write operation, otherwise, it will be a little strange
  for this API won't accept the second parameter.

This brings API change, since there are only 3
users of this API currently, this change only impact the
first one which will be fixed in patch 2 of this patchset,
so I think it is safe to do this change.

1     70  tests/test-vmstate.c <<open_mem_file_read>>
            return qemu_bufopen("r", qsb);
2    404  tests/test-vmstate.c <<test_save_noskip>>
            QEMUFile *fsave = qemu_bufopen("w", NULL);
3    424  tests/test-vmstate.c <<test_save_skip>>
            QEMUFile *fsave = qemu_bufopen("w", NULL);

Signed-off-by: Yang Hongyang <[email protected]>
Cc: Dr. David Alan Gilbert <[email protected]>
Cc: Juan Quintela <[email protected]>
Cc: Amit Shah <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Amit Shah <[email protected]>

xbzrle: rebuild the cache_is_cached function

Rebuild the cache_is_cached function by cache_get_by_addr. And
drops the asserts because the caller is also asserting the same
thing.

Signed-off-by: ChenLiang <[email protected]>
Signed-off-by: Gonglei <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Amit Shah <[email protected]>

xbzrle: optimize XBZRLE to decrease the cache misses

Avoid hot pages being replaced by others to remarkably decrease cache
misses

Sample results with the test program which quote from xbzrle.txt ran in
vm:(migrate bandwidth:1GE and xbzrle cache size 8MB)

the test program:

include <stdlib.h>
include <stdio.h>
int main()
{
        char *buf = (char *) calloc(4096, 4096);
        while (1) {
            int i;
            for (i = 0; i < 4096 * 4; i++) {
                buf[i * 4096 / 4]++;
            }
            printf(".");
        }
}

before this patch:
virsh qemu-monitor-command test_vm '{"execute": "query-migrate"}'
{"return":{"expected-downtime":1020,"xbzrle-cache":{"bytes":1108284,
"cache-size":8388608,"cache-miss-rate":0.987013,"pages":18297,"overflow":8,
"cache-miss":1228737},"status":"active","setup-time":10,"total-time":52398,
"ram":{"total":12466991104,"remaining":1695744,"mbps":935.559472,
"transferred":5780760580,"dirty-sync-counter":271,"duplicate":2878530,
"dirty-pages-rate":29130,"skipped":0,"normal-bytes":5748592640,
"normal":1403465}},"id":"libvirt-706"}

18k pages sent compressed in 52 seconds.
cache-miss-rate is 98.7%, totally miss.

after optimizing:
virsh qemu-monitor-command test_vm '{"execute": "query-migrate"}'
{"return":{"expected-downtime":2054,"xbzrle-cache":{"bytes":5066763,
"cache-size":8388608,"cache-miss-rate":0.485924,"pages":194823,"overflow":0,
"cache-miss":210653},"status":"active","setup-time":11,"total-time":18729,
"ram":{"total":12466991104,"remaining":3895296,"mbps":937.663549,
"transferred":1615042219,"dirty-sync-counter":98,"duplicate":2869840,
"dirty-pages-rate":58781,"skipped":0,"normal-bytes":1588404224,
"normal":387794}},"id":"libvirt-266"}

194k pages sent compressed in 18 seconds.
The value of cache-miss-rate decrease to 48.59%.

Signed-off-by: ChenLiang <[email protected]>
Signed-off-by: Gonglei <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Amit Shah <[email protected]>

Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2015-01-15' into staging

trivial patches for 2015-01-15

# gpg: Signature made Thu 15 Jan 2015 08:26:26 GMT using RSA key ID A4C3D7DB
# gpg: Good signature from "Michael Tokarev <[email protected]>"
# gpg:                 aka "Michael Tokarev <[email protected]>"
# gpg:                 aka "Michael Tokarev <[email protected]>"

* remotes/mjt/tags/pull-trivial-patches-2015-01-15:
  vl.c: fix some alignment issues
  blizzard: do not depend on VGA internals
  Makefile: Remove config.status and common.env during 'make distclean'
  target-openrisc: bugfix for dec_sys to decode instructions correctly
  Do not hang on full PTY
  misc: Fix new typos in comments
  target-arm: Fix typo in comment (seperately -> separately)
  target-tricore: Fix new typos
  migration/qemu-file.c: Don't shift left into sign bit
  translate-all: Mark map_exec() with the 'unused' attribute
  tests/hd-geo-test.c: Remove unused test_image variable
  vt82c686: avoid out-of-bounds read

Signed-off-by: Peter Maydell <[email protected]>

vl.c: fix some alignment issues

The misalignment was caused by tabs which were used instead of spaces.

Signed-off-by: Marcel Apfelbaum <[email protected]>
Reviewed-by: Stefan Weil <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

blizzard: do not depend on VGA internals

There is nothing that is used by this ARM-specific device.

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

Makefile: Remove config.status and common.env during 'make distclean'

config.status and tests/qemu-iotests/common.env are generated files
that should be deleted during 'make distclean'.

Signed-off-by: Thomas Huth <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

target-openrisc: bugfix for dec_sys to decode instructions correctly

Fixed the decoding of "system" instructions (starting with 0x2)
in dec_sys() in translate.c. In particular, the l.trap instruction
is now correctly decoded, which enables for singlestepping and
breakpoints to be set in GDB.

Signed-off-by: David R. Morrison <[email protected]>
Acked-by: Jia Liu <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

Do not hang on full PTY

Signed-off-by: Don Slutz <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

misc: Fix new typos in comments

recieve -> receive
suprise -> surprise

Cc: Igor Mammedov <[email protected]>
Cc: John Snow <[email protected]>
Signed-off-by: Stefan Weil <[email protected]>
Reviewed-by: John Snow <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

target-arm: Fix typo in comment (seperately -> separately)

Cc: Peter Maydell <[email protected]>
Cc: Greg Bellows <[email protected]>
Signed-off-by: Stefan Weil <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

target-tricore: Fix new typos

adress -> address
managment -> management

Cc: Bastian Koppelmann <[email protected]>
Signed-off-by: Stefan Weil <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

migration/qemu-file.c: Don't shift left into sign bit

Add a cast in qemu_get_be32() to avoid shifting left into the sign
bit of a signed integer (which is undefined behaviour in C).

Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

translate-all: Mark map_exec() with the 'unused' attribute

Mark map_exec() with the 'unused' attribute to avoid '-Wunused-function'
warnings on clang 3.4 or later. This means we don't need to mark it
'inline', which is what we were previously using to suppress the warning
(a trick which only works with gcc, not clang).

Signed-off-by: SeokYeon Hwang <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
[PMM: tweaked comment message a little]
Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

tests/hd-geo-test.c: Remove unused test_image variable

Remove unused variable test_image; this silences a clang warning.

Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>

vt82c686: avoid out-of-bounds read

superio_ioport_readb can read the 256th element of the array.
Coverity reports an out-of-bounds write in superio_ioport_writeb,
but it does not show the corresponding out-of-bounds read
because it cannot prove that it can happen. Fix the root
cause of the problem (zhanghailang's patch instead fixes
the logic in superio_ioport_writeb).

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: zhanghailiang <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>
Cc: [email protected]

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

Mostly bugfixes and cleanups from qemu-devel.  Yet another small patch from
the record/replay series, and a few SCSI and i386 patches as well.

# gpg: Signature made Wed 14 Jan 2015 09:39:14 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg:                 aka "Paolo Bonzini <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  cpus: consistently use QEMU_CLOCK_VIRTUAL_RT for icount_warp_rt timer
  qemu-timer: rename timer_init to timer_init_tl
  scsi: fix cancellation when I/O was completed but DMA was not.
  rules.mak: Fix module build
  hw/scsi/lsi53c895a: add support for additional diag / debug registers
  qemu-common.h: optimise muldiv64 if int128 is available
  target-i386: do not memcpy in and out of xmm_regs
  target-i386: fix movntsd on big-endian hosts
  vl.c: fix regression when reading memory size from config file
  vl: Don't silently change topology when all -smp options were set
  vl: fix max_cpus check
  vl: Avoid unnecessary 'if' nesting
  9pfs: changed to use event_notifier instead of qemu_pipe
  vl.c: fix regression when reading machine type from config file
  char: restore stdio echo on resume from suspend.

Signed-off-by: Peter Maydell <[email protected]>

cpus: consistently use QEMU_CLOCK_VIRTUAL_RT for icount_warp_rt timer

Fix mismatch between timer_new_ms and timer_mod.

Signed-off-by: Pavel Dovgalyuk <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

qemu-timer: rename timer_init to timer_init_tl

timer_init is not called that often. Free the name for an equivalent
of timer_new.

Signed-off-by: Paolo Bonzini <[email protected]>

scsi: fix cancellation when I/O was completed but DMA was not.

Commit d577646 (scsi: Introduce scsi_req_cancel_complete, 2014-09-25)
was supposed to have no semantic change, but it missed a case. When
r->aiocb has already been NULLed, but DMA was not complete and the
SCSI layer was waiting for scsi_req_continue, after the patch the
SCSI layer will not call the .cancel callback of SCSIBusInfo.

Fixes: d5776465ee9a55815792efa34d79de240f4ffd99
Cc: [email protected]
Reported-by: Dr. David Alan Gilbert <[email protected]>
Tested-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

rules.mak: Fix module build

Module build is broken since commit c261d774fb ( rules.mak: Fix DSO
build by pulling in archive symbols). That commit added .mo placeholders
of DSO to -y variables, in order to pull stub symbols to executable. But
the placeholders are unintentionally expanded in -y, rather than
filtered out while linking.

Fix it by moving the -objs expanding to before inserting .mo
placeholders. Note that passing -cflags and -libs to member objects are
also moved to keep it happening before object expanding.

Reported-by: Bharata B Rao <[email protected]>
Tested-by: Bharata B Rao <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hw/scsi/lsi53c895a: add support for additional diag / debug registers

Some ancient Linux kernels read from registers 0x09 and 0x3c-3f during
boot. According to the spec these registers are for diag and debug
purposes only. If they are absend qemu aborts on read.

Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

qemu-common.h: optimise muldiv64 if int128 is available

Let compiler do the job to optimise the function.

Signed-off-by: Frediano Ziglio <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Frediano Ziglio <[email protected]>

target-i386: do not memcpy in and out of xmm_regs

After the next patch, we will move the high parts of AVX and AVX512 registers
in the same array as the SSE registers. This will make it impossible to
memcpy an array of 128-bit values in and out of xmm_regs in one swoop.
Use a for loop instead.

Similarly, always use XMM_Q in translate.c. This avoids introducing bugs
such as the one fixed in the previous patch.

Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target-i386: fix movntsd on big-endian hosts

This was accessing an XMM register's low half without going through XMM_Q.

Cc: [email protected]
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

vl.c: fix regression when reading memory size from config file

This is happening because an actual logic is performed on the memory
arguments inside the main's switch, disregarding the config file content.

Solved by extracting the logic on a separate function and calling it
after the switch.

Signed-off-by: Marcel Apfelbaum <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge remote-tracking branch 'remotes/sstabellini/xen-2015-01-13' into staging

* remotes/sstabellini/xen-2015-01-13:
xen-hvm: increase maxmem before calling xc_domain_populate_physmap
xen-pt: Fix PCI devices re-attach failed

Signed-off-by: Peter Maydell <[email protected]>

xen-hvm: increase maxmem before calling xc_domain_populate_physmap

Increase maxmem before calling xc_domain_populate_physmap_exact to
avoid the risk of running out of guest memory. This way we can also
avoid complex memory calculations in libxl at domain construction
time.

This patch fixes an abort() when assigning more than 4 NICs to a VM.

Signed-off-by: Stefano Stabellini <[email protected]>
Signed-off-by: Don Slutz <[email protected]>

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

# gpg: Signature made Tue 13 Jan 2015 13:48:06 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg:                 aka "Stefan Hajnoczi <[email protected]>"

* remotes/stefanha/tags/block-pull-request: (38 commits)
  NVMe: Set correct VS Value for 1.1 Compliant Controllers
  MAINTAINERS: Add migration/block* to block subsystem
  MAINTAINERS: Update email addresses for Chrysostomos Nanakos
  nvme: Fix get/set number of queues feature
  ide: Implement VPD response for ATAPI
  block: Split BLOCK_OP_TYPE_COMMIT to BLOCK_OP_TYPE_COMMIT_{SOURCE, TARGET}
  block: limited request size in write zeroes unsupported path
  coroutine: try harder not to delete coroutines
  coroutine: drop qemu_coroutine_adjust_pool_size
  coroutine: rewrite pool to avoid mutex
  QSLIST: add lock-free operations
  test-coroutine: avoid overflow on 32-bit systems
  qemu-thread: add per-thread atexit functions
  coroutine-ucontext: use __thread
  qemu-iotests: Add supported os parameter for python tests
  qemu-iotests: Add "_supported_os Linux" to 058
  qemu-iotests: Replace "/bin/true" with "true"
  .gitignore: Ignore generated "common.env"
  libqos: Convert malloc-pc allocator to a generic allocator
  migration/block: fix pending() return value
  ...

Signed-off-by: Peter Maydell <[email protected]>

NVMe: Set correct VS Value for 1.1 Compliant Controllers

According to NVMe specifications Bits 15:08 represent Minor Version number.

Signed-off-by: Anubhav Rakshit <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

MAINTAINERS: Add migration/block* to block subsystem

We are moving block-migration.c to the separated migration directory,
keep this file watched by block maintainers is a good idea.

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

MAINTAINERS: Update email addresses for Chrysostomos Nanakos

Remove first email address and let the one from which I am contributing.

Signed-off-by: Chrysostomos Nanakos <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

nvme: Fix get/set number of queues feature

According to the specification, the low 16 bits should contain the number of
I/O submission queues, and the high 16 bits should contain the number of
I/O completion queues.

Signed-off-by: Alex Friedman <[email protected]>
Acked-by: Keith Busch <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

ide: Implement VPD response for ATAPI

SCSI devices have multiple kinds of queries they need to respond
to, as defined in the "cmd inquiry" section in MMC-6 and SPC-3.

Relevent sections:
MMC-6 revision 2g:
      Non-VPD response data and pointer to SPC-3;
      Section 6.8 "Inquiry Command"
SPC-3 revision 23:
      Inquiry command and error handling:
      Section 6.4 "INQUIRY command"
      VPD data pages format:
      Section 7.6 "Vital product data parameters"

We implement these Vital Product Data queries for SCSI, but not for
ATAPI through IDE. The result is that if you are looking for the WWN
identifier via tools such as sg3_utils, you will be unable to query
our CD/DVD rom device to obtain it.

This patch adds the minimum number of mandatory responses as defined
by SPC-3, which include the "supported pages" response (page 0x00)
and the "Device Identification" response (page 0x83). It also correctly
responds when it receives a request for an illegal page to improve
error output from related tools.

The Device ID page contains an arbitrary list of identification
strings of various formats; the ID strings included in this patch
were chosen to mimic those provided by the libata driver when
emulating this SCSI query (model, serial, and wwn when present.)

Example:

# libata emulated response
[root@localhost ~]# sg_inq --id /dev/sda
VPD INQUIRY: Device Identification page
  Designation descriptor number 1, descriptor length: 24
    designator_type: vendor specific [0x0],  code_set: ASCII
    associated with the addressed logical unit
      vendor specific: QM00001
  Designation descriptor number 2, descriptor length: 72
    designator_type: T10 vendor identification,  code_set: ASCII
    associated with the addressed logical unit
      vendor id: ATA
      vendor specific: QEMU HARDDISK                           QM00001

# QEMU generated ATAPI response, with WWN
[root@localhost ~]# sg_inq --id /dev/sr0
VPD INQUIRY: Device Identification page
  Designation descriptor number 1, descriptor length: 24
    designator_type: vendor specific [0x0],  code_set: ASCII
    associated with the addressed logical unit
      vendor specific: QM00005
  Designation descriptor number 2, descriptor length: 72
    designator_type: T10 vendor identification,  code_set: ASCII
    associated with the addressed logical unit
      vendor id: ATA
      vendor specific: QEMU DVD-ROM                            QM00005
  Designation descriptor number 3, descriptor length: 12
    designator_type: NAA,  code_set: Binary
    associated with the addressed logical unit
      NAA 5, IEEE Company_id: 0xc50
      Vendor Specific Identifier: 0x15ea71bb
      [0x5000c50015ea71bb]

See also: hw/scsi/scsi-disk.c, scsi_disk_emulate_inquiry()

Signed-off-by: John Snow <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block: Split BLOCK_OP_TYPE_COMMIT to BLOCK_OP_TYPE_COMMIT_{SOURCE, TARGET}

Like BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET,
block-commit involves two asymmetric devices.

This change is not user-visible (yet), because commit only works with
device names.

But once we enable backing reference in blockdev-add, or specifying
node-name in block-commit command, we don't want the user to start two
commit jobs on the same backing chain, which will corrupt things because
of the final bdrv_swap.

Before we have per category blockers, splitting this type is still
better.

[Resolved virtio-blk dataplane conflict by replacing
BLOCK_OP_TYPE_COMMIT with both BLOCK_OP_TYPE_COMMIT_{SOURCE, TARGET}.
They are safe since the block job runs in the same AioContext as the
dataplane IOThread.
--Stefan]

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block: limited request size in write zeroes unsupported path

If bs->bl.max_write_zeroes is large and we end up in the unsupported
path we might allocate a lot of memory for the iovector and/or even
generate an oversized requests.

Fix this by limiting the request by the minimum of the reported
maximum transfer size or 16MB (32768 sectors).

Reported-by: Denis V. Lunev <[email protected]>
Signed-off-by: Peter Lieven <[email protected]>
Reviewed-by: Denis V. Lunev <[email protected]>
Message-id: 1420457389 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

coroutine: try harder not to delete coroutines

Placing coroutines on the global pool should be preferrable, because it
can help all threads. But if the global pool is full, we can still
try to save some allocations by stashing completed coroutines on the
local pool. This is quite cheap too, because it does not require
atomic operations, and provides a gain of 15% in the best case.

Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Message-id: 1417518350 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

coroutine: drop qemu_coroutine_adjust_pool_size

This is not needed anymore. The new TLS-based algorithm is adaptive.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Message-id: 1417518350 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

coroutine: rewrite pool to avoid mutex

This patch removes the mutex by using fancy lock-free manipulation of
the pool.  Lock-free stacks and queues are not hard, but they can suffer
from the ABA problem so they are better avoided unless you have some
deferred reclamation scheme like RCU.  Otherwise you have to stick
with adding to a list, and emptying it completely.  This is what this
patch does, by coupling a lock-free global list of available coroutines
with per-CPU lists that are actually used on coroutine creation.

Whenever the destruction pool is big enough, the next thread that runs
out of coroutines will steal the whole destruction pool.  This is positive
in two ways:

1) the allocation does not have to do any atomic operation in the fast
path, it's entirely using thread-local storage.  Once every POOL_BATCH_SIZE
allocations it will do a single atomic_xchg.  Release does an atomic_cmpxchg
loop, that hopefully doesn't cause any starvation, and an atomic_inc.

A later patch will also remove atomic operations from the release path,
and try to avoid the atomic_xchg altogether---succeeding in doing so if
all devices either use ioeventfd or are not submitting requests actively.

2) in theory this should be completely adaptive.  The number of coroutines
around should be a little more than POOL_BATCH_SIZE * number of allocating
threads; so this also empties qemu_coroutine_adjust_pool_size.  (The previous
pool size was POOL_BATCH_SIZE * number of block backends, so it was a bit
more generous.  But if you actually have many high-iodepth disks, it's better
to put them in different iothreads, which will also use separate thread
pools and aio=native file descriptors).

This speeds up perf/cost (in tests/test-coroutine) by a factor of ~1.33.
No matter if we end with some kind of coroutine bypass scheme or not,
it cannot hurt to optimize hot code.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Message-id: 1417518350 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

QSLIST: add lock-free operations

These operations are trivial to implement and do not have ABA problems.
They are enough to implement simple multiple-producer, single consumer
lock-free lists or, as in the next patch, the multiple consumers can
steal a whole batch of elements and process them at their leisure.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Message-id: 1417518350 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

test-coroutine: avoid overflow on 32-bit systems

unsigned long is not large enough to represent 1000000000 * duration there.
Just use floating point.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Message-id: 1417518350 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

qemu-thread: add per-thread atexit functions

Destructors are the main additional feature of pthread TLS compared
to __thread.  If we were using C++ (hint, hint!) we could have used
thread-local objects with a destructor.  Since we are not, instead,
we add a simple Notifier-based API.

Note that the notifier must be per-thread as well.  We can add a
global list as well later, perhaps.

The Win32 implementation has some complications because a) detached
threads used not to have a QemuThreadData; b) the main thread does
not go through win32_start_routine, so we have to use atexit too.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Message-id: 1417518350 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

coroutine-ucontext: use __thread

ELF thread local storage is about 10% faster on tests/test-coroutine's
perf/cost test. The timing on my machine is 190ns per iteration with
pthread TLS, 170 with ELF TLS.

Based on a patch by Kevin Wolf and Peter Lieven, but redone to follow
the model of coroutine-win32.c (including the important "noinline"
attribute!).

Platforms without thread-local storage (OpenBSD probably?) will need
a new-enough GCC for this to compile, in order to use the same emutls
support that Windows already relies on.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Message-id: 1417518350 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

qemu-iotests: Add supported os parameter for python tests

If I understand correctly, qemu-iotests never meant to be portable. We
only support Linux for all the shell cases, but didn't specify it for
python tests. Now add this and default all the python tests as Linux
only. If we cares enough later, we can override the parameter in
individual cases.

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

qemu-iotests: Add "_supported_os Linux" to 058

Other cases have this, and this test is not portable as well, as we want
to add "make check-block" to "make check", it shouldn't fail on Mac OS
X.

Reported-by: Peter Maydell <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

qemu-iotests: Replace "/bin/true" with "true"

The former is not portable because on Mac OSX it is /usr/bin/true.

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

.gitignore: Ignore generated "common.env"

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

xen-pt: Fix PCI devices re-attach failed

Use the 'xl pci-attach $DomU $BDF' command to attach more than
one PCI devices to the guest, then detach the devices with
'xl pci-detach $DomU $BDF', after that, re-attach these PCI
devices again, an error message will be reported like following:

    libxl: error: libxl_qmp.c:287:qmp_handle_error_response: receive
    an error message from QMP server: Duplicate ID 'pci-pt-03_10.1'
    for device.

If using the 'address_space_memory' as the parameter of
'memory_listener_register', 'xen_pt_region_del' will not be called
if the memory region's name is not 'xen-pci-pt-*' when the devices
is detached. This will cause the device's related QemuOpts object
not be released properly.

Using the device's address space can avoid such issue, because the
calling count of 'xen_pt_region_add' when attaching and the calling
count of 'xen_pt_region_del' when detaching is the same, so all the
memory region ref and unref by the 'xen_pt_region_add' and
'xen_pt_region_del' can be released properly.

Signed-off-by: Liang Li <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reported-by: Longtao Pang <[email protected]>

libqos: Convert malloc-pc allocator to a generic allocator

The allocator in malloc-pc has been extracted, so it can be used in every arch.
This operation showed that both the alloc and free functions can be also
generic.
Because of this, the QGuestAllocator has been removed from is function to wrap
the alloc and free function, and now just contains the allocator parameters.
As a result, only the allocator initalizer and unitializer are arch dependent.

Signed-off-by: Marc Marí <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: John Snow <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

migration/block: fix pending() return value

Because of wrong return value of .save_live_pending() in
migration/block.c, migration finishes before the whole disk is
transferred. Such situation occurs when the migration process is fast
enough, for example when source and dest are on the same host.

If in the bulk phase we return something < max_size, we will skip
transferring the tail of the device. Currently we have "set pending to
BLOCK_SIZE if it is zero" for bulk phase, but there no guarantee, that
it will be < max_size.

True approach is to return, for example, max_size+1 when we are in the
bulk phase.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-id: 1419933856 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

iotests: Filter out "I/O thread spun..." warning

Filter out the "main loop: WARNING: I/O thread spun for..." warning from
qemu output (it hardly matters for code specifically testing I/O).

Furthermore, use _filter_qemu in all the custom functions which run
qemu.

Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: Test blockdev-backup in 055

This applies cases on drive-backup on blockdev-backup, except cases with
target format and mode.

Also add a case to check source == target.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Message-id: 1418899027 [email protected]
Signed-off-by: Max Reitz <[email protected]>

block: Add blockdev-backup to transaction

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Message-id: 1418899027 [email protected]
Signed-off-by: Max Reitz <[email protected]>

qmp: Add command 'blockdev-backup'

Similar to drive-backup, but this command uses a device id as target
instead of creating/opening an image file.

Also add blocker on target bs, since the target is also a named device
now.

Add check and report error for bs == target which became possible but is
an illegal case with introduction of blockdev-backup.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Message-id: 1418899027 [email protected]
Signed-off-by: Max Reitz <[email protected]>

qapi: Comment version info in TransactionAction

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Message-id: 1418899027 [email protected]
Signed-off-by: Max Reitz <[email protected]>

block: fix spoiling all dirty bitmaps by mirror and migration

Mirror and migration use dirty bitmaps for their purposes, and since
commit [block: per caller dirty bitmap] they use their own bitmaps, not
the global one. But they use old functions bdrv_set_dirty and
bdrv_reset_dirty, which change all dirty bitmaps.

Named dirty bitmaps series by Fam and Snow are affected: mirroring and
migration will spoil all (not related to this mirroring or migration)
named dirty bitmaps.

This patch fixes this by adding bdrv_set_dirty_bitmap and
bdrv_reset_dirty_bitmap, which change concrete bitmap. Also, to prevent
such mistakes in future, old functions bdrv_(set,reset)_dirty are made
static, for internal block usage.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
CC: John Snow <[email protected]>
CC: Fam Zheng <[email protected]>
CC: Denis V. Lunev <[email protected]>
CC: Stefan Hajnoczi <[email protected]>
CC: Kevin Wolf <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Message-id: 1417081246 [email protected]
Signed-off-by: Max Reitz <[email protected]>