Git Repo - qemu.git/log

Merge remote-tracking branch 'aurel32/tags/pull-target-sh4-20170513' into staging

Queued target/sh4 patches

# gpg: Signature made Sat 13 May 2017 10:25:41 AM BST
# gpg:                using RSA key 0xBA9C78061DDD8C9B
# gpg: Good signature from "Aurelien Jarno <[email protected]>"
# gpg:                 aka "Aurelien Jarno <[email protected]>"
# gpg:                 aka "Aurelien Jarno <[email protected]>"
# Primary key fingerprint: 7746 2642 A9EF 94FD 0F77  196D BA9C 7806 1DDD 8C9B

* aurel32/tags/pull-target-sh4-20170513:
  target/sh4: use cpu_loop_exit_restore
  target/sh4: trap unaligned accesses
  target/sh4: movua.l is an SH4-A only instruction
  target/sh4: implement tas.b using atomic helper
  target/sh4: generate fences for SH4
  target/sh4: optimize gen_write_sr using extract op
  target/sh4: optimize gen_store_fpr64
  target/sh4: fold ctx->bstate = BS_BRANCH into gen_conditional_jump
  target/sh4: only save flags state at the end of the TB
  target/sh4: fix BS_EXCP exit
  target/sh4: fix BS_STOP exit
  target/sh4: move DELAY_SLOT_TRUE flag into a separate global
  target/sh4: do not include DELAY_SLOT_TRUE in the TB state
  target/sh4: get rid of DELAY_SLOT_CLEARME
  target/sh4: split ctx->flags into ctx->tbflags and ctx->envflags

Signed-off-by: Stefan Hajnoczi <[email protected]>

Merge remote-tracking branch 'rth/tags/pull-s390-20170512' into staging

Queued target/s390 patches

# gpg: Signature made Sat 13 May 2017 12:33:08 AM BST
# gpg:                using RSA key 0xAD1270CC4DD0279B
# gpg: Good signature from "Richard Henderson <[email protected]>"
# gpg:                 aka "Richard Henderson <[email protected]>"
# gpg:                 aka "Richard Henderson <[email protected]>"
# Primary key fingerprint: 9CB1 8DDA F8E8 49AD 2AFC  16A4 AD12 70CC 4DD0 279B

* rth/tags/pull-s390-20170512:
  target/s390x: implement serialization in BRANCH CONDITION
  target/s390x: fix SIGNAL PROCESSOR return value
  target/s390x: mask the SIGP order_code using SIGP_ORDER_MASK
  target/s390x: Use atomic operations for LOAD AND OP
  target/s390x: Use atomic operations for COMPARE SWAP
  target/s390x: Implement LOAD PAIR DISJOINT
  target/s390x: Diagnose specification exception for atomics
  target/s390x: Implement LOAD PROGRAM PARAMETER
  target/s390x: Implement STORE FACILITIES LIST EXTENDED

Signed-off-by: Stefan Hajnoczi <[email protected]>

Merge remote-tracking branch 'kraxel/tags/pull-usb-20170512-1' into staging

usb: bugfixes, doc update

# gpg: Signature made Fri 12 May 2017 01:20:29 PM BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg:                 aka "Gerd Hoffmann <[email protected]>"
# gpg:                 aka "Gerd Hoffmann (private) <[email protected]>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* kraxel/tags/pull-usb-20170512-1:
  hw/usb/dev-serial: Do not try to set vendorid or productid properties
  xhci: relax link check
  usb-hub: clear PORT_STAT_SUSPEND on wakeup
  xhci: fix logging
  usb-redir: fix stack overflow in usbredir_log_data
  qemu-doc: Update to use the new way of attaching USB devices

Signed-off-by: Stefan Hajnoczi <[email protected]>

Merge remote-tracking branch 'kraxel/tags/pull-ui-20170512-1' into staging

ui: add egl-headless
ui: some vnc cleanups
ui: absolute events for input-linux

# gpg: Signature made Fri 12 May 2017 12:50:07 PM BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg:                 aka "Gerd Hoffmann <[email protected]>"
# gpg:                 aka "Gerd Hoffmann (private) <[email protected]>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* kraxel/tags/pull-ui-20170512-1:
  vnc: replace hweight_long() with ctpopl()
  vnc: simple clean up
  opengl: add egl-headless display
  egl: explicitly ask for core context
  egl-helpers: add missing error check
  egl-helpers: fix display init for x11
  egl-helpers: drop support for gles and debug logging
  virtio-gpu: move virtio_gpu_gl_block
  ui: input-linux: Add absolute event support
  ui: Support non-zero minimum values for absolute input axes

Signed-off-by: Stefan Hajnoczi <[email protected]>

Merge remote-tracking branch 'ehabkost/tags/x86-and-machine-pull-request' into staging

x86 and machine queue, 2017-05-11

Highlights:
* New "-numa cpu" option
* NUMA distance configuration
* migration/i386 vmstatification

# gpg: Signature made Thu 11 May 2017 08:16:07 PM BST
# gpg:                using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <[email protected]>"
# gpg: Note: This key has expired!
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* ehabkost/tags/x86-and-machine-pull-request: (29 commits)
  migration/i386: Remove support for pre-0.12 formats
  vmstatification: i386 FPReg
  migration/i386: Remove old non-softfloat 64bit FP support
  tests: check -numa node,cpu=props_list usecase
  numa: add '-numa cpu,...' option for property based node mapping
  numa: remove node_cpu bitmaps as they are no longer used
  numa: use possible_cpus for not mapped CPUs check
  machine: call machine init from wrapper
  numa: remove no longer need numa_post_machine_init()
  tests: numa: add case for QMP command query-cpus
  QMP: include CpuInstanceProperties into query_cpus output output
  virt-arm: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  spapr: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  numa: do default mapping based on possible_cpus instead of node_cpu bitmaps
  numa: mirror cpu to node mapping in MachineState::possible_cpus
  numa: add check that board supports cpu_index to node mapping
  virt-arm: add node-id property to CPU
  pc: add node-id property to CPU
  spapr: add node-id property to sPAPR core
  ...

Signed-off-by: Stefan Hajnoczi <[email protected]>

Merge remote-tracking branch 'kraxel/tags/pull-vga-20170511-1' into staging

make display updates thread safe, batch #2

# gpg: Signature made Thu 11 May 2017 03:41:51 PM BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg:                 aka "Gerd Hoffmann <[email protected]>"
# gpg:                 aka "Gerd Hoffmann (private) <[email protected]>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* kraxel/tags/pull-vga-20170511-1:
  vga: fix display update region calculation
  sm501: make display updates thread safe
  tcx: make display updates thread safe
  cg3: make display updates thread safe

Signed-off-by: Stefan Hajnoczi <[email protected]>

Merge remote-tracking branch 'dgibson/tags/ppc-for-2.10-20170511' into staging

ppc patch queue for 2017-05-11

This pull request supersedes the one from yesterday (20170510), fixing
an important style bug in one patch, and adding an extra couple of
simple patches.

Highlights of this set:
  * Some fixes for POWER9
  * TCG support for POWER9 radix MMU
  * VGA rom for Mac machine types
  * Fixes for the XICS interrupt controller
  * MTTCG support for ppc targets

As suggested by Paolo, I've tried to add the Docker tests to my
standard pre-pull-request tests.  I haven't wholly suceeded; this has
been tested with some of the Docker images, but others I haven't
managed due to problems that as best I can tell are not due to
problems in this patch series.  I'll continue working on this for
future pull requests.  Specifically, 'travis', 'fedora', and 'centos6'
seem to work.  'min-glib' jammed while gtesting moxie, which seems
very unlikely to be caused by this series.  'ubuntu', 'debian' and
'debian-bootstrap' hit build errors almost immediately that look like
problems with the container configuration, and 'debian-*-cross' hit
build errors later on which also look like missing dependencies from
the container.

# gpg: Signature made Thu 11 May 2017 05:13:46 AM BST
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <[email protected]>"
# gpg:                 aka "David Gibson (kernel.org) <[email protected]>"
# gpg:                 aka "David Gibson (Red Hat) <[email protected]>"
# gpg:                 aka "David Gibson (ozlabs.org) <[email protected]>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* dgibson/tags/ppc-for-2.10-20170511: (23 commits)
  target/ppc: Avoid printing wrong aliases in CPU help text
  pnv: Fix build failures on some host platforms
  target/ppc: Allow workarounds for POWER9 DD1
  spapr: Don't accidentally advertise HTM support on POWER9
  ppc: xics: fix compilation with CentOS 6
  target/ppc: Enable RADIX mmu mode for pseries TCG guest
  target/ppc: Implement ISA V3.00 radix page fault handler
  target/ppc: Change tlbie invalid fields for POWER9 support
  target/ppc: Update tlbie to check privilege level based on GTSE
  target/ppc: Set UPRT and GTSE on all cpus in H_REGISTER_PROCESS_TABLE
  ppc: add qemu_vga.ndrv ROM to fw_cfg interface for NewWorld Macs
  ppc: add qemu_vga.ndrv ROM to fw_cfg interface for OldWorld Macs
  Add QemuMacDrivers qemu_vga.ndrv revision d4e7d7a built as submodule
  Add QemuMacDrivers as submodule
  ppc/xics: preserve P and Q bits for KVM IRQs
  ppc/xics: Fix stale irq->status bits after get
  target/ppc: do not reset reserve_addr in exec_enter
  tcg: enable MTTCG by default for PPC64 on x86
  cpus: Fix CPU unplug for MTTCG
  target/ppc: Generate fence operations
  ...

Signed-off-by: Stefan Hajnoczi <[email protected]>

target/sh4: use cpu_loop_exit_restore

Use cpu_loop_exit_restore when using cpu_restore_state and cpu_loop_exit
together.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Aurelien Jarno <[email protected]>

target/sh4: trap unaligned accesses

SH4 requires that memory accesses are naturally aligned, except for the
SH4-A movua.l instructions which can do unaligned loads.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Aurelien Jarno <[email protected]>

target/sh4: movua.l is an SH4-A only instruction

At the same time change the comment describing the instruction the same
way than other instruction, so that the code is easier to read and search.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Aurelien Jarno <[email protected]>

target/sh4: implement tas.b using atomic helper

We only emulate UP SH4, however as the tas.b instruction is used in the GNU
libc, this improve linux-user emulation.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Aurelien Jarno <[email protected]>

target/sh4: generate fences for SH4

synco is a SH4-A only instruction.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Aurelien Jarno <[email protected]>

target/sh4: optimize gen_write_sr using extract op

This doesn't change the generated code on x86, but optimizes it on most
RISC architectures and makes the code simpler to read.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Aurelien Jarno <[email protected]>

target/sh4: optimize gen_store_fpr64

Using extr and avoiding intermediate temps.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Aurelien Jarno <[email protected]>

target/sh4: fold ctx->bstate = BS_BRANCH into gen_conditional_jump

Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Aurelien Jarno <[email protected]>

target/sh4: only save flags state at the end of the TB

There is no need to save flags when entering and exiting the delay slot.
They can be saved only when reaching the end of the TB. If the TB is
interrupted before by an exception, they will be restored using
restore_state_to_opc.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Aurelien Jarno <[email protected]>

target/sh4: fix BS_EXCP exit

In case of exception, there is no need to call tcg_gen_exit_tb as the
exception helper won't return.

Also fix a few cases where BS_BRANCH is called instead of BS_EXCP.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Aurelien Jarno <[email protected]>

target/sh4: fix BS_STOP exit

When stopping the translation because the state has changed, goto_tb
should not be used as it might link TB with different flags.

Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Aurelien Jarno <[email protected]>

target/sh4: move DELAY_SLOT_TRUE flag into a separate global

Instead of using one bit of the env flags to store the condition of the
next delay slot, use a separate global. It simplifies reading and
writing the flags variable and also removes some confusion between
ctx->envflags and env->flags.

Note that the global is first transfered to a temp in order to be
able to discard the global before the brcond.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Aurelien Jarno <[email protected]>

target/sh4: do not include DELAY_SLOT_TRUE in the TB state

DELAY_SLOT_TRUE is used as a dynamic condition for the branch after the
delay slot instruction. It is not used in code generation, so there is
no need to including in the TB state.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Aurelien Jarno <[email protected]>

target/sh4: get rid of DELAY_SLOT_CLEARME

Now that ctx->flags has been split, it becomes clear that
DELAY_SLOT_CLEARME has not impact on the code generation: in both case
ctx->envflags is cleared, either by clearing all the flags, or by
setting it to 0. This is left-over from pre-TCG era.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Aurelien Jarno <[email protected]>

target/sh4: split ctx->flags into ctx->tbflags and ctx->envflags

There is a confusion (and not only in the SH4 target) between tb->flags,
env->flags and ctx->flags. To avoid it, split ctx->flags into
ctx->tbflags and ctx->envflags. ctx->tbflags stays unchanged during the
whole TB translation, while ctx->envflags evolves and is kept in sync
with env->flags using TCG instructions. ctx->envflags now only contains
the part that of env->flags that is contained in the TB state, i.e. the
DELAY_SLOT* flags.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Aurelien Jarno <[email protected]>

target/s390x: implement serialization in BRANCH CONDITION

Signed-off-by: Aurelien Jarno <[email protected]>
Message-Id: <20170509082800 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/s390x: fix SIGNAL PROCESSOR return value

The SIGNAL PROCESSOR helper returns its value through the CC register.
set_cc_static should be called just after the helper.

Signed-off-by: Aurelien Jarno <[email protected]>
Message-Id: <20170509082800 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/s390x: mask the SIGP order_code using SIGP_ORDER_MASK

For that move the definition from kvm.c to cpu.h

Reviewed-by: Thomas Huth <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Aurelien Jarno <[email protected]>
Message-Id: <20170509082800 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/s390x: Use atomic operations for LOAD AND OP

Reviewed-by: Aurelien Jarno <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/s390x: Use atomic operations for COMPARE SWAP

Reviewed-by: Aurelien Jarno <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/s390x: Implement LOAD PAIR DISJOINT

Reviewed-by: Aurelien Jarno <[email protected]>
Signed-off-by: Eric Bischoff <[email protected]>
Message-Id: <20170228120134 [email protected]>
[rth: Combine the two via insn->data; free the address temps.]
Signed-off-by: Richard Henderson <[email protected]>

target/s390x: Diagnose specification exception for atomics

All of the interlocked access facility instructions raise a
specification exception for unaligned accesses. Do this by
using the (previously unused) unaligned_access hook.

Reviewed-by: Aurelien Jarno <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/s390x: Implement LOAD PROGRAM PARAMETER

Linux arch/s390/kernel/head(64).S uses LPP instruction if it is
available in facilities list provided by stfl/stfle instruction.
This is the case of newer z/System generations and their qemu
definition.

The description of LPP is at
http://www-01.ibm.com/support/docview.wss?uid=isg26fcd1cc32246f4c8852574ce0044734a

Reviewed-by: Aurelien Jarno <[email protected]>
Signed-off-by: Miroslav Benes <[email protected]>
Message-Id: <20170227085353 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/s390x: Implement STORE FACILITIES LIST EXTENDED

At the same time, improve STORE FACILITIES LIST
so that we don't hard-code the list for all cpus.

Signed-off-by: Richard Henderson <[email protected]>

Merge tag 'tracing-pull-request' into staging

# gpg: Signature made Fri 12 May 2017 10:38:07 AM EDT
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg:                 aka "Stefan Hajnoczi <[email protected]>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* tag 'tracing-pull-request':
  trace: add sanity check

Signed-off-by: Stefan Hajnoczi <[email protected]>

Merge tag 'block-pull-request' into staging

# gpg: Signature made Fri 12 May 2017 10:37:12 AM EDT
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg:                 aka "Stefan Hajnoczi <[email protected]>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* tag 'block-pull-request':
  aio: add missing aio_notify() to aio_enable_external()
  block: Simplify BDRV_BLOCK_RAW recursion
  coroutine: remove GThread implementation

Signed-off-by: Stefan Hajnoczi <[email protected]>

Merge remote-tracking branch 'kwolf/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Thu 11 May 2017 10:31:37 AM EDT
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* kwolf/tags/for-upstream: (58 commits)
  MAINTAINERS: Add qemu-progress to the block layer
  qcow2: Discard/zero clusters by byte count
  qcow2: Assert that cluster operations are aligned
  qcow2: Optimize write zero of unaligned tail cluster
  iotests: Add test 179 to cover write zeroes with unmap
  iotests: Improve _filter_qemu_img_map
  qcow2: Optimize zero_single_l2() to minimize L2 churn
  qcow2: Make distinction between zero cluster types obvious
  qcow2: Name typedef for cluster type
  qcow2: Correctly report status of preallocated zero clusters
  block: Update comments on BDRV_BLOCK_* meanings
  qcow2: Use consistent switch indentation
  qcow2: Nicer variable names in qcow2_update_snapshot_refcount()
  tests: Add coverage for recent block geometry fixes
  blkdebug: Add ability to override unmap geometries
  blkdebug: Simplify override logic
  blkdebug: Add pass-through write_zero and discard support
  blkdebug: Refactor error injection
  blkdebug: Sanity check block layer guarantees
  qemu-io: Switch 'map' output to byte-based reporting
  ...

Signed-off-by: Stefan Hajnoczi <[email protected]>

trace: add sanity check

If trace backend is set to TRACE_NOP, trace_get_vcpu_event_count
returns 0, cause bitmap_new call abort.

The abort can be triggered as follows:

  $ ./configure --enable-trace-backend=nop --target-list=x86_64-softmmu
  $ gdb ./x86_64-softmmu/qemu-system-x86_64 -M q35,accel=kvm -m 1G
  (gdb) bt
  #0  0x00007ffff04e25f7 in raise () from /lib64/libc.so.6
  #1  0x00007ffff04e3ce8 in abort () from /lib64/libc.so.6
  #2  0x00005555559de905 in bitmap_new (nbits=<optimized out>)
      at /home/root/git/qemu2.git/include/qemu/bitmap.h:96
  #3  cpu_common_initfn (obj=0x555556621d30) at qom/cpu.c:399
  #4  0x0000555555a11869 in object_init_with_type (obj=0x555556621d30, ti=0x55555656bbb0) at qom/object.c:341
  #5  0x0000555555a11869 in object_init_with_type (obj=0x555556621d30, ti=0x55555656bd30) at qom/object.c:341
  #6  0x0000555555a11efc in object_initialize_with_type (data=data@entry=0x555556621d30, size=76560,
      type=type@entry=0x55555656bd30) at qom/object.c:376
  #7  0x0000555555a12061 in object_new_with_type (type=0x55555656bd30) at qom/object.c:484
  #8  0x0000555555a121c5 in object_new (typename=typename@entry=0x555556550340 "qemu64-x86_64-cpu")
      at qom/object.c:494
  #9  0x00005555557f6e3d in pc_new_cpu (typename=typename@entry=0x555556550340 "qemu64-x86_64-cpu", apic_id=0,
      errp=errp@entry=0x5555565391b0 <error_fatal>) at /home/root/git/qemu2.git/hw/i386/pc.c:1101
  #10 0x00005555557fa33e in pc_cpus_init (pcms=pcms@entry=0x5555565f9690)
      at /home/root/git/qemu2.git/hw/i386/pc.c:1184
  #11 0x00005555557fe0f6 in pc_q35_init (machine=0x5555565f9690) at /home/root/git/qemu2.git/hw/i386/pc_q35.c:121
  #12 0x000055555574fbad in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4562

Signed-off-by: Anthony Xu <[email protected]>
Message-id: 1494369432 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

aio: add missing aio_notify() to aio_enable_external()

The main loop uses aio_disable_external()/aio_enable_external() to
temporarily disable processing of external AioContext clients like
device emulation.

This allows monitor commands to quiesce I/O and prevent the guest from
submitting new requests while a monitor command is in progress.

The aio_enable_external() API is currently broken when an IOThread is in
aio_poll() waiting for fd activity when the main loop re-enables
external clients.  Incrementing ctx->external_disable_cnt does not wake
the IOThread from ppoll(2) so fd processing remains suspended and leads
to unresponsive emulated devices.

This patch adds an aio_notify() call to aio_enable_external() so the
IOThread is kicked out of ppoll(2) and will re-arm the file descriptors.

The bug can be reproduced as follows:

  $ qemu -M accel=kvm -m 1024 \
         -object iothread,id=iothread0 \
         -device virtio-scsi-pci,iothread=iothread0,id=virtio-scsi-pci0 \
         -drive if=none,id=drive0,aio=native,cache=none,format=raw,file=test.img \
         -device scsi-hd,id=scsi-hd0,drive=drive0 \
         -qmp tcp::5555,server,nowait

  $ scripts/qmp/qmp-shell localhost:5555
  (qemu) blockdev-snapshot-sync device=drive0 snapshot-file=sn1.qcow2
         mode=absolute-paths format=qcow2

After blockdev-snapshot-sync completes the SCSI disk will be
unresponsive.  This leads to request timeouts inside the guest.

Reported-by: Qianqian Zhu <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>
Message-id: 20170508180705 [email protected]
Suggested-by: Fam Zheng <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block: Simplify BDRV_BLOCK_RAW recursion

Since we are already in coroutine context during the body of
bdrv_co_get_block_status(), we can shave off a few layers of
wrappers when recursing to query the protocol when a format driver
returned BDRV_BLOCK_RAW.

Note that we are already using the correct recursion later on in
the same function, when probing whether the protocol layer is sparse
in order to find out if we can add BDRV_BLOCK_ZERO to an existing
BDRV_BLOCK_DATA|BDRV_BLOCK_OFFSET_VALID.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Message-id: 20170504173745 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

coroutine: remove GThread implementation

The GThread implementation is not functional enough to actually
run QEMU reliably. While it was potentially useful for debugging,
we have a scripts/qemugdb/coroutine.py to enable tracing of
ucontext coroutines in GDB, so that removes the only reason for
GThread to exist.

Signed-off-by: Daniel P. Berrange <[email protected]>
Acked-by: Alex Bennée <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

vnc: replace hweight_long() with ctpopl()

ctpopl() has a better implementation than hweight_long() and ui/vnc.c
being the last user of hweight_long(), we can simply remove it.

Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Message-id: 1489415605 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

vnc: simple clean up

It is unnecessary to assign 'packed_bytes' to 'estimated_bytes', because 'estimated_bytes' unused after assignment.

Signed-off-by: Wei Qi <[email protected]>
Reviewed-by: Sahid Orentino Ferdjaoui <[email protected]>
Signed-off-by: Gerd Hoffmann <[email protected]>

hw/usb/dev-serial: Do not try to set vendorid or productid properties

When starting QEMU with the legacy USB serial device like this:

qemu-system-x86_64 -usbdevice serial:vendorid=0x1234:stdio

it currently aborts since the vendorid property does not exist
anymore (it has been removed by commit f29783f72ea77dfbd7ea0c9):

Unexpected error in object_property_find() at qemu/qom/object.c:1008:
qemu-system-x86_64: -usbdevice serial:vendorid=0x1234:stdio: Property
'.vendorid' not found
Aborted (core dumped)

Fix this crash by issuing a more friendly error message instead
(and simplify the code also a little bit this way).

Signed-off-by: Thomas Huth <[email protected]>
Message-id: 1493883704 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

xhci: relax link check

The strict td link limit added by commit "05f43d4 xhci: limit the
number of link trbs we are willing to process" causes problems with
Windows guests. Let's raise the limit.

This change is analogous to:

  commit ab6b1105a2259c7072905887f71caa850ce63190
  Author: Gerd Hoffmann <[email protected]>
  Date:   Tue Mar 7 09:40:18 2017 +0100

      ohci: relax link check

Signed-off-by: Ladi Prosek <[email protected]>
Message-id: 20170512102100 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

usb-hub: clear PORT_STAT_SUSPEND on wakeup

The spec says:

  Suspend: (PORT_SUSPEND) This field indicates whether or not the device
  on this port is suspended. Setting this field causes the device to
  suspend by not propagating bus traffic downstream. This field may be
  reset by a request or by resume signaling from the device attached to
  the port.

I can't find any specific statement like "the PORT_SUSPEND field is reset
automatically on remote wakeup", but without this patch, the only way to
reset it is via the ClearPortFeature request so the ".. or by resume
signaling from the device" clause is clearly not implemented on the remote
wakeup path.

The default xhci Windows driver does not issue the ClearPortFeature request
and suspended devices attached to a hub don't properly get out of the
suspended state. Interestingly, the default uhci Windows driver *does*
issue the ClearPortFeature request and does not exhibit this problem.

Signed-off-by: Ladi Prosek <[email protected]>
Message-id: 20170511125314 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

xhci: fix logging

slotid and epid were deleted from XHCITransfer in commit d6fcb29.
Also deleting one unused forward declaration.

Signed-off-by: Ladi Prosek <[email protected]>
Message-id: 20170511125314 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

usb-redir: fix stack overflow in usbredir_log_data

Don't reinvent a broken wheel, just use the hexdump function we have.

Impact: low, broken code doesn't run unless you have debug logging
enabled.

Reported-by: 李强 <[email protected]>
Signed-off-by: Gerd Hoffmann <[email protected]>
Message-id: 20170509110128 [email protected]

qemu-doc: Update to use the new way of attaching USB devices

The preferred way of adding USB devices is via "-device" and
"device_add" nowadays, so let's start to get rid of "-usbdevice"
and "usb_add" in the documentation. While we're at it, also
add the new USB devices there which have been added to QEMU
during the last years, and get rid of the old "vendorid" and
"productid" parameters of "-usbdevice serial" which have been
removed in QEMU version 0.14.0 already.

Reviewed-by: Markus Armbruster <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>
Message-id: 1494256429 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

opengl: add egl-headless display

Add egl-headless user interface.  It doesn't provide a real user
interface, it only provides opengl support using drm render nodes.
It will copy back the bits rendered by the guest using virgl back
to a DisplaySurface and kick the usual display update code paths,
so spice and vnc and screendump can pick it up.

Use it this way:
  qemu -display egl-headless -vnc $display
  qemu -display egl-headless -spice gl=off,$args

Note that you should prefer native spice opengl support (-spice
gl=on) if possible because that delivers better performance.

Signed-off-by: Gerd Hoffmann <[email protected]>
Message-id: 20170505104101 [email protected]

egl: explicitly ask for core context

Signed-off-by: Gerd Hoffmann <[email protected]>
Message-id: 20170505104101 [email protected]

egl-helpers: add missing error check

Code didn't check for qemu_egl_init_dpy_mesa() failures, add it.

Signed-off-by: Gerd Hoffmann <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: 20170505104101 [email protected]

egl-helpers: fix display init for x11

When running on gtk we need X11 platform not mesa platform.
Create separate functions for mesa and x11 so we can keep
the egl #ifdef mess local to egl-helpers.c

Fixes: 0ea1523fb6703aa0dcd65e66b59e96fec028e60a
Signed-off-by: Gerd Hoffmann <[email protected]>
Message-id: 20170505104101 [email protected]

egl-helpers: drop support for gles and debug logging

Leftover from the early opengl days.
Unused now, so delete the dead code.

Signed-off-by: Gerd Hoffmann <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Message-id: 20170505104101 [email protected]

virtio-gpu: move virtio_gpu_gl_block

Move to virtio-gpu-3d.c where all the other virgl code lives too.

Signed-off-by: Gerd Hoffmann <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: 20170505104101 [email protected]

migration/i386: Remove support for pre-0.12 formats

Remove support for versions of the CPU state prior to 11
which is the version used in qemu 0.12 - you'd be pretty
lucky if you got a migration stream to work from anything
that old anyway. This doesn't affect the machine type
definition in any way.

My main reason for doing this is the hack for sysenter_esp/eip
that uses .get/.put's in state versions less than 7 (that's
prior to somewhere before 0.10).

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Message-Id: <20170405190024 [email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

vmstatification: i386 FPReg

Convert the fpreg save/restore to use VMSTATE_ macros rather than
.get/.put.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Message-Id: <20170405190024 [email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

migration/i386: Remove old non-softfloat 64bit FP support

Long long ago, we used to support storing the x86 FP registers in
a 64bit format.

Then c31da136a0bf8caad70c348f5ffc283206e9c7fc in v0.14-rc0 removed
the last support for writing that in the migration format.
Even before that, it was only used if you had softfloat disabled
(i.e. !USE_X86LDOUBLE) so in practice use of it in even earlier
qemu is unlikely for most users.

Kill it off, it's complicated, and possibly broken.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Message-Id: <20170405190024 [email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

tests: check -numa node,cpu=props_list usecase

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Message-Id: <1494415802 [email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

numa: add '-numa cpu,...' option for property based node mapping

legacy cpu to node mapping is using cpu index values to map
VCPU to node with help of '-numa node,nodeid=node,cpus=x[-y]'
option. However cpu index is internal concept and QEMU users
have to guess /reimplement qemu's logic/ to map it to
a concrete cpu socket/core/thread to make sane CPUs
placement across numa nodes.

This patch allows to map cpu objects to numa nodes using
the same properties as used for cpus with -device/device_add
(socket-id/core-id/thread-id/node-id).

At present valid properties/values to address CPUs could be
fetched using hotpluggable-cpus monitor/qmp command, it will
require user to start qemu twice when creating domain to fetch
possible CPUs for a machine type/-smp layout first and
then the second time with numa explicit mapping for actual
usage. The first step results could be saved and reused to
set/change mapping later as far as machine type/-smp stays
the same.

Proposed impl. supports exact and wildcard matching to
simplify CLI and allow to set mapping for a specific cpu
or group of cpu objects specified by matched properties.

For example:

   # exact mapping x86
   -numa cpu,node-id=x,socket-id=y,core-id=z,thread-id=n

   # exact mapping SPAPR
   -numa cpu,node-id=x,core-id=y

   # wildcard mapping, all cpu objects that match socket-id=y
   # are mapped to node-id=x
   -numa cpu,node-id=x,socket-id=y

Signed-off-by: Igor Mammedov <[email protected]>
Message-Id: <1494415802 [email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

numa: remove node_cpu bitmaps as they are no longer used

Postfactum "CPU(s) present in multiple NUMA nodes" check
was the last user of node_cpu bitmaps, but it's not need
as machine_set_cpu_numa_node() does the similar check at
the time mapping is set for cpus (i.e. when -numa cpus=
is parsed) and ensures that cpu can be mapped only to
one node.

Remove duplicate check based on node_cpu bitmaps and
since the last user is gone remove node_cpu as well,
which completes internal transition from legacy bitmap
based mapping storage to possible_cpus storage.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Message-Id: <1494415802 [email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

numa: use possible_cpus for not mapped CPUs check

and remove corresponding part in numa.c that uses
node_cpu bitmaps.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Message-Id: <1494415802 [email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

machine: call machine init from wrapper

add machine_run_board_init() wrapper that calls machine
init for now but in follow up patches it will be used
to run generic machine code that should run before
machine init.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Message-Id: <1494415802 [email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

numa: remove no longer need numa_post_machine_init()

CPUState::numa_node is still in use but now it's set by
board when it creates CPU objects. So there isn't any
need to set it again after all CPU's are created,
since it's been already set.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Message-Id: <1494415802 [email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

tests: numa: add case for QMP command query-cpus

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Message-Id: <1494415802 [email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

QMP: include CpuInstanceProperties into query_cpus output output

if board supports CpuInstanceProperties, report them for
each CPU thread listed. Main motivation for this is to
provide these properties introspection via QMP interface
for using in test cases to verify numa node to cpu mapping,
which includes not only boards that support cpu hotplug
and have this info in query-hotpluggable-cpus (pc/spapr)
but also for boards that don't not support hotpluggable-cpus
but support numa mapping (virt-arm).

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-Id: <1494415802 [email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

virt-arm: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Message-Id: <1494415802 [email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

spapr: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()

it's safe to remove thread node_id != core node_id error
branch as machine_set_cpu_numa_node() also does mismatch
check and is called even before any CPU is created.

Signed-off-by: Igor Mammedov <[email protected]>
Acked-by: David Gibson <[email protected]>
Message-Id: <1494415802 [email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Message-Id: <1494415802 [email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

numa: do default mapping based on possible_cpus instead of node_cpu bitmaps

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Message-Id: <1494415802 [email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

numa: mirror cpu to node mapping in MachineState::possible_cpus

Introduce machine_set_cpu_numa_node() helper that stores
node mapping for CPU in MachineState::possible_cpus.
CPU and node it belongs to is specified by 'props' argument.

Patch doesn't remove old way of storing mapping in
numa_info[X].node_cpu as removing it at the same time
makes patch rather big. Instead it just mirrors mapping
in possible_cpus and follow up per target patches will
switch to possible_cpus and numa_info[X].node_cpu will
be removed once there isn't any users left.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Message-Id: <1494415802 [email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

numa: add check that board supports cpu_index to node mapping

Default node mapping initialization already checks that board
supports cpu_index to node mapping and refuses to start if
it's not supported. Do the same for explicitly provided
mapping "-numa node,cpus=..."

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Message-Id: <1494415802 [email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

virt-arm: add node-id property to CPU

it will allow switching from cpu_index to property based
numa mapping in follow up patches.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Message-Id: <1494415802 [email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

pc: add node-id property to CPU

it will allow switching from cpu_index to property based
numa mapping in follow up patches.

PS:
patch changes default value of CPUState::numa_node from 0
to CPU_UNSET_NUMA_NODE_ID. The only place for x86 that
would affected is monitor's 'infor numa' command which
uses that field. However legacy 0 value is still preserved
by pc_cpu_pre_plug() in this patch if user/numa.c hasn't
set it explicitly, so there is no change in behavior.

Signed-off-by: Igor Mammedov <[email protected]>
Message-Id: <1494415802 [email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

spapr: add node-id property to sPAPR core

it will allow switching from cpu_index to core based numa
mapping in follow up patches.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Message-Id: <1494415802 [email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

numa: move source of default CPUs to NUMA node mapping into boards

Originally CPU threads were by default assigned in
round-robin fashion. However it was causing issues in
guest since CPU threads from the same socket/core could
be placed on different NUMA nodes.
Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
fixed it by grouping threads within a socket on the same node
introducing cpu_index_to_socket_id() callback and commit
20bb648d (spapr: Fix default NUMA node allocation for threads)
reused callback to fix similar issues for SPAPR machine
even though socket doesn't make much sense there.

As result QEMU ended up having 3 default distribution rules
used by 3 targets /virt-arm, spapr, pc/.

In effort of moving NUMA mapping for CPUs into possible_cpus,
generalize default mapping in numa.c by making boards decide
on default mapping and let them explicitly tell generic
numa code to which node a CPU thread belongs to by replacing
cpu_index_to_socket_id() with @cpu_index_to_instance_props()
which provides default node_id assigned by board to specified
cpu_index.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Message-Id: <1494415802 [email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

hw/arm/virt: explicitly allocate cpu_index for cpus

Currently cpu_index is implicitly auto assigned during
cpu.realize() time cpu_exec_realizefn()->cpu_list_add().

It happens to match index in possible_cpus so take
control over it and make board initialize cpu_index
to possible_cpus index explicitly. It will at least
document that board is in control of it and when
'-device cpu' support comes it will keep cpu_index
stable regardless of order cpus are created so it won't
break migration.
Within this series it will be used for internal
conversion from storing cpu_index based NUMA node
bitmaps to property based mapping with possible_cpus,
And will allow map cpu_index to a CPU entry in
possible_cpus array.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Message-Id: <1493816238 [email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

hw/arm/virt: use machine->possible_cpus for storing possible topology info

for now precalculate and store mp_afinity in possible_cpus
as ARM cpus don't have socket/core/thread-id properties yet.
In follow patches possible_cpus will be used for storing
and setting NUMA node mapping and replace legacy bitmap
based numa_info[node_id].node_cpu/numa_get_node_for_cpu()

For the lack of better idea, this patch cannibalizes
possible_cpus.cpus[x].props.thread_id so that
*_cpu_index_to_props() callback could return addressable
by props CPU which will be used by machine_set_cpu_numa_node()
in follow up patches to assign a CPU to node. But
cannibalizing is fine for now as that thread_id isn't exposed
to users (no hotpluggable_cpus callback support for ARM yet)
and it will be used only internally until 'device_add cpu'
is supported where we can decide on which properties to use.

Signed-off-by: Igor Mammedov <[email protected]>
Message-Id: <1493816238 [email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

hw/arm/virt: extract mp-affinity calculation in separate function

Signed-off-by: Igor Mammedov <[email protected]>
Message-Id: <1493816238 [email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

tests: add CPUs to numa node mapping test

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Message-Id: <1493816238 [email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

tests: acpi: extend cphp and memhp testcase with numa distance check

Signed-off-by: He Chen <[email protected]>
Message-Id: <1493803036 [email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
[ehabkost: regenerated tests/acpi-tst-data, included SLIT table]
Signed-off-by: Eduardo Habkost <[email protected]>

numa: equally distribute memory on nodes

When there are more nodes than available memory to put the minimum
allowed memory by node, all the memory is put on the last node.

This is because we put (ram_size / nb_numa_nodes) &
~((1 << mc->numa_mem_align_shift) - 1); on each node, and in this
case the value is 0. This is particularly true with pseries,
as the memory must be aligned to 256MB.

To avoid this problem, this patch uses an error diffusion algorithm [1]
to distribute equally the memory on nodes.

We introduce numa_auto_assign_ram() function in MachineClass
to keep compatibility between machine type versions.
The legacy function is used with pseries-2.9, pc-q35-2.9 and
pc-i440fx-2.9 (and previous), the new one with all others.

Example:

qemu-system-ppc64 -S -nographic  -nodefaults -monitor stdio -m 1G -smp 8 \
                  -numa node -numa node -numa node \
                  -numa node -numa node -numa node

Before:

(qemu) info numa
6 nodes
node 0 cpus: 0 6
node 0 size: 0 MB
node 1 cpus: 1 7
node 1 size: 0 MB
node 2 cpus: 2
node 2 size: 0 MB
node 3 cpus: 3
node 3 size: 0 MB
node 4 cpus: 4
node 4 size: 0 MB
node 5 cpus: 5
node 5 size: 1024 MB

After:
(qemu) info numa
6 nodes
node 0 cpus: 0 6
node 0 size: 0 MB
node 1 cpus: 1 7
node 1 size: 256 MB
node 2 cpus: 2
node 2 size: 0 MB
node 3 cpus: 3
node 3 size: 256 MB
node 4 cpus: 4
node 4 size: 256 MB
node 5 cpus: 5
node 5 size: 256 MB

[1] https://en.wikipedia.org/wiki/Error_diffusion

Signed-off-by: Laurent Vivier <[email protected]>
Message-Id: <20170502162955 [email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
[ehabkost: s/ram_size/size/ at numa_default_auto_assign_ram()]
Signed-off-by: Eduardo Habkost <[email protected]>

numa: Allow setting NUMA distance for different NUMA nodes

This patch is going to add SLIT table support in QEMU, and provides
additional option `dist` for command `-numa` to allow user set vNUMA
distance by QEMU command.

With this patch, when a user wants to create a guest that contains
several vNUMA nodes and also wants to set distance among those nodes,
the QEMU command would like:

```
-numa node,nodeid=0,cpus=0 \
-numa node,nodeid=1,cpus=1 \
-numa node,nodeid=2,cpus=2 \
-numa node,nodeid=3,cpus=3 \
-numa dist,src=0,dst=1,val=21 \
-numa dist,src=0,dst=2,val=31 \
-numa dist,src=0,dst=3,val=41 \
-numa dist,src=1,dst=2,val=21 \
-numa dist,src=1,dst=3,val=31 \
-numa dist,src=2,dst=3,val=21 \
```

Signed-off-by: He Chen <[email protected]>
Message-Id: <1493260558 [email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

maintainers: Add myself as linux-user reviewer

I volunteer to review linux-user patches.
Adding myself will help to not miss some of them.

Signed-off-by: Laurent Vivier <[email protected]>
Acked-by: Riku Voipio <[email protected]>
Message-id: 20170510153950 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

i386: rewrite way CPUID index is validated

Change the nested if statements into a flat format, to make
it clearer what validation / capping is being performed on
different CPUID index values.

NB this changes behaviour when "index > env->cpuid_xlevel2".
This won't have any guest-visible effect because no there is
no CPUID[0xC0000001] feature supported by TCG, and KVM code
will never call cpu_x86_cpuid() with such an index value.

Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Message-Id: <20170509132736 [email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

Merge remote-tracking branch 'mreitz/tags/pull-block-2017-05-11' into queue-block

Block patches for the block queue.

# gpg: Signature made Thu May 11 14:28:41 2017 CEST
# gpg:                using RSA key 0xF407DB0061D5CF40
# gpg: Good signature from "Max Reitz <[email protected]>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* mreitz/tags/pull-block-2017-05-11: (22 commits)
  MAINTAINERS: Add qemu-progress to the block layer
  qcow2: Discard/zero clusters by byte count
  qcow2: Assert that cluster operations are aligned
  qcow2: Optimize write zero of unaligned tail cluster
  iotests: Add test 179 to cover write zeroes with unmap
  iotests: Improve _filter_qemu_img_map
  qcow2: Optimize zero_single_l2() to minimize L2 churn
  qcow2: Make distinction between zero cluster types obvious
  qcow2: Name typedef for cluster type
  qcow2: Correctly report status of preallocated zero clusters
  block: Update comments on BDRV_BLOCK_* meanings
  qcow2: Use consistent switch indentation
  qcow2: Nicer variable names in qcow2_update_snapshot_refcount()
  tests: Add coverage for recent block geometry fixes
  blkdebug: Add ability to override unmap geometries
  blkdebug: Simplify override logic
  blkdebug: Add pass-through write_zero and discard support
  blkdebug: Refactor error injection
  blkdebug: Sanity check block layer guarantees
  qemu-io: Switch 'map' output to byte-based reporting
  ...

Signed-off-by: Kevin Wolf <[email protected]>

MAINTAINERS: Add qemu-progress to the block layer

util/qemu-progress.c is currently unmaintained. The only user of its
functionality is qemu-img, so it effectively is part of the block layer.

Suggested-by: Fam Zheng <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Message-id: 20170428165517 [email protected]
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

qcow2: Discard/zero clusters by byte count

Passing a byte offset, but sector count, when we ultimately
want to operate on cluster granularity, is madness.  Clean up
the external interfaces to take both offset and count as bytes,
while still keeping the assertion added previously that the
caller must align the values to a cluster.  Then rename things
to make sure backports don't get confused by changed units:
instead of qcow2_discard_clusters() and qcow2_zero_clusters(),
we now have qcow2_cluster_discard() and qcow2_cluster_zeroize().

The internal functions still operate on clusters at a time, and
return an int for number of cleared clusters; but on an image
with 2M clusters, a single L2 table holds 256k entries that each
represent a 2M cluster, totalling well over INT_MAX bytes if we
ever had a request for that many bytes at once.  All our callers
currently limit themselves to 32-bit bytes (and therefore fewer
clusters), but by making this function 64-bit clean, we have one
less place to clean up if we later improve the block layer to
support 64-bit bytes through all operations (with the block layer
auto-fragmenting on behalf of more-limited drivers), rather than
the current state where some interfaces are artificially limited
to INT_MAX at a time.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Message-id: 20170507000552 [email protected]
Signed-off-by: Max Reitz <[email protected]>

qcow2: Assert that cluster operations are aligned

We already audited (in commit 0c1bd469) that qcow2_discard_clusters()
is only passed cluster-aligned start values; but we can further
tighten the assertion that the only unaligned end value is at EOF.

Recent commits have taken advantage of an unaligned tail cluster,
for both discard and write zeroes.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Message-id: 20170507000552 [email protected]
Signed-off-by: Max Reitz <[email protected]>

qcow2: Optimize write zero of unaligned tail cluster

We've already improved discards to operate efficiently on the tail
of an unaligned qcow2 image; it's time to make a similar improvement
to write zeroes. The special case is only valid at the tail
cluster of a file, where we must recognize that any sectors beyond
the image end would implicitly read as zero, and therefore should
not penalize our logic for widening a partial cluster into writing
the whole cluster as zero.

However, note that for now, the special case of end-of-file is only
recognized if there is no backing file, or if the backing file has
the same length; that's because when the backing file is shorter
than the active layer, we don't have code in place to recognize
that reads of a sector unallocated at the top and beyond the backing
end-of-file are implicitly zero. It's not much of a real loss,
because most people don't use images that aren't cluster-aligned,
or where the active layer is a different size than the backing
layer (especially where the difference falls within a single cluster).

Update test 154 to cover the new scenarios, using two images of
intentionally differing length.

While at it, fix the test to gracefully skip when run as
./check -qcow2 -o compat=0.10 154
since the older format lacks zero clusters already required earlier
in the test.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Message-id: 20170507000552 [email protected]
Signed-off-by: Max Reitz <[email protected]>

iotests: Add test 179 to cover write zeroes with unmap

No tests were covering write zeroes with unmap. Additionally,
I needed to prove that my previous patches for correct status
reporting and write zeroes optimizations actually had an impact.

The test works for cluster_size between 8k and 2M (for smaller
sizes, it fails because our allocation patterns are not contiguous
with small clusters - in part, the largest consecutive allocation
we tend to get is often bounded by the size covered by one L2
table).

Note that testing for zero clusters is tricky: 'qemu-io map'
reports whether data comes from the current layer of the image
(useful for sniffing out which regions of the file have
QCOW_OFLAG_ZERO) - but doesn't show which clusters have mappings;
while 'qemu-img map' sees "zero":true for both unallocated and
zero clusters for any qcow2 with no backing layer (so less useful
at detecting true zero clusters), but reliably shows mappings.
So we have to rely on both queries side-by-side at each point of
the test.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Message-id: 20170507000552 [email protected]
Signed-off-by: Max Reitz <[email protected]>

iotests: Improve _filter_qemu_img_map

Although _filter_qemu_img_map documents that it scrubs offsets, it
was only doing so for human mode. Of the existing tests using the
filter (97, 122, 150, 154, 176), two of them are affected, but it
does not hurt the validity of the tests to not require particular
mappings (another test, 66, uses offsets but intentionally does not
pass through _filter_qemu_img_map, because it checks that offsets
are unchanged before and after an operation).

Another justification for this patch is that it will allow a future
patch to utilize 'qemu-img map --output=json' to check the status of
preallocated zero clusters without regards to the mapping (since
the qcow2 mapping can be very sensitive to the chosen cluster size,
when preallocation is not in use).

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Message-id: 20170507000552 [email protected]
Signed-off-by: Max Reitz <[email protected]>

qcow2: Optimize zero_single_l2() to minimize L2 churn

Similar to discard_single_l2(), we should try to avoid dirtying
the L2 cache when the cluster we are changing already has the
right characteristics.

Note that by the time we get to zero_single_l2(), BDRV_REQ_MAY_UNMAP
is a requirement to unallocate a cluster (this is because the block
layer clears that flag if discard.* flags during open requested that
we never punch holes - see the conversation around commit 170f4b2e,
https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07306.html).
Therefore, this patch can only reuse a zero cluster as-is if either
unmapping is not requested, or if the zero cluster was not associated
with an allocation.

Technically, there are some cases where an unallocated cluster
already reads as all zeroes (namely, when there is no backing file
[easy: check bs->backing], or when the backing file also reads as
zeroes [harder: we can't check bdrv_get_block_status since we are
already holding the lock]), where the guest would not immediately see
a difference if we left that cluster unallocated. But if the user
did not request unmapping, leaving an unallocated cluster is wrong;
and even if the user DID request unmapping, keeping a cluster
unallocated risks a subtle semantic change of guest-visible contents
if a backing file is later added, and it is not worth auditing
whether all internal uses such as mirror properly avoid an unmap
request. Thus, this patch is intentionally limited to just clusters
that are already marked as zero.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Message-id: 20170507000552 [email protected]
Signed-off-by: Max Reitz <[email protected]>

qcow2: Make distinction between zero cluster types obvious

Treat plain zero clusters differently from allocated ones, so that
we can simplify the logic of checking whether an offset is present.
Do this by splitting QCOW2_CLUSTER_ZERO into two new enums,
QCOW2_CLUSTER_ZERO_PLAIN and QCOW2_CLUSTER_ZERO_ALLOC.

I tried to arrange the enum so that we could use
'ret <= QCOW2_CLUSTER_ZERO_PLAIN' for all unallocated types, and
'ret >= QCOW2_CLUSTER_ZERO_ALLOC' for allocated types, although
I didn't actually end up taking advantage of the layout.

In many cases, this leads to simpler code, by properly combining
cases (sometimes, both zero types pair together, other times,
plain zero is more like unallocated while allocated zero is more
like normal).

Signed-off-by: Eric Blake <[email protected]>
Message-id: 20170507000552 [email protected]
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

qcow2: Name typedef for cluster type

Although it doesn't add all that much type safety (this is C, after
all), it does add a bit of legibility to use the name QCow2ClusterType
instead of a plain int.

In particular, qcow2_get_cluster_offset() has an overloaded return
type; a QCow2ClusterType on success, and -errno on failure; keeping
the cluster type in a separate variable makes it slightly easier for
the next patch to make further computations based on the type.

Suggested-by: Max Reitz <[email protected]>
Signed-off-by: Eric Blake <[email protected]>
Message-id: 20170507000552 [email protected]
[mreitz: Use the new type in two more places (one of them pulled from
the next patch)]
Signed-off-by: Max Reitz <[email protected]>

qcow2: Correctly report status of preallocated zero clusters

We were throwing away the preallocation information associated with
zero clusters. But we should be matching the well-defined semantics
in bdrv_get_block_status(), where (BDRV_BLOCK_ZERO |
BDRV_BLOCK_OFFSET_VALID) informs the user which offset is reserved,
while still reminding the user that reading from that offset is
likely to read garbage.

count_contiguous_clusters_by_type() is now used only for unallocated
cluster runs, hence it gets renamed and tightened.

Making this change lets us see which portions of an image are zero
but preallocated, when using qemu-img map --output=json. The
--output=human side intentionally ignores all zero clusters, whether
or not they are preallocated.

The fact that there is no change to qemu-iotests './check -qcow2'
merely means that we aren't yet testing this aspect of qemu-img;
a later patch will add a test.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Message-id: 20170507000552 [email protected]
Signed-off-by: Max Reitz <[email protected]>

block: Update comments on BDRV_BLOCK_* meanings

We had some conflicting documentation: a nice 8-way table that
described all possible combinations of DATA, ZERO, and
OFFSET_VALID, contrasted with text that implied that OFFSET_VALID
always meant raw data could be read directly.  Furthermore, the
text refers a lot to bs->file, even though the interface was
updated back in 67a0fd2a to let the driver pass back a specific
BDS (not necessarily bs->file).  As the 8-way table is the
intended semantics, simplify the rest of the text to get rid of
the confusion.

ALLOCATED is always set by the block layer for convenience (drivers
do not have to worry about it).  RAW is used only internally, but
by more than the raw driver.  Document these additional items on
the driver callback.

Suggested-by: Max Reitz <[email protected]>
Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Message-id: 20170507000552 [email protected]
Signed-off-by: Max Reitz <[email protected]>

qcow2: Use consistent switch indentation

Fix a couple of inconsistent indentations, before an upcoming
patch further tweaks the switch statements.
(best viewed with 'git diff -b').

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Message-id: 20170507000552 [email protected]
Signed-off-by: Max Reitz <[email protected]>

qcow2: Nicer variable names in qcow2_update_snapshot_refcount()

In order to keep checkpatch happy when the next patch changes
indentation, we first have to shorten some long lines. The easiest
approach is to use a new variable in place of
'offset & L2E_OFFSET_MASK', except that 'offset' is the best name
for that variable. Change '[old_]offset' to '[old_]entry' to
make room.

While touching things, also fix checkpatch warnings about unusual
'for' statements.

Suggested by Max Reitz <[email protected]>
Signed-off-by: Eric Blake <[email protected]>
Message-id: 20170507000552 [email protected]
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

tests: Add coverage for recent block geometry fixes

Use blkdebug's new geometry constraints to emulate setups that
have needed past regression fixes: write zeroes asserting
when running through a loopback block device with max-transfer
smaller than cluster size, and discard rounding away portions
of requests not aligned to preferred boundaries.  Also, add
coverage that the block layer is honoring max transfer limits.

For now, a single iotest performs all actions, with the idea
that we can add future blkdebug constraint test cases in the
same file; but it can be split into multiple iotests if we find
reason to run one portion of the test in more setups than what
are possible in the other.

For reference, the final portion of the test (checking whether
discard passes as much as possible to the lowest layers of the
stack) works as follows:

qemu-io: discard 30M at 80000001, passed to blkdebug
  blkdebug: discard 511 bytes at 80000001, -ENOTSUP (smaller than
blkdebug's 512 align)
  blkdebug: discard 14371328 bytes at 80000512, passed to qcow2
    qcow2: discard 739840 bytes at 80000512, -ENOTSUP (smaller than
qcow2's 1M align)
    qcow2: discard 13M bytes at 77M, succeeds
  blkdebug: discard 15M bytes at 90M, passed to qcow2
    qcow2: discard 15M bytes at 90M, succeeds
  blkdebug: discard 1356800 bytes at 105M, passed to qcow2
    qcow2: discard 1M at 105M, succeeds
    qcow2: discard 308224 bytes at 106M, -ENOTSUP (smaller than qcow2's
1M align)
  blkdebug: discard 1 byte at 111457280, -ENOTSUP (smaller than
blkdebug's 512 align)

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Message-id: 20170429191419 [email protected]
[mreitz: For cooperation with image locking, add -r to the qemu-io
         invocation which verifies the image content]
Signed-off-by: Max Reitz <[email protected]>

blkdebug: Add ability to override unmap geometries

Make it easier to simulate various unusual hardware setups (for
example, recent commits 3482b9b and b8d0a98 affect the Dell
Equallogic iSCSI with its 15M preferred and maximum unmap and
write zero sizing, or b2f95fe deals with the Linux loopback
block device having a max_transfer of 64k), by allowing blkdebug
to wrap any other device with further restrictions on various
alignments.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Message-id: 20170429191419 [email protected]
Signed-off-by: Max Reitz <[email protected]>

blkdebug: Simplify override logic

Rather than store into a local variable, then copy to the struct
if the value is valid, then reporting errors otherwise, it is
simpler to just store into the struct and report errors if the
value is invalid. This however requires that the struct store
a 64-bit number, rather than a narrower type. Likewise, setting
a sane errno value in ret prior to the sequence of parsing and
jumping to out: on error makes it easier for the next patch to
add a chain of similar checks.

Signed-off-by: Eric Blake <[email protected]>
Message-id: 20170429191419 [email protected]
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

blkdebug: Add pass-through write_zero and discard support

In order to test the effects of artificial geometry constraints
on operations like write zero or discard, we first need blkdebug
to manage these actions.  It also allows us to inject errors on
those operations, just like we can for read/write/flush.

We can also test the contract promised by the block layer; namely,
if a device has specified limits on alignment or maximum size,
then those limits must be obeyed (for now, the blkdebug driver
merely inherits limits from whatever it is wrapping, but the next
patch will further enhance it to allow specific limit overrides).

This patch intentionally refuses to service requests smaller than
the requested alignments; this is because an upcoming patch adds
a qemu-iotest to prove that the block layer is correctly handling
fragmentation, but the test only works if there is a way to tell
the difference at artificial alignment boundaries when blkdebug is
using a larger-than-default alignment.  If we let the blkdebug
layer always defer to the underlying layer, which potentially has
a smaller granularity, the iotest will be thwarted.

Tested by setting up an NBD server with export 'foo', then invoking:
$ ./qemu-io
qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo
qemu-io> d 0 15M
qemu-io> w -z 0 15M

Pre-patch, the server never sees the discard (it was silently
eaten by the block layer); post-patch it is passed across the
wire.  Likewise, pre-patch the write is always passed with
NBD_WRITE (with 15M of zeroes on the wire), while post-patch
it can utilize NBD_WRITE_ZEROES (for less traffic).

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Message-id: 20170429191419 [email protected]
Signed-off-by: Max Reitz <[email protected]>