Git Repo - qemu.git/log

Merge remote-tracking branch 'remotes/stefanberger/tags/pull-tpm-2017-10-24-1' into staging

Merge tpm 2017/10/24 v1

# gpg: Signature made Wed 25 Oct 2017 06:06:55 BST
# gpg:                using RSA key 0x75AD65802A0B4211
# gpg: Good signature from "Stefan Berger <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B818 B9CA DF90 89C2 D5CE  C66B 75AD 6580 2A0B 4211

* remotes/stefanberger/tags/pull-tpm-2017-10-24-1:
  tpm: print buffers received from TPM when debugging
  vl: remove unnecessary #ifdef CONFIG_TPM
  tpm: remove unnecessary #ifdef CONFIG_TPM
  tpm: add stubs
  tpm: add missing include

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20171025' into staging

TCG patch queue

# gpg: Signature made Wed 25 Oct 2017 10:30:18 BST
# gpg:                using RSA key 0x64DF38E8AF7E215F
# gpg: Good signature from "Richard Henderson <[email protected]>"
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth/tags/pull-tcg-20171025: (51 commits)
  translate-all: exit from tb_phys_invalidate if qht_remove fails
  tcg: Initialize cpu_env generically
  tcg: enable multiple TCG contexts in softmmu
  tcg: introduce regions to split code_gen_buffer
  translate-all: use qemu_protect_rwx/none helpers
  osdep: introduce qemu_mprotect_rwx/none
  tcg: allocate optimizer temps with tcg_malloc
  tcg: distribute profiling counters across TCGContext's
  tcg: introduce **tcg_ctxs to keep track of all TCGContext's
  gen-icount: fold exitreq_label into TCGContext
  tcg: define tcg_init_ctx and make tcg_ctx a pointer
  tcg: take tb_ctx out of TCGContext
  translate-all: report correct avg host TB size
  exec-all: rename tb_free to tb_remove
  translate-all: use a binary search tree to track TBs in TBContext
  tcg: Remove CF_IGNORE_ICOUNT
  tcg: Add CF_LAST_IO + CF_USE_ICOUNT to CF_HASH_MASK
  cpu-exec: lookup/generate TB outside exclusive region during step_atomic
  tcg: check CF_PARALLEL instead of parallel_cpus
  target/sparc: check CF_PARALLEL instead of parallel_cpus
  ...

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20171023' into staging

migration/next for 20171023

# gpg: Signature made Mon 23 Oct 2017 17:05:14 BST
# gpg:                using RSA key 0xF487EF185872D723
# gpg: Good signature from "Juan Quintela <[email protected]>"
# gpg:                 aka "Juan Quintela <[email protected]>"
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03  4B82 F487 EF18 5872 D723

* remotes/juanquintela/tags/migration/20171023: (21 commits)
  migration: Improve migration thread error handling
  qapi: Fix grammar in x-multifd-page-count descriptions
  migration: add bitmap for received page
  migration: introduce qemu_ufd_copy_ioctl helper
  migration: postcopy_place_page factoring out
  migration: new ram_init_bitmaps()
  migration: clean up xbzrle cache init/destroy
  migration: provide ram_state_cleanup
  migration: provide ram_state_init()
  migration: pause-before-switchover for postcopy
  migration: allow cancel to unpause
  migrate: HMP migate_continue
  migration: migrate-continue
  migration: Wait for semaphore before completing migration
  migration: Add 'pre-switchover' and 'device' statuses
  migration: Add 'pause-before-switchover' capability
  migration: Make cache_init() take an error parameter
  migration: Move xbzrle cache resize error handling to xbzrle_cache_resize
  migration: Make cache size elements use the right types
  migratiom: Remove max_item_age parameter
  ...

Signed-off-by: Peter Maydell <[email protected]>

tpm: print buffers received from TPM when debugging

Signed-off-by: Stefan Berger <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>

vl: remove unnecessary #ifdef CONFIG_TPM

a stub is now provided.

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Tested-by: Richard W.M. Jones <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm: remove unnecessary #ifdef CONFIG_TPM

Makefile.objs now checks for $(CONFIG_TPM).

Suggested-by: Stefan Berger <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm: add stubs

Commit c37cacabf22 moved tpm_cleanup() in the main loop exit, however this
function is not available when compiling with --disable-tpm.

Provides necessary stubs to keep code clean of #ifdef'fery.

Reported-by: BALATON Zoltan <[email protected]>
Message-Id: <20171023102903.256AF7456A0@zero.eik.bme.hu>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Tested-by: Richard W.M. Jones <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

translate-all: exit from tb_phys_invalidate if qht_remove fails

Two or more threads might race while invalidating the same TB. We currently
do not check for this at all despite taking tb_lock, which means we would
wrongly invalidate the same TB more than once. This bug has actually been
hit by users: I recently saw a report on IRC, although I have yet to see
the corresponding test case.

Fix this by using qht_remove as the synchronization point; if it fails,
that means the TB has already been invalidated, and therefore there
is nothing left to do in tb_phys_invalidate.

Note that this solution works now that we still have tb_lock, and will
continue working once we remove tb_lock.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <1508445114 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Initialize cpu_env generically

This is identical for each target. So, move the initialization to
common code. Move the variable itself out of tcg_ctx and name it
cpu_env to minimize changes within targets.

This also means we can remove tcg_global_reg_new_{ptr,i32,i64},
since there are no longer global-register temps created by targets.

Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: enable multiple TCG contexts in softmmu

This enables parallel TCG code generation. However, we do not take
advantage of it yet since tb_lock is still held during tb_gen_code.

In user-mode we use a single TCG context; see the documentation
added to tcg_region_init for the rationale.

Note that targets do not need any conversion: targets initialize a
TCGContext (e.g. defining TCG globals), and after this initialization
has finished, the context is cloned by the vCPU threads, each of
them keeping a separate copy.

TCG threads claim one entry in tcg_ctxs[] by atomically increasing
n_tcg_ctxs. Do not be too annoyed by the subsequent atomic_read's
of that variable and tcg_ctxs; they are there just to play nice with
analysis tools such as thread sanitizer.

Note that we do not allocate an array of contexts (we allocate
an array of pointers instead) because when tcg_context_init
is called, we do not know yet how many contexts we'll use since
the bool behind qemu_tcg_mttcg_enabled() isn't set yet.

Previous patches folded some TCG globals into TCGContext. The non-const
globals remaining are only set at init time, i.e. before the TCG
threads are spawned. Here is a list of these set-at-init-time globals
under tcg/:

Only written by tcg_context_init:
- indirect_reg_alloc_order
- tcg_op_defs
Only written by tcg_target_init (called from tcg_context_init):
- tcg_target_available_regs
- tcg_target_call_clobber_regs
- arm: arm_arch, use_idiv_instructions
- i386: have_cmov, have_bmi1, have_bmi2, have_lzcnt,
        have_movbe, have_popcnt
- mips: use_movnz_instructions, use_mips32_instructions,
        use_mips32r2_instructions, got_sigill (tcg_target_detect_isa)
- ppc: have_isa_2_06, have_isa_3_00, tb_ret_addr
- s390: tb_ret_addr, s390_facilities
- sparc: qemu_ld_trampoline, qemu_st_trampoline (build_trampolines),
         use_vis3_instructions

Only written by tcg_prologue_init:
- 'struct jit_code_entry one_entry'
- aarch64: tb_ret_addr
- arm: tb_ret_addr
- i386: tb_ret_addr, guest_base_flags
- ia64: tb_ret_addr
- mips: tb_ret_addr, bswap32_addr, bswap32u_addr, bswap64_addr

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: introduce regions to split code_gen_buffer

This is groundwork for supporting multiple TCG contexts.

The naive solution here is to split code_gen_buffer statically
among the TCG threads; this however results in poor utilization
if translation needs are different across TCG threads.

What we do here is to add an extra layer of indirection, assigning
regions that act just like pages do in virtual memory allocation.
(BTW if you are wondering about the chosen naming, I did not want
to use blocks or pages because those are already heavily used in QEMU).

We use a global lock to serialize allocations as well as statistics
reporting (we now export the size of the used code_gen_buffer with
tcg_code_size()). Note that for the allocator we could just use
a counter and atomic_inc; however, that would complicate the gathering
of tcg_code_size()-like stats. So given that the region operations are
not a fast path, a lock seems the most reasonable choice.

The effectiveness of this approach is clear after seeing some numbers.
I used the bootup+shutdown of debian-arm with '-tb-size 80' as a benchmark.
Note that I'm evaluating this after enabling per-thread TCG (which
is done by a subsequent commit).

* -smp 1, 1 region (entire buffer):
    qemu: flush code_size=83885014 nb_tbs=154739 avg_tb_size=357
    qemu: flush code_size=83884902 nb_tbs=153136 avg_tb_size=363
    qemu: flush code_size=83885014 nb_tbs=152777 avg_tb_size=364
    qemu: flush code_size=83884950 nb_tbs=150057 avg_tb_size=373
    qemu: flush code_size=83884998 nb_tbs=150234 avg_tb_size=373
    qemu: flush code_size=83885014 nb_tbs=154009 avg_tb_size=360
    qemu: flush code_size=83885014 nb_tbs=151007 avg_tb_size=370
    qemu: flush code_size=83885014 nb_tbs=151816 avg_tb_size=367

That is, 8 flushes.

* -smp 8, 32 regions (80/32 MB per region) [i.e. this patch]:

    qemu: flush code_size=76328008 nb_tbs=141040 avg_tb_size=356
    qemu: flush code_size=75366534 nb_tbs=138000 avg_tb_size=361
    qemu: flush code_size=76864546 nb_tbs=140653 avg_tb_size=361
    qemu: flush code_size=76309084 nb_tbs=135945 avg_tb_size=375
    qemu: flush code_size=74581856 nb_tbs=132909 avg_tb_size=375
    qemu: flush code_size=73927256 nb_tbs=135616 avg_tb_size=360
    qemu: flush code_size=78629426 nb_tbs=142896 avg_tb_size=365
    qemu: flush code_size=76667052 nb_tbs=138508 avg_tb_size=368

Again, 8 flushes. Note how buffer utilization is not 100%, but it
is close. Smaller region sizes would yield higher utilization,
but we want region allocation to be rare (it acquires a lock), so
we do not want to go too small.

* -smp 8, static partitioning of 8 regions (10 MB per region):
    qemu: flush code_size=21936504 nb_tbs=40570 avg_tb_size=354
    qemu: flush code_size=11472174 nb_tbs=20633 avg_tb_size=370
    qemu: flush code_size=11603976 nb_tbs=21059 avg_tb_size=365
    qemu: flush code_size=23254872 nb_tbs=41243 avg_tb_size=377
    qemu: flush code_size=28289496 nb_tbs=52057 avg_tb_size=358
    qemu: flush code_size=43605160 nb_tbs=78896 avg_tb_size=367
    qemu: flush code_size=45166552 nb_tbs=82158 avg_tb_size=364
    qemu: flush code_size=63289640 nb_tbs=116494 avg_tb_size=358
    qemu: flush code_size=51389960 nb_tbs=93937 avg_tb_size=362
    qemu: flush code_size=59665928 nb_tbs=107063 avg_tb_size=372
    qemu: flush code_size=38380824 nb_tbs=68597 avg_tb_size=374
    qemu: flush code_size=44884568 nb_tbs=79901 avg_tb_size=376
    qemu: flush code_size=50782632 nb_tbs=90681 avg_tb_size=374
    qemu: flush code_size=39848888 nb_tbs=71433 avg_tb_size=372
    qemu: flush code_size=64708840 nb_tbs=119052 avg_tb_size=359
    qemu: flush code_size=49830008 nb_tbs=90992 avg_tb_size=362
    qemu: flush code_size=68372408 nb_tbs=123442 avg_tb_size=368
    qemu: flush code_size=33555560 nb_tbs=59514 avg_tb_size=378
    qemu: flush code_size=44748344 nb_tbs=80974 avg_tb_size=367
    qemu: flush code_size=37104248 nb_tbs=67609 avg_tb_size=364

That is, 20 flushes. Note how a static partitioning approach uses
the code buffer poorly, leading to many unnecessary flushes.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

translate-all: use qemu_protect_rwx/none helpers

The helpers require the address and size to be page-aligned, so
do that before calling them.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

osdep: introduce qemu_mprotect_rwx/none

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: allocate optimizer temps with tcg_malloc

Groundwork for supporting multiple TCG contexts.

While at it, also allocate temps_used directly as a bitmap of the
required size, instead of using a bitmap of TCG_MAX_TEMPS via
TCGTempSet.

Performance-wise we lose about 1.12% in a translation-heavy workload
such as booting+shutting down debian-arm:

Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \
-machine type=virt -nographic -smp 1 -m 4096 \
-netdev user,id=unet,hostfwd=tcp::2222-:22 \
-device virtio-net-device,netdev=unet \
-drive file=die-on-boot.qcow2,id=myblock,index=0,if=none \
-device virtio-blk-device,drive=myblock \
-kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
-name arm,debug-threads=on -smp 1' (10 runs):

             exec time (s)  Relative slowdown wrt original (%)
---------------------------------------------------------------
original     20.213321616                                  0.
tcg_malloc   20.441130078                           1.1270214
TCGContext   20.477846517                           1.3086662
g_malloc     20.780527895                           2.8061013

The other two alternatives shown in the table are:
- TCGContext: embed temps[TCG_MAX_TEMPS] and TCGTempSet used_temps
  in TCGContext. This is simple enough but it isn't faster than using
  tcg_malloc; moreover, it wastes memory.
- g_malloc: allocate/deallocate both temps and used_temps every time
  tcg_optimize is executed.

Suggested-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: distribute profiling counters across TCGContext's

This is groundwork for supporting multiple TCG contexts.

To avoid scalability issues when profiling info is enabled, this patch
makes the profiling info counters distributed via the following changes:

1) Consolidate profile info into its own struct, TCGProfile, which
TCGContext also includes. Note that tcg_table_op_count is brought
into TCGProfile after dropping the tcg_ prefix.
2) Iterate over the TCG contexts in the system to obtain the total counts.

This change also requires updating the accessors to TCGProfile fields to
use atomic_read/set whenever there may be conflicting accesses (as defined
in C11) to them.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: introduce **tcg_ctxs to keep track of all TCGContext's

Groundwork for supporting multiple TCG contexts.

Note that having n_tcg_ctxs is unnecessary. However, it is
convenient to have it, since it will simplify iterating over the
array: we'll have just a for loop instead of having to iterate
over a NULL-terminated array (which would require n+1 elems)
or having to check with ifdef's for usermode/softmmu.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

gen-icount: fold exitreq_label into TCGContext

Groundwork for supporting multiple TCG contexts.

Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: define tcg_init_ctx and make tcg_ctx a pointer

Groundwork for supporting multiple TCG contexts.

The core of this patch is this change to tcg/tcg.h:

> -extern TCGContext tcg_ctx;
> +extern TCGContext tcg_init_ctx;
> +extern TCGContext *tcg_ctx;

Note that for now we set *tcg_ctx to whatever TCGContext is passed
to tcg_context_init -- in this case &tcg_init_ctx.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: take tb_ctx out of TCGContext

Groundwork for supporting multiple TCG contexts.

Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

translate-all: report correct avg host TB size

Since commit 6e3b2bfd6 ("tcg: allocate TB structs before the
corresponding translated code") we are not fully utilizing
code_gen_buffer for translated code, and therefore are
incorrectly reporting the amount of translated code as well as
the average host TB size. Address this by:

- Making the conscious choice of misreporting the total translated code;
  doing otherwise would mislead users into thinking "-tb-size" is not
  honoured.

- Expanding tb_tree_stats to accurately count the bytes of translated code on
  the host, and using this for reporting the average tb host size,
  as well as the expansion ratio.

In the future we might want to consider reporting the accurate numbers for
the total translated code, together with a "bookkeeping/overhead" field to
account for the TB structs.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

exec-all: rename tb_free to tb_remove

We don't really free anything in this function anymore; we just remove
the TB from the binary search tree.

Suggested-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

translate-all: use a binary search tree to track TBs in TBContext

This is a prerequisite for supporting multiple TCG contexts, since
we will have threads generating code in separate regions of
code_gen_buffer.

For this we need a new field (.size) in struct tb_tc to keep
track of the size of the translated code. This field uses a size_t
to avoid adding a hole to the struct, although really an unsigned
int would have been enough.

The comparison function we use is optimized for the common case:
insertions. Profiling shows that upon booting debian-arm, 98%
of comparisons are between existing tb's (i.e. a->size and b->size
are both !0), which happens during insertions (and removals, but
those are rare). The remaining cases are lookups. From reading the glib
sources we see that the first key is always the lookup key. However,
the code does not assume this to always be the case because this
behaviour is not guaranteed in the glib docs. However, we embed
this knowledge in the code as a branch hint for the compiler.

Note that tb_free does not free space in the code_gen_buffer anymore,
since we cannot easily know whether the tb is the last one inserted
in code_gen_buffer. The next patch in this series renames tb_free
to tb_remove to reflect this.

Performance-wise, lookups in tb_find_pc are the same as before:
O(log n). However, insertions are O(log n) instead of O(1), which
results in a small slowdown when booting debian-arm:

Performance counter stats for 'build/arm-softmmu/qemu-system-arm \
-machine type=virt -nographic -smp 1 -m 4096 \
-netdev user,id=unet,hostfwd=tcp::2222-:22 \
-device virtio-net-device,netdev=unet \
-drive file=img/arm/jessie-arm32.qcow2,id=myblock,index=0,if=none \
-device virtio-blk-device,drive=myblock \
-kernel img/arm/aarch32-current-linux-kernel-only.img \
-append console=ttyAMA0 root=/dev/vda1 \
-name arm,debug-threads=on -smp 1' (10 runs):

- Before:

       8048.598422      task-clock (msec)         #    0.931 CPUs utilized            ( +-  0.28% )
            16,974      context-switches          #    0.002 M/sec                    ( +-  0.12% )
                 0      cpu-migrations            #    0.000 K/sec
            10,125      page-faults               #    0.001 M/sec                    ( +-  1.23% )
    35,144,901,879      cycles                    #    4.367 GHz                      ( +-  0.14% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    65,758,252,643      instructions              #    1.87  insns per cycle          ( +-  0.33% )
    10,871,298,668      branches                  # 1350.707 M/sec                    ( +-  0.41% )
       192,322,212      branch-misses             #    1.77% of all branches          ( +-  0.32% )

       8.640869419 seconds time elapsed                                          ( +-  0.57% )

- After:
       8146.242027      task-clock (msec)         #    0.923 CPUs utilized            ( +-  1.23% )
            17,016      context-switches          #    0.002 M/sec                    ( +-  0.40% )
                 0      cpu-migrations            #    0.000 K/sec
            18,769      page-faults               #    0.002 M/sec                    ( +-  0.45% )
    35,660,956,120      cycles                    #    4.378 GHz                      ( +-  1.22% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    65,095,366,607      instructions              #    1.83  insns per cycle          ( +-  1.73% )
    10,803,480,261      branches                  # 1326.192 M/sec                    ( +-  1.95% )
       195,601,289      branch-misses             #    1.81% of all branches          ( +-  0.39% )

       8.828660235 seconds time elapsed                                          ( +-  0.38% )

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Remove CF_IGNORE_ICOUNT

Now that we have curr_cflags, we can include CF_USE_ICOUNT
early and then remove it as necessary.

Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Add CF_LAST_IO + CF_USE_ICOUNT to CF_HASH_MASK

These flags are used by target/*/translate.c,
and affect code generation.

Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

cpu-exec: lookup/generate TB outside exclusive region during step_atomic

Now that all code generation has been converted to check CF_PARALLEL, we can
generate !CF_PARALLEL code without having yet set !parallel_cpus --
and therefore without having to be in the exclusive region during
cpu_exec_step_atomic.

While at it, merge cpu_exec_step into cpu_exec_step_atomic.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: check CF_PARALLEL instead of parallel_cpus

Thereby decoupling the resulting translated code from the current state
of the system.

The tb->cflags field is not passed to tcg generation functions. So
we add a field to TCGContext, storing there a copy of tb->cflags.

Most architectures have <= 32 registers, which results in a 4-byte hole
in TCGContext. Use this hole for the new field.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/sparc: check CF_PARALLEL instead of parallel_cpus

Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/sh4: check CF_PARALLEL instead of parallel_cpus

Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/s390x: check CF_PARALLEL instead of parallel_cpus

Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/m68k: check CF_PARALLEL instead of parallel_cpus

Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/i386: check CF_PARALLEL instead of parallel_cpus

Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/hppa: check CF_PARALLEL instead of parallel_cpus

Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/arm: check CF_PARALLEL instead of parallel_cpus

Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: convert tb->cflags reads to tb_cflags(tb)

Convert all existing readers of tb->cflags to tb_cflags, so that we
use atomic_read and therefore avoid undefined behaviour in C11.

Note that the remaining setters/getters of the field are protected
by tb_lock, and therefore do not need conversion.

Luckily all readers access the field via 'tb->cflags' (so no foo.cflags,
bar->cflags in the code base), which makes the conversion easily
scriptable:

FILES=$(git grep 'tb->cflags' target include/exec/gen-icount.h \
accel/tcg/translator.c | cut -f1 -d':' | sort | uniq)

perl -pi -e 's/([^.>])tb->cflags/$1tb_cflags(tb)/g' $FILES
perl -pi -e 's/([a-z->.]*)(->|\.)tb->cflags/tb_cflags($1$2tb)/g' $FILES

Then manually fixed the few errors that checkpatch reported.

Compile-tested for all targets.

Suggested-by: Richard Henderson <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Include CF_COUNT_MASK in CF_HASH_MASK

Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Add CPUState cflags_next_tb

We were generating code during tb_invalidate_phys_page_range,
check_watchpoint, cpu_io_recompile, and (seemingly) discarding
the TB, assuming that it would magically be picked up during
the next iteration through the cpu_exec loop.

Instead, record the desired cflags in CPUState so that we request
the proper TB so that there is no more magic.

Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK

This will enable us to decouple code translation from the value
of parallel_cpus at any given time. It will also help us minimize
TB flushes when generating code via EXCP_ATOMIC.

Note that the declaration of parallel_cpus is brought to exec-all.h
to be able to define there the "curr_cflags" inline.

Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Use offsets not indices for TCGv_*

Using the offset of a temporary, relative to TCGContext, rather than
its index means that we don't use 0. That leaves offset 0 free for
a NULL representation without having to leave index 0 unused.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

qom: Introduce CPUClass.tcg_initialize

Move target cpu tcg initialization to common code,
called from cpu_exec_realizefn.

Acked-by: Andreas Färber <[email protected]>
Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Remove TCGV_EQUAL*

When we used structures for TCGv_*, we needed a macro in order to
perform a comparison. Now that we use pointers, this is just clutter.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Remove GET_TCGV_* and MAKE_TCGV_*

The GET and MAKE functions weren't really specific enough.
We now have a full complement of functions that convert exactly
between temporaries, arguments, tcgv pointers, and indices.

The target/sparc change is also a bug fix, which would have affected
a host that defines TCG_TARGET_HAS_extr[lh]_i64_i32, i.e. MIPS64.

Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Introduce temp_tcgv_{i32,i64,ptr}

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Introduce tcgv_{i32,i64,ptr}_{arg,temp}

Transform TCGv_* to an "argument" or a temporary.
For now, an argument is simply the temporary index.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Push tcg_ctx into tcg_gen_callN

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Push tcg_ctx into generator functions

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Use per-temp state data in optimize

While we're touching many of the lines anyway, adjust the naming
of the functions to better distinguish when "TCGArg" vs "TCGTemp"
should be used.

Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Remove unused TCG_CALL_DUMMY_TCGV

Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Change temp_allocate_frame arg to TCGTemp

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Avoid loops against variable bounds

Copy s->nb_globals or s->nb_temps to a local variable for the purposes
of iteration. This should allow the compiler to use low-overhead
looping constructs on some hosts.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Use per-temp state data in liveness

This avoids having to allocate external memory for each temporary.

Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Introduce temp_arg, export temp_idx

At the same time, drop the TCGContext argument and use tcg_ctx instead.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Return NULL temp for TCG_CALL_DUMMY_ARG

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Add temp_global bit to TCGTemp

This avoids needing to test the index of a temp against nb_globals.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Introduce arg_temp

Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Propagate TCGOp down to allocators

Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Propagate args to op->args in tcg.c

Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Propagate args to op->args in optimizer

Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Merge opcode arguments into TCGOp

Rather than have a separate buffer of 10*max_ops entries,
give each opcode 10 entries. The result is actually a bit
smaller and should have slightly more cache locality.

Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tpm: add missing include

else file including "sysemu/tpm.h" fails to compile:

  In file included from qemu/stubs/tpm.c:2:0:
  qemu/include/sysemu/tpm.h:36:19: error: implicit declaration of function ‘object_resolve_path_type’ [-Werror=implicit-function-declaration]
       Object *obj = object_resolve_path_type("", TYPE_TPM_TIS, NULL);
                     ^~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Tested-by: Richard W.M. Jones <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

Merge remote-tracking branch 'remotes/kraxel/tags/input-20171023-pull-request' into staging

input: fixes for ui input code and ps/2 keyboard (mostly sysrq key)

# gpg: Signature made Mon 23 Oct 2017 10:19:22 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg:                 aka "Gerd Hoffmann <[email protected]>"
# gpg:                 aka "Gerd Hoffmann (private) <[email protected]>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/input-20171023-pull-request:
  ui: pull in latest keycodemapdb
  ui: normalize the 'sysrq' key into the 'print' key
  ps2: fix scancodes sent for Ctrl+Pause key combination
  ps2: fix scancodess sent for Pause key in AT set 1
  ps2: fix scancodes sent for Shift/Ctrl+Print key combination
  ps2: fix scancodes sent for Alt-Print key combination (aka SysRq)
  ui: use correct union field for key number
  ui: fix crash with sendkey and raw key numbers
  input: use hex in ps2 keycode trace events

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/kraxel/tags/usb-20171023-pull-request' into staging

usb: ccid fix.

# gpg: Signature made Mon 23 Oct 2017 09:45:00 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg:                 aka "Gerd Hoffmann <[email protected]>"
# gpg:                 aka "Gerd Hoffmann (private) <[email protected]>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/usb-20171023-pull-request:
  usb-ccid: remove needless migration state code

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/kraxel/tags/fixes-20171023-pull-request' into staging

fixes for the fallout of the recent ui and keymap merges.

# gpg: Signature made Mon 23 Oct 2017 09:02:24 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg:                 aka "Gerd Hoffmann <[email protected]>"
# gpg:                 aka "Gerd Hoffmann (private) <[email protected]>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/fixes-20171023-pull-request:
  scripts: don't throw away stderr when checking out git submodules
  ui: add qemu-keymap and shader to .gitignore
  configure: disable qemu-keymap for linux-user qemu

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/shorne/tags/openrisc-20171021-smp-pr' into staging

OpenRISC SMP patchset 20171021

# gpg: Signature made Fri 20 Oct 2017 22:51:16 BST
# gpg:                using RSA key 0xC3B31C2D5E6627E4
# gpg: Good signature from "Stafford Horne <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: D9C4 7354 AEF8 6C10 3A25  EFF1 C3B3 1C2D 5E66 27E4

* remotes/shorne/tags/openrisc-20171021-smp-pr:
  openrisc: Only kick cpu on timeout, not on update
  openrisc: Initial SMP support
  openrisc/cputimer: Perparation for Multicore
  target/openrisc: Make coreid and numcores variable
  openrisc/ompic: Add OpenRISC Multicore PIC (OMPIC)

Signed-off-by: Peter Maydell <[email protected]>

migration: Improve migration thread error handling

We now report errors also when we finish migration, not only on info
migrate. We plan to use this error from several places, and we want
the first error to happen to win, so we add an mutex to order it.

Signed-off-by: Juan Quintela <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>

qapi: Fix grammar in x-multifd-page-count descriptions

Reported-by: Eric Blake <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: add bitmap for received page

This patch adds ability to track down already received
pages, it's necessary for calculation vCPU block time in
postcopy migration feature, and for recovery after
postcopy migration failure.

Also it's necessary to solve shared memory issue in
postcopy livemigration. Information about received pages
will be transferred to the software virtual bridge
(e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
already received pages. fallocate syscall is required for
remmaped shared memory, due to remmaping itself blocks
ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
error (struct page is exists after remmap).

Bitmap is placed into RAMBlock as another postcopy/precopy
related bitmaps.

Reviewed-by: Peter Xu <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Signed-off-by: Alexey Perevalov <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: introduce qemu_ufd_copy_ioctl helper

Just for placing auxilary operations inside helper,
auxilary operations like: track received pages,
notify about copying operation in futher patches.

Reviewed-by: Juan Quintela <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Signed-off-by: Alexey Perevalov <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: postcopy_place_page factoring out

Need to mark copied pages as closer as possible to the place where it
tracks down. That will be necessary in futher patch.

Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Alexey Perevalov <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: new ram_init_bitmaps()

Rearrange the bitmap initialization and the first sync. Since at it,
make sure the locks are taken/released in correct order (I moved RCU
unlock upper - though it may not affect much).

Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: clean up xbzrle cache init/destroy

Let's further simplify ram_init_all() and ram_save_cleanup() by abstract
all the XBZRLE related codes into their own functions.

When allocating xbzrle cache, we are always very careful on -ENOMEM;
which makes sense. Replacing the last g_malloc0() with g_try_malloc0(),
then refactor the logic a bit.

This patch should be fixing some memory leaks when some memory
allocation failed for XBZRLE in the past.

Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: provide ram_state_cleanup

There are two Mutexes that are created but not yet destroyed for
RAMState. Fix that.

Since we are at it, provide helper function to clean up RAMState.

Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: provide ram_state_init()

The old ram_state_init() is not really initializing the RAMState only,
but including lots of other stuff that is RAM-related. Renaming it to
ram_init_all(). Instead, provide a real ram_state_init().

Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: pause-before-switchover for postcopy

Add pause-before-switchover support for postcopy.
After starting postcopy it will transition
active->pre-switchover->postcopy_active

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: allow cancel to unpause

If a migration_cancel is issued during the new paused state,
kick the pause_sem to get to unpause so it can cancel.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migrate: HMP migate_continue

HMP equivalent to the just added migrate-continue
Unpause a migrate paused at a given state.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: migrate-continue

A new qmp command allows the caller to continue from a given
paused state.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: Wait for semaphore before completing migration

Wait for a semaphore before completing the migration,
if the previously added capability was enabled.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: Add 'pre-switchover' and 'device' statuses

Add two statuses for use when the 'pause-before-switchover'
capability is enabled.

'pre-switchover' is the state that we wait in for management
to allow us to continue.
'device' is the state we enter while serialising the devices
after management gives us the OK.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: Add 'pause-before-switchover' capability

When 'pause-before-switchover' is enabled, the outgoing migration
will pause before invalidating the block devices and serializing
the device state.
At this point the management layer gets the chance to clean up any
device jobs or other device users before the migration completes.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: Make cache_init() take an error parameter

Once there, take a total size instead of the size of the pages. We
move the check that the new_size is bigger than one page from
xbzrle_cache_resize().

Signed-off-by: Juan Quintela <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
--

Fix typo spotted by Peter Xu

migration: Move xbzrle cache resize error handling to xbzrle_cache_resize

Signed-off-by: Juan Quintela <[email protected]>
Reviewed-by: Peter Xu <[email protected]>

migration: Make cache size elements use the right types

Signed-off-by: Juan Quintela <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>

migratiom: Remove max_item_age parameter

It was not used at all since commit:

27af7d6ea5015e5ef1f7985eab94a8a218267a2b

which replaced its use by the dirty sync count.

Signed-off-by: Juan Quintela <[email protected]>
Reviewed-by: Peter Xu <[email protected]>

migration: Fix migrate_test_apply for multifd parameters

They were missing when introduced on the tree

Signed-off-by: Juan Quintela <[email protected]>
Reviewed-by: Peter Xu <[email protected]>

ui: pull in latest keycodemapdb

Latest keycodemapdb has a fix for Sun keyboard Pause mapping
and backcompat fix for QEMU's treatment of 0xb7 as an alternative
to 0x54 for triggering Print/SysRq

Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20171019142848 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

ui: normalize the 'sysrq' key into the 'print' key

The 'sysrq' key was mistakenly added to QEMU to deal with incorrect handling
of the 'print' key in the ps2 device:

  commit f2289cb6924afc97b2a75d21bfc9217024d11741
  Author: balrog <balrog@c046a42c-6fe2-441c-8c8c-71466251a162>
  Date:   Wed Jun 4 10:14:16 2008 +0000

    Add sysrq to key names known by "sendkey".

    Adding sysrq keycode to the table enabling running sysrq debugging in
    the guest via the monitor sendkey command, like:

    (qemu) sendkey alt-sysrq-t

    Tested on x86-64 target and Linux guest.

Signed-off-by: Ryan Harper <[email protected]>
The ps2 device is now fixed wrt modifiers and the 'print' key. Further the
handling of the 'sysrq' key has some problems of its own, documented in the
previous commit. To cleanup this mess, we convert any use of 'sysrq' into
'print' prior to dispatching the event to device models.

Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20171019142848 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

ps2: fix scancodes sent for Ctrl+Pause key combination

The 'Pause' key is special in the AT set 1 / set 2 scancode definitions.

An unmodified 'Pause' key is supposed to send

AT Set 1:  e1 1d 45 91 9d c5 (Down)  <nothing> (Up)
AT Set 2:  e1 14 77 e1 f0 14 f0 77 (Down)  <nothing> (Up)

which QEMU gets right. When combined with Ctrl (both left and right variants),
a different sequence is expected

AT Set 1:  e0 46 e0 c6 (Down)  <nothing> (Up)
AT Set 2:  e0 7e e0 f0 73 (Down)  <nothing> (Up)

Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20171019142848 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

ps2: fix scancodess sent for Pause key in AT set 1

The ps2 device was previously fixed to send the special Pause/Print
scancode sequences in:

  commit 8c10e0baf0260b59a4e984744462a18016662e3e
  Author: Hervé Poussineau <[email protected]>
  Date:   Thu Sep 15 22:06:26 2016 +0200

    ps2: use QEMU qcodes instead of scancodes

The sequence used for Pause had a small typo in the AT set 1, with a 0xe1
accidentally changed to 0x91.  This is not immediately visible with Linux
guests since they run the ps2 device with AT set 2 scancodes.

Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20171019142848 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

ps2: fix scancodes sent for Shift/Ctrl+Print key combination

The 'Print' key is special in the AT set 1 / set 2 scancode definitions.

An unmodified 'Print' key is supposed to send

AT Set 1:  e0 2a e0 37 (Down)  e0 b7 e0 aa (Up)
AT Set 2:  e0 12 e0 7c (Down)  e0 f0 7c e0 f0 12 (Up)

which QEMU gets right. When combined with Shift/Ctrl (both left and right
variants), the leading two bytes should be dropped, resulting in

AT Set 1:  e0 37 (Down)  e0 b7 (Up)
AT Set 2:  e0 7c (Down)  e0 f0 7c (Up)

This difference is pretty benign, since of all the operating systems I have
checked (Linux, FreeBSD and OpenStack), none bother to check the leading two
bytes anyway. This change none the less makes the ps2 device better follow real
hardware behaviour.

Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20171019142848 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

ps2: fix scancodes sent for Alt-Print key combination (aka SysRq)

The 'Print' key is special in the AT set 1 / set 2 scancode definitions.

An unmodified 'Print' key is supposed to send

AT Set 1:  e0 2a e0 37 (Down)  e0 b7 e0 aa (Up)
AT Set 2:  e0 12 e0 7c (Down)  e0 f0 7c e0 f0 12 (Up)

which QEMU gets right. When pressed in combination with the 'Alt_L' or 'Alt_R'
keys (which signify SysRq), the scancodes are required to follow a different
scheme. With Alt_L, the expected sequences are

AT set 1:  38, 54 (Down) d4, b8 (Up)
AT set 2:  11, 84 (Down) f0 84, f0 11 (Up)

And with Alt_R

AT set 1:  e0 38, 54 (Down) d4, e0 b8 (Up)
AT set 2:  e0 11, 84 (Down) f0 84, f0 e0 11 (Up)

It is actually slightly more complicated than that, because (according results
of 'showkey -s', keyboards will in fact first release the currently pressed
modifier before sending the sequence above (which effectively re-presses &
then releases the modifier) and finally re-press the original modifier
afterwards. IOW, with Alt_L we need to send

AT set 1:  b8, 38, 54 (Down) d4, b8, 38 (Up)
AT set 2:  f0 11, 11, 84 (Down) f0 84, f0 11, 11 (Up)

And with Alt_R

AT set 1:  e0 b8, e0 38, 54 (Down) d4, e0 b8, e0 38 (Up)
AT set 2:  e0 f0 11, e0 11, 84 (Down) f0 84, e0 f0 11, e0 11 (Up)

The AT set 3 scancodes have no special handling for Alt-Print.

Rather than fixing the handling of the 'print' key in the ps2 driver to consider
the Alt modifiers, way back, a patch was commited that defined an extra 'sysrq'
key name:

  commit f2289cb6924afc97b2a75d21bfc9217024d11741
  Author: balrog <balrog@c046a42c-6fe2-441c-8c8c-71466251a162>
  Date:   Wed Jun 4 10:14:16 2008 +0000

    Add sysrq to key names known by "sendkey".

    Adding sysrq keycode to the table enabling running sysrq debugging in
    the guest via the monitor sendkey command, like:

    (qemu) sendkey alt-sysrq-t

    Tested on x86-64 target and Linux guest.

Signed-off-by: Ryan Harper <[email protected]>
With this patch QEMU would send

AT set 1:  38, 54 (Down) d4, b8 (Up)
AT set 2:  11, 84 (Down) f0 84, f0 11 (Up)

but this doesn't match what actual real keyboards send, as it is not releasing
the original modifier & pressing it again afterwards. In addition the original
problem remains, and a new problem was added:

  - The sequence 'alt-print-t' is still broken, acting as if 'print-t' was
    requested
  - The sequence 'sysrq-t' is broken, injecting an undefine scancode sequence
    tot he guest os (bare 0x54)

To deal with this mess we make these changes to the ps2 code, so that we track
the state of modifier keys (Alt, Shift, Ctrl - both left & right). Then we can
vary what scancodes are sent for Q_KEY_CODE_PRINT according to the Alt key
modifier state

Interestingly, it appears that of operating systems I've checked (Linux, FreeBSD
and OpenSolaris), none of them actually bother to validate the full sequences
for a unmodified 'Print' key. They all just ignore the leading "e0 2a" and
trigger based off "e0 37" alone. The latter two byte sequence is what keyboards
send with 'Print' is combined with 'Shift' or 'Ctrl' modifiers.

Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20171019142848 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

ui: use correct union field for key number

The code converting key numbers to QKeyCode in the 'input-send-event'
command mistakenly accessed the key->u.qcode union field instead of
the key->u.number field. This is harmless because the fields use the
same size datatype in both cases, but none the less it should be fixed
to avoid confusion.

Signed-off-by: Daniel P. Berrange <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-id: 20171019142848 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

ui: fix crash with sendkey and raw key numbers

Previously we enforced that all key events are using QKeyCodes
at time they are sent:

  commit af07e5ff02ae6d4258fc5331007811d0b1c4d35a
  Author: Daniel P. Berrange <[email protected]>
  Date:   Fri Sep 29 11:12:00 2017 +0100

    ui: convert key events to QKeyCodes immediately

This commit forget to fix the code for the legacy 'sendkey'
command which still accepts key numbers from the user, which
then need converting to QKeyCodes

Signed-off-by: Daniel P. Berrange <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-id: 20171019142848 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

input: use hex in ps2 keycode trace events

Hardware scancodes are all documented in hex, so use that in trace
events to make it easier to understand.

Signed-off-by: Daniel P. Berrange <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-id: 20171019142848 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

usb-ccid: remove needless migration state code

This code appears to be unused since its introduction. We need to keep
the state_vmstate field byte in VMState for compatibility reasons.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-id: 20171013125533 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

scripts: don't throw away stderr when checking out git submodules

The stderr from git is important if git fails to checkout modules
due to network problems, or other unexpected errors.

Signed-off-by: Daniel P. Berrange <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Message-id: 20171020130748 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

ui: add qemu-keymap and shader to .gitignore

Signed-off-by: Gerd Hoffmann <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-id: 20171020070914 [email protected]

configure: disable qemu-keymap for linux-user qemu

We don't need qemu-keymap when we build only linux-user qemu.

When we compile in static mode, the libxkbcommon is detected
by configure if the shared one is available, but cannot
be linked if the static version is not available.

As we don't need it for qemu-linux-user, and we generally need
a static link to use it in a chroot, disable qemu-keymap in
this case.

Signed-off-by: Laurent Vivier <[email protected]>
Message-id: 20171019191606 [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>

openrisc: Only kick cpu on timeout, not on update

Previously we were kicking the cpu on every update. This caused
problems noticeable in SMP configurations where one CPU got pinned
continuously servicing timer exceptions.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Stafford Horne <[email protected]>

openrisc: Initial SMP support

Wire in ompic and add basic support for SMP.  The OpenRISC is special in
that interrupts for devices are routed to each core's PIC.  This is
achieved using the qemu_irq_split utility, but this currently limits
OpenRISC to 2 cores.

This models the reference architecture described in the OpenRISC spec
1.2 proposal.

  https://github.com/stffrdhrn/doc/raw/arch-1.2-proposal/openrisc-arch-1.2-rev0.pdf

The changes to the intialization of the sim include:

CPU Reset
o Reset each cpu to the bootstrap PC rather than only a single cpu as
   done before.
o During Kernel loading the bootstrap PC is saved in a static global.

Network Initialization
o Connect the interrupt to each CPU
o Use more simple sysbus_mmio_map() rather than memory_region_add_subregion()

Sim Initialization
o Initialize the pic and tick timer per cpu
o Wire in the OMPIC if SMP is enabled
o Wire the serial irq to each CPU using qemu_irq_split()

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Stafford Horne <[email protected]>

openrisc/cputimer: Perparation for Multicore

In order to support multicore system we move some of the previously
static state variables into the state of each core.

On the other hand in order to allow timers to be synced between each
code the ttcr (tick timer count register) is moved out of the core.
This is not as per real hardware spec which has a separate timer counter
per core, but it seems the most simple way to keep each clock in sync.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Stafford Horne <[email protected]>