Git Repo - qemu.git/log

Merge remote-tracking branch 'remotes/rth/tags/pull-atomic-20161026' into staging

cmpxchg emulation of atomics, v8

# gpg: Signature made Wed 26 Oct 2016 16:30:03 BST
# gpg:                using RSA key 0xAD1270CC4DD0279B
# gpg: Good signature from "Richard Henderson <[email protected]>"
# gpg:                 aka "Richard Henderson <[email protected]>"
# gpg:                 aka "Richard Henderson <[email protected]>"
# Primary key fingerprint: 9CB1 8DDA F8E8 49AD 2AFC  16A4 AD12 70CC 4DD0 279B

* remotes/rth/tags/pull-atomic-20161026: (37 commits)
  target-alpha: Emulate LL/SC using cmpxchg helpers
  target-alpha: Introduce MMU_PHYS_IDX
  target-arm: remove EXCP_STREX + cpu_exclusive_{test, info}
  linux-user: remove handling of aarch64's EXCP_STREX
  linux-user: remove handling of ARM's EXCP_STREX
  target-arm: emulate aarch64's LL/SC using cmpxchg helpers
  target-arm: emulate SWP with atomic_xchg helper
  target-arm: emulate LL/SC using cmpxchg helpers
  target-arm: Rearrange aa32 load and store functions
  tests: add atomic_add-bench
  target-i386: remove helper_lock()
  target-i386: emulate XCHG using atomic helper
  target-i386: emulate LOCK'ed BTX ops using atomic helpers
  target-i386: emulate LOCK'ed XADD using atomic helper
  target-i386: emulate LOCK'ed NEG using cmpxchg helper
  target-i386: emulate LOCK'ed NOT using atomic helper
  target-i386: emulate LOCK'ed INC using atomic helper
  target-i386: emulate LOCK'ed OP instructions using atomic helpers
  target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers
  tcg: Emit barriers with parallel_cpus
  ...

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging

# gpg: Signature made Wed 26 Oct 2016 03:19:06 BST
# gpg:                using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request:
  colo-proxy: fix memory leak
  net: rtl8139: limit processing of ring descriptors
  net: vmxnet: initialise local tx descriptor
  e1000e: Don't zero out buffer address in rx descriptor
  net: rocker: set limit to DMA buffer size
  net: eepro100: fix memory leak in device uninit
  tap-bsd: OpenBSD uses tap(4) now
  net: pcnet: fix source formatting and indentation
  net: pcnet: check rx/tx descriptor ring length

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/vivier/tags/m68k-part1-pull-request' into staging

# gpg: Signature made Tue 25 Oct 2016 19:58:46 BST
# gpg:                using RSA key 0xF30C38BD3F2FBE3C
# gpg: Good signature from "Laurent Vivier <[email protected]>"
# gpg:                 aka "Laurent Vivier <[email protected]>"
# gpg:                 aka "Laurent Vivier (Red Hat) <[email protected]>"
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier/tags/m68k-part1-pull-request: (23 commits)
  target-m68k: Optimize gen_flush_flags
  target-m68k: Optimize some comparisons
  target-m68k: Use setcond for scc
  target-m68k: Introduce DisasCompare
  target-m68k: Reorg flags handling
  target-m68k: Remove incorrect clearing of cc_x
  target-m68k: Some fixes to SR and flags management
  target-m68k: Print flags properly
  target-m68k: update CPU flags management
  target-m68k: don't update cc_dest in helpers
  target-m68k: update move to/from ccr/sr
  target-m68k: remove m68k_cpu_exec_enter() and m68k_cpu_exec_exit()
  target-m68k: Replace helper_xflag_lt with setcond
  target-m68k: allow to update flags with operation on words and bytes
  target-m68k: REG() macro cleanup
  target-m68k: set PAGE_BITS to 12 for m68k
  target-m68k: define operand sizes
  target-m68k: set disassembler mode to 680x0 or coldfire
  target-m68k: introduce read_imXX() functions
  target-m68k: manage scaled index
  ...

Signed-off-by: Peter Maydell <[email protected]>

target-alpha: Emulate LL/SC using cmpxchg helpers

Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. However, portable parallel
code is written assuming only cmpxchg which means that in
practice this is a viable alternative.

Signed-off-by: Richard Henderson <[email protected]>

target-alpha: Introduce MMU_PHYS_IDX

Rather than using helpers for physical accesses, use a mmu index.
The primary cleanup is with store-conditional on physical addresses.

Signed-off-by: Richard Henderson <[email protected]>

target-arm: remove EXCP_STREX + cpu_exclusive_{test, info}

The exception is not emitted anymore; remove it and the associated
TCG variables.

Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-Id: <1467054136 [email protected]>

linux-user: remove handling of aarch64's EXCP_STREX

The exception is not emitted anymore.

Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-Id: <1467054136 [email protected]>

linux-user: remove handling of ARM's EXCP_STREX

The exception is not emitted anymore.

Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-Id: <1467054136 [email protected]>

target-arm: emulate aarch64's LL/SC using cmpxchg helpers

Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.

The appended emulates LL/SC pairs in aarch64 with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.

Hi-res plots: http://imgur.com/a/JVc8Y

                atomic_add-bench: 1000000 ops/thread, [0,1] range

  18 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  16 ++master +-H--+                                                      ++
     ||                                                                    |
  14 ++                                                                   ++
     | |                                                                   |
  12 ++|                                                                  ++
     | |                                                                   |
  10 ++++                                                                 ++
   8 ++E                                                                  ++
     |+++                                                                  |
   6 ++ |                                                                 ++
     |  |                                                                  |
   4 ++ |                                                                 ++
     |   |                                                                 |
   2 +H++E+---                                                            ++
     + |     +E++----+E+---+--+E+----++E+------+E+------+E++----+E+---+--+E|
   0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 1000000 ops/thread, [0,2] range

  18 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  16 ++master +-H--+                                                      ++
     | |                                                                   |
  14 ++E                                                                  ++
     | |                                                                   |
  12 ++|                                                                  ++
     |+++                                                                  |
  10 ++ |                                                                 ++
   8 ++ |                                                                 ++
     |  |                                                                  |
   6 ++ |                                                                 ++
     |   |                                                                 |
   4 ++  |                                                                ++
     |  +E+---                                                             |
   2 +H+     +E+-----+++              +++      +++   ---+E+-----+E+------+++
     +++        +    +E+---+--+E+----++E+------+E+---   ++++    +++   +  +E|
   0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

               atomic_add-bench: 1000000 ops/thread, [0,128] range

  70 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  60 ++master +-H--+                  +++            ---+E+-----+E+------+E+
     |                        +E+------E-------+E+---                      |
     |                     ---        +++                                  |
  50 ++              +++---                                               ++
     |              -+E+                                                   |
  40 ++      +++----                                                      ++
     |        E-                                                           |
     |      --|                                                            |
  30 ++   -- +++                                                          ++
     |  +E+                                                                |
  20 ++E+                                                                 ++
     |E+                                                                   |
     |                                                                     |
  10 ++                                                                   ++
     +          +          +         +          +          +          +    |
   0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

              atomic_add-bench: 1000000 ops/thread, [0,1024] range

  160 ++---------+---------+----------+---------+----------+----------+---++
      +cmpxchg +-E--+      +          +         +          +          +    |
  140 ++master +-H--+                                           +++      +++
      |                                                -+E+-----+E+-------E|
  120 ++                                       +++ ----                  +++
      |                                +++  ----E--                        |
  100 ++                              --E---   +++                        ++
      |                       +++ ---- +++                                 |
   80 ++                     --E--                                        ++
      |                  ---- +++                                          |
      |              -+E+                                                  |
   60 ++         ---- +++                                                 ++
      |      +E+-                                                          |
   40 ++   --                                                             ++
      |  +E+                                                               |
   20 +EE+                                                                ++
      +++        +         +          +         +          +          +    |
    0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

[rth: Rearrange 128-bit cmpxchg helper.  Enforce alignment on LL.]

Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <1467054136 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-arm: emulate SWP with atomic_xchg helper

Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <1467054136 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-arm: emulate LL/SC using cmpxchg helpers

Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.

The appended emulates LL/SC pairs in ARM with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.

Hi-res plots: http://imgur.com/a/aNQpB

               atomic_add-bench: 1000000 ops/thread, [0,1] range

  9 ++---------+----------+----------+----------+----------+----------+---++
    +cmpxchg +-E--+       +          +          +          +          +    |
  8 +Emaster +-H--+                                                       ++
    | |                                                                    |
  7 ++E                                                                   ++
    | |                                                                    |
  6 ++++                                                                  ++
    |  |                                                                   |
  5 ++ |                                                                  ++
  4 ++ |                                                                  ++
    |  |                                                                   |
  3 ++ |                                                                  ++
    |   |                                                                  |
  2 ++  |                                                                 ++
    |H++E+---                                  +++  ---+E+------+E+------+E|
  1 +++     +E+-----+E+------+E+------+E+------+E+--   +++      +++       ++
    ++H+       +    +++   +  +++     ++++       +          +          +    |
  0 ++--H----H-+-----H----+----------+----------+----------+----------+---++
    0          10         20         30         40         50         60
                               Number of threads

                atomic_add-bench: 1000000 ops/thread, [0,2] range

  16 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  14 ++master +-H--+                                                      ++
     | |                                                                   |
  12 ++|                                                                  ++
     | E                                                                   |
  10 ++|                                                                  ++
     | |                                                                   |
   8 ++++                                                                 ++
     |E+|                                                                  |
     |  |                                                                  |
   6 ++ |                                                                 ++
     |   |                                                                 |
   4 ++  |                                                                ++
     |  +E+---       +++      +++              +++           ---+E+------+E|
   2 +H+     +E+------E-------+E+-----+E+------+E+------+E+--            +++
     + |        +    +++   +         ++++       +          +          +    |
   0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

               atomic_add-bench: 1000000 ops/thread, [0,128] range

  70 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +       ++++          +    |
  60 ++master +-H--+                                 ----E------+E+-------++
     |                                        -+E+---   +++     +++      +E|
     |                                +++ ---- +++                       ++|
  50 ++                       +++  ---+E+-                                ++
     |                        -E---                                        |
  40 ++                    ---+++                                         ++
     |               +++---                                                |
     |              -+E+                                                   |
  30 ++      +++----                                                      ++
     |       +E+                                                           |
  20 ++ +++--                                                             ++
     |  +E+                                                                |
     |+E+                                                                  |
  10 +E+                                                                  ++
     +          +          +         +          +          +          +    |
   0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

              atomic_add-bench: 1000000 ops/thread, [0,1024] range

  120 ++---------+---------+----------+---------+----------+----------+---++
      +cmpxchg +-E--+      +          +         +          +          +    |
      | master +-H--+                                                    ++|
  100 ++                                                              ----E+
      |                                                 +++  ---+E+---   ++|
      |                                                --E---   +++        |
   80 ++                                           ---- +++               ++
      |                                     ---+E+-                        |
   60 ++                              -+E+--                              ++
      |                       +++ ---- +++                                 |
      |                      -+E+-                                         |
   40 ++              +++----                                             ++
      |      +++   ---+E+                                                  |
      |     -+E+---                                                        |
   20 ++ +E+                                                              ++
      |+E+++                                                               |
      +E+        +         +          +         +          +          +    |
    0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

[rth: Enforce alignment for ldrexd.]

Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <1467054136 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-arm: Rearrange aa32 load and store functions

Stop specializing on TARGET_LONG_BITS == 32; unconditionally allocate
a temp and expand with tcg_gen_extu_i32_tl. Split out gen_aa32_addr,
gen_aa32_frob64, gen_aa32_ld_i32 and gen_aa32_st_i32 as separate interfaces.

Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tests: add atomic_add-bench

With this microbenchmark we can measure the overhead of emulating atomic
instructions with a configurable degree of contention.

The benchmark spawns $n threads, each performing $o atomic ops (additions)
in a loop. Each atomic operation is performed on a different cache line
(assuming lines are 64b long) that is randomly selected from a range [0, $r).

[ Note: each $foo corresponds to a -foo flag ]

Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-Id: <1467054136 [email protected]>

target-i386: remove helper_lock()

It's been superseded by the atomic helpers.

The use of the atomic helpers provides a significant performance and scalability
improvement. Below is the result of running the atomic_add-test microbenchmark with:
$ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n
, where $n is the number of threads and $r is the allowed range for the additions.

The scenarios measured are:
- atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset)
- cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper
- master: before this patchset

Results sorted in ascending range, i.e. descending degree of contention.
Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64
Opteron 6376 cores.

                atomic_add-bench: 5000000 ops/thread, [0,1] range

  25 ++---------+----------+---------+----------+----------+----------+---++
     + atomic +-E--+       +         +          +          +          +    |
     |cmpxchg +-H--+                                                       |
  20 +Emaster +-N--+                                                      ++
     ||                                                                    |
     |++                                                                   |
     ||                                                                    |
  15 +++                                                                  ++
     |N|                                                                   |
     |+|                                                                   |
  10 ++|                                                                  ++
     |+|+                                                                  |
     | |    -+E+------        +++  ---+E+------+E+------+E+-----+E+------+E|
     |+E+E+- +++     +E+------+E+--                                        |
   5 ++|+                                                                 ++
     |+N+H+---                                 +++                         |
     ++++N+--+H++----+++   +  +++  --++H+------+H+------+H++----+H+---+--- |
   0 ++---------+-----H----+---H-----+----------+----------+----------+---H+
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 5000000 ops/thread, [0,2] range

  25 ++---------+----------+---------+----------+----------+----------+---++
     ++atomic +-E--+       +         +          +          +          +    |
     |cmpxchg +-H--+                                                       |
  20 ++master +-N--+                                                      ++
     |E|                                                                   |
     |++                                                                   |
     ||E                                                                   |
  15 ++|                                                                  ++
     |N||                                                                  |
     |+||                                   ---+E+------+E+-----+E+------+E|
  10 ++| |        ---+E+------+E+-----+E+---                    +++      +++
     ||H+E+--+E+--                                                         |
     |+++++                                                                |
     | ||                                                                  |
   5 ++|+H+--                                  +++                        ++
     |+N+    -                              ---+H+------+H+------          |
     +  +N+--+H++----+H+---+--+H+----++H+---    +          +    +H+---+--+H|
   0 ++---------+----------+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 5000000 ops/thread, [0,8] range

  40 ++---------+----------+---------+----------+----------+----------+---++
     ++atomic +-E--+       +         +          +          +          +    |
  35 +cmpxchg +-H--+                                                      ++
     | master +-N--+               ---+E+------+E+------+E+-----+E+------+E|
  30 ++|                   ---+E+--   +++                                 ++
     | |            -+E+---                                                |
  25 ++E        ---- +++                                                  ++
     |+++++ -+E+                                                           |
  20 +E+ E-- +++                                                          ++
     |H|+++                                                                |
     |+|                                       +H+-------                  |
  15 ++H+                                   ---+++      +H+------         ++
     |N++H+--                         +++---                    +H+------++|
  10 ++ +++  -       +++           ---+H+                       +++      +H+
     | |     +H+-----+H+------+H+--                                        |
   5 ++|                      +++                                         ++
     ++N+N+--+N++          +         +          +          +          +    |
   0 ++---------+----------+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

               atomic_add-bench: 5000000 ops/thread, [0,128] range

  160 ++---------+---------+----------+---------+----------+----------+---++
      + atomic +-E--+      +          +         +          +          +    |
  140 +cmpxchg +-H--+                          +++      +++               ++
      | master +-N--+                           E--------E------+E+------++|
  120 ++                                      --|        |      +++       E+
      |                                     -- +++      +++              ++|
  100 ++                                   -                              ++
      |                                +++-                     +++      ++|
   80 ++                              -+E+    -+H+------+H+------H--------++
      |                           ----    ----                  +++       H|
      |            ---+E+-----+E+-  ---+H+                               ++|
   60 ++     +E+---   +++  ---+H+---                                      ++
      |    --+++   ---+H+--                                                |
   40 ++ +E+-+H+---                                                       ++
      |  +H+                                                               |
   20 +EE+                                                                ++
      +N+        +         +          +         +          +          +    |
    0 ++N-N---N--+---------+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

              atomic_add-bench: 5000000 ops/thread, [0,1024] range

  350 ++---------+---------+----------+---------+----------+----------+---++
      + atomic +-E--+      +          +         +          +          +    |
  300 +cmpxchg +-H--+                                                    +++
      | master +-N--+                                           +++       ||
      |                                                 +++      |    ----E|
  250 ++                                                 |   ----E----    ++
      |                                              ----E---    |    ---+H|
  200 ++                                      -+E+---   +++  ---+H+---    ++
      |                                   ----         -+H+--              |
      |                                +E+     +++ ---- +++                |
  150 ++                            ---+++  ---+H+-                       ++
      |                          ---  -+H+--                               |
  100 ++                   ---+E+ ---- +++                                ++
      |      +++   ---+E+-----+H+-                                         |
      |     -+E+------+H+--                                                |
   50 ++ +E+                                                              ++
      +EE+       +         +          +         +          +          +    |
    0 ++N-N---N--+---------+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

  hi-res: http://imgur.com/a/fMRmq

For master I stopped measuring master after 8 threads, because there is little
point in measuring the well-known performance collapse of a contended lock.

Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <1467054136 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-i386: emulate XCHG using atomic helper

Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <1467054136 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-i386: emulate LOCK'ed BTX ops using atomic helpers

[rth: Avoid redundant qemu_ld in locked case. Fix previously unnoticed
incorrect zero-extension of address in register-offset case.]

Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <1467054136 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-i386: emulate LOCK'ed XADD using atomic helper

[rth: Move load of reg value to common location.]

Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <1467054136 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-i386: emulate LOCK'ed NEG using cmpxchg helper

[rth: Move redundant qemu_load out of cmpxchg loop.]

Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <1467054136 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-i386: emulate LOCK'ed NOT using atomic helper

[rth: Avoid qemu_load that's redundant with the atomic op.]

Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <1467054136 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-i386: emulate LOCK'ed INC using atomic helper

[rth: Merge gen_inc_locked back into gen_inc to share cc update.]

Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <1467054136 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-i386: emulate LOCK'ed OP instructions using atomic helpers

[rth: Eliminate some unnecessary temporaries.]

Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <1467054136 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers

The diff here is uglier than necessary. All this does is to turn

FOO

into:

if (s->prefix & PREFIX_LOCK) {
BAR
} else {
FOO
}

where FOO is the original implementation of an unlocked cmpxchg.

[rth: Adjust unlocked cmpxchg to use movcond instead of branches.
Adjust helpers to use atomic helpers.]

Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <1467054136 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Emit barriers with parallel_cpus

Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Add CONFIG_ATOMIC64

Allow qemu to build on 32-bit hosts without 64-bit atomic ops.

Even if we only allow 32-bit hosts to multi-thread emulate 32-bit
guests, we still need some way to handle the 32-bit guest using a
64-bit atomic operation. Do so by dropping back to single-step.

Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Add atomic128 helpers

Force the use of cmpxchg16b on x86_64.

Wikipedia suggests that only very old AMD64 (circa 2004) did not have
this instruction. Further, it's required by Windows 8 so no new cpus
will ever omit it.

If we truely care about these, then we could check this at startup time
and then avoid executing paths that use it.

Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Add atomic helpers

Add all of cmpxchg, op_fetch, fetch_op, and xchg.
Handle both endian-ness, and sizes up to 8.
Handle expanding non-atomically, when emulating in serial.

Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

cputlb: Tidy some macros

TGT_LE and TGT_BE are not size dependent and do not need to be
redefined. The others are no longer used at all.

Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

cputlb: Move most of iotlb code out of line

Saves 2k code size off of a cold path.

Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

cputlb: Remove includes from softmmu_template.h

We already include exec/address-spaces.h and exec/memory.h in
cputlb.c; the include of qemu/timer.h appears to be a fossil.

Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

cputlb: Move probe_write out of softmmu_template.h

Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

cputlb: Replace SHIFT with DATA_SIZE

Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

linux-user: enable parallel code generation on clone

The variable parallel_cpus controls the generation of thread aware
atomic code. We only need to set it once we clone our first thread.
At this point any existing translations need to be thrown away.

Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Add EXCP_ATOMIC

When we cannot emulate an atomic operation within a parallel
context, this exception allows us to stop the world and try
again in a serial context.

Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

int128: Add int128_make128

Allows Int128 to be used more generally, rather than having to
begin with 64-bit inputs and accumulate.

Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

int128: Use __int128 if available

Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

exec: Avoid direct references to Int128 parts

Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

atomics: Add __nocheck atomic operations

While the check against sizeof(void *) is appropriate for
normal usage within qemu, there are places in which we want
wider operaions and have checked for their existance.

Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

atomics: add atomic_op_fetch variants

This paves the way for upcoming work.

Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-Id: <1467054136 [email protected]>

atomics: add atomic_xor

This paves the way for upcoming work.

Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-Id: <1467054136 [email protected]>

atomics: Add parameters to macros

Making these functional rather than object macros will
prevent later problems with complex macro expansion.

Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

colo-proxy: fix memory leak

Fix memory leak in colo-compare.c and filter-rewriter.c
Report by Coverity and add some comments.

Signed-off-by: Zhang Chen <[email protected]>
Reviewed-by: zhanghailiang <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: rtl8139: limit processing of ring descriptors

RTL8139 ethernet controller in C+ mode supports multiple
descriptor rings, each with maximum of 64 descriptors. While
processing transmit descriptor ring in 'rtl8139_cplus_transmit',
it does not limit the descriptor count and runs forever. Add
check to avoid it.

Reported-by: Andrew Henderson <[email protected]>
Signed-off-by: Prasad J Pandit <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: vmxnet: initialise local tx descriptor

In Vmxnet3 device emulator while processing transmit(tx) queue,
when it reaches end of packet, it calls vmxnet3_complete_packet.
In that local 'txcq_descr' object is not initialised, which could
leak host memory bytes a guest.

Reported-by: Li Qiang <[email protected]>
Signed-off-by: Prasad J Pandit <[email protected]>
Reviewed-by: Dmitry Fleytman <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

e1000e: Don't zero out buffer address in rx descriptor

The e1000e emulation zeroes out any used rx descriptor and then writes a
completely newly constructed value there. By doing this, it doesn't only
update the write-back area of the descriptors (as it's supposed to do),
but it also clears the buffer address, which real hardware doesn't do.

The spec explicitly mentions in chapter 7.1.8 that it is valid for a
driver to reuse a descriptor and only update the status field while
doing so, i.e. reusing the old buffer address:

    If software statically allocates buffers, and uses memory read to
    check for completed descriptors, it simply has to zero the status
    byte in the descriptor to make it ready for reuse by hardware.

This patch fixes the behaviour to leave the buffer address in
descriptors unchanged even after the descriptor has been used.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Dmitry Fleytman <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: rocker: set limit to DMA buffer size

Rocker network switch emulator has test registers to help debug
DMA operations. While testing host DMA access, a buffer address
is written to register 'TEST_DMA_ADDR' and its size is written to
register 'TEST_DMA_SIZE'. When performing TEST_DMA_CTRL_INVERT
test, if DMA buffer size was greater than 'INT_MAX', it leads to
an invalid buffer access. Limit the DMA buffer size to avoid it.

Reported-by: Huawei PSIRT <[email protected]>
Signed-off-by: Prasad J Pandit <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: eepro100: fix memory leak in device uninit

The exit dispatch of eepro100 network card device doesn't free
the 's->vmstate' field which was allocated in device realize thus
leading a host memory leak. This patch avoid this.

Signed-off-by: Li Qiang <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

tap-bsd: OpenBSD uses tap(4) now

Update the tap-bsd code now that OpenBSD uses tap(4).

Signed-off-by: Brad Smith <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: pcnet: fix source formatting and indentation

Fix indentations and source format at few places. Add braces
around 'if' and 'while' statements.

Signed-off-by: Prasad J Pandit <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: pcnet: check rx/tx descriptor ring length

The AMD PC-Net II emulator has set of control and status(CSR)
registers. Of these, CSR76 and CSR78 hold receive and transmit
descriptor ring length respectively. This ring length could range
from 1 to 65535. Setting ring length to zero leads to an infinite
loop in pcnet_rdra_addr() or pcnet_transmit(). Add check to avoid it.

Reported-by: Li Qiang <[email protected]>
Signed-off-by: Prasad J Pandit <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

target-m68k: Optimize gen_flush_flags

Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

target-m68k: Optimize some comparisons

Signed-off-by: Richard Henderson <[email protected]>
[laurent: fixed VC and VS: assign v1, not v2]
Signed-off-by: Laurent Vivier <[email protected]>

target-m68k: Use setcond for scc

Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

target-m68k: Introduce DisasCompare

Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

target-m68k: Reorg flags handling

Separate all ccr bits. Continue to batch updates via cc_op.

Signed-off-by: Richard Henderson <[email protected]>
Fix gen_logic_cc() to really extend the size of the result.
Fix gen_get_ccr(): update cc_op as it is used by the helper.
Factorize flags computing and src/ccr cleanup

Signed-off-by: Laurent Vivier <[email protected]>
target-m68k: sr/ccr cleanup

Signed-off-by: Laurent Vivier <[email protected]>

target-m68k: Remove incorrect clearing of cc_x

The CF docs certainly doesnt suggest this is true.

Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

target-m68k: Some fixes to SR and flags management

Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

target-m68k: Print flags properly

Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

target-m68k: update CPU flags management

Copied from target-i386

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

target-m68k: don't update cc_dest in helpers

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

target-m68k: update move to/from ccr/sr

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

target-m68k: remove m68k_cpu_exec_enter() and m68k_cpu_exec_exit()

Update cc_op directly from tcg_gen_insn_start() and
restore_state_to_opc()

Copied from target-i386

Signed-off-by: Laurent Vivier <[email protected]>

target-m68k: Replace helper_xflag_lt with setcond

Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

target-m68k: allow to update flags with operation on words and bytes

Signed-off-by: Laurent Vivier <[email protected]>

target-m68k: REG() macro cleanup

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

target-m68k: set PAGE_BITS to 12 for m68k

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

target-m68k: define operand sizes

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

target-m68k: set disassembler mode to 680x0 or coldfire

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

target-m68k: introduce read_imXX() functions

Read a 8, 16 or 32bit immediat constant.

An immediate constant is stored in the instruction opcode and
can be in one or two extension words.

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

target-m68k: manage scaled index

Scaled index is not supported by 68000, 68008, and 68010.

EA = (bd + PC) + Xn.SIZE*SCALE + od

Ignore it:

M68000 FAMILY PROGRAMMER’S REFERENCE MANUAL
2.4 BRIEF EXTENSION WORD FORMAT COMPATIBILITY

"If the MC68000 were to execute an instruction that
encoded a scaling factor, the scaling factor would be
ignored and would not access the desired memory address.
The earlier microprocessors do not recognize the brief
extension word formats implemented by newer processors.
Although they can detect illegal instructions, they do not
decode invalid encodings of the brief extension word formats
as exceptions."

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

target-m68k: define m680x0 CPUs and features

This patch defines height new features:

    - M68K_FEATURE_SCALED_INDEX, scaled address index register
    - M68K_FEATURE_LONG_MULDIV, 32bit multiply/divide
    - M68K_FEATURE_QUAD_MULDIV, 64bit multiply/divide
    - M68K_FEATURE_BCCL, long conditional branches
    - M68K_FEATURE_BITFIELD, bit field instructions
    - M68K_FEATURE_FPU, FPU instructions
    - M68K_FEATURE_CAS, cas instruction
    - M68K_FEATURE_BKPT, bkpt instruction

Original patch from Andreas Schwab <[email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

target-m68k: Build the opcode table only once to avoid multithreading issues

Signed-off-by: John Paul Adrian Glaubitz <[email protected]>
Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

target-m68k: fix DEBUG_DISPATCH

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2016-10-25' into staging

QAPI patches for 2016-10-25

# gpg: Signature made Tue 25 Oct 2016 16:56:27 BST
# gpg:                using RSA key 0x3870B400EB918653
# gpg: Good signature from "Markus Armbruster <[email protected]>"
# gpg:                 aka "Markus Armbruster <[email protected]>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-qapi-2016-10-25:
  qdict: implement a qdict_crumple method for un-flattening a dict
  qapi: don't pass two copies of TestInputVisitorData to tests
  qapi: rename QmpOutputVisitor to QObjectOutputVisitor
  qapi: rename QmpInputVisitor to QObjectInputVisitor
  qapi: rename *qmp-*-visitor* to *qobject-*-visitor*
  qapi: add trace events for visitor
  trivial: Restore blank line in qapi-schema

Signed-off-by: Peter Maydell <[email protected]>

qdict: implement a qdict_crumple method for un-flattening a dict

The qdict_flatten() method will take a dict whose elements are
further nested dicts/lists and flatten them by concatenating
keys.

The qdict_crumple() method aims to do the reverse, taking a flat
qdict, and turning it into a set of nested dicts/lists. It will
apply nesting based on the key name, with a '.' indicating a
new level in the hierarchy. If the keys in the nested structure
are all numeric, it will create a list, otherwise it will create
a dict.

If the keys are a mixture of numeric and non-numeric, or the
numeric keys are not in strictly ascending order, an error will
be reported.

As an example, a flat dict containing

{
   'foo.0.bar': 'one',
   'foo.0.wizz': '1',
   'foo.1.bar': 'two',
   'foo.1.wizz': '2'
}

will get turned into a dict with one element 'foo' whose
value is a list. The list elements will each in turn be
dicts.

{
   'foo': [
     { 'bar': 'one', 'wizz': '1' },
     { 'bar': 'two', 'wizz': '2' }
   ],
}

If the key is intended to contain a literal '.', then it must
be escaped as '..'. ie a flat dict

  {
     'foo..bar': 'wizz',
     'bar.foo..bar': 'eek',
     'bar.hello': 'world'
  }

Will end up as

  {
     'foo.bar': 'wizz',
     'bar': {
        'foo.bar': 'eek',
        'hello': 'world'
     }
  }

The intent of this function is that it allows a set of QemuOpts
to be turned into a nested data structure that mirrors the nesting
used when the same object is defined over QMP.

Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Message-Id: <1475246744 [email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
[Parameter recursive dropped along with its tests; whitespace style
touched up]
Signed-off-by: Markus Armbruster <[email protected]>

qapi: don't pass two copies of TestInputVisitorData to tests

The input_visitor_test_add() method was accepting an instance
of 'TestInputVisitorData' and passing it as the 'user_data'
parameter to test functions. The main 'TestInputVisitorData'
instance that was actually used, was meanwhile being allocated
automatically by the test framework fixture setup.

The 'user_data' parameter is going to be needed for tests
added in later patches, so getting rid of the current mistaken
usage now allows this.

Signed-off-by: Daniel P. Berrange <[email protected]>
Message-Id: <1475246744 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Signed-off-by: Markus Armbruster <[email protected]>

qapi: rename QmpOutputVisitor to QObjectOutputVisitor

The QmpOutputVisitor has no direct dependency on QMP. It is
valid to use it anywhere that one wants a QObject. Rename it
to better reflect its functionality as a generic QAPI
to QObject converter.

The commit before previous renamed the files, this one renames C
identifiers.

Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Message-Id: <1475246744 [email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
[Split into file rename and identifier rename]
Signed-off-by: Markus Armbruster <[email protected]>

qapi: rename QmpInputVisitor to QObjectInputVisitor

The QmpInputVisitor has no direct dependency on QMP. It is
valid to use it anywhere that one has a QObject. Rename it
to better reflect its functionality as a generic QObject
to QAPI converter.

The previous commit renamed the files, this one renames C identifiers.

Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Message-Id: <1475246744 [email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
[Straightforwardly rebased, split into file and identifier rename]
Signed-off-by: Markus Armbruster <[email protected]>

qapi: rename *qmp-*-visitor* to *qobject-*-visitor*

The QMP visitors have no direct dependency on QMP. It is
valid to use them anywhere that one has a QObject. Rename them
to better reflect their functionality as a generic QObject
to QAPI converter.

This is the first of three parts: rename the files. The next two
parts will rename C identifiers. The split is necessary to make git
rename detection work.

Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
[Split into file and identifier rename, two comments touched up]
Signed-off-by: Markus Armbruster <[email protected]>

qapi: add trace events for visitor

Allow tracing of the operation of visitors

Signed-off-by: Daniel P. Berrange <[email protected]>
Message-Id: <1475246744 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
[visit_type_uint8() & friends rearranged slightly for clarity]
Signed-off-by: Markus Armbruster <[email protected]>

trivial: Restore blank line in qapi-schema

Commit de63ab6 accidentally undid part of commit a43edcf,
because the two patches were written in parallel, and the
blank line was not noticed as a casualty of merge conflicts.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <1476739794 [email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Markus Armbruster <[email protected]>

tests: Restore check-qdict unit test

Commit ea3af47 accidentally dropped check-qdict from the list of unit
tests. Put it back.

Signed-off-by: Markus Armbruster <[email protected]>
Message-id: 1477386565 [email protected]
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into staging

x86 and CPU queue, 2016-10-24

x2APIC support to APIC code, cpu_exec_init() refactor on all
architectures, and other x86 changes.

# gpg: Signature made Mon 24 Oct 2016 20:51:14 BST
# gpg:                using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <[email protected]>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/x86-pull-request:
  exec: call cpu_exec_exit() from a CPU unrealize common function
  exec: move cpu_exec_init() calls to realize functions
  exec: split cpu_exec_init()
  pc: q35: Bump max_cpus to 288
  pc: Require IRQ remapping and EIM if there could be x2APIC CPUs
  pc: Add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs
  Increase MAX_CPUMASK_BITS from 255 to 288
  pc: Clarify FW_CFG_MAX_CPUS usage comment
  pc: kvm_apic: Pass APIC ID depending on xAPIC/x2APIC mode
  pc: apic_common: Reset APIC ID to initial ID when switching into x2APIC mode
  pc: apic_common: Restore APIC ID to initial ID on reset
  pc: apic_common: Extend APIC ID property to 32bit
  pc: Leave max apic_id_limit only in legacy cpu hotplug code
  acpi: cphp: Force switch to modern cpu hotplug if APIC ID > 254
  pc: acpi: x2APIC support for SRAT table
  pc: acpi: x2APIC support for MADT table and _MAT method

Conflicts:
target-arm/cpu.c

Signed-off-by: Peter Maydell <[email protected]>

exec: call cpu_exec_exit() from a CPU unrealize common function

As cpu_exec_exit() mirrors the cpu_exec_realizefn(),
rename it as cpu_exec_unrealizefn().

Create and register a cpu_common_unrealizefn() function for
the CPU device class and call cpu_exec_unrealizefn() from
this function.

Remove cpu_exec_exit() from cpu_common_finalize()
(which mirrors init, not realize), and as x86_cpu_unrealizefn()
and ppc_cpu_unrealizefn() overwrite the device class unrealize function,
add a call to a parent_unrealize pointer.

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

exec: move cpu_exec_init() calls to realize functions

Modify all CPUs to call it from XXX_cpu_realizefn() function.

Remove all the cannot_destroy_with_object_finalize_yet as
unsafe references have been moved to cpu_exec_realizefn().
(tested with QOM command provided by commit 4c315c27)

for arm:

Setting of cpu->mp_affinity is moved from arm_cpu_initfn()
to arm_cpu_realizefn() as setting of cpu_index is now done
in cpu_exec_realizefn(). To avoid to overwrite an user defined
value, we set it to an invalid value by default, and update
it in realize function only if the value is still invalid.

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

exec: split cpu_exec_init()

Put in cpu_exec_initfn() what initializes the CPU,
and leave in cpu_exec_init() what adds it to the environment.

As cpu_exec_initfn() is called by all XX_cpu_initfn(), call it
directly in cpu_common_initfn().
cpu_exec_init() is now a realize function, it will be renamed
to cpu_exec_realizefn() and moved to the XX_cpu_realizefn()
function in a following patch.

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

pc: q35: Bump max_cpus to 288

Along with it for machine versions 2.7 and older keep
it at 255.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Radim Krčmář <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

pc: Require IRQ remapping and EIM if there could be x2APIC CPUs

It would prevent starting guest with incorrect configs
where interrupts couldn't be delivered to CPUs with
APIC IDs > 255.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Radim Krčmář <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

pc: Add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs

Currently firmware uses 1 byte at 0x5F offset in RTC CMOS
to get number of CPUs present at boot. However 1 byte is
not enough to handle more than 255 CPUs. So add a new
fw_cfg file that would allow QEMU to tell it.
For compat reasons add file only for machine types that
support more than 255 CPUs.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

Increase MAX_CPUMASK_BITS from 255 to 288

so that it would be possible to increase maxcpus limit
for x86 target. Keep spapr/virt_arm at limit they used
to have 255.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

pc: Clarify FW_CFG_MAX_CPUS usage comment

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

pc: kvm_apic: Pass APIC ID depending on xAPIC/x2APIC mode

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Radim Krčmář <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

pc: apic_common: Reset APIC ID to initial ID when switching into x2APIC mode

SDM: x2APIC State Transitions:
State Changes From xAPIC Mode to x2APIC Mode
"
Any APIC ID value written to the memory-mapped
local APIC ID register is not preserved
"

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Radim Krčmář <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

pc: apic_common: Restore APIC ID to initial ID on reset

APIC ID should be restored to initial APIC ID
state after Reset and Power-On.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

pc: apic_common: Extend APIC ID property to 32bit

ACPI ID is 32 bit wide on CPUs with x2APIC support.
Extend 'id' property to support it.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

pc: Leave max apic_id_limit only in legacy cpu hotplug code

That's enough to make old code that depends on it
to prevent QEMU starting with more than 255 CPUs.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

acpi: cphp: Force switch to modern cpu hotplug if APIC ID > 254

Switch to modern cpu hotplug at machine startup time if
a cpu present at boot has apic-id in range unsupported
by legacy cpu hotplug interface (i.e. > 254), to avoid
killing QEMU from legacy cpu hotplug code with error:
"acpi: invalid cpu id: #apic-id#"

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

pc: acpi: x2APIC support for SRAT table

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

pc: acpi: x2APIC support for MADT table and _MAT method

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20161024' into staging

target-arm queue:
* support variable (runtime-determined) page sizes, for a
   nearly-20% speedup of TCG for ARMv7 and v8 CPUs with 4K pages
* ptimer: add tests, support more flexible behaviour around
   what happens on the "zero" tick, use ptimer for a9gtimer
* virt: ACPI: Add IORT Structure definition
* i2c: Fix SMBus read transactions to avoid double events
* timer: stm32f2xx_timer: add check for prescaler value
* QOMify musicpal, pxa2xx_gpio, strongarm, pl110
* target-arm: Implement new HLT trap for semihosting
* i2c: Add asserts for second smbus i2c_start_transfer()

# gpg: Signature made Mon 24 Oct 2016 18:24:17 BST
# gpg:                using RSA key 0x3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <[email protected]>"
# gpg:                 aka "Peter Maydell <[email protected]>"
# gpg:                 aka "Peter Maydell <[email protected]>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* remotes/pmaydell/tags/pull-target-arm-20161024: (32 commits)
  i2c: Add asserts for second smbus i2c_start_transfer()
  target-arm: Implement new HLT trap for semihosting
  hw/display: QOM'ify pl110.c
  hw/arm: QOM'ify strongarm.c
  hw/arm: QOM'ify pxa2xx_gpio.c
  hw/arm: QOM'ify musicpal.c
  timer: stm32f2xx_timer: add check for prescaler value
  i2c: Fix SMBus read transactions to avoid double events
  timer: a9gtimer: remove loop to auto-increment comparator
  ARM: Virt: ACPI: Build an IORT table with RC and ITS nodes
  ACPI: Add IORT Structure definition
  tests: Add tests for the ARM MPTimer
  arm_mptimer: Convert to use ptimer
  tests: ptimer: Replace 10000 with 1
  tests: ptimer: Change the copyright comment
  tests: ptimer: Add tests for "no counter round down" policy
  hw/ptimer: Add "no counter round down" policy
  tests: ptimer: Add tests for "no immediate reload" policy
  hw/ptimer: Add "no immediate reload" policy
  tests: ptimer: Add tests for "no immediate trigger" policy
  ...

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Mon 24 Oct 2016 17:02:47 BST
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (23 commits)
  block/replication: Clarify 'top-id' parameter usage
  block: More operations for meta dirty bitmap
  tests: Add test code for hbitmap serialization
  block: BdrvDirtyBitmap serialization interface
  hbitmap: serialization
  block: Assert that bdrv_release_dirty_bitmap succeeded
  block: Add two dirty bitmap getters
  block: Support meta dirty bitmap
  tests: Add test code for meta bitmap
  HBitmap: Introduce "meta" bitmap to track bit changes
  block: Hide HBitmap in block dirty bitmap interface
  quorum: do not allocate multiple iovecs for FIFO strategy
  quorum: change child_iter to children_read
  iotests: Do not rely on unavailable domains in 162
  iotests: Remove raciness from 162
  qemu-nbd: Add --fork option
  qemu-iotests: Test I/O in a single drive from a throttling group
  throttle: Correct access to wrong BlockBackendPublic structures
  qapi: fix memory leak in bdrv_image_info_specific_dump
  block: improve error handling in raw_open
  ...

Signed-off-by: Peter Maydell <[email protected]>