Git Repo - qemu.git/log

Merge remote-tracking branch 'remotes/berrange/tags/qcrypto-next-pull-request' into staging

Update min required crypto library versions

The min required versions for crypto libraries are now

- gnutls >= 3.1.18
- nettle >= 2.7.1
- gcrypt >= 1.5.0

# gpg: Signature made Fri 19 Oct 2018 14:42:35 BST
# gpg:                using RSA key BE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <[email protected]>"
# gpg:                 aka "Daniel P. Berrange <[email protected]>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF

* remotes/berrange/tags/qcrypto-next-pull-request:
  crypto: require nettle >= 2.7.1 for building QEMU
  crypto: require libgcrypt >= 1.5.0 for building QEMU
  crypto: require gnutls >= 3.1.18 for building QEMU

Signed-off-by: Peter Maydell <[email protected]>

osdep: Work around MinGW assert

In several places we use assert(FEATURE), and assume that if FEATURE
is disabled, all following code is removed as unreachable. Which allows
us to compile-out functions that are only present with FEATURE, and
have a link-time failure if the functions remain used.

MinGW does not mark its internal function _assert() as noreturn, so the
compiler cannot see when code is unreachable, which leads to link errors
for this host that are not present elsewhere.

The current build-time failure concerns 62823083b8a2, but I remember
having seen this same error before. Fix it once and for all for MinGW.

Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: 20181022181623 [email protected]
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/vivier2/tags/linux-user-for-3.1-pull-request' into staging

A series to enable ioctl usbfs in linux-user

# gpg: Signature made Fri 19 Oct 2018 13:18:53 BST
# gpg:                using RSA key F30C38BD3F2FBE3C
# gpg: Good signature from "Laurent Vivier <[email protected]>"
# gpg:                 aka "Laurent Vivier <[email protected]>"
# gpg:                 aka "Laurent Vivier (Red Hat) <[email protected]>"
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier2/tags/linux-user-for-3.1-pull-request:
  linux-user: Implement special usbfs ioctls.
  linux-user: Define ordinary usbfs ioctls.
  linux-user: Check for Linux USBFS in configure

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* RTC fixes (Artem)
* icount fixes (Artem)
* rr fixes (Pavel, myself)
* hotplug cleanup (Igor)
* SCSI fixes (myself)
* 4.20-rc1 KVM header update (myself)
* coalesced PIO support (Peng Hao)
* HVF fixes (Roman B.)
* Hyper-V refactoring (Roman K.)
* Support for Hyper-V IPI (Vitaly)

# gpg: Signature made Fri 19 Oct 2018 12:47:58 BST
# gpg:                using RSA key BFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg:                 aka "Paolo Bonzini <[email protected]>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (47 commits)
  replay: pass raw icount value to replay_save_clock
  target/i386: kvm: just return after migrate_add_blocker failed
  hyperv_testdev: add SynIC message and event testmodes
  hyperv: process POST_MESSAGE hypercall
  hyperv: add support for KVM_HYPERV_EVENTFD
  hyperv: process SIGNAL_EVENT hypercall
  hyperv: add synic event flag signaling
  hyperv: add synic message delivery
  hyperv: make overlay pages for SynIC
  hyperv: only add SynIC in compatible configurations
  hyperv: qom-ify SynIC
  hyperv:synic: split capability testing and setting
  i386: add hyperv-stub for CONFIG_HYPERV=n
  default-configs: collect CONFIG_HYPERV* in hyperv.mak
  hyperv: factor out arch-independent API into hw/hyperv
  hyperv: make hyperv_vp_index inline
  hyperv: split hyperv-proto.h into x86 and arch-independent parts
  hyperv: rename kvm_hv_sint_route_set_sint
  hyperv: make HvSintRoute reference-counted
  hyperv: address HvSintRoute by X86CPU pointer
  ...

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20181018' into staging

Queued tcg patches.

# gpg: Signature made Fri 19 Oct 2018 07:03:20 BST
# gpg:                using RSA key 64DF38E8AF7E215F
# gpg: Good signature from "Richard Henderson <[email protected]>"
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth/tags/pull-tcg-20181018: (21 commits)
  cputlb: read CPUTLBEntry.addr_write atomically
  target/s390x: Check HAVE_ATOMIC128 and HAVE_CMPXCHG128 at translate
  target/s390x: Skip wout, cout helpers if op helper does not return
  target/s390x: Split do_cdsg, do_lpq, do_stpq
  target/s390x: Convert to HAVE_CMPXCHG128 and HAVE_ATOMIC128
  target/ppc: Convert to HAVE_CMPXCHG128 and HAVE_ATOMIC128
  target/arm: Check HAVE_CMPXCHG128 at translate time
  target/arm: Convert to HAVE_CMPXCHG128
  target/i386: Convert to HAVE_CMPXCHG128
  tcg: Split CONFIG_ATOMIC128
  tcg: Add tlb_index and tlb_entry helpers
  cputlb: serialize tlb updates with env->tlb_lock
  cputlb: fix assert_cpu_is_self macro
  exec: introduce tlb_init
  target/unicore32: remove tlb_flush from uc32_init_fn
  target/alpha: remove tlb_flush from alpha_cpu_initfn
  tcg: distribute tcg_time into TCG contexts
  tcg: plug holes in struct TCGProfile
  tcg: fix use of uninitialized variable under CONFIG_PROFILER
  tcg: access cpu->icount_decr.u16.high with atomics
  ...

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging

# gpg: Signature made Fri 19 Oct 2018 04:16:03 BST
# gpg:                using RSA key EF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request: (26 commits)
  qemu-options: Fix bad "macaddr" property in the documentation
  e1000: indicate dropped packets in HW counters
  net: ignore packet size greater than INT_MAX
  pcnet: fix possible buffer overflow
  rtl8139: fix possible out of bound access
  ne2000: fix possible out of bound access in ne2000_receive
  clean up callback when del virtqueue
  docs: Add COLO status diagram to COLO-FT.txt
  COLO: quick failover process by kick COLO thread
  COLO: notify net filters about checkpoint/failover event
  filter-rewriter: handle checkpoint and failover event
  filter: Add handle_event method for NetFilterClass
  COLO: flush host dirty ram from cache
  savevm: split the process of different stages for loadvm/savevm
  qapi: Add new command to query colo status
  qapi/migration.json: Rename COLO unknown mode to none mode.
  qmp event: Add COLO_EXIT event to notify users while exited COLO
  COLO: Flush memory data from ram cache
  ram/COLO: Record the dirty pages that SVM received
  COLO: Load dirty pages into SVM's RAM cache firstly
  ...

Signed-off-by: Peter Maydell <[email protected]>

crypto: require nettle >= 2.7.1 for building QEMU

nettle 2.7.1 was released in 2013 and all the distros that are build
target platforms for QEMU [1] include it:

  RHEL-7: 2.7.1
  Debian (Stretch): 3.3
  Debian (Jessie): 2.7.1
  OpenBSD (ports): 3.4
  FreeBSD (ports): 3.4
  OpenSUSE Leap 15: 3.4
  Ubuntu (Xenial): 3.2
  macOS (Homebrew): 3.4

Based on this, it is reasonable to require nettle >= 2.7.1 in QEMU
which allows for some conditional version checks in the code to be
removed.

[1] https://qemu.weilnetz.de/doc/qemu-doc.html#Supported-build-platforms

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Daniel P. Berrangé <[email protected]>

linux-user: Implement special usbfs ioctls.

Userspace submits a USB Request Buffer to the kernel, optionally
discards it, and finally reaps the URB. Thunk buffers from target
to host and back.

Tested by running an i386 scanner driver on ARMv7 and by running
the PowerPC lsusb utility on x86_64. The discardurb ioctl is
not exercised in these tests.

Signed-off-by: Cortland Tölva <[email protected]>
Message-Id: <20181008163521 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Define ordinary usbfs ioctls.

Provide ioctl definitions for the generic thunk mechanism to
convert most usbfs calls. Calculate arg size at runtime.

Signed-off-by: Cortland Tölva <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Message-Id: <20181008163521 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Check for Linux USBFS in configure

In preparation for adding user mode emulation support for the
Linux usbfs interface, check for its kernel header.

Signed-off-by: Cortland Tölva <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Message-Id: <20181008163521 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

replay: pass raw icount value to replay_save_clock

This avoids lock recursion when REPLAY_CLOCK is called inside the
timers spinlock.

Signed-off-by: Paolo Bonzini <[email protected]>

target/i386: kvm: just return after migrate_add_blocker failed

When migrate_add_blocker failed, the invtsc_mig_blocker is not
appended so no need to remove. This can save several instructions.

Signed-off-by: Li Qiang <[email protected]>
Message-Id: <20181006091816 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hyperv_testdev: add SynIC message and event testmodes

Add testmodes for SynIC messages and events. The message or event
connection setup / teardown is initiated by the guest via new control
codes written to the test device port. Then the test connections bounce
the respective operations back to the guest, i.e. the incoming messages
are posted or the incoming events are signaled on the configured vCPUs.

Signed-off-by: Roman Kagan <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hyperv: process POST_MESSAGE hypercall

Add handling of POST_MESSAGE hypercall. For that, add an interface to
regsiter a handler for the messages arrived from the guest on a
particular connection id (IOW set up a message connection in Hyper-V
speak).

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921082217 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hyperv: add support for KVM_HYPERV_EVENTFD

When setting up a notifier for Hyper-V event connection, try to use the
KVM-assisted one first, and fall back to userspace handling of the
hypercall if the kernel doesn't provide the requested feature.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921082217 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hyperv: process SIGNAL_EVENT hypercall

Add handling of SIGNAL_EVENT hypercall. For that, provide an interface
to associate an EventNotifier with an event connection number, so that
it's signaled when the SIGNAL_EVENT hypercall with the matching
connection ID is called by the guest.

Support for using KVM functionality for this will be added in a followup
patch.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921082217 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hyperv: add synic event flag signaling

Add infrastructure to signal SynIC event flags by atomically setting the
corresponding bit in the event flags page and firing a SINT if
necessary.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921082217 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hyperv: add synic message delivery

Add infrastructure to deliver SynIC messages to the SynIC message page.

Note that KVM may also want to deliver (SynIC timer) messages to the
same message slot.

The problem is that the access to a SynIC message slot is controlled by
the value of its .msg_type field which indicates if the slot is being
owned by the hypervisor (zero) or by the guest (non-zero).

This leaves no room for synchronizing multiple concurrent producers.

The simplest way to deal with this for both KVM and QEMU is to only
deliver messages in the vcpu thread.  KVM already does this; this patch
makes it for QEMU, too.

Specifically,

- add a function for posting messages, which only copies the message
   into the staging buffer if its free, and schedules a work on the
   corresponding vcpu to actually deliver it to the guest slot;

- instead of a sint ack callback, set up the sint route with a message
   status callback.  This function is called in a bh whenever there are
   updates to the message slot status: either the vcpu made definitive
   progress delivering the message from the staging buffer (succeeded or
   failed) or the guest issued EOM; the status is passed as an argument
   to the callback.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921082217 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hyperv: make overlay pages for SynIC

Per Hyper-V spec, SynIC message and event flag pages are to be
implemented as so called overlay pages.  That is, they are owned by the
hypervisor and, when mapped into the guest physical address space,
overlay the guest physical pages such that

1) the overlaid guest page becomes invisible to the guest CPUs until the
   overlay page is turned off
2) the contents of the overlay page is preserved when it's turned off
   and back on, even at a different address; it's only zeroed at vcpu
   reset

This particular nature of SynIC message and event flag pages is ignored
in the current code, and guest physical pages are used directly instead.
This happens to (mostly) work because the actual guests seem not to
depend on the features listed above.

This patch implements those pages as the spec mandates.

Since the extra RAM regions, which introduce migration incompatibility,
are only added at SynIC object creation which only happens when
hyperv_synic_kvm_only == false, no extra compat logic is necessary.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921082217 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hyperv: only add SynIC in compatible configurations

Certain configurations do not allow SynIC to be used in QEMU.  In
particular,

- when hyperv_vpindex is off, SINT routes can't be used as they refer to
  the destination vCPU by vp_index

- older KVM (which doesn't expose KVM_CAP_HYPERV_SYNIC2) zeroes out
  SynIC message and event pages on every msr load, breaking migration

OTOH in-KVM users of SynIC -- SynIC timers -- do work in those
configurations, and we shouldn't stop the guest from using them.

To cover both scenarios, introduce an X86CPU property that makes CPU
init code to skip creation of the SynIC object (and thus disables any
SynIC use in QEMU) but keeps the KVM part of the SynIC working.
The property is clear by default but is set via compat logic for older
machine types.

As a result, when hv_synic and a modern machine type are specified, QEMU
will refuse to run unless vp_index is on and the kernel is recent
enough.  OTOH with an older machine type QEMU will run fine with
hv_synic=on against an older kernel and/or without vp_index enabled but
will disallow the in-QEMU uses of SynIC (in e.g. VMBus).

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921082217 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hyperv: qom-ify SynIC

Make Hyper-V SynIC a device which is attached as a child to a CPU. For
now it only makes SynIC visibile in the qom hierarchy, and maintains its
internal fields in sync with the respecitve msrs of the parent cpu (the
fields will be used in followup patches).

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921082217 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hyperv:synic: split capability testing and setting

Put a bit more consistency into handling KVM_CAP_HYPERV_SYNIC capability,
by checking its availability and determining the feasibility of hv-synic
property first, and enabling it later.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921082217 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

i386: add hyperv-stub for CONFIG_HYPERV=n

This will allow to build slightly leaner QEMU that supports some HyperV
features of KVM (e.g. SynIC timers, PV spinlocks, APIC assists, etc.)
but nothing else on the QEMU side.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921082041 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

default-configs: collect CONFIG_HYPERV* in hyperv.mak

Accumulate HYPERV config options in a dedicated file. There are only
two so far; more will be added later.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921082041 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hyperv: factor out arch-independent API into hw/hyperv

A significant part of hyperv.c is not actually tied to x86, and can
be moved to hw/.

This will allow to maintain most of Hyper-V and VMBus
target-independent, and to avoid conflicts with inclusion of
arch-specific headers down the road in VMBus implementation.

Also this stuff can now be opt-out with CONFIG_HYPERV.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921082041 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hyperv: make hyperv_vp_index inline

Also make the inverse function, hyperv_find_vcpu, static as it's not
used outside hyperv.c

This paves the way to making hyperv.c built optionally.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921082041 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hyperv: split hyperv-proto.h into x86 and arch-independent parts

Some parts of the Hyper-V hypervisor-guest interface appear to be
target-independent, so move them into a proper header.

Not that Hyper-V ARM64 emulation is around the corner but it seems more
conveninent to have most of Hyper-V and VMBus target-independent, and
allows to avoid conflicts with inclusion of arch-specific headers down
the road in VMBus implementation.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921082041 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hyperv: rename kvm_hv_sint_route_set_sint

There's nothing kvm-specific in it so follow the suite and replace
"kvm_hv" prefix with "hyperv".

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921081836 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hyperv: make HvSintRoute reference-counted

Multiple entities (e.g. VMBus devices) can use the same SINT route. To
make their lives easier in maintaining SINT route ownership, make it
reference-counted. Adjust the respective API names accordingly.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921081836 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hyperv: address HvSintRoute by X86CPU pointer

Use X86CPU pointer to refer to the respective HvSintRoute instead of
vp_index. This is more convenient and also paves the way for future
enhancements.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921081836 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hyperv: allow passing arbitrary data to sint ack callback

Make sint ack callback accept an opaque pointer, that is stored on
sint_route at creation time.

This allows for more convenient interaction with the callback.

Besides, nothing outside hyperv.c should need to know the layout of
HvSintRoute fields any more so its declaration can be removed from the
header.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921081836 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hyperv: synic: only setup ack notifier if there's a callback

There's no point setting up an sint ack notifier if no callback is
specified.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921081836 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hyperv: cosmetic: g_malloc -> g_new

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921081836 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hyperv_testdev: drop unnecessary includes

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921081836 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hyperv_testdev: refactor for better maintainability

Make hyperv_testdev slightly easier to follow and enhance in future.
For that, put the hyperv sint routes (wrapped in a helper structure) on
a linked list rather than a fixed-size array. Besides, this way
HvSintRoute can be treated as an opaque structure, allowing for easier
refactoring of the core Hyper-V SynIC code in followup pathches.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20180921081836 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

scsi-disk: fix rerror/werror=ignore

rerror=ignore was returning true from scsi_handle_rw_error but the callers were not
calling scsi_req_complete when rerror=ignore returns true (this is the correct thing
to do when true is returned after executing a passthrough command). Fix this by
calling it in scsi_handle_rw_error.

Signed-off-by: Paolo Bonzini <[email protected]>

scsi-disk: fix double completion of failing passthrough requests

If a command fails with a sense that scsi_sense_buf_to_errno converts to
ECANCELED/EAGAIN/ENOTCONN or with a unit attention, scsi_req_complete is
called twice. This caused a crash.

Reported-by: Wangguang <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

hw: edu: drop DO_UPCAST

Signed-off-by: Li Qiang <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

call HotplugHandler->plug() as the last step in device realization

When [2] was fixed it was agreed that adding and calling post_plug()
callback after device_reset() was low risk approach to hotfix issue
right before release. So it was merged instead of moving already
existing plug() callback after device_reset() is called which would
be more risky and require all plug() callbacks audit.

Looking at the current plug() callbacks, it doesn't seem that moving
plug() callback after device_reset() is breaking anything, so here
goes agreed upon [3] proper fix which essentially reverts [1][2]
and moves plug() callback after device_reset().
This way devices always comes to plug() stage, after it's been fully
initialized (including being reset), which fixes race condition [2]
without need for an extra post_plug() callback.

1. (25e897881 "qdev: add HotplugHandler->post_plug() callback")
2. (8449bcf94 "virtio-scsi: fix hotplug ->reset() vs event race")
3. https://www.mail-archive.com/[email protected]/msg549915.html

Signed-off-by: Igor Mammedov <[email protected]>
Message-Id: <1539696820 [email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Tested-by: Pierre Morel<[email protected]>
Acked-by: Pierre Morel<[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

vl, qapi: offset calculation in RTC_CHANGE event reverted

Return value of qemu_timedate_diff(), used for calculation offset in
QAPI 'RTC_CHANGE' event, restored to keep compatibility. Since it
wasn't documented that difference is relative to host clock
advancement, this change also adds important note to 'RTC_CHANGE'
event description to highlight established implementation specifics.

Signed-off-by: Artem Pisarenko <[email protected]>
Message-Id: <1fc12c77e8b7115d3842919a8b586d9cbe4efca6.1539846575 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Fixes RTC bug with base datetime shifts in clock=vm

This makes all current "-rtc" option parameters combinations produce
fixed/unambiguous RTC timedate reference for hardware emulation
frontends.
It restores determinism of guest execution when used with clock=vm and
specified base <datetime> value.

Buglink: https://bugs.launchpad.net/qemu/+bug/1797033
Signed-off-by: Artem Pisarenko <[email protected]>
Message-Id: <1d963c3e013dfedafa1f6edb9fb219b7e49e39da.1539846575 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

vl: refactor -rtc option references

Improve code readability and prepare for fixing bug #1797033

Signed-off-by: Artem Pisarenko <[email protected]>
Message-Id: <9330a48899f997431a34460014886d118a7c0960.1539846575 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

vl: improve/fix documentation related to RTC function

Documentation describing -rtc option updated to better match current
implementation and highlight some important specifics.

Signed-off-by: Artem Pisarenko <[email protected]>
Message-Id: <1b245c6c0803d4bf11dcbf9eb32f34af8c2bd0b4.1539846575 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

i386: hvf: Remove hvf_disabled

accel_init_machine sets *(acc->allowed) to true if acc->init_machine(ms)
succeeds. There's no need to have both hvf_allowed and hvf_disabled.

Signed-off-by: Roman Bolshakov <[email protected]>
Message-Id: <20181018143051 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

i386: hvf: Fix register refs if REX is present

According to Intel(R)64 and IA-32 Architectures Software Developer's
Manual, the following one-byte registers should be fetched when REX
prefix is present (sorted by reg encoding index):
AL, CL, DL, BL, SPL, BPL, SIL, DIL, R8L - R15L

The first 8 are fetched if REX.R is zero, the last 8 if non-zero.

The following registers should be fetched for instructions without REX
prefix (also sorted by reg encoding index):
AL, CL, DL, BL, AH, CH, DH, BH

Current emulation code doesn't handle accesses to SPL, BPL, SIL, DIL
when REX is present, thefore an instruction 40883e "mov %dil,(%rsi)" is
decoded as "mov %bh,(%rsi)".

That caused an infinite loop in vp_reset:
https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg03293.html

Signed-off-by: Roman Bolshakov <[email protected]>
Message-Id: <20181018134401 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

i386/kvm: add support for Hyper-V IPI send

Hyper-V PV IPI support is merged to KVM, enable the feature in Qemu. When
enabled, this allows Windows guests to send IPIs to other vCPUs with a
single hypercall even when there are >64 vCPUs in the request.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
Reviewed-by: Roman Kagan <[email protected]>
Message-Id: <20181009130853 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

replay: don't process events at virtual clock checkpoint

As QEMU becomes more multi-threaded and non-synchronized, checkpoints
move from thread to thread. And the event queue that processed at checkpoints
should belong to the same thread in both record and replay executions.
This patch disables asynchronous event processing at virtual clock
checkpoint, because it may be invoked in different threads at record and
replay. This patch is temporary fix until the checkpoints are completely
refactored.

Signed-off-by: Pavel Dovgalyuk <[email protected]>
Message-Id: <20181018063345.7433.11678.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <[email protected]>

target-i386: add q35 0xcf8 port as coalesced_pio

Signed-off-by: Peng Hao <[email protected]>
Message-Id: <1539795177 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target-i386: add i440fx 0xcf8 port as coalesced_pio

Signed-off-by: Peng Hao <[email protected]>
Message-Id: <1539795177 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target-i386: add rtc 0x70 port as coalesced_pio

Signed-off-by: Peng Hao <[email protected]>
Message-Id: <1539890353 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target-i386 : add coalesced_pio API

the primary API realization.

Signed-off-by: Peng Hao <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Message-Id: <1539795177 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

linux-headers: update to 4.20-rc1

This brings in eVMCS and coalesced PIO support, as well as other features we do
not support yet.

Signed-off-by: Paolo Bonzini <[email protected]>

target-i386: kvm: do not initialize padding fields

The exception.pad field is going to be renamed to pending in an upcoming
header file update. Remove the unnecessary initialization; it was
introduced to please valgrind (commit 7e680753cfa2) but they were later
rendered unnecessary by commit 076796f8fd27f4d, which added the "= {}"
initializer to the declaration of "events". Therefore the patch does
not change behavior in any way.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

qemu-timer: avoid checkpoints for virtual clock timers in external subsystems

Adds EXTERNAL attribute definition to qemu timers subsystem and assigns
it to virtual clock timers, used in slirp (ICMP IPv6) and ui (key queue).
Virtual clock processing in rr mode can use this attribute instead of a
separate clock type.

Fixes: 87f4fe7653baf55b5c2f2753fe6003f473c07342
Fixes: 775a412bf83f6bc0c5c02091ee06cf649b34c593
Fixes: 9888091404a702d7ec79d51b088d994b9fc121bd
Signed-off-by: Artem Pisarenko <[email protected]>
Message-Id: <e771f96ab94e86b54b9a783c974f2af3009fe5d1.1539764043 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

qemu-timer: introduce timer attributes

Attributes are simple flags, associated with individual timers for their
whole lifetime. They intended to be used to mark individual timers for
special handling when they fire.

New/init functions family in timer interface updated and refactored (new
'attribute' argument added, timer_list replaced with timer_list_group+type
combinations, comments improved to avoid info duplication). Also existing
aio interface extended with attribute-enabled variants of functions,
which create/initialize timers.

Signed-off-by: Artem Pisarenko <[email protected]>
Message-Id: <f47b81dbce734e9806f9516eba8ca588e6321c2f.1539764043 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Revert some patches from recent [PATCH v6] "Fixing record/replay and adding reverse debugging"

That patch series introduced new virtual clock type for use in external
subsystems. It breaks desired behavior in non-record/replay usage
scenarios due to a small change to existing behavior. Processing of
virtual timers belonging to new clock type is kicked off to the main
loop, which makes these timers asynchronous with vCPU thread and,
in icount mode, with whole guest execution. This breaks expected
determinism in non-record/replay icount mode of emulation where these
"external subsystems" are isolated from the host (i.e. they are
external only to guest core, not to the entire emulation environment).

Example for slirp ("user" backend for network device):
User runs qemu in icount mode with rtc clock=vm without any external
communication interfaces but with "-netdev user,restrict=on". It expects
deterministic execution, because network services are emulated inside
qemu and isolated from host. There are no reasons to get reply from DHCP
server with different delay or something like that.

The next patches revert reimplements the same changes in a better way.
This reverts commit 87f4fe7653baf55b5c2f2753fe6003f473c07342.
This reverts commit 775a412bf83f6bc0c5c02091ee06cf649b34c593.
This reverts commit 9888091404a702d7ec79d51b088d994b9fc121bd.

Signed-off-by: Artem Pisarenko <[email protected]>
Message-Id: <18b1e7c8f155fe26976f91be06bde98eef6f8751.1539764043 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

es1370: more fixes for ADC_FRAMEADR and ADC_FRAMECNT

They are not consecutive with DAC1_FRAME* and DAC2_FRAME*; Coverity
still complains about es1370_read, while es1370_write was fixed in
commit cf9270e5220671f49cc238deaf6136669cc07ae1.

Fixes: 154c1d1f960c5147a3f8ef00907504112f271cd8
Signed-off-by: Paolo Bonzini <[email protected]>

crypto: require libgcrypt >= 1.5.0 for building QEMU

libgcrypt 1.5.0 was released in 2011 and all the distros that are build
target platforms for QEMU [1] include it:

  RHEL-7: 1.5.3
  Debian (Stretch): 1.7.6
  Debian (Jessie): 1.6.3
  OpenBSD (ports): 1.8.2
  FreeBSD (ports): 1.8.3
  OpenSUSE Leap 15: 1.8.2
  Ubuntu (Xenial): 1.6.5
  macOS (Homebrew): 1.8.3

Based on this, it is reasonable to require libgcrypt >= 1.5.0 in QEMU
which allows for some conditional version checks in the code to be
removed.

[1] https://qemu.weilnetz.de/doc/qemu-doc.html#Supported-build-platforms

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Daniel P. Berrangé <[email protected]>

crypto: require gnutls >= 3.1.18 for building QEMU

gnutls 3.0.0 was released in 2011 and all the distros that are build
target platforms for QEMU [1] include it:

  RHEL-7: 3.1.18
  Debian (Stretch): 3.5.8
  Debian (Jessie): 3.3.8
  OpenBSD (ports): 3.5.18
  FreeBSD (ports): 3.5.18
  OpenSUSE Leap 15: 3.6.2
  Ubuntu (Xenial): 3.4.10
  macOS (Homebrew): 3.5.19

Based on this, it is reasonable to require gnutls >= 3.1.18 in QEMU
which allows for all conditional version checks in the code to be
removed.

[1] https://qemu.weilnetz.de/doc/qemu-doc.html#Supported-build-platforms

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Daniel P. Berrangé <[email protected]>

Merge remote-tracking branch 'remotes/vivier2/tags/linux-user-for-3.1-pull-request' into staging

Add a workaround for clang bug and remove misleading comment (sparc)

# gpg: Signature made Thu 18 Oct 2018 20:00:17 BST
# gpg:                using RSA key F30C38BD3F2FBE3C
# gpg: Good signature from "Laurent Vivier <[email protected]>"
# gpg:                 aka "Laurent Vivier <[email protected]>"
# gpg:                 aka "Laurent Vivier (Red Hat) <[email protected]>"
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier2/tags/linux-user-for-3.1-pull-request:
  linux-user/sparc/signal.c: Remove unnecessary comment
  linux-user: Suppress address-of-packed-member warnings in __get/put_user_e

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/amarkovic/tags/mips-queue-october-2018-part1-v2' into staging

MIPS queue October 2018, part1, v2

# gpg: Signature made Thu 18 Oct 2018 19:39:00 BST
# gpg:                using RSA key D4972A8967F75A65
# gpg: Good signature from "Aleksandar Markovic <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 8526 FBF1 5DA3 811F 4A01  DD75 D497 2A89 67F7 5A65

* remotes/amarkovic/tags/mips-queue-october-2018-part1-v2: (28 commits)
  target/mips: Add opcodes for nanoMIPS EVA instructions
  target/mips: Fix misplaced 'break' in handling of NM_SHRA_R_PH
  target/mips: Fix emulation of microMIPS R6 <SELEQZ|SELNEZ>.<D|S>
  target/mips: Implement hardware page table walker for MIPS32
  target/mips: Add reset state for PWSize and PWField registers
  target/mips: Add CP0 PWCtl register
  target/mips: Add CP0 PWSize register
  target/mips: Add CP0 PWField register
  target/mips: Add CP0 PWBase register
  target/mips: Add CP0 Config2 to DisasContext
  target/mips: Improve DSP R2/R3-related naming
  target/mips: Add availability control for DSP R3 ASE
  target/mips: Add bit definitions for DSP R3 ASE
  target/mips: Reorganize bit definitions for insn_flags (ISAs/ASEs flags)
  target/mips: Increase 'supported ISAs/ASEs' flag holder size
  target/mips: Add opcode values of MXU ASE
  target/mips: Add organizational chart of MXU ASE
  target/mips: Add assembler mnemonics list for MXU ASE
  target/mips: Add basic description of MXU ASE
  target/mips: Add a comment before each CP0 register section in cpu.h
  ...

Signed-off-by: Peter Maydell <[email protected]>

qemu-options: Fix bad "macaddr" property in the documentation

When using the "-device" option, the property is called "mac".
"macaddr" is only used for the legacy "-net nic" option.

Reported-by: Harald Hoyer <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

e1000: indicate dropped packets in HW counters

The e1000 emulation silently discards RX packets if there's
insufficient space in the ring buffer. This leads to errors
on higher-level protocols in the guest, with no indication
about the error cause.

This patch increments the "Missed Packets Count" (MPC) and
"Receive No Buffers Count" (RNBC) HW counters in this case.
As the emulation has no FIFO for buffering packets that can't
immediately be pushed to the guest, these two registers are
practically equivalent (see 10.2.7.4, 10.2.7.33 in
https://www.intel.com/content/www/us/en/embedded/products/networking/82574l-gbe-controller-datasheet.html).

On a Linux guest, the register content will be reflected in
the "rx_missed_errors" and "rx_no_buffer_count" stats from
"ethtool -S", and in the "missed" stat from "ip -s -s link show",
giving at least some hint about the error cause inside the guest.

If the cause is known, problems like this can often be avoided
easily, by increasing the number of RX descriptors in the guest
e1000 driver (e.g under Linux, "e1000.RxDescriptors=1024").

The patch also adds a qemu trace message for this condition.

Signed-off-by: Martin Wilck <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: ignore packet size greater than INT_MAX

There should not be a reason for passing a packet size greater than
INT_MAX. It's usually a hint of bug somewhere, so ignore packet size
greater than INT_MAX in qemu_deliver_packet_iov()

CC: [email protected]
Reported-by: Daniel Shapira <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

pcnet: fix possible buffer overflow

In pcnet_receive(), we try to assign size_ to size which converts from
size_t to integer. This will cause troubles when size_ is greater
INT_MAX, this will lead a negative value in size and it can then pass
the check of size < MIN_BUF_SIZE which may lead out of bound access
for both buf and buf1.

Fixing by converting the type of size to size_t.

CC: [email protected]
Reported-by: Daniel Shapira <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

rtl8139: fix possible out of bound access

In rtl8139_do_receive(), we try to assign size_ to size which converts
from size_t to integer. This will cause troubles when size_ is greater
INT_MAX, this will lead a negative value in size and it can then pass
the check of size < MIN_BUF_SIZE which may lead out of bound access of
for both buf and buf1.

Fixing by converting the type of size to size_t.

CC: [email protected]
Reported-by: Daniel Shapira <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

ne2000: fix possible out of bound access in ne2000_receive

In ne2000_receive(), we try to assign size_ to size which converts
from size_t to integer. This will cause troubles when size_ is greater
INT_MAX, this will lead a negative value in size and it can then pass
the check of size < MIN_BUF_SIZE which may lead out of bound access of
for both buf and buf1.

Fixing by converting the type of size to size_t.

CC: [email protected]
Reported-by: Daniel Shapira <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

clean up callback when del virtqueue

Before, we did not clear callback like handle_output when delete
the virtqueue which may result be segmentfault.
The scene is as follows:
1. Start a vm with multiqueue vhost-net,
2. then we write VIRTIO_PCI_GUEST_FEATURES in PCI configuration to
triger multiqueue disable in this vm which will delete the virtqueue.
In this step, the tx_bh is deleted but the callback virtio_net_handle_tx_bh
still exist.
3. Finally, we write VIRTIO_PCI_QUEUE_NOTIFY in PCI configuration to
notify the deleted virtqueue. In this way, virtio_net_handle_tx_bh
will be called and qemu will be crashed.

Although the way described above is uncommon, we had better reinforce it.

CC: [email protected]
Signed-off-by: liujunjie <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

docs: Add COLO status diagram to COLO-FT.txt

This diagram make user better understand COLO.
Suggested by Markus Armbruster.

Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

COLO: quick failover process by kick COLO thread

COLO thread may sleep at qemu_sem_wait(&s->colo_checkpoint_sem),
while failover works begin, It's better to wakeup it to quick
the process.

Signed-off-by: zhanghailiang <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

COLO: notify net filters about checkpoint/failover event

Notify all net filters about the checkpoint and failover event.

Signed-off-by: zhanghailiang <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

filter-rewriter: handle checkpoint and failover event

After one round of checkpoint, the states between PVM and SVM
become consistent, so it is unnecessary to adjust the sequence
of net packets for old connections, besides, while failover
happens, filter-rewriter will into failover mode that needn't
handle the new TCP connection.

Signed-off-by: zhanghailiang <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

filter: Add handle_event method for NetFilterClass

Filter needs to process the event of checkpoint/failover or
other event passed by COLO frame.

Signed-off-by: zhanghailiang <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

COLO: flush host dirty ram from cache

Don't need to flush all VM's ram from cache, only
flush the dirty pages since last checkpoint

Signed-off-by: Li Zhijian <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: zhanghailiang <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

savevm: split the process of different stages for loadvm/savevm

There are several stages during loadvm/savevm process. In different stage,
migration incoming processes different types of sections.
We want to control these stages more accuracy, it will benefit COLO
performance, we don't have to save type of QEMU_VM_SECTION_START
sections everytime while do checkpoint, besides, we want to separate
the process of saving/loading memory and devices state.

So we add three new helper functions: qemu_load_device_state() and
qemu_savevm_live_state() to achieve different process during migration.

Besides, we make qemu_loadvm_state_main() and qemu_save_device_state()
public, and simplify the codes of qemu_save_device_state() by calling the
wrapper qemu_savevm_state_header().

Signed-off-by: zhanghailiang <[email protected]>
Signed-off-by: Li Zhijian <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

qapi: Add new command to query colo status

Libvirt or other high level software can use this command query colo status.
You can test this command like that:
{'execute':'query-colo-status'}

Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

qapi/migration.json: Rename COLO unknown mode to none mode.

Suggested by Markus Armbruster rename COLO unknown mode to none mode.

Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

qmp event: Add COLO_EXIT event to notify users while exited COLO

If some errors happen during VM's COLO FT stage, it's important to
notify the users of this event. Together with 'x-colo-lost-heartbeat',
Users can intervene in COLO's failover work immediately.
If users don't want to get involved in COLO's failover verdict,
it is still necessary to notify users that we exited COLO mode.

Signed-off-by: zhanghailiang <[email protected]>
Signed-off-by: Li Zhijian <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

COLO: Flush memory data from ram cache

During the time of VM's running, PVM may dirty some pages, we will transfer
PVM's dirty pages to SVM and store them into SVM's RAM cache at next checkpoint
time. So, the content of SVM's RAM cache will always be same with PVM's memory
after checkpoint.

Instead of flushing all content of PVM's RAM cache into SVM's MEMORY,
we do this in a more efficient way:
Only flush any page that dirtied by PVM since last checkpoint.
In this way, we can ensure SVM's memory same with PVM's.

Besides, we must ensure flush RAM cache before load device state.

Signed-off-by: zhanghailiang <[email protected]>
Signed-off-by: Li Zhijian <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

ram/COLO: Record the dirty pages that SVM received

We record the address of the dirty pages that received,
it will help flushing pages that cached into SVM.

Here, it is a trick, we record dirty pages by re-using migration
dirty bitmap. In the later patch, we will start the dirty log
for SVM, just like migration, in this way, we can record both
the dirty pages caused by PVM and SVM, we only flush those dirty
pages from RAM cache while do checkpoint.

Signed-off-by: zhanghailiang <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

COLO: Load dirty pages into SVM's RAM cache firstly

We should not load PVM's state directly into SVM, because there maybe some
errors happen when SVM is receving data, which will break SVM.

We need to ensure receving all data before load the state into SVM. We use
an extra memory to cache these data (PVM's ram). The ram cache in secondary side
is initially the same as SVM/PVM's memory. And in the process of checkpoint,
we cache the dirty pages of PVM into this ram cache firstly, so this ram cache
always the same as PVM's memory at every checkpoint, then we flush this cached ram
to SVM after we receive all PVM's state.

Signed-off-by: zhanghailiang <[email protected]>
Signed-off-by: Li Zhijian <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

COLO: Remove colo_state migration struct

We need to know if migration is going into COLO state for
incoming side before start normal migration.

Instead by using the VMStateDescription to send colo_state
from source side to destination side, we use MIG_CMD_ENABLE_COLO
to indicate whether COLO is enabled or not.

Signed-off-by: zhanghailiang <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

COLO: Add block replication into colo process

Make sure master start block replication after slave's block
replication started.

Besides, we need to activate VM's blocks before goes into
COLO state.

Signed-off-by: zhanghailiang <[email protected]>
Signed-off-by: Li Zhijian <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

COLO: integrate colo compare with colo frame

For COLO FT, both the PVM and SVM run at the same time,
only sync the state while it needs.

So here, let SVM runs while not doing checkpoint, change
DEFAULT_MIGRATE_X_CHECKPOINT_DELAY to 200*100.

Besides, we forgot to release colo_checkpoint_semd and
colo_delay_timer, fix them here.

Signed-off-by: zhanghailiang <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

colo-compare: use notifier to notify packets comparing result

It's a good idea to use notifier to notify COLO frame of
inconsistent packets comparing.

Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: zhanghailiang <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

colo-compare: implement the process of checkpoint

While do checkpoint, we need to flush all the unhandled packets,
By using the filter notifier mechanism, we can easily to notify
every compare object to do this process, which runs inside
of compare threads as a coroutine.

Signed-off-by: zhanghailiang <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

filter-rewriter: Add TCP state machine and fix memory leak in connection_track_table

We add almost full TCP state machine in filter-rewriter, except
TCPS_LISTEN and some simplify in VM active close FIN states.
The reason for this simplify job is because guest kernel will track
the TCP status and wait 2MSL time too, if client resend the FIN packet,
guest will resend the last ACK, so we needn't wait 2MSL time in filter-rewriter.

After a net connection is closed, we didn't clear its related resources
in connection_track_table, which will lead to memory leak.

Let's track the state of net connection, if it is closed, its related
resources will be cleared up.

Signed-off-by: zhanghailiang <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

cputlb: read CPUTLBEntry.addr_write atomically

Updates can come from other threads, so readers that do not
take tlb_lock must use atomic_read to avoid undefined
behaviour (UB).

This completes the conversion to tlb_lock. This conversion results
on average in no performance loss, as the following experiments
(run on an Intel i7-6700K CPU @ 4.00GHz) show.

1. aarch64 bootup+shutdown test:

- Before:
Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):

       7487.087786      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.12% )
    31,574,905,303      cycles                    #    4.217 GHz                      ( +-  0.12% )
    57,097,908,812      instructions              #    1.81  insns per cycle          ( +-  0.08% )
    10,255,415,367      branches                  # 1369.747 M/sec                    ( +-  0.08% )
       173,278,962      branch-misses             #    1.69% of all branches          ( +-  0.18% )

       7.504481349 seconds time elapsed                                          ( +-  0.14% )

- After:
Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):

       7462.441328      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.07% )
    31,478,476,520      cycles                    #    4.218 GHz                      ( +-  0.07% )
    57,017,330,084      instructions              #    1.81  insns per cycle          ( +-  0.05% )
    10,251,929,667      branches                  # 1373.804 M/sec                    ( +-  0.05% )
       173,023,787      branch-misses             #    1.69% of all branches          ( +-  0.11% )

       7.474970463 seconds time elapsed                                          ( +-  0.07% )

2. SPEC06int:
                                              SPEC06int (test set)
                                           [Y axis: Speedup over master]
  1.15 +-+----+------+------+------+------+------+-------+------+------+------+------+------+------+----+-+
       |                                                                                                  |
   1.1 +-+.................................+++.............................+  tlb-lock-v2 (m+++x)       +-+
       |                                +++ |                   +++        tlb-lock-v3 (spinl|ck)         |
       |                    +++          |  |     +++    +++     |                           |            |
  1.05 +-+....+++...........####.........|####.+++.|......|.....###....+++...........+++....###.........+-+
       |      ###         ++#| #         |# |# ***### +++### +++#+#     |     +++     |     #|#    ###    |
     1 +-+++***+#++++####+++#++#++++++++++#++#+*+*++#++++#+#+****+#++++###++++###++++###++++#+#++++#+#+++-+
       |    *+* #    #++# ***  #   #### ***  # * *++# ****+# *| * # ****|#   |# #    #|#    #+#    # #    |
  0.95 +-+..*.*.#....#..#.*|*..#...#..#.*|*..#.*.*..#.*|.*.#.*++*.#.*++*+#.****.#....#+#....#.#..++#.#..+-+
       |    * * #    #  # *|*  #   #  # *|*  # * *  # *++* # *  * # *  * # * |* #  ++# #    # #  *** #    |
       |    * * #  ++#  # *+*  #   #  # *|*  # * *  # *  * # *  * # *  * # *++* # **** #  ++# #  * * #    |
   0.9 +-+..*.*.#...|#..#.*.*..#.++#..#.*|*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*.|*.#...|#.#..*.*.#..+-+
       |    * * #  ***  # * *  #  |#  # *+*  # * *  # *  * # *  * # *  * # *  * # *++* #   |# #  * * #    |
  0.85 +-+..*.*.#..*|*..#.*.*..#.***..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.****.#..*.*.#..+-+
       |    * * #  *+*  # * *  # *|*  # * *  # * *  # *  * # *  * # *  * # *  * # *  * # * |* #  * * #    |
       |    * * #  * *  # * *  # *+*  # * *  # * *  # *  * # *  * # *  * # *  * # *  * # * |* #  * * #    |
   0.8 +-+..*.*.#..*.*..#.*.*..#.*.*..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.*++*.#..*.*.#..+-+
       |    * * #  * *  # * *  # * *  # * *  # * *  # *  * # *  * # *  * # *  * # *  * # *  * #  * * #    |
  0.75 +-+--***##--***###-***###-***###-***###-***###-****##-****##-****##-****##-****##-****##--***##--+-+
400.perlben401.bzip2403.gcc429.m445.gob456.hmme45462.libqua464.h26471.omnet473483.xalancbmkgeomean

  png: https://imgur.com/a/BHzpPTW

Notes:
- tlb-lock-v2 corresponds to an implementation with a mutex.
- tlb-lock-v3 corresponds to the current implementation, i.e.
  a spinlock and a single lock acquisition in tlb_set_page_with_attrs.

Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <20181016153840 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/s390x: Check HAVE_ATOMIC128 and HAVE_CMPXCHG128 at translate

Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/s390x: Skip wout, cout helpers if op helper does not return

When op raises an exception, it may not have initialized the output
temps that would be written back by wout or cout.

Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/s390x: Split do_cdsg, do_lpq, do_stpq

Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/s390x: Convert to HAVE_CMPXCHG128 and HAVE_ATOMIC128

Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/ppc: Convert to HAVE_CMPXCHG128 and HAVE_ATOMIC128

Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/arm: Check HAVE_CMPXCHG128 at translate time

Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/arm: Convert to HAVE_CMPXCHG128

Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target/i386: Convert to HAVE_CMPXCHG128

Reviewed-by: Emilio G. Cota <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Split CONFIG_ATOMIC128

GCC7+ will no longer advertise support for 16-byte __atomic operations
if only cmpxchg is supported, as for x86_64. Fortunately, x86_64 still
has support for __sync_compare_and_swap_16 and we can make use of that.
AArch64 does not have, nor ever has had such support, so open-code it.

Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Add tlb_index and tlb_entry helpers

Isolate the computation of an index from an address into a
helper before we change that function.

Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
[ cota: convert tlb_vaddr_to_host; use atomic_read on addr_write ]
Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <20181009175129 [email protected]>

cputlb: serialize tlb updates with env->tlb_lock

Currently we rely on atomic operations for cross-CPU invalidations.
There are two cases that these atomics miss: cross-CPU invalidations
can race with either (1) vCPU threads flushing their TLB, which
happens via memset, or (2) vCPUs calling tlb_reset_dirty on their TLB,
which updates .addr_write with a regular store. This results in
undefined behaviour, since we're mixing regular and atomic ops
on concurrent accesses.

Fix it by using tlb_lock, a per-vCPU lock. All updaters of tlb_table
and the corresponding victim cache now hold the lock.
The readers that do not hold tlb_lock must use atomic reads when
reading .addr_write, since this field can be updated by other threads;
the conversion to atomic reads is done in the next patch.

Note that an alternative fix would be to expand the use of atomic ops.
However, in the case of TLB flushes this would have a huge performance
impact, since (1) TLB flushes can happen very frequently and (2) we
currently use a full memory barrier to flush each TLB entry, and a TLB
has many entries. Instead, acquiring the lock is barely slower than a
full memory barrier since it is uncontended, and with a single lock
acquisition we can flush the entire TLB.

Tested-by: Alex Bennée <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <20181009174557 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

cputlb: fix assert_cpu_is_self macro

Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Emilio G. Cota <[email protected]>
Message-Id: <20181009174557 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>