* remotes/bonzini/tags/for-upstream: (31 commits)
docs: create config/, devel/ and spin/ subdirectories
cpus: reset throttle_thread_scheduled after sleep
kvm: don't register smram_listener when smm is off
nbd: make it thread-safe, fix qcow2 over nbd
target/i386: Add GDB XML description for SSE registers
i386/kvm: do not zero out segment flags if segment is unusable or not present
edu: fix memory leak on msi_broken platforms
linuxboot_dma: compile for i486
kvmclock: update system_time_msr address forcibly
nbd: Fully initialize client in case of failed negotiation
sockets: improve error reporting if UNIX socket path is too long
i386: fix read/write cr with icount option
target/i386: use multiple CPU AddressSpaces
target/i386: enable A20 automatically in system management mode
virtio-scsi: Unset hotplug handler when unrealize
exec: simplify phys_page_find() params
nbd/client.c: use errp instead of LOG
nbd: add errp to read_sync, write_sync and drop_sync
nbd: add errp parameter to nbd_wr_syncv()
nbd: read_sync and friends: return 0 on success
...
Paolo Bonzini [Tue, 6 Jun 2017 14:46:26 +0000 (16:46 +0200)]
docs: create config/, devel/ and spin/ subdirectories
Developer documentation should be its own manual. As a start, move all
developer-oriented files to a separate directory.
Also move non-text files to their own directories: docs/config/ for
QEMU -readconfig input, and docs/spin/ for formal models to be used
with the SPIN model checker.
Felipe Franciosi [Fri, 19 May 2017 21:29:50 +0000 (22:29 +0100)]
cpus: reset throttle_thread_scheduled after sleep
Currently, the throttle_thread_scheduled flag is reset back to 0 before
sleeping (as part of the throttling logic). Given that throttle_timer
(well, any timer) may tick with a slight delay, it so happens that under
heavy throttling (ie. close or on CPU_THROTTLE_PCT_MAX) the tick may
schedule a further cpu_throttle_thread() work item after the flag reset,
but before the previous sleep completed. This results on the vCPU thread
sleeping continuously for potentially several seconds in a row.
The chances of that happening can be drastically minimised by resetting
the flag after the sleep.
Gonglei [Thu, 1 Jun 2017 11:35:15 +0000 (19:35 +0800)]
kvm: don't register smram_listener when smm is off
If the user set disable smm by '-machine smm=off', we
should not register smram_listener so that we can
avoid waster memory in kvm since the added sencond
address space.
Meanwhile we should assign value of the global kvm_state
before invoking the kvm_arch_init(), because
pc_machine_is_smm_enabled() may use it by kvm_has_mm().
Paolo Bonzini [Thu, 1 Jun 2017 10:44:56 +0000 (12:44 +0200)]
nbd: make it thread-safe, fix qcow2 over nbd
NBD is not thread safe, because it accesses s->in_flight without
a CoMutex. Fixing this will be required for multiqueue.
CoQueue doesn't have spurious wakeups but, when another coroutine can
run between qemu_co_queue_next's wakeup and qemu_co_queue_wait's
re-locking of the mutex, the wait condition can become false and
a loop is necessary.
In fact, it turns out that the loop is necessary even without this
multi-threaded scenario. A particular sequence of coroutine wakeups
is happening ~80% of the time when starting a guest with qcow2 image
served over NBD (i.e. qemu-nbd --format=raw, and QEMU's -drive option
has -format=qcow2). This patch fixes that issue too.
target/i386: Add GDB XML description for SSE registers
Add an XML description for SSE registers (XMM+MXCSR) for both X86
and X86-64 architectures in the GDB stub:
- configure: Define gdb_xml_files for the X86 targets (32 and 64bit).
- gdb-xml/i386-32bit-sse.xml & gdb-xml/i386-64bit-sse.xml: The XML files
that contain a description of the XMM + MXCSR registers.
- gdb-xml/i386-32bit.xml & gdb-xml/i386-64bit.xml: wrappers that include
the XML file of the core registers and the other XML file of the SSE registers.
- target/i386/cpu.c: Modify the gdb_core_xml_file to the new XML wrapper,
modify the gdb_num_core_regs to fit the registers number defined in each
XML file.
Roman Pen [Thu, 1 Jun 2017 08:56:04 +0000 (10:56 +0200)]
i386/kvm: do not zero out segment flags if segment is unusable or not present
This is a fix for the problem [1], where VMCB.CPL was set to 0 and interrupt
was taken on userspace stack. The root cause lies in the specific AMD CPU
behaviour which manifests itself as unusable segment attributes on SYSRET[2].
Here in this patch flags are not touched even segment is unusable or is not
present, therefore CPL (which is stored in DPL field) should not be lost and
will be successfully restored on kvm/svm kernel side.
Also current patch should not break desired behavior described in this commit:
4cae9c97967a ("target-i386: kvm: clear unusable segments' flags in migration")
since present bit will be dropped if segment is unusable or is not present.
This is the second part of the whole fix of the corresponding problem [1],
first part is related to kvm/svm kernel side and does exactly the same:
segment attributes are not zeroed out.
Paolo Bonzini [Wed, 31 May 2017 12:56:37 +0000 (14:56 +0200)]
edu: fix memory leak on msi_broken platforms
If msi_init fails, the thread has already been created and the
mutex/condvar are not destroyed. Initialize everything only
after the point where pci_edu_realize cannot fail.
Paolo Bonzini [Wed, 31 May 2017 12:37:15 +0000 (14:37 +0200)]
linuxboot_dma: compile for i486
The ROM uses the cmovne instruction, which is new in Pentium Pro and does not
work when running QEMU with "-cpu 486". Avoid producing that instruction.
Denis Plotnikov [Mon, 29 May 2017 10:49:04 +0000 (13:49 +0300)]
kvmclock: update system_time_msr address forcibly
Do an update of system_time_msr address every time before reading
the value of tsc_timestamp from guest's kvmclock page.
There is no other code paths which ensure that qemu has an up-to-date
value of system_time_msr. So, force this update on guest's tsc_timestamp
reading.
This bug causes effect on those nested setups which turn off TPR access
interception for L2 guests and that access being intercepted by L0 doesn't
show up in L1.
Linux bootstrap initiate kvmclock before APIC initializing causing TPR access.
That's why on L1 guests, having TPR interception turned on for L2, the effect
of the bug is not revealed.
This patch fixes this problem by making sure it knows the correct
system_time_msr address every time it is needed.
Eric Blake [Sat, 27 May 2017 03:04:21 +0000 (22:04 -0500)]
nbd: Fully initialize client in case of failed negotiation
If a non-NBD client connects to qemu-nbd, we would end up with
a SIGSEGV in nbd_client_put() because we were trying to
unregister the client's association to the export, even though
we skipped inserting the client into that list. Easy trigger
in two terminals:
nmap claims that it thinks it connected to a pago-services1
server (which probably means nmap could be updated to learn the
NBD protocol and give a more accurate diagnosis of the open
port - but that's not our problem), then terminates immediately,
so our call to nbd_negotiate() fails. The fix is to reorder
nbd_co_client_start() to ensure that all initialization occurs
before we ever try talking to a client in nbd_negotiate(), so
that the teardown sequence on negotiation failure doesn't fault
while dereferencing a half-initialized object.
While debugging this, I also noticed that nbd_update_server_watch()
called by nbd_client_closed() was still adding a channel to accept
the next client, even when the state was no longer RUNNING. That
is fixed by making nbd_can_accept() pay attention to the current
state.
sockets: improve error reporting if UNIX socket path is too long
The 'struct sockaddr_un' only allows 108 bytes for the socket
path.
If the user supplies a path, QEMU uses snprintf() to silently
truncate it when too long. This is undesirable because the user
will then be unable to connect to the path they asked for.
If the user doesn't supply a path, QEMU builds one based on
TMPDIR, but if that leads to an overlong path, it mistakenly
uses error_setg_errno() with a stale errno value, because
snprintf() does not set errno on truncation.
In solving this the code needed some refactoring to ensure we
don't pass 'un.sun_path' directly to any APIs which expect
NUL-terminated strings, because the path is not required to
be terminated.
Mihail Abakumov [Fri, 19 May 2017 09:36:15 +0000 (12:36 +0300)]
i386: fix read/write cr with icount option
Running Windows with icount causes a crash in instruction of write cr.
This patch fixes it.
Reading and writing cr cause an icount read because there are called
cpu_get_apic_tpr and cpu_set_apic_tpr functions. So, there is need
gen_io_start()/gen_io_end() calls.
Peter Maydell [Wed, 7 Jun 2017 15:29:29 +0000 (16:29 +0100)]
arm_gicv3: Fix ICC_BPR1 reset value when EL3 not implemented
If EL3 is not implemented (ie only one security state) then the
one and only ICC_BPR1 register behaves like the Non-secure
ICC_BPR1 in an EL3-present configuration. In particular, its
reset value is GIC_MIN_BPR_NS, not GIC_MIN_BPR.
Correct the erroneous reset value; this fixes a problem where
we might hit the assert added in commit a89ff39ee901.
# gpg: Signature made Wed 07 Jun 2017 10:02:01 BST
# gpg: using RSA key 0xF487EF185872D723
# gpg: Good signature from "Juan Quintela <[email protected]>"
# gpg: aka "Juan Quintela <[email protected]>"
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723
* remotes/juanquintela/tags/migration/20170607:
qemu/migration: fix the double free problem on from_src_file
ram: Make RAMState dynamic
ram: Use MigrationStats for statistics
ram: Move ZERO_TARGET_PAGE inside XBZRLE
ram: Call migration_page_queue_free() at ram_migration_cleanup()
ram: We only print throttling information sometimes
ram: Unfold get_xbzrle_cache_stats() into populate_ram_info()
Peter Maydell [Wed, 7 Jun 2017 10:16:22 +0000 (11:16 +0100)]
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Wed 07 Jun 2017 04:29:20 BST
# gpg: using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
Revert "Change net/socket.c to use socket_*() functions" again
net/rocker: Cleanup the useless return value check
* remotes/rth/tags/pull-s390-20170606: (70 commits)
target/s390x: addressing exceptions are suppressing
target/s390x: mark ETF2 and ETF2-ENH facilities as available
target/s390x: check alignment in CDSG in the !CONFIG_ATOMIC128 case
target/s390x: implement STORE PAIR TO QUADWORD
target/s390x: implement LOAD PAIR FROM QUADWORD
target/s390x: implement TRANSLATE ONE/TWO TO ONE/TWO
target/s390x: implement TEST DECIMAL
target/s390x: implement UNPACK UNICODE
target/s390x: implement UNPACK ASCII
target/s390x: implement PACK UNICODE
target/s390x: implement PACK ASCII
target/s390x: implement MOVE LONG UNICODE
target/s390x: implement COMPARE LOGICAL LONG UNICODE
target/s390x: improve MOVE LONG and MOVE LONG EXTENDED
target/s390x: fix adj_len_to_page
target/s390x: implement COMPARE LOGICAL LONG
target/s390x: fix COMPARE LOGICAL LONG EXTENDED
target/s390x: improve 24-bit and 31-bit lengths read/write
target/s390x: improve 24-bit and 31-bit addresses write
target/s390x: improve 24-bit and 31-bit addresses read
...
QingFeng Hao [Tue, 6 Jun 2017 05:24:38 +0000 (07:24 +0200)]
qemu/migration: fix the double free problem on from_src_file
In load_snapshot, mis->from_src_file is freed twice, the first free is by
qemu_fclose, the second is by migration_incoming_state_destroy and
it causes Illegal instruction exception. The fix is just to remove the
first free.
This problem is found by qemu-iotests case 068 since commit
"660819b migration: shut src return path unconditionally". The error is:
068 1s ... - output mismatch (see 068.out.bad)
--- tests/qemu-iotests/068.out 2017-05-06 01:00:26.417270437 +0200
+++ 068.out.bad 2017-06-03 13:59:55.360274640 +0200
@@ -6,6 +6,8 @@
QEMU X.Y.Z monitor - type 'help' for more information
(qemu) savevm 0
(qemu) quit
+./common.config: line 107: 242472 Illegal instruction (core dumped) ( if [ -n "${QEMU_NEED_PID}" ]; then
+ echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid";
+fi; exec "$QEMU_PROG" $QEMU_OPTIONS "$@" )
QEMU X.Y.Z monitor - type 'help' for more information
-(qemu) quit
-*** done
+(qemu) *** done
Juan Quintela [Tue, 6 Jun 2017 17:49:03 +0000 (19:49 +0200)]
ram: Use MigrationStats for statistics
RAM Statistics need to survive migration to make info migrate work, so we
need to store them outside of RAMState. As we already have an struct
with those fields, just used them. (MigrationStats and XBZRLECacheStats).
This code changed net/socket.c from using socket()+connect(),
to using socket_connect(). In theory this is great, but in
practice this has completely broken the ability to connect
the frontend and backend:
The old code would call net_socket_fd_init() synchronously,
while letting the connect() complete in the backgorund. The
new code moved net_socket_fd_init() so that it is only called
after connect() completes in the background.
Thus at the time we initialize the NIC frontend, the backend
does not exist.
The socket_connect() conversion as done is a bad fit for the
current code, since it did not try to change the way it deals
with async connection completion. Rather than try to fix this,
just revert the socket_connect() conversion entirely.
The code is about to be converted to use QIOChannel which
will let the problem be solved in a cleaner manner. This
revert is more suitable for stable branches in the meantime.
Mao Zhongyi [Wed, 24 May 2017 02:57:18 +0000 (10:57 +0800)]
net/rocker: Cleanup the useless return value check
None of pci_dma_read()'s callers check the return value except
rocker. There is no need to check it because it always return
0. So the check work is useless. Remove it entirely.
target/s390x: addressing exceptions are suppressing
We have to make the address in the old PSW point at the next
instruction, as addressing exceptions are suppressing and not
nullifying.
I assume that there are a lot of other broken cases (as most instructions
we care about are suppressing) - all trigger_pgm_exception() specifying
and explicit number or ILEN_LATER look suspicious, however this is another
story that might require bigger changes (and I have to understand when
the address might already have been incremented first).
This is needed to make an upcoming kvm-unit-test work.
Aurelien Jarno [Sun, 4 Jun 2017 20:20:34 +0000 (22:20 +0200)]
target/s390x: check alignment in CDSG in the !CONFIG_ATOMIC128 case
The CDSG instruction requires a 16-byte alignement, as expressed in
the MO_ALIGN_16 passed to helper_atomic_cmpxchgo_be_mmu. In the non
parallel case, use check_alignment to enforce this.
Aurelien Jarno [Wed, 31 May 2017 22:01:20 +0000 (00:01 +0200)]
target/s390x: implement COMPARE LOGICAL LONG UNICODE
For that we need to make program_interrupt available to qemu-user.
Fortunately there is almost nothing to change as both kvm_enabled and
CONFIG_KVM evaluate to false in that case.
Aurelien Jarno [Wed, 31 May 2017 22:01:19 +0000 (00:01 +0200)]
target/s390x: improve MOVE LONG and MOVE LONG EXTENDED
As MVCL and MVCLE only differ by their operands, use a common
do_mvcl helper. Optimize it calling fast_memmove and fast_memset.
Correctly write back addresses. Check that r1 and r2/r3 registers
are even.
Aurelien Jarno [Wed, 31 May 2017 22:01:16 +0000 (00:01 +0200)]
target/s390x: fix COMPARE LOGICAL LONG EXTENDED
There are multiple issues with the COMPARE LOGICAL LONG EXTENDED
instruction:
- The test between the two operands is inverted, leading to an inversion
of the cc values 1 and 2.
- The address and length of an operand continue to be decreased after
reaching the end of this operand. These values are then wrong write
back to the registers.
- We should limit the amount of bytes to process, so that interrupts can
be served correctly.
At the same time rename dest into src1 and src into src3 to match the
operand names and make the code less confusing.
Aurelien Jarno [Wed, 31 May 2017 22:01:13 +0000 (00:01 +0200)]
target/s390x: improve 24-bit and 31-bit addresses read
Improve fix_address to also handle the 24-bit mode. Rename fix_address
to wrap_address to better explain what is changed.
Replace the calls to get_address with x2 = 0 and b2 = 0 by
call to wrap_address, leading to the removal of this function. Rename
get_address_31fix into get_address.
Thomas Huth [Thu, 25 May 2017 09:22:12 +0000 (11:22 +0200)]
target/s390x/cpu_models: Allow some additional feature bits for the "qemu" CPU
Currently we only present the plain z900 feature bits to the guest,
but QEMU already emulates some additional features (but not all of
the next CPU generation, so we can not use the next CPU level as
default yet). Since newer Linux kernels are checking the feature bits
and refuse to work if a required feature is missing, it would be nice
to have a way to present more of the supported features when we are
running with the "qemu" CPU.
This patch now adds the supported features to the "full_feat" bitmap,
so that additional features can be enabled on the command line now,
for example with:
target/s390x: Re-implement a few EXECUTE target insns directly
While the previous patch is required for proper conformance,
the vast majority of target insns are MVC and XC for implementing
memmove and memset respectively. The next most common are CLC,
TR, and SVC.
Implementing these (and a few others for which we already have
an implementation) directly is faster than going through full
translation to a TB.
target/s390x: Implement EXECUTE via new TranslationBlock
Previously, helper_ex would construct the insn and then implement
the insn via direct calls other helpers. This was sufficient to
boot Linux but that is all.
It is easy enough to go the whole nine yards by stashing state for
EXECUTE within the cpu, and then rely on a new TB to be created
that properly and completely interprets the insn.
(1) The OR of the low bits or R1 into INSN were not being done
consistently; it was forgotten along all but the SVC path.
(2) The setting of ILEN was wrong on SVC path for EXRL.
(3) The data load for ICM read too much.
Fix these by consolidating data load at the beginning, using
get_ilen to control the number of bytes loaded, and ORing in
the byte from R1. Use extract64 from the full aligned insn
to extract arguments.
Pass in ILEN rather than RET as the more natural way to give
the required data along the SVC path.
Modify ENV->CC_OP directly rather than include it in the
functional interface.