block/gluster: add support to choose libgfapi logfile
currently all the libgfapi logs defaults to '/dev/stderr' as it was hardcoded
in a call to glfs logging api. When the debug level is chosen to DEBUG/TRACE,
gfapi logs will be huge and fill/overflow the console view.
This patch provides a commandline option to mention log file path which helps
in logging to the specified file and also help in persisting the gfapi logs.
Peter Maydell [Mon, 12 Sep 2016 11:48:47 +0000 (12:48 +0100)]
Merge remote-tracking branch 'remotes/berrange/tags/pull-qcrypto-2016-09-12-1' into staging
Merge qcrypto 2016/09/12 v1
# gpg: Signature made Mon 12 Sep 2016 12:02:20 BST
# gpg: using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <[email protected]>"
# gpg: aka "Daniel P. Berrange <[email protected]>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF
* remotes/berrange/tags/pull-qcrypto-2016-09-12-1:
crypto: report enum strings instead of values in errors
crypto: fix building complaint
crypto: ensure XTS is only used with ciphers with 16 byte blocks
crypto: report enum strings instead of values in errors
Several error messages print out the raw enum value, which
is less than helpful to users, as these values are not
documented, nor stable across QEMU releases. Switch to use
the enum string instead.
The nettle impl also had two typos where it mistakenly
said "algorithm" instead of "mode", and actually reported
the algorithm value too.
gnutls commit 846753877d renamed LIBGNUTLS_VERSION_NUMBER to GNUTLS_VERSION_NUMBER.
If using gnutls before that verion, we'll get the below warning:
crypto/tlscredsx509.c:618:5: warning: "GNUTLS_VERSION_NUMBER" is not defined
Because gnutls 3.x still defines LIBGNUTLS_VERSION_NUMBER for back compat, Let's
use LIBGNUTLS_VERSION_NUMBER instead of GNUTLS_VERSION_NUMBER to fix building
complaint.
crypto: ensure XTS is only used with ciphers with 16 byte blocks
The XTS cipher mode needs to be used with a cipher which has
a block size of 16 bytes. If a mis-matching block size is used,
the code will either corrupt memory beyond the IV array, or
not fully encrypt/decrypt the IV.
This fixes a memory corruption crash when attempting to use
cast5-128 with xts, since the former has an 8 byte block size.
A test case is added to ensure the cipher creation fails with
such an invalid combination.
Peter Maydell [Mon, 12 Sep 2016 10:25:40 +0000 (11:25 +0100)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
virtio,vhost,pc: fixes and updates
balloon fixes wrt migration
virtio-vsock device support
Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Fri 09 Sep 2016 22:36:13 BST
# gpg: using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>"
# gpg: aka "Michael S. Tsirkin <[email protected]>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream:
vhost-vsock: add virtio sockets device
tests/acpi: speedup acpi tests
virtio-pci: minor refactoring
vhost: don't set vring call if no vector
virtio-pci: error out when both legacy and modern modes are disabled
virtio-balloon: fix stats vq migration
virtio: add virtqueue_rewind()
virtio-balloon: discard virtqueue element on reset
virtio: zero vq->inuse in virtio_reset()
virtio-pci: reduce modern_mem_bar size
target-i386: present virtual L3 cache info for vcpus
pc: Add 2.8 machine
virtio-pci: use size from correct structure
virtio: Tell the user what went wrong when event_notifier_init failed
Stefan Hajnoczi [Tue, 16 Aug 2016 12:27:22 +0000 (13:27 +0100)]
vhost-vsock: add virtio sockets device
Implement the new virtio sockets device for host<->guest communication
using the Sockets API. Most of the work is done in a vhost kernel
driver so that virtio-vsock can hook into the AF_VSOCK address family.
The QEMU vhost-vsock device handles configuration and live migration
while the rx/tx happens in the vhost_vsock.ko Linux kernel driver.
The vsock device must be given a CID (host-wide unique address):
Jason Wang [Mon, 1 Aug 2016 08:07:58 +0000 (16:07 +0800)]
vhost: don't set vring call if no vector
We used to set vring call fd unconditionally even if guest driver does
not use MSIX for this vritqueue at all. This will cause lots of
unnecessary userspace access and other checks for drivers does not use
interrupt at all (e.g virtio-net pmd). So check and clean vring call
fd if guest does not use any vector for this virtqueue at
all.
Perf diffs (on rx) shows lots of cpus wasted on vhost_signal() were saved:
Greg Kurz [Fri, 9 Sep 2016 09:00:59 +0000 (11:00 +0200)]
virtio-pci: error out when both legacy and modern modes are disabled
Without presuming if we got there because of a user mistake or some
more subtle bug in the tooling, it really does not make sense to
implement a non-functional device.
The statistics virtqueue is not migrated properly because virtio-balloon
does not include s->stats_vq_elem in the migration stream.
After migration the statistics virtqueue hangs because the host never
completes the last element (s->stats_vq_elem is NULL on the destination
QEMU). Therefore the guest never submits new elements and the virtqueue
is hung.
Instead of changing the migration stream format in an incompatible way,
detect the migration case and rewind the virtqueue so the last element
can be completed.
Stefan Hajnoczi [Wed, 7 Sep 2016 15:20:48 +0000 (17:20 +0200)]
virtio: add virtqueue_rewind()
virtqueue_discard() requires a VirtQueueElement but virtio-balloon does
not migrate its in-use element. Introduce a new function that is
similar to virtqueue_discard() but doesn't require a VirtQueueElement.
This will allow virtio-balloon to access element again after migration
with the usual proviso that the guest may have modified the vring since
last time.
virtio-balloon: discard virtqueue element on reset
The one pending element is being freed but not discarded on device
reset, which causes svq->inuse to creep up, eventually hitting the
"Virtqueue size exceeded" error.
Properly discarding the element on device reset makes sure that its
buffers are unmapped and the inuse counter stays balanced.
Stefan Hajnoczi [Wed, 7 Sep 2016 15:51:25 +0000 (11:51 -0400)]
virtio: zero vq->inuse in virtio_reset()
vq->inuse must be zeroed upon device reset like most other virtqueue
fields.
In theory, virtio_reset() just needs assert(vq->inuse == 0) since
devices must clean up in-flight requests during reset (requests cannot
not be leaked!).
In practice, it is difficult to achieve vq->inuse == 0 across reset
because balloon, blk, 9p, etc implement various different strategies for
cleaning up requests. Most devices call g_free(elem) directly without
telling virtio.c that the VirtQueueElement is cleaned up. Therefore
vq->inuse is not decremented during reset.
This patch zeroes vq->inuse and trusts that devices are not leaking
VirtQueueElements across reset.
I will send a follow-up series that refactors request life-cycle across
all devices and converts vq->inuse = 0 into assert(vq->inuse == 0) but
this more invasive approach is not appropriate for stable trees.
Currently each VQ Notification Virtio Capability is allocated
on a different page. The idea is to enable split drivers within
guests, however there are no known plans to do that.
The allocation will result in a 8MB BAR, more than various
guest firmwares pre-allocates for PCI Bridges hotplug process.
Reserve 4 bytes per VQ by default and add a new parameter
"page-per-vq" to be used with split drivers.
target-i386: present virtual L3 cache info for vcpus
Some software algorithms are based on the hardware's cache info, for example,
for x86 linux kernel, when cpu1 want to wakeup a task on cpu2, cpu1 will trigger
a resched IPI and told cpu2 to do the wakeup if they don't share low level
cache. Oppositely, cpu1 will access cpu2's runqueue directly if they share llc.
The relevant linux-kernel code as bellow:
In real hardware, the cpus on the same socket share L3 cache, so one won't
trigger a resched IPIs when wakeup a task on others. But QEMU doesn't present a
virtual L3 cache info for VM, then the linux guest will trigger lots of RES IPIs
under some workloads even if the virtual cpus belongs to the same virtual socket.
For KVM, there will be lots of vmexit due to guest send IPIs.
The workload is a SAP HANA's testsuite, we run it one round(about 40 minuates)
and observe the (Suse11sp3)Guest's amounts of RES IPIs which triggering during
the period:
No-L3 With-L3(applied this patch)
cpu0: 363890 44582
cpu1: 373405 43109
cpu2: 340783 43797
cpu3: 333854 43409
cpu4: 327170 40038
cpu5: 325491 39922
cpu6: 319129 42391
cpu7: 306480 41035
cpu8: 161139 32188
cpu9: 164649 31024
cpu10: 149823 30398
cpu11: 149823 32455
cpu12: 164830 35143
cpu13: 172269 35805
cpu14: 179979 33898
cpu15: 194505 32754
avg: 268963.6 40129.8
The VM's topology is "1*socket 8*cores 2*threads".
After present virtual L3 cache info for VM, the amounts of RES IPIs in guest
reduce 85%.
For KVM, vcpus send IPIs will cause vmexit which is expensive, so it can cause
severe performance degradation. We had tested the overall system performance if
vcpus actually run on sparate physical socket. With L3 cache, the performance
improves 7.2%~33.1%(avg:15.7%).
PIO MR registration should use size from the correct notify struct.
Doesn't affect any visible behaviour because the field values are the
same (both are 4).
Thomas Huth [Mon, 27 Jun 2016 22:12:03 +0000 (00:12 +0200)]
virtio: Tell the user what went wrong when event_notifier_init failed
event_notifier_init() can fail in real life, for example when there
are not enough open file handles available (EMFILE) when using a lot
of devices. So instead of leaving the average user with a cryptic
error number only, print out a proper error message with strerror()
instead, so that the user has a better way to figure out what is
going on and that using "ulimit -n" might help here for example.
Peter Maydell [Fri, 9 Sep 2016 11:49:41 +0000 (12:49 +0100)]
Merge remote-tracking branch 'remotes/famz/tags/docker-pull-request' into staging
# gpg: Signature made Fri 09 Sep 2016 05:54:35 BST
# gpg: using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6
* remotes/famz/tags/docker-pull-request:
docker: silence debootstrap when --quiet is given
docker: build debootstrap after cloning
docker: make sure debootstrap is at least 1.0.67
docker: print warning if EXECUTABLE is not set when building debootstrap image
docker: debian-bootstrap.pre: print helpful message if DEB_ARCH/DEB_TYPE unset
docker: debian-bootstrap.pre: print error messages to stderr
docker: avoid dependency on 'realpath' package
docker.py: don't hang on large docker output
docker: Add a glib2-2.22 image
Peter Maydell [Fri, 5 Aug 2016 10:43:20 +0000 (11:43 +0100)]
qtest.c: Allow zero size in memset qtest commands
Some tests use the qtest protocol "memset" command with a zero
size, expecting it to do nothing. However in the current code this
will result in calling memset() with a NULL pointer, which is
undefined behaviour. Detect and specially handle zero sizes to
avoid this.
Peter Maydell [Thu, 8 Sep 2016 14:22:50 +0000 (15:22 +0100)]
Merge remote-tracking branch 'remotes/elmarco/tags/leak-pull-request' into staging
Pull request
v2:
- dropped "tests: fix small leak in test-io-channel-command" that Daniel Berrange will pick
- fixed "tests: add qtest_add_data_func_full" to work with glib < 2.26
# gpg: Signature made Thu 08 Sep 2016 15:16:54 BST
# gpg: using RSA key 0xDAE8E10975969CE5
# gpg: Good signature from "Marc-André Lureau <[email protected]>"
# gpg: aka "Marc-André Lureau <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 87A9 BD93 3F87 C606 D276 F62D DAE8 E109 7596 9CE5
Allows one to specify a destroy function for the test data.
Add a fallback using glib g_test_add_vtable() internal function, whose
signature changed over time. Tested with glib 2.22, 2.26 and 2.48, which
according to git log should be enough to cover all variations.
machine_class_base_init() member name is allocated by
machine_class_base_init(), but not freed by
machine_class_finalize(). Simply freeing there doesn't work,
because DEFINE_PC_MACHINE() overwrites it with a literal string.
Fix DEFINE_PC_MACHINE() not to overwrite it, and add the missing
free to machine_class_finalize().
The isa_register_portio_list() function allocates ioports
data/state. Let's keep the reference to this data on some owner. This
isn't enough to fix leaks, but at least, ASAN stops complaining of
direct leaks. Further cleanup would require calling
portio_list_del/destroy().
If we silence docker when --quiet is given, we should also silence the
.pre script (i.e. debootstrap).
Only discards stdout, so some diagnostics (e.g. from git clone) are
still printed. Most of the verbose output is gone however and this way
we still have a chance to see error messages.
When using the git version of debootstrap (because no usable version
of debootstrap was installed on the host), we need to run 'make' so
that devices.tar.gz gets built. Otherwise the first debootstrap stage
will fail without printing any error message.
debootstrap prior to 1.0.67 generated an empty sources.list during
foreign bootstraps (Debian#732255 [1]). Fall back to the git checkout
if the installed debootstrap version is too old.
docker: print warning if EXECUTABLE is not set when building debootstrap image
Building the debian-debootstrap image will usually fail if EXECUTABLE
isn't set (when using the Makefile). Warn the user in this case so
they know why it's failing.
docker: debian-bootstrap.pre: print helpful message if DEB_ARCH/DEB_TYPE unset
The debian-bootstrap image doesn't choose a default architecture and
distribution version, instead the user has to set both DEB_ARCH and
DEB_TYPE in the environment. Print a reasonably helpful message if
either of them isn't set instead of complaining about "qemu-" being
missing or erroring out because we cannot cd to the mirror URL.
Unlike Popen.communicate(), subprocess.call() doesn't read from the
stdout file descriptor. If the child process produces more output than
fits into the pipe buffer, it will block indefinitely.
If we don't intend to consume the output, just send it straight to
/dev/null to avoid this issue.
Peter Maydell [Thu, 8 Sep 2016 10:28:11 +0000 (11:28 +0100)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.8-20160907' into staging
ppc patch queue for 2016-Sep-7
This is my first pull request for the newly opened qemu-2.8 tree. It
contains a heap of things that were too late for 2.7 and have been
queued for a while. In particular:
* A number of preliminary patches for the powernv machine type
* A substantial cleanup of exception handling which will be
necessary to support running a TCG with hypervisor
facilities
* A start on support for POWER9
* Some TCG implementations for new POWER9 instructions
* Some TCG and related cleanups in preparation for POWER9
* Some assorted TCG optimizations
* An implementation of the H_CHANGE_LOGICAL_LAN_MAC hypercall
which allows the MAC address to be changed on the PAPR virtual
NIC.
* Add some extra test cases for several machines (this isn't
strictly in the ppc code, but is most value to ppc)
NOTE: This pull request supersedes ppc-for-2.8-20160906, which had
some problems. Changes:
* Dropped BenH's lmw/stmw speedups, which break for
qemu-system-ppc64 on BE hosts
* A small fix to Thomas' serial output test to avoid a warning on
the isapc machine type.
* Some trivial checkpatch fixes
Note that some of the patches in this series still have large numbers
of checkpatch warnings. This is because they're moving existing code
that predates most of the checkpatch style conventions.
* remotes/dgibson/tags/ppc-for-2.8-20160907: (64 commits)
tests: Check serial output of firmware boot of some machines
tests: Resort check-qtest entries in Makefile.include
spapr: implement H_CHANGE_LOGICAL_LAN_MAC h_call
ppc: Improve a few more helper flags
ppc: Improve the exception helpers flags
ppc: Improve flags for helpers loading/writing the time facilities
ppc: Don't generate dead code on unconditional branches
ppc: Stop dumping state on all exceptions in linux-user
ppc: Fix catching some segfaults in user mode
ppc: Fix macio ESCC legacy mapping
hw/ppc: add a ppc_create_page_sizes_prop() helper routine
hw/ppc: use error_report instead of fprintf
ppc: Rename #include'd .c files to .inc.c
target-ppc: add extswsli[.] instruction
target-ppc: add vsrv instruction
target-ppc: add vslv instruction
target-ppc: add vcmpnez[b,h,w][.] instructions
target-ppc: add vabsdu[b,h,w] instructions
target-ppc: add dtstsfi[q] instructions
target-ppc: implement branch-less divd[o][.]
...
Thomas Huth [Sat, 3 Sep 2016 09:57:51 +0000 (11:57 +0200)]
tests: Check serial output of firmware boot of some machines
Some of the machines that we have got a firmware image for write
some output to the serial console while booting up. We can use
this output to make sure that the machine is basically working,
so this adds a test that checks the output of these machines
for some well-known "magic" strings.
Thomas Huth [Sat, 3 Sep 2016 09:57:50 +0000 (11:57 +0200)]
tests: Resort check-qtest entries in Makefile.include
The rather random list of check-qtest-xxx entries caused some
confusion in the past, where to use "=" and where to use "+="
(see commits 0ccac16f59462b8e2b9afbc1 and 1f5c1cfbaec0792cd2e5da
for example).
Sorting the check-qtest-xxx entries by architecure instead and
using some empty lines inbetween should help to ease this
situation a little bit, so that it is hopefully now obvious
that new tests should be added with "+=" instead of "=".
While we are at it, this patch also comments out two of the
"gcov-files-..." lines since the corresponding m48t59-test is
disabled for sparc and sparc64, too.
Mostly turn "store" type of helpers into TCG_CALL_NO_WG because
they can take exceptions. Also fixup_thrm doesn't read nor write
the tracked environment.
ppc: Improve flags for helpers loading/writing the time facilities
Those helpers never load from or store to the TCG tracked environment,
not do they generate synchronous exceptions (they might generate an
asynchronous interrupt but that's not an issue here).
ppc: Stop dumping state on all exceptions in linux-user
Other archs don't do it, some programs catch signals just fine
and those dumps just clutter the output. Keep the dumps for cases
that aren't supposed to happen such as unknown codes.
The usermode "translate" code generates an error code value that
has the "is_write" bit set, which causes our switch/case to miss
and display "Invalid segfault errno" and a spurrious second state
dump. Fix it.
The current mapping, while correct for the base ports (which is all the
driver uses these days), is wrong for the extended registers.
I suspect the bugs come from incorrect tables in the CHRP IO Ref document,
I have verified the new values here match Apple's MacTech.pdf.
Note: Nothing that I know of actually uses these registers so it's not a
huge deal, but this patch has the added advantage of adding comments to
document what the registers are.
vcmpnezb[.]: Vector Compare Not Equal or Zero Byte
vcmpnezh[.]: Vector Compare Not Equal or Zero Halfword
vcmpnezw[.]: Vector Compare Not Equal or Zero Word
While implementing modulo instructions figured out that the
implementation uses many branches. Change the logic to achieve the
branch-less code. Undefined value is set to dividend in case of invalid
input.
The current alignment exception generation tries to load the opcode
to put in DSISR from a context where a cpu_ldl_code() is really not
a good idea. It might fault and longjmp out and that's not something
we want happening here.
Instead, pass the releavant opcode bits via the error_code.
There are a couple of cases of alignment interrupts that won't set
anything, the ones coming from access to direct store segments, but
that doesn't happen in practice, nobody used direct store segments
and they are gone from newer chips.
ppc: Don't update NIP in facility unavailable interrupts
This is no longer necessary as the helpers will properly retrieve
the return address when needed. Also remove gen_update_current_nip()
which didn't seem to make much sense to me.
Instead, pass GETPC() result to the corresponding helpers. This
requires a bit of fiddling to get the PC (hopefully) right in
the case where we generate a program check, though the hacks there
are temporary, a subsequent patch will clean this all up by always
having the nip already set to the right instruction when taking
the fault.
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
[dwg: Fix trivial checkpatch warning] Signed-off-by: David Gibson <[email protected]>
We don't implement imprecise FP exceptions and using store_current
which sets SRR1 to the *previous* instruction never makes sense
for these. So let's be truthful and make them precise, which is
allowed by the architecture.