Peter Maydell [Fri, 5 Sep 2014 15:03:56 +0000 (16:03 +0100)]
Merge remote-tracking branch 'remotes/afaerber/tags/qom-cpu-for-peter' into staging
QOM CPUState and X86CPU
* Include exception state in CPU VMState
* Fix -cpu *,migratable=foo
* Error out on unknown -cpu *,+foo,-bar
# gpg: Signature made Fri 05 Sep 2014 15:38:14 BST using RSA key ID 3E7E013F
# gpg: Good signature from "Andreas Färber <[email protected]>"
# gpg: aka "Andreas Färber <[email protected]>"
* remotes/afaerber/tags/qom-cpu-for-peter:
target-i386: Reject invalid CPU feature names on the command-line
target-i386: Support migratable=no properly
exec: Save CPUState::exception_index field
Eduardo Habkost [Wed, 20 Aug 2014 20:30:12 +0000 (17:30 -0300)]
target-i386: Support migratable=no properly
When the "migratable" property was implemented, the behavior was tested
by changing the default on the code, but actually using the option on
the command-line (e.g. "-cpu host,migratable=false") doesn't work as
expected. This is a regression for a common use case of "-cpu host",
which is to enable features that are supported by the host CPU + kernel
before feature-specific code is added to QEMU.
Fix this by initializing the feature words for "-cpu host" on
x86_cpu_parse_featurestr(), right after parsing the CPU options.
Pavel Dovgaluk [Thu, 31 Jul 2014 05:41:17 +0000 (09:41 +0400)]
exec: Save CPUState::exception_index field
This patch adds a subsection with exception_index field to the VMState for
correct saving the CPU state.
Without this patch, simulator could miss the pending exception in the saved
virtual machine state.
If we need to, we should use the pixman formats instead but for
now this is unused except in commented out code so take it out
to avoid further confusion about surface endianness.
Gerd Hoffmann [Thu, 19 Jun 2014 06:52:17 +0000 (08:52 +0200)]
console: add dpy_gfx_update_dirty
Calls dpy_gfx_update for all dirty scanlines. Works for
DisplaySurfaces backed by guest memory (i.e. the ones created
using qemu_create_displaysurface_guestmem).
Gerd Hoffmann [Thu, 19 Jun 2014 06:46:08 +0000 (08:46 +0200)]
console: add qemu_create_displaysurface_guestmem
This patch adds a qemu_create_displaysurface_guestmem helper function.
Works simliar to qemu_create_displaysurface_from, but accepts a
guest address instead of a host pointer and it handles
cpu_physical_memory_{map,unmap} for you.
Gerd Hoffmann [Wed, 18 Jun 2014 09:03:15 +0000 (11:03 +0200)]
console: stop using PixelFormat
With this patch the qemu console core stops using PixelFormat and pixman
format codes side-by-side, pixman format code is the primary way to
specify the DisplaySurface format:
* DisplaySurface stops carrying a PixelFormat field.
* qemu_create_displaysurface_from() expects a pixman format now.
Functions to convert PixelFormat to pixman_format_code_t (and back)
exist for those who still use PixelFormat. As PixelFormat allows
easy access to masks and shifts it will probably continue to exist.
Gerd Hoffmann [Wed, 18 Jun 2014 09:07:50 +0000 (11:07 +0200)]
console: reimplement qemu_default_pixelformat
Use the new qemu_pixelformat_from_pixman and qemu_default_pixman_format
functions to reimplement qemu_default_pixelformat
(qemu_different_endianness_pixelformat too).
Sebastian Tanase [Mon, 28 Jul 2014 11:39:14 +0000 (13:39 +0200)]
pty: Fix byte loss bug when connecting to pty
When trying to print data to the pty, we first check if it is connected.
If not, we try to reconnect, but we drop the pending data even if we
have successfully reconnected; this makes us lose the first byte of the very
first transmission.
This small fix addresses the issue by checking once more if the pty is connected
after having tried to reconnect.
Peter Maydell [Fri, 5 Sep 2014 11:26:33 +0000 (12:26 +0100)]
Merge remote-tracking branch 'remotes/kraxel/tags/pull-cve-2014-3615-20140905-1' into staging
CVE-2014-3615: fix sanity checks in vbe (bochs dispi) and spice.
# gpg: Signature made Fri 05 Sep 2014 12:18:04 BST using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg: aka "Gerd Hoffmann <[email protected]>"
# gpg: aka "Gerd Hoffmann (private) <[email protected]>"
* remotes/kraxel/tags/pull-cve-2014-3615-20140905-1:
spice: make sure we don't overflow ssd->buf
vbe: rework sanity checks
vbe: make bochs dispi interface return the correct memory size with qxl
Related spice-only bug. We have a fixed 16 MB buffer here, being
presented to the spice-server as qxl video memory in case spice is
used with a non-qxl card. It's also used with qxl in vga mode.
When using display resolutions requiring more than 16 MB of memory we
are going to overflow that buffer. In theory the guest can write,
indirectly via spice-server. The spice-server clears the memory after
setting a new video mode though, triggering a segfault in the overflow
case, so qemu crashes before the guest has a chance to do something
evil.
Fix that by switching to dynamic allocation for the buffer.
Peter Maydell [Thu, 4 Sep 2014 18:41:15 +0000 (19:41 +0100)]
Merge remote-tracking branch 'remotes/afaerber/tags/qom-devices-for-peter' into staging
QOM infrastructure fixes and device conversions
* Cleanups for recursive device unrealization
# gpg: Signature made Thu 04 Sep 2014 18:17:35 BST using RSA key ID 3E7E013F
# gpg: Good signature from "Andreas Färber <[email protected]>"
# gpg: aka "Andreas Färber <[email protected]>"
* remotes/afaerber/tags/qom-devices-for-peter:
qdev: Add cleanup logic in device_set_realized() to avoid resource leak
qdev: Use NULL instead of local_err for qbus_child unrealize
qdev: Use error_abort instead of using local_err
memory: Remove object_property_add_child_array()
qom: Add automatic arrayification to object_property_add()
machine: Clean up -machine handling
qom: Make object_child_foreach() safe for objects removal
Peter Maydell [Thu, 4 Sep 2014 17:34:28 +0000 (18:34 +0100)]
Merge remote-tracking branch 'remotes/kvaneesh/for-upstream' into staging
* remotes/kvaneesh/for-upstream:
hw/9pfs: Don't return type from host in readdir on local 9p filesystem
hw/9pfs: Use little-endian format for xattr values
qdev: Add cleanup logic in device_set_realized() to avoid resource leak
At present, this function doesn't have partial cleanup implemented,
which will cause resource leaks in some scenarios.
Example:
1. Assume that "dc->realize(dev, &local_err)" executes successful
and local_err == NULL;
2. device hotplug in hotplug_handler_plug() executes but fails
(it is prone to occur). Then local_err != NULL;
3. error_propagate(errp, local_err) and return. But the resources
which have been allocated in dc->realize() will be leaked.
Simple backtrace:
dc->realize()
|->device_realize
|->pci_qdev_init()
|->do_pci_register_device()
|->etc.
Add fuller cleanup logic which assures that function can
goto appropriate error label as local_err population is
detected at each relevant point.
qdev: Use NULL instead of local_err for qbus_child unrealize
Forcefully unrealize all children regardless of errors in earlier
iterations (if any). We should keep going with cleanup operation
rather than report an error immediately. Therefore store the first
child unrealization failure and propagate it at the end. We also
forcefully unregister vmsd and unrealize actual object, too.
Peter Maydell [Thu, 4 Sep 2014 16:39:07 +0000 (17:39 +0100)]
Merge remote-tracking branch 'remotes/stefanha/tags/net-pull-request' into staging
Net patches
# gpg: Signature made Thu 04 Sep 2014 17:32:44 BST using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg: aka "Stefan Hajnoczi <[email protected]>"
* remotes/stefanha/tags/net-pull-request:
virtio-net: purge outstanding packets when starting vhost
net: complete all queued packets on VM stop
net: invoke callback when purging queue
virtio: don't call device on !vm_running
virtio-net: don't run bh on vm stopped
net: Forbid dealing with packets when VM is not running
devices rely on packet callbacks eventually running,
but we violate this rule whenever we purge the queue.
To fix, invoke callbacks on all packets on purge.
Set length to 0, this way callers can detect that
this happened and re-queue if necessary.
is incomplete: BH might execute within the same main loop iteration but
after vmstop, so in theory, we might trigger an assertion.
I was unable to reproduce this in practice,
but it seems clear enough that the potential is there, so worth fixing.
Bastian Blank [Fri, 22 Aug 2014 09:22:21 +0000 (13:22 +0400)]
hw/9pfs: Don't return type from host in readdir on local 9p filesystem
When using mapped mode in 9pfs, readdir implementation
should not return file type in d_type from the host
readdir, instead, it should use the type stored in
the extended attributes. Since d_type is optional
and reading ext attrs for every readdir is expensive,
it should be sufficient to just set d_type to DT_UNKNOWN,
so guest will know to look it up separately.
This error can not happen normally. If it happens, it indicates
something very wrong, we should abort QEMU. Moreover, the
user can only refer to /machine/peripheral or /objects, not
/machine/unattached.
While at it, remove superfluous check about local_err.
qom: Add automatic arrayification to object_property_add()
If "[*]" is given as the last part of a QOM property name, treat that
as an array property. The added property is given the first available
name, replacing the * with a decimal number counting from 0.
First add with name "foo[*]" will be "foo[0]". Second "foo[1]" and so
on.
Callers may inspect the ObjectProperty * return value to see what
number the added property was given.
Andreas Färber [Wed, 23 Jul 2014 14:16:26 +0000 (16:16 +0200)]
machine: Clean up -machine handling
Since commit c4090f8, -object options are no longer handled through
object_set_property(), so clean up -object leftovers by renaming the
function and dropping special-casing of qom-type and id properties.
zhanghailiang [Tue, 26 Aug 2014 08:06:17 +0000 (16:06 +0800)]
net: Forbid dealing with packets when VM is not running
For all NICs(except virtio-net) emulated by qemu,
Such as e1000, rtl8139, pcnet and ne2k_pci,
Qemu can still receive packets when VM is not running.
If this happened in *migration's* last PAUSE VM stage, but
before the end of the migration, the new receiving packets will possibly dirty
parts of RAM which has been cached in *iovec*(will be sent asynchronously) and
dirty parts of new RAM which will be missed.
This will lead serious network fault in VM.
To avoid this, we forbid receiving packets in generic net code when
VM is not running.
Bug reproduction steps:
(1) Start a VM which configured at least one NIC
(2) In VM, open several Terminal and do *Ping IP -i 0.1*
(3) Migrate the VM repeatedly between two Hosts
And the *PING* command in VM will very likely fail with message:
'Destination HOST Unreachable', the NIC in VM will stay unavailable unless you
run 'service network restart'
Peter Maydell [Thu, 4 Sep 2014 11:20:41 +0000 (12:20 +0100)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pci, pc fixes, features
A bunch of bugfixes - these will make sense for 2.1.1
Initial Intel IOMMU support.
Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Wed 03 Sep 2014 14:41:23 BST using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>"
# gpg: aka "Michael S. Tsirkin <[email protected]>"
* remotes/mst/tags/for_upstream:
acpi-build: Set FORCE_APIC_CLUSTER_MODEL bit for FADT flags
vhost-scsi: init backend features earlier
vhost_net: init acked_features to backend_features
vhost_net: start/stop guest notifiers properly
I accidentally merged the wrong version of a pull request
which had a buggy version of this patch. Reverting the
buggy version means we can then cleanly merge in the correct
pull with the corrected change.
Gerd Hoffmann [Tue, 26 Aug 2014 13:35:23 +0000 (15:35 +0200)]
vbe: rework sanity checks
Plug a bunch of holes in the bochs dispi interface parameter checking.
Add a function doing verification on all registers. Call that
unconditionally on every register write. That way we should catch
everything, even changing one register affecting the valid range of
another register.
Some of the holes have been added by commit e9c6149f6ae6873f14a12eea554925b6aa4c4dec. Before that commit the
maximum possible framebuffer (VBE_DISPI_MAX_XRES * VBE_DISPI_MAX_YRES *
32 bpp) has been smaller than the qemu vga memory (8MB) and the checking
for VBE_DISPI_MAX_XRES + VBE_DISPI_MAX_YRES + VBE_DISPI_MAX_BPP was ok.
Some of the holes have been there forever, such as
VBE_DISPI_INDEX_X_OFFSET and VBE_DISPI_INDEX_Y_OFFSET register writes
lacking any verification.
Security impact:
(1) Guest can make the ui (gtk/vnc/...) use memory rages outside the vga
frame buffer as source -> host memory leak. Memory isn't leaked to
the guest but to the vnc client though.
(2) Qemu will segfault in case the memory range happens to include
unmapped areas -> Guest can DoS itself.
The guest can not modify host memory, so I don't think this can be used
by the guest to escape.
Gerd Hoffmann [Tue, 26 Aug 2014 12:16:30 +0000 (14:16 +0200)]
vbe: make bochs dispi interface return the correct memory size with qxl
VgaState->vram_size is the size of the pci bar. In case of qxl not the
whole pci bar can be used as vga framebuffer. Add a new variable
vbe_size to handle that case. By default (if unset) it equals
vram_size, but qxl can set vbe_size to something else.
This makes sure VBE_DISPI_INDEX_VIDEO_MEMORY_64K returns correct results
and sanity checks are done with the correct size too.
zhanghailiang [Fri, 29 Aug 2014 03:52:51 +0000 (11:52 +0800)]
acpi-build: Set FORCE_APIC_CLUSTER_MODEL bit for FADT flags
If we start Windows 2008 R2 DataCenter with number of cpu less than 8,
The system will use APIC Flat Logical destination mode as default configuration,
Which has an upper limit of 8 CPUs.
The fault is that VM can not show all processors within Task Manager if
we hot-add cpus when the number of cpus in VM extends the limit of 8.
If we use cluster destination model, the problem will be solved.
Note:
This flag was introduced later than ACPI v1.0 specification while QEMU
generates v1.0 tables only, but...
linux kernel ignores this flag, so patch has no influence on it.
Tested with Win[XPsp3|Srv2003EE|Srv2008DC|Srv2008R2|Srv2012R2], there
isn't BSODs and guests boot just fine. In cases guest doesn't support
cpu-hotplug, cpu becomes visible after reboot and in case the guest
supports cpu-hotplug, it works as expected with this patch.
As vhost core can use backend_features during init, clear it earlier to
avoid using uninitialized memory.
This use would be harmless since vhost scsi ignores the result
anyway, but initializing earlier will help prevent valgrind errors,
and make scsi and net behave similarly.
Jason Wang [Wed, 3 Sep 2014 06:25:30 +0000 (14:25 +0800)]
vhost_net: init acked_features to backend_features
commit 2e6d46d77ed328d34a94688da8371bcbe243479b (vhost: add
vhost_get_features and vhost_ack_features) removes the step that
initializes the acked_features to backend_features.
As this field is now uninitialized, vhost initialization will sometimes
fail.
Jason Wang [Tue, 19 Aug 2014 04:56:29 +0000 (12:56 +0800)]
vhost_net: start/stop guest notifiers properly
commit a9f98bb5ebe6fb1869321dcc58e72041ae626ad8 "vhost: multiqueue
support" changed the order of stopping the device. Previously
vhost_dev_stop would disable backend and only afterwards, unset guest
notifiers. We now unset guest notifiers while vhost is still
active. This can lose interrupts causing guest networking to fail. In
particular, this has been observed during migration.
To fix this, several other changes are needed:
- remove the hdev->started assertion in vhost.c since we may want to
start the guest notifiers before vhost starts and stop the guest
notifiers after vhost is stopped.
- introduce the vhost_net_set_vq_index() and call it before setting
guest notifiers. This is to guarantee vhost_net has the correct
virtqueue index when setting guest notifiers.
hw/9pfs: Use little-endian format for xattr values
With security_model=mapped-xattr, we encode the uid,gid and other file
attributes as extended attributes of the file. We save them under
user.virtfs.* namespace.
Use little-endian encoding for on-disk values. This enables us to export
the same directory from both little-endian and big-endian hosts.
NOTE: This will break big-endian host that have virtFS exports
using security model mapped-xattr. They will have to use external tools
to convert the xattr to little-endian format.
Peter Maydell [Mon, 16 Jun 2014 15:47:49 +0000 (16:47 +0100)]
slirp: Honour vlan/stack in hostfwd_remove commands
The hostfwd_add and hostfwd_remove monitor commands allow the user
to optionally specify a vlan/stack tuple. hostfwd_add honours this,
but hostfwd_remove does not (it looks up the tuple but then ignores
the SlirpState it has looked up and always uses the first stack
in the list anyway). Correct this to honour what the user requested.
Chen Fan [Mon, 18 Aug 2014 06:46:35 +0000 (14:46 +0800)]
hmp: fix MemdevList memory leak
the memdev_list in hmp_info_memdev() is never freed.
so we use existent method qapi_free_MemdevList() to free it.
and also we can use qapi_free_MemdevList() to replace list loops
to clean up the memdev list in error path.
Gonglei [Mon, 25 Aug 2014 02:01:27 +0000 (10:01 +0800)]
Fix debug print warning
Steps:
1.enable qemu debug print, using simply scprit as below:
grep "//#define DEBUG" * -rl | xargs sed -i "s/\/\/#define DEBUG/#define DEBUG/g"
2. make -j
3. get some warning:
hw/i2c/pm_smbus.c: In function 'smb_ioport_writeb':
hw/i2c/pm_smbus.c:142: warning: format '%04x' expects type 'unsigned int', but argument 2 has type 'hwaddr'
hw/i2c/pm_smbus.c:142: warning: format '%02x' expects type 'unsigned int', but argument 3 has type 'uint64_t'
hw/i2c/pm_smbus.c: In function 'smb_ioport_readb':
hw/i2c/pm_smbus.c:209: warning: format '%04x' expects type 'unsigned int', but argument 2 has type 'hwaddr'
hw/intc/i8259.c: In function 'pic_ioport_read':
hw/intc/i8259.c:373: warning: format '%02x' expects type 'unsigned int', but argument 2 has type 'hwaddr'
hw/input/pckbd.c: In function 'kbd_write_command':
hw/input/pckbd.c:232: warning: format '%02x' expects type 'unsigned int', but argument 2 has type 'uint64_t'
hw/input/pckbd.c: In function 'kbd_write_data':
hw/input/pckbd.c:333: warning: format '%02x' expects type 'unsigned int', but argument 2 has type 'uint64_t'
hw/isa/apm.c: In function 'apm_ioport_writeb':
hw/isa/apm.c:44: warning: format '%x' expects type 'unsigned int', but argument 2 has type 'hwaddr'
hw/isa/apm.c:44: warning: format '%02x' expects type 'unsigned int', but argument 3 has type 'uint64_t'
hw/isa/apm.c: In function 'apm_ioport_readb':
hw/isa/apm.c:67: warning: format '%x' expects type 'unsigned int', but argument 2 has type 'hwaddr'
hw/timer/mc146818rtc.c: In function 'cmos_ioport_write':
hw/timer/mc146818rtc.c:394: warning: format '%02x' expects type 'unsigned int', but argument 3 has type 'uint64_t'
hw/i386/pc.c: In function 'port92_write':
hw/i386/pc.c:479: warning: format '%02x' expects type 'unsigned int', but argument 2 has type 'uint64_t'
Peter Maydell [Tue, 2 Sep 2014 15:07:31 +0000 (16:07 +0100)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pci, pc fixes, features
A bunch of bugfixes - these will make sense for 2.1.1
Initial Intel IOMMU support.
Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Tue 02 Sep 2014 16:05:04 BST using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>"
# gpg: aka "Michael S. Tsirkin <[email protected]>"
* remotes/mst/tags/for_upstream:
vhost_net: start/stop guest notifiers properly
pci: avoid losing config updates to MSI/MSIX cap regs
virtio-net: don't run bh on vm stopped
ioh3420: remove unused ioh3420_init() declaration
vhost_net: cleanup start/stop condition
intel-iommu: add IOTLB using hash table
intel-iommu: add context-cache to cache context-entry
intel-iommu: add supports for queued invalidation interface
intel-iommu: fix coding style issues around in q35.c and machine.c
intel-iommu: add Intel IOMMU emulation to q35 and add a machine option "iommu" as a switch
intel-iommu: add DMAR table to ACPI tables
intel-iommu: introduce Intel IOMMU (VT-d) emulation
iommu: add is_write as a parameter to the translate function of MemoryRegionIOMMUOps
Jason Wang [Tue, 19 Aug 2014 04:56:29 +0000 (12:56 +0800)]
vhost_net: start/stop guest notifiers properly
commit a9f98bb5ebe6fb1869321dcc58e72041ae626ad8 vhost: multiqueue
support changed the order of stopping the device. Previously
vhost_dev_stop would disable backend and only afterwards, unset guest
notifiers. We now unset guest notifiers while vhost is still
active. This can lose interrupts causing guest networking to fail. In
particular, this has been observed during migration.
To adapt this, several other changes are needed:
- remove the hdev->started assertion in vhost.c since we may want to
start the guest notifiers before vhost starts and stop the guest
notifiers after vhost is stopped.
- introduce the vhost_net_set_vq_index() and call it before setting
guest notifiers. This is used to guarantee vhost_net has the correct
virtqueue index when setting guest notifiers.
pci: avoid losing config updates to MSI/MSIX cap regs
Since
commit 95d658002401e2e47a5404298ebe9508846e8a39
msi: Invoke msi/msix_write_config from PCI core
msix config writes are lost, the value written is always 0.
is incomplete: BH might execute within the same main loop iteration but
after vmstop, so in theory, we might trigger an assertion.
I was unable to reproduce this in practice,
but it seems clear enough that the potential is there, so worth fixing.
Checking vhost device internal state in vhost_net looks like
a layering violation since vhost_net does not
set this flag: it is set and tested by vhost.c.
There seems to be no reason to check this:
caller in virtio net uses its own flag,
vhost_started, to ensure vhost is started/stopped
as appropriate.
Xin Tong [Tue, 5 Aug 2014 01:35:23 +0000 (20:35 -0500)]
implementing victim TLB for QEMU system emulated TLB
QEMU system mode page table walks are expensive. Taken by running QEMU
qemu-system-x86_64 system mode on Intel PIN , a TLB miss and walking a
4-level page tables in guest Linux OS takes ~450 X86 instructions on
average.
QEMU system mode TLB is implemented using a directly-mapped hashtable.
This structure suffers from conflict misses. Increasing the
associativity of the TLB may not be the solution to conflict misses as
all the ways may have to be walked in serial.
A victim TLB is a TLB used to hold translations evicted from the
primary TLB upon replacement. The victim TLB lies between the main TLB
and its refill path. Victim TLB is of greater associativity (fully
associative in this patch). It takes longer to lookup the victim TLB,
but its likely better than a full page table walk. The memory
translation path is changed as follows :
Before Victim TLB:
1. Inline TLB lookup
2. Exit code cache on TLB miss.
3. Check for unaligned, IO accesses
4. TLB refill.
5. Do the memory access.
6. Return to code cache.
After Victim TLB:
1. Inline TLB lookup
2. Exit code cache on TLB miss.
3. Check for unaligned, IO accesses
4. Victim TLB lookup.
5. If victim TLB misses, TLB refill
6. Do the memory access.
7. Return to code cache
The advantage is that victim TLB can offer more associativity to a
directly mapped TLB and thus potentially fewer page table walks while
still keeping the time taken to flush within reasonable limits.
However, placing a victim TLB before the refill path increase TLB
refill path as the victim TLB is consulted before the TLB refill. The
performance results demonstrate that the pros outweigh the cons.
some performance results taken on SPECINT2006 train
datasets and kernel boot and qemu configure script on an
Intel(R) Xeon(R) CPU E5620 @ 2.40GHz Linux machine are shown in the
Google Doc link below.
In summary, victim TLB improves the performance of qemu-system-x86_64 by
11% on average on SPECINT2006, kernelboot and qemu configscript and with
highest improvement of in 26% in 456.hmmer. And victim TLB does not result
in any performance degradation in any of the measured benchmarks. Furthermore,
the implemented victim TLB is architecture independent and is expected to
benefit other architectures in QEMU as well.
Although there are measurement fluctuations, the performance
improvement is very significant and by no means in the range of
noises.
Peter Maydell [Mon, 1 Sep 2014 12:57:45 +0000 (13:57 +0100)]
Merge remote-tracking branch 'remotes/borntraeger/tags/kvm-s390-20140901' into staging
s390x/kvm: Several updates/fixes/features
1. s390x/kvm: avoid synchronize_rcu's in kernel
----------------------------------------------
The first patches change s390x/kvm code to issue VCPU specific ioctls
from the VCPU thread. This will avoid unnecessary synchronize_rcu in
the kernel, which caused a noticably slowdown with many guest CPUs.
It speeds up all start/restart/reset operations involving cpus
drastically.
2. s390-ccw.img: block size and DASD format support
---------------------------------------------------
The second part changes the s390-ccw bios to IPL (boot) more disk
formats than before. Furthermore a small fix is made to the console
output of the bios.
3. s390: Support for Hotplug of Standby Memory
----------------------------------------------
The third part adds support in s390 for a pool of standby memory,
which can be set online/offline by the guest (ie, via chmem).
The standby pool of memory is allocated as the difference between
the initial memory setting and the maxmem setting.
As part of this work, additional results are provided for the
Read SCP Information SCLP, and new implentation is added for the
Read Storage Element Information, Attach Storage Element,
Assign Storage and Unassign Storage SCLPs, which enables the s390
guest to manipulate the standby memory pool.
This patchset is based on work originally done by Jeng-Fang (Nick)
Wang.
This will allocate 1024M of active memory, and another 1024M
of standby memory. Example output from s390-tools lsmem:
=============================================================================
0x0000000000000000-0x000000000fffffff 256 online no 0-127
0x0000000010000000-0x000000001fffffff 256 online yes 128-255
0x0000000020000000-0x000000003fffffff 512 online no 256-511
0x0000000040000000-0x000000007fffffff 1024 offline - 512-1023
The guest can dynamically enable part or all of the standby pool
via the s390-tools chmem, for example:
chmem -e 512M
And can attempt to dynamically disable:
chmem -d 512M
4. s390x/gdb: various fixes
---------------------------
* Patch 1 fixes a bug where the cc was changed accidentally.
* Patch 2 adds the gdb feature XML files for s390x
* Patch 3 Define acr and fpr registers as coprocessor registers. This allows us
to reuse the feature XML files.
* Patch 4 whitespace fixes
# gpg: Signature made Mon 01 Sep 2014 12:53:39 BST using RSA key ID B5A61C7C
# gpg: Can't check signature: public key not found
* remotes/borntraeger/tags/kvm-s390-20140901:
s390x/gdb: coding style fixes
s390x/gdb: generate target.xml and handle fp/ac as coprocessors
s390x/gdb: add the feature xml files for s390x
s390x/gdb: don't touch the cc if tcg is not enabled
sclp-s390: Add memory hotplug SCLPs
s390-virtio: Apply same memory boundaries as virtio-ccw
virtio-ccw: Include standby memory when calculating storage increment
sclp-s390: Add device to manage s390 memory hotplug
pc-bios/s390-ccw.img binary update
pc-bios/s390-ccw: Do proper console setup
pc-bios/s390-ccw: IPL from DASD with format variations
pc-bios/s390-ccw Really big EAV ECKD DASD handling
pc-bios/s390-ccw Improve ECKD informational message
pc-bios/s390-ccw: handle more ECKD DASD block sizes
pc-bios/s390-ccw: support all virtio block size
s390x/kvm: execute the first cpu reset on the vcpu thread
s390x/kvm: execute "system reset" cpu resets on the vcpu thread
s390x/kvm: execute sigp orders on the target vcpu thread
s390x/kvm: run guest triggered resets on the target vcpu thread
Gerd Hoffmann [Fri, 29 Aug 2014 07:27:52 +0000 (09:27 +0200)]
qxl-render: add more sanity checks
Damn, the dirty rectangle values are signed integers. So the checks
added by commit 788fbf042fc6d5aaeab56757e6dad622ac5f0c21 are not good
enough, we also have to make sure they are not negative.
[ Note: There must be something broken in spice-server so we get
negative values in the first place. Bug opened:
https://bugzilla.redhat.com/show_bug.cgi?id=1135372 ]
s390x/gdb: generate target.xml and handle fp/ac as coprocessors
This patch reduces the core registers to the psw and the general purpose
registers. The fpc and ac registers are handled as coprocessors registers by gdb.
This allows to reuse the feature xml files taken from gdb without further
modification and is what other architectures do.
The target.xml is now generated and provided to the gdb client. Therefore, the
client doesn't have to guess which registers are available at which logical
register number.
Matthew Rosato [Thu, 28 Aug 2014 15:25:35 +0000 (11:25 -0400)]
sclp-s390: Add memory hotplug SCLPs
Add memory information to read SCP info and add handlers for
Read Storage Element Information, Attach Storage Element,
Assign Storage and Unassign Storage.
Matthew Rosato [Thu, 28 Aug 2014 15:25:34 +0000 (11:25 -0400)]
s390-virtio: Apply same memory boundaries as virtio-ccw
Although s390-virtio won't support memory hotplug, it should
enforce the same memory boundaries so that it can use shared codepaths
(like read_SCP_info).
pc-bios/s390-ccw: Do proper console setup
pc-bios/s390-ccw: support all virtio block size
pc-bios/s390-ccw: handle more ECKD DASD block sizes
pc-bios/s390-ccw Improve ECKD informational message
pc-bios/s390-ccw Really big EAV ECKD DASD handling
pc-bios/s390-ccw: IPL from DASD with format variations
pc-bios/s390-ccw: IPL from DASD with format variations
There are two known cases of DASD format where signatures are
incomplete or absent:
1. result of <dasdfmt -d ldl -L ...> (ECKD_LDL_UNLABELED)
2. CDL with zero keys in IPL1 and IPL2 records
Now the code attempts to
1. find zIPL and use SCSI layout
2. find IPL1 and use CDL layout
3. find CMS1 and use LDL layout
3. find LNX1 and use LDL layout
4. find zIPL and use unlabeled LDL layout
5. find zIPL and use CDL layout
6. die
in this sequence.
pc-bios/s390-ccw: handle more ECKD DASD block sizes
Using dasdfmt(8) to format a DASD allows to choose a block size.
There are four supported values: 512, 1024, 2048, and 4096 bytes
per block. Each block size leads to selection of new count of
sectors per track. The head count remains always the same: 15.
This empiric knowledge is used to detect ECKD DASD to IPL from.
The block size value may be given "as is" OR as a base value and
a shift count (exponent). So, we have to use calculation to get
the proper number in the code.
The main expression reads as
(blk_cfg.blk_size << blk_cfg.physical_block_exp)
E.g., various combinations between blk_size=1/physical_block_exp=12
and blk_size=4096/physical_block_exp=0 are valid for 4K blocks.
s390x/kvm: execute the first cpu reset on the vcpu thread
As all full cpu resets currently call into the kernel to do initial cpu reset,
let's run this reset (triggered by cpu_s390x_init()) on the proper vcpu thread.
s390x/kvm: run guest triggered resets on the target vcpu thread
Currently, load_normal_reset() and modified_clear_reset() as triggered
by a guest vcpu will initiate cpu resets on the current vcpu thread for
all cpus. The reset should happen on the individual vcpu thread
instead, so let's use run_on_cpu() for this.
This avoids calls to synchronize_rcu() in the kernel.
Peter Maydell [Fri, 29 Aug 2014 17:40:04 +0000 (18:40 +0100)]
Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
Block pull request
# gpg: Signature made Fri 29 Aug 2014 17:25:58 BST using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg: aka "Stefan Hajnoczi <[email protected]>"
* remotes/stefanha/tags/block-pull-request: (35 commits)
quorum: Fix leak of opts in quorum_open
blkverify: Fix leak of opts in blkverify_open
nfs: Fix leak of opts in nfs_file_open
curl: Don't deref NULL pointer in call to aio_poll.
curl: Allow a cookie or cookies to be sent with http/https requests.
virtio-blk: allow drive_del with dataplane
block: acquire AioContext in do_drive_del()
linux-aio: avoid deadlock in nested aio_poll() calls
qemu-iotests: add multiwrite test cases
block: fix overlapping multiwrite requests
nbd: Follow the BDS' AIO context
block: Add AIO context notifiers
nbd: Drop nbd_can_read()
sheepdog: fix a core dump while do auto-reconnecting
aio-win32: add support for sockets
qemu-coroutine-io: fix for Win32
AioContext: introduce aio_prepare
aio-win32: add aio_set_dispatching optimization
test-aio: test timers on Windows too
AioContext: export and use aio_dispatch
...