s390x/gdb: synchronize cpu state after modifying acrs
Whenever we touch the access control registers, we have to make sure that
the values will make it into kvm. Otherwise the change will simply be lost.
When synchronizing qemu and kvm, a normal KVM_PUT_RUNTIME_STATE does not take
care of these registers. Let's simply trigger a KVM_PUT_FULL_STATE sync,
so the values will directly be written to kvm. The performance overhead can
be ignored and this is much cleaner than manually writing these registers to kvm
via our two supported ways.
commit fa92e218df1d ("s390x/ipl: avoid sign extension") introduced
a regression:
qemu-system-s390x -drive file=image.qcow,format=qcow2
does not boot, the bios states
"No virtio-blk device found!"
adding bootindex=1 does boot.
The reason is that the uint32_t as return value will not do the right
thing for the return -1 (default without bootindex).
The bios itself, will interpret a 64bit -1 as autodetect (but it will
interpret 32bit -1 as ccw device address ff.ff.ffff)
Thomas Huth [Thu, 11 Dec 2014 13:25:12 +0000 (14:25 +0100)]
s390x/virtio-ccw: add virtio set-revision call
Handle the virtio-ccw revision according to what the guest sets.
When revision 1 is selected, we have a virtio-1 standard device
with byteswapping for the virtio rings.
When a channel gets disabled, we have to revert to the legacy behavior
in case the next user of the device does not negotiate the revision 1
anymore (e.g. the boot firmware uses revision 1, but the operating
system only uses the legacy mode).
We have to consume the outstanding service interrupt after each
service call, otherwise a correct implementation will return
CC=2 on subsequent service calls.
Cornelia Huck [Wed, 24 Jun 2015 08:57:23 +0000 (10:57 +0200)]
css: mss/mcss-e vs. migration
Our main channel_subsys structure is not a device (yet), but we need
to setup mss/mcss-e again if the guest had enabled it before. Use
a hack that should catch most configurations (assuming that the guest
will have enabled at least one device in higher subchannel sets or
channel subsystems if it enabled the functionality.)
Cornelia Huck [Tue, 23 Jun 2015 13:46:31 +0000 (15:46 +0200)]
virtio-ccw: complete handling of guest-initiated resets
For a guest-initiated reset, we need to not only reset the virtio device,
but also reset the VirtioCcwDevice into a clean state. This includes
resetting the indicators, or else a guest will not be able to e.g.
switch from classic interrupts to adapter interrupts.
Split off this routine into a new function virtio_ccw_reset_virtio()
to make the distinction between resetting the virtio-related devices
and the base subchannel device clear.
Peter Maydell [Mon, 29 Jun 2015 16:03:20 +0000 (17:03 +0100)]
Merge remote-tracking branch 'remotes/vivier/tags/pull-m68k-20150629' into staging
Trivial m68k cleanup
# gpg: Signature made Mon Jun 29 16:38:40 2015 BST using DSA key ID ABF36C53
# gpg: Good signature from "Laurent Vivier <[email protected]>"
# gpg: aka "Laurent Vivier <[email protected]>"
# gpg: aka "Laurent Vivier <[email protected]>"
# gpg: aka "Laurent Vivier (Red Hat) <[email protected]>"
# gpg: aka "[jpeg image of size 3881]"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 9EC7 B78A C0AC E697 5E4B BDE3 34A4 F6C9 ABF3 6C53
* remotes/vivier/tags/pull-m68k-20150629:
m68k: remove useless parameter op_size from gen_lea_indexed()
m68k: remove useless file m68k-qreg.h
m68k: is_mem is useless
Peter Maydell [Fri, 26 Jun 2015 14:57:43 +0000 (15:57 +0100)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
virtio, pci fixes, enhancements
Almost exclusively bugfixes, though in this case,
we are adding functionality to the pxb in order
to make OVMF work on it.
Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Fri Jun 26 14:43:27 2015 BST using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>"
# gpg: aka "Michael S. Tsirkin <[email protected]>"
* remotes/mst/tags/for_upstream:
Fix glib_subprocess test
hw/pci-bridge: format special OFW unit address for PXB host
hw/core: explicit OFW unit address callback for SysBusDeviceClass
hw/pci-bridge: disable SHPC in PXB
hw/pci-bridge: introduce "shpc" property
hw/pci: introduce shpc_present() helper function
hw/pci-bridge: add macro for "msi" property
hw/pci-bridge: add macro for "chassis_nr" property
hw/pci-bridge: expose _test parameter in SHPC_VMSTATE()
migration: introduce VMSTATE_BUFFER_UNSAFE_INFO_TEST()
add pci-bridge-seat
pc: cleanup and convert TMP ACPI device description to AML API
MAINTAINERS: add ACPI entry
vhost: correctly pass error to caller in vhost_dev_enable_notifiers()
balloon: add a feature bit to let Guest OS deflate balloon on oom
qdev: fix OVERFLOW_BEFORE_WIDEN
virito-pci: fix OVERRUN problem
Peter Maydell [Fri, 26 Jun 2015 13:40:47 +0000 (14:40 +0100)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20150626' into staging
target-arm queue:
* Change the virt board's default interface type for block devices to virtio
* Improve some error messages that will now be triggered by some incorrect
but previously worked-by-accident command lines
* Print ELR if we're doing debug logging of AArch64 exception entry
* Handle the "completely empty semihosting commandline" correctly for
softmmu (we already did for linux-user)
* Add GICv2m description to ACPI tables for virt board
* Fix some incorrect table revision entries in virt board ACPI tables
# gpg: Signature made Fri Jun 26 14:29:39 2015 BST using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <[email protected]>"
* remotes/pmaydell/tags/pull-target-arm-20150626:
hw/arm/virt: Make block devices default to virtio
qdev-properties-system: Improve error message for drive assignment conflict
qdev-properties-system: Change set_pointer's parse callback to use Error
target-arm: A64: Print ELR when taking exceptions
target-arm: default empty semihosting cmdline
hw/arm/virt-acpi-build: Add GICv2m description in ACPI MADT table
hw/arm/virt-acpi-build: Fix table revision and some comments
Peter Maydell [Fri, 26 Jun 2015 13:22:37 +0000 (14:22 +0100)]
hw/arm/virt: Make block devices default to virtio
Now we have virtio-pci, we can make the virt board's default block
device type be IF_VIRTIO. This allows users to use simplified
command lines that don't have to explicitly create virtio-pci-blk
devices; the -hda &c very short options now also work.
This means we also need to set no_cdrom to avoid getting a
default cdrom device -- this is needed because the virtio-blk
device will fail if it is connected to a block backend with
no media, which is what the default cdrom device typically is.
Providing a cdrom with media via -cdrom will succeed, but silently
create a device with non-removable medium. this is probably
not really what the user wants, but is the best we can do now.
Note that this change means that some command lines which used
to work (by accident) will stop working. Where a drive was connected
manually to a device but without 'if=none' being specified, we
used to treat this as an IDE drive, which we would then not autoplug
because the board doesn't support IDE. Now we will treat it as a
virtio disk and autoplug it, which means the attempt to use the
drive manually will fail:
qemu-system-arm: -drive file=img.qcow2,id=foo: Drive 'foo' is already
in use because it has been automatically connected to another device
(did you need 'if=none' in the drive options?)
The command line will have to be changed to include 'if=none', as the
error message suggests.
Peter Maydell [Fri, 26 Jun 2015 13:22:36 +0000 (14:22 +0100)]
qdev-properties-system: Improve error message for drive assignment conflict
If the user forgot if=none on their drive specification they're likely
to get an error message because the drive is assigned once automatically
by QEMU and once by the manual id=/drive= user command line specification.
Improve the error message produced in this case to explicitly guide the
user towards if=none.
We rephrase the "drive conflict but not for an if=something" error as
well to keep the wording in line.
The two cases that change are:
(1) Drive specified as to be auto-connected and also manually connected
(and the board does handle this if= type):
Previously:
qemu-system-x86_64: -device ide-hd,drive=foo: Property 'ide-hd.drive'
can't take value 'foo', it's in use
Now:
qemu-system-x86_64: -device ide-hd,drive=foo: Drive 'foo' is already in
use because it has been automatically connected to another device (did
you need 'if=none' in the drive options?)
(2) Drive specified to be manually connected in two different ways:
Peter Maydell [Fri, 26 Jun 2015 13:22:36 +0000 (14:22 +0100)]
qdev-properties-system: Change set_pointer's parse callback to use Error
Instead of having set_pointer() call a parse callback which returns
an error number that we then convert to an Error string with
error_set_from_qdev_prop_error(), make the parse callback take an
Error** and set the error itself. This will allow parse routines
to provide more helpful error messages than the generic ones.
Soren Brinkmann [Fri, 26 Jun 2015 13:22:36 +0000 (14:22 +0100)]
target-arm: A64: Print ELR when taking exceptions
When taking an exception print the content of the exception link
register. This is useful especially for synchronous exceptions because
in that case this registers holds the address of the instruction that
generated the exception.
Peter Maydell [Fri, 26 Jun 2015 10:32:58 +0000 (11:32 +0100)]
Merge remote-tracking branch 'remotes/lalrae/tags/mips-20150626' into staging
MIPS patches 2015-06-26
Changes:
* MIPS UHI semihosting support
* microMIPS32 R6 support
# gpg: Signature made Fri Jun 26 10:42:33 2015 BST using RSA key ID 0B29DA6B
# gpg: Good signature from "Leon Alrae <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 8DD3 2F98 5495 9D66 35D4 4FC0 5211 8E3C 0B29 DA6B
* remotes/lalrae/tags/mips-20150626:
target-mips: add mips32r6-generic CPU definition
target-mips: microMIPS32 R6 POOL16{A, C} instructions
target-mips: microMIPS32 R6 Major instructions
target-mips: microMIPS32 R6 POOL32{I, C} instructions
target-mips: microMIPS32 R6 POOL32F instructions
target-mips: microMIPS32 R6 POOL32A{XF} instructions
target-mips: microMIPS32 R6 branches and jumps
target-mips: add microMIPS32 R6 opcode enum
target-mips: signal RI for removed instructions in microMIPS R6
target-mips: raise RI exceptions when FIR.PS = 0
target-mips: rearrange gen_compute_compact_branch
target-mips: refactor {D}LSA, {D}ALIGN, {D}BITSWAP
target-mips: remove an unused argument
target-mips: add microMIPS TLBINV, TLBINVF
target-mips: fix {RD, WR}PGPR in microMIPS
target-mips: convert host to MIPS errno values when required
target-mips: add Unified Hosting Interface (UHI) support
target-mips: remove identical code in different branch
hw/mips: Do not clear BEV for MIPS malta kernel load
include/softmmu-semi.h: Make semihosting support 64-bit clean
Yongbok Kim [Wed, 24 Jun 2015 23:24:18 +0000 (00:24 +0100)]
target-mips: raise RI exceptions when FIR.PS = 0
64-bit paired-single (PS) floating point data type is optional in the
pre-Release 6.
It has to raise RI exception when PS type is not implemented. (FIR.PS = 0)
(The PS data type is removed in the Release 6.)
Loongson-2E and Loongson-2F don't have any implementation field in
FCSR0(FIR) but do support PS data format, therefore for these cores RI will
not be signalled regardless of PS bit.
Leon Alrae [Fri, 19 Jun 2015 10:08:43 +0000 (11:08 +0100)]
target-mips: add Unified Hosting Interface (UHI) support
Add UHI semihosting support for MIPS. QEMU run with "-semihosting" option
will alter the behaviour of SDBBP 1 instruction -- UHI operation will be
called instead of generating a debug exception.
Also tweak Malta's pseudo-bootloader. On CPU reset the $4 register is set
to -1 if semihosting arguments are passed to indicate that the UHI
operations should be used to obtain input arguments.
Matthew Fortune [Fri, 19 Jun 2015 10:08:41 +0000 (11:08 +0100)]
hw/mips: Do not clear BEV for MIPS malta kernel load
The BEV flag controls whether the boot exception vector is still
in place when starting a kernel. When cleared the exception vector
at EBASE (or hard coded address of 0x80000000) is used instead.
The early stages of the linux kernel would benefit from BEV still
being set to ensure any faults get handled by the boot rom exception
handlers. This is a moot point for system qemu as there aren't really
any BEV handlers, but there are other good reasons to change this...
The UHI (semi-hosting interface) defines special behaviours depending
on whether an application starts in an environment with BEV set or
cleared. When BEV is set then UHI assumes that a bootloader is
relatively dumb and has no advanced exception handling logic.
However, when BEV is cleared then UHI assumes that the bootloader
has the ability to handle UHI exceptions with its exception handlers
and will unwind and forward UHI SYSCALL exceptions to the exception
vector that was installed prior to running the application.
Peter Maydell [Thu, 25 Jun 2015 13:03:55 +0000 (14:03 +0100)]
Merge remote-tracking branch 'remotes/stefanha/tags/net-pull-request' into staging
# gpg: Signature made Wed Jun 24 16:37:23 2015 BST using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg: aka "Stefan Hajnoczi <[email protected]>"
* remotes/stefanha/tags/net-pull-request:
net: simplify net_client_init1()
net: drop if expression that is always true
net: raise an error if -net type is invalid
net: replace net_client_init1() netdev whitelist with blacklist
net: add missing "netmap" to host_net_devices[]
Peter Maydell [Thu, 25 Jun 2015 10:19:46 +0000 (11:19 +0100)]
Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
# gpg: Signature made Wed Jun 24 16:27:53 2015 BST using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg: aka "Stefan Hajnoczi <[email protected]>"
* remotes/stefanha/tags/block-pull-request:
virito-blk: drop duplicate check
qemu-iotests: fix 051.out after qdev error message change
iov: don't touch iov in iov_send_recv()
raw-posix: Introduce hdev_is_sg()
raw-posix: Use DPRINTF for DEBUG_FLOPPY
raw-posix: DPRINTF instead of DEBUG_BLOCK_PRINT
Fix migration in case of scsi-generic
block: Use bdrv_is_sg() everywhere
nvme: Fix memleak in nvme_dma_read_prp
vvfat: add a label option
util/hbitmap: Add an API to reset all set bits in hbitmap
virtio-blk: Use blk_drain() to drain IO requests
block-backend: Introduce blk_drain()
throttle: Check current timers before updating any_timer_armed[]
block: Let bdrv_drain_all() to call aio_poll() for each AioContext
Stefan Hajnoczi [Tue, 23 Jun 2015 14:56:09 +0000 (15:56 +0100)]
qemu-iotests: fix 051.out after qdev error message change
Commit f006cf7fa9a63ba8e4ccf57d46231ce594301727 ("qdev-monitor:
Propagate errors through qdev_device_add()") dropped a meaningless error
message. This change in output caused qemu-iotests 051 to fail:
QEMU_PROG: -device ide-drive,drive=disk: Device initialization failed.
-QEMU_PROG: -device ide-drive,drive=disk: Device 'ide-drive' could not be initialized
A typo means that the tests dependent on glib with subprocess
support are never run.
Fixes: 9d41401b90fa10b335d2e739149d36437cfbf622 Signed-off-by: Dr. David Alan Gilbert <[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
Laszlo Ersek [Fri, 19 Jun 2015 02:40:17 +0000 (04:40 +0200)]
hw/pci-bridge: format special OFW unit address for PXB host
We have agreed that OpenFirmware device paths in the "bootorder" fw_cfg
file should follow the pattern
/pci@i0cf8,%x/...
for devices that live behind an extra root bus. The extra root bus in
question is the %x'th among the extra root buses. (In other words, %x
gives the position of the affected extra root bus relative to the other
extra root buses, in bus_nr order.) %x starts at 1, and is formatted in
hex.
The portion of the unit address that comes before the comma is dynamically
taken from the main host bridge, similarly to sysbus_get_fw_dev_path().
Laszlo Ersek [Fri, 19 Jun 2015 02:40:16 +0000 (04:40 +0200)]
hw/core: explicit OFW unit address callback for SysBusDeviceClass
The sysbus_get_fw_dev_path() function formats OpenFirmware device path
nodes ("driver-name@unit-address") for sysbus devices. The first choice
for "unit-address" is the base address of the device's first MMIO region.
The second choice is its first IO port.
However, if two sysbus devices with the same "driver-name" lack both MMIO
and PIO resources, then there is no good way to distinguish them based on
their OFW nodes, because in this case unit-address is omitted completely
for both devices. An example is TYPE_PXB_HOST ("pxb-host").
For the sake of such devices, introduce the explicit_ofw_unit_address()
"virtual member function". With this function, each sysbus device in the
same SysBusDeviceClass can state its own address.
Laszlo Ersek [Fri, 19 Jun 2015 02:40:14 +0000 (04:40 +0200)]
hw/pci-bridge: disable SHPC in PXB
OVMF downloads the ACPI linker/loader script from QEMU when the edk2 PCI
Bus driver globally signals the firmware that PCI enumeration and resource
allocation have completed. At this point QEMU regenerates the ACPI payload
in an fw_cfg read callback, and this is when the PXB's _CRS gets
populated.
Unfortunately, when this happens, the PCI_COMMAND_MEMORY bit is clear in
the root bus's command register, *unlike* under SeaBIOS. The consequences
unfold as follows:
- When build_crs() fetches dev->io_regions[i].addr, it is all-bits-one,
because pci_update_mappings() --> pci_bar_address() calculated it as
PCI_BAR_UNMAPPED, due to the PCI_COMMAND_MEMORY bit being clear.
- Consequently, the SHPC MMIO BAR (bar 0) of the bridge is not added to
the _CRS, *despite* having been programmed in PCI config space.
- Similarly, the SHPC MMIO BAR of the PXB is not removed from the main
root bus's DWordMemory descriptor.
- Guest OSes (Linux and Windows alike) notice the pre-programmed SHPC BAR
within the PXB's config space, and notice that it conflicts with the
main root bus's memory resource descriptors. Linux reports
pci 0000:04:00.0: BAR 0: can't assign mem (size 0x100)
pci 0000:04:00.0: BAR 0: trying firmware assignment [mem
0x88200000-0x882000ff 64bit]
pci 0000:04:00.0: BAR 0: [mem 0x88200000-0x882000ff 64bit] conflicts
with PCI Bus 0000:00 [mem
0x88200000-0xfebfffff]
This device cannot find enough free resources that it can use. If you
want to use this device, you will need to disable one of the other
devices on this system. (Code 12)
This issue was apparently encountered earlier, see the "hack" in:
and the current hole-punching logic in build_crs() and build_ssdt() is
probably supposed to remedy exactly that problem -- however, for OVMF they
don't work, because at the end of the PCI enumeration and resource
allocation, which cues the ACPI linker/loader client, the command register
is clear.
The "shpc" property of "pci-bridge", introduced in the previous patches,
allows us to disable the standard hotplug controller cleanly, eliminating
the SHPC bar and the conflict.
Laszlo Ersek [Fri, 19 Jun 2015 02:40:09 +0000 (04:40 +0200)]
hw/pci-bridge: expose _test parameter in SHPC_VMSTATE()
Change the signature of the function-like macro SHPC_VMSTATE(), so that we
can produce and expect this field conditionally in the migration stream,
starting with an upcoming patch.
There is no _TEST() variant of VMSTATE_BUFFER_UNSAFE_INFO() yet, but we'll
soon need it. Introduce it and rebase the original
VMSTATE_BUFFER_UNSAFE_INFO() on top.
The parameter order of the new function-like macro follows that of
VMSTATE_SINGLE_TEST(): "_test" is introduced between "_state" and
"_version".
util/qemu-sockets: improve ai_flag hints for ipv6 hosts
*) Do not use AI_ADDRCONFIG on listening sockets, because this flag
makes it impossible to explicitly listen on '127.0.0.1' if no global
ipv4 address is configured additionally, making this a very
uncomfortable option.
*) Add AI_V4MAPPED hint for connecting sockets.
If your system is globally only connected via ipv6 you often still want
to be able to use '127.0.0.1' and 'localhost' (even if localhost doesn't
also have an ipv6 entry).
For example, PVE - unless explicitly asking for insecure mode - uses
ipv4 loopback addresses with QEMU for live migrations tunneled over SSH.
These fail to start because AI_ADDRCONFIG makes getaddrinfo refuse to
work with '127.0.0.1'.
As for the AI_V4MAPPED flag: glibc uses it by default, and providing
non-0 flags removes it. I think it makes sense to use it.
I also want to point out that glibc explicitly sidesteps POSIX standards
when passing 0 as hints by then assuming both AI_V4MAPPED and
AI_ADDRCONFIG (the latter being a rather weird choice IMO), while
according to POSIX.1-2001 it should be assumed 0. (glibc considers its
choice an improvement.)
Since either AI_CANONNAME or AI_PASSIVE are passed in our cases, glibc's
default flags in turn are disabled again unless explicitly added, which
I do with this patch.
Fam Zheng [Fri, 22 May 2015 05:35:07 +0000 (13:35 +0800)]
Makefile: Fix "make cscope TAGS"
Cscope and TAGS files work in source directory rather than the build
directory, also, don't ask users to run configure first, because they
may have an out of tree build.
Recent commit 3751d7c "vl: allow full-blown QemuOpts syntax for
-global" overloaded its existing argument syntax DRIVER.PROP=VALUE
with QemuOpts syntax. Unambigious as long as no DRIVER contains '='.
Its documentation claims that "the two syntaxes are equivalent."
Improve it to spell out how exactly the old syntax gets desugared into
the new one.
Michael Tokarev [Wed, 17 Jun 2015 18:02:03 +0000 (21:02 +0300)]
libcacard: pkgconfig: tidy dependent libs
libcacard.pc file lists only one package in Requires
field, which is nss, while glib-2.0 is also a requiriment.
Furthermore, for libraries used internally by the library
(this is the way nss and glib are used by libcacard),
Requires.private shold be used instead of Requires.
Fix both issues.
This does not affect linking of qemu because it links
with objects from libcacard directly.
When loading migration fails due to a disagreement about
PCI config data we don't currently get any errors explaining
that was the cause of the problem or which byte in the config
data was at fault.
Michael Tokarev [Wed, 3 Jun 2015 14:37:27 +0000 (17:37 +0300)]
remove libdecnumber/dpd/decimal128Local.h
Commit 72ac97cdfc added two equivalent versions of decimal128Local.h,
one in libdecnumber/dpd/ and another in include/libdecnumber/dpd/.
Being identical by the code, the two files however differs in the
licensing terms. The one in libdecnumber/dpd/ (which is being
removed by this patch) is licensed as GPL3.1 (plus gcc runtime
exception), which, as far as I know, is not compatible with GPL-2.
This file is not used (it is included from
include/libdecnumber/dpd/decimal128.h, so version in include/ is
used).
More, the version in include/ can also be removed, since none
of the 3 defines from that file are actually used by the code.
Even more, one of the defines from there, decimal128SetSign,
is redefined (to equivalent value) in libdecnumber/dpd/decimal128.c,
but again, never used.
Peter Maydell [Tue, 23 Jun 2015 16:46:20 +0000 (17:46 +0100)]
Merge remote-tracking branch 'remotes/sstabellini/tags/xen-220615-3' into staging
xen-220615, more SOB lines
# gpg: Signature made Tue Jun 23 17:19:08 2015 BST using RSA key ID 70E1AE90
# gpg: Good signature from "Stefano Stabellini <[email protected]>"
* remotes/sstabellini/tags/xen-220615-3:
Revert "xen-hvm: increase maxmem before calling xc_domain_populate_physmap"
xen/pass-through: constify some static data
xen/pass-through: log errno values rather than function return ones
xen/pass-through: ROM BAR handling adjustments
xen/pass-through: fold host PCI command register writes
The original commit fixes a bug when assigning a large number of
devices which require option roms to a guest. (One known
configuration that needs extra memory is having more than 4 emulated
NICs assigned. Three or fewer NICs seems to work without this
functionality.)
However, by unilaterally increasing maxmem, it introduces two
problems.
First, now libxl's calculation of the required maxmem during migration
is broken -- any guest which exercised this functionality will fail on
migration. (Guests which have the default number of devices are not
affected.)
Secondly, it makes it impossible for a higher-level toolstack or
administer to predict how much memory a VM will actually use, making
it much more difficult to effectively use all of the memory on a
machine.
Jan Beulich [Fri, 5 Jun 2015 12:04:55 +0000 (13:04 +0100)]
xen/pass-through: constify some static data
This is done indirectly by adjusting two typedefs and helps emphasizing
that the respective tables aren't supposed to be modified at runtime
(as they may be shared between devices).
Jan Beulich [Mon, 8 Jun 2015 13:11:51 +0000 (14:11 +0100)]
xen/pass-through: ROM BAR handling adjustments
Expecting the ROM BAR to be written with an all ones value when sizing
the region is wrong - the low bit has another meaning (enable/disable)
and bits 1..10 are reserved. The PCI spec also mandates writing all
ones to just the address portion of the register.
Use suitable constants also for initializing the ROM BAR register field
description.
The code introduced to address XSA-126 allows simplification of other
code in xen_pt_initfn(): All we need to do is update "cmd" suitably,
as it'll be written back to the host register near the end of the
function anyway.
Igor Mammedov [Tue, 9 Jun 2015 03:31:53 +0000 (05:31 +0200)]
pc: cleanup and convert TMP ACPI device description to AML API
remove some code duplication in acpi-build.c and drop 5
ASL and binary blobs files with TPM ACPI device description,
replacing them with 1 small hunk written in AML API.
Igor agreed to help review ACPI patches, add an entry to MAINTAINERS
with all ACPI stuff I could think of.
Note: I listed ARM ACPI files here just to make sure we are Cc'd, no
plan to maintain ACPI for ARM through my tree :)
Jason Wang [Fri, 29 May 2015 06:13:14 +0000 (14:13 +0800)]
vhost: correctly pass error to caller in vhost_dev_enable_notifiers()
We override the error value r in fail_vq, this will cause the caller
can't detect the failure which may cause the caller may disable the
notifiers twice if vhost is failed to start. Fix this by using another
variable to keep track the return value of set_host_notifier().
Denis V. Lunev [Mon, 15 Jun 2015 10:52:52 +0000 (13:52 +0300)]
balloon: add a feature bit to let Guest OS deflate balloon on oom
Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless it
is often the case that these control tools does not have enough time to
react on fast changing memory load. As a result OS runs out of memory and
invokes OOM-killer. The balancing of memory by use of the virtio balloon
should not cause the termination of processes while there are pages in the
balloon. Now there is no way for virtio balloon driver to free memory at
the last moment before some process get killed by OOM-killer.
This does not provide a security breach as balloon itself is running
inside Guest OS and is working in the cooperation with the host. Thus
some improvements from Guest side should be considered as normal.
To solve the problem, introduce a virtio_balloon callback which is
expected to be called from the oom notifier call chain in out_of_memory()
function. If virtio balloon could release some memory, it will make the
system return and retry the allocation that forced the out of memory
killer to run.
This behavior should be enabled if and only if appropriate feature bit
is set on the device. It is off by default.
This functionality was recently merged into vanilla Linux.
Until now, an SG device was identified only by checking if its path
started with "/dev/sg". Then, hdev_open() would set the bs->sg flag
accordingly. The patch relies on the actual properties of the device
instead of the specified file path.
To this end, test for an SG device (e.g. /dev/sg0) by ensuring that
all of the following holds:
- The specified file name corresponds to a character device
- The device supports the SG_GET_VERSION_NUM ioctl
- The device supports the SG_GET_SCSI_ID ioctl