Paolo Bonzini [Tue, 22 Aug 2017 05:08:27 +0000 (07:08 +0200)]
scsi: move non-emulation specific code to scsi/
util/scsi.c includes some SCSI code that is shared by block/iscsi.c and
hw/scsi, but the introduction of the persistent reservation helper
will add many more instances of this. There is also include/block/scsi.h,
which actually is not part of the core block layer.
The persistent reservation manager will also need a home. A scsi/
directory provides one for both the aforementioned shared code and
the PR manager code.
Paolo Bonzini [Tue, 22 Aug 2017 07:31:36 +0000 (09:31 +0200)]
scsi: rename scsi_build_sense to scsi_convert_sense
After introducing the scsi/ subdirectory, there will be a scsi_build_sense
function that is the same as scsi_req_build_sense but without needing
a SCSIRequest. The existing scsi_build_sense function gets in the way,
remove it.
virtio-scsi: Add virtqueue_size parameter allowing virtqueue size to be set.
Since Linux switched to blk-mq as the default in Linux commit 5c279bd9e406 ("scsi: default to scsi-mq"), virtio-scsi LUNs consume
about 10x as much guest kernel memory.
This commit allows you to choose the virtqueue size for each
virtio-scsi-pci controller like this:
-device virtio-scsi-pci,id=scsi,virtqueue_size=16
The default is still 128 as before. Using smaller virtqueue_size
allows many more disks to be added to small memory virtual machines.
For a 1 vCPU, 500 MB, no swap VM I observed:
With scsi-mq enabled (upstream kernel): 175 disks
-"- ditto -"- virtqueue_size=64: 318 disks
-"- ditto -"- virtqueue_size=16: 775 disks
With scsi-mq disabled (kernel before 5c279bd9e406): 1755 disks
Note that to have any effect, this requires a kernel patch:
Joseph Myers [Fri, 11 Aug 2017 14:23:35 +0000 (14:23 +0000)]
target/i386: fix phminposuw in-place operation
The SSE4.1 phminposuw instruction finds the minimum 16-bit element in
the source vector, putting the value of that element in the low 16
bits of the destination vector, the index of that element in the next
three bits and zeroing the rest of the destination. The helper for
this operation fills the destination from high to low, meaning that
when the source and destination are the same register, the minimum
source element can be overwritten before it is copied to the
destination. This patch fixes it to fill the destination from low to
high instead, so the minimum source element is always copied first.
This fixes one gcc test failure in my GCC 6-based testing (and so
concludes the present sequence of patches, as I don't have any further
gcc test failures left in that testing that I attribute to QEMU bugs).
Joseph Myers [Thu, 10 Aug 2017 21:40:41 +0000 (21:40 +0000)]
target/i386: fix pcmpxstrx substring search
One of the cases of the SSE4.2 pcmpestri / pcmpestrm / pcmpistri /
pcmpistrm instructions does a substring search. The implementation of
this case in the pcmpxstrx helper is incorrect. The operation in this
case is a search for a string (argument d to the helper) in another
string (argument s to the helper); if a copy of d at a particular
position would run off the end of s, the resulting output bit should
be 0 whether or not the strings match in the region where they
overlap, but the QEMU implementation was wrongly comparing only up to
the point where s ends and counting it as a match if an initial
segment of d matched a terminal segment of s. Here, "run off the end
of s" means that some byte of d would overlap some byte outside of s;
thus, if d has zero length, it is considered to match everywhere,
including after the end of s. This patch fixes the implementation to
correspond with the proper instruction semantics. This fixes four gcc
test failures in my GCC 6-based testing.
Joseph Myers [Thu, 10 Aug 2017 00:24:23 +0000 (00:24 +0000)]
target/i386: fix packusdw in-place operation
The SSE4.1 packusdw instruction combines source and destination
vectors of signed 32-bit integers into a single vector of unsigned
16-bit integers, with unsigned saturation. When the source and
destination are the same register, this means each 32-bit element of
that register is used twice as an input, to produce two of the 16-bit
output elements, and so if the operation is carried out
element-by-element in-place, no matter what the order in which it is
applied to the elements, the first element's operation will overwrite
some future input. The helper for packssdw avoids this issue by
computing the result in a local temporary and copying it to the
destination at the end; this patch fixes the packusdw helper to do
likewise. This fixes three gcc test failures in my GCC 6-based
testing.
Joseph Myers [Tue, 8 Aug 2017 23:51:29 +0000 (23:51 +0000)]
target/i386: set rip_offset for further SSE instructions
It turns out that my recent fix to set rip_offset when emulating some
SSE4.1 instructions needs generalizing to cover a wider class of
instructions. Specifically, every instruction in the sse_op_table7
table, coming from various instruction set extensions, has an 8-bit
immediate operand that comes after any memory operand, and so needs
rip_offset set for correctness if there is a memory operand that is
rip-relative, and my patch only set it for a subset of those
instructions. This patch moves the rip_offset setting to cover the
wider class of instructions, so fixing 9 further gcc testsuite
failures in my GCC 6-based testing. (I do not know whether there
might be still further classes of instructions missing this setting.)
The SSE4.1 pmovsx* and pmovzx* instructions take packed 1-byte, 2-byte
or 4-byte inputs and sign-extend or zero-extend them to a wider vector
output. The associated helpers for these instructions do the
extension on each element in turn, starting with the lowest. If the
input and output are the same register, this means that all the input
elements after the first have been overwritten before they are read.
This patch makes the helpers extend starting with the highest element,
not the lowest, to avoid such overwriting. This fixes many GCC test
failures (161 in the gcc testsuite in my GCC 6-based testing) when
testing with a default CPU setting enabling those instructions.
* remotes/jnsnow/tags/ide-pull-request:
hw/block/fdc: Convert to realize
hw/ide: Convert DeviceClass init to realize
AHCI: remove DPRINTF macro
AHCI: pretty-print FIS to buffer instead of stderr
AHCI: Rework IRQ constants
AHCI: Replace DPRINTF with trace-events
IDE: replace DEBUG_AIO with trace events
ATAPI: Replace DEBUG_IDE_ATAPI with tracing events
IDE: add tracing for data ports
IDE: Add register hints to tracing
IDE: replace DEBUG_IDE with tracing system
hw/ide/microdrive: Mark the dscm1xxxx device with user_creatable = false
ide: ahci: unparent children buses before freeing their memory
Mao Zhongyi [Mon, 18 Sep 2017 14:05:13 +0000 (22:05 +0800)]
hw/ide: Convert DeviceClass init to realize
Replace init with realize in IDEDeviceClass, which has errp
as a parameter. So all the implementations now use error_setg
instead of error_report for reporting error.
John Snow [Mon, 18 Sep 2017 19:01:26 +0000 (15:01 -0400)]
AHCI: pretty-print FIS to buffer instead of stderr
The current FIS printing routines dump the FIS to screen. adjust this
such that it dumps to buffer instead, then use this ability to have
FIS dump mechanisms via trace-events instead of compiled defines.
John Snow [Mon, 18 Sep 2017 19:01:26 +0000 (15:01 -0400)]
AHCI: Rework IRQ constants
Create a new enum so that we can name the IRQ bits, which will make debugging
them a little nicer if we can print them out. Not handled in this patch, but
this will make it possible to get a nice debug printf detailing exactly which
status bits are set, as it can be multiple at any given time.
As a consequence of this patch, it is no longer possible to set multiple IRQ
codes at once, but nothing was utilizing this ability anyway.
John Snow [Mon, 18 Sep 2017 19:01:26 +0000 (15:01 -0400)]
ATAPI: Replace DEBUG_IDE_ATAPI with tracing events
As part of the ongoing effort to modernize the tracing facilities for
the IDE family of devices, remove PRINTFs in the ATAPI device with
actual tracing events.
Igor Mammedov [Mon, 18 Sep 2017 19:01:25 +0000 (15:01 -0400)]
ide: ahci: unparent children buses before freeing their memory
Fixes read after freeing error reported
https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg04243.html
Message-Id: <59a56959-ca12-ea75-33fa-ff07eba1b090@redhat.com>
ich9-ahci device creates ide buses and attaches them as QOM children
at realize time, however it forgets to properly clean them up
at unrealize time and frees memory containing these children,
with following call-chain:
qdev_device_add()
object_property_set_bool('realized', true)
device_set_realized()
...
pci_qdev_realize() -> pci_ich9_ahci_realize() -> ahci_realize()
...
s->dev = g_new0(AHCIDevice, ports);
...
AHCIDevice *ad = &s->dev[i];
ide_bus_new(&ad->port, sizeof(ad->port), qdev, i, 1);
^^^ creates bus in memory allocated by above gnew()
and adds it as child propety to ahci device
...
hotplug_handler_plug(); -> goto post_realize_fail;
pci_qdev_unrealize() -> pci_ich9_uninit() -> ahci_uninit()
...
g_free(s->dev);
^^^ free memory that holds children busses
return with error from device_set_realized()
As result later when qdev_device_add() tries to unparent ich9-ahci
after failed device_set_realized(),
object_unparent() -> object_property_del_child()
iterates over existing QOM children including buses added by
ide_bus_new() and tries to unparent them, which causes access to
freed memory where they where located.
Kevin Wolf [Mon, 18 Sep 2017 05:25:24 +0000 (07:25 +0200)]
qemu.py: Fix syntax error
Python requires parentheses around multiline expression. This fixes the
breakage of all Python-based qemu-iotests cases that was introduced in
commit dab91d9aa0.
Peter Maydell [Sun, 17 Sep 2017 15:24:48 +0000 (16:24 +0100)]
Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20170917' into staging
tcg queued patches
# gpg: Signature made Sun 17 Sep 2017 16:03:28 BST
# gpg: using RSA key 0x64DF38E8AF7E215F
# gpg: Good signature from "Richard Henderson <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F
* remotes/rth/tags/pull-tcg-20170917:
tcg/mips: Fully convert tcg_target_op_def
tcg/sparc: Fully convert tcg_target_op_def
tcg/ppc: Fully convert tcg_target_op_def
tcg/arm: Fully convert tcg_target_op_def
tcg/aarch64: Fully convert tcg_target_op_def
tcg: Fix types in tcg_regset_{set,reset}_reg
tcg: Remove tcg_regset_set32
tcg: Remove tcg_regset_{or,and,andnot,not}
tcg: Remove tcg_regset_set
tcg: Remove tcg_regset_clear
tcg: Add tcg_op_supported
accel/tcg: move USER code to user-exec.c
accel/tcg: move atomic_template.h to accel/tcg/
accel/tcg: move tcg-runtime to accel/tcg/
accel/tcg: move user-exec to accel/tcg/
accel/tcg: move softmmu_template.h to accel/tcg/
tcg/ppc: disable atomic write check on ppc32
Peter Maydell [Sat, 16 Sep 2017 13:36:16 +0000 (14:36 +0100)]
Merge remote-tracking branch 'remotes/ehabkost/tags/python-next-pull-request' into staging
Python queue, 2017-09-15
# gpg: Signature made Sat 16 Sep 2017 00:14:01 BST
# gpg: using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <[email protected]>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6
* remotes/ehabkost/tags/python-next-pull-request:
qemu.py: include debug information on launch error
qemu.py: improve message on negative exit code
qemu.py: use os.path.null instead of /dev/null
qemu.py: avoid writing to stdout/stderr
qemu.py: fix is_running() return before first launch()
qtest.py: Few pylint/style fixes
qmp.py: Avoid overriding a builtin object
qmp.py: Avoid "has_key" usage
qmp.py: Use object-based class for QEMUMonitorProtocol
qmp.py: Couple of pylint/style fixes
qemu.py: Use custom exceptions rather than Exception
qemu.py: Simplify QMP key-conversion
qemu.py: Use iteritems rather than keys()
qemu|qtest: Avoid dangerous arguments
qemu.py: Pylint/style fixes
qemu.py: include debug information on launch error
When launching a VM, if an exception happens and the VM is not
initiated, it might be useful to see the qemu command line and
the qemu command output.
This patch creates that message. Notice that self._iolog needs to be
cleaned up in the beginning of the launch() to make sure we will not
expose the qemu log from a previous launch if the current one fails.
This module should not write directly to stdout/stderr. Instead, it
should either raise exceptions or just log the messages and let the
callers handle them and decide what to do. For example, scripts could
choose to send the log messages stderr or/and write them to a file if
verbose or debugging mode is enabled.
This patch replaces the writes to stderr by an exception in the
send_fd_scm() when _socket_scm_helper is not set or not present. In the
same method, the subprocess Popen will now redirect the stdout/stderr to
logging.debug instead of writing to system stderr. As consequence, since
the Popen.communicate() is now used (in order to get the stdout), the
further call to wait() became redundant and was replaced by
Popen.returncode.
The shutdown() message on negative exit code will now be logged
to logging.warn instead of written to system stderr.
Lukáš Doktor [Fri, 18 Aug 2017 14:26:12 +0000 (16:26 +0200)]
qmp.py: Avoid overriding a builtin object
The "id" is a builtin method to get object's identity and should not be
overridden. This might bring some issues in case someone was directly
calling "cmd(..., id=id)" but I haven't found such usage on brief search
for "cmd\(.*id=".
Lukáš Doktor [Fri, 18 Aug 2017 14:26:09 +0000 (16:26 +0200)]
qmp.py: Couple of pylint/style fixes
No actual code changes, just initializing attributes earlier to avoid
AttributeError on early introspection, a few pylint/style fixes and
docstring clarifications.
Lukáš Doktor [Fri, 18 Aug 2017 14:26:08 +0000 (16:26 +0200)]
qemu.py: Use custom exceptions rather than Exception
The naked Exception should not be widely used. It makes sense to be a
bit more specific and use better-suited custom exceptions. As a benefit
we can store the full reply in the exception in case someone needs it
when catching the exception.
Lukáš Doktor [Fri, 18 Aug 2017 14:26:07 +0000 (16:26 +0200)]
qemu.py: Simplify QMP key-conversion
The QMP key conversion consist of '_'s to be replaced with '-'s, which
can easily be done by a single `str.replace` method which is faster and
does not require `string` module import.
In this case the `args` is actually copied so it would be safe to keep
it, but it's not a good practice to keep it. The same issue applies in
inherited qtest module.
# gpg: Signature made Fri 15 Sep 2017 09:21:15 BST
# gpg: using RSA key 0xDF32E7C0F0FFF9A2
# gpg: Good signature from "Eduardo Otubo (Senior Software Engineer) <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: D67E 1B50 9374 86B4 0723 DBAB DF32 E7C0 F0FF F9A2
* remotes/otubo/tags/pull-seccomp-20170915:
buildsys: Move seccomp cflags/libs to per object
seccomp: add resourcecontrol argument to command line
seccomp: add spawn argument to command line
seccomp: add elevateprivileges argument to command line
seccomp: add obsolete argument to command line
seccomp: changing from whitelist to blacklist
* remotes/huth/tags/check-20170915:
qtest: Avoid passing raw strings through hmp()
libqtest: Remove dead qtest_instances variable
numa-test: Use hmp()
qtest: Don't perform side effects inside assertion
test-qga: Kill broken and dead QGA_TEST_SIDE_EFFECTING code
tests: Fix broken ivshmem-server-msi/-irq tests
tests/libqtest: Use a proper error message if QTEST_QEMU_BINARY is missing
tests/test-hmp: Remove puv3 and tricore_testboard from the blacklist
tests: Introduce generic device hot-plug/hot-unplug functions
* remotes/dgibson/tags/ppc-for-2.11-20170915:
ppc/kvm: use kvm_vm_check_extension() in kvmppc_is_pr()
spapr_events: use QTAILQ_FOREACH_SAFE() in spapr_clear_pending_events()
spapr_cpu_core: cleaning up qdev_get_machine() calls
spapr_pci: don't create 64-bit MMIO window if we don't need to
spapr_pci: convert sprintf() to g_strdup_printf()
spapr_cpu_core: fail gracefully with non-pseries machine types
xics: fix several error leaks
vfio, spapr: Fix levels calculation
spapr_pci: handle FDT creation errors with _FDT()
spapr_pci: use the common _FDT() helper
spapr: fix CAS-generated reset
ppc/xive: fix OV5_XIVE_EXPLOIT bits
spapr: only update SDR1 once per-cpu during CAS
spapr_pci: use g_strdup_printf()
spapr_pci: drop useless check in spapr_populate_pci_child_dt()
spapr_pci: drop useless check in spapr_phb_vfio_get_loc_code()
hw/ppc/spapr.c: cleaning up qdev_get_machine() calls
net: Add SunGEM device emulation as found on Apple UniNorth
trace: Immediately apply per-vCPU state changes if a vCPU is being created
Right now, function trace_event_set_vcpu_state_dynamic() asynchronously enables
events in the case a vCPU is executing TCG code. If the vCPU is being created
this makes some events like "guest_cpu_enter" to not be traced.
Like many other libraries, libseccomp cflags and libs should only apply
to the building of necessary objects. Do so in the usual way with the
help of per object variables.
Eduardo Otubo [Mon, 13 Mar 2017 21:18:51 +0000 (22:18 +0100)]
seccomp: add resourcecontrol argument to command line
This patch adds [,resourcecontrol=deny] to `-sandbox on' option. It
blacklists all process affinity and scheduler priority system calls to
avoid any bigger of the process.
Eduardo Otubo [Mon, 13 Mar 2017 21:16:01 +0000 (22:16 +0100)]
seccomp: add spawn argument to command line
This patch adds [,spawn=deny] argument to `-sandbox on' option. It
blacklists fork and execve system calls, avoiding Qemu to spawn new
threads or processes.
Eduardo Otubo [Mon, 13 Mar 2017 21:13:27 +0000 (22:13 +0100)]
seccomp: add elevateprivileges argument to command line
This patch introduces the new argument
[,elevateprivileges=allow|deny|children] to the `-sandbox on'. It allows
or denies Qemu process to elevate its privileges by blacklisting all
set*uid|gid system calls. The 'children' option will let forks and
execves run unprivileged.
Eduardo Otubo [Wed, 1 Mar 2017 22:17:29 +0000 (23:17 +0100)]
seccomp: add obsolete argument to command line
This patch introduces the argument [,obsolete=allow] to the `-sandbox on'
option. It allows Qemu to run safely on old system that still relies on
old system calls.
Eduardo Otubo [Tue, 28 Feb 2017 20:13:12 +0000 (21:13 +0100)]
seccomp: changing from whitelist to blacklist
This patch changes the default behavior of the seccomp filter from
whitelist to blacklist. By default now all system calls are allowed and
a small black list of definitely forbidden ones was created.
Eric Blake [Mon, 11 Sep 2017 17:20:14 +0000 (12:20 -0500)]
qtest: Avoid passing raw strings through hmp()
hmp() passes its string argument through the sprintf() family;
with a proper attribute, gcc -Wformat warns us when we do something
dangerous like passing a non-constant format string. Fortunately,
all our strings were safe, but checking whether the string can
contain an unintended % is easy to avoid and therefore worth doing.
Eric Blake [Mon, 11 Sep 2017 17:19:49 +0000 (12:19 -0500)]
libqtest: Remove dead qtest_instances variable
Prior to commit 063c23d9, we were tracking a list of parallel
qtest objects, in order to safely clean up a SIGABRT handler
only after the last connection quits. But when we switched to
more of glib's infrastructure, the list became dead code that
is never assigned to.
Eric Blake [Mon, 11 Sep 2017 17:19:45 +0000 (12:19 -0500)]
test-qga: Kill broken and dead QGA_TEST_SIDE_EFFECTING code
Back when the test was introduced, in commit 62c39b307, the
test was set up to run qemu-ga directly on the host performing
the test, and defaults to limiting itself to safe commands. At
the time, it was envisioned that setting QGA_TEST_SIDE_EFFECTING
in the environment could cover a few more commands, while noting
the potential danger of those side effects running in the host.
But this has NEVER been tested: if you enable the environment
variable, the test WILL fail. One obvious reason: if you are not
running as root, you'll probably get a permission failure when
trying to freeze the file systems, or when changing system time.
Less obvious: if you run the test as root (wow, you're brave), you
could end up hanging if the test tries to log things to a
temporarily frozen filesystem. But the cutest reason of all: if
you get past the above hurdles, the test uses invalid JSON in
test_qga_fstrim() (missing '' around the dictionary key 'minimum'),
and will thus fail an assertion in qmp_fd().
Rather than leave this untested time-bomb in place, rip it out.
Hopefully, as originally envisioned, we can find an opportunity
to test an actual sandboxed guest where the guest-agent has
full permissions and will not unduly affect the host running
the test - if so, 'git revert' can be used if desired, for
salvaging any useful parts of this attempt.
Thomas Huth [Tue, 29 Aug 2017 17:03:51 +0000 (19:03 +0200)]
tests: Fix broken ivshmem-server-msi/-irq tests
Broken with commit b4ba67d9a7025 ("libqos: Change PCI accessors to take
opaque BAR handle") a while ago, but nobody noticed since the tests are
not run by default: The msix_pba_bar is not correctly initialized
anymore if bir_pba has the same value as bir_table. With this fix,
"make check SPEED=slow" should work fine again.
Thomas Huth [Mon, 28 Aug 2017 10:25:45 +0000 (12:25 +0200)]
tests/libqtest: Use a proper error message if QTEST_QEMU_BINARY is missing
The user can currently still cause an abort() if running certain tests
(like the prom-env-test) without setting the QTEST_QEMU_BINARY first.
A similar problem has been fixed with commit 7c933ad61b8f3f51337
already, but forgot to also take care of the qtest_get_arch() function,
so let's introduce a proper wrapper around getenv("QTEST_QEMU_BINARY")
that can be used in both locations now.
Thomas Huth [Thu, 24 Aug 2017 05:00:11 +0000 (07:00 +0200)]
tests/test-hmp: Remove puv3 and tricore_testboard from the blacklist
The problem with puv3 has been fixed with 0ac241bcf9f9d99a252a352a162f
('unicore32: abort when entering "x 0" on the monitor') and the problem
with tricore_testboard has been fixed with b190f477e29c7cd03a8fee49c96d
('qemu-system-tricore: segfault when entering "x 0" on the monitor').
A lot of tests provide code for adding and removing a device via the
device_add and device_del QMP commands. Maintaining this code in so many
places is cumbersome and error-prone (some of the code parts check the
responses for device deletion in an incorrect way, for example, we've got
to deal with both, error code and DEVICE_DEL event here). So let's provide
some proper generic functions for adding and removing a device instead.
The code for correctly unplugging a device has been taken from a patch
from Peter Xu.
Greg Kurz [Thu, 14 Sep 2017 10:48:04 +0000 (12:48 +0200)]
ppc/kvm: use kvm_vm_check_extension() in kvmppc_is_pr()
If the host has both KVM PR and KVM HV loaded and we pass:
-machine pseries,accel=kvm,kvm-type=PR
the kvmppc_is_pr() returns false instead of true. Since the helper
is mostly used as fallback, it doesn't have any real impact with
recent kernels. A notable exception is the workaround to allow
migration between compatible hosts with different PVRs (eg, POWER8
and POWER8E), since KVM still doesn't provide a way to check if a
specific PVR is supported (see commit c363a37a450f for details).
According to the official KVM API documentation [1], KVM_PPC_GET_PVINFO
is "vm ioctl", but we check it as a global ioctl. The following function
in KVM is hence called with kvm == NULL and considers we're in HV mode.
int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
{
int r;
/* Assume we're using HV mode when the HV module is loaded */
int hv_enabled = kvmppc_hv_ops ? 1 : 0;
if (kvm) {
/*
* Hooray - we know which VM type we're running on. Depend on
* that rather than the guess above.
*/
hv_enabled = is_kvmppc_hv_enabled(kvm);
}
Let's use kvm_vm_check_extension() to fix the issue.
Greg Kurz [Tue, 12 Sep 2017 06:51:21 +0000 (08:51 +0200)]
spapr_cpu_core: cleaning up qdev_get_machine() calls
This patch removes the qdev_get_machine() calls that are made
in spapr_cpu_core.c in situations where we can get an existing
pointer for the MachineState by either passing it as an argument
to the function or by using other already available pointers.
Credits to Daniel Henrique Barboza for the idea and the changelog
text.
This happens because we always create a 64-bit MMIO window, even if
we didn't explicitely requested it (ie, mem64_win_size == 0) and the
32-bit window is below 2GiB. It doesn't seem to have an impact on the
guest though because spapr_populate_pci_dt() doesn't advertise the
bogus windows when mem64_win_size == 0.
Since these memory regions don't induce any state, we can safely
choose to not create them when their address is equal to -1,
without breaking migration from existing setups.
Greg Kurz [Mon, 11 Sep 2017 20:52:06 +0000 (22:52 +0200)]
spapr_cpu_core: fail gracefully with non-pseries machine types
Since commit 7cca3e466eb0 ("ppc: spapr: Move VCPU ID calculation into
sPAPR"), QEMU aborts when started with a *-spapr-cpu-core device and
a non-pseries machine.
Let's rely on the already existing call to object_dynamic_cast() instead
of using the SPAPR_MACHINE() macro.
Greg Kurz [Mon, 11 Sep 2017 22:04:40 +0000 (00:04 +0200)]
xics: fix several error leaks
If object_property_get_link() fails then it allocates an error, which
must be freed before returning. The error_get_pretty() function is
merely an accessor to the error message and doesn't free anything.
The error.h header indicates how to do it right:
* Pass an existing error to the caller with the message modified:
* error_propagate(errp, err);
* error_prepend(errp, "Could not frobnicate '%s': ", name);
The existing tries to round up the number of pages but @pages is always
calculated as the rounded up value minus one which makes ctz64() always
return 0 and have create.levels always set 1.
This removes wrong "-1" and allows having more than 1 levels. This becomes
handy for >128GB guests with standard 64K pages as this requires blocks
with zone order 9 and the popular limit of CONFIG_FORCE_MAX_ZONEORDER=9
means that only blocks up to order 8 are allowed.
Greg Kurz [Sat, 9 Sep 2017 15:06:33 +0000 (17:06 +0200)]
spapr_pci: handle FDT creation errors with _FDT()
libfdt failures when creating the FDT should cause QEMU to terminate.
Let's use the _FDT() macro which does just that instead of propagating
the error to the caller. spapr_populate_pci_child_dt() no longer needs
to return a value in this case.
Note that, on the way, this get rids of the following nonsensical lines:
Greg Kurz [Sat, 9 Sep 2017 15:06:25 +0000 (17:06 +0200)]
spapr_pci: use the common _FDT() helper
All other users in hw/ppc already consider an error when building
the FDT to be fatal, even on hotplug paths. There's no valid reason
for spapr_pci to behave differently. So let's used the common _FDT()
helper which terminates QEMU when libfdt fails.
The OV5_MMU_RADIX_300 requires special handling in the CAS negotiation
process. It is cleared from the option vector of the guest before
evaluating the changes and re-added later. But, when testing for a
possible CAS reset :
On POWER9, the Client Architecture Support (CAS) negotiation process
determines whether the guest operates in XIVE Legacy compatibility or
in XIVE exploitation mode. Now that we have initial guest support for
the XIVE interrupt controller, let's fix the bits definition which have
evolved in the latest specs.
The platform advertises the XIVE Exploitation Mode support using the
property "ibm,arch-vec-5-platform-support-vec-5", byte 23 bits 0-1 :
- 0b00 XIVE legacy mode Only
- 0b01 XIVE exploitation mode Only
- 0b10 XIVE legacy or exploitation mode
The OS asks for XIVE Exploitation Mode support using the property
"ibm,architecture-vec-5", byte 23 bits 0-1:
- 0b00 XIVE legacy mode Only
- 0b01 XIVE exploitation mode Only