spapr_iommu: Realloc guest visible TCE table when hot(un)plugging vfio-pci
This replaces g_malloc() with spapr_tce_alloc_table() as this is
the standard way of allocating tables and this allows moving the table
back to KVM when unplugging a VFIO PCI device and VFIO TCE acceleration
support is not present in the KVM.
Although spapr_tce_alloc_table() is expected to fail with EBUSY
if called when previous fd is not closed yet, in practice we will not
see it because cap_spapr_vfio is false at the moment.
KONRAD Frederic [Mon, 7 Aug 2017 15:50:46 +0000 (17:50 +0200)]
booke206: fix tlbnps for fixed size TLB
Some OS don't populate the TSIZE field when using a fixed size TLB which result
in a 1KB TLB. When the TLB is a fixed size TLB the TSIZE field should be
ignored.
KONRAD Frederic [Mon, 7 Aug 2017 15:50:45 +0000 (17:50 +0200)]
booke206: fix booke206_tlbnps for mav 2.0
This fixes booke206_tlbnps for MAV 2.0 by checking the MMUCFG register and
return directly the right tlbnps instead of computing it from non existing
field.
Sam Bobroff [Wed, 9 Aug 2017 05:38:56 +0000 (15:38 +1000)]
ppc: spapr: Make VCPU ID handling private to SPAPR
The concept of a VCPU ID that differs from the CPU's index
(cpu->cpu_index) exists only within SPAPR machines so, move the
functions ppc_get_vcpu_id() and ppc_get_cpu_by_vcpu_id() into spapr.c
and rename them appropriately.
Sam Bobroff [Thu, 3 Aug 2017 06:28:44 +0000 (16:28 +1000)]
ppc: spapr: Rename cpu_dt_id to vcpu_id
This field actually records the VCPU ID used by KVM and, although the
value is also used in the device tree it is primarily the VCPU ID so
rename it as such.
Sam Bobroff [Thu, 3 Aug 2017 06:28:36 +0000 (16:28 +1000)]
e500: Use cpu_index instead of vcpu_dt_id
The e500 platform code uses the function ppc_get_vcpu_dt_id() to get
an id to put in its device tree. Which seems like it makes sense, but
ppc_get_vcpu_dt_id() is actually badly named - it only differs from
cpu_index in cases where you're running on KVM HV and the host's
number of threads differs from the guests. Since KVM HV only supports
PAPR, not e500, it doesn't make sense to use it here.
Simply use the cpu_index instead (which is 'i' in this context
because qemu_get_cpu(i) returns the cpu with cpu_index == i).
Greg Kurz [Tue, 25 Jul 2017 17:59:44 +0000 (19:59 +0200)]
spapr_drc: add unrealize method to physical DRC class
When hot-unplugging a PHB, all its PCI DRC connectors get unrealized. This
patch adds an unrealize method to the physical DRC class, in order to undo
registrations performed in realize_physical().
Greg Kurz [Tue, 25 Jul 2017 17:59:18 +0000 (19:59 +0200)]
spapr_pci: parent the MSI memory region to the PHB
This memory region should be owned by the PHB. This ensures the PHB
cannot be finalized as long as the the region is guest visible, or
used by a CPU or a device.
Greg Kurz [Tue, 25 Jul 2017 17:58:53 +0000 (19:58 +0200)]
spapr_drc: use g_strdup_printf() instead of snprintf()
Passing a stack allocated buffer of arbitrary length to snprintf()
without checking the return value can cause the resultant strings
to be silently truncated.
Greg Kurz [Tue, 25 Jul 2017 17:58:40 +0000 (19:58 +0200)]
spapr_iommu: use g_strdup_printf() instead of snprintf()
Passing a stack allocated buffer of arbitrary length to snprintf()
without checking the return value can cause the resultant strings
to be silently truncated.
This patch is a follow up on the discussions made in patch
"hw/ppc: disable hotplug before CAS is completed" that can be
found at [1].
At this moment, we do not support CPU/memory hotplug in early
boot stages, before CAS. When a hotplug occurs, the event is logged
in an internal RTAS event log queue and an IRQ pulse is fired. In
regular conditions, the guest handles the interrupt by executing
check_exception, fetching the generated hotplug event and enabling
the device for use.
In early boot, this IRQ isn't caught (SLOF does not handle hotplug
events), leaving the event in the rtas event log queue. If the guest
executes check_exception due to another hotplug event, the re-assertion
of the IRQ ends up de-queuing the first hotplug event as well. In short,
a device hotplugged before CAS is considered coldplugged by SLOF.
This leads to device misbehavior and, in some cases, guest kernel
Ooops when trying to unplug the device.
A proper fix would be to turn every device hotplugged before CAS
as a colplugged device. This is not trivial to do with the current
code base though - the FDT is written in the guest memory at
ppc_spapr_reset and can't be retrieved without adding extra state
(fdt_size for example) that will need to managed and migrated. Adding
the hotplugged DT in the middle of CAS negotiation via the updated DT
tree works with CPU devs, but panics the guest kernel at boot. Additional
analysis would be necessary for LMBs and PCI devices. There are
questions to be made in QEMU/SLOF/kernel level about how we can make
this change in a sustainable way.
With Linux guests, a fix would be the kernel executing check_exception
at boot time, de-queueing the events that happened in early boot and
processing them. However, even if/when the newer kernels start
fetching these events at boot time, we need to take care of older
kernels that won't be doing that.
This patch works around the situation by issuing a CAS reset if a hotplugged
device is detected during CAS:
- the DRC conditions that warrant a CAS reset is the same as those that
triggers a DRC migration - the DRC must have a device attached and
the DRC state is not equal to its ready_state. With that in mind, this
patch makes use of 'spapr_drc_needed' to determine if a CAS reset
is needed.
- In the middle of CAS negotiations, the function
'spapr_hotplugged_dev_before_cas' goes through all the DRCs to see
if there are any DRC that requires a reset, using spapr_drc_needed. If
that happens, returns '1' in 'spapr_h_cas_compose_response' which will set
spapr->cas_reboot to true, causing the machine to reboot.
The sPAPR machine isn't clearing up the pending events QTAILQ on
machine reboot. This allows for unprocessed hotplug/epow events
to persist in the queue after reset and, when reasserting the IRQs in
check_exception later on, these will be being processed by the OS.
This patch implements a new function called 'spapr_clear_pending_events'
that clears up the pending_events QTAILQ. This helper is then called
inside ppc_spapr_reset to clear up the events queue, preventing
old/deprecated events from persisting after a reset.
hw/ppc/spapr_drc.c: change spapr_drc_needed to use drc->dev
This patch makes a small fix in 'spapr_drc_needed' to change how we detect
if a DRC has a device attached. Previously it used dr_entity_sense for this,
which works for physical DRCs.
However, for logical DRCs, it didn't cover the case where a logical DRC has
a drc->dev but the state is LOGICAL_UNUSABLE (e.g. a hotplugged CPU before
CAS). In this case, the dr_entity_sense of this DRC returns UNUSABLE and the
code was considering that there were no dev attached, making spapr_drc_needed
return 'false' when in fact we would like to migrate the DRC.
Changing it to check for drc->dev instead works for all DRC types.
At this point the conversion is a wash. Loading of TB+ofs is
smaller, but the actual return address from exit_tb is larger.
There are a few more insns required to transition between TBs.
But the expectation is that accesses to the constant pool will
on the whole be smaller.
We are not going to use ldrd for loading the comparator
for 32-bit guests, so don't limit cmp_off to 8 bits then.
This eliminates one insn in the tlb load for some guests.
Use UBFX to avoid limitation on CPU_TLB_BITS. Since we're dropping
the initial shift, we need to replace the page masking. We can use
MOVW+BIC to do this without shifting. The result is the same size
as the armv6 path with one less conditional instruction.
A new shared header tcg-pool.inc.c adds new_pool_label,
for registering a tcg_target_ulong to be emitted after
the generated code, plus relocation data to install a
pointer to the data.
A new pointer is added to the TCGContext, so that we
dump the constant pool as data, not code.
Dispense with TCGBackendData, as it has never been used for more than
holding a single pointer. Use a define in the cpu/tcg-target.h to
signal requirement for TCGLabelQemuLdst, so that we can drop the no-op
tcg-be-null.h stubs. Rename tcg-be-ldst.h to tcg-ldst.inc.c.
tcg: Move USE_DIRECT_JUMP discriminator to tcg/cpu/tcg-target.h
Replace the USE_DIRECT_JUMP ifdef with a TCG_TARGET_HAS_direct_jump
boolean test. Replace the tb_set_jmp_target1 ifdef with an unconditional
function tb_target_set_jmp_target.
While we're touching all backends, add a parameter for tb->tc_ptr;
we're going to need it shortly for some backends.
Move tb_set_jmp_target and tb_add_jump from exec-all.h to cpu-exec.c.
This opens the possibility for TCG_TARGET_HAS_direct_jump to be
a runtime decision -- based on host cpu capabilities, the size of
code_gen_buffer, or a future debugging switch.
Peter Maydell [Tue, 8 Aug 2017 12:42:52 +0000 (13:42 +0100)]
target/alpha: Switch to do_transaction_failed() hook
Switch the alpha target from the old unassigned_access hook
to the new do_transaction_failed hook. This allows us to
resolve a ??? in the old hook implementation.
The only part of the alpha target that does physical
memory accesses is reading the page table -- add a
TODO comment there to the effect that we should handle
bus faults on page table walks. (Since the palcode
doesn't actually do anything useful on a bus fault anyway
it's a bit moot for now.)
Peter Maydell [Mon, 4 Sep 2017 17:19:00 +0000 (18:19 +0100)]
configure: Drop AIX host support
Nobody has mentioned AIX host support on the mailing list for years,
and we have no test systems for it so it is most likely broken.
We've advertised in configure for two releases now that we plan
to drop support for this host OS, and have had no complaints.
Drop the AIX host support code.
We can also drop the now-unused AIX version of sys_cache_info().
Note that the _CALL_AIX define used in the PPC tcg backend is
also used for Linux PPC64, and so that code should not be removed.
* remotes/ericb/tags/pull-nbd-2017-09-06:
nbd: Use new qio_channel_*_all() functions
io: Add new qio_channel_read{, v}_all_eof functions
io: Yield rather than wait when already in coroutine
iotests: blacklist 194 with the luks driver
iotests: rewrite 192 to use _launch_qemu to fix LUKS support
Peter Maydell [Thu, 7 Sep 2017 15:42:55 +0000 (16:42 +0100)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20170907' into staging
target-arm:
* cleanups converting to DEFINE_PROP_LINK
* allwinner-a10: mark as not user-creatable
* initial patches working towards ARMv8M support
* implement generating aborts on memory transaction failures
* make BXJ behave correctly (ie not UNDEF) on ARMv6-and-later
* remotes/pmaydell/tags/pull-target-arm-20170907: (31 commits)
target/arm: Add Jazelle feature
target/arm: Implement new do_transaction_failed hook
hw/arm: Set ignore_memory_transaction_failures for most ARM boards
boards.h: Define new flag ignore_memory_transaction_failures
target/arm: Implement BXNS, and banked stack pointers
target/arm: Move regime_is_secure() to target/arm/internals.h
target/arm: Make CFSR register banked for v8M
target/arm: Make MMFAR banked for v8M
target/arm: Make CCR register banked for v8M
target/arm: Make MPU_CTRL register banked for v8M
target/arm: Make MPU_RNR register banked for v8M
target/arm: Make MPU_RBAR, MPU_RLAR banked for v8M
target/arm: Make MPU_MAIR0, MPU_MAIR1 registers banked for v8M
target/arm: Make VTOR register banked for v8M
nvic: Add NS alias SCS region
target/arm: Make CONTROL register banked for v8M
target/arm: Make FAULTMASK register banked for v8M
target/arm: Make PRIMASK register banked for v8M
target/arm: Make BASEPRI register banked for v8M
target/arm: Add MMU indexes for secure v8M
...
Peter Maydell [Thu, 7 Sep 2017 14:26:06 +0000 (15:26 +0100)]
Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20170906a' into staging
migration pull 2017-09-06
# gpg: Signature made Wed 06 Sep 2017 19:39:23 BST
# gpg: using RSA key 0x0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert/tags/pull-migration-20170906a:
migration: dump str in migrate_set_state trace
snapshot/tests: Try loadvm twice
migration: Reset rather than destroy main_thread_load_event
runstate/migrate: Two more transitions
host-utils: Simplify pow2ceil()
host-utils: Proactively fix pow2floor(), switch to unsigned
xbzrle: Drop unused cache_resize()
migration: Report when bdrv_inactivate_all fails
Peter Maydell [Thu, 7 Sep 2017 13:34:25 +0000 (14:34 +0100)]
Merge remote-tracking branch 'remotes/rth/tags/pull-tgt-20170906' into staging
tcg generic translate loop v15
# gpg: Signature made Wed 06 Sep 2017 17:02:31 BST
# gpg: using RSA key 0x64DF38E8AF7E215F
# gpg: Good signature from "Richard Henderson <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F
* remotes/rth/tags/pull-tgt-20170906: (32 commits)
target/arm: Perform per-insn cross-page check only for Thumb
target/arm: Split out thumb_tr_translate_insn
target/arm: Move ss check to init_disas_context
target/arm: [a64] Move page and ss checks to init_disas_context
target/arm: [tcg] Port to generic translation framework
target/arm: [tcg,a64] Port to disas_log
target/arm: [tcg] Port to disas_log
target/arm: [tcg,a64] Port to tb_stop
target/arm: [tcg] Port to tb_stop
target/arm: [tcg,a64] Port to translate_insn
target/arm: [tcg] Port to translate_insn
target/arm: [tcg,a64] Port to breakpoint_check
target/arm: [tcg,a64] Port to insn_start
target/arm: [tcg] Port to insn_start
target/arm: [tcg] Port to tb_start
target/arm: [tcg,a64] Port to init_disas_context
target/arm: [tcg] Port to init_disas_context
target/arm: [tcg] Port to DisasContextBase
target/i386: [tcg] Port to generic translation framework
target/i386: [tcg] Port to disas_log
...
This adds a feature bit indicating support of the (trivial) Jazelle
implementation if ARM_FEATURE_V6 is set or if the processor is arm926
or arm1026. This fixes the issue that any BXJ instruction will
result in an illegal_op. BXJ instructions will now check if the
architecture supports ARM_FEATURE_JAZELLE.
Peter Maydell [Thu, 7 Sep 2017 12:54:54 +0000 (13:54 +0100)]
hw/arm: Set ignore_memory_transaction_failures for most ARM boards
Set the MachineClass flag ignore_memory_transaction_failures
for almost all ARM boards. This means they retain the legacy
behaviour that accesses to unimplemented addresses will RAZ/WI
rather than aborting, when a subsequent commit adds support
for external aborts.
The exceptions are:
* virt -- we know that guests won't try to prod devices
that we don't describe in the device tree or ACPI tables
* mps2 -- this board was written to use unimplemented-device
for all the ranges with devices we don't yet handle
New boards should not set the flag, but instead be written
like the mps2.
Peter Maydell [Thu, 7 Sep 2017 12:54:54 +0000 (13:54 +0100)]
boards.h: Define new flag ignore_memory_transaction_failures
Define a new MachineClass field ignore_memory_transaction_failures.
If this is flag is true then the CPU will ignore memory transaction
failures which should cause the CPU to take an exception due to an
access to an unassigned physical address; the transaction will
instead return zero (for a read) or be ignored (for a write). This
should be set only by legacy board models which rely on the old
RAZ/WI behaviour for handling devices that QEMU does not yet model.
New board models should instead use "unimplemented-device" for all
memory ranges where the guest will attempt to probe for a device that
QEMU doesn't implement and a stub device is required.
We need this for ARM boards, where we're about to implement support for
generating external aborts on memory transaction failures. Too many
of our legacy board models rely on the RAZ/WI behaviour and we
would break currently working guests when their "probe for device"
code provoked an external abort rather than a RAZ.
Peter Maydell [Thu, 7 Sep 2017 12:54:54 +0000 (13:54 +0100)]
target/arm: Implement BXNS, and banked stack pointers
Implement the BXNS v8M instruction, which is like BX but will do a
jump-and-switch-to-NonSecure if the branch target address has bit 0
clear.
This is the first piece of code which implements "switch to the
other security state", so the commit also includes the code to
switch the stack pointers around, which is the only complicated
part of switching security state.
BLXNS is more complicated than just "BXNS but set the link register",
so we leave it for a separate commit.
Peter Maydell [Thu, 7 Sep 2017 12:54:54 +0000 (13:54 +0100)]
target/arm: Make CCR register banked for v8M
Make the CCR register banked if v8M security extensions are enabled.
This is slightly more complicated than the other "add banking"
patches because there is one bit in the register which is not
banked. We keep the live data in the NS copy of the register,
and adjust it on register reads and writes. (Since we don't
currently implement the behaviour that the bit controls, there
is nowhere else that needs to care.)
This patch includes the enforcement of the bits which are newly
RES1 in ARMv8M.
Peter Maydell [Thu, 7 Sep 2017 12:54:53 +0000 (13:54 +0100)]
target/arm: Make MPU_RBAR, MPU_RLAR banked for v8M
Make the MPU registers MPU_MAIR0 and MPU_MAIR1 banked if v8M security
extensions are enabled.
We can freely add more items to vmstate_m_security without
breaking migration compatibility, because no CPU currently
has the ARM_FEATURE_M_SECURITY bit enabled and so this
subsection is not yet used by anything.
Peter Maydell [Thu, 7 Sep 2017 12:54:53 +0000 (13:54 +0100)]
nvic: Add NS alias SCS region
For v8M the range 0xe002e000..0xe002efff is an alias region which
for secure accesses behaves like a NonSecure access to the main
SCS region. (For nonsecure accesses including when the security
extension is not implemented, it is RAZ/WI.)
Peter Maydell [Thu, 7 Sep 2017 12:54:52 +0000 (13:54 +0100)]
target/arm: Make FAULTMASK register banked for v8M
Make the FAULTMASK register banked if v8M security extensions are enabled.
Note that we do not yet implement the functionality of the new
AIRCR.PRIS bit (which allows the effect of the NS copy of FAULTMASK to
be restricted).
This patch includes the code to determine for v8M which copy
of FAULTMASK should be updated on exception exit; further
changes will be required to the exception exit code in general
to support v8M, so this is just a small piece of that.
The v8M ARM ARM introduces a notation where individual paragraphs
are labelled with R (for rule) or I (for information) followed
by a random group of subscript letters. In comments where we want
to refer to a particular part of the manual we use this convention,
which should be more stable across document revisions than using
section or page numbers.
Peter Maydell [Thu, 7 Sep 2017 12:54:52 +0000 (13:54 +0100)]
target/arm: Add MMU indexes for secure v8M
Now that MPU lookups can return different results for v8M
when the CPU is in secure vs non-secure state, we need to
have separate MMU indexes; add the secure counterparts
to the existing three M profile MMU indexes.
Peter Maydell [Thu, 7 Sep 2017 12:54:52 +0000 (13:54 +0100)]
target/arm: Add state field, feature bit and migration for v8M secure state
As the first step in implementing ARM v8M's security extension:
* add a new feature bit ARM_FEATURE_M_SECURITY
* add the CPU state field that indicates whether the CPU is
currently in the secure state
* add a migration subsection for this new state
(we will add the Secure copies of banked register state
to this subsection in later patches)
* add a #define for the one new-in-v8M exception type
* make the CPU debug log print S/NS status
Peter Maydell [Thu, 7 Sep 2017 12:54:51 +0000 (13:54 +0100)]
target/arm: Implement ARMv8M's PMSAv8 registers
As part of ARMv8M, we need to add support for the PMSAv8 MPU
architecture.
PMSAv8 differs from PMSAv7 both in register/data layout (for instance
using base and limit registers rather than base and size) and also in
behaviour (for example it does not have subregions); rather than
trying to wedge it into the existing PMSAv7 code and data structures,
we define separate ones.
This commit adds the data structures which hold the state for a
PMSAv8 MPU and the register interface to it. The implementation of
the MPU behaviour will be added in a subsequent commit.
Thomas Huth [Thu, 7 Sep 2017 12:54:51 +0000 (13:54 +0100)]
hw/arm/allwinner-a10: Mark the allwinner-a10 device with user_creatable = false
QEMU currently exits unexpectedly when the user accidentially
tries to do something like this:
$ aarch64-softmmu/qemu-system-aarch64 -S -M integratorcp -nographic
QEMU 2.9.93 monitor - type 'help' for more information
(qemu) device_add allwinner-a10
Unsupported NIC model: smc91c111
Exiting just due to a "device_add" should not happen. Looking closer
at the the realize and instance_init function of this device also
reveals that it is using serial_hds and nd_table directly there, so
this device is clearly not creatable by the user and should be marked
accordingly.
Eric Blake [Tue, 5 Sep 2017 19:11:14 +0000 (14:11 -0500)]
nbd: Use new qio_channel_*_all() functions
Rather than open-coding our own read/write-all functions, we
can make use of the recently-added qio code. It slightly
changes the error message in one of the iotests.
Eric Blake [Tue, 5 Sep 2017 19:11:13 +0000 (14:11 -0500)]
io: Add new qio_channel_read{, v}_all_eof functions
Some callers want to distinguish between clean EOF (no bytes read)
vs. a short read (at least one byte read, but EOF encountered
before reaching the desired length), as it allows clients the
ability to do a graceful shutdown when a server shuts down at
defined safe points in the protocol, rather than treating all
shutdown scenarios as an error due to EOF. However, we don't want
to require all callers to have to check for early EOF. So add
another wrapper function that can be used by the callers that care
about the distinction.
Eric Blake [Tue, 5 Sep 2017 19:11:12 +0000 (14:11 -0500)]
io: Yield rather than wait when already in coroutine
The new qio_channel_{read,write}{,v}_all functions are documented
as yielding until data is available. When used on a blocking
channel, this yield is done via qio_channel_wait() which spawns
a nested event loop under the hood (so it is that secondary loop
which yields as needed); but if we are already in a coroutine (at
which point QIO_CHANNEL_ERR_BLOCK is only possible if we are a
non-blocking channel), we want to yield the current coroutine
instead of spawning a nested event loop.