Peter Maydell [Mon, 31 Oct 2016 11:12:02 +0000 (11:12 +0000)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20161028' into staging
target-arm queue:
* Fix reset GPIO handling for spitz, tosa boards
* virt: add 'pmu' property for configuring whether to expose the
vPMU to the guest
* char: cadence: correct reset value for baud rate registers
* versatilepb: do not run if user asks for more than 256MB RAM
* pxa2xx: Set value default values for CCCR and CKEN on PXA255
* arm: cubieboard: Add support for initrd
* i.MX: Fix GPIO ISR register write
* remotes/pmaydell/tags/pull-target-arm-20161028:
hw/arm/tosa: Fix reset handling
hw/arm/spitz: Fix reset handling
arm: virt: add PMU property to mach-virt machine type
arm: Add an option to turn on/off vPMU support
char: cadence: correct reset value for baud rate registers
versatilepb: do not run if user asks for more than 256MB RAM
hw/arm/pxa2xx: Set value default values for CCCR and CKEN on PXA255
arm: cubieboard: Add support for initrd
i.MX: Fix GPIO ISR register write
Peter Maydell [Mon, 31 Oct 2016 10:10:16 +0000 (10:10 +0000)]
Merge remote-tracking branch 'remotes/famz/tags/for-upstream' into staging
# gpg: Signature made Fri 28 Oct 2016 15:47:39 BST
# gpg: using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6
* remotes/famz/tags/for-upstream:
aio: convert from RFifoLock to QemuRecMutex
qemu-thread: introduce QemuRecMutex
iothread: release AioContext around aio_poll
block: only call aio_poll on the current thread's AioContext
qemu-img: call aio_context_acquire/release around block job
qemu-io: acquire AioContext
block: prepare bdrv_reopen_multiple to release AioContext
replication: pass BlockDriverState to reopen_backing_file
iothread: detach all block devices before stopping them
aio: introduce qemu_get_current_aio_context
sheepdog: use BDRV_POLL_WHILE
nfs: use BDRV_POLL_WHILE
nfs: move nfs_set_events out of the while loops
block: introduce BDRV_POLL_WHILE
qed: Implement .bdrv_drain
block: change drain to look only at one child at a time
block: add BDS field to count in-flight requests
mirror: use bdrv_drained_begin/bdrv_drained_end
blockjob: introduce .drain callback for jobs
replication: interrupt failover if the main device is closed
Alex Bennée [Fri, 28 Oct 2016 13:25:59 +0000 (14:25 +0100)]
net: split colo_compare_pkt_info into two trace events
It seems there is a limit to the number of arguments a UST trace event
can take and at 11 the previous trace command broke the build. Split the
trace into a src pkt and dst pkt trace to fix this.
* remotes/kraxel/tags/pull-ui-20161028-1:
curses: Use cursesw instead of curses
curses: fix left/right arrow translation
ui/gtk: Fix non-working DELETE key
gtk: fix compilation warning with gtk 3.22.2
Defer BrlAPI tty acquisition to when guest starts using device
Add dots keypresses support to the baum braille device
* remotes/vivier/tags/m68k-part2-pull-request:
MAINTAINERS: update M68K entry
target-m68k: immediate ops manage word and byte operands
target-m68k: cmp manages word and bytes operands
target-m68k: add/sub manage word and byte operands
target-m68k: add addressing modes to neg
target-m68k: introduce byte and word cc_ops
target-m68k: some bit ops cleanup
target-m68k: suba/adda can manage word operand
target-m68k: and can manage word and byte operands
target-m68k: or can manage word and byte operands
target-m68k: eor can manage word and byte operands
target-m68k: add addressing modes to not
target-m68k: Inline addx, subx, negx
target-m68k: add dbcc
target-m68k: add addressing modes to scc
target-m68k: add exg ops
target-m68k: add linkl
target-m68k: add bkpt instruction
Peter Maydell [Fri, 28 Oct 2016 15:31:59 +0000 (16:31 +0100)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.8-20161028' into staging
ppc patch queue 2016-10-28
This pull request supersedes and extends the one from 2016-10-26
(which had a build bug).
Highlights:
* SLOF (pseries guest firmware) update
* Enable a number of extra testcases on ppc / pseries
* Added the 'powernv' machine type
- Almost enough to be minimally usable
- But still missing necessary interrupt controller updates
* Cleanup and consolidation of NVRAM handling on several platforms
with related firmware
* Substantial cleanup to device tree construction
* Some more POWER9 instruction emulation
* Cleanup to handling of pseries option vectors and CAS reboot
handling (host/guest feature negotiation mechanism)
* Significant cleanups to handling of PCI devices in test cases
* New hotplug event infrastructure
* Memory hot unplug support for pseries
* Several bug fixes
The NVRAM cleanup affects some Sun sparc platforms as well as ppc
ones, but have been tested by the sparc maintainer (Mark Cave-Ayland).
The test additions also include substantial general changes to the
test framework that aren't strictly ppc related. They don't seem to
break tests on other platforms, they're for the benefit of enabling
tests on ppc and there isn't a specific maintainer for them, so
they're included in this tree.
* remotes/dgibson/tags/ppc-for-2.8-20161028: (73 commits)
ppc: allow certain HV interrupts to be delivered to guests
spapr: Memory hot-unplug support
spapr: use count+index for memory hotplug
spapr: Add DRC count indexed hotplug identifier type
spapr: add hotplug interrupt machine options
spapr_events: add support for dedicated hotplug event source
spapr: update spapr hotplug documentation
target-ppc: Add xvcmpnesp, xvcmpnedp instructions
target-ppc: add xscmp[eq,gt,ge,ne]dp instructions
tests: Add pseries machine to the prom-env-test, too
spapr_nvram: Pre-initialize the NVRAM to support the -prom-env parameter
libqos: Change PCI accessors to take opaque BAR handle
tests: Don't assume structure of PCI IO base in ahci-test
tests: Use qpci_mem{read,write} in ivshmem-test
libqos: Add 64-bit PCI IO accessors
tests: Clean up IO handling in ide-test
libqos: Implement mmio accessors in terms of mem{read,write}
libqos: Add streaming accessors for PCI MMIO
tests: Adjust tco-test to use qpci_legacy_iomap()
libqos: Better handling of PCI legacy IO
...
Guenter Roeck [Fri, 28 Oct 2016 13:12:31 +0000 (14:12 +0100)]
hw/arm/tosa: Fix reset handling
Using the CPU reset handler for resets triggered by writing into
gpio pins other than GPIO01 is not appropriate and does not work,
since the reset triggered by writing into GPIO01 is configurable.
Use a separate reset handler for tosa to reset the entire system
and not just the CPU.
Guenter Roeck [Fri, 28 Oct 2016 13:12:31 +0000 (14:12 +0100)]
hw/arm/spitz: Fix reset handling
Using the CPU reset handler for resets triggered by writing into
gpio pins other than GPIO01 is not appropriate and does not work,
since the reset triggered by writing into GPIO01 is configurable.
Use a separate reset handler for spitz to reset the entire system
and not just the CPU.
Wei Huang [Fri, 28 Oct 2016 13:12:31 +0000 (14:12 +0100)]
arm: virt: add PMU property to mach-virt machine type
CPU vPMU is now turned ON by default, but this feature wasn't introduced
until virt-2.7 machine type. To solve this problem, this patch adds a
PMU option in machine state, which is used to control CPU's vPMU status.
This PMU option is not exposed to command line and is turned off in
virt-2.6 machine type.
Wei Huang [Fri, 28 Oct 2016 13:12:31 +0000 (14:12 +0100)]
arm: Add an option to turn on/off vPMU support
This patch adds a pmu=[on/off] option to enable/disable vPMU support
in guest vCPU. It allows virt tools, such as libvirt, to determine the
exsitence of vPMU and configure it. Note this option is only available
for cortex-a57/cortex-53/ host CPUs, but unavailable on ARMv7 and other
processors. Also even though "pmu=" option is available for TCG mode,
setting it doesn't turn PMU on.
char: cadence: correct reset value for baud rate registers
The Cadence UART device emulator stores 'baud rate generator'
and 'baud rate divider' values, used in computing speed, in two
registers. The device specification defines their range and
their reset value. Use their correct value when resetting the
device in cadence_uart_reset.
versatilepb: do not run if user asks for more than 256MB RAM
The versatilepb physical address space layout only has
a 256MB region for RAM before the devices. Without a guard
on the amount of RAM requested by the user we would happily
create a RAM area that overlapped with the devices, resulting
in very confusing behaviour (typically a guest crash).
Report the problem to the user if they try to request more
RAM than the board can handle (as we do already for some
other board models).
Guenter Roeck [Fri, 28 Oct 2016 13:12:31 +0000 (14:12 +0100)]
hw/arm/pxa2xx: Set value default values for CCCR and CKEN on PXA255
The code used default values for PXA270 to configure CCCR. For PXA255,
the resulting register value is invalid (unsupported) and resulted
in a division by zero in the Linux kernel. Use default values from
datasheet instead.
Guenter Roeck [Fri, 28 Oct 2016 13:12:31 +0000 (14:12 +0100)]
i.MX: Fix GPIO ISR register write
Writing the ISR register is supposed to clear interrupt status bits,
not to set them.
This patch makes '-M sabrelite' work without devicetree changes (Linux
kernel versions 3.18 to 4.7 with imx_v6_v7_defconfig and up to v4.8 with
multi_v7_defconfig; mainline has different problems).
Peter Maydell [Fri, 28 Oct 2016 14:30:55 +0000 (15:30 +0100)]
Merge remote-tracking branch 'remotes/berrange/tags/pull-qio-2016-10-27-1' into staging
Merge qio 2016/10/27 v1
# gpg: Signature made Thu 27 Oct 2016 13:54:03 BST
# gpg: using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <[email protected]>"
# gpg: aka "Daniel P. Berrange <[email protected]>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF
* remotes/berrange/tags/pull-qio-2016-10-27-1:
main: set names for main loop sources created
vnc: set name for all I/O channels created
migration: set name for all I/O channels created
char: set name for all I/O channels created
nbd: set name for all I/O channels created
io: add ability to set a name for IO channels
io: Add a QIOChannelSocket cleanup test
io: set LISTEN flag explicitly for listen sockets
io: Introduce a qio_channel_set_feature() helper
io: Use qio_channel_has_feature() where applicable
io: Fix double shift usages on QIOChannel features
Paolo Bonzini [Thu, 27 Oct 2016 10:49:06 +0000 (12:49 +0200)]
iothread: release AioContext around aio_poll
This is the first step towards having fine-grained critical sections in
dataplane threads, which will resolve lock ordering problems between
address_space_* functions (which need the BQL when doing MMIO, even
after we complete RCU-based dispatch) and the AioContext.
Because AioContext does not use contention callbacks anymore, the
unit test has to be changed.
Paolo Bonzini [Thu, 27 Oct 2016 10:49:05 +0000 (12:49 +0200)]
block: only call aio_poll on the current thread's AioContext
aio_poll is not thread safe; for example bdrv_drain can hang if
the last in-flight I/O operation is completed in the I/O thread after
the main thread has checked bs->in_flight.
The bug remains latent as long as all of it is called within
aio_context_acquire/aio_context_release, but this will change soon.
To fix this, if bdrv_drain is called from outside the I/O thread,
signal the main AioContext through a dummy bottom half. The event
loop then only runs in the I/O thread.
Paolo Bonzini [Thu, 27 Oct 2016 10:49:02 +0000 (12:49 +0200)]
block: prepare bdrv_reopen_multiple to release AioContext
After the next patch bdrv_drain_all will have to be called without holding any
AioContext. Prepare to do this by adding an AioContext argument to
bdrv_reopen_multiple.
Paolo Bonzini [Thu, 27 Oct 2016 10:49:00 +0000 (12:49 +0200)]
iothread: detach all block devices before stopping them
Soon bdrv_drain will not call aio_poll itself on iothreads. If block
devices are left hanging off the iothread's AioContext, there will be no
one to do I/O for those poor devices.
Paolo Bonzini [Thu, 27 Oct 2016 10:48:55 +0000 (12:48 +0200)]
block: introduce BDRV_POLL_WHILE
We want the BDS event loop to run exclusively in the iothread that
owns the BDS's AioContext. This macro will provide the synchronization
between the two event loops; for now it just wraps the common idiom
of a while loop around aio_poll.
Fam Zheng [Thu, 27 Oct 2016 10:48:54 +0000 (12:48 +0200)]
qed: Implement .bdrv_drain
The "need_check_timer" is used to clear the "NEED_CHECK" flag in the
image header after a grace period once metadata update has finished. To
comply with the bdrv_drain semantics, we should make sure it remains
deleted once .bdrv_drain is called.
The change to qed_need_check_timer_cb is needed because bdrv_qed_drain
is called after s->bs has been drained, and should not operate on it;
instead it should operate on the BdrvChild-ren exclusively. Doing so
is easy because QED does not have a bdrv_co_flush_to_os callback, hence
all that is needed to flush it is to ensure writes have reached the disk.
Based on commit df9a681dc9a (which however included some unrelated
hunks, possibly due to a merge failure or an overlooked squash).
The patch was reverted because at the time bdrv_qed_drain could call
qed_plug_allocating_write_reqs while an allocating write was queued.
This however is not possible anymore after the previous patch;
.bdrv_drain is only called after all writes have completed at the
QED level, and its purpose is to trigger metadata writes in bs->file.
Paolo Bonzini [Thu, 27 Oct 2016 10:48:53 +0000 (12:48 +0200)]
block: change drain to look only at one child at a time
bdrv_requests_pending is checking children to also wait until internal
requests (such as metadata writes) have completed. However, checking
children is in general overkill. Children requests can be of two kinds:
- requests caused by an operation on bs, e.g. a bdrv_aio_write to bs
causing a write to bs->file->bs. In this case, the parent's in_flight
count will always be incremented by at least one for every request in
the child.
- asynchronous metadata writes or flushes. Such writes can be started
even if bs's in_flight count is zero, but not after the .bdrv_drain
callback has been invoked.
This patch therefore changes bdrv_drain to finish I/O in the parent
(after which the parent's in_flight will be locked to zero), call
bdrv_drain (after which the parent will not generate I/O on the child
anymore), and then wait for internal I/O in the children to complete.
Paolo Bonzini [Thu, 27 Oct 2016 10:48:52 +0000 (12:48 +0200)]
block: add BDS field to count in-flight requests
Unlike tracked_requests, this field also counts throttled requests,
and remains non-zero if an AIO operation needs a BH to be "really"
completed.
With this change, it is no longer necessary to have a dummy
BdrvTrackedRequest for requests that are never serialising, and
it is no longer necessary to poll the AioContext once after
bdrv_requests_pending(bs) returns false.
Paolo Bonzini [Thu, 27 Oct 2016 10:48:50 +0000 (12:48 +0200)]
blockjob: introduce .drain callback for jobs
This is required to decouple block jobs from running in an
AioContext. With multiqueue block devices, a BlockDriverState
does not really belong to a single AioContext.
The solution is to first wait until all I/O operations are
complete; then loop in the main thread for the block job to
complete entirely.
Paolo Bonzini [Thu, 27 Oct 2016 10:48:49 +0000 (12:48 +0200)]
replication: interrupt failover if the main device is closed
Without this change, there is a race condition in tests/test-replication.
Depending on how fast the failover job (active commit) runs, there is a
chance of two bad things happening:
1) replication_done can be called after the secondary has been closed
and hence when the BDRVReplicationState is not valid anymore.
2) two copies of the active disk are present during the
/replication/secondary/stop test (that test runs immediately after
/replication/secondary/start, which tests failover). This causes the
corruption detector to fire.
* remotes/jnsnow/tags/ide-pull-request:
qemu-iotests: Test creating floppy drives
fdc: Move qdev properties to FloppyDrive
fdc: Add a floppy drive qdev
fdc: Add a floppy qbus
macio: switch over to new byte-aligned DMA helpers
dma-helpers: explicitly pass alignment into DMA helpers
Samuel Thibault [Sat, 15 Oct 2016 19:53:05 +0000 (21:53 +0200)]
curses: Use cursesw instead of curses
Use ncursesw package instead of curses on non-mingw, and check a few
functions.
Also take cflags from pkg-config, since cursesw headers may be in a
separate, non-default directory.
Thomas Huth [Thu, 27 Oct 2016 12:17:27 +0000 (14:17 +0200)]
ui/gtk: Fix non-working DELETE key
GTK generates key events for the delete key with key->string[0] = 0x7f
... but this does not work right with the readline_handle_byte()
function in util/readline.c, since this treats the keycode 127 as
backspace. So let's add a special case for the GTK delete key to make
this key behave right in the monitor interface of the GTK ui.
Nicholas Piggin [Thu, 27 Oct 2016 12:50:58 +0000 (23:50 +1100)]
ppc: allow certain HV interrupts to be delivered to guests
ppc hypervisors have delivered system reset and machine check exception
interrupts to guests in some situations (e.g., see FWNMI feature of LoPAPR,
or NMI injection in QEMU).
These exceptions are architected to set the HV bit in hardware, however
when injected into a guest, the HV bit should be cleared. Current code
masks off the HV bit before setting the new MSR, however this happens after
the interrupt delivery model has calculated delivery mode for the exception.
This can result in the guest's MSR LE bit being lost.
Account for this in the exception handler and don't set HV bit for guest
delivery.
Also add another sanity check to ensure similar bugs get caught.
Bharata B Rao [Thu, 27 Oct 2016 02:20:30 +0000 (21:20 -0500)]
spapr: Memory hot-unplug support
Add support to hot remove pc-dimm memory devices.
Since we're introducing a machine-level unplug_request hook, we also
had handling for CPU unplug there as well to ensure CPU unplug
continues to work as it did before.
Signed-off-by: Bharata B Rao <[email protected]>
* add hooks to CAS/cmdline enablement of hotplug ACR support
* add hook for CPU unplug Signed-off-by: Michael Roth <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: David Gibson <[email protected]>
spapr: Move memory hotplug to RTAS_LOG_V6_HP_ID_DRC_COUNT type
dropped per-DRC/per-LMB hotplugs event in favor of a bulk add via a
single LMB count value. This was to avoid overrunning the guest EPOW
event queue with hotplug events. This works fine, but relies on the
guest exhaustively scanning for pluggable LMBs to satisfy the
requested count by issuing rtas-get-sensor(DR_ENTITY_SENSE, ...) calls
until all the LMBs associated with the DIMM are identified.
With newer support for dedicated hotplug event source, this queue
exhaustion is no longer as much of an issue due to implementation
details on the guest side, but we still try to avoid excessive hotplug
events by now supporting both a count and a starting index to avoid
unecessary work. This patch makes use of that approach when the
capability is available.
Bharata B Rao [Thu, 27 Oct 2016 02:20:28 +0000 (21:20 -0500)]
spapr: Add DRC count indexed hotplug identifier type
Add support for DRC count indexed hotplug ID type which is primarily
needed for memory hot unplug. This type allows for specifying the
number of DRs that should be plugged/unplugged starting from a given
DRC index.
Signed-off-by: Bharata B Rao <[email protected]>
* updated rtas_event_log_v6_hp to reflect count/index field ordering
used in PAPR hotplug ACR Signed-off-by: Michael Roth <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: David Gibson <[email protected]>
If false, QEMU will force the use of "legacy" style hotplug events,
which are surfaced through EPOW events instead of a dedicated
hot plug event source, and lack certain features necessary, mainly,
for memory unplug support.
If true, QEMU will enable support for "modern" dedicated hot plug
event source. Note that we will still default to "legacy" style unless
the guest advertises support for the "modern" hotplug events via
ibm,client-architecture-support hcall during early boot.
For pseries-2.7 and earlier we default to false, for newer machine
types we default to true.
Michael Roth [Thu, 27 Oct 2016 02:20:26 +0000 (21:20 -0500)]
spapr_events: add support for dedicated hotplug event source
Hotplug events were previously delivered using an EPOW interrupt
and were queued by linux guests into a circular buffer. For traditional
EPOW events like shutdown/resets, this isn't an issue, but for hotplug
events there are cases where this buffer can be exhausted, resulting
in the loss of hotplug events, resets, etc.
Newer-style hotplug event are delivered using a dedicated event source.
We enable this in supported guests by adding standard an additional
event source in the guest device-tree via /event-sources, and, if
the guest advertises support for the newer-style hotplug events,
using the corresponding interrupt to signal the available of
hotplug/unplug events.
Michael Roth [Thu, 27 Oct 2016 02:20:25 +0000 (21:20 -0500)]
spapr: update spapr hotplug documentation
This updates the existing documentation to reflect recent updates to
the hotplug event structure, which are in draft form but slated
for inclusion in PAPR/LoPAPR.
David Gibson [Fri, 28 Oct 2016 00:17:31 +0000 (11:17 +1100)]
tests: Add pseries machine to the prom-env-test, too
Now that we also support the "-prom-env" parameter for the pseries
machine, we can enable this test for this machine, too. Since booting
with TCG is rather slow with the pseries machine, we also enable
the "-nodefaults" parameter for this test now, so that SLOF does not
have to check that much devices during boot and thus runs a little
bit faster.
Signed-off-by: Thomas Huth <[email protected]>
[dwg: Don't add -nodefaults to the command line, it causes extra warnings
for the sparc testcases] Signed-off-by: David Gibson <[email protected]>
Thomas Huth [Wed, 26 Oct 2016 07:35:40 +0000 (09:35 +0200)]
spapr_nvram: Pre-initialize the NVRAM to support the -prom-env parameter
In case we do not load the NVRAM contents from a file and the user
specified the "-prom-env" parameter, use the new CHRP NVRAM helper
functions to pre-initialize the NVRAM partitions, so that the SLOF
firmware now can pick up the environment variables from the -prom-env
parameter, too.
David Gibson [Mon, 24 Oct 2016 04:52:06 +0000 (15:52 +1100)]
libqos: Change PCI accessors to take opaque BAR handle
The usual use model for the libqos PCI functions is to map a specific PCI
BAR using qpci_iomap() then pass the returned token into IO accessor
functions. This, and the fact that iomap() returns a (void *) which
actually contains a PCI space address, kind of suggests that the return
value from iomap is supposed to be an opaque token.
..except that the callers expect to be able to add offsets to it. Which
also assumes the compiler will support pointer arithmetic on a (void *),
and treat it as working with byte offsets.
To clarify this situation change iomap() and the IO accessors to take
a definitely opaque BAR handle (enforced with a wrapper struct) along with
an offset within the BAR. This changes both the functions and all the
callers.
There were a number of places that checked if iomap() returned non-NULL,
and or initialized it to NULL before hand. Since iomap() already assert()s
if it fails to map the BAR, these tests were mostly pointless and are
removed.
David Gibson [Mon, 24 Oct 2016 04:50:22 +0000 (15:50 +1100)]
tests: Don't assume structure of PCI IO base in ahci-test
In a couple of places ahci-test makes assumptions about how the tokens
returned from qpci_iomap() are formatted in ways it probably shouldn't.
First in verify_state() it uses a non-NULL token to indicate that the AHCI
device has been enabled (part of enabling is to iomap()). This changes it
to use an explicit 'enabled' flag instead.
Second, it uses the fact that the token contains a PCI address, stored when
the BAR is mapped during initialization to check that the BAR has the same
value after a migration. This changes it to explicitly read the BAR
register before and after the migration and compare.
Together, these changes will make the test more robust against changes to
the internals of the libqos PCI layer.
David Gibson [Tue, 18 Oct 2016 05:18:45 +0000 (16:18 +1100)]
tests: Use qpci_mem{read,write} in ivshmem-test
ivshmem implements a block of shared memory in a PCI BAR. Currently our
test case accesses this using qtest_mem{read,write}. However, deducing
the correct addresses for these requires making assumptions about the
internel format returned by qpci_iomap(), along with some ugly casts.
This patch changes the test to use the new qpci_mem{read,write} interfaces
which is neater.
David Gibson [Wed, 19 Oct 2016 04:00:21 +0000 (15:00 +1100)]
libqos: Add 64-bit PCI IO accessors
Currently the libqos PCI layer includes accessor helpers for 8, 16 and 32
bit reads and writes. It's likely that we'll want 64-bit accesses in the
future (plenty of modern peripherals will have 64-bit reigsters). This
adds them.
For PIO (not MMIO) accesses on the PC backend, this is implemented as two
32-bit ins or outs. That's not ideal but AFAICT x86 doesn't have 64-bit
versions of in and out.
This patch also converts the single current user of 64-bit accesses -
virtio-pci.c to use the new mechanism, rather than a sequence of 8 byte
reads.
David Gibson [Thu, 20 Oct 2016 23:46:37 +0000 (10:46 +1100)]
tests: Clean up IO handling in ide-test
ide-test uses many explicit inb() / outb() operations for its IO, which
means it's not portable to non-x86 platforms. This cleans it up to use
the libqos PCI accessors instead.
David Gibson [Wed, 19 Oct 2016 03:20:44 +0000 (14:20 +1100)]
libqos: Implement mmio accessors in terms of mem{read,write}
In the libqos PCI code we now have accessors both for registers (byte
significance preserving) and for streaming data (byte address order
preserving). These exist in both the interface for qtest drivers and in
the machine specific backends.
However, the register-style accessors aren't actually necessary in the
backend. They can be implemented in terms of the byte address order
preserving accessors by the libqos wrappers. This works because PCI is
always little endian.
This does assume that the back end byte address order preserving accessors
will perform the equivalent of a single bus transaction for short lengths.
This is the case, and in fact they currently end up using the same
cpu_physical_memory_rw() implementation within the qtest accelerator.
David Gibson [Wed, 19 Oct 2016 03:19:47 +0000 (14:19 +1100)]
libqos: Add streaming accessors for PCI MMIO
Currently PCI memory (aka MMIO) space is accessed via a set of readb/writeb
style accessors. This is what we want for accessing discrete registers of
a certain size. However, there are a few cases where we instead need a
"bag of bytes" style streaming interface to PCI MMIO space. This can be
either for streaming data style registers or when there's actual memory
rather than registers in PCI space, for example frame buffers or ivshmem.
This patch adds backend callbacks, and libqos wrappers for this type of
byte address order preserving accesses.
David Gibson [Wed, 19 Oct 2016 10:49:39 +0000 (21:49 +1100)]
tests: Adjust tco-test to use qpci_legacy_iomap()
Avoid tco-test making assumptions about the internal format of the address
tokens passed to PCI IO accessors, by using the new qpci_legacy_iomap()
function.
David Gibson [Wed, 19 Oct 2016 06:43:42 +0000 (17:43 +1100)]
libqos: Better handling of PCI legacy IO
The usual model for PCI IO with libqos is to use qpci_iomap() to map a
specific BAR for a PCI device, then perform IOs within that BAR using
qpci_io_{read,write}*().
However, certain devices also have legacy PCI IO. In this case, instead of
(or as well as) being accessed via PCI BARs, the device can be accessed
via certain well-known, fixed addresses in PCI IO space.
Two existing tests use legacy PCI IO, and take different flawed approaches
to it:
* tco-test manually constructs a tco_io_base value instead of calling
qpci_iomap(), which assumes internal knowledge of the structure of
the value it shouldn't have
* ide-test uses direct in*() and out*() calls instead of using
qpci_io_*() accessors, meaning it's not portable to non-x86 machine
types.
This patch implements a new qpci_iomap_legacy() interface which gets a
handle in the same format as qpci_iomap() but refers to a region in
the legacy PIO space. For a device which has the same registers
available both in a BAR and in legacy space (quite common), this
allows the same test code to test both options with just a different
iomap() at the beginning.
David Gibson [Wed, 19 Oct 2016 03:06:51 +0000 (14:06 +1100)]
libqos: Move BAR assignment to common code
The PCI backends in libqos each supply an iomap() and iounmap() function
which is used to set up a specified PCI BAR. But PCI BAR allocation takes
place entirely within PCI space, so doesn't really need per-backend
versions. For example, Linux includes generic BAR allocation code used on
platforms where that isn't done by firmware.
This patch merges the BAR allocation from the two existing backends into a
single simplified copy. The back ends just need to set up some parameters
describing the window of PCI IO and PCI memory addresses which are
available for allocation. Like both the existing versions the new one uses
a simple bump allocator.
Note that (again like the existing versions) this doesn't really handle
64-bit memory BARs properly. It is actually used for such a BAR by the
ivshmem test, and apparently the 32-bit MMIO BAR logic is close enough to
work, as long as the BAR isn't too big. Fixing that to properly handle
64-bit BAR allocation is a problem for another time.
David Gibson [Tue, 18 Oct 2016 06:02:49 +0000 (17:02 +1100)]
libqos: Handle PCI IO de-multiplexing in common code
The PCI IO space (aka PIO, aka legacy IO) and PCI memory space (aka MMIO)
are distinct address spaces by the PCI spec (although parts of one might be
aliased to parts of the other in some cases).
However, qpci_io_read*() and qpci_io_write*() can perform accesses to
either space depending on parameter. That's convenient for test case
drivers, since there are a fair few devices which can be controlled via
either a PIO or MMIO BAR but with an otherwise identical driver.
This is implemented by having addresses below 64kiB treated as PIO, and
those above treated as MMIO. This works because low addresses in memory
space are generally reserved for DMA rather than MMIO.
At the moment, this demultiplexing must be handled by each PCI backend
(pc and spapr, so far). There's no real reason for this - the current
encoding is likely to work for all platforms, and even if it doesn't we
can still use a more complex common encoding since the value returned from
iomap are semi-opaque.
This patch moves the demultiplexing into the common part of the libqos PCI
code, with the backends having simpler, separate accessors for PIO and
MMIO space. This also means we have a way of explicitly accessing either
space if it's necessary for some special case.
David Gibson [Thu, 20 Oct 2016 03:08:07 +0000 (14:08 +1100)]
libqos: Give qvirtio_config_read*() consistent semantics
The 'addr' parameter to qvirtio_config_read*() doesn't have a consistent
meaning: when using the virtio-pci versions, it's a full PCI space address,
but for virtio-mmio, it's an offset from the device's base mmio address.
This means that the callers need to do different things to calculate the
addresses in the two cases, which rather defeats the purpose of function
pointer backends.
All the current users of these functions are using them to retrieve
variables from the device specific portion of the virtio config space.
So, this patch alters the semantics to always be an offset into that
device specific config area.
Hervé Poussineau [Tue, 25 Oct 2016 07:01:01 +0000 (09:01 +0200)]
adb: change handler only when recognized
ADB devices must take new handler into account only when they recognize it.
This lets operating systems probe for valid/invalid handles, to know device capabilities.
Add a FIXME in keyboard handler, which should use a different translation
table depending of the selected handler.
ibm,architecture-vec-5 is supposed to encode all option vector 5 bits
negotiated between platform/guest. Currently we hardcode this property
in the boot-time device tree to advertise a single negotiated
capability, "Form 1" NUMA Affinity, regardless of whether or not CAS
has been invoked or that capability has actually been negotiated.
Improve this by generating ibm,architecture-vec-5 based on the full
set of option vector 5 capabilities negotiated via CAS.
Michael Roth [Tue, 25 Oct 2016 04:47:29 +0000 (23:47 -0500)]
spapr: add option vector handling in CAS-generated resets
In some cases, ibm,client-architecture-support calls can fail. This
could happen in the current code for situations where the modified
device tree segment exceeds the buffer size provided by the guest
via the call parameters. In these cases, QEMU will reset, allowing
an opportunity to regenerate the device tree from scratch via
boot-time handling. There are potentially other scenarios as well,
not currently reachable in the current code, but possible in theory,
such as cases where device-tree properties or nodes need to be removed.
We currently don't handle either of these properly for option vector
capabilities however. Instead of carrying the negotiated capability
beyond the reset and creating the boot-time device tree accordingly,
we start from scratch, generating the same boot-time device tree as we
did prior to the CAS-generated and the same device tree updates as we
did before. This could (in theory) cause us to get stuck in a reset
loop. This hasn't been observed, but depending on the extensiveness
of CAS-induced device tree updates in the future, could eventually
become an issue.
Address this by pulling capability-related device tree
updates resulting from CAS calls into a common routine,
spapr_dt_cas_updates(), and adding an sPAPROptionVector*
parameter that allows us to test for newly-negotiated capabilities.
We invoke it as follows:
1) When ibm,client-architecture-support gets called, we
call spapr_dt_cas_updates() with the set of capabilities
added since the previous call to ibm,client-architecture-support.
For the initial boot, or a system reset generated by something
other than the CAS call itself, this set will consist of *all*
options supported both the platform and the guest. For calls
to ibm,client-architecture-support immediately after a CAS-induced
reset, we call spapr_dt_cas_updates() with only the set
of capabilities added since the previous call, since the other
capabilities will have already been addressed by the boot-time
device-tree this time around. In the unlikely event that
capabilities are *removed* since the previous CAS, we will
generate a CAS-induced reset. In the unlikely event that we
cannot fit the device-tree updates into the buffer provided
by the guest, well generate a CAS-induced reset.
2) When a CAS update results in the need to reset the machine and
include the updates in the boot-time device tree, we call the
spapr_dt_cas_updates() using the full set of negotiated
capabilities as part of the reset path. At initial boot, or after
a reset generated by something other than the CAS call itself,
this set will be empty, resulting in what should be the same
boot-time device-tree as we generated prior to this patch. For
CAS-induced reset, this routine will be called with the full set of
capabilities negotiated by the platform/guest in the previous
CAS call, which should result in CAS updates from previous call
being accounted for in the initial boot-time device tree.
Michael Roth [Tue, 25 Oct 2016 04:47:28 +0000 (23:47 -0500)]
spapr_hcall: use spapr_ovec_* interfaces for CAS options
Currently we access individual bytes of an option vector via
ldub_phys() to test for the presence of a particular capability
within that byte. Currently this is only done for the "dynamic
reconfiguration memory" capability bit. If that bit is present,
we pass a boolean value to spapr_h_cas_compose_response()
to generate a modified device tree segment with the additional
properties required to enable this functionality.
As more capability bits are added, will would need to modify the
code to add additional option vector accesses and extend the
param list for spapr_h_cas_compose_response() to include similar
boolean values for these parameters.
Avoid this by switching to spapr_ovec_* helpers so we can do all
the parsing in one shot and then test for these additional bits
within spapr_h_cas_compose_response() directly.
Michael Roth [Tue, 25 Oct 2016 04:47:27 +0000 (23:47 -0500)]
spapr_ovec: initial implementation of option vector helpers
PAPR guests advertise their capabilities to the platform by passing
an ibm,architecture-vec structure via an
ibm,client-architecture-support hcall as described by LoPAPR v11,
B.6.2.3. during early boot.
Using this information, the platform enables the capabilities it
supports, then encodes a subset of those enabled capabilities (the
5th option vector of the ibm,architecture-vec structure passed to
ibm,client-architecture-support) into the guest device tree via
"/chosen/ibm,architecture-vec-5".
The logical format of these these option vectors is a bit-vector,
where individual bits are addressed/documented based on the byte-wise
offset from the beginning of the bit-vector, followed by the bit-wise
index starting from the byte-wise offset. Thus the bits of each of
these bytes are stored in reverse order. Additionally, the first
byte of each option vector is encodes the length of the option vector,
so byte offsets begin at 1, and bit offset at 0.
This is not very intuitive for the purposes of mapping these bits to
a particular documented capability, so this patch introduces a set
of abstractions that encapsulate the work of parsing/encoding these
options vectors and testing for individual capabilities.
Cc: Bharata B Rao <[email protected]> Signed-off-by: Michael Roth <[email protected]>
[dwg: Tweaked double-include protection to not trigger a checkpatch
false positive] Signed-off-by: David Gibson <[email protected]>
David Gibson [Thu, 20 Oct 2016 05:05:00 +0000 (16:05 +1100)]
pseries: Remove spapr_create_fdt_skel()
For historical reasons construction of the guest device tree in spapr is
divided between spapr_create_fdt_skel() which is called at init time, and
spapr_build_fdt() which runs at reset time. Over time, more and more
things have needed to be moved to reset time.
Previous cleanups mean the only things left in spapr_create_fdt_skel() are
the properties of the root node itself. Finish consolidating these two
parts of device tree construction, by moving this to the start of
spapr_build_fdt(), and removing spapr_create_fdt_skel() entirely.
David Gibson [Thu, 20 Oct 2016 05:01:17 +0000 (16:01 +1100)]
pseries: Consolidate construction of /vdevice device tree node
Construction of the /vdevice node (and its children) is divided between
spapr_create_fdt_skel() (at init time), which creates the base node, and
spapr_populate_vdevice() (at reset time) which creates the nodes for each
individual virtual device.
This consolidates both into a single function called from
spapr_build_fdt().
David Gibson [Thu, 20 Oct 2016 04:59:36 +0000 (15:59 +1100)]
pseries: Move /hypervisor node construction to fdt_build_fdt()
Currently the /hypervisor device tree node is constructed in
spapr_create_fdt_skel(). As part of consolidating device tree construction
to reset time, move it to a function called from spapr_build_fdt().
David Gibson [Thu, 20 Oct 2016 04:56:48 +0000 (15:56 +1100)]
pseries: Move /event-sources construction to spapr_build_fdt()
The /event-sources device tree node is built from spapr_create_fdt_skel().
As part of consolidating device tree construction to reset time, this moves
it to spapr_build_fdt().
David Gibson [Thu, 20 Oct 2016 04:55:36 +0000 (15:55 +1100)]
pseries: Consolidate construction of /rtas device tree node
For historical reasons construction of the /rtas node in the device
tree (amongst others) is split into several places. In particular
it's split between spapr_create_fdt_skel(), spapr_build_fdt() and
spapr_rtas_device_tree_setup().
In fact, as well as adding the actual RTAS tokens to the device tree,
spapr_rtas_device_tree_setup() just adds the ibm,lrdr-capacity
property, which despite going in the /rtas node, doesn't have a lot to
do with RTAS.
This patch consolidates the code constructing /rtas together into a new
spapr_dt_rtas() function. spapr_rtas_device_tree_setup() is renamed to
spapr_dt_rtas_tokens() and now only adds the token properties.
David Gibson [Mon, 24 Oct 2016 01:05:57 +0000 (12:05 +1100)]
pseries: Consolidate construction of /chosen device tree node
For historical reasons, building the /chosen node in the guest device tree
is split across several places and includes both parts which write the DT
sequentially and others which use random access functions.
This patch consolidates construction of the node into one place, using
random access functions throughout.
David Gibson [Thu, 20 Oct 2016 05:07:56 +0000 (16:07 +1100)]
pseries: Move construction of /interrupt-controller fdt node
Currently the device tree node for the XICS interrupt controller is in
spapr_create_fdt_skel(). As part of consolidating device tree construction
to reset time, this moves it to a function called from spapr_build_fdt().
In addition we move the actual code into hw/intc/xics_spapr.c with the
rest of the PAPR specific interrupt controller code.
David Gibson [Thu, 20 Oct 2016 04:37:41 +0000 (15:37 +1100)]
pseries: Consolidate RTAS loading
At each system reset, the pseries machine needs to load RTAS (the runtime
portion of the guest firmware) into the VM. This means copying
the actual RTAS code into guest memory, and also updating the device
tree so that the guest OS and boot firmware can locate it.
For historical reasons the copy and update to the device tree were in
different parts of the code. This cleanup brings them both together in
an spapr_load_rtas() function.