Peter Maydell [Tue, 5 Feb 2019 18:25:07 +0000 (18:25 +0000)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20190205' into staging
target-arm queue:
* Implement Armv8.5-BTI extension for system emulation mode
* Implement the PR_PAC_RESET_KEYS prctl() for linux-user mode's Armv8.3-PAuth support
* Support TBI (top-byte-ignore) properly for linux-user mode
* gdbstub: allow killing QEMU via vKill command
* hw/arm/boot: Support DTB autoload for firmware-only boots
* target/arm: Make FPSCR/FPCR trapped-exception bits RAZ/WI
* remotes/pmaydell/tags/pull-target-arm-20190205: (22 commits)
target/arm: Make FPSCR/FPCR trapped-exception bits RAZ/WI
hw/arm/boot: Support DTB autoload for firmware-only boots
hw/arm/boot: Clarify why arm_setup_firmware_boot() doesn't set env->boot_info
hw/arm/boot: Factor out "set up firmware boot" code
hw/arm/boot: Factor out "direct kernel boot" code into its own function
hw/arm/boot: Fix block comment style in arm_load_kernel()
gdbstub: allow killing QEMU via vKill command
target/arm: Enable TBI for user-only
target/arm: Compute TB_FLAGS for TBI for user-only
target/arm: Clean TBI for data operations in the translator
target/arm: Add TBFLAG_A64_TBID, split out gen_top_byte_ignore
tests/tcg/aarch64: Add pauth smoke test
linux-user: Implement PR_PAC_RESET_KEYS
target/arm: Enable BTI for -cpu max
target/arm: Set btype for indirect branches
target/arm: Reset btype for direct branches
target/arm: Default handling of BTYPE during translation
target/arm: Cache the GP bit for a page in MemTxAttrs
exec: Add target-specific tlb bits to MemTxAttrs
target/arm: Add BT and BTYPE to tb->flags
...
* remotes/cohuck/tags/s390x-20190205:
s390x/pci: Unplug remaining requested devices on pcihost reset
s390x/pci: Warn when adding PCI devices without the 'zpci' feature
s390x/pci: Fix hotplugging of PCI bridges
s390x/pci: Fix primary bus number for PCI bridges
s390x/tcg: Don't model FP registers as globals
s390x/pci: mark zpci devices as unmigratable
s390x/pci: Drop release timer and replace it with a flag
s390x/pci: Introduce unplug requests and split unplug handler
s390x: remove direct reference to mem_path global from s390x code
target/s390x: define TCG_GUEST_DEFAULT_MO for MTTCG
Peter Maydell [Tue, 5 Feb 2019 16:52:42 +0000 (16:52 +0000)]
target/arm: Make FPSCR/FPCR trapped-exception bits RAZ/WI
The {IOE, DZE, OFE, UFE, IXE, IDE} bits in the FPSCR/FPCR are for
enabling trapped IEEE floating point exceptions (where IEEE exception
conditions cause a CPU exception rather than updating the FPSR status
bits). QEMU doesn't implement this (and nor does the hardware we're
modelling), but for implementations which don't implement trapped
exception handling these control bits are supposed to be RAZ/WI.
This allows guest code to test for whether the feature is present
by trying to write to the bit and checking whether it sticks.
QEMU is incorrectly making these bits read as written. Make them
RAZ/WI as the architecture requires.
In particular this was causing problems for the NetBSD automatic
test suite.
Peter Maydell [Tue, 5 Feb 2019 16:52:42 +0000 (16:52 +0000)]
hw/arm/boot: Support DTB autoload for firmware-only boots
The arm_boot_info struct has a skip_dtb_autoload flag: if this is
set to true by the board code then arm_load_kernel() will not
load the DTB itself, but will leave this for the board code to
do itself later. However, the check for this is done in a
code path which is only executed for the case where we load
a kernel image file. If we're taking the "boot via firmware"
code path then the flag isn't honoured and the DTB is never
loaded.
We didn't notice this because the only real user of "boot
via firmware" that cares about the DTB is the virt board
(for UEFI boot), and that always wants skip_dtb_autoload
anyway. But the SBSA reference board model we're planning to
add will want the flag to behave correctly.
Now we've refactored the arm_load_kernel() function, the
fix is simple: drop the early 'return' so we fall into
the same "load the DTB" code the boot-direct-kernel path uses.
Peter Maydell [Tue, 5 Feb 2019 16:52:41 +0000 (16:52 +0000)]
hw/arm/boot: Factor out "direct kernel boot" code into its own function
Factor out the "direct kernel boot" code path from arm_load_kernel()
into its own function; this function is getting long enough that
the code flow is a bit confusing.
This commit only moves code around; no semantic changes.
We leave the "load the dtb" code in arm_load_kernel() -- this
is currently only used by the "direct kernel boot" path, but
this is a bug which we will fix shortly.
Peter Maydell [Tue, 5 Feb 2019 16:52:41 +0000 (16:52 +0000)]
hw/arm/boot: Fix block comment style in arm_load_kernel()
Fix the block comment style in arm_load_kernel() to QEMU's
current style preferences. This will allow us to do some
refactoring of this function without checkpatch complaining
about the code-motion patches.
Max Filippov [Tue, 5 Feb 2019 16:52:41 +0000 (16:52 +0000)]
gdbstub: allow killing QEMU via vKill command
With multiprocess extensions gdb uses 'vKill' packet instead of 'k' to
kill the inferior. Handle 'vKill' the same way 'k' was handled in the
presence of single process.
Fixes: 7cf48f6752e5 ("gdbstub: add multiprocess support to
(f|s)ThreadInfo and ThreadExtraInfo")
Peter Maydell [Tue, 5 Feb 2019 16:52:40 +0000 (16:52 +0000)]
target/arm: Compute TB_FLAGS for TBI for user-only
Enables, but does not turn on, TBI for CONFIG_USER_ONLY.
Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Richard Henderson <[email protected]>
Message-id: 20190204132126[email protected]
[PMM: adjusted #ifdeffery to placate clang, which otherwise complains
about static functions that are unused in the CONFIG_USER_ONLY build] Signed-off-by: Peter Maydell <[email protected]>
target/arm: Clean TBI for data operations in the translator
This will allow TBI to be used in user-only mode, as well as
avoid ping-ponging the softmmu TLB when TBI is in use. It
will also enable other armv8 extensions.
target/arm: Default handling of BTYPE during translation
The branch target exception for guarded pages has high priority,
and only 8 instructions are valid for that case. Perform this
check before doing any other decode.
Clear BTYPE after all insns that neither set BTYPE nor exit via
exception (DISAS_NORETURN).
Not yet handled are insns that exit via DISAS_NORETURN for some
other reason, like direct branches.
Peter Maydell [Tue, 5 Feb 2019 16:52:19 +0000 (16:52 +0000)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pci, pc, virtio: fixes, cleanups, features
vhost user blk discard/write zeroes features
misc cleanups and fixes all over the place
Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Tue 05 Feb 2019 16:00:20 GMT
# gpg: using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>" [full]
# gpg: aka "Michael S. Tsirkin <[email protected]>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream:
contrib/libvhost-user: cleanup casts
r2d: fix build on mingw
mmap-alloc: fix hugetlbfs misaligned length in ppc64
mmap-alloc: unfold qemu_ram_mmap()
i386, acpi: cleanup build_facs by removing second unused argument
fw_cfg: fix the life cycle and the name of "qemu_extra_params_fw"
acpi: Make TPM 2.0 with TIS available as MSFT0101
hw/virtio: Use CONFIG_VIRTIO_PCI switch instead of CONFIG_PCI
vhost-user-blk: add discard/write zeroes features support
contrib/vhost-user-blk: fix the compilation issue
pci/msi: export msi_is_masked()
intel_iommu: reset intr_enabled when system reset
intel_iommu: fix operator in vtd_switch_address_space
hw: virtio-pci: drop DO_UPCAST
include: update Linux headers to 4.21-rc1/5.0-rc1
scripts/update-linux-headers.sh: adjust for Linux 4.21-rc1 (or 5.0-rc1)
contrib/libvhost-user: switch to uint64_t
virtio: add checks for the size of the indirect table
However, we still need to consider the underlying huge page size
during munmap() because it requires that both address and length be a
multiple of the underlying huge page size for Huge TLB mappings.
Quote from "Huge page (Huge TLB) mappings" paragraph under NOTES
section of the munmap(2) manual:
"For munmap(), addr and length must both be a multiple of the
underlying huge page size."
On ppc64, the munmap() in qemu_ram_munmap() does not work for Huge TLB
mappings because the mapped segment can be aligned with the underlying
huge page size, not aligned with the native system page size, as
returned by getpagesize().
This has the side effect of not releasing huge pages back to the pool
after a hugetlbfs file-backed memory device is hot-unplugged.
This patch fixes the situation in qemu_ram_mmap() and
qemu_ram_munmap() by considering the underlying page size on ppc64.
After this patch, memory hot-unplug releases huge pages back to the
pool.
Unfold parts of qemu_ram_mmap() for the sake of understanding, moving
declarations to the top, and keeping architecture-specifics in the
ifdef-else blocks. No changes in the function behaviour.
Give ptr and ptr1 meaningful names:
ptr -> guardptr : pointer to the PROT_NONE guard region
ptr1 -> ptr : pointer to the mapped memory returned to caller
Laszlo Ersek [Fri, 18 Jan 2019 22:31:52 +0000 (23:31 +0100)]
fw_cfg: fix the life cycle and the name of "qemu_extra_params_fw"
Commit 19bcc4bc3213 ("fw_cfg: Make qemu_extra_params_fw locally",
2019-01-04) changed the storage duration of the "qemu_extra_params_fw"
array from static to automatic. This broke the interface contract on the
fw_cfg_add_file() function, which is documented as follows, in
"include/hw/nvram/fw_cfg.h":
> [...] The data referenced by the starting pointer is only linked, NOT
> copied, into the data structure of the fw_cfg device. [...]
As a result, when guest firmware fetches the "etc/boot-menu-wait" fw_cfg
file, it now sees garbage. Fix the regression by changing the storage
duration to allocated. (The call is reached at most once, on the realize
path of the board-specific fw_cfg sysbus device.)
While at it, clean up the name and the assignment of the object as well.
Stefan Berger [Fri, 25 Jan 2019 21:00:58 +0000 (16:00 -0500)]
acpi: Make TPM 2.0 with TIS available as MSFT0101
This patch makes the a TPM 2.0 with TIS interface available under the
HID 'MSF0101'. This is supported by Linux and also Windows now
recognizes the TPM 2.0 with TIS interface. Leave the TPM 1.2 as before.
Thomas Huth [Fri, 25 Jan 2019 12:56:00 +0000 (13:56 +0100)]
hw/virtio: Use CONFIG_VIRTIO_PCI switch instead of CONFIG_PCI
For downstream s390x builds, we'd like to be able to build QEMU with
CONFIG_VIRTIO_PCI disabled (since virtio-ccw is used here instead),
but still with CONFIG_PCI enabled. This currently fails since the
virtio-*-pci.o files are still included in the build, but virtio-pci.o
is missing. Use the right config switch CONFIG_VIRTIO_PCI to exclude
the virtio-*-pci.o files from the build.
Changpeng Liu [Wed, 16 Jan 2019 05:19:30 +0000 (13:19 +0800)]
vhost-user-blk: add discard/write zeroes features support
Linux commit 1f23816b8 "virtio_blk: add discard and write zeroes support"
added the support in the Guest kernel, while here also enable the features
support with vhost-user-blk driver. Also enable the test example utility
with DISCARD and WRITE ZEROES commands.
Peter Xu [Wed, 16 Jan 2019 03:08:13 +0000 (11:08 +0800)]
intel_iommu: reset intr_enabled when system reset
This is found when I was debugging another problem. Until now no bug
is reported with this but we'd better reset the IR status correctly
after a system reset.
Peter Xu [Wed, 16 Jan 2019 03:08:12 +0000 (11:08 +0800)]
intel_iommu: fix operator in vtd_switch_address_space
When calculating use_iommu, we wanted to first detect whether DMAR is
enabled, then check whether PT is enabled if DMAR is enabled. However
in the current code we used "&" rather than "&&" so the ordering
requirement is lost (instead it'll be an "AND" operation). This could
introduce errors dumped in QEMU console when rebooting a guest with
both assigned device and vIOMMU, like:
* remotes/kraxel/tags/ui-20190205-pull-request:
keymap: fix keyup mappings
keymap: pass full keyboard state to keysym2scancode
kbd-state: use state tracker for vnc
kbd-state: use state tracker for gtk
sdl2: use only QKeyCode in sdl2_process_key()
kbd-state: use state tracker for sdl2
sdl2: remove sdl2_reset_keys() function
kbd-state: add keyboard state tracker
ui/egl-helpers: Augment parameter list of egl_texture_blend() to convey scales of viewport
ui/cocoa.m: Fix macOS 10.14 deprecation warnings
ui/sdl_keysym: Remove obsolete SDL1.2 related code
ui: listen for GDK_SMOOTH_SCROLL events
ui: don't send any event if delta_y == 0
Remove deprecated -no-frame option
Gerd Hoffmann [Tue, 22 Jan 2019 09:28:14 +0000 (10:28 +0100)]
keymap: fix keyup mappings
It is possible that the modifier state on keyup is different from the
modifier state on keydown. In that case the keycode lookup can end up
with different keys in case multiple keysym -> keycode mappings exist,
because it picks the mapping depending on modifier state.
To fix that change the lookup logic for keyup events. Instead of
looking at the modifier state check the key state and prefer a keycodes
where the key is in "down" state right now.
Gerd Hoffmann [Tue, 22 Jan 2019 09:28:09 +0000 (10:28 +0100)]
kbd-state: use state tracker for sdl2
Use the new keyboard state tracked for sdl2. We can drop the modifier
state tracking from sdl2. Also keyup code is simpler, the state tracker
will take care to not send suspious keyup events to the guest.
Gerd Hoffmann [Tue, 22 Jan 2019 09:28:07 +0000 (10:28 +0100)]
kbd-state: add keyboard state tracker
Now that most user interfaces are using QKeyCodes it is easier to have
common keyboard code useable by all user interfaces.
This patch adds helper code to track the state of all keyboard keys,
using a bitmap indexed by QKeyCode. Modifier state is tracked too,
as separate bitmap. That makes checking modifier state easier.
Likewise we can easily apply special handling for capslock & numlock
(toggles on keypress) and ctrl + shift (we have two keys for that).
Chen Zhang [Fri, 25 Jan 2019 07:47:23 +0000 (15:47 +0800)]
ui/egl-helpers: Augment parameter list of egl_texture_blend() to convey scales of viewport
This would help gtk-egl display showing scaled DMABuf cursor images when
gtk window was zoomed. A default scale of (1.0, 1.0) was presumed for
call sites where no scaling is needed.
Peter Maydell [Tue, 5 Feb 2019 09:35:53 +0000 (09:35 +0000)]
Merge remote-tracking branch 'remotes/xtensa/tags/20190204-xtensa' into staging
target/xtensa: SMP updates and various fixes
- fix CPU wakeup on runstall changes; expose runstall as an IRQ line;
- place mini-bootloader at the BSP reset vector;
- expose CPU core frequency in XTFPGA board FPGA register;
- rearrange access to external interrupts of xtensa cores;
- add MX interrupt distributor and use it on SMP XTFPGA boards;
- add test_mmuhifi_c3 xtensa core variant;
- raise number of CPUs that can be instantiated on XTFPGA boards.
* remotes/xtensa/tags/20190204-xtensa:
hw/xtensa: xtfpga: raise CPU number limit
target/xtensa: add test_mmuhifi_c3 core
hw/xtensa: xtfpga: use MX PIC for SMP
target/xtensa: add MX interrupt controller
target/xtensa: expose core runstall as an IRQ line
target/xtensa: rearrange access to external interrupts
target/xtensa: drop function xtensa_timer_irq
target/xtensa: fix access to the INTERRUPT SR
hw/xtensa: xtfpga: use core frequency
hw/xtensa: xtfpga: fix bootloader placement in SMP
target/xtensa: add qemu_cpu_kick to xtensa_runstall
s390x/pci: Unplug remaining requested devices on pcihost reset
When resetting the guest we should unplug and remove all devices that
are still pending.
With this patch, the requested device will be unplugged on reboot
(S390_RESET_EXTERNAL and S390_RESET_REIPL, which reset the pcihost bridge
via qemu_devices_reset()).
This approach is similar to what's done for acpi PCI hotplug in
acpi_pcihp_reset() -> acpi_pcihp_update() ->
acpi_pcihp_update_hotplug_bus() -> acpi_pcihp_eject_slot().
s390_pci_generate_plug_event()'s will still be generated, I guess this
is not an issue. The same thing would happen right now when unplugging
a device just before starting the guest.
s390x/pci: Warn when adding PCI devices without the 'zpci' feature
We decided to always create the PCI host bridge, even if 'zpci' is not
enabled (due to migration compatibility). This however right now allows
to add zPCI/PCI devices to a VM although the guest will never actually see
them, confusing people that are using a simple CPU model that has no
'zpci' enabled - "Why isn't this working" (David Hildenbrand)
Let's check for 'zpci' and at least print a warning that this will not
work as expected. We could also bail out, however that might break
existing QEMU commandlines.
When hotplugging a PCI bridge right now to the root port, we resolve
pci_get_bus(pdev)->parent_dev, which results in a SEGFAULT. Hotplugging
really only works right now when hotplugging to another bridge.
Instead, we have to properly check if we are already at the root.
Let's cleanup the code while at it a bit and factor out updating the
subordinate bus number into a separate function. The check for
"old_nr < nr" is right now not strictly necessary, but makes it more
obvious what is actually going on.
Most probably fixing up the topology is not our responsibility when
hotplugging. The guest has to sort this out. But let's keep it for now
and only fix current code to not crash.
The primary bus number corresponds always to the bus number of the
bus the bridge is attached to.
Right now, if we have two bridges attached to the same bus (e.g. root
bus) this is however not the case. The first bridge will have primary
bus 0, the second bridge primary bus 1, which is wrong. Fix the assignment.
While at it, drop setting the PCI_SUBORDINATE_BUS temporarily to 0xff.
Setting it temporarily to that value (as discussed e.g. in [1]), is
only relevant for a running system that probes the buses. The value is
effectively unused for us just doing a DFS.
Also add a comment why we have to reassign during every reset (which I
found to be surprising.
Please note that hotplugging of bridges is in general still broken, will
be fixed next.
Brendan Shanks [Fri, 1 Feb 2019 07:12:25 +0000 (23:12 -0800)]
ui/cocoa.m: Fix macOS 10.14 deprecation warnings
macOS 10.14 deprecated NSOnState/NSOffState in favour of
NSControlStateValueOn/NSControlStateValueOff. Use the new constants,
and #define them to the old ones when compiling against a pre-10.13 SDK.
Also [NSGraphicsContext graphicsPort] is now deprecated, use
[NSGraphicsContext CGContext] when available.
Sergio Lopez [Mon, 4 Feb 2019 12:20:43 +0000 (13:20 +0100)]
ui: don't send any event if delta_y == 0
When the user raises their fingers from the touchpad, we may receive a
GDK_SMOOTH_SCROLL event with delta_y == 0. Avoid generating a WHEEL_UP
event in this situation.
Thomas Huth [Tue, 5 Feb 2019 07:29:29 +0000 (08:29 +0100)]
Remove deprecated -no-frame option
The -no-frame option has been deprecated with QEMU v2.12. It was only
useful with SDL1.2 - now that we've removed support for SDL1.2, we
can certainly remove the -no-frame option, too.
We have several paranoid checks for ioc != NULL. But ioc may become
NULL only on close, which should not happen during requests handling.
Also, we check ioc only sometimes, not after each yield, which is
inconsistent. Let's drop these checks. However, for safety, let's leave
asserts instead.
block/nbd-client: split channel errors from export errors
To implement nbd reconnect in further patches, we need to distinguish
error codes, returned by nbd server, from channel errors, to reconnect
only in the latter case.
We generally do very similar things around nbd_read: error_prepend
specifying what we have tried to read, and be_to_cpu conversion of
integers.
So, it seems reasonable to move common things to helper functions,
which:
1. simplify code a bit
2. generalize nbd_read error descriptions, all starting with
"Failed to read"
3. make it more difficult to forget to convert things from BE
Eric Blake [Fri, 25 Jan 2019 23:48:37 +0000 (17:48 -0600)]
qemu-nbd: Deprecate qemu-nbd --partition
The existing qemu-nbd --partition code claims to handle logical
partitions up to 8, since its introduction in 2008 (commit 7a5ca86).
However, the implementation is bogus (actual MBR logical partitions
form a sort of linked list, with one partition per extended table
entry, rather than four logical partitions in a single extended
table), making the code unlikely to work for anything beyond -P5 on
actual guest images. What's more, the code does not support GPT
partitions, which are becoming more popular, and maintaining device
subsetting in both NBD and the raw device is unnecessary duplication
of effort (even if it is not too difficult).
Note that obtaining the offsets of a partition (MBR or GPT) can be
learned by using 'qemu-nbd -c /dev/nbd0 file.qcow2 && sfdisk --dump
/dev/nbd0', but by the time you've done that, you might as well
just mount /dev/nbd0p1 that the kernel creates for you instead of
bothering with qemu exporting a subset. Or, keeping to just
user-space code, use nbdkit's partition filter, which has already
known both GPT and primary MBR partitions for a while, and was
just recently enhanced to support arbitrary logical MBR parititions.
Start the clock on the deprecation cycle, with examples of how
to accomplish device subsetting without using -P.
As floating point registers overlay some vector registers and we want
to make use of the general tcg_gvec infrastructure that assumes vectors
are not stored in globals but in memory, don't model floating point
registers as globals anymore. This is then similar to how arm handles
it.
Reading/writing a floating point register means reading/writing memory now.
Break up ugly in2_x2() handling that modifies both, in1 and in2 into
in2_x2l and in2_x2h. This makes things more readable. Also, in1_x1() is
ugly as it touches out/out2, get rid of that and use prep_x1() instead.
As we are no longer able to use the original global variables for
out/out2, we have to use new temporary variables and write from them to
the target registers using wout_ helpers.
E.g. an instruction that reads and writes x1 will use
- prep_x1 to get the values into out/out2
- wout_x1 to write the values from out/out2
This special handling is needed for x1 as it is often used along with
other inputs, so in1/in2 is already used.
Jason Wang [Wed, 30 Jan 2019 03:14:27 +0000 (11:14 +0800)]
test-filter-mirror: pass UNIX domain socket through fd
The tests tries to let qemu server mode to process the connection
which turns out to be racy after commit 8258292e18c3 ("monitor: Remove
"x-oob", offer capability "oob" unconditionally"). This is because the
filter may try to mirror the packets before UNIX socket object is
ready (connected was set to true) from the view of qemu. In this case
the packet will be dropped silently.
Fixing this by passing pre-connected socket created by socketpair() to
qemu through fd.
Thomas Huth [Mon, 4 Feb 2019 08:25:43 +0000 (09:25 +0100)]
tests/docker/test-mingw and docs: Remove --with-sdlabi=2.0
Patchew currently reports failures with the mingw docker test - this
is due to --with-sdlabi=2.0 configure flag which does not exist anymore.
Remove this remainder from the docker test and the docs now.
Cornelia Huck [Fri, 1 Feb 2019 12:29:08 +0000 (13:29 +0100)]
s390x/pci: mark zpci devices as unmigratable
We currently don't migrate any state for zpci devices, which are
coupled with standard pci devices. This means funny things happen
when we e.g. try to migrate with a virtio-pci device but the s390x-
specific zpci state is not migrated (vfio-pci is not affected, as
it is not migratable anyway.)
Until this is fixed, mark zpci devices as unmigratable.
s390x/pci: Drop release timer and replace it with a flag
Let's handle it similar to x86 ACPI PCI code and don't use a timer.
Instead, remember if an unplug request is pending and keep it pending
for eternity. (a follow up patch will process the request on
reboot).
We expect that a guest that is up and running, will process the unplug
request and trigger the unplug. This is normal operation, no timer needed.
If the guest does not react, this usually means something in the guest
is going wrong. Simply removing the device after 30 seconds does not
really sound like a good idea. It might sometimes be wanted, but I
consider this rather an "opt-in" decision as it might harm a guest not
prepared for it.
If we ever actually want a "forced/surprise removal", we will have to
implement something on top of the existing "device_del" framework. E.g.
also x86 might want to do a forced/surprise removal of PCI devices under
some conditions. "device_del X, forced=true" could be an option and will
require changes to the hotplug handler infrastructure.
This will then move the responsibility on when to do a forced removal
to a higher level. Doing a forced removal right now over-complicates
things and doesn't really seem to be required.
s390x/pci: Introduce unplug requests and split unplug handler
PCI on s390x is really weird and how it was modeled in QEMU might not have
been the right choice. Anyhow, right now it is the case that:
- Hotplugging a PCI device will silently create a zPCI device
(if none is provided)
- Hotunplugging a zPCI device will unplug the PCI device (if any)
- Hotunplugging a PCI device will unplug also the zPCI device
As far as I can see, we can no longer change this behavior. But we
should fix it.
Both device types are handled via a single hotplug handler call. This
is problematic for various reasons:
1. Unplugging via the zPCI device allows to unplug devices that are not
hot removable. (check performed in qdev_unplug()) - bad.
2. Hotplug handler chains are not possible for the unplug case. In the
future, the machine might want to override hotplug handlers, to
process device specific stuff and to then branch off to the actual
hotplug handler. We need separate hotplug handler calls for both the
PCI and zPCI device to make this work reliably. All other PCI
implementations are already prepared to handle this correctly, only
s390x is missing.
Therefore, introduce the unplug_request handler and properly perform
unplug checks by redirecting to the separate unplug_request handlers.
When finally unplugging, perform two separate hotplug_handler_unplug()
calls, first for the PCI device, followed by the zPCI device. This now
nicely splits unplugging paths for both devices.
The redirect part is a little hairy, as the user is allowed to trigger
unplug either via the PCI or the zPCI device. So redirect always to the
PCI unplug request handler first and remember if that check has been
performed in the zPCI device. Redirect then to the zPCI device unplug
request handler to perform the magic. Remembering that we already
checked the PCI device breaks the redirect loop.
Igor Mammedov [Wed, 30 Jan 2019 07:55:06 +0000 (08:55 +0100)]
s390x: remove direct reference to mem_path global from s390x code
I plan to deprecate -mem-path option and replace it with memory-backend,
for that it's necessary to get rid of mem_path global variable.
Do it for s390x case, replacing it with alternative way to enable
1Mb hugepages capability.
Todo that replace qemu_mempath_getpagesize() with qemu_getrampagesize()
which also checks for -mem-path provided RAM.
Alex Bennée [Fri, 18 Jan 2019 17:18:48 +0000 (17:18 +0000)]
target/s390x: define TCG_GUEST_DEFAULT_MO for MTTCG
MTTCG should be enabled by default whenever the memory model allows
it. s390x was missing its definition of TCG_GUEST_DEFAULT_MO meaning
the user had to manually specify --accel tcg,thread=multi.
Paul Durrant [Thu, 31 Jan 2019 15:33:16 +0000 (15:33 +0000)]
xen-block: handle resize callback
Some frontend drivers will handle dynamic resizing of PV disks, so set up
the BlockDevOps resize_cb() method during xen_block_realize() to allow
this to be done.
Paul Durrant [Tue, 22 Jan 2019 15:53:46 +0000 (15:53 +0000)]
xen: fix xen-bus state model to allow frontend re-connection
There is a flaw in the xen-bus state model. To allow a frontend to re-
connect the backend state of an online XenDevice is transitioned from
Closed to InitWait, but this is currently done unilaterally which is
incorrect. The backend state should remain Closed until the frontend state
transitions to Initialising.
This patch removes the automatic backend state transition from
xen_device_backend_state_changed() and, instead, adds an extra check in
xen_device_frontend_state_changed() to determine whether a frontend is
trying to re-connect to a previously Closed XenDevice. Only if this is
found to be the case is the backend state transitioned from Closed to
InitWait. Note that this transition will be common amongst all XenDevice
classes and hence xen_device_frontend_state_changed() returns immediately
afterwards without calling into the XenDeviceClass frontend_changed()
method.
Peter Maydell [Mon, 4 Feb 2019 10:33:40 +0000 (10:33 +0000)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.0-20190204' into staging
ppc patch queue 2019-02-04
Here's the next batch of ppc target and spapr related changes.
Highlights are:
* A number of endianness handling cleanups from Mark Cave-Ayland
* Updated Mac VGA driver
* Updated SLOF image
* Some XIVE cleanups and small fixes
* ppc4xx cleanups and fixes from BALATON Zoltan
There are a few chances not technically in the ppc target code:
* Several MAINTAINERS updates
* Fixes for unmapping of hugepages on power hosts
The latter is included because it's primarily of interest for ppc KVM setups.
* remotes/dgibson/tags/ppc-for-4.0-20190204: (37 commits)
mmap-alloc: fix hugetlbfs misaligned length in ppc64
mmap-alloc: unfold qemu_ram_mmap()
hw/ppc: Don't include m48t59.h if it is not necessary
spapr_pci: Fix endianness in assigned-addresses property
target/ppc: remove various HOST_WORDS_BIGENDIAN hacks in int_helper.c
target/ppc: remove ROTRu32 and ROTRu64 macros from int_helper.c
target/ppc: simplify VEXT_SIGNED macro in int_helper.c
target/ppc: eliminate use of EL_IDX macros from int_helper.c
target/ppc: eliminate use of HI_IDX and LO_IDX macros from int_helper.c
target/ppc: rework vmul{e,o}{s,u}{b,h,w} instructions to use Vsr* macros
target/ppc: rework vmrg{l,h}{b,h,w} instructions to use Vsr* macros
hw/ppc/spapr: Add support for "-vga cirrus"
QemuMacDrivers: update qemu_vga.ndrv to 90c488d built from submodule
MAINTAINERS: add myself as maintainer for Mac Old World and New World machines
spapr: Drop unused parameters from fdt building helper
MAINTAINERS: Merge the two e500 sections
MAINTAINERS: XIVE is an interrupt controller, not a machine
hw/ppc: Move ppc40x_*reset() functions from ppc405_uc.c to ppc.c
ppc: remove the interrupt presenters from under PowerPCCPU
target/ppc: implement complete set of Vsr* macros
...
However, we still need to consider the underlying huge page size
during munmap() because it requires that both address and length be a
multiple of the underlying huge page size for Huge TLB mappings.
Quote from "Huge page (Huge TLB) mappings" paragraph under NOTES
section of the munmap(2) manual:
"For munmap(), addr and length must both be a multiple of the
underlying huge page size."
On ppc64, the munmap() in qemu_ram_munmap() does not work for Huge TLB
mappings because the mapped segment can be aligned with the underlying
huge page size, not aligned with the native system page size, as
returned by getpagesize().
This has the side effect of not releasing huge pages back to the pool
after a hugetlbfs file-backed memory device is hot-unplugged.
This patch fixes the situation in qemu_ram_mmap() and
qemu_ram_munmap() by considering the underlying page size on ppc64.
After this patch, memory hot-unplug releases huge pages back to the
pool.
Unfold parts of qemu_ram_mmap() for the sake of understanding, moving
declarations to the top, and keeping architecture-specifics in the
ifdef-else blocks. No changes in the function behaviour.
Give ptr and ptr1 meaningful names:
ptr -> guardptr : pointer to the PROT_NONE guard region
ptr1 -> ptr : pointer to the mapped memory returned to caller
spapr_pci: Fix endianness in assigned-addresses property
reg->phys_hi and assigned->phys_hi are big endian but we do an extra
byteswap anyway when copying reg->phys_hi to assigned->phys_hi.
To make things slightly more messy, we also add a relocatable bit (b_n())
although in the right endianness.
This fixes endianness of assigned->phys_hi.
This is unlikely to produce any visible difference though as we should end up
there only in the case of PCI hotplug and even then I am not sure if
(d->io_regions[i].addr == PCI_BAR_UNMAPPED) == true.
Mark Cave-Ayland [Wed, 30 Jan 2019 20:36:37 +0000 (20:36 +0000)]
target/ppc: remove ROTRu32 and ROTRu64 macros from int_helper.c
Richard points out that these macros suffer from a -fsanitize=shift bug in that
they improperly handle n == 0 turning it into a shift by 32/64 respectively.
Replace them with QEMU's existing ror32() and ror64() functions instead.
Mark Cave-Ayland [Wed, 30 Jan 2019 20:36:36 +0000 (20:36 +0000)]
target/ppc: simplify VEXT_SIGNED macro in int_helper.c
As pointed out by Richard: it does not need the mask argument, nor does it need
the recast argument. The masking is implied by the cast argument, and the
recast is implied by the assignment.
Mark Cave-Ayland [Wed, 30 Jan 2019 20:36:34 +0000 (20:36 +0000)]
target/ppc: eliminate use of HI_IDX and LO_IDX macros from int_helper.c
The original purpose of these macros was to correctly reference the high and low
parts of the VSRs regardless of the host endianness.
Replace these direct references to high and low parts with the relevant VsrD
macro instead, and completely remove the now-unused HI_IDX and LO_IDX macros.
Mark Cave-Ayland [Wed, 30 Jan 2019 20:36:32 +0000 (20:36 +0000)]
target/ppc: rework vmrg{l,h}{b,h,w} instructions to use Vsr* macros
The current implementations make use of the endian-specific macros MRGLO/MRGHI
and also reference HI_IDX and LO_IDX directly to calculate array offsets.
Rework the implementation to use the Vsr* macros so that these per-endian
references can be removed.
Thomas Huth [Wed, 30 Jan 2019 13:36:39 +0000 (14:36 +0100)]
hw/ppc/spapr: Add support for "-vga cirrus"
The cirrus VGA card has been enabled in the PPC builds with
commit 29f9cef39eb1ae55e82c ("ppc: Include vga cirrus card into
the compiling process") last year. It also works on the pseries
machine, even SLOF contains support for this card, so we can
also support this for the "-vga" parameter here.
Mark Cave-Ayland [Mon, 28 Jan 2019 21:21:56 +0000 (21:21 +0000)]
MAINTAINERS: add myself as maintainer for Mac Old World and New World machines
I've unofficially been doing most of the work on the Mac machines for a while
now, so update MAINTAINERS to reflect this. David is still happy to be listed
as a reviewer as per our discussion at KVM forum.
spapr: Drop unused parameters from fdt building helper
spapr_load_rtas() handles now RTAS address and size information in the FDT
so drop them from spapr_build_fdt().
While we are here, fix a small typo.
Fixes: 3f5dabceba24 "pseries: Consolidate construction of /rtas device tree node" Signed-off-by: Alexey Kardashevskiy <[email protected]> Reviewed-by: Greg Kurz <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Signed-off-by: David Gibson <[email protected]>
Thomas Huth [Wed, 30 Jan 2019 16:22:25 +0000 (17:22 +0100)]
MAINTAINERS: Merge the two e500 sections
There is currently a "e500" machine section and a "ppce500" device
section in the maintainers file - with some oddities: The wildcard
in the device section also covers the files from the machine section.
And hw/pci-host/ppce500.c is in the device section, while its header
is in the machine section.
This is really quite confusing, and I don't see a reason why we really
need two sections here, so let's simply merge them.
Thomas Huth [Wed, 30 Jan 2019 15:45:40 +0000 (16:45 +0100)]
MAINTAINERS: XIVE is an interrupt controller, not a machine
The "XIVE" section is currently listed in the "PowerPC Machines"
section, which is weird, since this is an interrupt controller
device. Move it to the "Devices" section instead.