Git Repo - qemu.git/log

pc: exit QEMU if compat machine doesn't support memory hotlpug

... if user attempts to start it with memory hotplug enabled.

Signed-off-by: Igor Mammedov <[email protected]>
Acked-by: Peter Crosthwaite <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pc: add 'etc/reserved-memory-end' fw_cfg interface for SeaBIOS

'etc/reserved-memory-end' will allow QEMU to tell BIOS where PCI
BARs mapping could safely start in high memory.

Allowing BIOS to start mapping 64-bit PCI BARs at address where it
wouldn't conflict with other mappings QEMU might place before it.

That permits QEMU to reserve extra address space before
64-bit PCI hole for memory hotplug.

Signed-off-by: Igor Mammedov <[email protected]>
Acked-by: Peter Crosthwaite <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pc: exit QEMU if number of slots more than supported 256

... which is imposed by current naming scheme of ACPI memory devices.

Signed-off-by: Igor Mammedov <[email protected]>
Acked-by: Peter Crosthwaite <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pc: initialize memory hotplug address space

initialize and map hotplug memory address space container
into guest's RAM address space.

Signed-off-by: Igor Mammedov <[email protected]>
Acked-by: Peter Crosthwaite <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pc-dimm: do not allow setting an in-use memdev

using the same memdev backend more than once will cause
assertion at MemoryRegion mapping time because it's already
mapped. Prevent it by checking that the associated MemoryRegion
is not mapped.

Signed-off-by: Igor Mammedov <[email protected]>
Acked-by: Peter Crosthwaite <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
MST: tweak commit log

memory: add memory_region_is_mapped() API

which allows to check if MemoryRegion is already mapped.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pc: implement pc-dimm device abstraction

Each hotplug-able memory slot is a PCDIMMDevice.
A hot-add operation for a memory device:
- creates a new PCDIMMDevice and makes hotplug controller to map it into
guest address space

Hotplug operations are done through normal device_add commands.
For migration case, all hotplugged memory devices on source should be
specified on target's command line using '-device' option with
properties set to the same values as on source.

To simplify review, patch introduces only PCDIMMDevice QOM skeleton that
will be extended by following patches to implement actual memory hotplug
and related functions.

Signed-off-by: Vasilis Liaskovitis <[email protected]>
Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

qdev: expose DeviceState.hotplugged field as a property

so that management could detect via QOM interface if device was
hotplugged

Signed-off-by: Igor Mammedov <[email protected]>
Acked-by: Peter Crosthwaite <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

qdev: hotplug for bus-less devices

Add get_hotplug_handler() method to machine, and
make bus-less device use it during hotplug
as a means to discover a hotplug handler controller.
The returned controller is used to perform hotplug
actions.

Signed-off-by: Igor Mammedov <[email protected]>
Acked-by: Peter Crosthwaite <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vl.c: extend -m option to support options for memory hotplug

Add following parameters:
"slots" - total number of hotplug memory slots
"maxmem" - maximum possible memory

"slots" and "maxmem" should go in pair and "maxmem" should be greater
than "mem" for memory hotplug to be enabled.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
MST: fix build on 32 bit

add memdev backend infrastructure

Provides framework for splitting host RAM allocation/
policies into a separate backend that could be used
by devices.

Initially only legacy RAM backend is provided, which
uses memory_region_init_ram() allocator and compatible
with every CLI option that affects memory_region_init_ram().

Signed-off-by: Igor Mammedov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vl.c: daemonize before guest memory allocation

memory allocated for guest before QEMU is daemonized and then mapped
later in guest's address space after it is daemonized, leads to EPT
violation and QEMU aborts.

To avoid this and similar issues switch to daemonized mode early
before applying/processing other options.

Signed-off-by: Igor Mammedov <[email protected]>
Acked-by: Peter Crosthwaite <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

object_add: allow completion handler to get canonical path

Add object to /objects before calling user_creatable_complete()
handler, so that object might be able to call
object_get_canonical_path() in its completion handler.

Signed-off-by: Igor Mammedov <[email protected]>
Acked-by: Peter Crosthwaite <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pc: ACPI BIOS: use enum for defining memory affinity flags

replace magic numbers with enum describing Flags field of
memory affinity in SRAT table.

MemoryAffinityFlags enum will define flags decribed by:
ACPI spec 5.0, "5.2.16.2 Memory Affinity Structure",
"Table 5-69 Flags - Memory Affinity Structure"

Signed-off-by: Igor Mammedov <[email protected]>
Acked-by: Peter Crosthwaite <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pc: create custom generic PC machine type

it will be used for PC specific options/variables

Signed-off-by: Igor Mammedov <[email protected]>
Acked-by: Peter Crosthwaite <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

Merge remote-tracking branch 'remotes/bonzini/memory' into staging

* remotes/bonzini/memory:
  memory: Don't call memory_region_update_coalesced_range if nothing changed
  memory: MemoryRegion: rename parent to container
  memory: MemoryRegion: factor out memory region re-adder
  memory: MemoryRegion: factor out subregion add functionality
  qtest: fix qtest_clock_warp() for no deadline case
  exec: dummy_section: Pass address space through.
  memory: Simplify mr_add_subregion() if-else
  memory: Don't update all memory region when ioeventfd changed
  unset RAMBlock idstr when unregister MemoryRegion
  exec: introduce qemu_ram_unset_idstr() to unset RAMBlock idstr
  MAINTAINERS: Add myself as Memory API maintainer

Signed-off-by: Peter Maydell <[email protected]>

memory: Don't call memory_region_update_coalesced_range if nothing changed

With huge number of PCI devices in the system (for example, 200
virtio-blk-pci), this unconditional call can slow down emulation of
irrelevant PCI operations drastically, such as a BAR update on a device
that has no coalescing region. So avoid it.

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

memory: MemoryRegion: rename parent to container

Avoid confusion with the QOM parent.

Reviewed-by: Peter Crosthwaite <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

memory: MemoryRegion: factor out memory region re-adder

memory_region_set_address is mostly just a function that deletes and
re-adds a memory region. Factor this generic functionality out into a
re-usable function. This prepares support for further QOMification
of MemoryRegion.

Signed-off-by: Peter Crosthwaite <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

memory: MemoryRegion: factor out subregion add functionality

Split off the core looping code that actually adds subregions into
it's own fn. This prepares support for Memory Region qomification
where setting the MR address or parent via QOM will back onto this more
minimal function.

Signed-off-by: Peter Crosthwaite <[email protected]>
[Rename new function. - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>

Merge remote-tracking branch 'remotes/riku/linux-user-for-upstream' into staging

* remotes/riku/linux-user-for-upstream:
  User mode support for Linux ELF files with no section header
  linux-user: Return correct errno for unsupported netlink socket
  linux-user: Don't overrun guest buffer in sched_getaffinity
  linux-user/uname: Return correct uname string for x86_64
  linux-user: fix gcc-4.9 compiler error on __{get,put]}_user
  signal/ppc/do_setcontext remove __get_user return check
  signal/sparc64_set_context: remove __get_user checks
  signal/ppc/{save,restore}_user_regs remove __put/get error checks
  signal/all/setup_frame remove __put_user checks
  signal/all/do_sigreturn - remove __get_user checks
  signal/all/do_sigaltstack remove __get_user value check
  signal/sparc/restore_fpu_state: remove
  signal/all: remove return value from restore_sigcontext
  signal/all: remove return value from setup_sigcontext
  signal/all: remove return value from copy_siginfo_to_user
  signal/x86/setup_frame: __put_user cleanup
  signal/all: remove __get/__put_user return value reading

Signed-off-by: Peter Maydell <[email protected]>

qtest: fix qtest_clock_warp() for no deadline case

Use dedicated qemu_soonest_timeout() instead of MIN().

Signed-off-by: Sergey Fedorov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

exec: dummy_section: Pass address space through.

Rather than use the global singleton.

Signed-off-by: Peter Crosthwaite <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

memory: Simplify mr_add_subregion() if-else

This if else is not needed. The previous call to memory_region_add
(whether _overlap or not) will always set priority and may_overlap
to desired values. And its not possible to get here without having
called memory_region_add_subregion due to the null guard on parent.
So we can just directly call memory_region_add_subregion_common.

Signed-off-by: Peter Crosthwaite <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

memory: Don't update all memory region when ioeventfd changed

memory mappings don't rely on ioeventfds, there is no need
to destroy and rebuild them when manipulating ioeventfds,
otherwise it scarifies performance.

according to testing result, each ioeventfd deleing needs
about 5ms, within which memory mapping rebuilding needs
about 4ms. With many Nics and vmchannel in a VM doing migrating,
there can be many ioeventfds deleting which increasing
downtime remarkably.

Signed-off-by: Gonglei <[email protected]>
Signed-off-by: Herongguang <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

unset RAMBlock idstr when unregister MemoryRegion

Signed-off-by: Hu Tao <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

exec: introduce qemu_ram_unset_idstr() to unset RAMBlock idstr

Signed-off-by: Hu Tao <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

MAINTAINERS: Add myself as Memory API maintainer

I'm not including Avi since he has already removed himself from the
KVM entry. I'm not going to commit my patches without review.

Acked-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

User mode support for Linux ELF files with no section header

In user mode Linux, Qemu currently refuses to load ELF files that do not
contain section headers (ehdr->e_shentsize == 0). Since section headers are not
required in order to load an ELF file, simply removing the e_shentsize check in
elf_check_ehdr() allows ELF binaries with no section headers to be run properly
in user mode:

Signed-off-by: Craig Heffner <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user: Return correct errno for unsupported netlink socket

This fixes "Cannot open audit interface - aborting." when the
EAFNOSUPPORT errno differs between the target and host
architectures (e.g. mips target and x86_64 host).

Signed-off-by: Ed Swierk <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user: Don't overrun guest buffer in sched_getaffinity

If the guest's "long" type is smaller than the host's, then
our sched_getaffinity wrapper needs to round the buffer size
up to a multiple of the host sizeof(long). This means that when
we copy the data back from the host buffer to the guest's
buffer there might be more than we can fit. Rather than
overflowing the guest's buffer, handle this case by returning
EINVAL or ignoring the unused extra space, as appropriate.

Note that only guests using the syscall interface directly might
run into this bug -- the glibc wrappers around it will always
use a buffer whose size is a multiple of 8 regardless of guest
architecture.

Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user/uname: Return correct uname string for x86_64

We were returning the incorrect uname string (with a hyphen, not
an underscore) for x86_64. Fix this by removing the x86_64 special
case, since the default "just use UNAME_MACHINE" behaviour suffices.
This leaves cpu_to_uname_machine() special cases for only those
architectures which need to vary the string based on runtime CPU
features.

Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

linux-user: fix gcc-4.9 compiler error on __{get,put]}_user

gcc-4.9 finds unused operand:

linux-user/syscall.c: In function ‘host_to_target_stat64’:
linux-user/qemu.h:301:19: error: right-hand operand of comma expression
has no effect [-Werror=unused-value]
((hptr), (x)), 0)

Just removing the rh operand is no good, it will error in later:

linux-user/main.c: In function ‘arm_kernel_cmpxchg64_helper’:
linux-user/qemu.h:330:15: error: void value not ignored as it ought to be
__ret = __put_user((x), __hptr); \

Thus, remove setting __ret from __get_user and __put_user, as and
set the right hand operand to (void)0 to make it clear that these
return never nothing.

This commit depends on the signal.c cleanup, to ensure bisectable
version history.

Signed-off-by: Riku Voipio <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Cc: Richard Henderson <[email protected]>

signal/ppc/do_setcontext remove __get_user return check

The last remaining check for return value of __get_user.

Signed-off-by: Riku Voipio <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Cc: Alexander Graf <[email protected]>

signal/sparc64_set_context: remove __get_user checks

Remove checks of __get_user and the err variable
used to control flow with it.

Signed-off-by: Riku Voipio <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>

signal/ppc/{save,restore}_user_regs remove __put/get error checks

As __get_user and __put_user do not return errors, remove the
if checks from around them. This allows making the save/restore
functions void.

Signed-off-by: Riku Voipio <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Cc: Alexander Graf <[email protected]>

signal/all/setup_frame remove __put_user checks

Remove "if(__put_user" checks and their related error paths
for all architecture's setup_frame, setup_rt_frame and similar.

Remove the unlock_user_struct when the only way to end up there is
from failed lock_user_struct.

Remove err variable if there are no users for it in the function
anymore.

Signed-off-by: Riku Voipio <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>

signal/all/do_sigreturn - remove __get_user checks

Remove "if(__get_user" checks and their related error paths
for all architecture's do_sigreturn. Remove the unlock_user_struct
when the only way to end up there is from failed lock_user_struct.

v3: remove unneccesary sigsegv label as suggested by Peter

Signed-off-by: Riku Voipio <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>

signal/all/do_sigaltstack remove __get_user value check

Access is already checked in the lock_user_struct
call before.

Signed-off-by: Riku Voipio <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>

signal/sparc/restore_fpu_state: remove

A function never called from anywhere, obviously half-complete.
Remove function and if someone wants to complete this, please
check the old version out of git history.

Signed-off-by: Riku Voipio <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>

signal/all: remove return value from restore_sigcontext

make most implementations of restore_sigcontext void and
remove checking it's return value from functions calling
restore_sigcontext.

The exception is the X86 version of the function that is
too different from others to deal in this way, and arm
version, to keep possibility of erroring out from failed
valid_user_regs.

v3: keep arm valid_user_regs for filling in near future.

Signed-off-by: Riku Voipio <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>

signal/all: remove return value from setup_sigcontext

Make all implementations of setup_sigcontext void and
remove checking it's return value from functions calling
setup_sigcontext.

Signed-off-by: Riku Voipio <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>

signal/all: remove return value from copy_siginfo_to_user

Since copy_siginfo_to_user always returns 0, make it void
and remove any checks for return value from calling functions.

Signed-off-by: Riku Voipio <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>

signal/x86/setup_frame: __put_user cleanup

Remove the remaining check for __put_user return
value, and all the checks for err variable which
isn't set anywhere anymore.

No we can only end up in give_sigsegv due to failed
lock_user_struct - thus we remove the unlock_user_struct
to avoid unlocking a region never locked.

Signed-off-by: Riku Voipio <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>

signal/all: remove __get/__put_user return value reading

Remove all the simple cases of reading the return value
of __get_user and __put_user.

We set err = 0 in sparc versions of do_sigreturn and
sparc64_set_context to avoid compile error, but else this patch is
just general removal of err |= __get_user ... idiom.

v2: remove err variable from target_rt_restore_ucontext

Signed-off-by: Riku Voipio <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/agraf/tags/signed-ppc-for-upstream' into staging

Patch queue for ppc - 2014-06-16

This pull request brings a lot of fun things. Among others we have

  - e500: u-boot firmware support
  - sPAPR: magic page enablement
  - sPAPR: add "compat" CPU option to support older guests
  - sPAPR: refactorings in preparation for VFIO
  - POWER8 live migration
  - mac99: expose bus frequency
  - little endian core dump, gdb and disas support
  - new ppc64le-linux-user target
  - DFP emulation
  - bug fixes

# gpg: Signature made Mon 16 Jun 2014 12:28:32 BST using RSA key ID 03FEDC60
# gpg: Can't check signature: public key not found

* remotes/agraf/tags/signed-ppc-for-upstream: (156 commits)
  spapr_pci: Advertise MSI quota
  PPC: KVM: Make pv hcall endian agnostic
  powerpc: use float64 for frsqrte
  spapr: Add kvm-type property
  spapr: Create SPAPRMachine struct
  linux-user: Tell guest about big host page sizes
  spapr_hcall: Add address-translation-mode-on-interrupt resource in H_SET_MODE
  spapr_hcall: Split h_set_mode()
  target-ppc: Enable DABRX SPR and limit it to <=POWER7
  target-ppc: Enable PPR and VRSAVE SPRs migration
  target-ppc: Add POWER8's Event Based Branch (EBB) control SPRs
  KVM: target-ppc: Enable TM state migration
  target-ppc: Add POWER8's TM SPRs
  target-ppc: Add POWER8's MMCR2/MMCRS SPRs
  target-ppc: Enable FSCR facility check for TAR
  target-ppc: Add POWER8's FSCR SPR
  target-ppc: Add POWER8's TIR SPR
  target-ppc: Refactor class init for POWER7/8
  target-ppc: Switch POWER7/8 classes to use correct PMU SPRs
  target-ppc: Make use of gen_spr_power5p_lpar() for POWER7/8
  ...

Signed-off-by: Peter Maydell <[email protected]>

rules.mak: remove $(sort) from extract-libs

Duplicate removal was added to extract-libs in order to avoid including
the same library multiple times into the linking command line; this could
potentially happen when using "foo.mo-libs" (which adds the library to
all components, causing it to appear N times if the module is composed
of N objects). However, sorting and removing duplicates causes problems
with static linking, and also with space-separated linker options as
found in some Mac OS X packaging systems. Furthermore, the "optimization"
is really a non-problem since we do not expect .mo modules to be composed
of many files.

Reported-by: Sean Bruno <[email protected]>
Tested-by: Sean Bruno <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-id: 1402929805 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

Block pull request

# gpg: Signature made Mon 16 Jun 2014 12:22:22 BST using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg:                 aka "Stefan Hajnoczi <[email protected]>"

* remotes/stefanha/tags/block-pull-request: (39 commits)
  QemuOpts: cleanup tmp 'allocated' member from QemuOptsList
  cleanup QEMUOptionParameter
  vpc.c: replace QEMUOptionParameter with QemuOpts
  vmdk.c: replace QEMUOptionParameter with QemuOpts
  vhdx.c: replace QEMUOptionParameter with QemuOpts
  vdi.c: replace QEMUOptionParameter with QemuOpts
  ssh.c: replace QEMUOptionParameter with QemuOpts
  sheepdog.c: replace QEMUOptionParameter with QemuOpts
  rbd.c: replace QEMUOptionParameter with QemuOpts
  raw_bsd.c: replace QEMUOptionParameter with QemuOpts
  raw-win32.c: replace QEMUOptionParameter with QemuOpts
  raw-posix.c: replace QEMUOptionParameter with QemuOpts
  qed.c: replace QEMUOptionParameter with QemuOpts
  qcow2.c: replace QEMUOptionParameter with QemuOpts
  QemuOpts: export qemu_opt_find
  qcow.c: replace QEMUOptionParameter with QemuOpts
  nfs.c: replace QEMUOptionParameter with QemuOpts
  iscsi.c: replace QEMUOptionParameter with QemuOpts
  gluster.c: replace QEMUOptionParameter with QemuOpts
  cow.c: replace QEMUOptionParameter with QemuOpts
  ...

Signed-off-by: Peter Maydell <[email protected]>

spapr_pci: Advertise MSI quota

Hotplug of multiple disks fails due to MSI vector quota check.
Number of MSI vectors default to 8 allowing only 4 devices.
This happens on RHEL6.5 guest. RHEL7 and SLES11 guests fallback
to INTX.

One way to workaround the issue is to increase total MSIs,
so that MSI quota check allows us to hotplug multiple disks.

This sets the quota to the maximum number of interupts XICS has
which is 1024 now (XICS_IRQS). This moves XICS_IRQS from spapr.c
to xics.h for wider visibility.

Signed-off-by: Badari Pulavarty <[email protected]>
[aik: put XICS_IRQS=1024 instead of 64i, fixed endianness and size]
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

PPC: KVM: Make pv hcall endian agnostic

There were a few revisions of the Linux kernel that incorrectly swapped
the hcall instructions when they saw ePAPR compliant hypercalls.

We already have fixups for those in place when running with PR KVM, but
HV KVM and systems that don't implement hypercalls at all are still broken
because they fall back to the QEMU implementation of fallback hypercalls.

So let's make the fallback hypercall instruction path endian agnostic. This
only really works well for 64bit guests, but I don't think there are any 32bit
systems left that don't implement real pv hcall support, so we'll never get
into this code path.

Signed-off-by: Alexander Graf <[email protected]>

powerpc: use float64 for frsqrte

Remove the code that reduce the result to float32 as the frsqrte
instruction is defined to return a double-precision estimate of
the reciprocal square root.

Although reducing the fractional part is harmless (as the estimation
must have at least 12 bits of precision according to the old PEM),
reducing the exponent range is not correct.

Signed-off-by: Tristan Gingold <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

spapr: Add kvm-type property

The kvm-type machine option was left out when MachineState was
introduced, preventing the kvm-type option from being used. Add the
missing property to the sPAPR machine class, so it can be used.

Signed-off-by: Eduardo Habkost <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

spapr: Create SPAPRMachine struct

Signed-off-by: Eduardo Habkost <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

linux-user: Tell guest about big host page sizes

We tell the guest its page size via AUX vectors. The guest process then uses
this page size as information on which boundaries it can mmap() things.

However, if the host has a bigger page size granularity than the guest, it can
not fulfill these mmap() requests - which falls apart when MAP_FIXED is passed
to mmap.

So in that case, let the guest know that we're running on a bigger page size
granularity than the target would require.

This fixes running qemu-ppc (TARGET_PAGE_SIZE=4k) on a 64k page size ppc64 host
for me.

Signed-off-by: Alexander Graf <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

spapr_hcall: Add address-translation-mode-on-interrupt resource in H_SET_MODE

This adds handling of the RESOURCE_ADDR_TRANS_MODE resource from
the H_SET_MODE, for POWER8 (PowerISA 2.07) only.

This defines AIL flags for LPCR special register.

This changes @excp_prefix according to the mode, takes effect in TCG.

This turns support of a new capability PPC2_ISA207S flag for TCG.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

spapr_hcall: Split h_set_mode()

This moves H_SET_MODE_RESOURCE_LE handler to a separate function
as there are other "resources" coming and this is going to become ugly.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Enable DABRX SPR and limit it to <=POWER7

This adds DABRX SPR.

As DABR(X) are present in POWER CPUs till POWER7 only and POWER8 does not
have them (as it implements more powerful facility instead), this limits
DABR/DABRX registration by POWER7 (inclusive).

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Enable PPR and VRSAVE SPRs migration

This hooks SPR with their "KVM set_one_reg" counterparts which enables
their migration.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Add POWER8's Event Based Branch (EBB) control SPRs

POWER8 supports Event-Based Branch Facility (EBB). It is controlled via
set of SPRs access to which should generate an "Facility Unavailable"
interrupt if the facilities are not enabled in FSCR for problem state.

This adds EBB SPRs.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

KVM: target-ppc: Enable TM state migration

This adds migration support for registers saved before Transactional
Memory (TM) transaction started.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Add POWER8's TM SPRs

This adds TM (Transactional Memory) SPRs.

This adds generic spr_read_prev_upper32()/spr_write_prev_upper32() to
handle upper half SPRs such as TEXASRU which is upper half of TEXASR.
Since this is not the only register like that and their numbers go
consequently, it makes sense to generalize the helpers.

This adds a gen_msr_facility_check() helper which purpose is to generate
the Facility Unavailable exception if the facility is disabled.
It is a copy of gen_fscr_facility_check() but it checks for enabled
facility in MSR rather than FSCR/HFSCR. It still sets the interrupt cause
in FSCR/HFSCR (whichever is passed to the helper).

This adds spr_read_tm/spr_write_tm/spr_read_tm_upper32/spr_write_tm_upper32
which are used for TM SPRs.

This adds TM-relates MSR bits definitions. This enables TM in POWER8 CPU class'
msr_mask.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Add POWER8's MMCR2/MMCRS SPRs

This adds POWER8 specific PMU MMCR2/MMCRS SPRs.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Enable FSCR facility check for TAR

This makes user-privileged read/write fail if TAR facility is not enabled
in FSCR.

Since this is the very first check for enabled in FSCR facility,
this also adds gen_fscr_facility_check() for using in spr_write_tar()/
spr_read_tar().

This enables TAR in FSCR for user mode unconditionally.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Add POWER8's FSCR SPR

This adds an FSCR (Facility Status and Control Register) SPR. This defines
names for FSCR bits.

This defines new exception type - POWERPC_EXCP_FU - "facility unavailable" (FU).
This registers an interrupt vector for it at 0xF60 as PowerISA defines.

This adds a TCG helper_fscr_facility_check() helper to raise an exception
if the facility is not enabled. It updates the interrupt cause field
in FSCR. This adds a TCG translation block generation code. The helper
may be used for HFSCR too as it has the same format.

The helper raising FU exceptions is not used by this patch but will be
in the next ones.

This adds gen_update_current_nip() to update NIP in DisasContext.
This helper is not used now and will be called before checking for
a condition for throwing an FU exception.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Add POWER8's TIR SPR

This adds TIR (Thread Identification Register) SPR first defined for server
CPUs in PowerISA 2.07.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Refactor class init for POWER7/8

This extends init_proc_book3s_64 to support POWER7 and POWER8.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Switch POWER7/8 classes to use correct PMU SPRs

This replaces gen_spr_7xx() call (which registers 32bit SPRs) with
gen_spr_book3s_pmu() call.

This removes SPR_7XX_PMC5/6 as they are for 32bit and gen_spr_book3s_pmu()
already registers correct PMC5/6 SPRs.

This removes explicit MMCRA registration as gen_spr_book3s_pmu() does it
anyway.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Make use of gen_spr_power5p_lpar() for POWER7/8

This makes use of generic gen_spr_power5p_lpar() which registers LPCR SPR.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Make use of gen_spr_book3s_altivec() for POWER7/8

This replaces VRSAVE registration and vscr_init() call with
gen_spr_book3s_altivec() which is generic and does the same thing if
insns_flags has PPC_ALTIVEC bit set (which POWER7/8 have set).

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Move POWER7/8 CFAR/DSCR/CTRL/PPR/PCR SPR registration to helpers

This moves SCFAR/DSCR/CTRL/PPR/PCR PRs to helpers. Later these helpers
will be called from generalized init_proc_book3s_64().

This switches init_proc_POWER7() to use generalized gen_spr_book3s_common()
which registers CRTL SPR under slightly different names. No change in
behaviour or non-debug output is expected.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Move POWER8 TCE Address control (TAR) to a helper

This moves TAR SPR to a helper. Later this helper will be
called from generalized init_proc_book3s_64().

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Move POWER7/8 PIR/PURR/SPURR SPR registration to helpers

This moves PIR/PURR/SPURR SPRs to helpers. Later these helpers will be
called from generalized init_proc_book3s_64().

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Enable PMU SPRs migration

This enabled PMU SPRs migration by hooking hypv privileged versions with
"KVM one reg" IDs.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Remove check_pow_970FX

After merging 970s into one class, check_pow_970() is used for all of them.
Since POWER5+ is no different in the matter of supported power modes,
let's use the same check_pow() callback for POWER5+ too,

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Introduce and reuse generalized init_proc_book3s_64()

At the moment every POWER CPU family has its own init_proc_POWERX function.
E500 already has common init function so we try to do the same thing.

This introduces BOOK3S_CPU_TYPE enum with 2 values - 970 and POWER5+.

This introduces generalized init_proc_book3s_64() which accepts a CPU type
as a parameter.

This uses new init function for 970 and POWER5+ CPU classes.

970 and POWER5+ use the same CPU class initialization except 3 things:
1. logical partitioning is controlled by LPCR (POWER5+) and HID4 (970)
SPRs;
2. 970 does not have EAR (External Access Register) SPR and PowerISA 2.03
defines one so keep it only for POWER5+;
3. POWER5+ does not have ALTIVEC so insns_flags does not have PPC_ALTIVEC
flag set and gen_spr_book3s_altivec() won't init ALTIVEC for POWER5+.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Add HID4 SPR for PPC970

Previously LPCR was registered for the 970 class which was wrong as
it does not have LPCR. Instead, HID4 is used which this patch registers.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Add PMC7/8 to 970 class

Compared to PowerISA-compliant CPUs, 970 family has most of them plus
PMC7/8 which are only present on 970 but not on POWER5 and later CPUs.

Since we are changing SPRs for Book3s/970 families, let's add them too.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Add PMC5/6, SDAR and MMCRA to 970 family

MMCR0, MMCR1, MMCRA, PMC1..6, SIAR, SDAR are defined for 970 and PowerISA
CPUs. Since we are building common infrastructure for SPRs intialization
to share it between 970 and POWER5+/7/..., let's add missing SPRs to
the 970 family. Later rework of CPU class initialization will use those
for all PowerISA CPUs.

This adds new SPRs and enables writing to Uxxxx SPRs from supermode.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Add "POWER" prefix to MMCRA PMU registers

Since we started adding "POWER" prefix to 64bit PMU SPRs, let's finish
the transition and fix MMCRA and define a supermode version of it.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Copy and split gen_spr_7xx() for 970

This stops using 7xx common SPRs init function and adds separate set
of helpers for 970.

This does not copy ICTC SPR as neither 970 manual nor PowerISA mention it.

This defines 970/book3s PMU SPRs constants as they differs from the ones
used for 7XX.

This creates 2 helpers for PMU SPRs, one for supermode privileged SPRs and
one for user privileged SPRs as "sup" versions can be shared across
the family while "user" versions will behave different starting POWER8
(which will be addressed later).

This allows writing to Uxxxx SPRs from supermode. spr_write_ureg() is
implemented for this as a copy of already existing spr_read_ureg().

This allows writing to supervisor's SIAR - it used to be disabled
when gen_spr_7xx() was used.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Make UCTRL a mirror of CTRL

This changes UCTRL SPR to read from its supermode copy.

This enables reading from UCTRL in user mode.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Refactor PPC970

This splits one init_proc_970() into a set of small helpers. Later
init_proc_970() will be generalized and will call different set of helpers
depending on the current CPU class.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Merge 970FX and 970MP into a single 970 class

The differences between classes were:
1. SLB size, was 32 for 970 and 64 for others, should be 64 for all;
2. check_pow() callback, HID0 format is the same so should be the same
0x01C00000 which means "deep nap", "doze" and "nap" bits set;
3. LPCR - 970 does not have it but 970MP had one (by mistake).

This fixes wrong differences and makes one 970 class.

This fixes wrong registration of LPCR which is not present on 970.

This defines HID0 bits and uses them in check_pow_970().

This does not copy MSR_SHV (Hypervisor State, HV) bit from 970FX to
970 class as we do not emulate hypervisor in QEMU anyway.

This does not remove check_pow_970FX now as it is still used by POWER5+
class, this will be addressed later.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Rename 7XX/60x/74XX/e600 PMU SPRs

As defined in Linux kernel, PMC*, SIAR, MMCR0/1 have different numbers
for 32 and 64 bit POWERPC. We are going to support 64bit versions too so
let's rename 32bit ones to avoid confusion.

This is a mechanical patch so it does not fix obvious mistake with these
registers in POWER7 yet, this will be fixed later.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Fix Temporary Variable Leak in bctar

Fix a temporary variable leak detected in the bctar instruction:

Opcode 13 10 11 (4d910460) leaked temporaries

Signed-off-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

PPC: e500: Merge 32 and 64 bit SPE emulation

Today we have a lot of conditional code in the SPE emulation depending on
whether we have 64bit GPRs or not.

Unfortunately the assumption that we can just recycle the 64bit GPR
implementation is wrong. Normal SPE implementations maintain the upper 32 bits
on all non-SPE instructions which then only modify the low 32 bits. However
all instructions we model that adhere to the normal SF based switching don't
care whether they operate on 32 or 64 bit registers and just always use the full
64 bits.

So let's remove that dubious SPE optimization and revert everything to the same
code path the 32bit target code was taking. That way we get rid of differences
between the two implementations, but will get a slight performance hit when
emulating SPE instructions.

This fixes SPE emulation with qemu-system-ppc64 for me.

Signed-off-by: Alexander Graf <[email protected]>

PPC: spapr: Expose /hypervisor node in device tree

PR KVM supports an ePAPR compliant hypercall interface in parallel to the
normal sPAPR one. Expose the ePAPR /hypervisor node and properties to the
guest so it can use it.

This enables magic page sharing on PR KVM with -M pseries.

However we had a few nasty bugs in the magic page implementation on vcpus
newer than 970 (p7, p8) that KVM now has workarounds for. It indicates that
it does have these workarounds through the PPC_FIXUP_HCALL capability.

To not expose broken guest kernels to issues on host kernels that don't
have the fixups in place, we don't expose working hypercall instructions
when the fixups are not available so that the guest can never active the
magic page.

Signed-off-by: Alexander Graf <[email protected]>

KVM: PPC: Expose fixup hcall capability

New kvm versions expose a PPC_FIXUP_HCALL capability. Make it visible to
machine code so we can take decisions based on it.

Signed-off-by: Alexander Graf <[email protected]>

linux-headers: update linux headers to kvm/next

This updates the kvm headers to commit 820b3fcd in kvm/next.

Signed-off-by: Alexander Graf <[email protected]>

linux-headers: include psci.h

The kvm headers now have a dependency on psci.h, sync it into our linux
header copy as well.

Signed-off-by: Alexander Graf <[email protected]>

PPC: SPE: Fix high-bits bitmask

The SPE emulation code wants to access the highest 32bits of a 64bit register
and uses the andi TCG instruction for that. Unfortunately it masked with the
wrong mask. Fix the mask to actually cover the upper 32 bits.

This fixes simple multiplication tests with SPE guests for me.

Signed-off-by: Alexander Graf <[email protected]>

PPC: e500: Fix TLB lookup for 32bit CPUs

When we run 32bit guest CPUs (or 32bit guest code on 64bit CPUs) on
qemu-system-ppc64 the TLB lookup will use the full effective address
as pointer.

However, only the first 32bits are valid when MSR.CM = 0. Check for
that condition.

This makes QEMU boot an e500v2 guest with more than 1G of RAM for me.

Signed-off-by: Alexander Graf <[email protected]>

hw/pci-host/ppce500: Fix typo in vmstate definition

Fix a typo in the ppce500_pci vmstate definition which meant that
we were migrating the struct pci_inbound using the vmstate for
pci_outbound. Fortunately the two structures have exactly the same
format at the moment (four uint32_ts) so this was harmless, and
we can correcting the typo without a migration compatibility
break because the vmstate name doesn't go out on the wire.

Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Store Quadword Conditional Drops Size Bit

The size and register information are encoded into the reserve_info field
of CPU state in the store conditional translation code. Specifically, the
size is shifted left by 5 bits (see target-ppc/translate.c gen_conditional_store).

The user-mode store conditional code erroneously extracts the size by ANDing
with a 4 bit mask; this breaks if size >= 16.

Eliminate the mask to make the extraction of size mirror its encoding.

Signed-off-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Confirm That .bss Pages Are Valid

The existing code does a check to ensure that a .bss region is properly
mmap'd. When additional mmap is required, the (guest) pages are also
validated. However, this code has a bug: when host page size is larger
than target page size, it is possible for the .bss pages to already be
(host) mapped but the guest .bss pages may not be valid.

The check to mmap additional space is separated from the flagging of the
target (guest) pages, thus ensuring that both aspects are done properly.

Signed-off-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Support VSX in PPC User Mode

Some modern tool chains use VSX instructions. Therefore attempt to enable the VSX MSR
bit by default, just like similar bits (FP, VEC, SPE, etc.).

Signed-off-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Add a new user mode target for little-endian PPC64.

Signed-off-by: Doug Kwan <[email protected]>
Signed-off-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Allow little-endian user mode.

This allows running PPC64 little-endian in user mode if target is configured
that way. In PPC64 LE user mode we set MSR.LE during initialization.

Signed-off-by: Doug Kwan <[email protected]>
Signed-off-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Support little-endian PPC64 in user mode.

Look at ELF header to determine ABI version on PPC64. This is required
for executing the first instruction correctly. Also print correct machine
name in uname() system call.

Signed-off-by: Doug Kwan <[email protected]>
Signed-off-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

PPC: e500: Fix MMUCSR0 emulation

A "mtspr SPRMMUCSR0, reg" always flushed TLB0,
because it passed the SPR number 0x3f4 to the flush routine.
But we want to flush either TLB0 or TBL1 depending on the GPR value.

Signed-off-by: Alex Zuepke <[email protected]>
[agraf: change subject line, fix TCGv size mismatch]
Signed-off-by: Alexander Graf <[email protected]>