Peter Maydell [Mon, 13 Jun 2016 10:22:05 +0000 (11:22 +0100)]
linux-user: Use safe_syscall wrapper for fcntl
Use the safe_syscall wrapper for fcntl. This is straightforward now
that we always use 'struct fcntl64' on the host, as we don't need
to select whether to call the host's fcntl64 or fcntl syscall
(a detail that the libc previously hid for us).
Peter Maydell [Mon, 13 Jun 2016 10:22:05 +0000 (11:22 +0100)]
linux-user: Use __get_user() and __put_user() to handle structs in do_fcntl()
Use the __get_user() and __put_user() to handle reading and writing the
guest structures in do_ioctl(). This has two benefits:
* avoids possible errors due to misaligned guest pointers
* correctly sign extends signed fields (like l_start in struct flock)
which might be different sizes between guest and host
To do this we abstract out into copy_from/to_user functions. We
also standardize on always using host flock64 and the F_GETLK64
etc flock commands, as this means we always have 64 bit offsets
whether the host is 64-bit or 32-bit and we don't need to support
conversion to both host struct flock and struct flock64.
In passing we fix errors in converting l_type from the host to
the target (where we were doing a byteswap of the host value
before trying to do the convert-bitmasks operation rather than
otherwise, and inexplicably shifting left by 1); these were
accidentally left over when the original simple "just shift by 1"
arm<->x86 conversion of commit 43f238d was changed to the more
general scheme of using target_to_host_bitmask() functions in 2ba7f73.
Peter Maydell [Mon, 13 Jun 2016 10:22:05 +0000 (11:22 +0100)]
linux-user: Avoid possible misalignment in host_to_target_siginfo()
host_to_target_siginfo() is implemented by a combination of
host_to_target_siginfo_noswap() followed by tswap_siginfo().
The first of these two functions assumes that the target_siginfo_t
it is writing to is correctly aligned, but the pointer passed
into host_to_target_siginfo() is directly from the guest and
might be misaligned. Use a local variable to avoid this problem.
(tswap_siginfo() does now correctly handle a misaligned destination.)
We have to add a memset() to host_to_target_siginfo_noswap()
to avoid some false positive "may be used uninitialized" warnings
from gcc about subfields of the _sifields union if it chooses to
inline both tswap_siginfo() and host_to_target_siginfo_noswap()
into host_to_target_siginfo().
Peter Maydell [Thu, 23 Jun 2016 10:53:14 +0000 (11:53 +0100)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160623' into staging
ppc patch queue for 2016-06-23
Currently outstanding patches for spapr, target-ppc and related
devices. This batch has:
* Significant new progress towards full support for hypervisor
mode
* Assorted bugfixes
* Some preliminary patches towards dynamic DMA window support
The last involves a change to memory.c, which Paolo has said I can
take through this tree.
# gpg: Signature made Thu 23 Jun 2016 06:47:53 BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <[email protected]>"
# gpg: aka "David Gibson (Red Hat) <[email protected]>"
# gpg: aka "David Gibson (ozlabs.org) <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.7-20160623:
ppc: Disable huge page support if it is not available for main RAM
ppc: Add P7/P8 Power Management instructions
ppc: Move exception generation code out of line
ppc: Turn a bunch of booleans from int to bool
ppc: Add real mode CI load/store instructions for P7 and P8
ppc: Rework generation of priv and inval interrupts
ppc: Fix generation if ISI/DSI vs. HV mode
ppc: Fix POWER7 and POWER8 exception definitions
ppc: fix exception model for HV mode
ppc: define a default LPCR value
ppc: Fix rfi/rfid/hrfi/... emulation
memory: Add reporting of supported page sizes
ppc: Improve emulation of THRM registers
target-ppc: Fix rlwimi, rlwinm, rlwnm again
ppc64: disable gen_pause() for linux-user mode
tests: Use '+=' to add additional tests, not '='
powerpc/mm: Update the WIMG check during H_ENTER
* remotes/kraxel/tags/pull-usb-20160622-2:
usb-uas: hotplug support
usb-bot: hotplug support
usb: Add QOM property "attached".
usb: make USBDevice->attached bool
usb-storage: qcow2 encryption support is finally gone, zap dead code
Thomas Huth [Wed, 22 Jun 2016 08:50:05 +0000 (10:50 +0200)]
ppc: Disable huge page support if it is not available for main RAM
On powerpc, we must only signal huge page support to the guest if
all memory areas are capable of supporting huge pages. The commit 2d103aae8765 ("fix hugepage support when using memory-backend-file")
already fixed the case when the user specified the mem-path property
for NUMA memory nodes instead of using the global "-mem-path" option.
However, there is one more case where it currently can go wrong.
When specifying additional memory DIMMs without using NUMA, e.g.
the code in getrampagesize() currently assumes that huge pages
are possible since they are enabled for the mem1 object. But
since the main RAM is not backed by a huge page filesystem,
the guest Linux kernel then crashes very quickly after being
started. So in case the we've got "normal" memory without NUMA
and without the global "-mem-path" option, we must not announce
huge pages to the guest. Since this is likely a mis-configuration
by the user, also spill out a message in this case.
There's no point inlining this, if you hit the exception case you exit
anyway, and not inlining saves about 100K of code size (and cache
footprint).
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
[clg: removed '__attribute__((noinline))' from original patch ] Signed-off-by: Cédric Le Goater <[email protected]> Signed-off-by: David Gibson <[email protected]>
ppc: Rework generation of priv and inval interrupts
Recent server processors use the Hypervisor Emulation Assistance
interrupt for illegal instructions and *some* type of SPR accesses.
Also the code was always generating inval instructions even for priv
violations due to setting the wrong flags
Finally, the checking for PR/HV was open coded everywhere.
This reworks it all, using little helper macros for checking, and
adding the HV interrupt (which gets converted back to program check
in the slow path of excp_helper.c on CPUs that don't want it).
Under some circumstances, we need to direct ISI and DSI interrupts
at the hypervisor, turning them into HISI/HDSI, and using different
SPRs (HDSISR and HDAR) depending on the combination of MSR_DR and
the corresponding VPM bits in LPCR.
This moves part of the code into helpers that are fixed to select
the right exception type and registers. On pre-P7 processors, LPCR
is 0 which provides the old behaviour of directing the interrupts
at the supervisor.
Thanks to Andrei Warkentin for finding a bug when HV=1
This properly implements LPES0 handling for HV vs. !HV mode and
removes the unsupported LPES1. This has been removed from the specs
since ISA v2.07.
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
[clg: AIL implementation was fixed in commit 5c94b2a5e5ef. This patch
only contains the bits of the original patch related to LPES0
handling, adapted commit log.
fixed checkpatch.pl errors. ] Signed-off-by: Cédric Le Goater <[email protected]> Signed-off-by: David Gibson <[email protected]>
This allows us to set the appropriate LPCR bits which will be used
when fixing the exception model for the HV mode.
Signed-off-by: Benjamin Herrenschmidt <[email protected]> Reviewed-by: David Gibson <[email protected]>
[clg: previous commit 26a7f1291bb5 did not include the LPCR setting as
it was not needed at the time, adapted commit log ] Signed-off-by: Cédric Le Goater <[email protected]> Signed-off-by: David Gibson <[email protected]>
This reworks emulation of the various "rfi" variants. I removed
some masking bits that I couldn't make sense of, the only bit that
I am aware we should mask here is POW, the CPU's MSR mask should
take care of the rest.
This also fixes some problems when running 32-bit userspace under
a 64-bit kernel.
This patch broke 32bit OpenBIOS when run under a 970 cpu. A fix was
proposed here :
Signed-off-by: Benjamin Herrenschmidt <[email protected]> Reviewed-by: David Gibson <[email protected]>
[clg: updated the commit log with the reference of the openbios fix ] Signed-off-by: Cédric Le Goater <[email protected]>
[dwg: Remove hunk which disabled rfi on 64-bit CPUS. The change was
correct, but we need to fix OpenBIOS before applying it] Signed-off-by: David Gibson <[email protected]>
Gerd Hoffmann [Wed, 15 Jun 2016 09:46:57 +0000 (11:46 +0200)]
usb: Add QOM property "attached".
USB devices in attached state are visible to the guest. This patch adds
a QOM property for this. Write access is opt-in per device. Some
devices manage attached state automatically (usb-host, usb-serial,
usb-redir), so we can't enable write access universally but have to do
it on a case by case base. So far, no device opts in.
Juergen Gross [Mon, 13 Jun 2016 09:12:21 +0000 (11:12 +0200)]
xen: move xen_sysdev to xen_backend.c
Commit 9432e53a5bc88681b2d3aec4dac9db07c5476d1b added xen_sysdev as a
system device to serve as an anchor for removable virtual buses. This
introduced a build failure for non-x86 builds with CONFIG_XEN_BACKEND
set, as xen_sysdev was defined in a x86 specific file while being
consumed in an architecture independent source.
Move the xen_sysdev definition and initialization to xen_backend.c to
avoid the build failure.
Juergen Gross [Mon, 20 Jun 2016 07:55:43 +0000 (09:55 +0200)]
xen: fix qdisk BLKIF_OP_DISCARD for 32/64 word size mix
In case the word size of the domU and qemu running the qdisk backend
differ BLKIF_OP_DISCARD will not work reliably, as the request
structure in the ring have different layouts for different word size.
Correct this by copying the request structure in case of different
word size element by element in the BLKIF_OP_DISCARD case, too.
Every IOMMU has some granularity which MemoryRegionIOMMUOps::translate
uses when translating, however this information is not available outside
the translate context for various checks.
This adds a get_min_page_size callback to MemoryRegionIOMMUOps and
a wrapper for it so IOMMU users (such as VFIO) can know the minimum
actual page size supported by an IOMMU.
As IOMMU MR represents a guest IOMMU, this uses TARGET_PAGE_SIZE
as fallback.
This removes vfio_container_granularity() and uses new helper in
memory_region_iommu_replay() when replaying IOMMU mappings on added
IOMMU memory region.
The 75x and 74xx processors have some thermal monitoring SPRs that
some OSes such as MacOS do use. Our current "dumb" implementation
isn't good enough and will cause some versions of MacOS to hang during
boot.
This lifts an improved emulation from MacOnLinux and adapts it to
qemu, thus fixing the problem.
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
[dwg: Fixed typo in comment, a number of minor checkpatch warnings,
and a compile failure with CONFIG_USER_ONLY] Signed-off-by: David Gibson <[email protected]>
In 63ae0915f8ec, I arranged to use a 32-bit rotate, without
considering the effect of a mask value that wraps around to
the high bits of the word.
[dwg: In 2e11b15 this was partially fixed, but an edge case was still
incorrect, which this fixes]
Signed-off-by: Richard Henderson <[email protected]>
[dwg: Folded with a revert of 2e11b15, an earlier buggy version of
this patch which already went upstream] Tested-by: Anton Blanchard <[email protected]> Signed-off-by: David Gibson <[email protected]>
Otherwise tight loops at smt_low for example, which OPAL does,
eat so much CPU that we can't boot a kernel anymore. With that,
I can boot 8 CPUs just fine with powernv.
Signed-off-by: Benjamin Herrenschmidt <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: David Gibson <[email protected]>
We can fix that by preventing to send EXCP_HLT in the case of linux-user mode,
as the main loop doesn't know how to manage it.
Thomas Huth [Fri, 17 Jun 2016 13:16:17 +0000 (15:16 +0200)]
tests: Use '+=' to add additional tests, not '='
The recent commit that added the prom-env-test accidentially
overwrote the check-qtest-ppc-y, check-qtest-ppc64-y and
check-qtest-sparc-y variables instead of extending them.
Fixes: fcbf4a3c0c576eec1321f9cff4fa0dd8e0b1a82f Signed-off-by: Thomas Huth <[email protected]> Signed-off-by: David Gibson <[email protected]>
Support for 0 value for memeory coherence is optional and with ppc64
we can always enable memory coherence. Linux kernel did that during
the development of 4.7 kernel. But that resulted in failure in Qemu
in H_ENTER hcall due to below check. The mentioned change was reverted
in the kernel and kernel right now enable memory coherence only if
cache inhibited is not set. Nevertheless update qemu WIMG flag check
to cover the case where we enable memory coherence along with cache
inhibited flag.
In order to handle older and newer kernel version consider both Cache
inhibitted and (cache inhibitted | memory conference) as valid values
for wimg flags.
Peter Maydell [Mon, 20 Jun 2016 21:30:34 +0000 (22:30 +0100)]
Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging
# gpg: Signature made Mon 20 Jun 2016 21:29:27 BST
# gpg: using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg: aka "Stefan Hajnoczi <[email protected]>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8
* remotes/stefanha/tags/tracing-pull-request: (42 commits)
trace: split out trace events for linux-user/ directory
trace: split out trace events for qom/ directory
trace: split out trace events for target-ppc/ directory
trace: split out trace events for target-s390x/ directory
trace: split out trace events for target-sparc/ directory
trace: split out trace events for net/ directory
trace: split out trace events for audio/ directory
trace: split out trace events for ui/ directory
trace: split out trace events for hw/alpha/ directory
trace: split out trace events for hw/arm/ directory
trace: split out trace events for hw/acpi/ directory
trace: split out trace events for hw/vfio/ directory
trace: split out trace events for hw/s390x/ directory
trace: split out trace events for hw/pci/ directory
trace: split out trace events for hw/ppc/ directory
trace: split out trace events for hw/9pfs/ directory
trace: split out trace events for hw/i386/ directory
trace: split out trace events for hw/isa/ directory
trace: split out trace events for hw/sd/ directory
trace: split out trace events for hw/sparc/ directory
...
Mark Cave-Ayland [Mon, 20 Jun 2016 20:55:16 +0000 (21:55 +0100)]
MAINTAINERS: add Artyom Tarasenko as SPARC maintainer
Artyom has been working on QEMU's SPARC emulation for several years, providing
initial support for Solaris under qemu-system-sparc and more recently bugfixes
for qemu-system-sparc64 and TCG patch reviews. As work progresses on improving
emulation for sun4u machines and beyond, Artyom has agreed to take on
co-maintainership of SPARC with a focus on 64-bit architecture.
Peter Maydell [Mon, 20 Jun 2016 17:14:26 +0000 (18:14 +0100)]
Merge remote-tracking branch 'remotes/mwalle/tags/lm32-queue/20160620' into staging
lm32/milkymist: some qomifying
# gpg: Signature made Mon 20 Jun 2016 17:27:53 BST
# gpg: using RSA key 0xB458ABB0D8D378E3
# gpg: Good signature from "Michael Walle <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2190 3E48 4537 A7C2 90CE 3EB2 B458 ABB0 D8D3 78E3
trace: add build framework for merging trace-events files
Switch make rules over to use trace-events-all as the
master trace events input file. Add rule that will
construct trace-events-all from $(trace-events-y).
Lluís Vilanova [Thu, 9 Jun 2016 17:31:47 +0000 (19:31 +0200)]
trace: [all] Add "guest_mem_before" event
The event is described in "trace-events". Note that the "MO_AMASK" flag
is not traced, since it does not seem to affect the visible semantics of
instructions.
[s/inline inline/inline/ to fix clang build.
--Stefan]
Peter Maydell [Mon, 20 Jun 2016 15:19:18 +0000 (16:19 +0100)]
Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2016-06-20' into staging
Error reporting patches for 2016-06-20
# gpg: Signature made Mon 20 Jun 2016 15:56:15 BST
# gpg: using RSA key 0x3870B400EB918653
# gpg: Good signature from "Markus Armbruster <[email protected]>"
# gpg: aka "Markus Armbruster <[email protected]>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* remotes/armbru/tags/pull-error-2016-06-20:
log: Fix qemu_set_log_filename() error handling
log: Fix qemu_set_dfilter_ranges() error reporting
log: Plug memory leak on multiple -dfilter
coccinelle: Remove unnecessary variables for function return value
error: Remove unnecessary local_err variables
error: Remove NULL checks on error_propagate() calls
vl: Error messages need to go to stderr, fix some
When qemu_set_log_filename() detects an invalid file name, it reports
an error, closes the log file (if any), and starts logging to stderr
(unless daemonized or nothing is being logged).
This is wrong. Asking for an invalid log file on the command line
should be fatal. Asking for one in the monitor should fail without
messing up an existing logfile.
Fix by converting qemu_set_log_filename() to Error. Pass it
&error_fatal, except for hmp_logfile report errors.
This also permits testing without a subprocess, so do that.
Eduardo Habkost [Mon, 13 Jun 2016 21:57:58 +0000 (18:57 -0300)]
coccinelle: Remove unnecessary variables for function return value
Use Coccinelle script to replace 'ret = E; return ret' with
'return E'. The script will do the substitution only when the
function return type and variable type are the same.
Manual fixups:
* audio/audio.c: coding style of "read (...)" and "write (...)"
* block/qcow2-cluster.c: wrap line to make it shorter
* block/qcow2-refcount.c: change indentation of wrapped line
* target-tricore/op_helper.c: fix coding style of
"remainder|quotient"
* target-mips/dsp_helper.c: reverted changes because I don't
want to argue about checkpatch.pl
* ui/qemu-pixman.c: fix line indentation
* block/rbd.c: restore blank line between declarations and
statements
Reviewed-by: Eric Blake <[email protected]> Signed-off-by: Eduardo Habkost <[email protected]>
Message-Id: <1465855078[email protected]> Reviewed-by: Markus Armbruster <[email protected]>
[Unused Coccinelle rule name dropped along with a redundant comment;
whitespace touched up in block/qcow2-cluster.c; stale commit message
paragraph deleted] Signed-off-by: Markus Armbruster <[email protected]>
Lluís Vilanova [Thu, 9 Jun 2016 17:31:41 +0000 (19:31 +0200)]
exec: [tcg] Track which vCPU is performing translation and execution
Information is tracked inside the TCGContext structure, and later used
by tracing events with the 'tcg' and 'vcpu' properties.
The 'cpu' field is used to check tracing of translation-time
events ("*_trans"). The 'tcg_env' field is used to pass it to
execution-time events ("*_exec").
Stefan Hajnoczi [Thu, 16 Jun 2016 16:56:29 +0000 (17:56 +0100)]
backup: follow AioContext change gracefully
Move s->target to the new AioContext when there is an AioContext change.
The backup_run() coroutine does not use asynchronous I/O so there is no
need to wait for in-flight requests in a BlockJobDriver->pause()
callback.
Guest writes are intercepted by the backup job. Treat them as guest
activity and do it even while the job is paused. This is necessary
since the only alternative would be to fail a job that experienced guest
writes during pause once the job is resumed. In practice the guest
writes don't interfere with AioContext switching since bdrv_drain() is
used by bdrv_set_aio_context().
Loops already contain pause points because of block_job_sleep_ns() calls
in the yield_and_check() helper function. It is necessary to convert a
raw qemu_coroutine_yield() to block_job_yield() so the
MIRROR_SYNC_MODE_NONE case can pause.
Stefan Hajnoczi [Thu, 16 Jun 2016 16:56:26 +0000 (17:56 +0100)]
block: use safe iteration over AioContext notifiers
It's possible that an AioContext notifier user was close to finishing
when .detach_aio_context() or .attached_aio_context() is called. In
that case they may call bdrv_remove_aio_context_notifier() during the
callback.
Use safe iteration to avoid crashing when the notifier list is modified
during iteration. We must not only handle the case where the current
aio notifier is removed during a callback but also the one where any
other aio notifier is removed.
The next patch adds an AioContext notifier for block jobs and they
really could be terminating just as .detach_aio_context() is invoked.