David Gibson [Wed, 12 Oct 2011 19:10:30 +0000 (19:10 +0000)]
pseries: Update SLOF firmware image
This patch is a general update to the SLOF firmware image used on the
pseries machine. This doesn't contain updates for specific features but
contains a number of bugfixes and enhancements in the main SLOF tree from
Thomas Huth.
David Gibson [Mon, 10 Oct 2011 18:31:01 +0000 (18:31 +0000)]
pseries: Add device tree properties for VMX/VSX and DFP under kvm
Sufficiently recent PAPR specifications define properties "ibm,vmx"
and "ibm,dfp" on the CPU node which advertise whether the VMX vector
extensions (or the later VSX version) and/or the Decimal Floating
Point operations from IBM's recent POWER CPUs are available.
Currently we do not put these in the guest device tree and the guest
kernel will consequently assume they are not available. This is good,
because they are not supported under TCG. VMX is similar enough to
Altivec that it might be trivial to support, but VSX and DFP would
both require significant work to support in TCG.
However, when running under kvm on a host which supports these
instructions, there's no reason not to let the guest use them. This
patch, therefore, checks for the relevant support on the host CPU
and, if present, advertises them to the guest as well.
David Gibson [Mon, 10 Oct 2011 18:31:00 +0000 (18:31 +0000)]
ppc: Generalize the kvmppc_get_clockfreq() function
Currently the kvmppc_get_clockfreq() function reads the host's clock
frequency from /proc/device-tree, which is useful to past to the guest
in KVM setups. However, there are some other host properties
advertised in the device tree which can also be relevant to the
guests.
This patch, therefore, replaces kvmppc_get_clockfreq() which can
retrieve any named, single integer property from the host device
tree's CPU node.
Set an invalid-bits mask for each SPE instructions
SPE instructions are defined by pairs. Currently, the invalid-bits mask is set
for the first instruction, but the second one can have a different mask.
David Gibson [Thu, 29 Sep 2011 21:39:12 +0000 (21:39 +0000)]
pseries: Use Book3S-HV TCE acceleration capabilities
The pseries machine of qemu implements the TCE mechanism used as a
virtual IOMMU for the PAPR defined virtual IO devices. Because the
PAPR spec only defines a small DMA address space, the guest VIO
drivers need to update TCE mappings very frequently - the virtual
network device is particularly bad. This means many slow exits to
qemu to emulate the H_PUT_TCE hypercall.
Sufficiently recent kernels allow this to be mitigated by implementing
H_PUT_TCE in the host kernel. To make use of this, however, qemu
needs to initialize the necessary TCE tables, and map them into itself
so that the VIO device implementations can retrieve the mappings when
they access guest memory (which is treated as a virtual DMA
operation).
This patch adds the necessary calls to use the KVM TCE acceleration.
If the kernel does not support acceleration, or there is some other
error creating the accelerated TCE table, then it will still fall back
to full userspace TCE implementation.
David Gibson [Thu, 29 Sep 2011 21:39:11 +0000 (21:39 +0000)]
pseries: Allow KVM Book3S-HV on PPC970 CPUS
At present, using the hypervisor aware Book3S-HV KVM will only work
with qemu on POWER7 CPUs. PPC970 CPUs also have hypervisor
capability, but they lack the VRMA feature which makes assigning guest
memory easier.
In order to allow KVM Book3S-HV on PPC970, we need to specially
allocate the first chunk of guest memory (the "Real Mode Area" or
RMA), so that it is physically contiguous.
Sufficiently recent host kernels allow such contiguous RMAs to be
allocated, with a kvm capability advertising whether the feature is
available and/or necessary on this hardware. This patch enables qemu
to use this support, thus allowing kvm acceleration of pseries qemu
machines on PPC970 hardware.
David Gibson [Thu, 29 Sep 2011 21:39:10 +0000 (21:39 +0000)]
pseries: Support SMT systems for KVM Book3S-HV
Alex Graf has already made qemu support KVM for the pseries machine
when using the Book3S-PR KVM variant (which runs the guest in
usermode, emulating supervisor operations). This code allows gets us
very close to also working with KVM Book3S-HV (using the hypervisor
capabilities of recent POWER CPUs).
This patch moves us another step towards Book3S-HV support by
correctly handling SMT (multithreaded) POWER CPUs. There are two
parts to this:
* Querying KVM to check SMT capability, and if present, adjusting the
cpu numbers that qemu assigns to cause KVM to assign guest threads
to cores in the right way (this isn't automatic, because the POWER
HV support has a limitation that different threads on a single core
cannot be in different guests at the same time).
* Correctly informing the guest OS of the SMT thread to core mappings
via the device tree.
this patch fix multiple issues with VirtFS tracing.
a) Add tracepoint to the correct code path. We handle error in complete_pdu
b) Fix indentation in python script
c) Fix variable naming issue in python script
While ALIGNADDR was implemented out-of-line, ALIGNADDRL was not
implemeneted at all. However, this is a very simple operation
so we're better off doing this inline.
target-sparc: Do exceptions management fully inside the helpers.
This reduces the size of the individual translation blocks, since
we only emit a single call for each FOP rather than three. In
addition, clear_float_exceptions expands inline to a single byte store.
target-sparc: Change fpr representation to doubles.
This allows a more efficient representation for 64-bit hosts.
It should be about the same for 32-bit hosts, as we can still
access the individual pieces of the double.
target-sparc: Add accessors for double-precision fpr access.
Begin using i64 quantities to manipulate double-precision values.
On a 64-bit host this will, for the moment, generate less efficient
code; on a 32-bit host code quality should be largely unchanged.
Code quality for 64-bit will be adjusted with a subsequent patch.
target-sparc: Add accessors for single-precision fpr access.
Load, store, and "create destination". This version attempts to
change the behaviour of the translator as little as possible. We
previously used cpu_tmp32 as the temporary destination, and we
continue to use that. This will eventually allow a change in
representation of the fprs.
Change the name of the cpu_fpr array to make certain that all
instances are converted.
Blue Swirl [Mon, 1 Aug 2011 09:20:58 +0000 (09:20 +0000)]
Sparc: avoid AREG0 for softint op helpers and Leon cache control
Make softint op helpers and Leon cache irq manager take a parameter
for CPUState instead of relying on global env. Move the functions
to int{32,64}_helper.c.
Peter Maydell [Thu, 29 Sep 2011 14:48:12 +0000 (15:48 +0100)]
linux-user: Fix broken "-version" option
Fix the "-version" option, which was accidentally broken in commit fc9c541:
* exit after printing version information rather than proceeding
blithely onward (and likely printing the full usage message)
* correct the cut-n-paste error in the usage message for it
* don't insist on the presence of a following argument for
options which don't take an argument (this was preventing
'qemu-arm -version' from working)
* remove a spurious argc check from the beginning of main() which
meant 'QEMU_VERSION=1 qemu-arm' didn't work.
Paolo Bonzini [Thu, 20 Oct 2011 11:16:25 +0000 (13:16 +0200)]
block: change discard to co_discard
Since coroutine operation is now mandatory, convert both bdrv_discard
implementations to coroutines. For qcow2, this means taking the lock
around the operation. raw-posix remains synchronous.
The bdrv_discard callback is then unused and can be eliminated.
Paolo Bonzini [Thu, 20 Oct 2011 11:16:24 +0000 (13:16 +0200)]
block: change flush to co_flush
Since coroutine operation is now mandatory, convert all bdrv_flush
implementations to coroutines. For qcow2, this means taking the lock.
Other implementations are simpler and just forward bdrv_flush to the
underlying protocol, so they can avoid the lock.
The bdrv_flush callback is then unused and can be eliminated.
Paolo Bonzini [Thu, 20 Oct 2011 11:16:23 +0000 (13:16 +0200)]
block: take lock around bdrv_write implementations
This does the first part of the conversion to coroutines, by
wrapping bdrv_write implementations to take the mutex.
Drivers that implement bdrv_write rather than bdrv_co_writev can
then benefit from asynchronous operation (at least if the underlying
protocol supports it, which is not the case for raw-win32), even
though they still operate with a bounce buffer.
Paolo Bonzini [Thu, 20 Oct 2011 11:16:22 +0000 (13:16 +0200)]
block: take lock around bdrv_read implementations
This does the first part of the conversion to coroutines, by
wrapping bdrv_read implementations to take the mutex.
Drivers that implement bdrv_read rather than bdrv_co_readv can
then benefit from asynchronous operation (at least if the underlying
protocol supports it, which is not the case for raw-win32), even
though they still operate with a bounce buffer.
raw-win32 does not need the lock, because it cannot yield.
nbd also doesn't probably, but better be safe.
Paolo Bonzini [Thu, 20 Oct 2011 11:16:21 +0000 (13:16 +0200)]
block: add a CoMutex to synchronous read drivers
The big conversion of bdrv_read/write to coroutines caused the two
homonymous callbacks in BlockDriver to become reentrant. It goes
like this:
1) bdrv_read is now called in a coroutine, and calls bdrv_read or
bdrv_pread.
2) the nested bdrv_read goes through the fast path in bdrv_rw_co_entry;
3) in the common case when the protocol is file, bdrv_co_do_readv calls
bdrv_co_readv_em (and from here goes to bdrv_co_io_em), which yields
until the AIO operation is complete;
4) if bdrv_read had been called from a bottom half, the main loop
is free to iterate again: a device model or another bottom half
can then come and call bdrv_read again.
This applies to all four of read/write/flush/discard. It would also
apply to is_allocated, but it is not used from within coroutines:
besides qemu-img.c and qemu-io.c, which operate synchronously, the
only user is the monitor. Copy-on-read will introduce a use in the
block layer, and will require converting it.
The solution is "simply" to convert all drivers to coroutines! We
just need to add a CoMutex that is taken around affected operations.
If that can happen, however, the code is bogus. vmdk_parent_open
reads from bs->file:
if (bdrv_pread(bs->file, s->desc_offset, desc, DESC_SIZE) != DESC_SIZE) {
but it is always called with s->desc_offset == 0 and with the same
bs->file. So the data that vmdk_parent_open reads comes always from the
same place, and anyway there is only one place where it can write it,
namely bs->backing_file.
So, if it cannot happen, the patched code is okay.
It is also possible that the recursive call can happen, but only once. In
that case there would still be a bug in vmdk_open_desc_file setting
s->desc_offset = 0, but the patched code is okay.
Finally, in the case where multiple recursive calls can happen the code
would need to be rewritten anyway. It is likely that this would anyway
involve adding several parameters to vmdk_parent_open, and calling it from
vmdk_open_vmdk4.
Kevin Wolf [Thu, 20 Oct 2011 14:37:26 +0000 (16:37 +0200)]
pc: Fix floppy drives with if=none
Commit 63ffb564 broke floppy devices specified on the command line like
-drive file=...,if=none,id=floppy -global isa-fdc.driveA=floppy because it
relies on drive_get() which works only with -fda/-drive if=floppy.
This patch resembles what we're already doing for IDE, i.e. remember the floppy
device that was created and use that to extract the BlockDriverStates where
needed.
Kevin Wolf [Tue, 18 Oct 2011 15:12:44 +0000 (17:12 +0200)]
qcow2: Fix bdrv_write_compressed error handling
If during allocation of compressed clusters the cluster was already allocated
uncompressed, fail and properly release the l2_table (the latter avoids a
failed assertion).
While at it, make it return some real error numbers instead of -1.
Kevin Wolf [Tue, 18 Oct 2011 14:41:45 +0000 (16:41 +0200)]
fdc: Fix floppy port I/O
The floppy device was broken by commit 212ec7ba (fdc: Convert to
isa_register_portio_list). While the old interface provided the port number
relative to the floppy drive's io_base, the new one provides the real port
number, so we need to apply a bitmask now to get the register number.
Stefan Hajnoczi [Mon, 17 Oct 2011 10:32:13 +0000 (12:32 +0200)]
block: drop redundant bdrv_flush implementation
Block drivers now only need to provide either of .bdrv_co_flush,
.bdrv_aio_flush() or for legacy drivers .bdrv_flush(). Remove
the redundant .bdrv_flush() implementations.
[Paolo Bonzini: change raw driver to bdrv_co_flush]
Paolo Bonzini [Mon, 17 Oct 2011 10:32:12 +0000 (12:32 +0200)]
block: unify flush implementations
Add coroutine support for flush and apply the same emulation that
we already do for read/write. bdrv_aio_flush is simplified to always
go through a coroutine.
Kevin Wolf [Fri, 21 Oct 2011 10:16:44 +0000 (12:16 +0200)]
xen_disk: Always set feature-barrier = 1
The synchronous .bdrv_flush callback doesn't exist any more and a device really
shouldn't poke into the block layer internals anyway. All drivers are supposed
to have a correctly working bdrv_flush, so let's just hard-code this.
Peter Maydell [Tue, 18 Oct 2011 15:12:54 +0000 (16:12 +0100)]
hw/omap2: Wire up the IRQ for the 2430's fifth GPIO module
The OMAP2430 version of the omap-gpio device has five GPIO modules,
not four like the other OMAP2 versions; wire up the fifth module's
IRQ line correctly.
In file included from trace.c:2:0:
trace.h: In function ‘trace_v9fs_attach’:
trace.h:2850:9: error: too many arguments for format [-Werror=format-extra-args]
trace.h: In function ‘trace_v9fs_wstat’:
trace.h:3039:9: error: too many arguments for format [-Werror=format-extra-args]
trace.h: In function ‘trace_v9fs_mkdir’:
trace.h:3088:9: error: too many arguments for format [-Werror=format-extra-args]
trace.h: In function ‘trace_v9fs_mkdir_return’:
trace.h:3095:9: error: too many arguments for format [-Werror=format-extra-args]
Fix the format strings and also use %u instead of %d for unsigned values
in the changed strings. There are more minor errors of this kind
which I did not fix because that would make the review more difficult.