Stefan Hajnoczi [Fri, 17 Dec 2010 12:01:50 +0000 (12:01 +0000)]
virtio-pci: Use ioeventfd for virtqueue notify
Virtqueue notify is currently handled synchronously in userspace virtio. This
prevents the vcpu from executing guest code while hardware emulation code
handles the notify.
On systems that support KVM, the ioeventfd mechanism can be used to make
virtqueue notify a lightweight exit by deferring hardware emulation to the
iothread and allowing the VM to continue execution. This model is similar to
how vhost receives virtqueue notifies.
The result of this change is improved performance for userspace virtio devices.
Virtio-blk throughput increases especially for multithreaded scenarios and
virtio-net transmit throughput increases substantially.
Some virtio devices are known to have guest drivers which expect a notify to be
processed synchronously and spin waiting for completion.
For virtio-net, this also seems to interact with the guest stack in strange
ways so that TCP throughput for small message sizes (~200bytes)
is harmed. Only enable ioeventfd for virtio-blk for now.
Care must be taken not to interfere with vhost-net, which uses host
notifiers. If the set_host_notifier() API is used by a device
virtio-pci will disable virtio-ioeventfd and let the device deal with
host notifiers as it wishes.
Finally, there used to be a limit of 6 KVM io bus devices inside the
kernel. On such a kernel, don't use ioeventfd for virtqueue host
notification since the limit is reached too easily. This ensures that
existing vhost-net setups (which always use ioeventfd) have ioeventfds
available so they can continue to work.
After migration and on VM change state (running/paused) virtio-ioeventfd
will enable/disable itself.
Stefan Hajnoczi [Mon, 10 Jan 2011 11:50:05 +0000 (13:50 +0200)]
kvm: test for ioeventfd support on old kernels
There used to be a limit of 6 KVM io bus devices in the kernel.
On such a kernel, we can't use many ioeventfds for host notification
since the limit is reached too easily.
Stefan Hajnoczi [Fri, 17 Dec 2010 12:01:49 +0000 (12:01 +0000)]
virtio-pci: Rename bugs field to flags
The VirtIOPCIProxy bugs field is currently used to enable workarounds
for older guests. Rename it to flags so that other per-device behavior
can be tracked.
A later patch uses the flags field to remember whether ioeventfd should
be used for virtqueue host notification.
Gerd Hoffmann [Thu, 6 Jan 2011 14:14:39 +0000 (15:14 +0100)]
vga: tag as not hotplugable.
This patch tags all vga cards as not hotpluggable. The qemu
standard vga will never ever be hotpluggable. For cirrus + vmware
it might be possible to get that work some day. Todays we can't
handle that for a number of reasons though.
Aurelien Jarno [Thu, 6 Jan 2011 21:43:13 +0000 (22:43 +0100)]
slirp: fix unaligned access in bootp code
Slirp code tries to be smart an avoid data copy by using pointer to
the data. This solution leads to unaligned access, in this case
preq_addr, which is a 32-bit long structure. There is no real point
of avoiding data copy in a such case, as the value itself is smaller
or the same size as a pointer.
The patch replaces pointers to the preq_addr structure by the strcture
itself, and use the address 0.0.0.0 if no address has been requested
(this is not a valid address in such a request). It compares it with
htonl(0L) for correctness reasons, in case a code checker look for such
mistakes. It also uses memcpy() for copying the data, which takes care
of alignement issues.
This fixes an unaligned access on IA64 host while requesting a DHCP
address.
Aurelien Jarno [Thu, 6 Jan 2011 21:43:13 +0000 (22:43 +0100)]
tcg/arm: improve constant loading
Improve constant loading in two ways:
- On all ARM versions, it's possible to load 0xffffff00 = -0x100 using
the mvn rd, #0. Fix the conditions.
- On <= ARMv6 versions, where movw and movt are not available, load the
constants using mov and orr with rotations depending on the constant
to load. This is very useful for example to load constants where the
low byte is 0. This reduce the generated code size by about 7%.
Aurelien Jarno [Sun, 9 Jan 2011 22:53:45 +0000 (23:53 +0100)]
target-sh4: improve TLB
SH4 is using 16-bit instructions which means most of the constants are
loaded through a constant pool at the end of the subroutine. The same
memory page is therefore accessed in exec and read mode.
With the current implementation, a QEMU TLB entry is set to read or
read/write mode after an UTLB search and to exec mode after an ITLB
search, which causes a lot of TLB exceptions to switch from read or
read/write to exec and vice versa.
This patch optimizes that by already setting the QEMU TLB entry in read
or read/write mode when an UTLB entry is copied into ITLB (during an
ITLB miss). This improve the emulation speed by about 14%.
Aurelien Jarno [Sun, 9 Jan 2011 22:53:45 +0000 (23:53 +0100)]
target-sh4: implement writes to mmaped ITLB
Some Linux kernels seems to implement ITLB/UTLB flushing through by
writing all TLB entries through the memory mapped interface instead
of writing one to MMUCR.TI.
Implement memory mapped ITLB write interface so that such kernels can
boot. This fixes https://bugs.launchpad.net/bugs/700774 .
Aurelien Jarno [Thu, 6 Jan 2011 21:43:13 +0000 (22:43 +0100)]
tcg/arm: fix branch target change during code retranslation
QEMU uses code retranslation to restore the CPU state when an exception
happens. For it to work the retranslation must not modify the generated
code. This is what is currently implemented in ARM TCG.
However on CPU that don't have icache/dcache/memory synchronised like
ARM, this requirement is stronger and code retranslation must not modify
the generated code "atomically", as the cache line might be flushed
at any moment (interrupt, exception, task switching), even if not
triggered by QEMU. The probability for this to happen is very low, and
depends on cache size and associativiy, machine load, interrupts, so the
symptoms are might happen randomly.
This requirement is currently not followed in tcg/arm, for the
load/store code, which basically has the following structure:
1) tlb access code is written
2) conditional fast path code is written
3) branch is written with a temporary target
4) slow path code is written
5) branch target is updated
The cache lines corresponding to the retranslated code is not flushed
after code retranslation as the generated code is supposed to be the
same. However if the cache line corresponding to the branch instruction
is flushed between step 3 and 5, and is not flushed again before the
code is executed again, the branch target is wrong. In the guest, the
symptoms are MMU page fault at a random addresses, which leads to
kernel page fault or segmentation faults.
The patch fixes this issue by avoiding writing the branch target until
it is known, that is by writing only the branch instruction first, and
later only the offset.
This fixes booting linux guests on ARM hosts (tested: arm, i386, mips,
mipsel, sh4, sparc).
Aurelien Jarno [Sat, 8 Jan 2011 15:25:48 +0000 (16:25 +0100)]
Merge branch 'linux-user-for-upstream' of git://gitorious.org/qemu-maemo/qemu
* 'linux-user-for-upstream' of git://gitorious.org/qemu-maemo/qemu:
Remove dead code for ARM semihosting commandline handling
Fix commandline handling for ARM semihosted executables
linux-user: Fix incorrect NaN detection in ARM nwfpe emulation
softfloat: Implement floatx80_is_any_nan() and float128_is_any_nan()
linux-user: Implement FS_IOC_FIEMAP ioctl
linux-user: Support ioctls whose parameter size is not constant
linux-user: Implement sync_file_range{,2} syscalls
Fix commandline handling for ARM semihosted executables
Use the copy of the command line that loader_build_argptr() sets up in guest
memory as the command line to return from the ARM SYS_GET_CMDLINE semihosting
call. Previously we were using a pointer to memory which had already been
freed before the guest program started.
This fixes https://bugs.launchpad.net/qemu/+bug/673613 .
Peter Maydell [Thu, 6 Jan 2011 18:34:44 +0000 (18:34 +0000)]
linux-user: Fix incorrect NaN detection in ARM nwfpe emulation
The code in the linux-user ARM nwfpe emulation was incorrectly
checking only for quiet NaNs when it should have been checking
for any kind of NaN. This is probably because the code in
question was taken from the Linux kernel, whose copy of the
softfloat library had been modified so that float*_is_nan()
returned true for all NaNs, not just quiet ones. The qemu
equivalent function is float*_is_any_nan(), so use that.
NB that this code is really obsolete since nobody uses FPE
for actual arithmetic now; this is just cleanup following
the recent renaming of the NaN related functions.
Peter Maydell [Thu, 6 Jan 2011 15:04:18 +0000 (15:04 +0000)]
linux-user: Implement FS_IOC_FIEMAP ioctl
Implement the FS_IOC_FIEMAP ioctl using the new support for
custom handling of ioctls; this is needed because the struct
that is passed includes a variable-length array.
Peter Maydell [Thu, 6 Jan 2011 15:04:17 +0000 (15:04 +0000)]
linux-user: Support ioctls whose parameter size is not constant
Some ioctls (for example FS_IOC_FIEMAP) use structures whose size is
not constant. The generic argument conversion code in do_ioctl()
cannot handle this, so add support for implementing a special-case
handler for a particular ioctl which does the conversion itself.
Peter Maydell [Thu, 6 Jan 2011 19:37:55 +0000 (19:37 +0000)]
target-arm: wire up the softfloat flush_input_to_zero flag
Wire up the new softfloat support for flushing input denormals
to zero on ARM. The FPSCR FZ bit enables flush-to-zero for
both inputs and outputs, but the reporting of when inputs are
flushed to zero is via a separate IDC bit rather than the UFC
(underflow) bit used when output denormals are flushed to zero.
Peter Maydell [Thu, 6 Jan 2011 19:37:54 +0000 (19:37 +0000)]
target-arm: Set softfloat cumulative exc flags from correct FPSCR bits
When handling a write to the ARM FPSCR, set the softfloat cumulative
exception flags from the cumulative flags in the FPSCR, not the
exception-enable bits. Also don't apply a mask: vfp_exceptbits_to_host
will only look at the correct bits anyway.
Peter Maydell [Thu, 6 Jan 2011 19:37:53 +0000 (19:37 +0000)]
softfloat: Implement flushing input denormals to zero
Add support to softfloat for flushing input denormal float32 and float64
to zero. softfloat's existing 'flush_to_zero' flag only flushes denormals
to zero on output. Some CPUs need input denormals to be flushed before
processing as well. Implement this, using a new status flag to enable it
and a new exception status bit to indicate when it has happened. Existing
CPUs should be unaffected as there is no behaviour change unless the
mode is enabled.
Aurelien Jarno [Thu, 6 Jan 2011 18:53:56 +0000 (19:53 +0100)]
target-arm: fix SMMLA/SMMLS instructions
SMMLA and SMMLS are broken on both in normal and thumb mode, that is
both (different) implementations are wrong. They try to avoid a 64-bit
add for the rounding, which is not trivial if you want to support both
SMMLA and SMMLS with the same code.
The code below uses the same implementation for both modes, using the
code from the ARM manual. It also fixes the thumb decoding that was a
mix between normal and thumb mode.
This fixes the issues reported in
https://bugs.launchpad.net/qemu/+bug/629298
Blue Swirl [Thu, 6 Jan 2011 18:25:37 +0000 (18:25 +0000)]
block: delete a write-only variable
Avoid a warning with GCC 4.6.0:
/src/qemu/block.c: In function 'bdrv_img_create':
/src/qemu/block.c:2862:25: error: variable 'fmt' set but not used [-Werror=unused-but-set-variable]
Aurelien Jarno [Thu, 6 Jan 2011 14:38:19 +0000 (15:38 +0100)]
softfloat: use float{32,64,x80,128}_maybe_silence_nan()
Use float{32,64,x80,128}_maybe_silence_nan() instead of toggling the
sNaN bit manually. This allow per target implementation of sNaN to qNaN
conversion.
Aurelien Jarno [Thu, 6 Jan 2011 14:38:19 +0000 (15:38 +0100)]
softfloat: fix float{32,64}_maybe_silence_nan() for MIPS
On targets that define sNaN with the sNaN bit as one, simply clearing
this bit may correspond to an infinite value.
Convert it to a default NaN if SNAN_BIT_IS_ONE, as it corresponds to
the MIPS implementation, the only emulated CPU with SNAN_BIT_IS_ONE.
When other CPU of this type are added, this might be updated to include
more cases.
Aurelien Jarno [Thu, 6 Jan 2011 14:38:18 +0000 (15:38 +0100)]
target-ppc: remove PRECISE_EMULATION define
The PRECISE_EMULATION is "hardcoded" to one in target-ppc/exec.h and not
something easily tunable. Remove it and non-precise emulation code as
it doesn't make a noticeable difference in speed. People wanting speed
improvement should use softfloat-native instead.
Alex Williamson [Tue, 4 Jan 2011 19:38:02 +0000 (12:38 -0700)]
rtl8139: Use subsection to restrict migration after hotplug
rtl8139 includes a cpu_register_io_memory acquired value in it's
migration data. This is not only unecessary, but we should treat
these values as unique to the VM instances since the value depends
on call order. In most cases, this miraculously still works.
However, if devices are added or removed from the system, it may
represent an ordering change, which could cause the target rtl8139
device to make use of another device's cpu_register_io_memory value.
If we detect that a hot-add/remove has occured, include a subsection
to restrict migrations only to driver versions known to include this
fix.
Alex Williamson [Tue, 4 Jan 2011 19:37:50 +0000 (12:37 -0700)]
qdev: Track runtime machine modifications
Create a trivial interface to track whether the machine has been
modified since boot. Adding or removing devices will trigger this
to return true. An example usage scenario for such an interface is
the rtl8139 driver which includes a cpu_register_io_memory() value
in it's migration stream. For the majority of migrations, where
no hotplug has occured in the machine, this works correctly. Once
the machine is modified, we can use this interface to detect that
and include a subsection for the device to prevent migrations to
rtl8139 versions with this bug.
It uses the color expansion rop with the source pitch set to 0. This is
something allowed, as the manual explicitely says "When the source of
color-expand data is display memory, the source pitch is ignored.".
This patch fixes this regression by computing sx, sy and others
variables only if they are going to be used later, that is for a plain
copy ROP. It basically consists in moving code.
Aurelien Jarno [Tue, 4 Jan 2011 20:58:24 +0000 (21:58 +0100)]
Fix curses on big endian hosts
On big endian hosts, the curses interface is unusable: the emulated
graphic card only displays garbage, while the monitor interface displays
nothing (or rather only spaces).
The curses interface is waiting for data in native endianness, so
console_write_ch() should not do any conversion. The conversion should
be done when reading the video buffer in hw/vga.c. I supposed this
buffer is in little endian mode, though it's not impossible that the
data is actually in guest endianness. I currently have no big endian
guest to way (they all switch to graphic mode immediately).
Michael Walle [Tue, 4 Jan 2011 00:48:55 +0000 (01:48 +0100)]
noaudio: correctly account acquired samples
This will fix the return value of the function which otherwise returns too
many samples because sw->total_hw_samples_acquired isn't correctly
accounted.
Peter Maydell [Thu, 16 Dec 2010 11:51:17 +0000 (11:51 +0000)]
softfloat: abstract out target-specific NaN propagation rules
IEEE754 doesn't specify precisely what NaN should be returned as
the result of an operation on two input NaNs. This is therefore
target-specific. Abstract out the code in propagateFloat*NaN()
which was implementing the x87 propagation rules, so that it
can be easily replaced on a per-target basis.
Peter Maydell [Fri, 17 Dec 2010 15:56:06 +0000 (15:56 +0000)]
softfloat: Rename float*_is_nan() functions to float*_is_quiet_nan()
The softfloat functions float*_is_nan() were badly misnamed,
because they return true only for quiet NaNs, not for all NaNs.
Rename them to float*_is_quiet_nan() to more accurately reflect
what they do.
This change was produced by:
perl -p -i -e 's/_is_nan/_is_quiet_nan/g' $(git grep -l is_nan)
(with the results manually checked.)
Aurelien Jarno [Tue, 28 Dec 2010 16:46:59 +0000 (17:46 +0100)]
TCG: Improve tb_phys_hash_func()
Most of emulated CPU have instructions aligned on 16 or 32 bits, while
on others GCC tries to align the target jump location. This means that
1/2 or 3/4 of tb_phys_hash entries are never used.
Update the hash function tb_phys_hash_func() to ignore the two lowest
bits of the address. This brings a 6% speed-up when booting a MIPS
image.
Aurelien Jarno [Fri, 31 Dec 2010 16:50:27 +0000 (17:50 +0100)]
target-arm: fix UMAAL instruction
UMAAL should use unsigned multiply instead of signed.
This patch fixes this issue by handling UMAAL separately from
UMULL/UMLAL/SMULL/SMLAL as these instructions are different
enough. It also explicitly list instructions in case and catch
nonexistent instruction as illegal. Also fixes a few style issues.
This fixes the issues reported in
https://bugs.launchpad.net/qemu/+bug/696015
Aurelien Jarno [Sat, 25 Dec 2010 22:25:47 +0000 (23:25 +0100)]
target-sparc: fix udiv(cc) and sdiv(cc)
Since commit 5a4bb580cdb10b066f9fd67658b31cac4a4ea5e5, Xorg crashes on
a Debian Etch image. The commit itself is fine, but it triggers a bug
due to wrong computation of flags for udiv(cc) and sdiv(cc).
This patch only compute cc_src2 for the cc version of udiv/sdiv. It
also moves the update of cc_dst and cc_op to the helper, as it is
faster doing it here when there is already an helper.
Luiz Capitulino [Wed, 15 Dec 2010 19:56:18 +0000 (17:56 -0200)]
Fix migrate set speed doc arg
We used to ignore any fractional part in 0.13, but due to recent
changes (started with 9f9b17a4f0865286391e4d3a0a735230122a2289)
migrate_set_speed will reject the fractional part.
We don't expect existing clients to be relying on this, but we
need to update the documentation to reflect the change.
Peter Maydell [Tue, 7 Dec 2010 14:13:45 +0000 (14:13 +0000)]
target-arm: Correct result in saturating cases for VQSHL of s8/16/32
Where VQSHL of a signed 8/16/32 bit value saturated, the result
value was not being calculated correctly (it should be either
the minimum or maximum value for the size of the signed type).
Juha Riihimäki [Tue, 7 Dec 2010 14:13:42 +0000 (14:13 +0000)]
target-arm: Fix VQSHL of signed 64 bit values
Add a missing '-' which meant that we were misinterpreting the shift
argument for VQSHL of 64 bit signed values and treating almost every
shift value as if it were an extremely large right shift.
pci: fix migration path for devices behind bridges
The device path used for migration is currently broken for
for all devices behind a nested bridge.
Replace this by a hierarchical list of slot/function numbers, walking
the path from root down to device. Add :00 after the domain number
so that if there are no nested bridges, this is compatible
with what we have now.
Note: as pointed out by Gleb, using openfirmware paths
might be cleaner, doing this would break compatibility though,
and the IDs used are not guest or user visible at all,
so breaking the compatibility is probably not worth it.
Aurelien Jarno [Sat, 25 Dec 2010 21:56:32 +0000 (22:56 +0100)]
target-mips: fix host CPU consumption when guest is idle
When the CPU is in wait state, do not wake-up if an interrupt can't be
taken. This avoid host CPU running at 100% if a device (e.g. timer) has
an interrupt line left enabled.
Also factorize code to check if interrupts are enabled in
cpu_mips_hw_interrupts_pending().
Stefan Weil [Thu, 16 Dec 2010 18:33:22 +0000 (19:33 +0100)]
qdev: sysbus_get_default must not return a NULL pointer (fix regression)
Every system should have some sort of main system bus,
so sysbus_get_default should always return a valid bus.
Without this patch, at least mipssim and malta no longer
start but raise a null pointer access exception (caused by
commit ec990eb622ad46df5ddcb1e94c418c271894d416).