Stefan Weil [Sat, 5 Jan 2013 08:33:43 +0000 (09:33 +0100)]
hw/i386: Fix broken build for non POSIX hosts
pc-testdev.c cannot be compiled with MinGW (and other non POSIX hosts):
CC i386-softmmu/hw/i386/../pc-testdev.o
qemu/hw/i386/../pc-testdev.c:38:22: warning: sys/mman.h: file not found
qemu/hw/i386/../pc-testdev.c: In function ‘test_flush_page’:
qemu/hw/i386/../pc-testdev.c:103: warning: implicit declaration of function ‘mprotect’
...
pc_fw_add_pflash_drv() ignores qemu_find_file() failure, and happily
creates a drive without a medium.
When pc_system_flash_init() asks for its size, bdrv_getlength() fails
with -ENOMEDIUM, which isn't checked either. It fails relatively
cleanly only because -ENOMEDIUM isn't a multiple of 4096:
$ qemu-system-x86_64 -S -vnc :0 -bios nonexistant
qemu: PC system firmware (pflash) must be a multiple of 0x1000
[Exit 1 ]
Fix this by moving the label to the end of the line, this way the
libvirt parser does still recognise the message. libvirt looks
for "char device redirected to ${ptsname}<whitespace>".
Stefan Hajnoczi [Thu, 3 Jan 2013 10:56:16 +0000 (11:56 +0100)]
dataplane: use linux-headers/ for virtio includes
The hw/dataplane/vring.c code includes linux/virtio_ring.h. Ensure that
we use linux-headers/ instead of the system-wide headers, which may be
out-of-date on older distros.
This resolves the following build error on Debian 6:
CC hw/dataplane/vring.o
cc1: warnings being treated as errors
hw/dataplane/vring.c: In function 'vring_enable_notification':
hw/dataplane/vring.c:71: error: implicit declaration of function 'vring_avail_event'
hw/dataplane/vring.c:71: error: nested extern declaration of 'vring_avail_event'
hw/dataplane/vring.c:71: error: lvalue required as left operand of assignment
Note that we now build dataplane/ for each target instead of only once.
There is no way around this since linux-headers/ is only available for
per-target objects - and it's how virtio, vfio, kvm, and friends are
built.
Michael Tokarev [Mon, 31 Dec 2012 11:30:31 +0000 (15:30 +0400)]
savevm.c: cleanup system includes
savevm.c suffers from the same problem as some other files.
Some years ago savevm.c was created from vl.c, moving some
code from there into a separate file. At that time, all
includes were just copied from vl.c to savevm.c, without
checking which ones are needed and which are not.
But actually most of that stuff is _not_ needed. More, some
stuff is wrong, for example, *BSD #ifdef'ery around <util.h>
vs <libutil.h> - for one, it fails to build on Debian/kFreebsd.
Just remove all this. Maybe there's a possibility to clean
it up further - like removing <windows.h> (and maybe including
winsock.h for htons etc), and maybe it's possible to remove
some internal #includes too, but I didn't check this.
While at it, remove duplicate #include of qemu/timer.h.
Curses display requires stdin/out to stay on the terminal,
so -daemonize makes no sense in this case. Instead of
leaving display uninitialized like is done since 995ee2bf469de6bb,
explicitly detect this case earlier and error out.
-nographic can actually be used with -daemonize, by redirecting
everything to a null device, but the problem is that according
to documentation and historical behavour, -nographic redirects
guest ports to stdin/out, which, again, makes no sense in case
of -daemonize. Since -nographic is a legacy option, don't bother
fixing this case (to allow -nographic and -daemonize by redirecting
guest ports to null instead of stdin/out in this case), but disallow
it completely instead, to stop garbling host terminal.
If no display display needed and user wants to use -nographic,
the right way to go is to use
-serial null -parallel null -monitor none -display none -vga none
instead of -nographic.
Also prevent the same issue -- it was possible to get garbled
host tty after
-nographic -daemonize
and it is still possible to have it by using
-serial stdio -daemonize
Fix this by disallowing opening stdio chardev when -daemonize
is specified.
Liu Yuan [Mon, 17 Dec 2012 06:17:26 +0000 (14:17 +0800)]
sheepdog: don't update inode when create_and_write fails
For the error case such as SD_RES_NO_SPACE, we shouldn't update the inode bitmap
to avoid the scenario that the object is allocated but wasn't created at the
server side. This will result in VM's IO error on the failed object.
liguang [Mon, 17 Dec 2012 01:49:23 +0000 (09:49 +0800)]
qemu-img: report size overflow error message
qemu-img will complain when qcow or qcow2
size overflow for 64 bits, report the right
message in this condition.
$./qemu-img create -f qcow2 /tmp/foo 0x10000000000000000
before change:
qemu-img: Invalid image size specified! You may use k, M, G or T suffixes for
qemu-img: kilobytes, megabytes, gigabytes and terabytes.
after change:
qemu-img: Image size must be less than 8 EiB!
[Resolved conflict with a9300911 goto removal -- Stefan]
The virtio-blk-data-plane feature is easy to integrate into
hw/virtio-blk.c. The data plane can be started and stopped similar to
vhost-net.
Users can take advantage of the virtio-blk-data-plane feature using the
new -device virtio-blk-pci,x-data-plane=on property.
The x-data-plane name was chosen because at this stage the feature is
experimental and likely to see changes in the future.
If the VM configuration does not support virtio-blk-data-plane an error
message is printed. Although we could fall back to regular virtio-blk,
I prefer the explicit approach since it prompts the user to fix their
configuration if they want the performance benefit of
virtio-blk-data-plane.
Limitations:
* Only format=raw is supported
* Live migration is not supported
* Block jobs, hot unplug, and other operations fail with -EBUSY
* I/O throttling limits are ignored
* Only Linux hosts are supported due to Linux AIO usage
Stefan Hajnoczi [Wed, 14 Nov 2012 14:39:30 +0000 (15:39 +0100)]
dataplane: add virtio-blk data plane code
virtio-blk-data-plane is a subset implementation of virtio-blk. It only
handles read, write, and flush requests. It does this using a dedicated
thread that executes an epoll(2)-based event loop and processes I/O
using Linux AIO.
This approach performs very well but can be used for raw image files
only. The number of IOPS achieved has been reported to be several times
higher than the existing virtio-blk implementation.
Eventually it should be possible to unify virtio-blk-data-plane with the
main body of QEMU code once the block layer and hardware emulation is
able to run outside the global mutex.
Stefan Hajnoczi [Mon, 10 Dec 2012 12:14:39 +0000 (13:14 +0100)]
virtio-blk: restore VirtIOBlkConf->config_wce flag
Two slightly different versions of a patch to conditionally set
VIRTIO_BLK_F_CONFIG_WCE through the "config-wce" qdev property have been
applied (ea776abca and eec7f96c2). David Gibson
<[email protected]> noticed that the "config-wce"
property is broken as a result and fixed it recently.
The fix sets the host_features VIRTIO_BLK_F_CONFIG_WCE bit from a qdev
property. Unfortunately, the virtio device then has no chance to test
for the presence of the feature bit during virtio_blk_init().
Therefore, reinstate the VirtIOBlkConf->config_wce flag. Drop the
duplicate qdev property to set the host_features bit. The
VirtIOBlkConf->config_wce flag will be used by virtio-blk-data-plane in
a later patch.
Stefan Hajnoczi [Thu, 22 Nov 2012 15:06:06 +0000 (16:06 +0100)]
iov: add qemu_iovec_concat_iov()
The qemu_iovec_concat() function copies a subset of a QEMUIOVector. The
new qemu_iovec_concat_iov() function does the same for a iov/cnt pair.
It is easy to define qemu_iovec_concat() in terms of
qemu_iovec_concat_iov(). The existing code is mostly unchanged, except
for the assertion src->size >= soffset, which cannot be efficiently
checked upfront on a iov/cnt pair. Instead we assert upon hitting the
end of src with an unsatisfied soffset.
Stefan Hajnoczi [Wed, 14 Nov 2012 14:30:09 +0000 (15:30 +0100)]
dataplane: add Linux AIO request queue
The IOQueue has a pool of iocb structs and a function to add new
read/write requests. Multiple requests can be added before calling the
submit function to actually tell the host kernel to begin I/O. This
allows callers to batch requests and submit them in one go.
Stefan Hajnoczi [Wed, 14 Nov 2012 14:23:00 +0000 (15:23 +0100)]
dataplane: add event loop
Outside the safety of the global mutex we need to poll on file
descriptors. I found epoll(2) is a convenient way to do that, although
other options could replace this module in the future (such as an
AioContext-based loop or glib's GMainLoop).
One important feature of this small event loop implementation is that
the loop can be terminated in a thread-safe way. This allows QEMU to
stop the data plane thread cleanly.
Stefan Hajnoczi [Wed, 14 Nov 2012 14:15:50 +0000 (15:15 +0100)]
dataplane: add virtqueue vring code
The virtio-blk-data-plane cannot access memory using the usual QEMU
functions since it executes outside the global mutex and the memory APIs
are this time are not thread-safe.
This patch introduces a virtqueue module based on the kernel's vhost
vring code. The trick is that we map guest memory ahead of time and
access it cheaply outside the global mutex.
Once the hardware emulation code can execute outside the global mutex it
will be possible to drop this code.
Stefan Hajnoczi [Tue, 20 Nov 2012 09:30:08 +0000 (10:30 +0100)]
dataplane: add host memory mapping code
The data plane thread needs to map guest physical addresses to host
pointers. Normally this is done with cpu_physical_memory_map() but the
function assumes the global mutex is held. The data plane thread does
not touch the global mutex and therefore needs a thread-safe memory
mapping mechanism.
Hostmem registers a MemoryListener similar to how vhost collects and
pushes memory region information into the kernel. There is a
fine-grained lock on the regions list which is held during lookup and
when installing a new regions list.
When the physical memory map changes the MemoryListener callbacks are
invoked. They build up a new list of memory regions which is finally
installed when the list has been completed.
Stefan Hajnoczi [Wed, 14 Nov 2012 10:43:23 +0000 (11:43 +0100)]
raw-posix: add raw_get_aio_fd() for virtio-blk-data-plane
The raw_get_aio_fd() function allows virtio-blk-data-plane to get the
file descriptor of a raw image file with Linux AIO enabled. This
interface is really a layering violation that can be resolved once the
block layer is able to run outside the global mutex - at that point
virtio-blk-data-plane will switch from custom Linux AIO code to using
the block layer.
Anthony Liguori [Wed, 2 Jan 2013 14:01:36 +0000 (08:01 -0600)]
Merge remote-tracking branch 'mst/tags/for_anthony' into staging
pci,virtio
This optimizes MSIX handling in virtio-pci.
Also included is pci express capability bugfix.
Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Anthony Liguori <[email protected]>
* mst/tags/for_anthony:
virtio-pci: don't poll masked vectors
msix: expose access to masked/pending state
msi: add API to get notified about pending bit poll
pcie: Fix bug in pcie_ext_cap_set_next
virtio: make bindings typesafe
target-mips: Use EXCP_SC rather than a magic number
From the discussion on the ML [1], the exception limit defined by
magic number 0x100 is actually EXCP_SC defined in cpu.h. Replace the
magic number with EXCP_SC. Remove "#if 1 .. #endif" as well.
Jovanovic, Petar [Tue, 11 Dec 2012 15:06:35 +0000 (15:06 +0000)]
target-mips: Make repl_ph to sign extend to target-long
The immediate value is 9bits, should sign-extend to 16bits. The return value to
register should sign-extend to target_long, as Richard says, removing an
unnecessary cast works fun.
Petar Jovanovic [Mon, 10 Dec 2012 15:28:17 +0000 (16:28 +0100)]
target-mips: Fix for helpers for EXTR_* instructions
The change removes some unnecessary and incorrect code for EXTR_S.H.
Further, it corrects the mask for shift value in the EXTR_ instructions. It also
extends the existing tests so they trigger the issues corrected with the change.
Petar Jovanovic [Thu, 6 Dec 2012 19:30:35 +0000 (20:30 +0100)]
target-mips: Fix incorrect reads and writes to DSPControl register
Upper 4 bits of ccond (bits 31..28 ) of DSPControl register are not used in
the MIPS32 architecture. They are used in the MIPS64 architecture. For MIPS32
these bits must be written as zero, and return zero on read.
The change fixes writes (WRDSP) and reads (RDDSP) to the register. It also fixes
the tests that use these instructions, and makes them smaller and simpler.
Blue Swirl [Fri, 28 Dec 2012 16:08:23 +0000 (16:08 +0000)]
Merge branch 'qom-cpu' of git://repo.or.cz/qemu/afaerber
* 'qom-cpu' of git://repo.or.cz/qemu/afaerber:
MAINTAINERS: Include X86CPU in CPU maintenance area
cpu: Move kvm_run into CPUState
cpu: Move kvm_state field into CPUState
ppc_booke: Pass PowerPCCPU to ppc_booke_timers_init()
ppc4xx_devs: Return PowerPCCPU from ppc4xx_init()
ppc_booke: Pass PowerPCCPU to {decr,fit,wdt} timer callbacks
ppc: Pass PowerPCCPU to [h]decr timer callbacks
ppc: Pass PowerPCCPU to [h]decr callbacks
ppc: Pass PowerPCCPU to ppc_set_irq()
kvm: Pass CPUState to kvm_vcpu_ioctl()
kvm: Pass CPUState to kvm_arch_*
cpu: Move kvm_fd into CPUState
qdev-properties.c: Separate core from the code used only by qemu-system-*
qdev: Coding style fixes
cpu: Introduce CPUListState struct
target-alpha: Add support for -cpu ?
target-alpha: Turn CPU definitions into subclasses
target-alpha: Avoid leaking the alarm timer over reset
alpha: Pass AlphaCPU array to Typhoon
target-alpha: Let cpu_alpha_init() return AlphaCPU
At the moment, when irqfd is in use but a vector is masked,
qemu will poll it and handle vector masks in userspace.
Since almost no one ever looks at the pending bits,
it is better to defer this until pending bits
are actually read.
Implement this optimization using the new poll notifier.
Robert Schiele [Tue, 4 Dec 2012 15:58:08 +0000 (16:58 +0100)]
configure: allow disabling pixman if not needed
When we build neither any system emulation targets nor the tools there
is actually no need for pixman library. In that case do not enforce
presence of that library on the system.
* Do not depend on "qemu-timer-common.o".
* Use "$(obj)" in rules to refer to the build sub-directory.
* Remove dependencies against "$(GENERATED_HEADERS)".
Eduardo Habkost [Thu, 20 Dec 2012 18:43:48 +0000 (16:43 -0200)]
target-i386: CPUID: return highest basic leaf if eax > cpuid_xlevel
This fixes a subtle bug. A bug that probably won't cause trouble for any
existing OS, but a bug anyway:
Intel SDM Volume 2, CPUID Instruction states:
> Two types of information are returned: basic and extended function
> information. If a value entered for CPUID.EAX is higher than the maximum
> input value for basic or extended function for that processor then the
> data for the highest basic information leaf is returned. For example,
> using the Intel Core i7 processor, the following is true:
>
> CPUID.EAX = 05H (* Returns MONITOR/MWAIT leaf. *)
> CPUID.EAX = 0AH (* Returns Architectural Performance Monitoring leaf. *)
> CPUID.EAX = 0BH (* Returns Extended Topology Enumeration leaf. *)
> CPUID.EAX = 0CH (* INVALID: Returns the same information as CPUID.EAX = 0BH. *)
> CPUID.EAX = 80000008H (* Returns linear/physical address size data. *)
> CPUID.EAX = 8000000AH (* INVALID: Returns same information as CPUID.EAX = 0BH. *)
AMD's CPUID Specification, on the other hand, is less specific:
> The CPUID instruction supports two sets or ranges of functions,
> standard and extended.
>
> • The smallest function number of the standard function range is
> Fn0000_0000. The largest function num- ber of the standard function
> range, for a particular implementation, is returned in CPUID
> Fn0000_0000_EAX.
>
> • The smallest function number of the extended function range is
> Fn8000_0000. The largest function num- ber of the extended function
> range, for a particular implementation, is returned in CPUID
> Fn8000_0000_EAX.
>
> Functions that are neither standard nor extended are undefined and
> should not be relied upon.
QEMU's behavior matched Intel's specification before, but this was
changed by commit b3baa152aaef1905876670590275c2dd0bbb088c. This patch
restores the behavior documented by Intel when cpuid_xlevel2 is 0.
The existing behavior when cpuid_xlevel2 is set (falling back to
level=cpuid_xlevel) is being kept, as I couldn't find any public
documentation on the CPUID 0xC0000000 function range on Centaur CPUs.
Lei Li [Fri, 21 Dec 2012 04:26:38 +0000 (12:26 +0800)]
qemu-char: Inherit ptys and improve output from -serial pty
Changes since V1:
- Avoid crashing since qemu_opts_id() may return null on some
systems according to Markus's suggestion.
When controlling a qemu instance from another program, it's
hard to know which serial port or monitor device is redirected
to which pty. With more than one device using "pty" a lot of
guesswork is involved.
$ ./x86_64-softmmu/qemu-system-x86_64 -serial pty -serial pty -monitor pty
char device redirected to /dev/pts/5
char device redirected to /dev/pts/6
char device redirected to /dev/pts/7
Although we can find out what everything else is connected to
by the "info chardev" with "-monitor stdio" in the command line,
It'd be very useful to be able to have qemu inherit pseudo-tty
file descriptors so they could just be specified on the command
line like:
$ ./x86_64-softmmu/qemu-system-x86_64 -serial pty -serial pty -monitor pty
char device compat_monitor0 redirected to /dev/pts/5
char device serial0 redirected to /dev/pts/6
char device serial1 redirected to /dev/pts/7
Liming Wang [Fri, 21 Dec 2012 08:56:58 +0000 (16:56 +0800)]
net: add missing include file
To fix building error:
CC net/vde.o
net/vde.c: In function ‘vde_cleanup’:
net/vde.c:65:5: error: implicit declaration of function ‘qemu_set_fd_handler’ [-Werror=implicit-function-declaration]
net/vde.c:65:5: error: nested extern declaration of ‘qemu_set_fd_handler’ [-Werror=nested-externs]
cc1: all warnings being treated as errors
translate-all.c: Use tb1->phys_hash_next directly in tb_remove
When tb_remove was first commited at fd6ce8f6, there were three different
calls pass different names to offsetof. In current codebase, the other two
calls are replaced with tb_page_remove. There is no need to have a general
tb_remove. Omit passing the third parameter and using tb1->phys_hash_next
directly.
Paolo Bonzini [Thu, 20 Dec 2012 11:29:19 +0000 (12:29 +0100)]
build: fix includes for VNC
vnc-tls.h is included by vnc.h, and it includes gnutls/gnutls.h.
Hence, GnuTLS header files are needed by all files that include
vnc.h, most notably qmp.c. Move these flags to QEMU_CFLAGS for
simplicity.
Juan Quintela [Wed, 19 Dec 2012 08:55:50 +0000 (09:55 +0100)]
migration: merge QEMUFileBuffered into MigrationState
Avoid splitting the state of outgoing migration, more or less arbitrarily,
between two data structures. QEMUFileBuffered anyway is used only during
migration.
Juan Quintela [Mon, 10 Dec 2012 12:27:50 +0000 (13:27 +0100)]
ram: refactor ram_save_block() return value
It could only return 0 if we only found dirty xbzrle pages that hadn't
changed (i.e. they were written with the same content). We don't care
about that case, it is the same than nothing dirty.
So now the return of the function is how much have it written, nothing
else. Adjust callers.
And we also made ram_save_iterate() return the number of transferred
bytes, not the number of transferred pages.
Juan Quintela [Wed, 17 Oct 2012 22:00:59 +0000 (00:00 +0200)]
ram: optimize migration bitmap walking
Instead of testing each page individually, we search what is the next
dirty page with a bitmap operation. We have to reorganize the code to
move from a "for" loop, to a while(dirty) loop.
Juan Quintela [Fri, 21 Sep 2012 09:18:18 +0000 (11:18 +0200)]
savevm: New save live migration method: pending
Code just now does (simplified for clarity)
if (qemu_savevm_state_iterate(s->file) == 1) {
vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
qemu_savevm_state_complete(s->file);
}
Problem here is that qemu_savevm_state_iterate() returns 1 when it
knows that remaining memory to sent takes less than max downtime.
But this means that we could end spending 2x max_downtime, one
downtime in qemu_savevm_iterate, and the other in
qemu_savevm_state_complete.
Changed code to:
pending_size = qemu_savevm_state_pending(s->file, max_size);
DPRINTF("pending size %lu max %lu\n", pending_size, max_size);
if (pending_size >= max_size) {
ret = qemu_savevm_state_iterate(s->file);
} else {
vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
qemu_savevm_state_complete(s->file);
}
So what we do is: at current network speed, we calculate the maximum
number of bytes we can sent: max_size.
Then we ask every save_live section how much they have pending. If
they are less than max_size, we move to complete phase, otherwise we
do an iterate one.
This makes things much simpler, because now individual sections don't
have to caluclate the bandwidth (it was implossible to do right from
there).