Avi Kivity [Tue, 26 Jul 2011 11:26:13 +0000 (14:26 +0300)]
memory: transaction API
Allow changes to the memory hierarchy to be accumulated and
made visible all at once. This reduces computational effort,
especially when an accelerator (e.g. kvm) is involved.
Useful when a single register update causes multiple changes
to an address space.
The single pass algorithm removed a1, added b2, then removed a2 and a3,
which caused the wrong memory map to be built. The two pass algorithm
removes a1, a2, and a3, then adds b1.
Avi Kivity [Tue, 26 Jul 2011 11:26:11 +0000 (14:26 +0300)]
memory: add ioeventfd support
As with the rest of the memory API, the caller associates an eventfd
with an address, and the memory API takes care of registering or
unregistering when the address is made visible or invisible to the
guest.
Avi Kivity [Tue, 26 Jul 2011 11:26:07 +0000 (14:26 +0300)]
memory: late initialization of ram_addr
For non-RAM memory regions, we cannot tell whether this is an I/O region
or an MMIO region. Since the qemu backing registration is different for
the two, we have to defer initialization until we know which address
space we are in.
These shenanigans will be removed once the backing registration is unified
with the memory API.
Avi Kivity [Tue, 26 Jul 2011 11:26:05 +0000 (14:26 +0300)]
memory: abstract address space operations
Prepare for multiple address space support by abstracting away the details
of registering a memory range with qemu's flat representation into an
AddressSpace object.
Note operations which are memory specific are not abstracted, since they will
never be called on I/O address spaces anyway.
Avi Kivity [Tue, 26 Jul 2011 11:26:04 +0000 (14:26 +0300)]
Internal interfaces for memory API
get_system_memory() provides the root of the memory hierarchy.
This interface is intended to be private between memory.c and exec.c.
If this file is included elsewhere, it should be regarded as a bug (or
TODO item). However, it will be temporarily needed for the conversion
to hierarchical memory routing.
Avi Kivity [Tue, 26 Jul 2011 11:26:03 +0000 (14:26 +0300)]
memory: merge adjacent segments of a single memory region
Simple implementations of memory routers, for example the Cirrus VGA memory banks
or the 440FX PAM registers can generate adjacent memory regions which are contiguous.
Detect these and merge them; this saves kvm memory slots and shortens lookup times.
Avi Kivity [Tue, 26 Jul 2011 11:26:01 +0000 (14:26 +0300)]
Hierarchical memory region API
The memory API separates the attributes of a memory region (its size, how
reads or writes are handled, dirty logging, and coalescing) from where it
is mapped and whether it is enabled. This allows a device to configure
a memory region once, then hand it off to its parent bus to map it according
to the bus configuration.
Hierarchical registration also allows a device to compose a region out of
a number of sub-regions with different properties; for example some may be
RAM while others may be MMIO.
Jan Kiszka [Sun, 24 Jul 2011 17:38:36 +0000 (19:38 +0200)]
qdev: Reset hot-plugged devices
Device models rely on the core invoking their reset handlers after init.
We do this in the cold-plug case, but so far we miss this step after
hot-plug.
Blue Swirl [Sat, 23 Jul 2011 21:21:14 +0000 (21:21 +0000)]
simpletrace: suppress a warning from unused variable
Avoid this warning:
CC simpletrace.o
/src/qemu/simpletrace.c: In function 'writeout_thread':
/src/qemu/simpletrace.c:122:12: error: variable 'unused' set but not used [-Werror=unused-but-set-variable]
by adding GCC attribute unused to the variable.
Blue Swirl [Sat, 23 Jul 2011 20:04:29 +0000 (20:04 +0000)]
Wrap recv to avoid warnings
Avoid warnings like these by wrapping recv():
CC slirp/ip_icmp.o
/src/qemu/slirp/ip_icmp.c: In function 'icmp_receive':
/src/qemu/slirp/ip_icmp.c:418:5: error: passing argument 2 of 'recv' from incompatible pointer type [-Werror]
/usr/local/lib/gcc/i686-mingw32msvc/4.6.0/../../../../i686-mingw32msvc/include/winsock2.h:547:32: note: expected 'char *' but argument is of type 'struct icmp *'
Jan Kiszka [Fri, 17 Jun 2011 09:25:49 +0000 (11:25 +0200)]
Register Linux dyntick timer as per-thread signal
Derived from kvm-tool patch
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/74309
Ingo Molnar pointed out that sending the timer signal to the whole
process, just blocking it everywhere, is suboptimal with an increasing
number of threads. QEMU is also using this pattern so far.
Linux provides a (non-portable) way to restrict the signal to a single
thread: We can use SIGEV_THREAD_ID unless we are forced to emulate
signalfd via an additional thread. That case could theoretically be
optimized as well, but it doesn't look worth bothering.
Jan Kiszka [Mon, 20 Jun 2011 12:06:28 +0000 (14:06 +0200)]
mc146818rtc: Handle host clock resets
Make use of the new clock reset notifier to update the RTC whenever
rtc_clock is the host clock and that happens to jump backward. This
avoids that the RTC stalls for the period the host clock was set back.
Jan Kiszka [Mon, 20 Jun 2011 12:06:27 +0000 (14:06 +0200)]
qemu-timer: Introduce clock reset notifier
QEMU_CLOCK_HOST is based on the system time which may jump backward in
case the admin or NTP adjusts it. RTC emulations and other device models
can suffer in this case as timers will stall for the period the clock
was tuned back.
This adds a detection mechanism that checks on every host clock readout
if the new time is before the last result. If that is the case a
notifier list is informed. Device models interested in this event can
register a notifier with the clock.
Jan Kiszka [Mon, 20 Jun 2011 12:06:26 +0000 (14:06 +0200)]
notifier: Pass data argument to callback
This allows to pass additional information to the notifier callback
which is useful if sender and receiver do not share any other distinct
data structure.
Indentation is off, and "dev-prop-int" suggests these are properties
you can configure with -device, which isn't the case. The other
buses' print_dev() callbacks don't do that. For instance, PCI's
output looks like this:
class Ethernet controller, addr 00:03.0, pci id 1af4:1000 (sub 1af4:0001)
bar 0: i/o at 0xffffffffffffffff [0x1e]
bar 1: mem at 0xffffffffffffffff [0xffe]
bar 6: mem at 0xffffffffffffffff [0xfffe]
Change virtser_bus_dev_print() to that style. Result:
dev: virtconsole, id ""
dev-prop: is_console = 1
dev-prop: nr = 0
dev-prop: chardev = <null>
dev-prop: name = <null>
port 0, guest on, host off, throttle off
Introduce a 'client_add' monitor command accepting an open FD
Allow client connections for VNC and socket based character
devices to be passed in over the monitor using SCM_RIGHTS.
One intended usage scenario is to start QEMU with VNC on a
UNIX domain socket. An unprivileged user which cannot access
the UNIX domain socket, can then connect to QEMU's VNC server
by passing an open FD to libvirt, which passes it onto QEMU.
In this case 'protocol' can be 'vnc' or 'spice', or the name
of a character device (eg from -chardev id=XXXX)
The 'skipauth' parameter can be used to skip any configured
VNC authentication scheme, which is useful if the mgmt layer
talking to the monitor has already authenticated the client
in another way.
Store VNC auth scheme per-client as well as per-server
A future patch will introduce a situation where different
clients may have different authentication schemes set.
When a new client arrives, copy the 'auth' and 'subauth'
fields from VncDisplay into the client's VncState, and
use the latter in all authentication functions.
* ui/vnc.h: Add 'auth' and 'subauth' to VncState
* ui/vnc-auth-sasl.c, ui/vnc-auth-vencrypt.c,
ui/vnc.c: Make auth functions pull auth scheme
from VncState instead of VncDisplay
Wen Congyang [Fri, 17 Jun 2011 02:25:22 +0000 (10:25 +0800)]
do not reset no_shutdown after we shutdown the vm
Daniel P. Berrange sent a libvirt's patch to support
reboots with the QEMU driver. He implements it in
json model like this:
1. add -no-shutdown in the qemu's option:
qemu -no-shutdown xxxx
2. shutdown the vm by monitor command system_powerdown
3. wait for shutdown event
4. reset the vm by monitor command system_reset
no_shutdown will be reset to 0 if the vm is powered down.
We only can reboot the vm once.
If no_shutdown is not reset to 0, we can reboot the vm
many times.
Kevin Wolf [Wed, 1 Jun 2011 11:29:11 +0000 (13:29 +0200)]
qemu-char: Print strerror message on failure
The only way for chardev drivers to communicate an error was to return a NULL
pointer, which resulted in an error message that said _that_ something went
wrong, but not _why_.
This patch changes the interface to return 0/-errno and updates
qemu_chr_open_opts to use strerror to display a more helpful error message.
Paolo Bonzini [Thu, 9 Jun 2011 11:10:25 +0000 (13:10 +0200)]
qemu-timer: change unix timer to dynticks
A timer that wakes up every millisecond puts a lot of stress on the
iothread. The large amount of IPIs causes very high context switch
activity, making emulation slow and the UI unusable. This is by the
way the same reason why the Windows timers were switched to dynticks.
Paolo Bonzini [Thu, 9 Jun 2011 11:10:24 +0000 (13:10 +0200)]
iothread: replace fair_mutex with a condition variable
This conveys the intention better, and scales to more than >1
threads contending the mutex with the iothread (as long as all
of them have a "quiescent point" like the TCG thread has).
Also, on Mac OS X the fair_mutex somehow didn't work as intended
and deadlocked.
Paolo Bonzini [Fri, 15 Jul 2011 15:10:15 +0000 (17:10 +0200)]
report serial devices created with -device in the PIIX4 config space
Serial and parallel devices created with -device are not reported in
the PIIX4 configuration space, and are hence not picked up by the DSDT.
This upsets Windows, which hides them altogether from the guest.
To avoid this, check at the end of machine initialization whether the
corresponding I/O ports have been registered. The new function in
ioport.c does this; this also requires a tweak to isa_unassign_ioport.
I left the comment in piix4_pm_initfn since the registers I moved do
seem to match the 82371AB datasheet. There are some quirks though.
We are setting this bit:
"Device 8 EIO Enable (EIO_EN_DEV8)—R/W. 1=Enable PCI access to the
device 8 enabled I/O ranges to be claimed by PIIX4 and forwarded
to the ISA/EIO bus. 0=Disable. The LPT_MON_EN must be set to enable
the decode."
but not LPT_MON_EN (bit 18 at 50h):
LPT Port Enable (LPT_MON_EN)—R/W. 1=Enable accesses to parallel
port address range (LPT_DEC_SEL) to generate a device 8 (parallel
port) decode event. 0=Disable.
We're also setting the LPT_DEC_SEL field (that's the 0x60 written to
63h) to 11, which means reserved, rather than to 01 (378h-37Fh).
Likewise we're not setting SA_MON_EN, SB_MON_EN (respectively bit 14
and bit 16 at address 50h) for the serial ports. However, we're setting
COMA_DEC_SEL and COMB_DEC_SEL correctly, unlike the corresponding register
for the parallel port.
All these fields are left as they are, since they are probably only
meant to be used in the DSDT.
Jan Kiszka [Wed, 20 Jul 2011 10:20:22 +0000 (12:20 +0200)]
net: Consistently use qemu_macaddr_default_if_unset
Drop the open-coded MAC assignment from net_init_nic and replace it with
standard qemu_macaddr_default_if_unset which is also used by qdev. That
avoid creating colliding MACs when instantiating NICs via different
mechanisms.
This change requires to store the MAC as MACAddr in NICInfo, and the
remaining nd_table users need to be updated.
Jan Kiszka [Wed, 20 Jul 2011 10:20:21 +0000 (12:20 +0200)]
net: Dump client type 'info network'
Include the client type name into the output of 'info network'. The
result looks like this:
(qemu) info network
VLAN 0 devices:
rtl8139.0: type=nic,model=rtl8139,macaddr=52:54:00:12:34:57
Devices not on any VLAN:
virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
\ network1: type=tap,fd=5
Jan Kiszka [Wed, 20 Jul 2011 10:20:20 +0000 (12:20 +0200)]
net: Refactor net_client_types
Position entries of net_client_types according to the corresponding
values of NET_CLIENT_TYPE_*. The array size is now defined by
NET_CLIENT_TYPE_MAX. This will allow to obtain entries based on type
value in later patches.
At this chance rename NET_CLIENT_TYPE_SLIRP to NET_CLIENT_TYPE_USER for
the sake of consistency.
Jan Kiszka [Wed, 20 Jul 2011 10:20:18 +0000 (12:20 +0200)]
slirp: Forward ICMP echo requests via unprivileged sockets
Linux 3.0 gained support for unprivileged ICMP ping sockets. Use this
feature to forward guest pings to the outer world. The host admin has to
set the ping_group_range in order to grant access to those sockets. To
allow ping for the users group (GID 100):
Jan Kiszka [Wed, 20 Jul 2011 10:20:17 +0000 (12:20 +0200)]
slirp: Put forked exec into separate process group
Recent smb daemons tend to terminate themselves via a process group
SIGTERM. If the daemon is still in qemu's group by that time, qemu will
die as well. Avoid this by always pushing fork_exec processes into a
group of their own, not just (unused) type 2 execs.
Jan Kiszka [Wed, 20 Jul 2011 10:20:14 +0000 (12:20 +0200)]
slirp: Canonicalize restrict syntax
All other boolean arguments accept on|off - except for slirp's restrict.
Fix that while still accepting the formerly allowed yes|y|no|n, but
reject everything else. This avoids accidentally allowing external
connections because syntax errors were so far interpreted as
'restrict=no'.
Jan Kiszka [Sat, 23 Jul 2011 10:38:37 +0000 (12:38 +0200)]
Generalize -machine command line option
-machine somehow suggests that it selects the machine, but it doesn't.
Fix that before this command is set in stone.
Actually, -machine should supersede -M and allow to introduce arbitrary
per-machine options to the command line. That will change the internal
realization again, but we will be able to keep the user interface
stable.
Hans de Goede [Tue, 19 Jul 2011 09:04:10 +0000 (11:04 +0200)]
USB: add usb network redirection support
This patch adds support for a usb-redir device, which takes a chardev
as a communication channel to an actual usbdevice using the usbredir protocol.
Compiling the usb-redir device requires usbredir-0.3 to be installed for
the usbredir protocol parser, usbredir-0.3 also contains a server for
redirecting usb traffic from an actual usb device. You can get the 0.3
release of usbredir here:
http://people.fedoraproject.org/~jwrdegoede/usbredir-0.3.tar.bz2
(getting a more formal site for it is a WIP)
Example usage:
1) Start usbredirserver for a usb device:
sudo usbredirserver 045e:0772
2) Start qemu with usb2 support + a chardev talking to usbredirserver +
a usb-redir device using this chardev:
qemu ... \
-readconfig docs/ich9-ehci-uhci.cfg \
-chardev socket,id=usbredirchardev,host=localhost,port=4000 \
-device usb-redir,chardev=usbredirchardev,id=usbredirdev
SPARC64: implement addtional MMU faults related to nonfaulting load
This patch implements MMU faults caused by TTE.NFO and TTE.E:
- access other than nonfaulting load to a page marked NFO should
raise data_access_exception
- nonfaulting load to a page marked with E bit should raise
data_access_exception
To distinguish nonfaulting loads, this patch extends (abuses?) the rw
argument of get_physical_address_data(). rw is set to 4 on nonfaulting
loads.
SPARC64: implement MMU miss traps on nonfaulting loads
Nonfaulting loads should raise fast_data_access_MMU_miss traps as
normal loads do. It is up to the guest OS kernel that detect MMU misses
on nonfaulting load instructions and make them complete without signaling.
SPARC64: fix fault status overwritten on nonfaulting load
cpu_get_phys_page_nofault() calls get_physical_address() twice,
that results in overwriting the fault status in the SFSR.
We need this change in order for nonfaulting loads to raising MMU faults
as normal loads do.
Also removed the call to cpu_get_physical_page_desc() since we are
going to modify nonfaulting loads raising MMU faults.
SPARC64: split cpu_get_phys_page_debug() from cpu_get_phys_page_nofault()
This patch makes cpu_get_phys_page_debug() independent from
cpu_get_phys_page_nofault() in advance of implementing nonfaulting load.
This also modifies cpu_get_phys_page_nofault() to be compiled only on
TARGET_SPARC64 because it is not required on SPARC32.
Michael Roth [Wed, 20 Jul 2011 20:19:37 +0000 (15:19 -0500)]
guest agent: qemu-ga daemon
This is the actual guest daemon, it listens for requests over a
virtio-serial/isa-serial/unix socket channel and routes them through
to dispatch routines, and writes the results back to the channel in
a manner similar to QMP.
Michael Roth [Tue, 19 Jul 2011 19:50:42 +0000 (14:50 -0500)]
qapi: add qapi-commands.py code generator
This is the code generator for qapi command marshaling/dispatch.
Currently only generators for synchronous qapi/qmp functions are
supported. This script generates the following files:
$(prefix)qmp-marshal.c: command marshal/dispatch functions for each
QMP command defined in the schema. Functions
generated by qapi-visit.py are used to
convert qobjects recieved from the wire into
function parameters, and uses the same
visiter functions to convert native C return
values to qobjects from transmission back
over the wire.
$(prefix)qmp-commands.h: Function prototypes for the QMP commands
specified in the schema.
$(prefix) is used in the same manner as with qapi-types.py
Michael Roth [Tue, 19 Jul 2011 19:50:41 +0000 (14:50 -0500)]
qapi: add qapi-visit.py code generator
This is the code generator for qapi visiter functions used to
marshal/unmarshal/dealloc qapi types. It generates the following 2
files:
$(prefix)qapi-visit.c: visiter function for a particular c type, used
to automagically convert qobjects into the
corresponding C type and vice-versa, and well
as for deallocation memory for an existing C
type
$(prefix)qapi-visit.h: declarations for previously mentioned visiter
functions
Michael Roth [Tue, 19 Jul 2011 19:50:40 +0000 (14:50 -0500)]
qapi: add qapi-types.py code generator
This is the code generator for qapi types. It will generation the
following files:
$(prefix)qapi-types.h - C types corresponding to types defined in
the schema you pass in
$(prefix)qapi-types.c - Cleanup functions for the above C types
The $(prefix) is used to as a namespace to keep the generated code from
one schema/code-generation separated from others so code and be
generated from multiple schemas with clobbering previously created code.
Michael Roth [Tue, 19 Jul 2011 19:50:37 +0000 (14:50 -0500)]
qapi: add QMP dispatch functions
Given an object recieved via QMP, this code uses the dispatch table
provided by qmp_registry.c to call the corresponding marshalling/dispatch
function and format return values/errors for delivery to the QMP.
Currently only synchronous QMP functions are supported, but this will
also be used for async QMP functions and QMP guest proxy dispatch as
well.
Michael Roth [Tue, 19 Jul 2011 19:50:34 +0000 (14:50 -0500)]
qapi: add QMP output visitor
Type of Visiter class that serves as the inverse of the input visitor:
it takes a series of native C types and uses their values to construct a
corresponding QObject. The command marshaling/dispatcher functions will
use this to convert the output of QMP functions into a QObject that can
be sent over the wire.
Michael Roth [Tue, 19 Jul 2011 19:50:33 +0000 (14:50 -0500)]
qapi: add QMP input visitor
A type of Visiter class that is used to walk a qobject's
structure and assign each entry to the corresponding native C type.
Command marshaling function will use this to pull out QMP command
parameters recieved over the wire and pass them as native arguments
to the corresponding C functions.
Michael Roth [Tue, 19 Jul 2011 19:50:32 +0000 (14:50 -0500)]
qapi: add QAPI visitor core
Base definitions/includes for Visiter interface used by generated
visiter/marshalling code.
Includes a GenericList type. Our lists require an embedded element.
Since these types are generated, if you want to use them in a different
type of data structure, there's no easy way to add another embedded
element. The solution is to have non-embedded lists and that what this is.
Anthony Liguori [Tue, 19 Jul 2011 19:50:29 +0000 (14:50 -0500)]
Add hard build dependency on glib
GLib is an extremely common library that has a portable thread implementation
along with tons of other goodies.
GLib and GObject have a fantastic amount of infrastructure we can leverage in
QEMU including an object oriented programming infrastructure.
Short term, it has a very nice thread pool implementation that we could leverage
in something like virtio-9p. It also has a test harness implementation that
this series will use.