Avi Kivity [Mon, 2 Jan 2012 11:12:08 +0000 (13:12 +0200)]
Direct dispatch through MemoryRegion
Now that all mmio goes through MemoryRegions, we can convert
io_mem_opaque to be a MemoryRegion pointer, and remove the thunks
that convert from old-style CPU{Read,Write}MemoryFunc to MemoryRegionOps.
Avi Kivity [Sun, 1 Jan 2012 22:32:15 +0000 (00:32 +0200)]
Convert IO_MEM_{RAM,ROM,UNASSIGNED,NOTDIRTY} to MemoryRegions
Convert the fixed-address IO_MEM_RAM, IO_MEM_ROM, IO_MEM_UNASSIGNED,
and IO_MEM_NOTDIRTY io handlers to MemoryRegions. These aren't real
regions, since they are never added to the memory hierarchy, but they
allow reuse of the dispatch functionality.
Avi Kivity [Sun, 1 Jan 2012 16:24:24 +0000 (18:24 +0200)]
Fix wrong region_offset when overlaying a page with another
cpu_register_physical_memory_log() does not update region_offset
if a page was previously registered for the same address. This
could cause mmio accesses going to the wrong place, by using the
old region_offset.
Avi Kivity [Sun, 1 Jan 2012 16:02:31 +0000 (18:02 +0200)]
memory: remove MemoryRegion::backend_registered
backend_registered was used to lazify the process of registering an
mmio region, since the it is different for the I/O address space and
the memory address space. However, it also makes registration dependent
on the region being visible in the address space. This is not the case
for "fake" regions, like watchpoints or IO_MEM_UNASSIGNED.
Remove backend_registered and always initialize the region. If it turns
out to be part of the I/O address space, we've wasted an I/O slot, but
that's not too bad. In any case this will be optimized later on.
Avi Kivity [Sun, 20 Nov 2011 15:52:22 +0000 (17:52 +0200)]
exec: make phys_page_find() return a temporary
Instead of returning a PhysPageDesc pointer, return a temporary.
This lets us move away from actually storing PhysPageDesc's, and
instead sythesising them when needed.
Avi Kivity [Wed, 21 Dec 2011 11:37:56 +0000 (13:37 +0200)]
Remove support for version 3 ram_load
Version 3 ram_load depends on ram_addrs, which are not stable. Version 4
was introduced in 0.13 (and RHEL 6), so this means live migration from 0.12
and earlier to 1.1 or later will not work.
Avi Kivity [Wed, 21 Dec 2011 11:22:16 +0000 (13:22 +0200)]
Sort RAMBlocks by ID for migration, not by ram_addr
ram_addr is (a) unstable (b) going away. Sort by idstr instead.
Commit b2e0a138e initially introduced the sorting for the purpose
of improving debuggability. After this patch, the order is still
stable, but perhaps less usable by a human.
Avi Kivity [Tue, 20 Dec 2011 13:59:12 +0000 (15:59 +0200)]
vmstate, memory: decouple vmstate from memory API
Currently creating a memory region automatically registers it for
live migration. This differs from other state (which is enumerated
in a VMStateDescription structure) and ties the live migration code
into the memory core.
Decouple the two by introducing a separate API, vmstate_register_ram(),
for registering a RAM block for migration. Currently the same
implementation is reused, but later it can be moved into a separate list,
and registrations can be moved to VMStateDescription blocks.
Anthony Liguori [Tue, 3 Jan 2012 20:39:05 +0000 (14:39 -0600)]
Merge remote-tracking branch 'qemu-kvm/memory/page_desc' into staging
* qemu-kvm/memory/page_desc: (22 commits)
Remove cpu_get_physical_page_desc()
sparc: avoid cpu_get_physical_page_desc()
virtio-balloon: avoid cpu_get_physical_page_desc()
vhost: avoid cpu_get_physical_page_desc()
kvm: avoid cpu_get_physical_page_desc()
memory: remove CPUPhysMemoryClient
xen: convert to MemoryListener API
memory: temporarily add memory_region_get_ram_addr()
xen, vga: add API for registering the framebuffer
vhost: convert to MemoryListener API
kvm: convert to MemoryListener API
kvm: switch kvm slots to use host virtual address instead of ram_addr_t
memory: add API for observing updates to the physical memory map
memory: replace cpu_physical_sync_dirty_bitmap() with a memory API
framebuffer: drop use of cpu_physical_sync_dirty_bitmap()
loader: remove calls to cpu_get_physical_page_desc()
framebuffer: drop use of cpu_get_physical_page_desc()
memory: introduce memory_region_find()
memory: add memory_region_is_logging()
memory: add memory_region_is_rom()
...
Avi Kivity [Tue, 27 Dec 2011 14:02:16 +0000 (16:02 +0200)]
Fix qapi code generation wrt parallel build
Make's multiple output syntax
x.c x.h: x.template
gen < x.template
actually invokes the command once for x.c and once for x.h (with differing $@
in each invocation). During a parallel build, the two commands may be invoked
in parallel; this opens up a race, where the second invocation trashes a file
supposedly produced during the first, and now in use by a dependent command.
The various qapi code generators are susceptible to this; fix by making them
generate just one file per invocation.
Anthony Liguori [Tue, 27 Dec 2011 14:53:35 +0000 (08:53 -0600)]
Merge remote-tracking branch 'aneesh/for-upstream' into staging
* aneesh/for-upstream:
scripts/analyse-9p-simpletrace.py: Add symbolic names for 9p operations.
hw/9pfs: iattr_valid flags are kernel internal flags map them to 9p values.
hw/9pfs: Use the correct signed type for different variables
hw/9pfs: replace iovec manipulation with QEMUIOVector
Anthony Liguori [Tue, 27 Dec 2011 14:52:42 +0000 (08:52 -0600)]
Merge remote-tracking branch 'bonzini/nbd-for-anthony' into staging
* bonzini/nbd-for-anthony: (26 commits)
nbd: add myself as maintainer
qemu-nbd: throttle requests
qemu-nbd: asynchronous operation
qemu-nbd: add client pointer to NBDRequest
qemu-nbd: move client handling to nbd.c
qemu-nbd: use common main loop
link the main loop and its dependencies into the tools
qemu-nbd: introduce NBDRequest
qemu-nbd: introduce NBDExport
qemu-nbd: introduce nbd_do_receive_request
qemu-nbd: more robust handling of invalid requests
qemu-nbd: introduce nbd_do_send_reply
qemu-nbd: simplify nbd_trip
move corking functions to osdep.c
qemu-nbd: remove data_size argument to nbd_trip
qemu-nbd: remove offset argument to nbd_trip
Update ioctl order in nbd_init() to detect EBUSY
nbd: add support for NBD_CMD_TRIM
nbd: add support for NBD_CMD_FLUSH
nbd: add support for NBD_CMD_FLAG_FUA
...
qemu-kvm passes numa/SRAT topology information for smp_cpus to SeaBIOS. However
SeaBIOS always expects to setup max_cpus number of SRAT cpu entries
(MaxCountCPUs variable in build_srat function of Seabios). When qemu-kvm runs
with smp_cpus != max_cpus (e.g. -smp 2,maxcpus=4), Seabios will mistakenly use
memory SRAT info for setting up CPU SRAT entries for the offline CPUs. Wrong
SRAT memory entries are also created. This breaks NUMA in a guest.
Fix by setting up SRAT info for max_cpus in qemu-kvm.
Jan Kiszka [Wed, 26 Oct 2011 11:09:45 +0000 (13:09 +0200)]
kvm: x86: Drop redundant apic base and tpr update from kvm_get_sregs
The latter was already commented out, the former is redundant as well.
We always get the latest changes after return from the guest via
kvm_arch_post_run.
Paolo Bonzini [Mon, 19 Sep 2011 13:25:40 +0000 (15:25 +0200)]
qemu-nbd: throttle requests
Limiting the number of in-flight requests is implemented very simply
with a can_read callback. It does not require a semaphore, unlike the
client side in block/nbd.c, because we can throttle directly the creation
of coroutines. The client side can have a coroutine created at any time
when an I/O request is made.
Paolo Bonzini [Mon, 19 Sep 2011 13:19:27 +0000 (15:19 +0200)]
qemu-nbd: asynchronous operation
Using coroutines enable asynchronous operation on both the network and
the block side. Network can be owned by two coroutines at the same time,
one writing and one reading. On the send side, mutual exclusion is
guaranteed by a CoMutex. On the receive side, mutual exclusion is
guaranteed because new coroutines immediately start receiving data,
and no new coroutines are created as long as the previous one is receiving.
Between receive and send, qemu-nbd can have an arbitrary number of
in-flight block transfers. Throttling is implemented by the next
patch.
Paolo Bonzini [Fri, 7 Oct 2011 14:47:56 +0000 (16:47 +0200)]
qemu-nbd: add client pointer to NBDRequest
By attaching a client to an NBDRequest, we can avoid passing around the
socket descriptor and data buffer.
Also, we can now manage the reference count for the client in
nbd_request_get/put request instead of having to do it ourselved in
nbd_read. This simplifies things when coroutines are used.
Paolo Bonzini [Mon, 19 Sep 2011 12:33:23 +0000 (14:33 +0200)]
qemu-nbd: move client handling to nbd.c
This patch sets up the fd handler in nbd.c instead of qemu-nbd.c. It
introduces NBDClient, which wraps the arguments to nbd_trip in a single
structure, so that we can add a notifier to it. This way, qemu-nbd can
know about disconnections.
Paolo Bonzini [Mon, 12 Sep 2011 14:20:11 +0000 (16:20 +0200)]
link the main loop and its dependencies into the tools
Using the main loop code from QEMU enables tools to operate fully
asynchronously. Advantages include better Windows portability (for some
definition of portability) over glib's.
Paolo Bonzini [Mon, 19 Sep 2011 12:18:33 +0000 (14:18 +0200)]
qemu-nbd: introduce NBDRequest
Move the buffer from NBDExport to a new structure, so that it will be
possible to have multiple in-flight requests for the same export
(and for the same client too---we get that for free).
Paolo Bonzini [Mon, 19 Sep 2011 12:25:30 +0000 (14:25 +0200)]
qemu-nbd: introduce nbd_do_send_reply
Group the sending of a reply and the associated data into a new function.
Without corking, the caller would be forced to leave 12 free bytes at the
beginning of the data pointer. Not too ugly, but still ugly. :)
Using nbd_do_send_reply everywhere will help when the routine will set up
the write handler that re-enters the send coroutine.
Chunyan Liu [Fri, 2 Dec 2011 15:27:54 +0000 (23:27 +0800)]
Update ioctl order in nbd_init() to detect EBUSY
Update ioctl(s) in nbd_init() to detect device busy early.
Current nbd_init() issues NBD_CLEAR_SOCKET before NBD_SET_SOCKET, if issuing
"qemu-nbd -c /dev/nbd0 disk.img" twice, the second time won't detect EBUSY in
nbd_init(), but in nbd_client will report EBUSY and do clear socket (the 1st
time command will be affacted too because of no socket any more.)
Paolo Bonzini [Sat, 10 Sep 2011 13:06:52 +0000 (15:06 +0200)]
nbd: allow multiple in-flight requests
Allow sending up to 16 requests, and drive the replies to the coroutine
that did the request. The code is written to be exactly the same as
before this patch when MAX_NBD_REQUESTS == 1 (modulo the extra mutex
and state).
Paolo Bonzini [Thu, 8 Sep 2011 11:46:25 +0000 (13:46 +0200)]
sheepdog: move coroutine send/recv function to generic code
Outside coroutines, avoid busy waiting on EAGAIN by temporarily
making the socket blocking.
The API of qemu_recvv/qemu_sendv is slightly different from
do_readv/do_writev because they do not handle coroutines. It
returns the number of bytes written before encountering an
EAGAIN. The specificity of yielding on EAGAIN is entirely in
qemu-coroutine.c.
Amit Shah [Wed, 21 Dec 2011 06:58:29 +0000 (12:28 +0530)]
virtio-serial-bus: Ports are expected to implement 'have_data' callback
There's no need to check if ports can accept any incoming data from the
guest each time the guest sends data. Check if the port implements such
functionality during port initialisation.
Amit Shah [Wed, 21 Dec 2011 06:58:28 +0000 (12:28 +0530)]
virtio-console: Properly initialise class methods
The earlier code really was a hack: initialising class methods in an
object init function as noted by Anthony.
The motivation for that was to not have the virtio-serial-bus call into
the callback functions if there was no chardev backend registered.
However, that really wasn't a worthwhile optimisation, and definitely
not one that was well-implemented. Get rid of it.
Amit Shah [Wed, 21 Dec 2011 06:58:27 +0000 (12:28 +0530)]
virtio-console: Check if chardev backends available before calling into them
For the callback functions invoked by the virtio-serial-bus code, check
if we have chardev backends registered before we call into the chardev
functions.
scripts/analyse-9p-simpletrace.py: Add symbolic names for 9p operations.
Currently, we just print the numerical value of 9p operation identifier in
case of RERROR which is less meaningful for readability. Mapping 9p
operation ids to symbolic names provides a better tracelog:
RERROR (tag = 1 , id = TWALK , err = " No such file or directory ")
RERROR (tag = 1 , id = TUNLINKAT , err = " Directory not empty ")
This patch provides a dictionary of all possible 9p operation symbols mapped
to their numerical identifiers which are likely to be used in future at
various places in this script.
Stefan Hajnoczi [Wed, 21 Dec 2011 07:07:22 +0000 (12:37 +0530)]
hw/9pfs: replace iovec manipulation with QEMUIOVector
The v9fs_read() and v9fs_write() functions rely on iovec[] manipulation
code should be replaced with QEMUIOVector to avoid duplicating code.
In the future it may be possible to make the code even more concise by
using QEMUIOVector consistently across virtio and 9pfs.
The "v" format specifier for pdu_marshal() and pdu_unmarshal() is
dropped since it does not actually pack/unpack anything. The specifier
was also not implemented to update the offset variable and could only be
used at the end of a format string, another sign that this shouldn't
really be a format specifier. Instead, see the new
v9fs_init_qiov_from_pdu() function.
This change avoids a possible iovec[] buffer overflow when indirect
vrings are used since the number of vectors is now limited by the
underlying VirtQueueElement and cannot be out-of-bounds.
Peter Maydell [Sun, 18 Dec 2011 20:37:59 +0000 (21:37 +0100)]
hw/sd.c: Correct handling of APP_CMD status bit
Fix some bugs in our implementation of the APP_CMD status bit:
* the response to an ACMD should have APP_CMD set, not cleared
* if an illegal ACMD is sent then the next command should be
handled as a normal command
This requires that we split "card is expecting an ACMD" from
the state of the APP_CMD status bit (the latter indicates
both "expecting ACMD" and "that was an ACMD").
Peter Maydell [Sun, 18 Dec 2011 20:37:58 +0000 (21:37 +0100)]
hw/sd.c: Correct handling of type B SD status bits
Correct how we handle the type B ("cleared on valid command")
status bits. In particular, the CURRENT_STATE bits in a response
should be the state of the card when it received that command,
not the state when it received the preceding command. (This is
one of the issues noted in LP:597641.)
Peter Maydell [Sun, 18 Dec 2011 20:37:56 +0000 (21:37 +0100)]
hw/sd.c: Handle CRC and locked-card errors in normal code path
Handle returning CRC and locked-card errors in the same code path
we use for other responses. This makes no difference in behaviour
but means that these error responses will be printed by the debug
logging code.
Peter Maydell [Sun, 18 Dec 2011 20:37:55 +0000 (21:37 +0100)]
hw/sd.c: Handle illegal commands in sd_do_command
Add an extra sd_illegal value to the sd_rsp_type_t enum so that
sd_app_command() and sd_normal_command() can tell sd_do_command()
that the command was illegal. This is needed so we can do things
like reset certain status bits only on receipt of a valid command.
For the moment, just use it to pull out the setting of the
ILLEGAL_COMMAND status bit into sd_do_command().
Peter Maydell [Sun, 18 Dec 2011 20:37:53 +0000 (21:37 +0100)]
hw/sd.c: On CRC error, set CRC error status bit rather than clearing it
If we fail to validate the CRC for an SD command we should be setting
COM_CRC_ERROR, not clearing it. (This bug actually has no effect currently
because sd_req_crc_validate() always returns success.)
Peter Maydell [Sun, 18 Dec 2011 20:37:51 +0000 (21:37 +0100)]
hw/sd.c: Fix the set of commands which are failed when card is locked
Fix bugs in the code determining whether to accept a command when the
SD card is locked. Most notably, we had the condition completely
reversed, so we would accept all the commands we should refuse and
refuse all the commands we should accept. Correct this by refactoring
the enormous if () clause into a separate function.
We had also missed ACMD42 off the list of commands which are accepted
in locked state: add it.
This is one of the two problems reported in LP:597641.
Alon Levy [Tue, 20 Dec 2011 11:41:04 +0000 (13:41 +0200)]
g_thread_init users: don't call it if glib >= 2.31
since commit f9b29ca03 included in release 2.31 (docs below say 2.32 but
that is not correct) and onwards g_thread_init is deprecated and calling
it is not required:
g_thread_init has been deprecated since version 2.32 and should not be
used in newly-written code. This function is no longer necessary. The
GLib threading system is automatically initialized at the start of your
program.
Fixes bulid failure when warnings are treated as errors on fedora 17.
I only tested the change to vl.c, and copy pasted to the two other
locations (couldn't decide if a wrapper for calling g_thread_init is
uglier).
Hervé Poussineau [Wed, 30 Nov 2011 20:35:38 +0000 (21:35 +0100)]
net: store guest timestamp in dump file instead of time since guest startup
Stored dates are no more 1970-01-01 (+ run time), but have a real meaning.
If someone wants to have comparable timestamps accross boots, it is
possible to start qemu with -rtc to give the startup date.