Paolo Bonzini [Thu, 18 Oct 2012 14:49:29 +0000 (16:49 +0200)]
qmp: add pull_event function
This function is unlike get_events in that it makes it easy to process
one event at a time. This is useful in the mirroring test cases, where
we want to process just one event (BLOCK_JOB_ERROR) and leave the others
to a helper function.
Paolo Bonzini [Thu, 18 Oct 2012 14:49:28 +0000 (16:49 +0200)]
mirror: add support for on-source-error/on-target-error
Error management is important for mirroring; otherwise, an error on the
target (even something as "innocent" as ENOSPC) requires to start again
with a full copy. Similar to on_read_error/on_write_error, two separate
knobs are provided for on_source_error (reads) and on_target_error (writes).
The default is 'report' for both.
The 'ignore' policy will leave the sector dirty, so that it will be
retried later. Thus, it will not cause corruption.
Paolo Bonzini [Thu, 18 Oct 2012 14:49:25 +0000 (16:49 +0200)]
mirror: implement completion
Switching to the target of the migration is done mostly asynchronously,
and reported to management via the BLOCK_JOB_COMPLETED event; the only
synchronous phase is opening the backing files. bdrv_open_backing_file
can always be done, even for migration of the full image (aka sync:
'full'). In this case, qmp_drive_mirror will create the target disk
with no backing file at all, and bdrv_open_backing_file will be a no-op.
Paolo Bonzini [Thu, 18 Oct 2012 14:49:23 +0000 (16:49 +0200)]
mirror: introduce mirror job
This patch adds the implementation of a new job that mirrors a disk to
a new image while letting the guest continue using the old image.
The target is treated as a "black box" and data is copied from the
source to the target in the background. This can be used for several
purposes, including storage migration, continuous replication, and
observation of the guest I/O in an external program. It is also a
first step in replacing the inefficient block migration code that is
part of QEMU.
The job is possibly never-ending, but it is logically structured into
two phases: 1) copy all data as fast as possible until the target
first gets in sync with the source; 2) keep target in sync and
ensure that reopening to the target gets a correct (full) copy
of the source data.
The second phase is indicated by the progress in "info block-jobs"
reporting the current offset to be equal to the length of the file.
When the job is cancelled in the second phase, QEMU will run the
job until the source is clean and quiescent, then it will report
successful completion of the job.
In other words, the BLOCK_JOB_CANCELLED event means that the target
may _not_ be consistent with a past state of the source; the
BLOCK_JOB_COMPLETED event means that the target is consistent with
a past state of the source. (Note that it could already happen
that management lost the race against QEMU and got a completion
event instead of cancellation).
It is not yet possible to complete the job and switch over to the target
disk. The next patches will fix this and add many refinements to the
basic idea introduced here. These include improved error management,
some tunable knobs and performance optimizations.
Paolo Bonzini [Mon, 23 Jul 2012 13:15:47 +0000 (15:15 +0200)]
block: introduce BLOCK_JOB_READY event
Even for jobs that need to be manually completed, management may want
to take care itself of the completion, not requiring the user to issue
a command to terminate the job. In this case we want to avoid that
they poll us continuously, waiting for completion to become available.
Thus, add a new event that signals the phase switch and the availability
of the block-job-complete command.
Paolo Bonzini [Thu, 18 Oct 2012 14:49:21 +0000 (16:49 +0200)]
block: add block-job-complete
While streaming can be dropped as soon as it progressed through the whole
image, mirroring needs to be completed manually for two reasons: 1) so that
management knows exactly when the VM switches to the target; 2) because
for other use cases such as replication, we may leave the operation running
for the whole life of the virtual machine.
Add a new block job command that manually completes background operations.
Paolo Bonzini [Thu, 18 Oct 2012 14:49:18 +0000 (16:49 +0200)]
block: introduce new dirty bitmap functionality
Assert that write_compressed is never used with the dirty bitmap.
Setting the bits early is wrong, because a coroutine might concurrently
examine them and copy incomplete data from the source.
Paolo Bonzini [Thu, 18 Oct 2012 14:49:17 +0000 (16:49 +0200)]
block: add bdrv_open_backing_file
Mirroring runs without the backing file so that it can be copied outside
QEMU. However, we need to add it at the time the job is completed and
QEMU switches to the target. Factor out the common bits of opening an
image and completing a mirroring operation.
The new function does not assume that the file is closed immediately after
it returns failure, so it keeps the BDRV_O_NO_BACKING flag up-to-date.
Corey Bryant [Thu, 18 Oct 2012 19:19:34 +0000 (15:19 -0400)]
qemu-config: Add new -add-fd command line option
This option can be used for passing file descriptors on the
command line. It mirrors the existing add-fd QMP command which
allows an fd to be passed to QEMU via SCM_RIGHTS and added to an
fd set.
This can be combined with commands such as -drive to link file
descriptors in an fd set to a drive:
This example adds dups of fds 3 and 4, and the accompanying opaque
strings to the fd set with ID=2. qemu_open() already knows how
to handle a filename of this format. qemu_open() searches the
corresponding fd set for an fd and when it finds a match, QEMU
goes on to use a dup of that fd just like it would have used an
fd that it opened itself.
Corey Bryant [Thu, 18 Oct 2012 19:19:33 +0000 (15:19 -0400)]
monitor: Prevent removing fd from set during init
If an fd is added to an fd set via the command line, and it is not
referenced by another command line option (ie. -drive), then clean
it up after QEMU initialization is complete.
Corey Bryant [Thu, 18 Oct 2012 19:19:32 +0000 (15:19 -0400)]
monitor: Enable adding an inherited fd to an fd set
qmp_add_fd() gets an fd that was received over a socket with
SCM_RIGHTS and adds it to an fd set. This patch adds support
that will enable adding an fd that was inherited on the
command line to an fd set.
Note: All of the code added to monitor_fdset_add_fd(), with the
exception of the error path for non-valid fdset-id, is code motion
from qmp_add_fd().
Corey Bryant [Thu, 18 Oct 2012 19:19:31 +0000 (15:19 -0400)]
monitor: Allow add-fd to any specified fd set
The first call to add an fd to an fd set was previously not
allowed to choose the fd set ID. The ID was generated as
the first available and ensuing calls could add more fds by
specifying the fd set ID. This change allows users to
choose the fd set ID on the first call.
Stefan Hajnoczi [Wed, 17 Oct 2012 12:02:31 +0000 (14:02 +0200)]
qemu-img: Add --backing-chain option to info command
The qemu-img info --backing-chain option enumerates the backing file
chain. For example, for base.qcow2 <- snap1.qcow2 <- snap2.qcow2 the
output becomes:
$ qemu-img info --backing-chain snap2.qcow2
image: snap2.qcow2
file format: qcow2
virtual size: 100M (104857600 bytes)
disk size: 196K
cluster_size: 65536
backing file: snap1.qcow2
backing file format: qcow2
Jeff Cody [Tue, 16 Oct 2012 19:49:10 +0000 (15:49 -0400)]
block: in commit, determine base image from the top image
This simplifies some code and error checking, and also fixes a bug.
bdrv_find_backing_image() should only be passed absolute filenames,
or filenames relative to the chain. In the QMP message handler for
block commit, when looking up the base do so from the determined top
image, so we know it is reachable from top.
Some of the error messages put out by block-commit have changed
slightly, which causes 2 tests cases for block-commit to fail.
This patch updates the test cases to look for the correct error
output.
Jeff Cody [Tue, 16 Oct 2012 19:49:09 +0000 (15:49 -0400)]
block: make bdrv_find_backing_image compare canonical filenames
Currently, bdrv_find_backing_image compares bs->backing_file with
what is passed in as a backing_file name. Mismatches may occur,
however, when bs->backing_file and backing_file are not both
absolute or relative.
Use path_combine() to make sure any relative backing filenames are
relative to the current image filename being searched, and then use
realpath() to make all comparisons based on absolute filenames.
If either backing_file or bs->backing_file is determine to be a
protocol, then no filename normalization is performed.
This also changes bdrv_find_backing_image to no longer be recursive,
but iterative.
Alex Bligh [Tue, 16 Oct 2012 12:46:18 +0000 (13:46 +0100)]
qemu-img rebase: use empty string to rebase without backing file
This patch allows an empty filename to be passed as the new base image name
for qemu-img rebase to mean base the image on no backing file (i.e.
independent of any backing file). According to Eric Blake, qemu-img rebase
already supports this when '-u' is used; this adds support when -u is not
used.
Jeff Cody [Mon, 15 Oct 2012 20:58:02 +0000 (16:58 -0400)]
qmp: fix __accept() in qmp.py
In QEMUMonitorProtocol, commit e9d17b6 removed the __sockfile creation
from __negotiate_capabilities(), which breaks _accept(). This causes
failures in qemu-io python based tests (i.e. tests 030 and 040).
This patch creates the sockfile in __accept() as well.
Avi Kivity [Tue, 23 Oct 2012 10:30:10 +0000 (12:30 +0200)]
Rename target_phys_addr_t to hwaddr
target_phys_addr_t is unwieldly, violates the C standard (_t suffixes are
reserved) and its purpose doesn't match the name (most target_phys_addr_t
addresses are not target specific). Replace it with a finger-friendly,
standards conformant hwaddr.
Outstanding patchsets can be fixed up with the command
Anthony Liguori [Mon, 22 Oct 2012 19:49:18 +0000 (14:49 -0500)]
Merge remote-tracking branch 'qemu-kvm/memory/urgent' into staging
* qemu-kvm/memory/urgent:
memory: abort if a memory region is destroyed during a transaction
i440fx: avoid destroying memory regions within a transaction
memory: Make eventfd adhere to device endianness
Gerd Hoffmann [Wed, 17 Oct 2012 07:54:19 +0000 (09:54 +0200)]
serial: split serial.c
Split serial.c into serial.c, serial.h and serial-isa.c. While being at
creating a serial.h header file move the serial prototypes from pc.h to
the new serial.h. The latter leads to s/pc.h/serial.h/ in tons of
boards which just want the serial bits from pc.h
Luiz Capitulino [Fri, 5 Oct 2012 19:47:57 +0000 (16:47 -0300)]
Call MADV_HUGEPAGE for guest RAM allocations
This makes it possible for QEMU to use transparent huge pages (THP)
when transparent_hugepage/enabled=madvise. Otherwise THP is only
used when it's enabled system wide.
Anthony Liguori [Mon, 22 Oct 2012 18:26:23 +0000 (13:26 -0500)]
Merge remote-tracking branch 'quintela/migration-next-20121017' into staging
* quintela/migration-next-20121017: (41 commits)
cpus: create qemu_in_vcpu_thread()
savevm: make qemu_file_put_notify() return errors
savevm: un-export qemu_file_set_error()
block-migration: handle errors with the return codes correctly
block-migration: Switch meaning of return value
block-migration: make flush_blks() return errors
buffered_file: buffered_put_buffer() don't need to set last_error
savevm: Only qemu_fflush() can generate errors
savevm: make qemu_fill_buffer() be consistent
savevm: unexport qemu_ftell()
savevm: unfold qemu_fclose_internal()
savevm: make qemu_fflush() return an error code
savevm: Remove qemu_fseek()
virtio-net: use qemu_get_buffer() in a temp buffer
savevm: unexport qemu_fflush
migration: make migrate_fd_wait_for_unfreeze() return errors
buffered_file: make buffered_flush return the error code
buffered_file: callers of buffered_flush() already check for errors
buffered_file: We can access directly to bandwidth_limit
buffered_file: unfold migrate_fd_close
...
Anthony Liguori [Mon, 22 Oct 2012 18:26:07 +0000 (13:26 -0500)]
Merge remote-tracking branch 'qemu-kvm/memory/dma' into staging
* qemu-kvm/memory/dma: (23 commits)
pci: honor PCI_COMMAND_MASTER
pci: give each device its own address space
memory: add address_space_destroy()
dma: make dma access its own address space
memory: per-AddressSpace dispatch
s390: avoid reaching into memory core internals
memory: use AddressSpace for MemoryListener filtering
memory: move tcg flush into a tcg memory listener
memory: move address_space_memory and address_space_io out of memory core
memory: manage coalesced mmio via a MemoryListener
xen: drop no-op MemoryListener callbacks
kvm: drop no-op MemoryListener callbacks
xen_pt: drop no-op MemoryListener callbacks
vfio: drop no-op MemoryListener callbacks
memory: drop no-op MemoryListener callbacks
memory: provide defaults for MemoryListener operations
memory: maintain a list of address spaces
memory: export AddressSpace
memory: prepare AddressSpace for exporting
xen_pt: use separate MemoryListeners for memory and I/O
...
Avi Kivity [Wed, 3 Oct 2012 15:17:27 +0000 (17:17 +0200)]
pci: give each device its own address space
Accesses from different devices can resolve differently
(depending on bridge settings, iommus, and PCI_COMMAND_MASTER), so
set up an address space for each device.
Currently iommus are expressed outside the memory API, so this doesn't
work if an iommu is present.
Avi Kivity [Wed, 3 Oct 2012 14:22:53 +0000 (16:22 +0200)]
memory: per-AddressSpace dispatch
Currently we use a global radix tree to dispatch memory access. This only
works with a single address space; to support multiple address spaces we
make the radix tree a member of AddressSpace (via an intermediate structure
AddressSpaceDispatch to avoid exposing too many internals).
A side effect is that address_space_io also gains a dispatch table. When
we remove all the pre-memory-API I/O registrations, we can use that for
dispatching I/O and get rid of the original I/O dispatch.
Avi Kivity [Tue, 2 Oct 2012 16:54:45 +0000 (18:54 +0200)]
memory: move tcg flush into a tcg memory listener
We plan to make the core listener listen to all address spaces; this
will cause many more flushes than necessary. Prepare for that by
moving the flush into a tcg-specific listener.
Later we can avoid registering the listener if tcg is disabled.
Avi Kivity [Tue, 2 Oct 2012 16:21:54 +0000 (18:21 +0200)]
memory: manage coalesced mmio via a MemoryListener
Instead of calling a global function on coalesced mmio changes, which
routes the call to kvm if enabled, add coalesced mmio hooks to
MemoryListener and make kvm use that instead.
The motivation is support for multiple address spaces (which means we
we need to filter the call on the right address space) but the result
is cleaner as well.
Michael Tokarev [Sun, 21 Oct 2012 18:52:54 +0000 (22:52 +0400)]
fix CONFIG_QEMU_HELPERDIR generation again
commit 38f419f35225 fixed a breakage with CONFIG_QEMU_HELPERDIR
which has been introduced by 8bf188aa18ef7a8. But while techinically
that fix has been correct, all other similar variables are handled
differently. Make it consistent, and let scripts/create_config
expand and capitalize the variable properly like for all other
qemu_*dir variables.
Peter Maydell [Thu, 18 Oct 2012 13:11:35 +0000 (14:11 +0100)]
qemu-log: Add new log category for guest bugs
Add a new category for device models to log guest behaviour
which is likely to be a guest bug of some kind (accessing
nonexistent registers, reading 32 bit wide registers with
a byte access, etc). Making this its own log category allows
those who care (mostly guest OS authors) to see the complaints
without bothering most users.
Subroutines do their own local temporary management.
Within disas_sparc_insn we limit the existance of the variable
to OP=2 insns, and delay initialization as late as is reasonable
for the specific XOP.
exec: Allocate code_gen_prologue from code_gen_buffer
We had a hack for arm and sparc, allocating code_gen_prologue to a
special section. Which, honestly does no good under certain cases.
We've already got limits on code_gen_buffer_size to ensure that all
TBs can use direct branches between themselves; reuse this limit to
ensure the prologue is also reachable.
As a bonus, we get to avoid marking a page of the main executable's
data segment as executable.
exec: Do not use absolute address hints for code_gen_buffer with -fpie
The hard-coded addresses inside alloc_code_gen_buffer only make sense
if we're building an executable that will actually run at the address
we've put into the linker scripts.
When we're building with -fpie, the executable will run at some
random location chosen by the kernel. We get better placement for
the code_gen_buffer if we allow the kernel to place the memory,
as it will tend to to place it near the executable, based on the
PROT_EXEC bit.
Since code_gen_prologue is always inside the executable, this effect
is easily seen at the end of most TB, with the exit_tb opcode, and
with any calls to helper functions.
Eduardo Habkost [Mon, 15 Oct 2012 20:22:02 +0000 (17:22 -0300)]
create struct for machine initialization arguments
This should help us to:
- More easily add or remove machine initialization arguments without
having to change every single machine init function;
- More easily make mechanical changes involving the machine init
functions in the future;
- Let machine initialization forward the init arguments to other
functions more easily.
This change was half-mechanical process: first the struct was added with
the local ram_size, boot_device, kernel_*, initrd_*, and cpu_model local
variable initialization to all functions. Then the compiler helped me
locate the local variables that are unused, so they could be removed.
Michael Roth [Mon, 8 Oct 2012 20:45:49 +0000 (15:45 -0500)]
tci: fix build breakage for target-sparc
commit c28ae41 introduced GETPC() usage for sparc, which is currently
not defined when building with --enable-tcg-interpreter. Add sparc to
the list of targets we selectively define GETPC() for.
Jan Kiszka [Wed, 17 Oct 2012 17:09:25 +0000 (19:09 +0200)]
configure: Fix CONFIG_QEMU_HELPERDIR generation
We need to evaluate $libexecdir in configure, otherwise we literally end
up with "${prefix}/libexec" instead of the absolute path as
CONFIG_QEMU_HELPERDIR.
Peter Maydell [Thu, 4 Oct 2012 15:22:01 +0000 (16:22 +0100)]
qemu-options.hx: Change from recommending '?' to 'help'
Update the -help output and documentation so that it recommends
'help' rather than '?' for the various "list valid values for this
option" cases. '?' is deprecated (as it can fail confusingly if
not quoted), so it's better to steer users towards 'help'. ('?'
still works, for backwards compatibility.)
This is the -help option part of the change otherwise done in
commit c8057f9, since we are now past release 1.2 and free to
change our help text without worrying about breaking libvirt.