Igor Mammedov [Mon, 21 Jan 2013 14:06:38 +0000 (15:06 +0100)]
target-i386: Set custom features/properties without intermediate x86_def_t
Move custom features parsing after built-in cpu_model defaults are set
and set custom features directly on CPU instance. That allows to make a
clear distinction between built-in cpu model defaults that eventually
should go into class_init() and extra property setting which is done
after defaults are set on CPU instance.
Impl. details:
* use object_property_parse() property setter so it would be a mechanical
change to switch to global properties later.
* And after all current features/properties are converted into static
properties, it will take a trivial patch to switch to global properties.
Which will allow to:
* get CPU instance initialized with all parameters passed on -cpu ...
cmd. line from object_new() call.
* call cpu_model/featurestr parsing only once before CPUs are created
* open a road for removing CPUxxxState.cpu_model_str field, when other
CPUs are similarly converted to subclasses and static properties.
- re-factor error handling, to use Error instead of fprintf()s, since
it is anyway passed in for property setter.
Igor Mammedov [Mon, 21 Jan 2013 14:06:37 +0000 (15:06 +0100)]
target-i386: Remove vendor_override field from CPUX86State
Commit 8935499831312 makes cpuid return to guest host's vendor value
instead of built-in one by default if kvm_enabled() == true and allows
to override this behavior if 'vendor' is specified on -cpu command line.
But every time guest calls cpuid to get 'vendor' value, host's value is
read again and again in default case.
It complicates semantics of vendor property and makes it harder to use.
Instead of reading 'vendor' value from host every time cpuid[vendor] is
called, override 'vendor' value only once in cpu_x86_find_by_name(), when
built-in CPU model is found and if(kvm_enabled() == true).
It provides the same default semantics
if (kvm_enabled() == true) vendor = host's vendor
else vendor = built-in vendor
and then later:
if (custom vendor) vendor = custom vendor
'vendor' value is overridden when user provides it on -cpu command line,
and there is no need for vendor_override field anymore, remove it.
Igor Mammedov [Mon, 21 Jan 2013 14:06:36 +0000 (15:06 +0100)]
target-i386: Replace uint32_t vendor fields by vendor string in x86_def_t
Vendor property setter takes string as vendor value but cpudefs
use uint32_t vendor[123] fields to define vendor value. It makes it
difficult to unify and use property setter for values from cpudefs.
Simplify code by using vendor property setter, vendor[123] fields
are converted into vendor[13] array to keep its value. And vendor
property setter is used to access/set value on CPU.
- Make for() cycle reusable for the next patch by adding
x86_cpu_vendor_words2str()
Intel's CPUID spec[1] says:
"
5.1.1 ...
These registers contain the ASCII string: GenuineIntel
...
"
List[2] of known vendor values shows that they all are 12 ASCII
characters long, padded where necessary with space.
Current supported values are all ASCII characters packed in
ebx, edx, ecx. So lets state that QEMU supports 12 printable ASCII
characters packed in ebx, edx, ecx registers for cpuid(0) instruction.
Andreas Färber [Mon, 21 Jan 2013 00:02:28 +0000 (01:02 +0100)]
target-i386: Drop redundant list of CPU definitions
It is no longer needed since dropping cpudef config file support.
Cleaning this up removes knowledge about other models from x86_def_t,
in preparation for reusing x86_def_t as intermediate step towards pure
QOM X86CPU subclasses.
Andreas Färber [Mon, 21 Jan 2013 00:00:24 +0000 (01:00 +0100)]
target-i386: Simplify cpu_x86_find_by_name()
Catch NULL name argument early to avoid repeated checks.
Similarly, check for -cpu host early and untangle from iterating through
model definitions. This prepares for introducing X86CPU subclasses.
Eduardo Habkost [Tue, 22 Jan 2013 20:25:09 +0000 (18:25 -0200)]
pc: Generate APIC IDs according to CPU topology
This keeps compatibility on machine-types pc-1.2 and older, and prints a
warning in case the requested configuration won't get the correct
topology.
I couldn't think of a better way to warn about broken topology when in
compat mode other than using error_report(). The warning message will
probably be buried in a log file somewhere, but it's better than
nothing.
Eduardo Habkost [Wed, 23 Jan 2013 17:58:27 +0000 (15:58 -0200)]
target-i386: Topology & APIC ID utility functions
This introduces utility functions for the APIC ID calculation, based on:
Intel® 64 Architecture Processor Topology Enumeration
http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/
The code should be compatible with AMD's "Extended Method" described at:
AMD CPUID Specification (Publication #25481)
Section 3: Multiple Core Calcuation
as long as:
- nr_threads is set to 1;
- OFFSET_IDX is assumed to be 0;
- CPUID Fn8000_0008_ECX[ApicIdCoreIdSize[3:0]] is set to
apicid_core_width().
Eduardo Habkost [Wed, 23 Jan 2013 17:51:18 +0000 (15:51 -0200)]
pc: Set fw_cfg data based on APIC ID calculation
This changes FW_CFG_MAX_CPUS and FW_CFG_NUMA to use apic_id_for_cpu(),
so the NUMA table can be based on the APIC IDs, instead of CPU index
(SeaBIOS knows nothing about CPU indexes, just APIC IDs).
Eduardo Habkost [Tue, 22 Jan 2013 20:25:02 +0000 (18:25 -0200)]
target-i386: kvm: Set vcpu_id to APIC ID instead of CPU index
The CPU ID in KVM is supposed to be the APIC ID, so change the
KVM_CREATE_VCPU call to match it. The current behavior didn't break
anything yet because today the APIC ID is assumed to be equal to the CPU
index, but this won't be true in the future.
Eduardo Habkost [Thu, 17 Jan 2013 20:59:29 +0000 (18:59 -0200)]
pc: Reverse pc_init_pci() compatibility logic
Currently, the pc-1.4 machine init function enables PV EOI and then
calls the pc-1.2 machine init function. The problem with this approach
is that now we can't enable any additional compatibility code inside the
pc-1.2 init function because it would end up enabling the compatibility
behavior on pc-1.3 and pc-1.4 as well.
This reverses the logic so that the pc-1.2 machine init function will
disable PV EOI, and then call the pc-1.4 machine init function.
This way we can change older machine-types to enable compatibility
behavior, and the newer machine-types (pc-1.3, pc-q35-1.4 and
pc-i440fx-1.4) would just use the default behavior.
(This means that one nice side-effect of this change is that pc-q35-1.4
will get PV EOI enabled by default, too)
It would be interesting to eventually change pc_init_pci_no_kvmclock()
and pc_init_isa() to reuse pc_init_pci_1_2() as well (so we don't need
to duplicate compatibility code on those two functions). But this will
be probably much easier to do after we create a PCInitArgs struct for
the PC initialization arguments, and/or after we use global-properties
to implement the compatibility modes present in pc_init_pci_1_2().
Eduardo Habkost [Thu, 17 Jan 2013 20:59:28 +0000 (18:59 -0200)]
target-i386: Don't set any KVM flag by default if KVM is disabled
This is a cleanup that tries to solve two small issues:
- We don't need a separate kvm_pv_eoi_features variable just to keep a
constant calculated at compile-time, and this style would require
adding a separate variable (that's declared twice because of the
CONFIG_KVM ifdef) for each feature that's going to be
enabled/disabled by machine-type compat code.
- The pc-1.3 code is setting the kvm_pv_eoi flag on cpuid_kvm_features
even when KVM is disabled at runtime. This small inconsistency in
the cpuid_kvm_features field isn't a problem today because
cpuid_kvm_features is ignored by the TCG code, but it may cause
unexpected problems later when refactoring the CPUID handling code.
This patch eliminates the kvm_pv_eoi_features variable and simply uses
kvm_enabled() inside the enable_kvm_pv_eoi() compat function, so it
enables kvm_pv_eoi only if KVM is enabled. I believe this makes the
behavior of enable_kvm_pv_eoi() clearer and easier to understand.
Andreas Färber [Thu, 17 Jan 2013 16:30:08 +0000 (17:30 +0100)]
target-openrisc: Clean up triple QOM casts
Instead of calling openrisc_env_get_cpu(), casting to CPU() via the
ENV_GET_CPU() compatibility macro and casting back to OPENRISC_CPU(),
just call openrisc_env_get_cpu() directly.
ENV_GET_CPU() is meant as workaround for target-independent code only.
Andreas Färber [Sat, 5 Jan 2013 13:14:27 +0000 (14:14 +0100)]
target-openrisc: Drop OpenRISCCPUList
It was missed in 92a3136174f60ee45b113296cb2c2a5225b00369 (cpu:
Introduce CPUListState struct) because its naming did not match the
*CPUListState pattern. Use the generalized CPUListState instead.
Default to moving back to the IDLE state after the COLLECTING_DATA
state. For a well behaved guest this patch has no consequence, but
A bad guest could crash QEMU by using one of the erase commands
followed by a longer than 5 byte argument (undefined behaviour).
xilinx_ethlite: Flush queued packets on SW service
Software services a received packet by clearing the CTRL_S bit in the RX_CTRLn
register. If this bit is cleared, flush any packets queued for the device.
The eth_can_rx() function only checks the first buffers status ("ping"). The
controller should be able to receive into "pong" when ping-pong is enabled.
Checks the active buffer (either "ping" or "pong") when determining can_rx()
rather than just testing "ping".
Blue Swirl [Sat, 26 Jan 2013 14:18:28 +0000 (14:18 +0000)]
Merge branch 'ppc-for-upstream' of git://repo.or.cz/qemu/agraf
* 'ppc-for-upstream' of git://repo.or.cz/qemu/agraf:
PPC: e500: Select MPIC v4.2 on ppce500 platform
PPC: e500: fix mpic_iack address
openpic: add basic support for MPIC v4.2
openpic: fix timer address decoding
openpic: fix remaining issues from idr-to-destmask conversion
pseries: Adjust default VIO address allocations to play better with libvirt
pseries: Improve handling of multiple PCI host bridges
target-ppc: Give a meaningful error if too many threads are specified
cuda: Move ADB bus into CUDA state
adb: QOM'ify ADB devices
adb: QOM'ify Apple Desktop Bus
cuda: QOM'ify CUDA
ide/macio: QOM'ify MacIO IDE
mac_nvram: QOM'ify MacIO NVRAM
mac_nvram: Mark as Big Endian
mac_nvram: Clean up public API
macio: Split MacIO in two
macio: Delay qdev init until all fields are initialized
macio: QOM'ify some more
ppc: Move Mac machines to hw/ppc/
Andreas Färber [Sat, 26 Jan 2013 11:45:14 +0000 (12:45 +0100)]
tests: Add gcov support for x86_64 qtest
Since x86_64 is a superset of i386 and reuses all its test cases, adopt
all the i386 gcov source files as well, substituting their paths
appropriately.
Michael Tokarev [Fri, 25 Jan 2013 17:23:24 +0000 (21:23 +0400)]
vmware_vga: fix out of bounds and invalid rects updating
This is a follow up for several attempts to fix this issue.
Previous incarnations:
1. http://thread.gmane.org/gmane.linux.ubuntu.bugs.general/3156089
https://bugs.launchpad.net/bugs/918791
"qemu-kvm dies when using vmvga driver and unity in the guest" bug.
Fix by Serge Hallyn:
https://launchpadlibrarian.net/94916786/qemu-vmware.debdiff
This fix is incomplete, since it does not check width and height
for being negative. Serge weren't sure if that's the right place
to fix it, maybe the fix should be up the stack somewhere.
2. http://thread.gmane.org/gmane.comp.emulators.qemu/166064
by Marek Vasut: "vmware_vga: Redraw only visible area"
This one adds the (incomplete) check to vmsvga_update_rect_delayed(),
the routine just queues the rect updating but does no interesting
stuff. It is also incomplete in the same way as patch by Serge,
but also does not touch width&height at all after adjusting x&y,
which is wrong.
As far as I can see, when processing guest requests, the device
places them into a queue (vmsvga_update_rect_delayed()) and
processes this queue in different place/time, namely, in
vmsvga_update_rect(). Sometimes, vmsvga_update_rect() is
called directly, without placing the request to the gueue.
This is the place this patch changes, which is the last
(deepest) in the stack. I'm not sure if this is the right
place still, since it is possible we have some queue optimization
(or may have in the future) which will be upset by negative/wrong
values here, so maybe we should check for validity of input
right when receiving request from the guest (and maybe even
use unsigned types there). But I don't know the protocol
and implementation enough to have a definitive answer.
But since vmsvga_update_rect() has other sanity checks already,
I'm adding the missing ones there as well.
Cc'ing BALATON Zoltan and Andrzej Zaborowski who shows in `git blame'
output and may know something in this area.
If this patch is accepted, it should be applied to all active
stable branches (at least since 1.1, maybe even before), with
minor context change (ds_get_*(s->vga.ds) => s->*). I'm not
Cc'ing -stable yet, will do it explicitly once the patch is
accepted.
BTW, these checks use fprintf(stderr) -- it should be converted
to something more appropriate, since stderr will most likely
disappear somewhere.
Peter Maydell [Thu, 24 Jan 2013 19:02:28 +0000 (19:02 +0000)]
hw/arm_boot: Align device tree to 4KB boundary, not page
Align the device tree blob to a 4KB boundary, not to QEMU's
idea of a page boundary -- the latter is the smallest possible
page size for the architecture, which on ARM is 1KB.
The documentation for Linux does not impose separation
or alignment requirements on the device tree blob, but
in practice some kernels will happily trash the entire
page the initrd ends in after they have finished uncompressing
the initrd. So 4KB-align the DTB to ensure it does not get
trampled by these kernels.
Andreas Färber [Thu, 24 Jan 2013 15:47:55 +0000 (16:47 +0100)]
make_device_config.sh: Fix target path in generated dependency file
config-devices.mak.d is included from Makefile.target, i.e. from inside
the *-softmmu/ directory. It included the directory path, so never
applied to the actual ./config-devices.mak. Symptoms were spurious
build failures due to missing dependency on default-configs/pci.mak.
Fix this by using `basename` to strip the directory path.
fw_cfg: Splash image loader can overrun a stack variable, fix
read_splashfile() passes the address of an int variable as size_t *
parameter to g_file_get_contents(), with a cast to gag the compiler.
No problem on machines where sizeof(size_t) == sizeof(int).
Happens to work on my x86_64 box (64 bit little endian): the least
significant 32 bits of the file size end up in the right place
(caller's variable file_size), and the most significant 32 bits
clobber a place that gets assigned to before its next use (caller's
variable file_type).
I'd expect it to break on a 64 bit big-endian box.
Fix up the variable types and drop the problematic cast.
Peter Maydell [Mon, 21 Jan 2013 12:50:56 +0000 (12:50 +0000)]
hw/pxa2xx_timer: Explicitly mark fallthroughs
Explicitly mark the fallthroughs as intentional in the code
pattern where we gradually increment an index before falling
into the code to read/write that array entry:
case THINGY_3: idx++;
case THINGY_2: idx++;
case THINGY_1: idx++;
case THINGY_0: return s->thingy[idx];
Peter Maydell [Mon, 21 Jan 2013 12:50:55 +0000 (12:50 +0000)]
hw/smc91c111: Add explicit 'return' rather than relying on fallthrough
Add an explicit 'return' statement to a case in smc91c111_readb
rather than relying on fallthrough to the following case's
return statement, for code clarity and to placate static analysers.
Peter Maydell [Mon, 21 Jan 2013 12:50:53 +0000 (12:50 +0000)]
hw/omap_dma, hw/omap_spi: Explicitly mark fallthroughs
Explicitly mark the fallthroughs as intentional in the code
pattern where we gradually increment an index before falling
into the code to read/write that array entry:
case THINGY_3: idx++;
case THINGY_2: idx++;
case THINGY_1: idx++;
case THINGY_0: return s->thingy[idx];
Peter Maydell [Mon, 21 Jan 2013 12:50:52 +0000 (12:50 +0000)]
hw/omap1.c: Add fallthrough markers and breaks
Explicitly mark cases where we are deliberately falling
through to the following code. In one case we insert a
'break' instead of falling through to a 'break', as this
seems slightly clearer.
Peter Maydell [Mon, 21 Jan 2013 12:50:51 +0000 (12:50 +0000)]
hw/arm_sysctl.c: Add missing 'break' statements
Add some break statements that were accidentally omitted
from some cases of arm_sysctl_write(). The omission was
harmless because in both cases the following case did
an immediate break, but adding the breaks explicitly
placates static analysers and avoids weird behaviour if
the following register is ever implemented as something
other than a no-op.
Michael Tokarev [Sat, 19 Jan 2013 14:58:09 +0000 (18:58 +0400)]
link seccomp only with softmmu targets
Now, if seccomp is detected, it is linked into every executable,
but is used only by softmmu targets (from vl.c). So link it
only where it is actually needed.
Paolo Bonzini [Sat, 19 Jan 2013 10:06:48 +0000 (11:06 +0100)]
build: remove extra-obj-y
extra-obj-y is somewhat complicated to understand. Replace it with a
special CONFIG_ALL symbol that is defined only at toplevel.
This limits the case of directories defining more than one
*-obj-y target.
Paolo Bonzini [Sat, 19 Jan 2013 10:06:47 +0000 (11:06 +0100)]
build: remove universal-obj-y
All of universal-obj-y, user-obj-y (right now unused) and common-obj-y can
be unified into common-obj-y if we take care of defining CONFIG_SOFTMMU
and CONFIG_USER_ONLY in the toplevel makefile. This is similar to how
we define symbols for hardware components.
Paolo Bonzini [Sat, 19 Jan 2013 10:06:45 +0000 (11:06 +0100)]
build: move around libcacard-y definition
It is also needed if !CONFIG_SOFTMMU, unlike everything that surrounds it. Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Blue Swirl <[email protected]>
Scott Wood [Mon, 21 Jan 2013 15:53:55 +0000 (15:53 +0000)]
PPC: e500: Select MPIC v4.2 on ppce500 platform
The compatible string is changed to fsl,mpic on all e500 platforms, to
advertise the existence of BRR1. This matches what the device tree will
have on real hardware.
With MPIC v4.2 max_cpu can be increased from 15 to 32.
Scott Wood [Mon, 21 Jan 2013 15:53:53 +0000 (15:53 +0000)]
openpic: add basic support for MPIC v4.2
Besides the new value in the version register, this provides:
- ILR support, which includes:
- IDR becoming a pure CPU bitmap, allowing 32 CPUs
- machine check output support (though other parts of QEMU need to
be fixed for it to do something other than immediately reboot the
guest)
- dummy error interrupt support (EISR0/EIMR0 read as zero)
- actually all FSL MPICs get all summary registers returning zero for now,
which includes EISR0/EIMR0
Various refactoring is done to support these changes and to ease
new functionality (e.g. a more flexible way of declaring regions).
Just as the code was already not a full implementation of MPIC v2.0,
this is not a full implementation of MPIC v4.2 -- e.g. it still has only
one bank of MSIs.
Scott Wood [Mon, 21 Jan 2013 15:53:52 +0000 (15:53 +0000)]
openpic: fix timer address decoding
The timer memory range begins at 0x10f0, so that address 0x1120 shows
up as 0x30, 0x1130 shows up as 0x40, etc. However, the address
decoding (other than TFRR) is not adjusted for this, causing the
wrong registers to be accessed.
Scott Wood [Mon, 21 Jan 2013 15:53:51 +0000 (15:53 +0000)]
openpic: fix remaining issues from idr-to-destmask conversion
openpic_update_irq() was checking idr rather than destmask, treating
it as if it were a simple bitmap of cpus. Changed to use destmask.
IPI delivery was removing bits directly from .idr, without calling
write_IRQreg_idr so that the change could be conveyed to destmask.
Changed to use destmask directly.
Save/restore destmask when serializing, as due to the IPI change it
cannot be reproduced from idr.
David Gibson [Wed, 23 Jan 2013 17:20:43 +0000 (17:20 +0000)]
pseries: Adjust default VIO address allocations to play better with libvirt
Currently, if VIO devices for pseries don't have addresses explicitly
allocated, they get automatically numbered from 0x1000. This is in the
same general range that libvirt will typically assign VIO device addresses.
That means that if there is a device libvirt doesn't know about, and it
gets an address assigned before the libvirt assigned devices are processed,
we can end up with an address conflict (qemu will abort with an error).
While the real solution is to teach libvirt about the other devices, so it
can correctly manage the whole allocation, this patch reduces the interim
inconvenience by moving qemu allocations to a range that libvirt is less
likely to conflict with.
Because the guest gets the device addresses through the device tree, these
addresses are truly arbitrary and can be changed without breaking guests.
David Gibson [Wed, 23 Jan 2013 17:20:39 +0000 (17:20 +0000)]
pseries: Improve handling of multiple PCI host bridges
Multiple - even many - PCI host bridges (i.e. PCI domains) are very
common on real PAPR compliant hardware. For reasons related to the
PAPR specified IOMMU interfaces, PCI device assignment with VFIO will
generally require at least two (virtual) PHBs and possibly more
depending on which devices are assigned.
At the moment the qemu PAPR PCI code will not deal with this well,
leaving several crucial parameters of PHBs other than the default one
uninitialized. This patch reworks the code to allow this.
Every PHB needs a unique BUID (Bus Unit Identifier, the id used for
the PAPR PCI related interfaces) and a unique LIOBN (Logical IO Bus
Number, the id used for the PAPR IOMMU related interfaces). In
addition they need windows in CPU real address space to access PCI
memory space, PCI IO space and MSIs. Properties are added to the PCI
host bridge qdevice to allow configuration of all these.
To simplify configuration of multiple PHBs for common cases, a
convenience "index" property is also added. This can be set instead
of the low-level properties, and will generate suitable values for the
other parameters, different for each index value.
Mike Qiu [Wed, 23 Jan 2013 17:20:38 +0000 (17:20 +0000)]
target-ppc: Give a meaningful error if too many threads are specified
Currently the target-ppc tcg code only supports a single thread. You can
specify more, but they're treated identically to multiple cores. On KVM
we obviously can't support more threads than the hardware; if more are
specified it will cause strange and cryptic errors.
This patch clarifies the situation by giving a simple meaningful error if
more threads are specified than we can support.
Andreas Färber [Wed, 23 Jan 2013 23:04:04 +0000 (23:04 +0000)]
adb: QOM'ify ADB devices
They were not qdev'ified before. Derive ADBDevice from DeviceState and
convert reset callbacks to DeviceClass::reset, ADBDevice::opaque pointer
to ADBDevice subtypes for mouse and keyboard and adb_{kbd,mouse}_init()
to regular qdev functions.
Fixing Coding Style issues and splitting keyboard and mouse off into
their own files is left for a later point in time.
Andreas Färber [Wed, 23 Jan 2013 23:04:03 +0000 (23:04 +0000)]
adb: QOM'ify Apple Desktop Bus
It was not a qbus before, turn it into a first-class bus and initialize
it properly from CUDA. Leave it a global variable as long as devices are
not QOM'ified yet.
The qmp monitor command to mirror a disk was passing -1 for size
along with the disk's backing file. This size of the resulting disk
is the size of the backing file, which is incorrect if the disk
has been resized. Therefore we should always pass in the size of
the current disk.
Jason Baron [Fri, 4 Jan 2013 19:44:42 +0000 (14:44 -0500)]
ahci: Add migration support
Jason tested these patches by migrating Windows 7 and Fedora 17 guests
(while under I/O) on both piix with ahci attached and on q35 (which has
a built-in AHCI controller).
Paolo Bonzini [Tue, 22 Jan 2013 14:01:12 +0000 (15:01 +0100)]
hbitmap: add assertion on hbitmap_iter_init
hbitmap_iter_init causes an out-of-bounds access when the "first"
argument is or greater than or equal to the size of the bitmap.
Forbid this with an assertion, and remove the failing testcase.
Paolo Bonzini [Tue, 22 Jan 2013 14:01:11 +0000 (15:01 +0100)]
mirror: do nothing on zero-sized disk
On a zero-sized disk we need to break out of the job successfully
before bdrv_dirty_iter_init is called, otherwise you will get an
assertion failure with the next patch.
Stefan Weil [Thu, 17 Jan 2013 20:45:24 +0000 (21:45 +0100)]
block: Add special error code for wrong format
The block drivers need a special error code for "wrong format".
From the available error codes EMEDIUMTYPE fits best.
It is not available on all platforms, so a definition in
qemu-common.h and a specific error report are needed.
Paolo Bonzini [Tue, 22 Jan 2013 08:03:15 +0000 (09:03 +0100)]
mirror: support arbitrarily-sized iterations
Yet another optimization is to extend the mirroring iteration to include more
adjacent dirty blocks. This limits the number of I/O operations and makes
mirroring efficient even with a small granularity. Most of the infrastructure
is already in place; we only need to put a loop around the computation of
the origin and sector count of the iteration.
Paolo Bonzini [Tue, 22 Jan 2013 08:03:14 +0000 (09:03 +0100)]
mirror: support more than one in-flight AIO operation
With AIO support in place, we can start copying more than one chunk
in parallel. This patch introduces the required infrastructure for
this: the buffer is split into multiple granularity-sized chunks,
and there is a free list to access them.
Because of copy-on-write, a single operation may already require
multiple chunks to be available on the free list.
In addition, two different iterations on the HBitmap may want to
copy the same cluster. We avoid this by keeping a bitmap of in-flight
I/O operations, and blocking until the previous iteration completes.
This should be a pretty rare occurrence, though; as long as there is
no overlap the next iteration can start before the previous one finishes.
Paolo Bonzini [Tue, 22 Jan 2013 08:03:12 +0000 (09:03 +0100)]
mirror: switch mirror_iteration to AIO
There is really no change in the behavior of the job here, since
there is still a maximum of one in-flight I/O operation between
the source and the target. However, this patch already introduces
the AIO callbacks (which are unmodified in the next patch)
and some of the logic to count in-flight operations and only
complete the job when there is none.
Paolo Bonzini [Mon, 21 Jan 2013 16:09:46 +0000 (17:09 +0100)]
mirror: allow customizing the granularity
The desired granularity may be very different depending on the kind of
operation (e.g. continuous replication vs. collapse-to-raw) and whether
the VM is expected to perform lots of I/O while mirroring is in progress.
Allow the user to customize it, while providing a sane default so that
in general there will be no extra allocated space in the target compared
to the source.
Paolo Bonzini [Mon, 21 Jan 2013 16:09:43 +0000 (17:09 +0100)]
mirror: perform COW if the cluster size is bigger than the granularity
When mirroring runs, the backing files for the target may not yet be
ready. However, this means that a copy-on-write operation on the target
would fill the missing sectors with zeros. Copy-on-write only happens
if the granularity of the dirty bitmap is smaller than the cluster size
(and only for clusters that are allocated in the source after the job
has started copying). So far, the granularity was fixed to 1MB; to avoid
the problem we detected the situation and required the backing files to
be available in that case only.
However, we want to lower the granularity for efficiency, so we need
a better solution. The solution is to always copy a whole cluster the
first time it is touched. The code keeps a bitmap of clusters that
have already been allocated by the mirroring job, and only does "manual"
copy-on-write if the chunk being copied is zero in the bitmap.
Paolo Bonzini [Mon, 21 Jan 2013 16:09:40 +0000 (17:09 +0100)]
add hierarchical bitmap data type and test cases
HBitmaps provides an array of bits. The bits are stored as usual in an
array of unsigned longs, but HBitmap is also optimized to provide fast
iteration over set bits; going from one bit to the next is O(logB n)
worst case, with B = sizeof(long) * CHAR_BIT: the result is low enough
that the number of levels is in fact fixed.
In order to do this, it stacks multiple bitmaps with progressively coarser
granularity; in all levels except the last, bit N is set iff the N-th
unsigned long is nonzero in the immediately next level. When iteration
completes on the last level it can examine the 2nd-last level to quickly
skip entire words, and even do so recursively to skip blocks of 64 words or
powers thereof (32 on 32-bit machines).
Given an index in the bitmap, it can be split in group of bits like
this (for the 64-bit case):
bits 0-57 => word in the last bitmap | bits 58-63 => bit in the word
bits 0-51 => word in the 2nd-last bitmap | bits 52-57 => bit in the word
bits 0-45 => word in the 3rd-last bitmap | bits 46-51 => bit in the word
So it is easy to move up simply by shifting the index right by
log2(BITS_PER_LONG) bits. To move down, you shift the index left
similarly, and add the word index within the group. Iteration uses
ffs (find first set bit) to find the next word to examine; this
operation can be done in constant time in most current architectures.
Setting or clearing a range of m bits on all levels, the work to perform
is O(m + m/W + m/W^2 + ...), which is O(m) like on a regular bitmap.
When iterating on a bitmap, each bit (on any level) is only visited
once. Hence, The total cost of visiting a bitmap with m bits in it is
the number of bits that are set in all bitmaps. Unless the bitmap is
extremely sparse, this is also O(m + m/W + m/W^2 + ...), so the amortized
cost of advancing from one bit to the next is usually constant.
Luiz Capitulino [Sat, 1 Dec 2012 02:14:57 +0000 (00:14 -0200)]
balloon: re-enable balloon stats
The statistics are now available through device properties via a
polling mechanism. First a client has to enable polling, then it
can query available stats.
Polling is enabled by setting an update interval (in seconds)
to a property named guest-stats-polling-interval, like this:
Jeff Cody [Fri, 18 Jan 2013 17:45:35 +0000 (12:45 -0500)]
block: Monitor command commit neglects to report some errors
The non-live bdrv_commit() function may return one of the following
errors: -ENOTSUP, -EBUSY, -EACCES, -EIO. The only error that is
checked in the HMP handler is -EBUSY, so the monitor command 'commit'
silently fails for all error cases other than 'Device is in use'.
Report error using monitor_printf() and strerror(), and convert existing
qerror_report() calls in do_commit() to monitor_printf().