Eduardo Habkost [Thu, 9 Jan 2014 19:12:42 +0000 (17:12 -0200)]
pc: Save size of RAM below 4GB
The ram_below_4g value will be useful in other places, such as the ACPI
table code, and other code that currently requires passing
below_4g_mem_size around in function arguments.
Marcel Apfelbaum [Tue, 21 Jan 2014 16:37:51 +0000 (18:37 +0200)]
hw/pci: fix error flow in pci multifunction init
Scenario:
- There is a non multifunction pci device A on 00:0X.0.
- Hot-plug another multifunction pci device B at 00:0X.1.
- The operation will fail of course.
- Try to hot-plug the B device 2-3 more times, qemu will crash.
Reason: The error flow leaves the B's address space into global address spaces
list, but the device object is freed. Fixed that.
Igor Mammedov [Thu, 9 Jan 2014 16:36:35 +0000 (17:36 +0100)]
pc: PIIX DSDT: exclude CPU/PCI hotplug & GPE0 IO range from PCI bus resources
.. so that they might not be used by PCI devices.
Note:
Resort to concatenating templates with preprocessor help,
because 1.0b spec isn't supporting ConcatenateResTemplate,
as result Windows XP fails to execute PCI0._CRS method if
ConcatenateResTemplate() is used.
This enables support for device hotplug behind
pci bridges. Bridge devices themselves need
to be pre-configured on qemu command line.
Design:
- at machine init time, assign "bsel" property to bridges with
hotplug support
- dynamically (At ACPI table read) generate ACPI code to handle
hotplug events for each bridge with "bsel" property
Note: ACPI doesn't support adding or removing bridges by hotplug.
We detect and prevent removal of bridges by hotplug,
unless they were added by hotplug previously
(and so, are not described by ACPI).
Add ACPI based PCI hotplug library with bridge hotplug
support.
Design
- each bus gets assigned "bsel" property.
- ACPI code writes this number
to a new BNUM register, then uses existing
UP/DOWN registers to probe slot status;
to eject, write number to BNUM register,
then slot into existing EJ.
The interface is actually backwards-compatible with
existing PIIX4 ACPI (though not migration compatible).
This is split out from PIIX4 codebase so we can
reuse it for Q35 as well.
Igor Mammedov [Thu, 9 Jan 2014 16:36:33 +0000 (17:36 +0100)]
pc: make: fix dependencies: rebuild when included file is changed
some *.dsl files include another *.dsl files but there weren't
any dependicies and when included file changed target table wasn't
rebuild. Fix this by using the same auto dependency generation
as for C files.
Marcel Apfelbaum [Thu, 16 Jan 2014 15:50:48 +0000 (17:50 +0200)]
acpi unit-test: do not fail on asl mismatch
The asl comparison will break every time the ACPI
tables are updated. This may break the git bisect.
Instead of failing print a warning on stderr
including the retained asl files, so they can be
compared offline.
Marcel Apfelbaum [Thu, 26 Dec 2013 14:54:24 +0000 (16:54 +0200)]
acpi unit-test: added script to rebuild the expected aml files
Acpi unit-test will fail every time the acpi tables change.
This script rebuild the expected aml files, so the test
will pass. It also validates the modifications.
Marcel Apfelbaum [Thu, 26 Dec 2013 14:54:23 +0000 (16:54 +0200)]
acpi unit-test: extract iasl executable from configuration
The test checked if iasl is installed by running "iasl"
and checking the error output.
It is better to use the iasl executable as appears
in configuration.
Marcel Apfelbaum [Thu, 26 Dec 2013 14:54:21 +0000 (16:54 +0200)]
acpi unit-test: compare DSDT and SSDT tables against expected values
This test will run only if iasl is installed on the host machine.
The test plan:
1. Dumps the ACPI tables as AML on the disk.
2. Runs iasl to disassembly the tables into ASL files.
3. Runs iasl to disassembly the offline AML files into ASL files.
4. Compares the ASL files.
The test runs for both default machine and q35.
In case the test fails, it can be easily tweaked to
show the differences between the ASL files and
understand the issue.
Gabriel L. Somlo [Sun, 22 Dec 2013 15:34:56 +0000 (10:34 -0500)]
Add DSDT node for AppleSMC
AppleSMC (-device isa-applesmc) is required to boot OS X guests.
OS X expects a SMC node to be present in the ACPI DSDT. This patch
adds a SMC node to the DSDT, and dynamically patches the return value
of SMC._STA to either 0x0B if the chip is present, or otherwise to 0x00,
before booting the guest.
Laszlo Ersek [Tue, 17 Dec 2013 00:37:06 +0000 (01:37 +0100)]
Python-lang gdb script to extract x86_64 guest vmcore from qemu coredump
When qemu dies unexpectedly, for example in response to an explicit
abort() call, or (more importantly) when an external signal is delivered
to it that results in a coredump, sometimes it is useful to extract the
guest vmcore from the qemu process' memory image. The guest vmcore might
help understand an emulation problem in qemu, or help debug the guest.
This script reimplements (and cuts many features of) the
qmp_dump_guest_memory() command in gdb/Python,
working off the saved memory image of the qemu process. The docstring in
the patch (serving as gdb help text) describes the limitations relative to
the QMP command.
Dependencies of qmp_dump_guest_memory() have been reimplemented as needed.
I sought to follow the general structure, sticking to original function
names where possible. However, keeping it simple prevailed in some places.
The patch has been tested with a 4 VCPU, 768 MB, RHEL-6.4
(2.6.32-358.el6.x86_64) guest:
- <name_dropping>I asked for Dave Anderson's help with verifying the
extracted vmcore, and his comments make me think I should post
this.</name_dropping>
Anthony Liguori [Fri, 24 Jan 2014 23:52:44 +0000 (15:52 -0800)]
Merge remote-tracking branch 'qemu-kvm/uq/master' into staging
* qemu-kvm/uq/master:
kvm: always update the MPX model specific register
KVM: fix addr type for KVM_IOEVENTFD
KVM: Retry KVM_CREATE_VM on EINTR
mempath prefault: fix off-by-one error
kvm: x86: Separately write feature control MSR on reset
roms: Flush icache when writing roms to guest memory
target-i386: clear guest TSC on reset
target-i386: do not special case TSC writeback
target-i386: Intel MPX
Anthony Liguori [Fri, 24 Jan 2014 23:50:14 +0000 (15:50 -0800)]
Merge remote-tracking branch 'bonzini/scsi-next' into staging
* bonzini/scsi-next:
scsi: Support TEST UNIT READY in the dummy LUN0
block: add .bdrv_reopen_prepare() stub for iscsi
virtio-scsi: Prevent assertion on missed events
virtio-scsi: Cleanup of I/Os that never started
scsi: Assign cancel_io vector for scsi_disk_emulate_ops
Conflicts:
block/iscsi.c
aliguori: resolve trivial merge conflict in block/iscsi.c
Anthony Liguori [Fri, 24 Jan 2014 23:43:30 +0000 (15:43 -0800)]
Merge remote-tracking branch 'kwolf/tags/for-anthony' into staging
Block patches
# gpg: Signature made Fri 24 Jan 2014 08:40:53 AM PST using RSA key ID C88F2FD6
# gpg: Can't check signature: public key not found
* kwolf/tags/for-anthony: (93 commits)
block: Switch bdrv_io_limits_intercept() to byte granularity
qemu-iotests: Test pwritev RMW logic
qemu-io: New command 'sleep'
blkdebug: Make required alignment configurable
iscsi: Set bs->request_alignment
block: Make bdrv_pwrite() a bdrv_prwv_co() wrapper
block: Make bdrv_pread() a bdrv_prwv_co() wrapper
block: Change coroutine wrapper to byte granularity
block: Assert serialisation assumptions in pwritev
block: Align requests in bdrv_co_do_pwritev()
block: Allow wait_serialising_requests() at any point
block: Make overlap range for serialisation dynamic
block: Generalise and optimise COR serialisation
block: Make zero-after-EOF work with larger alignment
block: Allow waiting for overlapping requests between begin/end
block: Switch BdrvTrackedRequest to byte granularity
block: Introduce bdrv_co_do_pwritev()
block: write: Handle COR dependency after I/O throttling
block: Introduce bdrv_aligned_pwritev()
block: Introduce bdrv_co_do_preadv()
...
Kevin Wolf [Thu, 16 Jan 2014 12:29:10 +0000 (13:29 +0100)]
block: Switch bdrv_io_limits_intercept() to byte granularity
Request sizes used to be rounded down to the next sector boundary,
allowing to bypass the I/O limit. Now all requests are accounted for
with their exact byte size.
Kevin Wolf [Tue, 14 Jan 2014 12:44:35 +0000 (13:44 +0100)]
blkdebug: Make required alignment configurable
The new 'align' option of blkdebug can be used in order to emulate
backends with a required 4k alignment on hosts which only really require
512 byte alignment.
Kevin Wolf [Tue, 14 Jan 2014 10:41:35 +0000 (11:41 +0100)]
block: Assert serialisation assumptions in pwritev
If a request calls wait_serialising_requests() and actually has to wait
in this function (i.e. a coroutine yield), other requests can run and
previously read data (like the head or tail buffer) could become
outdated. In this case, we would have to restart from the beginning to
read in the updated data.
However, we're lucky and don't actually need to do that: A request can
only wait in the first call of wait_serialising_requests() because we
mark it as serialising before that call, so any later requests would
wait. So as we don't wait in practice, we don't have to reload the data.
This is an important assumption that may not be broken or data
corruption will happen. Document it with some assertions.
Kevin Wolf [Tue, 3 Dec 2013 15:34:41 +0000 (16:34 +0100)]
block: Align requests in bdrv_co_do_pwritev()
This patch changes bdrv_co_do_pwritev() to actually be what its name
promises. If requests aren't properly aligned, it performs a RMW.
Requests touching the same block are serialised against the RMW request.
Further optimisation of this is possible by differentiating types of
requests (concurrent reads should actually be okay here).
Kevin Wolf [Fri, 13 Dec 2013 12:04:35 +0000 (13:04 +0100)]
block: Allow wait_serialising_requests() at any point
We can only have a single wait_serialising_requests() call per request
because otherwise we can run into deadlocks where requests are waiting
for each other. The same is true when wait_serialising_requests() is not
at the very beginning of a request, so that other requests can be issued
between the start of the tracking and wait_serialising_requests().
Fix this by changing wait_serialising_requests() to ignore requests that
are already (directly or indirectly) waiting for the calling request.
Kevin Wolf [Wed, 4 Dec 2013 16:08:50 +0000 (17:08 +0100)]
block: Make overlap range for serialisation dynamic
Copy on Read wants to serialise with all requests touching the same
cluster, so wait_serialising_requests() rounded to cluster boundaries.
Other users like alignment RMW will have different requirements, though
(requests touching the same sector), so make it dynamic.
Kevin Wolf [Wed, 4 Dec 2013 15:43:44 +0000 (16:43 +0100)]
block: Generalise and optimise COR serialisation
Change the API so that specific requests can be marked serialising. Only
these requests are checked for overlaps then.
This means that during a Copy on Read operation, not all requests
overlapping other requests are serialised any more, but only those that
actually overlap with the specific COR request.
Also remove COR from function and variable names because this
functionality can be useful in other contexts.
Kevin Wolf [Wed, 4 Dec 2013 11:13:10 +0000 (12:13 +0100)]
block: Make zero-after-EOF work with larger alignment
Odd file sizes could make bdrv_aligned_preadv() shorten the request in
non-aligned ways. Fix it by rounding to the required alignment instead
of 512 bytes.
Kevin Wolf [Tue, 3 Dec 2013 13:40:18 +0000 (14:40 +0100)]
block: Introduce bdrv_co_do_pwritev()
This is going to become the bdrv_co_do_preadv() equivalent for writes.
In this patch, however, just a function taking byte offsets is created,
it doesn't align anything yet.
Kevin Wolf [Tue, 3 Dec 2013 13:30:44 +0000 (14:30 +0100)]
block: write: Handle COR dependency after I/O throttling
First waiting for all COR requests to complete and calling the
throttling function afterwards means that the request could be delayed
and we still need to wait for the COR request even if it was issued only
after the throttled write request.
Kevin Wolf [Tue, 3 Dec 2013 13:02:23 +0000 (14:02 +0100)]
block: Introduce bdrv_aligned_pwritev()
This separates the part of bdrv_co_do_writev() that needs to happen
before the request is modified to match the backend alignment, and a
part that needs to be executed afterwards and passes the request to the
BlockDriver.
Kevin Wolf [Mon, 2 Dec 2013 15:09:46 +0000 (16:09 +0100)]
block: Introduce bdrv_co_do_preadv()
Similar to bdrv_pread(), which aligns byte-aligned request to 512 byte
sectors, bdrv_co_do_preadv() takes a byte-aligned request and aligns it
to the alignment specified in bs->request_alignment.
Kevin Wolf [Mon, 2 Dec 2013 14:07:48 +0000 (15:07 +0100)]
block: Introduce bdrv_aligned_preadv()
This separates the part of bdrv_co_do_readv() that needs to happen
before the request is modified to match the backend alignment, and a
part that needs to be executed afterwards and passes the request to the
BlockDriver.
Paolo Bonzini [Tue, 29 Nov 2011 11:42:20 +0000 (12:42 +0100)]
raw: Probe required direct I/O alignment
Add a bs->request_alignment field that contains the required
offset/length alignment for I/O requests and fill it in the raw block
drivers. Use ioctls if possible, else see what alignment it takes for
O_DIRECT to succeed.
While at it, also expose the memory alignment requirements, which may be
(and in practice are) different from the disk alignment requirements.
Paolo Bonzini [Tue, 29 Nov 2011 10:35:47 +0000 (11:35 +0100)]
block: rename buffer_alignment to guest_block_size
The alignment field is now set to the value that is promised to the
guest, rather than required by the host. The next patches will make
QEMU aware of the host-provided values, so make this clear.
The alignment is also not about memory buffers, but about the sectors on
the disk, change the documentation of the field.
At this point, the field is set by the device emulation, but completely
ignored by the block layer.
Kevin Wolf [Thu, 28 Nov 2013 09:23:32 +0000 (10:23 +0100)]
block: Don't use guest sector size for qemu_blockalign()
bs->buffer_alignment is set by the device emulation and contains the
logical block size of the guest device. This isn't something that the
block layer should know, and even less something to use for determining
the right alignment of buffers to be used for the host.
The new BlockLimits field opt_mem_alignment tells the qemu block layer
the optimal alignment to be used so that no bounce buffer must be used
in the driver.
This patch may change the buffer alignment from 4k to 512 for all
callers that used qemu_blockalign() with the top-level image format
BlockDriverState. The value was never propagated to other levels in the
tree, so in particular raw-posix never required anything else than 512.
While on disks with 4k sectors direct I/O requires a 4k alignment,
memory may still be okay when aligned to 512 byte boundaries. This is
what must have happened in practice, because otherwise this would
already have failed earlier. Therefore I don't expect regressions even
with this intermediate state. Later, raw-posix can implement the hook
and expose a different memory alignment requirement.
Kevin Wolf [Thu, 5 Dec 2013 12:01:46 +0000 (13:01 +0100)]
block: Detect unaligned length in bdrv_qiov_is_aligned()
For an O_DIRECT request to succeed, it's not only necessary that all
base addresses in the qiov are aligned, but also that each length in it
is aligned.
Kevin Wolf [Wed, 11 Dec 2013 18:50:32 +0000 (19:50 +0100)]
block: Inherit opt_transfer_length
When there is a format driver between the backend, it's not guaranteed
that exposing the opt_transfer_length for the format driver results in
the optimal requests (because of fragmentation etc.), but it can't make
things worse, so let's just do it.
Kevin Wolf [Wed, 11 Dec 2013 18:26:16 +0000 (19:26 +0100)]
block: Move initialisation of BlockLimits to bdrv_refresh_limits()
This function separates filling the BlockLimits from bdrv_open(), which
allows it to call it from other operations which may change the limits
(e.g. modifications to the backing file chain or bdrv_reopen)
Kevin Wolf [Fri, 24 Jan 2014 13:00:43 +0000 (14:00 +0100)]
block: Fix bdrv_commit return value
bdrv_commit() could return 0 or 1 on success, depending on whether or
not the last sector was allocated in the overlay and whether the overlay
format had a .bdrv_make_empty callback.
Most callers ignored it, but qemu-img commit would print an error
message while the operation actually succeeded.
Also clean up the handling of I/O errors to return the real error code
instead of -EIO.
This updates the documentation for commiting snapshot images.
Specifically, this highlights what happens when the base image
is either smaller or larger than the snapshot image being committed.
In the case of the base image being smaller, it is resized to the
larger size of the snapshot image. In the case of the base image
being larger, it is not resized automatically, but once the commit
has completed it is safe for the user to truncate the base image.
Jeff Cody [Fri, 24 Jan 2014 14:02:36 +0000 (09:02 -0500)]
block: resize backing image during active layer commit, if needed
If the top image to commit is the active layer, and also larger than
the base image, then an I/O error will likely be returned during
block-commit.
For instance, if we have a base image with a virtual size 10G, and a
active layer image of size 20G, then committing the snapshot via
'block-commit' will likely fail.
This will automatically attempt to resize the base image, if the
active layer image to be committed is larger.
Jeff Cody [Fri, 24 Jan 2014 14:02:35 +0000 (09:02 -0500)]
block: resize backing file image during offline commit, if necessary
Currently, if an image file is logically larger than its backing file,
committing it via 'qemu-img commit' will fail.
For instance, if we have a base image with a virtual size 10G, and a
snapshot image of size 20G, then committing the snapshot offline with
'qemu-img commit' will likely fail.
This will automatically attempt to resize the base image, if the
snapshot image to be committed is larger.
Peter Maydell [Fri, 24 Jan 2014 13:56:17 +0000 (14:56 +0100)]
block/curl: Implement the libcurl timer callback interface
libcurl versions 7.16.0 and later have a timer callback interface which
must be implemented in order for libcurl to make forward progress (it
will sometimes rely on being called back on the timeout if there are
no file descriptors registered). Implement the callback, and use a
QEMU AIO timer to ensure we prod libcurl again when it asks us to.
Based on Peter's original patch plus my fix to add curl_multi_timeout_do.
Should compile just fine even on older versions of libcurl.
Benoît Canet [Thu, 23 Jan 2014 20:31:35 +0000 (21:31 +0100)]
qmp: Allow to change password on named block driver states.
Signed-off-by: Benoit Canet <[email protected]> Reviewed-by: Fam Zheng <[email protected]>
There was two candidate ways to implement named node manipulation:
Luiz proposed 1 and says 2 was an abuse of the QMP interface and proposed to
rewrite the QMP block interface for 2.0.
Luiz does not like in 1 the fact that 2 fields are optional but one of them must
be specified leading to an abuse of the QMP semantic.
Kevin argumented that 2 what a clear abuse of the device field and would not be
practical when reading fast some log file because the user would read "device"
and think that a device is manipulated when it's in fact a node name.
Documentation of 1 make it pretty clear what to do for the user.
Kevin argued that all bs are node including devices ones so 2 does not make
sense.
Kevin also argued that rewriting the QMP block interface would not make disapear
the current one.
Kevin pushed the argument that making the QAPI generator compatible with the
semantic of the operation would need a rewrite that no one has done yet.
A vote has been done on the list to elect the version to use and 1 won.
For reference the complete thread is:
"[Qemu-devel] [PATCH V4 4/7] qmp: Allow to change password on names block driver
states."
Zhang Min [Thu, 23 Jan 2014 07:59:16 +0000 (15:59 +0800)]
drive mirror:fix memory leak
In the function mirror_iteration() -> qemu_iovec_init(),
it allocates memory for op->qiov.iov, when the write request calls back,
but in the function mirror_iteration_done(), it only frees the op,
not free the op->qiov.iov, so this causes memory leak.
It should use qemu_iovec_destroy() to free op->qiov.
Hu Tao [Tue, 21 Jan 2014 03:30:02 +0000 (11:30 +0800)]
qcow2: fix wrong value of L1E_OFFSET_MASK, L2E_OFFSET_MASK and REFT_OFFSET_MASK
Accoring to qcow spec, the offset fields in l1e, l2e and ref table entry
start at bit 9. The offset is cluster offset, and the smallest possible
cluster size is 512 bytes.
Stefan Hajnoczi [Mon, 13 Jan 2014 10:47:39 +0000 (18:47 +0800)]
dataplane: fix shadowed return value
Propagate the error return value from get_indirect(). This bug was
introduced in commit 4d684832 ("vring: create a common function to parse
descriptors").
Peter Feiner [Wed, 8 Jan 2014 19:43:25 +0000 (19:43 +0000)]
block: fix backing file segfault
When a backing file is opened such that (1) a protocol is directly
used as the block driver and (2) the block driver has bdrv_file_open,
bdrv_open_backing_file segfaults. The problem arises because
bdrv_open_common returns without setting bd->backing_hd->file.
To effect (1), you seem to have to use the -F flag in qemu-img. There
are several block drivers that satisfy (2), such as "file" and "nbd".
Here are some concrete examples:
#!/bin/bash
echo Test file format
./qemu-img create -f file base.file 1m
./qemu-img create -f qcow2 -F file -o backing_file=base.file\
file-overlay.qcow2
./qemu-img convert -O raw file-overlay.qcow2 file-convert.raw
echo Test nbd format
SOCK=$PWD/nbd.sock
./qemu-img create -f raw base.raw 1m
./qemu-nbd -t -k $SOCK base.raw &
trap "kill $!" EXIT
while ! test -e $SOCK; do sleep 1; done
./qemu-img create -f qcow2 -F nbd -o backing_file=nbd:unix:$SOCK\
nbd-overlay.qcow2
./qemu-img convert -O raw nbd-overlay.qcow2 nbd-convert.raw
Without this patch, the two qemu-img convert commands segfault.
Max Reitz [Fri, 20 Dec 2013 18:28:23 +0000 (19:28 +0100)]
iotests: Test new blkdebug/blkverify interface
Add a test for the new blkdebug/blkverify interface.
This test is not written in Python, although it uses QMP. This is
because it invokes the qemu-io HMP command, which outputs errors to
stderr instead of returning them through QMP. Filtering and testing that
output is easier in a shell script than with the Python infrastructure.
Max Reitz [Fri, 20 Dec 2013 18:28:20 +0000 (19:28 +0100)]
qemu-io: Make filename optional
Giving a filename is actually not essential, since it can be specified
through the options as well - on the contrary: Sometimes a filename must
not be given.
Max Reitz [Fri, 20 Dec 2013 18:28:17 +0000 (19:28 +0100)]
blkverify: Don't require protocol filename
If the filename is not prefixed by "blkverify:" in
blkverify_parse_filename(), the blkverify driver was not selected
through that protocol prefix, but by an explicit command line (or QMP)
option (like driver=blkverify).
If blkverify_parse_filename() has been called, a filename has been
given. If it is not prefixed, it is probably really just a plain
filename. This is no problem, since we can use it as the test image
filename and rely on the user to specify the raw image filename through
the new corresponding option.