Max Reitz [Tue, 13 Jun 2017 20:21:03 +0000 (22:21 +0200)]
block/qcow2: Add qcow2_refcount_area()
This function creates a collection of self-describing refcount
structures (including a new refcount table) at the end of a qcow2 image
file. Optionally, these structures can also describe a number of
additional clusters beyond themselves; this will be important for
preallocated truncation, which will place the data clusters and L2
tables there.
For now, we can use this function to replace the part of
alloc_refcount_block() that grows the refcount table (from which it is
actually derived).
Max Reitz [Tue, 13 Jun 2017 20:21:01 +0000 (22:21 +0200)]
block/qcow2: Lock s->lock in preallocate()
preallocate() is and will be called only from places that do not
otherwise need to lock s->lock: Currently that is qcow2_create2(), as of
a future patch it will be called from qcow2_truncate(), too.
It therefore makes sense to move locking that mutex into preallocate()
itself.
Max Reitz [Tue, 13 Jun 2017 20:21:00 +0000 (22:21 +0200)]
block/qcow2: Generalize preallocate()
This patch adds two new parameters to the preallocate() function so we
will be able to use it not just for preallocating a new image but also
for preallocated image growth.
The offset parameter allows the caller to specify a virtual offset from
which to start preallocating. For newly created images this is always 0,
but for preallocating growth this will be the old image length.
The new_length parameter specifies the supposed new length of the image
(basically the "end offset" for preallocation). During image truncation,
bdrv_getlength() will return the old image length so we cannot rely on
its return value then.
Max Reitz [Tue, 13 Jun 2017 20:20:58 +0000 (22:20 +0200)]
block/file-posix: Generalize raw_regular_truncate
Currently, raw_regular_truncate() is intended for setting the size of a
newly created file. However, we also want to use it for truncating an
existing file in which case only the newly added space (when growing)
should be preallocated.
This also means that if resizing failed, we should try to restore the
original file size. This is important when using preallocation.
Max Reitz [Tue, 13 Jun 2017 20:20:56 +0000 (22:20 +0200)]
block/file-posix: Small fixes in raw_create()
Variables should be declared at the start of a block, and if a certain
parameter value is not supported it may be better to return -ENOTSUP
instead of -EINVAL.
Max Reitz [Tue, 13 Jun 2017 20:20:53 +0000 (22:20 +0200)]
block: Add PreallocMode to bdrv_truncate()
For block drivers that just pass a truncate request to the underlying
protocol, we can now pass the preallocation mode instead of aborting if
it is not PREALLOC_MODE_OFF.
Max Reitz [Tue, 13 Jun 2017 20:20:52 +0000 (22:20 +0200)]
block: Add PreallocMode to BD.bdrv_truncate()
Add a PreallocMode parameter to the bdrv_truncate() function implemented
by each block driver. Currently, we always pass PREALLOC_MODE_OFF and no
driver accepts anything else.
Stefan Hajnoczi [Wed, 5 Jul 2017 12:57:37 +0000 (13:57 +0100)]
qemu-iotests: support per-format golden output files
Some tests produce format-dependent output. Either the difference is
filtered out and ignored, or the test case is format-specific so we
don't need to worry about per-format output differences.
There is a third case: the test script is the same for all image formats
and the format-dependent output is relevant. An ugly workaround is to
copy-paste the test into multiple per-format test cases. This
duplicates code and is not maintainable.
This patch allows test cases to add per-format golden output files so a
single test case can work correctly when format-dependent output must be
checked:
123.out.qcow2
123.out.raw
123.out.vmdk
...
This naming scheme is not composable with 123.out.nocache or 123.pc.out,
two other scenarios where output files are split. I don't think it
matters since few test cases need these features.
Stefan Hajnoczi [Wed, 5 Jul 2017 12:57:36 +0000 (13:57 +0100)]
qemu-img: add measure subcommand
The measure subcommand calculates the size required by a new image file.
This can be used by users or management tools that need to allocate
space on an LVM volume, SAN LUN, etc before creating or converting an
image file.
Stefan Hajnoczi [Wed, 5 Jul 2017 12:57:34 +0000 (13:57 +0100)]
qcow2: extract image creation option parsing
The image creation options parsed by qcow2_create() are also needed to
implement .bdrv_measure(). Extract the parsing code, including input
validation.
Stefan Hajnoczi [Wed, 5 Jul 2017 12:57:33 +0000 (13:57 +0100)]
qcow2: make refcount size calculation conservative
The refcount metadata size calculation is inaccurate and can produce
numbers that are too small. This is bad because we should calculate a
conservative number - one that is guaranteed to be large enough.
This patch switches the approach to a fixed point calculation because
the existing equation is hard to solve when inaccuracies are taken care
of.
Stefan Hajnoczi [Wed, 5 Jul 2017 12:57:30 +0000 (13:57 +0100)]
block: add bdrv_measure() API
bdrv_measure() provides a conservative maximum for the size of a new
image. This information is handy if storage needs to be allocated (e.g.
a SAN or an LVM volume) ahead of time.
Eric Blake [Mon, 3 Jul 2017 18:09:50 +0000 (13:09 -0500)]
tests: Avoid non-portable 'echo -ARG'
POSIX says that backslashes in the arguments to 'echo', as well as
any use of 'echo -n' and 'echo -e', are non-portable; it recommends
people should favor 'printf' instead. This is definitely true where
we do not control which shell is running (such as in makefile snippets
or in documentation examples). But even for scripts where we
require bash (and therefore, where echo does what we want by default),
it is still possible to use 'shopt -s xpg_echo' to change bash's
behavior of echo. And setting a good example never hurts when we are
not sure if a snippet will be copied from a bash-only script to a
general shell script (although I don't change the use of non-portable
\e for ESC when we know the running shell is bash).
Replace 'echo -n "..."' with 'printf %s "..."', and 'echo -e "..."'
with 'printf %b "...\n"', with the optimization that the %s/%b
argument can be omitted if the string being printed is a strict
literal with no '%', '$', or '`' (we could technically also make
this optimization when there are $ or `` substitutions but where
we can prove their results will not be problematic, but proving
that such substitutions are safe makes the patch less trivial
compared to just being consistent).
In the qemu-iotests check script, fix unusual shell quoting
that would result in word-splitting if 'date' outputs a space.
In test 051, take an opportunity to shorten the line.
In test 068, get rid of a pointless second invocation of bash.
Max Reitz [Sun, 2 Jul 2017 15:05:09 +0000 (17:05 +0200)]
iotests: Use absolute paths for executables
A user may specify a relative path for accessing qemu, qemu-img, etc.
through environment variables ($QEMU_PROG and friends) or a symlink.
If a test decides to change its working directory, relative paths will
cease to work, however. Work around this by making all of the paths to
programs that should undergo testing absolute. Besides "realpath", we
also have to use "type -p" to support programs in $PATH.
As a side effect, this fixes specifying these programs as symlinks for
out-of-tree builds: Before, you would have to create two symlinks, one
in the build and one in the source tree (the first one for common.config
to find, the second one for the iotest to use). Now it is sufficient to
create one in the build tree because common.config will resolve it.
iotests: chown LUKS device before qemu-io launches
On some distros, whenever you close a block device file
descriptor there is a udev rule that resets the file
permissions. This can race with the test script when
we run qemu-io multiple times against the same block
device. Occasionally the second qemu-io invocation
will find udev has reset the permissions causing failure.
iotests: reduce PBKDF iterations when testing LUKS
By default the PBKDF algorithm used with LUKS is tuned
based on the number of iterations to produce 1 second
of running time. This makes running the I/O test with
the LUKS format orders of magnitude slower than with
qcow2/raw formats.
When creating LUKS images, set the iteration time to
a 10ms to reduce the time overhead for LUKS, since
security does not matter in I/O tests.
Previously a full 'check -luks' would take
$ time ./check -luks
Passed all 22 tests
real 23m9.988s
user 21m46.223s
sys 0m22.841s
Now it takes
$ time ./check -luks
Passed all 22 tests
real 4m39.235s
user 3m29.590s
sys 0m24.234s
Still slow compared to qcow2/raw, but much improved
none the less.
The tests 033, 140, 145 and 157 were all broken
when run with LUKS, since they did not correctly use
the required image opts args syntax to specify the
decryption secret. Further, the 120 test simply does
not make sense to run with luks, as the scenario
exercised is not relevant.
The test 181 was broken when run with LUKS because
it didn't take account of fact that $TEST_IMG was
already in image opts syntax. The launch_qemu
helper also didn't register the secret object
providing the LUKS password.
While the qemu-img dd command does accept --image-opts
this is not sufficient to make it work with the LUKS
image yet. This is because bdrv_create() still always
requires the non-image-opts syntax.
This will be needed to check some restrictions before making bitmap
persistent in qmp-block-dirty-bitmap-add (this functionality will be
added by future patch)
New field BdrvDirtyBitmap.persistent means, that bitmap should be saved
by format driver in .bdrv_close and .bdrv_inactivate. No format driver
supports it for now.
Add format driver handler, which should mark loaded read-only
bitmaps as 'IN_USE' in the image and unset read_only field in
corresponding BdrvDirtyBitmap's.
Auto loading bitmaps are bitmaps in Qcow2, with the AUTO flag set. They
are loaded when the image is opened and become BdrvDirtyBitmaps for the
corresponding drive.
block/dirty-bitmap: add readonly field to BdrvDirtyBitmap
It will be needed in following commits for persistent bitmaps.
If bitmap is loaded from read-only storage (and we can't mark it
"in use" in this storage) corresponding BdrvDirtyBitmap should be
read-only.
Add bitmap extension as specified in docs/specs/qcow2.txt.
For now, just mirror extension header into Qcow2 state and check
constraints. Also, calculate refcounts for qcow2 bitmaps, to not break
qemu-img check.
For now, disable image resize if it has bitmaps. It will be fixed later.
A bitmap directory entry is sometimes called a 'bitmap header'. This
patch leaves only one name - 'bitmap directory entry'. The name 'bitmap
header' creates misunderstandings with 'qcow2 header' and 'qcow2 bitmap
header extension' (which is extension of qcow2 header)
sochin.jiang [Mon, 26 Jun 2017 11:04:24 +0000 (19:04 +0800)]
mirror: Fix inconsistent backing AioContext for after mirroring
mirror_complete opens the backing chain, which should have the same
AioContext as the top when using iothreads. Make the code guarantee
this, which fixes a failed assertion in bdrv_attach_child.
qcow2: report encryption specific image information
Currently 'qemu-img info' reports a simple "encrypted: yes"
field. This is not very useful now that qcow2 can support
multiple encryption formats. Users want to know which format
is in use and some data related to it.
Wire up usage of the qcrypto_block_get_info() method so that
'qemu-img info' can report about the encryption format
and parameters in use
While the crypto layer uses a fixed option name "key-secret",
the upper block layer may have a prefix on the options. e.g.
"encrypt.key-secret", in order to avoid clashes between crypto
option names & other block option names. To ensure the crypto
layer can report accurate error messages, we must tell it what
option name prefix was used.
Now that all encryption keys must be provided upfront via
the QCryptoSecret API and associated block driver properties
there is no need for any explicit encryption handling APIs
in the block layer. Encryption can be handled transparently
within the block driver. We only retain an API for querying
whether an image is encrypted or not, since that is a
potentially useful piece of metadata to report to the user.
Now that qcow & qcow2 are wired up to get encryption keys
via the QCryptoSecret object, nothing is relying on the
interactive prompting for passwords. All the code related
to password prompting can thus be ripped out.
The legacy "encryption=on" parameter still results in
creation of the old qcow2 AES format (and is equivalent
to the new 'encryption-format=aes'). e.g. the following are
equivalent:
With the LUKS format it is necessary to store the LUKS
partition header and key material in the QCow2 file. This
data can be many MB in size, so cannot go into the QCow2
header region directly. Thus the spec defines a FDE
(Full Disk Encryption) header extension that specifies
the offset of a set of clusters to hold the FDE headers,
as well as the length of that region. The LUKS header is
thus stored in these extra allocated clusters before the
main image payload.
Aside from all the cryptographic differences implied by
use of the LUKS format, there is one further key difference
between the use of legacy AES and LUKS encryption in qcow2.
For LUKS, the initialiazation vectors are generated using
the host physical sector as the input, rather than the
guest virtual sector. This guarantees unique initialization
vectors for all sectors when qcow2 internal snapshots are
used, thus giving stronger protection against watermarking
attacks.
qcow2: extend specification to cover LUKS encryption
Update the qcow2 specification to describe how the LUKS header is
placed inside a qcow2 file, when using LUKS encryption for the
qcow2 payload instead of the legacy AES-CBC encryption
qcow2: convert QCow2 to use QCryptoBlock for encryption
This converts the qcow2 driver to make use of the QCryptoBlock
APIs for encrypting image content, using the legacy QCow2 AES
scheme.
With this change it is now required to use the QCryptoSecret
object for providing passwords, instead of the current block
password APIs / interactive prompting.
The test 087 could be simplified since there is no longer a
difference in behaviour when using blockdev_add with encrypted
images for the running vs stopped CPU state.
qcow2: make qcow2_encrypt_sectors encrypt in place
Instead of requiring separate input/output buffers for
encrypting data, change qcow2_encrypt_sectors() to assume
use of a single buffer, encrypting in place. The current
callers all used the same buffer for input/output already.
qcow: convert QCow to use QCryptoBlock for encryption
This converts the qcow driver to make use of the QCryptoBlock
APIs for encrypting image content. This is only wired up to
permit use of the legacy QCow encryption format. Users who wish
to have the strong LUKS format should switch to qcow2 instead.
With this change it is now required to use the QCryptoSecret
object for providing passwords, instead of the current block
password APIs / interactive prompting.
Though note that running QEMU system emulators with the AES
encryption is no longer supported, so while the above syntax
is valid, QEMU will refuse to actually run the VM in this
particular example.
Likewise when creating images with the legacy AES-CBC format
Instead of requiring separate input/output buffers for
encrypting data, change encrypt_sectors() to assume
use of a single buffer, encrypting in place. One current
caller uses the same buffer for input/output already
and the other two callers are easily converted to do so.
block: deprecate "encryption=on" in favor of "encrypt.format=aes"
Historically the qcow & qcow2 image formats supported a property
"encryption=on" to enable their built-in AES encryption. We'll
soon be supporting LUKS for qcow2, so need a more general purpose
way to enable encryption, with a choice of formats.
This introduces an "encrypt.format" option, which will later be
joined by a number of other "encrypt.XXX" options. The use of
a "encrypt." prefix instead of "encrypt-" is done to facilitate
mapping to a nested QAPI schema at later date.
The qcow driver refuses to open images which are less than
2 bytes in size, but will happily create such images. Add
a check in the create path to avoid this discrepancy.
qcow: document another weakness of qcow AES encryption
Document that use of guest virtual sector numbers as the basis for
the initialization vectors is a potential weakness, when combined
with internal snapshots or multiple images using the same passphrase.
This fixes the formatting of the itemized list too.
When integrating the crypto support with qcow/qcow2, we don't
want to use the bare LUKS option names "hash-alg", "key-secret",
etc. We need to namespace them to match the nested QAPI schema.
e.g. "encrypt.hash-alg", "encrypt.key-secret"
so that they don't clash with any general qcow options at a later
date.
block: expose crypto option names / defs to other drivers
The block/crypto.c defines a set of QemuOpts that provide
parameters for encryption. This will also be needed by
the qcow/qcow2 integration, so expose the relevant pieces
in a new block/crypto.h header. Some helper methods taking
QemuOpts are changed to take QDict to simplify usage in
other places.
Paolo Bonzini [Tue, 11 Jul 2017 10:00:49 +0000 (12:00 +0200)]
build: disable Xen on ARM
While ARM could present the xenpv machine, it does not and trying to enable
it breaks compilation. Revert to the previous test which only looked at
$target_name, not $cpu.
Peter Maydell [Mon, 10 Jul 2017 17:13:03 +0000 (18:13 +0100)]
Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20170710a' into staging
Migration pull 2017-07-10
# gpg: Signature made Mon 10 Jul 2017 18:04:57 BST
# gpg: using RSA key 0x0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert/tags/pull-migration-20170710a:
migration: Make compression_threads use save/load_setup/cleanup()
migration: Convert ram to use new load_setup()/load_cleanup()
migration: Create load_setup()/cleanup() methods
migration: Rename cleanup() to save_cleanup()
migration: Rename save_live_setup() to save_setup()
doc: update TYPE_MIGRATION documents
doc: add item for "-M enforce-config-section"
vl: move global property, migrate init earlier
migration: fix handling for --only-migratable
Juan Quintela [Wed, 28 Jun 2017 09:52:27 +0000 (11:52 +0200)]
migration: Convert ram to use new load_setup()/load_cleanup()
Once there, I rename ram_migration_cleanup() to ram_save_cleanup().
Notice that this is the first pass, and I only passed XBZRLE to the
new scheme. Moved decoded_buf to inside XBZRLE struct.
As a bonus, I don't have to export xbzrle functions from ram.c.
loaded_data pointer was needed because called can change it (dave)
spell loaded correctly in comment (dave)
Message-Id: <20170628095228[email protected]> Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Move the printing of the error message so we can print the device
giving the error.
Add call to postcopy stuff
Message-Id: <20170628095228[email protected]> Reviewed-by: Dr. David Alan Gilbert <[email protected]> Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Peter Xu [Wed, 5 Jul 2017 08:21:21 +0000 (16:21 +0800)]
vl: move global property, migrate init earlier
Currently drive_init_func() may call migrate_get_current() while the
migrate object is still not ready yet at that time. Move the migration
object init earlier, along with the global properties, right after
acceleration init.
This fixes a breakage for iotest 055, which caused an assertion failure.
Alex Williamson [Mon, 10 Jul 2017 16:39:43 +0000 (10:39 -0600)]
vfio/pci: Fixup v0 PCIe capabilities
Intel 82599 VFs report a PCIe capability version of 0, which is
invalid. The earliest version of the PCIe spec used version 1. This
causes Windows to fail startup on the device and it will be disabled
with error code 10. Our choices are either to drop the PCIe cap on
such devices, which has the side effect of likely preventing the guest
from discovering any extended capabilities, or performing a fixup to
update the capability to the earliest valid version. This implements
the latter.
Alex Williamson [Mon, 10 Jul 2017 16:39:43 +0000 (10:39 -0600)]
vfio: Test realized when using VFIOGroup.device_list iterator
VFIOGroup.device_list is effectively our reference tracking mechanism
such that we can teardown a group when all of the device references
are removed. However, we also use this list from our machine reset
handler for processing resets that affect multiple devices. Generally
device removals are fully processed (exitfn + finalize) when this
reset handler is invoked, however if the removal is triggered via
another reset handler (piix4_reset->acpi_pcihp_reset) then the device
exitfn may run, but not finalize. In this case we hit asserts when
we start trying to access PCI helpers since much of the PCI state of
the device is released. To resolve this, add a pointer to the Object
DeviceState in our common base-device and skip non-realized devices
as we iterate.
* remotes/ericb/tags/pull-nbd-2017-07-10-v2:
nbd: use generic trace subsystem instead of TRACE macro
nbd: refactor tracing
nbd/server: rename clientflags var in nbd_negotiate_options
nbd/server: fix TRACE in nbd_negotiate_send_rep_len
nbd/client: refactor TRACE of NBD_MAGIC
nbd/common: nbd_tls_handshake: remove extra TRACE
nbd/server: add errp to nbd_send_reply()
nbd/server: use errp instead of LOG
nbd/server: refactor nbd_negotiate
nbd/server: nbd_negotiate: return 1 on NBD_OPT_ABORT
MAINTAINERS: Promote NBD to supported, with new maintainer