Git Repo - qemu.git/log

qcow2: add iotests to cover LUKS encryption support

This extends the 087 iotest to cover LUKS encryption when doing
blockdev-add.

Two further tests are added to validate read/write of LUKS
encrypted images with a single file and with a backing file.

Reviewed-by: Alberto Garcia <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20170623162419 [email protected]
Signed-off-by: Max Reitz <[email protected]>

qcow2: add support for LUKS encryption format

This adds support for using LUKS as an encryption format
with the qcow2 file, using the new encrypt.format parameter
to request "luks" format. e.g.

  # qemu-img create --object secret,data=123456,id=sec0 \
       -f qcow2 -o encrypt.format=luks,encrypt.key-secret=sec0 \
       test.qcow2 10G

The legacy "encryption=on" parameter still results in
creation of the old qcow2 AES format (and is equivalent
to the new 'encryption-format=aes'). e.g. the following are
equivalent:

  # qemu-img create --object secret,data=123456,id=sec0 \
       -f qcow2 -o encryption=on,encrypt.key-secret=sec0 \
       test.qcow2 10G

# qemu-img create --object secret,data=123456,id=sec0 \
       -f qcow2 -o encryption-format=aes,encrypt.key-secret=sec0 \
       test.qcow2 10G

With the LUKS format it is necessary to store the LUKS
partition header and key material in the QCow2 file. This
data can be many MB in size, so cannot go into the QCow2
header region directly. Thus the spec defines a FDE
(Full Disk Encryption) header extension that specifies
the offset of a set of clusters to hold the FDE headers,
as well as the length of that region. The LUKS header is
thus stored in these extra allocated clusters before the
main image payload.

Aside from all the cryptographic differences implied by
use of the LUKS format, there is one further key difference
between the use of legacy AES and LUKS encryption in qcow2.
For LUKS, the initialiazation vectors are generated using
the host physical sector as the input, rather than the
guest virtual sector. This guarantees unique initialization
vectors for all sectors when qcow2 internal snapshots are
used, thus giving stronger protection against watermarking
attacks.

Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20170623162419 [email protected]
Reviewed-by: Alberto Garcia <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

qcow2: extend specification to cover LUKS encryption

Update the qcow2 specification to describe how the LUKS header is
placed inside a qcow2 file, when using LUKS encryption for the
qcow2 payload instead of the legacy AES-CBC encryption

Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20170623162419 [email protected]
Signed-off-by: Max Reitz <[email protected]>

qcow2: convert QCow2 to use QCryptoBlock for encryption

This converts the qcow2 driver to make use of the QCryptoBlock
APIs for encrypting image content, using the legacy QCow2 AES
scheme.

With this change it is now required to use the QCryptoSecret
object for providing passwords, instead of the current block
password APIs / interactive prompting.

  $QEMU \
    -object secret,id=sec0,file=/home/berrange/encrypted.pw \
    -drive file=/home/berrange/encrypted.qcow2,encrypt.key-secret=sec0

The test 087 could be simplified since there is no longer a
difference in behaviour when using blockdev_add with encrypted
images for the running vs stopped CPU state.

Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20170623162419 [email protected]
Reviewed-by: Alberto Garcia <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

qcow2: make qcow2_encrypt_sectors encrypt in place

Instead of requiring separate input/output buffers for
encrypting data, change qcow2_encrypt_sectors() to assume
use of a single buffer, encrypting in place. The current
callers all used the same buffer for input/output already.

Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20170623162419 [email protected]
Reviewed-by: Alberto Garcia <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

qcow: convert QCow to use QCryptoBlock for encryption

This converts the qcow driver to make use of the QCryptoBlock
APIs for encrypting image content. This is only wired up to
permit use of the legacy QCow encryption format. Users who wish
to have the strong LUKS format should switch to qcow2 instead.

With this change it is now required to use the QCryptoSecret
object for providing passwords, instead of the current block
password APIs / interactive prompting.

  $QEMU \
    -object secret,id=sec0,file=/home/berrange/encrypted.pw \
    -drive file=/home/berrange/encrypted.qcow,encrypt.format=aes,\
           encrypt.key-secret=sec0

Though note that running QEMU system emulators with the AES
encryption is no longer supported, so while the above syntax
is valid, QEMU will refuse to actually run the VM in this
particular example.

Likewise when creating images with the legacy AES-CBC format

  qemu-img create -f qcow \
    --object secret,id=sec0,file=/home/berrange/encrypted.pw \
    -o encrypt.format=aes,encrypt.key-secret=sec0 \
    /home/berrange/encrypted.qcow 64M

Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20170623162419 [email protected]
Signed-off-by: Max Reitz <[email protected]>

qcow: make encrypt_sectors encrypt in place

Instead of requiring separate input/output buffers for
encrypting data, change encrypt_sectors() to assume
use of a single buffer, encrypting in place. One current
caller uses the same buffer for input/output already
and the other two callers are easily converted to do so.

Reviewed-by: Alberto Garcia <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20170623162419 [email protected]
Signed-off-by: Max Reitz <[email protected]>

block: deprecate "encryption=on" in favor of "encrypt.format=aes"

Historically the qcow & qcow2 image formats supported a property
"encryption=on" to enable their built-in AES encryption. We'll
soon be supporting LUKS for qcow2, so need a more general purpose
way to enable encryption, with a choice of formats.

This introduces an "encrypt.format" option, which will later be
joined by a number of other "encrypt.XXX" options. The use of
a "encrypt." prefix instead of "encrypt-" is done to facilitate
mapping to a nested QAPI schema at later date.

e.g. the preferred syntax is now

qemu-img create -f qcow2 -o encrypt.format=aes demo.qcow2

Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20170623162419 [email protected]
Reviewed-by: Alberto Garcia <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

iotests: skip 048 with qcow which doesn't support resize

Test 048 is designed to verify data preservation during an
image resize. The qcow (v1) format impl has never supported
resize so always fails.

Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20170623162419 [email protected]
Signed-off-by: Max Reitz <[email protected]>

iotests: skip 042 with qcow which dosn't support zero sized images

Test 042 is designed to verify operation with zero sized images.
Such images are not supported with qcow (v1), so this test has
always failed.

Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20170623162419 [email protected]
Signed-off-by: Max Reitz <[email protected]>

qcow: require image size to be > 1 for new images

The qcow driver refuses to open images which are less than
2 bytes in size, but will happily create such images. Add
a check in the create path to avoid this discrepancy.

Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20170623162419 [email protected]
Signed-off-by: Max Reitz <[email protected]>

qcow: document another weakness of qcow AES encryption

Document that use of guest virtual sector numbers as the basis for
the initialization vectors is a potential weakness, when combined
with internal snapshots or multiple images using the same passphrase.
This fixes the formatting of the itemized list too.

Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20170623162419 [email protected]
Signed-off-by: Max Reitz <[email protected]>

block: add ability to set a prefix for opt names

When integrating the crypto support with qcow/qcow2, we don't
want to use the bare LUKS option names "hash-alg", "key-secret",
etc. We need to namespace them to match the nested QAPI schema.

e.g. "encrypt.hash-alg", "encrypt.key-secret"

so that they don't clash with any general qcow options at a later
date.

Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20170623162419 [email protected]
Signed-off-by: Max Reitz <[email protected]>

block: expose crypto option names / defs to other drivers

The block/crypto.c defines a set of QemuOpts that provide
parameters for encryption. This will also be needed by
the qcow/qcow2 integration, so expose the relevant pieces
in a new block/crypto.h header. Some helper methods taking
QemuOpts are changed to take QDict to simplify usage in
other places.

Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20170623162419 [email protected]
Signed-off-by: Max Reitz <[email protected]>

Merge remote-tracking branch 'remotes/awilliam/tags/vfio-updates-20170710.0' into staging

VFIO fixes 2017-07-10

- Don't iterate over non-realized devices (Alex Williamson)
- Add PCIe capability version fixup (Alex Williamson)

# gpg: Signature made Mon 10 Jul 2017 20:06:11 BST
# gpg:                using RSA key 0x239B9B6E3BB08B22
# gpg: Good signature from "Alex Williamson <[email protected]>"
# gpg:                 aka "Alex Williamson <[email protected]>"
# gpg:                 aka "Alex Williamson <[email protected]>"
# gpg:                 aka "Alex Williamson <[email protected]>"
# Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B  8A90 239B 9B6E 3BB0 8B22

* remotes/awilliam/tags/vfio-updates-20170710.0:
  vfio/pci: Fixup v0 PCIe capabilities
  vfio: Test realized when using VFIOGroup.device_list iterator

Signed-off-by: Peter Maydell <[email protected]>

build: disable Xen on ARM

While ARM could present the xenpv machine, it does not and trying to enable
it breaks compilation. Revert to the previous test which only looked at
$target_name, not $cpu.

Fixes: 3b6b75506de44c5070639943c30a0ad5850f5d02
Reported-by: Alex Bennée <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-id: 20170711100049 [email protected]
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20170710a' into staging

Migration pull 2017-07-10

# gpg: Signature made Mon 10 Jul 2017 18:04:57 BST
# gpg:                using RSA key 0x0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert/tags/pull-migration-20170710a:
  migration: Make compression_threads use save/load_setup/cleanup()
  migration: Convert ram to use new load_setup()/load_cleanup()
  migration: Create load_setup()/cleanup() methods
  migration: Rename cleanup() to save_cleanup()
  migration: Rename save_live_setup() to save_setup()
  doc: update TYPE_MIGRATION documents
  doc: add item for "-M enforce-config-section"
  vl: move global property, migrate init earlier
  migration: fix handling for --only-migratable

Signed-off-by: Peter Maydell <[email protected]>

migration: Make compression_threads use save/load_setup/cleanup()

Once there, be consistent and use
compress_thread_{save,load}_{setup,cleanup}.

Signed-off-by: Juan Quintela <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Message-Id: <20170628095228 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: Convert ram to use new load_setup()/load_cleanup()

Once there, I rename ram_migration_cleanup() to ram_save_cleanup().
Notice that this is the first pass, and I only passed XBZRLE to the
new scheme. Moved decoded_buf to inside XBZRLE struct.
As a bonus, I don't have to export xbzrle functions from ram.c.

Signed-off-by: Juan Quintela <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
--

loaded_data pointer was needed because called can change it (dave)
spell loaded correctly in comment (dave)
Message-Id: <20170628095228 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: Create load_setup()/cleanup() methods

We need to do things at load time and at cleanup time.

Signed-off-by: Juan Quintela <[email protected]>
--

Move the printing of the error message so we can print the device
giving the error.
Add call to postcopy stuff
Message-Id: <20170628095228 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: Rename cleanup() to save_cleanup()

We need a cleanup for loads, so we rename here to be consistent.

Signed-off-by: Juan Quintela <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
--

Rename htab_cleanup to htap_save_cleanup as dave suggestion
Message-Id: <20170628095228 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: Rename save_live_setup() to save_setup()

We are going to use it now for more than save live regions.
Once there rename qemu_savevm_state_begin() to qemu_savevm_state_setup().

Signed-off-by: Juan Quintela <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Message-Id: <20170628095228 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

doc: update TYPE_MIGRATION documents

[Peter collected Eduardo's patch comment and formatted into patch]

Suggested-by: Eduardo Habkost <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <1499242883 [email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

doc: add item for "-M enforce-config-section"

It's never documented, and now we have one more parameter for it (which
obsoletes this one). Document it properly.

Suggested-by: Eduardo Habkost <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <1499396048 [email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Removed 'Although now' commit message as per Eduardo's review

vl: move global property, migrate init earlier

Currently drive_init_func() may call migrate_get_current() while the
migrate object is still not ready yet at that time. Move the migration
object init earlier, along with the global properties, right after
acceleration init.

This fixes a breakage for iotest 055, which caused an assertion failure.

Reported-by: Max Reitz <[email protected]>
Reported-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Tested-by: QingFeng Hao <[email protected]>
Fixes: 3df663 ("migration: move only_migratable to MigrationState")
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <1499242883 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: fix handling for --only-migratable

MigrateState object is not ready at that time, so we'll get an
assertion. Use qemu_global_option() instead.

Reported-by: Eduardo Habkost <[email protected]>
Suggested-by: Eduardo Habkost <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Fixes: 3df663e ("migration: move only_migratable to MigrationState")
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <1499242883 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

vfio/pci: Fixup v0 PCIe capabilities

Intel 82599 VFs report a PCIe capability version of 0, which is
invalid.  The earliest version of the PCIe spec used version 1.  This
causes Windows to fail startup on the device and it will be disabled
with error code 10.  Our choices are either to drop the PCIe cap on
such devices, which has the side effect of likely preventing the guest
from discovering any extended capabilities, or performing a fixup to
update the capability to the earliest valid version.  This implements
the latter.

Signed-off-by: Alex Williamson <[email protected]>

vfio: Test realized when using VFIOGroup.device_list iterator

VFIOGroup.device_list is effectively our reference tracking mechanism
such that we can teardown a group when all of the device references
are removed.  However, we also use this list from our machine reset
handler for processing resets that affect multiple devices.  Generally
device removals are fully processed (exitfn + finalize) when this
reset handler is invoked, however if the removal is triggered via
another reset handler (piix4_reset->acpi_pcihp_reset) then the device
exitfn may run, but not finalize.  In this case we hit asserts when
we start trying to access PCI helpers since much of the PCI state of
the device is released.  To resolve this, add a pointer to the Object
DeviceState in our common base-device and skip non-realized devices
as we iterate.

Signed-off-by: Alex Williamson <[email protected]>

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2017-07-10-v2' into staging

nbd patches for 2017-07-10

- Eric Blake: MAINTAINERS: Promote NBD to supported, with new maintainer
- Vladimir Sementsov-Ogievskiy: [00/10] nbd refactoring part 2

# gpg: Signature made Mon 10 Jul 2017 15:59:18 BST
# gpg:                using RSA key 0xA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <[email protected]>"
# gpg:                 aka "Eric Blake (Free Software Programmer) <[email protected]>"
# gpg:                 aka "[jpeg image of size 6874]"
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2017-07-10-v2:
  nbd: use generic trace subsystem instead of TRACE macro
  nbd: refactor tracing
  nbd/server: rename clientflags var in nbd_negotiate_options
  nbd/server: fix TRACE in nbd_negotiate_send_rep_len
  nbd/client: refactor TRACE of NBD_MAGIC
  nbd/common: nbd_tls_handshake: remove extra TRACE
  nbd/server: add errp to nbd_send_reply()
  nbd/server: use errp instead of LOG
  nbd/server: refactor nbd_negotiate
  nbd/server: nbd_negotiate: return 1 on NBD_OPT_ABORT
  MAINTAINERS: Promote NBD to supported, with new maintainer

Signed-off-by: Peter Maydell <[email protected]>

nbd: use generic trace subsystem instead of TRACE macro

Let NBD use the trace mechanisms already present in qemu. Now you can
use the -trace optino of qemu, or the -T/--trace option of qemu-img,
qemu-io, and qemu-nbd, to select nbd traces. For qemu, the QMP commands
trace-event-{get,set}-state can also toggle tracing on the fly.

Example:
qemu-nbd --trace 'nbd_*' <image file> # enables all nbd traces

Recompilation with CFLAGS=-DDEBUG_NBD is no more needed, furthermore,
DEBUG_NBD macro is removed from the code.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20170707152918 [email protected]>
[eblake: minor tweaks to a couple of traces]
Signed-off-by: Eric Blake <[email protected]>

nbd: refactor tracing

Reorganize traces: move, reword, add information, drop extra ones.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20170707152918 [email protected]>
Signed-off-by: Eric Blake <[email protected]>

nbd/server: rename clientflags var in nbd_negotiate_options

Rename 'clientflags' to just 'option'. This variable has nothing to do
with flags, but is a single integer representing the option requested
by the client.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20170707152918 [email protected]>
Signed-off-by: Eric Blake <[email protected]>

nbd/server: fix TRACE in nbd_negotiate_send_rep_len

Fix wrong order of TRACE arguments.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20170707152918 [email protected]>
Signed-off-by: Eric Blake <[email protected]>

nbd/client: refactor TRACE of NBD_MAGIC

We are going to switch from TRACE macro to trace points,
this TRACE complicates things, this patch simplifies it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20170707152918 [email protected]>
Signed-off-by: Eric Blake <[email protected]>

nbd/common: nbd_tls_handshake: remove extra TRACE

Error is propagated to the caller, TRACE is not needed.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-Id: <20170707152918 [email protected]>
Signed-off-by: Eric Blake <[email protected]>

nbd/server: add errp to nbd_send_reply()

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-Id: <20170707152918 [email protected]>
Signed-off-by: Eric Blake <[email protected]>

nbd/server: use errp instead of LOG

Move to modern errp scheme from just LOGging errors.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20170707152918 [email protected]>
Signed-off-by: Eric Blake <[email protected]>

nbd/server: refactor nbd_negotiate

Combine two successive "if (oldStyle) {...} else {...}" into one.

Block "if (client->tlscreds)" under "if (oldStyle)" is unreachable,
as we have "oldStyle = client->exp != NULL && !client->tlscreds;".
So, delete this block.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20170707152918 [email protected]>
Signed-off-by: Eric Blake <[email protected]>

nbd/server: nbd_negotiate: return 1 on NBD_OPT_ABORT

Separate the case when a client sends NBD_OPT_ABORT from all other
errors. It will be needed for the following patch, where errors will be
reported.
This particular case is not actually an error - it honestly follows the
NBD protocol. Therefore it should not be reported like an error.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-Id: <20170707152918 [email protected]>
Signed-off-by: Eric Blake <[email protected]>

MAINTAINERS: Promote NBD to supported, with new maintainer

We are promising more than just odd fixes, and Paolo is hoping
to offload the pull requests to me. Also, enough of NBD is related
to the block layer that it is worth including qemu-block on patches.

While at it, include blockdev-nbd.c and qemu-nbd.texi in the set
of maintained files.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20170707182151 [email protected]>
Acked-by: Paolo Bonzini <[email protected]>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Mon 10 Jul 2017 12:26:44 BST
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (40 commits)
  block: Make bdrv_is_allocated_above() byte-based
  block: Minimize raw use of bds->total_sectors
  block: Make bdrv_is_allocated() byte-based
  backup: Switch backup_run() to byte-based
  backup: Switch backup_do_cow() to byte-based
  backup: Switch block_backup.h to byte-based
  backup: Switch BackupBlockJob to byte-based
  block: Drop unused bdrv_round_sectors_to_clusters()
  mirror: Switch mirror_iteration() to byte-based
  mirror: Switch mirror_do_read() to byte-based
  mirror: Switch mirror_cow_align() to byte-based
  mirror: Update signature of mirror_clip_sectors()
  mirror: Switch mirror_do_zero_or_discard() to byte-based
  mirror: Switch MirrorBlockJob to byte-based
  commit: Switch commit_run() to byte-based
  commit: Switch commit_populate() to byte-based
  stream: Switch stream_run() to byte-based
  stream: Drop reached_end for stream_complete()
  stream: Switch stream_populate() to byte-based
  trace: Show blockjob actions via bytes, not sectors
  ...

Signed-off-by: Peter Maydell <[email protected]>

block: Make bdrv_is_allocated_above() byte-based

We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the signature of the function to use int64_t *pnum ensures
that the compiler enforces that all callers are updated.  For now,
the io.c layer still assert()s that all callers are sector-aligned,
but that can be relaxed when a later patch implements byte-based
block status.  Therefore, for the most part this patch is just the
addition of scaling at the callers followed by inverse scaling at
bdrv_is_allocated().  But some code, particularly stream_run(),
gets a lot simpler because it no longer has to mess with sectors.
Leave comments where we can further simplify by switching to
byte-based iterations, once later patches eliminate the need for
sector-aligned operations.

For ease of review, bdrv_is_allocated() was tackled separately.

Signed-off-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Minimize raw use of bds->total_sectors

bdrv_is_allocated_above() was relying on intermediate->total_sectors,
which is a field that can have stale contents depending on the value
of intermediate->has_variable_length. An audit shows that we are safe
(we were first calling through bdrv_co_get_block_status() which in
turn calls bdrv_nb_sectors() and therefore just refreshed the current
length), but it's nicer to favor our accessor functions to avoid having
to repeat such an audit, even if it means refresh_total_sectors() is
called more frequently.

Suggested-by: John Snow <[email protected]>
Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Manos Pitsidianakis <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Reviewed-by: John Snow <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Make bdrv_is_allocated() byte-based

We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the signature of the function to use int64_t *pnum ensures
that the compiler enforces that all callers are updated.  For now,
the io.c layer still assert()s that all callers are sector-aligned
on input and that *pnum is sector-aligned on return to the caller,
but that can be relaxed when a later patch implements byte-based
block status.  Therefore, this code adds usages like
DIV_ROUND_UP(,BDRV_SECTOR_SIZE) to callers that still want aligned
values, where the call might reasonbly give non-aligned results
in the future; on the other hand, no rounding is needed for callers
that should just continue to work with byte alignment.

For the most part this patch is just the addition of scaling at the
callers followed by inverse scaling at bdrv_is_allocated().  But
some code, particularly bdrv_commit(), gets a lot simpler because it
no longer has to mess with sectors; also, it is now possible to pass
NULL if the caller does not care how much of the image is allocated
beyond the initial offset.  Leave comments where we can further
simplify once a later patch eliminates the need for sector-aligned
requests through bdrv_is_allocated().

For ease of review, bdrv_is_allocated_above() will be tackled
separately.

Signed-off-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

backup: Switch backup_run() to byte-based

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Change the internal
loop iteration of backups to track by bytes instead of sectors
(although we are still guaranteed that we iterate by steps that
are cluster-aligned).

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

backup: Switch backup_do_cow() to byte-based

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Convert another internal
function (no semantic change).

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

backup: Switch block_backup.h to byte-based

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Continue by converting
the public interface to backup jobs (no semantic change), including
a change to CowRequest to track by bytes instead of cluster indices.

Note that this does not change the difference between the public
interface (starting point, and size of the subsequent range) and
the internal interface (starting and end points).

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Xie Changlong <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

backup: Switch BackupBlockJob to byte-based

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Continue by converting an
internal structure (no semantic change), and all references to
tracking progress. Drop a redundant local variable bytes_per_cluster.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Drop unused bdrv_round_sectors_to_clusters()

Now that the last user [mirror_iteration()] has converted to using
bytes, we no longer need a function to round sectors to clusters.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

mirror: Switch mirror_iteration() to byte-based

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Change the internal
loop iteration of mirroring to track by bytes instead of sectors
(although we are still guaranteed that we iterate by steps that
are both sector-aligned and multiples of the granularity). Drop
the now-unused mirror_clip_sectors().

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

mirror: Switch mirror_do_read() to byte-based

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Convert another internal
function, preserving all existing semantics, and adding one more
assertion that things are still sector-aligned (so that conversions
to sectors in mirror_read_complete don't need to round).

Signed-off-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

mirror: Switch mirror_cow_align() to byte-based

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Convert another internal
function (no semantic change), and add mirror_clip_bytes() as a
counterpart to mirror_clip_sectors(). Some of the conversion is
a bit tricky, requiring temporaries to convert between units; it
will be cleared up in a following patch.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

mirror: Update signature of mirror_clip_sectors()

Rather than having a void function that modifies its input
in-place as the output, change the signature to reduce a layer
of indirection and return the result.

Suggested-by: John Snow <[email protected]>
Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

mirror: Switch mirror_do_zero_or_discard() to byte-based

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Convert another internal
function (no semantic change).

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

mirror: Switch MirrorBlockJob to byte-based

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Continue by converting an
internal structure (no semantic change), and all references to the
buffer size.

Add an assertion that our use of s->granularity >> BDRV_SECTOR_BITS
(necessary for interaction with sector-based dirty bitmaps, until
a later patch converts those to be byte-based) does not suffer from
truncation problems.

[checkpatch has a false positive on use of MIN() in this patch]

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

commit: Switch commit_run() to byte-based

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Change the internal
loop iteration of committing to track by bytes instead of sectors
(although we are still guaranteed that we iterate by steps that
are sector-aligned).

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

commit: Switch commit_populate() to byte-based

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Start by converting an
internal function (no semantic change).

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

stream: Switch stream_run() to byte-based

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Change the internal
loop iteration of streaming to track by bytes instead of sectors
(although we are still guaranteed that we iterate by steps that
are sector-aligned).

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

stream: Drop reached_end for stream_complete()

stream_complete() skips the work of rewriting the backing file if
the job was cancelled, if data->reached_end is false, or if there
was an error detected (non-zero data->ret) during the streaming.
But note that in stream_run(), data->reached_end is only set if the
loop ran to completion, and data->ret is only 0 in two cases:
either the loop ran to completion (possibly by cancellation, but
stream_complete checks for that), or we took an early goto out
because there is no bs->backing. Thus, we can preserve the same
semantics without the use of reached_end, by merely checking for
bs->backing (and logically, if there was no backing file, streaming
is a no-op, so there is no backing file to rewrite).

Suggested-by: Kevin Wolf <[email protected]>
Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

stream: Switch stream_populate() to byte-based

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Start by converting an
internal function (no semantic change).

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

trace: Show blockjob actions via bytes, not sectors

Upcoming patches are going to switch to byte-based interfaces
instead of sector-based. Even worse, trace_backup_do_cow_enter()
had a weird mix of cluster and sector indices.

The trace interface is low enough that there are no stability
guarantees, and therefore nothing wrong with changing our units,
even in cases like trace_backup_do_cow_skip() where we are not
changing the trace output. So make the tracing uniformly use
bytes.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

blockjob: Track job ratelimits via bytes, not sectors

The user interface specifies job rate limits in bytes/second.
It's pointless to have our internal representation track things
in sectors/second, particularly since we want to move away from
sector-based interfaces.

Fix up a doc typo found while verifying that the ratelimit
code handles the scaling difference.

Repetition of expressions like 'n * BDRV_SECTOR_SIZE' will be
cleaned up later when functions are converted to iterate over
images by bytes rather than by sectors.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

blockdev: Print a warning for legacy drive options that belong to -device

We likely do not want to carry these legacy -drive options along forever.
Let's emit a deprecation warning for the -drive options that have a
replacement with the -device option, so that the (hopefully few) remaining
users are aware of this and can adapt their scripts / behaviour accordingly.

Signed-off-by: Thomas Huth <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-img: drop -e and -6 options from the 'create' & 'convert' commands

The '-e' and '-6' options to the 'create' & 'convert' commands were
"deprecated" in favour of the more generic '-o' option many years ago:

  commit eec77d9e712bd4157a4e1c0b5a9249d168add738
  Author: Jes Sorensen <[email protected]>
  Date:   Tue Dec 7 17:44:34 2010 +0100

    qemu-img: Deprecate obsolete -6 and -e options

Except this was never actually a deprecation, which would imply giving
the user a warning while the functionality continues to work for a
number of releases before eventual removal. Instead the options were
immediately turned into an error + exit. Given that the functionality
is already broken, there's no point in keeping these psuedo-deprecation
messages around any longer.

Signed-off-by: Daniel P. Berrange <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vvfat: change OEM name to 'MSWIN4.1'

According to specification:
"'MSWIN4.1' is the recommanded setting, because it is the setting least likely
to cause compatibility problems. If you want to put something else in here,
that is your option, but the result may be that some FAT drivers might not
recognize the volume."

Specification: "FAT: General overview of on-disk format" v1.03, page 9
Signed-off-by: Hervé Poussineau <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vvfat: handle KANJI lead byte 0xe5

Specification: "FAT: General overview of on-disk format" v1.03, page 23
Signed-off-by: Hervé Poussineau <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vvfat: limit number of entries in root directory in FAT12/FAT16

FAT12/FAT16 root directory is two sectors in size, which allows only 512 directory entries.
Prevent QEMU startup if too much files exist, instead of overflowing root directory.

Also introduce variable root_entries, which will be required for FAT32.

Fixes: https://bugs.launchpad.net/qemu/+bug/1599539/comments/4
Signed-off-by: Hervé Poussineau <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vvfat: correctly generate numeric-tail of short file names

More specifically:
- try without numeric-tail only if LFN didn't have invalid short chars
- start at ~1 (instead of ~0)
- handle case if numeric tail is more than one char (ie > 10)

Windows 9x Scandisk doesn't see anymore mismatches between short file names and
long file names for non-ASCII filenames.

Specification: "FAT: General overview of on-disk format" v1.03, page 31
Signed-off-by: Hervé Poussineau <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vvfat: correctly create base short names for non-ASCII filenames

More specifically, create short name from filename and change blacklist of
invalid chars to whitelist of valid chars.

Windows 9x also now correctly see long file names of filenames containing a space,
but Scandisk still complains about mismatch between SFN and LFN.

[kwolf: Build fix for this intermediate patch (it included declarations
for variables that are only used in the next patch) ]

Specification: "FAT: General overview of on-disk format" v1.03, pages 30-31
Signed-off-by: Hervé Poussineau <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vvfat: correctly create long names for non-ASCII filenames

Assume that input filename is encoded as UTF-8, so correctly create UTF-16 encoding.

Signed-off-by: Hervé Poussineau <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vvfat: always create . and .. entries at first and in that order

readdir() doesn't always return . and .. entries at first and in that order.
This leads to not creating them at first in the directory, which raises some
errors on file system checking utilities like MS-DOS Scandisk.

Specification: "FAT: General overview of on-disk format" v1.03, page 25

Fixes: https://bugs.launchpad.net/qemu/+bug/1599539
Signed-off-by: Hervé Poussineau <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vvfat: fix field names in FAT12/FAT16 and FAT32 boot sectors

Specification: "FAT: General overview of on-disk format" v1.03, pages 11-13
Signed-off-by: Hervé Poussineau <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vvfat: introduce offset_to_bootsector, offset_to_fat and offset_to_root_dir

- offset_to_bootsector is the number of sectors up to FAT bootsector
- offset_to_fat is the number of sectors up to first File Allocation Table
- offset_to_root_dir is the number of sectors up to root directory sector

Replace first_sectors_number - 1 by offset_to_bootsector.
Replace first_sectors_number by offset_to_fat.
Replace faked_sectors by offset_to_rootdir.

Signed-off-by: Hervé Poussineau <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vvfat: rename useless enumeration values

MODE_FAKED and MODE_RENAMED are not and were never used.

Signed-off-by: Hervé Poussineau <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vvfat: fix typos

Signed-off-by: Hervé Poussineau <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vvfat: replace tabs by 8 spaces

This was a complete mess. On 2299 indented lines:
- 1329 were with spaces only
- 617 with tabulations only
- 353 with spaces and tabulations

Signed-off-by: Hervé Poussineau <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vvfat: fix qemu-img map and qemu-img convert

- bs->total_sectors is the number of sectors of the whole disk
- s->sector_count is the number of sectors of the FAT partition

This fixes the following assert in qemu-img map:
qemu-img.c:2641: get_block_status: Assertion `nb_sectors' failed.

This also fixes an infinite loop in qemu-img convert.

Fixes: 4480e0f924a42e1db8b8cfcac4d0634dd1bb27a0
Fixes: https://bugs.launchpad.net/qemu/+bug/1599539
Cc: [email protected]
Signed-off-by: Hervé Poussineau <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

blkdebug: Support .bdrv_co_get_block_status

Without a passthrough status of BDRV_BLOCK_RAW, anything wrapped by
blkdebug appears 100% allocated as data. Better is treating it the
same as the underlying file being wrapped.

Update iotest 177 for the new expected output.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: John Snow <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Simplify use of BDRV_BLOCK_RAW

The lone caller that cares about a return of BDRV_BLOCK_RAW
(namely, io.c:bdrv_co_get_block_status) completely replaces the
return value, so there is no point in passing BDRV_BLOCK_DATA.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Reviewed-by: John Snow <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Guarantee that *file is set on bdrv_get_block_status()

We document that *file is valid if the return is not an error and
includes BDRV_BLOCK_OFFSET_VALID, but forgot to obey this contract
when a driver (such as blkdebug) lacks a callback.  Messed up in
commit 67a0fd2 (v2.6), when we added the file parameter.

Enhance qemu-iotest 177 to cover this, using a sequence that would
print garbage or even SEGV, because it was dererefencing through
uninitialized memory.  [The resulting test output shows that we
have less-than-ideal block status from the blkdebug driver, but
that's a separate fix coming up soon.]

Setting *file on all paths that return BDRV_BLOCK_OFFSET_VALID is
enough to fix the crash, but we can go one step further: always
setting *file, even on error, means that a broken caller that
blindly dereferences file without checking for error is now more
likely to get a reliable SEGV instead of randomly acting on garbage,
making it easier to diagnose such buggy callers.  Adding an
assertion that file is set where expected doesn't hurt either.

CC: [email protected]
Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: John Snow <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-io: Don't die on second open

Most callback commands in qemu-io return 0 to keep the interpreter
loop running, or 1 to quit immediately.  However, open_f() just
passed through the return value of openfile(), which has different
semantics of returning 0 if a file was opened, or 1 on any failure.

As a result of mixing the return semantics, we are forcing the
qemu-io interpreter to exit early on any failures, which is rather
annoying when some of the failures are obviously trying to give
the user a hint of how to proceed (if we didn't then kill qemu-io
out from under the user's feet):

$ qemu-io
qemu-io> open foo
qemu-io> open foo
file open already, try 'help close'
$ echo $?
0

In general, we WANT openfile() to report failures, since it is the
function used in the form 'qemu-io -c "$something" no_such_file'
for performing one or more -c options on a single file, and it is
not worth attempting $something if the file itself cannot be opened.
So the solution is to fix open_f() to always return 0 (when we are
in interactive mode, even failure to open should not end the
session), and save the return value of openfile() for command line
use in main().

Note, however, that we do have some qemu-iotests that do 'qemu-io
-c "open file" -c "$something"'; such tests will now proceed to
attempt $something whether or not the open succeeded, the same way
as if the two commands had been attempted in interactive mode.  As
such, the expected output for those tests has to be modified.  But it
also means that it is now possible to use -c close and have a single
qemu-io command line operate on more than one file even without
using interactive mode.  Although the '-c open' action is a subtle
change in behavior, remember that qemu-io is for debugging purposes,
so as long as it serves the needs of qemu-iotests while still being
reasonable for interactive use, it should not be a problem that we
are changing tests to the new behavior.

This has been awkward since at least as far back as commit
e3aff4f, in 2009.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Reviewed-by: John Snow <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20170709' into staging

Queued TCG patches

# gpg: Signature made Mon 10 Jul 2017 08:31:44 BST
# gpg:                using RSA key 0xAD1270CC4DD0279B
# gpg: Good signature from "Richard Henderson <[email protected]>"
# gpg:                 aka "Richard Henderson <[email protected]>"
# gpg:                 aka "Richard Henderson <[email protected]>"
# Primary key fingerprint: 9CB1 8DDA F8E8 49AD 2AFC  16A4 AD12 70CC 4DD0 279B

* remotes/rth/tags/pull-tcg-20170709:
  tcg/mips: Bugfix for crash when running program with qemu-i386.
  util/cacheinfo: Fix warning generated by clang
  tcg/aarch64: Enable indirect jump path using LDR (literal)
  tcg/aarch64: Use ADRP+ADD to compute target address
  tcg/aarch64: Introduce and use long branch to register

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/sstabellini/tags/xen-20170707-tag' into staging

Xen 2017/07/07

# gpg: Signature made Fri 07 Jul 2017 19:21:22 BST
# gpg:                using RSA key 0x894F8F4870E1AE90
# gpg: Good signature from "Stefano Stabellini <[email protected]>"
# gpg:                 aka "Stefano Stabellini <[email protected]>"
# Primary key fingerprint: D04E 33AB A51F 67BA 07D3  0AEA 894F 8F48 70E1 AE90

* remotes/sstabellini/tags/xen-20170707-tag:
  xen/pt: Fixup addr validation in xen_pt_pci_config_access_check
  xen-platform: Cleanup network infrastructure when emulated NICs are unplugged
  xenfb: remove xen_init_display "temporary" hack

Signed-off-by: Peter Maydell <[email protected]>

tcg/mips: Bugfix for crash when running program with qemu-i386.

When running a helloworld program with qemu-i386 in linux-user
mode on Loongson 3A3000, it will crash. This patch fix the bug.

Signed-off-by: Jiang Biao <[email protected]>
Message-Id: <1499669979 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

util/cacheinfo: Fix warning generated by clang

Clang generates the following warning on aarch64 host:

  CC      util/cacheinfo.o
/home/pranith/qemu/util/cacheinfo.c:121:48: warning: value size does not match register size specified by the constraint and modifier [-Wasm-operand-widths]
        asm volatile("mrs\t%0, ctr_el0" : "=r"(ctr));
                                               ^
/home/pranith/qemu/util/cacheinfo.c:121:28: note: use constraint modifier "w"
        asm volatile("mrs\t%0, ctr_el0" : "=r"(ctr));
                           ^~
                           %w0

Constraint modifier 'w' is not (yet?) accepted by gcc. Fix this by increasing the ctr size.

Tested-by: Emilio G. Cota <[email protected]>
Reviewed-by: Emilio G. Cota <[email protected]>
Signed-off-by: Pranith Kumar <[email protected]>
Message-Id: <20170630153946 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg/aarch64: Enable indirect jump path using LDR (literal)

This patch enables the indirect jump path using an LDR (literal)
instruction. It will be interesting to test and see which performs
better among the two paths.

CC: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Pranith Kumar <[email protected]>
Message-Id: <20170630143614 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg/aarch64: Use ADRP+ADD to compute target address

We use ADRP+ADD to compute the target address for goto_tb. This patch
introduces the NOP instruction which is used to align the above
instruction pair so that we can use one atomic instruction to patch
the destination offsets.

CC: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Pranith Kumar <[email protected]>
Message-Id: <20170630143614 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg/aarch64: Introduce and use long branch to register

We can use a branch to register instruction for exit_tb for offsets
greater than 128MB.

CC: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Pranith Kumar <[email protected]>
Message-Id: <20170630143614 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

xen/pt: Fixup addr validation in xen_pt_pci_config_access_check

xen_pt_pci_config_access_check checks if addr >= 0xFF. 0xFF is a valid
address and should not be ignored.

Signed-off-by: Anoob Soman <[email protected]>
Acked-by: Anthony PERARD <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>

xen-platform: Cleanup network infrastructure when emulated NICs are unplugged

When the guest unplugs the emulated NICs, cleanup the peer for each NIC
as it is not needed anymore. Most importantly, this allows the tap
interfaces which QEMU holds open to be closed and removed.

Signed-off-by: Ross Lagerwall <[email protected]>
Acked-by: Anthony PERARD <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>

xenfb: remove xen_init_display "temporary" hack

Initialize xenfb properly, as all other backends, from its own
"initialise" function.

Remove the dependency of vkbd on vfb: use qemu_console_lookup_by_index
to find the principal console (to get the size of the screen) instead of
relying on a vfb backend to be available (which adds a dependency
between the two).

Signed-off-by: Stefano Stabellini <[email protected]>
Reviewed-by: Paul Durrant <[email protected]>

Merge remote-tracking branch 'remotes/borntraeger/tags/s390x-20170706' into staging

s390x/kvm/migration: fixes, enhancements and cleanups

- new email address for Cornelia
- Fixes: 3270, flic, virtio-scsi-ccw, ipl
- Enhancements, cpumodel, migration

# gpg: Signature made Thu 06 Jul 2017 08:18:19 BST
# gpg:                using RSA key 0x117BBC80B5A61C7C
# gpg: Good signature from "Christian Borntraeger (IBM) <[email protected]>"
# Primary key fingerprint: F922 9381 A334 08F9 DBAB  FBCA 117B BC80 B5A6 1C7C

* remotes/borntraeger/tags/s390x-20170706:
  hw/s390x/ipl: Fix endianness problem with netboot_start_addr
  virtio-scsi-ccw: use ioeventfd even when KVM is disabled
  s390x: return unavailable features via query-cpu-definitions
  s390x/MAINTAINERS: Update my email address
  s390x: fix realize inheritance for kvm-flic
  s390x: fix error propagation in kvm-flic's realize
  s390x/3270: fix instruction interception handler
  s390x: vmstatify config migration for virtio-ccw

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* qemu-thread portability improvement (Fam)
* virtio-scsi IOMMU fix (Jason)
* poisoning and common-obj-y cleanups (Thomas)
* initial Hypervisor.framework refactoring (Sergio)
* x86 TCG interrupt injection fixes (Wu Xiang, me)
* --disable-tcg support for x86 (Yang Zhong, me)
* various other bugfixes and cleanups (Daniel, Peter, Thomas)

# gpg: Signature made Wed 05 Jul 2017 08:12:56 BST
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg:                 aka "Paolo Bonzini <[email protected]>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (42 commits)
  target/i386: add the CONFIG_TCG into Makefiles
  target/i386: add the tcg_enabled() in target/i386/
  target/i386: move TLB refill function out of helper.c
  target/i386: split cpu_set_mxcsr() and make cpu_set_fpuc() inline
  target/i386: make cpu_get_fp80()/cpu_set_fp80() static
  target/i386: move cpu_sync_bndcs_hflags() function
  tcg: add the CONFIG_TCG into Makefiles
  tcg: add CONFIG_TCG guards in headers
  exec: elide calls to tb_lock and tb_unlock
  tcg: move tb_lock out of translate-all.h
  tcg: add the tcg-stub.c file into accel/stubs/
  vapic: use tcg_enabled
  monitor: disable "info jit" and "info opcount" if !TCG
  tcg: make tcg_allowed global
  cpu: move interrupt handling out of translate-common.c
  tcg: move page_size_init() function
  vl: add tcg_enabled() for tcg related code
  vl: convert -tb-size to qemu_strtoul
  configure: add --disable-tcg configure option
  configure: early test for supported targets
  ...

Signed-off-by: Peter Maydell <[email protected]>

hw/s390x/ipl: Fix endianness problem with netboot_start_addr

The start address has to be stored in big endian byte order
in the iplb.ccw block for the guest.

Signed-off-by: Thomas Huth <[email protected]>
Message-Id: <1499268345 [email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>

virtio-scsi-ccw: use ioeventfd even when KVM is disabled

This patch is based on a similar patch from Stefan Hajnoczi -
commit c324fd0a39c ("virtio-pci: use ioeventfd even when KVM is disabled")

Do not check kvm_eventfds_enabled() when KVM is disabled since it
always returns 0. Since commit 8c56c1a592b5092d91da8d8943c17777d6462a6f
("memory: emulate ioeventfd") it has been possible to use ioeventfds in
qtest or TCG mode.

This patch makes -device virtio-scsi-ccw,iothread=iothread0 work even
when KVM is disabled.
Currently we don't have an equivalent to "memory: emulate ioeventfd"
for ccw yet, but that this doesn't hurt and qemu-iotests 068 can pass with
skipping iothread arguments.

I have tested that virtio-scsi-ccw works under tcg both with and without
iothread.

This patch fixes qemu-iotests 068, which was accidentally merged early
despite the dependency on ioeventfd.

Signed-off-by: QingFeng Hao <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Message-Id: <20170704132350 [email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>

s390x: return unavailable features via query-cpu-definitions

The response for query-cpu-definitions didn't include the
unavailable-features field, which is used by libvirt to figure
out whether a certain cpu model is usable on the host.

The unavailable features are now computed by obtaining the host CPU
model and comparing it against the known CPU models. The comparison
takes into account the generation, the GA level and the feature
bitmaps. In the case of a CPU generation/GA level mismatch
a feature called "type" is reported to be missing.

As a result, the output of virsh domcapabilities would change
from something like
...
     <mode name='custom' supported='yes'>
      <model usable='unknown'>z10EC-base</model>
      <model usable='unknown'>z9EC-base</model>
      <model usable='unknown'>z196.2-base</model>
      <model usable='unknown'>z900-base</model>
      <model usable='unknown'>z990</model>
...
to
...
     <mode name='custom' supported='yes'>
      <model usable='yes'>z10EC-base</model>
      <model usable='yes'>z9EC-base</model>
      <model usable='no'>z196.2-base</model>
      <model usable='yes'>z900-base</model>
      <model usable='yes'>z990</model>
...

Signed-off-by: Viktor Mihajlovski <[email protected]>
Message-Id: <1499082529 [email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Cornelia Huck <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>

s390x/MAINTAINERS: Update my email address

Signed-off-by: Cornelia Huck <[email protected]>
Message-Id: <20170704092215 [email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>

s390x: fix realize inheritance for kvm-flic

Commit f6f4ce4211 ("s390x: add property adapter_routes_max_batch",
2016-12-09) introduces a common realize (intended to be common for all
the subclasses) for flic, but fails to make sure the kvm-flic which had
its own is actually calling this common realize.

This omission fortunately does not result in a grave problem. The common
realize was only supposed to catch a possible programming mistake by
validating a value of a property set via the compat machine macros. Since
there was no programming mistake we don't need this fixed for stable.

Let's fix this problem by making sure kvm flic honors the realize of its
parent class.

Let us also improve on the error message we would hypothetically emit
when the validation fails.

Signed-off-by: Halil Pasic <[email protected]>
Fixes: f6f4ce4211 ("s390x: add property adapter_routes_max_batch")
Reviewed-by: Dong Jia Shi <[email protected]>
Reviewed-by: Yi Min Zhao <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>

s390x: fix error propagation in kvm-flic's realize

From the moment it was introduced by commit a2875e6f98 ("s390x/kvm:
implement floating-interrupt controller device", 2013-07-16) the kvm-flic
is not making realize fail properly in case it's impossible to create the
KVM device which basically serves as a backend and is absolutely
essential for having an operational kvm-flic.

Let's fix this by making sure we do proper error propagation in realize.

Signed-off-by: Halil Pasic <[email protected]>
Fixes: a2875e6f98 "s390x/kvm: implement floating-interrupt controller device"
Reviewed-by: Dong Jia Shi <[email protected]>
Reviewed-by: Yi Min Zhao <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>

s390x/3270: fix instruction interception handler

Commit bab482d7405f ("s390x/css: ccw translation infrastructure")
introduced instruction interception handler for different types of
subchannels. For emulated 3270 devices, we should assign the virtual
subchannel handler to them during device realization process, or 3270
will not work.

Fixes: bab482d7405f ("s390x/css: ccw translation infrastructure")
Reviewed-by: Jing Liu <[email protected]>
Reviewed-by: Halil Pasic <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>