Git Repo - qemu.git/log

iotests: Add test for concurrent stream/commit

We already have 030 for that in general, but this tests very specific
cases of both jobs finishing concurrently.

Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

tests: Test mid-drain bdrv_replace_child_noperm()

Add a test for what happens when you call bdrv_replace_child_noperm()
for various drain situations ({old,new} child {drained,not drained}).

Most importantly, if both the old and the new child are drained, the
parent must not be undrained at any point.

Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

tests: Test polling in bdrv_drop_intermediate()

Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Reduce (un)drains when replacing a child

Currently, bdrv_replace_child_noperm() undrains the parent until it is
completely undrained, then re-drains it after attaching the new child
node.

This is a problem with bdrv_drop_intermediate(): We want to keep the
whole subtree drained, including parents, while the operation is
under way.  bdrv_replace_child_noperm() breaks this by allowing every
parent to become unquiesced briefly, and then redraining it.

In fact, there is no reason why the parent should become unquiesced and
be allowed to submit requests to the new child node if that new node is
supposed to be kept drained.  So if anything, we have to drain the
parent before detaching the old child node.  Conversely, we have to
undrain it only after attaching the new child node.

Thus, change the whole drain algorithm here: Calculate the number of
times we have to drain/undrain the parent before replacing the child
node then drain it (if necessary), replace the child node, and then
undrain it.

Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Keep subtree drained in drop_intermediate

bdrv_drop_intermediate() calls BdrvChildRole.update_filename(). That
may poll, thus changing the graph, which potentially breaks the
QLIST_FOREACH_SAFE() loop.

Just keep the whole subtree drained. This is probably the right thing
to do anyway (dropping nodes while the subtree is not drained seems
wrong).

Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Simplify bdrv_filter_default_perms()

The same change as commit 2b23f28639 ('block/copy-on-read: Fix
permissions for inactive node') made for the copy-on-read driver can be
made for bdrv_filter_default_perms(): Retaining the old permissions from
the BdrvChild if it is given complicates things unnecessarily when in
the end this only means that the options set in the c == NULL case (i.e.
during child creation) are retained.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>

iotests: Test migration with all kinds of filter nodes

This test case is motivated by commit 2b23f28639 ('block/copy-on-read:
Fix permissions for inactive node'). Instead of just testing
copy-on-read on migration, let's stack all sorts of filter nodes on top
of each other and try if the resulting VM can still migrate
successfully. For good measure, put everything into an iothread, because
why not?

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Max Reitz <[email protected]>

iotests: Move migration helpers to iotests.py

234 implements functions that are useful for doing migration between two
VMs. Move them to iotests.py so that other test cases can use them, too.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Max Reitz <[email protected]>

iotests/118: Add -blockdev based tests

The code path for -device drive=<node-name> or without a drive=...
option for empty drives, which is supposed to be used with -blockdev
differs enough from the -drive based path with a user-owned
BlockBackend, so we want to test both paths at least for the basic tests
implemented by TestInitiallyFilled and TestInitiallyEmpty.

This would have caught the bug recently fixed for inserting read-only
nodes into a scsi-cd created without a drive=... option.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Max Reitz <[email protected]>

iotests/118: Create test classes dynamically

We're getting a ridiculous number of child classes of
TestInitiallyFilled and TestInitiallyEmpty that differ only in a few
attributes that we want to test in all combinations.

Instead of explicitly writing down every combination, let's use a loop
and create those classes dynamically.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Max Reitz <[email protected]>

iotests/118: Test media change for scsi-cd

The test covered only floppy and ide-cd. Add scsi-cd as well.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Max Reitz <[email protected]>

block/nbd: refactor nbd connection parameters

We'll need some connection parameters to be available all the time to
implement nbd reconnect. So, let's refactor them: define additional
parameters in BDRVNBDState, drop them from function parameters, drop
nbd_client_init and separate options parsing instead from nbd_open.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20190618114328 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
[eblake: Drop useless 'if' before object_unref]
Signed-off-by: Eric Blake <[email protected]>

block/nbd: add cmdline and qapi parameter reconnect-delay

Reconnect will be implemented in the following commit, so for now,
in semantics below, disconnect itself is a "serious error".

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-Id: <20190618114328 [email protected]>
[eblake: slipped from 4.1 to 4.2]
Signed-off-by: Eric Blake <[email protected]>

block/nbd: move from quit to state

To implement reconnect we need several states for the client:
CONNECTED, QUIT and two different CONNECTING states. CONNECTING states
will be added in the following patches. This patch implements CONNECTED
and QUIT.

QUIT means, that we should close the connection and fail all current
and further requests (like old quit = true).

CONNECTED means that connection is ok, we can send requests (like old
quit = false).

For receiving loop we use a comparison of the current state with QUIT,
because reconnect will be in the same loop, so it should be looping
until the end.

Opposite, for requests we use a comparison of the current state with
CONNECTED, as we don't want to send requests in future CONNECTING
states.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-Id: <20190618114328 [email protected]>
Signed-off-by: Eric Blake <[email protected]>

block/nbd: use non-blocking io channel for nbd negotiation

No reason to use blocking channel for negotiation and we'll benefit in
further reconnect feature, as qio_channel reads and writes will do
qemu_coroutine_yield while waiting for io completion.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-Id: <20190618114328 [email protected]>
Signed-off-by: Eric Blake <[email protected]>

block/nbd: split connection_co start out of nbd_client_connect

nbd_client_connect is going to be used from connection_co, so, let's
refactor nbd_client_connect in advance, leaving io channel
configuration all in nbd_client_connect.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-Id: <20190618114328 [email protected]>
Signed-off-by: Eric Blake <[email protected]>

nbd: improve CMD_CACHE: use BDRV_REQ_PREFETCH

This helps to avoid extra io, allocations and memory copying.
We assume here that CMD_CACHE is always used with copy-on-read, as
otherwise it's a noop.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20190725100550 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Eric Blake <[email protected]>

block/stream: use BDRV_REQ_PREFETCH

This helps to avoid extra io, allocations and memory copying.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20190725100550 [email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
[eblake: fix comment grammar]
Signed-off-by: Eric Blake <[email protected]>

block: implement BDRV_REQ_PREFETCH

Do effective copy-on-read request when we don't need data actually. It
will be used for block-stream and NBD_CMD_CACHE.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20190725100550 [email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
[eblake: comment grammar fix]
Signed-off-by: Eric Blake <[email protected]>

qapi: Add InetSocketAddress member keep-alive

It's needed to provide keepalive for nbd client to track server
availability.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20190725094937 [email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Acked-by: Daniel P. Berrangé <[email protected]>
[eblake: Fix error message typo]
Signed-off-by: Eric Blake <[email protected]>

tests/libqtest: Make qmp_assert_success() independent from global_qtest

The normal libqtest library functions should never depend on global_qtest.
Pass in the test state via parameter instead. And while we're at it,
also rename this function to qtest_qmp_assert_success() to make it clear
that it is part of libqtest.

Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-Id: <20190813093047 [email protected]>
Signed-off-by: Thomas Huth <[email protected]>

tests/libqtest: Make qtest_qmp_device_add/del independent from global_qtest

Generic library functions like qtest_qmp_device_add() and _del()
should not depend on the global_qtest variable. Pass the test
state via parameter instead.

Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Message-Id: <20190813093047 [email protected]>
Signed-off-by: Thomas Huth <[email protected]>

tests/libqtest: Clean up qtest_cb_for_every_machine() wrt global_qtest

The generic libqtest library functions should not use functions that
require the global_qtest variable.

Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-Id: <20190813093047 [email protected]>
Signed-off-by: Thomas Huth <[email protected]>

tests/libqtest: Remove unused function hmp()

No test is using hmp() anymore, and since this function uses the disliked
global_qtest variable, we should also make sure that nobody adds new code
with this function again. qtest_hmp() should be used instead.

Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Message-Id: <20190813093047 [email protected]>
Signed-off-by: Thomas Huth <[email protected]>

tests/libqos: Make virtio-pci code independent from global_qtest

The libqos library functions should never depend on global_qtest,
since these functions might be used in tests that track multiple
test states. So let's use the test state of the QPCIDevice instead.

Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-Id: <20190813093047 [email protected]>
Signed-off-by: Thomas Huth <[email protected]>

tests/libqos: Make generic virtio code independent from global_qtest

The libqos library functions should never depend on global_qtest,
since these functions might be used in tests that track multiple
test states. Pass around a pointer to the QTestState instead.

Message-Id: <20190814195920 [email protected]>
Signed-off-by: Thomas Huth <[email protected]>

tests: Set read-zeroes on for null-co driver

This patch is to reduce the number of Valgrind report messages about
using uninitialized memory with the null-co driver. It helps to filter
real memory issues and is the same work done for the iotests with the
commit ID a6862418fec4072.

Suggested-by: Kevin Wolf <[email protected]>
Signed-off-by: Andrey Shinkevich <[email protected]>
Message-Id: <1564404360 [email protected]>
Signed-off-by: Thomas Huth <[email protected]>

libqos: Account for the ctrl queue in virtio-net

The number of queues is 2n+1, where n == 1 when multiqueue is disabled

Signed-off-by: Alexander Oleinik <[email protected]>
Message-Id: <20190805032400 [email protected]>
[thuth: fixed "intefaces" typo]
Signed-off-by: Thomas Huth <[email protected]>

qtest: Rename qtest.c:qtest_init()

Both the qtest client, libqtest.c, and server, qtest.c, used the same
name for initialization functions which can cause confusion.

Signed-off-by: Alexander Oleinik <[email protected]>
Message-Id: <20190805031240 [email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>

Open 4.2 development tree

Signed-off-by: Peter Maydell <[email protected]>

Update version for v4.1.0 release

Signed-off-by: Peter Maydell <[email protected]>

migration: add some multifd traces

Signed-off-by: Juan Quintela <[email protected]>
Message-Id: <20190814020218 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: Make global sem_sync semaphore by channel

This makes easy to debug things because when you want for all threads
to arrive at that semaphore, you know which one your are waiting for.

Signed-off-by: Juan Quintela <[email protected]>
Message-Id: <20190814020218 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: Add traces for multifd terminate threads

Signed-off-by: Juan Quintela <[email protected]>
Message-Id: <20190814020218 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

qemu-file: move qemu_{get,put}_counted_string() declarations

Move migration helpers for strings under include/, so they can be used
outside of migration/

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Message-Id: <20190808150325 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/postcopy: use mis->bh instead of allocating a QEMUBH

For migration incoming side, it either quit in precopy or postcopy. It
is safe to use the mis->bh for both instead of allocating a dedicated
QEMUBH for postcopy.

Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Message-Id: <20190805053146 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: rename migration_bitmap_sync_range to ramblock_sync_dirty_bitmap

Rename for better understanding of the code.

Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20190808033155 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: update ram_counters for multifd sync packet

Multifd sync will send MULTIFD_FLAG_SYNC flag info to destination, add
these bytes to ram_counters record.

Signed-off-by: Ivan Ren <[email protected]>
Suggested-by: Wei Yang <[email protected]>
Message-Id: <1564464816 [email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: add speed limit for multifd migration

Limit the speed of multifd migration through common speed limitation
qemu file.

Signed-off-by: Ivan Ren <[email protected]>
Message-Id: <1564464816 [email protected]>
Reviewed-by: Wei Yang <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: add qemu_file_update_transfer interface

Add qemu_file_update_transfer for just update bytes_xfer for speed
limitation. This will be used for further migration feature such as
multifd migration.

Signed-off-by: Ivan Ren <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Message-Id: <1564464816 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: always initialise ram_counters for a new migration

This patch fix a multifd migration bug in migration speed calculation, this
problem can be reproduced as follows:
1. start a vm and give a heavy memory write stress to prevent the vm be
   successfully migrated to destination
2. begin a migration with multifd
3. migrate for a long time [actually, this can be measured by transferred bytes]
4. migrate cancel
5. begin a new migration with multifd, the migration will directly run into
   migration_completion phase

Reason as follows:

Migration update bandwidth and s->threshold_size in function
migration_update_counters after BUFFER_DELAY time:

    current_bytes = migration_total_bytes(s);
    transferred = current_bytes - s->iteration_initial_bytes;
    time_spent = current_time - s->iteration_start_time;
    bandwidth = (double)transferred / time_spent;
    s->threshold_size = bandwidth * s->parameters.downtime_limit;

In multifd migration, migration_total_bytes function return
qemu_ftell(s->to_dst_file) + ram_counters.multifd_bytes.
s->iteration_initial_bytes will be initialized to 0 at every new migration,
but ram_counters is a global variable, and history migration data will be
accumulated. So if the ram_counters.multifd_bytes is big enough, it may lead
pending_size >= s->threshold_size become false in migration_iteration_run
after the first migration_update_counters.

Signed-off-by: Ivan Ren <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Suggested-by: Wei Yang <[email protected]>
Message-Id: <1564741121 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: remove unused field bytes_xfer

MigrationState->bytes_xfer is only set to 0 in migrate_init().

Remove this unnecessary field.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20190402003106 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

hmp: Remove migration capabilities from "info migrate"

With the growth of migration capabilities, it is not proper to display
them in "info migrate". Users are recommended to use "info
migrate_capabiltiies" to list them.

Signed-off-by: Wei Yang <[email protected]>
Suggested-by: Dr. David Alan Gilbert <[email protected]>
Message-Id: <20190806003645 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/postcopy: use QEMU_IS_ALIGNED to replace host_offset

Use QEMU_IS_ALIGNED for the check, it would be more consistent with
other align calculations.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20190806004648 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/postcopy: simplify calculation of run_start and fixup_start_addr

The purpose of the calculation is to find a HostPage which is partially
dirty.

  * fixup_start_addr points to the start of the HostPage to discard
  * run_start points to the next HostPage to check

While in the middle stage, there would two cases for run_start:

  * aligned with HostPage means this is not partially dirty
  * not aligned means this is partially dirty

When it is aligned, no work and calculation is necessary. run_start
already points to the start of next HostPage and is ready to continue.

When it is not aligned, the calculation could be simplified with:

  * fixup_start_addr = QEMU_ALIGN_DOWN(run_start, host_ratio)
  * run_start = QEMU_ALIGN_UP(run_start, host_ratio)

By doing so, run_start always points to the next HostPage to check.
fixup_start_addr always points to the HostPage to discard.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20190806004648 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/postcopy: make PostcopyDiscardState a static variable

In postcopy-ram.c, we provide three functions to discard certain
RAMBlock range:

  * postcopy_discard_send_init()
  * postcopy_discard_send_range()
  * postcopy_discard_send_finish()

Currently, we allocate/deallocate PostcopyDiscardState for each RAMBlock
on sending discard information to destination. This is not necessary and
the same data area could be reused for each RAMBlock.

This patch defines PostcopyDiscardState a static variable. By doing so:

  1) avoid memory allocation and deallocation to the system
  2) avoid potential failure of memory allocation
  3) hide some details for their users

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20190724010721 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: extract ram_load_precopy

After cleanup, it would be clear to audience there are two cases
ram_load:

* precopy
* postcopy

And it is not necessary to check postcopy_running on each iteration for
precopy.

Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Message-Id: <20190725002023 [email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: return -EINVAL directly when version_id mismatch

It is not reasonable to continue when version_id mismatch.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20190722075339 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: equation is more proper than and to check LOADVM_QUIT

LOADVM_QUIT allows a command to quit all layers of nested loadvm loops,
while current return value check is not that proper even it works now.

Current return value check "ret & LOADVM_QUIT" would return true if
bit[0] is 1. This would be true when ret is -1 which is used to indicate
an error of handling a command.

Since there is only one place return LOADVM_QUIT and no other
combination of return value, use "ret == LOADVM_QUIT" would be more
proper.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20190718064257 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: just pass RAMBlock is enough

RAMBlock->used_length is always passed to migration_bitmap_sync_range(),
which could be retrieved from RAMBlock.

Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20190718012547 [email protected]>
Reviewed-by: Peter Xu <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: use migration_in_postcopy() to check POSTCOPY_ACTIVE

Use common helper function to check the state.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20190719071129 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/postcopy: start_postcopy could be true only when migrate_postcopy() return true

There is only one place to set start_postcopy to true,
qmp_migrate_start_postcopy(), which make sure start_postcopy could be
set to true when migrate_postcopy() return true.

So start_postcopy is true implies the other one.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20190718083747 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/postcopy: PostcopyState is already set in loadvm_postcopy_handle_advise()

PostcopyState is already set to ADVISE at the beginning of
loadvm_postcopy_handle_advise().

Remove the redundant set.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20190711080816 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/savevm: move non SaveStateEntry condition check out of iteration

in_postcopy and iterable_only are not SaveStateEntry specific, it would
be more proper to check them out of iteration.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20190709140924 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/savevm: split qemu_savevm_state_complete_precopy() into two parts

This is a preparation patch for further cleanup.

No functional change, just wrap two major part of
qemu_savevm_state_complete_precopy() into function.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20190709140924 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/savevm: flush file for iterable_only case

It would be proper to flush file even for iterable_only case.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20190709140924 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/postcopy: do_fixup is true when host_offset is non-zero

This means it is not necessary to spare an extra variable to hold this
condition. Use host_offset directly is fine.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20190710050814 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/postcopy: reduce one operation to calculate fixup_start_addr

Use the same way for run_end to calculate run_start, which saves one
operation.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20190710050814 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/postcopy: discard_length must not be 0

Since we break the loop when there is no more page to discard, we are
sure the following process would find some page to discard.

It is not necessary to check it again.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20190627020822 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/postcopy: break the loop when there is no more page to discard

When one is equal or bigger then end, it means there is no page to
discard. Just break the loop in this case instead of processing it.

No functional change, just refactor it a little.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20190627020822 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration/postcopy: the valid condition is one less then end

If one equals end, it means we have gone through the whole bitmap.

Use a more restrict check to skip a unnecessary condition.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20190627020822 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: consolidate time info into populate_time_info

Consolidate time information fill up into its function for better
readability.

Signed-off-by: Wei Yang <[email protected]>
Message-Id: <20190716005411 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

hw/net: fix vmxnet3 live migration

At some point vmxnet3 live migration stopped working and git-bisect
didn't help finding a working version.
The issue is the PCI configuration space is not being migrated
successfully and MSIX remains masked at destination.

Remove the migration differentiation between PCI and PCIe since
the logic resides now inside VMSTATE_PCI_DEVICE.
Remove also the VMXNET3_COMPAT_FLAG_DISABLE_PCIE based differentiation
since at 'realize' time is decided if the device is PCI or PCIe,
then the above macro is enough.

Use the opportunity to move to the standard VMSTATE_MSIX
instead of the deprecated SaveVMHandlers.

Signed-off-by: Marcel Apfelbaum <[email protected]>
Message-Id: <20190705010711 [email protected]>
Tested-by: Sukrit Bhatnagar <[email protected]>
Reviewed-by: Dmitry Fleytman <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

migration: Add error_desc for file channel errors

Currently, there is no information about error if outgoing migration was failed
because of file channel errors.
Example (QMP session):
-> { "execute": "migrate", "arguments": { "uri": "exec:head -c 1" }}
<- { "return": {} }
...
-> { "execute": "query-migrate" }
<- { "return": { "status": "failed" }} // There is not error's description

And even in the QEMU's output there is nothing.

This patch
1) Adds errp for the most of QEMUFileOps
2) Adds qemu_file_get_error_obj/qemu_file_set_error_obj
3) And finally using of qemu_file_get_error_obj in migration.c

And now, the status for the mentioned fail will be:
-> { "execute": "query-migrate" }
<- { "return": { "status": "failed",
"error-desc": "Unable to write to command: Broken pipe" }}

Signed-off-by: Yury Kotov <[email protected]>
Message-Id: <20190422103420 [email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

Update version for v4.1.0-rc5 release

Signed-off-by: Peter Maydell <[email protected]>

riscv: roms: Fix make rules for building sifive_u bios

Currently the make rules are wrongly using qemu/virt opensbi image
for sifive_u machine. Correct it.

Signed-off-by: Bin Meng <[email protected]>
Reviewed-by: Chih-Min Chao <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Message-id: 1564812484 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.1-20190813' into staging

ppc patch queue 2019-08-13 (last minute qemu-4.1 fixes)

Here's a very, very last minute pull request for qemu-4.1.  This fixes
two nasty bugs with the XIVE interrupt controller in "dual" mode
(where the guest decides which interrupt controller it wants to use).
One occurs when resetting the guest while I/O is active, and the other
with migration of hotplugged CPUs.

The timing here is very unfortunate.  Alas, we only spotted these bugs
very late, and I was sick last week, delaying analysis and fix even
further.

This series hasn't had nearly as much testing as I'd really like, but
I'd still like to squeeze it into qemu-4.1 if possible, since
definitely fixing two bad bugs seems like an acceptable tradeoff for
the risk of introducing different bugs.

# gpg: Signature made Tue 13 Aug 2019 07:56:42 BST
# gpg:                using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <[email protected]>" [full]
# gpg:                 aka "David Gibson (Red Hat) <[email protected]>" [full]
# gpg:                 aka "David Gibson (ozlabs.org) <[email protected]>" [full]
# gpg:                 aka "David Gibson (kernel.org) <[email protected]>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-4.1-20190813:
  spapr/xive: Fix migration of hot-plugged CPUs
  spapr: Reset CAS & IRQ subsystem after devices

Signed-off-by: Peter Maydell <[email protected]>

spapr/xive: Fix migration of hot-plugged CPUs

The migration sequence of a guest using the XIVE exploitation mode
relies on the fact that the states of all devices are restored before
the machine is. This is not true for hot-plug devices such as CPUs
which state come after the machine. This breaks migration because the
thread interrupt context registers are not correctly set.

Fix migration of hotplugged CPUs by restoring their context in the
'post_load' handler of the XiveTCTX model.

Fixes: 277dd3d7712a ("spapr/xive: add migration support for KVM")
Signed-off-by: Cédric Le Goater <[email protected]>
Message-Id: <20190813064853 [email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr: Reset CAS & IRQ subsystem after devices

This fixes a nasty regression in qemu-4.1 for the 'pseries' machine,
caused by the new "dual" interrupt controller model.  Specifically,
qemu can crash when used with KVM if a 'system_reset' is requested
while there's active I/O in the guest.

The problem is that in spapr_machine_reset() we:

1. Reset the CAS vector state
spapr_ovec_cleanup(spapr->ov5_cas);

2. Reset all devices
qemu_devices_reset()

3. Reset the irq subsystem
spapr_irq_reset();

However (1) implicitly changes the interrupt delivery mode, because
whether we're using XICS or XIVE depends on the CAS state.  We don't
properly initialize the new irq mode until (3) though - in particular
setting up the KVM devices.

During (2), we can temporarily drop the BQL allowing some irqs to be
delivered which will go to an irq system that's not properly set up.

Specifically, if the previous guest was in (KVM) XIVE mode, the CAS
reset will put us back in XICS mode.  kvm_kernel_irqchip() still
returns true, because XIVE was using KVM, however XICs doesn't have
its KVM components intialized and kernel_xics_fd == -1.  When the irq
is delivered it goes via ics_kvm_set_irq() which assert()s that
kernel_xics_fd != -1.

This change addresses the problem by delaying the CAS reset until
after the devices reset.  The device reset should quiesce all the
devices so we won't get irqs delivered while we mess around with the
IRQ.  The CAS reset and irq re-initialize should also now be under the
same BQL critical section so nothing else should be able to interrupt
it either.

We also move the spapr_irq_msi_reset() used in one of the legacy irq
modes, since it logically makes sense at the same point as the
spapr_irq_reset() (it's essentially an equivalent operation for older
machine types).  Since we don't need to switch between different
interrupt controllers for those old machine types it shouldn't
actually be broken in those cases though.

Cc: Cédric Le Goater <[email protected]>
Fixes: b2e22477 "spapr: add a 'reset' method to the sPAPR IRQ backend"
Fixes: 13db0cd9 "spapr: introduce a new sPAPR IRQ backend supporting
                 XIVE and XICS"
Signed-off-by: David Gibson <[email protected]>

display/bochs: fix pcie support

Set QEMU_PCI_CAP_EXPRESS unconditionally in init(), then clear it in
realize() in case the device is not connected to a PCIe bus.

This makes sure the pci config space allocation is big enough, so
accessing the PCIe extended config space doesn't overflow the pci
config space buffer.

PCI(e) config space is guest writable.  Writes are limited by
write mask (which probably is also filled with random stuff),
so the guest can only flip enabled bits.  But I suspect it
still might be exploitable, so rather serious because it might
be a host escape for the guest.  On the other hand the device
is probably not yet in widespread use.

(For a QEMU version without this commit, a mitigation for the
bug is available: use "-device bochs-display" as a conventional pci
device only.)

Cc: [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>
Message-id: 20190812065221 [email protected]
Reviewed-by: Alex Williamson <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

Update version for v4.1.0-rc4 release

Signed-off-by: Peter Maydell <[email protected]>

compat: disable edid on virtio-gpu base device

'edid' is a property of the virtio-gpu base device, so turning
it off on virtio-gpu-pci is not enough (it misses -ccw). Turn
it off on the base device instead.

Fixes: 0a71966253c8 ("edid: flip the default to enabled")
Signed-off-by: Cornelia Huck <[email protected]>
Reviewed-by: Gerd Hoffmann <[email protected]>
Message-id: 20190806115819 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2019-08-06' into staging

Block patches for 4.1.0-rc4:
- Fix the backup block job when using copy offloading
- Fix the mirror block job when using the write-blocking copy mode
- Fix incremental backups after the image has been grown with the
  respective bitmap attached to it

# gpg: Signature made Tue 06 Aug 2019 12:57:07 BST
# gpg:                using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Max Reitz <[email protected]>" [full]
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* remotes/maxreitz/tags/pull-block-2019-08-06:
  block/backup: disable copy_range for compressed backup
  iotests: Test unaligned blocking mirror write
  mirror: Only mirror granularity-aligned chunks
  iotests: Test incremental backup after truncation
  util/hbitmap: update orig_size on truncate
  iotests: Test backup job with two guest writes
  backup: Copy only dirty areas

Signed-off-by: Peter Maydell <[email protected]>

block/backup: disable copy_range for compressed backup

Enabled by default copy_range ignores compress option. It's definitely
unexpected for user.

It's broken since introduction of copy_range usage in backup in
9ded4a011496.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-id: 20190730163251 [email protected]
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Cc: [email protected]
Signed-off-by: Max Reitz <[email protected]>

iotests: Test unaligned blocking mirror write

Signed-off-by: Max Reitz <[email protected]>
Message-id: 20190805113526 [email protected]
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

mirror: Only mirror granularity-aligned chunks

In write-blocking mode, all writes to the top node directly go to the
target. We must only mirror chunks of data that are aligned to the
job's granularity, because that is how the dirty bitmap works.
Therefore, the request alignment for writes must be the job's
granularity (in write-blocking mode).

Unfortunately, this forces all reads and writes to have the same
granularity (we only need this alignment for writes to the target, not
the source), but that is something to be fixed another time.

Cc: [email protected]
Signed-off-by: Max Reitz <[email protected]>
Message-id: 20190805153308 [email protected]
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Fixes: d06107ade0ce74dc39739bac80de84b51ec18546
Signed-off-by: Max Reitz <[email protected]>

iotests: Test incremental backup after truncation

Signed-off-by: Max Reitz <[email protected]>
Message-id: 20190805152840 [email protected]
Signed-off-by: Max Reitz <[email protected]>

util/hbitmap: update orig_size on truncate

Without this, hbitmap_next_zero and hbitmap_next_dirty_area are broken
after truncate. So, orig_size is broken since it's introduction in
76d570dc495c56bb.

Fixes: 76d570dc495c56bb
Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-id: 20190805120120 [email protected]
Reviewed-by: Max Reitz <[email protected]>
Cc: [email protected]
Signed-off-by: Max Reitz <[email protected]>

iotests: Test backup job with two guest writes

Perform two guest writes to not yet backed up areas of an image, where
the former touches an inner area of the latter.

Before HEAD^, copy offloading broke this in two ways:
(1) The target image differs from the reference image (what the source
    was when the backup started).
(2) But you will not see that in the failing output, because the job
    offset is reported as being greater than the job length.  This is
    because one cluster is copied twice, and thus accounted for twice,
    but of course the job length does not increase.

Signed-off-by: Max Reitz <[email protected]>
Message-id: 20190801173900 [email protected]
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Tested-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

backup: Copy only dirty areas

The backup job must only copy areas that the copy_bitmap reports as
dirty. This is always the case when using traditional non-offloading
backup, because it copies each cluster separately. When offloading the
copy operation, we sometimes copy more than one cluster at a time, but
we only check whether the first one is dirty.

Therefore, whenever copy offloading is possible, the backup job
currently produces wrong output when the guest writes to an area of
which an inner part has already been backed up, because that inner part
will be re-copied.

Fixes: 9ded4a0114968e98b41494fc035ba14f84cdf700
Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-id: 20190801173900 [email protected]
Cc: [email protected]
Signed-off-by: Max Reitz <[email protected]>

Merge remote-tracking branch 'remotes/philmd-gitlab/tags/edk2-next-20190803' into staging

A harmless build-sys patch that fixes a regression affecting Linux
distributions packaging QEMU.

# gpg: Signature made Sat 03 Aug 2019 09:24:15 BST
# gpg:                using RSA key E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <[email protected]>" [full]
# Primary key fingerprint: FAAB E75E 1291 7221 DCFD  6BB2 E3E3 2C2C DEAD C0DE

* remotes/philmd-gitlab/tags/edk2-next-20190803:
  Makefile: remove DESTDIR from firmware file content

Signed-off-by: Peter Maydell <[email protected]>

Makefile: remove DESTDIR from firmware file content

The resulting firmware files should only contain the runtime path.
Fixes commit 26ce90fde5c ("Makefile: install the edk2 firmware images
and their descriptors")

Signed-off-by: Olaf Hering <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Laszlo Ersek <[email protected]>
Message-Id: <20190530192812 [email protected]>
Fixes: https://bugs.launchpad.net/qemu/+bug/1838703
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>

target/arm: Avoid bogus NSACR traps on M-profile without Security Extension

In Arm v8.0 M-profile CPUs without the Security Extension and also in
v7M CPUs, there is no NSACR register. However, the code we have to handle
the FPU does not always check whether the ARM_FEATURE_M_SECURITY bit
is set before testing whether env->v7m.nsacr permits access to the
FPU. This means that for a CPU with an FPU but without the Security
Extension we would always take a bogus fault when trying to stack
the FPU registers on an exception entry.

We could fix this by adding extra feature bit checks for all uses,
but it is simpler to just make the internal value of nsacr 0xcff
("all non-secure accesses allowed"), since this is not guest
visible when the Security Extension is not present. This allows
us to continue to follow the Arm ARM pseudocode which takes a
similar approach. (In particular, in the v8.1 Arm ARM the register
is documented as reading as 0xcff in this configuration.)

Fixes: https://bugs.launchpad.net/qemu/+bug/1838475
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Damien Hedde <[email protected]>
Message-id: 20190801105742 [email protected]

Merge remote-tracking branch 'remotes/elmarco/tags/slirp-CVE-2019-14378-pull-request' into staging

Slirp CVE-2019-14378 pull request

# gpg: Signature made Fri 02 Aug 2019 12:17:24 BST
# gpg:                using RSA key 87A9BD933F87C606D276F62DDAE8E10975969CE5
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Marc-André Lureau <[email protected]>" [full]
# gpg:                 aka "Marc-André Lureau <[email protected]>" [full]
# Primary key fingerprint: 87A9 BD93 3F87 C606 D276  F62D DAE8 E109 7596 9CE5

* remotes/elmarco/tags/slirp-CVE-2019-14378-pull-request:
  slirp: update with CVE-2019-14378 fix

Signed-off-by: Peter Maydell <[email protected]>

slirp: update with CVE-2019-14378 fix

Signed-off-by: Marc-André Lureau <[email protected]>

Update version for v4.1.0-rc3 release

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pci: bugfix

A last minute fix to cross-version migration.
Better late than never.

Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Tue 30 Jul 2019 17:07:42 BST
# gpg:                using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>" [full]
# gpg:                 aka "Michael S. Tsirkin <[email protected]>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream:
  pcie_root_port: Disable ACS on older machines
  pcie_root_port: Allow ACS to be disabled

Signed-off-by: Peter Maydell <[email protected]>

pcie_root_port: Disable ACS on older machines

ACS got added in 4.0 unconditionally,  that broke older<->4.0 migration
where there was a PCIe root port.
Fix this by turning it off for 3.1 and older machines; note this
fixes compatibility for older QEMUs but breaks compatibility with 4.0
for older machine types.

    machine type    source qemu   dest qemu
       3.1             3.1           4.0        broken
       3.1             3.1           4.1rc2     broken
       3.1             3.1           4.1+this   OK ++
       3.1             4.0           4.1rc2     OK
       3.1             4.0           4.1+this   broken --
       4.0             4.0           4.1rc2     OK
       4.0             4.0           4.1+this   OK

So we gain and lose; the consensus seems to be treat this as a
fix for older machine types.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Message-Id: <20190730093719 [email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pcie_root_port: Allow ACS to be disabled

ACS was added in 4.0 unconditionally, this breaks migration
compatibility.
Allow ACS to be disabled by adding a property that's
checked by pcie_root_port.

Unfortunately pcie-root-port doesn't have any instance data,
so there's no where for that flag to live, so stuff it into
PCIESlot.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Message-Id: <20190730093719 [email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

target/arm: Deliver BKPT/BRK exceptions to correct exception level

Most Arm architectural debug exceptions (eg watchpoints) are ignored
if the configured "debug exception level" is below the current
exception level (so for example EL1 can't arrange to get debug exceptions
for EL2 execution). Exceptions generated by the BRK or BPKT instructions
are a special case -- they must always cause an exception, so if
we're executing above the debug exception level then we
must take them to the current exception level.

This fixes a bug where executing BRK at EL2 could result in an
exception being taken at EL1 (which is strictly forbidden by the
architecture).

Fixes: https://bugs.launchpad.net/qemu/+bug/1838277
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: 20190730132522 [email protected]

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- fdc: Fix inserting read-only media in empty drive

# gpg: Signature made Tue 30 Jul 2019 16:32:14 BST
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  iotests/118: Test inserting a read-only medium
  fdc: Fix inserting read-only media in empty drive

Signed-off-by: Peter Maydell <[email protected]>

iotests/118: Test inserting a read-only medium

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: John Snow <[email protected]>

fdc: Fix inserting read-only media in empty drive

In order to insert a read-only medium (i.e. a read-only block node) to
the BlockBackend of a floppy drive, we must not have taken write
permissions on that BlockBackend, or the operation will fail with the
error message "Block node is read-only".

The device already takes care to remove all permissions when the medium
is ejected, but the state isn't correct if the drive is initially empty:
It uses blk_is_read_only() to check whether write permissions should be
taken, but this function returns false for empty BlockBackends in the
common case.

Fix floppy_drive_realize() to avoid taking write permissions if the
drive is empty.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: John Snow <[email protected]>

Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2019-07-30' into staging

Block patch for 4.1.0-rc3:
- Fix CID 1403771 in block/nvme.c

# gpg: Signature made Tue 30 Jul 2019 13:51:52 BST
# gpg:                using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Max Reitz <[email protected]>" [full]
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* remotes/maxreitz/tags/pull-block-2019-07-30:
  nvme: Limit blkshift to 12 (for 4 kB blocks)

Signed-off-by: Peter Maydell <[email protected]>

nvme: Limit blkshift to 12 (for 4 kB blocks)

Linux does not support blocks greater than 4 kB anyway, so we might as
well limit blkshift to 12 and thus save us from some potential trouble.

Reported-by: Peter Maydell <[email protected]>
Suggested-by: Maxim Levitsky <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Message-id: 20190730114812 [email protected]
Reviewed-by: Maxim Levitsky <[email protected]>
Coverity: CID 1403771
Signed-off-by: Max Reitz <[email protected]>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- scsi-cd: Fix inserting read-only media in empty drive
- block/copy-on-read: Fix permissions for inactive node
- Test case fixes

# gpg: Signature made Tue 30 Jul 2019 12:21:48 BST
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  scsi-cd: Fix inserting read-only media in empty drive
  block/copy-on-read: Fix permissions for inactive node
Fixes: add read-zeroes to 051.out
  tests/multiboot: Fix load address of test kernels

Signed-off-by: Peter Maydell <[email protected]>

scsi-cd: Fix inserting read-only media in empty drive

scsi-disks decides whether it has a read-only device by looking at
whether the BlockBackend specified as drive=... is read-only. In the
case of an anonymous BlockBackend (with a node name specified in
drive=...), this is the read-only flag of the attached node. In the case
of an empty anonymous BlockBackend, it's always read-write because
nothing prevented it from being read-write.

This is a problem because scsi-cd would take write permissions on the
anonymous BlockBackend of an empty drive created without a drive=...
option. Using blockdev-insert-medium with a read-only node fails then
with the error message "Block node is read-only".

Fix scsi_realize() so that scsi-cd devices always take read-only
permissions on their BlockBackend instead.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1733920
Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>

block/copy-on-read: Fix permissions for inactive node

The copy-on-read drive must not request the WRITE_UNCHANGED permission
for its child if the node is inactive, otherwise starting a migration
destination with -incoming will fail because the child cannot provide
write access yet:

qemu-system-x86_64: -blockdev copy-on-read,file=img,node-name=cor: Block node is read-only

Earlier QEMU versions additionally ran into an abort() on the migration
source side: bdrv_inactivate_recurse() failed to update permissions.
This is silently ignored today because it was only supposed to loosen
restrictions. This is the symptom that was originally reported here:

https://bugzilla.redhat.com/show_bug.cgi?id=1733022

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Max Reitz <[email protected]>

Fixes: add read-zeroes to 051.out
The patch "iotests: Set read-zeroes on in null block driver for Valgrind"
with the commit ID a6862418fec4072 needs the change in 051.out when
compared against on the s390 system.

Fixes: a6862418fec40727b392c86dc13d9ec980efcb15
Reported-by: Christian Borntraeger <[email protected]>
Signed-off-by: Andrey Shinkevich <[email protected]>
Tested-by: Christian Borntraeger <[email protected]>
Reviewed-by: John Snow <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

tests/multiboot: Fix load address of test kernels

While older toolchains produced binaries where the physical load address
of ELF segments was the same as the virtual address, newer versions seem
to choose a different physical address if it isn't specified explicitly.
The means that the test kernel doesn't use the right addresses to access
e.g. format strings any more and the whole output disappears, causing
all test cases to fail.

Fix this by specifying the physical load address of sections explicitly.

Signed-off-by: Kevin Wolf <[email protected]>