Git Repo - qemu.git/log

hw/arm/xlnx-zynqmp: Mark the "xlnx, zynqmp" device with user_creatable = false

The device uses serial_hds in its realize function and thus can't be
used twice. Apart from that, the comma in its name makes it quite hard
to use for the user anyway, since a comma is normally used to separate
the device name from its properties when using the "-device" parameter
or the "device_add" HMP command.

Signed-off-by: Thomas Huth <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Message-id: 1506441116 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

hw/sd: fix out-of-bounds check for multi block reads

The current code checks if the next block exceeds the size of the card.
This generates an error while reading the last block of the card.
Do the out-of-bounds check when starting to read a new block to fix this.

This issue became visible with increased error checking in Linux 4.13.

Cc: [email protected]
Signed-off-by: Michael Olbrich <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Message-id: 20170916091611 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

arm: Fix SMC reporting to EL2 when QEMU provides PSCI

This properly forwards SMC events to EL2 when PSCI is provided by QEMU
itself and, thus, ARM_FEATURE_EL3 is off.

Found and tested with the Jailhouse hypervisor. Solution based on
suggestions by Peter Maydell.

Signed-off-by: Jan Kiszka <[email protected]>
Message-id: 4f243068-aaea-776f-d18f-f9e05e7be9cd@siemens.com
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'mreitz/tags/pull-block-2017-10-06' into queue-block

Block patches

# gpg: Signature made Fri Oct  6 16:30:57 2017 CEST
# gpg:                using RSA key F407DB0061D5CF40
# gpg: Good signature from "Max Reitz <[email protected]>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* mreitz/tags/pull-block-2017-10-06:
  block/mirror: check backing in bdrv_mirror_top_flush
  qcow2: truncate the tail of the image file after shrinking the image
  qcow2: fix return error code in qcow2_truncate()
  iotests: Fix 195 if IMGFMT is part of TEST_DIR
  block/mirror: check backing in bdrv_mirror_top_refresh_filename
  block: support passthrough of BDRV_REQ_FUA in crypto driver
  block: convert qcrypto_block_encrypt|decrypt to take bytes offset
  block: convert crypto driver to bdrv_co_preadv|pwritev
  block: fix data type casting for crypto payload offset
  crypto: expose encryption sector size in APIs
  block: use 1 MB bounce buffers for crypto instead of 16KB

Signed-off-by: Kevin Wolf <[email protected]>

block/mirror: check backing in bdrv_mirror_top_flush

Backing may be zero after failed bdrv_append in mirror_start_job,
which leads to SIGSEGV.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-id: 20170929152255 [email protected]
Signed-off-by: Max Reitz <[email protected]>

qcow2: truncate the tail of the image file after shrinking the image

Now after shrinking the image, at the end of the image file, there might be a
tail that probably will never be used. So we can find the last used cluster and
cut the tail.

Signed-off-by: Pavel Butsykin <[email protected]>
Reviewed-by: John Snow <[email protected]>
Message-id: 20170929121613 [email protected]
Signed-off-by: Max Reitz <[email protected]>

qcow2: fix return error code in qcow2_truncate()

Signed-off-by: Pavel Butsykin <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Message-id: 20170929121613 [email protected]
Signed-off-by: Max Reitz <[email protected]>

iotests: Fix 195 if IMGFMT is part of TEST_DIR

do_run_qemu() in iotest 195 first applies _filter_imgfmt when printing
qemu's command line and _filter_testdir only afterwards. Therefore, if
the image format is part of the test directory path, _filter_testdir
will no longer apply and the actual output will differ from the
reference output even in case of success.

For example, TEST_DIR might be "/tmp/test-qcow2", in which case
_filter_imgfmt first transforms this to "/tmp/test-IMGFMT" which is no
longer recognized as the TEST_DIR by _filter_testdir.

Fix this by not applying _filter_imgfmt in do_run_qemu() but in
run_qemu() instead, and only after _filter_testdir.

Signed-off-by: Max Reitz <[email protected]>
Message-id: 20170927211334 [email protected]
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

block/mirror: check backing in bdrv_mirror_top_refresh_filename

Backing may be zero after failed bdrv_attach_child in
bdrv_set_backing_hd, which leads to SIGSEGV.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-id: 20170928120300 [email protected]
Reviewed-by: John Snow <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

block: support passthrough of BDRV_REQ_FUA in crypto driver

The BDRV_REQ_FUA flag can trivially be allowed in the crypt driver
as a passthrough to the underlying block driver.

Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20170927125340 [email protected]
Signed-off-by: Max Reitz <[email protected]>

block: convert qcrypto_block_encrypt|decrypt to take bytes offset

Instead of sector offset, take the bytes offset when encrypting
or decrypting data.

Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20170927125340 [email protected]
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

block: convert crypto driver to bdrv_co_preadv|pwritev

Make the crypto driver implement the bdrv_co_preadv|pwritev
callbacks, and also use bdrv_co_preadv|pwritev for I/O
with the protocol driver beneath. This replaces sector based
I/O with byte based I/O, and allows us to stop assuming the
physical sector size matches the encryption sector size.

Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20170927125340 [email protected]
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

block: fix data type casting for crypto payload offset

The crypto APIs report the offset of the data payload as an uint64_t
type, but the block driver is casting to size_t or ssize_t which will
potentially truncate.

Most of the block APIs use int64_t for offsets meanwhile, so even if
using uint64_t in the crypto block driver we are still at risk of
truncation.

Change the block crypto driver to use uint64_t, but add asserts that
the value is less than INT64_MAX.

Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20170927125340 [email protected]
Signed-off-by: Max Reitz <[email protected]>

crypto: expose encryption sector size in APIs

While current encryption schemes all have a fixed sector size of
512 bytes, this is not guaranteed to be the case in future. Expose
the sector size in the APIs so the block layer can remove assumptions
about fixed 512 byte sectors.

Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20170927125340 [email protected]
Signed-off-by: Max Reitz <[email protected]>

block: use 1 MB bounce buffers for crypto instead of 16KB

Using 16KB bounce buffers creates a significant performance
penalty for I/O to encrypted volumes on storage which high
I/O latency (rotating rust & network drives), because it
triggers lots of fairly small I/O operations.

On tests with rotating rust, and cache=none|directsync,
write speed increased from 2MiB/s to 32MiB/s, on a par
with that achieved by the in-kernel luks driver. With
other cache modes the in-kernel driver is still notably
faster because it is able to report completion of the
I/O request before any encryption is done, while the
in-QEMU driver must encrypt the data before completion.

Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: 20170927125340 [email protected]
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

iotests: Add test 197 for covering copy-on-read

Add a test for qcow2 copy-on-read behavior, including exposure
for the just-fixed bugs.

The copy-on-read behavior is always to a qcow2 image, but the
test is careful to allow running with most image protocol/format
combos as the backing file being copied from (luks being the
exception, as it is harder to pass the right secret to all the
right places).  In fact, for './check nbd', this appears to be
the first time we've had a qcow2 image wrapping NBD, requiring
an additional line in _filter_img_create to match the similar
line in _filter_img_info.

Invoking blkdebug to prove we don't write too much took some
effort to get working; and it requires that $TEST_WRAP (based
on $TEST_DIR) not be subject to word splitting.  We may decide
later to have the entire iotests suite use relative rather than
absolute names, to avoid problems inherited by the absolute
name of $PWD or $TEST_DIR, at which point the sanity check in
this commit could be simplified.

This test requires at least 2G of consecutive memory to succeed;
as such, it is prone to spurious failures, particularly on
32-bit machines under load.  This situation is detected and
triggers an early exit to skip the test, rather than a failure.
To manually provoke this setup on a beefier machine, I used:
  $ (ulimit -S -v 1000000; ./check -qcow2 197)

Signed-off-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Perform copy-on-read in loop

Improve our braindead copy-on-read implementation.  Pre-patch,
we have multiple issues:
- we create a bounce buffer and perform a write for the entire
request, even if the active image already has 99% of the
clusters occupied, and really only needs to copy-on-read the
remaining 1% of the clusters
- our bounce buffer was as large as the read request, and can
needlessly exhaust our memory by using double the memory of
the request size (the original request plus our bounce buffer),
rather than a capped maximum overhead beyond the original
- if a driver has a max_transfer limit, we are bypassing the
normal code in bdrv_aligned_preadv() that fragments to that
limit, and instead attempt to read the entire buffer from the
driver in one go, which some drivers may assert on
- a client can request a large request of nearly 2G such that
rounding the request out to cluster boundaries results in a
byte count larger than 2G.  While this cannot exceed 32 bits,
it DOES have some follow-on problems:
-- the call to bdrv_driver_pread() can assert for exceeding
BDRV_REQUEST_MAX_BYTES, if the driver is old and lacks
.bdrv_co_preadv
-- if the buffer is all zeroes, the subsequent call to
bdrv_co_do_pwrite_zeroes is a no-op due to a negative size,
which means we did not actually copy on read

Fix all of these issues by breaking up the action into a loop,
where each iteration is capped to sane limits.  Also, querying
the allocation status allows us to optimize: when data is
already present in the active layer, we don't need to bounce.

Note that the code has a telling comment that copy-on-read
should probably be a filter driver rather than a bolt-on hack
in io.c; but that remains a task for another day.

CC: [email protected]
Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Add blkdebug hook for copy-on-read

Make it possible to inject errors on writes performed during a
read operation due to copy-on-read semantics.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests: Restore stty settings on completion

Executing qemu with a terminal as stdin will temporarily alter stty
settings on that terminal (for example, disabling echo), because of
how we run both the monitor and any multiplexing with guest input.
Normally, qemu restores the original settings on exit; but if an
iotest triggers qemu to abort in the middle, we can be left with
the altered terminal setup.  This can make life very annoying when
debugging an iotest failure (not everyone remembers the trick of
blind-typing 'stty sane' without echo, and some people prefer
terminal settings that are slightly different than the defaults
picked by 'stty sane').

It is possible to avoid qemu corrupting the terminal by not passing
a terminal to qemu's stdin in the first place (as in, use
'./check ... </dev/null'), but that's extra typing to have to
remember.  But running 'exec </dev/null' in the harness seems like
it might be too heavy of a hammer.  So I instead went the the
solution of saving and restoring the stty settings, only when the
harness detects that it is run interactively.

I tested this patch by forcing an allocation failure (I can't
guarantee that this particular limit will work on all setups, but
it shows the idea):
$ (ulimit -S -v 500000; ./check -qcow2 1)

Signed-off-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Uniform handling of 0-length bdrv_get_block_status()

Handle a 0-length block status request up front, with a uniform
return value claiming the area is not allocated.

Most callers don't pass a length of 0 to bdrv_get_block_status()
and friends; but it definitely happens with a 0-length read when
copy-on-read is enabled. While we could audit all callers to
ensure that they never make a 0-length request, and then assert
that fact, it was just as easy to fix things to always report
success (as long as the callers are careful to not go into an
infinite loop). However, we had inconsistent behavior on whether
the status is reported as allocated or defers to the backing
layer, depending on what callbacks the driver implements, and
possibly wasting quite a few CPU cycles to get to that answer.
Consistently reporting unallocated up front doesn't really hurt
anything, and makes it easier both for callers (0-length requests
now have well-defined behavior) and for drivers (drivers don't
have to deal with 0-length requests).

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-io: Add -C for opening with copy-on-read

Make it easier to enable copy-on-read during iotests, by
exposing a new bool option to main and open.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

commit: Remove overlay_bs

We don't need to make any assumptions about the graph layout above the
top node of the commit operation any more. Remove the use of
bdrv_find_overlay() and related variables from the commit job code.

bdrv_drop_intermediate() doesn't use the 'active' parameter any more, so
we can just drop it.

The overlay node was previously added to the block job to get a
BLK_PERM_GRAPH_MOD. We really need to respect those permissions in
bdrv_drop_intermediate() now, but as long as we haven't figured out yet
how BLK_PERM_GRAPH_MOD is actually supposed to work, just leave a TODO
comment there.

With this change, it is now possible to perform another block job on an
overlay node without conflicts. qemu-iotests 030 is changed accordingly.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Eric Blake <[email protected]>

qemu-iotests: Test commit block job where top has two parents

Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: Allow QMP pretty printing in common.qemu

QMP responses to certain commands can become quite long, which doesn't
only make reading them hard, but also means that the maximum line length
in patch emails can be exceeded. Allow tests to switch to QMP pretty
printing, which results in more, but shorter lines.

We also need to make sure to keep indentation in the response for this
to work as expected.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Eric Blake <[email protected]>

commit: Support multiple roots above top node

This changes the commit block job to support operation in a graph where
there is more than a single active layer that references the top node.

This involves inserting the commit filter node not only on the path
between the given active node and the top node, but between the top node
and all of its parents.

On completion, bdrv_drop_intermediate() must consider all parents for
updating the backing file link. These parents may be backing files
themselves and as such read-only; reopen them temporarily if necessary.
Previously this was achieved by the bdrv_reopen() calls in the commit
block job that made overlay_bs read-write for the whole duration of the
block job, even though write access is only needed on completion.

Now that we consider all parents, overlay_bs is meaningless. It is left
in place in this commit, but we'll remove it soon.

Signed-off-by: Kevin Wolf <[email protected]>

block: Introduce BdrvChildRole.update_filename

There is no good reason for bdrv_drop_intermediate() to know the active
layer above the subchain it is operating on - even more so, because
the assumption that there is a single active layer above it is not
generally true.

In order to prepare removal of the active parameter, use a BdrvChildRole
callback to update the backing file string in the overlay image instead
of directly calling bdrv_change_backing_file().

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Eric Blake <[email protected]>

qemu-iotests: merge "check" and "common"

"check" is full of qemu-iotests--specific details. Separating it
from "common" does not make much sense anymore.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: get rid of $iam

The variable is almost unused, and one of the two uses is actually
uninitialized.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: fix uninitialized variable

The variable is used in "common" but defined only after the file
is sourced.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: disintegrate more parts of common.config

Split "check" parts from tests part.

For the directory setup, the actual computation of directories goes
in "check", while the sanity checks go in the tests.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: do not include common.rc in "check"

It only provides functions used by the test programs.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: limit non-_PROG-suffixed variables to common.rc

These are never used by "check", with one exception that does not need
$QEMU_OPTIONS. Keep them in common.rc, which will be soon included only
by the tests.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: cleanup and fix search for programs

Instead of ./check failing when a binary is missing, we try each test
case now and each one fails with tons of test case diffs. Also, all the
variables were initialized by "check" prior to "common" being sourced,
and then (uselessly) checked for emptiness again in "check".

Centralize the search for programs in "common" (which will soon be
one with "check"), including the "realpath" invocation which can be done
just once in "check" rather than in the tests.

For qnio_server, move the detection to "common", simplifying
set_prog_path to stop handling the unused second argument, and
embedding the "realpath" pass.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: move "check" code out of common.rc

Some functions in common.rc are never used by the tests. Move
them out of that file and into common, which is already included
only by "check".

Code that actually *is* common to "check" and tests can be placed in
common.config.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: get rid of AWK_PROG

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: remove dead code

This includes shell function, shell variables and command line options
(randomize.awk does not exist).

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hw/block/onenand: Remove dead code block

The condition of the for-loop makes sure that b is always smaller
than s->blocks, so the "if (b >= s->blocks)" statement is completely
superfluous here.

Buglink: https://bugs.launchpad.net/qemu/+bug/1715007
Signed-off-by: Thomas Huth <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

dirty-bitmap: Convert internal hbitmap size/granularity

Now that all callers are using byte-based interfaces, there's no
reason for our internal hbitmap to remain with sector-based
granularity. It also simplifies our internal scaling, since we
already know that hbitmap widens requests out to granularity
boundaries.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

dirty-bitmap: Switch bdrv_set_dirty() to bytes

Both callers already had bytes available, but were scaling to
sectors. Move the scaling to internal code. In the case of
bdrv_aligned_pwritev(), we are now passing the exact offset
rather than a rounded sector-aligned value, but that's okay
as long as dirty bitmap widens start/bytes to granularity
boundaries.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qcow2: Switch store_bitmap_data() to byte-based iteration

Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.

iotests 165 was rather weak - on a default 64k-cluster image, where
bitmap granularity also defaults to 64k bytes, a single cluster of
the bitmap table thus covers (64*1024*8) bits which each cover 64k
bytes, or 32G of image space. But the test only uses a 1G image,
so it cannot trigger any more than one loop of the code in
store_bitmap_data(); and it was writing to the first cluster. In
order to test that we are properly aligning which portions of the
bitmap are being written to the file, we really want to test a case
where the first dirty bit returned by bdrv_dirty_iter_next() is not
aligned to the start of a cluster, which we can do by modifying the
test to write data that doesn't happen to fall in the first cluster
of the image.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qcow2: Switch load_bitmap_data() to byte-based iteration

Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qcow2: Switch qcow2_measure() to byte-based iteration

This is new code, but it is easier to read if it makes passes over
the image using bytes rather than sectors (and will get easier in
the future when bdrv_get_block_status is converted to byte-based).

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

mirror: Switch mirror_dirty_init() to byte-based iteration

Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

dirty-bitmap: Change bdrv_[re]set_dirty_bitmap() to use bytes

Some of the callers were already scaling bytes to sectors; others
can be easily converted to pass byte offsets, all in our shift
towards a consistent byte interface everywhere. Making the change
will also make it easier to write the hold-out callers to use byte
rather than sectors for their iterations; it also makes it easier
for a future dirty-bitmap patch to offload scaling over to the
internal hbitmap. Although all callers happen to pass
sector-aligned values, make the internal scaling robust to any
sub-sector requests.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

dirty-bitmap: Change bdrv_get_dirty_locked() to take bytes

Half the callers were already scaling bytes to sectors; the other
half can eventually be simplified to use byte iteration. Both
callers were already using the result as a bool, so make that
explicit. Making the change also makes it easier for a future
dirty-bitmap patch to offload scaling over to the internal hbitmap.

Remember, asking whether a byte is dirty is effectively asking
whether the entire granularity containing the byte is dirty, since
we only track dirtiness by granularity.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

dirty-bitmap: Change bdrv_get_dirty_count() to report bytes

Thanks to recent cleanups, all callers were scaling a return value
of sectors into bytes; do the scaling internally instead.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

dirty-bitmap: Change bdrv_dirty_iter_next() to report byte offset

Thanks to recent cleanups, most callers were scaling a return value
of sectors into bytes (the exception, in qcow2-bitmap, will be
converted to byte-based iteration later).  Update the interface to
do the scaling internally instead.

In qcow2-bitmap, the code was specifically checking for an error
return of -1.  To avoid a regression, we either have to make sure
we continue to return -1 (rather than a scaled -512) on error, or
we have to fix the caller to treat all negative values as error
rather than just one magic value.  It's easy enough to make both
changes at the same time, even though either one in isolation
would work.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

dirty-bitmap: Set iterator start by offset, not sector

All callers to bdrv_dirty_iter_new() passed 0 for their initial
starting point, drop that parameter.

Most callers to bdrv_set_dirty_iter() were scaling a byte offset to
a sector number; the exception qcow2-bitmap will be converted later
to use byte rather than sector iteration. Move the scaling to occur
internally to dirty bitmap code instead, so that callers now pass
in bytes.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qcow2: Switch sectors_covered_by_bitmap_cluster() to byte-based

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Change the qcow2 bitmap
helper function sectors_covered_by_bitmap_cluster(), renaming it
to bytes_covered_by_bitmap_cluster() in the process.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

dirty-bitmap: Change bdrv_dirty_bitmap_*serialize*() to take bytes

Right now, the dirty-bitmap code exposes the fact that we use
a scale of sector granularity in the underlying hbitmap to anything
that wants to serialize a dirty bitmap.  It's nicer to uniformly
expose bytes as our dirty-bitmap interface, matching the previous
change to bitmap size.  The only caller to serialization is currently
qcow2-cluster.c, which becomes a bit more verbose because it is still
tracking sectors for other reasons, but a later patch will fix that
to more uniformly use byte offsets everywhere.  Likewise, within
dirty-bitmap, we have to add more assertions that we are not
truncating incorrectly, which can go away once the internal hbitmap
is byte-based rather than sector-based.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

dirty-bitmap: Track bitmap size by bytes

We are still using an internal hbitmap that tracks a size in sectors,
with the granularity scaled down accordingly, because it lets us
use a shortcut for our iterators which are currently sector-based.
But there's no reason we can't track the dirty bitmap size in bytes,
since it is (mostly) an internal-only variable (remember, the size
is how many bytes are covered by the bitmap, not how many bytes the
bitmap occupies). A later cleanup will convert dirty bitmap
internals to be entirely byte-based, eliminating the intermediate
sector rounding added here; and technically, since bdrv_getlength()
already rounds up to sectors, our use of DIV_ROUND_UP is more for
theoretical completeness than for any actual rounding.

Use is_power_of_2() while at it, instead of open-coding that.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

dirty-bitmap: Change bdrv_dirty_bitmap_size() to report bytes

We're already reporting bytes for bdrv_dirty_bitmap_granularity();
mixing bytes and sectors in our return values is a recipe for
confusion. A later cleanup will convert dirty bitmap internals
to be entirely byte-based, but in the meantime, we should report
the bitmap size in bytes.

The only external caller in qcow2-bitmap.c is temporarily more verbose
(because it is still using sector-based math), but will later be
switched to track progress by bytes instead of sectors.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

dirty-bitmap: Avoid size query failure during truncate

We've previously fixed several places where we failed to account
for possible errors from bdrv_nb_sectors().  Fix another one by
making bdrv_dirty_bitmap_truncate() take the new size from the
caller instead of querying itself; then adjust the sole caller
bdrv_truncate() to pass the size just determined by a successful
resize, or to reuse the size given to the original truncate
operation when refresh_total_sectors() was not able to confirm the
actual size (the two sizes can potentially differ according to
rounding constraints), thus avoiding sizing the bitmaps to -1.
This also fixes a bug where not all failure paths in
bdrv_truncate() would set errp.

Note that bdrv_truncate() is still a bit awkward.  We may want
to revisit it later and clean up things to better guarantee that
a resize attempt either fails cleanly up front, or cannot fail
after guest-visible changes have been made (if temporary changes
are made, then they need to be cleanly rolled back).  But that
is a task for another day; for now, the goal is the bare minimum
fix to ensure that just bdrv_dirty_bitmap_truncate() cannot fail.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

dirty-bitmap: Drop unused functions

We had several functions that no one is currently using, and which
use sector-based interfaces. I'm trying to convert towards byte-based
interfaces, so it's easier to just drop the unused functions:

bdrv_dirty_bitmap_get_meta
bdrv_dirty_bitmap_get_meta_locked
bdrv_dirty_bitmap_reset_meta
bdrv_dirty_bitmap_meta_granularity

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qcow2: Ensure bitmap serialization is aligned

When subdividing a bitmap serialization, the code in hbitmap.c
enforces that start/count parameters are aligned (except that
count can end early at end-of-bitmap). We exposed this required
alignment through bdrv_dirty_bitmap_serialization_align(), but
forgot to actually check that we comply with it.

Fortunately, qcow2 is never dividing bitmap serialization smaller
than one cluster (which is a minimum of 512 bytes); so we are
always compliant with the serialization alignment (which insists
that we partition at least 64 bits per chunk) because we are doing
at least 4k bits per chunk.

Still, it's safer to add an assertion (for the unlikely case that
we'd ever support a cluster smaller than 512 bytes, or if the
hbitmap implementation changes what it considers to be aligned),
rather than leaving bdrv_dirty_bitmap_serialization_align()
without a caller.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hbitmap: Rename serialization_granularity to serialization_align

The only client of hbitmap_serialization_granularity() is dirty-bitmap's
bdrv_dirty_bitmap_serialization_align(). Keeping the two names consistent
is worthwhile, and the shorter name is more representative of what the
function returns (the required alignment to be used for start/count of
other serialization functions, where violating the alignment causes
assertion failures).

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Make bdrv_img_create() size selection easier to read

All callers of bdrv_img_create() pass in a size, or -1 to read the
size from the backing file. We then set that size as the QemuOpt
default, which means we will reuse that default rather than the
final parameter to qemu_opt_get_size() several lines later. But
it is rather confusing to read subsequent checks of 'size == -1'
when it looks (without seeing the full context) like size defaults
to 0; it also doesn't help that a size of 0 is valid (for some
formats).

Rework the logic to make things more legible.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Typo fix in copy_on_readv()

Signed-off-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20171006' into staging

s390x changes:
- support for IDA (indirect addressing in ccws) via ccw data stream
- support for extended TOD-Clock (z14 feature)
- various fixes and improvements all over the place

# gpg: Signature made Fri 06 Oct 2017 10:52:22 BST
# gpg:                using RSA key 0xDECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <[email protected]>"
# gpg:                 aka "Cornelia Huck <[email protected]>"
# gpg:                 aka "Cornelia Huck <[email protected]>"
# gpg:                 aka "Cornelia Huck <[email protected]>"
# gpg:                 aka "Cornelia Huck <[email protected]>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0  18CE DECF 6B93 C6F0 2FAF

* remotes/cohuck/tags/s390x-20171006: (33 commits)
  hw/s390x: Mark the "sclpquiesce" device with user_creatable = false
  s390x/tcg: initialize machine check queue
  s390x/sclp: mark sclp-cpu-hotplug as non-usercreatable
  s390x/sclp: Mark the sclp device with user_creatable = false
  s390/kvm: make TOD setting failures fatal for migration
  s390/kvm: Support for get/set of extended TOD-Clock for guest
  s390x/css: fix css migration compat handling
  s390x: sort some devices into categories
  s390x/tcg: make STFL store into the lowcore
  s390x: introduce and use S390_MAX_CPUS
  target/s390x: get rid of next_core_id
  s390x/cpumodel: fix max STFL(E) bit number
  s390x: raise CPU hotplug irq after really hotplugged
  MAINTAINERS: use KVM s390x maintainers for kvm-stubs.c and kvm_s390x.h
  s390x/3270: handle writes of arbitrary length
  s390x/3270: IDA support for 3270 via CcwDataStream
  Revert "s390x/ccw: create s390 phb conditionally"
  s390x/tcg: make idte/ipte use the new _real mmu
  s390x/tcg: make testblock use the new _real mmu
  s390x/tcg: make stora(g) use the new _real mmu
  ...

Signed-off-by: Peter Maydell <[email protected]>

hw/s390x: Mark the "sclpquiesce" device with user_creatable = false

The "sclpquiesce" device is just an internal device that should not be
created by the user directly. Though it currently does not seem to cause
any obvious trouble when the user instantiates an additional device, let's
better mark it with user_creatable = false to avoid unexpected behavior,
e.g. because the quiesce notifier gets registered multiple times.

Signed-off-by: Thomas Huth <[email protected]>
Message-Id: <1507193105 [email protected]>
Reviewed-by: Halil Pasic <[email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: initialize machine check queue

Just as for external interrupts and I/O interrupts, we need to
initialize mchk_index during cpu reset.

Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/sclp: mark sclp-cpu-hotplug as non-usercreatable

A TYPE_SCLP_CPU_HOTPLUG device for handling cpu hotplug events
is already created by the sclp event facility. Adding a second
TYPE_SCLP_CPU_HOTPLUG device via -device sclp-cpu-hotplug creates
an ambiguity in raise_irq_cpu_hotplug(), leading to a crash once
a cpu is hotplugged.

To fix this, disallow creating a sclp-cpu-hotplug device manually.

Reviewed-by: Thomas Huth <[email protected]>
Acked-by: Christian Borntraeger <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/sclp: Mark the sclp device with user_creatable = false

The "sclp" device is just an internal device that can not be instantiated
by the users. If they try to use it, they only get a simple error message:

$ qemu-system-s390x -nographic -device sclp
qemu-system-s390x: Option '-device s390-sclp-event-facility' cannot be
handled by this machine

Since sclp_init() tries to create a TYPE_SCLP_EVENT_FACILITY which is
a non-pluggable sysbus device, there is really no way that the "sclp"
device can be used by the user, so let's set the user_creatable = false
accordingly.

Signed-off-by: Thomas Huth <[email protected]>
Message-Id: <1507125199 [email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
Reviewed-by: Farhan Ali <[email protected]>
Acked-by: Halil Pasic <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390/kvm: make TOD setting failures fatal for migration

If we fail to set a proper TOD clock on the target system, this can
already result in some problematic cases. We print several warn messages
on source and target in that case.

If kvm fails to set a nonzero epoch index, then we must ultimately fail
the migration as this will result in a giant time leap backwards. This
patch lets the migration fail if we can not set the guest time on the
target.

On failure the guest will resume normally on the original host machine.

Signed-off-by: Collin L. Walling <[email protected]>
Reviewed-by: Eric Farman <[email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
[split failure change from epoch index change, minor fixups]
Message-Id: <20171004105751 [email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390/kvm: Support for get/set of extended TOD-Clock for guest

Provides an interface for getting and setting the guest's extended
TOD-Clock via a single ioctl to kvm. If the ioctl fails because it
is not support by kvm, then we fall back to the old style of
retrieving the clock via two ioctls.

Signed-off-by: Collin L. Walling <[email protected]>
Reviewed-by: Eric Farman <[email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
[split failure change from epoch index change]
Message-Id: <20171004105751 [email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>
[some cosmetic fixes]

s390x/css: fix css migration compat handling

Commit e996583eb3 ("s390x/css: activate ChannelSubSys migration",
2017-07-11) was supposed to enable css migration for virtio-ccw
machines starting 2.10, but it ended up effectively enabling it
only for 2.10 as the registration of the appropriate VMStateDescription
happens in ccw_machine_2_10_instance_options which does not get
called for machines more recent than 2_10.

Let us move the corresponding chunk of code (which conditionally enables
the migration based on the value of the corresponding class property) to
ccw_init, which is called for each virtio-ccw machine instance.

Signed-off-by: Halil Pasic <[email protected]>
Reported-by: Thomas Huth <[email protected]>
Message-Id: <20171004110109 [email protected]>
Tested-by: Christian Borntraeger <[email protected]>
Acked-by: Christian Borntraeger <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x: sort some devices into categories

Add missing categorizations for some s390x devices:
- zpci device -> misc
- 3270 -> display
- vfio-ccw -> misc

Acked-by: Christian Borntraeger <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: make STFL store into the lowcore

Using virtual memory access is wrong and will soon include low-address
protection checks, which is to be bypassed for STFL.

STFL is a privileged instruction and using LowCore requires
!CONFIG_USER_ONLY, so add the ifdef and move the declaration to the
right place.

This was originally part of a bigger STFL(E) refactoring.

Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20170927170027 [email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x: introduce and use S390_MAX_CPUS

Will be handy in the future.

Reviewed-by: Thomas Huth <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20170928134609 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

target/s390x: get rid of next_core_id

core_id is not needed by linux-user, as the core_id a.k.a. CPU address
is only accessible from kernel space.

Therefore, drop next_core_id and make cpu_index get autoassigned again
for linux-user.

While at it, shield core_id and cpuid completely from linux-user. cpuid
can also only be queried from kernel space.

Suggested-by: Igor Mammedov <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20170928134609 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/cpumodel: fix max STFL(E) bit number

Not that it would matter in the near future, but it is actually 2048
bytes, therefore 16384 possible bits.

Reviewed-by: Christian Borntraeger <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20170928134609 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x: raise CPU hotplug irq after really hotplugged

Let's move it into the machine, so we trigger the IRQ after setting
ms->possible_cpus (which SCLP uses to construct the list of
online CPUs).

This also fixes a problem reported by Thomas Huth, whereby qemu can be
crashed using the none machine

qemu-s390x-softmmu -M none -monitor stdio
-> device_add qemu-s390-cpu

Reviewed-by: Christian Borntraeger <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20170928134609 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

MAINTAINERS: use KVM s390x maintainers for kvm-stubs.c and kvm_s390x.h

Forgot it when factoring code out into these files. This is 100% s390x
KVM material.

Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20170928134609 [email protected]>
Acked-by: Christian Borntraeger <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/3270: handle writes of arbitrary length

The problem is, that the current implementation places unrealistic and
arbitrary constraints on the length of writes to the device (that is the
outbound requests), by asserting ccw.count being such that that even the
worst case escaped payload will fit an more or less arbitrary sized
buffer. Actually on protocol level there is nothing to justify such
a limitation.

Another strange thing is the return value which more or less reflects
the size (written) after escaping instead of before escaping. This
is strange, because this return value is used to calculate SCSW.count.

Let us teach 3270 how to deal with arbitrary long writes.

Signed-off-by: Halil Pasic <[email protected]>
Acked-by: Christian Borntraeger <[email protected]>
Reviewed-by: Dong Jia Shi <[email protected]>
Reported-by: Jason J . Herne <[email protected]>
Tested-by: Jason J . Herne <[email protected]>
Message-Id: <20170920172314 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/3270: IDA support for 3270 via CcwDataStream

Let us convert the 3270 code so it uses the recently introduced
CcwDataStream abstraction instead of blindly assuming direct data access.

This patch does not change behavior beyond introducing IDA support: for
direct data access CCWs everything stays as-is. (If there are bugs, they
are also preserved).

Signed-off-by: Halil Pasic <[email protected]>
Acked-by: Christian Borntraeger <[email protected]>
Reviewed-by: Dong Jia Shi <[email protected]>
Message-Id: <20170920172314 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

Revert "s390x/ccw: create s390 phb conditionally"

This reverts commit d32bd032d8fde41281aae34c16a4aa97e9acfeac.

Turns out that old QEMUs always created a pci host bridge
and for many CPU models the migration from old QEMUs to new
QEMUs will fail with
qemu-system-s390x: Unknown savevm section or instance 'PCIBUS' 0
qemu-system-s390x: load of migration failed: Invalid argument

As a quick fix we will revert the commit and always create the
pci host bridge.

Signed-off-by: Christian Borntraeger <[email protected]>
[fixed revert to keep the comment fixup, added a comment in the code]
Cc: Cornelia Huck <[email protected]>
Cc: David Hildenbrand <[email protected]>
Message-Id: <20170928131831 [email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: make idte/ipte use the new _real mmu

We don't wrap addresses in the mmu for the _real case, therefore the
behavior should be unchanged.

Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20170926183318 [email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: make testblock use the new _real mmu

Low address protection checks will be moved into the mmu later.

Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20170926183318 [email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: make stora(g) use the new _real mmu

As we properly handle the return address now, we can drop
potential_page_fault().

Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20170926183318 [email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: make lura(g) use the new _real mmu.

Looks like, lurag was not loading 64bit but only 32bit.

As we properly handle the return address now, we can drop
potential_page_fault().

Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20170926183318 [email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: add MMU for real addresses

This makes it easy to access real addresses (prefix) and in addition
checks for valid memory addresses, which is missing when using e.g.
stl_phys().

We can later reuse it to implement low address protection checks (then
we might even decide to introduce yet another MMU for absolute
addresses, just for handling storage keys and low address protection).

Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20170926183318 [email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: fix checking for invalid memory check

It should have been a >=, but let's directly perform a proper access
check to also be able to deal with hotplugged memory later.

Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20170926183318 [email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/css: support ccw IDA

Let's add indirect data addressing support for our virtual channel
subsystem. This implementation does not bother with any kind of
prefetching. We simply step through the IDAL on demand.

Signed-off-by: Halil Pasic <[email protected]>
Message-Id: <20170921180841 [email protected]>
Reviewed-by: Dong Jia Shi <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

390x/css: introduce maximum data address checking

The architecture mandates the addresses to be accessed on the first
indirection level (that is, the data addresses without IDA, and the
(M)IDAW addresses with (M)IDA) to be checked against an CCW format
dependent limit maximum address. If a violation is detected, the storage
access is not to be performed and a channel program check needs to be
generated. As of today, we fail to do this check.

Let us stick even closer to the architecture specification.

Signed-off-by: Halil Pasic <[email protected]>
Message-Id: <20170921180841 [email protected]>
Reviewed-by: Pierre Morel <[email protected]>
Reviewed-by: Dong Jia Shi <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

virtio-ccw: use ccw data stream

Replace direct access which implicitly assumes no IDA
or MIDA with the new ccw data stream interface which should
cope with these transparently in the future.

Note that checking the return code for ccw_dstream_* will be
done in a follow-on patch.

Signed-off-by: Halil Pasic <[email protected]>
Reviewed-by: Pierre Morel <[email protected]>
Reviewed-by: Dong Jia Shi <[email protected]>
Message-Id: <20170921180841 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/css: use ccw data stream

Replace direct access which implicitly assumes no IDA
or MIDA with the new ccw data stream interface which should
cope with these transparently in the future.

Note that checking the return code for ccw_dstream_* will be
done in a follow-on patch.

Signed-off-by: Halil Pasic <[email protected]>
Reviewed-by: Dong Jia Shi <[email protected]>
Reviewed-by: Pierre Morel <[email protected]>
Message-Id: <20170921180841 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/css: introduce css data stream

This is a preparation for introducing handling for indirect data
addressing and modified indirect data addressing (CCW). Here we introduce
an interface which should make the addressing scheme transparent for the
client code. Here we implement only the basic scheme (no IDA or MIDA).

Signed-off-by: Halil Pasic <[email protected]>
Reviewed-by: Dong Jia Shi <[email protected]>
Reviewed-by: Pierre Morel <[email protected]>
Message-Id: <20170921180841 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/kvm: fix and cleanup storing CPU status

env->psa is a 64bit value, while we copy 4 bytes into the save area,
resulting always in 0 getting stored.

Let's try to reduce such errors by using a proper structure. While at
it, use correct cpu->be conversion (and get_psw_mask()), as we will be
reusing this code for TCG soon.

Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20170922140338 [email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x: use generic cpu_model parsing

Define default CPU type in generic way in machine class_init
and let common machine code handle cpu_model parsing.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Message-Id: <1505998749 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: add basic MSA features

The STFLE bits for the MSA (extension) facilities simply indicate that
the respective instructions can be executed. The QUERY subfunction can then
be used to identify which features exactly are available.

Availability of subfunctions can also vary on real hardware. For now, we
simply implement a CPU model without any available subfunctions except
QUERY (which is always around).

As all MSA functions behave quite similarly, we can use one translation
handler for now. Prepare the code for implementation of actual subfunctions.

At least MSA is helpful for now, as older Linux kernels require this
facility when compiled for a z9 model. Allow to enable the facilities
for the qemu cpu model.

Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20170920153016 [email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: move wrap_address() to internal.h

We want to use it in another file.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20170920153016 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: implement spm (SET PROGRAM MASK)

Missing and is used inside Linux in the context of CPACF.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20170920153016 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging

# gpg: Signature made Thu 05 Oct 2017 15:25:21 BST
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg:                 aka "Stefan Hajnoczi <[email protected]>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/tracing-pull-request:
  checkpatch: fix incompatibility with old perl

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/dgilbert/tags/pull-hmp-20171005' into staging

HMP pull 2017-10-05

# gpg: Signature made Thu 05 Oct 2017 11:50:13 BST
# gpg:                using RSA key 0x0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert/tags/pull-hmp-20171005:
  hmp-commands-info: Change "@findex FOO" to "@findex info FOO"
  hmp-commands-info: Move Texinfo stanzas to conventional place
  hmp-commands-info: Fix "info rocker-FOO" misspellings
  hmp: Fix unknown command for subtable
  hmp: Missing handle_errors

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/kraxel/tags/usb-20171005-pull-request' into staging

usb bugfixes.

# gpg: Signature made Thu 05 Oct 2017 10:04:15 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg:                 aka "Gerd Hoffmann <[email protected]>"
# gpg:                 aka "Gerd Hoffmann (private) <[email protected]>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/usb-20171005-pull-request:
  usb: fix host-stub.c build race
  usb: Use angle brackets for cacard include directive
  usb: fix libusb config variable name.
  hw/usb/bus: Remove bad object_unparent() from usb_try_create_simple()

Signed-off-by: Peter Maydell <[email protected]>

checkpatch: fix incompatibility with old perl

Do not use '/r' modifier which was introduced in perl 5.14.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Daniel P. Berrange <[email protected]>
Fixes: 3e5875afc0f ("checkpatch: check trace-events code style")
Tested-by: Alex Williamson <[email protected]>
Message-id: 20171004154420 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

Merge remote-tracking branch 'remotes/berrange/tags/pull-qio-2017-10-04-1' into staging

Merge qio 2017/10/04 v1

# gpg: Signature made Wed 04 Oct 2017 13:23:04 BST
# gpg:                using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <[email protected]>"
# gpg:                 aka "Daniel P. Berrange <[email protected]>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF

* remotes/berrange/tags/pull-qio-2017-10-04-1:
  io: add trace events for websockets frame handling
  io: Attempt to send websocket close messages to client
  io: Reply to ping frames
  io: Ignore websocket PING and PONG frames
  io: Allow empty websocket payload
  io: Add support for fragmented websocket binary frames
  io: Small updates in preparation for websocket changes
  ui: Always remove an old VNC channel watch before adding a new one
  io: use case insensitive check for Connection & Upgrade websock headers
  io: include full error message in websocket handshake trace
  io: send proper HTTP response for websocket errors

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/awilliam/tags/vfio-updates-20171003.0' into staging

VFIO updates 2017-10-03

- NVIDIA GPUDirect Cliques experimental support (Alex Williamson)

# gpg: Signature made Tue 03 Oct 2017 22:28:43 BST
# gpg:                using RSA key 0x239B9B6E3BB08B22
# gpg: Good signature from "Alex Williamson <[email protected]>"
# gpg:                 aka "Alex Williamson <[email protected]>"
# gpg:                 aka "Alex Williamson <[email protected]>"
# gpg:                 aka "Alex Williamson <[email protected]>"
# Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B  8A90 239B 9B6E 3BB0 8B22

* remotes/awilliam/tags/vfio-updates-20171003.0:
  vfio/pci: Add NVIDIA GPUDirect Cliques support
  vfio/pci: Add virtual capabilities quirk infrastructure
  vfio/pci: Do not unwind on error

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

# gpg: Signature made Tue 03 Oct 2017 19:53:34 BST
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg:                 aka "Stefan Hajnoczi <[email protected]>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request:
  aio: fix assert when remove poll during destroy
  iothread: delay the context release to finalize
  iothread: export iothread_stop()
  iothread: provide helpers for internal use
  qom: provide root container for internal objs

Signed-off-by: Peter Maydell <[email protected]>

hmp-commands-info: Change "@findex FOO" to "@findex info FOO"

qemu-doc has the monitor commands in the "Function Index". The "info
FOO" are listed as "FOO" there. List them as "info FOO" instead.

Signed-off-by: Markus Armbruster <[email protected]>
Message-Id: <20171002134538 [email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>