Git Repo - qemu.git/log

qcow2: Zero-initialise first cluster for new images

Strictly speaking, this is only required for has_zero_init() == false,
but it's easy enough to just do a cluster-aligned write that is padded
with zeros after the header.

This fixes that after 'qemu-img create' header extensions are attempted
to be parsed that are really just random leftover data.

Cc: [email protected]
Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block: Close backing file early in bdrv_img_create

Leaving the backing file open although it is not needed anymore can
cause problems if it is opened through a block driver which allows
exclusive access only and if the create function of the block driver
used for the top image (the one being created) tries to close and reopen
the image file (which will include opening the backing file a second
time).

In particular, this will happen with a backing file opened through
qemu-nbd and using qcow2 as the top image file format (which reopens the
image to flush it to disk).

In addition, the BlockDriverState in bdrv_img_create() is used for the
backing file only; it should therefore be made local to the respective
block.

Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Wenchao Xia <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

scsi-disk: correctly implement WRITE SAME

Fetch the data to be written from the input buffer. If it is all zeroes,
we can use the write_zeroes call (possibly with the new MAY_UNMAP flag).
Otherwise, do as many write cycles as needed, writing 512k at a time.

Strictly speaking, this is still incorrect because a zero cluster should
only be written if the MAY_UNMAP flag is set. But this is a bug in qcow2
and the other formats, not in the SCSI code.

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

scsi-disk: reject ANCHOR=1 for UNMAP and WRITE SAME commands

Since we report ANC_SUP==0 in VPD page B2h, we need to return
an error (ILLEGAL REQUEST/INVALID FIELD IN CDB) for all WRITE SAME
requests with ANCHOR==1.

Inspired by a similar patch to the LIO in-kernel target.

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

scsi-disk: catch write protection errors in UNMAP

This is the same that is already done for WRITE SAME.

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

qemu-iotests: 033 is fast

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

raw-posix: add support for write_zeroes on XFS and block devices

The code is similar to the implementation of discard and write_zeroes
with UNMAP.  However, failure must be propagated up to block.c.

The stale page cache problem can be reproduced as follows:

    # modprobe scsi-debug lbpws=1 lbprz=1
    # ./qemu-io /dev/sdXX
    qemu-io> write -P 0xcc 0 2M
    qemu-io> write -z 0 1M
    qemu-io> read -P 0x00 0 512
    Pattern verification failed at offset 0, 512 bytes
    qemu-io> read -v 0 512
    00000000:  cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  ................
    ...

    # ./qemu-io --cache=none /dev/sdXX
    qemu-io> write -P 0xcc 0 2M
    qemu-io> write -z 0 1M
    qemu-io> read -P 0x00 0 512
    qemu-io> read -v 0 512
    00000000:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    ...

And similarly with discard instead of "write -z".

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

raw-posix: implement write_zeroes with MAY_UNMAP for block devices

See the next commit for the description of the Linux kernel problem
that is worked around in raw_open_common.

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

raw-posix: implement write_zeroes with MAY_UNMAP for files

Writing zeroes to a file can be done by punching a hole if
MAY_UNMAP is set.

Note that in this case ENOTSUP is not ignored, but makes
the block layer fall back to the generic implementation.

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block/iscsi: check WRITE SAME support differently depending on MAY_UNMAP

The current check is right for MAY_UNMAP=1. For MAY_UNMAP=0, just
try and fall back to regular writes as soon as a WRITE SAME command
fails.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block/iscsi: updated copyright

added myself to reflect recent work on the iscsi block driver.

Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block/iscsi: remove .bdrv_has_zero_init

since commit 3ac21627 the default value changed to 0.

Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block drivers: expose requirement for write same alignment from formats

This will let misaligned but large requests use zero clusters. This
is important because the cluster size is not guest visible.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block drivers: add discard/write_zeroes properties to bdrv_get_info implementation

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

vpc, vhdx: add get_info

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block: make bdrv_co_do_write_zeroes stricter in producing aligned requests

Right now, bdrv_co_do_write_zeroes will only try to align the
beginning of the request. However, it is simpler for many
formats to expect the block layer to separate both the head *and*
the tail. This makes sure that the format's bdrv_co_write_zeroes
function will be called with aligned sector_num and nb_sectors for
the bulk of the request.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block: handle ENOTSUP from discard in generic code

Similar to write_zeroes, let the generic code receive a ENOTSUP for
discard operations. Since bdrv_discard has advisory semantics,
we can just swallow the error.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block: add bdrv_aio_write_zeroes

This will be used by the SCSI layer.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block: add flags argument to bdrv_co_write_zeroes tracepoint

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block: add flags to BlockRequest

This lets bdrv_co_do_rw receive flags, so that it can be used for
zero writes.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block: generalize BlockLimits handling to cover bdrv_aio_discard too

bdrv_co_discard is only covering drivers which have a .bdrv_co_discard()
implementation, but not those with .bdrv_aio_discard(). Not very nice,
and easy to avoid.

Suggested-by: Kevin Wolf <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

vmdk: Fix creating big description file

The buffer for description file was 4096 which only covers a few
hundred of extents. This changes the buffer to dynamic allocated with
g_strdup_printf in order to support bigger cases.

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

coroutine: remove unused CoQueue AioContext

The AioContext ctx field is apparently unused in qemu codebase since
02ffb504485.

Signed-off-by: Marc-André Lureau <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

coroutine: remove qemu_co_queue_wait_insert_head

qemu_co_queue_wait_insert_head() is unused in qemu code base now.

Signed-off-by: Marc-André Lureau <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

qemu-iotests: Add sample image and test for VMDK version 3

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

vmdk: Allow read only open of VMDK version 3

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: Filter out 'qemu-io> ' prompt

This removes "qemu-io> " prompt from qemu-io output in _filter_qemu_io,
and updates all the output files with the following command:

cd tests/qemu-iotests && sed -i "s/qemu-io> //g" *.out

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: Filter qemu-io output in 025

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Use BDRV_O_NO_BACKING where appropriate

If you open an image temporarily just because you want to check its size
or get it flushed, there's no real reason to open the whole backing file
chain.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Reviewed-by: Benoit Canet <[email protected]>

qemu-iotests: Test snapshot mode

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Eric Blake <[email protected]>

block: Enable BDRV_O_SNAPSHOT with driver-specific options

In the case of snapshot=on, don't rely on the backing file path in the
temporary image any more, but override the backing file with the given
set of options. This way, block drivers that don't use a file name can
be accessed with snapshot=on, for example:

-drive file.driver=nbd,file.host=localhost,snapshot=on

Which becomes internally something like:

file.filename=/tmp/vl.AWQZCu,backing.file.driver=nbd,backing.file.host=localhost

Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: Make test case 030, 040 and 055 deterministic

Pause the drive and start the block job, so we won't miss the block job.

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

qemu-iotest: Add pause_drive and resume_drive methods

They wrap blkdebug "break" and "remove_break".

Add optional argument "resume" to cancel_and_wait().

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

blkdebug: add "remove_break" command

This adds "remove_break" command which is the reverse of blkdebug
command "break": it removes all breakpoints with given tag and resumes
all the requests.

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

qemu-iotests: Drop local version of cancel_and_wait from 040

iotests.py already has one.

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

sheepdog: support user-defined redundancy option

Sheepdog support two kinds of redundancy, full replication and erasure coding.

# create a fully replicated vdi with x copies
-o redundancy=x (1 <= x <= SD_MAX_COPIES)

# create a erasure coded vdi with x data strips and y parity strips
-o redundancy=x:y (x must be one of {2,4,8,16} and 1 <= y < SD_EC_MAX_STRIP)

E.g, to convert a vdi into sheepdog vdi 'test' with 8:3 erasure coding scheme

$ qemu-img convert -o redundancy=8:3 linux-0.2.img sheepdog:test

Cc: Kevin Wolf <[email protected]>
Cc: Stefan Hajnoczi <[email protected]>
Signed-off-by: Liu Yuan <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

sheepdog: refactor do_sd_create()

We can actually use BDRVSheepdogState *s to pass most of the parameters.

Cc: Kevin Wolf <[email protected]>
Cc: Stefan Hajnoczi <[email protected]>
Signed-off-by: Liu Yuan <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

qdict: Optimise qdict_do_flatten()

Nested QDicts used to be both entered recursively in order to move their
entries to the target QDict and also be moved themselves to the target
QDict like all other objects. This is harmless because for the top
level, qdict_do_flatten() will encounter the (now empty) QDict for a
second time and then delete it, but at the same time it's obviously
unnecessary overhead. Just delete nested QDicts directly after moving
all of their entries.

Reported-by: Laszlo Ersek <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

qdict: Fix memory leak in qdict_do_flatten()

Reported-by: Laszlo Ersek <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

MAINTAINERS: add sheepdog development mailing list

This will help people find mailing list relevant to sheepdog.

Cc: Stefan Hajnoczi <[email protected]>
Cc: Kevin Wolf <[email protected]>
Signed-off-by: Liu Yuan <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

COW: Extend checking allocated bits to beyond one sector

cow_co_is_allocated() only checks one sector's worth of allocated bits
before returning. This is allowed but (slightly) inefficient, so extend
it to check all of the file's metadata sectors.

Signed-off-by: Charlie Shepherd <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>
[kwolf: silenced compiler warning (-Wmaybe-uninitialized for changed)]
Signed-off-by: Kevin Wolf <[email protected]>

COW: Speed up writes

Process a whole sector's worth of COW bits by reading a sector, setting
the bits after skipping any already set bits, then writing it out again.
Make sure we only flush once before writing metadata, and only if we
need to write metadata.

Signed-off-by: Charlie Shepherd <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

qapi: Change BlockDirtyInfo to list

We have multiple dirty bitmaps in BDS now, switch QAPI to allow query
it (BlockInfo.dirty_bitmaps), and also drop old BlockInfo.dirty.

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: per caller dirty bitmap

Previously a BlockDriverState has only one dirty bitmap, so only one
caller (e.g. a block job) can keep track of writing. This changes the
dirty bitmap to a list and creates a BdrvDirtyBitmap for each caller, the
lifecycle is managed with these new functions:

    bdrv_create_dirty_bitmap
    bdrv_release_dirty_bitmap

Where BdrvDirtyBitmap is a linked list wrapper structure of HBitmap.

In place of bdrv_set_dirty_tracking, a BdrvDirtyBitmap pointer argument
is added to these functions, since each caller has its own dirty bitmap:

    bdrv_get_dirty
    bdrv_dirty_iter_init
    bdrv_get_dirty_count

bdrv_set_dirty and bdrv_reset_dirty prototypes are unchanged but will
internally walk the list of all dirty bitmaps and set them one by one.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block/stream: Don't stream unbacked devices

If a block device is unbacked, a streaming blockjob should immediately
finish instead of beginning to try to stream, then noticing the backing
file does not contain even the first sector (since it does not exist)
and then finishing normally.

Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Wenchao Xia <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

sheepdog: implement .bdrv_get_allocated_file_size

With this patch, qemu-img info sheepdog:image will show disk size for sheepdog
images.

Cc: Kevin Wolf <[email protected]>
Cc: Stefan Hajnoczi <[email protected]>
Cc: MORITA Kazutaka <[email protected]>
Signed-off-by: Liu Yuan <[email protected]>
Reviewed-by: MORITA Kazutaka <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

Test coroutine execution order

This patch adds a test for coroutine execution order in test-coroutine -
this catches a bug in the CPC coroutine implementation.

Signed-off-by: Charlie Shepherd <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

util/error: Save errno from clobbering

There may be calls to error_setg() and especially error_setg_errno()
which blindly (and until now wrongly) assume these functions not to
clobber errno (e.g., they pass errno to error_setg_errno() and return
-errno afterwards). Instead of trying to find and fix all of these
constructs, just make sure error_setg() and error_setg_errno() indeed do
not clobber errno.

Suggested-by: Eric Blake <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Benoit Canet <[email protected]
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

qemu-img: conditionally zero out target on convert

If the target has_zero_init = 0, but supports efficiently
writing zeroes by unmapping we call bdrv_make_zero to
avoid fully allocating the target. This currently works
only for iscsi. It can be extended to raw with
BLKDISCARDZEROES for example.

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

qemu-img: add support for fully allocated images

Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block/get_block_status: fix BDRV_BLOCK_ZERO for unallocated blocks

this patch does 2 things:
a) only do additional call outs if BDRV_BLOCK_ZERO is not already set.
b) use the newly introduced bdrv_unallocated_blocks_are_zero()
to return the zero state of an unallocated block. the used callout
to bdrv_has_zero_init() is only valid right after bdrv_create.

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block: introduce bdrv_make_zero

this patch adds a call to completely zero out a block device.
the operation is sped up by checking the block status and
only writing zeroes to the device if they currently do not
return zeroes. optionally the zero writing can be sped up
by setting the flag BDRV_REQ_MAY_UNMAP to emulate the zero
write by unmapping if the driver supports it.

Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

iscsi: add bdrv_co_write_zeroes

Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

iscsi: simplify iscsi_co_discard

now that bdrv_co_discard can handle limits we do not need
the request split logic here anymore.

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

iscsi: set limits in BlockDriverState

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block: honour BlockLimits in bdrv_co_discard

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block: honour BlockLimits in bdrv_co_do_write_zeroes

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block/raw: copy BlockLimits on raw_open

Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block: add BlockLimits structure to BlockDriverState

this patch adds BlockLimits which introduces discard and write_zeroes
limits and alignment information to the BlockDriverState.

Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block/iscsi: add .bdrv_get_info

Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block: add wrappers for logical block provisioning information

This adds 2 wrappers to read the unallocated_blocks_are_zero and
can_write_zeroes_with_unmap info from the BDI. The wrappers are
required to check for the existence of a backing_hd and
if the devices are opened with the correct flags.

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block: add logical block provisioning info to BlockDriverInfo

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block: introduce BDRV_REQ_MAY_UNMAP request flag

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block: add flags to bdrv_*_write_zeroes

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

block: make BdrvRequestFlags public

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Peter Lieven <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

Open 2.0 development tree

Signed-off-by: Anthony Liguori <[email protected]>

Update version for 1.7.0 release

Signed-off-by: Anthony Liguori <[email protected]>

qemu-iotests: Fix test 041

Performing multiple drive-mirror blockjobs on the same qemu instance
results in the image file used for the block device being replaced by
the newly mirrored file, which is not what we want.

Fix this by performing one dedicated test per sync mode.

Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-id: 1385407736 [email protected]
Signed-off-by: Anthony Liguori <[email protected]>

block/drive-mirror: Reuse backing HD for sync=none

For "none" sync mode in "absolute-paths" mode, the current image should
be used as the backing file for the newly created image.

The current behavior is:
a) If the image to be mirrored has a backing file, use that (which is
   wrong, since the operations recorded by "none" are applied to the
   image itself, not to its backing file).
b) If the image to be mirrored lacks a backing file, the target doesn't
   have one either (which is not really wrong, but not really right,
   either; "none" records a set of operations executed on the image
   file, therefore having no backing file to apply these operations on
   seems rather pointless).

For a, this is clearly a bugfix. For b, it is still a bugfix, although
it might break existing API - but since that case crashed qemu just
three weeks ago (before 1452686495922b81d6cf43edf025c1aef15965c0), we
can safely assume there is no such API relying on that case yet.

Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-id: 1385407736 [email protected]
Signed-off-by: Anthony Liguori <[email protected]>

Update version for v1.7.0-rc2 release

curses: fixup SIGWINCH handler mess

Don't run code in the signal handler, only set a flag.
Use sigaction(2) to avoid non-portable signal(2) semantics.
Make #ifdefs less messy.

Signed-off-by: Gerd Hoffmann <[email protected]>
Reviewed-by: Laszlo Ersek <[email protected]>
Message-id: 1385130903 [email protected]
Signed-off-by: Anthony Liguori <[email protected]>

qga: Fix two format strings for MinGW

Both code locations cause a compiler warning. Using "%s" instead of "%lu"
would result in a program crash if the wrong code were executed.

Signed-off-by: Stefan Weil <[email protected]>
Message-id: 1385409257 [email protected]
Signed-off-by: Anthony Liguori <[email protected]>

PPC: BookE: Make FIT/WDT timers at best millisecond grained

The default granularity for the FIT timer on 440 is on every 0x1000th
transition of TB from 0 to 1. Translated that means 48828 times a second.

Since interrupts are quite expensive for 440 and we don't really care
about the accuracy of the FIT to that significance, let's force FIT and
WDT to at best millisecond granularity.

This basically restores behavior as it was in QEMU 1.6, where timers
could only deal with millisecond granularities at all.

This patch greatly improves performance with the 440 target and restores
roughly the same performance level that QEMU 1.6 had for me.

Signed-off-by: Alexander Graf <[email protected]>
Message-id: 1385416015 [email protected]
Signed-off-by: Anthony Liguori <[email protected]>

PPC: Make BookE FIT/WDT timers more lazy

Today we fire FIT and WDT timer events every time the respective bit
position in TB flips from 0 -> 1.

However, there is no need to do this if the end result would be that
we're changing a TSR bit that is set to 1 to 1 again. No guest visible
change would have occured.

So whenever we see that the TSR bit to our timer is already set, don't
even bother to update the timer that would potentially fire it off.

However, we do need to make sure that we update our timer that notifies
us of the TB flip when the respective TSR bit gets unset. In that case
we do care about the flip and need to notify the guest again. So add
a callback into our timer handlers when TSR bits get unset.

This improves performance for me when the guest is busy processing things.

Signed-off-by: Alexander Graf <[email protected]>
Message-id: 1385416015 [email protected]
Signed-off-by: Anthony Liguori <[email protected]>

acpi-build: fix support for glib < 2.22

glib < 2.22 does not have g_array_get_element_size,
limit it's use (to check all elements are 1 byte
in size) to newer glib.

This fixes build on RHEL 5.3.

Reported-by: Richard Henderson <[email protected]>
Reported-by: Erik Rull <[email protected]>
Tested-by: Richard Henderson <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Message-id: 20131125220039 [email protected]
Signed-off-by: Anthony Liguori <[email protected]>

Merge remote-tracking branch 'mst/tags/for_anthony' into staging

pc very last minute fixes for 1.7

This has a fix for a crasher bug with pci bridges,
boot failure fix for s390 on 32 bit hosts,
and fixes build for hosts with old glib.

There's also a fix for --iasl configure flag - it can be used
to work around broken iasl on some systems either
by using a non-standard iasl or by disabling it.

I've also reverted a e1000/rtl mac programming change
that seems slightly wrong and too risky for 1.8.

Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Mon 25 Nov 2013 03:40:07 AM PST using RSA key ID D28D5469
# gpg: Can't check signature: public key not found

# By Michael S. Tsirkin (5) and Bandan Das (1)
# Via Michael S. Tsirkin
* mst/tags/for_anthony:
  configure: make --iasl option actually work
  Revert "e1000/rtl8139: update HMP NIC when every bit is written"
  acpi-build: fix build on glib < 2.14
  acpi-build: fix build on glib < 2.22
  pci: unregister vmstate_pcibus on unplug
  s390x: fix flat file load on 32 bit systems

Message-id: 1385379990 [email protected]
Signed-off-by: Anthony Liguori <[email protected]>

Merge remote-tracking branch 'bonzini/tags/for-anthony' into staging

Here are a bunch of 1.7-tagged patches that I was afraid
were getting forgotten or that did not have a clear maintainer responsible
for making a pull request.

# gpg: Signature made Thu 21 Nov 2013 08:40:59 AM PST using RSA key ID 9B4D86F2
# gpg: Can't check signature: public key not found

# By Peter Maydell (3) and others
# Via Paolo Bonzini
* bonzini/tags/for-anthony:
  qga: Fix compiler warnings (missing format attribute, wrong format strings)
  mips jazz: do not raise data bus exception when accessing invalid addresses
  target-i386: yield to another VCPU on PAUSE
  rng-egd: offset the point when repeatedly read from the buffer
  rng-egd: remove redundant free
  target-i386: Fix build by providing stub kvm_arch_get_supported_cpuid()
  vfio-pci: Fix multifunction=on
  atomic.h: Fix build with clang
  pc: get rid of builtin pvpanic for "-M pc-1.5"
  configure: Explicitly set ARFLAGS so we can build with GNU Make 4.0
  sun4m: Add FCode ROM for TCX framebuffer

Message-id: 1385052578 [email protected]
Signed-off-by: Anthony Liguori <[email protected]>

Merge remote-tracking branch 'mdroth/qga-pull-2013-11-22' into staging

# By Tomoki Sekiyama
# Via Michael Roth
* mdroth/qga-pull-2013-11-22:
qemu-ga: vss-win32: Install VSS provider COM+ application service

Message-id: 1385154505 [email protected]
Signed-off-by: Anthony Liguori <[email protected]>

Merge remote-tracking branch 'stefanha/net' into staging

# By Vlad Yasevich
# Via Stefan Hajnoczi
* stefanha/net:
qdev-properties-system.c: Allow vlan or netdev for -device, not both

Message-id: 1385118544 [email protected]
Signed-off-by: Anthony Liguori <[email protected]>

configure: make --iasl option actually work

--iasl option was added to CC option parsing section by mistake,
it's not effective there and attempts to use cause
an 'unknown option' error.

Fix this up.

Tested-by: Marcel Apfelbaum <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

qemu-ga: vss-win32: Install VSS provider COM+ application service

Currently, qemu-ga for Windows fails to execute guset-fsfreeze-freeze when
no user is logging in to Windows, with an error message:
{"error":{"class":"GenericError",
"desc":"failed to add C:\\ to snapshotset: (error: 8004230f)"}}

To enable guest-fsfreeze-freeze/thaw without logging in users, this installs
a service to execute qemu-ga VSS provider COM+ application that has full
access privileges to the local system. The service will automatically be
removed when the COM+ application is deregistered.

This patch replaces ICOMAdminCatalog interface with ICOMAdminCatalog2
interface that contains CreateServiceForApplication() method in addition.

Signed-off-by: Tomoki Sekiyama <[email protected]>
Reviewed-by: Gal Hammer <[email protected]>
Reviewed-by: Yan Vugenfirer <[email protected]>
Tested-by: Yan Vugenfirer <[email protected]>
Signed-off-by: Michael Roth <[email protected]>

qdev-properties-system.c: Allow vlan or netdev for -device, not both

It is currently possible to specify things like:
-device e1000,netdev=foo,vlan=1
With this usage, whichever argument was specified last (vlan or netdev)
overwrites what was previousely set and results in a non-working
configuration. Even worse, when used with multiqueue devices,
it causes a segmentation fault on exit in qemu_free_net_client.

That patch treates the above command line options as invalid and
generates an error at start-up.

Signed-off-by: Vlad Yasevich <[email protected]>
Acked-by: Jason Wang <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

qga: Fix compiler warnings (missing format attribute, wrong format strings)

gcc 4.8.2 reports this warning when extra warnings are enabled (-Wextra):

  CC    qga/commands.o
qga/commands.c: In function ‘slog’:
qga/commands.c:28:5: error:
function might be possible candidate for ‘gnu_printf’ format attribute [-Werror=suggest-attribute=format]
     g_logv("syslog", G_LOG_LEVEL_INFO, fmt, ap);
     ^

gcc 4.8.2 reports this warning when slog is declared with the
gnu_printf format attribute:

qga/commands-posix.c: In function ‘qmp_guest_file_open’:
qga/commands-posix.c:404:5: warning:
format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘int64_t’ [-Wformat=]
     slog("guest-file-open, handle: %d", handle);
     ^

On 32 bit hosts there are three more warnings which are also fixed here.

Signed-off-by: Stefan Weil <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

mips jazz: do not raise data bus exception when accessing invalid addresses

MIPS Jazz chipset doesn't seem to raise data bus exceptions on invalid accesses.
However, there is no easy way to prevent them. Creating a big memory region
for the whole address space doesn't prevent memory core to directly call
unassigned_mem_read/write which in turn call cpu->do_unassigned_access,
which (for MIPS CPU) raise an data bus exception.

This fixes a MIPS Jazz regression introduced in c658b94f6e8c206c59d02aa6fbac285b86b53d2c.

Signed-off-by: Hervé Poussineau <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target-i386: yield to another VCPU on PAUSE

After commit b1bbfe7 (aio / timers: On timer modification, qemu_notify
or aio_notify, 2013-08-21) FreeBSD guests report a huge slowdown.

The problem shows up as soon as FreeBSD turns out its periodic (~1 ms)
tick, but the timers are only the trigger for a pre-existing problem.

Before the offending patch, setting a timer did a timer_settime system call.

After, setting the timer exits the event loop (which uses poll) and
reenters it with a new deadline.  This does not cause any slowdown; the
difference is between one system call (timer_settime and a signal
delivery (SIGALRM) before the patch, and two system calls afterwards
(write to a pipe or eventfd + calling poll again when re-entering the
event loop).

Unfortunately, the exit/enter causes the main loop to grab the iothread
lock, which in turns kicks the VCPU thread out of execution.  This
causes TCG to execute the next VCPU in its round-robin scheduling of
VCPUS.  When the second VCPU is mostly unused, FreeBSD runs a "pause"
instruction in its idle loop which only burns cycles without any
progress.  As soon as the timer tick expires, the first VCPU runs
the interrupt handler but very soon it sets it again---and QEMU
then goes back doing nothing in the second VCPU.

The fix is to make the pause instruction do "cpu_loop_exit".

Reported-by: Luigi Rizzo <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

rng-egd: offset the point when repeatedly read from the buffer

The buffer content might be read out more than once, currently
we just repeatedly read the first data block, buffer offset is
missing.

Cc: [email protected]
Signed-off-by: Amos Kong <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

rng-egd: remove redundant free

We didn't set default chr_name, the free is redundant.

Signed-off-by: Amos Kong <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target-i386: Fix build by providing stub kvm_arch_get_supported_cpuid()

Fix build failures with clang when KVM is not enabled by
providing a stub version of kvm_arch_get_supported_cpuid().
We retain the compile time check that this function isn't
called when CONFIG_KVM is not set by guarding the stub with
ifndef __OPTIMIZE__ (we assume that an optimizing build will
do sufficient constant folding and dead code elimination to
remove the calls before linking).

Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

vfio-pci: Fix multifunction=on

When an assigned device is initialized it copies the device config
space into the emulated config space. Unfortunately multifunction is
setup prior to the device initfn and gets clobbered. We need to
restore it just like pci-assign does.

Cc: [email protected]
Signed-off-by: Alex Williamson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

atomic.h: Fix build with clang

clang defines __ATOMIC_SEQ_CST but its implementation of the
__atomic_exchange() builtin differs from that of gcc. Move the
__clang__ branch of the ifdef ladder to the top and fix its
implementation (there is no such builtin as __sync_exchange),
so we can compile with clang again.

Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

pc: get rid of builtin pvpanic for "-M pc-1.5"

This causes two slight backwards-incompatibilities between "-M pc-1.5"
and 1.5's "-M pc":

(1) a fw_cfg file is removed with this patch. This is only a problem
if migration stops the virtual machine exactly during fw_cfg enumeration.

(2) after migration, a VM created without an explicit "-device pvpanic"
will stop reporting panics to management.

The first problem only occurs if migration is done at a very, very
early point (and I'm not sure it can happen in practice for reasonable-size
VMs, since it will likely take more time to send the RAM to destination,
than it will take for BIOS to scan fw_cfg).

The second problem only occurs if the guest panics _and_ has a guest
driver _and_ management knows to look at the crash event, so it is
mostly theoretical at this point in time.

Thus keep the code simple, and pretend it was never broken.

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

configure: Explicitly set ARFLAGS so we can build with GNU Make 4.0

Our rules.mak adds '-rR' to MAKEFLAGS to indicate that we will be
explicitly specifying everything and not relying on any default
variables or rules. However we were accidentally relying on the
default ARFLAGS ("rv"). This went unnoticed because of a bug in
GNU Make 3.82 and earlier which meant that adding -rR to MAKEFLAGS
only affected submakes, not the currently running instance.
Explicitly set ARFLAGS in config-host.mak, in the same way we
handle CFLAGS and LDFLAGS; this will allow us to work with
Make 4.0.

Thanks to Paul Smith for analyzing this bug for us.

Cc: [email protected]
Reported-by: Ken Moffat <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

sun4m: Add FCode ROM for TCX framebuffer

Upstream OpenBIOS now implements SBus probing in order to determine the
contents of a physical bus slot, which is required to allow OpenBIOS to
identify the framebuffer without help from the fw_cfg interface.

SBus probing works by detecting the presence of an FCode program
(effectively tokenised Forth) at the base address of each slot, and if
present executes it so that it creates its own device node in the
OpenBIOS device tree.

The FCode ROM is generated as part of the OpenBIOS build and should
generally be updated at the same time.

Signed-off-by: Mark Cave-Ayland <[email protected]>
CC: Blue Swirl <[email protected]>
CC: Bob Breuer <[email protected]>
CC: Artyom Tarasenko <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Update version for 1.7.0-rc1 release

Signed-off-by: Anthony Liguori <[email protected]>

vfio-pci: Fix multifunction=on

When an assigned device is initialized it copies the device config
space into the emulated config space. Unfortunately multifunction is
setup prior to the device initfn and gets clobbered. We need to
restore it just like pci-assign does.

Signed-off-by: Alex Williamson <[email protected]>
Reviewed-by: Bandan Das <[email protected]>
Message-id: 20131112185059 [email protected]
Cc: [email protected]
Signed-off-by: Anthony Liguori <[email protected]>

target-i386: Fix addr32 prefix in gen_lea_modrm

Fix the following run-test-x86_64 testsuite failures:

-lea (%%eax) = 0000000000000001
-lea (%%ebx) = 0000000000000002
-lea (%%ecx) = 0000000000000004
-lea (%%edx) = 0000000000000008
-lea (%%esi) = 0000000000000010
-lea (%%edi) = 0000000000000020
+lea (%%eax) = 0000abcc00000001
+lea (%%ebx) = 0000abcf00000002
+lea (%%ecx) = 0000abc900000004
+lea (%%edx) = 0000abc500000008
+lea (%%esi) = 0000abdd00000010
+lea (%%edi) = 0000abed00000020

In addition, reduce ifdeffery and minimize the number of TCG ops
produced during address computation.

Signed-off-by: Richard Henderson <[email protected]>
Message-id: 1384219016 [email protected]
Signed-off-by: Anthony Liguori <[email protected]>

atomic.h: Fix build with clang

clang defines __ATOMIC_SEQ_CST but its implementation of the
__atomic_exchange() builtin differs from that of gcc. Move the
__clang__ branch of the ifdef ladder to the top and fix its
implementation (there is no such builtin as __sync_exchange),
so we can compile with clang again.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Message-id: 1382435921 [email protected]
Signed-off-by: Anthony Liguori <[email protected]>

target-i386: do not override nr_cores for -cpu host

Commit 787aaf5 (target-i386: forward CPUID cache leaves when -cpu host is
used, 2013-09-02) brings bits 31..26 of CPUID leaf 04h out of sync with
the APIC IDs that QEMU reserves for each package. This number must come
from "-smp" options rather than from the host CPUID.

It also turns out that this unsyncing makes Windows Server 2012R2 fail
to boot.

Tested-by: Peter Lieven <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Benoit Canet <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Message-id: 1384879786 [email protected]
Signed-off-by: Anthony Liguori <[email protected]>

mips jazz: do not raise data bus exception when accessing invalid addresses

MIPS Jazz chipset doesn't seem to raise data bus exceptions on invalid accesses.
However, there is no easy way to prevent them. Creating a big memory region
for the whole address space doesn't prevent memory core to directly call
unassigned_mem_read/write which in turn call cpu->do_unassigned_access,
which (for MIPS CPU) raise an data bus exception.

This fixes a MIPS Jazz regression introduced in c658b94f6e8c206c59d02aa6fbac285b86b53d2c.

Signed-off-by: Hervé Poussineau <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Hervé Poussineau <[email protected]>
Message-id: 1383603977 [email protected]
Signed-off-by: Anthony Liguori <[email protected]>

target-i386: yield to another VCPU on PAUSE

After commit b1bbfe7 (aio / timers: On timer modification, qemu_notify
or aio_notify, 2013-08-21) FreeBSD guests report a huge slowdown.

The problem shows up as soon as FreeBSD turns out its periodic (~1 ms)
tick, but the timers are only the trigger for a pre-existing problem.

Before the offending patch, setting a timer did a timer_settime system call.

After, setting the timer exits the event loop (which uses poll) and
reenters it with a new deadline.  This does not cause any slowdown; the
difference is between one system call (timer_settime and a signal
delivery (SIGALRM) before the patch, and two system calls afterwards
(write to a pipe or eventfd + calling poll again when re-entering the
event loop).

Unfortunately, the exit/enter causes the main loop to grab the iothread
lock, which in turns kicks the VCPU thread out of execution.  This
causes TCG to execute the next VCPU in its round-robin scheduling of
VCPUS.  When the second VCPU is mostly unused, FreeBSD runs a "pause"
instruction in its idle loop which only burns cycles without any
progress.  As soon as the timer tick expires, the first VCPU runs
the interrupt handler but very soon it sets it again---and QEMU
then goes back doing nothing in the second VCPU.

The fix is to make the pause instruction do "cpu_loop_exit".

Cc: Richard Henderson <[email protected]>
Reported-by: Luigi Rizzo <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 1384948442 [email protected]
Signed-off-by: Anthony Liguori <[email protected]>