Git Repo - qemu.git/log

block: Lift more functions into BlockBackend

There are already some blk_aio_* functions, so we might as well have
blk_co_* functions (as far as we need them). This patch adds
blk_co_flush(), blk_co_discard(), and also blk_invalidate_cache() (which
is not a blk_co_* function but is needed nonetheless).

Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Message-id: 1416309679 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

ahci: replace SATA FIS type magic numbers with constants

SATA 3.0 "10.3.1 FIS Type values" defines the constants used to
differentiate between FIS types.

Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: John Snow <[email protected]>
Message-id: 1415874281 [email protected]
Signed-off-by: Kevin Wolf <[email protected]>

ahci: avoid #ifdef DEBUG_AHCI bitrot

Debug code using #ifdef is susceptible to bitrot because the compiler
never checks the debug code.

This is easy to avoid, change the DPRINTF() macro to use if (DEBUG_AHCI)
and always give it a 0 or 1 value.

This also allows us to drop an #ifdef DEBUG_AHCI in ahci_start_dma()
since the compiler can now see the local variable is used.

The motivation for this change is a recent DEBUG_AHCI build failure due
to an outdated DPRINTF() format string. From now on the compiler will
catch these errors.

Cc: John Snow <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: John Snow <[email protected]>
Message-id: 1415874281 [email protected]
Signed-off-by: Kevin Wolf <[email protected]>

iotests: Plain blkdebug filename generation

Add one test whether blkdebug is able to generate a plain filename if
given a configuration file and a file to be tested only; and add another
test whether blkdebug is able to do the same without being given a
configuration file.

Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-id: 1415697825 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

blkdebug: Simplify and improve filename generation

Instead of actually recreating the options from scratch, just reuse the
options given for creating the BDS, which are the configuration file
name and additional options. In case there are no additional options we
can thus create a plain filename.

This obviously results in a different output for qemu-iotest 099 which
exactly tests this filename generation. Fix it up as well.

Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-id: 1415697825 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

monitor: Fix HMP tab completion

Commands with multiple boolean flag options (like 'info block') didn't
provide correct completion because only the first one was skipped.

Signed-off-by: Kevin Wolf <[email protected]>

block/hmp: Allow node-name in 'info block'

The optional parameter specifying a block device allows now to use a
node-name instead of a drive name (and therefore to inspect any node in
the graph). The new -n options allows listing all named nodes instead of
BlockBackends.

Signed-off-by: Kevin Wolf <[email protected]>

block/hmp: Allow info = NULL in print_block_info()

This allows printing infos of BlockDriverStates that aren't at the root
of the graph (and logically implementing a BlockBackend).

Signed-off-by: Kevin Wolf <[email protected]>

block/hmp: Factor out print_block_info()

The new function prints the info for a single BlockDriverState.

Signed-off-by: Kevin Wolf <[email protected]>

block/qapi: Add cache information to query-block

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Eric Blake <[email protected]>

blockdev: acquire AioContext in change-backing-file

Add dataplane support to the change-backing-file QMP commands. By
acquiring the AioContext we avoid race conditions with the dataplane
thread which may also be accessing the BlockDriverState.

Note that this command operates on both bs and a node in its chain
(image_bs). The bdrv_chain_contains(bs, image_bs) check guarantees that
bs and image_bs are in the same AioContext.

Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

blockdev: acquire AioContext in eject, change, and block_passwd

By acquiring the AioContext we avoid race conditions with the dataplane
thread which may also be accessing the BlockDriverState.

Fix up eject, change, and block_passwd in a single patch because
qmp_eject() and qmp_change_blockdev() both call eject_device(). Also
fix block_passwd while we're tackling a command that takes a block
encryption password.

Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

blockdev: check for BLOCK_OP_TYPE_INTERNAL_SNAPSHOT_DELETE

The BLOCK_OP_TYPE_INTERNAL_SNAPSHOT_DELETE op blocker exists but was
never used! Let's fix that so snapshot delete can be blocked.

Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

blockdev: acquire AioContext in blockdev-snapshot-delete-internal-sync

Add dataplane support to the blockdev-snapshot-delete-internal-sync QMP
command. By acquiring the AioContext we avoid race conditions with the
dataplane thread which may also be accessing the BlockDriverState.

Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests: Use -qmp-pretty in 067

067 invokes query-block, resulting in a reference output with really
long lines (which may pose a problem in email patches and always poses a
problem when the output changes, because it is hard to see what has
actually changed). Use -qmp-pretty to mitigate this issue.

Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests: _filter_qmp for pretty JSON output

_filter_qmp should be able to correctly filter out the QMP version
object for pretty JSON output.

Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

chardev: Add -qmp-pretty

Add a command line option for adding a QMP monitor using pretty JSON
formatting.

Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qjson: Drop trailing space for pretty formatting

For the pretty formatting, the functions converting QDicts and QLists to
JSON should not print a space after the comma separating objects,
because a newline will emitted immediately afterwards, making the
whitespace superfluous.

Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qmp: Add optional switch "query-nodes" in query-blockstats

This bool option will allow query all the node names. It iterates all
the BDSes that are assigned a name, also in this case don't query up the
backing chain.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Include "node-name" if present in query-blockstats

Node name is a better identifier of BDS.

We will want to query statistics of a BDS node buried in the BDS graph,
so reporting the node's name if there is one will do the trick.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Add bdrv_get_node_name

This returns the node name of a BDS. Remove the TODO comment and expect
the callers to be explicit.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Add bdrv_next_node

Similar to bdrv_next, this traverses through graph_bdrv_states. Will be
useful to enumerate all the named nodes.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

Open 2.3 development tree

Signed-off-by: Peter Maydell <[email protected]>

Update version for v2.2.0 release

Signed-off-by: Peter Maydell <[email protected]>

Update version for v2.2.0-rc5 release

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/kraxel/tags/pull-cve-2014-8106-20141204-1' into staging

cirrus: fix blit region check

# gpg: Signature made Thu 04 Dec 2014 11:54:57 GMT using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg:                 aka "Gerd Hoffmann <[email protected]>"
# gpg:                 aka "Gerd Hoffmann (private) <[email protected]>"

* remotes/kraxel/tags/pull-cve-2014-8106-20141204-1:
  cirrus: don't overflow CirrusVGAState->cirrus_bltbuf
  cirrus: fix blit region check

Signed-off-by: Peter Maydell <[email protected]>

Update version for v2.2.0-rc4 release

Signed-off-by: Peter Maydell <[email protected]>

vhost: Fix vhostfd leak in error branch

Signed-off-by: Gonglei <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Message-id: 1417166789 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

cirrus: don't overflow CirrusVGAState->cirrus_bltbuf

This is CVE-2014-8106.

Signed-off-by: Gerd Hoffmann <[email protected]>

cirrus: fix blit region check

Issues:
* Doesn't check pitches correctly in case it is negative.
* Doesn't check width at all.

Turn macro into functions while being at it, also factor out the check
for one region which we then can simply call twice for src + dst.

This is CVE-2014-8106.

Reported-by: Paolo Bonzini <[email protected]>
Signed-off-by: Gerd Hoffmann <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>

Fix for crash after migration in virtio-rng on bi-endian targets

VirtIO devices now remember which endianness they're operating in in order
to support targets which may have guests of either endianness, such as
powerpc.  This endianness state is transferred in a subsection of the
virtio device's information.

With virtio-rng this can lead to an abort after a loadvm hitting the
assert() in virtio_is_big_endian().  This can be reproduced by doing a
migrate and load from file on a bi-endian target with a virtio-rng device.
The actual guest state isn't particularly important to triggering this.

The cause is that virtio_rng_load_device() calls virtio_rng_process() which
accesses the ring and thus needs the endianness.  However,
virtio_rng_process() is called via virtio_load() before it loads the
subsections.  Essentially the ->load callback in VirtioDeviceClass should
only be used for actually reading the device state from the stream, not for
post-load re-initialization.

This patch fixes the bug by moving the virtio_rng_process() after the call
to virtio_load().  Better yet would be to convert virtio to use vmsd and
have the virtio_rng_process() as a post_load callback, but that's a bigger
project for another day.

This is bugfix, and should be considered for the 2.2 branch.

Signed-off-by: David Gibson <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Message-id: 1417067290 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

virtio-net: fix unmap leak

virtio_net_handle_ctrl() and other functions that process control vq
request call iov_discard_front() which will shorten the iov. This will
lead unmapping in virtqueue_push() leaks mapping.

Fixes this by keeping the original iov untouched and using a temp variable
in those functions.

Cc: Wen Congyang <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: [email protected]
Signed-off-by: Jason Wang <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Message-id: 1417082643 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

hmp: fix regression of HMP device_del auto-completion

The commits:
- 6a1fa9f5 (monitor: add del completion for peripheral device)
- 66e56b13 (qdev: add qdev_build_hotpluggable_device_list helper)

cause a QEMU crash when trying to use HMP device_del auto-completion.
It can be easily reproduced by:
    <qemu-bin> -enable-kvm  ~/images/fedora.qcow2 -monitor stdio -device virtio-net-pci,id=vnet

    (qemu) device_del
    /home/mapfelba/git/upstream/qemu/hw/core/qdev.c:941:qdev_build_hotpluggable_device_list: Object 0x7f6ce04e4fe0 is not an instance of type device
    Aborted (core dumped)

The root cause is qdev_build_hotpluggable_device_list going recursively over
all peripherals and their children assuming all are devices. It doesn't work
since PCI devices have at least on child which is a memory region (bus master).

Solved by observing that all devices appear as direct children of
/machine/peripheral container. No need of going recursively
over all the children.

Signed-off-by: Marcel Apfelbaum <[email protected]>
Reported-by: Gal Hammer <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Message-id: 1417002601 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

qemu-timer: Avoid overflows when converting timeout to struct timespec

In qemu_poll_ns(), when we convert an int64_t nanosecond timeout into
a struct timespec, we may accidentally run into overflow problems if
the timeout is very long. This happens because the tv_sec field is a
time_t, which is signed, so we might end up setting it to a negative
value by mistake. This will result in what was intended to be a
near-infinite timeout turning into an instantaneous timeout, and we'll
busy loop. Cap the maximum timeout at INT32_MAX seconds (about 68 years)
to avoid this problem.

This specifically manifested on ARM hosts as an extreme slowdown on
guest shutdown (when the guest reprogrammed the PL031 RTC to not
generate alarms using a very long timeout) but could happen on other
hosts and guests too.

Reported-by: Christoffer Dall <[email protected]>
Cc: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Message-id: 1416939705 [email protected]

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

The final 2.2 patches from me.

# gpg: Signature made Wed 26 Nov 2014 11:12:25 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg:                 aka "Paolo Bonzini <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  s390x/kvm: Fix compile error
  fw_cfg: fix boot order bug when dynamically modified via QOM
  -machine vmport=auto: Fix handling of VMWare ioport emulation for xen

Signed-off-by: Peter Maydell <[email protected]>

s390x/kvm: Fix compile error

commit a2b257d6212a "memory: expose alignment used for allocating RAM
as MemoryRegion API" triggered a compile error on KVM/s390x.

Fix the prototype and the implementation of legacy_s390_alloc.

Cc: Igor Mammedov <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

fw_cfg: fix boot order bug when dynamically modified via QOM

When we dynamically modify boot order, the length of
boot order will be changed, but we don't update
s->files->f[i].size with new length. This casuse
seabios read a wrong vale of qemu cfg file about
bootorder.

Cc: Gerd Hoffmann <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Signed-off-by: Gonglei <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

-machine vmport=auto: Fix handling of VMWare ioport emulation for xen

c/s 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4

or

c/s b154537ad07598377ebf98252fb7d2aff127983b

moved the testing of xen_enabled() from pc_init1() to
pc_machine_initfn().

xen_enabled() does not return the correct value in
pc_machine_initfn().

Changed vmport from a bool to an enum. Added the value "auto" to do
the old way. Move check of xen_enabled() back to pc_init1().

Acked-by: Eric Blake <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Don Slutz <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Update version for v2.2.0-rc3 release

Signed-off-by: Peter Maydell <[email protected]>

input: move input-send-event into experimental namespace

Ongoing discussions on how we are going to specify the console,
so tag the command as experiental so we can refine things in
the 2.3 development cycle.

Signed-off-by: Gerd Hoffmann <[email protected]>
Message-id: 1416923657 [email protected]
[Spell out "not a stable API", and x- the QAPI schema, too]
Signed-off-by: Markus Armbruster <[email protected]>
Reviewed-by: Amos Kong <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc, pci, misc bugfixes

A bunch of bugfixes for 2.2.

Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Mon 24 Nov 2014 18:59:47 GMT using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>"
# gpg:                 aka "Michael S. Tsirkin <[email protected]>"

* remotes/mst/tags/for_upstream:
  pc: acpi: mark all possible CPUs as enabled in SRAT
  pcie: fix improper use of negative value
  pcie: fix typo in pcie_cap_deverr_init()
  target-i386: move generic memory hotplug methods to DSDTs
  acpi-build: mark RAM dirty on table update
  hw/pci: fix crash on shpc error flow
  pc: count in 1Gb hugepage alignment when sizing hotplug-memory container
  pc: explicitly check maxmem limit when adding DIMM
  pc: pc-dimm: use backend alignment during address auto allocation
  pc: align DIMM's address/size by backend's alignment value
  memory: expose alignment used for allocating RAM as MemoryRegion API
  pc: limit DIMM address and size to page aligned values
  pc: make pc_dimm_plug() more readble
  pc: kvm: check if KVM has free memory slots to avoid abort()
  qemu-char: fix tcp_get_fds

Signed-off-by: Peter Maydell <[email protected]>

pc: acpi: mark all possible CPUs as enabled in SRAT

If QEMU is started with -numa ... Windows only notices that
CPU has been hot-added but it will not online such CPUs.

It's caused by the fact that possible CPUs are flagged as
not enabled in SRAT and Windows honoring that information
doesn't use corresponding CPU.

ACPI 5.0 Spec regarding to flag says:
"
Table 5-47 Local APIC Flags
...
Enabled: if zero, this processor is unusable, and the operating system
support will not attempt to use it.
"

Fix QEMU to adhere to spec and mark possible CPUs as enabled
in SRAT.

With that Windows onlines hot-added CPUs as expected.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pcie: fix improper use of negative value

Signed-off-by: Gonglei <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pcie: fix typo in pcie_cap_deverr_init()

Reported-by:
https://bugs.launchpad.net/qemu/+bug/1393440

Signed-off-by: Gonglei <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

target-i386: move generic memory hotplug methods to DSDTs

This makes it simpler to keep the SSDT byte-for-byte identical for a
given machine type, which is a goal we want to have for 2.2 and newer
types.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

acpi-build: mark RAM dirty on table update

acpi build modifies internal FW CFG RAM on first access
but we forgot to mark it dirty.
If this RAM has been migrated already, it won't be
migrated again, returning corrupted tables to guest.

Signed-off-by: Michael S. Tsirkin <[email protected]>

hw/pci: fix crash on shpc error flow

If the pci bridge enters in error flow as part
of init process it will only delete the shpc mmio
subregion but not remove it from the properties list,
resulting in segmentation fault when the bridge runs
the exit function.

Example: add a pci bridge without specifing the chassis number:
    <qemu-bin> ... -device pci-bridge,id=p1
Result:
    (qemu) qemu-system-x86_64: -device pci-bridge,id=p1: Bridge chassis not specified. Each bridge is required to be assigned a unique chassis id > 0.
    qemu-system-x86_64: -device pci-bridge,id=p1: Device
    initialization failed.
    Segmentation fault (core dumped)

    if (child->class->unparent) {
    #0  0x00005555558d629b in object_finalize_child_property (obj=0x555556d2e830, name=0x555556d30630 "shpc-mmio[0]", opaque=0x555556a42fc8) at qom/object.c:1078
    #1  0x00005555558d4b1f in object_property_del_all (obj=0x555556d2e830) at qom/object.c:367
    #2  0x00005555558d4ca1 in object_finalize (data=0x555556d2e830) at qom/object.c:412
    #3  0x00005555558d55a1 in object_unref (obj=0x555556d2e830) at qom/object.c:720
    #4  0x000055555572c907 in qdev_device_add (opts=0x5555563544f0) at qdev-monitor.c:566
    #5  0x0000555555744f16 in device_init_func (opts=0x5555563544f0, opaque=0x0) at vl.c:2213
    #6  0x00005555559cf5f0 in qemu_opts_foreach (list=0x555555e0f8e0 <qemu_device_opts>, func=0x555555744efa <device_init_func>, opaque=0x0, abort_on_failure=1) at util/qemu-option.c:1057
    #7  0x000055555574a11b in main (argc=16, argv=0x7fffffffdde8, envp=0x7fffffffde70) at vl.c:423

Unparent the shpc mmio region as part of shpc cleanup.

Signed-off-by: Marcel Apfelbaum <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Amos Kong <[email protected]>

pc: count in 1Gb hugepage alignment when sizing hotplug-memory container

if DIMMs with different size/alignment are interleaved
in creation order, it could lead to hotplug-memory
container fragmentation and following inability to use
all RAM upto maxmem.
For example:
    -m 4G,slots=3,maxmem=7G
    -object memory-backend-file,id=mem-1,size=256M,mem-path=/pagesize-2MB
    -device pc-dimm,id=mem1,memdev=mem-1
    -object memory-backend-file,id=mem-2,size=1G,mem-path=/pagesize-1GB
    -device pc-dimm,id=mem2,memdev=mem-2
    -object memory-backend-file,id=mem-3,size=256M,mem-path=/pagesize-2MB
    -device pc-dimm,id=mem3,memdev=mem-3

fragments hotplug-memory container and doesn't allow
to use 1GB hugepage backend to consume remainig 1Gb.

To ease managment factor count in max 1Gb alignment for
each memory slot when sizing hotplug-memory region so
that regadless of fragmentaion it would be possible to
add max aligned DIMM.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pc: explicitly check maxmem limit when adding DIMM

Currently maxmem limit is not checked and depends on
hotplug region container not being able to fit more RAM
than maxmem. Do check explicitly so that it would
be possible to change hotplug container size later
to deal with fragmentation.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block patches for 2.2.0-rc3

# gpg: Signature made Mon 24 Nov 2014 12:52:23 GMT using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"

* remotes/kevin/tags/for-upstream:
Revert "qemu-img info: show nocow info"

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

Three patches to fix ExtINT for the QEMU implementation of the local APIC.

# gpg: Signature made Mon 24 Nov 2014 13:38:36 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg:                 aka "Paolo Bonzini <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  apic: fix incorrect handling of ExtINT interrupts wrt processor priority
  apic: fix loss of IPI due to masked ExtINT
  apic: avoid getting out of halted state on masked PIC interrupts

Signed-off-by: Peter Maydell <[email protected]>

apic: fix incorrect handling of ExtINT interrupts wrt processor priority

This fixes another failure with ExtINT, demonstrated by QNX. The failure
mode is as follows:
- IPI sent to cpu 0 (bit set in APIC irr)
- IPI accepted by cpu 0 (bit cleared in irr, set in isr)
- IPI sent to cpu 0 (bit set in both irr and isr)
- PIC interrupt sent to cpu 0

The PIC interrupt causes CPU_INTERRUPT_HARD to be set, but
apic_irq_pending observes that the highest pending APIC interrupt priority
(the IPI) is the same as the processor priority (since the IPI is still
being handled), so apic_get_interrupt returns a spurious interrupt rather
than the pending PIC interrupt. The result is an endless sequence of
spurious interrupts, since nothing will clear CPU_INTERRUPT_HARD.

Instead, ExtINT interrupts should have ignored the processor priority.
Calling apic_check_pic early in apic_get_interrupt ensures that
apic_deliver_pic_intr is called instead of delivering the spurious
interrupt. apic_deliver_pic_intr then clears CPU_INTERRUPT_HARD if needed.

Reported-by: Richard Bilson <[email protected]>
Tested-by: Richard Bilson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

apic: fix loss of IPI due to masked ExtINT

This patch fixes an obscure failure of the QNX kernel on QEMU x86 SMP.
In QNX, all hardware interrupts come via the PIC, and are delivered by
the cpu 0 LAPIC in ExtINT mode, while IPIs are delivered by the LAPIC
in fixed mode.

This bug happens as follows:
- cpu 0 masks a particular PIC interrupt
- IPI sent to cpu 0 (CPU_INTERRUPT_HARD is set)
- before the IPI is accepted, the masked interrupt line is asserted by the
device

Since the interrupt is masked, apic_deliver_pic_intr will clear
CPU_INTERRUPT_HARD. The IPI will still be set in the APIC irr, but since
CPU_INTERRUPT_HARD is not set the cpu will not notice. Depending on the
scenario this can cause a system hang, i.e. if cpu 0 is expected to unmask
the interrupt.

In order to fix this, do a full check of the APIC before an EXTINT
is acknowledged. This can result in clearing CPU_INTERRUPT_HARD, but
can also result in delivering the lost IPI.

Reported-by: Richard Bilson <[email protected]>
Tested-by: Richard Bilson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

apic: avoid getting out of halted state on masked PIC interrupts

After the next patch, if a masked PIC interrupts causes CPU_INTERRUPT_POLL
to be set, the CPU will spuriously get out of halted state.  While this
is technically valid, we should avoid that.

Make CPU_INTERRUPT_POLL run apic_update_irq in the right thread and then
look at CPU_INTERRUPT_HARD.  If CPU_INTERRUPT_HARD does not get set,
do not report the CPU as having work.

Also move the handling of software-disabled APIC from apic_update_irq
to apic_irq_pending, and always trigger CPU_INTERRUPT_POLL.  This will
be important once we will add a case that resets CPU_INTERRUPT_HARD
from apic_update_irq.  We want to run it even if we go through
CPU_INTERRUPT_POLL, and even if the local APIC is software disabled.

Reported-by: Richard Bilson <[email protected]>
Tested-by: Richard Bilson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Revert "qemu-img info: show nocow info"

This reverts commit 000c4dfff4d7686e2fba3066a477a1290ed60622.

The main reason for reverting this commit before the 2.2 release is that
it adds a QAPI interface that we don't want to keep: The 'nocow' flag
doesn't generally make sense for block nodes, but only for the raw-posix
driver. It should therefore be part of ImageInfoSpecific rather than
ImageInfo.

The commit contains more problems, but unlike the API stability issue
they wouldn't justify reverting it.

Conflicts:
block/qapi.c

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>

pc: pc-dimm: use backend alignment during address auto allocation

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pc: align DIMM's address/size by backend's alignment value

Performance wise it's better to align GVA by the backend's
page size.

Also do not allow to create DIMM device with suboptimal
size (i.e. not aligned to backends page size) to aviod
memory loss.

Do above only for 2.2 and newer machine types to avoid
breaking working configs with 2.1 machine type.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

memory: expose alignment used for allocating RAM as MemoryRegion API

introduce memory_region_get_alignment() that returns
underlying memory block alignment or 0 if it's not
relevant/implemented for backend.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pc: limit DIMM address and size to page aligned values

When running in KVM mode, kvm_set_phys_mem() will silently
fail if registered MemoryRegion address/size is not page
aligned. Causing memory hotplug failure in guest.

Mapping non aligned MemoryRegion in TCG mode 'works', but
sane guest OS still expects page aligned memory module
and fails to initialize it if it's not aligned.

So do not allow non aligned (i.e. valid) address/size
values for DIMM to avoid either KVM failure or guest
issues caused by it.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pc: make pc_dimm_plug() more readble

split addr initialization from declaration so that
later when new local vars are added property getter
wouldn't drift off of error check.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pc: kvm: check if KVM has free memory slots to avoid abort()

When more memory devices are used than available
KVM memory slots, QEMU crashes with:

kvm_alloc_slot: no free slot available
Aborted (core dumped)

Fix this by checking that KVM has a free slot before
attempting to map memory in guest address space.

Signed-off-by: Igor Mammedov <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

qemu-char: fix tcp_get_fds

tcp_get_fds API discards fds if there's more than 1 of these.

It's tricky to fix this without API changes in the generic case.

However, this API is only used by tests ATM, and tests know how
many fds they expect.

So let's not waste cycles trying to fix this properly:
simply assume at most 16 fds (tests use at most 8 now).
assert if some test tries to get more.

Signed-off-by: Michael S. Tsirkin <[email protected]>

Merge remote-tracking branch 'remotes/stefanha/tags/net-pull-request' into staging

# gpg: Signature made Fri 21 Nov 2014 11:12:37 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg:                 aka "Stefan Hajnoczi <[email protected]>"

* remotes/stefanha/tags/net-pull-request:
  rtl8139: fix Pointer to local outside scope
  pcnet: fix Negative array index read
  net/socket: fix Uninitialized scalar variable
  net/slirp: fix memory leak

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/kraxel/tags/pull-gtk-20141121-1' into staging

gtk: two bugfixes for 2.2.

# gpg: Signature made Fri 21 Nov 2014 07:38:45 GMT using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg:                 aka "Gerd Hoffmann <[email protected]>"
# gpg:                 aka "Gerd Hoffmann (private) <[email protected]>"

* remotes/kraxel/tags/pull-gtk-20141121-1:
  gtk: Don't crash if -nodefaults
  gtk: fix possible memory leak about local_err

Signed-off-by: Peter Maydell <[email protected]>

rtl8139: fix Pointer to local outside scope

Coverity spot:
Assigning: iov = struct iovec [3]({{buf, 12UL},
{(void *)dot1q_buf, 4UL},
{buf + 12, size - 12}})
(address of temporary variable of type struct iovec [3]).
out_of_scope: Temporary variable of type struct iovec [3] goes out of scope.

Pointer to local outside scope (RETURN_LOCAL)
use_invalid:
Using iov, which points to an out-of-scope temporary variable of type struct iovec [3].

Signed-off-by: Gonglei <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

pcnet: fix Negative array index read

s->xmit_pos maybe assigned to a negative value (-1),
but in this branch variable s->xmit_pos as an index to
array s->buffer. Let's add a check for s->xmit_pos.

Signed-off-by: Gonglei <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

net/socket: fix Uninitialized scalar variable

If is_connected parameter is false, the saddr
variable will no initialize. Coverity report:
uninit_use: Using uninitialized value saddr.sin_port.

We don't need add saddr information to nc->info_str
when is_connected is false.

Signed-off-by: Gonglei <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

net/slirp: fix memory leak

commit b412eb61 introduce 'cmd:' target for guestfwd,
and fwd don't be used in this scenario, and will leak
memory in true branch with 'cmd:'. Let's allocate memory
for fwd variable just in else statement.

Cc: Alexander Graf <[email protected]>
Signed-off-by: Gonglei <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

gtk: Don't crash if -nodefaults

This fixes a crash by just skipping the vte resize hack if cur is NULL.

Reproducer:

qemu-system-x86_64 -nodefaults

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Gerd Hoffmann <[email protected]>

gtk: fix possible memory leak about local_err

local_err in gd_vc_gfx_init() is not freed, and we don't use it,
so remove it.

Signed-off-by: zhanghailiang <[email protected]>
Signed-off-by: Gerd Hoffmann <[email protected]>

hw/arm/virt: set stdout-path instead of linux,stdout-path

ePAPR 1.1 defines the stdout-path property, making the os-specific
linux,stdout-path property redundant. Change the DT setup for ARM virt
to use the generic property - supported by Linux since 3.15.

The old QEMU behaviour was not present in any released version of
QEMU, and was only added to QEMU after the kernel changed, so
this should not break any existing setups.

Signed-off-by: Leif Lindholm <[email protected]>
[PMM: add note to commit about the old behaviour never hving been
in a released version of QEMU]
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/agraf/tags/signed-ppc-for-upstream' into staging

Patch queue for ppc - 2014-11-20

Hopefully the last few fixups for 2.2:

  - KVM memory slot fix (should usually only occur on PPC)
  - e300 fix
  - Altivec mtvscr instruction fix

# gpg: Signature made Thu 20 Nov 2014 13:53:34 GMT using RSA key ID 03FEDC60
# gpg: Good signature from "Alexander Graf <[email protected]>"
# gpg:                 aka "Alexander Graf <[email protected]>"

* remotes/agraf/tags/signed-ppc-for-upstream:
  target-ppc: Altivec's mtvscr Decodes Wrong Register
  kvm: Fix memory slot page alignment logic
  target-ppc: Fix breakpoint registers for e300

Signed-off-by: Peter Maydell <[email protected]>

target-ppc: Altivec's mtvscr Decodes Wrong Register

The Move to Vector Status and Control Register (mtvscr) instruction
uses VRB as the source register. Fix the code generator to correctly
decode the VRB field. That is, use "rB(ctx->opcode)" instead of
"rD(ctx->opcode)".

Signed-off-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

kvm: Fix memory slot page alignment logic

Memory slots have to be page aligned to get entered into KVM. There
is existing logic that tries to ensure that we pad memory slots that
are not page aligned to the biggest region that would still fit in the
alignment requirements.

Unfortunately, that logic is broken. It tries to calculate the start
offset based on the region size.

Fix up the logic to do the thing it was intended to do and document it
properly in the comment above it.

With this patch applied, I can successfully run an e500 guest with more
than 3GB RAM (at which point RAM starts overlapping subpage memory regions).

Cc: [email protected]
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Fix breakpoint registers for e300

In the previous patch, the registers were added to init_proc_G2LE
instead of init_proc_e300.

Signed-off-by: Fabien Chouteau <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

Merge remote-tracking branch 'remotes/amit-migration/tags/for-2.2-2' into staging

Fix from a while back that unfortunately got ignored.  Dave Gilbert says
it may actually fix a case where autoconverge would break on a repeat
migration (and not just fix stats).

# gpg: Signature made Thu 20 Nov 2014 12:52:41 GMT using RSA key ID 854083B6
# gpg: Good signature from "Amit Shah <[email protected]>"
# gpg:                 aka "Amit Shah <[email protected]>"
# gpg:                 aka "Amit Shah <[email protected]>"

* remotes/amit-migration/tags/for-2.2-2:
  migration: static variables will not be reset at second migration

Signed-off-by: Peter Maydell <[email protected]>

migration: static variables will not be reset at second migration

The static variables in migration_bitmap_sync will not be reset in
the case of a second attempted migration.

Signed-off-by: ChenLiang <[email protected]>
Signed-off-by: Gonglei <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Amit Shah <[email protected]>

Update version for v2.2.0-rc2 release

Signed-off-by: Peter Maydell <[email protected]>

hw/ide/core.c: Prevent SIGSEGV during migration

The other callers to blk_set_enable_write_cache() in this file
already check for s->blk == NULL.

Signed-off-by: Don Slutz <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Message-id: 1416259239 [email protected]
Cc: [email protected]
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/stefanha/tags/net-pull-request' into staging

# gpg: Signature made Tue 18 Nov 2014 15:04:53 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg: aka "Stefan Hajnoczi <[email protected]>"

* remotes/stefanha/tags/net-pull-request:
net: The third parameter of getsockname should be initialized

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging

# gpg: Signature made Tue 18 Nov 2014 15:04:14 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg:                 aka "Stefan Hajnoczi <[email protected]>"

* remotes/stefanha/tags/tracing-pull-request:
  Tracing: Fix simpletrace.py error on tcg enabled binary traces
  Tracing docs fix configure option and description

Signed-off-by: Peter Maydell <[email protected]>

net: The third parameter of getsockname should be initialized

Signed-off-by: zhanghailiang <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

Tracing: Fix simpletrace.py error on tcg enabled binary traces

simpletrace.py does not recognize the tcg option while reading trace-events file. In result simpletrace does not work on binary traces and tcg enabled events. Moved transformation of tcg enabled events to _read_events() which is used by simpletrace.

Signed-off-by: Christoph Seifert <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

Tracing docs fix configure option and description

Fix the example trace configure option.
Update the text to say that multiple backends are allowed and what
happens when multiple backends are enabled.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Message-id: 1412691161 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block patches for 2.2.0-rc2

# gpg: Signature made Tue 18 Nov 2014 11:32:55 GMT using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"

* remotes/kevin/tags/for-upstream:
  block/raw-posix: Catch fsync() errors
  block/raw-posix: Only sync after successful preallocation
  block/raw-posix: Fix preallocating write() loop
  raw-posix: The SEEK_HOLE code is flawed, rewrite it
  raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  raw-posix: Fix comment for raw_co_get_block_status()

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/amit-migration/tags/for-2.2' into staging

Fix for CVE-2014-7840, avoiding arbitrary qemu memory overwrite for
migration by Michael S. Tsirkin.

# gpg: Signature made Tue 18 Nov 2014 11:23:00 GMT using RSA key ID 854083B6
# gpg: Good signature from "Amit Shah <[email protected]>"
# gpg:                 aka "Amit Shah <[email protected]>"
# gpg:                 aka "Amit Shah <[email protected]>"

* remotes/amit-migration/tags/for-2.2:
  migration: fix parameter validation on ram load

Signed-off-by: Peter Maydell <[email protected]>

linux-headers: update to 3.18-rc5

This updates the Linux header to version 3.18-rc5, adding support for
(among other things) read-only memslots on ARM and arm64.

Signed-off-by: Ard Biesheuvel <[email protected]>
Message-id: 1416248898 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

migration: fix parameter validation on ram load

During migration, the values read from migration stream during ram load
are not validated. Especially offset in host_from_stream_offset() and
also the length of the writes in the callers of said function.

To fix this, we need to make sure that the [offset, offset + length]
range fits into one of the allocated memory regions.

Validating addr < len should be sufficient since data seems to always be
managed in TARGET_PAGE_SIZE chunks.

Fixes: CVE-2014-7840
Note: follow-up patches add extra checks on each block->host access.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Amit Shah <[email protected]>

block/raw-posix: Catch fsync() errors

fsync() may fail, and that case should be handled.

Reported-by: László Érsek <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block/raw-posix: Only sync after successful preallocation

The loop which filled the file with zeroes may have been left early due
to an error. In that case, the fsync() should be skipped.

Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block/raw-posix: Fix preallocating write() loop

write() may write less bytes than requested; in this case, the number of
bytes written is returned. This is the byte count we should be
subtracting from the number of bytes still to be written, and not the
byte count we requested to write.

Reported-by: László Érsek <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

exec: Handle multipage ranges in invalidate_and_set_dirty()

The code in invalidate_and_set_dirty() needs to handle addr/length
combinations which cross guest physical page boundaries. This can happen,
for example, when disk I/O reads large blocks into guest RAM which previously
held code that we have cached translations for. Unfortunately we were only
checking the clean/dirty status of the first page in the range, and then
were calling a tb_invalidate function which only handles ranges that don't
cross page boundaries. Fix the function to deal with multipage ranges.

The symptoms of this bug were that guest code would misbehave (eg segfault),
in particular after a guest reboot but potentially any time the guest
reused a page of its physical RAM for new code.

Cc: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Message-id: 1416167061 [email protected]

Merge remote-tracking branch 'mreitz/block' into queue-block

* mreitz/block:
  raw-posix: The SEEK_HOLE code is flawed, rewrite it
  raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  raw-posix: Fix comment for raw_co_get_block_status()

raw-posix: The SEEK_HOLE code is flawed, rewrite it

On systems where SEEK_HOLE in a trailing hole seeks to EOF (Solaris,
but not Linux), try_seek_hole() reports trailing data instead.

Additionally, unlikely lseek() failures are treated badly:

* When SEEK_HOLE fails, try_seek_hole() reports trailing data.  For
  -ENXIO, there's in fact a trailing hole.  Can happen only when
  something truncated the file since we opened it.

* When SEEK_HOLE succeeds, SEEK_DATA fails, and SEEK_END succeeds,
  then try_seek_hole() reports a trailing hole.  This is okay only
  when SEEK_DATA failed with -ENXIO (which means the non-trailing hole
  found by SEEK_HOLE has since become trailing somehow).  For other
  failures (unlikely), it's wrong.

* When SEEK_HOLE succeeds, SEEK_DATA fails, SEEK_END fails (unlikely),
  then try_seek_hole() reports bogus data [-1,start), which its caller
  raw_co_get_block_status() turns into zero sectors of data.  Could
  theoretically lead to infinite loops in code that attempts to scan
  data vs. hole forward.

Rewrite from scratch, with very careful comments.

Signed-off-by: Markus Armbruster <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

raw-posix: SEEK_HOLE suffices, get rid of FIEMAP

Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
follows:

1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl

2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek()

3. Else pretend there are no holes

Later on, raw_co_is_allocated() was generalized to
raw_co_get_block_status().

Commit 4f11aa8 (May 2014) changed it to try the three methods in order
until success, because "there may be implementations which support
[SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice
versa."

Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC.
Commit 38c4d0a (Sep 2014) added it.  Because that's a significant
speed hit, the next commit 7c159037 put SEEK_HOLE/SEEK_DATA first.

As you see, the obvious use of FIEMAP is wrong, and the correct use is
slow.  I guess this puts it somewhere between -7 "The obvious use is
wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard
to Misuse scale[*].

"Fortunately", the FIEMAP code is used only when

* SEEK_HOLE/SEEK_DATA aren't defined, but CONFIG_FIEMAP is

  Uncommon.  SEEK_HOLE had no XFS implementation between 2011 (when it
  was introduced for ext4 and btrfs) and 2012.

* SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails

  Unlikely.

Thus, the FIEMAP code executes rarely.  Makes it a nice hidey-hole for
bugs.  Worse, bugs hiding there can theoretically bite even on a host
that has SEEK_HOLE/SEEK_DATA.

I don't want to worry about this crap, not even theoretically.  Get
rid of it.

[*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html

Signed-off-by: Markus Armbruster <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

raw-posix: Fix comment for raw_co_get_block_status()

Missed in commit 705be72.

Signed-off-by: Markus Armbruster <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

target-arm: handle address translations that start at level 3

The ARMv8 address translation system defines that a page table walk
starts at a level which depends on the translation granule size
and the number of bits of virtual address that need to be resolved.
Where the translation granule is 64KB and the guest sets the
TCR.TxSZ field to between 35 and 39, it's actually possible to
start at level 3 (the final level). QEMU's implementation failed
to handle this case, and so we would set level to 2 and behave
incorrectly (including invoking the C undefined behaviour of
shifting left by a negative number). Correct the code that
determines the starting level to deal with the start-at-3 case,
by replacing the if-else ladder with an expression derived from
the ARM ARM pseudocode version.

This error was detected by the Coverity scan, which spotted
the potential shift by a negative number.

Signed-off-by: Peter Maydell <[email protected]>
Message-id: 1415890569 [email protected]

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

A smattering of fixes for problems that Coverity reported.

# gpg: Signature made Mon 17 Nov 2014 17:03:25 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg:                 aka "Paolo Bonzini <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  hcd-musb: fix dereference null return value
  target-cris/translate.c: fix out of bounds read
  shpc: fix error propaagation
  qemu-char: fix MISSING_COMMA
  acl: fix memory leak
  nvme: remove superfluous check
  loader: fix NEGATIVE_RETURNS
  qga: fix false negative argument passing
  mips_mipssim: fix use-after-free for filename
  l2tpv3: fix fd leak
  l2tpv3: fix possible double free
  libcacard: fix resource leak

Signed-off-by: Peter Maydell <[email protected]>

hcd-musb: fix dereference null return value

usb_ep_get and usb_handle_packet can deal with a NULL device, but we have
to avoid dereferencing NULL pointers when building the id.

Thanks to Gonglei for an initial stab at fixing this.

Signed-off-by: Paolo Bonzini <[email protected]>

Merge remote-tracking branch 'remotes/mcayland/tags/qemu-openbios-signed' into staging

Update OpenBIOS images

# gpg: Signature made Sat 15 Nov 2014 13:12:02 GMT using RSA key ID AE0F321F
# gpg: Good signature from "Mark Cave-Ayland <[email protected]>"

* remotes/mcayland/tags/qemu-openbios-signed:
Update OpenBIOS images

Signed-off-by: Peter Maydell <[email protected]>