Git Repo - qemu.git/log

Merge remote-tracking branch 'remotes/mdroth/tags/qga-pull-2017-11-20-tag' into staging

qemu-ga patch queue for 2.11

* fix potential overflow in network interface stats reporting

# gpg: Signature made Mon 20 Nov 2017 20:56:05 GMT
# gpg:                using RSA key 0x3353C9CEF108B584
# gpg: Good signature from "Michael Roth <[email protected]>"
# gpg:                 aka "Michael Roth <[email protected]>"
# gpg:                 aka "Michael Roth <[email protected]>"
# Primary key fingerprint: CEAC C9E1 5534 EBAB B82D  3FA0 3353 C9CE F108 B584

* remotes/mdroth/tags/qga-pull-2017-11-20-tag:
  qga: replace GetIfEntry with GetIfEntry2 for interface stats

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/riku/tags/pull-linux-user-20171120' into staging

late linux-user fixes for Qemu 2.11

# gpg: Signature made Mon 20 Nov 2017 21:19:00 GMT
# gpg:                using RSA key 0xB44890DEDE3C9BC0
# gpg: Good signature from "Riku Voipio <[email protected]>"
# gpg:                 aka "Riku Voipio <[email protected]>"
# Primary key fingerprint: FF82 03C8 C391 98AE 0581  41EF B448 90DE DE3C 9BC0

* remotes/riku/tags/pull-linux-user-20171120:
  linux-user: Fix calculation of auxv length
  linux-user: Handle rt_sigaction correctly for SPARC
  linux-user/sparc: Put address for data faults where linux-user expects it
  linux-user/ppc: Report correct fault address for data faults
  linux-user/s390x: Mask si_addr for SIGSEGV
  linux-user: return EINVAL from prctl(PR_*_SECCOMP)
  linux-user: fix 'finshed' typo in comment
  linux-user/syscall.c: Handle SH4's exceptional alignment for p{read, write}64
  linux-user: Handle TARGET_MAP_STACK and TARGET_MAP_HUGETLB
  linux-user/hppa: Fix TARGET_F_RDLCK, TARGET_F_WRLCK, TARGET_F_UNLCK
  linux-user/hppa: Fix TARGET_MAP_TYPE
  linux-user/hppa: Fix typo for TARGET_NR_epoll_wait
  linux-user/hppa: Fix cpu_clone_regs
  linux-user/hppa: Fix TARGET_SA_* defines
  linux-user: Restrict usage of sa_restorer

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20171120' into staging

target-arm queue:
* hw/arm: Silence xlnx-ep108 deprecation warning during tests
* hw/arm/aspeed: Unlock SCU when running kernel
* arm: check regime, not current state, for ATS write PAR format
* nvic: Fix ARMv7M MPU_RBAR reads
* target/arm: Report GICv3 sysregs present in ID registers if needed

# gpg: Signature made Mon 20 Nov 2017 17:35:25 GMT
# gpg:                using RSA key 0x3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <[email protected]>"
# gpg:                 aka "Peter Maydell <[email protected]>"
# gpg:                 aka "Peter Maydell <[email protected]>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* remotes/pmaydell/tags/pull-target-arm-20171120:
  hw/arm: Silence xlnx-ep108 deprecation warning during tests
  hw/arm/aspeed: Unlock SCU when running kernel
  arm: check regime, not current state, for ATS write PAR format
  nvic: Fix ARMv7M MPU_RBAR reads
  target/arm: Report GICv3 sysregs present in ID registers if needed

Signed-off-by: Peter Maydell <[email protected]>

qga: replace GetIfEntry with GetIfEntry2 for interface stats

The data obtained by GetIfEntry is 32 bits, and it may overflow. Thus
using GetIfEntry2 instead of GetIfEntry.

Signed-off-by: ZhiPeng Lu <[email protected]>
*avoid CamelCase variable names
*update field names for MIB_IFROW -> MIB_IF_ROW2
*dynamically probe for GetIfIndex2 to deal with older OSs
*check return value from get_interface_index
Signed-off-by: Michael Roth <[email protected]>

Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20171120-v1' into staging

Fix storing cpu status (both kvm and tcg), locking around diag 308
(tcg only) and a non-zero variable in the s390-ccw bios.

# gpg: Signature made Mon 20 Nov 2017 15:18:05 GMT
# gpg:                using RSA key 0xDECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <[email protected]>"
# gpg:                 aka "Cornelia Huck <[email protected]>"
# gpg:                 aka "Cornelia Huck <[email protected]>"
# gpg:                 aka "Cornelia Huck <[email protected]>"
# gpg:                 aka "Cornelia Huck <[email protected]>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0  18CE DECF 6B93 C6F0 2FAF

* remotes/cohuck/tags/s390x-20171120-v1:
  pc-bios/s390-ccw.img: update image
  pc-bios/s390-ccw: Fix problem with invalid virtio-scsi LUN when rebooting
  s390x/tcg: fix DIAG 308 with > 1 VCPU (MTTCG)
  s390x: fix storing CPU status (again)

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.11-20171120' into staging

ppc patch queue 2017-11-20

Here's the current queue of ppc patches.  These 2 patches are both
more complex than I'd ideally like this late in the 2.11 cycle.
However, they do fix important bugs, so I think it's worth it on
balance.

# gpg: Signature made Mon 20 Nov 2017 03:27:19 GMT
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <[email protected]>"
# gpg:                 aka "David Gibson (Red Hat) <[email protected]>"
# gpg:                 aka "David Gibson (ozlabs.org) <[email protected]>"
# gpg:                 aka "David Gibson (kernel.org) <[email protected]>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.11-20171120:
  spapr: reset DRCs after devices
  target/ppc: Update setting of cpu features to account for compat modes

Signed-off-by: Peter Maydell <[email protected]>

pc-bios/s390-ccw.img: update image

Contains the following commit:
- pc-bios/s390-ccw: Fix problem with invalid virtio-scsi LUN when rebooting

Signed-off-by: Cornelia Huck <[email protected]>

Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging

# gpg: Signature made Mon 20 Nov 2017 03:28:54 GMT
# gpg:                using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request:
  hw/net/vmxnet3: Fix code to work on big endian hosts, too
  net: Transmit zero UDP checksum as 0xFFFF
  MAINTAINERS: Add missing entry for eepro100 emulation
  hw/net/eepro100: Fix endianness problem on big endian hosts
  Revert "Add new PCI ID for i82559a"
  colo-compare: fix the dangerous assignment

Signed-off-by: Peter Maydell <[email protected]>

linux-user: Fix calculation of auxv length

In commit 7c4ee5bcc82e643 we changed the order in which we construct
the AUXV, but forgot to adjust the calculation of the length. The
result is that we set info->auxv_len to a bogus and negative value,
and then later on the code in open_self_auxv() gets confused and
ends up presenting the guest with an empty file.

Since we now have to calculate the auxv length up-front as part
of figuring out how much we're going to put on the stack, set
info->auxv_len then; this allows us to assert that we put the
same number of entries into auxv as we pre-calculated, rather
than merely having a comment saying we need to do that.

Fixes: https://bugs.launchpad.net/qemu/+bug/1728116
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Riku Voipio <[email protected]>

hw/arm: Silence xlnx-ep108 deprecation warning during tests

The new deprecation warning for the xlnx-ep108 machine also pops up
during "make check" which is kind of confusing. Silence it if testing
mode is enabled.

Signed-off-by: Thomas Huth <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Acked-by: Wei Huang <[email protected]>
Message-id: 1510846183 [email protected]
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

hw/arm/aspeed: Unlock SCU when running kernel

The ASPEED hardware contains a lock register for the SCU that disables
any writes to the SCU when it is locked. The machine comes up with the
lock enabled, but on all known hardware u-boot will unlock it and leave
it unlocked when loading the kernel.

This means the kernel expects the SCU to be unlocked. When booting from
an emulated ROM the normal u-boot unlock path is executed. Things don't
go well when booting using the -kernel command line, as u-boot does not
run first.

Change behaviour so that when a kernel is passed to the machine, set the
reset value of the SCU to be unlocked.

Signed-off-by: Joel Stanley <[email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>
Message-id: 20171114122018 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

arm: check regime, not current state, for ATS write PAR format

In do_ats_write(), rather than using extended_addresses_enabled() to
decide whether the value we get back from get_phys_addr() is a 64-bit
format PAR or a 32-bit one, use arm_s1_regime_using_lpae_format().

This is not really the correct answer, because the PAR format
depends on the AT instruction being used, not just on the
translation regime. However getting this correct requires a
significant refactoring, so that get_phys_addr() returns raw
information about the fault which the caller can then assemble
into a suitable FSR/PAR/syndrome for its purposes, rather than
get_phys_addr() returning a pre-formatted FSR.

However this change at least improves the situation by making
the PAR work correctly for address translation operations done
at AArch64 EL2 on the EL2 translation regime. In particular,
this is necessary for Xen to be able to run in our emulation,
so this seems like a safer interim fix given that we are in freeze.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Edgar E. Iglesias <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Tested-by: Stefano Stabellini <[email protected]>
Message-id: 1509719814 [email protected]

nvic: Fix ARMv7M MPU_RBAR reads

Fix an incorrect mask expression in the handling of v7M MPU_RBAR
reads that meant that we would always report the ADDR field as zero.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Message-id: 1509732813 [email protected]

target/arm: Report GICv3 sysregs present in ID registers if needed

The CPU ID registers ID_AA64PFR0_EL1, ID_PFR1_EL1 and ID_PFR1
have a field for reporting presence of GICv3 system registers.
We need to report this field correctly in order for Xen to
work as a guest inside QEMU emulation. We mustn't incorrectly
claim the sysregs exist when they don't, though, or Linux will
crash.

Unfortunately the way we've designed the GICv3 emulation in QEMU
puts the system registers as part of the GICv3 device, which
may be created after the CPU proper has been realized. This
means that we don't know at the point when we define the ID
registers what the correct value is. Handle this by switching
them to calling a function at runtime to read the value, where
we can fill in the GIC field appropriately.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Tested-by: Stefano Stabellini <[email protected]>
Message-id: 1510066898 [email protected]

Revert "cpu-exec: don't overwrite exception_index"

This reverts commit e01cecabf3e04d22340d7e8b3616ef051c42c891,
which breaks booting of aarch64 Linux images.

Signed-off-by: Peter Maydell <[email protected]>

pc-bios/s390-ccw: Fix problem with invalid virtio-scsi LUN when rebooting

When rebooting a guest that has a virtio-scsi disk, the s390-ccw
bios sometimes bails out with an error message like this:

! SCSI cannot report LUNs: STATUS=02 RSPN=70 KEY=05 CODE=25 QLFR=00, sure !

Enabling the scsi_req* tracing in QEMU shows that the ccw bios is
trying to execute the REPORT LUNS SCSI command with a LUN != 0, and
this causes the SCSI command to fail.
Looks like we neither clear the BSS of the s390-ccw bios during reboot,
nor do we explicitly set the default_scsi_device.lun value to 0, so
this variable can contain random values from the OS after the reboot.
By setting this variable explicitly to 0, the problem is fixed and
the reboots always succeed.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1514352
Signed-off-by: Thomas Huth <[email protected]>
Message-Id: <1510942228 [email protected]>
Acked-by: Christian Borntraeger <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: fix DIAG 308 with > 1 VCPU (MTTCG)

Currently, multi threaded TCG with > 1 VCPU gets stuck during IPL, when
the bios tries to switch to the loaded kernel via DIAG 308.

As run_on_cpu() is used, we run into a deadlock after handling the reset.
We need the iolock (just like KVM).

Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20171116170526 [email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x: fix storing CPU status (again)

Looks like the last fix + cleanup introduced another bug. (for now Linux
guests don't seem to care) - we store the crs into ars.

Fixes: 947a38bd6f13 ("s390x/kvm: fix and cleanup storing CPU status")
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20171116170526 [email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

hw/net/vmxnet3: Fix code to work on big endian hosts, too

Since commit ab06ec43577177a442e8 we test the vmxnet3 device in the
pxe-tester, too (when running "make check SPEED=slow"). This now
revealed that the code is not working there if the host is a big
endian machine (for example ppc64 or s390x) - "make check SPEED=slow"
is now failing on such hosts.

The vmxnet3 code lacks endianness conversions in a couple of places.
Interestingly, the bitfields in the structs in vmxnet3.h already tried to
take care of the *bit* endianness of the C compilers - but the code missed
to change the *byte* endianness when reading or writing the corresponding
structs. So the bitfields are now wrapped into unions which allow to change
the byte endianness during runtime with the non-bitfield member of the union.
With these changes, "make check SPEED=slow" now properly works on big endian
hosts, too.

Reported-by: David Gibson <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: Transmit zero UDP checksum as 0xFFFF

The checksum algorithm used by IPv4, TCP and UDP allows a zero value
to be represented by either 0x0000 and 0xFFFF. But per RFC 768, a zero
UDP checksum must be transmitted as 0xFFFF because 0x0000 is a special
value meaning no checksum.

Substitute 0xFFFF whenever a checksum is computed as zero when
modifying a UDP datagram header. Doing this on IPv4 and TCP checksums
is unnecessary but legal. Add a wrapper for net_checksum_finish() that
makes the substitution.

(We can't just change net_checksum_finish(), as that function is also
used by receivers to verify checksums, and in that case the expected
value is always 0x0000.)

Signed-off-by: Ed Swierk <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

MAINTAINERS: Add missing entry for eepro100 emulation

Signed-off-by: Stefan Weil <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

hw/net/eepro100: Fix endianness problem on big endian hosts

Since commit 1865e288a823c764cd4344d ("Fix eepro100 simple transmission
mode"), the test/pxe-test is broken for the eepro100 device on big
endian hosts. However, it seems like that commit did not introduce the
problem, but just uncovered it: The EEPRO100State->tx.tbd_array_addr and
EEPRO100State->tx.tcb_bytes fields are already in host byte order, since
they have already been byte-swapped in the read_cb() function.
Thus byte-swapping them in tx_command() again results in the wrong
endianness. Removing the byte-swapping here fixes the pxe-test.

Signed-off-by: Thomas Huth <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

Revert "Add new PCI ID for i82559a"

This reverts commit 5e89dc01133f8f5e621f6b66b356c6f37d31dafb since:

- we should use ID in the spec instead the one used by OEM
- in the future, we should allow changing id through either property
or EEPROM file.

Cc: Stefan Weil <[email protected]>
Cc: Michael Nawrocki <[email protected]>
Cc: Peter Maydell <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Reviewed-by: Stefan Weil <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

colo-compare: fix the dangerous assignment

Cc: Peter Maydell <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Zhang Chen <[email protected]>
Cc: Li Zhijian <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Fixes: 8ec14402029d783720f4312ed8a925548e1dad61
Reported-by: Peter Maydell <[email protected]>
Reported-by: Paolo Bonzini <[email protected]>
Signed-off-by: Mao Zhongyi <[email protected]>
Reviewed-by: Darren Kenny <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

spapr: reset DRCs after devices

A DRC with a pending unplug request releases its associated device at
machine reset time.

In the case of LMB, when all DRCs for a DIMM device have been reset,
the DIMM gets unplugged, causing guest memory to disappear. This may
be very confusing for anything still using this memory.

This is exactly what happens with vhost backends, and QEMU aborts
with:

qemu-system-ppc64: used ring relocated for ring 2
qemu-system-ppc64: qemu/hw/virtio/vhost.c:649: vhost_commit: Assertion
`r >= 0' failed.

The issue is that each DRC registers a QEMU reset handler, and we
don't control the order in which these handlers are called (ie,
a LMB DRC will unplug a DIMM before the virtio device using the
memory on this DIMM could stop its vhost backend).

To avoid such situations, let's reset DRCs after all devices
have been reset.

Reported-by: Mallesh N. Koti <[email protected]>
Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Daniel Henrique Barboza <[email protected]>
Reviewed-by: Michael Roth <[email protected]>
Signed-off-by: David Gibson <[email protected]>

target/ppc: Update setting of cpu features to account for compat modes

The device tree nodes ibm,arch-vec-5-platform-support and ibm,pa-features
are used to communicate features of the cpu to the guest operating
system. The properties of each of these are determined based on the
selected cpu model and the availability of hypervisor features.
Currently the compatibility mode of the cpu is not taken into account.

The ibm,arch-vec-5-platform-support node is used to communicate the
level of support for various ISAv3 processor features to the guest
before CAS to inform the guests' request. The available mmu mode should
only be hash unless the cpu is a POWER9 which is not in a prePOWER9
compat mode, in which case the available modes depend on the
accelerator and the hypervisor capabilities.

The ibm,pa-featues node is used to communicate the level of cpu support
for various features to the guest os. This should only contain features
relevant to the operating mode of the processor, that is the selected
cpu model taking into account any compat mode. This means that the
compat mode should be taken into account when choosing the properties of
ibm,pa-features and they should match the compat mode selected, or the
cpu model selected if no compat mode.

Update the setting of these cpu features in the device tree as described
above to properly take into account any compat mode. We use the
ppc_check_compat function which takes into account the current processor
model and the cpu compat mode.

Signed-off-by: Suraj Jitindar Singh <[email protected]>
Signed-off-by: David Gibson <[email protected]>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches for 2.11.0-rc2

# gpg: Signature made Fri 17 Nov 2017 17:58:36 GMT
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (25 commits)
  iotests: Make 087 pass without AIO enabled
  block: Make bdrv_next() keep strong references
  qcow2: Fix overly broad madvise()
  qcow2: Refuse to get unaligned offsets from cache
  qcow2: Add bounds check to get_refblock_offset()
  block: Guard against NULL bs->drv
  qcow2: Unaligned zero cluster in handle_alloc()
  qcow2: check_errors are fatal
  qcow2: reject unaligned offsets in write compressed
  iotests: Add test for failing qemu-img commit
  tests: Add check-qobject for equality tests
  iotests: Add test for non-string option reopening
  block: qobject_is_equal() in bdrv_reopen_prepare()
  qapi: Add qobject_is_equal()
  qapi/qlist: Add qlist_append_null() macro
  qapi/qnull: Add own header
  qcow2: fix image corruption on commit with persistent bitmap
  iotests: test clearing unknown autoclear_features by qcow2
  block: Fix permissions in image activation
  qcow2: fix image corruption after committing qcow2 image into base
  ...

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'mreitz/tags/pull-block-2017-11-17' into queue-block

Block patches for 2.11.0-rc2

# gpg: Signature made Fri Nov 17 18:22:07 2017 CET
# gpg:                using RSA key F407DB0061D5CF40
# gpg: Good signature from "Max Reitz <[email protected]>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* mreitz/tags/pull-block-2017-11-17:
  iotests: Make 087 pass without AIO enabled
  block: Make bdrv_next() keep strong references
  qcow2: Fix overly broad madvise()
  qcow2: Refuse to get unaligned offsets from cache
  qcow2: Add bounds check to get_refblock_offset()
  block: Guard against NULL bs->drv
  qcow2: Unaligned zero cluster in handle_alloc()
  qcow2: check_errors are fatal
  qcow2: reject unaligned offsets in write compressed
  iotests: Add test for failing qemu-img commit
  tests: Add check-qobject for equality tests
  iotests: Add test for non-string option reopening
  block: qobject_is_equal() in bdrv_reopen_prepare()
  qapi: Add qobject_is_equal()
  qapi/qlist: Add qlist_append_null() macro
  qapi/qnull: Add own header

Signed-off-by: Kevin Wolf <[email protected]>

iotests: Make 087 pass without AIO enabled

If AIO has not been enabled in the qemu build that is to be tested, we
should skip the "aio=native without O_DIRECT" test instead of failing.

Signed-off-by: Max Reitz <[email protected]>
Message-id: 20171115180732 [email protected]
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

block: Make bdrv_next() keep strong references

On one hand, it is a good idea for bdrv_next() to return a strong
reference because ideally nearly every pointer should be refcounted.
This fixes intermittent failure of iotest 194.

On the other, it is absolutely necessary for bdrv_next() itself to keep
a strong reference to both the BB (in its first phase) and the BDS (at
least in the second phase) because when called the next time, it will
dereference those objects to get a link to the next one.  Therefore, it
needs these objects to stay around until then.  Just storing the pointer
to the next in the iterator is not really viable because that pointer
might become invalid as well.

Both arguments taken together means we should probably just invoke
bdrv_ref() and blk_ref() in bdrv_next().  This means we have to assert
that bdrv_next() is always called from the main loop, but that was
probably necessary already before this patch and judging from the
callers, it also looks to actually be the case.

Keeping these strong references means however that callers need to give
them up if they decide to abort the iteration early.  They can do so
through the new bdrv_next_cleanup() function.

Suggested-by: Kevin Wolf <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Message-id: 20171110172545 [email protected]
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

qcow2: Fix overly broad madvise()

@mem_size and @offset are both size_t, thus subtracting them from one
another will just return a big size_t if mem_size < offset -- even more
obvious here because the result is stored in another size_t.

Checking that result to be positive is therefore not sufficient to
exclude the case that offset > mem_size. Thus, we currently sometimes
issue an madvise() over a very large address range.

This is triggered by iotest 163, but with -m64, this does not result in
tangible problems. But with -m32, this test produces three segfaults,
all of which are fixed by this patch.

Signed-off-by: Max Reitz <[email protected]>
Message-id: 20171114184127 [email protected]
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Reviewed-by: Darren Kenny <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

qcow2: Refuse to get unaligned offsets from cache

Instead of using an assertion, it is better to emit a corruption event
here.  Checking all offsets for correct alignment can be tedious and it
is easily possible to forget to do so.  qcow2_cache_do_get() is a
function every L2 and refblock access has to go through, so this is a
good central point to add such a check.

And for good measure, let us also add an assertion that the offset is
non-zero.  Making this a corruption event is not feasible, because a
zero offset usually means something special (such as the cluster is
unused), so all callers should be checking this anyway.  If they do not,
it is their fault, hence the assertion here.

Signed-off-by: Max Reitz <[email protected]>
Message-id: 20171110203111 [email protected]
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

qcow2: Add bounds check to get_refblock_offset()

Reported-by: R. Nageswara Sastry <[email protected]>
Buglink: https://bugs.launchpad.net/qemu/+bug/1728661
Signed-off-by: Max Reitz <[email protected]>
Message-id: 20171110203111 [email protected]
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

block: Guard against NULL bs->drv

We currently do not guard everywhere against a NULL bs->drv where we
should be doing so.  Most of the places fixed here just do not care
about that case at all.

Some care implicitly, e.g. through a prior function call to
bdrv_getlength() which would always fail for an ejected BDS.  Add an
assert there to make it more obvious.

Other places seem to care, but do so insufficiently: Freeing clusters in
a qcow2 image is an error-free operation, but it may leave the image in
an unusable state anyway.  Giving qcow2_free_clusters() an error code is
not really viable, it is much easier to note that bs->drv may be NULL
even after a successful driver call.  This concerns bdrv_co_flush(), and
the way the check is added to bdrv_co_pdiscard() (in every iteration
instead of only once).

Finally, some places employ at least an assert(bs->drv); somewhere, that
may be reasonable (such as in the reopen code), but in
bdrv_has_zero_init(), it is definitely not.  Returning 0 there in case
of an ejected BDS saves us much headache instead.

Reported-by: R. Nageswara Sastry <[email protected]>
Buglink: https://bugs.launchpad.net/qemu/+bug/1728660
Signed-off-by: Max Reitz <[email protected]>
Message-id: 20171110203111 [email protected]
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

qcow2: Unaligned zero cluster in handle_alloc()

We should check whether the cluster offset we are about to use is
actually valid; that is, whether it is aligned to cluster boundaries.

Reported-by: R. Nageswara Sastry <[email protected]>
Buglink: https://bugs.launchpad.net/qemu/+bug/1728643
Buglink: https://bugs.launchpad.net/qemu/+bug/1728657
Signed-off-by: Max Reitz <[email protected]>
Message-id: 20171110203111 [email protected]
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

qcow2: check_errors are fatal

When trying to repair a dirty image, qcow2_check() may apparently
succeed (no really fatal error occurred that would prevent the check
from continuing), but if check_errors in the result object is non-zero,
we cannot trust the image to be usable.

Reported-by: R. Nageswara Sastry <[email protected]>
Buglink: https://bugs.launchpad.net/qemu/+bug/1728639
Signed-off-by: Max Reitz <[email protected]>
Message-id: 20171110203111 [email protected]
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

qcow2: reject unaligned offsets in write compressed

Misaligned compressed write is not supported.

Signed-off-by: Anton Nefedov <[email protected]>
Message-id: 1510654613 [email protected]
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

iotests: Add test for failing qemu-img commit

Signed-off-by: Max Reitz <[email protected]>
Message-id: 20170616135847 [email protected]
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

tests: Add check-qobject for equality tests

Add a new test file (check-qobject.c) for unit tests that concern
QObjects as a whole.

Its only purpose for now is to test the qobject_is_equal() function.

Signed-off-by: Max Reitz <[email protected]>
Message-id: 20171114180128 [email protected]
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

iotests: Add test for non-string option reopening

Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-id: 20171114180128 [email protected]
Signed-off-by: Max Reitz <[email protected]>

block: qobject_is_equal() in bdrv_reopen_prepare()

Currently, bdrv_reopen_prepare() assumes that all BDS options are
strings. However, this is not the case if the BDS has been created
through the json: pseudo-protocol or blockdev-add.

Note that the user-invokable reopen command is an HMP command, so you
can only specify strings there. Therefore, specifying a non-string
option with the "same" value as it was when originally created will now
return an error because the values are supposedly similar (and there is
no way for the user to circumvent this but to just not specify the
option again -- however, this is still strictly better than just
crashing).

Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Message-id: 20171114180128 [email protected]
Signed-off-by: Max Reitz <[email protected]>

qapi: Add qobject_is_equal()

This generic function (along with its implementations for different
types) determines whether two QObjects are equal.

Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Message-id: 20171114180128 [email protected]
Signed-off-by: Max Reitz <[email protected]>

qapi/qlist: Add qlist_append_null() macro

Besides the macro itself, this patch also adds a corresponding
Coccinelle rule.

Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Message-id: 20171114180128 [email protected]
Signed-off-by: Max Reitz <[email protected]>

qapi/qnull: Add own header

Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Message-id: 20171114180128 [email protected]
Signed-off-by: Max Reitz <[email protected]>

qcow2: fix image corruption on commit with persistent bitmap

If an image contains persistent bitmaps, we cannot use the
fast path of bdrv_make_empty() to clear the image during
qemu-img commit, because that will lose the clusters related
to the bitmaps.

Also leave a comment in qcow2_read_extensions to remind future
feature additions to think about fast-path removal, since we
just barely fixed the same bug for LUKS encryption.

It's a pain that qemu-img has not yet been taught to manipulate,
or even at a very minimum display, information about persistent
bitmaps; instead, we have to use QMP commands. It's also a
pain that only qeury-block and x-debug-block-dirty-bitmap-sha256
will allow bitmap introspection; but the former requires the
node to be hooked to a block device, and the latter is experimental.

Signed-off-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests: test clearing unknown autoclear_features by qcow2

Test clearing unknown autoclear_features by qcow2 on incoming
migration.

[ kwolf: Fixed wait for destination VM startup ]

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Max Reitz <[email protected]>

block: Fix permissions in image activation

Inactive images generally request less permissions for their image files
than they would if they were active (in particular, write permissions).
Activating the image involves extending the permissions, therefore.

drv->bdrv_invalidate_cache() can already require write access to the
image file, so we have to update the permissions earlier than that.
The current code does it only later, so we have to move up this part.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Max Reitz <[email protected]>

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2017-11-17' into staging

nbd patches for 2017-11-17

Eric Blake - nbd: Don't crash when server reports NBD_CMD_READ failure
Eric Blake - nbd/client: Use error_prepend() correctly
Eric Blake - nbd/client: Don't hard-disconnect on ESHUTDOWN from server
Eric Blake - nbd/server: Fix error reporting for bad requests

# gpg: Signature made Fri 17 Nov 2017 14:53:30 GMT
# gpg:                using RSA key 0xA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <[email protected]>"
# gpg:                 aka "Eric Blake (Free Software Programmer) <[email protected]>"
# gpg:                 aka "[jpeg image of size 6874]"
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2017-11-17:
  nbd/server: Fix error reporting for bad requests
  nbd/client: Don't hard-disconnect on ESHUTDOWN from server
  nbd/client: Use error_prepend() correctly
  nbd: Don't crash when server reports NBD_CMD_READ failure

Signed-off-by: Peter Maydell <[email protected]>

nbd/server: Fix error reporting for bad requests

The NBD spec says an attempt to NBD_CMD_TRIM on a read-only
export should fail with EPERM, as a trim has the potential
to change disk contents, but we were relying on the block
layer to catch that for us, which might not always give the
right error (and even if it does, it does not let us pass
back a sane message for structured replies).

The NBD spec says an attempt to NBD_CMD_WRITE_ZEROES out of
bounds should fail with ENOSPC, not EINVAL.

Our check for u64 offset + u32 length wraparound up front is
pointless; nothing uses offset until after the second round
of sanity checks, and we can just as easily ensure there is
no wraparound by checking whether offset is in bounds (since
a disk size cannot exceed off_t which is 63 bits, adding a
32-bit number for a valid offset can't overflow). Bonus:
dropping the up-front check lets us keep the connection alive
after NBD_CMD_WRITE, whereas before we would drop the
connection (of course, any client sending a packet that would
trigger the failure is already buggy, so it's also okay to
drop the connection, but better quality-of-implementation
never hurts).

Solve all of these issues by some code motion and improved
request validation.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20171115213557 [email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>

nbd/client: Don't hard-disconnect on ESHUTDOWN from server

The NBD spec says that a server may fail any transmission request
with ESHUTDOWN when it is apparent that no further request from
the client can be successfully honored.  The client is supposed
to then initiate a soft shutdown (wait for all remaining in-flight
requests to be answered, then send NBD_CMD_DISC).  However, since
qemu's server never uses ESHUTDOWN errors, this code was mostly
untested since its introduction in commit b6f5d3b5.

More recently, I learned that nbdkit as the NBD server is able to
send ESHUTDOWN errors, so I finally tested this code, and noticed
that our client was special-casing ESHUTDOWN to cause a hard
shutdown (immediate disconnect, with no NBD_CMD_DISC), but only
if the server sends this error as a simple reply.  Further
investigation found that commit d2febedb introduced a regression
where structured replies behave differently than simple replies -
but that the structured reply behavior is more in line with the
spec (even if we still lack code in nbd-client.c to properly quit
sending further requests).  So this patch reverts the portion of
b6f5d3b5 that introduced an improper hard-disconnect special-case
at the lower level, and leaves the future enhancement of a nicer
soft-disconnect at the higher level for another day.

CC: [email protected]
Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20171113194857 [email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>

nbd/client: Use error_prepend() correctly

When using error prepend(), it is necessary to end with a space
in the format string; otherwise, messages come out incorrectly,
such as when connecting to a socket that hangs up immediately:

can't open device nbd://localhost:10809/: Failed to read dataUnexpected end-of-file before all bytes were read

Originally botched in commit e44ed99d, then several more instances
added in the meantime.

Pre-existing and not fixed here: we are inconsistent on capitalization;
some of our messages start with lower case, and others start with upper,
although the use of error_prepend() is much nicer to read when all
fragments consistently start with lower.

CC: [email protected]
Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20171113152424 [email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>

nbd: Don't crash when server reports NBD_CMD_READ failure

If a server fails a read, for example with EIO, but the connection
is still live, then we would crash trying to print a non-existent
error message in nbd_client_co_preadv(). For consistency, also
change the error printout in nbd_read_reply_entry(), although that
instance does not crash. Bug introduced in commit f140e300.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20171112013936 [email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>

qcow2: fix image corruption after committing qcow2 image into base

After committing the qcow2 image contents into the base image, qemu-img
will call bdrv_make_empty to drop the payload in the layered image.

When this is done for qcow2 images, it blows away the LUKS encryption
header, making the resulting image unusable. There are two codepaths
for emptying a qcow2 image, and the second (slower) codepath leaves
the LUKS header intact, so force use of that codepath.

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Deprecate bdrv_set_read_only() and users

bdrv_set_read_only() is used by some block drivers to override the
read-only option given by the user. This is not how read-only images
generally work in QEMU: Instead of second guessing what the user really
meant (which currently includes making an image read-only even if the
user didn't only use the default, but explicitly said read-only=off), we
should error out if we can't provide what the user requested.

This adds deprecation warnings to all callers of bdrv_set_read_only() so
that the behaviour can be corrected after the usual deprecation period.

Signed-off-by: Kevin Wolf <[email protected]>

qcow2: don't permit changing encryption parameters

Currently if trying to change encryption parameters on a qcow2 image, qemu-img
will abort. We already explicitly check for attempt to change encrypt.format
but missed other parameters like encrypt.key-secret. Rather than list each
parameter, just blacklist changing of all parameters with a 'encrypt.' prefix.

Signed-off-by: Daniel P. Berrange <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Fix error path in bdrv_backing_update_filename()

error_setg_errno() takes a positive errno code. Spotted by Coverity
(CID 1381628).

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>

qemu-iotests: Use -nographic in 182

This avoids that random UI frontend error messages end up in the output.
In particular, we were seeing this line in CI error logs:

+Unable to init server: Could not connect: Connection refused

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Kashyap Chamarthy <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>

replication: Fix replication open fail

replication_child_perm request write
permissions for all child which will lead bdrv_check_perm fail.
replication_child_perm() should request write
permissions only if it is writable itself.

Signed-off-by: Wang Guang <[email protected]>
Signed-off-by: Wang Yong <[email protected]>
Reviewed-by: Xie Changlong <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

Merge remote-tracking branch 'remotes/kraxel/tags/ui-20171117-pull-request' into staging

sdl2 fixes for 2.11

# gpg: Signature made Fri 17 Nov 2017 10:06:27 GMT
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg:                 aka "Gerd Hoffmann <[email protected]>"
# gpg:                 aka "Gerd Hoffmann (private) <[email protected]>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/ui-20171117-pull-request:
  sdl2: Fix broken display updating after the window is hidden
  sdl2: Do not leave grab when fullscreen
  sdl2: Fix dead keyboard after fullsceen
  sdl2: Use the same pointer show/hide logic for absolute and relative mode
  sdl2: Do not quit the emulator when an auxilliary window is closed

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc, pci, virtio: fixes for rc1

A bunch of fixes all over the place.

Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Thu 16 Nov 2017 16:37:21 GMT
# gpg:                using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>"
# gpg:                 aka "Michael S. Tsirkin <[email protected]>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream:
  tests/bios-tables-test: Fix endianess problems when passing data to iasl
  build-sys: restrict vmcoreinfo to fw_cfg+dma capable targets
  vmcoreinfo: put it in the 'misc' device category
  NUMA: Enable adding NUMA node implicitly
  tests/acpi-test-data: update _CRS in DSDT
  hw/pcie-pci-bridge: restrict to X86 and ARM
  hw/pci-host: Fix x86 Host Bridges 64bit PCI hole
  pci: Initialize pci_dev->name before use
  fix: unrealize virtio device if we fail to hotplug it

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

# gpg: Signature made Thu 16 Nov 2017 16:36:02 GMT
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg:                 aka "Stefan Hajnoczi <[email protected]>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request:
  throttle-groups: forget timer and schedule next TGM on detach

Signed-off-by: Peter Maydell <[email protected]>

tests/bios-tables-test: Fix endianess problems when passing data to iasl

The bios-tables-test was writing out files that we pass to iasl in
with the wrong endianness in the header when running on a big endian
host. So instead of storing mixed endian information in our structures,
let's keep everything in little endian and byte-swap it only when we
need a value in the code.

Reported-by: Daniel P. Berrange <[email protected]>
Buglink: https://bugs.launchpad.net/qemu/+bug/1724570
Suggested-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>
Tested-by: "Daniel P. Berrange" <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

build-sys: restrict vmcoreinfo to fw_cfg+dma capable targets

vmcoreinfo is built for all targets. However, it requires fw_cfg with
DMA operations support (write operation). Restrict vmcoreinfo exposure
to architectures that are supporting FW_CFG_DMA, that is arm-virt and
x86 only atm.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Reviewed-by: Daniel Henrique Barboza <[email protected]>
Tested-by: Daniel Henrique Barboza <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vmcoreinfo: put it in the 'misc' device category

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

NUMA: Enable adding NUMA node implicitly

Linux and Windows need ACPI SRAT table to make memory hotplug work properly,
however currently QEMU doesn't create SRAT table if numa options aren't present
on CLI.

Which breaks both linux and windows guests in certain conditions:
* Windows: won't enable memory hotplug without SRAT table at all
* Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table
   present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers
   when memory is hotplugged and guest tries to use it with that drivers.

Fix above issues by automatically creating a numa node when QEMU is started with
memory hotplug enabled but without '-numa' options on CLI.
(PS: auto-create numa node only for new machine types so not to break migration).

Which would provide SRAT table to guests without explicit -numa options on CLI
and would allow:
* Windows: to enable memory hotplug
* Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated
   buffers that legacy drivers/hw can handle.

[Rewritten by Igor]

Reported-by: Thadeu Lima de Souza Cascardo <[email protected]>
Suggested-by: Igor Mammedov <[email protected]>
Signed-off-by: Dou Liyang <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Eduardo Habkost <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Marcel Apfelbaum <[email protected]>
Cc: Igor Mammedov <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Thomas Huth <[email protected]>
Cc: Alistair Francis <[email protected]>
Cc: Takao Indoh <[email protected]>
Cc: Izumi Taku <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

tests/acpi-test-data: update _CRS in DSDT

commit dadf988e81b15065ac1d6dbaf4b87b5b80c7b670
hw/pci-host: Fix x86 Host Bridges 64bit PCI hole

Added a 64 bit hole to _CRS of PCI0.
Update the expected files accordingly.

Signed-off-by: Michael S. Tsirkin <[email protected]>

hw/pcie-pci-bridge: restrict to X86 and ARM

The PCIE-PCI bridge is specific to "pure" PCIe systems
(on QEMU we have X86 and ARM), it does not make sense to
have it in other archs.

Reported-by: Thomas Huth <[email protected]>
Signed-off-by: Marcel Apfelbaum <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Tested-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Tested-by: Yongbok Kim <[email protected]>

hw/pci-host: Fix x86 Host Bridges 64bit PCI hole

Currently there is no MMIO range over 4G
reserved for PCI hotplug. Since the 32bit PCI hole
depends on the number of cold-plugged PCI devices
and other factors, it is very possible is too small
to hotplug PCI devices with large BARs.

Fix it by reserving 2G for I4400FX chipset
in order to comply with older Win32 Guest OSes
and 32G for Q35 chipset.

Even if the new defaults of pci-hole64-size will appear in
"info qtree" also for older machines, the property was
not implemented so no changes will be visible to guests.

Note this is a regression since prev QEMU versions had
some range reserved for 64bit PCI hotplug.

Reviewed-by: Laszlo Ersek <[email protected]>
Reviewed-by: Gerd Hoffmann <[email protected]>
Signed-off-by: Marcel Apfelbaum <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pci: Initialize pci_dev->name before use

This moves pci_dev->name initialization earlier so
pci_dev->bus_master_as could get a name instead of an empty string.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

fix: unrealize virtio device if we fail to hotplug it

If we fail to hotplug virtio-blk device and then suspend
or shutdown VM, qemu is likely to crash.

Re-production steps:
1. Run VM named vm001
2. Create a virtio-blk.xml which contains wrong configurations:
<disk device="lun" rawio="yes" type="block">
  <driver cache="none" io="native" name="qemu" type="raw" />
  <source dev="/dev/mapper/11-dm" />
  <target bus="virtio" dev="vdx" />
</disk>
3. Run command : virsh attach-device vm001 virtio-blk.xml
error: Failed to attach device from blk-scsi.xml
error: internal error: unable to execute QEMU command 'device_add': Please set scsi=off for virtio-blk devices in order to use virtio 1.0
it means hotplug virtio-blk device failed.
4. Suspend or shutdown VM will leads to qemu crash

Problem happens in virtio_vmstate_change which is called by
vm_state_notify:
vdev’s parent_bus is NULL, so qdev_get_parent_bus(DEVICE(vdev)) will crash.
virtio_vmstate_change is added to the list vm_change_state_head at virtio_blk_device_realize(virtio_init),
but after hotplug virtio-blk failed, virtio_vmstate_change will not be removed from vm_change_state_head.
Adding unrealize function of virtio-blk device can solve this problem.

Signed-off-by: linzhecheng <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

Miscellaneous bugfixes

# gpg: Signature made Wed 15 Nov 2017 15:27:25 GMT
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg:                 aka "Paolo Bonzini <[email protected]>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  fix scripts/update-linux-headers.sh here document
  exec: Do not resolve subpage in mru_section
  util/stats64: Fix min/max comparisons
  cpu-exec: avoid cpu_exec_nocache infinite loop with record/replay
  cpu-exec: don't overwrite exception_index
  vhost-user-scsi: add missing virtqueue_size param
  target-i386: adds PV_TLB_FLUSH CPUID feature bit
  thread-posix: fix qemu_rec_mutex_trylock macro
  Makefile: simpler/faster "make help"
  ioapic/tracing: Remove last DPRINTFs
  Enable 8-byte wide MMIO for 16550 serial devices

Signed-off-by: Peter Maydell <[email protected]>

throttle-groups: forget timer and schedule next TGM on detach

tg->any_timer_armed[] must be cleared when detaching pending timers from
the AioContext.  Failure to do so leads to hung I/O because it looks
like there are still timers pending when in fact they have been removed.

Other ThrottleGroupMembers might have requests pending too so it's
necessary to schedule the next TGM so it can set a timer.

This patch fixes hung I/O when QEMU is launched with drives that are in
the same throttling group:

  (guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct bs=512 &
  (guest)$ dd if=/dev/zero of=/dev/vdc oflag=direct bs=512 &
  (qemu) stop
  (qemu) cont
  ...I/O is stuck...

Signed-off-by: Stefan Hajnoczi <[email protected]>
Message-id: 20171116112150 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20171115' into staging

User-mode memory helper fixes

# gpg: Signature made Wed 15 Nov 2017 12:32:33 GMT
# gpg:                using RSA key 0x64DF38E8AF7E215F
# gpg: Good signature from "Richard Henderson <[email protected]>"
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth/tags/pull-tcg-20171115:
  target/arm: Fix GETPC usage in do_paired_cmpxchg64_l/be
  target/arm: Use helper_retaddr in stxp helpers
  tcg: Record code_gen_buffer address for user-only memory helpers

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/stefanberger/tags/pull-tpm-2017-11-15-1' into staging

Merge tpm 2017/11/15 v1

# gpg: Signature made Wed 15 Nov 2017 11:51:47 GMT
# gpg:                using RSA key 0x75AD65802A0B4211
# gpg: Good signature from "Stefan Berger <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B818 B9CA DF90 89C2 D5CE  C66B 75AD 6580 2A0B 4211

* remotes/stefanberger/tags/pull-tpm-2017-11-15-1:
  tpm_tis: Return 0 for every register in case of failure mode
  tpm_tis: Return TPM_VERSION_UNSPEC in case of BE failure
  tpm-emulator: protect concurrent ctrl_chr access
  specs: Extend TPM spec with TPM emulator description

Signed-off-by: Peter Maydell <[email protected]>

sdl2: Fix broken display updating after the window is hidden

With SDL 2.0.6, calling SDL_ShowWindow during SDL_WINDOWEVENT_HIDDEN
blocks all subsequent display updates.

Instead of trying to override the change, just update the scon->hidden
flag.

Signed-off-by: Jindrich Makovicka <[email protected]>
Message-Id: <20171112193032 [email protected]>

This is a partial revert of d3f3a0f453ea590be529079ae214c200bb5ecc1a,
which in turn is a workaround for a SDL bug. The bug is fixed in 2.0.6,
see https://bugzilla.libsdl.org/show_bug.cgi?id=3410

Signed-off-by: Gerd Hoffmann <[email protected]>

sdl2: Do not leave grab when fullscreen

Prevents displaying of a doubled mouse pointer when moving the pointer
to the screen edges when fullscreen.

Signed-off-by: Jindrich Makovicka <[email protected]>
Message-Id: <20171112193032 [email protected]>
Signed-off-by: Gerd Hoffmann <[email protected]>

sdl2: Fix dead keyboard after fullsceen

Signed-off-by: Jindrich Makovicka <[email protected]>
Message-Id: <20171112193032 [email protected]>
Signed-off-by: Gerd Hoffmann <[email protected]>

sdl2: Use the same pointer show/hide logic for absolute and relative mode

Also use a proper enum parameter for SDL_ShowCursor

Signed-off-by: Jindrich Makovicka <[email protected]>
Message-Id: <20171112193032 [email protected]>
Signed-off-by: Gerd Hoffmann <[email protected]>

sdl2: Do not quit the emulator when an auxilliary window is closed

Signed-off-by: Jindrich Makovicka <[email protected]>
Message-Id: <20171112193032 [email protected]>
Signed-off-by: Gerd Hoffmann <[email protected]>

fix scripts/update-linux-headers.sh here document

The minus sign after << causes the shell to strip only
preceding tabs, not spaces.

Signed-off-by: Gerd Hoffmann <[email protected]>
Message-Id: <20171110090354 [email protected]>
Fixes: 40bf8e9aede0f9105a9e1e4aaf17b20aaa55f9a0
Reviewed-by: Roman Kagan <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Tested-by: Christian Borntraeger <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

exec: Do not resolve subpage in mru_section

This fixes a crash caused by picking the wrong memory region in
address_space_lookup_region seen with client code accessing a device
model that uses alias memory regions. The expensive part of
address_space_lookup_region anyway is phys_page_find; performance-wise
it is okay to repeat the subsequent subpage lookup.

Signed-off-by: BALATON Zoltan <[email protected]>
Message-Id: <20171114225941.072707456B5@zero.eik.bme.hu>
Signed-off-by: Paolo Bonzini <[email protected]>

tpm_tis: Return 0 for every register in case of failure mode

Rather than returning ~0, return 0 for every register in case of failure
mode. The '0' is better to indicate that there's no device there. It avoids
SeaBIOS detecting a device and getting stuck on it trying to read and write
its registers.

Signed-off-by: Stefan Berger <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>

tpm_tis: Return TPM_VERSION_UNSPEC in case of BE failure

In case the backend has a failure, such as the tpm_emulator's CMD_INIT
failing, the TIS goes into failure mode and does not respond to reads
or writes to MMIO registers. In this case we need to prevent the ACPI
table from being added and the straight-forward way is to indicate that
there's no known TPM version being used.

Signed-off-by: Stefan Berger <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>

tpm-emulator: protect concurrent ctrl_chr access

The control chardev is being used from the data thread to set the
locality of the next request. Altough the chr has a write mutex, we
may potentially read the reply from another thread request.

Add a mutex to protect from concurrent control commands.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

specs: Extend TPM spec with TPM emulator description

Following the recent extension of QEMU with a TPM emulator device,
update the specs describing for how to interact with the device.

The results of commands run inside a Linux VM are expected to be
similar to those when the TPM passthrough device is used, so we
just reuse that.

Signed-off-by: Stefan Berger <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>

target/arm: Fix GETPC usage in do_paired_cmpxchg64_l/be

Use of GETPC must be restricted to those functions that are
directly called from TCG generated code.

Reviewed-by: Alex Bennée <[email protected]>
Fixes: 2399d4e7cec22ecf1c51062d2ebfd45220dbaace
Signed-off-by: Richard Henderson <[email protected]>

target/arm: Use helper_retaddr in stxp helpers

We use raw memory primitives along the !parallel_cpus paths in order to
simplify the endianness handling. Because of that, we did not benefit
from the generic changes to cpu_ldst_user_only_template.h.

The simplest fix is to manipulate helper_retaddr here.

Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Record code_gen_buffer address for user-only memory helpers

When we handle a signal from a fault within a user-only memory helper,
we cannot cpu_restore_state with the PC found within the signal frame.
Use a TLS variable, helper_retaddr, to record the unwind start point
to find the faulting guest insn.

Tested-by: Alex Bennée <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reported-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

util/stats64: Fix min/max comparisons

stat64_min_slow() and stat64_max_slow() compare the wrong way. This
makes iotest 136 fail with clang and -m32.

Signed-off-by: Max Reitz <[email protected]>
Message-Id: <20171114232223 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Update version for v2.11.0-rc1 release

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2017-11-14' into staging

Block patches for 2.11.0-rc1

# gpg: Signature made Tue 14 Nov 2017 17:22:17 GMT
# gpg:                using RSA key 0xF407DB0061D5CF40
# gpg: Good signature from "Max Reitz <[email protected]>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* remotes/maxreitz/tags/pull-block-2017-11-14:
  qemu-iotests: update unsupported image formats in 194
  block/parallels: add migration blocker
  block/parallels: Do not update header or truncate image when INMIGRATE
  block/vhdx.c: Don't blindly update the header
  iotests: 077: Filter out 'resume' lines
  block/snapshot: dirty all dirty bitmaps on snapshot-switch
  qcow2: Check that corrupted images can be repaired in iotest 060
  iotests: Use new-style NBD connections
  iotests: Make 136 less flaky
  iotests: Make 083 less flaky
  iotests: Make 055 less flaky
  iotests: Add missing 'blkdebug::' in 040
  iotests: Make 030 less flaky
  qcow2: Assert that the crypto header does not overlap other metadata
  qcow2: Add iotest for an empty refcount table
  qcow2: Add iotest for an image with header.refcount_table_offset == 0
  qcow2: Don't open images with header.refcount_table_clusters == 0
  qcow2: Prevent allocating compressed clusters at offset 0
  qcow2: Prevent allocating L2 tables at offset 0
  qcow2: Prevent allocating refcount blocks at offset 0

Signed-off-by: Peter Maydell <[email protected]>

qemu-iotests: update unsupported image formats in 194

Test 194 checks for 'luks' to exclude as an unsupported format,
However, most formats are unsupported, due to migration blockers.

Rather than specifying a blacklist of unsupported formats, whitelist
supported formats (specifically, qcow2, qed, raw, dmg).

Tested-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Jeff Cody <[email protected]>
Message-id: 23ca18c7f843c86a28b1529ca9ac6db4b35ca0e4.1510059970 [email protected]
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Denis V. Lunev <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

block/parallels: add migration blocker

Migration does not work for parallels, and has been broken for a while
(see patch 'block/parallels: Do not update header or truncate image when
INMIGRATE'). The bdrv_invalidate_cache() method needs to be added for
migration to be supported. Until this is done, prohibit migration.

Signed-off-by: Jeff Cody <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Message-id: 5e04a7c8a3089913fa58d484af42dab7993984ad.1510059970 [email protected]
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Denis V. Lunev <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

block/parallels: Do not update header or truncate image when INMIGRATE

If we write or modify the image file while the QEMU run state is
INMIGRATE, then the BDRV_O_INACTIVE BDS flag is set. This will cause
an assert, since the image is marked inactive. Make sure we obey this
flag.

Tested-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Jeff Cody <[email protected]>
Message-id: 3996c930fa8cde8570b7a63032720d76a28fd78b.1510059970 [email protected]
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Denis V. Lunev <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

block/vhdx.c: Don't blindly update the header

The VHDX specification requires that before user data modification of
the vhdx image, the VHDX header file and data GUIDs need to be updated.
In vhdx_open(), if the image is set to RDWR, we go ahead and update the
header.

However, just because the image is set to RDWR does not mean we can go
ahead and write at this point - specifically, if the QEMU run state is
INMIGRATE, the underlying file BS may be set to inactive via the BDS
open flag of BDRV_O_INACTIVE.  Attempting to write under this condition
will cause an assert in bdrv_co_pwritev().

We can alternatively latch the first time the image is written.  And lo
and behold, we do just that, via vhdx_user_visible_write() in
vhdx_co_writev().  This means the call to vhdx_update_headers() in
vhdx_open() is likely just vestigial, and can be removed.

Reported-by: Alexey Kardashevskiy <[email protected]>
Tested-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Jeff Cody <[email protected]>
Message-id: 659e4cdba6ef4c651737852777c8c93d27b38040.1510059970 [email protected]
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Denis V. Lunev <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

iotests: 077: Filter out 'resume' lines

In the "Overlapping multiple requests" cases, the 3rd reqs (the break
point B) doesn't wait for the 2nd, and once resumed the I/O will just
continue.  This is because the 2nd is already waiting for the 1st, and
in wait_serialising_requests() there is:

    /* If the request is already (indirectly) waiting for us, or
     * will wait for us as soon as it wakes up, then just go on
     * (instead of producing a deadlock in the former case). */
    if (!req->waiting_for) {
        /* actually break */
        ...
    }

Consequently, the following "sleep 100; resume A" command races with the
completion of that request, and sometimes results in an unexpected
order of output:

> @@ -56,9 +56,9 @@
>  wrote XXX/XXX bytes at offset XXX
>  XXX bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  blkdebug: Resuming request 'B'
> +blkdebug: Resuming request 'A'
>  wrote XXX/XXX bytes at offset XXX
>  XXX bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -blkdebug: Resuming request 'A'
>  wrote XXX/XXX bytes at offset XXX
>  XXX bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  wrote XXX/XXX bytes at offset XXX

Filter out the "Resuming request" lines to make the output
deterministic.

Reported-by: Patchew <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>
Message-id: 20171113150026 [email protected]
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

block/snapshot: dirty all dirty bitmaps on snapshot-switch

Snapshot-switch actually changes active state of disk so it should
reflect on dirty bitmaps. Otherwise next incremental backup using
these bitmaps will be invalid.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-id: 20171023092945 [email protected]
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

qcow2: Check that corrupted images can be repaired in iotest 060

We just fixed a few bugs that caused QEMU to crash when trying to
write to corrupted qcow2 images, and iotest 060 was expanded to test
all those scenarios.

In almost all cases the corrupted images can be repaired using
qemu-img, so this patch verifies that.

Signed-off-by: Alberto Garcia <[email protected]>
Message-id: 0b1b95340ecdfbc6927e36adf2fd42ae6198747a.1510143008 [email protected]
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

iotests: Use new-style NBD connections

Old-style NBD is deprecated upstream (it is documented, but no
longer implemented in the reference implementation), and it is
severely limited (it cannot support structured replies, which
means it cannot support efficient handling of zeroes), when
compared to new-style NBD. We are better off having our iotests
favor new-style everywhere (although some explicit tests,
particularly 83, still cover old-style for back-compat reasons);
this is as simple as supplying the empty string as the default
export name, as it does not change the URI needed to connect a
client to the server. This also gives us more coverage of the
just-added structured reply code, when not overriding $QEMU_NBD
to intentionally point to an older server.

Signed-off-by: Eric Blake <[email protected]>
Message-id: 20171109221216 [email protected]
Signed-off-by: Max Reitz <[email protected]>

iotests: Make 136 less flaky

136 executes some AIO requests without a final aio_flush; then it
advances the virtual clock and thus expects the last access time of the
device to be less than the current time when queried (i.e. idle_time_ns
to be greater than 0). However, without the aio_flush, some requests
may be settled after the clock_step invocation. In that case,
idle_time_ns would be 0 and the test fails.

Fix this by adding an aio_flush if any AIO request other than some other
aio_flush has been executed.

Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Message-id: 20171109203025 [email protected]
Signed-off-by: Max Reitz <[email protected]>