In my "build everything" tree, changing qemu/main-loop.h triggers a
recompile of some 5600 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h). It includes block/aio.h,
which in turn includes qemu/event_notifier.h, qemu/notify.h,
qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h,
qemu/thread.h, qemu/timer.h, and a few more.
Include qemu/main-loop.h only where it's needed. Touching it now
recompiles only some 1700 objects. For block/aio.h and
qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the
others, they shrink only slightly.
In my "build everything" tree, changing hw/hw.h triggers a recompile
of some 2600 out of 6600 objects (not counting tests and objects that
don't depend on qemu/osdep.h).
The previous commits have left only the declaration of hw_error() in
hw/hw.h. This permits dropping most of its inclusions. Touching it
now recompiles less than 200 objects.
hw/hw.h used to include headers hardware emulation "usually" needs.
The previous commits removed all but one of them, to good effect.
Only qom/object.h is left. Remove that one, too.
In my "build everything" tree, changing migration/vmstate.h triggers a
recompile of some 2700 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).
hw/hw.h supposedly includes it for convenience. Several other headers
include it just to get VMStateDescription. The previous commit made
that unnecessary.
Include migration/vmstate.h only where it's still needed. Touching it
now recompiles only some 1600 objects.
migration: Move the VMStateDescription typedef to typedefs.h
We declare incomplete struct VMStateDescription in a couple of places
so we don't have to include migration/vmstate.h for the typedef.
That's fine with me. However, the next commit will drop
migration/vmstate.h from a massive number of compiles. Move the
typedef to qemu/typedefs.h now, so I don't have to insert struct in
front of VMStateDescription all over the place then.
In my "build everything" tree, changing hw/irq.h triggers a recompile
of some 5400 out of 6600 objects (not counting tests and objects that
don't depend on qemu/osdep.h).
hw/hw.h supposedly includes it for convenience. Several other headers
include it just to get qemu_irq and.or qemu_irq_handler.
Move the qemu_irq and qemu_irq_handler typedefs from hw/irq.h to
qemu/typedefs.h, and then include hw/irq.h only where it's still
needed. Touching it now recompiles only some 500 objects.
ide: Include hw/ide/internal a bit less outside hw/ide/
According to hw/ide/internal's file comment, only files in hw/ide/ are
supposed to include it. Drag reality slightly closer to supposition.
Three includes outside hw/ide remain: hw/arm/sbsa-ref.c,
include/hw/ide/pci.h, and include/hw/misc/macio/macio.h. Turns out
board code needs ide-internal.h to wire up IDE stuff. More cleanup is
needed. Left for another day.
In my "build everything" tree, changing migration/qemu-file-types.h
triggers a recompile of some 2600 out of 6600 objects (not counting
tests and objects that don't depend on qemu/osdep.h).
The culprit is again hw/hw.h, which supposedly includes it for
convenience.
Include migration/qemu-file-types.h only where it's needed. Touching
it now recompiles less than 200 objects.
In my "build everything" tree, changing sysemu/reset.h triggers a
recompile of some 2600 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).
The main culprit is hw/hw.h, which supposedly includes it for
convenience.
Include sysemu/reset.h only where it's needed. Touching it now
recompiles less than 200 objects.
trace: Do not include qom/cpu.h into generated trace.h
docs/devel/tracing.txt explains "since many source files include
trace.h, [the generated trace.h use] a minimum of types and other
header files included to keep the namespace clean and compile times
and dependencies down."
Commit 4815185902 "trace: Add per-vCPU tracing states for events with
the 'vcpu' property" made them all include qom/cpu.h via
control-internal.h. qom/cpu.h in turn includes about thirty headers.
Ouch.
Per-vCPU tracing is currently not supported in sub-directories'
trace-events. In other words, qom/cpu.h can only be used in
trace-root.h, not in any trace.h.
Split trace/control-vcpu.h off trace/control.h and
trace/control-internal.h. Have the generated trace.h include
trace/control.h (which no longer includes qom/cpu.h), and trace-root.h
include trace/control-vcpu.h (which includes it).
The resulting improvement is a bit disappointing: in my "build
everything" tree, some 1100 out of 6600 objects (not counting tests
and objects that don't depend on qemu/osdep.h) depend on a trace.h,
and about 600 of them no longer depend on qom/cpu.h. But more than
1300 others depend on trace-root.h. More work is clearly needed.
Left for another day.
hw/tpm/trace-events uses TARGET_FMT_plx formats with uint64_t
arguments. That's wrong, TARGET_FMT_plx takes hwaddr. Since hwaddr
happens to be uint64_t, it works anyway. Messed up in commit ec427498da5, v2.12.0. Clean up by replacing TARGET_FMT_plx with its
macro expansion.
scripts/tracetool/format/log_stap.py (commit 62dd1048c0b, v4.0.0) has
a special case for TARGET_FMT_plx. Delete it.
When commit 5f7d05ecfda added QLIST_INSERT_HEAD_RCU() to qemu/queue.h,
it had to include qemu/atomic.h. Commit 341774fe6cc removed
QLIST_INSERT_HEAD_RCU() again, but neglected to remove the #include.
Do that now.
memory: Fix type of IOMMUMemoryRegionClass member @parent_class
TYPE_IOMMU_MEMORY_REGION is a direct subtype of TYPE_MEMORY_REGION.
Its instance struct is IOMMUMemoryRegion, and its first member is a
MemoryRegion. Correct. Its class struct is IOMMUMemoryRegionClass,
and its first member is a DeviceClass. Wrong. Messed up when commit 1221a474676 introduced the QOM type. It even included hw/qdev-core.h
just for that.
TYPE_MEMORY_REGION doesn't bother to define a class struct. This is
fine, it simply defaults to its super-type TYPE_OBJECT's class struct
ObjectClass. Changing IOMMUMemoryRegionClass's first member's type to
ObjectClass would be a minimal fix, if a bit brittle: if
TYPE_MEMORY_REGION ever acquired own class struct, we'd have to update
IOMMUMemoryRegionClass to use it.
Fix it the clean and robust way instead: give TYPE_MEMORY_REGION its
own class struct MemoryRegionClass now, and use it for
IOMMUMemoryRegionClass's first member.
Revert the include of hw/qdev-core.h, and fix the few files that have
come to rely on it.
In my "build everything" tree, changing a type in qapi/common.json
triggers a recompile of some 3600 out of 6600 objects (not counting
tests and objects that don't depend on qemu/osdep.h).
One common dependency is QapiErrorClass: it's used only in in
qapi/error.h, which uses nothing else, and is widely included.
Move QapiErrorClass from common.json to new error.json. Touching
common.json now recompiles only some 2900 objects.
Some of the generated qapi-types-MODULE.h are included all over the
place. Changing a QAPI type can trigger massive recompiling. Top
scorers recompile more than 1000 out of some 6600 objects (not
counting tests and objects that don't depend on qemu/osdep.h):
Clean up headers to include generated QAPI headers only where needed.
Impact is negligible except for hw/qdev-properties.h.
This header includes qapi/qapi-types-block.h and
qapi/qapi-types-misc.h. They are used only in expansions of property
definition macros such as DEFINE_PROP_BLOCKDEV_ON_ERROR() and
DEFINE_PROP_OFF_AUTO(). Moving their inclusion from
hw/qdev-properties.h to the users of these macros avoids pointless
recompiles. This is how other property definition macros, such as
DEFINE_PROP_NETDEV(), already work.
Back in 2016, we discussed[1] rules for headers, and these were
generally liked:
1. Have a carefully curated header that's included everywhere first. We
got that already thanks to Peter: osdep.h.
2. Headers should normally include everything they need beyond osdep.h.
If exceptions are needed for some reason, they must be documented in
the header. If all that's needed from a header is typedefs, put
those into qemu/typedefs.h instead of including the header.
3. Cyclic inclusion is forbidden.
This patch gets include/ closer to obeying 2.
It's actually extracted from my "[RFC] Baby steps towards saner
headers" series[2], which demonstrates a possible path towards
checking 2 automatically. It passes the RFC test there.
Here's a very, very last minute pull request for qemu-4.1. This fixes
two nasty bugs with the XIVE interrupt controller in "dual" mode
(where the guest decides which interrupt controller it wants to use).
One occurs when resetting the guest while I/O is active, and the other
with migration of hotplugged CPUs.
The timing here is very unfortunate. Alas, we only spotted these bugs
very late, and I was sick last week, delaying analysis and fix even
further.
This series hasn't had nearly as much testing as I'd really like, but
I'd still like to squeeze it into qemu-4.1 if possible, since
definitely fixing two bad bugs seems like an acceptable tradeoff for
the risk of introducing different bugs.
Cédric Le Goater [Tue, 13 Aug 2019 06:48:53 +0000 (08:48 +0200)]
spapr/xive: Fix migration of hot-plugged CPUs
The migration sequence of a guest using the XIVE exploitation mode
relies on the fact that the states of all devices are restored before
the machine is. This is not true for hot-plug devices such as CPUs
which state come after the machine. This breaks migration because the
thread interrupt context registers are not correctly set.
Fix migration of hotplugged CPUs by restoring their context in the
'post_load' handler of the XiveTCTX model.
David Gibson [Tue, 13 Aug 2019 05:59:18 +0000 (15:59 +1000)]
spapr: Reset CAS & IRQ subsystem after devices
This fixes a nasty regression in qemu-4.1 for the 'pseries' machine,
caused by the new "dual" interrupt controller model. Specifically,
qemu can crash when used with KVM if a 'system_reset' is requested
while there's active I/O in the guest.
The problem is that in spapr_machine_reset() we:
1. Reset the CAS vector state
spapr_ovec_cleanup(spapr->ov5_cas);
2. Reset all devices
qemu_devices_reset()
3. Reset the irq subsystem
spapr_irq_reset();
However (1) implicitly changes the interrupt delivery mode, because
whether we're using XICS or XIVE depends on the CAS state. We don't
properly initialize the new irq mode until (3) though - in particular
setting up the KVM devices.
During (2), we can temporarily drop the BQL allowing some irqs to be
delivered which will go to an irq system that's not properly set up.
Specifically, if the previous guest was in (KVM) XIVE mode, the CAS
reset will put us back in XICS mode. kvm_kernel_irqchip() still
returns true, because XIVE was using KVM, however XICs doesn't have
its KVM components intialized and kernel_xics_fd == -1. When the irq
is delivered it goes via ics_kvm_set_irq() which assert()s that
kernel_xics_fd != -1.
This change addresses the problem by delaying the CAS reset until
after the devices reset. The device reset should quiesce all the
devices so we won't get irqs delivered while we mess around with the
IRQ. The CAS reset and irq re-initialize should also now be under the
same BQL critical section so nothing else should be able to interrupt
it either.
We also move the spapr_irq_msi_reset() used in one of the legacy irq
modes, since it logically makes sense at the same point as the
spapr_irq_reset() (it's essentially an equivalent operation for older
machine types). Since we don't need to switch between different
interrupt controllers for those old machine types it shouldn't
actually be broken in those cases though.
Cc: Cédric Le Goater <[email protected]> Fixes: b2e22477 "spapr: add a 'reset' method to the sPAPR IRQ backend" Fixes: 13db0cd9 "spapr: introduce a new sPAPR IRQ backend supporting
XIVE and XICS" Signed-off-by: David Gibson <[email protected]>
Gerd Hoffmann [Mon, 12 Aug 2019 06:52:21 +0000 (08:52 +0200)]
display/bochs: fix pcie support
Set QEMU_PCI_CAP_EXPRESS unconditionally in init(), then clear it in
realize() in case the device is not connected to a PCIe bus.
This makes sure the pci config space allocation is big enough, so
accessing the PCIe extended config space doesn't overflow the pci
config space buffer.
PCI(e) config space is guest writable. Writes are limited by
write mask (which probably is also filled with random stuff),
so the guest can only flip enabled bits. But I suspect it
still might be exploitable, so rather serious because it might
be a host escape for the guest. On the other hand the device
is probably not yet in widespread use.
(For a QEMU version without this commit, a mitigation for the
bug is available: use "-device bochs-display" as a conventional pci
device only.)
Cornelia Huck [Tue, 6 Aug 2019 11:58:19 +0000 (13:58 +0200)]
compat: disable edid on virtio-gpu base device
'edid' is a property of the virtio-gpu base device, so turning
it off on virtio-gpu-pci is not enough (it misses -ccw). Turn
it off on the base device instead.
Peter Maydell [Tue, 6 Aug 2019 12:40:31 +0000 (13:40 +0100)]
Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2019-08-06' into staging
Block patches for 4.1.0-rc4:
- Fix the backup block job when using copy offloading
- Fix the mirror block job when using the write-blocking copy mode
- Fix incremental backups after the image has been grown with the
respective bitmap attached to it
* remotes/maxreitz/tags/pull-block-2019-08-06:
block/backup: disable copy_range for compressed backup
iotests: Test unaligned blocking mirror write
mirror: Only mirror granularity-aligned chunks
iotests: Test incremental backup after truncation
util/hbitmap: update orig_size on truncate
iotests: Test backup job with two guest writes
backup: Copy only dirty areas
Max Reitz [Mon, 5 Aug 2019 15:33:08 +0000 (17:33 +0200)]
mirror: Only mirror granularity-aligned chunks
In write-blocking mode, all writes to the top node directly go to the
target. We must only mirror chunks of data that are aligned to the
job's granularity, because that is how the dirty bitmap works.
Therefore, the request alignment for writes must be the job's
granularity (in write-blocking mode).
Unfortunately, this forces all reads and writes to have the same
granularity (we only need this alignment for writes to the target, not
the source), but that is something to be fixed another time.
Without this, hbitmap_next_zero and hbitmap_next_dirty_area are broken
after truncate. So, orig_size is broken since it's introduction in 76d570dc495c56bb.
Max Reitz [Thu, 1 Aug 2019 17:39:00 +0000 (19:39 +0200)]
iotests: Test backup job with two guest writes
Perform two guest writes to not yet backed up areas of an image, where
the former touches an inner area of the latter.
Before HEAD^, copy offloading broke this in two ways:
(1) The target image differs from the reference image (what the source
was when the backup started).
(2) But you will not see that in the failing output, because the job
offset is reported as being greater than the job length. This is
because one cluster is copied twice, and thus accounted for twice,
but of course the job length does not increase.
Max Reitz [Thu, 1 Aug 2019 17:38:59 +0000 (19:38 +0200)]
backup: Copy only dirty areas
The backup job must only copy areas that the copy_bitmap reports as
dirty. This is always the case when using traditional non-offloading
backup, because it copies each cluster separately. When offloading the
copy operation, we sometimes copy more than one cluster at a time, but
we only check whether the first one is dirty.
Therefore, whenever copy offloading is possible, the backup job
currently produces wrong output when the guest writes to an area of
which an inner part has already been backed up, because that inner part
will be re-copied.
Olaf Hering [Thu, 30 May 2019 19:28:11 +0000 (21:28 +0200)]
Makefile: remove DESTDIR from firmware file content
The resulting firmware files should only contain the runtime path.
Fixes commit 26ce90fde5c ("Makefile: install the edk2 firmware images
and their descriptors")
Peter Maydell [Thu, 1 Aug 2019 10:57:42 +0000 (11:57 +0100)]
target/arm: Avoid bogus NSACR traps on M-profile without Security Extension
In Arm v8.0 M-profile CPUs without the Security Extension and also in
v7M CPUs, there is no NSACR register. However, the code we have to handle
the FPU does not always check whether the ARM_FEATURE_M_SECURITY bit
is set before testing whether env->v7m.nsacr permits access to the
FPU. This means that for a CPU with an FPU but without the Security
Extension we would always take a bogus fault when trying to stack
the FPU registers on an exception entry.
We could fix this by adding extra feature bit checks for all uses,
but it is simpler to just make the internal value of nsacr 0xcff
("all non-secure accesses allowed"), since this is not guest
visible when the Security Extension is not present. This allows
us to continue to follow the Arm ARM pseudocode which takes a
similar approach. (In particular, in the v8.1 Arm ARM the register
is documented as reading as 0xcff in this configuration.)
ACS got added in 4.0 unconditionally, that broke older<->4.0 migration
where there was a PCIe root port.
Fix this by turning it off for 3.1 and older machines; note this
fixes compatibility for older QEMUs but breaks compatibility with 4.0
for older machine types.
machine type source qemu dest qemu
3.1 3.1 4.0 broken
3.1 3.1 4.1rc2 broken
3.1 3.1 4.1+this OK ++
3.1 4.0 4.1rc2 OK
3.1 4.0 4.1+this broken --
4.0 4.0 4.1rc2 OK
4.0 4.0 4.1+this OK
So we gain and lose; the consensus seems to be treat this as a
fix for older machine types.
ACS was added in 4.0 unconditionally, this breaks migration
compatibility.
Allow ACS to be disabled by adding a property that's
checked by pcie_root_port.
Unfortunately pcie-root-port doesn't have any instance data,
so there's no where for that flag to live, so stuff it into
PCIESlot.
Peter Maydell [Tue, 30 Jul 2019 13:25:22 +0000 (14:25 +0100)]
target/arm: Deliver BKPT/BRK exceptions to correct exception level
Most Arm architectural debug exceptions (eg watchpoints) are ignored
if the configured "debug exception level" is below the current
exception level (so for example EL1 can't arrange to get debug exceptions
for EL2 execution). Exceptions generated by the BRK or BPKT instructions
are a special case -- they must always cause an exception, so if
we're executing above the debug exception level then we
must take them to the current exception level.
This fixes a bug where executing BRK at EL2 could result in an
exception being taken at EL1 (which is strictly forbidden by the
architecture).
Kevin Wolf [Tue, 30 Jul 2019 13:37:08 +0000 (15:37 +0200)]
fdc: Fix inserting read-only media in empty drive
In order to insert a read-only medium (i.e. a read-only block node) to
the BlockBackend of a floppy drive, we must not have taken write
permissions on that BlockBackend, or the operation will fail with the
error message "Block node is read-only".
The device already takes care to remove all permissions when the medium
is ejected, but the state isn't correct if the drive is initially empty:
It uses blk_is_read_only() to check whether write permissions should be
taken, but this function returns false for empty BlockBackends in the
common case.
Fix floppy_drive_realize() to avoid taking write permissions if the
drive is empty.
Peter Maydell [Tue, 30 Jul 2019 11:25:34 +0000 (12:25 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches:
- scsi-cd: Fix inserting read-only media in empty drive
- block/copy-on-read: Fix permissions for inactive node
- Test case fixes
# gpg: Signature made Tue 30 Jul 2019 12:21:48 BST
# gpg: using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream:
scsi-cd: Fix inserting read-only media in empty drive
block/copy-on-read: Fix permissions for inactive node Fixes: add read-zeroes to 051.out
tests/multiboot: Fix load address of test kernels
Kevin Wolf [Mon, 29 Jul 2019 16:33:33 +0000 (18:33 +0200)]
scsi-cd: Fix inserting read-only media in empty drive
scsi-disks decides whether it has a read-only device by looking at
whether the BlockBackend specified as drive=... is read-only. In the
case of an anonymous BlockBackend (with a node name specified in
drive=...), this is the read-only flag of the attached node. In the case
of an empty anonymous BlockBackend, it's always read-write because
nothing prevented it from being read-write.
This is a problem because scsi-cd would take write permissions on the
anonymous BlockBackend of an empty drive created without a drive=...
option. Using blockdev-insert-medium with a read-only node fails then
with the error message "Block node is read-only".
Fix scsi_realize() so that scsi-cd devices always take read-only
permissions on their BlockBackend instead.
Kevin Wolf [Mon, 29 Jul 2019 10:45:14 +0000 (12:45 +0200)]
block/copy-on-read: Fix permissions for inactive node
The copy-on-read drive must not request the WRITE_UNCHANGED permission
for its child if the node is inactive, otherwise starting a migration
destination with -incoming will fail because the child cannot provide
write access yet:
qemu-system-x86_64: -blockdev copy-on-read,file=img,node-name=cor: Block node is read-only
Earlier QEMU versions additionally ran into an abort() on the migration
source side: bdrv_inactivate_recurse() failed to update permissions.
This is silently ignored today because it was only supposed to loosen
restrictions. This is the symptom that was originally reported here:
Fixes: add read-zeroes to 051.out
The patch "iotests: Set read-zeroes on in null block driver for Valgrind"
with the commit ID a6862418fec4072 needs the change in 051.out when
compared against on the s390 system.
Kevin Wolf [Mon, 22 Jul 2019 09:26:15 +0000 (11:26 +0200)]
tests/multiboot: Fix load address of test kernels
While older toolchains produced binaries where the physical load address
of ELF segments was the same as the virtual address, newer versions seem
to choose a different physical address if it isn't specified explicitly.
The means that the test kernel doesn't use the right addresses to access
e.g. format strings any more and the whole output disappears, causing
all test cases to fail.
Fix this by specifying the physical load address of sections explicitly.
Peter Maydell [Tue, 30 Jul 2019 08:43:32 +0000 (09:43 +0100)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
virtio, pc: fixes
A couple of last minute bugfixes.
Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Mon 29 Jul 2019 22:13:22 BST
# gpg: using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>" [full]
# gpg: aka "Michael S. Tsirkin <[email protected]>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream:
pc-dimm: fix crash when invalid slot number is used
Revert "hw: report invalid disable-legacy|modern usage for virtio-1-only devs"
Revert "Revert "globals: Allow global properties to be optional""
Revert "hw: report invalid disable-legacy|modern usage for virtio-1-only devs"
This reverts commit f2784eed306449c3d04a71a05ed6463b8289aedf
since that accidentally removes the PCIe capabilities from virtio
devices because virtio_pci_dc_realize is called before the new 'mode'
flag is set.
Peter Maydell [Mon, 29 Jul 2019 11:04:53 +0000 (12:04 +0100)]
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Mon 29 Jul 2019 09:30:48 BST
# gpg: using RSA key EF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <[email protected]>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
net/colo-compare.c: Fix memory leak and code style issue.
net: tap: replace snprintf with g_strdup_printf calls
qemu-bridge-helper: move repeating code in parse_acl_file
qemu-bridge-helper: restrict interface name to IFNAMSIZ
e1000: don't raise interrupt in pre_save()
Peter Maydell [Mon, 29 Jul 2019 09:14:24 +0000 (10:14 +0100)]
Merge remote-tracking branch 'remotes/palmer/tags/riscv-for-master-4.1-rc3' into staging
RISC-V Patch for 4.1-rc3
This contains a single patch that fixes the warning introduced as part
of the OpenSBI integration.
# gpg: Signature made Sat 27 Jul 2019 00:04:19 BST
# gpg: using RSA key 00CE76D1834960DFCE886DF8EF4CA1502CCBAB41
# gpg: issuer "[email protected]"
# gpg: Good signature from "Palmer Dabbelt <[email protected]>" [unknown]
# gpg: aka "Palmer Dabbelt <[email protected]>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 00CE 76D1 8349 60DF CE88 6DF8 EF4C A150 2CCB AB41
* remotes/palmer/tags/riscv-for-master-4.1-rc3:
riscv/boot: Fixup the RISC-V firmware warning
net: tap: replace snprintf with g_strdup_printf calls
When invoking qemu-bridge-helper in 'net_bridge_run_helper',
instead of using fixed sized buffers, use dynamically allocated
ones initialised and returned by g_strdup_printf().
qemu-bridge-helper: restrict interface name to IFNAMSIZ
The network interface name in Linux is defined to be of size
IFNAMSIZ(=16), including the terminating null('\0') byte.
The same is applied to interface names read from 'bridge.conf'
file to form ACL rules. If user supplied '--br=bridge' name
is not restricted to the same length, it could lead to ACL bypass
issue. Restrict interface name to IFNAMSIZ, including null byte.
Jason Wang [Wed, 10 Jul 2019 03:52:53 +0000 (11:52 +0800)]
e1000: don't raise interrupt in pre_save()
We should not raise any interrupt after VM has been stopped but this
is what e1000 currently did when mit timer is active in
pre_save(). Fixing this by scheduling a timer in post_load() which can
make sure the interrupt was raised when VM is running.
Greg Kurz [Wed, 24 Jul 2019 16:57:20 +0000 (18:57 +0200)]
xics/kvm: Fix fallback to emulated XICS
Commit 4812f2615288 tried to fix rollback path of xics_kvm_connect() but
it isn't enough. If we fail to create the KVM device, the guest fails
to boot later on with:
This happens because QEMU fell back on XICS emulation but didn't unregister
the RTAS calls from KVM. The emulated RTAS calls are hence never called and
the KVM ones return an error to the guest since the KVM device is absent.
The sanity checks in xics_kvm_disconnect() are abusive since we're freeing
the KVM device. Simply drop them.
Alistair Francis [Mon, 22 Jul 2019 20:20:40 +0000 (13:20 -0700)]
riscv/boot: Fixup the RISC-V firmware warning
Fix a typo in the warning message displayed to users, don't print the
message when running inside qtest and don't mention a specific QEMU
version for the deprecation.
Peter Maydell [Thu, 25 Jul 2019 13:16:45 +0000 (14:16 +0100)]
linux-user: Make sigaltstack stacks per-thread
The alternate signal stack set up by the sigaltstack syscall is
supposed to be per-thread. We were incorrectly implementing it as
process-wide. This causes problems for guest binaries that rely on
this. Notably the Go runtime does, and so we were seeing crashes
caused by races where two guest threads might incorrectly both
execute on the same stack simultaneously.
Replace the global target_sigaltstack_used with a field
sigaltstack_used in the TaskState, and make all the references to the
old global instead get a pointer to the TaskState and use the field.
Peter Maydell [Fri, 26 Jul 2019 15:23:07 +0000 (16:23 +0100)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20190726' into staging
target-arm queue:
* Fix broken migration on pl330 device
* Fix broken migration on stellaris-input device
* Add type checks to vmstate varry macros to avoid this class of bugs
* hw/arm/boot: Fix some remaining cases where we would put the
initrd on top of the kernel image
Peter Maydell [Mon, 22 Jul 2019 15:18:04 +0000 (16:18 +0100)]
hw/arm/boot: Further improve initrd positioning code
In commit e6b2b20d9735d4ef we made the boot loader code try to avoid
putting the initrd on top of the kernel. However the expression used
to calculate the start of the initrd:
incorrectly uses 'kernel_size' as the offset within RAM of the
highest address to avoid. This is incorrect because the kernel
doesn't start at address 0, but slightly higher than that. This
means that we can still incorrectly end up overlaying the initrd on
the kernel in some cases, for example:
* The kernel's image_size is 0x0a7a8000
* The kernel was loaded at 0x40080000
* The end of the kernel is 0x4A828000
* The DTB was loaded at 0x4a800000
To get this right we need to track the actual highest address used
by the kernel and use that rather than kernel_size. We already
set image_low_addr and image_high_addr for ELF images; set them
also for the various other image types we support, and then use
image_high_addr as the lowest allowed address for the initrd.
(We don't use image_low_addr, but we set it for consistency
with the existing code path for ELF files.)
Peter Maydell [Mon, 22 Jul 2019 15:18:03 +0000 (16:18 +0100)]
hw/arm/boot: Rename elf_{low, high}_addr to image_{low, high}_addr
Rename the elf_low_addr and elf_high_addr variables to image_low_addr
and image_high_addr -- in the next commit we will extend them to
be set for other kinds of image file and not just ELF files.
Peter Maydell [Fri, 26 Jul 2019 14:40:28 +0000 (15:40 +0100)]
vmstate.h: Type check VMSTATE_STRUCT_VARRAY macros
The VMSTATE_STRUCT_VARRAY_UINT32 macro is intended to handle
migrating a field which is an array of structs, but where instead of
migrating the entire array we only migrate a variable number of
elements of it.
The VMSTATE_STRUCT_VARRAY_POINTER_UINT32 macro is intended to handle
migrating a field which is of pointer type, and points to a
dynamically allocated array of structs of variable size.
We weren't actually checking that the field passed to
VMSTATE_STRUCT_VARRAY_UINT32 really is an array, with the result that
accidentally using it where the _POINTER_ macro was intended would
compile but silently corrupt memory on migration.
Add type-checking that enforces that the field passed in is
really of the right array type. This applies to all the VMSTATE
macros which use flags including VMS_VARRAY_* but not VMS_POINTER.
Peter Maydell [Fri, 26 Jul 2019 14:40:28 +0000 (15:40 +0100)]
stellaris_input: Fix vmstate description of buttons field
gamepad_state::buttons is a pointer to an array of structs,
not an array of structs, so should be declared in the vmstate
with VMSTATE_STRUCT_VARRAY_POINTER_INT32; otherwise we
corrupt memory on incoming migration.
We bump the vmstate version field as the easiest way to
deal with the migration break, since migration wouldn't have
worked reliably before anyway.
Fix the pl330 main and queue vmstate description.
There were missing POINTER flags causing crashes during
incoming migration because:
+ PL330State chan field is a pointer to an array
+ PL330Queue queue field is a pointer to an array
Peter Maydell [Fri, 26 Jul 2019 09:53:20 +0000 (10:53 +0100)]
Merge remote-tracking branch 'remotes/stefanberger/tags/pull-tpm-2019-07-25-1' into staging
Merge tpm 2019/07/25 v1
# gpg: Signature made Thu 25 Jul 2019 16:40:54 BST
# gpg: using RSA key 75AD65802A0B4211
# gpg: Good signature from "Stefan Berger <[email protected]>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B818 B9CA DF90 89C2 D5CE C66B 75AD 6580 2A0B 4211
* remotes/stefanberger/tags/pull-tpm-2019-07-25-1:
tpm_emulator: Translate TPM error codes to strings
tpm: Exit in reset when backend indicates failure
Peter Maydell [Thu, 25 Jul 2019 15:38:24 +0000 (16:38 +0100)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
virtio, pc: fixes, cleanups
A bunch of fixes all over the place.
Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Thu 25 Jul 2019 16:19:33 BST
# gpg: using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>" [full]
# gpg: aka "Michael S. Tsirkin <[email protected]>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream:
virtio-balloon: free pbp more aggressively
virtio-balloon: don't track subpages for the PBP
virtio-balloon: Use temporary PBP only
virtio-balloon: Rework pbp tracking data
virtio-balloon: Better names for offset variables in inflate/deflate code
virtio-balloon: Simplify deflate with pbp
virtio-balloon: Fix QEMU crashes on pagesize > BALLOON_PAGE_SIZE
virtio-balloon: Fix wrong sign extension of PFNs
i386/acpi: show PCI Express bus on pxb-pcie expanders
ioapic: kvm: Skip route updates for masked pins
i386/acpi: fix gint overflow in crs_range_compare
docs: clarify multiqueue vs multiple virtqueues
Previous patches switched to a temporary pbp but that does not go far
enough: after device uses a buffer, guest is free to reuse it, so
tracking the page and freeing it later is wrong.
Free and reset the pbp after we push each element.
Fixes: ed48c59875b6 ("virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size") Cc: [email protected] #v4.0.0 Cc: David Hildenbrand <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
As ramblocks cannot get removed/readded while we are processing a bulk
of inflation requests, there is no more need to track the page size
in form of the number of subpages.
We still have multiple issues in the current code
- The PBP is not freed during unrealize()
- The PBP is not reset on device resets: After a reset, the PBP is stale.
- We are not indicating VIRTIO_BALLOON_F_MUST_TELL_HOST, therefore
guests (esp. legacy guests) will reuse pages without deflating,
turning the PBP stale. Adding that would require compat handling.
Instead, let's use the PBP only temporarily, when processing one bulk of
inflation requests. This will keep guest_page_size > 4k working (with
Linux guests). There is nothing to do for deflation requests anymore.
The pbp is only used for a limited amount of time.
Using the address of a RAMBlock to test for a matching pbp is not really
safe. Instead, let's use the guest physical address of the base page
along with the page size (via the number of subpages).
Also, let's allocate the bitmap separately. This makes the code
easier to read and maintain - we can reuse bitmap_new().
Prepare the code to move the PBP out of the device.
Fixes: ed48c59875b6 ("virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size") Fixes: b27b32391404 ("virtio-balloon: Fix possible guest memory corruption with inflates & deflates") Cc: [email protected] #v4.0.0 Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20190722134108[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
Let's simplify this - the case we are optimizing for is very hard to
trigger and not worth the effort. If we're switching from inflation to
deflation, let's reset the pbp.
virtio-balloon: Fix QEMU crashes on pagesize > BALLOON_PAGE_SIZE
We are using the wrong functions to set/clear bits, effectively touching
multiple bits, writing out of range of the bitmap, resulting in memory
corruptions. We have to use set_bit()/clear_bit() instead.
Can easily be reproduced by starting a qemu guest on hugetlbfs memory,
inflating the balloon. QEMU crashes. This never could have worked
properly - especially, also pages would have been discarded when the
first sub-page would be inflated (the whole bitmap would be set).
While testing I realized, that on hugetlbfs it is pretty much impossible
to discard a page - the guest just frees the 4k sub-pages in random order
most of the time. I was only able to discard a hugepage a handful of
times - so I hope that now works correctly.
Fixes: ed48c59875b6 ("virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size") Fixes: b27b32391404 ("virtio-balloon: Fix possible guest memory corruption with inflates & deflates") Cc: [email protected] #v4.0.0 Acked-by: David Gibson <[email protected]> Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20190722134108[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
If we directly cast from int to uint64_t, we will first sign-extend to
an int64_t, which is wrong. We actually want to treat the PFNs like
unsigned values.
As far as I can see, this dates back to the initial virtio-balloon
commit, but wasn't triggered as fairly big guests would be required.
* remotes/bonzini/tags/for-upstream:
docs: correct kconfig option
i386/kvm: Do not sync nested state during runtime
virtio-scsi: fixed virtio_scsi_ctx_check failed when detaching scsi disk
Jan Kiszka [Sun, 2 Jun 2019 11:42:13 +0000 (13:42 +0200)]
ioapic: kvm: Skip route updates for masked pins
Masked entries will not generate interrupt messages, thus do no need to
be routed by KVM. This is a cosmetic cleanup, just avoiding warnings of
the kind