Commit 5643cc94ac1c ("virtio-gpu-3d: add support for second capability
set (v4)") updated virtio_gpu.h with a define that does not yet(?)
exist upstream resulting in build breakage every time Linux headers
are updated via the standard update script. Conditionally define this
within QEMU code instead to avoid future breakage.
Laszlo Ersek [Wed, 9 May 2018 15:26:08 +0000 (17:26 +0200)]
docs/interop: add "firmware.json"
Add a schema that describes the different uses and properties of virtual
machine firmware.
Each firmware executable installed on a host system should come with at
least one JSON file that conforms to this schema. Each file informs the
management applications about
- the firmware's properties and one possible use case / feature set,
- configuration bits that are required to run the firmware binary.
In addition, define rules for management apps for picking the highest
priority firmware JSON file when multiple such files match the search
criteria.
The vmstate for isa_ipmi_kcs was referencing into the kcs structure,
instead create a kcs structure separate and use that.
There were also some issues in the state transfer. The inlen field
was not being transferred, so if a transaction was in process during
the transfer it would be messed up. And the use_irq field was
transferred, but that should come from the configuration.
To fix this, the new VMS_VSTRUCT macros are used so the exact
version of the structure can be specified, depending on what
version was being received. So an upgrade should work for KCS.
The VMS_STRUCT has no way to specify which version of a structure
to use. Add a type and a new field to allow the specific version
of a structure to be used.
Paolo Bonzini [Tue, 22 May 2018 19:20:03 +0000 (21:20 +0200)]
tcg: remove softfloat from --disable-tcg builds
Even though the presence of softfloat does not cause --disable-tcg builds to fail,
it is the single largest .o file in them. Remove it, since TCG is the only client.
Lucian Petrut [Tue, 15 May 2018 17:35:22 +0000 (20:35 +0300)]
WHPX: fix some compiler warnings
This patch fixes a few compiler warnings, especially in case of
x86 targets, where the number of registers was not properly handled
and could cause an overflow.
Lucian Petrut [Tue, 15 May 2018 17:35:21 +0000 (20:35 +0300)]
WHPX: dynamically load WHP libraries
We're currently linking against import libraries of the WHP DLLs.
By dynamically loading the libraries, we ensure that QEMU will work
on previous Windows versions, where the WHP DLLs will be missing
(assuming that WHP is not requested).
Also, we're simplifying the build process, as we no longer require
the import libraries.
Peter Maydell [Tue, 15 May 2018 18:27:00 +0000 (19:27 +0100)]
exec.c: Initialize sa_flags passed to sigaction()
Coverity points out that in the user-only version of cpu_abort() we
call sigaction() with a partially initialized struct sigaction
(CID 1005351). Correct the omission.
Peter Maydell [Tue, 15 May 2018 17:27:29 +0000 (18:27 +0100)]
memfd: Avoid Coverity warning about integer overflow
Coverity complains about qemu_memfd_create() (CID 1385858) because
we calculate a bit position htsize which could be up to 63, but
then use it in "1 << htsize" which is a 32-bit integer calculation
and could push the 1 off the top of the value.
Silence the complaint bu using "1ULL"; this isn't a bug in
practice since a hugetlbsize of 4GB is not very plausible.
hw/isa/superio: Fix inconsistent use of Chardev->be
4c3119a6e3e and cd9526ab7c0 introduced an incorrect and inconsistent
use of Chardev->be. Also, this CharBackend member is private and is
not supposed to be accessible.
Paolo Bonzini [Tue, 15 May 2018 14:35:16 +0000 (16:35 +0200)]
memory: get rid of memory_region_init_reservation
The function has been deprecated for 2.5 years, and there are just a handful
of users. Convert them to memory_region_init_io with NULL callbacks,
and while at it pass the right device as the owner.
qom: Document qom/device-list-properties implementation specific
The recently introduced qom-list-properties QMP command raised
a question what properties it (and its cousin - device-list-properties)
can possibly print - only those defined by DeviceClass::props
or dynamically created in TypeInfo::instance_init() so properties created
elsewhere won't show up and this behaviour might confuse the user.
For example, PIIX4 does that from piix4_pm_realize() via
piix4_pm_add_propeties():
vfio: Include "exec/address-spaces.h" directly in the source file
No declaration of "hw/vfio/vfio-common.h" directly requires to include
the "exec/address-spaces.h" header. To simplify dependencies and
ease the upcoming cleanup of "exec/address-spaces.h", directly include
it in the source file where the declaration are used.
Yi Min Zhao [Thu, 31 May 2018 03:29:37 +0000 (11:29 +0800)]
sandbox: disable -sandbox if CONFIG_SECCOMP undefined
If CONFIG_SECCOMP is undefined, the option 'elevateprivileges' remains
compiled. This would make libvirt set the corresponding capability and
then trigger failure during guest startup. This patch moves the code
regarding seccomp command line options to qemu-seccomp.c file and
wraps qemu_opts_foreach finding sandbox option with CONFIG_SECCOMP.
Because parse_sandbox() is moved into qemu-seccomp.c file, change
seccomp_start() to static function.
* remotes/vivier2/tags/linux-user-for-2.13-pull-request:
gdbstub: Clarify what gdb_handlesig() is doing
linux-user: define TARGET_SO_REUSEPORT
linux-user: copy sparc/sockbits.h definitions from linux
linux-user: update ARCH_HAS_SOCKET_TYPES use
linux-user: move ppc socket.h definitions to ppc/sockbits.h
linux-user: move socket.h generic definitions to generic/sockbits.h
linux-user: move sparc/sparc64 socket.h definitions to sparc/sockbits.h
linux-user: move alpha socket.h definitions to alpha/sockbits.h
linux-user: move mips socket.h definitions to mips/sockbits.h
linux-user: Fix payload size logic in host_to_target_cmsg()
linux-user: update comments to point to tcg_exec_init()
linux-user: update netlink emulation
linux-user: Assert on bad type in thunk_type_align() and thunk_type_size()
Peter Maydell [Tue, 15 May 2018 18:19:58 +0000 (19:19 +0100)]
gdbstub: Clarify what gdb_handlesig() is doing
gdb_handlesig()'s behaviour is not entirely obvious at first
glance. Add a doc comment for it, and also add a comment
explaining why it's ok for gdb_do_syscallv() to ignore
gdb_handlesig()'s return value. (Coverity complains about
this: CID 1390850.)
Peter Maydell [Fri, 18 May 2018 18:47:15 +0000 (19:47 +0100)]
linux-user: Fix payload size logic in host_to_target_cmsg()
Coverity points out that there's a missing break in the switch in
host_to_target_cmsg() where we update tgt_len for
cmsg_level/cmsg_type combinations which require a different length
for host and target (CID 1385425). To avoid duplicating the default
case (target length same as host) in both switches, set that before
the switch so that only the cases which want to override it need any
code.
This fixes a bug where we would have used the wrong length
for SOL_SOCKET/SO_TIMESTAMP messages where the target and
host have differently sized 'struct timeval' (ie one is 32
bit and the other is 64 bit).
Igor Mammedov [Thu, 17 May 2018 11:51:17 +0000 (13:51 +0200)]
linux-user: update comments to point to tcg_exec_init()
cpu_init() was replaced by cpu_create() since 2.12 but comments
weren't updated. So update stale comments to point that page
sizes arei actually initialized by tcg_exec_init(). Also move
another qemu_host_page_size related comment before tcg_exec_init()
where it belongs.
Peter Maydell [Mon, 14 May 2018 17:46:16 +0000 (18:46 +0100)]
linux-user: Assert on bad type in thunk_type_align() and thunk_type_size()
In thunk_type_align() and thunk_type_size() we currently return
-1 if the value at the type_ptr isn't one of the TYPE_* values
we understand. However, this should never happen, and if it does
then the calling code will go confusingly wrong because none
of the callsites try to handle an error return. Switch to an
assertion instead, so that if this does somehow happen we'll have
a nice clear backtrace of what happened rather than a weird crash
or misbehaviour.
This also silences various Coverity complaints about not handling
the negative return value (CID 1005735, 1005736, 1005738, 1390582).
Peter Maydell [Thu, 24 May 2018 13:22:23 +0000 (14:22 +0100)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pc, pci, virtio, vhost: fixes, features
Beginning of merging vDPA, new PCI ID, a new virtio balloon stat, intel
iommu rework fixing a couple of security problems (no CVEs yet), fixes
all over the place.
Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Wed 23 May 2018 15:41:32 BST
# gpg: using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>"
# gpg: aka "Michael S. Tsirkin <[email protected]>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream: (28 commits)
intel-iommu: rework the page walk logic
util: implement simple iova tree
intel-iommu: trace domain id during page walk
intel-iommu: pass in address space when page walk
intel-iommu: introduce vtd_page_walk_info
intel-iommu: only do page walk for MAP notifiers
intel-iommu: add iommu lock
intel-iommu: remove IntelIOMMUNotifierNode
intel-iommu: send PSI always even if across PDEs
nvdimm: fix typo in label-size definition
contrib/vhost-user-blk: enable protocol feature for vhost-user-blk
hw/virtio: Fix brace Werror with clang 6.0.0
libvhost-user: Send messages with no data
vhost-user+postcopy: Use qemu_set_nonblock
virtio: support setting memory region based host notifier
vhost-user: support receiving file descriptors in slave_read
vhost-user: add Net prefix to internal state structure
linux-headers: add kvm header for mips
linux-headers: add unistd.h on all arches
update-linux-headers.sh: unistd.h, kvm consistency
...
Peter Maydell [Thu, 24 May 2018 10:30:59 +0000 (11:30 +0100)]
Merge remote-tracking branch 'remotes/sstabellini-http/tags/xen-20180522-tag' into staging
Xen 2018/05/22
# gpg: Signature made Tue 22 May 2018 19:44:06 BST
# gpg: using RSA key 894F8F4870E1AE90
# gpg: Good signature from "Stefano Stabellini <[email protected]>"
# gpg: aka "Stefano Stabellini <[email protected]>"
# Primary key fingerprint: D04E 33AB A51F 67BA 07D3 0AEA 894F 8F48 70E1 AE90
* remotes/sstabellini-http/tags/xen-20180522-tag:
xen_disk: be consistent with use of xendev and blkdev->xendev
xen_disk: use a single entry iovec
xen_backend: make the xen_feature_grant_copy flag private
xen_disk: remove use of grant map/unmap
xen_backend: add an emulation of grant copy
xen: remove other open-coded use of libxengnttab
xen_disk: remove open-coded use of libxengnttab
xen_backend: add grant table helpers
xen: add a meaningful declaration of grant_copy_segment into xen_common.h
checkpatch: generalize xen handle matching in the list of types
xen-hvm: create separate function for ioreq server initialization
xen_pt: Present the size of 64 bit BARs correctly
configure: Add explanation for --enable-xen-pci-passthrough
xen/pt: use address_space_memory object for memory region hooks
xen-pvdevice: Introduce a simplistic xen-pvdevice save state
Peter Maydell [Thu, 24 May 2018 09:25:43 +0000 (10:25 +0100)]
Merge remote-tracking branch 'remotes/mwalle/tags/lm32-queue/20180521' into staging
target/lm32: BQL patch
# gpg: Signature made Tue 22 May 2018 19:25:30 BST
# gpg: using RSA key B458ABB0D8D378E3
# gpg: Good signature from "Michael Walle <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2190 3E48 4537 A7C2 90CE 3EB2 B458 ABB0 D8D3 78E3
* remotes/mwalle/tags/lm32-queue/20180521:
lm32: take BQL before writing IP/IM register
Gerd Hoffmann [Tue, 22 May 2018 16:50:55 +0000 (18:50 +0200)]
hw/display: add new bochs-display device
After writing up the virtual mdev device emulating a display supporting
the bochs vbe dispi interface (mbochs.ko) and seeing how simple it
actually is I've figured that would be useful for qemu too.
So, here it is, -device bochs-display. It is basically -device VGA
without legacy vga emulation. PCI bar 0 is the framebuffer, PCI bar 2
is mmio with the registers. The vga registers are simply not there
though, neither in the legacy ioport location nor in the mmio bar.
Consequently it is PCI class DISPLAY_OTHER not DISPLAY_VGA.
So there is no text mode emulation, no weird video modes (planar,
256color palette), no memory window at 0xa0000. Just a linear
framebuffer in the pci memory bar. And the amount of code to emulate
this (and therefore the attack surface) is an order of magnitude smaller
when compared to vga emulation.
Compatibility wise it works with OVMF (latest git master).
The bochs-drm.ko linux kernel module can handle it just fine too.
So UEFI guests should not see any functional difference to VGA.
Gerd Hoffmann [Mon, 14 May 2018 10:31:17 +0000 (12:31 +0200)]
vga: catch depth 0
depth == 0 is used to indicate 256 color modes. Our region calculation
goes wrong in that case. So detect that and just take the safe code
path we already have for the wraparound case.
While being at it also catch depth == 15 (where our region size
calculation goes wrong too). And make the comment more verbose,
explaining what is going on here.
Without this windows guest install might trigger an assert due to trying
to check dirty bitmap outside the snapshot region.
Peter Xu [Fri, 18 May 2018 07:25:17 +0000 (15:25 +0800)]
intel-iommu: rework the page walk logic
This patch fixes a potential small window that the DMA page table might
be incomplete or invalid when the guest sends domain/context
invalidations to a device. This can cause random DMA errors for
assigned devices.
This is a major change to the VT-d shadow page walking logic. It
includes but is not limited to:
- For each VTDAddressSpace, now we maintain what IOVA ranges we have
mapped and what we have not. With that information, now we only send
MAP or UNMAP when necessary. Say, we don't send MAP notifies if we
know we have already mapped the range, meanwhile we don't send UNMAP
notifies if we know we never mapped the range at all.
- Introduce vtd_sync_shadow_page_table[_range] APIs so that we can call
in any places to resync the shadow page table for a device.
- When we receive domain/context invalidation, we should not really run
the replay logic, instead we use the new sync shadow page table API to
resync the whole shadow page table without unmapping the whole
region. After this change, we'll only do the page walk once for each
domain invalidations (before this, it can be multiple, depending on
number of notifiers per address space).
While at it, the page walking logic is also refactored to be simpler.
Peter Xu [Fri, 18 May 2018 07:25:15 +0000 (15:25 +0800)]
intel-iommu: trace domain id during page walk
This patch only modifies the trace points.
Previously we were tracing page walk levels. They are redundant since
we have page mask (size) already. Now we trace something much more
useful which is the domain ID of the page walking. That can be very
useful when we trace more than one devices on the same system, so that
we can know which map is for which domain.
Peter Xu [Fri, 18 May 2018 07:25:13 +0000 (15:25 +0800)]
intel-iommu: introduce vtd_page_walk_info
During the recursive page walking of IOVA page tables, some stack
variables are constant variables and never changed during the whole page
walking procedure. Isolate them into a struct so that we don't need to
pass those contants down the stack every time and multiple times.
Peter Xu [Fri, 18 May 2018 07:25:12 +0000 (15:25 +0800)]
intel-iommu: only do page walk for MAP notifiers
For UNMAP-only IOMMU notifiers, we don't need to walk the page tables.
Fasten that procedure by skipping the page table walk. That should
boost performance for UNMAP-only notifiers like vhost.
Peter Xu [Fri, 18 May 2018 07:25:11 +0000 (15:25 +0800)]
intel-iommu: add iommu lock
SECURITY IMPLICATION: this patch fixes a potential race when multiple
threads access the IOMMU IOTLB cache.
Add a per-iommu big lock to protect IOMMU status. Currently the only
thing to be protected is the IOTLB/context cache, since that can be
accessed even without BQL, e.g., in IO dataplane.
Note that we don't need to protect device page tables since that's fully
controlled by the guest kernel. However there is still possibility that
malicious drivers will program the device to not obey the rule. In that
case QEMU can't really do anything useful, instead the guest itself will
be responsible for all uncertainties.
Peter Xu [Fri, 18 May 2018 07:25:10 +0000 (15:25 +0800)]
intel-iommu: remove IntelIOMMUNotifierNode
That is not really necessary. Removing that node struct and put the
list entry directly into VTDAddressSpace. It simplfies the code a lot.
Since at it, rename the old notifiers_list into vtd_as_with_notifiers.
Peter Xu [Fri, 18 May 2018 07:25:09 +0000 (15:25 +0800)]
intel-iommu: send PSI always even if across PDEs
SECURITY IMPLICATION: without this patch, any guest with both assigned
device and a vIOMMU might encounter stale IO page mappings even if guest
has already unmapped the page, which may lead to guest memory
corruption. The stale mappings will only be limited to the guest's own
memory range, so it should not affect the host memory or other guests on
the host.
During IOVA page table walking, there is a special case when the PSI
covers one whole PDE (Page Directory Entry, which contains 512 Page
Table Entries) or more. In the past, we skip that entry and we don't
notify the IOMMU notifiers. This is not correct. We should send UNMAP
notification to registered UNMAP notifiers in this case.
For UNMAP only notifiers, this might cause IOTLBs cached in the devices
even if they were already invalid. For MAP/UNMAP notifiers like
vfio-pci, this will cause stale page mappings.
This special case doesn't trigger often, but it is very easy to be
triggered by nested device assignments, since in that case we'll
possibly map the whole L2 guest RAM region into the device's IOVA
address space (several GBs at least), which is far bigger than normal
kernel driver usages of the device (tens of MBs normally).
Without this patch applied to L1 QEMU, nested device assignment to L2
guests will dump some errors like:
hw/virtio/vhost-user.c:1319:26: error: suggest braces
around initialization of subobject [-Werror,-Wmissing-braces]
VhostUserMsg msg = { 0 };
^
{}
While the original code is correct, and technically exactly correct
as per ISO C89, both GCC and Clang support plain empty set of braces
as an extension.
The response to a VHOST_USER_POSTCOPY_ADVISE contains a fd but doesn't
actually contain any data. FIx vu_message_write so that it doesn't
do a 0-byte write() call, since this was ending up with rc=0
that was confusing the error handling code.
virtio: support setting memory region based host notifier
This patch introduces the support for setting memory region
based host notifiers for virtio device. This is helpful when
using a hardware accelerator for a virtio device, because
hardware heavily depends on the notification, this will allow
the guest driver in the VM to notify the hardware directly.
Kevin Wolf [Tue, 8 May 2018 16:10:16 +0000 (18:10 +0200)]
iotests: Move qmp_to_opts() to VM
qmp_to_opts() used to be a method of QMPTestCase, but recently we
started to add more Python test cases that don't make use of
QMPTestCase. In order to make the method usable there, move it to VM.
Kevin Wolf [Fri, 4 May 2018 14:25:43 +0000 (16:25 +0200)]
job: Add query-jobs QMP command
This adds a minimal query-jobs implementation that shouldn't pose many
design questions. It can later be extended to expose more information,
and especially job-specific information.
Kevin Wolf [Fri, 4 May 2018 10:17:20 +0000 (12:17 +0200)]
job: Move progress fields to Job
BlockJob has fields .offset and .len, which are actually misnomers today
because they are no longer tied to block device sizes, but just progress
counters. As such they make a lot of sense in generic Jobs.
This patch moves the fields to Job and renames them to .progress_current
and .progress_total to describe their function better.
Kevin Wolf [Wed, 25 Apr 2018 12:56:09 +0000 (14:56 +0200)]
job: Add job_transition_to_ready()
The transition to the READY state was still performed in the BlockJob
layer, in the same function that sent the BLOCK_JOB_READY QMP event.
This patch brings the state transition to the Job layer and implements
the QMP event using a notifier called from the Job layer, like we
already do for other events related to state transitions.
Kevin Wolf [Wed, 25 Apr 2018 13:09:58 +0000 (15:09 +0200)]
job: Add job_is_ready()
Instead of having a 'bool ready' in BlockJob, add a function that
derives its value from the job status.
At the same time, this fixes the behaviour to match what the QAPI
documentation promises for query-block-job: 'true if the job may be
completed'. When the ready flag was introduced in commit ef6dbf1e46e,
the flag never had to be reset to match the description because after
being ready, the jobs would immediately complete and disappear.
Job transactions and manual job finalisation were introduced only later.
With these changes, jobs may stay around even after having completed
(and they are not ready to be completed a second time), however their
patches forgot to reset the ready flag.
Kevin Wolf [Wed, 16 May 2018 11:46:37 +0000 (13:46 +0200)]
block: Cancel job in bdrv_close_all() callers
Now that we cancel all jobs and not only block jobs on shutdown, doing
that in bdrv_close_all() isn't really appropriate any more. Move the
job_cancel_sync_all() call to the callers, and only assert that there
are no job running in bdrv_close_all().