Igor Druzhinin [Mon, 10 Jul 2017 22:40:02 +0000 (23:40 +0100)]
xen/mapcache: introduce xen_replace_cache_entry()
This new call is trying to update a requested map cache entry
according to the changes in the physmap. The call is searching
for the entry, unmaps it and maps again at the same place using
a new guest address. If the mapping is dummy this call will
make it real.
This function makes use of a new xenforeignmemory_map2() call
with an extended interface that was recently introduced in
libxenforeignmemory [1].
Igor Druzhinin [Mon, 10 Jul 2017 22:40:01 +0000 (23:40 +0100)]
xen/mapcache: add an ability to create dummy mappings
Dummys are simple anonymous mappings that are placed instead
of regular foreign mappings in certain situations when we need
to postpone the actual mapping but still have to give a
memory region to QEMU to play with.
Commit 090fa1c8 "add support for unplugging NVMe disks..." extended the
existing disk unplug flag to cover NVMe disks as well as IDE and SCSI.
The recent thread on the xen-devel mailing list [1] has highlighted that
this is not desirable behaviour: PV frontends should be able to distinguish
NVMe disks from other types of disk and should have separate control over
whether they are unplugged.
This patch defines a new bit in the unplug mask for this purpose (see Xen
commit [2]) and also tidies up the definitions of, and improves the
comments regarding, the previously exiting bits in the protocol.
Peter Maydell [Sun, 9 Jul 2017 16:37:22 +0000 (17:37 +0100)]
xen_pt_msi.c: Check for xen_host_pci_get_* failures in xen_pt_msix_init()
Check the return status of the xen_host_pci_get_* functions we call in
xen_pt_msix_init(), and fail device init if the reads failed rather than
ploughing ahead. (Spotted by Coverity: CID 777338.)
In igd passthrough environment, guest could only access opregion at the
first bootup time. Once guest shutdown, later guest couldn't access
opregion anymore.
This is because qemu set emulated guest opregion base address to host
register. Later guest get a wrong host opregion base address, and couldn't
access it anymore.
This patch set emu_mask for igd_opregion register, so guest won't set
guest opregion base address to host.
* remotes/ericb/tags/pull-nbd-2017-07-17:
nbd: Fix server reply to NBD_OPT_EXPORT_NAME of older clients
nbd: Trace client command being sent
nbd: Fix iotests failure due to changed client error message
The migration tests used two VMs each with -m 1024 this caused
problems when run in some small, pessimistic test VMs (netbsd).
We can just be meaner with the amount of RAM in the test and use -m 384
John Snow [Tue, 18 Jul 2017 15:47:57 +0000 (11:47 -0400)]
ahci: Isolate public AHCI interface
Begin separating the public/private interface by removing the minimum
set of information used by code outside of hw/ide/ and calling this
a new ahci_public.h file, which will be renamed to ahci.h in a future
patch.
Peter Xu [Tue, 18 Jul 2017 03:39:08 +0000 (11:39 +0800)]
migration: provide migrate_caps_check()
Abstract helper function to check migration capabilities (from the old
qmp_migrate_set_capabilities). Prepare to be used somewhere else.
There is side effect on the change: when applying the capabilities, we
were skipping the invalid ones, but still applying the valid ones (if
they are provided in the same QMP request). After this refactoring,
we'll ignore all the capabilities if we detected invalid setup along the
way. However, I don't think it is a problem since general users should
not provide anything invalid after all.
Peter Xu [Tue, 18 Jul 2017 03:39:07 +0000 (11:39 +0800)]
migration: remove check against colo support
Since commit a15215f3 ("build: remove --enable-colo/--disable-colo"),
colo is always supported. We don't need any colo_supported() now since
it is always true. Removing any extra code that depends on it.
When we issue a cancel and clean up the RDMA channel
send a CONTROL_ERROR to get the destination to quit.
The rdma_cleanup code waits for the event to come back
from the rdma_disconnect; but that wont happen until the
destination quits and there's currently nothing to force
it.
Note this makes the case of a cancel work while the destination
is alive, and it already works if the destination is
truly dead. Note it doesn't fix the case where the destination
is hung (we get stuck waiting for the rdma_disconnect event).
control_desc[] is an array of strings that correspond to a
series of message types; they're used only for error messages, but if
the message type is seriously broken then we could go off the end of
the array.
Convert the array to a function control_desc() that bound checks.
migration/rdma: Allow cancelling while waiting for wrid
When waiting for a WRID, if the other side dies we end up waiting
for ever with no way to cancel the migration.
Cure this by poll()ing the fd first with a timeout and checking
error flags and migration state.
Peter Maydell [Tue, 18 Jul 2017 14:24:11 +0000 (15:24 +0100)]
Merge remote-tracking branch 'remotes/ehabkost/tags/x86-and-machine-pull-request' into staging
x86 and machine queue, 2017-07-17
# gpg: Signature made Mon 17 Jul 2017 19:46:14 BST
# gpg: using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <[email protected]>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6
* remotes/ehabkost/tags/x86-and-machine-pull-request:
qmp: Include parent type on 'qom-list-types' output
qmp: Include 'abstract' field on 'qom-list-types' output
tests: Simplify abstract-interfaces check with a helper
i386: add Skylake-Server cpu model
i386: Update comment about XSAVES on Skylake-Client
i386: expose "TCGTCGTCGTCG" in the 0x40000000 CPUID leaf
fw_cfg: move QOM type defines and fw_cfg types into fw_cfg.h
fw_cfg: move qdev_init_nofail() from fw_cfg_init1() to callers
fw_cfg: switch fw_cfg_find() to locate the fw_cfg device by type rather than path
qom: Fix ambiguous path detection when ambiguous=NULL
Revert "machine: Convert abstract typename on compat_props to subclass names"
test-qdev-global-props: Test global property ordering
qdev: fix the order compat and global properties are applied
tests: Test case for object_resolve_path*()
device-crash-test: Fix regexp on whitelist
John Snow [Tue, 18 Jul 2017 00:34:22 +0000 (20:34 -0400)]
qemu-img: Check for backing image if specified during create
Or, rather, force the open of a backing image if one was specified
for creation. Using a similar -unsafe option as rebase, allow qemu-img
to ignore the backing file validation if possible.
It may not always be possible, as in the existing case when a filesize
for the new image was not specified.
This is accomplished by shifting around the conditionals in
bdrv_img_create, such that a backing file is always opened unless we
provide BDRV_O_NO_BACKING. qemu-img is adjusted to pass this new flag
when -u is provided to create.
Sorry for the heinous looking diffstat, but it's mostly whitespace.
John Snow [Tue, 18 Jul 2017 00:34:21 +0000 (20:34 -0400)]
blockdev: move BDRV_O_NO_BACKING option forward
For both external_snapshot_prepare and qmp_drive_mirror, we eventually
append the option BDRV_O_NO_BACKING. However, we generally do so after
we create the image.
To accommodate image creation wanting to verify that a backing file
exists or not, add this option prior to create to override checking
the existence of the backing file. This prevents QEMU from trying to
re-open a backing file that's already in use (thanks to qcow2 locking).
Max Reitz [Mon, 17 Jul 2017 15:12:07 +0000 (17:12 +0200)]
block/vvfat: Fix compiler warning with gcc 7
gcc 7 complains that the sprintf() might write a null byte beyond the
end of the tail buffer. That is wrong, but we can silence it by making
i unsigned (it can never be negative anyway, see the if condition right
before). For some reason, this allows gcc to suddenly accurately
calculate the range of i so we can give the tail[] array the exact size
it needs to have (which is 8 bytes) without gcc complaining.
In addition, let us convert the sprintf() to snprintf(), because that is
always nicer, and add an assertion about the range of the return value
afterwards so we can see that "8 - len" will never be negative and thus
"entry->name + MIN(j, 8 - len)" will never be out of bounds.
vvfat: correctly parse non-ASCII short and long file names
Write support works again when image contains non-ASCII names. It is either the
case when user created a non-ASCII filename, or when initial directory contained
a non-ASCII filename (since 0c36111f57ec2188f679e7fa810291b7386bdca1)
Kevin Wolf [Wed, 12 Jul 2017 11:53:03 +0000 (13:53 +0200)]
qemu-iotests: Test unplug of -device without drive
This caused an assertion failure until recently because the BlockBackend
would be detached on unplug, but was in fact never attached in the first
place. Add a regression test.
Kevin Wolf [Tue, 11 Jul 2017 12:04:08 +0000 (14:04 +0200)]
scsi-disk: bdrv_attach_dev() for empty CD-ROM
If no drive=... option is passed (for an empty drive), we don't only
lack the BlockBackend normally created by parse_drive(), but we also
need to manually call blk_attach_dev().
This fixes at least a segfault when unplugging such devices, the bug
that they didn't show up in query-block, and probably some more
problems.
Kevin Wolf [Tue, 11 Jul 2017 12:04:08 +0000 (14:04 +0200)]
ide: bdrv_attach_dev() for empty CD-ROM
If no drive=... option is passed (for an empty drive), we don't only
lack the BlockBackend normally created by parse_drive(), but we also
need to manually call blk_attach_dev().
IDE does not support hot unplug, but if it did, qdev would take care to
call the matching blk_detach_dev() on unplug.
This fixes at least the bug that such devices didn't show up in
query-block, and probably some more problems.
Kevin Wolf [Tue, 11 Jul 2017 12:00:57 +0000 (14:00 +0200)]
block: List anonymous device BBs in query-block
Instead of listing only monitor-owned BlockBackends in query-block, also
add those anonymous BlockBackends that are owned by a qdev device and as
such under the control of the user.
This allows using query-block to inspect BlockBackends for the modern
configuration syntax with -blockdev and -device.
Kevin Wolf [Tue, 11 Jul 2017 11:04:28 +0000 (13:04 +0200)]
block/qapi: Use blk_all_next() for query-block
This patch replaces the blk_next() loop in query-block by a
blk_all_next() one so that we also get access to BlockBackends that
aren't owned by the monitor. For now, the next thing we do is check
whether each BB has a name, so there is no semantic difference.
Kevin Wolf [Tue, 11 Jul 2017 11:27:38 +0000 (13:27 +0200)]
block/qapi: Add qdev device name to query-block
With -blockdev/-device, users can indirectly create anonymous
BlockBackends, while the state of such backends is still of interest. As
a preparation for making such BBs visible in query-block, make sure that
they can be identified even without a name by adding the ID/QOM path of
their qdev device to BlockInfo.
Peter Maydell [Sun, 9 Jul 2017 21:07:17 +0000 (22:07 +0100)]
block/vpc.c: Handle write failures in get_image_offset()
Coverity (CID 1355236) points out that get_image_offset() doesn't check that
it actually succeeded in writing the updated block bitmap to the file.
Check the error return from bdrv_pwrite_sync() and propagate an error
response back up to the function which calls get_image_offset() for
a write so that it can return the error to its caller.
get_sector_offset() is only used for reads, but we move it to the
same API for consistency.
Peter Maydell [Sun, 9 Jul 2017 17:06:14 +0000 (18:06 +0100)]
block/vmdk: Report failures in vmdk_read_cid()
The function vmdk_read_cid() can fail if the read on the underlying
block device fails, or if there's a format error in the VMDK file.
However its API doesn't provide a mechanism to report these errors,
and in some cases we were returning a CID of 0 and in some cases a
CID of 0xffffffff, either of which might potentially be valid values.
Change the function to return 0 on success or a negative errno, and
return the CID via a uint32_t* argument. Update the callsites to
handle and propagate the error appropriately.
This fixes in passing a Coverity-spotted issue (CID 1350038) where
we weren't checking the return value from sscanf().
block: remove timer canceling in throttle_config()
throttle_config() cancels the timers of the calling BlockBackend. This
doesn't make sense because other BlockBackends in the group remain
untouched. There's no need to cancel the timers in the one specific
BlockBackend so let's not do that. Throttled requests will run as
scheduled and future requests will follow the new configuration. This
also allows a throttle group's configuration to be changed even when it
has no members.
Clock type in throttling is currently inferred by the ThrottleTimer's
clock type even though it is a per-ThrottleGroup property; it doesn't
make sense to have different clock types in the same group. Moving this
to a field in ThrottleGroup can simplify some of the throttle functions.
Kevin Wolf [Mon, 10 Jul 2017 11:42:35 +0000 (13:42 +0200)]
commit: Add NULL check for overlay_bs
I can't see how overlay_bs could become NULL with the current code, but
other code in this function already checks it and we can make Coverity
happy with this check, so let's add it.
Peter Maydell [Tue, 18 Jul 2017 13:14:32 +0000 (14:14 +0100)]
Merge remote-tracking branch 'remotes/mdroth/tags/qga-pull-2017-07-17-v2-tag' into staging
qemu-ga patch queue
* new command: qemu-get-osinfo
* build fix for OpenBSD
* better error-reporting for failure on keyfile dump
* remove redundant initialization of qa_state global
* include libpcre in w32 package
* w32 localization fixes for service installation/registration
v2:
* fix build issue with older GCCs introduced with guest_get_osinfo
* relocated some declarations in guest_get_osinfo
* remotes/mdroth/tags/qga-pull-2017-07-17-v2-tag:
test-qga: add test for guest-get-osinfo
test-qga: pass environemnt to qemu-ga
qemu-ga: add guest-get-osinfo command
qga: report error on keyfile dump error
qga-win32: remove a redundancy code
qemu-ga: check if utmpx.h is available on the system
qemu-ga: add missing libpcre to MSI build
qga-win: fix installation on localized windows
* remotes/stefanha/tags/block-pull-request:
block: fix shadowed variable in bdrv_co_pdiscard
util/aio-win32: Only select on what we are actually waiting for
Qemu-ga was modified to accept QGA_OS_RELEASE environment variable. If
the variable is defined it is interpreted as path to the os-release file
and it is parsed instead of the default paths.
Add a new 'guest-get-osinfo' command for reporting basic information of
the guest operating system. This includes machine architecture,
version and release of the kernel and several fields from os-release
file if it is present (as defined in [1]).
Signed-off-by: Vinzenz Feenstra <[email protected]> Signed-off-by: Tomáš Golembiovský <[email protected]>
* moved declarations to beginning of functions
* dropped unecessary initialization of struct utsname Signed-off-by: Michael Roth <[email protected]>
Peter Maydell [Tue, 18 Jul 2017 10:41:03 +0000 (11:41 +0100)]
Merge remote-tracking branch 'remotes/aurel/tags/pull-target-mips-20170717' into staging
Queued target/mips patches
# gpg: Signature made Mon 17 Jul 2017 15:50:27 BST
# gpg: using RSA key 0xBA9C78061DDD8C9B
# gpg: Good signature from "Aurelien Jarno <[email protected]>"
# gpg: aka "Aurelien Jarno <[email protected]>"
# gpg: aka "Aurelien Jarno <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 7746 2642 A9EF 94FD 0F77 196D BA9C 7806 1DDD 8C9B
* remotes/aurel/tags/pull-target-mips-20170717:
target/mips: optimize WSBH, DSBH and DSHD
mips: set CP0 Debug DExcCode for SDBBP instruction
Peter Maydell [Tue, 18 Jul 2017 09:35:06 +0000 (10:35 +0100)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20170717' into staging
target-arm queue:
* new model of the ARM MPS2/MPS2+ FPGA based development board
* clean up DISAS_* exit conditions and fix various regressions
since commits e75449a3468a6b28c7b5 (in particular including
ones which broke OP-TEE guests)
* make Cortex-M3 and M4 correctly default to 8 PMSA regions
* remotes/pmaydell/tags/pull-target-arm-20170717:
MAINTAINERS: Add entries for MPS2 board
hw/arm/mps2: Add ethernet
hw/arm/mps2: Add SCC
hw/misc/mps2_scc: Implement MPS2 Serial Communication Controller
hw/arm/mps2: Add timers
hw/char/cmsdk-apb-timer: Implement CMSDK APB timer device
hw/arm/mps2: Add UARTs
hw/char/cmsdk-apb-uart.c: Implement CMSDK APB UART
hw/arm/mps2: Implement skeleton mps2-an385 and mps2-an511 board models
target/arm: use DISAS_EXIT for eret handling
target/arm: use gen_goto_tb for ISB handling
target/arm/translate: ensure gen_goto_tb sets exit flags
target/arm/translate.h: expand comment on DISAS_EXIT
target/arm/translate: make DISAS_UPDATE match declared semantics
include/exec/exec-all: document common exit conditions
target/arm: Make Cortex-M3 and M4 default to 8 PMSA regions
qdev: support properties which don't set a default value
qdev-properties.h: Explicitly set the default value for arraylen properties
Peter Maydell [Tue, 18 Jul 2017 08:16:43 +0000 (09:16 +0100)]
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Mon 17 Jul 2017 13:17:17 BST
# gpg: using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
virtio-net: fix offload ctrl endian
virtion-net: Prefer is_power_of_2()
docs/colo-proxy.txt: Update colo-proxy usage of net driver with vnet_header
net/filter-rewriter.c: Make filter-rewriter support vnet_hdr_len
net/colo-compare.c: Add vnet packet's tcp/udp/icmp compare
net/colo.c: Add vnet packet parse feature in colo-proxy
net/colo-compare.c: Make colo-compare support vnet_hdr_len
net/colo-compare.c: Introduce parameter for compare_chr_send()
net/colo.c: Make vnet_hdr_len as packet property
net/filter-mirror.c: Add new option to enable vnet support for filter-redirector
net/filter-mirror.c: Make filter mirror support vnet support.
net/filter-mirror.c: Introduce parameter for filter_send()
net/net.c: Add vnet_hdr support in SocketReadState
net: Add vnet_hdr_len arguments in NetClientState
- Use reStructuredText as markup language (with the goal of generating
the HTML output using the Sphinx Documentation Generator). It is
gentler on the eye, and can be trivially converted to different
formats. (Another reason: upstream QEMU is considering to switch to
Sphinx, which uses reStructuredText as its markup language.)
- Raw QMP JSON output vs. 'qmp-shell'. I debated with myself whether
to only show raw QMP JSON output (as that is the canonical
representation), or use 'qmp-shell', which takes key-value pairs. I
settled on the approach of: for the first occurrence of a command,
use raw JSON; for subsequent occurrences, use 'qmp-shell', with an
occasional exception.
- Usage of `-blockdev` command-line.
- Usage of 'node-name' vs. file path to refer to disks. While we have
`blockdev-{mirror, backup}` as 'node-name'-alternatives for
`drive-{mirror, backup}`, the `block-commit` command still operates
on file names for parameters 'base' and 'top'. So I added a caveat
at the beginning to that effect.
Refer this related thread that I started (where I learnt
`block-stream` was recently reworked to accept 'node-name' for 'top'
and 'base' parameters):
https://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg06466.html
"[RFC] Making 'block-stream', and 'block-commit' accept node-name"
All commands showed in this document were tested while documenting.
Thanks: Eric Blake for the section: "A note on points-in-time vs file
names". This useful bit was originally articulated by Eric in his
KVMForum 2015 presentation, so I included that specific bit in this
document.
qemu-ga: check if utmpx.h is available on the system
Commit 161a56a9065 added command guest-get-users and requires the
utmpx.h (defined by POSIX) to work. It is however not always available
(e.g. on OpenBSD) therefor a check for its existence is necessary.
Daniel Rempel [Wed, 5 Jul 2017 09:01:13 +0000 (12:01 +0300)]
qga-win: fix installation on localized windows
Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1357789
Replace hardcoded user and group names ("Administrators", "SYSTEM") with the ones acquired from system. Windows uses localized strings for these names and it may cause the installation to fail.
Windows has Well-known SIDs for "Administrators" group and "SYSTEM" user so they were used to identify required users and groups.
Well-known SIDs: https://support.microsoft.com/en-us/help/243330/well-known-security-identifiers-in-windows-operating-systems
Eric Blake [Mon, 17 Jul 2017 19:26:35 +0000 (14:26 -0500)]
nbd: Fix server reply to NBD_OPT_EXPORT_NAME of older clients
A typo in commit 23e099c set the size of buf[] used in response
to NBD_OPT_EXPORT_NAME according to the length needed for old-style
negotiation (4 bytes of flag information) instead of the intended
2 bytes used in new style. If the client doesn't enable
NBD_FLAG_C_NO_ZEROES, then the server sends two bytes too many,
and is then out of sync in response to the client's next command
(the bug is masked when modern qemu is the client, since we enable
the no zeroes flag).
While touching this code, add some more defines to nbd_internal.h
rather than having quite so many magic numbers in the .c; also,
use "" initialization rather than memset(), and tweak the oldstyle
negotiation to better match the spec description of the layout
(since the spec is big-endian, skipping two bytes as 0 followed by
writing a 2-byte flag is the same as writing a zero-extended 4-byte
flag), to make it a bit easier to follow compared to the spec.
[checkpatch.pl has some false positives in the comments]
target/s390x: Allow to enable "idtes" feature for TCG
STFL bit 4 and 5 are just indications to the guest, which TLB entries an
IDTE call will clear. These are performance indicators for the guest.
STFL bit 4:
INVALIDATE DAT TABLE ENTRY (IDTE) performs
the invalidation-and-clearing operation by
selectively clearing TLB segment-table entries
when a segment-table entry or entries are
invalidated. IDTE also performs the clearing-by-
ASCE operation. Unless bit 4 is one, IDTE simply
purges all TLBs. Bit 3 is one if bit 4 is one.
We can simply set STFL bit 4 ("idtes") and still purge the complete TLB.
Purging more than advertised is never bad. E.g. Linux doesn't even care
about this bit. We can optimized this later.
This is helpful, as the z9 base model contains this facility.
STFL bit 5 (clearing TLB region-table-entries) was never implemented on
real HW, therefore we can simply ignore it for now.
Since we require all registers saved on input, read R0 from ENV instead
of passing it manually. Recognize the specification exception when R0
contains incorrect data. Keep high bits of result registers unmodified
when in 31 or 24-bit mode.
Eric Blake [Mon, 17 Jul 2017 14:23:10 +0000 (09:23 -0500)]
nbd: Fix iotests failure due to changed client error message
Commit 8ecaeae8 changed the way the client requests an NBD export,
and in the process also changed the resulting error message when
the export is not present, breaking a couple of iotests. The error
message is now directly given by the server (a failed NBD_OPT_GO)
instead of implied by the client (after exhausting NBD_OPT_LIST),
but looking at the testsuite changes, it proves worthwhile to
reword the error message to be slightly less verbose (as this is
one particular error message likely to be hit by a user).
Note that the error message is now sensitive to which binary is
running the server as well as the client (since the expected
output is replaying a message received from the server - for that
matter, it depends on a server new enough to understand NBD_OPT_GO);
in general iotests are run on client and server from the same source
code base so the default setup will pass; but if it proves
problematic for people overriding QEMU_PROG, QEMU_IMG_PROG,
QEMU_IO_PROG, and QEMU_NBD_PROG to point across multiple builds for
cross-version integration testing, we may have to later tweak or
sanitize the output somehow.
qmp: Include parent type on 'qom-list-types' output
Include name of parent type of each type on 'qom-list-types' output.
Without this, there's no way to figure out the parents of a given type
without making additional 'qom-list-types' queries.
In addition to the test case for the new feature, update the
abstract-interface test case to use the new field and avoid the
"qom-list-types implements=object" trick.
qmp: Include 'abstract' field on 'qom-list-types' output
A client may be interested in getting the list of both abstract and
non-abstract types. Instead of requiring them to make multiple queries
with different filter arguments, just return an 'abstract' field in
'qom-list-types'.
In addition to the new test code for validating this field, update the
abstract-interfaces test case to query for all 'interface' subtypes
(including abstract ones), and to look at the 'abstract' field directly.
Introduce Skylake-Server cpu mode which inherits the features from
Skylake-Client and supports some additional features that are: AVX512,
CLWB and PGPE1GB.
i386: expose "TCGTCGTCGTCG" in the 0x40000000 CPUID leaf
Currently when running KVM, we expose "KVMKVMKVM\0\0\0" in
the 0x40000000 CPUID leaf. Other hypervisors (VMWare,
HyperV, Xen, BHyve) all do the same thing, which leaves
TCG as the odd one out.
The CPUID signature is used by software to detect which
virtual environment they are running in and (potentially)
change behaviour in certain ways. For example, systemd
supports a ConditionVirtualization= setting in unit files.
The virt-what command can also report the virt type it is
running on
Currently both these apps have to resort to custom hacks
like looking for 'fw-cfg' entry in the /proc/device-tree
file to identify TCG.
This change thus proposes a signature "TCGTCGTCGTCG" to be
reported when running under TCG.
To hide this, the -cpu option tcg-cpuid=off can be used.
Mark Cave-Ayland [Fri, 14 Jul 2017 09:40:08 +0000 (10:40 +0100)]
fw_cfg: move QOM type defines and fw_cfg types into fw_cfg.h
By exposing FWCfgIoState and FWCfgMemState internals we allow the possibility
for the internal MemoryRegion fields to be mapped by name for boards that wish
to wire up the fw_cfg device themselves.
Mark Cave-Ayland [Fri, 14 Jul 2017 09:40:07 +0000 (10:40 +0100)]
fw_cfg: move qdev_init_nofail() from fw_cfg_init1() to callers
When looking to instantiate a TYPE_FW_CFG_MEM or TYPE_FW_CFG_IO device to be
able to wire it up differently, it is much more convenient for the caller to
instantiate the device and have the fw_cfg default files already preloaded
during realize.
Move fw_cfg_init1() to the end of both the fw_cfg_mem_realize() and
fw_cfg_io_realize() functions so it no longer needs to be called manually
when instantiating the device, and also rename it to fw_cfg_common_realize()
which better describes its new purpose.
Since it is now the responsibility of the machine to wire up the fw_cfg device
it is necessary to introduce a object_property_add_child() call into
fw_cfg_init_io() and fw_cfg_init_mem() to link the fw_cfg device to the root
machine object as before.
Finally with the previous change to fw_cfg_find() we can now remove the
assert() preventing multiple fw_cfg devices being instantiated and replace
them with a simple call to fw_cfg_find() at realize time instead. This allows
us to remove FW_CFG_NAME and FW_CFG_PATH since they are no longer required.
Mark Cave-Ayland [Fri, 14 Jul 2017 09:40:06 +0000 (10:40 +0100)]
fw_cfg: switch fw_cfg_find() to locate the fw_cfg device by type rather than path
This will enable the fw_cfg device to be placed anywhere within the QOM tree
regardless of its machine location.
Note that we also add a comment to document the behaviour that we return NULL to
indicate failure where either no fw_cfg device or multiple fw_cfg devices are
found.
qom: Fix ambiguous path detection when ambiguous=NULL
object_resolve_path*() ambiguous path detection breaks when
ambiguous==NULL and the object tree have 3 objects of the same type and
only 2 of them are under the same parent. e.g.:
With the above tree, object_resolve_path_type("", TYPE_FOO, NULL) will
incorrectly return /obj2, because the search inside "/container" will
return NULL, and the match at "/obj2" won't be detected as ambiguous.
Fix that by always calling object_resolve_partial_path() with a non-NULL
ambiguous parameter.
Greg Kurz [Tue, 11 Jul 2017 00:43:01 +0000 (21:43 -0300)]
qdev: fix the order compat and global properties are applied
The current code recursively applies global properties from child up to
parent types. This can cause properties passed with the -global option to
be silently overridden by internal compat properties.
This is exactly what happened with virtio-*-pci drivers since commit:
Passing -device virtio-blk-pci.disable-modern=off had no effect on 2.6
machine types because the internal virtio-pci.disable-modern=on compat
property always prevailed.
A workaround for this was included with commit 0bcba41f ("machine:
Convert abstract typename on compat_props to subclass names").
This patch fixes the issue properly by reversing the logic: we now go
through the global property list and, for each property, we check if it
is applicable to the device.
This results in compat properties being applied first, in the order they
appear in the HW_COMPAT_* macros, followed by global properties, in the
order they appear on the command line.
At the moment VFIO PCI device initialization works as follows:
vfio_realize
vfio_get_group
vfio_connect_container
register memory listeners (1)
update QEMU groups lists
vfio_kvm_device_add_group
Then (example for pseries) the machine reset hook triggers region_add()
for all regions where listeners from (1) are listening:
This scheme works fine until we need to handle VFIO PCI device hotplug
and we want to enable PPC64/sPAPR in-kernel TCE acceleration on,
i.e. after PCI hotplug we need a place to call
ioctl(vfio_kvm_device_fd, KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE).
Since the ioctl needs a LIOBN fd (from sPAPRTCETable) and a IOMMU group fd
(from VFIOGroup), vfio_listener_region_add() seems to be the only place
for this ioctl().
However this only works during boot time because the machine reset
happens strictly after all devices are finalized. When hotplug happens,
vfio_listener_region_add() is called when a memory listener is registered
but when this happens:
1. new group is not added to the container->group_list yet;
2. VFIO KVM device is unaware of the new IOMMU group.
This moves bits around to have all necessary VFIO infrastructure
in place for both initial startup and hotplug cases.
[aw: ie, register vfio groups with kvm prior to memory listener
registration such that kvm-vfio pseudo device ioctls are available
during the region_add callback]
* remotes/stefanha/tags/tracing-pull-request:
trace: update old trace events in docs
trace: [trivial] Statically enable all guest events
trace: [tcg, trivial] Re-align generated code
trace: [tcg] Do not generate TCG code to trace dynamically-disabled events
exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state
trace: [tcg] Delay changes to dynamic state when translating
trace: Allocate cpu->trace_dstate in place