Peter Maydell [Fri, 15 Mar 2019 11:39:42 +0000 (11:39 +0000)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20190315' into staging
target-arm queue:
* Add missing SVE-enabled check to ADDVL/ADDPL/RDVL
* virt-acpi-build: use PCIE_MMCFG_BUS to retrieve end_bus_number
* virt-acpi-build: Fix SMMUv3 GSIV values
* Allow EL0 to write to arch timer registers, not just read them
* bcm2836_control: Implement local timer
Some generic arch timer registers are Config-RW in the EL0,
which means the EL0 exception level can have write permission
if it is appropriately configured.
When VM access registers, QEMU firstly checks whether they have RW
permission, then check whether it is appropriately configured.
If they are defined to read only in EL0, even though they have been
appropriately configured, they still do not have write permission.
So need to add the write permission according to ARMV8 spec when
define it.
Eric Auger [Fri, 15 Mar 2019 11:12:28 +0000 (11:12 +0000)]
hw/arm/virt-acpi-build: Fix SMMUv3 GSIV values
The GSIV numbers of the SPI based interrupts is not correct as
ARM_SPI_BASE was not added to the irqmap[VIRT_SMMU] value. So
this may collide with VIRTIO_MMIO irq window.
Zoltán Baldaszti [Fri, 15 Mar 2019 11:12:28 +0000 (11:12 +0000)]
hw/intc/bcm2836_control: Implement local timer
The BCM2836 control logic module includes a simple
"local timer" which is a programmable down-counter that
can generates an interrupt. Implement this functionality.
Signed-off-by: Zoltán Baldaszti <[email protected]>
[PMM: wrote commit message; wrapped long line; tweaked
some comments to match the final version of the code] Reviewed-by: Peter Maydell <[email protected]> Signed-off-by: Peter Maydell <[email protected]>
* remotes/vivier2/tags/trivial-branch-pull-request:
tests/.gitignore: ignore test-qapi-emit-events.[ch] for in-tree builds
.gitignore: ignore docs/built created for in-tree builds
maint: Ignore built elf2dmp
Peter Maydell [Thu, 14 Mar 2019 13:04:46 +0000 (13:04 +0000)]
Merge remote-tracking branch 'remotes/stsquad/tags/pull-testing-for-4.0-120319-1' into staging
Final testing fixes for 4.0
- various CI tweaks and fixes
- fixes for some tcg tests
- addition of system tcg tests
# gpg: Signature made Tue 12 Mar 2019 17:07:24 GMT
# gpg: using RSA key 6685AE99E75167BCAFC8DF35FBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <[email protected]>" [full]
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44
* remotes/stsquad/tags/pull-testing-for-4.0-120319-1: (26 commits)
.travis.yml: add softmmu check-tcg tests
.travis.yml: separate softfloat from check-tcg
tests/tcg/arm: account for pauth randomness
tests/tcg/i386: add memory test to exercise softmmu
tests/tcg/i386: add system mode Hello World test
tests/tcg: provide a minilib for system tests
tests/tcg: enable cris base user-mode tests
tests/tcg/cris: align mul operations
tests/tcg/cris: comment out the ccs test
tests/tcg: split cris tests into bare and libc directories
tests/tcg/cris: cleanup sys.c
tests/docker: add fedora-cris-cross compilers
tests/tcg/arm: add ARMv6-M UNDEFINED 32-bit instruction test
tests/tcg/xtensa: enable system tests
tests/docker: add debian-xtensa-cross image
tests/tcg/mips: fix hello-mips compilation
tests/tcg: add gdb runner variant
tests/tcg: split run-test into user and system variants
tests/tcg: add QEMU_OPT option for test runner
tests/tcg: enable tcg tests for softmmu
...
Peter Maydell [Thu, 14 Mar 2019 09:34:51 +0000 (09:34 +0000)]
Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
Pull request
* Add 'drop-cache=on|off' option to file-posix.c. The default is on.
Disabling the option fixes a QEMU 3.0.0 performance regression when live
migrating on the same host with cache.direct=off.
# gpg: Signature made Wed 13 Mar 2019 11:07:48 GMT
# gpg: using RSA key 9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>" [full]
# gpg: aka "Stefan Hajnoczi <[email protected]>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8
Peter Maydell [Wed, 13 Mar 2019 22:20:27 +0000 (22:20 +0000)]
Merge remote-tracking branch 'remotes/palmer/tags/riscv-for-master-4.0-sf4' into staging
target/riscv: Convert to decodetree
Bastian: this patchset converts the RISC-V decoder to decodetree in four major steps:
1) Convert 32-bit instructions to decodetree [Patch 1-15]:
Many of the gen_* functions are called by the decode functions for 16-bit
and 32-bit functions. If we move translation code from the gen_*
functions to the generated trans_* functions of decode-tree, we get a lot of
duplication. Therefore, we mostly generate calls to the old gen_* function
which are properly replaced after step 2).
Each of the trans_ functions are grouped into files corresponding to their
ISA extension, e.g. addi which is in RV32I is translated in the file
'trans_rvi.inc.c'.
2) Convert 16-bit instructions to decodetree [Patch 16-18]:
All 16 bit instructions have a direct mapping to a 32 bit instruction. Thus,
we convert the arguments in the 16 bit trans_ function to the arguments of
the corresponding 32 bit instruction and call the 32 bit trans_ function.
3) Remove old manual decoding in gen_* function [Patch 19-29]:
this move all manual translation code into the trans_* instructions of
decode tree, such that we can remove the old decode_* functions.
Palmer: This, with some additional cleanup patches, passed Alistar's
testing on rv32 and rv64 as well as my testing on rv64, so I think it's
good to go. I've run my standard test against this exact tag.
I still don't have a Mac to try this on, sorry!
# gpg: Signature made Wed 13 Mar 2019 13:44:49 GMT
# gpg: using RSA key 00CE76D1834960DFCE886DF8EF4CA1502CCBAB41
# gpg: issuer "[email protected]"
# gpg: Good signature from "Palmer Dabbelt <[email protected]>" [unknown]
# gpg: aka "Palmer Dabbelt <[email protected]>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 00CE 76D1 8349 60DF CE88 6DF8 EF4C A150 2CCB AB41
* remotes/palmer/tags/riscv-for-master-4.0-sf4: (29 commits)
target/riscv: Remove decode_RV32_64G()
target/riscv: Remove gen_system()
target/riscv: Rename trans_arith to gen_arith
target/riscv: Remove manual decoding of RV32/64M insn
target/riscv: Remove shift and slt insn manual decoding
target/riscv: make ADD/SUB/OR/XOR/AND insn use arg lists
target/riscv: Move gen_arith_imm() decoding into trans_* functions
target/riscv: Remove manual decoding from gen_store()
target/riscv: Remove manual decoding from gen_load()
target/riscv: Remove manual decoding from gen_branch()
target/riscv: Remove gen_jalr()
target/riscv: Convert quadrant 2 of RVXC insns to decodetree
target/riscv: Convert quadrant 1 of RVXC insns to decodetree
target/riscv: Convert quadrant 0 of RVXC insns to decodetree
target/riscv: Convert RV priv insns to decodetree
target/riscv: Convert RV64D insns to decodetree
target/riscv: Convert RV32D insns to decodetree
target/riscv: Convert RV64F insns to decodetree
target/riscv: Convert RV32F insns to decodetree
target/riscv: Convert RV64A insns to decodetree
...
configure: remove slirp submodule support that doesn't exist yet
The slirp code is not yet split off into a separate repository, so
configuring QEMU to use slirp as a submodule is premature. This
causes the non-existant "slirp" to be requested from git when syncing
submodules. This in turn appears to be cause of non-deterministic
failures some developers are seeing with QEMU's submodule sync process.
Peter Maydell [Wed, 13 Mar 2019 19:10:40 +0000 (19:10 +0000)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pci, pc, virtio: features, fixes, cleanups
intel-iommu scalable option
pcie acs emulation
beginning for vhost-user-blk reconnect and of vhost-user backend work
misc fixes and cleanups
Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Wed 13 Mar 2019 02:52:02 GMT
# gpg: using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>" [full]
# gpg: aka "Michael S. Tsirkin <[email protected]>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream: (26 commits)
i386, acpi: check acpi_memory_hotplug capacity in pre_plug
gen_pcie_root_port: Add ACS (Access Control Services) capability
pcie: Add a simple PCIe ACS (Access Control Services) helper function
vhost-user-blk: Add support to get/set inflight buffer
libvhost-user: Support tracking inflight I/O in shared memory
libvhost-user: Introduce vu_queue_map_desc()
libvhost-user: Remove unnecessary FD flag check for event file descriptors
vhost-user: Support transferring inflight buffer between qemu and backend
nvdimm: use NVDIMM_ACPI_IO_LEN for the proper IO size
nvdimm: use *function* directly instead of allocating it again
nvdimm: fix typo in nvdimm_build_nvdimm_devices argument
intel_iommu: add scalable-mode option to make scalable mode work
intel_iommu: add 256 bits qi_desc support
intel_iommu: scalable mode emulation
libvhost-user: add vu_queue_unpop()
libvhost-user-glib: export vug_source_new()
vhost-user: split vhost_user_read()
vhost-user: wrap some read/write with retry handling
libvhost-user: exit by default on VHOST_USER_NONE
vhost-user: simplify vhost_user_init/vhost_user_cleanup
...
Peter Maydell [Wed, 13 Mar 2019 14:44:28 +0000 (14:44 +0000)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches:
- file-posix: Make auto-read-only dynamic
- Add x-blockdev-reopen QMP command
- Finalize block-latency-histogram QMP command
- gluster: Build fixes for newer lib version
# gpg: Signature made Tue 12 Mar 2019 19:30:31 GMT
# gpg: using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (28 commits)
qemu-iotests: Test the x-blockdev-reopen QMP command
block: Add an 'x-blockdev-reopen' QMP command
block: Remove the AioContext parameter from bdrv_reopen_multiple()
block: Add bdrv_reset_options_allowed()
block: Add a 'mutable_opts' field to BlockDriver
block: Allow changing the backing file on reopen
block: Allow omitting the 'backing' option in certain cases
block: Handle child references in bdrv_reopen_queue()
block: Add 'keep_old_opts' parameter to bdrv_reopen_queue()
block: Freeze the backing chain for the duration of the stream job
block: Freeze the backing chain for the duration of the mirror job
block: Freeze the backing chain for the duration of the commit job
block: Allow freezing BdrvChild links
nvme: fix write zeroes offset and count
file-posix: Make auto-read-only dynamic
file-posix: Prepare permission code for fd switching
file-posix: Lock new fd in raw_reopen_prepare()
file-posix: Store BDRVRawState.reopen_state during reopen
file-posix: Factor out raw_reconfigure_getfd()
file-posix: Fix bdrv_open_flags() for snapshot=on
...
Peter Maydell [Wed, 13 Mar 2019 13:09:38 +0000 (13:09 +0000)]
Merge remote-tracking branch 'remotes/rth/tags/pull-dt-20190312' into staging
Break out documentation to docs/devel/.
Add support for pattern groups.
Other misc cleanups for multiple decode functions.
# gpg: Signature made Tue 12 Mar 2019 16:59:37 GMT
# gpg: using RSA key 64DF38E8AF7E215F
# gpg: Good signature from "Richard Henderson <[email protected]>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F
* remotes/rth/tags/pull-dt-20190312:
decodetree: Properly diagnose fields overflowing an insn
decodetree: Prefix extract function names with decode_function
decodetree: Allow +- to begin a number initializing a field
decodetree: Produce clean output for an empty input file
decodetree: Add --static-decode option
test/decode: Add tests for PatternGroups
decodetree: Allow grouping of overlapping patterns
decodetree: Do not unconditionaly return from Pattern.output_code
decodetree: Ensure build_tree does not include values outside insnmask
decodetree: Document the usefulness of argument sets
decodetree: Move documentation to docs/devel/decodetree.rst
MAINTAINERS: Add scripts/decodetree.py to the TCG section
Stefan Hajnoczi [Thu, 7 Mar 2019 16:49:41 +0000 (16:49 +0000)]
file-posix: add drop-cache=on|off option
Commit dd577a26ff03b6829721b1ffbbf9e7c411b72378 ("block/file-posix:
implement bdrv_co_invalidate_cache() on Linux") introduced page cache
invalidation so that cache.direct=off live migration is safe on Linux.
The invalidation takes a significant amount of time when the file is
large and present in the page cache. Normally this is not the case for
cross-host live migration but it can happen when migrating between QEMU
processes on the same host.
On same-host migration we don't need to invalidate pages for correctness
anyway, so an option to skip page cache invalidation is useful. I
investigated optimizing invalidation and detecting same-host migration,
but both are hard to achieve so a user-visible option will suffice.
As a bonus this option means that the cache invalidation feature will
now be detectable by libvirt via QMP schema introspection.
target/riscv: make ADD/SUB/OR/XOR/AND insn use arg lists
manual decoding in gen_arith() is not necessary with decodetree. For now
the function is called trans_arith as the original gen_arith still
exists. The former will be renamed to gen_arith as soon as the old
gen_arith can be removed.
target/riscv: Move gen_arith_imm() decoding into trans_* functions
gen_arith_imm() does a lot of decoding manually, which was hard to read
in case of the shift instructions and is not necessary anymore with
decodetree.
target/riscv: Convert RV64I load/store insns to decodetree
this splits the 64-bit only instructions into its own decode file such
that we generate the decoder for these instructions only for the RISC-V
64 bit target.
Samuel Thibault [Mon, 11 Mar 2019 13:51:27 +0000 (14:51 +0100)]
curses: add option to specify VGA font encoding
This uses iconv to convert glyphs from the specified VGA font encoding to
unicode, and makes use of cchar_t instead of chtype when using ncursesw,
which allows to store all wide char as well as the WACS values. The default
charset is made CP437 since that is the charset of the hardware default VGA
font. This also makes the curses backend set the LC_CTYPE locale to "" to
allow curses to emit wide characters.
Before we do device realization and plug, we should allocate necessary
resources and check if memory-hotplug-support property is enabled.
At the piix4 and ich9, the memory-hotplug-support property is checked at
plug stage. This means that device has been realized and mapped into guest
address space 'pc_dimm_plug()' by the time acpi plug handler is called,
where it might fail and crash QEMU due to reaching g_assert_not_reached()
(piix4) or error_abort (ich9).
Fix it by checking if memory hotplug is enabled at pre_plug stage
where we can gracefully abort hotplug request.
Knut Omang [Thu, 21 Feb 2019 18:13:23 +0000 (19:13 +0100)]
gen_pcie_root_port: Add ACS (Access Control Services) capability
Claim ACS support in the generic PCIe root port to allow
passthrough of individual functions of a device to different
guests (in a nested virt.setting) with VFIO.
Without this patch, all functions of a device, such as all VFs of
an SR/IOV device, will end up in the same IOMMU group.
A similar situation occurs on Windows with Hyper-V.
In the single function device case, it also has a small cosmetic
benefit in that the root port itself is not grouped with
the device. VFIO handles that situation in that binding rules
only apply to endpoints, so it does not limit passthrough in
those cases.
Knut Omang [Thu, 21 Feb 2019 18:13:22 +0000 (19:13 +0100)]
pcie: Add a simple PCIe ACS (Access Control Services) helper function
Implementing an ACS capability on downstream ports and multifunction
endpoints indicates isolation and IOMMU visibility to a finer
granularity. This creates smaller IOMMU groups in the guest and thus
more flexibility in assigning endpoints to guest userspace or an L2
guest.
Xie Yongji [Thu, 28 Feb 2019 08:53:52 +0000 (16:53 +0800)]
libvhost-user: Support tracking inflight I/O in shared memory
This patch adds support for VHOST_USER_GET_INFLIGHT_FD and
VHOST_USER_SET_INFLIGHT_FD message to set/get shared buffer
to/from qemu. Then backend can track inflight I/O in this buffer.
Xie Yongji [Thu, 28 Feb 2019 08:53:49 +0000 (16:53 +0800)]
vhost-user: Support transferring inflight buffer between qemu and backend
This patch introduces two new messages VHOST_USER_GET_INFLIGHT_FD
and VHOST_USER_SET_INFLIGHT_FD to support transferring a shared
buffer between qemu and backend.
Firstly, qemu uses VHOST_USER_GET_INFLIGHT_FD to get the
shared buffer from backend. Then qemu should send it back
through VHOST_USER_SET_INFLIGHT_FD each time we start vhost-user.
This shared buffer is used to track inflight I/O by backend.
Qemu should retrieve a new one when vm reset.
Yi Sun [Tue, 5 Mar 2019 02:34:55 +0000 (10:34 +0800)]
intel_iommu: add scalable-mode option to make scalable mode work
This patch adds an option to provide flexibility for user to expose
Scalable Mode to guest. User could expose Scalable Mode to guest by
the config as below:
Liu, Yi L [Tue, 5 Mar 2019 02:34:53 +0000 (10:34 +0800)]
intel_iommu: scalable mode emulation
Intel(R) VT-d 3.0 spec introduces scalable mode address translation to
replace extended context mode. This patch extends current emulator to
support Scalable Mode which includes root table, context table and new
pasid table format change. Now intel_iommu emulates both legacy mode
and scalable mode (with legacy-equivalent capability set).
The key points are below:
1. Extend root table operations to support both legacy mode and scalable
mode.
2. Extend context table operations to support both legacy mode and
scalable mode.
3. Add pasid tabled operations to support scalable mode.
Since commit 2566378d6d13bf4d28c7770bdbda5f7682594bbe, libvhost-user
no longer panics on disconnect (rc == 0), and instead silently ignores
an invalid VHOST_USER_NONE message.
Without extra work from the API user, this will simply busy-loop on
HUP events. The obvious thing to do is to exit(0) instead, while
additional or different work can be done by overriding
iface->process_msg().
vhost-user: define conventions for vhost-user backends
As discussed during "[PATCH v4 00/29] vhost-user for input & GPU"
review, let's define a common set of backend conventions to help with
management layer implementation, and interoperability.
contrib/libvhost-user/libvhost-user.c:953:20: error: implicit conversion from enumeration type 'enum VhostUserSlaveRequest' to different enumeration type 'VhostUserRequest' (aka 'enum VhostUserRequest') [-Werror,-Wenum-conversion]
.request = VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
David Gibson [Wed, 6 Mar 2019 03:06:01 +0000 (14:06 +1100)]
virtio-balloon: Restore MADV_WILLNEED hint on balloon deflate
Prior to f6deb6d9 "virtio-balloon: Remove unnecessary MADV_WILLNEED on
deflate", the balloon device issued an madvise() MADV_WILLNEED on
pages removed from the balloon. That would hint to the host kernel
that the pages were likely to be needed by the guest in the near
future.
It's unclear if this is actually valuable or not, and so f6deb6d9
removed this, essentially ignoring balloon deflate requests. However,
concerns have been raised that this might cause a performance
regression by causing extra latency for the guest in certain
configurations.
So, until we can get actual benchmark data to see if that's the case,
this restores the old behaviour, issuing a MADV_WILLNEED when a page is
removed from the balloon.
David Gibson [Wed, 6 Mar 2019 03:06:00 +0000 (14:06 +1100)]
virtio-balloon: Fix possible guest memory corruption with inflates & deflates
This fixes a balloon bug with a nasty consequence - potentially
corrupting guest memory - but which is extremely unlikely to be
triggered in practice.
The balloon always works in 4kiB units, but the host could have a
larger page size on certain platforms. Since ed48c59 "virtio-balloon:
Safely handle BALLOON_PAGE_SIZE < host page size" we've handled this
by accumulating requests to balloon 4kiB subpages until they formed a
full host page. Since f6deb6d "virtio-balloon: Remove unnecessary
MADV_WILLNEED on deflate" we essentially ignore deflate requests.
Suppose we have a host with 8kiB pages, and one host page has subpages
A & B. If we get this sequence of events -
inflate A
deflate A
inflate B
- the current logic will discard the whole host page. That's
incorrect because the guest has deflated subpage A, and could have
written important data to it.
This patch fixes the problem by adjusting our state information about
partially ballooned host pages when deflate requests are received.
ed48c59875b6 "virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host
page size" introduced a new temporary data structure which tracks 4kiB
chunks which have been inserted into the balloon by the guest but
don't yet form a full host page which we can discard.
Unfortunately, I had a thinko and allocated that structure with
g_malloc0() but freed it with a plain free() rather than g_free().
This corrects the problem.
Wei Wang [Tue, 12 Mar 2019 09:34:40 +0000 (17:34 +0800)]
virtio-balloon: fix a use-after-free case
The elem could theorically contain both outbuf and inbufs. We move the
free operation to the end of this function to avoid using elem->in_sg
while elem has been freed.
* remotes/huth-gitlab/tags/pull-request-2019-03-12:
scripts/qemugdb: re-license timers.py to GPLv2 or later
hw/sd/sdhci: Move PCI-related code into a separate file
ahci-test: Drop dependence on global_qtest
tests: test-announce-self: fix memory leak
Alex Bennée [Tue, 26 Feb 2019 14:51:21 +0000 (14:51 +0000)]
contrib: gitdm: add more individual contributors
I know Richard's is right because I asked him in the pub. I'm guessing
Fredrik's based on the fact I vaguely remember an Atari demo. The
others I attributed to academic institutions last time I posted so
have moved them to individuals as requested.
Alberto Garcia [Tue, 12 Mar 2019 16:48:51 +0000 (18:48 +0200)]
block: Add an 'x-blockdev-reopen' QMP command
This command allows reopening an arbitrary BlockDriverState with a
new set of options. Some options (e.g node-name) cannot be changed
and some block drivers don't allow reopening, but otherwise this
command is modelled after 'blockdev-add' and the state of the reopened
BlockDriverState should generally be the same as if it had just been
added by 'blockdev-add' with the same set of options.
One notable exception is the 'backing' option: 'x-blockdev-reopen'
requires that it is always present unless the BlockDriverState in
question doesn't have a current or default backing file.
This command allows reconfiguring the graph by using the appropriate
options to change the children of a node. At the moment it's possible
to change a backing file by setting the 'backing' option to the name
of the new node, but it should also be possible to add a similar
functionality to other block drivers (e.g. Quorum, blkverify).
Although the API is unlikely to change, this command is marked
experimental for the time being so there's room to see if the
semantics need changes.
Alberto Garcia [Tue, 12 Mar 2019 16:48:49 +0000 (18:48 +0200)]
block: Add bdrv_reset_options_allowed()
bdrv_reopen_prepare() receives a BDRVReopenState with (among other
things) a new set of options to be applied to that BlockDriverState.
If an option is missing then it means that we want to reset it to its
default value rather than keep the previous one. This way the state
of the block device after being reopened is comparable to that of a
device added with "blockdev-add" using the same set of options.
Not all options from all drivers can be changed this way, however.
If the user attempts to reset an immutable option to its default value
using this method then we must forbid it.
This new function takes a BlockDriverState and a new set of options
and checks if there's any option that was previously set but is
missing from the new set of options.
If the option is present in both sets we don't need to check that they
have the same value. The loop at the end of bdrv_reopen_prepare()
already takes care of that.
Alberto Garcia [Tue, 12 Mar 2019 16:48:48 +0000 (18:48 +0200)]
block: Add a 'mutable_opts' field to BlockDriver
If we reopen a BlockDriverState and there is an option that is present
in bs->options but missing from the new set of options then we have to
return an error unless the driver is able to reset it to its default
value.
This patch adds a new 'mutable_opts' field to BlockDriver. This is
a list of runtime options that can be modified during reopen. If an
option in this list is unspecified on reopen then it must be reset (or
return an error).
Alberto Garcia [Tue, 12 Mar 2019 16:48:47 +0000 (18:48 +0200)]
block: Allow changing the backing file on reopen
This patch allows the user to change the backing file of an image that
is being reopened. Here's what it does:
- In bdrv_reopen_prepare(): check that the value of 'backing' points
to an existing node or is null. If it points to an existing node it
also needs to make sure that replacing the backing file will not
create a cycle in the node graph (i.e. you cannot reach the parent
from the new backing file).
- In bdrv_reopen_commit(): perform the actual node replacement by
calling bdrv_set_backing_hd().
There may be temporary implicit nodes between a BDS and its backing
file (e.g. a commit filter node). In these cases bdrv_reopen_prepare()
looks for the real (non-implicit) backing file and requires that the
'backing' option points to it. Replacing or detaching a backing file
is forbidden if there are implicit nodes in the middle.
Although x-blockdev-reopen is meant to be used like blockdev-add,
there's an important thing that must be taken into account: the only
way to set a new backing file is by using a reference to an existing
node (previously added with e.g. blockdev-add). If 'backing' contains
a dictionary with a new set of options ({"driver": "qcow2", "file": {
... }}) then it is interpreted that the _existing_ backing file must
be reopened with those options.
Alberto Garcia [Tue, 12 Mar 2019 16:48:46 +0000 (18:48 +0200)]
block: Allow omitting the 'backing' option in certain cases
Of all options of type BlockdevRef used to specify children in
BlockdevOptions, 'backing' is the only one that is optional.
For "x-blockdev-reopen" we want that if an option is omitted then it
must be reset to its default value. The default value of 'backing'
means that QEMU opens the backing file specified in the image
metadata, but this is not something that we want to support for the
reopen operation.
Because of this the 'backing' option has to be specified during
reopen, pointing to the existing backing file if we want to keep it,
or pointing to a different one (or NULL) if we want to replace it (to
be implemented in a subsequent patch).
In order to simplify things a bit and not to require that the user
passes the 'backing' option to every single block device even when
it's clearly not necessary, this patch allows omitting this option if
the block device being reopened doesn't have a backing file attached
_and_ no default backing file is specified in the image metadata.
Alberto Garcia [Tue, 12 Mar 2019 16:48:45 +0000 (18:48 +0200)]
block: Handle child references in bdrv_reopen_queue()
Children in QMP are specified with BlockdevRef / BlockdevRefOrNull,
which can contain a set of child options, a child reference, or
NULL. In optional attributes like "backing" it can also be missing.
Only the first case (set of child options) is being handled properly
by bdrv_reopen_queue(). This patch deals with all the others.
Here's how these cases should be handled when bdrv_reopen_queue() is
deciding what to do with each child of a BlockDriverState:
1) Set of child options: if the child was implicitly created (i.e
inherits_from points to the parent) then the options are removed
from the parent's options QDict and are passed to the child with
a recursive bdrv_reopen_queue() call. This case was already
working fine.
2) Child reference: there's two possibilites here.
2a) Reference to the current child: if the child was implicitly
created then it is put in the reopen queue, keeping its
current set of options (since this was a child reference
there was no way to specify a different set of options).
If the child is not implicit then it keeps its current set
of options but it is not reopened (and therefore does not
inherit any new option from the parent).
2b) Reference to a different BDS: the current child is not put
in the reopen queue at all. Passing a reference to a
different BDS can be used to replace a child, although at
the moment no driver implements this, so it results in an
error. In any case, the current child is not going to be
reopened (and might in fact disappear if it's replaced)
3) NULL: This is similar to (2b). Although no driver allows this
yet it can be used to detach the current child so it should not
be put in the reopen queue.
4) Missing option: at the moment "backing" is the only case where
this can happen. With "blockdev-add", leaving "backing" out
means that the default backing file is opened. We don't want to
open a new image during reopen, so we require that "backing" is
always present. We'll relax this requirement a bit in the next
patch. If keep_old_opts is true and "backing" is missing then
this behaves like 2a (the current child is reopened).
Alberto Garcia [Tue, 12 Mar 2019 16:48:44 +0000 (18:48 +0200)]
block: Add 'keep_old_opts' parameter to bdrv_reopen_queue()
The bdrv_reopen_queue() function is used to create a queue with
the BDSs that are going to be reopened and their new options. Once
the queue is ready bdrv_reopen_multiple() is called to perform the
operation.
The original options from each one of the BDSs are kept, with the new
options passed to bdrv_reopen_queue() applied on top of them.
For "x-blockdev-reopen" we want a function that behaves much like
"blockdev-add". We want to ignore the previous set of options so that
only the ones actually specified by the user are applied, with the
rest having their default values.
One of the things that we need is a way to tell bdrv_reopen_queue()
whether we want to keep the old set of options or not, and that's what
this patch does. All current callers are setting this new parameter to
true and x-blockdev-reopen will set it to false.