Bin Meng [Thu, 8 Sep 2022 13:28:15 +0000 (21:28 +0800)]
block/nfs: Fix 32-bit Windows build
libnfs.h declares nfs_fstat() as the following for win32:
int nfs_fstat(struct nfs_context *nfs, struct nfsfh *nfsfh,
struct __stat64 *st);
The 'st' parameter should be of type 'struct __stat64'. The
codes happen to build successfully for 64-bit Windows, but it
does not build for 32-bit Windows.
block: use the new _change_ API instead of _can_set_ and _set_
Replace all direct usage of ->can_set_aio_ctx and ->set_aio_ctx,
and call bdrv_child_try_change_aio_context() in
bdrv_try_set_aio_context(), the main function called through
the whole block layer.
From this point onwards, ->can_set_aio_ctx and ->set_aio_ctx
won't be used anymore.
block-backend: implement .change_aio_ctx in child_root
blk_root_change_aio_ctx() is very similar to blk_root_can_set_aio_ctx(),
but implements a new transaction so that if all check pass, the new
transaction's .commit will take care of changing the BlockBackend
AioContext. blk_root_set_aio_ctx_commit() is the same as
blk_root_set_aio_ctx().
Note: bdrv_child_try_change_aio_context() is not called by
anyone at this point.
child_job_change_aio_ctx() is very similar to
child_job_can_set_aio_ctx(), but it implements a new transaction
so that if all check pass, the new transaction's .commit()
will take care of changin the BlockJob AioContext.
child_job_set_aio_ctx_commit() is similar to child_job_set_aio_ctx(),
but it doesn't need to invoke the recursion, as this is already
taken care by child_job_change_aio_ctx().
Note: bdrv_child_try_change_aio_context() is not called by
anyone at this point.
bdrv_change_aio_context: use hash table instead of list of visited nodes
Minor performance improvement, but given that we have hash tables
available, avoid iterating in the visited nodes list every time just
to check if a node has been already visited.
The data structure is not actually a proper hash map, but an hash set,
as we are just adding nodes and not key,value pairs.
block: use transactions as a replacement of ->{can_}set_aio_context()
Simplify the way the aiocontext can be changed in a BDS graph.
There are currently two problems in bdrv_try_set_aio_context:
- There is a confusion of AioContext locks taken and released, because
we assume that old aiocontext is always taken and new one is
taken inside.
- It doesn't look very safe to call bdrv_drained_begin while some
nodes have already switched to the new aiocontext and others haven't.
This could be especially dangerous because bdrv_drained_begin polls, so
something else could be executed while graph is in an inconsistent
state.
Additional minor nitpick: can_set and set_ callbacks both traverse the
graph, both using the ignored list of visited nodes in a different way.
Therefore, get rid of all of this and introduce a new callback,
change_aio_context, that uses transactions to efficiently, cleanly
and most importantly safely change the aiocontext of a graph.
This new callback is a "merge" of the two previous ones:
- Just like can_set_aio_context, recursively traverses the graph.
Marks all nodes that are visited using a GList, and checks if
they *could* change the aio_context.
- For each node that passes the above check, drain it and add a new transaction
that implements a callback that effectively changes the aiocontext.
- Once done, the recursive function returns if *all* nodes can change
the AioContext. If so, commit the above transactions.
Regardless of the outcome, call transaction.clean() to undo all drains
done in the recursion.
- The transaction list is scanned only after all nodes are being drained, so
we are sure that they all are in the same context, and then
we switch their AioContext, concluding the drain only after all nodes
switched to the new AioContext. In this way we make sure that
bdrv_drained_begin() is always called under the old AioContext, and
bdrv_drained_end() under the new one.
- Because of the above, we don't need to release and re-acquire the
old AioContext every time, as everything is done once (and not
per-node drain and aiocontext change).
Note that the "change" API is not yet invoked anywhere.
block: refactor bdrv_remove_file_or_backing_child to bdrv_remove_child
Now the function can remove any child, so give it more common name.
Drop assertions and drop bs argument which becomes unused. Function
would be reused in a further commit.
block/snapshot: drop indirection around bdrv_snapshot_fallback_ptr
Now the indirection is not actually used, we can safely reduce it to
simple pointer. For consistency do a bit of refactoring to get rid of
_ptr suffixes that become meaningless.
block: Manipulate bs->file / bs->backing pointers in .attach/.detach
bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)
We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.
Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.
Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.
Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:
- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
BDRV_CHILD_PRIMARY), it's a filtered child. Use
bs->drv->filtered_child_is_backing to chose the pointer field to
modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
other child and we shouldn't care
OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:
bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.
bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.
Look at bdrv_attach_child_noperm() callers:
- bdrv_attach_child() doesn't need the feature
- bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
- bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore
So, we should drop this stuff! Great!
We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.
Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:
Revert "block: Let replace_child_noperm free children"
We are going to reimplement this behavior (clear bs->file / bs->backing
pointers automatically when child->bs is cleared) in a nicer way, see
further commit
"block: Manipulate bs->file / bs->backing pointers in .attach/.detach".
With this revert we bring back a problem that was fixed by b0a9f6fed3d8.
Still the problem was mostly theoretical, we don't have concrete bugs
fixed by b0a9f6fed3d8, we don't have a specific test. Probably some
accidental failures of iotests are related.
Alternatively, we may merge this and following three reverts into final
"block: Manipulate ..." to avoid any kind of regression. But seems that
in this case having separate clear revert commits is better.
block/snapshot: stress that we fallback to primary child
Actually what we chose is a primary child. Let's stress it in the code.
We are going to drop indirect pointer logic here in future. Actually
this commit simplifies the future work: we drop use of indirection in
the assertion now.
bdrv_pass_through is used as filter, even all node variables has
corresponding names. We want to append it, so it should be
backing-child-based filter like mirror_top.
So, in test_update_perm_tree, first child should be DATA, as we don't
want filters with two filtered children.
bdrv_exclusive_writer is used as a filter once. So it should be filter
anyway. We want to append it, so it should be backing-child-based
fitler too.
Make all FILTERED children to be PRIMARY as well. We are going to force
this rule by assertion soon.
test-bdrv-graph-mod: update test_parallel_perm_update test case
test_parallel_perm_update() does two things that we are going to
restrict in the near future:
1. It updates bs->file field by hand. bs->file will be managed
automatically by generic code (together with bs->children list).
Let's better refactor our "tricky" bds to have own state where one
of children is linked as "selected".
This also looks less "tricky", so avoid using this word.
2. It create FILTERED children that are not PRIMARY. Except for tests
all FILTERED children in the Qemu block layer are always PRIMARY as
well. We are going to formalize this rule, so let's better use DATA
children here.
3. It creates more than one FILTERED child, which is already abandoned
in BDRV_CHILD_FILTERED's description.
While being here, update the picture to better correspond to the test
code.
Almost all drivers call bdrv_open_child() similarly. Let's create a
helper for this.
The only not updated drivers that call bdrv_open_child() to set
bs->file are raw-format and snapshot-access:
raw-format sometimes want to have filtered child but
don't set drv->is_filter to true.
snapshot-access wants only DATA | PRIMARY
Possibly we should implement drv->is_filter_func() handler, to consider
raw-format as filter when it works as filter.. But it's another story.
Note also, that we decrease assignments to bs->file in code: it helps
us restrict modifying this field in further commit.
block: BlockDriver: add .filtered_child_is_backing field
Unfortunately not all filters use .file child as filtered child. Two
exclusions are mirror_top and commit_top. Happily they both are private
filters. Bad thing is that this inconsistency is observable through qmp
commands query-block / query-named-block-nodes. So, could we just
change mirror_top and commit_top to use file child as all other filter
driver is an open question. Probably, we could do that with some kind
of deprecation period, but how to warn users during it?
For now, let's just add a field so we can distinguish them in generic
code, it will be used in further commits.
The commit "Use io_uring_register_ring_fd() to skip fd operations" broke
when booting a guest with iothread and io_uring. That is because the
io_uring_register_ring_fd() call is made from the main thread instead of
IOThread where io_uring_submit() is called. It can not be guaranteed
to register the ring fd in the correct thread or unregister the same ring
fd if the IOThread is disabled. This optimization is not critical so we
will revert previous commit.
Bin Meng [Mon, 10 Oct 2022 04:04:31 +0000 (12:04 +0800)]
block: Refactor get_tmp_filename()
At present there are two callers of get_tmp_filename() and they are
inconsistent.
One does:
/* TODO: extra byte is a hack to ensure MAX_PATH space on Windows. */
char *tmp_filename = g_malloc0(PATH_MAX + 1);
...
ret = get_tmp_filename(tmp_filename, PATH_MAX + 1);
while the other does:
s->qcow_filename = g_malloc(PATH_MAX);
ret = get_tmp_filename(s->qcow_filename, PATH_MAX);
As we can see different 'size' arguments are passed. There are also
platform specific implementations inside the function, and the use
of snprintf is really undesirable.
The function name is also misleading. It creates a temporary file,
not just a filename.
Refactor this routine by changing its name and signature to:
char *create_tmp_file(Error **errp)
and use g_get_tmp_dir() / g_mkstemp() for a consistent implementation.
While we are here, add some comments to mention that /var/tmp is
preferred over /tmp on non-win32 hosts.
Bin Meng [Mon, 10 Oct 2022 04:04:30 +0000 (12:04 +0800)]
block: Ignore close() failure in get_tmp_filename()
The temporary file has been created and is ready for use. Checking
return value of close() does not seem useful. The file descriptor
is almost certainly closed; see close(2) under "Dealing with error
returns from close()".
MAINTAINERS: Fold "Block QAPI, monitor, ..." into "Block layer core"
Section "Block QAPI, monitor, command line" is about the external
interfaces we provide for block devices. It covers the relevant QAPI
schema parts, monitor and command line code, more or less.
The section's files are also covered by section "Block layer core",
except for the QAPI schema files.
I haven't acted as maintainer in this area for a long time. Make it
official: add the QAPI schema files to section "Block layer core", and
delete section "Block QAPI, monitor, command line".
* tag 'dump-pull-request' of https://gitlab.com/marcandre.lureau/qemu:
dump/win_dump: limit number of processed PRCBs
s390x: pv: Add dump support
s390x: Add KVM PV dump interface
include/elf.h: add s390x note types
s390x: Introduce PV query interface
s390x: Add protected dump cap
dump: Add architecture section and section string table support
dump: Reintroduce memory_offset and section_offset
dump: Reorder struct DumpState
dump: Write ELF section headers right after ELF header
dump: Use a buffer for ELF section data and headers
Stefan Hajnoczi [Wed, 26 Oct 2022 14:53:41 +0000 (10:53 -0400)]
Merge tag 'pull-tcg-20221026' of https://gitlab.com/rth7680/qemu into staging
Revert incorrect cflags initialization.
Add direct jumps for tcg/loongarch64.
Speed up breakpoint check.
Improve assertions for atomic.h.
Move restore_state_to_opc to TCGCPUOps.
Cleanups to TranslationBlock maintenance.
* tag 'pull-tcg-20221026' of https://gitlab.com/rth7680/qemu: (47 commits)
accel/tcg: Remove restore_state_to_opc function
target/xtensa: Convert to tcg_ops restore_state_to_opc
target/tricore: Convert to tcg_ops restore_state_to_opc
target/sparc: Convert to tcg_ops restore_state_to_opc
target/sh4: Convert to tcg_ops restore_state_to_opc
target/s390x: Convert to tcg_ops restore_state_to_opc
target/rx: Convert to tcg_ops restore_state_to_opc
target/riscv: Convert to tcg_ops restore_state_to_opc
target/ppc: Convert to tcg_ops restore_state_to_opc
target/openrisc: Convert to tcg_ops restore_state_to_opc
target/nios2: Convert to tcg_ops restore_state_to_opc
target/mips: Convert to tcg_ops restore_state_to_opc
target/microblaze: Convert to tcg_ops restore_state_to_opc
target/m68k: Convert to tcg_ops restore_state_to_opc
target/loongarch: Convert to tcg_ops restore_state_to_opc
target/i386: Convert to tcg_ops restore_state_to_opc
target/hppa: Convert to tcg_ops restore_state_to_opc
target/hexagon: Convert to tcg_ops restore_state_to_opc
target/cris: Convert to tcg_ops restore_state_to_opc
target/avr: Convert to tcg_ops restore_state_to_opc
...
Stefan Hajnoczi [Wed, 26 Oct 2022 14:04:05 +0000 (10:04 -0400)]
Merge tag 'pull-aspeed-20221025' of https://github.com/legoater/qemu into staging
aspeed queue :
* Performance improvement with Object class caching
* Serial Flash Discovery Parameters support for m25p80 device
* Various small adjustments on intructions and models
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmNX/WEACgkQUaNDx8/7
# 7KFhERAAhrcLcv15ny8RwatHPjzU00ZPQ0PcxGj1VDT66pCVh6M+rIeRPB2scOey
# Pu8jUvIYJ8w7ozjAP6YTQ1MP/WufniVi91Bx+vs/okSiWZa4dP0/G7NQWoc1at0s
# NBlkg57l1GMEeQb5x8vC1DizTQ1Z8Q8J/Ur3uXukXCmYVJAwHYpl/Foob1IPFgh8
# UcJ55LyuRq99lS8ib6HvRftAsC3DOcA/sl3b/TYR2+iKyi1VS2aZoQzxVCavSBcz
# PoTonT9O4OvIQthAgXRwpylW/aMYU3I7FeyOMKlCNLbmJ8LpVbX2v0KN3WBvWBv4
# OWP0DiqPUuoWFHLUGKbiVOgWQrTQXZyoD70SD/ObE1oMTLmeBoD1oFizQDvokHAR
# g2+gMdWnuWcbyaofY7YwuI6qz22gbrgh8JqX6sEWRDnY7HgCUvPhCsmci+bdN5cf
# dGcE8YKi7aD5gzoU9LRziPlhbwaEsgYLpYS7aGfNcmypgeq6lmNG7xKyw911zCTY
# uqDZWOUJy0tUIUTxoz3o1/KtsTFugjuZ+9W1SxELptJR37iwlP1vumf6bduwcx/3
# ba8tzNoXecXO5Icmq5P3lMNVM/abpkDDKS66HA87mABLEd/eCD0ojR9Kfxo0mD74
# kmQK3MFfJPkTu0ddu1cWhCIgTO7EuLuZL7gzj1oxoeXiU3YcVh8=
# =u7pS
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 25 Oct 2022 11:14:41 EDT
# gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <[email protected]>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1
* tag 'pull-aspeed-20221025' of https://github.com/legoater/qemu:
arm/aspeed: Replace mx25l25635e chip model
m25p80: Add the w25q01jvq SFPD table
m25p80: Add the w25q512jv SFPD table
m25p80: Add the w25q256 SFPD table
m25p80: Add the mx66l1g45g SFDP table
m25p80: Add the mx25l25635f SFPD table
m25p80: Add the mx25l25635e SFPD table
m25p80: Add erase size for mx25l25635e
m25p80: Add the n25q256a SFDP table
m25p80: Add basic support for the SFDP command
hw/arm/aspeed: increase Bletchley memory size
ast2600: Drop NEON from the CPU features
aspeed/smc: Cache AspeedSMCClass
ssi: cache SSIPeripheralClass to avoid GET_CLASS()
tests/avocado/machine_aspeed.py: Fix typos on buildroot
hw/i2c/aspeed: Fix old reg slave receive
Viktor Prutyanov [Wed, 19 Oct 2022 23:59:48 +0000 (02:59 +0300)]
dump/win_dump: limit number of processed PRCBs
When number of CPUs utilized by guest Windows is less than defined in
QEMU (i.e., desktop versions of Windows severely limits number of CPU
sockets), patch_and_save_context routine accesses non-existent PRCB and
fails. So, limit number of processed PRCBs by NumberProcessors taken
from guest Windows driver.
Janosch Frank [Mon, 17 Oct 2022 08:38:22 +0000 (08:38 +0000)]
s390x: pv: Add dump support
Sometimes dumping a guest from the outside is the only way to get the
data that is needed. This can be the case if a dumping mechanism like
KDUMP hasn't been configured or data needs to be fetched at a specific
point. Dumping a protected guest from the outside without help from
fw/hw doesn't yield sufficient data to be useful. Hence we now
introduce PV dump support.
The PV dump support works by integrating the firmware into the dump
process. New Ultravisor calls are used to initiate the dump process,
dump cpu data, dump memory state and lastly complete the dump process.
The UV calls are exposed by KVM via the new KVM_PV_DUMP command and
its subcommands. The guest's data is fully encrypted and can only be
decrypted by the entity that owns the customer communication key for
the dumped guest. Also dumping needs to be allowed via a flag in the
SE header.
On the QEMU side of things we store the PV dump data in the newly
introduced architecture ELF sections (storage state and completion
data) and the cpu notes (for cpu dump data).
Users can use the zgetdump tool to convert the encrypted QEMU dump to an
unencrypted one.
Add a tcg_ops hook to replace the restore_state_to_opc
function call. Because these generic hooks cannot depend
on target-specific types, temporarily, copy the current
target_ulong data[] into uint64_t d64[].
Since the only user, Arm MTE, always requires allocation,
merge the get and alloc functions to always produce a
non-null result. Also assume that the user has already
checked page validity.
accel/tcg: Call tb_invalidate_phys_page for PAGE_RESET
When PAGE_RESET is set, we are replacing pages with new
content, which means that we need to invalidate existing
cached data, such as TranslationBlocks. Perform the
reset invalidate while we're doing other invalidates,
which allows us to remove the separate invalidates from
the user-only mmap/munmap/mprotect routines.
In addition, restrict invalidation to PAGE_EXEC pages.
Since cdf713085131, we have validated PAGE_EXEC is present
before translation, which means we can assume that if the
bit is not present, there are no translations to invalidate.
accel/tcg: Remove duplicate store to tb->page_addr[]
When we added the fast path, we initialized page_addr[] early.
These stores in and around tb_page_add() are redundant; remove them.
Fixes: 50627f1b7b1 ("accel/tcg: Add fast path for translator_ld*") Reviewed-by: Alex Bennée <[email protected]> Signed-off-by: Richard Henderson <[email protected]>
accel/tcg: Drop cpu_get_tb_cpu_state from TARGET_HAS_PRECISE_SMC
The results of the calls to cpu_get_tb_cpu_state,
current_{pc,cs_base,flags}, are not used.
In tb_invalidate_phys_page, use bool for current_tb_modified.
accel/tcg: Make page_alloc_target_data allocation constant
Use a constant target data allocation size for all pages.
This will be necessary to reduce overhead of page tracking.
Since TARGET_PAGE_DATA_SIZE is now required, we can use this
to omit data tracking for targets that don't require it.
This differs from assert, in that with optimization enabled it
triggers at build-time. It differs from QEMU_BUILD_BUG_ON,
aka _Static_assert, in that it is sensitive to control flow
and is subject to dead-code elimination.
Leandro Lupori [Tue, 25 Oct 2022 20:24:22 +0000 (17:24 -0300)]
accel/tcg: Add a quicker check for breakpoints
Profiling QEMU during Fedora 35 for PPC64 boot revealed that a
considerable amount of time was being spent in
check_for_breakpoints() (0.61% of total time on PPC64 and 2.19% on
amd64), even though it was just checking that its queue was empty
and returning, when no breakpoints were set. It turns out this
function is not inlined by the compiler and it's always called by
helper_lookup_tb_ptr(), one of the most called functions.
By leaving only the check for empty queue in
check_for_breakpoints() and moving the remaining code to
check_for_breakpoints_slow(), called only when the queue is not
empty, it's possible to avoid the call overhead. An improvement of
about 3% in total time was measured on POWER9.
* tag 'linux-user-for-7.2-pull-request' of https://gitlab.com/laurent_vivier/qemu:
linux-user: Add guest memory layout to exception dump
linux-user: Implement faccessat2
linux-user: remove conditionals for many fs.h ioctls
linux-user: add more compat ioctl definitions
linux-user: don't use AT_EXECFD in do_openat()
linux-user: handle /proc/self/exe with execve() syscall
linux-user: fix pidfd_send_signal()
linux-user: Fix more MIPS n32 syscall ABI issues
Qi Hu [Mon, 17 Oct 2022 02:08:26 +0000 (10:08 +0800)]
tcg/aarch64: Remove unused code in tcg_out_op
AArch64 defines the TCG_TARGET_HAS_direct_jump. So the "else" block is
useless in the case of "INDEX_op_goto_tb" in function "tcg_out_op". Add
an assertion and delete these codes for clarity.