Daniel Borkmann [Wed, 16 Oct 2024 13:49:12 +0000 (15:49 +0200)]
bpf: Fix print_reg_state's constant scalar dump
print_reg_state() should not consider adding reg->off to reg->var_off.value
when dumping scalars. Scalars can be produced with reg->off != 0 through
BPF_ADD_CONST, and thus as-is this can skew the register log dump.
Daniel Borkmann [Wed, 16 Oct 2024 13:49:11 +0000 (15:49 +0200)]
bpf: Fix incorrect delta propagation between linked registers
Nathaniel reported a bug in the linked scalar delta tracking, which can lead
to accepting a program with OOB access. The specific code is related to the
sync_linked_regs() function and the BPF_ADD_CONST flag, which signifies a
constant offset between two scalar registers tracked by the same register id.
The verifier attempts to track "similar" scalars in order to propagate bounds
information learned about one scalar to others. For instance, if r1 and r2
are known to contain the same value, then upon encountering 'if (r1 != 0x1234)
goto xyz', not only does it know that r1 is equal to 0x1234 on the path where
that conditional jump is not taken, it also knows that r2 is.
Additionally, with env->bpf_capable set, the verifier will track scalars
which should be a constant delta apart (if r1 is known to be one greater than
r2, then if r1 is known to be equal to 0x1234, r2 must be equal to 0x1233.)
The code path for the latter in adjust_reg_min_max_vals() is reached when
processing both 32 and 64-bit addition operations. While adjust_reg_min_max_vals()
knows whether dst_reg was produced by a 32 or a 64-bit addition (based on the
alu32 bool), the only information saved in dst_reg is the id of the source
register (reg->id, or'ed by BPF_ADD_CONST) and the value of the constant
offset (reg->off).
Later, the function sync_linked_regs() will attempt to use this information
to propagate bounds information from one register (known_reg) to others,
meaning, for all R in linked_regs, it copies known_reg range (and possibly
adjusting delta) into R for the case of R->id == known_reg->id.
For the delta adjustment, meaning, matching reg->id with BPF_ADD_CONST, the
verifier adjusts the register as reg = known_reg; reg += delta where delta
is computed as (s32)reg->off - (s32)known_reg->off and placed as a scalar
into a fake_reg to then simulate the addition of reg += fake_reg. This is
only correct, however, if the value in reg was created by a 64-bit addition.
When reg contains the result of a 32-bit addition operation, its upper 32
bits will always be zero. sync_linked_regs() on the other hand, may cause
the verifier to believe that the addition between fake_reg and reg overflows
into those upper bits. For example, if reg was generated by adding the
constant 1 to known_reg using a 32-bit alu operation, then reg->off is 1
and known_reg->off is 0. If known_reg is known to be the constant 0xFFFFFFFF,
sync_linked_regs() will tell the verifier that reg is equal to the constant
0x100000000. This is incorrect as the actual value of reg will be 0, as the
32-bit addition will wrap around.
What can be seen here is that r1 is copied to r2 and r4, such that {r1,r2,r4}.id
are all the same which later lets sync_linked_regs() to be invoked. Then, in
a next step constants are added with alu32 to r2 and r4, setting their ->off,
as well as id |= BPF_ADD_CONST. Next, the conditional will bind r2 and
propagate ranges to its linked registers. The verifier now believes the upper
32 bits of r4 are r4=0xffffffff80000001, while actually r4=r1=0x80000001.
One approach for a simple fix suitable also for stable is to limit the constant
delta tracking to only 64-bit alu addition. If necessary at some later point,
BPF_ADD_CONST could be split into BPF_ADD_CONST64 and BPF_ADD_CONST32 to avoid
mixing the two under the tradeoff to further complicate sync_linked_regs().
However, none of the added tests from dedf56d775c0 ("selftests/bpf: Add tests
for add_const") make this necessary at this point, meaning, BPF CI also passes
with just limiting tracking to 64-bit alu addition.
Jordan Rome [Wed, 16 Oct 2024 21:00:48 +0000 (14:00 -0700)]
bpf: Properly test iter/task tid filtering
Previously test_task_tid was setting `linfo.task.tid`
to `getpid()` which is the same as `gettid()` for the
parent process. Instead create a new child thread
and set `linfo.task.tid` to `gettid()` to make sure
the tid filtering logic is working as expected.
Jordan Rome [Wed, 16 Oct 2024 21:00:47 +0000 (14:00 -0700)]
bpf: Fix iter/task tid filtering
In userspace, you can add a tid filter by setting
the "task.tid" field for "bpf_iter_link_info".
However, `get_pid_task` when called for the
`BPF_TASK_ITER_TID` type should have been using
`PIDTYPE_PID` (tid) instead of `PIDTYPE_TGID` (pid).
Andrea Parri [Thu, 17 Oct 2024 14:36:28 +0000 (17:36 +0300)]
riscv, bpf: Make BPF_CMPXCHG fully ordered
According to the prototype formal BPF memory consistency model
discussed e.g. in [1] and following the ordering properties of
the C/in-kernel macro atomic_cmpxchg(), a BPF atomic operation
with the BPF_CMPXCHG modifier is fully ordered. However, the
current RISC-V JIT lowerings fail to meet such memory ordering
property. This is illustrated by the following litmus test:
whose "exists" clause is not satisfiable according to the BPF
memory model. Using the current RISC-V JIT lowerings, the test
can be mapped to the following RISC-V litmus test:
where the two stores in P0 can be reordered. Update the RISC-V
JIT lowerings/implementation of BPF_CMPXCHG to emit an SC with
RELEASE ("rl") annotation in order to meet the expected memory
ordering guarantees. The resulting RISC-V JIT lowerings of
BPF_CMPXCHG match the RISC-V lowerings of the C atomic_cmpxchg().
Other lowerings were fixed via 20a759df3bba ("riscv, bpf: make
some atomic operations fully ordered").
Michal Luczaj [Sun, 13 Oct 2024 16:26:40 +0000 (18:26 +0200)]
vsock: Update rx_bytes on read_skb()
Make sure virtio_transport_inc_rx_pkt() and virtio_transport_dec_rx_pkt()
calls are balanced (i.e. virtio_vsock_sock::rx_bytes doesn't lie) after
vsock_transport::read_skb().
While here, also inform the peer that we've freed up space and it has more
credit.
Failing to update rx_bytes after packet is dequeued leads to a warning on
SOCK_STREAM recv():
[ 233.396654] rx_queue is empty, but rx_bytes is non-zero
[ 233.396702] WARNING: CPU: 11 PID: 40601 at net/vmw_vsock/virtio_transport_common.c:589
Michal Luczaj [Sun, 13 Oct 2024 16:26:39 +0000 (18:26 +0200)]
bpf, sockmap: SK_DROP on attempted redirects of unsupported af_vsock
Don't mislead the callers of bpf_{sk,msg}_redirect_{map,hash}(): make sure
to immediately and visibly fail the forwarding of unsupported af_vsock
packets.
====================
Fix truncation bug in coerce_reg_to_size_sx and extend selftests.
This patch series addresses a truncation bug in the eBPF verifier function
coerce_reg_to_size_sx(). The issue was caused by the incorrect ordering
of assignments between 32-bit and 64-bit min/max values, leading to
improper truncation when updating the register state. This issue has been
reported previously by Zac Ecob[1] , but was not followed up on.
The first patch fixes the assignment order in coerce_reg_to_size_sx()
to ensure correct truncation. The subsequent patches add selftests for
coerce_{reg,subreg}_to_size_sx.
Changelog:
v1 -> v2:
- Moved selftests inside the conditional check for cpuv4
Dimitar Kanaliev [Mon, 14 Oct 2024 12:11:54 +0000 (15:11 +0300)]
selftests/bpf: Add test for truncation after sign extension in coerce_reg_to_size_sx()
Add test that checks whether unsigned ranges deduced by the verifier for
sign extension instruction is correct. Without previous patch that
fixes truncation in coerce_reg_to_size_sx() this test fails.
Dimitar Kanaliev [Mon, 14 Oct 2024 12:11:53 +0000 (15:11 +0300)]
bpf: Fix truncation bug in coerce_reg_to_size_sx()
coerce_reg_to_size_sx() updates the register state after a sign-extension
operation. However, there's a bug in the assignment order of the unsigned
min/max values, leading to incorrect truncation:
In the current implementation, the unsigned 32-bit min/max values
(u32_min_value and u32_max_value) are assigned directly from the 64-bit
signed min/max values (s64_min and s64_max):
Tyrone Wu [Fri, 11 Oct 2024 00:08:02 +0000 (00:08 +0000)]
bpf: Fix unpopulated path_size when uprobe_multi fields unset
Previously when retrieving `bpf_link_info.uprobe_multi` with `path` and
`path_size` fields unset, the `path_size` field is not populated
(remains 0). This behavior was inconsistent with how other input/output
string buffer fields work, as the field should be populated in cases
when:
- both buffer and length are set (currently works as expected)
- both buffer and length are unset (not working as expected)
This patch now fills the `path_size` field when `path` and `path_size`
are unset.
Tony Ambardar [Wed, 9 Oct 2024 04:07:20 +0000 (21:07 -0700)]
selftests/bpf: Fix cross-compiling urandom_read
Linking of urandom_read and liburandom_read.so prefers LLVM's 'ld.lld' but
falls back to using 'ld' if unsupported. However, this fallback discards
any existing makefile macro for LD and can break cross-compilation.
Fix by changing the fallback to use the target linker $(LD), passed via
'-fuse-ld=' using an absolute path rather than a linker "flavour".
====================
Fix caching of BTF for kfuncs in the verifier
When playing around with defining kfuncs in some custom modules, we
noticed that if a BPF program calls two functions with the same
signature in two different modules, the function from the wrong module
may sometimes end up being called. Whether this happens depends on the
order of the calls in the BPF program, which turns out to be due to the
use of sort() inside __find_kfunc_desc_btf() in the verifier code.
This series contains a fix for the issue (first patch), and a selftest
to trigger it (last patch). The middle commit is a small refactor to
expose the module loading helper functions in testing_helpers.c. See the
individual patch descriptions for more details.
Changes in v2:
- Drop patch that refactors module building in selftests (Alexei)
- Get rid of expect_val function argument in selftest (Jiri)
- Collect ACKs
- Link to v1: https://lore.kernel.org/r/20241008-fix-kfunc-btf-caching-for-modules-v1-0-dfefd9aa4318@redhat.com
Simon Sundberg [Thu, 10 Oct 2024 13:27:09 +0000 (15:27 +0200)]
selftests/bpf: Add test for kfunc module order
Add a test case for kfuncs from multiple external modules, checking
that the correct kfuncs are called regardless of which order they're
called in. Specifically, check that calling the kfuncs in an order
different from the one the modules' BTF are loaded in works.
Simon Sundberg [Thu, 10 Oct 2024 13:27:08 +0000 (15:27 +0200)]
selftests/bpf: Provide a generic [un]load_module helper
Generalize the previous [un]load_bpf_testmod() helpers (in
testing_helpers.c) to the more generic [un]load_module(), which can
load an arbitrary kernel module by name. This allows future selftests
to more easily load custom kernel modules other than bpf_testmod.ko.
Refactor [un]load_bpf_testmod() to wrap this new helper.
The verifier contains a cache for looking up module BTF objects when
calling kfuncs defined in modules. This cache uses a 'struct
bpf_kfunc_btf_tab', which contains a sorted list of BTF objects that
were already seen in the current verifier run, and the BTF objects are
looked up by the offset stored in the relocated call instruction using
bsearch().
The first time a given offset is seen, the module BTF is loaded from the
file descriptor passed in by libbpf, and stored into the cache. However,
there's a bug in the code storing the new entry: it stores a pointer to
the new cache entry, then calls sort() to keep the cache sorted for the
next lookup using bsearch(), and then returns the entry that was just
stored through the stored pointer. However, because sort() modifies the
list of entries in place *by value*, the stored pointer may no longer
point to the right entry, in which case the wrong BTF object will be
returned.
The end result of this is an intermittent bug where, if a BPF program
calls two functions with the same signature in two different modules,
the function from the wrong module may sometimes end up being called.
Whether this happens depends on the order of the calls in the BPF
program (as that affects whether sort() reorders the array of BTF
objects), making it especially hard to track down. Simon, credited as
reporter below, spent significant effort analysing and creating a
reproducer for this issue. The reproducer is added as a selftest in a
subsequent patch.
The fix is straight forward: simply don't use the stored pointer after
calling sort(). Since we already have an on-stack pointer to the BTF
object itself at the point where the function return, just use that, and
populate it from the cache entry in the branch where the lookup
succeeds.
Tony Ambardar [Tue, 8 Oct 2024 23:12:32 +0000 (16:12 -0700)]
selftests/bpf: Fix error compiling cgroup_ancestor.c with musl libc
Existing code calls connect() with a 'struct sockaddr_in6 *' argument
where a 'struct sockaddr *' argument is declared, yielding compile errors
when building for mips64el/musl-libc:
In file included from cgroup_ancestor.c:3:
cgroup_ancestor.c: In function 'send_datagram':
cgroup_ancestor.c:38:38: error: passing argument 2 of 'connect' from incompatible pointer type [-Werror=incompatible-pointer-types]
38 | if (!ASSERT_OK(connect(sock, &addr, sizeof(addr)), "connect")) {
| ^~~~~
| |
| struct sockaddr_in6 *
./test_progs.h:343:29: note: in definition of macro 'ASSERT_OK'
343 | long long ___res = (res); \
| ^~~
In file included from .../netinet/in.h:10,
from .../arpa/inet.h:9,
from ./test_progs.h:17:
.../sys/socket.h:386:19: note: expected 'const struct sockaddr *' but argument is of type 'struct sockaddr_in6 *'
386 | int connect (int, const struct sockaddr *, socklen_t);
| ^~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
This only compiles because of a glibc extension allowing declaration of the
argument as a "transparent union" which includes both types above.
Explicitly cast the argument to allow compiling for both musl and glibc.
Pu Lehui [Tue, 8 Oct 2024 12:45:44 +0000 (12:45 +0000)]
riscv, bpf: Fix possible infinite tailcall when CONFIG_CFI_CLANG is enabled
When CONFIG_CFI_CLANG is enabled, the number of prologue instructions
skipped by tailcall needs to include the kcfi instruction, otherwise the
TCC will be initialized every tailcall is called, which may result in
infinite tailcalls.
Tyrone Wu [Tue, 8 Oct 2024 16:43:11 +0000 (16:43 +0000)]
bpf: fix unpopulated name_len field in perf_event link info
Previously when retrieving `bpf_link_info.perf_event` for
kprobe/uprobe/tracepoint, the `name_len` field was not populated by the
kernel, leaving it to reflect the value initially set by the user. This
behavior was inconsistent with how other input/output string buffer
fields function (e.g. `raw_tracepoint.tp_name_len`).
This patch fills `name_len` with the actual size of the string name.
Rik van Riel [Tue, 8 Oct 2024 21:07:35 +0000 (17:07 -0400)]
bpf: use kvzmalloc to allocate BPF verifier environment
The kzmalloc call in bpf_check can fail when memory is very fragmented,
which in turn can lead to an OOM kill.
Use kvzmalloc to fall back to vmalloc when memory is too fragmented to
allocate an order 3 sized bpf verifier environment.
Admittedly this is not a very common case, and only happens on systems
where memory has already been squeezed close to the limit, but this does
not seem like much of a hot path, and it's a simple enough fix.
The patch set adds the missed check again info_cnt when flattening the
array of nested struct. The problem was spotted when developing dynptr
key support for hash map. Patch #1 adds the missed check and patch #2
adds three success test cases and one failure test case for the problem.
Comments are always welcome.
Change Log:
v2:
* patch #1: check info_cnt in btf_repeat_fields()
* patch #2: use a hard-coded number instead of BTF_FIELDS_MAX, because
BTF_FIELDS_MAX is not always available in vmlinux.h (e.g.,
for llvm 17/18)
Hou Tao [Tue, 8 Oct 2024 07:11:14 +0000 (15:11 +0800)]
selftests/bpf: Add more test case for field flattening
Add three success test cases to test the flattening of array of nested
struct. For these three tests, the number of special fields in map is
BTF_FIELDS_MAX, but the array is defined in structs with different
nested level.
Add one failure test case for the flattening as well. In the test case,
the number of special fields in map is BTF_FIELDS_MAX + 1. It will make
btf_parse_fields() in map_create() return -E2BIG, the creation of map
will succeed, but the load of program will fail because the btf_record
is invalid for the map.
Hou Tao [Tue, 8 Oct 2024 07:11:13 +0000 (15:11 +0800)]
bpf: Check the remaining info_cnt before repeating btf fields
When trying to repeat the btf fields for array of nested struct, it
doesn't check the remaining info_cnt. The following splat will be
reported when the value of ret * nelems is greater than BTF_FIELDS_MAX:
------------[ cut here ]------------
UBSAN: array-index-out-of-bounds in ../kernel/bpf/btf.c:3951:49
index 11 is out of range for type 'btf_field_info [11]'
CPU: 6 UID: 0 PID: 411 Comm: test_progs ...... 6.11.0-rc4+ #1
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ...
Call Trace:
<TASK>
dump_stack_lvl+0x57/0x70
dump_stack+0x10/0x20
ubsan_epilogue+0x9/0x40
__ubsan_handle_out_of_bounds+0x6f/0x80
? kallsyms_lookup_name+0x48/0xb0
btf_parse_fields+0x992/0xce0
map_create+0x591/0x770
__sys_bpf+0x229/0x2410
__x64_sys_bpf+0x1f/0x30
x64_sys_call+0x199/0x9f0
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7fea56f2cc5d
......
</TASK>
---[ end trace ]---
Fix it by checking the remaining info_cnt in btf_repeat_fields() before
repeating the btf fields.
Merge branch 'bpf: devmap: provide rxq after redirect'
Florian Kauer says:
====================
rxq contains a pointer to the device from where
the redirect happened. Currently, the BPF program
that was executed after a redirect via BPF_MAP_TYPE_DEVMAP*
does not have it set.
Add bugfix and related selftest.
---
Changes in v4:
- return -> goto out_close, thanks Toke
- Link to v3: https://lore.kernel.org/r/20240909-devel-koalo-fix-ingress-ifindex-v3-0-66218191ecca@linutronix.de
Changes in v3:
- initialize skel to NULL, thanks Stanislav
- Link to v2: https://lore.kernel.org/r/20240906-devel-koalo-fix-ingress-ifindex-v2-0-4caa12c644b4@linutronix.de
Changes in v2:
- changed fixes tag
- added selftest
- Link to v1: https://lore.kernel.org/r/20240905-devel-koalo-fix-ingress-ifindex-v1-1-d12a0d74c29c@linutronix.de
====================
rxq contains a pointer to the device from where
the redirect happened. Currently, the BPF program
that was executed after a redirect via BPF_MAP_TYPE_DEVMAP*
does not have it set.
This is particularly bad since accessing ingress_ifindex, e.g.
bpf: Make sure internal and UAPI bpf_redirect flags don't overlap
The bpf_redirect_info is shared between the SKB and XDP redirect paths,
and the two paths use the same numeric flag values in the ri->flags
field (specifically, BPF_F_BROADCAST == BPF_F_NEXTHOP). This means that
if skb bpf_redirect_neigh() is used with a non-NULL params argument and,
subsequently, an XDP redirect is performed using the same
bpf_redirect_info struct, the XDP path will get confused and end up
crashing, which syzbot managed to trigger.
With the stack-allocated bpf_redirect_info, the structure is no longer
shared between the SKB and XDP paths, so the crash doesn't happen
anymore. However, different code paths using identically-numbered flag
values in the same struct field still seems like a bit of a mess, so
this patch cleans that up by moving the flag definitions together and
redefining the three flags in BPF_F_REDIRECT_INTERNAL to not overlap
with the flags used for XDP. It also adds a BUILD_BUG_ON() check to make
sure the overlap is not re-introduced by mistake.
Eduard Zingerman [Tue, 24 Sep 2024 21:08:44 +0000 (14:08 -0700)]
selftests/bpf: Verify that sync_linked_regs preserves subreg_def
This test was added because of a bug in verifier.c:sync_linked_regs(),
upon range propagation it destroyed subreg_def marks for registers.
The test is written in a way to return an upper half of a register
that is affected by range propagation and must have it's subreg_def
preserved. This gives a return value of 0 and leads to undefined
return value if subreg_def mark is not preserved.
Eduard Zingerman [Tue, 24 Sep 2024 21:08:43 +0000 (14:08 -0700)]
bpf: sync_linked_regs() must preserve subreg_def
Range propagation must not affect subreg_def marks, otherwise the
following example is rewritten by verifier incorrectly when
BPF_F_TEST_RND_HI32 flag is set:
(or zero extension of w1 at (2) is missing for architectures that
require zero extension for upper register half).
The following happens w/o this patch:
- r0 is marked as not a subreg at (0);
- w1 is marked as subreg at (2);
- w1 subreg_def is overridden at (3) by copy_register_state();
- w1 is read at (5) but mark_insn_zext() does not mark (2)
for zero extension, because w1 subreg_def is not set;
- because of BPF_F_TEST_RND_HI32 flag verifier inserts random
value for hi32 bits of (2) (marked (r));
- this random value is read at (5).
The function __bpf_ringbuf_reserve is invoked from a tracepoint, which
disables preemption. Using spinlock_t in this context can lead to a
"sleep in atomic" warning in the RT variant. This issue is illustrated
in the example below:
Merge tag 'nfs-for-6.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"New Features:
- Add a 'noalignwrite' mount option for lock-less 'lost writes' prevention
- Add support for the LOCALIO protocol extention
Bugfixes:
- Fix memory leak in error path of nfs4_do_reclaim()
- Simplify and guarantee lock owner uniqueness
- Fix -Wformat-truncation warning
- Fix folio refcounts by using folio_attach_private()
- Fix failing the mount system call when the server is down
- Fix detection of "Proxying of Times" server support
Cleanups:
- Annotate struct nfs_cache_array with __counted_by()
- Remove unnecessary NULL checks before kfree()
- Convert RPC_TASK_* constants to an enum
- Remove obsolete or misleading comments and declerations"
* tag 'nfs-for-6.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (41 commits)
nfs: Fix `make htmldocs` warnings in the localio documentation
nfs: add "NFS Client and Server Interlock" section to localio.rst
nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst
nfs: add Documentation/filesystems/nfs/localio.rst
nfs: implement client support for NFS_LOCALIO_PROGRAM
nfs/localio: use dedicated workqueues for filesystem read and write
pnfs/flexfiles: enable localio support
nfs: enable localio for non-pNFS IO
nfs: add LOCALIO support
nfs: pass struct nfsd_file to nfs_init_pgio and nfs_init_commit
nfsd: implement server support for NFS_LOCALIO_PROGRAM
nfsd: add LOCALIO support
nfs_common: prepare for the NFS client to use nfsd_file for LOCALIO
nfs_common: add NFS LOCALIO auxiliary protocol enablement
SUNRPC: replace program list with program array
SUNRPC: add svcauth_map_clnt_to_svc_cred_local
SUNRPC: remove call_allocate() BUG_ONs
nfsd: add nfsd_serv_try_get and nfsd_serv_put
nfsd: add nfsd_file_acquire_local()
nfsd: factor out __fh_verify to allow NULL rqstp to be passed
...
Merge tag 'fuse-update-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
- Add support for idmapped fuse mounts (Alexander Mikhalitsyn)
- Add optimization when checking for writeback (yangyun)
- Add tracepoints (Josef Bacik)
- Clean up writeback code (Joanne Koong)
- Clean up request queuing (me)
- Misc fixes
* tag 'fuse-update-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (32 commits)
fuse: use exclusive lock when FUSE_I_CACHE_IO_MODE is set
fuse: clear FR_PENDING if abort is detected when sending request
fs/fuse: convert to use invalid_mnt_idmap
fs/mnt_idmapping: introduce an invalid_mnt_idmap
fs/fuse: introduce and use fuse_simple_idmap_request() helper
fs/fuse: fix null-ptr-deref when checking SB_I_NOIDMAP flag
fuse: allow O_PATH fd for FUSE_DEV_IOC_BACKING_OPEN
virtio_fs: allow idmapped mounts
fuse: allow idmapped mounts
fuse: warn if fuse_access is called when idmapped mounts are allowed
fuse: handle idmappings properly in ->write_iter()
fuse: support idmapped ->rename op
fuse: support idmapped ->set_acl
fuse: drop idmap argument from __fuse_get_acl
fuse: support idmapped ->setattr op
fuse: support idmapped ->permission inode op
fuse: support idmapped getattr inode op
fuse: support idmap for mkdir/mknod/symlink/create/tmpfile
fuse: support idmapped FUSE_EXT_GROUPS
fuse: add an idmap argument to fuse_simple_request
...
Merge tag 'exfat-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat updates from Namjae Jeon:
- Clean-up unnecessary codes as ->valid_size is supported
- buffered-IO fallback is no longer needed when using direct-IO
- Move ->valid_size extension from mmap to ->page_mkwrite. This
improves the overhead caused by unnecessary zero-out during mmap.
- Fix memleaks from exfat_load_bitmap() and exfat_create_upcase_table()
- Add sops->shutdown and ioctl
- Add Yuezhang Mo as a reviwer
* tag 'exfat-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
MAINTAINERS: exfat: add myself as reviewer
exfat: resolve memory leak from exfat_create_upcase_table()
exfat: move extend valid_size into ->page_mkwrite()
exfat: fix memory leak in exfat_load_bitmap()
exfat: Implement sops->shutdown and ioctl
exfat: do not fallback to buffered write
exfat: drop ->i_size_ondisk
Merge tag 'f2fs-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"The main changes include converting major IO paths to use folio, and
adding various knobs to control GC more flexibly for Zoned devices.
In addition, there are several patches to address corner cases of
atomic file operations and better support for file pinning on zoned
device.
Enhancement:
- add knobs to tune foreground/background GCs for Zoned devices
- convert IO paths to use folio
- reduce expensive checkpoint trigger frequency
- allow F2FS_IPU_NOCACHE for pinned file
- forcibly migrate to secure space for zoned device file pinning
- get rid of buffer_head use
- add write priority option based on zone UFS
- get rid of online repair on corrupted directory
Bug fixes:
- fix to don't panic system for no free segment fault injection
- fix to don't set SB_RDONLY in f2fs_handle_critical_error()
- avoid unused block when dio write in LFS mode
- compress: don't redirty sparse cluster during {,de}compress
- check discard support for conventional zones
- atomic: prevent atomic file from being dirtied before commit
- atomic: fix to check atomic_file in f2fs ioctl interfaces
- atomic: fix to forbid dio in atomic_file
- atomic: fix to truncate pagecache before on-disk metadata truncation
- atomic: create COW inode from parent dentry
- atomic: fix to avoid racing w/ GC
- atomic: require FMODE_WRITE for atomic write ioctls
- fix to wait page writeback before setting gcing flag
- fix to avoid racing in between read and OPU dio write, dio completion
- fix several potential integer overflows in file offsets and dir_block_index
- fix to avoid use-after-free in f2fs_stop_gc_thread()
As usual, there are several code clean-ups and refactorings"
* tag 'f2fs-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (60 commits)
f2fs: allow F2FS_IPU_NOCACHE for pinned file
f2fs: forcibly migrate to secure space for zoned device file pinning
f2fs: remove unused parameters
f2fs: fix to don't panic system for no free segment fault injection
f2fs: fix to don't set SB_RDONLY in f2fs_handle_critical_error()
f2fs: add valid block ratio not to do excessive GC for one time GC
f2fs: create gc_no_zoned_gc_percent and gc_boost_zoned_gc_percent
f2fs: do FG_GC when GC boosting is required for zoned devices
f2fs: increase BG GC migration window granularity when boosted for zoned devices
f2fs: add reserved_segments sysfs node
f2fs: introduce migration_window_granularity
f2fs: make BG GC more aggressive for zoned devices
f2fs: avoid unused block when dio write in LFS mode
f2fs: fix to check atomic_file in f2fs ioctl interfaces
f2fs: get rid of online repaire on corrupted directory
f2fs: prevent atomic file from being dirtied before commit
f2fs: get rid of page->index
f2fs: convert read_node_page() to use folio
f2fs: convert __write_node_page() to use folio
f2fs: convert f2fs_write_data_page() to use folio
...
Merge tag 'bpf-next-6.12-struct-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf 'struct fd' updates from Alexei Starovoitov:
"This includes struct_fd BPF changes from Al and Andrii"
* tag 'bpf-next-6.12-struct-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
bpf: convert bpf_token_create() to CLASS(fd, ...)
security,bpf: constify struct path in bpf_token_create() LSM hook
bpf: more trivial fdget() conversions
bpf: trivial conversions for fdget()
bpf: switch maps to CLASS(fd, ...)
bpf: factor out fetching bpf_map from FD and adding it to used_maps list
bpf: switch fdget_raw() uses to CLASS(fd_raw, ...)
bpf: convert __bpf_prog_get() to CLASS(fd, ...)
Merge tag 'kbuild-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Support cross-compiling linux-headers Debian package and kernel-devel
RPM package
- Add support for the linux-debug Pacman package
- Improve module rebuilding speed by factoring out the common code to
scripts/module-common.c
- Separate device tree build rules into scripts/Makefile.dtbs
- Add a new script to generate modules.builtin.ranges, which is useful
for tracing tools to find symbols in built-in modules
- Refactor Kconfig and misc tools
- Update Kbuild and Kconfig documentation
* tag 'kbuild-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (51 commits)
kbuild: doc: replace "gcc" in external module description
kbuild: doc: describe the -C option precisely for external module builds
kbuild: doc: remove the description about shipped files
kbuild: doc: drop section numbering, use references in modules.rst
kbuild: doc: throw out the local table of contents in modules.rst
kbuild: doc: remove outdated description of the limitation on -I usage
kbuild: doc: remove description about grepping CONFIG options
kbuild: doc: update the description about Kbuild/Makefile split
kbuild: remove unnecessary export of RUST_LIB_SRC
kbuild: remove append operation on cmd_ld_ko_o
kconfig: cache expression values
kconfig: use hash table to reuse expressions
kconfig: refactor expr_eliminate_dups()
kconfig: add comments to expression transformations
kconfig: change some expr_*() functions to bool
scripts: move hash function from scripts/kconfig/ to scripts/include/
kallsyms: change overflow variable to bool type
kallsyms: squash output_address()
kbuild: add install target for modules.builtin.ranges
scripts: add verifier script for builtin module range data
...
Merge tag 'linux-cpupower-6.12-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux
Pull cpupower updates from Shuah Khan
"The 'raw_pylibcpupower.i' file was being removed by "make mrproper".
That was because '*.i', '.s' and '*.o' files are generated during
kernel compile and removed when the repo is cleaned by mrproper.
Rename it to use .swg extension instead to avoid the problem.
A second patch removes references to it from .gitignore"
* tag 'linux-cpupower-6.12-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux:
pm: cpupower: Clean up bindings gitignore
pm: cpupower: rename raw_pylibcpupower.i
Merge tag 'i3c/for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Pull i3c updates from Alexandre Belloni:
"This adds support for the I3C HCI controller of the AMD SoC which as
expected requires quirks. Also fixes for the other drivers, including
rate selection fixes for svc.
Core:
- allow adjusting first broadcast address speed
Drivers:
- cdns: few fixes
- mipi-i3c-hci: Add AMD SoC I3C controller support and quirks, fix
get_i3c_mode
- svc: adjust rates, fix race condition"
* tag 'i3c/for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c: master: svc: Fix use after free vulnerability in svc_i3c_master Driver Due to Race Condition
i3c: master: cdns: Fix use after free vulnerability in cdns_i3c_master Driver Due to Race Condition
i3c: master: svc: adjust SDR according to i3c spec
i3c: master: svc: use slow speed for first broadcast address
i3c: master: support to adjust first broadcast address speed
i3c/master: cmd_v1: Fix the rule for getting i3c mode
i3c: master: cdns: fix module autoloading
i3c: mipi-i3c-hci: Add a quirk to set Response buffer threshold
i3c: mipi-i3c-hci: Add a quirk to set timing parameters
i3c: mipi-i3c-hci: Relocate helper macros to HCI header file
i3c: mipi-i3c-hci: Add a quirk to set PIO mode
i3c: mipi-i3c-hci: Read HC_CONTROL_PIO_MODE only after i3c hci v1.1
i3c: mipi-i3c-hci: Add AMDI5017 ACPI ID to the I3C Support List
Merge tag 'input-for-v6.12-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- support for PixArt PS/2 touchpad
- updates to tsc2004/5, usbtouchscreen, and zforce_ts drivers
- support for GPIO-only mode for ADP55888 controller
- support for touch keys in Zinitix driver
- support for querying density of Synaptics sensors
- sysfs interface for Goodex "Berlin" devices to read and write touch
IC registers
- more quirks to i8042 to handle various Tuxedo laptops
- a number of drivers have been converted to using "guard" notation
when acquiring various locks, as well as using other cleanup
functions to simplify releasing of resources (with more drivers to
follow)
- evdev will limit amount of data that can be written into an evdev
instance at a given time to 4096 bytes (170 input events) to avoid
holding evdev->mutex for too long and starving other users
- Spitz has been converted to use software nodes/properties to describe
its matrix keypad and GPIO-connected LEDs
- msc5000_ts, msc_touchkey and keypad-nomadik-ske drivers have been
removed since noone in mainline have been using them
- other assorted cleanups and fixes
* tag 'input-for-v6.12-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (98 commits)
ARM: spitz: fix compile error when matrix keypad driver is enabled
Input: hynitron_cstxxx - drop explicit initialization of struct i2c_device_id::driver_data to 0
Input: adp5588-keys - fix check on return code
Input: Convert comma to semicolon
Input: i8042 - add TUXEDO Stellaris 15 Slim Gen6 AMD to i8042 quirk table
Input: i8042 - add another board name for TUXEDO Stellaris Gen5 AMD line
Input: tegra-kbc - use of_property_read_variable_u32_array() and of_property_present()
Input: ps2-gpio - use IRQF_NO_AUTOEN flag in request_irq()
Input: ims-pcu - fix calling interruptible mutex
Input: zforce_ts - switch to using asynchronous probing
Input: zforce_ts - remove assert/deassert wrappers
Input: zforce_ts - do not hardcode interrupt level
Input: zforce_ts - switch to using devm_regulator_get_enable()
Input: zforce_ts - stop treating VDD regulator as optional
Input: zforce_ts - make zforce_idtable constant
Input: zforce_ts - use dev_err_probe() where appropriate
Input: zforce_ts - do not ignore errors when acquiring regulator
Input: zforce_ts - make parsing of contacts less confusing
Input: zforce_ts - switch to using get_unaligned_le16
Input: zforce_ts - use guard notation when acquiring mutexes
...
Merge tag 'hwlock-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux
Pull hwspinlock update from Bjorn Andersson:
"This converts the Spreadtrum hardware spinlock DeviceTree binding to
YAML, to allow validation of related DeviceTree source"
* tag 'hwlock-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
dt-bindings: hwlock: sprd-hwspinlock: convert to YAML
Merge tag 'rpmsg-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux
Pull rpmsg updates from Bjorn Andersson:
- Minor cleanup/refactor to the Qualcomm GLINK code, in order to add
trace events related to the messages exchange with the remote side,
useful for debugging a range of interoperability issues
- Rewrite the nested structs with flexible array members in order to
avoid the risk of invalid accesses
* tag 'rpmsg-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
rpmsg: glink: Avoid -Wflex-array-member-not-at-end warnings
rpmsg: glink: Introduce packet tracepoints
rpmsg: glink: Pass channel to qcom_glink_send_close_ack()
rpmsg: glink: Tidy up RX advance handling
Merge tag 'rproc-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux
Pull remoteproc updates from Bjorn Andersson:
- Add remoteproc support for the Cortex M4F found in AM62x and AM64x of
the TI K3 family, support for the modem remoteproc in the Qualcomm
SDX75, and audio, compute and general-purpose DSPs of the Qualcomm
SA8775P.
- Add support for blocking and non-blocking mailbox transmissions to
the i.MX remoteproc driver, and implement poweroff and reboot
mechanisms using them. Plus a few bug fixes and minor improvements.
- Cleanups and bug fixes for the TI K3 DSP and R5F drivers
- Support mapping SRAM regions into the AMD-Xilinx Zynqmp R5 cores
- Use devres helpers for various allocations in the Ingenic, TI DA8xx,
TI Keystone, TI K3, ST slim drivers
- Replace uses of of_{find,get}_property() with of_property_present()
where possible
* tag 'rproc-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux: (25 commits)
remoteporc: ingenic: Use devm_platform_ioremap_resource_byname()
remoteproc: da8xx: Use devm_platform_ioremap_resource_byname()
remoteproc: st_slim: Use devm_platform_ioremap_resource_byname()
remoteproc: xlnx: Add sram support
remoteproc: k3-r5: Fix error handling when power-up failed
remoteproc: imx_rproc: Add support for poweroff and reboot
remoteproc: imx_rproc: Allow setting of the mailbox transmit mode
remoteproc: k3-r5: Delay notification of wakeup event
remoteproc: k3-m4: Add a remoteproc driver for M4F subsystem
remoteproc: k3: Factor out TI-SCI processor control OF get function
dt-bindings: remoteproc: k3-m4f: Add K3 AM64x SoCs
remoteproc: k3-dsp: Acquire mailbox handle during probe routine
remoteproc: k3-r5: Acquire mailbox handle during probe routine
remoteproc: k3-r5: Use devm_rproc_alloc() helper
remoteproc: qcom: pas: Add support for SA8775p ADSP, CDSP and GPDSP
remoteproc: qcom: pas: Add SDX75 remoteproc support
dt-bindings: remoteproc: qcom,sm8550-pas: document the SDX75 PAS
remoteproc: keystone: Use devm_rproc_alloc() helper
remoteproc: keystone: Use devm_kasprintf() to build name string
dt-bindings: remoteproc: xlnx,zynqmp-r5fss: Add missing "additionalProperties" on child nodes
...
Merge tag 'vfio-v6.12-rc1' of https://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
"Just a few cleanups this cycle:
- Remove several unused structure and function declarations, and
unused variables (Dr. David Alan Gilbert, Yue Haibing, Zhang Zekun)
- Constify unmodified structure in mdev (Hongbo Li)
- Convert to unsigned type to catch overflow with less fanfare than
passing a negative value to kcalloc() (Dan Carpenter)"
* tag 'vfio-v6.12-rc1' of https://github.com/awilliam/linux-vfio:
vfio/pci: clean up a type in vfio_pci_ioctl_pci_hot_reset_groups()
vfio/mdev: Constify struct kobj_type
vfio: mdev: Remove unused function declarations
vfio/fsl-mc: Remove unused variable 'hwirq'
vfio/pci: Remove unused struct 'vfio_pci_mmap_vma'
Merge tag 'dma-mapping-6.12-2024-09-24' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
- sort out a few issues with the direct calls to iommu-dma (Christoph
Hellwig, Leon Romanovsky)
* tag 'dma-mapping-6.12-2024-09-24' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: report unlimited DMA addressing in IOMMU DMA path
iommu/dma: remove most stubs in iommu-dma.h
dma-mapping: fix vmap and mmap of noncontiougs allocations
Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd updates from Jason Gunthorpe:
"Collection of small cleanup and one fix:
- Sort headers and struct forward declarations
- Fix random selftest failures in some cases due to dirty tracking
tests
- Have the reserved IOVA regions mechanism work when a HWPT is used
as a nesting parent. This updates the nesting parent's IOAS with
the reserved regions of the device and will also install the ITS
doorbell page on ARM.
- Add missed validation of parent domain ops against the current
iommu
- Fix a syzkaller bug related to integer overflow during ALIGN()
- Tidy two iommu_domain attach paths"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommu: Set iommu_attach_handle->domain in core
iommufd: Avoid duplicated __iommu_group_set_core_domain() call
iommufd: Protect against overflow of ALIGN() during iova allocation
iommufd: Reorder struct forward declarations
iommufd: Check the domain owner of the parent before creating a nesting domain
iommufd/device: Enforce reserved IOVA also when attached to hwpt_nested
iommufd/selftest: Fix buffer read overrrun in the dirty test
iommufd: Reorder include files
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"Usual collection of small improvements and fixes, nothing especially
stands out to me here.
The new multipath PCI feature is a sign of things to come, I think we
will see more of this in the next 10 years. Broadcom and HNS continue
to update their drivers for their new HW generations.
Summary:
- Bug fixes and minor improvments in cxgb4, siw, mlx5, rxe, efa, rts,
hfi, erdma, hns, irdma
- Variable size work queue, SRQ changes, and relaxed ordering for new
bnxt HW
- New ODP fault resolution FW protocol in mlx5
- New 'rdma monitor' netlink mechanism"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits)
RDMA/bnxt_re: Remove the unused variable en_dev
RDMA/nldev: Add missing break in rdma_nl_notify_err_msg()
RDMA/irdma: fix error message in irdma_modify_qp_roce()
RDMA/cxgb4: Added NULL check for lookup_atid
RDMA/hns: Fix ah error counter in sw stat not increasing
RDMA/bnxt_re: Recover the device when FW error is detected
RDMA/bnxt_re: Group all operations under add_device and remove_device
RDMA/bnxt_re: Use the aux device for L2 ULP callbacks
RDMA/bnxt_re: Change aux driver data to en_info to hold more information
RDMA/nldev: Expose whether RDMA monitoring is supported
RDMA/nldev: Add support for RDMA monitoring
RDMA/mlx5: Use IB set_netdev and get_netdev functions
RDMA/device: Remove optimization in ib_device_get_netdev()
RDMA/mlx5: Initialize phys_port_cnt earlier in RDMA device creation
RDMA/mlx5: Obtain upper net device only when needed
RDMA/mlx5: Check RoCE LAG status before getting netdev
RDMA/mlx5: Consider the query_vuid cap for data_direct
net/mlx5: Handle memory scheme ODP capabilities
RDMA/mlx5: Add implicit MR handling to ODP memory scheme
RDMA/mlx5: Add handling for memory scheme page fault events
...
Merge tag 'sched_ext-for-6.12-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- Three build fixes
- The fix for a stall bug introduced by a recent optimization in sched
core (SM_IDLE)
- Addition of /sys/kernel/sched_ext/enable_seq. While not a fix, it is
a simple addition that distro people want to be able to tell whether
an SCX scheduler has ever been loaded on the system
* tag 'sched_ext-for-6.12-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Provide a sysfs enable_seq counter
sched_ext: Fix build when !CONFIG_STACKTRACE
sched, sched_ext: Disable SM_IDLE/rq empty path when scx_enabled()
sched: Put task_group::idle under CONFIG_GROUP_SCHED_WEIGHT
sched: Add dummy version of sched_group_set_idle()
Merge tag 'for-6.12/io_uring-20240922' of git://git.kernel.dk/linux
Pull more io_uring updates from Jens Axboe:
"Mostly just a set of fixes in here, or little changes that didn't get
included in the initial pull request. This contains:
- Move the SQPOLL napi polling outside the submission lock (Olivier)
- Rename of the "copy buffers" API that got added in the 6.12 merge
window. There's really no copying going on, it's just referencing
the buffers. After a bit of consideration, decided that it was
better to simply rename this to avoid potential confusion (me)
- Shrink struct io_mapped_ubuf from 48 to 32 bytes, by changing it to
start + len tracking rather than having start / end in there, and
by removing the caching of folio_mask when we can just calculate it
from folio_shift when we need it (me)
- Fixes for the SQPOLL affinity checking (me, Felix)
- Fix for how cqring waiting checks for the presence of task_work.
Just check it directly rather than check for a specific
notification mechanism (me)
- Tweak to how request linking is represented in tracing (me)
- Fix a syzbot report that deliberately sets up a huge list of
overflow entries, and then hits rcu stalls when flushing this list.
Just check for the need to preempt, and drop/reacquire locks in the
loop. There's no state maintained over the loop itself, and each
entry is yanked from head-of-list (me)"
* tag 'for-6.12/io_uring-20240922' of git://git.kernel.dk/linux:
io_uring: check if we need to reschedule during overflow flush
io_uring: improve request linking trace
io_uring: check for presence of task_work rather than TIF_NOTIFY_SIGNAL
io_uring/sqpoll: do the napi busy poll outside the submission block
io_uring: clean up a type in io_uring_register_get_file()
io_uring/sqpoll: do not put cpumask on stack
io_uring/sqpoll: retain test for whether the CPU is valid
io_uring/rsrc: change ubuf->ubuf_end to length tracking
io_uring/rsrc: get rid of io_mapped_ubuf->folio_mask
io_uring: rename "copy buffers" to "clone buffers"
Merge tag 'sysctl-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl update from Joel Granados:
- Avoid evaluating non-mount ctl_tables as a sysctl_mount_point by
removing the unlikely (but possible) chance that the permanently
empty ctl_table array shares its address with another ctl_table
- Update Joel Granados' contact info in MAINTAINERS
* tag 'sysctl-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
MAINTAINERS: update email for Joel Granados
sysctl: avoid spurious permanent empty tables
Merge tag 'riscv-for-linus-6.12-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support using Zkr to seed KASLR
- Support IPI-triggered CPU backtracing
- Support for generic CPU vulnerabilities reporting to userspace
- A few cleanups for missing licenses
- The size limit on the XIP kernel has been removed
- Support for tracing userspace stacks
- Support for the Svvptc extension
- Various cleanups and fixes throughout the tree
* tag 'riscv-for-linus-6.12-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (47 commits)
crash: Fix riscv64 crash memory reserve dead loop
perf/riscv-sbi: Add platform specific firmware event handling
tools: Optimize ring buffer for riscv
tools: Add riscv barrier implementation
RISC-V: Don't have MAX_PHYSMEM_BITS exceed phys_addr_t
ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE
riscv: Enable bitops instrumentation
riscv: Omit optimized string routines when using KASAN
ACPI: RISCV: Make acpi_numa_get_nid() to be static
riscv: Randomize lower bits of stack address
selftests: riscv: Allow mmap test to compile on 32-bit
riscv: Make riscv_isa_vendor_ext_andes array static
riscv: Use LIST_HEAD() to simplify code
riscv: defconfig: Disable RZ/Five peripheral support
RISC-V: Implement kgdb_roundup_cpus() to enable future NMI Roundup
riscv: avoid Imbalance in RAS
riscv: cacheinfo: Add back init_cache_level() function
riscv: Remove unused _TIF_WORK_MASK
drivers/perf: riscv: Remove redundant macro check
riscv: define ILLEGAL_POINTER_VALUE for 64bit
...
Merge tag 'landlock-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
"We can now scope a Landlock domain thanks to a new "scoped" field that
can deny interactions with resources outside of this domain.
The LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET flag denies connections to an
abstract UNIX socket created outside of the current scoped domain, and
the LANDLOCK_SCOPE_SIGNAL flag denies sending a signal to processes
outside of the current scoped domain.
These restrictions also apply to nested domains according to their
scope. The related changes will also be useful to support other kind
of IPC isolations"
* tag 'landlock-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
landlock: Document LANDLOCK_SCOPE_SIGNAL
samples/landlock: Add support for signal scoping
selftests/landlock: Test signal created by out-of-bound message
selftests/landlock: Test signal scoping for threads
selftests/landlock: Test signal scoping
landlock: Add signal scoping
landlock: Document LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET
samples/landlock: Add support for abstract UNIX socket scoping
selftests/landlock: Test inherited restriction of abstract UNIX socket
selftests/landlock: Test connected and unconnected datagram UNIX socket
selftests/landlock: Test UNIX sockets with any address formats
selftests/landlock: Test abstract UNIX socket scoping
selftests/landlock: Test handling of unknown scope
landlock: Add abstract UNIX socket scoping
Merge tag 'keys-next-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull key updates from Jarkko Sakkinen:
"The bulk of this is OpenSSL 3.0 compatibility fixes for the signing
and certificates"
* tag 'keys-next-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3
sign-file,extract-cert: avoid using deprecated ERR_get_error_line()
sign-file,extract-cert: move common SSL helper functions to a header
KEYS: prevent NULL pointer dereference in find_asymmetric_key()
KEYS: Remove unused declarations
Merge tag 'lsm-pr-20240923' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull LSM fixes from Paul Moore:
- Add a missing security_mmap_file() check to the remap_file_pages()
syscall
- Properly reference the SELinux and Smack LSM blobs in the
security_watch_key() LSM hook
- Fix a random IPE selftest crash caused by a missing list terminator
in the test
* tag 'lsm-pr-20240923' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
ipe: Add missing terminator to list of unit tests
selinux,smack: properly reference the LSM blob in security_watch_key()
mm: call the security_mmap_file() LSM hook in remap_file_pages()
fuse: use exclusive lock when FUSE_I_CACHE_IO_MODE is set
This may be a typo. The comment has said shared locks are
not allowed when this bit is set. If using shared lock, the
wait in `fuse_file_cached_io_open` may be forever.
Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
"The core clk framework is left largely untouched this time around
except for support for the newly ratified DT property
'assigned-clock-rates-u64'.
I'm much more excited about the support for loading DT overlays from
KUnit tests so that we can test how the clk framework parses DT nodes
during clk registration. The clk framework has some places that are
highly DeviceTree dependent so this charts the path to extend the
KUnit tests to cover even more framework code in the future. I've got
some more tests on the list that use the DT overlay support, but they
uncovered issues with clk unregistration that I'm still working on
fixing.
Outside the core, the clk driver update pile is dominated by Qualcomm
and Renesas SoCs, making it fairly usual. Looking closer, there are
fixes for things all over the place, like adding missing clk
frequencies or moving defines for the number of clks out of DT binding
headers into the drivers. There are even conversions of DT bindings to
YAML and migration away from strings to describe clk topology. Overall
it doesn't look unusual so I expect the new drivers to be where we'll
have fixes in the coming weeks.
Core:
- KUnit tests for clk registration and fixed rate basic clk type
- A couple more devm helpers, one consumer and one provider
- Support for assigned-clock-rates-u64
New Drivers:
- Camera, display and GPU clocks on Qualcomm SM4450
- Camera clocks on Qualcomm SM8150
- Rockchip rk3576 clks
- Microchip SAM9X7 clks
- Renesas RZ/V2H(P) (R9A09G057) clks
Updates:
- Mark a bunch of struct freq_tbl const to reduce .data usage
- Add Qualcomm MSM8226 A7PLL and Regera PLL support
- Fix the Qualcomm Lucid 5LPE PLL configuration sequence to not reuse
Trion, as they do differ
- A number of fixes to the Qualcomm SM8550 display clock driver
- Fold Qualcomm SM8650 display clock driver into SM8550 one
- Add missing clocks and GDSCs needed for audio on Qualcomm MSM8998
- Add missing USB MP resets, GPLL9, and QUPv3 DFS to Qualcomm SC8180X
- Fix sdcc clk frequency tables on Qualcomm SC8180X
- Drop the Qualcomm SM8150 gcc_cpuss_ahb_clk_src
- Mark Qualcomm PCIe GDSCs as RET_ON on sm8250 and sm8540 to avoid
them turning off during suspend
- Use the HW_CTRL mechanism on Qualcomm SM8550 video clock controller
GDSCs
- Get rid of CLK_NR_CLKS defines in Rockchip DT binding headers
- Some fixes for Rockchip rk3228 and rk3588
- Exynos850: Add clock for Thermal Management Unit
- Exynos7885: Fix duplicated ID in the header, add missing TOP PLLs
and add clocks for USB block in the FSYS clock controller
- ExynosAutov9: Add DPUM clock controller
- ExynosAutov920: Add new (first) clock controllers: TOP and PERIC0
(and a bit more complete bindings)
- Use clk_hw pointer instead of fw_name for acm_aud_clk[0-1]_sel
clocks on i.MX8Q as parents in ACM provider
- Add i.MX95 NETCMIX support to the block control provider
- Fix parents for ENETx_REF_SEL clocks on i.MX6UL
- Add USB clocks, resets and power domains on Renesas RZ/G3S
- Add Generic Timer (GTM), I2C Bus Interface (RIIC), SD/MMC Host
Interface (SDHI) and Watchdog Timer (WDT) clocks and resets on
Renesas RZ/V2H
- Add PCIe, PWM, and CAN-FD clocks on Renesas R-Car V4M
- Add LCD controller clocks and resets on Renesas RZ/G2UL
- Add DMA clocks and resets on Renesas RZ/G3S
- Add fractional multiplication PLL support on Renesas R-Car Gen4
- Document support for the Renesas RZ/G2M v3.0 (r8a774a3) SoC
- Support for the Microchip SAM9X7 SoC as follows:
- Updates for the Microchip PLL drivers
- DT binding documentation updates (for the new clock driver and for
the slow clock controller that SAM9X7 is using)
- A fix for the Microchip SAMA7G5 clock driver to avoid allocating
more memory than necessary
- Constify some Amlogic structs
- Add SM1 eARC clocks for Amlogic
- Introduce a symbol namespace for Amlogic clock specific symbols
- Add reset controller support to audiomix block control on i.MX
- Add CLK_SET_RATE_PARENT flag to all audiomix clocks and to i.MX7D
lcdif_pixel_src clock
- Fix parent clocks for earc_phy and audpll on i.MX8MP
- Fix default parents for enet[12]_ref_sel on i.MX6UL
- Add ops in composite 8M and 93 that allow no-op on disable
- Add check for PCC present bit on composite 7ULP register
- Fix fractional part for fracn-gppll on prepare in i.MX
- Fix clock tree update for TF-A managed clocks on i.MX8M
- Drop CLK_SET_PARENT_GATE for DRAM mux on i.MX7D
- Add the SAI7 IPG clock for i.MX8MN
- Mark the 'nand_usdhc_bus' clock as non-critical on i.MX8MM
- Add LVDS bypass clocks on i.MX8QXP
- Add muxes for MIPI and PHY ref clocks on i.MX
- Reorder dc0_bypass0_clk, lcd_pxl and dc1_disp clocks on i.MX8QXP
- Add 1039.5MHz and 800MHz rates to fracn-gppll table on i.MX
- Add CLK_SET_RATE_PARENT for media_disp pixel clocks on i.MX8QXP
- Add some module descriptions to the i.MX generic and the i.MXRT1050
driver
- Fix return value for bypass for composite i.MX7ULP
- Move Mediatek clk bindings to clock/
- Convert some more clk bindings to dt schema"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (180 commits)
clk: Switch back to struct platform_driver::remove()
dt-bindings: clock, reset: fix top-comment indentation rk3576 headers
clk: rockchip: remove unused mclk_pdm0_p/pdm0_p definitions
clk: provide devm_clk_get_optional_enabled_with_rate()
clk: fixed-rate: add devm_clk_hw_register_fixed_rate_parent_data()
clk: imx6ul: fix clock parent for IMX6UL_CLK_ENETx_REF_SEL
clk: renesas: r9a09g057: Add clock and reset entries for GTM/RIIC/SDHI/WDT
clk: renesas: rzv2h: Add support for dynamic switching divider clocks
clk: renesas: r9a08g045: Add clocks, resets and power domains for USB
clk: rockchip: fix error for unknown clocks
clk: rockchip: rk3588: drop unused code
clk: rockchip: Add clock controller for the RK3576
clk: rockchip: Add new pll type pll_rk3588_ddr
dt-bindings: clock, reset: Add support for rk3576
dt-bindings: clock: rockchip,rk3588-cru: drop unneeded assigned-clocks
clk: rockchip: rk3588: Fix 32k clock name for pmu_24m_32k_100m_src_p
clk: imx95: enable the clock of NETCMIX block control
dt-bindings: clock: add RMII clock selection
dt-bindings: clock: add i.MX95 NETCMIX block control
clk: imx: imx8: Use clk_hw pointer for self registered clock in clk_parent_data
...
Merge tag 'i2c-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
"I2C core:
- finally remove the I2C_COMPAT symbol after 15 years of deprecation
- lock client addresses during initialization to prevent race
conditions between different kinds of instantiation
- use scoped foreach OF child loops
- testunit cleanups and documentation improvements, as well as two
new tests, one for repeated start and one for triggering SMBusAlert
interrupts
I2C host drivers:
- DesignWare and Renesas I2C driver updates.
The first has has undergone through a series of cleanups that have
been sent to the mailing list a year ago for the first time and
finally get merged in this pull request. They are many, from typos
(e.g. i2/i2c), to cosmetics, to refactoring (e.g. move inline
functions to librarieas) and many others.
- all the DesignWare Kconfig options have been grouped under the
I2C_DESIGNWARE_CORE and this required some adaptation in many of
the kernel configuration files for different arm and mips boards
Cleanups:
- improve the exit path in the runtime resume function for the
Qualcomm Geni platform
- get rid of the unused "target_addr" parameter in the Intel LJCA
driver
- intialize the restart_flag in the MediaTek controller in one single
place
- constify a few global data structures in the virtio driver
- simplify the bus speed handling in the Renesas driver init function
making it more readable
- improved probe function of the Renesas R-Car driver
- switch the iMX/MXC driver to use RUNTIME_PM_OPS() instead of
SET_RUNTIME_PM_OPS()
- iMX/MXC driver cleanups
- use devm_clk_get_enabled() to simplify the Renesas EMEV2, Ingenic
and MPC drivers
Refactoring:
- Fix a potential out of boundary array access in the Nuvoton driver.
This is not a bug fix because the issue could never occur due to
hardware not having the properties listed in the array. The change
makes the driver more future proof and, at the same time, silences
code analyzers.
Improvements:
- several patches improving the runtime power management handling of
the Renesas I2C (riic) driver
- use a more descriptive adapter name in the Intel i801 driver to
show the presence of the IDF feature
- kill pending transactions when irq's can't complete their handling
in the Intel Denverton (ismt) driver, triggering a timeout
New Feature:
- support fast mode plus in the Renesas I2C (riic) driver
New support:
- Added support for:
- Renesas R9A08G045
- Rockchip RK3576
- KEBA I2C
- Theobroma Systems Mule Multiplexer.
- new i2c-keba.c driver
- new driver for The Mule i2c multiplexer
Core I2C framework:
- move runtime PM functions in order to allow them to be accessed
during device add
Devicetree:
- nVidia and Qualcomm binding improvements
- get rid of redundant "multi-master" property in the aspeed binding
- convert i2c-sprd binding to YAML
AT24 updates:
- document a new model from giantec in DT bindings"
* tag 'i2c-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (69 commits)
i2c: designware: Use pci_get_drvdata()
i2c: designware: Propagate firmware node
i2c: designware: Uninline i2c_dw_probe()
i2c: ljca: Remove unused "target_addr" parameter
i2c: keba: Add KEBA I2C controller support
i2c: i801: Use a different adapter-name for IDF adapters
i2c: core: Setup i2c_adapter runtime-pm before calling device_add()
dt-bindings: i2c: i2c-sprd: convert to YAML
i2c: ismt: kill transaction in hardware on timeout
i2c: designware: Group all DesignWare drivers under a single option
net: txgbe: Fix I2C Kconfig dependencies
RISC-V: configs: enable I2C_DESIGNWARE_CORE with I2C_DESIGNWARE_PLATFORM
mips: configs: enable I2C_DESIGNWARE_CORE with I2C_DESIGNWARE_PLATFORM
arm64: defconfig: enable I2C_DESIGNWARE_CORE with I2C_DESIGNWARE_PLATFORM
ARM: configs: enable I2C_DESIGNWARE_CORE with I2C_DESIGNWARE_PLATFORM
ARC: configs: enable I2C_DESIGNWARE_CORE with I2C_DESIGNWARE_PLATFORM
i2c: virtio: Constify struct i2c_algorithm and struct virtio_device_id
i2c: rcar: tidyup priv->devtype handling on rcar_i2c_probe()
i2c: imx: Convert comma to semicolon
i2c: jz4780: Use devm_clk_get_enabled() helpers
...
Merge tag 'libnvdimm-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Ira Weiny:
- use Open Firmware helper routines
- fix memory leak when nvdimm labels are incorrect
- remove some dead code
* tag 'libnvdimm-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nvdimm: Remove dead code for ENODEV checking in scan_labels()
nvdimm: Fix devs leaks in scan_labels()
nvdimm: Use of_property_present() and of_property_read_bool()
Merge tag 'leds-next-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds
Pull LED updates from Lee Jones:
- Limited LED current based on thermal conditions in the QCOM flash LED
driver
- Fixed device child node usage in the BD2606MVV and PCA995x drivers
- Used device_for_each_child_node_scoped() to access child nodes in the
IS31FL319X driver
- Reset the LED controller during the probe in the LM3601X driver
- Used device_for_each_child_node() to access device child nodes in the
PCA995X driver
- Fixed CONFIG_LEDS_CLASS_MULTICOLOR dependency in the BlinkM driver
- Replaced msleep() with usleep_range() in the SUN50I-A100 driver
- Used scoped device node handling to simplify error paths in the
AAT1290, KTD2692, and MC13783 drivers
- Added missing of_node_get for probe duration in the MAX77693 driver
- Simplified using for_each_available_child_of_node_scoped() loops when
iterating over device nodes
- Used devm_clk_get_enabled() helpers in the LP55XX driver
- Converted DT bindings from TXT to YAML format for various drivers,
including LM3692x and SC2731-BLTC
- Set num_leds after allocation in the GPIO driver
- Removed irrelevant blink configuration error message in the PCA9532
driver
- Fixed module autoloading with MODULE_DEVICE_TABLE() in the Turris
Omnia driver
* tag 'leds-next-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds: (38 commits)
leds: turris-omnia: Fix module autoloading with MODULE_DEVICE_TABLE()
leds: pca9532: Remove irrelevant blink configuration error message
leds: gpio: Set num_leds after allocation
dt-bindings: leds: Convert leds-lm3692x to YAML format
leds: lp55xx: Use devm_clk_get_enabled() helpers
leds: as3645a: Use device_* to iterate over device child nodes
leds: qcom-lpg: Simplify with scoped for each OF child loop
leds: turris-omnia: Simplify with scoped for each OF child loop
leds: sc27xx: Simplify with scoped for each OF child loop
leds: pca9532: Simplify with scoped for each OF child loop
leds: netxbig: Simplify with scoped for each OF child loop
leds: mt6323: Simplify with scoped for each OF child loop
leds: mc13783: Use scoped device node handling to simplify error paths
leds: lp55xx: Simplify with scoped for each OF child loop
leds: is31fl32xx: Simplify with scoped for each OF child loop
leds: bcm6358: Simplify with scoped for each OF child loop
leds: bcm6328: Simplify with scoped for each OF child loop
leds: aw2013: Simplify with scoped for each OF child loop
leds: 88pm860x: Simplify with scoped for each OF child loop
leds: max77693: Simplify with scoped for each OF child loop
...
Merge tag 'dmaengine-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine updates from Vinod Koul:
"Unusually, more new driver and device support than updates. Couple of
new device support, AMD, Rcar, Intel and New drivers in Freescale,
Loonsoon, AMD and LPC32XX with DT conversion and mode updates etc.
New support:
- Support for AMD Versal Gen 2 DMA IP
- Rcar RZ/G3S SoC dma controller
- Support for Intel Diamond Rapids and Granite Rapids-D dma controllers
- Support for Freescale ls1021a-qdma controller
- New driver for Loongson-1 APB DMA
- New driver for AMD QDMA
- Pl08x in LPC32XX router dma driver
Updates:
- Support for dpdma cyclic dma mode
- XML conversion for marvell xor dma bindings
- Dma clocks documentation for imx dma"
* tag 'dmaengine-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (24 commits)
dmaengine: loongson1-apb-dma: Fix the build warning caused by the size of pdev_irqname
dmaengine: Fix spelling mistakes
dmaengine: Add dma router for pl08x in LPC32XX SoC
dmaengine: fsl-edma: add edma src ID check at request channel
dmaengine: fsl-edma: change to guard(mutex) within fsl_edma3_xlate()
dmaengine: avoid non-constant format string
dmaengine: imx-dma: Remove i.MX21 support
dt-bindings: dma: fsl,imx-dma: Document the DMA clocks
dmaengine: Loongson1: Add Loongson-1 APB DMA driver
dt-bindings: dma: Add Loongson-1 APB DMA
dmaengine: zynqmp_dma: Add support for AMD Versal Gen 2 DMA IP
dt-bindings: dmaengine: zynqmp_dma: Add a new compatible string
dmaengine: idxd: Add new DSA and IAA device IDs for Diamond Rapids platform
dmaengine: idxd: Add a new DSA device ID for Granite Rapids-D platform
dmaengine: ti: k3-udma: Remove unused declarations
dmaengine: amd: qdma: Add AMD QDMA driver
dmaengine: xilinx: dpdma: Add support for cyclic dma mode
dma: ipu: Remove include/linux/dma/ipu-dma.h
dt-bindings: dma: fsl-mxs-dma: Add compatible string "fsl,imx8qxp-dma-apbh"
dt-bindings: fsl-qdma: allow compatible string fallback to fsl,ls1021a-qdma
...
Merge tag 'phy-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
Pull phy updates from Vinod Koul:
"New hw support:
- Rcar usb2 support for RZ/G3S SoC
- Nuvoton MA35 SoC USB 2.0 PHY driver
Removed:
- obsolete qcom,usb-8x16-phy bindings
Updates:
- 4 lane PCIe support for Qualcomm X1E80100
- Constify structure in subsystem update
- Subsystem simplification with scoped for each OF child loop update
- Yaml conversion for Qualcomm sata phy, Hiilicon hi3798cv200-combphy
bindings"
* tag 'phy-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (40 commits)
phy: renesas: rcar-gen3-usb2: Add support for the RZ/G3S SoC
dt-bindings: phy: renesas,usb2-phy: Document RZ/G3S phy bindings
phy: renesas: rcar-gen3-usb2: Add support to initialize the bus
phy: ti: j721e-wiz: Simplify with scoped for each OF child loop
phy: ti: j721e-wiz: Drop OF node reference earlier for simpler code
phy: ti: gmii-sel: Simplify with dev_err_probe()
phy: ti: am654-serdes: Use scoped device node handling to simplify error paths
phy: qcom: qmp-pcie-msm8996: Simplify with scoped for each OF child loop
phy: mediatek: xsphy: Simplify with scoped for each OF child loop
phy: mediatek: tphy: Simplify with scoped for each OF child loop
phy: hisilicon: usb2: Simplify with scoped for each OF child loop
phy: cadence: sierra: Simplify with scoped for each OF child loop
phy: broadcom: brcm-sata: Simplify with scoped for each OF child loop
phy: broadcom: bcm-cygnus-pcie: Simplify with scoped for each OF child loop
phy: nuvoton: add new driver for the Nuvoton MA35 SoC USB 2.0 PHY
dt-bindings: phy: nuvoton,ma35-usb2-phy: add new bindings
phy: qcom: qmp-pcie: Configure all tables on port B PHY
phy: airoha: adjust initialization delay in airoha_pcie_phy_init()
dt-bindings: phy: socionext,uniphier: add top-level constraints
phy: qcom: qmp-pcie: Add Gen4 4-lanes mode for X1E80100
...
Merge tag 'soundwire-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire updates from Vinod Koul:
- bus cleanup for warnings and probe deferral errors suppression
- cadence recheck for status with a delayed work
- intel interrupt rework on reset exit
* tag 'soundwire-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: intel_bus_common: enable interrupts before exiting reset
soundwire: cadence: re-check Peripheral status with delayed_work
soundwire: bus: clean up probe warnings
soundwire: bus: drop unused driver name field
soundwire: bus: suppress probe deferral errors
Merge tag 'ntb-6.12' of https://github.com/jonmason/ntb
Pull PCIe non-transparent bridge updates from Jon Mason:
"Bug fixes for intel ntb driver debugfs, use after free in switchtec
driver, ntb transport rx ring buffers. Also, cleanups in printks,
kernel-docs, and idt driver comment"
* tag 'ntb-6.12' of https://github.com/jonmason/ntb:
ntb: Force physically contiguous allocation of rx ring buffers
ntb: ntb_hw_switchtec: Fix use after free vulnerability in switchtec_ntb_remove due to race condition
ntb: idt: Fix the cacography in ntb_hw_idt.c
NTB: epf: don't misuse kernel-doc marker
NTB: ntb_transport: fix all kernel-doc warnings
ntb: Constify struct bus_type
ntb_perf: Fix printk format
ntb: intel: Fix the NULL vs IS_ERR() bug for debugfs_create_dir()
Merge tag 'firewire-updates-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire updates from Takashi Sakamoto:
"In the FireWire subsystem, tasklets have been used as the bottom half
of 1394 OHCi hardIRQ. In recent kernel updates, BH workqueues have
become available, and some developers have proposed replacing the
tasklet with a BH workqueue.
As a first step towards dropping tasklet use, the 1394 OHCI
isochronous context can use regular workqueues. In this context, the
batch of packets is processed in the specific queue, thus the timing
jitter caused by task scheduling is not so critical.
Additionally, DMA transmission can be scheduled per-packet basis,
therefore the context can be sleep between the operation of
transmissions. Furthermore, in-kernel protocol implementation involves
some CPU-bound tasks, which can sometimes consumes CPU time so long.
These characteristics suggest that normal workqueues are suitable,
through BH workqueues are not.
The replacement with a workqueue allows unit drivers to process the
content of packets in non-atomic context. It brings some reliefs to
some drivers in sound subsystem that spin-lock is not mandatory
anymore during isochronous packet processing.
Summary:
- Replace tasklet with workqueue for isochronous context
- Replace IDR with XArray
- Utilize guard macro where possible
- Print deprecation warning when enabling debug parameter of
firewire-ohci module
- Switch to nonatomic PCM operation"
* tag 'firewire-updates-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: (55 commits)
firewire: core: rename cause flag of tracepoints event
firewire: core: update documentation of kernel APIs for flushing completions
firewire: core: add helper function to retire descriptors
Revert "firewire: core: move workqueue handler from 1394 OHCI driver to core function"
Revert "firewire: core: use mutex to coordinate concurrent calls to flush completions"
firewire: core: use mutex to coordinate concurrent calls to flush completions
firewire: core: move workqueue handler from 1394 OHCI driver to core function
firewire: core: fulfill documentation of fw_iso_context_flush_completions()
firewire: core: expose kernel API to schedule work item to process isochronous context
firewire: core: use WARN_ON_ONCE() to avoid superfluous dumps
ALSA: firewire: use nonatomic PCM operation
firewire: core: non-atomic memory allocation for isochronous event to user client
firewire: ohci: operate IT/IR events in sleepable work process instead of tasklet softIRQ
firewire: core: add local API to queue work item to workqueue specific to isochronous contexts
firewire: core: allocate workqueue to handle isochronous contexts in card
firewire: ohci: obsolete direct usage of printk_ratelimit()
firewire: ohci: deprecate debug parameter
firewire: core: update fw_device outside of device_find_child()
firewire: ohci: fix error path to detect initiated reset in TI TSB41BA3D phy
firewire: core/ohci: minor refactoring for computation of configuration ROM size
...
Merge tag 'pci-v6.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Wait for device readiness after reset by polling Vendor ID and
looking for Configuration RRS instead of polling the Command
register and looking for non-error completions, to avoid hardware
retries done for RRS on non-Vendor ID reads (Bjorn Helgaas)
- Rename CRS Completion Status to RRS ('Request Retry Status') to
match PCIe r6.0 spec usage (Bjorn Helgaas)
- Clear LBMS bit after a manual link retrain so we don't try to
retrain a link when there's no downstream device anymore (Maciej W.
Rozycki)
- Revert to the original link speed after retraining fails instead of
leaving it restricted to 2.5GT/s, so a future device has a chance
to use higher speeds (Maciej W. Rozycki)
- Wait for each level of downstream bus, not just the first, to
become accessible before restoring devices on that bus (Ilpo
Järvinen)
- Add ARCH_PCI_DEV_GROUPS so s390 can add its own attribute_groups
without having to stomp on the core's pdev->dev.groups (Lukas
Wunner)
Driver binding:
- Export pcim_request_region(), a managed counterpart of
pci_request_region(), for use by drivers (Philipp Stanner)
- Export pcim_iomap_region() and deprecate pcim_iomap_regions()
(Philipp Stanner)
- Request the PCI BAR used by xboxvideo (Philipp Stanner)
- Request and map drm/ast BARs with pcim_iomap_region() (Philipp
Stanner)
MSI:
- Add MSI_FLAG_NO_AFFINITY flag for devices that mux MSIs onto a
single IRQ line and cannot set the affinity of each MSI to a
specific CPU core (Marek Vasut)
- Use MSI_FLAG_NO_AFFINITY and remove unnecessary .irq_set_affinity()
implementations in aardvark, altera, brcmstb, dwc, mediatek-gen3,
mediatek, mobiveil, plda, rcar, tegra, vmd, xilinx-nwl,
xilinx-xdma, and xilinx drivers to avoid 'IRQ: set affinity failed'
warnings (Marek Vasut)
Power management:
- Add pwrctl support for ATH11K inside the WCN6855 package (Konrad
Dybcio)
PCI device hotplug:
- Remove unnecessary hpc_ops struct from shpchp (ngn)
- Check for PCI_POSSIBLE_ERROR(), not 0xffffffff, in cpqphp
(weiyufeng)
Virtualization:
- Mark Creative Labs EMU20k2 INTx masking as broken (Alex Williamson)
- Add an ACS quirk for Qualcomm SA8775P, which doesn't advertise ACS
but does provide ACS-like features (Subramanian Ananthanarayanan)
IOMMU:
- Add function 0 DMA alias quirk for Glenfly Arise audio function,
which uses the function 0 Requester ID (WangYuli)
NPEM:
- Add Native PCIe Enclosure Management (NPEM) support for sysfs
control of NVMe RAID storage indicators (ok/fail/locate/
rebuild/etc) (Mariusz Tkaczyk)
- Add support for the ACPI _DSM PCIe SSD status LED management, which
is functionally similar to NPEM but mediated by platform firmware
(Mariusz Tkaczyk)
Device trees:
- Drop minItems and maxItems from ranges in PCI generic host binding
since host bridges may have several MMIO and I/O port apertures
(Frank Li)
- Add kirin, rcar-gen2, uniphier DT binding top-level constraints for
clocks (Krzysztof Kozlowski)
Altera PCIe controller driver:
- Convert altera DT bindings from text to YAML (Matthew Gerlach)
- Replace TLP_REQ_ID() with macro PCI_DEVID(), which does the same
thing and is what other drivers use (Jinjie Ruan)
Broadcom STB PCIe controller driver:
- Add DT binding maxItems for reset controllers (Jim Quinlan)
- Use the 'bridge' reset method if described in the DT (Jim Quinlan)
- Use the 'swinit' reset method if described in the DT (Jim Quinlan)
- Add 'has_phy' so the existence of a 'rescal' reset controller
doesn't imply software control of it (Jim Quinlan)
- Add support for many inbound DMA windows (Jim Quinlan)
- Rename SoC 'type' to 'soc_base' express the fact that SoCs come in
families of multiple similar devices (Jim Quinlan)
- Add Broadcom 7712 DT description and driver support (Jim Quinlan)
- Add imx6q-pcie 'dbi2' and 'atu' reg-names for i.MX8M Endpoints
(Richard Zhu)
- Fix a code restructuring error that caused i.MX8MM and i.MX8MP
Endpoints to fail to establish link (Richard Zhu)
- Fix i.MX8MP Endpoint occasional failure to trigger MSI by enforcing
outbound alignment requirement (Richard Zhu)
- Call phy_power_off() in the .probe() error path (Frank Li)
- Rename internal names from imx6_* to imx_* since i.MX7/8/9 are also
supported (Frank Li)
- Manage Refclk by using SoC-specific callbacks instead of switch
statements (Frank Li)
- Manage core reset by using SoC-specific callbacks instead of switch
statements (Frank Li)
- Expand comments for erratum ERR010728 workaround (Frank Li)
- Use generic PHY APIs to configure mode, speed, and submode, which
is harmless for devices that implement their own internal PHY
management and don't set the generic imx_pcie->phy (Frank Li)
- Add i.MX8Q (i.MX8QM, i.MX8QXP, and i.MX8DXL) DT binding and driver
Root Complex support (Richard Zhu)
- Add layerscape-pcie DT binding deprecated 'num-viewport' property
to address a DT checker warning (Frank Li)
- Change layerscape-pcie DT binding 'fsl,pcie-scfg' to phandle-array
(Frank Li)
Loongson PCIe controller driver:
- Increase max PCI hosts to 8 for Loongson-3C6000 and newer chipsets
(Huacai Chen)
Marvell Aardvark PCIe controller driver:
- Fix issue with emulating Configuration RRS for two-byte reads of
Vendor ID; previously it only worked for four-byte reads (Bjorn
Helgaas)
MediaTek PCIe Gen3 controller driver:
- Add per-SoC struct mtk_gen3_pcie_pdata to support multiple SoC
types (Lorenzo Bianconi)
- Use reset_bulk APIs to manage PHY reset lines (Lorenzo Bianconi)
- Add DT and driver support for Airoha EN7581 PCIe controller
(Lorenzo Bianconi)
Qualcomm PCIe controller driver:
- Update qcom,pcie-sc7280 DT binding with eight interrupts (Rayyan
Ansari)
- Add back DT 'vddpe-3v3-supply', which was incorrectly removed
earlier (Johan Hovold)
- Drop endpoint redundant masking of global IRQ events (Manivannan
Sadhasivam)
- Clarify unknown global IRQ message and only log it once to avoid a
flood (Manivannan Sadhasivam)
- Add 'linux,pci-domain' property to endpoint DT binding (Manivannan
Sadhasivam)
- Assign PCI domain number for endpoint controllers (Manivannan
Sadhasivam)
- Add 'qcom_pcie_ep' and the PCI domain number to IRQ names for
endpoint controller (Manivannan Sadhasivam)
- Add global SPI interrupt for PCIe link events to DT binding
(Manivannan Sadhasivam)
- Add global RC interrupt handler to handle 'Link up' events and
automatically enumerate hot-added devices (Manivannan Sadhasivam)
- Avoid mirroring of DBI and iATU register space so it doesn't
overlap BAR MMIO space (Prudhvi Yarlagadda)
- Enable controller resources like PHY only after PERST# is
deasserted to partially avoid the problem that the endpoint SoC
crashes when accessing things when Refclk is absent (Manivannan
Sadhasivam)
- Add 16.0 GT/s equalization and RX lane margining settings (Shashank
Babu Chinta Venkata)
- Pass domain number to pci_bus_release_domain_nr() explicitly to
avoid a NULL pointer dereference (Manivannan Sadhasivam)
Renesas R-Car PCIe controller driver:
- Make the read-only const array 'check_addr' static (Colin Ian King)
- Add R-Car V4M (R8A779H0) PCIe host and endpoint to DT binding
(Yoshihiro Shimoda)
TI DRA7xx PCIe controller driver:
- Request IRQF_ONESHOT for 'dra7xx-pcie-main' IRQ since the primary
handler is NULL (Siddharth Vadapalli)
- Handle IRQ request errors during root port and endpoint probe
(Siddharth Vadapalli)
TI J721E PCIe driver:
- Add DT 'ti,syscon-acspcie-proxy-ctrl' and driver support to enable
the ACSPCIE module to drive Refclk for the Endpoint (Siddharth
Vadapalli)
- Extract the cadence link setup from cdns_pcie_host_setup() so link
setup can be done separately during resume (Thomas Richard)
- Add T_PERST_CLK_US definition for the mandatory delay between
Refclk becoming stable and PERST# being deasserted (Thomas Richard)
- Add j721e suspend and resume support (Théo Lebrun)
TI Keystone PCIe controller driver:
- Fix NULL pointer checking when applying MRRS limitation quirk for
AM65x SR 1.0 Errata #i2037 (Dan Carpenter)
Xilinx NWL PCIe controller driver:
- Fix off-by-one error in INTx IRQ handler that caused INTx
interrupts to be lost or delivered as the wrong interrupt (Sean
Anderson)
Mike Snitzer [Thu, 5 Sep 2024 19:10:00 +0000 (15:10 -0400)]
nfs: add "NFS Client and Server Interlock" section to localio.rst
This section answers a new FAQ entry:
9. How does LOCALIO make certain that object lifetimes are managed
properly given NFSD and NFS operate in different contexts?
See the detailed "NFS Client and Server Interlock" section below.
The first half of the section details NeilBrown's elegant design
for LOCALIO's nfs_uuid_t based interlock and is heavily based on
Neil's "net namespace refcounting" description here:
https://marc.info/?l=linux-nfs&m=172498546024767&w=2
The second half of the section details the per-cpu-refcount introduced
to ensure NFSD's nfsd_serv isn't destroyed while in use by a LOCALIO
client.
This document gives an overview of the LOCALIO auxiliary RPC protocol
added to the Linux NFS client and server to allow them to reliably
handshake to determine if they are on the same host.
Once an NFS client and server handshake as "local", the client will
bypass the network RPC protocol for read, write and commit operations.
Due to this XDR and RPC bypass, these operations will operate faster.
Mike Snitzer [Thu, 5 Sep 2024 19:09:57 +0000 (15:09 -0400)]
nfs: implement client support for NFS_LOCALIO_PROGRAM
The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL"
RPC method that allows the Linux NFS client to verify the local Linux
NFS server can see the nonce (single-use UUID) the client generated and
made available in nfs_common for subsequent lookup and verification
by the NFS server. If matched, the NFS server populates members in the
nfs_uuid_t struct. The NFS client then transfers these nfs_uuid_t
struct member pointers to the nfs_client struct and cleans up the
nfs_uuid_t struct. See: fs/nfs/localio.c:nfs_local_probe()
This protocol isn't part of an IETF standard, nor does it need to be
considering it is Linux-to-Linux auxiliary RPC protocol that amounts
to an implementation detail.
Localio is only supported when UNIX-style authentication (AUTH_UNIX, aka
AUTH_SYS) is used (enforced by fs/nfs/localio.c:nfs_local_probe()).
The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of
the fixed UUID_SIZE (16 bytes). The fixed size opaque encode and decode
XDR methods are used instead of the less efficient variable sized
methods.
Having a nonce (single-use uuid) is better than using the same uuid
for the life of the server, and sending it proactively by client
rather than reactively by the server is also safer.
nfs/localio: use dedicated workqueues for filesystem read and write
For localio access, don't call filesystem read() and write() routines
directly. This solves two problems:
1) localio writes need to use a normal (non-memreclaim) unbound
workqueue. This avoids imposing new requirements on how underlying
filesystems process frontend IO, which would cause a large amount
of work to update all filesystems. Without this change, when XFS
starts getting low on space, XFS flushes work on a non-memreclaim
work queue, which causes a priority inversion problem:
2) Some filesystem writeback routines can end up taking up a lot of
stack space (particularly XFS). Instead of risking running over
due to the extra overhead from the NFS stack, we should just call
these routines from a workqueue job. Since we need to do this to
address 1) above we're able to avoid possibly blowing the stack
"for free".
Use of dedicated workqueues improves performance over using the
system_unbound_wq.
Also, the creds used to open the file are used to override_creds() in
both nfs_local_call_read() and nfs_local_call_write() -- otherwise the
workqueue could have elevated capabilities (which the caller may not).
Lastly, care is taken to set PF_LOCAL_THROTTLE | PF_MEMALLOC_NOIO in
nfs_do_local_write() to avoid writeback deadlocks.
The PF_LOCAL_THROTTLE flag prevents deadlocks in balance_dirty_pages()
by causing writes to only be throttled against other writes to the
same bdi (it keeps the throttling local). Normally all writes to
bdi(s) are throttled equally (after throughput factors are allowed
for).
The PF_MEMALLOC_NOIO flag prevents the lower filesystem IO from
causing memory reclaim to re-enter filesystems or IO devices and so
prevents deadlocks from occuring where IO that cleans pages is
waiting on IO to complete.
Add client support for bypassing NFS for localhost reads, writes, and
commits. This is only useful when the client and the server are
running on the same host.
nfs_local_probe() is stubbed out, later commits will enable client and
server handshake via a Linux-only LOCALIO auxiliary RPC protocol.
This has dynamic binding with the nfsd module (via nfs_localio module
which is part of nfs_common). LOCALIO will only work if nfsd is
already loaded.
The "localio_enabled" nfs kernel module parameter can be used to
disable and enable the ability to use LOCALIO support.
CONFIG_NFS_LOCALIO enables NFS client support for LOCALIO.
Lastly, LOCALIO uses an nfsd_file to initiate all IO. To make proper
use of nfsd_file (and nfsd's filecache) its lifetime (duration before
nfsd_file_put is called) must extend until after commit, read and
write operations. So rather than immediately drop the nfsd_file
reference in nfs_local_open_fh(), that doesn't happen until
nfs_local_pgio_release() for read/write and not until
nfs_local_release_commit_data() for commit. The same applies to the
reference held on nfsd's nn->nfsd_serv. Both objects' lifetimes and
associated references are managed through calls to
nfs_to->nfsd_file_put_local().
Mike Snitzer [Thu, 5 Sep 2024 19:09:51 +0000 (15:09 -0400)]
nfsd: implement server support for NFS_LOCALIO_PROGRAM
The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL"
RPC method that allows the Linux NFS client to verify the local Linux
NFS server can see the nonce (single-use UUID) the client generated and
made available in nfs_common. The server expects this protocol to use
the same transport as NFS and NFSACL for its RPCs. This protocol
isn't part of an IETF standard, nor does it need to be considering it
is Linux-to-Linux auxiliary RPC protocol that amounts to an
implementation detail.
The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of
the fixed UUID_SIZE (16 bytes). The fixed size opaque encode and decode
XDR methods are used instead of the less efficient variable sized
methods.
The RPC program number for the NFS_LOCALIO_PROGRAM is 400122 (as assigned
by IANA, see https://www.iana.org/assignments/rpc-program-numbers/ ):
Linux Kernel Organization 400122 nfslocalio
Add server support for bypassing NFS for localhost reads, writes, and
commits. This is only useful when both the client and server are
running on the same host.
If nfsd_open_local_fh() fails then the NFS client will both retry and
fallback to normal network-based read, write and commit operations if
localio is no longer supported.
Care is taken to ensure the same NFS security mechanisms are used
(authentication, etc) regardless of whether localio or regular NFS
access is used. The auth_domain established as part of the traditional
NFS client access to the NFS server is also used for localio. Store
auth_domain for localio in nfsd_uuid_t and transfer it to the client
if it is local to the server.
Relative to containers, localio gives the client access to the network
namespace the server has. This is required to allow the client to
access the server's per-namespace nfsd_net struct.
This commit also introduces the use of NFSD's percpu_ref to interlock
nfsd_destroy_serv and nfsd_open_local_fh, to ensure nn->nfsd_serv is
not destroyed while in use by nfsd_open_local_fh and other LOCALIO
client code.
CONFIG_NFS_LOCALIO enables NFS server support for LOCALIO.
Mike Snitzer [Thu, 5 Sep 2024 19:09:49 +0000 (15:09 -0400)]
nfs_common: prepare for the NFS client to use nfsd_file for LOCALIO
The next commit will introduce nfsd_open_local_fh() which returns an
nfsd_file structure. This commit exposes LOCALIO's required NFSD
symbols to the NFS client:
- Make nfsd_open_local_fh() symbol and other required NFSD symbols
available to NFS in a global 'nfs_to' nfsd_localio_operations
struct (global access suggested by Trond, nfsd_localio_operations
suggested by NeilBrown). The next commit will also introduce
nfsd_localio_ops_init() that init_nfsd() will call to initialize
'nfs_to'.
- Introduce nfsd_file_file() that provides access to nfsd_file's
backing file. Keeps nfsd_file structure opaque to NFS client (as
suggested by Jeff Layton).
- Introduce nfsd_file_put_local() that will put the reference to the
nfsd_file's associated nn->nfsd_serv and then put the reference to
the nfsd_file (as suggested by NeilBrown).
fs/nfs_common/nfslocalio.c provides interfaces that enable an NFS
client to generate a nonce (single-use UUID) and associated nfs_uuid_t
struct, register it with nfs_common for subsequent lookup and
verification by the NFS server and if matched the NFS server populates
members in the nfs_uuid_t struct.
nfs_common's nfs_uuids list is the basis for localio enablement, as
such it has members that point to nfsd memory for direct use by the
client (e.g. 'net' is the server's network namespace, through it the
client can access nn->nfsd_serv).
This commit also provides the base nfs_uuid_t interfaces to allow
proper net namespace refcounting for the LOCALIO use case.
CONFIG_NFS_LOCALIO controls the nfs_common, NFS server and NFS client
enablement for LOCALIO. If both NFS_FS=m and NFSD=m then
NFS_COMMON_LOCALIO_SUPPORT=m and nfs_localio.ko is built (and provides
nfs_common's LOCALIO support).