- kallsyms: Clean up interaction with LTO suffixes (Song Liu)
- refcount: Report UAF for refcount_sub_and_test(0) when counter==0
(Petr Pavlu)
- kunit/overflow: Avoid misallocation of driver name (Ivan Orlov)
* tag 'hardening-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
kallsyms: Match symbols exactly with CONFIG_LTO_CLANG
kallsyms: Do not cleanup .llvm.<hash> suffix before sorting symbols
kunit/overflow: Fix UB in overflow_allocation_test
gcc-plugins: randstruct: Remove GCC 4.7 or newer requirement
refcount: Report UAF for refcount_sub_and_test(0) when counter==0
- netfilter:
- nf_queue: drop packets with cloned unconfirmed conntracks
- nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
- vsock: fix recursive ->recvmsg calls
- dsa: vsc73xx: fix MDIO bus access and PHY opera
- eth: gtp: pull network headers in gtp_dev_xmit()
- eth: igc: fix packet still tx after gate close by reducing i226 MAC
retry buffer
- eth: mana: fix RX buf alloc_size alignment and atomic op panic
- eth: hns3: fix a deadlock problem when config TC during resetting"
* tag 'net-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits)
net: hns3: use correct release function during uninitialization
net: hns3: void array out of bound when loop tnl_num
net: hns3: fix a deadlock problem when config TC during resetting
net: hns3: use the user's cfg after reset
net: hns3: fix wrong use of semaphore up
selftests: net: lib: kill PIDs before del netns
pse-core: Conditionally set current limit during PI regulator registration
net: thunder_bgx: Fix netdev structure allocation
net: ethtool: Allow write mechanism of LPL and both LPL and EPL
vsock: fix recursive ->recvmsg calls
selftest: af_unix: Fix kselftest compilation warnings
netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
netfilter: nf_tables: Introduce nf_tables_getobj_single
netfilter: nf_tables: Audit log dump reset after the fact
selftests: netfilter: add test for br_netfilter+conntrack+queue combination
netfilter: nf_queue: drop packets with cloned unconfirmed conntracks
netfilter: flowtable: initialise extack before use
netfilter: nfnetlink: Initialise extack before use in ACKs
netfilter: allow ipv6 fragments to arrive on different devices
tcp: Update window clamping condition
...
* tag 'media/v6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: atomisp: Fix streaming no longer working on BYT / ISP2400 devices
media: Revert "media: dvb-usb: Fix unexpected infinite loop in dvb_usb_read_remote_control()"
Linus Torvalds [Thu, 15 Aug 2024 17:10:59 +0000 (10:10 -0700)]
Merge tag 'ata-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fix from Niklas Cassel:
- Revert a recent change to sense data generation.
Sense data can be in either fixed format or descriptor format.
The D_SENSE bit in the Control mode page controls which format to
generate. All places but one respected the D_SENSE bit.
The recent change fixed the one place that didn't respect the D_SENSE
bit. However, it turns out that hdparm, hddtemp and udisks
(incorrectly) assumes sense data in descriptor format.
Therefore, even while the change was technically correct, revert it,
since even if these user space programs are fixed to (correctly) look
at the format type before parsing the data, older versions of these
tools will be around roughly forever.
* tag 'ata-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
Revert "ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error"
Song Liu [Wed, 7 Aug 2024 22:05:13 +0000 (15:05 -0700)]
kallsyms: Match symbols exactly with CONFIG_LTO_CLANG
With CONFIG_LTO_CLANG=y, the compiler may add .llvm.<hash> suffix to
function names to avoid duplication. APIs like kallsyms_lookup_name()
and kallsyms_on_each_match_symbol() tries to match these symbol names
without the .llvm.<hash> suffix, e.g., match "c_stop" with symbol
c_stop.llvm.17132674095431275852. This turned out to be problematic
for use cases that require exact match, for example, livepatch.
Fix this by making the APIs to match symbols exactly.
Ivan Orlov [Thu, 15 Aug 2024 00:04:31 +0000 (01:04 +0100)]
kunit/overflow: Fix UB in overflow_allocation_test
The 'device_name' array doesn't exist out of the
'overflow_allocation_test' function scope. However, it is being used as
a driver name when calling 'kunit_driver_create' from
'kunit_device_register'. It produces the kernel panic with KASAN
enabled.
Since this variable is used in one place only, remove it and pass the
device name into kunit_device_register directly as an ascii string.
Paolo Abeni [Thu, 15 Aug 2024 11:25:06 +0000 (13:25 +0200)]
Merge tag 'nf-24-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Ignores ifindex for types other than mcast/linklocal in ipv6 frag
reasm, from Tom Hughes.
2) Initialize extack for begin/end netlink message marker in batch,
from Donald Hunter.
3) Initialize extack for flowtable offload support, also from Donald.
4) Dropped packets with cloned unconfirmed conntracks in nfqueue,
later it should be possible to explore lookup after reinject but
Florian prefers this approach at this stage. From Florian Westphal.
5) Add selftest for cloned unconfirmed conntracks in nfqueue for
previous update.
6) Audit after filling netlink header successfully in object dump,
from Phil Sutter.
7-8) Fix concurrent dump and reset which could result in underflow
counter / quota objects.
netfilter pull request 24-08-15
* tag 'nf-24-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
netfilter: nf_tables: Introduce nf_tables_getobj_single
netfilter: nf_tables: Audit log dump reset after the fact
selftests: netfilter: add test for br_netfilter+conntrack+queue combination
netfilter: nf_queue: drop packets with cloned unconfirmed conntracks
netfilter: flowtable: initialise extack before use
netfilter: nfnetlink: Initialise extack before use in ACKs
netfilter: allow ipv6 fragments to arrive on different devices
====================
Peiyang Wang [Tue, 13 Aug 2024 14:10:24 +0000 (22:10 +0800)]
net: hns3: use correct release function during uninitialization
pci_request_regions is called to apply for PCI I/O and memory resources
when the driver is initialized, Therefore, when the driver is uninstalled,
pci_release_regions should be used to release PCI I/O and memory resources
instead of pci_release_mem_regions is used to release memory reasouces
only.
Peiyang Wang [Tue, 13 Aug 2024 14:10:23 +0000 (22:10 +0800)]
net: hns3: void array out of bound when loop tnl_num
When query reg inf of SSU, it loops tnl_num times. However, tnl_num comes
from hardware and the length of array is a fixed value. To void array out
of bound, make sure the loop time is not greater than the length of array
Jie Wang [Tue, 13 Aug 2024 14:10:22 +0000 (22:10 +0800)]
net: hns3: fix a deadlock problem when config TC during resetting
When config TC during the reset process, may cause a deadlock, the flow is
as below:
pf reset start
│
▼
......
setup tc │
│ ▼
▼ DOWN: napi_disable()
napi_disable()(skip) │
│ │
▼ ▼
...... ......
│ │
▼ │
napi_enable() │
▼
UINIT: netif_napi_del()
│
▼
......
│
▼
INIT: netif_napi_add()
│
▼
...... global reset start
│ │
▼ ▼
UP: napi_enable()(skip) ......
│ │
▼ ▼
...... napi_disable()
In reset process, the driver will DOWN the port and then UINIT, in this
case, the setup tc process will UP the port before UINIT, so cause the
problem. Adds a DOWN process in UINIT to fix it.
Fixes: bb6b94a896d4 ("net: hns3: Add reset interface implementation in client") Signed-off-by: Jie Wang <[email protected]> Signed-off-by: Jijie Shao <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
Peiyang Wang [Tue, 13 Aug 2024 14:10:21 +0000 (22:10 +0800)]
net: hns3: use the user's cfg after reset
Consider the followed case that the user change speed and reset the net
interface. Before the hw change speed successfully, the driver get old
old speed from hw by timer task. After reset, the previous speed is config
to hw. As a result, the new speed is configed successfully but lost after
PF reset. The followed pictured shows more dirrectly.
This patch save new speed if hw change speed successfully, which will be
used after reset successfully.
Fixes: 2d03eacc0b7e ("net: hns3: Only update mac configuation when necessary") Signed-off-by: Peiyang Wang <[email protected]> Signed-off-by: Jijie Shao <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
Jie Wang [Tue, 13 Aug 2024 14:10:20 +0000 (22:10 +0800)]
net: hns3: fix wrong use of semaphore up
Currently, if hns3 PF or VF FLR reset failed after five times retry,
the reset done process will directly release the semaphore
which has already released in hclge_reset_prepare_general.
This will cause down operation fail.
So this patch fixes it by adding reset state judgement. The up operation is
only called after successful PF FLR reset.
Fixes: 8627bdedc435 ("net: hns3: refactor the precedure of PF FLR") Fixes: f28368bb4542 ("net: hns3: refactor the procedure of VF FLR") Signed-off-by: Jie Wang <[email protected]> Signed-off-by: Jijie Shao <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
When deleting netns, it is possible to still have some tasks running,
e.g. background tasks like tcpdump running in the background, not
stopped because the test has been interrupted.
Before deleting the netns, it is then safer to kill all attached PIDs,
if any. That should reduce some noises after the end of some tests, and
help with the debugging of some issues. That's why this modification is
seen as a "fix".
Oleksij Rempel [Tue, 13 Aug 2024 07:37:19 +0000 (09:37 +0200)]
pse-core: Conditionally set current limit during PI regulator registration
Fix an issue where `devm_regulator_register()` would fail for PSE
controllers that do not support current limit control, such as simple
GPIO-based controllers like the podl-pse-regulator. The
`REGULATOR_CHANGE_CURRENT` flag and `max_uA` constraint are now
conditionally set only if the `pi_set_current_limit` operation is
supported. This change prevents the regulator registration routine from
attempting to call `pse_pi_set_current_limit()`, which would return
`-EOPNOTSUPP` and cause the registration to fail.
Marc Zyngier [Mon, 12 Aug 2024 14:13:22 +0000 (15:13 +0100)]
net: thunder_bgx: Fix netdev structure allocation
Commit 94833addfaba ("net: thunderx: Unembed netdev structure") had
a go at dynamically allocating the netdev structures for the thunderx_bgx
driver. This change results in my ThunderX box catching fire (to be fair,
it is what it does best).
The issues with this change are that:
- bgx_lmac_enable() is called *after* bgx_acpi_register_phy() and
bgx_init_of_phy(), both expecting netdev to be a valid pointer.
- bgx_init_of_phy() populates the MAC addresses for *all* LMACs
attached to a given BGX instance, and thus needs netdev for each of
them to have been allocated.
There is a few things to be said about how the driver mixes LMAC and
BGX states which leads to this sorry state, but that's beside the point.
To address this, go back to a situation where all netdev structures
are allocated before the driver starts relying on them, and move the
freeing of these structures to driver removal. Someone brave enough
can always go and restructure the driver if they want.
Danielle Ratson [Mon, 12 Aug 2024 14:08:24 +0000 (17:08 +0300)]
net: ethtool: Allow write mechanism of LPL and both LPL and EPL
CMIS 5.2 standard section 9.4.2 defines four types of firmware update
supported mechanism: None, only LPL, only EPL, both LPL and EPL.
Currently, only LPL (Local Payload) type of write firmware block is
supported. However, if the module supports both LPL and EPL the flashing
process wrongly fails for no supporting LPL.
Fix that, by allowing the write mechanism to be LPL or both LPL and
EPL.
Cong Wang [Mon, 12 Aug 2024 02:21:53 +0000 (19:21 -0700)]
vsock: fix recursive ->recvmsg calls
After a vsock socket has been added to a BPF sockmap, its prot->recvmsg
has been replaced with vsock_bpf_recvmsg(). Thus the following
recursiion could happen:
vsock_bpf_recvmsg()
-> __vsock_recvmsg()
-> vsock_connectible_recvmsg()
-> prot->recvmsg()
-> vsock_bpf_recvmsg() again
We need to fix it by calling the original ->recvmsg() without any BPF
sockmap logic in __vsock_recvmsg().
Linus Torvalds [Thu, 15 Aug 2024 00:56:15 +0000 (17:56 -0700)]
Merge tag 'for-6.11-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- extend tree-checker verification of directory item type
- fix regression in page/folio and extent state tracking in xarray, the
dirty status can get out of sync and can cause problems e.g. a hang
- in send, detect last extent and allow to clone it instead of sending
it as write, reduces amount of data transferred in the stream
- fix checking extent references when cleaning deleted subvolumes
- fix one more case in the extent map shrinker, let it run only in the
kswapd context so it does not cause latency spikes during other
operations
* tag 'for-6.11-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix invalid mapping of extent xarray state
btrfs: send: allow cloning non-aligned extent if it ends at i_size
btrfs: only run the extent map shrinker from kswapd tasks
btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type
btrfs: check delayed refs when we're checking if a ref exists
Phil Sutter [Fri, 9 Aug 2024 13:07:32 +0000 (15:07 +0200)]
netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
Objects' dump callbacks are not concurrency-safe per-se with reset bit
set. If two CPUs perform a reset at the same time, at least counter and
quota objects suffer from value underrun.
Prevent this by introducing dedicated locking callbacks for nfnetlink
and the asynchronous dump handling to serialize access.
Fixes: 43da04a593d8 ("netfilter: nf_tables: atomic dump and reset for stateful objects") Signed-off-by: Phil Sutter <[email protected]> Reviewed-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
Phil Sutter [Fri, 9 Aug 2024 13:07:30 +0000 (15:07 +0200)]
netfilter: nf_tables: Audit log dump reset after the fact
In theory, dumpreset may fail and invalidate the preceeding log message.
Fix this and use the occasion to prepare for object reset locking, which
benefits from a few unrelated changes:
* Add an early call to nfnetlink_unicast if not resetting which
effectively skips the audit logging but also unindents it.
* Extract the table's name from the netlink attribute (which is verified
via earlier table lookup) to not rely upon validity of the looked up
table pointer.
* Do not use local variable family, it will vanish.
selftests: netfilter: add test for br_netfilter+conntrack+queue combination
Trigger cloned skbs leaving softirq protection.
This triggers splat without the preceeding change
("netfilter: nf_queue: drop packets with cloned unconfirmed
conntracks"):
WARNING: at net/netfilter/nf_conntrack_core.c:1198 __nf_conntrack_confirm..
because local delivery and forwarding will race for confirmation.
Linus Torvalds [Wed, 14 Aug 2024 20:46:24 +0000 (13:46 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"s390:
- Fix failure to start guests with kvm.use_gisa=0
- Panic if (un)share fails to maintain security.
ARM:
- Use kvfree() for the kvmalloc'd nested MMUs array
- Set of fixes to address warnings in W=1 builds
- Make KVM depend on assembler support for ARMv8.4
- Fix for vgic-debug interface for VMs without LPIs
- Actually check ID_AA64MMFR3_EL1.S1PIE in get-reg-list selftest
- Minor code / comment cleanups for configuring PAuth traps
- Take kvm->arch.config_lock to prevent destruction / initialization
race for a vCPU's CPUIF which may lead to a UAF
x86:
- Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)
- Fix smatch issues
- Small cleanups
- Make x2APIC ID 100% readonly
- Fix typo in uapi constant
Generic:
- Use synchronize_srcu_expedited() on irqfd shutdown"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
KVM: SEV: uapi: fix typo in SEV_RET_INVALID_CONFIG
KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)
KVM: eventfd: Use synchronize_srcu_expedited() on shutdown
KVM: selftests: Add a testcase to verify x2APIC is fully readonly
KVM: x86: Make x2APIC ID 100% readonly
KVM: x86: Use this_cpu_ptr() instead of per_cpu_ptr(smp_processor_id())
KVM: x86: hyper-v: Remove unused inline function kvm_hv_free_pa_page()
KVM: SVM: Fix an error code in sev_gmem_post_populate()
KVM: SVM: Fix uninitialized variable bug
KVM: arm64: vgic: Hold config_lock while tearing down a CPU interface
KVM: selftests: arm64: Correct feature test for S1PIE in get-reg-list
KVM: arm64: Tidying up PAuth code in KVM
KVM: arm64: vgic-debug: Exit the iterator properly w/o LPI
KVM: arm64: Enforce dependency on an ARMv8.4-aware toolchain
s390/uv: Panic for set and remove shared access UVC errors
KVM: s390: fix validity interception issue when gisa is switched off
docs: KVM: Fix register ID of SPSR_FIQ
KVM: arm64: vgic: fix unexpected unlock sparse warnings
KVM: arm64: fix kdoc warnings in W=1 builds
KVM: arm64: fix override-init warnings in W=1 builds
...
Tom Hughes [Tue, 6 Aug 2024 11:40:52 +0000 (12:40 +0100)]
netfilter: allow ipv6 fragments to arrive on different devices
Commit 264640fc2c5f4 ("ipv6: distinguish frag queues by device
for multicast and link-local packets") modified the ipv6 fragment
reassembly logic to distinguish frag queues by device for multicast
and link-local packets but in fact only the main reassembly code
limits the use of the device to those address types and the netfilter
reassembly code uses the device for all packets.
This means that if fragments of a packet arrive on different interfaces
then netfilter will fail to reassemble them and the fragments will be
expired without going any further through the filters.
Fixes: 648700f76b03 ("inet: frags: use rhashtables for reassembly units") Signed-off-by: Tom Hughes <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)
Disallow read-only memslots for SEV-{ES,SNP} VM types, as KVM can't
directly emulate instructions for ES/SNP, and instead the guest must
explicitly request emulation. Unless the guest explicitly requests
emulation without accessing memory, ES/SNP relies on KVM creating an MMIO
SPTE, with the subsequent #NPF being reflected into the guest as a #VC.
But for read-only memslots, KVM deliberately doesn't create MMIO SPTEs,
because except for ES/SNP, doing so requires setting reserved bits in the
SPTE, i.e. the SPTE can't be readable while also generating a #VC on
writes. Because KVM never creates MMIO SPTEs and jumps directly to
emulation, the guest never gets a #VC. And since KVM simply resumes the
guest if ES/SNP guests trigger emulation, KVM effectively puts the vCPU
into an infinite #NPF loop if the vCPU attempts to write read-only memory.
Disallow read-only memory for all VMs with protected state, i.e. for
upcoming TDX VMs as well as ES/SNP VMs. For TDX, it's actually possible
to support read-only memory, as TDX uses EPT Violation #VE to reflect the
fault into the guest, e.g. KVM could configure read-only SPTEs with RX
protections and SUPPRESS_VE=0. But there is no strong use case for
supporting read-only memslots on TDX, e.g. the main historical usage is
to emulate option ROMs, but TDX disallows executing from shared memory.
And if someone comes along with a legitimate, strong use case, the
restriction can always be lifted for TDX.
Don't bother trying to retroactively apply the restriction to SEV-ES
VMs that are created as type KVM_X86_DEFAULT_VM. Read-only memslots can't
possibly work for SEV-ES, i.e. disallowing such memslots is really just
means reporting an error to userspace instead of silently hanging vCPUs.
Trying to deal with the ordering between KVM_SEV_INIT and memslot creation
isn't worth the marginal benefit it would provide userspace.
Linus Torvalds [Wed, 14 Aug 2024 16:23:20 +0000 (09:23 -0700)]
Merge tag 'selinux-pr-20240814' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fixes from Paul Moore:
- Fix a xperms counting problem where we adding to the xperms count
even if we failed to add the xperm.
- Propogate errors from avc_add_xperms_decision() back to the caller so
that we can trigger the proper cleanup and error handling.
- Revert our use of vma_is_initial_heap() in favor of our older logic
as vma_is_initial_heap() doesn't correctly handle the no-heap case
and it is causing issues with the SELinux process/execheap access
control. While the older SELinux logic may not be perfect, it
restores the expected user visible behavior.
Hopefully we will be able to resolve the problem with the
vma_is_initial_heap() macro with the mm folks, but we need to fix
this in the meantime.
* tag 'selinux-pr-20240814' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: revert our use of vma_is_initial_heap()
selinux: add the processing of the failure of avc_add_xperms_decision()
selinux: fix potential counting error in avc_add_xperms_decision()
Linus Torvalds [Wed, 14 Aug 2024 16:06:28 +0000 (09:06 -0700)]
Merge tag 'vfs-6.11-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"VFS:
- Fix the name of file lease slab cache. When file leases were split
out of file locks the name of the file lock slab cache was used for
the file leases slab cache as well.
- Fix a type in take_fd() helper.
- Fix infinite directory iteration for stable offsets in tmpfs.
- When the icache is pruned all reclaimable inodes are marked with
I_FREEING and other processes that try to lookup such inodes will
block.
But some filesystems like ext4 can trigger lookups in their inode
evict callback causing deadlocks. Ext4 does such lookups if the
ea_inode feature is used whereby a separate inode may be used to
store xattrs.
Introduce I_LRU_ISOLATING which pins the inode while its pages are
reclaimed. This avoids inode deletion during inode_lru_isolate()
avoiding the deadlock and evict is made to wait until
I_LRU_ISOLATING is done.
netfs:
- Fault in smaller chunks for non-large folio mappings for
filesystems that haven't been converted to large folios yet.
- Fix the CONFIG_NETFS_DEBUG config option. The config option was
renamed a short while ago and that introduced two minor issues.
First, it depended on CONFIG_NETFS whereas it wants to depend on
CONFIG_NETFS_SUPPORT. The former doesn't exist, while the latter
does. Second, the documentation for the config option wasn't fixed
up.
- Revert the removal of the PG_private_2 writeback flag as ceph is
using it and fix how that flag is handled in netfs.
- Fix DIO reads on 9p. A program watching a file on a 9p mount
wouldn't see any changes in the size of the file being exported by
the server if the file was changed directly in the source
filesystem. Fix this by attempting to read the full size specified
when a DIO read is requested.
- Fix a NULL pointer dereference bug due to a data race where a
cachefiles cookies was retired even though it was still in use.
Check the cookie's n_accesses counter before discarding it.
nsfs:
- Fix ioctl declaration for NS_GET_MNTNS_ID from _IO() to _IOR() as
the kernel is writing to userspace.
pidfs:
- Prevent the creation of pidfds for kthreads until we have a
use-case for it and we know the semantics we want. It also confuses
userspace why they can get pidfds for kthreads.
squashfs:
- Fix an unitialized value bug reported by KMSAN caused by a
corrupted symbolic link size read from disk. Check that the
symbolic link size is not larger than expected"
* tag 'vfs-6.11-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
Squashfs: sanity check symbolic link size
9p: Fix DIO read through netfs
vfs: Don't evict inode under the inode lru traversing context
netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags
netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a second writeback flag"
file: fix typo in take_fd() comment
pidfd: prevent creation of pidfds for kthreads
netfs: clean up after renaming FSCACHE_DEBUG config
libfs: fix infinite directory reads for offset dir
nsfs: fix ioctl declaration
fs/netfs/fscache_cookie: add missing "n_accesses" check
filelock: fix name of file_lease slab cache
netfs: Fault in smaller chunks for non-large folio mappings
Linus Torvalds [Wed, 14 Aug 2024 15:57:24 +0000 (08:57 -0700)]
Merge tag 'bpf-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Fix bpftrace regression from Kyle Huey.
Tracing bpf prog was called with perf_event input arguments causing
bpftrace produce garbage output.
- Fix verifier crash in stacksafe() from Yonghong Song.
Daniel Hodges reported verifier crash when playing with sched-ext.
The stack depth in the known verifier state was larger than stack
depth in being explored state causing out-of-bounds access.
- Fix update of freplace prog in prog_array from Leon Hwang.
freplace prog type wasn't recognized correctly.
* tag 'bpf-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
perf/bpf: Don't call bpf_overflow_handler() for tracing events
selftests/bpf: Add a test to verify previous stacksafe() fix
bpf: Fix a kernel verifier crash in stacksafe()
bpf: Fix updating attached freplace prog in prog_array map
Sense data can be in either fixed format or descriptor format.
SAT-6 revision 1, "10.4.6 Control mode page", defines the D_SENSE bit:
"The SATL shall support this bit as defined in SPC-5 with the following
exception: if the D_ SENSE bit is set to zero (i.e., fixed format sense
data), then the SATL should return fixed format sense data for ATA
PASS-THROUGH commands."
The libata SATL has always kept D_SENSE set to zero by default. (It is
however possible to change the value using a MODE SELECT SG_IO command.)
Failed ATA PASS-THROUGH commands correctly respected the D_SENSE bit,
however, successful ATA PASS-THROUGH commands incorrectly returned the
sense data in descriptor format (regardless of the D_SENSE bit).
Commit 28ab9769117c ("ata: libata-scsi: Honor the D_SENSE bit for
CK_COND=1 and no error") fixed this bug for successful ATA PASS-THROUGH
commands.
However, after commit 28ab9769117c ("ata: libata-scsi: Honor the D_SENSE
bit for CK_COND=1 and no error"), there were bug reports that hdparm,
hddtemp, and udisks were no longer working as expected.
These applications incorrectly assume the returned sense data is in
descriptor format, without even looking at the RESPONSE CODE field in the
returned sense data (to see which format the returned sense data is in).
Considering that there will be broken versions of these applications around
roughly forever, we are stuck with being bug compatible with older kernels.
This patch is based on the discussions between Neal Cardwell and
Eric Dumazet in the link
https://lore.kernel.org/netdev/20240726204105.1466841[email protected]/
It was correctly pointed out that tp->window_clamp would not be
updated in cases where net.ipv4.tcp_moderate_rcvbuf=0 or if
(copied <= tp->rcvq_space.space). While it is expected for most
setups to leave the sysctl enabled, the latter condition may
not end up hitting depending on the TCP receive queue size and
the pattern of arriving data.
The updated check should be hit only on initial MSS update from
TCP_MIN_MSS to measured MSS value and subsequently if there was
an update to a larger value.
Fixes: 05f76b2d634e ("tcp: Adjust clamping window for applications specifying SO_RCVBUF") Signed-off-by: Sean Tranchetti <[email protected]> Signed-off-by: Subash Abhinov Kasiviswanathan <[email protected]> Acked-by: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Hans de Goede [Sun, 21 Jul 2024 15:38:40 +0000 (17:38 +0200)]
media: atomisp: Fix streaming no longer working on BYT / ISP2400 devices
Commit a0821ca14bb8 ("media: atomisp: Remove test pattern generator (TPG)
support") broke BYT support because it removed a seemingly unused field
from struct sh_css_sp_config and a seemingly unused value from enum
ia_css_input_mode.
But these are part of the ABI between the kernel and firmware on ISP2400
and this part of the TPG support removal changes broke ISP2400 support.
ISP2401 support was not affected because on ISP2401 only a part of
struct sh_css_sp_config is used.
Restore the removed field and enum value to fix this.
ssn_offset field is u32 and is placed into the netlink response with
nla_put_u32(), but only 2 bytes are reserved for the attribute payload
in subflow_get_info_size() (even though it makes no difference
in the end, as it is aligned up to 4 bytes). Supply the correct
argument to the relevant nla_total_size() call to make it less
confusing.
Linus Torvalds [Tue, 13 Aug 2024 23:10:32 +0000 (16:10 -0700)]
Merge tag 'execve-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve fixes from Kees Cook:
- binfmt_flat: Fix corruption when not offsetting data start
- exec: Fix ToCToU between perm check and set-uid/gid usage
* tag 'execve-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
exec: Fix ToCToU between perm check and set-uid/gid usage
binfmt_flat: Fix corruption when not offsetting data start
Kees Cook [Thu, 8 Aug 2024 18:39:08 +0000 (11:39 -0700)]
exec: Fix ToCToU between perm check and set-uid/gid usage
When opening a file for exec via do_filp_open(), permission checking is
done against the file's metadata at that moment, and on success, a file
pointer is passed back. Much later in the execve() code path, the file
metadata (specifically mode, uid, and gid) is used to determine if/how
to set the uid and gid. However, those values may have changed since the
permissions check, meaning the execution may gain unintended privileges.
For example, if a file could change permissions from executable and not
set-id:
---------x 1 root root 16048 Aug 7 13:16 target
to set-id and non-executable:
---S------ 1 root root 16048 Aug 7 13:16 target
it is possible to gain root privileges when execution should have been
disallowed.
While this race condition is rare in real-world scenarios, it has been
observed (and proven exploitable) when package managers are updating
the setuid bits of installed programs. Such files start with being
world-executable but then are adjusted to be group-exec with a set-uid
bit. For example, "chmod o-x,u+s target" makes "target" executable only
by uid "root" and gid "cdrom", while also becoming setuid-root:
-rwxr-xr-x 1 root cdrom 16048 Aug 7 13:16 target
becomes:
-rwsr-xr-- 1 root cdrom 16048 Aug 7 13:16 target
But racing the chmod means users without group "cdrom" membership can
get the permission to execute "target" just before the chmod, and when
the chmod finishes, the exec reaches brpm_fill_uid(), and performs the
setuid to root, violating the expressed authorization of "only cdrom
group members can setuid to root".
Re-check that we still have execute permissions in case the metadata
has changed. It would be better to keep a copy from the perm-check time,
but until we can do that refactoring, the least-bad option is to do a
full inode_permission() call (under inode lock). It is understood that
this is safe against dead-locks, but hardly optimal.
Kyle Huey [Tue, 13 Aug 2024 15:17:27 +0000 (15:17 +0000)]
perf/bpf: Don't call bpf_overflow_handler() for tracing events
The regressing commit is new in 6.10. It assumed that anytime event->prog
is set bpf_overflow_handler() should be invoked to execute the attached bpf
program. This assumption is false for tracing events, and as a result the
regressing commit broke bpftrace by invoking the bpf handler with garbage
inputs on overflow.
Prior to the regression the overflow handlers formed a chain (of length 0,
1, or 2) and perf_event_set_bpf_handler() (the !tracing case) added
bpf_overflow_handler() to that chain, while perf_event_attach_bpf_prog()
(the tracing case) did not. Both set event->prog. The chain of overflow
handlers was replaced by a single overflow handler slot and a fixed call to
bpf_overflow_handler() when appropriate. This modifies the condition there
to check event->prog->type == BPF_PROG_TYPE_PERF_EVENT, restoring the
previous behavior and fixing bpftrace.
Li RongQing [Thu, 11 Jul 2024 12:11:30 +0000 (20:11 +0800)]
KVM: eventfd: Use synchronize_srcu_expedited() on shutdown
When hot-unplug a device which has many queues, and guest CPU will has
huge jitter, and unplugging is very slow.
It turns out synchronize_srcu() in irqfd_shutdown() caused the guest
jitter and unplugging latency, so replace synchronize_srcu() with
synchronize_srcu_expedited(), to accelerate the unplugging, and reduce
the guest OS jitter, this accelerates the VM reboot too.
Michal Luczaj [Fri, 2 Aug 2024 20:29:41 +0000 (13:29 -0700)]
KVM: selftests: Add a testcase to verify x2APIC is fully readonly
Add a test to verify that userspace can't change a vCPU's x2APIC ID by
abusing KVM_SET_LAPIC. KVM models the x2APIC ID (and x2APIC LDR) as
readonly, and silently ignores userspace attempts to change the x2APIC ID
for backwards compatibility.
Ignore the userspace provided x2APIC ID when fixing up APIC state for
KVM_SET_LAPIC, i.e. make the x2APIC fully readonly in KVM. Commit a92e2543d6a8 ("KVM: x86: use hardware-compatible format for APIC ID
register"), which added the fixup, didn't intend to allow userspace to
modify the x2APIC ID. In fact, that commit is when KVM first started
treating the x2APIC ID as readonly, apparently to fix some race:
static inline u32 kvm_apic_id(struct kvm_lapic *apic)
{
- return (kvm_lapic_get_reg(apic, APIC_ID) >> 24) & 0xff;
+ /* To avoid a race between apic_base and following APIC_ID update when
+ * switching to x2apic_mode, the x2apic mode returns initial x2apic id.
+ */
+ if (apic_x2apic_mode(apic))
+ return apic->vcpu->vcpu_id;
+
+ return kvm_lapic_get_reg(apic, APIC_ID) >> 24;
}
Furthermore, KVM doesn't support delivering interrupts to vCPUs with a
modified x2APIC ID, but KVM *does* return the modified value on a guest
RDMSR and for KVM_GET_LAPIC. I.e. no remotely sane setup can actually
work with a modified x2APIC ID.
Making the x2APIC ID fully readonly fixes a WARN in KVM's optimized map
calculation, which expects the LDR to align with the x2APIC ID.
WARNING: CPU: 2 PID: 958 at arch/x86/kvm/lapic.c:331 kvm_recalculate_apic_map+0x609/0xa00 [kvm]
CPU: 2 PID: 958 Comm: recalc_apic_map Not tainted 6.4.0-rc3-vanilla+ #35
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.2-1-1 04/01/2014
RIP: 0010:kvm_recalculate_apic_map+0x609/0xa00 [kvm]
Call Trace:
<TASK>
kvm_apic_set_state+0x1cf/0x5b0 [kvm]
kvm_arch_vcpu_ioctl+0x1806/0x2100 [kvm]
kvm_vcpu_ioctl+0x663/0x8a0 [kvm]
__x64_sys_ioctl+0xb8/0xf0
do_syscall_64+0x56/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fade8b9dd6f
Unfortunately, the WARN can still trigger for other CPUs than the current
one by racing against KVM_SET_LAPIC, so remove it completely.
David Thompson [Fri, 9 Aug 2024 16:36:12 +0000 (12:36 -0400)]
mlxbf_gige: disable RX filters until RX path initialized
A recent change to the driver exposed a bug where the MAC RX
filters (unicast MAC, broadcast MAC, and multicast MAC) are
configured and enabled before the RX path is fully initialized.
The result of this bug is that after the PHY is started packets
that match these MAC RX filters start to flow into the RX FIFO.
And then, after rx_init() is completed, these packets will go
into the driver RX ring as well. If enough packets are received
to fill the RX ring (default size is 128 packets) before the call
to request_irq() completes, the driver RX function becomes stuck.
This bug is intermittent but is most likely to be seen where the
oob_net0 interface is connected to a busy network with lots of
broadcast and multicast traffic.
All the MAC RX filters must be disabled until the RX path is ready,
i.e. all initialization is done and all the IRQs are installed.
Naohiro Aota [Fri, 9 Aug 2024 07:54:22 +0000 (16:54 +0900)]
btrfs: fix invalid mapping of extent xarray state
In __extent_writepage_io(), we call btrfs_set_range_writeback() ->
folio_start_writeback(), which clears PAGECACHE_TAG_DIRTY mark from the
mapping xarray if the folio is not dirty. This worked fine before commit 97713b1a2ced ("btrfs: do not clear page dirty inside
extent_write_locked_range()").
After the commit, however, the folio is still dirty at this point, so the
mapping DIRTY tag is not cleared anymore. Then, __extent_writepage_io()
calls btrfs_folio_clear_dirty() to clear the folio's dirty flag. That
results in the page being unlocked with a "strange" state. The page is not
PageDirty, but the mapping tag is set as PAGECACHE_TAG_DIRTY.
This strange state looks like causing a hang with a call trace below when
running fstests generic/091 on a null_blk device. It is waiting for a folio
lock.
While I don't have an exact relation between this hang and the strange
state, fixing the state also fixes the hang. And, that state is worth
fixing anyway.
This commit reorders btrfs_folio_clear_dirty() and
btrfs_set_range_writeback() in __extent_writepage_io(), so that the
PAGECACHE_TAG_DIRTY tag is properly removed from the xarray.
Phillip Lougher [Sun, 11 Aug 2024 23:28:21 +0000 (00:28 +0100)]
Squashfs: sanity check symbolic link size
Syzkiller reports a "KMSAN: uninit-value in pick_link" bug.
This is caused by an uninitialised page, which is ultimately caused
by a corrupted symbolic link size read from disk.
The reason why the corrupted symlink size causes an uninitialised
page is due to the following sequence of events:
1. squashfs_read_inode() is called to read the symbolic
link from disk. This assigns the corrupted value 3875536935 to inode->i_size.
2. Later squashfs_symlink_read_folio() is called, which assigns
this corrupted value to the length variable, which being a
signed int, overflows producing a negative number.
3. The following loop that fills in the page contents checks that
the copied bytes is less than length, which being negative means
the loop is skipped, producing an uninitialised page.
This patch adds a sanity check which checks that the symbolic
link size is not larger than expected.
If a program is watching a file on a 9p mount, it won't see any change in
size if the file being exported by the server is changed directly in the
source filesystem, presumably because 9p doesn't have change notifications,
and because netfs skips the reads if the file is empty.
Fix this by attempting to read the full size specified when a DIO read is
requested (such as when 9p is operating in unbuffered mode) and dealing
with a short read if the EOF was less than the expected read.
To make this work, filesystems using netfslib must not set
NETFS_SREQ_CLEAR_TAIL if performing a DIO read where that read hit the EOF.
I don't want to mandatorily clear this flag in netfslib for DIO because,
say, ceph might make a read from an object that is not completely filled,
but does not reside at the end of file - and so we need to clear the
excess.
This can be tested by watching an empty file over 9p within a VM (such as
in the ktest framework):
while true; do read content; if [ -n "$content" ]; then echo $content; break; fi; done < /host/tmp/foo
then writing something into the empty file. The watcher should immediately
display the file content and break out of the loop. Without this fix, it
remains in the loop indefinitely.
Zhihao Cheng [Fri, 9 Aug 2024 03:16:28 +0000 (11:16 +0800)]
vfs: Don't evict inode under the inode lru traversing context
The inode reclaiming process(See function prune_icache_sb) collects all
reclaimable inodes and mark them with I_FREEING flag at first, at that
time, other processes will be stuck if they try getting these inodes
(See function find_inode_fast), then the reclaiming process destroy the
inodes by function dispose_list(). Some filesystems(eg. ext4 with
ea_inode feature, ubifs with xattr) may do inode lookup in the inode
evicting callback function, if the inode lookup is operated under the
inode lru traversing context, deadlock problems may happen.
Case 1: In function ext4_evict_inode(), the ea inode lookup could happen
if ea_inode feature is enabled, the lookup process will be stuck
under the evicting context like this:
1. File A has inode i_reg and an ea inode i_ea
2. getfattr(A, xattr_buf) // i_ea is added into lru // lru->i_ea
3. Then, following three processes running like this:
PA PB
echo 2 > /proc/sys/vm/drop_caches
shrink_slab
prune_dcache_sb
// i_reg is added into lru, lru->i_ea->i_reg
prune_icache_sb
list_lru_walk_one
inode_lru_isolate
i_ea->i_state |= I_FREEING // set inode state
inode_lru_isolate
__iget(i_reg)
spin_unlock(&i_reg->i_lock)
spin_unlock(lru_lock)
rm file A
i_reg->nlink = 0
iput(i_reg) // i_reg->nlink is 0, do evict
ext4_evict_inode
ext4_xattr_delete_inode
ext4_xattr_inode_dec_ref_all
ext4_xattr_inode_iget
ext4_iget(i_ea->i_ino)
iget_locked
find_inode_fast
__wait_on_freeing_inode(i_ea) ----→ AA deadlock
dispose_list // cannot be executed by prune_icache_sb
wake_up_bit(&i_ea->i_state)
Case 2: In deleted inode writing function ubifs_jnl_write_inode(), file
deleting process holds BASEHD's wbuf->io_mutex while getting the
xattr inode, which could race with inode reclaiming process(The
reclaiming process could try locking BASEHD's wbuf->io_mutex in
inode evicting function), then an ABBA deadlock problem would
happen as following:
1. File A has inode ia and a xattr(with inode ixa), regular file B has
inode ib and a xattr.
2. getfattr(A, xattr_buf) // ixa is added into lru // lru->ixa
3. Then, following three processes running like this:
PA PB PC
echo 2 > /proc/sys/vm/drop_caches
shrink_slab
prune_dcache_sb
// ib and ia are added into lru, lru->ixa->ib->ia
prune_icache_sb
list_lru_walk_one
inode_lru_isolate
ixa->i_state |= I_FREEING // set inode state
inode_lru_isolate
__iget(ib)
spin_unlock(&ib->i_lock)
spin_unlock(lru_lock)
rm file B
ib->nlink = 0
rm file A
iput(ia)
ubifs_evict_inode(ia)
ubifs_jnl_delete_inode(ia)
ubifs_jnl_write_inode(ia)
make_reservation(BASEHD) // Lock wbuf->io_mutex
ubifs_iget(ixa->i_ino)
iget_locked
find_inode_fast
__wait_on_freeing_inode(ixa)
| iput(ib) // ib->nlink is 0, do evict
| ubifs_evict_inode
| ubifs_jnl_delete_inode(ib)
↓ ubifs_jnl_write_inode
ABBA deadlock ←-----make_reservation(BASEHD)
dispose_list // cannot be executed by prune_icache_sb
wake_up_bit(&ixa->i_state)
Fix the possible deadlock by using new inode state flag I_LRU_ISOLATING
to pin the inode in memory while inode_lru_isolate() reclaims its pages
instead of using ordinary inode reference. This way inode deletion
cannot be triggered from inode_lru_isolate() thus avoiding the deadlock.
evict() is made to wait for I_LRU_ISOLATING to be cleared before
proceeding with inode cleanup.
Filipe Manana [Mon, 12 Aug 2024 13:18:06 +0000 (14:18 +0100)]
btrfs: send: allow cloning non-aligned extent if it ends at i_size
If we a find that an extent is shared but its end offset is not sector
size aligned, then we don't clone it and issue write operations instead.
This is because the reflink (remap_file_range) operation does not allow
to clone unaligned ranges, except if the end offset of the range matches
the i_size of the source and destination files (and the start offset is
sector size aligned).
While this is not incorrect because send can only guarantee that a file
has the same data in the source and destination snapshots, it's not
optimal and generates confusion and surprising behaviour for users.
For example, running this test:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV
mount $DEV $MNT
# Use a file size not aligned to any possible sector size.
file_size=$((1 * 1024 * 1024 + 5)) # 1MB + 5 bytes
dd if=/dev/random of=$MNT/foo bs=$file_size count=1
cp --reflink=always $MNT/foo $MNT/bar
Filipe Manana [Sun, 11 Aug 2024 10:53:42 +0000 (11:53 +0100)]
btrfs: only run the extent map shrinker from kswapd tasks
Currently the extent map shrinker can be run by any task when attempting
to allocate memory and there's enough memory pressure to trigger it.
To avoid too much latency we stop iterating over extent maps and removing
them once the task needs to reschedule. This logic was introduced in commit b3ebb9b7e92a ("btrfs: stop extent map shrinker if reschedule is needed").
While that solved high latency problems for some use cases, it's still
not enough because with a too high number of tasks entering the extent map
shrinker code, either due to memory allocations or because they are a
kswapd task, we end up having a very high level of contention on some
spin locks, namely:
1) The fs_info->fs_roots_radix_lock spin lock, which we need to find
roots to iterate over their inodes;
2) The spin lock of the xarray used to track open inodes for a root
(struct btrfs_root::inodes) - on 6.10 kernels and below, it used to
be a red black tree and the spin lock was root->inode_lock;
3) The fs_info->delayed_iput_lock spin lock since the shrinker adds
delayed iputs (calls btrfs_add_delayed_iput()).
Instead of allowing the extent map shrinker to be run by any task, make
it run only by kswapd tasks. This still solves the problem of running
into OOM situations due to an unbounded extent map creation, which is
simple to trigger by direct IO writes, as described in the changelog
of commit 956a17d9d050 ("btrfs: add a shrinker for extent maps"), and
by a similar case when doing buffered IO on files with a very large
number of holes (keeping the file open and creating many holes, whose
extent maps are only released when the file is closed).
Qu Wenruo [Sun, 11 Aug 2024 23:22:44 +0000 (08:52 +0930)]
btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type
[REPORT]
There is a bug report that kernel is rejecting a mismatching inode mode
and its dir item:
[ 1881.553937] BTRFS critical (device dm-0): inode mode mismatch with
dir: inode mode=040700 btrfs type=2 dir type=0
[CAUSE]
It looks like the inode mode is correct, while the dir item type
0 is BTRFS_FT_UNKNOWN, which should not be generated by btrfs at all.
This may be caused by a memory bit flip.
[ENHANCEMENT]
Although tree-checker is not able to do any cross-leaf verification, for
this particular case we can at least reject any dir type with
BTRFS_FT_UNKNOWN.
So here we enhance the dir type check from [0, BTRFS_FT_MAX), to
(0, BTRFS_FT_MAX).
Although the existing corruption can not be fixed just by such enhanced
checking, it should prevent the same 0x2->0x0 bitflip for dir type to
reach disk in the future.
Josef Bacik [Thu, 11 Apr 2024 20:41:20 +0000 (16:41 -0400)]
btrfs: check delayed refs when we're checking if a ref exists
In the patch 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete
resume") I added some code to handle file systems that had been
corrupted by a bug that incorrectly skipped updating the drop progress
key while dropping a snapshot. This code would check to see if we had
already deleted our reference for a child block, and skip the deletion
if we had already.
Unfortunately there is a bug, as the check would only check the on-disk
references. I made an incorrect assumption that blocks in an already
deleted snapshot that was having the deletion resume on mount wouldn't
be modified.
If we have 2 pending deleted snapshots that share blocks, we can easily
modify the rules for a block. Take the following example
subvolume a exists, and subvolume b is a snapshot of subvolume a. They
share references to block 1. Block 1 will have 2 full references, one
for subvolume a and one for subvolume b, and it belongs to subvolume a
(btrfs_header_owner(block 1) == subvolume a).
When deleting subvolume a, we will drop our full reference for block 1,
and because we are the owner we will drop our full reference for all of
block 1's children, convert block 1 to FULL BACKREF, and add a shared
reference to all of block 1's children.
Then we will start the snapshot deletion of subvolume b. We look up the
extent info for block 1, which checks delayed refs and tells us that
FULL BACKREF is set, so sets parent to the bytenr of block 1. However
because this is a resumed snapshot deletion, we call into
check_ref_exists(). Because check_ref_exists() only looks at the disk,
it doesn't find the shared backref for the child of block 1, and thus
returns 0 and we skip deleting the reference for the child of block 1
and continue. This orphans the child of block 1.
The fix is to lookup the delayed refs, similar to what we do in
btrfs_lookup_extent_info(). However we only care about whether the
reference exists or not. If we fail to find our reference on disk, go
look up the bytenr in the delayed refs, and if it exists look for an
existing ref in the delayed ref head. If that exists then we know we
can delete the reference safely and carry on. If it doesn't exist we
know we have to skip over this block.
This bug has existed since I introduced this fix, however requires
having multiple deleted snapshots pending when we unmount. We noticed
this in production because our shutdown path stops the container on the
system, which deletes a bunch of subvolumes, and then reboots the box.
This gives us plenty of opportunities to hit this issue. Looking at the
history we've seen this occasionally in production, but we had a big
spike recently thanks to faster machines getting jobs with multiple
subvolumes in the job.
Chris Mason wrote a reproducer which does the following
On the second loop this would fail consistently, with my patch it has
been running for hours and hasn't failed.
I also used dm-log-writes to capture the state of the failure so I could
debug the problem. Using the existing failure case to test my patch
validated that it fixes the problem.
Long Li [Fri, 9 Aug 2024 15:58:58 +0000 (08:58 -0700)]
net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings
After napi_complete_done() is called when NAPI is polling in the current
process context, another NAPI may be scheduled and start running in
softirq on another CPU and may ring the doorbell before the current CPU
does. When combined with unnecessary rings when there is no need to arm
the CQ, it triggers error paths in the hardware.
This patch fixes this by calling napi_complete_done() after doorbell
rings. It limits the number of unnecessary rings when there is
no need to arm. MANA hardware specifies that there must be one doorbell
ring every 8 CQ wraparounds. This driver guarantees one doorbell ring as
soon as the number of consumed CQEs exceeds 4 CQ wraparounds. In practical
workloads, the 4 CQ wraparounds proves to be big enough that it rarely
exceeds this limit before all the napi weight is consumed.
To implement this, add a per-CQ counter cq->work_done_since_doorbell,
and make sure the CQ is armed as soon as passing 4 wraparounds of the CQ.
Kalle Valo [Tue, 13 Aug 2024 09:51:21 +0000 (12:51 +0300)]
Merge tag 'ath-current-20240812' of git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath
ath.git patch for v6.11
We have a single patch for the next 6.11-rc which introduces a
workaround to ath12k which addresses a WCN7850 hardware issue that
prevents proper operation with unaligned transmit buffers.
Benjamin Berg [Mon, 12 Aug 2024 11:06:40 +0000 (13:06 +0200)]
wifi: iwlwifi: correctly lookup DMA address in SG table
The code to lookup the scatter gather table entry assumed that it was
possible to use sg_virt() in order to lookup the DMA address in a mapped
scatter gather table. However, this assumption is incorrect as the DMA
mapping code may merge multiple entries into one. In that case, the DMA
address space may have e.g. two consecutive pages which is correctly
represented by the scatter gather list entry, however the virtual
addresses for these two pages may differ and the relationship cannot be
resolved anymore.
Avoid this problem entirely by working with the offset into the mapped
area instead of using virtual addresses. With that we only use the DMA
length and DMA address from the scatter gather list entries. The
underlying DMA/IOMMU code is therefore free to merge two entries into
one even if the virtual addresses space for the area is not continuous.
Yonghong Song [Mon, 12 Aug 2024 21:48:52 +0000 (14:48 -0700)]
selftests/bpf: Add a test to verify previous stacksafe() fix
A selftest is added such that without the previous patch,
a crash can happen. With the previous patch, the test can
run successfully. The new test is written in a way which
mimics original crash case:
main_prog
static_prog_1
static_prog_2
where static_prog_1 has different paths to static_prog_2
and some path has stack allocated and some other path
does not. A stacksafe() checking in static_prog_2()
triggered the crash.
Yonghong Song [Mon, 12 Aug 2024 21:48:47 +0000 (14:48 -0700)]
bpf: Fix a kernel verifier crash in stacksafe()
Daniel Hodges reported a kernel verifier crash when playing with sched-ext.
Further investigation shows that the crash is due to invalid memory access
in stacksafe(). More specifically, it is the following code:
The 'i' iterates old->allocated_stack.
If cur->allocated_stack < old->allocated_stack the out-of-bound
access will happen.
To fix the issue add 'i >= cur->allocated_stack' check such that if
the condition is true, stacksafe() should fail. Otherwise,
cur->stack[spi].slot_type[i % BPF_REG_SIZE] memory access is legal.
Leon Hwang [Sun, 28 Jul 2024 11:46:11 +0000 (19:46 +0800)]
bpf: Fix updating attached freplace prog in prog_array map
The commit f7866c358733 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT")
fixed a NULL pointer dereference panic, but didn't fix the issue that
fails to update attached freplace prog to prog_array map.
Since commit 1c123c567fb1 ("bpf: Resolve fext program type when checking map compatibility"),
freplace prog and its target prog are able to tail call each other.
And the commit 3aac1ead5eb6 ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach")
sets prog->aux->dst_prog as NULL after attaching freplace prog to its
target prog.
After loading freplace the prog_array's owner type is BPF_PROG_TYPE_SCHED_CLS.
Then, after attaching freplace its prog->aux->dst_prog is NULL.
Then, while updating freplace in prog_array the bpf_prog_map_compatible()
incorrectly returns false because resolve_prog_type() returns
BPF_PROG_TYPE_EXT instead of BPF_PROG_TYPE_SCHED_CLS.
After this patch the resolve_prog_type() returns BPF_PROG_TYPE_SCHED_CLS
and update to prog_array can succeed.
David Howells [Wed, 7 Aug 2024 18:38:46 +0000 (19:38 +0100)]
netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags
The NETFS_RREQ_USE_PGPRIV2 and NETFS_RREQ_WRITE_TO_CACHE flags aren't used
correctly. The problem is that we try to set them up in the request
initialisation, but we the cache may be in the process of setting up still,
and so the state may not be correct. Further, we secondarily sample the
cache state and make contradictory decisions later.
The issue arises because we set up the cache resources, which allows the
cache's ->prepare_read() to switch on NETFS_SREQ_COPY_TO_CACHE - which
triggers cache writing even if we didn't set the flags when allocating.
Fix this in the following way:
(1) Drop NETFS_ICTX_USE_PGPRIV2 and instead set NETFS_RREQ_USE_PGPRIV2 in
->init_request() rather than trying to juggle that in
netfs_alloc_request().
(2) Repurpose NETFS_RREQ_USE_PGPRIV2 to merely indicate that if caching is
to be done, then PG_private_2 is to be used rather than only setting
it if we decide to cache and then having netfs_rreq_unlock_folios()
set the non-PG_private_2 writeback-to-cache if it wasn't set.
(3) Split netfs_rreq_unlock_folios() into two functions, one of which
contains the deprecated code for using PG_private_2 to avoid
accidentally doing the writeback path - and always use it if
USE_PGPRIV2 is set.
(4) As NETFS_ICTX_USE_PGPRIV2 is removed, make netfs_write_begin() always
wait for PG_private_2. This function is deprecated and only used by
ceph anyway, and so label it so.
(5) Drop the NETFS_RREQ_WRITE_TO_CACHE flag and use
fscache_operation_valid() on the cache_resources instead. This has
the advantage of picking up the result of netfs_begin_cache_read() and
fscache_begin_write_operation() - which are called after the object is
initialised and will wait for the cache to come to a usable state.
Just reverting ae678317b95e[1] isn't a sufficient fix, so this need to be
applied on top of that. Without this as well, things like:
rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {
and:
WARNING: CPU: 13 PID: 3621 at fs/ceph/caps.c:3386
may happen, along with some UAFs due to PG_private_2 not getting used to
wait on writeback completion.
Revert the patch that removes the deprecated use of PG_private_2 in
netfslib for the moment as Ceph is actually still using this to track
data copied to the cache.
It's currently possible to create pidfds for kthreads but it is unclear
what that is supposed to mean. Until we have use-cases for it and we
figured out what behavior we want block the creation of pidfds for
kthreads.
netfs: clean up after renaming FSCACHE_DEBUG config
Commit 6b8e61472529 ("netfs: Rename CONFIG_FSCACHE_DEBUG to
CONFIG_NETFS_DEBUG") renames the config, but introduces two issues: First,
NETFS_DEBUG mistakenly depends on the non-existing config NETFS, whereas
the actual intended config is called NETFS_SUPPORT. Second, the config
renaming misses to adjust the documentation of the functionality of this
config.
libfs: fix infinite directory reads for offset dir
After we switch tmpfs dir operations from simple_dir_operations to
simple_offset_dir_operations, every rename happened will fill new dentry
to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free
key starting with octx->newx_offset, and then set newx_offset equals to
free key + 1. This will lead to infinite readdir combine with rename
happened at the same time, which fail generic/736 in xfstests(detail show
as below).
1. create 5000 files(1 2 3...) under one dir
2. call readdir(man 3 readdir) once, and get one entry
3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry)
4. loop 2~3, until readdir return nothing or we loop too many
times(tmpfs break test with the second condition)
We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite
directory reads") to fix it, record the last_index when we open dir, and
do not emit the entry which index >= last_index. The file->private_data
now used in offset dir can use directly to do this, and we also update
the last_index when we llseek the dir file.
This happened because fscache_cookie_state_machine() was slow and was
still running while another process invoked fscache_unuse_cookie();
this led to a fscache_cookie_lru_do_one() call, setting the
FSCACHE_COOKIE_DO_LRU_DISCARD flag, which was picked up by
fscache_cookie_state_machine(), withdrawing the cookie via
cachefiles_withdraw_cookie(), clearing cookie->cache_priv.
At the same time, yet another process invoked
cachefiles_prepare_write(), which found a NULL pointer in this code
line:
During cachefiles_prepare_write(), the "n_accesses" counter is
non-zero (via fscache_begin_operation()). The cookie must not be
withdrawn until it drops to zero.
The counter is checked by fscache_cookie_state_machine() before
switching to FSCACHE_COOKIE_STATE_RELINQUISHING and
FSCACHE_COOKIE_STATE_WITHDRAWING (in "case
FSCACHE_COOKIE_STATE_FAILED"), but not for
FSCACHE_COOKIE_STATE_LRU_DISCARDING ("case
FSCACHE_COOKIE_STATE_ACTIVE").
This patch adds the missing check. With a non-zero access counter,
the function returns and the next fscache_end_cookie_access() call
will queue another fscache_cookie_state_machine() call to handle the
still-pending FSCACHE_COOKIE_DO_LRU_DISCARD.
When struct file_lease was split out from struct file_lock, the name of
the file_lock slab cache was copied to the new slab cache for
file_lease. This name conflict causes confusion in /proc/slabinfo and
/sys/kernel/slab. In particular, it caused failures in drgn's test case
for slab cache merging.
netfs: Fault in smaller chunks for non-large folio mappings
As in commit 4e527d5841e2 ("iomap: fault in smaller chunks for non-large
folio mappings"), we can see a performance loss for filesystems
which have not yet been converted to large folios.
Linus Torvalds [Mon, 12 Aug 2024 15:21:52 +0000 (08:21 -0700)]
Merge tag 'platform-drivers-x86-v6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
"While the ideapad concurrency fix itself is relatively
straightforward, it required moving code around and adding a bit of
supporting infrastructure to have a clean inter-driver interface. This
shows up in the diffstats.
- ideapad-laptop / lenovo-ymc: Protect VPC calls with a mutex
- amd/pmf: Query HPD data also when ALS is disabled"
* tag 'platform-drivers-x86-v6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: ideapad-laptop: add a mutex to synchronize VPC commands
platform/x86: ideapad-laptop: move ymc_trigger_ec from lenovo-ymc
platform/x86: ideapad-laptop: introduce a generic notification chain
platform/x86/amd/pmf: Fix to Update HPD Data When ALS is Disabled
Zheng Zhang [Sat, 10 Aug 2024 05:26:51 +0000 (13:26 +0800)]
net: ethernet: mtk_wed: fix use-after-free panic in mtk_wed_setup_tc_block_cb()
When there are multiple ap interfaces on one band and with WED on,
turning the interface down will cause a kernel panic on MT798X.
Previously, cb_priv was freed in mtk_wed_setup_tc_block() without
marking NULL,and mtk_wed_setup_tc_block_cb() didn't check the value, too.
Assign NULL after free cb_priv in mtk_wed_setup_tc_block() and check NULL
in mtk_wed_setup_tc_block_cb().
----------
Unable to handle kernel paging request at virtual address 0072460bca32b4f5
Call trace:
mtk_wed_setup_tc_block_cb+0x4/0x38
0xffffffc0794084bc
tcf_block_playback_offloads+0x70/0x1e8
tcf_block_unbind+0x6c/0xc8
...
---------
Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support") Signed-off-by: Zheng Zhang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Haiyang Zhang [Fri, 9 Aug 2024 21:01:24 +0000 (14:01 -0700)]
net: mana: Fix RX buf alloc_size alignment and atomic op panic
The MANA driver's RX buffer alloc_size is passed into napi_build_skb() to
create SKB. skb_shinfo(skb) is located at the end of skb, and its alignment
is affected by the alloc_size passed into napi_build_skb(). The size needs
to be aligned properly for better performance and atomic operations.
Otherwise, on ARM64 CPU, for certain MTU settings like 4000, atomic
operations may panic on the skb_shinfo(skb)->dataref due to alignment fault.
To fix this bug, add proper alignment to the alloc_size calculation.
Add missed property phys, which indicate how connect to serdes phy.
Fix below warning:
arch/arm64/boot/dts/freescale/fsl-lx2160a-honeycomb.dtb: fsl-mc@80c000000: dpmacs:ethernet@7: Unevaluated properties are not allowed ('phys' was unexpected)
David S. Miller [Mon, 12 Aug 2024 10:46:56 +0000 (11:46 +0100)]
Merge branch 'vsc73xx-fix-mdio-and-phy'
Pawel Dembicki says:
====================
net: dsa: vsc73xx: fix MDIO bus access and PHY opera
This series are extracted patches from net-next series [0].
The VSC73xx driver has issues with PHY configuration. This patch series
fixes most of them.
The first patch synchronizes the register configuration routine with the
datasheet recommendations.
Patches 2-3 restore proper communication on the MDIO bus. Currently,
the write value isn't sent to the MDIO register, and without a busy check,
communication with the PHY can be interrupted. This causes the PHY to
receive improper configuration and autonegotiation could fail.
The fourth patch removes the PHY reset blockade, as it is no longer
required.
After fixing the MDIO operations, autonegotiation became possible.
The last patch removes the blockade, which became unnecessary after
the MDIO operations fix.
Pawel Dembicki [Fri, 9 Aug 2024 19:38:05 +0000 (21:38 +0200)]
net: dsa: vsc73xx: allow phy resetting
Resetting the VSC73xx PHY was problematic because the MDIO bus, without
a busy check, read and wrote incorrect register values.
My investigation indicates that resetting the PHY only triggers changes
in configuration. However, improper register values written earlier
were only exposed after a soft reset.
The reset itself wasn't the issue; rather, the problem stemmed from
incorrect read and write operations.
A 'soft_reset' can now proceed normally. There are no reasons to keep
the VSC73xx from being reset.
This commit removes the reset blockade in the 'vsc73xx_phy_write'
function.
Pawel Dembicki [Fri, 9 Aug 2024 19:38:04 +0000 (21:38 +0200)]
net: dsa: vsc73xx: check busy flag in MDIO operations
The VSC73xx has a busy flag used during MDIO operations. It is raised
when MDIO read/write operations are in progress. Without it, PHYs are
misconfigured and bus operations do not work as expected.
In axiethernet header fix register defines comment description to be
inline with IP documentation. It updates MAC configuration register,
MDIO configuration register and frame filter control description.
Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Radhey Shyam Pandey <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Linus Torvalds [Sun, 11 Aug 2024 17:20:29 +0000 (10:20 -0700)]
Merge tag 'x86-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
- Fix 32-bit PTI for real.
pti_clone_entry_text() is called twice, once before initcalls so that
initcalls can use the user-mode helper and then again after text is
set read only. Setting read only on 32-bit might break up the PMD
mapping, which makes the second invocation of pti_clone_entry_text()
find the mappings out of sync and failing.
Allow the second call to split the existing PMDs in the user mapping
and synchronize with the kernel mapping.
- Don't make acpi_mp_wake_mailbox read-only after init as the mail box
must be writable in the case that CPU hotplug operations happen after
boot. Otherwise the attempt to start a CPU crashes with a write to
read only memory.
- Add a missing sanity check in mtrr_save_state() to ensure that the
fixed MTRR MSRs are supported.
Otherwise mtrr_save_state() ends up in a #GP, which is fixed up, but
the WARN_ON() can bring systems down when panic on warn is set.
* tag 'x86-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mtrr: Check if fixed MTRRs exist before saving them
x86/paravirt: Fix incorrect virt spinlock setting on bare metal
x86/acpi: Remove __ro_after_init from acpi_mp_wake_mailbox
x86/mm: Fix PTI for i386 some more
Linus Torvalds [Sun, 11 Aug 2024 17:15:34 +0000 (10:15 -0700)]
Merge tag 'timers-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull time keeping fixes from Thomas Gleixner:
- Fix a couple of issues in the NTP code where user supplied values are
neither sanity checked nor clamped to the operating range. This
results in integer overflows and eventualy NTP getting out of sync.
According to the history the sanity checks had been removed in favor
of clamping the values, but the clamping never worked correctly under
all circumstances. The NTP people asked to not bring the sanity
checks back as it might break existing applications.
Make the clamping work correctly and add it where it's missing
- If adjtimex() sets the clock it has to trigger the hrtimer subsystem
so it can adjust and if the clock was set into the future expire
timers if needed. The caller should provide a bitmask to tell
hrtimers which clocks have been adjusted.
adjtimex() uses not the proper constant and uses CLOCK_REALTIME
instead, which is 0. So hrtimers adjusts only the clocks, but does
not check for expired timers, which might make them expire really
late. Use the proper bitmask constant instead.
* tag 'timers-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Fix bogus clock_was_set() invocation in do_adjtimex()
ntp: Safeguard against time_constant overflow
ntp: Clamp maxerror and esterror to operating range
Linus Torvalds [Sun, 11 Aug 2024 17:07:52 +0000 (10:07 -0700)]
Merge tag 'irq-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"Three small fixes for interrupt core and drivers:
- The interrupt core fails to honor caller supplied affinity hints
for non-managed interrupts and uses the system default affinity on
startup instead. Set the missing flag in the descriptor to tell the
core to use the provided affinity.
- Fix a shift out of bounds error in the Xilinx driver
- Handle switching to level trigger correctly in the RISCV APLIC
driver. It failed to retrigger the interrupt which causes it to
become stale"
* tag 'irq-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/riscv-aplic: Retrigger MSI interrupt on source configuration
irqchip/xilinx: Fix shift out of bounds
genirq/irqdesc: Honor caller provided affinity in alloc_desc()
Linus Torvalds [Sun, 11 Aug 2024 16:55:32 +0000 (09:55 -0700)]
Merge tag 'usb-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a number of small USB driver fixes for reported issues for
6.11-rc3. Included in here are:
- usb serial driver MODULE_DESCRIPTION() updates
- usb serial driver fixes
- typec driver fixes
- usb-ip driver fix
- gadget driver fixes
- dt binding update
All of these have been in linux-next with no reported issues"
* tag 'usb-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: ucsi: Fix a deadlock in ucsi_send_command_common()
usb: typec: tcpm: avoid sink goto SNK_UNATTACHED state if not received source capability message
usb: gadget: f_fs: pull out f->disable() from ffs_func_set_alt()
usb: gadget: f_fs: restore ffs_func_disable() functionality
USB: serial: debug: do not echo input by default
usb: typec: tipd: Delete extra semi-colon
usb: typec: tipd: Fix dereferencing freeing memory in tps6598x_apply_patch()
usb: gadget: u_serial: Set start_delayed during suspend
usb: typec: tcpci: Fix error code in tcpci_check_std_output_cap()
usb: typec: fsa4480: Check if the chip is really there
usb: gadget: core: Check for unset descriptor
usb: vhci-hcd: Do not drop references before new references are gained
usb: gadget: u_audio: Check return codes from usb_ep_enable and config_ep_by_speed.
usb: gadget: midi2: Fix the response for FB info with block 0xff
dt-bindings: usb: microchip,usb2514: Add USB2517 compatible
USB: serial: garmin_gps: use struct_size() to allocate pkt
USB: serial: garmin_gps: annotate struct garmin_packet with __counted_by
USB: serial: add missing MODULE_DESCRIPTION() macros
USB: serial: spcp8x5: remove unused struct 'spcp8x5_usb_ctrl_arg'
Linus Torvalds [Sun, 11 Aug 2024 16:51:29 +0000 (09:51 -0700)]
Merge tag 'tty-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes for reported problems
for 6.11-rc3. Included in here are:
- sc16is7xx serial driver fixes
- uartclk bugfix for a divide by zero issue
- conmakehash userspace build issue fix
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: vt: conmakehash: cope with abs_srctree no longer in env
serial: sc16is7xx: fix invalid FIFO access with special register set
serial: sc16is7xx: fix TX fifo corruption
serial: core: check uartclk for zero to avoid divide by zero
Linus Torvalds [Sun, 11 Aug 2024 16:38:38 +0000 (09:38 -0700)]
Merge tag 'driver-core-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core / documentation fixes from Greg KH:
"Here are some small fixes, and some documentation updates for
6.11-rc3. Included in here are:
- embargoed hardware documenation updates based on a lot of review by
legal-types in lots of companies to try to make the process a _bit_
easier for us to manage over time.
- rust firmware documentation fix
- driver detach race fix for the fix that went into 6.11-rc1
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: Fix uevent_show() vs driver detach race
Documentation: embargoed-hardware-issues.rst: add a section documenting the "early access" process
Documentation: embargoed-hardware-issues.rst: minor cleanups and fixes
rust: firmware: fix invalid rustdoc link