Linus Torvalds [Fri, 15 Nov 2024 17:59:51 +0000 (09:59 -0800)]
Merge tag 'sched_ext-for-6.12-rc7-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fix from Tejun Heo:
"One more fix for v6.12-rc7
ops.cpu_acquire() was being invoked with the wrong kfunc mask allowing
the operation to call kfuncs which shouldn't be allowed. Fix it by
using SCX_KF_REST instead, which is trivial and low risk"
* tag 'sched_ext-for-6.12-rc7-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: ops.cpu_acquire() should be called with SCX_KF_REST
Tejun Heo [Thu, 14 Nov 2024 18:50:58 +0000 (08:50 -1000)]
sched_ext: ops.cpu_acquire() should be called with SCX_KF_REST
ops.cpu_acquire() is currently called with 0 kf_maks which is interpreted as
SCX_KF_UNLOCKED which allows all unlocked kfuncs, but ops.cpu_acquire() is
called from balance_one() under the rq lock and should only be allowed call
kfuncs that are safe under the rq lock. Update it to use SCX_KF_REST.
Linus Torvalds [Thu, 14 Nov 2024 18:00:23 +0000 (10:00 -0800)]
Merge tag 'bcachefs-2024-11-13' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"This fixes one minor regression from the btree cache fixes (in the
scan_for_btree_nodes repair path) - and the shutdown path fix is the
big one here, in terms of bugs closed:
The shutdown path wasn't flushing the btree write buffer, leading
to shutting down while we still had operations in flight. This
fixes a whole slew of syzbot bugs, and undoubtedly other strange
heisenbugs.
* tag 'bcachefs-2024-11-13' of git://evilpiepirate.org/bcachefs:
bcachefs: Fix assertion pop in bch2_ptr_swab()
bcachefs: Fix journal_entry_dev_usage_to_text() overrun
bcachefs: Allow for unknown key types in backpointers fsck
bcachefs: Fix assertion pop in topology repair
bcachefs: Fix hidden btree errors when reading roots
bcachefs: Fix validate_bset() repair path
bcachefs: Fix missing validation for bch_backpointer.level
bcachefs: Fix bch_member.btree_bitmap_shift validation
bcachefs: bch2_btree_write_buffer_flush_going_ro()
Josef Bacik [Wed, 13 Nov 2024 16:05:13 +0000 (11:05 -0500)]
btrfs: fix incorrect comparison for delayed refs
When I reworked delayed ref comparison in cf4f04325b2b ("btrfs: move
->parent and ->ref_root into btrfs_delayed_ref_node"), I made a mistake
and returned -1 for the case where ref1->ref_root was > than
ref2->ref_root. This is a subtle bug that can result in improper
delayed ref running order, which can result in transaction aborts.
net: sched: u32: Add test case for systematic hnode IDR leaks
Add a tdc test case to exercise the just-fixed systematic leak of
IDR entries in u32 hnode disposal. Given the IDR in question is
confined to the range [1..0x7FF], it is sufficient to create/delete
the same filter 2048 times to fill it up and get a nonzero exit
status from "tc filter add".
====================
bonding: fix ns targets not work on hardware NIC
The first patch fixed ns targets not work on hardware NIC when bonding
set arp_validate.
The second patch add a related selftest for bonding.
v4: Thanks Nikolay for the comments:
use bond_slave_ns_maddrs_{add/del} with clear name
fix comments typos
remove _slave_set_ns_maddrs underscore directly
update bond_option_arp_validate_set() change logic
v3: use ndisc_mc_map to convert the mcast mac address (Jay Vosburgh)
v2: only add/del mcast group on backup slaves when arp_validate is set (Jay Vosburgh)
arp_validate doesn't support 3ad, tlb, alb. So let's only do it on ab mode.
====================
Hangbin Liu [Mon, 11 Nov 2024 10:16:49 +0000 (10:16 +0000)]
bonding: add ns target multicast address to slave device
Commit 4598380f9c54 ("bonding: fix ns validation on backup slaves")
tried to resolve the issue where backup slaves couldn't be brought up when
receiving IPv6 Neighbor Solicitation (NS) messages. However, this fix only
worked for drivers that receive all multicast messages, such as the veth
interface.
For standard drivers, the NS multicast message is silently dropped because
the slave device is not a member of the NS target multicast group.
To address this, we need to make the slave device join the NS target
multicast group, ensuring it can receive these IPv6 NS messages to validate
the slave’s status properly.
There are three policies before joining the multicast group:
1. All settings must be under active-backup mode (alb and tlb do not support
arp_validate), with backup slaves and slaves supporting multicast.
2. We can add or remove multicast groups when arp_validate changes.
3. Other operations, such as enslaving, releasing, or setting NS targets,
need to be guarded by arp_validate.
Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets") Signed-off-by: Hangbin Liu <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
Meghana Malladi [Mon, 11 Nov 2024 09:58:42 +0000 (15:28 +0530)]
net: ti: icssg-prueth: Fix 1 PPS sync
The first PPS latch time needs to be calculated by the driver
(in rounded off seconds) and configured as the start time
offset for the cycle. After synchronizing two PTP clocks
running as master/slave, missing this would cause master
and slave to start immediately with some milliseconds
drift which causes the PPS signal to never synchronize with
the PTP master.
net: stmmac: dwmac-mediatek: Fix inverted handling of mediatek,mac-wol
The mediatek,mac-wol property is being handled backwards to what is
described in the binding: it currently enables PHY WOL when the property
is present and vice versa. Invert the driver logic so it matches the
binding description.
Breno Leitao [Fri, 8 Nov 2024 14:08:36 +0000 (06:08 -0800)]
ipmr: Fix access to mfc_cache_list without lock held
Accessing `mr_table->mfc_cache_list` is protected by an RCU lock. In the
following code flow, the RCU read lock is not held, causing the
following error when `RCU_PROVE` is not held. The same problem might
show up in the IPv6 code path.
6.12.0-rc5-kbuilder-01145-gbac17284bdcb #33 Tainted: G E N
-----------------------------
net/ipv4/ipmr_base.c:313 RCU-list traversed in non-reader section!!
This is not a problem per see, since the RTNL lock is held here, so, it
is safe to iterate in the list without the RCU read lock, as suggested
by Eric.
To alleviate the concern, modify the code to use
list_for_each_entry_rcu() with the RTNL-held argument.
The annotation will raise an error only if RTNL or RCU read lock are
missing during iteration, signaling a legitimate problem, otherwise it
will avoid this false positive.
This will solve the IPv6 case as well, since ip6mr_rtm_dumproute() calls
this function as well.
Wei Fang [Tue, 12 Nov 2024 03:03:47 +0000 (11:03 +0800)]
samples: pktgen: correct dev to DEV
In the pktgen_sample01_simple.sh script, the device variable is uppercase
'DEV' instead of lowercase 'dev'. Because of this typo, the script cannot
enable UDP tx checksum.
net: phylink: ensure PHY momentary link-fails are handled
Normally, phylib won't notify changes in quick succession. However, as
a result of commit 3e43b903da04 ("net: phy: Immediately call
adjust_link if only tx_lpi_enabled changes") this is no longer true -
it is now possible that phy_link_down() and phy_link_up() will both
complete before phylink's resolver has run, which means it'll miss that
pl->phy_state.link momentarily became false.
Rename "mac_link_dropped" to be more generic "link_failed" since it will
cover more than the MAC/PCS end of the link failing, and arrange to set
this in phylink_phy_change() if we notice that the PHY reports that the
link is down.
This will ensure that we capture an EEE reconfiguration event.
Jakub Kicinski [Thu, 14 Nov 2024 02:51:09 +0000 (18:51 -0800)]
Merge branch 'mptcp-pm-a-few-more-fixes'
Matthieu Baerts says:
====================
mptcp: pm: a few more fixes
Three small fixes related to the MPTCP path-manager:
- Patch 1: correctly reflect the backup flag to the corresponding local
address entry of the userspace path-manager. A fix for v5.19.
- Patch 2: hold the PM lock when deleting an entry from the local
addresses of the userspace path-manager to avoid messing up with this
list. A fix for v5.19.
- Patch 3: use _rcu variant to iterate the in-kernel path-manager's
local addresses list, when under rcu_read_lock(). A fix for v5.17.
====================
In mptcp_pm_create_subflow_or_signal_addr(), rcu_read_(un)lock() are
used as expected to iterate over the list of local addresses, but
list_for_each_entry() was used instead of list_for_each_entry_rcu() in
__lookup_addr(). It is important to use this variant which adds the
required READ_ONCE() (and diagnostic checks if enabled).
Because __lookup_addr() is also used in mptcp_pm_nl_set_flags() where it
is called under the pernet->lock and not rcu_read_lock(), an extra
condition is then passed to help the diagnostic checks making sure
either the associated spin lock or the RCU lock is held.
Geliang Tang [Tue, 12 Nov 2024 19:18:33 +0000 (20:18 +0100)]
mptcp: update local address flags when setting it
Just like in-kernel pm, when userspace pm does set_flags, it needs to send
out MP_PRIO signal, and also modify the flags of the corresponding address
entry in the local address list. This patch implements the missing logic.
Traverse all address entries on userspace_pm_local_addr_list to find the
local address entry, if bkup is true, set the flags of this entry with
FLAG_BACKUP, otherwise, clear FLAG_BACKUP.
Linus Torvalds [Wed, 13 Nov 2024 21:32:51 +0000 (13:32 -0800)]
Merge tag 'pm-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix a locking issue in the asymmetric CPU capacity setup code in the
intel_pstate driver that may lead to a deadlock if CPU online/offline
runs in parallel with the code in question, which is unlikely but not
impossible (Rafael Wysocki)"
* tag 'pm-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Rearrange locking in hybrid_init_cpu_capacity_scaling()
Linus Torvalds [Wed, 13 Nov 2024 21:28:58 +0000 (13:28 -0800)]
Merge tag 'tpmdd-next-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm fixes from Jarkko Sakkinen:
"Two bug fixes for TPM bus encryption (the remaining reported issues in
the feature)"
* tag 'tpmdd-next-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm: Disable TPM on tpm2_create_primary() failure
tpm: Opt-in in disable PCR integrity protection
Jarkko Sakkinen [Wed, 13 Nov 2024 05:54:12 +0000 (07:54 +0200)]
tpm: Opt-in in disable PCR integrity protection
The initial HMAC session feature added TPM bus encryption and/or integrity
protection to various in-kernel TPM operations. This can cause performance
bottlenecks with IMA, as it heavily utilizes PCR extend operations.
In order to mitigate this performance issue, introduce a kernel
command-line parameter to the TPM driver for disabling the integrity
protection for PCR extend operations (i.e. TPM2_PCR_Extend).
Linus Torvalds [Wed, 13 Nov 2024 17:14:19 +0000 (09:14 -0800)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Daniel Borkmann:
- Fix a mismatching RCU unlock flavor in bpf_out_neigh_v6 (Jiawei Ye)
- Fix BPF sockmap with kTLS to reject vsock and unix sockets upon kTLS
context retrieval (Zijian Zhang)
- Fix BPF bits iterator selftest for s390x (Hou Tao)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix mismatched RCU unlock flavour in bpf_out_neigh_v6
bpf: Add sk_is_inet and IS_ICSK check in tls_sw_has_ctx_tx/rx
selftests/bpf: Use -4095 as the bad address for bits iterator
Linus Torvalds [Wed, 13 Nov 2024 17:09:00 +0000 (09:09 -0800)]
Merge tag 'loongarch-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
- fix possible CPUs setup logical-physical CPU mapping, in order to
avoid CPU hotplug issue
- fix some KASAN bugs
- fix AP booting issue in VM mode
- some trivial cleanups
* tag 'loongarch-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Fix AP booting issue in VM mode
LoongArch: Add WriteCombine shadow mapping in KASAN
LoongArch: Disable KASAN if PGDIR_SIZE is too large for cpu_vabits
LoongArch: Make KASAN work with 5-level page-tables
LoongArch: Define a default value for VM_DATA_DEFAULT_FLAGS
LoongArch: Fix early_numa_add_cpu() usage for FDT systems
LoongArch: For all possible CPUs setup logical-physical CPU mapping
Linus Torvalds [Wed, 13 Nov 2024 16:58:11 +0000 (08:58 -0800)]
Merge tag 'mm-hotfixes-stable-2024-11-12-16-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"10 hotfixes, 7 of which are cc:stable. 7 are MM, 3 are not. All
singletons"
* tag 'mm-hotfixes-stable-2024-11-12-16-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: swapfile: fix cluster reclaim work crash on rotational devices
selftests: hugetlb_dio: fixup check for initial conditions to skip in the start
mm/thp: fix deferred split queue not partially_mapped: fix
mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
nommu: pass NULL argument to vma_iter_prealloc()
ocfs2: fix UBSAN warning in ocfs2_verify_volume()
nilfs2: fix null-ptr-deref in block_dirty_buffer tracepoint
nilfs2: fix null-ptr-deref in block_touch_buffer tracepoint
mm: page_alloc: move mlocked flag clearance into free_pages_prepare()
mm: count zeromap read and set for swapout and swapin
net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes.
To generate hnode handles (in gen_new_htid()), u32 uses IDR and
encodes the returned small integer into a structured 32-bit
word. Unfortunately, at disposal time, the needed decoding
is not done. As a result, idr_remove() fails, and the IDR
fills up. Since its size is 2048, the following script ends up
with "Filter already exists":
tc filter add dev myve $FILTER1
tc filter add dev myve $FILTER2
for i in {1..2048}
do
echo $i
tc filter del dev myve $FILTER2
tc filter add dev myve $FILTER2
done
This patch adds the missing decoding logic for handles that
deserve it.
Removing full driver sections also removed mailing list entries, causing
submitters of future patches to forget CCing these mailing lists.
Hence re-add the sections for the Renesas Ethernet AVB, R-Car SATA, and
SuperH Ethernet drivers. Add people who volunteered to maintain these
drivers (thanks a lot!), and mark all of them as supported.
Jakub Kicinski [Wed, 13 Nov 2024 01:30:41 +0000 (17:30 -0800)]
Merge tag 'for-net-2024-11-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- btintel: Direct exception event to bluetooth stack
- hci_core: Fix calling mgmt_device_connected
* tag 'for-net-2024-11-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: btintel: Direct exception event to bluetooth stack
Bluetooth: hci_core: Fix calling mgmt_device_connected
====================
The syzbot console output indicates a virtual environment where swapfile
is on a rotational device. In this case, clusters aren't actually used,
and si->full_clusters is not initialized. Daan's report is from qemu, so
likely rotational too.
Make sure to only schedule the cluster reclaim work when clusters are
actually in use.
Si-Wei Liu [Mon, 21 Oct 2024 13:40:39 +0000 (16:40 +0300)]
vdpa/mlx5: Fix PA offset with unaligned starting iotlb map
When calculating the physical address range based on the iotlb and mr
[start,end) ranges, the offset of mr->start relative to map->start
is not taken into account. This leads to some incorrect and duplicate
mappings.
For the case when mr->start < map->start the code is already correct:
the range in [mr->start, map->start) was handled by a different
iteration.
Linus Torvalds [Tue, 12 Nov 2024 21:35:13 +0000 (13:35 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"x86 and selftests fixes.
x86:
- When emulating a guest TLB flush for a nested guest, flush vpid01,
not vpid02, if L2 is active but VPID is disabled in vmcs12, i.e. if
L2 and L1 are sharing VPID '0' (from L1's perspective).
- Fix a bug in the SNP initialization flow where KVM would return '0'
to userspace instead of -errno on failure.
- Move the Intel PT virtualization (i.e. outputting host trace to
host buffer and guest trace to guest buffer) behind CONFIG_BROKEN.
- Fix memory leak on failure of KVM_SEV_SNP_LAUNCH_START
- Fix a bug where KVM fails to inject an interrupt from the IRR after
KVM_SET_LAPIC.
Selftests:
- Increase the timeout for the memslot performance selftest to avoid
false failures on arm64 and nested x86 platforms.
- Fix a goof in the guest_memfd selftest where a for-loop initialized
a bit mask to zero instead of BIT(0).
- Disable strict aliasing when building KVM selftests to prevent the
compiler from treating things like "u64 *" to "uint64_t *" cases as
undefined behavior, which can lead to nasty, hard to debug
failures.
- Force -march=x86-64-v2 for KVM x86 selftests if and only if the
uarch is supported by the compiler.
- Fix broken compilation of kvm selftests after a header sync in
tools/"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: Bury Intel PT virtualization (guest/host mode) behind CONFIG_BROKEN
KVM: x86: Unconditionally set irr_pending when updating APICv state
kvm: svm: Fix gctx page leak on invalid inputs
KVM: selftests: use X86_MEMTYPE_WB instead of VMX_BASIC_MEM_TYPE_WB
KVM: SVM: Propagate error from snp_guest_req_init() to userspace
KVM: nVMX: Treat vpid01 as current if L2 is active, but with VPID disabled
KVM: selftests: Don't force -march=x86-64-v2 if it's unsupported
KVM: selftests: Disable strict aliasing
KVM: selftests: fix unintentional noop test in guest_memfd_test.c
KVM: selftests: memslot_perf_test: increase guest sync timeout
Linus Torvalds [Tue, 12 Nov 2024 21:21:07 +0000 (13:21 -0800)]
Merge tag 'for-6.12/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mikulas Patocka:
- fix warnings about duplicate slab cache names
* tag 'for-6.12/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-cache: fix warnings about duplicate slab caches
dm-bufio: fix warnings about duplicate slab caches
Linus Torvalds [Tue, 12 Nov 2024 21:06:31 +0000 (13:06 -0800)]
Merge tag 'integrity-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity fixes from Mimi Zohar:
"One bug fix, one performance improvement, and the use of
static_assert:
- The bug fix addresses "only a cosmetic change" commit, which didn't
take into account the original 'ima' template definition.
- The performance improvement limits the atomic_read()"
* tag 'integrity-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
integrity: Use static_assert() to check struct sizes
evm: stop avoidably reading i_writecount in evm_file_release
ima: fix buffer overrun in ima_eventdigest_init_common
Linus Torvalds [Tue, 12 Nov 2024 21:01:09 +0000 (13:01 -0800)]
Merge tag 'landlock-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock fixes from Mickaël Salaün:
"This fixes issues in the Landlock's sandboxer sample and
documentation, slightly refactors helpers (required for ongoing patch
series), and improve/fix a feature merged in v6.12 (signal and
abstract UNIX socket scoping)"
* tag 'landlock-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
landlock: Optimize scope enforcement
landlock: Refactor network access mask management
landlock: Refactor filesystem access mask management
samples/landlock: Clarify option parsing behaviour
samples/landlock: Refactor help message
samples/landlock: Fix port parsing in sandboxer
landlock: Fix grammar issues in documentation
landlock: Improve documentation of previous limitations
Donet Tom [Sun, 10 Nov 2024 06:49:03 +0000 (00:49 -0600)]
selftests: hugetlb_dio: fixup check for initial conditions to skip in the start
This test verifies that a hugepage, used as a user buffer for DIO
operations, is correctly freed upon unmapping. To test this, we read the
count of free hugepages before and after the mmap, DIO, and munmap
operations, then check if the free hugepage count is the same.
Reading free hugepages before the test was removed by commit 0268d4579901
('selftests: hugetlb_dio: check for initial conditions to skip at the
start'), causing the test to always fail.
This patch adds back reading the free hugepages before starting the test.
With this patch, the tests are now passing.
Test results without this patch:
./tools/testing/selftests/mm/hugetlb_dio
TAP version 13
1..4
# No. Free pages before allocation : 0
# No. Free pages after munmap : 100
not ok 1 : Huge pages not freed!
# No. Free pages before allocation : 0
# No. Free pages after munmap : 100
not ok 2 : Huge pages not freed!
# No. Free pages before allocation : 0
# No. Free pages after munmap : 100
not ok 3 : Huge pages not freed!
# No. Free pages before allocation : 0
# No. Free pages after munmap : 100
not ok 4 : Huge pages not freed!
# Totals: pass:0 fail:4 xfail:0 xpass:0 skip:0 error:0
Test results with this patch:
/tools/testing/selftests/mm/hugetlb_dio
TAP version 13
1..4
# No. Free pages before allocation : 100
# No. Free pages after munmap : 100
ok 1 : Huge pages freed successfully !
# No. Free pages before allocation : 100
# No. Free pages after munmap : 100
ok 2 : Huge pages freed successfully !
# No. Free pages before allocation : 100
# No. Free pages after munmap : 100
ok 3 : Huge pages freed successfully !
# No. Free pages before allocation : 100
# No. Free pages after munmap : 100
ok 4 : Huge pages freed successfully !
Hugh Dickins [Sun, 10 Nov 2024 21:11:21 +0000 (13:11 -0800)]
mm/thp: fix deferred split queue not partially_mapped: fix
Though even more elusive than before, list_del corruption has still been
seen on THP's deferred split queue.
The idea in commit e66f3185fa04 was right, but its implementation wrong.
The context omitted an important comment just before the critical test:
"split_folio() removes folio from list on success." In ignoring that
comment, when a THP split succeeded, the code went on to release the
preceding safe folio, preserving instead an irrelevant (formerly head)
folio: which gives no safety because it's not on the list. Fix the logic.
John Hubbard [Tue, 5 Nov 2024 03:29:44 +0000 (19:29 -0800)]
mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
commit 53ba78de064b ("mm/gup: introduce
check_and_migrate_movable_folios()") created a new constraint on the
pin_user_pages*() API family: a potentially large internal allocation must
now occur, for FOLL_LONGTERM cases.
A user-visible consequence has now appeared: user space can no longer pin
more than 2GB of memory anymore on x86_64. That's because, on a 4KB
PAGE_SIZE system, when user space tries to (indirectly, via a device
driver that calls pin_user_pages()) pin 2GB, this requires an allocation
of a folio pointers array of MAX_PAGE_ORDER size, which is the limit for
kmalloc().
In addition to the directly visible effect described above, there is also
the problem of adding an unnecessary allocation. The **pages array
argument has already been allocated, and there is no need for a redundant
**folios array allocation in this case.
Fix this by avoiding the new allocation entirely. This is done by
referring to either the original page[i] within **pages, or to the
associated folio. Thanks to David Hildenbrand for suggesting this
approach and for providing the initial implementation (which I've tested
and adjusted slightly) as well.
Kiran K [Tue, 22 Oct 2024 09:11:34 +0000 (14:41 +0530)]
Bluetooth: btintel: Direct exception event to bluetooth stack
Have exception event part of HCI traces which helps for debug.
snoop traces:
> HCI Event: Vendor (0xff) plen 79
Vendor Prefix (0x8780)
Intel Extended Telemetry (0x03)
Unknown extended telemetry event type (0xde)
01 01 de
Unknown extended subevent 0x07
01 01 de 07 01 de 06 1c ef be ad de ef be ad de
ef be ad de ef be ad de ef be ad de ef be ad de
ef be ad de 05 14 ef be ad de ef be ad de ef be
ad de ef be ad de ef be ad de 43 10 ef be ad de
ef be ad de ef be ad de ef be ad de
Fixes: af395330abed ("Bluetooth: btintel: Add Intel devcoredump support") Signed-off-by: Kiran K <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Since 61a939c68ee0 ("Bluetooth: Queue incoming ACL data until
BT_CONNECTED state is reached") there is no long the need to call
mgmt_device_connected as ACL data will be queued until BT_CONNECTED
state.
Michal Luczaj [Thu, 7 Nov 2024 20:46:13 +0000 (21:46 +0100)]
vsock: Fix sk_error_queue memory leak
Kernel queues MSG_ZEROCOPY completion notifications on the error queue.
Where they remain, until explicitly recv()ed. To prevent memory leaks,
clean up the queue when the socket is destroyed.
Michal Luczaj [Thu, 7 Nov 2024 20:46:12 +0000 (21:46 +0100)]
virtio/vsock: Fix accept_queue memory leak
As the final stages of socket destruction may be delayed, it is possible
that virtio_transport_recv_listen() will be called after the accept_queue
has been flushed, but before the SOCK_DONE flag has been set. As a result,
sockets enqueued after the flush would remain unremoved, leading to a
memory leak.
Bibo Mao [Tue, 12 Nov 2024 08:35:39 +0000 (16:35 +0800)]
LoongArch: Fix AP booting issue in VM mode
Native IPI is used for AP booting, because it is the booting interface
between OS and BIOS firmware. The paravirt IPI is only used inside OS,
and native IPI is necessary to boot AP.
When booting AP, we write the kernel entry address in the HW mailbox of
AP and send IPI interrupt to it. AP executes idle instruction and waits
for interrupts or SW events, then clears IPI interrupt and jumps to the
kernel entry from HW mailbox.
Between writing HW mailbox and sending IPI, AP can be woken up by SW
events and jumps to the kernel entry, so ACTION_BOOT_CPU IPI interrupt
will keep pending during AP booting. And native IPI interrupt handler
needs be registered so that it can clear pending native IPI, else there
will be endless interrupts during AP booting stage.
Here native IPI interrupt is initialized even if paravirt IPI is used.
Kanglong Wang [Tue, 12 Nov 2024 08:35:39 +0000 (16:35 +0800)]
LoongArch: Add WriteCombine shadow mapping in KASAN
Currently, the kernel couldn't boot when ARCH_IOREMAP, ARCH_WRITECOMBINE
and KASAN are enabled together. Because DMW2 is used by kernel now which
is configured as 0xa000000000000000 for WriteCombine, but KASAN has no
segment mapping for it. This patch fix this issue.
Solution: Add the relevant definitions for WriteCombine (DMW2) in KASAN.
Huacai Chen [Tue, 12 Nov 2024 08:35:39 +0000 (16:35 +0800)]
LoongArch: Disable KASAN if PGDIR_SIZE is too large for cpu_vabits
If PGDIR_SIZE is too large for cpu_vabits, KASAN_SHADOW_END will
overflow UINTPTR_MAX because KASAN_SHADOW_START/KASAN_SHADOW_END are
aligned up by PGDIR_SIZE. And then the overflowed KASAN_SHADOW_END looks
like a user space address.
For example, PGDIR_SIZE of CONFIG_4KB_4LEVEL is 2^39, which is too large
for Loongson-2K series whose cpu_vabits = 39.
Since CONFIG_4KB_4LEVEL is completely legal for CPUs with cpu_vabits <=
39, we just disable KASAN via early return in kasan_init(). Otherwise we
get a boot failure.
Moreover, we change KASAN_SHADOW_END from the first address after KASAN
shadow area to the last address in KASAN shadow area, in order to avoid
the end address exactly overflow to 0 (which is a legal case). We don't
need to worry about alignment because pgd_addr_end() can handle it.
Huacai Chen [Tue, 12 Nov 2024 08:35:39 +0000 (16:35 +0800)]
LoongArch: Make KASAN work with 5-level page-tables
Make KASAN work with 5-level page-tables, including:
1. Implement and use __pgd_none() and kasan_p4d_offset().
2. As done in kasan_pmd_populate() and kasan_pte_populate(), restrict
the loop conditions of kasan_p4d_populate() and kasan_pud_populate()
to avoid unnecessary population.
Yuli Wang [Tue, 12 Nov 2024 08:35:39 +0000 (16:35 +0800)]
LoongArch: Define a default value for VM_DATA_DEFAULT_FLAGS
This is a trivial cleanup, commit c62da0c35d58518d ("mm/vma: define a
default value for VM_DATA_DEFAULT_FLAGS") has unified default values of
VM_DATA_DEFAULT_FLAGS across different platforms.
Huacai Chen [Tue, 12 Nov 2024 08:35:36 +0000 (16:35 +0800)]
LoongArch: For all possible CPUs setup logical-physical CPU mapping
In order to support ACPI-based physical CPU hotplug, we suppose for all
"possible" CPUs cpu_logical_map() can work. Because some drivers want to
use cpu_logical_map() for all "possible" CPUs, while currently we only
setup logical-physical mapping for "present" CPUs. This lack of mapping
also causes cpu_to_node() cannot work for hot-added CPUs.
All "possible" CPUs are listed in MADT, and the "present" subset is
marked as ACPI_MADT_ENABLED. To setup logical-physical CPU mapping for
all possible CPUs and keep present CPUs continuous in cpu_present_mask,
we parse MADT twice. The first pass handles CPUs with ACPI_MADT_ENABLED
and the second pass handles CPUs without ACPI_MADT_ENABLED.
The global flag (cpu_enumerated) is removed because acpi_map_cpu() calls
cpu_number_map() rather than set_processor_mask() now.
Carolina Jubran [Thu, 7 Nov 2024 18:35:27 +0000 (20:35 +0200)]
net/mlx5e: Disable loopback self-test on multi-PF netdev
In Multi-PF (Socket Direct) configurations, when a loopback packet is
sent through one of the secondary devices, it will always be received
on the primary device. This causes the loopback layer to fail in
identifying the loopback packet as the devices are different.
To avoid false test failures, disable the loopback self-test in
Multi-PF configurations.
Moshe Shemesh [Thu, 7 Nov 2024 18:35:26 +0000 (20:35 +0200)]
net/mlx5e: CT: Fix null-ptr-deref in add rule err flow
In error flow of mlx5_tc_ct_entry_add_rule(), in case ct_rule_add()
callback returns error, zone_rule->attr is used uninitiated. Fix it to
use attr which has the needed pointer value.
Dragos Tatulea [Thu, 7 Nov 2024 18:35:24 +0000 (20:35 +0200)]
net/mlx5e: kTLS, Fix incorrect page refcounting
The kTLS tx handling code is using a mix of get_page() and
page_ref_inc() APIs to increment the page reference. But on the release
path (mlx5e_ktls_tx_handle_resync_dump_comp()), only put_page() is used.
This is an issue when using pages from large folios: the get_page()
references are stored on the folio page while the page_ref_inc()
references are stored directly in the given page. On release the folio
page will be dereferenced too many times.
This was found while doing kTLS testing with sendfile() + ZC when the
served file was read from NFS on a kernel with NFS large folios support
(commit 49b29a573da8 ("nfs: add support for large folios")).
Mark Bloch [Thu, 7 Nov 2024 18:35:23 +0000 (20:35 +0200)]
net/mlx5: fs, lock FTE when checking if active
The referenced commits introduced a two-step process for deleting FTEs:
- Lock the FTE, delete it from hardware, set the hardware deletion function
to NULL and unlock the FTE.
- Lock the parent flow group, delete the software copy of the FTE, and
remove it from the xarray.
However, this approach encounters a race condition if a rule with the same
match value is added simultaneously. In this scenario, fs_core may set the
hardware deletion function to NULL prematurely, causing a panic during
subsequent rule deletions.
To prevent this, ensure the active flag of the FTE is checked under a lock,
which will prevent the fs_core layer from attaching a new steering rule to
an FTE that is in the process of deletion.
Parav Pandit [Thu, 7 Nov 2024 18:35:22 +0000 (20:35 +0200)]
net/mlx5: Fix msix vectors to respect platform limit
The number of PCI vectors allocated by the platform (which may be fewer
than requested) is currently not honored when creating the SF pool;
only the PCI MSI-X capability is considered.
As a result, when a platform allocates fewer vectors
(in non-dynamic mode) than requested, the PF and SF pools end up
with an invalid vector range.
This causes incorrect SF vector accounting, which leads to the
following call trace when an invalid IRQ vector is allocated.
This issue is resolved by ensuring that the platform's vector
limit is respected for both the SF and PF pools.
Chiara Meiohas [Thu, 7 Nov 2024 18:35:21 +0000 (20:35 +0200)]
net/mlx5: E-switch, unload IB representors when unloading ETH representors
IB representors depend on ETH representors, so the IB representors
should not exist without the ETH ones. When unloading the ETH
representors, the corresponding IB representors should be also
unloaded.
The commit 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions")
introduced the use of the ib_device_set_netdev API in IB
repsresentors. ib_device_set_netdev() increments the refcount of
the representor's netdev when loading an IB representor and
decrements it when unloading.
Without the unloading of the IB representor, the refcount of the
representor's netdev remains greater than 0, preventing it from
being unregistered.
The patch uncovered an underlying bug where the eth representor is
unloaded, without unloading the IB representor.
This issue happened when using multiport E-switch and rebooting,
causing the shutdown to hang when unloading the ETH representor
because the refcount of the representor's netdevice was greater than 0.
Call trace:
unregister_netdevice: waiting for eth3 to become free. Usage count = 2
ref_tracker: eth%d@00000000661d60f7 has 1/1 users at
ib_device_set_netdev+0x160/0x2d0 [ib_core]
mlx5_ib_vport_rep_load+0x104/0x3f0 [mlx5_ib]
mlx5_eswitch_reload_ib_reps+0xfc/0x110 [mlx5_core]
mlx5_mpesw_work+0x236/0x330 [mlx5_core]
process_one_work+0x169/0x320
worker_thread+0x288/0x3a0
kthread+0xb8/0xe0
ret_from_fork+0x2d/0x50
ret_from_fork_asm+0x11/0x20
Jakub Kicinski [Tue, 12 Nov 2024 03:06:36 +0000 (19:06 -0800)]
Merge branch 'mptcp-fix-a-couple-of-races'
Paolo Abeni says:
====================
mptcp: fix a couple of races
The first patch addresses a division by zero issue reported by Eric,
the second one solves a similar issue found by code inspection while
investigating the former.
====================
Paolo Abeni [Fri, 8 Nov 2024 10:58:17 +0000 (11:58 +0100)]
mptcp: cope racing subflow creation in mptcp_rcv_space_adjust
Additional active subflows - i.e. created by the in kernel path
manager - are included into the subflow list before starting the
3whs.
A racing recvmsg() spooling data received on an already established
subflow would unconditionally call tcp_cleanup_rbuf() on all the
current subflows, potentially hitting a divide by zero error on
the newly created ones.
Explicitly check that the subflow is in a suitable state before
invoking tcp_cleanup_rbuf().
The root cause is the current bad handling of racing disconnect.
After the blamed commit below, sk_wait_data() can return (with
error) with the underlying socket disconnected and a zero rcv_mss.
Catch the error and return without performing any additional
operations on the current socket.
Hajime Tazaki [Fri, 8 Nov 2024 22:28:34 +0000 (07:28 +0900)]
nommu: pass NULL argument to vma_iter_prealloc()
When deleting a vma entry from a maple tree, it has to pass NULL to
vma_iter_prealloc() in order to calculate internal state of the tree, but
it passed a wrong argument. As a result, nommu kernels crashed upon
accessing a vma iterator, such as acct_collect() reading the size of vma
entries after do_munmap().
This commit fixes this issue by passing a right argument to the
preallocation call.
For a really damaged superblock, the value of 'i_super.s_blocksize_bits'
may exceed the maximum possible shift for an underlying 'int'. So add an
extra check whether the aforementioned field represents the valid block
size, which is 512 bytes, 1K, 2K, or 4K.
Ryusuke Konishi [Wed, 6 Nov 2024 16:07:33 +0000 (01:07 +0900)]
nilfs2: fix null-ptr-deref in block_dirty_buffer tracepoint
When using the "block:block_dirty_buffer" tracepoint, mark_buffer_dirty()
may cause a NULL pointer dereference, or a general protection fault when
KASAN is enabled.
This happens because, since the tracepoint was added in
mark_buffer_dirty(), it references the dev_t member bh->b_bdev->bd_dev
regardless of whether the buffer head has a pointer to a block_device
structure.
In the current implementation, nilfs_grab_buffer(), which grabs a buffer
to read (or create) a block of metadata, including b-tree node blocks,
does not set the block device, but instead does so only if the buffer is
not in the "uptodate" state for each of its caller block reading
functions. However, if the uptodate flag is set on a folio/page, and the
buffer heads are detached from it by try_to_free_buffers(), and new buffer
heads are then attached by create_empty_buffers(), the uptodate flag may
be restored to each buffer without the block device being set to
bh->b_bdev, and mark_buffer_dirty() may be called later in that state,
resulting in the bug mentioned above.
Fix this issue by making nilfs_grab_buffer() always set the block device
of the super block structure to the buffer head, regardless of the state
of the buffer's uptodate flag.
Ryusuke Konishi [Wed, 6 Nov 2024 16:07:32 +0000 (01:07 +0900)]
nilfs2: fix null-ptr-deref in block_touch_buffer tracepoint
Patch series "nilfs2: fix null-ptr-deref bugs on block tracepoints".
This series fixes null pointer dereference bugs that occur when using
nilfs2 and two block-related tracepoints.
This patch (of 2):
It has been reported that when using "block:block_touch_buffer"
tracepoint, touch_buffer() called from __nilfs_get_folio_block() causes a
NULL pointer dereference, or a general protection fault when KASAN is
enabled.
This happens because since the tracepoint was added in touch_buffer(), it
references the dev_t member bh->b_bdev->bd_dev regardless of whether the
buffer head has a pointer to a block_device structure. In the current
implementation, the block_device structure is set after the function
returns to the caller.
Here, touch_buffer() is used to mark the folio/page that owns the buffer
head as accessed, but the common search helper for folio/page used by the
caller function was optimized to mark the folio/page as accessed when it
was reimplemented a long time ago, eliminating the need to call
touch_buffer() here in the first place.
So this solves the issue by eliminating the touch_buffer() call itself.
The problem was originally introduced by commit b109b87050df ("mm/munlock:
replace clear_page_mlock() by final clearance"): it was focused on
handling pagecache and anonymous memory and wasn't suitable for lower
level get_page()/free_page() API's used for example by KVM, as with this
reproducer.
Fix it by moving the mlocked flag clearance down to free_page_prepare().
The bug itself if fairly old and harmless (aside from generating these
warnings), aside from a small memory leak - "bad" pages are stopped from
being allocated again.
Its possible that two threads call tcp_v6_do_rcv()/sk_forward_alloc_add()
concurrently when sk->sk_state == TCP_LISTEN with sk->sk_lock unlocked,
which triggers a data-race around sk->sk_forward_alloc:
tcp_v6_rcv
tcp_v6_do_rcv
skb_clone_and_charge_r
sk_rmem_schedule
__sk_mem_schedule
sk_forward_alloc_add()
skb_set_owner_r
sk_mem_charge
sk_forward_alloc_add()
__kfree_skb
skb_release_all
skb_release_head_state
sock_rfree
sk_mem_uncharge
sk_forward_alloc_add()
sk_mem_reclaim
// set local var reclaimable
__sk_mem_reclaim
sk_forward_alloc_add()
The skb_clone_and_charge_r() should not be called in tcp_v6_do_rcv() when
sk->sk_state is TCP_LISTEN, it happens later in tcp_v6_syn_recv_sock().
Fix the same issue in dccp_v6_do_rcv().
Linus Torvalds [Mon, 11 Nov 2024 22:09:57 +0000 (14:09 -0800)]
Merge tag 'sched_ext-for-6.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- The fair sched class currently has a bug where its balance() returns
true telling the sched core that it has tasks to run but then NULL
from pick_task(). This makes sched core call sched_ext's pick_task()
without preceding balance() which can lead to stalls in partial mode.
For now, work around by detecting the condition and forcing the CPU
to go through another scheduling cycle.
- Add a missing newline to an error message and fix drgn introspection
tool which went out of sync.
* tag 'sched_ext-for-6.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Handle cases where pick_task_scx() is called without preceding balance_scx()
sched_ext: Update scx_show_state.py to match scx_ops_bypass_depth's new type
sched_ext: Add a missing newline at the end of an error message
Linus Torvalds [Mon, 11 Nov 2024 17:06:17 +0000 (09:06 -0800)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
"Several small bugfixes all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa/mlx5: Fix error path during device add
vp_vdpa: fix id_table array not null terminated error
virtio_pci: Fix admin vq cleanup by using correct info pointer
vDPA/ifcvf: Fix pci_read_config_byte() return code handling
Fix typo in vringh_test.c
vdpa: solidrun: Fix UB bug with devres
vsock/virtio: Initialization of the dangling pointer occurring in vsk->trans
Mikulas Patocka [Mon, 11 Nov 2024 15:48:18 +0000 (16:48 +0100)]
dm-bufio: fix warnings about duplicate slab caches
The commit 4c39529663b9 adds a warning about duplicate cache names if
CONFIG_DEBUG_VM is selected. These warnings are triggered by the dm-bufio
code. The dm-bufio code allocates a slab cache with each client. It is
not possible to preallocate the caches in the module init function
because the size of auxiliary per-buffer data is not known at this point.
So, this commit changes dm-bufio so that it appends a unique atomic value
to the cache name, to avoid the warnings.
Signed-off-by: Mikulas Patocka <[email protected]> Fixes: 4c39529663b9 ("slab: Warn on duplicate cache names when DEBUG_VM=y")
cpufreq: intel_pstate: Rearrange locking in hybrid_init_cpu_capacity_scaling()
Notice that hybrid_init_cpu_capacity_scaling() only needs to hold
hybrid_capacity_lock around __hybrid_init_cpu_capacity_scaling()
calls, so introduce a "locked" wrapper around the latter and call
it from the former. This allows to drop a local variable and a
label that are not needed any more.
Also, rename __hybrid_init_cpu_capacity_scaling() to
__hybrid_refresh_cpu_capacity_scaling() for consistency.
Interestingly enough, this fixes a locking issue introduced by commit 929ebc93ccaa ("cpufreq: intel_pstate: Set asymmetric CPU capacity on
hybrid systems") that put an arch_enable_hybrid_capacity_scale() call
under hybrid_capacity_lock, which was a mistake because the latter is
acquired in CPU hotplug paths and so it cannot be held around
cpus_read_lock() calls.
Barry Song [Thu, 7 Nov 2024 01:12:46 +0000 (14:12 +1300)]
mm: count zeromap read and set for swapout and swapin
When the proportion of folios from the zeromap is small, missing their
accounting may not significantly impact profiling. However, it's easy to
construct a scenario where this becomes an issue—for example, allocating
1 GB of memory, writing zeros from userspace, followed by MADV_PAGEOUT,
and then swapping it back in. In this case, the swap-out and swap-in
counts seem to vanish into a black hole, potentially causing semantic
ambiguity.
On the other hand, Usama reported that zero-filled pages can exceed 10% in
workloads utilizing zswap, while Hailong noted that some app in Android
have more than 6% zero-filled pages. Before commit 0ca0c24e3211 ("mm:
store zero pages to be swapped out in a bitmap"), both zswap and zRAM
implemented similar optimizations, leading to these optimized-out pages
being counted in either zswap or zRAM counters (with pswpin/pswpout also
increasing for zRAM). With zeromap functioning prior to both zswap and
zRAM, userspace will no longer detect these swap-out and swap-in actions.
We have three ways to address this:
1. Introduce a dedicated counter specifically for the zeromap.
2. Use pswpin/pswpout accounting, treating the zero map as a standard
backend. This approach aligns with zRAM's current handling of
same-page fills at the device level. However, it would mean losing the
optimized-out page counters previously available in zRAM and would not
align with systems using zswap. Additionally, as noted by Nhat Pham,
pswpin/pswpout counters apply only to I/O done directly to the backend
device.
3. Count zeromap pages under zswap, aligning with system behavior when
zswap is enabled. However, this would not be consistent with zRAM, nor
would it align with systems lacking both zswap and zRAM.
Given the complications with options 2 and 3, this patch selects
option 1.
We can find these counters from /proc/vmstat (counters for the whole
system) and memcg's memory.stat (counters for the interested memcg).
This patch does not address any specific zeromap bug, but the missing
swpout and swpin counts for zero-filled pages can be highly confusing and
may mislead user-space agents that rely on changes in these counters as
indicators. Therefore, we add a Fixes tag to encourage the inclusion of
this counter in any kernel versions with zeromap.
Many thanks to Kanchana for the contribution of changing
count_objcg_event() to count_objcg_events() to support large folios[1],
which has now been incorporated into this patch.
Kent Overstreet [Mon, 11 Nov 2024 04:28:33 +0000 (23:28 -0500)]
bcachefs: Allow for unknown key types in backpointers fsck
We can't assume that btrees only contain keys of a given type - even if
they only have a single key type listed in the allowed key types for
that btree; this is a forwards compatibility issue.
Linus Torvalds [Sun, 10 Nov 2024 22:16:28 +0000 (14:16 -0800)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A handful of Qualcomm clk driver fixes:
- Correct flags for X Elite USB MP GDSC and pcie pipediv2 clocks
- Fix alpha PLL post_div mask for the cases where width is not
specified
- Avoid hangs in the SM8350 video driver (venus) by setting HW_CTRL
trigger feature on the video clocks"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: qcom: gcc-x1e80100: Fix USB MP SS1 PHY GDSC pwrsts flags
clk: qcom: gcc-x1e80100: Fix halt_check for pipediv2 clocks
clk: qcom: clk-alpha-pll: Fix pll post div mask when width is not set
clk: qcom: videocc-sm8350: use HW_CTRL_TRIGGER for vcodec GDSCs
Linus Torvalds [Sun, 10 Nov 2024 22:13:05 +0000 (14:13 -0800)]
Merge tag 'i2c-for-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"i2c-host fixes for v6.12-rc7 (from Andi):
- Fix designware incorrect behavior when concluding a transmission
- Fix Mule multiplexer error value evaluation"
* tag 'i2c-for-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: designware: do not hold SCL low when I2C_DYNAMIC_TAR_UPDATE is not set
i2c: muxes: Fix return value check in mule_i2c_mux_probe()
If the caller supplies an iocb->ki_pos value that is close to the
filesystem upper limit, and an iterator with a count that causes us to
overflow that limit, then filemap_read() enters an infinite loop.
This behaviour was discovered when testing xfstests generic/525 with the
"localio" optimisation for loopback NFS mounts.
Linus Torvalds [Sun, 10 Nov 2024 17:37:47 +0000 (09:37 -0800)]
Merge tag 'irq_urgent_for_v6.12_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Borislav Petkov:
- Make sure GICv3 controller interrupt activation doesn't race with a
concurrent deactivation due to propagation delays of the register
write
* tag 'irq_urgent_for_v6.12_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3: Force propagation of the active state with a read-back
Linus Torvalds [Sun, 10 Nov 2024 17:04:27 +0000 (09:04 -0800)]
Merge tag 'mm-hotfixes-stable-2024-11-09-22-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"20 hotfixes, 14 of which are cc:stable.
Three affect DAMON. Lorenzo's five-patch series to address the
mmap_region error handling is here also.
Apart from that, various singletons"
* tag 'mm-hotfixes-stable-2024-11-09-22-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mailmap: add entry for Thorsten Blum
ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove()
signal: restore the override_rlimit logic
fs/proc: fix compile warning about variable 'vmcore_mmap_ops'
ucounts: fix counter leak in inc_rlimit_get_ucounts()
selftests: hugetlb_dio: check for initial conditions to skip in the start
mm: fix docs for the kernel parameter ``thp_anon=``
mm/damon/core: avoid overflow in damon_feed_loop_next_input()
mm/damon/core: handle zero schemes apply interval
mm/damon/core: handle zero {aggregation,ops_update} intervals
mm/mlock: set the correct prev on failure
objpool: fix to make percpu slot allocation more robust
mm/page_alloc: keep track of free highatomic
mm: resolve faulty mmap_region() error path behaviour
mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling
mm: refactor map_deny_write_exec()
mm: unconditionally close VMAs on error
mm: avoid unsafe VMA hook invocation when error arises on mmap hook
mm/thp: fix deferred split unqueue naming and locking
mm/thp: fix deferred split queue not partially_mapped
Linus Torvalds [Sun, 10 Nov 2024 16:56:48 +0000 (08:56 -0800)]
Merge tag 'usb-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB/Thunderbolt fixes from Greg KH:
"Here are some small remaining USB and Thunderbolt fixes and device ids
for 6.12-rc7. Included in here are:
- new USB serial driver device ids
- thunderbolt driver fixes for reported problems
- typec bugfixes
- dwc3 driver fix
- musb driver fix
All of these have been in linux-next this past week with no reported
issues"
* tag 'usb-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: serial: qcserial: add support for Sierra Wireless EM86xx
thunderbolt: Fix connection issue with Pluggable UD-4VPD dock
usb: typec: fix potential out of bounds in ucsi_ccg_update_set_new_cam_cmd()
usb: dwc3: fix fault at system suspend if device was already runtime suspended
usb: typec: qcom-pmic: init value of hdr_len/txbuf_len earlier
usb: musb: sunxi: Fix accessing an released usb phy
USB: serial: io_edgeport: fix use after free in debug printk
USB: serial: option: add Quectel RG650V
USB: serial: option: add Fibocom FG132 0x0112 composition
thunderbolt: Add only on-board retimers when !CONFIG_USB4_DEBUGFS_MARGINING
Linus Torvalds [Sun, 10 Nov 2024 16:53:24 +0000 (08:53 -0800)]
Merge tag 'staging-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are two small memory leak fixes for the vchiq_arm staging driver
that have been sitting in my tree for weeks and should get merged for
6.12-rc7 so that people don't keep tripping over them.
They both have been in linux-next for a while with no reported
problems"
* tag 'staging-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: vchiq_arm: Use devm_kzalloc() for drv_mgmt allocation
staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state allocation
Tejun Heo [Sat, 9 Nov 2024 20:43:55 +0000 (10:43 -1000)]
sched_ext: Handle cases where pick_task_scx() is called without preceding balance_scx()
sched_ext dispatches tasks from the BPF scheduler from balance_scx() and
thus every pick_task_scx() call must be preceded by balance_scx(). While
this usually holds, due to a bug, there are cases where the fair class's
balance() returns true indicating that it has tasks to run on the CPU and
thus terminating balance() calls but fails to actually find the next task to
run when pick_task() is called. In such cases, pick_task_scx() can be called
without preceding balance_scx().
Detect this condition using SCX_RQ_BAL_PENDING flags. If detected, keep
running the previous task if possible and avoid stalling from entering idle
without balancing.