Git Repo - linux.git/log

net/sched: Fix UAF when resolving a clash

KASAN reports the following UAF:

BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct]
Read of size 1 at addr ffff888c07603600 by task handler130/6469

Call Trace:
  <IRQ>
  dump_stack_lvl+0x48/0x70
  print_address_description.constprop.0+0x33/0x3d0
  print_report+0xc0/0x2b0
  kasan_report+0xd0/0x120
  __asan_load1+0x6c/0x80
  tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct]
  tcf_ct_act+0x886/0x1350 [act_ct]
  tcf_action_exec+0xf8/0x1f0
  fl_classify+0x355/0x360 [cls_flower]
  __tcf_classify+0x1fd/0x330
  tcf_classify+0x21c/0x3c0
  sch_handle_ingress.constprop.0+0x2c5/0x500
  __netif_receive_skb_core.constprop.0+0xb25/0x1510
  __netif_receive_skb_list_core+0x220/0x4c0
  netif_receive_skb_list_internal+0x446/0x620
  napi_complete_done+0x157/0x3d0
  gro_cell_poll+0xcf/0x100
  __napi_poll+0x65/0x310
  net_rx_action+0x30c/0x5c0
  __do_softirq+0x14f/0x491
  __irq_exit_rcu+0x82/0xc0
  irq_exit_rcu+0xe/0x20
  common_interrupt+0xa1/0xb0
  </IRQ>
  <TASK>
  asm_common_interrupt+0x27/0x40

Allocated by task 6469:
  kasan_save_stack+0x38/0x70
  kasan_set_track+0x25/0x40
  kasan_save_alloc_info+0x1e/0x40
  __kasan_krealloc+0x133/0x190
  krealloc+0xaa/0x130
  nf_ct_ext_add+0xed/0x230 [nf_conntrack]
  tcf_ct_act+0x1095/0x1350 [act_ct]
  tcf_action_exec+0xf8/0x1f0
  fl_classify+0x355/0x360 [cls_flower]
  __tcf_classify+0x1fd/0x330
  tcf_classify+0x21c/0x3c0
  sch_handle_ingress.constprop.0+0x2c5/0x500
  __netif_receive_skb_core.constprop.0+0xb25/0x1510
  __netif_receive_skb_list_core+0x220/0x4c0
  netif_receive_skb_list_internal+0x446/0x620
  napi_complete_done+0x157/0x3d0
  gro_cell_poll+0xcf/0x100
  __napi_poll+0x65/0x310
  net_rx_action+0x30c/0x5c0
  __do_softirq+0x14f/0x491

Freed by task 6469:
  kasan_save_stack+0x38/0x70
  kasan_set_track+0x25/0x40
  kasan_save_free_info+0x2b/0x60
  ____kasan_slab_free+0x180/0x1f0
  __kasan_slab_free+0x12/0x30
  slab_free_freelist_hook+0xd2/0x1a0
  __kmem_cache_free+0x1a2/0x2f0
  kfree+0x78/0x120
  nf_conntrack_free+0x74/0x130 [nf_conntrack]
  nf_ct_destroy+0xb2/0x140 [nf_conntrack]
  __nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack]
  nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack]
  __nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack]
  tcf_ct_act+0x12ad/0x1350 [act_ct]
  tcf_action_exec+0xf8/0x1f0
  fl_classify+0x355/0x360 [cls_flower]
  __tcf_classify+0x1fd/0x330
  tcf_classify+0x21c/0x3c0
  sch_handle_ingress.constprop.0+0x2c5/0x500
  __netif_receive_skb_core.constprop.0+0xb25/0x1510
  __netif_receive_skb_list_core+0x220/0x4c0
  netif_receive_skb_list_internal+0x446/0x620
  napi_complete_done+0x157/0x3d0
  gro_cell_poll+0xcf/0x100
  __napi_poll+0x65/0x310
  net_rx_action+0x30c/0x5c0
  __do_softirq+0x14f/0x491

The ct may be dropped if a clash has been resolved but is still passed to
the tcf_ct_flow_table_process_conn function for further usage. This issue
can be fixed by retrieving ct from skb again after confirming conntrack.

Fixes: 0cc254e5aa37 ("net/sched: act_ct: Offload connections with commit action")
Co-developed-by: Gerald Yang <[email protected]>
Signed-off-by: Gerald Yang <[email protected]>
Signed-off-by: Chengen Du <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: ks8851: Fix potential TX stall after interface reopen

The amount of TX space in the hardware buffer is tracked in the tx_space
variable. The initial value is currently only set during driver probing.

After closing the interface and reopening it the tx_space variable has
the last value it had before close. If it is smaller than the size of
the first send packet after reopeing the interface the queue will be
stopped. The queue is woken up after receiving a TX interrupt but this
will never happen since we did not send anything.

This commit moves the initialization of the tx_space variable to the
ks8851_net_open function right before starting the TX queue. Also query
the value from the hardware instead of using a hard coded value.

Only the SPI chip variant is affected by this issue because only this
driver variant actually depends on the tx_space variable in the xmit
function.

Fixes: 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX buffer overrun")
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Simon Horman <[email protected]>
Cc: [email protected]
Cc: [email protected] # 5.10+
Signed-off-by: Ronald Wahl <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port().

syzkaller triggered the warning [0] in udp_v4_early_demux().

In udp_v[46]_early_demux() and sk_lookup(), we do not touch the refcount
of the looked-up sk and use sock_pfree() as skb->destructor, so we check
SOCK_RCU_FREE to ensure that the sk is safe to access during the RCU grace
period.

Currently, SOCK_RCU_FREE is flagged for a bound socket after being put
into the hash table.  Moreover, the SOCK_RCU_FREE check is done too early
in udp_v[46]_early_demux() and sk_lookup(), so there could be a small race
window:

  CPU1                                 CPU2
  ----                                 ----
  udp_v4_early_demux()                 udp_lib_get_port()
  |                                    |- hlist_add_head_rcu()
  |- sk = __udp4_lib_demux_lookup()    |
  |- DEBUG_NET_WARN_ON_ONCE(sk_is_refcounted(sk));
                                       `- sock_set_flag(sk, SOCK_RCU_FREE)

We had the same bug in TCP and fixed it in commit 871019b22d1b ("net:
set SOCK_RCU_FREE before inserting socket into hashtable").

Let's apply the same fix for UDP.

[0]:
WARNING: CPU: 0 PID: 11198 at net/ipv4/udp.c:2599 udp_v4_early_demux+0x481/0xb70 net/ipv4/udp.c:2599
Modules linked in:
CPU: 0 PID: 11198 Comm: syz-executor.1 Not tainted 6.9.0-g93bda33046e7 #13
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:udp_v4_early_demux+0x481/0xb70 net/ipv4/udp.c:2599
Code: c5 7a 15 fe bb 01 00 00 00 44 89 e9 31 ff d3 e3 81 e3 bf ef ff ff 89 de e8 2c 74 15 fe 85 db 0f 85 02 06 00 00 e8 9f 7a 15 fe <0f> 0b e8 98 7a 15 fe 49 8d 7e 60 e8 4f 39 2f fe 49 c7 46 60 20 52
RSP: 0018:ffffc9000ce3fa58 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8318c92c
RDX: ffff888036ccde00 RSI: ffffffff8318c2f1 RDI: 0000000000000001
RBP: ffff88805a2dd6e0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0001ffffffffffff R12: ffff88805a2dd680
R13: 0000000000000007 R14: ffff88800923f900 R15: ffff88805456004e
FS:  00007fc449127640(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc449126e38 CR3: 000000003de4b002 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
PKRU: 55555554
Call Trace:
<TASK>
ip_rcv_finish_core.constprop.0+0xbdd/0xd20 net/ipv4/ip_input.c:349
ip_rcv_finish+0xda/0x150 net/ipv4/ip_input.c:447
NF_HOOK include/linux/netfilter.h:314 [inline]
NF_HOOK include/linux/netfilter.h:308 [inline]
ip_rcv+0x16c/0x180 net/ipv4/ip_input.c:569
__netif_receive_skb_one_core+0xb3/0xe0 net/core/dev.c:5624
__netif_receive_skb+0x21/0xd0 net/core/dev.c:5738
netif_receive_skb_internal net/core/dev.c:5824 [inline]
netif_receive_skb+0x271/0x300 net/core/dev.c:5884
tun_rx_batched drivers/net/tun.c:1549 [inline]
tun_get_user+0x24db/0x2c50 drivers/net/tun.c:2002
tun_chr_write_iter+0x107/0x1a0 drivers/net/tun.c:2048
new_sync_write fs/read_write.c:497 [inline]
vfs_write+0x76f/0x8d0 fs/read_write.c:590
ksys_write+0xbf/0x190 fs/read_write.c:643
__do_sys_write fs/read_write.c:655 [inline]
__se_sys_write fs/read_write.c:652 [inline]
__x64_sys_write+0x41/0x50 fs/read_write.c:652
x64_sys_call+0xe66/0x1990 arch/x86/include/generated/asm/syscalls_64.h:2
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x4b/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7fc44a68bc1f
Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 e9 cf f5 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 3c d0 f5 ff 48
RSP: 002b:00007fc449126c90 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00000000004bc050 RCX: 00007fc44a68bc1f
RDX: 0000000000000032 RSI: 00000000200000c0 RDI: 00000000000000c8
RBP: 00000000004bc050 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000032 R11: 0000000000000293 R12: 0000000000000000
R13: 000000000000000b R14: 00007fc44a5ec530 R15: 0000000000000000
</TASK>

Fixes: 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF")
Reported-by: syzkaller <[email protected]>
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

ethtool: netlink: do not return SQI value if link is down

Do not attach SQI value if link is down. "SQI values are only valid if
link-up condition is present" per OpenAlliance specification of
100Base-T1 Interoperability Test suite [1]. The same rule would apply
for other link types.

[1] https://opensig.org/automotive-ethernet-specifications/#

Fixes: 806602191592 ("ethtool: provide UAPI for PHY Signal Quality Index (SQI)")
Signed-off-by: Oleksij Rempel <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Woojung Huh <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

ppp: reject claimed-as-LCP but actually malformed packets

Since 'ppp_async_encode()' assumes valid LCP packets (with code
from 1 to 7 inclusive), add 'ppp_check_packet()' to ensure that
LCP packet has an actual body beyond PPP_LCP header bytes, and
reject claimed-as-LCP but actually malformed data otherwise.

Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=ec0723ba9605678b14bf
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Dmitry Antipov <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: ethernet: mtk-star-emac: set mac_managed_pm when probing

The below commit introduced a warning message when phy state is not in
the states: PHY_HALTED, PHY_READY, and PHY_UP.
commit 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")

mtk-star-emac doesn't need mdiobus suspend/resume. To fix the warning
message during resume, indicate the phy resume/suspend is managed by the
mac when probing.

Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
Signed-off-by: Jian Hui Lee <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

e1000e: fix force smbus during suspend flow

Commit 861e8086029e ("e1000e: move force SMBUS from enable ulp function
to avoid PHY loss issue") resolved a PHY access loss during suspend on
Meteor Lake consumer platforms, but it affected corporate systems
incorrectly.

A better fix, working for both consumer and corporate systems, was
proposed in commit bfd546a552e1 ("e1000e: move force SMBUS near the end
of enable_ulp function"). However, it introduced a regression on older
devices, such as [8086:15B8], [8086:15F9], [8086:15BE].

This patch aims to fix the secondary regression, by limiting the scope of
the changes to Meteor Lake platforms only.

Fixes: bfd546a552e1 ("e1000e: move force SMBUS near the end of enable_ulp function")
Reported-by: Todd Brandt <[email protected]>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218940
Reported-by: Dieter Mummenschanz <[email protected]>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218936
Signed-off-by: Vitaly Lifshits <[email protected]>
Tested-by: Mor Bar-Gabay <[email protected]> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tcp: avoid too many retransmit packets

If a TCP socket is using TCP_USER_TIMEOUT, and the other peer
retracted its window to zero, tcp_retransmit_timer() can
retransmit a packet every two jiffies (2 ms for HZ=1000),
for about 4 minutes after TCP_USER_TIMEOUT has 'expired'.

The fix is to make sure tcp_rtx_probe0_timed_out() takes
icsk->icsk_user_timeout into account.

Before blamed commit, the socket would not timeout after
icsk->icsk_user_timeout, but would use standard exponential
backoff for the retransmits.

Also worth noting that before commit e89688e3e978 ("net: tcp:
fix unexcepted socket die when snd_wnd is 0"), the issue
would last 2 minutes instead of 4.

Fixes: b701a99e431d ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Neal Cardwell <[email protected]>
Reviewed-by: Jason Xing <[email protected]>
Reviewed-by: Jon Maxwell <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: ethernet: lantiq_etop: fix double free in detach

The number of the currently released descriptor is never incremented
which results in the same skb being released multiple times.

Fixes: 504d4721ee8e ("MIPS: Lantiq: Add ethernet driver")
Reported-by: Joe Perches <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Aleksander Jan Bajkowski <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

i40e: Fix XDP program unloading while removing the driver

The commit 6533e558c650 ("i40e: Fix reset path while removing
the driver") introduced a new PF state "__I40E_IN_REMOVE" to block
modifying the XDP program while the driver is being removed.
Unfortunately, such a change is useful only if the ".ndo_bpf()"
callback was called out of the rmmod context because unloading the
existing XDP program is also a part of driver removing procedure.
In other words, from the rmmod context the driver is expected to
unload the XDP program without reporting any errors. Otherwise,
the kernel warning with callstack is printed out to dmesg.

Example failing scenario:
1. Load the i40e driver.
2. Load the XDP program.
3. Unload the i40e driver (using "rmmod" command).

The example kernel warning log:

[  +0.004646] WARNING: CPU: 94 PID: 10395 at net/core/dev.c:9290 unregister_netdevice_many_notify+0x7a9/0x870
[...]
[  +0.010959] RIP: 0010:unregister_netdevice_many_notify+0x7a9/0x870
[...]
[  +0.002726] Call Trace:
[  +0.002457]  <TASK>
[  +0.002119]  ? __warn+0x80/0x120
[  +0.003245]  ? unregister_netdevice_many_notify+0x7a9/0x870
[  +0.005586]  ? report_bug+0x164/0x190
[  +0.003678]  ? handle_bug+0x3c/0x80
[  +0.003503]  ? exc_invalid_op+0x17/0x70
[  +0.003846]  ? asm_exc_invalid_op+0x1a/0x20
[  +0.004200]  ? unregister_netdevice_many_notify+0x7a9/0x870
[  +0.005579]  ? unregister_netdevice_many_notify+0x3cc/0x870
[  +0.005586]  unregister_netdevice_queue+0xf7/0x140
[  +0.004806]  unregister_netdev+0x1c/0x30
[  +0.003933]  i40e_vsi_release+0x87/0x2f0 [i40e]
[  +0.004604]  i40e_remove+0x1a1/0x420 [i40e]
[  +0.004220]  pci_device_remove+0x3f/0xb0
[  +0.003943]  device_release_driver_internal+0x19f/0x200
[  +0.005243]  driver_detach+0x48/0x90
[  +0.003586]  bus_remove_driver+0x6d/0xf0
[  +0.003939]  pci_unregister_driver+0x2e/0xb0
[  +0.004278]  i40e_exit_module+0x10/0x5f0 [i40e]
[  +0.004570]  __do_sys_delete_module.isra.0+0x197/0x310
[  +0.005153]  do_syscall_64+0x85/0x170
[  +0.003684]  ? syscall_exit_to_user_mode+0x69/0x220
[  +0.004886]  ? do_syscall_64+0x95/0x170
[  +0.003851]  ? exc_page_fault+0x7e/0x180
[  +0.003932]  entry_SYSCALL_64_after_hwframe+0x71/0x79
[  +0.005064] RIP: 0033:0x7f59dc9347cb
[  +0.003648] Code: 73 01 c3 48 8b 0d 65 16 0c 00 f7 d8 64 89 01 48 83
c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 35 16 0c 00 f7 d8 64 89 01 48
[  +0.018753] RSP: 002b:00007ffffac99048 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  +0.007577] RAX: ffffffffffffffda RBX: 0000559b9bb2f6e0 RCX: 00007f59dc9347cb
[  +0.007140] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000559b9bb2f748
[  +0.007146] RBP: 00007ffffac99070 R08: 1999999999999999 R09: 0000000000000000
[  +0.007133] R10: 00007f59dc9a5ac0 R11: 0000000000000206 R12: 0000000000000000
[  +0.007141] R13: 00007ffffac992d8 R14: 0000559b9bb2f6e0 R15: 0000000000000000
[  +0.007151]  </TASK>
[  +0.002204] ---[ end trace 0000000000000000 ]---

Fix this by checking if the XDP program is being loaded or unloaded.
Then, block only loading a new program while "__I40E_IN_REMOVE" is set.
Also, move testing "__I40E_IN_REMOVE" flag to the beginning of XDP_SETUP
callback to avoid unnecessary operations and checks.

Fixes: 6533e558c650 ("i40e: Fix reset path while removing the driver")
Signed-off-by: Michal Kubiak <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: fix rc7's __skb_datagram_iter()

X would not start in my old 32-bit partition (and the "n"-handling looks
just as wrong on 64-bit, but for whatever reason did not show up there):
"n" must be accumulated over all pages before it's added to "offset" and
compared with "copy", immediately after the skb_frag_foreach_page() loop.

Fixes: d2d30a376d9c ("net: allow skb_datagram_iter to be called from any context")
Signed-off-by: Hugh Dickins <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2024-07-09

The following pull-request contains BPF updates for your *net* tree.

We've added 3 non-merge commits during the last 1 day(s) which contain
a total of 5 files changed, 81 insertions(+), 11 deletions(-).

The main changes are:

1) Fix a use-after-free in a corner case where tcx_entry got released too
   early. Also add BPF test coverage along with the fix, from Daniel Borkmann.

2) Fix a kernel panic on Loongarch in sk_msg_recvmsg() which got triggered
   by running BPF sockmap selftests, from Geliang Tang.

bpf-for-netdev

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  skmsg: Skip zero length skb in sk_msg_recvmsg
  selftests/bpf: Extend tcx tests to cover late tcx_entry release
  bpf: Fix too early release of tcx_entry
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: ks8851: Fix deadlock with the SPI chip variant

When SMP is enabled and spinlocks are actually functional then there is
a deadlock with the 'statelock' spinlock between ks8851_start_xmit_spi
and ks8851_irq:

    watchdog: BUG: soft lockup - CPU#0 stuck for 27s!
    call trace:
      queued_spin_lock_slowpath+0x100/0x284
      do_raw_spin_lock+0x34/0x44
      ks8851_start_xmit_spi+0x30/0xb8
      ks8851_start_xmit+0x14/0x20
      netdev_start_xmit+0x40/0x6c
      dev_hard_start_xmit+0x6c/0xbc
      sch_direct_xmit+0xa4/0x22c
      __qdisc_run+0x138/0x3fc
      qdisc_run+0x24/0x3c
      net_tx_action+0xf8/0x130
      handle_softirqs+0x1ac/0x1f0
      __do_softirq+0x14/0x20
      ____do_softirq+0x10/0x1c
      call_on_irq_stack+0x3c/0x58
      do_softirq_own_stack+0x1c/0x28
      __irq_exit_rcu+0x54/0x9c
      irq_exit_rcu+0x10/0x1c
      el1_interrupt+0x38/0x50
      el1h_64_irq_handler+0x18/0x24
      el1h_64_irq+0x64/0x68
      __netif_schedule+0x6c/0x80
      netif_tx_wake_queue+0x38/0x48
      ks8851_irq+0xb8/0x2c8
      irq_thread_fn+0x2c/0x74
      irq_thread+0x10c/0x1b0
      kthread+0xc8/0xd8
      ret_from_fork+0x10/0x20

This issue has not been identified earlier because tests were done on
a device with SMP disabled and so spinlocks were actually NOPs.

Now use spin_(un)lock_bh for TX queue related locking to avoid execution
of softirq work synchronously that would lead to a deadlock.

Fixes: 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX buffer overrun")
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Simon Horman <[email protected]>
Cc: [email protected]
Cc: [email protected] # 5.10+
Signed-off-by: Ronald Wahl <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

octeontx2-af: Fix incorrect value output on error path in rvu_check_rsrc_availability()

In rvu_check_rsrc_availability() in case of invalid SSOW req, an incorrect
data is printed to error log. 'req->sso' value is printed instead of
'req->ssow'. Looks like "copy-paste" mistake.

Fix this mistake by replacing 'req->sso' with 'req->ssow'.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 746ea74241fa ("octeontx2-af: Add RVU block LF provisioning support")
Signed-off-by: Aleksandr Mishin <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

bnxt: fix crashes when reducing ring count with active RSS contexts

bnxt doesn't check if a ring is used by RSS contexts when reducing
ring count. Core performs a similar check for the drivers for
the main context, but core doesn't know about additional contexts,
so it can't validate them. bnxt_fill_hw_rss_tbl_p5() uses ring
id to index bp->rx_ring[], which without the check may end up
being out of bounds.

  BUG: KASAN: slab-out-of-bounds in __bnxt_hwrm_vnic_set_rss+0xb79/0xe40
  Read of size 2 at addr ffff8881c5809618 by task ethtool/31525
  Call Trace:
  __bnxt_hwrm_vnic_set_rss+0xb79/0xe40
   bnxt_hwrm_vnic_rss_cfg_p5+0xf7/0x460
   __bnxt_setup_vnic_p5+0x12e/0x270
   __bnxt_open_nic+0x2262/0x2f30
   bnxt_open_nic+0x5d/0xf0
   ethnl_set_channels+0x5d4/0xb30
   ethnl_default_set_doit+0x2f1/0x620

Core does track the additional contexts in net-next, so we can
move this validation out of the driver as a follow up there.

Fixes: b3d0083caf9a ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()")
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Pavan Chebbi <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

skmsg: Skip zero length skb in sk_msg_recvmsg

When running BPF selftests (./test_progs -t sockmap_basic) on a Loongarch
platform, the following kernel panic occurs:

  [...]
  Oops[#1]:
  CPU: 22 PID: 2824 Comm: test_progs Tainted: G           OE  6.10.0-rc2+ #18
  Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018
     ... ...
     ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560
    ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0
   CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
   PRMD: 0000000c (PPLV0 +PIE +PWE)
   EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
   ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
  ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
   BADV: 0000000000000040
   PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
  Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack
  Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...)
  Stack : ...
  Call Trace:
  [<9000000004162774>] copy_page_to_iter+0x74/0x1c0
  [<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560
  [<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0
  [<90000000049aae34>] inet_recvmsg+0x54/0x100
  [<900000000481ad5c>] sock_recvmsg+0x7c/0xe0
  [<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0
  [<900000000481e27c>] sys_recvfrom+0x1c/0x40
  [<9000000004c076ec>] do_syscall+0x8c/0xc0
  [<9000000003731da4>] handle_syscall+0xc4/0x160
  Code: ...
  ---[ end trace 0000000000000000 ]---
  Kernel panic - not syncing: Fatal exception
  Kernel relocated by 0x3510000
   .text @ 0x9000000003710000
   .data @ 0x9000000004d70000
   .bss  @ 0x9000000006469400
  ---[ end Kernel panic - not syncing: Fatal exception ]---
  [...]

This crash happens every time when running sockmap_skb_verdict_shutdown
subtest in sockmap_basic.

This crash is because a NULL pointer is passed to page_address() in the
sk_msg_recvmsg(). Due to the different implementations depending on the
architecture, page_address(NULL) will trigger a panic on Loongarch
platform but not on x86 platform. So this bug was hidden on x86 platform
for a while, but now it is exposed on Loongarch platform. The root cause
is that a zero length skb (skb->len == 0) was put on the queue.

This zero length skb is a TCP FIN packet, which was sent by shutdown(),
invoked in test_sockmap_skb_verdict_shutdown():

shutdown(p1, SHUT_WR);

In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and no
page is put to this sge (see sg_set_page in sg_set_page), but this empty
sge is queued into ingress_msg list.

And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is got by
sg_page(sge). Pass this NULL page to copy_page_to_iter(), which passes it
to kmap_local_page() and to page_address(), then kernel panics.

To solve this, we should skip this zero length skb. So in sk_msg_recvmsg(),
if copy is zero, that means it's a zero length skb, skip invoking
copy_page_to_iter(). We are using the EFAULT return triggered by
copy_page_to_iter to check for is_fin in tcp_bpf.c.

Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Suggested-by: John Fastabend <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: John Fastabend <[email protected]>
Link: https://lore.kernel.org/bpf/e3a16eacdc6740658ee02a33489b1b9d4912f378.1719992715.git.tanggeliang@kylinos.cn

net: phy: microchip: lan87xx: reinit PHY after cable test

Reinit PHY after cable test, otherwise link can't be established on
tested port. This issue is reproducible on LAN9372 switches with
integrated 100BaseT1 PHYs.

Fixes: 788050256c411 ("net: phy: microchip_t1: add cable test support for lan87xx phy")
Signed-off-by: Oleksij Rempel <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Michal Kubiak <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/bpf: Extend tcx tests to cover late tcx_entry release

Add a test case which replaces an active ingress qdisc while keeping the
miniq in-tact during the transition period to the new clsact qdisc.

  # ./vmtest.sh -- ./test_progs -t tc_link
  [...]
  ./test_progs -t tc_link
  [    3.412871] bpf_testmod: loading out-of-tree module taints kernel.
  [    3.413343] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  #332     tc_links_after:OK
  #333     tc_links_append:OK
  #334     tc_links_basic:OK
  #335     tc_links_before:OK
  #336     tc_links_chain_classic:OK
  #337     tc_links_chain_mixed:OK
  #338     tc_links_dev_chain0:OK
  #339     tc_links_dev_cleanup:OK
  #340     tc_links_dev_mixed:OK
  #341     tc_links_ingress:OK
  #342     tc_links_invalid:OK
  #343     tc_links_prepend:OK
  #344     tc_links_replace:OK
  #345     tc_links_revision:OK
  Summary: 14/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>

bpf: Fix too early release of tcx_entry

Pedro Pinto and later independently also Hyunwoo Kim and Wongi Lee reported
an issue that the tcx_entry can be released too early leading to a use
after free (UAF) when an active old-style ingress or clsact qdisc with a
shared tc block is later replaced by another ingress or clsact instance.

Essentially, the sequence to trigger the UAF (one example) can be as follows:

  1. A network namespace is created
  2. An ingress qdisc is created. This allocates a tcx_entry, and
     &tcx_entry->miniq is stored in the qdisc's miniqp->p_miniq. At the
     same time, a tcf block with index 1 is created.
  3. chain0 is attached to the tcf block. chain0 must be connected to
     the block linked to the ingress qdisc to later reach the function
     tcf_chain0_head_change_cb_del() which triggers the UAF.
  4. Create and graft a clsact qdisc. This causes the ingress qdisc
     created in step 1 to be removed, thus freeing the previously linked
     tcx_entry:

     rtnetlink_rcv_msg()
       => tc_modify_qdisc()
         => qdisc_create()
           => clsact_init() [a]
         => qdisc_graft()
           => qdisc_destroy()
             => __qdisc_destroy()
               => ingress_destroy() [b]
                 => tcx_entry_free()
                   => kfree_rcu() // tcx_entry freed

  5. Finally, the network namespace is closed. This registers the
     cleanup_net worker, and during the process of releasing the
     remaining clsact qdisc, it accesses the tcx_entry that was
     already freed in step 4, causing the UAF to occur:

     cleanup_net()
       => ops_exit_list()
         => default_device_exit_batch()
           => unregister_netdevice_many()
             => unregister_netdevice_many_notify()
               => dev_shutdown()
                 => qdisc_put()
                   => clsact_destroy() [c]
                     => tcf_block_put_ext()
                       => tcf_chain0_head_change_cb_del()
                         => tcf_chain_head_change_item()
                           => clsact_chain_head_change()
                             => mini_qdisc_pair_swap() // UAF

There are also other variants, the gist is to add an ingress (or clsact)
qdisc with a specific shared block, then to replace that qdisc, waiting
for the tcx_entry kfree_rcu() to be executed and subsequently accessing
the current active qdisc's miniq one way or another.

The correct fix is to turn the miniq_active boolean into a counter. What
can be observed, at step 2 above, the counter transitions from 0->1, at
step [a] from 1->2 (in order for the miniq object to remain active during
the replacement), then in [b] from 2->1 and finally [c] 1->0 with the
eventual release. The reference counter in general ranges from [0,2] and
it does not need to be atomic since all access to the counter is protected
by the rtnl mutex. With this in place, there is no longer a UAF happening
and the tcx_entry is freed at the correct time.

Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support")
Reported-by: Pedro Pinto <[email protected]>
Co-developed-by: Pedro Pinto <[email protected]>
Signed-off-by: Pedro Pinto <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Hyunwoo Kim <[email protected]>
Cc: Wongi Lee <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>

docs: networking: devlink: capitalise length value

Correct the example to match the help text from the devlink utility.

Signed-off-by: Chris Packham <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: fix incorrect undo caused by DSACK of TLP retransmit

Loss recovery undo_retrans bookkeeping had a long-standing bug where a
DSACK from a spurious TLP retransmit packet could cause an erroneous
undo of a fast recovery or RTO recovery that repaired a single
really-lost packet (in a sequence range outside that of the TLP
retransmit). Basically, because the loss recovery state machine didn't
account for the fact that it sent a TLP retransmit, the DSACK for the
TLP retransmit could erroneously be implicitly be interpreted as
corresponding to the normal fast recovery or RTO recovery retransmit
that plugged a real hole, thus resulting in an improper undo.

For example, consider the following buggy scenario where there is a
real packet loss but the congestion control response is improperly
undone because of this bug:

+ send packets P1, P2, P3, P4
+ P1 is really lost
+ send TLP retransmit of P4
+ receive SACK for original P2, P3, P4
+ enter fast recovery, fast-retransmit P1, increment undo_retrans to 1
+ receive DSACK for TLP P4, decrement undo_retrans to 0, undo (bug!)
+ receive cumulative ACK for P1-P4 (fast retransmit plugged real hole)

The fix: when we initialize undo machinery in tcp_init_undo(), if
there is a TLP retransmit in flight, then increment tp->undo_retrans
so that we make sure that we receive a DSACK corresponding to the TLP
retransmit, as well as DSACKs for all later normal retransmits, before
triggering a loss recovery undo. Note that we also have to move the
line that clears tp->tlp_high_seq for RTO recovery, so that upon RTO
we remember the tp->tlp_high_seq value until tcp_init_undo() and clear
it only afterward.

Also note that the bug dates back to the original 2013 TLP
implementation, commit 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)").

However, this patch will only compile and work correctly with kernels
that have tp->tlp_retrans, which was added only in v5.8 in 2020 in
commit 76be93fc0702 ("tcp: allow at most one TLP probe per flight").
So we associate this fix with that later commit.

Fixes: 76be93fc0702 ("tcp: allow at most one TLP probe per flight")
Signed-off-by: Neal Cardwell <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Cc: Yuchung Cheng <[email protected]>
Cc: Kevin Yang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'wireguard-fixes-for-6-10-rc7'

Jason A. Donenfeld says:

====================
wireguard fixes for 6.10-rc7

These are four small fixes for WireGuard, which are all marked for
stable:

1) A QEMU command line fix to remove deprecated flags.

2) Use of proper unaligned helpers to avoid unaligned memory access on
some systems, from Helge.

3) Two patches to annotate intentional data races, so KCSAN and syzbot
don't get upset.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

wireguard: send: annotate intentional data race in checking empty queue

KCSAN reports a race in wg_packet_send_keepalive, which is intentional:

    BUG: KCSAN: data-race in wg_packet_send_keepalive / wg_packet_send_staged_packets

    write to 0xffff88814cd91280 of 8 bytes by task 3194 on cpu 0:
     __skb_queue_head_init include/linux/skbuff.h:2162 [inline]
     skb_queue_splice_init include/linux/skbuff.h:2248 [inline]
     wg_packet_send_staged_packets+0xe5/0xad0 drivers/net/wireguard/send.c:351
     wg_xmit+0x5b8/0x660 drivers/net/wireguard/device.c:218
     __netdev_start_xmit include/linux/netdevice.h:4940 [inline]
     netdev_start_xmit include/linux/netdevice.h:4954 [inline]
     xmit_one net/core/dev.c:3548 [inline]
     dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3564
     __dev_queue_xmit+0xeff/0x1d80 net/core/dev.c:4349
     dev_queue_xmit include/linux/netdevice.h:3134 [inline]
     neigh_connected_output+0x231/0x2a0 net/core/neighbour.c:1592
     neigh_output include/net/neighbour.h:542 [inline]
     ip6_finish_output2+0xa66/0xce0 net/ipv6/ip6_output.c:137
     ip6_finish_output+0x1a5/0x490 net/ipv6/ip6_output.c:222
     NF_HOOK_COND include/linux/netfilter.h:303 [inline]
     ip6_output+0xeb/0x220 net/ipv6/ip6_output.c:243
     dst_output include/net/dst.h:451 [inline]
     NF_HOOK include/linux/netfilter.h:314 [inline]
     ndisc_send_skb+0x4a2/0x670 net/ipv6/ndisc.c:509
     ndisc_send_rs+0x3ab/0x3e0 net/ipv6/ndisc.c:719
     addrconf_dad_completed+0x640/0x8e0 net/ipv6/addrconf.c:4295
     addrconf_dad_work+0x891/0xbc0
     process_one_work kernel/workqueue.c:2633 [inline]
     process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2706
     worker_thread+0x525/0x730 kernel/workqueue.c:2787
     kthread+0x1d7/0x210 kernel/kthread.c:388
     ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
     ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242

    read to 0xffff88814cd91280 of 8 bytes by task 3202 on cpu 1:
     skb_queue_empty include/linux/skbuff.h:1798 [inline]
     wg_packet_send_keepalive+0x20/0x100 drivers/net/wireguard/send.c:225
     wg_receive_handshake_packet drivers/net/wireguard/receive.c:186 [inline]
     wg_packet_handshake_receive_worker+0x445/0x5e0 drivers/net/wireguard/receive.c:213
     process_one_work kernel/workqueue.c:2633 [inline]
     process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2706
     worker_thread+0x525/0x730 kernel/workqueue.c:2787
     kthread+0x1d7/0x210 kernel/kthread.c:388
     ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
     ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242

    value changed: 0xffff888148fef200 -> 0xffff88814cd91280

Mark this race as intentional by using the skb_queue_empty_lockless()
function rather than skb_queue_empty(), which uses READ_ONCE()
internally to annotate the race.

Cc: [email protected]
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

wireguard: queueing: annotate intentional data race in cpu round robin

KCSAN reports a race in the CPU round robin function, which, as the
comment points out, is intentional:

    BUG: KCSAN: data-race in wg_packet_send_staged_packets / wg_packet_send_staged_packets

    read to 0xffff88811254eb28 of 4 bytes by task 3160 on cpu 1:
     wg_cpumask_next_online drivers/net/wireguard/queueing.h:127 [inline]
     wg_queue_enqueue_per_device_and_peer drivers/net/wireguard/queueing.h:173 [inline]
     wg_packet_create_data drivers/net/wireguard/send.c:320 [inline]
     wg_packet_send_staged_packets+0x60e/0xac0 drivers/net/wireguard/send.c:388
     wg_packet_send_keepalive+0xe2/0x100 drivers/net/wireguard/send.c:239
     wg_receive_handshake_packet drivers/net/wireguard/receive.c:186 [inline]
     wg_packet_handshake_receive_worker+0x449/0x5f0 drivers/net/wireguard/receive.c:213
     process_one_work kernel/workqueue.c:3248 [inline]
     process_scheduled_works+0x483/0x9a0 kernel/workqueue.c:3329
     worker_thread+0x526/0x720 kernel/workqueue.c:3409
     kthread+0x1d1/0x210 kernel/kthread.c:389
     ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147
     ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

    write to 0xffff88811254eb28 of 4 bytes by task 3158 on cpu 0:
     wg_cpumask_next_online drivers/net/wireguard/queueing.h:130 [inline]
     wg_queue_enqueue_per_device_and_peer drivers/net/wireguard/queueing.h:173 [inline]
     wg_packet_create_data drivers/net/wireguard/send.c:320 [inline]
     wg_packet_send_staged_packets+0x6e5/0xac0 drivers/net/wireguard/send.c:388
     wg_packet_send_keepalive+0xe2/0x100 drivers/net/wireguard/send.c:239
     wg_receive_handshake_packet drivers/net/wireguard/receive.c:186 [inline]
     wg_packet_handshake_receive_worker+0x449/0x5f0 drivers/net/wireguard/receive.c:213
     process_one_work kernel/workqueue.c:3248 [inline]
     process_scheduled_works+0x483/0x9a0 kernel/workqueue.c:3329
     worker_thread+0x526/0x720 kernel/workqueue.c:3409
     kthread+0x1d1/0x210 kernel/kthread.c:389
     ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147
     ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

    value changed: 0xffffffff -> 0x00000000

Mark this race as intentional by using READ/WRITE_ONCE().

Cc: [email protected]
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

wireguard: allowedips: avoid unaligned 64-bit memory accesses

On the parisc platform, the kernel issues kernel warnings because
swap_endian() tries to load a 128-bit IPv6 address from an unaligned
memory location:

Kernel: unaligned access to 0x55f4688c in wg_allowedips_insert_v6+0x2c/0x80 [wireguard] (iir 0xf3010df)
Kernel: unaligned access to 0x55f46884 in wg_allowedips_insert_v6+0x38/0x80 [wireguard] (iir 0xf2010dc)

Avoid such unaligned memory accesses by instead using the
get_unaligned_be64() helper macro.

Signed-off-by: Helge Deller <[email protected]>
[Jason: replace src[8] in original patch with src+8]
Cc: [email protected]
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

wireguard: selftests: use acpi=off instead of -no-acpi for recent QEMU

QEMU 9.0 removed -no-acpi, in favor of machine properties, so update the
Makefile to use the correct QEMU invocation.

Cc: [email protected]
Fixes: b83fdcd9fb8a ("wireguard: selftests: use microvm on x86")
Signed-off-by: Jason A. Donenfeld <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: bcmasp: Fix error code in probe()

Return an error code if bcmasp_interface_create() fails. Don't return
success.

Fixes: 490cb412007d ("net: bcmasp: Add support for ASP2.0 Ethernet controller")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Michal Kubiak <[email protected]>
Reviewed-by: Justin Chen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

dsa: lan9303: Fix mapping between DSA port number and PHY address

The 'phy' parameter supplied to lan9303_phy_read/_write was sometimes a
DSA port number and sometimes a PHY address. This isn't a problem as
long as they are equal. But if the external phy_addr_sel_strap pin is
wired to 'high', the PHY addresses change from 0-1-2 to 1-2-3 (CPU,
slave0, slave1). In this case, lan9303_phy_read/_write must translate
between DSA port numbers and the corresponding PHY address.

Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Christian Eggers <[email protected]>
Reviewed-by: Michal Kubiak <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'net-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, wireless and netfilter.

  There's one fix for power management with Intel's e1000e here,
  Thorsten tells us there's another problem that started in v6.9. We're
  trying to wrap that up but I don't think it's blocking.

  Current release - new code bugs:

   - wifi: mac80211: disable softirqs for queued frame handling

   - af_unix: fix uninit-value in __unix_walk_scc(), with the new
     garbage collection algo

  Previous releases - regressions:

   - Bluetooth:
      - qca: fix BT enable failure for QCA6390 after warm reboot
      - add quirk to ignore reserved PHY bits in LE Extended Adv Report,
        abused by some Broadcom controllers found on Apple machines

   - wifi: wilc1000: fix ies_len type in connect path

  Previous releases - always broken:

   - tcp: fix DSACK undo in fast recovery to call tcp_try_to_open(),
     avoid premature timeouts

   - net: make sure skb_datagram_iter maps fragments page by page, in
     case we somehow get compound highmem mixed in

   - eth: bnx2x: fix multiple UBSAN array-index-out-of-bounds when more
     queues are used

  Misc:

   - MAINTAINERS: Remembering Larry Finger"

* tag 'net-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
  bnxt_en: Fix the resource check condition for RSS contexts
  mlxsw: core_linecards: Fix double memory deallocation in case of invalid INI file
  inet_diag: Initialize pad field in struct inet_diag_req_v2
  tcp: Don't flag tcp_sk(sk)->rx_opt.saw_unknown for TCP AO.
  selftests: make order checking verbose in msg_zerocopy selftest
  selftests: fix OOM in msg_zerocopy selftest
  ice: use proper macro for testing bit
  ice: Reject pin requests with unsupported flags
  ice: Don't process extts if PTP is disabled
  ice: Fix improper extts handling
  selftest: af_unix: Add test case for backtrack after finalising SCC.
  af_unix: Fix uninit-value in __unix_walk_scc()
  bonding: Fix out-of-bounds read in bond_option_arp_ip_targets_set()
  net: rswitch: Avoid use-after-free in rswitch_poll()
  netfilter: nf_tables: unconditionally flush pending work before notifier
  wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference
  wifi: iwlwifi: mvm: avoid link lookup in statistics
  wifi: iwlwifi: mvm: don't wake up rx_sync_waitq upon RFKILL
  wifi: iwlwifi: properly set WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK
  wifi: wilc1000: fix ies_len type in connect path
  ...

Merge tag 's390-6.10-8' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Heiko Carstens:

- Fix and add physical to virtual address translations in dasd and
   virtio_ccw drivers. For virtio_ccw this is just a minimal fix.
   More code cleanup will follow.

- Small defconfig updates

* tag 's390-6.10-8' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/dasd: Fix invalid dereferencing of indirect CCW data pointer
  s390/vfio_ccw: Fix target addresses of TIC CCWs
  s390: Update defconfigs

Merge tag 'platform-drivers-x86-v6.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fix from Hans de Goede:

- Fix regression in toshiba_acpi introduced in 6.10-rc1

* tag 'platform-drivers-x86-v6.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: toshiba_acpi: Fix quickstart quirk handling

Merge tag 'kselftest-fix-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull Kselftest fix from Mickaël Salaün:
"Fix Kselftests timeout.

  We can't use CLONE_VFORK, since that blocks the parent - and thus the
  timeout handling - until the child exits or execve's.

  Go back to using plain fork()"

* tag 'kselftest-fix-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  selftests/harness: Fix tests timeout and race condition

Merge tag 'mm-hotfixes-stable-2024-07-03-22-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from, Andrew Morton:
"6 hotfies, all cc:stable. Some fixes for longstanding nilfs2 issues
  and three unrelated MM fixes"

* tag 'mm-hotfixes-stable-2024-07-03-22-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  nilfs2: fix incorrect inode allocation from reserved inodes
  nilfs2: add missing check for inode numbers on directory entries
  nilfs2: fix inode number range checks
  mm: avoid overflows in dirty throttling logic
  Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"
  mm: optimize the redundant loop of mm_update_owner_next()

bnxt_en: Fix the resource check condition for RSS contexts

While creating a new RSS context, bnxt_rfs_capable() currently
makes a strict check to see if the required VNICs are already
available.  If the current VNICs are not what is required,
either too many or not enough, it will call the firmware to
reserve the exact number required.

There is a bug in the firmware when the driver tries to
relinquish some reserved VNICs and RSS contexts.  It will
cause the default VNIC to lose its RSS configuration and
cause receive packets to be placed incorrectly.

Workaround this problem by skipping the resource reduction.
The driver will not reduce the VNIC and RSS context reservations
when a context is deleted.  The resources will be available for
use when new contexts are created later.

Potentially, this workaround can cause us to run out of VNIC
and RSS contexts if there are a lot of VF functions creating
and deleting RSS contexts.  In the future, we will conditionally
disable this workaround when the firmware fix is available.

Fixes: 438ba39b25fe ("bnxt_en: Improve RSS context reservation infrastructure")
Reported-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/netdev/[email protected]/
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: Pavan Chebbi <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: core_linecards: Fix double memory deallocation in case of invalid INI file

In case of invalid INI file mlxsw_linecard_types_init() deallocates memory
but doesn't reset pointer to NULL and returns 0. In case of any error
occurred after mlxsw_linecard_types_init() call, mlxsw_linecards_init()
calls mlxsw_linecard_types_fini() which performs memory deallocation again.

Add pointer reset to NULL.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: b217127e5e4e ("mlxsw: core_linecards: Add line card objects and implement provisioning")
Signed-off-by: Aleksandr Mishin <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Michal Kubiak <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'wireless-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v6.10

Hopefully the last fixes for v6.10. Fix a regression in wilc1000
where bitrate Information Elements longer than 255 bytes were broken.
Few fixes also to mac80211 and iwlwifi.

* tag 'wireless-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference
  wifi: iwlwifi: mvm: avoid link lookup in statistics
  wifi: iwlwifi: mvm: don't wake up rx_sync_waitq upon RFKILL
  wifi: iwlwifi: properly set WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK
  wifi: wilc1000: fix ies_len type in connect path
  wifi: mac80211: fix BSS_CHANGED_UNSOL_BCAST_PROBE_RESP
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'nf-24-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following batch contains a oneliner patch to inconditionally flush
workqueue containing stale objects to be released, syzbot managed to
trigger UaF. Patch from Florian Westphal.

netfilter pull request 24-07-04

* tag 'nf-24-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: unconditionally flush pending work before notifier
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

inet_diag: Initialize pad field in struct inet_diag_req_v2

KMSAN reported uninit-value access in raw_lookup() [1]. Diag for raw
sockets uses the pad field in struct inet_diag_req_v2 for the
underlying protocol. This field corresponds to the sdiag_raw_protocol
field in struct inet_diag_req_raw.

inet_diag_get_exact_compat() converts inet_diag_req to
inet_diag_req_v2, but leaves the pad field uninitialized. So the issue
occurs when raw_lookup() accesses the sdiag_raw_protocol field.

Fix this by initializing the pad field in
inet_diag_get_exact_compat(). Also, do the same fix in
inet_diag_dump_compat() to avoid the similar issue in the future.

[1]
BUG: KMSAN: uninit-value in raw_lookup net/ipv4/raw_diag.c:49 [inline]
BUG: KMSAN: uninit-value in raw_sock_get+0x657/0x800 net/ipv4/raw_diag.c:71
raw_lookup net/ipv4/raw_diag.c:49 [inline]
raw_sock_get+0x657/0x800 net/ipv4/raw_diag.c:71
raw_diag_dump_one+0xa1/0x660 net/ipv4/raw_diag.c:99
inet_diag_cmd_exact+0x7d9/0x980
inet_diag_get_exact_compat net/ipv4/inet_diag.c:1404 [inline]
inet_diag_rcv_msg_compat+0x469/0x530 net/ipv4/inet_diag.c:1426
sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282
netlink_rcv_skb+0x537/0x670 net/netlink/af_netlink.c:2564
sock_diag_rcv+0x35/0x40 net/core/sock_diag.c:297
netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
netlink_unicast+0xe74/0x1240 net/netlink/af_netlink.c:1361
netlink_sendmsg+0x10c6/0x1260 net/netlink/af_netlink.c:1905
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x332/0x3d0 net/socket.c:745
____sys_sendmsg+0x7f0/0xb70 net/socket.c:2585
___sys_sendmsg+0x271/0x3b0 net/socket.c:2639
__sys_sendmsg net/socket.c:2668 [inline]
__do_sys_sendmsg net/socket.c:2677 [inline]
__se_sys_sendmsg net/socket.c:2675 [inline]
__x64_sys_sendmsg+0x27e/0x4a0 net/socket.c:2675
x64_sys_call+0x135e/0x3ce0 arch/x86/include/generated/asm/syscalls_64.h:47
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd9/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f

Uninit was stored to memory at:
raw_sock_get+0x650/0x800 net/ipv4/raw_diag.c:71
raw_diag_dump_one+0xa1/0x660 net/ipv4/raw_diag.c:99
inet_diag_cmd_exact+0x7d9/0x980
inet_diag_get_exact_compat net/ipv4/inet_diag.c:1404 [inline]
inet_diag_rcv_msg_compat+0x469/0x530 net/ipv4/inet_diag.c:1426
sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282
netlink_rcv_skb+0x537/0x670 net/netlink/af_netlink.c:2564
sock_diag_rcv+0x35/0x40 net/core/sock_diag.c:297
netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
netlink_unicast+0xe74/0x1240 net/netlink/af_netlink.c:1361
netlink_sendmsg+0x10c6/0x1260 net/netlink/af_netlink.c:1905
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x332/0x3d0 net/socket.c:745
____sys_sendmsg+0x7f0/0xb70 net/socket.c:2585
___sys_sendmsg+0x271/0x3b0 net/socket.c:2639
__sys_sendmsg net/socket.c:2668 [inline]
__do_sys_sendmsg net/socket.c:2677 [inline]
__se_sys_sendmsg net/socket.c:2675 [inline]
__x64_sys_sendmsg+0x27e/0x4a0 net/socket.c:2675
x64_sys_call+0x135e/0x3ce0 arch/x86/include/generated/asm/syscalls_64.h:47
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd9/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f

Local variable req.i created at:
inet_diag_get_exact_compat net/ipv4/inet_diag.c:1396 [inline]
inet_diag_rcv_msg_compat+0x2a6/0x530 net/ipv4/inet_diag.c:1426
sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282

CPU: 1 PID: 8888 Comm: syz-executor.6 Not tainted 6.10.0-rc4-00217-g35bb670d65fc #32
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014

Fixes: 432490f9d455 ("net: ip, diag -- Add diag interface for raw sockets")
Reported-by: syzkaller <[email protected]>
Signed-off-by: Shigeru Yoshida <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

tcp: Don't flag tcp_sk(sk)->rx_opt.saw_unknown for TCP AO.

When we process segments with TCP AO, we don't check it in
tcp_parse_options(). Thus, opt_rx->saw_unknown is set to 1,
which unconditionally triggers the BPF TCP option parser.

Let's avoid the unnecessary BPF invocation.

Fixes: 0a3a809089eb ("net/tcp: Verify inbound TCP-AO signed segments")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Dmitry Safonov <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'fix-oom-and-order-check-in-msg_zerocopy-selftest'

Zijian Zhang says:

====================
fix OOM and order check in msg_zerocopy selftest

In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
until the socket is not writable. Typically, it will start the receiving
process after around 30+ sendmsgs. However, as the introduction of commit
dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is
always writable and does not get any chance to run recv notifications.
The selftest always exits with OUT_OF_MEMORY because the memory used by
opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a
different value to trigger OOM on older kernels too.

Thus, we introduce "cfg_notification_limit" to force sender to receive
notifications after some number of sendmsgs.

And, we find that when lock debugging is on, notifications may not come in
order. Thus, we have order checking outputs managed by cfg_verbose, to
avoid too many outputs in this case.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: make order checking verbose in msg_zerocopy selftest

We find that when lock debugging is on, notifications may not come in
order. Thus, we have order checking outputs managed by cfg_verbose, to
avoid too many outputs in this case.

Fixes: 07b65c5b31ce ("test: add msg_zerocopy test")
Signed-off-by: Zijian Zhang <[email protected]>
Signed-off-by: Xiaochun Lu <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: fix OOM in msg_zerocopy selftest

In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
until the socket is not writable. Typically, it will start the receiving
process after around 30+ sendmsgs. However, as the introduction of commit
dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is
always writable and does not get any chance to run recv notifications.
The selftest always exits with OUT_OF_MEMORY because the memory used by
opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a
different value to trigger OOM on older kernels too.

Thus, we introduce "cfg_notification_limit" to force sender to receive
notifications after some number of sendmsgs.

Fixes: 07b65c5b31ce ("test: add msg_zerocopy test")
Signed-off-by: Zijian Zhang <[email protected]>
Signed-off-by: Xiaochun Lu <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'intel-wired-lan-driver-updates-2024-06-25-ice'

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-06-25 (ice)

This series contains updates to ice driver only.

Milena adds disabling of extts events when PTP is disabled.

Jake prevents possible NULL pointer by checking that timestamps are
ready before processing extts events and adds checks for unsupported
PTP pin configuration.

Petr Oros replaces _test_bit() with the correct test_bit() macro.
v1: https://lore.kernel.org/netdev/20240625170248 [email protected]/
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ice: use proper macro for testing bit

Do not use _test_bit() macro for testing bit. The proper macro for this
is one without underline.

_test_bit() is what test_bit() was prior to const-optimization. It
directly calls arch_test_bit(), i.e. the arch-specific implementation
(or the generic one). It's strictly _internal_ and shouldn't be used
anywhere outside the actual test_bit() macro.

test_bit() is a wrapper which checks whether the bitmap and the bit
number are compile-time constants and if so, it calls the optimized
function which evaluates this call to a compile-time constant as well.
If either of them is not a compile-time constant, it just calls _test_bit().
test_bit() is the actual function to use anywhere in the kernel.

IOW, calling _test_bit() avoids potential compile-time optimizations.

The sensors is not a compile-time constant, thus most probably there
are no object code changes before and after the patch.
But anyway, we shouldn't call internal wrappers instead of
the actual API.

Fixes: 4da71a77fc3b ("ice: read internal temperature sensor")
Acked-by: Ivan Vecera <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Signed-off-by: Petr Oros <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ice: Reject pin requests with unsupported flags

The driver receives requests for configuring pins via the .enable
callback of the PTP clock object. These requests come into the driver
with flags which modify the requested behavior from userspace. Current
implementation in ice does not reject flags that it doesn't support.
This causes the driver to incorrectly apply requests with such flags as
PTP_PEROUT_DUTY_CYCLE, or any future flags added by the kernel which it
is not yet aware of.

Fix this by properly validating flags in both ice_ptp_cfg_perout and
ice_ptp_cfg_extts. Ensure that we check by bit-wise negating supported
flags rather than just checking and rejecting known un-supported flags.
This is preferable, as it ensures better compatibility with future
kernels.

Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins")
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Signed-off-by: Karol Kolacinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ice: Don't process extts if PTP is disabled

The ice_ptp_extts_event() function can race with ice_ptp_release() and
result in a NULL pointer dereference which leads to a kernel panic.

Panic occurs because the ice_ptp_extts_event() function calls
ptp_clock_event() with a NULL pointer. The ice driver has already
released the PTP clock by the time the interrupt for the next external
timestamp event occurs.

To fix this, modify the ice_ptp_extts_event() function to check the
PTP state and bail early if PTP is not ready.

Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins")
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Signed-off-by: Karol Kolacinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ice: Fix improper extts handling

Extts events are disabled and enabled by the application ts2phc.
However, in case where the driver is removed when the application is
running, a specific extts event remains enabled and can cause a kernel
crash.
As a side effect, when the driver is reloaded and application is started
again, remaining extts event for the channel from a previous run will
keep firing and the message "extts on unexpected channel" might be
printed to the user.

To avoid that, extts events shall be disabled when PTP is released.

Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins")
Reviewed-by: Przemek Kitszel <[email protected]>
Co-developed-by: Jacob Keller <[email protected]>
Signed-off-by: Jacob Keller <[email protected]>
Signed-off-by: Milena Olech <[email protected]>
Signed-off-by: Karol Kolacinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftest: af_unix: Add test case for backtrack after finalising SCC.

syzkaller reported a KMSAN splat in __unix_walk_scc() while backtracking
edge_stack after finalising SCC.

Let's add a test case exercising the path.

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Shigeru Yoshida <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

af_unix: Fix uninit-value in __unix_walk_scc()

KMSAN reported uninit-value access in __unix_walk_scc() [1].

In the list_for_each_entry_reverse() loop, when the vertex's index
equals it's scc_index, the loop uses the variable vertex as a
temporary variable that points to a vertex in scc. And when the loop
is finished, the variable vertex points to the list head, in this case
scc, which is a local variable on the stack (more precisely, it's not
even scc and might underflow the call stack of __unix_walk_scc():
container_of(&scc, struct unix_vertex, scc_entry)).

However, the variable vertex is used under the label prev_vertex. So
if the edge_stack is not empty and the function jumps to the
prev_vertex label, the function will access invalid data on the
stack. This causes the uninit-value access issue.

Fix this by introducing a new temporary variable for the loop.

[1]
BUG: KMSAN: uninit-value in __unix_walk_scc net/unix/garbage.c:478 [inline]
BUG: KMSAN: uninit-value in unix_walk_scc net/unix/garbage.c:526 [inline]
BUG: KMSAN: uninit-value in __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584
__unix_walk_scc net/unix/garbage.c:478 [inline]
unix_walk_scc net/unix/garbage.c:526 [inline]
__unix_gc+0x2589/0x3c20 net/unix/garbage.c:584
process_one_work kernel/workqueue.c:3231 [inline]
process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312
worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393
kthread+0x3c4/0x530 kernel/kthread.c:389
ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Uninit was stored to memory at:
unix_walk_scc net/unix/garbage.c:526 [inline]
__unix_gc+0x2adf/0x3c20 net/unix/garbage.c:584
process_one_work kernel/workqueue.c:3231 [inline]
process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312
worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393
kthread+0x3c4/0x530 kernel/kthread.c:389
ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Local variable entries created at:
ref_tracker_free+0x48/0xf30 lib/ref_tracker.c:222
netdev_tracker_free include/linux/netdevice.h:4058 [inline]
netdev_put include/linux/netdevice.h:4075 [inline]
dev_put include/linux/netdevice.h:4101 [inline]
update_gid_event_work_handler+0xaa/0x1b0 drivers/infiniband/core/roce_gid_mgmt.c:813

CPU: 1 PID: 12763 Comm: kworker/u8:31 Not tainted 6.10.0-rc4-00217-g35bb670d65fc #32
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
Workqueue: events_unbound __unix_gc

Fixes: 3484f063172d ("af_unix: Detect Strongly Connected Components.")
Reported-by: syzkaller <[email protected]>
Signed-off-by: Shigeru Yoshida <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

bonding: Fix out-of-bounds read in bond_option_arp_ip_targets_set()

In function bond_option_arp_ip_targets_set(), if newval->string is an
empty string, newval->string+1 will point to the byte after the
string, causing an out-of-bound read.

BUG: KASAN: slab-out-of-bounds in strlen+0x7d/0xa0 lib/string.c:418
Read of size 1 at addr ffff8881119c4781 by task syz-executor665/8107
CPU: 1 PID: 8107 Comm: syz-executor665 Not tainted 6.7.0-rc7 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:364 [inline]
print_report+0xc1/0x5e0 mm/kasan/report.c:475
kasan_report+0xbe/0xf0 mm/kasan/report.c:588
strlen+0x7d/0xa0 lib/string.c:418
__fortify_strlen include/linux/fortify-string.h:210 [inline]
in4_pton+0xa3/0x3f0 net/core/utils.c:130
bond_option_arp_ip_targets_set+0xc2/0x910
drivers/net/bonding/bond_options.c:1201
__bond_opt_set+0x2a4/0x1030 drivers/net/bonding/bond_options.c:767
__bond_opt_set_notify+0x48/0x150 drivers/net/bonding/bond_options.c:792
bond_opt_tryset_rtnl+0xda/0x160 drivers/net/bonding/bond_options.c:817
bonding_sysfs_store_option+0xa1/0x120 drivers/net/bonding/bond_sysfs.c:156
dev_attr_store+0x54/0x80 drivers/base/core.c:2366
sysfs_kf_write+0x114/0x170 fs/sysfs/file.c:136
kernfs_fop_write_iter+0x337/0x500 fs/kernfs/file.c:334
call_write_iter include/linux/fs.h:2020 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x96a/0xd80 fs/read_write.c:584
ksys_write+0x122/0x250 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
---[ end trace ]---

Fix it by adding a check of string length before using it.

Fixes: f9de11a16594 ("bonding: add ip checks when store ip target")
Signed-off-by: Yue Sun <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Acked-by: Jay Vosburgh <[email protected]>
Reviewed-by: Hangbin Liu <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: rswitch: Avoid use-after-free in rswitch_poll()

The use-after-free is actually in rswitch_tx_free(), which is inlined in
rswitch_poll(). Since `skb` and `gq->skbs[gq->dirty]` are in fact the
same pointer, the skb is first freed using dev_kfree_skb_any(), then the
value in skb->len is used to update the interface statistics.

Let's move around the instructions to use skb->len before the skb is
freed.

This bug is trivial to reproduce using KFENCE. It will trigger a splat
every few packets. A simple ARP request or ICMP echo request is enough.

Fixes: 271e015b9153 ("net: rswitch: Add unmap_addrs instead of dma address in each desc")
Signed-off-by: Radu Rendec <[email protected]>
Reviewed-by: Yoshihiro Shimoda <[email protected]>
Reviewed-by: Niklas Söderlund <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

netfilter: nf_tables: unconditionally flush pending work before notifier

syzbot reports:

KASAN: slab-uaf in nft_ctx_update include/net/netfilter/nf_tables.h:1831
KASAN: slab-uaf in nft_commit_release net/netfilter/nf_tables_api.c:9530
KASAN: slab-uaf int nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597
Read of size 2 at addr ffff88802b0051c4 by task kworker/1:1/45
[..]
Workqueue: events nf_tables_trans_destroy_work
Call Trace:
nft_ctx_update include/net/netfilter/nf_tables.h:1831 [inline]
nft_commit_release net/netfilter/nf_tables_api.c:9530 [inline]
nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597

Problem is that the notifier does a conditional flush, but its possible
that the table-to-be-removed is still referenced by transactions being
processed by the worker, so we need to flush unconditionally.

We could make the flush_work depend on whether we found a table to delete
in nf-next to avoid the flush for most cases.

AFAICS this problem is only exposed in nf-next, with
commit e169285f8c56 ("netfilter: nf_tables: do not store nft_ctx in transaction objects"),
with this commit applied there is an unconditional fetch of
table->family which is whats triggering the above splat.

Fixes: 2c9f0293280e ("netfilter: nf_tables: flush pending destroy work before netlink notifier")
Reported-and-tested-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=4fd66a69358fc15ae2ad
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

Merge tag 'trace-v6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:
"Fix ioctl conflict with memmapped ring buffer ioctl

  It was reported that the ioctl() number used to update the ring buffer
  memory mapping conflicted with the TCGETS ioctl causing strace to
  report:

    $ strace -e ioctl stty
    ioctl(0, TCGETS or TRACE_MMAP_IOCTL_GET_READER, {c_iflag=ICRNL|IXON, c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B38400|CS8|CREAD, c_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|IEXTEN|ECHOCTL|ECHOKE, ...}) = 0

  Since this ioctl hasn't been in a full release yet, change it from
  "T", 0x1 to "R" 0x20, and also reserve 0x20-0x2F for future ioctl
  commands, as some more are being worked on for the future"

* tag 'trace-v6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Have memmapped ring buffer use ioctl of "R" range 0x20-2F

tracing: Have memmapped ring buffer use ioctl of "R" range 0x20-2F

To prevent conflicts with other ioctl numbers to allow strace to have an
idea of what is happening, add the range of ioctls for the trace buffer
mapping from _IO("T", 0x1) to the range of "R" 0x20 - 0x2F.

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
Cc: Jonathan Corbet <[email protected]>
Fixes: cf9f0f7c4c5bb ("tracing: Allow user-space mapping of the ring-buffer")
Link: https://lore.kernel.org/[email protected]
Reported-by: "Dmitry V. Levin" <[email protected]>
Reviewed-by: Mathieu Desnoyers <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

nilfs2: fix incorrect inode allocation from reserved inodes

If the bitmap block that manages the inode allocation status is corrupted,
nilfs_ifile_create_inode() may allocate a new inode from the reserved
inode area where it should not be allocated.

Previous fix commit d325dc6eb763 ("nilfs2: fix use-after-free bug of
struct nilfs_root"), fixed the problem that reserved inodes with inode
numbers less than NILFS_USER_INO (=11) were incorrectly reallocated due to
bitmap corruption, but since the start number of non-reserved inodes is
read from the super block and may change, in which case inode allocation
may occur from the extended reserved inode area.

If that happens, access to that inode will cause an IO error, causing the
file system to degrade to an error state.

Fix this potential issue by adding a wraparound option to the common
metadata object allocation routine and by modifying
nilfs_ifile_create_inode() to disable the option so that it only allocates
inodes with inode numbers greater than or equal to the inode number read
in "nilfs->ns_first_ino", regardless of the bitmap status of reserved
inodes.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

nilfs2: add missing check for inode numbers on directory entries

Syzbot reported that mounting and unmounting a specific pattern of
corrupted nilfs2 filesystem images causes a use-after-free of metadata
file inodes, which triggers a kernel bug in lru_add_fn().

As Jan Kara pointed out, this is because the link count of a metadata file
gets corrupted to 0, and nilfs_evict_inode(), which is called from iput(),
tries to delete that inode (ifile inode in this case).

The inconsistency occurs because directories containing the inode numbers
of these metadata files that should not be visible in the namespace are
read without checking.

Fix this issue by treating the inode numbers of these internal files as
errors in the sanity check helper when reading directory folios/pages.

Also thanks to Hillf Danton and Matthew Wilcox for their initial mm-layer
analysis.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8
Reported-by: Jan Kara <[email protected]>
Closes: https://lkml.kernel.org/r/20240617075758.wewhukbrjod5fp5o@quack3
Tested-by: Ryusuke Konishi <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

nilfs2: fix inode number range checks

Patch series "nilfs2: fix potential issues related to reserved inodes".

This series fixes one use-after-free issue reported by syzbot, caused by
nilfs2's internal inode being exposed in the namespace on a corrupted
filesystem, and a couple of flaws that cause problems if the starting
number of non-reserved inodes written in the on-disk super block is
intentionally (or corruptly) changed from its default value.

This patch (of 3):

In the current implementation of nilfs2, "nilfs->ns_first_ino", which
gives the first non-reserved inode number, is read from the superblock,
but its lower limit is not checked.

As a result, if a number that overlaps with the inode number range of
reserved inodes such as the root directory or metadata files is set in the
super block parameter, the inode number test macros (NILFS_MDT_INODE and
NILFS_VALID_INODE) will not function properly.

In addition, these test macros use left bit-shift calculations using with
the inode number as the shift count via the BIT macro, but the result of a
shift calculation that exceeds the bit width of an integer is undefined in
the C specification, so if "ns_first_ino" is set to a large value other
than the default value NILFS_USER_INO (=11), the macros may potentially
malfunction depending on the environment.

Fix these issues by checking the lower bound of "nilfs->ns_first_ino" and
by preventing bit shifts equal to or greater than the NILFS_USER_INO
constant in the inode number test macros.

Also, change the type of "ns_first_ino" from signed integer to unsigned
integer to avoid the need for type casting in comparisons such as the
lower bound check introduced this time.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: avoid overflows in dirty throttling logic

The dirty throttling logic is interspersed with assumptions that dirty
limits in PAGE_SIZE units fit into 32-bit (so that various multiplications
fit into 64-bits).  If limits end up being larger, we will hit overflows,
possible divisions by 0 etc.  Fix these problems by never allowing so
large dirty limits as they have dubious practical value anyway.  For
dirty_bytes / dirty_background_bytes interfaces we can just refuse to set
so large limits.  For dirty_ratio / dirty_background_ratio it isn't so
simple as the dirty limit is computed from the amount of available memory
which can change due to memory hotplug etc.  So when converting dirty
limits from ratios to numbers of pages, we just don't allow the result to
exceed UINT_MAX.

This is root-only triggerable problem which occurs when the operator
sets dirty limits to >16 TB.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jan Kara <[email protected]>
Reported-by: Zach O'Keefe <[email protected]>
Reviewed-By: Zach O'Keefe <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"

Patch series "mm: Avoid possible overflows in dirty throttling".

Dirty throttling logic assumes dirty limits in page units fit into
32-bits.  This patch series makes sure this is true (see patch 2/2 for
more details).

This patch (of 2):

This reverts commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78.

The commit is broken in several ways.  Firstly, the removed (u64) cast
from the multiplication will introduce a multiplication overflow on 32-bit
archs if wb_thresh * bg_thresh >= 1<<32 (which is actually common - the
default settings with 4GB of RAM will trigger this).  Secondly, the
div64_u64() is unnecessarily expensive on 32-bit archs.  We have
div64_ul() in case we want to be safe & cheap.  Thirdly, if dirty
thresholds are larger than 1<<32 pages, then dirty balancing is going to
blow up in many other spectacular ways anyway so trying to fix one
possible overflow is just moot.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
Signed-off-by: Jan Kara <[email protected]>
Reviewed-By: Zach O'Keefe <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: optimize the redundant loop of mm_update_owner_next()

When mm_update_owner_next() is racing with swapoff (try_to_unuse()) or
/proc or ptrace or page migration (get_task_mm()), it is impossible to
find an appropriate task_struct in the loop whose mm_struct is the same as
the target mm_struct.

If the above race condition is combined with the stress-ng-zombie and
stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in
write_lock_irq() for tasklist_lock.

Recognize this situation in advance and exit early.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jinliang Zheng <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tycho Andersen <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

Merge tag 'io_uring-6.10-20240703' of git://git.kernel.dk/linux

Pull io_uring fix from Jens Axboe:
"A fix for a feature that went into the 6.10 merge window actually
  ended up causing a regression in building bundles for receives.

  Fix that up by ensuring we don't overwrite msg_inq before we use
  it in the loop"

* tag 'io_uring-6.10-20240703' of git://git.kernel.dk/linux:
  io_uring/net: don't clear msg_inq before io_recv_buf_select() needs it

Merge tag 'media/v6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:
"Some fixes related to the IPU6 driver"

* tag 'media/v6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: ivsc: Depend on IPU_BRIDGE or not IPU_BRIDGE
  media: intel/ipu6: Fix a null pointer dereference in ipu6_isys_query_stream_by_source
  media: ipu6: Use the ISYS auxdev device as the V4L2 device's device

s390/dasd: Fix invalid dereferencing of indirect CCW data pointer

Fix invalid dereferencing of indirect CCW data pointer in
dasd_eckd_dump_sense() that leads to a kernel panic in error cases.

When using indirect addressing for DASD CCWs (IDAW) the CCW CDA pointer
does not contain the data address itself but a pointer to the IDAL.
This needs to be translated from physical to virtual as well before
using it.

This dereferencing is also used for dasd_page_cache and also fixed
although it is very unlikely that this code path ever gets used.

Fixes: c0bd39601c13 ("s390/dasd: use new address translation helpers")
Cc: [email protected]
Signed-off-by: Stefan Haberland <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>

wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference

iwl_mvm_get_bss_vif might return a NULL or ERR_PTR. Some of the callers
check only the NULL case, and some doesn't check at all.

Some of the callers even have a pointer to the mvmvif of the bss vif,
so we don't even need to call this function, and can simply get the vif
from mvmvif. Do it for those cases, and for the others - properly check
if IS_ERR_OR_NULL

Fixes: ec0d43d26f2c ("wifi: iwlwifi: mvm: Activate EMLSR based on traffic volume")
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://patch.msgid.link/20240703064027.a661f8c65aac.I45cf09b01af8ee3d55828863958ead741ea43b7f@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: iwlwifi: mvm: avoid link lookup in statistics

We already iterate the link bss_conf/link_info and have the
pointer, or know that deflink/bss_conf is used, so avoid an
extra lookup and just pass the pointer. This may also avoid
a crash when this is processed during restart, where the FW
to link conf array (link_id_to_link_conf) may be NULLed out.

Fixes: c1e458b987f2 ("wifi: iwlwifi: mvm: Move beacon filtering to be per link")
Signed-off-by: Johannes Berg <[email protected]>
Reviewed-by: Ilan Peer <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://patch.msgid.link/20240703064026.346a6ef67a86.Iba5d65d728ca9f58518c88d029496c1250670544@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: iwlwifi: mvm: don't wake up rx_sync_waitq upon RFKILL

Since we now want to sync the queues even when we're in RFKILL, we
shouldn't wake up the wait queue since we still expect to get all the
notifications from the firmware.

Fixes: 4d08c0b3357c ("wifi: iwlwifi: mvm: handle BA session teardown in RF-kill")
Signed-off-by: Emmanuel Grumbach <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://patch.msgid.link/20240703064027.be7a9dbeacde.I5586cb3ca8d6e44f79d819a48a0c22351ff720c9@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: iwlwifi: properly set WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK

The WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK should be set based on the
WOWLAN_KEK_KCK_MATERIAL command version. Currently, the command
version in the firmware has advanced to 4, which prevents the
flag from being set correctly, fix that.

Signed-off-by: Daniel Gabay <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://patch.msgid.link/20240703064026.a0f162108575.If1a9785727d2a1b0197a396680965df1b53d4096@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: wilc1000: fix ies_len type in connect path

Commit 205c50306acf ("wifi: wilc1000: fix RCU usage in connect path")
made sure that the IEs data was manipulated under the relevant RCU section.
Unfortunately, while doing so, the commit brought a faulty implicit cast
from int to u8 on the ies_len variable, making the parsing fail to be
performed correctly if the IEs block is larger than 255 bytes. This failure
can be observed with Access Points appending a lot of IEs TLVs in their
beacon frames (reproduced with a Pixel phone acting as an Access Point,
which brough 273 bytes of IE data in my testing environment).

Fix IEs parsing by removing this undesired implicit cast.

Fixes: 205c50306acf ("wifi: wilc1000: fix RCU usage in connect path")
Signed-off-by: Jozef Hopko <[email protected]>
Signed-off-by: Alexis Lothoré <[email protected]>
Acked-by: Ajay Singh <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://patch.msgid.link/[email protected]

s390/vfio_ccw: Fix target addresses of TIC CCWs

The processing of a Transfer-In-Channel (TIC) CCW requires locating
the target of the CCW in the channel program, and updating the
address to reflect what will actually be sent to hardware.

An error exists where the 64-bit virtual address is truncated to
32-bits (variable "cda") when performing this math. Since s390
addresses of that size are 31-bits, this leaves that additional
bit enabled such that the resulting I/O triggers a channel
program check. This shows up occasionally when booting a KVM
guest from a passthrough DASD device:

  ..snip...
  Interrupt Response Block Data:
  : 0x0000000000003990
      Function Ctrl : [Start]
      Activity Ctrl :
      Status Ctrl : [Alert] [Primary] [Secondary] [Status-Pending]
      Device Status :
      Channel Status : [Program-Check]
      cpa=: 0x00000000008d0018
      prev_ccw=: 0x0000000000000000
      this_ccw=: 0x0000000000000000
  ...snip...
  dasd-ipl: Failed to run IPL1 channel program

The channel program address of "0x008d0018" in the IRB doesn't
look wrong, but tracing the CCWs shows the offending bit enabled:

  ccw=0x0000012e808d0000 cda=00a0b030
  ccw=0x0000012e808d0008 cda=00a0b038
  ccw=0x0000012e808d0010 cda=808d0008
  ccw=0x0000012e808d0018 cda=00a0b040

Fix the calculation of the TIC CCW's data address such that it points
to a valid 31-bit address regardless of the input address.

Fixes: bd36cfbbb9e1 ("s390/vfio_ccw_cp: use new address translation helpers")
Signed-off-by: Eric Farman <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexander Gordeev <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>

net: stmmac: enable HW-accelerated VLAN stripping for gmac4 only

Commit 750011e239a5 ("net: stmmac: Add support for HW-accelerated VLAN
stripping") enables MAC level VLAN tag stripping for all MAC cores, but
leaves set_hw_vlan_mode() and rx_hw_vlan() un-implemented for both gmac
and xgmac.

On gmac and xgmac, ethtool reports rx-vlan-offload is on, both MAC and
driver do nothing about VLAN packets actually, although VLAN works well.

Driver level stripping should be used on gmac and xgmac for now.

Fixes: 750011e239a5 ("net: stmmac: Add support for HW-accelerated VLAN stripping")
Signed-off-by: Furong Xu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ntb_netdev: Move ntb_netdev_rx_handler() to call netif_rx() from __netif_rx()

The following is emitted when using idxd (DSA) dmanegine as the data
mover for ntb_transport that ntb_netdev uses.

[74412.546922] BUG: using smp_processor_id() in preemptible [00000000] code: irq/52-idxd-por/14526
[74412.556784] caller is netif_rx_internal+0x42/0x130
[74412.562282] CPU: 6 PID: 14526 Comm: irq/52-idxd-por Not tainted 6.9.5 #5
[74412.569870] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.E9I.1752.P05.2402080856 02/08/2024
[74412.581699] Call Trace:
[74412.584514]  <TASK>
[74412.586933]  dump_stack_lvl+0x55/0x70
[74412.591129]  check_preemption_disabled+0xc8/0xf0
[74412.596374]  netif_rx_internal+0x42/0x130
[74412.600957]  __netif_rx+0x20/0xd0
[74412.604743]  ntb_netdev_rx_handler+0x66/0x150 [ntb_netdev]
[74412.610985]  ntb_complete_rxc+0xed/0x140 [ntb_transport]
[74412.617010]  ntb_rx_copy_callback+0x53/0x80 [ntb_transport]
[74412.623332]  idxd_dma_complete_txd+0xe3/0x160 [idxd]
[74412.628963]  idxd_wq_thread+0x1a6/0x2b0 [idxd]
[74412.634046]  irq_thread_fn+0x21/0x60
[74412.638134]  ? irq_thread+0xa8/0x290
[74412.642218]  irq_thread+0x1a0/0x290
[74412.646212]  ? __pfx_irq_thread_fn+0x10/0x10
[74412.651071]  ? __pfx_irq_thread_dtor+0x10/0x10
[74412.656117]  ? __pfx_irq_thread+0x10/0x10
[74412.660686]  kthread+0x100/0x130
[74412.664384]  ? __pfx_kthread+0x10/0x10
[74412.668639]  ret_from_fork+0x31/0x50
[74412.672716]  ? __pfx_kthread+0x10/0x10
[74412.676978]  ret_from_fork_asm+0x1a/0x30
[74412.681457]  </TASK>

The cause is due to the idxd driver interrupt completion handler uses
threaded interrupt and the threaded handler is not hard or soft interrupt
context. However __netif_rx() can only be called from interrupt context.
Change the call to netif_rx() in order to allow completion via normal
context for dmaengine drivers that utilize threaded irq handling.

While the following commit changed from netif_rx() to __netif_rx(),
baebdf48c360 ("net: dev: Makes sure netif_rx() can be invoked in any context."),
the change should've been a noop instead. However, the code precedes this
fix should've been using netif_rx_ni() or netif_rx_any_context().

Fixes: 548c237c0a99 ("net: Add support for NTB virtual ethernet device")
Reported-by: Jerry Dai <[email protected]>
Tested-by: Jerry Dai <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phy: aquantia: add missing include guards

The header is missing the include guards so add them.

Reviewed-by: Andrew Lunn <[email protected]>
Fixes: fb470f70fea7 ("net: phy: aquantia: add hwmon support")
Signed-off-by: Bartosz Golaszewski <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'linux_kselftest-fixes-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
"One single patch to fix the non-contiguous CBM resctrl:

  - AMD supports non-contiguous CBM but does not report it via CPUID.
    This test should not use CPUID on AMD to detect non-contiguous CBM
    support. Fix the problem so the test uses CPUID to discover
    non-contiguous CBM support only on Intel"

* tag 'linux_kselftest-fixes-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/resctrl: Fix non-contiguous CBM for AMD

Merge tag 'vfs-6.10-rc7.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:
"VFS:

   - Improve handling of deep ancestor chains in is_subdir()

   - Release locks cleanly when fctnl_setlk() races with close().

     When setting a file lock fails the VFS tries to cleanup the already
     created lock. The helper used for this calls back into the LSM
     layer which may cause it to fail, leaving the stale lock accessible
     via /proc/locks.

  AFS:

   - Fix a comma/semicolon typo"

* tag 'vfs-6.10-rc7.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  afs: Convert comma to semicolon
  fs: better handle deep ancestor chains in is_subdir()
  filelock: Remove locks reliably when fcntl/close race is detected

afs: Convert comma to semicolon

Replace a comma between expression statements by a semicolon.

Signed-off-by: Chen Ni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]
Acked-by: David Howells <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

fs: better handle deep ancestor chains in is_subdir()

Jan reported that 'cd ..' may take a long time in deep directory
hierarchies under a bind-mount. If concurrent renames happen it is
possible to livelock in is_subdir() because it will keep retrying.

Change is_subdir() from simply retrying over and over to retry once and
then acquire the rename lock to handle deep ancestor chains better. The
list of alternatives to this approach were less then pleasant. Change
the scope of rcu lock to cover the whole walk while at it.

A big thanks to Jan and Linus. Both Jan and Linus had proposed
effectively the same thing just that one version ended up being slightly
more elegant.

Reported-by: Jan Kara <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

Merge tag 'erofs-for-6.10-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:
"The most important one fixes possible infinite loops reported by a
  smartphone vendor OPPO recently due to some unexpected zero-sized
  compressed pcluster out of interrupted I/Os, storage failures, etc.

  Another patch fixes global buffer memory leak on unloading, and the
  remaining one switches to use super_set_uuid() to keep with the other
  filesystems.

  Summary:

   - Fix possible global buffer memory leak when unloading EROFS module

   - Fix FS_IOC_GETFSUUID ioctl by using super_set_uuid()

   - Reset m_llen to 0 so then it can retry if metadata is invalid"

* tag 'erofs-for-6.10-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: ensure m_llen is reset to 0 if metadata is invalid
  erofs: convert to use super_set_uuid to support for FS_IOC_GETFSUUID
  erofs: fix possible memory leak in z_erofs_gbuf_exit()

filelock: Remove locks reliably when fcntl/close race is detected

When fcntl_setlk() races with close(), it removes the created lock with
do_lock_file_wait().
However, LSMs can allow the first do_lock_file_wait() that created the lock
while denying the second do_lock_file_wait() that tries to remove the lock.
In theory (but AFAIK not in practice), posix_lock_file() could also fail to
remove a lock due to GFP_KERNEL allocation failure (when splitting a range
in the middle).

After the bug has been triggered, use-after-free reads will occur in
lock_get_status() when userspace reads /proc/locks. This can likely be used
to read arbitrary kernel memory, but can't corrupt kernel memory.
This only affects systems with SELinux / Smack / AppArmor / BPF-LSM in
enforcing mode and only works from some security contexts.

Fix it by calling locks_remove_posix() instead, which is designed to
reliably get rid of POSIX locks associated with the given file and
files_struct and is also used by filp_flush().

Fixes: c293621bbf67 ("[PATCH] stale POSIX lock handling")
Cc: [email protected]
Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563
Signed-off-by: Jann Horn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

io_uring/net: don't clear msg_inq before io_recv_buf_select() needs it

For bundle receives to function properly, the previous iteration msg_inq
value is needed to make a judgement call on how much data there is to
receive. A previous fix ended up clearing it earlier as an error case
would potentially errantly set IORING_CQE_F_SOCK_NONEMPTY if the request
got failed.

Move the assignment to post assigning buffers for the receive, but
ensure that it's cleared for the buffer selection error case. With that,
buffer selection has the right msg_inq value and can correctly bundle
receives as designed.

Noticed while testing where it was apparent than more than 1 buffer was
never received. After fix was in place, multiple buffers are correctly
picked for receive. This provides a 10x speedup for the test case, as
the buffer size used was 64b.

Fixes: 18414a4a2eab ("io_uring/net: assign kmsg inq/flags before buffer selection")
Signed-off-by: Jens Axboe <[email protected]>

Merge branch 'net-txgbe-fix-msi-and-intx-interrupts'

Jiawen Wu says:

====================
net: txgbe: fix MSI and INTx interrupts

Fix MSI and INTx interrupts for txgbe driver.

changes in v3:
- Add flag wx->misc_irq_domain.
- Separate commits.
- Detail null-defer events.

changes in v2: https://lore.kernel.org/all/20240626060703 [email protected]
- Split into two commits.
- Detail commit description.

v1: https://lore.kernel.org/all/20240621080951 [email protected]
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: txgbe: free isb resources at the right time

When using MSI/INTx interrupt, the shared interrupts are still being
handled in the device remove routine, before free IRQs. So isb memory
is still read after it is freed. Thus move wx_free_isb_resources()
from txgbe_close() to txgbe_remove(). And fix the improper isb free
action in txgbe_open() error handling path.

Fixes: aefd013624a1 ("net: txgbe: use irq_domain for interrupt controller")
Signed-off-by: Jiawen Wu <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: txgbe: add extra handle for MSI/INTx into thread irq handle

Rename original txgbe_misc_irq_handle() to txgbe_misc_irq_thread_fn()
since it is the handle thread to wake up. And add the primary handler
to deal the case of MSI/INTx, because there is a schedule NAPI poll.

Fixes: aefd013624a1 ("net: txgbe: use irq_domain for interrupt controller")
Signed-off-by: Jiawen Wu <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: txgbe: remove separate irq request for MSI and INTx

When using MSI or INTx interrupts, request_irq() for pdev->irq will
conflict with request_threaded_irq() for txgbe->misc.irq, to cause
system crash. So remove txgbe_request_irq() for MSI/INTx case, and
rename txgbe_request_msix_irqs() since it only request for queue irqs.

Add wx->misc_irq_domain to determine whether the driver creates an IRQ
domain and threaded request the IRQs.

Fixes: aefd013624a1 ("net: txgbe: use irq_domain for interrupt controller")
Signed-off-by: Jiawen Wu <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: txgbe: initialize num_q_vectors for MSI/INTx interrupts

When using MSI/INTx interrupts, wx->num_q_vectors is uninitialized.
Thus there will be kernel panic in wx_alloc_q_vectors() to allocate
queue vectors.

Fixes: 3f703186113f ("net: libwx: Add irq flow functions")
Signed-off-by: Jiawen Wu <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

platform/x86: toshiba_acpi: Fix quickstart quirk handling

The global hci_hotkey_quickstart quirk flag is tested in
toshiba_acpi_enable_hotkeys() before the quirk flag is properly
initialized based on SMBIOS data. This causes the quirk to be
applied to all models, some of which behave erratically as a
result.

Fix this by initializing the global quirk flags during module
initialization before registering the ACPI driver. This also
allows us to mark toshiba_dmi_quirks[] as __initconst.

Fixes: 23f1d8b47d12 ("platform/x86: toshiba_acpi: Add quirk for buttons on Z830")
Reported-by: kemal <[email protected]>
Closes: https://lore.kernel.org/platform-driver-x86/[email protected]/
Tested-by: kemal <[email protected]>
Cc: [email protected]
Signed-off-by: Armin Wolf <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Hans de Goede <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>

net: stmmac: dwmac-qcom-ethqos: fix error array size

Correct member @num_por with size of right array @emac_v4_0_0_por for
struct ethqos_emac_driver_data @emac_v4_0_0_data.

Cc: [email protected]
Fixes: 8c4d92e82d50 ("net: stmmac: dwmac-qcom-ethqos: add support for emac4 on sa8775p platforms")
Signed-off-by: Yijie Yang <[email protected]>
Reviewed-by: Bartosz Golaszewski <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: allow skb_datagram_iter to be called from any context

We only use the mapping in a single context, so kmap_local is sufficient
and cheaper. Make sure to use skb_frag_foreach_page as skb frags may
contain compound pages and we need to map page by page.

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Fixes: 950fcaecd5cc ("datagram: consolidate datagram copy to iter helpers")
Signed-off-by: Sagi Grimberg <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

s390: Update defconfigs

Acked-by: Alexander Gordeev <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Alexander Gordeev <[email protected]>

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"A couple of error leg problems, one affecting scsi_debug and the other
  affecting pure SAS (i.e. not SATA) SCSI expanders"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed
  scsi: scsi_debug: Fix create target debugfs failure

e1000e: Fix S0ix residency on corporate systems

On vPro systems, the configuration of the I219-LM to achieve power
gating and S0ix residency is split between the driver and the CSME FW.
It was discovered that in some scenarios, where the network cable is
connected and then disconnected, S0ix residency is not always reached.
This was root-caused to a subset of I219-LM register writes that are not
performed by the CSME FW. Therefore, the driver should perform these
register writes on corporate setups, regardless of the CSME FW state.

This was discovered on Meteor Lake systems; however it is likely to
appear on other platforms as well.

Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218589
Signed-off-by: Dima Ruinskiy <[email protected]>
Signed-off-by: Vitaly Lifshits <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'linux-can-fixes-for-6.10-20240701' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2024-07-01

Jimmy Assarsson's patch for the kvaser_usb adds a missing explicit
initialization of the struct kvaser_usb_driver_info::family for the
kvaser_usb_driver_info_leafimx.

* tag 'linux-can-fixes-for-6.10-20240701' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: kvaser_usb: Explicitly initialize family in leafimx driver_info struct
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'cxl-fixes-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull cxl fixes from Dave Jiang:

- Fix no cxl_nvd during pmem region auto-assemble

- Avoid NULLL pointer dereference in region lookup

- Add missing checks to interleave capability

- Add cxl kdoc fix to address document compilation error

* tag 'cxl-fixes-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl: documentation: add missing files to cxl driver-api
  cxl/region: check interleave capability
  cxl/region: Avoid null pointer dereference in region lookup
  cxl/mem: Fix no cxl_nvd during pmem region auto-assembling

Merge tag 'for-6.10-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
"A fixup for a recent fix that prevents an infinite loop during block
  group reclaim.

  Unfortunately it introduced an unsafe way of updating block group list
  and could race with relocation. This could be hit on fast devices when
  relocation/balance does not have enough space"

* tag 'for-6.10-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix adding block group to a reclaim list and the unused list during reclaim

Merge tag 'asm-generic-fixes-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic fix from Arnd Bergmann:
"This fixes up a last minute build regression from the previous set of
bug fixes"

* tag 'asm-generic-fixes-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
syscalls: fix sys_fanotify_mark prototype

Merge tag 'arm-fixes-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull SoC fixes from Arnd Bergmann:
"A number of devicetree fixes came in for the rockchip platforms,
  correcting some of the address information, and reverting a change to
  the MMC controller configuration that caused regressions.

  Four drivers have one code change each, addressing minor build issues
  for the optee firmware driver, the litex SoC platform driver and two
  reset drivers.

  The riscv fixes as also simple, mainly turning off device nodes in the
  canaan dts files unless they are actually usable on a particular
  board.

  Finally, Drew takes over maintaining the THEAD RISC-V SoC platform"

* tag 'arm-fixes-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  drivers/soc/litex: drop obsolete dependency on COMPILE_TEST
  tee: optee: ffa: Fix missing-field-initializers warning
  arm64: dts: rockchip: Add sound-dai-cells for RK3368
  arm64: dts: rockchip: Fix the i2c address of es8316 on Cool Pi 4B
  reset: hisilicon: hi6220: add missing MODULE_DESCRIPTION() macro
  reset: gpio: Fix missing gpiolib dependency for GPIO reset controller
  MAINTAINERS: thead: update Maintainer
  arm64: dts: rockchip: fix PMIC interrupt pin on ROCK Pi E
  riscv: dts: starfive: Set EMMC vqmmc maximum voltage to 3.3V on JH7110 boards
  arm64: dts: rockchip: make poweroff(8) work on Radxa ROCK 5A
  Revert "arm64: dts: rockchip: remove redundant cd-gpios from rk3588 sdmmc nodes"
  ARM: dts: rockchip: rk3066a: add #sound-dai-cells to hdmi node
  arm64: dts: rockchip: Fix the value of `dlg,jack-det-rate` mismatch on rk3399-gru
  arm64: dts: rockchip: set correct pwm0 pinctrl on rk3588-tiger
  riscv: dts: canaan: Disable I/O devices unless used
  riscv: dts: canaan: Clean up serial aliases
  arm64: dts: rockchip: Rename LED related pinctrl nodes on rk3308-rock-pi-s
  arm64: dts: rockchip: Fix SD NAND and eMMC init on rk3308-rock-pi-s
  arm64: dts: rockchip: Fix rk3308 codec@ff560000 reset-names
  arm64: dts: rockchip: Fix the DCDC_REG2 minimum voltage on Quartz64 Model B

Merge tag 'mtd/fixes-for-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux

Pull mtd fixes from Miquel Raynal:

- Rockchip NAND controller driver was not checking the timings properly
   and the introduction of NV-DDR support broke it.

- The core was also misbehaving in some very specific cases: in case of
   (unlikely) bitflips in the parameter page, the fallback might have
   failed as well but for software reasons.

- Finally, the chosen ECC configuration was no longer properly
   propagated to upper layers, mostly failing an info message at probe
   time.

* tag 'mtd/fixes-for-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
  mtd: rawnand: rockchip: ensure NVDDR timings are rejected
  mtd: rawnand: Bypass a couple of sanity checks during NAND identification
  mtd: rawnand: Fix the nand_read_data_op() early check
  mtd: rawnand: Ensure ECC configuration is propagated to upper layers

Merge tag 'vfs-6.10-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:
"Misc:

   - Don't misleadingly warn during filesystem thaw operations.

     It's possible that a block device which was frozen before it was
     mounted can cause a failing thaw operation if someone concurrently
     tried to mount it while that thaw operation was issued and the
     device had already been temporarily claimed for the mount (The
     mount will of course be aborted because the device is frozen).

  netfs:

   - Fix io_uring based write-through. Make sure that the total request
     length is correctly set.

   - Fix partial writes to folio tail.

   - Remove some xarray helpers that were intended for bounce buffers
     which got defered to a later patch series.

   - Make netfs_page_mkwrite() whether folio->mapping is vallid after
     acquiring the folio lock.

   - Make netfs_page_mkrite() flush conflicting data instead of waiting.

  fsnotify:

   - Ensure that fsnotify creation events are generated before fsnotify
     open events when a file is created via ->atomic_open(). The
     ordering was broken before.

   - Ensure that no fsnotify events are generated for O_PATH file
     descriptors. While no fsnotify open events were generated, fsnotify
     close events were. Make it consistent and don't produce any"

* tag 'vfs-6.10-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  netfs: Fix netfs_page_mkwrite() to flush conflicting data, not wait
  netfs: Fix netfs_page_mkwrite() to check folio->mapping is valid
  netfs: Delete some xarray-wangling functions that aren't used
  netfs: Fix early issue of write op on partial write to folio tail
  netfs: Fix io_uring based write-through
  vfs: generate FS_CREATE before FS_OPEN when ->atomic_open used.
  fsnotify: Do not generate events for O_PATH file descriptors
  fs: don't misleadingly warn during thaw operations

btrfs: fix adding block group to a reclaim list and the unused list during reclaim

There is a potential parallel list adding for retrying in
btrfs_reclaim_bgs_work and adding to the unused list. Since the block
group is removed from the reclaim list and it is on a relocation work,
it can be added into the unused list in parallel. When that happens,
adding it to the reclaim list will corrupt the list head and trigger
list corruption like below.

Fix it by taking fs_info->unused_bgs_lock.

  [177.504][T2585409] BTRFS error (device nullb1): error relocating ch= unk 2415919104
  [177.514][T2585409] list_del corruption. next->prev should be ff1100= 0344b119c0, but was ff11000377e87c70. (next=3Dff110002390cd9c0)
  [177.529][T2585409] ------------[ cut here ]------------
  [177.537][T2585409] kernel BUG at lib/list_debug.c:65!
  [177.545][T2585409] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
  [177.555][T2585409] CPU: 9 PID: 2585409 Comm: kworker/u128:2 Tainted: G        W          6.10.0-rc5-kts #1
  [177.568][T2585409] Hardware name: Supermicro SYS-520P-WTR/X12SPW-TF, BIOS 1.2 02/14/2022
  [177.579][T2585409] Workqueue: events_unbound btrfs_reclaim_bgs_work[btrfs]
  [177.589][T2585409] RIP: 0010:__list_del_entry_valid_or_report.cold+0x70/0x72
  [177.624][T2585409] RSP: 0018:ff11000377e87a70 EFLAGS: 00010286
  [177.633][T2585409] RAX: 000000000000006d RBX: ff11000344b119c0 RCX:0000000000000000
  [177.644][T2585409] RDX: 000000000000006d RSI: 0000000000000008 RDI:ffe21c006efd0f40
  [177.655][T2585409] RBP: ff110002e0509f78 R08: 0000000000000001 R09:ffe21c006efd0f08
  [177.665][T2585409] R10: ff11000377e87847 R11: 0000000000000000 R12:ff110002390cd9c0
  [177.676][T2585409] R13: ff11000344b119c0 R14: ff110002e0508000 R15:dffffc0000000000
  [177.687][T2585409] FS:  0000000000000000(0000) GS:ff11000fec880000(0000) knlGS:0000000000000000
  [177.700][T2585409] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [177.709][T2585409] CR2: 00007f06bc7b1978 CR3: 0000001021e86005 CR4:0000000000771ef0
  [177.720][T2585409] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
  [177.731][T2585409] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
  [177.742][T2585409] PKRU: 55555554
  [177.748][T2585409] Call Trace:
  [177.753][T2585409]  <TASK>
  [177.759][T2585409]  ? __die_body.cold+0x19/0x27
  [177.766][T2585409]  ? die+0x2e/0x50
  [177.772][T2585409]  ? do_trap+0x1ea/0x2d0
  [177.779][T2585409]  ? __list_del_entry_valid_or_report.cold+0x70/0x72
  [177.788][T2585409]  ? do_error_trap+0xa3/0x160
  [177.795][T2585409]  ? __list_del_entry_valid_or_report.cold+0x70/0x72
  [177.805][T2585409]  ? handle_invalid_op+0x2c/0x40
  [177.812][T2585409]  ? __list_del_entry_valid_or_report.cold+0x70/0x72
  [177.820][T2585409]  ? exc_invalid_op+0x2d/0x40
  [177.827][T2585409]  ? asm_exc_invalid_op+0x1a/0x20
  [177.834][T2585409]  ? __list_del_entry_valid_or_report.cold+0x70/0x72
  [177.843][T2585409]  btrfs_delete_unused_bgs+0x3d9/0x14c0 [btrfs]

There is a similar retry_list code in btrfs_delete_unused_bgs(), but it is
safe, AFAICS. Since the block group was in the unused list, the used bytes
should be 0 when it was added to the unused list. Then, it checks
block_group->{used,reserved,pinned} are still 0 under the
block_group->lock. So, they should be still eligible for the unused list,
not the reclaim list.

The reason it is safe there it's because because we're holding
space_info->groups_sem in write mode.

That means no other task can allocate from the block group, so while we
are at deleted_unused_bgs() it's not possible for other tasks to
allocate and deallocate extents from the block group, so it can't be
added to the unused list or the reclaim list by anyone else.

The bug can be reproduced by btrfs/166 after a few rounds. In practice
this can be hit when relocation cannot find more chunk space and ends
with ENOSPC.

Reported-by: Shinichiro Kawasaki <[email protected]>
Suggested-by: Johannes Thumshirn <[email protected]>
Fixes: 4eb4e85c4f81 ("btrfs: retry block group reclaim without infinite loop")
CC: [email protected] # 5.15+
Reviewed-by: Filipe Manana <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Naohiro Aota <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

syscalls: fix sys_fanotify_mark prototype

My earlier fix missed an incorrect function prototype that shows up on
native 32-bit builds:

In file included from fs/notify/fanotify/fanotify_user.c:14:
include/linux/syscalls.h:248:25: error: conflicting types for 'sys_fanotify_mark'; have 'long int(int,  unsigned int,  u32,  u32,  int,  const char *)' {aka 'long int(int,  unsigned int,  unsigned int,  unsigned int,  int,  const char *)'}
1924 | SYSCALL32_DEFINE6(fanotify_mark,
      | ^~~~~~~~~~~~~~~~~
include/linux/syscalls.h:862:17: note: previous declaration of 'sys_fanotify_mark' with type 'long int(int,  unsigned int,  u64,  int, const char *)' {aka 'long int(int,  unsigned int,  long long unsigned int,  int,  const char *)'}

On x86 and powerpc, the prototype is also wrong but hidden in an #ifdef,
so it never caused problems.

Add another alternative declaration that matches the conditional function
definition.

Fixes: 403f17a33073 ("parisc: use generic sys_fanotify_mark implementation")
Cc: [email protected]
Reported-by: Guenter Roeck <[email protected]>
Reported-by: Geert Uytterhoeven <[email protected]>
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'v6.10-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/fixes

Apart from the regular dts fixes for wrong addresses, missing
or wrong properties, this reverts the previous move away from
cd-gpios to the mmc-controller's internal card-detect.
With this change applied, it was reported that boards could not
detect card anymore, so this go reverted of course.

* tag 'v6.10-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  arm64: dts: rockchip: Add sound-dai-cells for RK3368
  arm64: dts: rockchip: Fix the i2c address of es8316 on Cool Pi 4B
  arm64: dts: rockchip: fix PMIC interrupt pin on ROCK Pi E
  arm64: dts: rockchip: make poweroff(8) work on Radxa ROCK 5A
  Revert "arm64: dts: rockchip: remove redundant cd-gpios from rk3588 sdmmc nodes"
  ARM: dts: rockchip: rk3066a: add #sound-dai-cells to hdmi node
  arm64: dts: rockchip: Fix the value of `dlg,jack-det-rate` mismatch on rk3399-gru
  arm64: dts: rockchip: set correct pwm0 pinctrl on rk3588-tiger
  arm64: dts: rockchip: Rename LED related pinctrl nodes on rk3308-rock-pi-s
  arm64: dts: rockchip: Fix SD NAND and eMMC init on rk3308-rock-pi-s
  arm64: dts: rockchip: Fix rk3308 codec@ff560000 reset-names
  arm64: dts: rockchip: Fix the DCDC_REG2 minimum voltage on Quartz64 Model B

Link: https://lore.kernel.org/r/10237789.nnTZe4vzsl@diego
Signed-off-by: Arnd Bergmann <[email protected]>