Git Repo - linux.git/log

Merge tag 'ipsec-2024-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2024-10-22

1) Fix routing behavior that relies on L4 information
   for xfrm encapsulated packets.
   From Eyal Birger.

2) Remove leftovers of pernet policy_inexact lists.
   From Florian Westphal.

3) Validate new SA's prefixlen when the selector family is
   not set from userspace.
   From Sabrina Dubroca.

4) Fix a kernel-infoleak when dumping an auth algorithm.
   From Petr Vaganov.

Please pull or let me know if there are problems.

ipsec-2024-10-22

* tag 'ipsec-2024-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm: fix one more kernel-infoleak in algo dumping
  xfrm: validate new SA's prefixlen using SA family when sel.family is unset
  xfrm: policy: remove last remnants of pernet inexact list
  xfrm: respect ip protocols rules criteria when performing dst lookups
  xfrm: extract dst lookup parameters into a struct
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime()

If get_clock_desc() succeeds, it calls fget() for the clockid's fd,
and get the clk->rwsem read lock, so the error path should release
the lock to make the lock balance and fput the clockid's fd to make
the refcount balance and release the fd related resource.

However the below commit left the error path locked behind resulting in
unbalanced locking. Check timespec64_valid_strict() before
get_clock_desc() to fix it, because the "ts" is not changed
after that.

Fixes: d8794ac20a29 ("posix-clock: Fix missing timespec64 check in pc_clock_settime()")
Acked-by: Richard Cochran <[email protected]>
Signed-off-by: Jinjie Ruan <[email protected]>
Acked-by: Anna-Maria Behnsen <[email protected]>
[[email protected]: fixed commit message typo]
Signed-off-by: Paolo Abeni <[email protected]>

r8169: avoid unsolicited interrupts

It was reported that after resume from suspend a PCI error is logged
and connectivity is broken. Error message is:
PCI error (cmd = 0x0407, status_errs = 0x0000)
The message seems to be a red herring as none of the error bits is set,
and the PCI command register value also is normal. Exception handling
for a PCI error includes a chip reset what apparently brakes connectivity
here. The interrupt status bit triggering the PCI error handling isn't
actually used on PCIe chip versions, so it's not clear why this bit is
set by the chip. Fix this by ignoring this bit on PCIe chip versions.

Fixes: 0e4851502f84 ("r8169: merge with version 8.001.00 of Realtek's r8168 driver")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219388
Tested-by: Atlas Yu <[email protected]>
Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: sched: use RCU read-side critical section in taprio_dump()

Fix possible use-after-free in 'taprio_dump()' by adding RCU
read-side critical section there. Never seen on x86 but
found on a KASAN-enabled arm64 system when investigating
https://syzkaller.appspot.com/bug?extid=b65e0af58423fc8a73aa:

[T15862] BUG: KASAN: slab-use-after-free in taprio_dump+0xa0c/0xbb0
[T15862] Read of size 4 at addr ffff0000d4bb88f8 by task repro/15862
[T15862]
[T15862] CPU: 0 UID: 0 PID: 15862 Comm: repro Not tainted 6.11.0-rc1-00293-gdefaf1a2113a-dirty #2
[T15862] Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-20240524-5.fc40 05/24/2024
[T15862] Call trace:
[T15862]  dump_backtrace+0x20c/0x220
[T15862]  show_stack+0x2c/0x40
[T15862]  dump_stack_lvl+0xf8/0x174
[T15862]  print_report+0x170/0x4d8
[T15862]  kasan_report+0xb8/0x1d4
[T15862]  __asan_report_load4_noabort+0x20/0x2c
[T15862]  taprio_dump+0xa0c/0xbb0
[T15862]  tc_fill_qdisc+0x540/0x1020
[T15862]  qdisc_notify.isra.0+0x330/0x3a0
[T15862]  tc_modify_qdisc+0x7b8/0x1838
[T15862]  rtnetlink_rcv_msg+0x3c8/0xc20
[T15862]  netlink_rcv_skb+0x1f8/0x3d4
[T15862]  rtnetlink_rcv+0x28/0x40
[T15862]  netlink_unicast+0x51c/0x790
[T15862]  netlink_sendmsg+0x79c/0xc20
[T15862]  __sock_sendmsg+0xe0/0x1a0
[T15862]  ____sys_sendmsg+0x6c0/0x840
[T15862]  ___sys_sendmsg+0x1ac/0x1f0
[T15862]  __sys_sendmsg+0x110/0x1d0
[T15862]  __arm64_sys_sendmsg+0x74/0xb0
[T15862]  invoke_syscall+0x88/0x2e0
[T15862]  el0_svc_common.constprop.0+0xe4/0x2a0
[T15862]  do_el0_svc+0x44/0x60
[T15862]  el0_svc+0x50/0x184
[T15862]  el0t_64_sync_handler+0x120/0x12c
[T15862]  el0t_64_sync+0x190/0x194
[T15862]
[T15862] Allocated by task 15857:
[T15862]  kasan_save_stack+0x3c/0x70
[T15862]  kasan_save_track+0x20/0x3c
[T15862]  kasan_save_alloc_info+0x40/0x60
[T15862]  __kasan_kmalloc+0xd4/0xe0
[T15862]  __kmalloc_cache_noprof+0x194/0x334
[T15862]  taprio_change+0x45c/0x2fe0
[T15862]  tc_modify_qdisc+0x6a8/0x1838
[T15862]  rtnetlink_rcv_msg+0x3c8/0xc20
[T15862]  netlink_rcv_skb+0x1f8/0x3d4
[T15862]  rtnetlink_rcv+0x28/0x40
[T15862]  netlink_unicast+0x51c/0x790
[T15862]  netlink_sendmsg+0x79c/0xc20
[T15862]  __sock_sendmsg+0xe0/0x1a0
[T15862]  ____sys_sendmsg+0x6c0/0x840
[T15862]  ___sys_sendmsg+0x1ac/0x1f0
[T15862]  __sys_sendmsg+0x110/0x1d0
[T15862]  __arm64_sys_sendmsg+0x74/0xb0
[T15862]  invoke_syscall+0x88/0x2e0
[T15862]  el0_svc_common.constprop.0+0xe4/0x2a0
[T15862]  do_el0_svc+0x44/0x60
[T15862]  el0_svc+0x50/0x184
[T15862]  el0t_64_sync_handler+0x120/0x12c
[T15862]  el0t_64_sync+0x190/0x194
[T15862]
[T15862] Freed by task 6192:
[T15862]  kasan_save_stack+0x3c/0x70
[T15862]  kasan_save_track+0x20/0x3c
[T15862]  kasan_save_free_info+0x4c/0x80
[T15862]  poison_slab_object+0x110/0x160
[T15862]  __kasan_slab_free+0x3c/0x74
[T15862]  kfree+0x134/0x3c0
[T15862]  taprio_free_sched_cb+0x18c/0x220
[T15862]  rcu_core+0x920/0x1b7c
[T15862]  rcu_core_si+0x10/0x1c
[T15862]  handle_softirqs+0x2e8/0xd64
[T15862]  __do_softirq+0x14/0x20

Fixes: 18cdd2f0998a ("net/sched: taprio: taprio_dump and taprio_change are protected by rtnl_mutex")
Acked-by: Vinicius Costa Gomes <[email protected]>
Signed-off-by: Dmitry Antipov <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: sched: fix use-after-free in taprio_change()

In 'taprio_change()', 'admin' pointer may become dangling due to sched
switch / removal caused by 'advance_sched()', and critical section
protected by 'q->current_entry_lock' is too small to prevent from such
a scenario (which causes use-after-free detected by KASAN). Fix this
by prefer 'rcu_replace_pointer()' over 'rcu_assign_pointer()' to update
'admin' immediately before an attempt to schedule freeing.

Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule")
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=b65e0af58423fc8a73aa
Acked-by: Vinicius Costa Gomes <[email protected]>
Signed-off-by: Dmitry Antipov <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net/sched: act_api: deny mismatched skip_sw/skip_hw flags for actions created by classifiers

tcf_action_init() has logic for checking mismatches between action and
filter offload flags (skip_sw/skip_hw). AFAIU, this is intended to run
on the transition between the new tc_act_bind(flags) returning true (aka
now gets bound to classifier) and tc_act_bind(act->tcfa_flags) returning
false (aka action was not bound to classifier before). Otherwise, the
check is skipped.

For the case where an action is not standalone, but rather it was
created by a classifier and is bound to it, tcf_action_init() skips the
check entirely, and this means it allows mismatched flags to occur.

Taking the matchall classifier code path as an example (with mirred as
an action), the reason is the following:

1 | mall_change()
2 | -> mall_replace_hw_filter()
3 |   -> tcf_exts_validate_ex()
4 |      -> flags |= TCA_ACT_FLAGS_BIND;
5 |      -> tcf_action_init()
6 |         -> tcf_action_init_1()
7 |            -> a_o->init()
8 |               -> tcf_mirred_init()
9 |                  -> tcf_idr_create_from_flags()
10 |                     -> tcf_idr_create()
11 |                        -> p->tcfa_flags = flags;
12 |         -> tc_act_bind(flags))
13 |         -> tc_act_bind(act->tcfa_flags)

When invoked from tcf_exts_validate_ex() like matchall does (but other
classifiers validate their extensions as well), tcf_action_init() runs
in a call path where "flags" always contains TCA_ACT_FLAGS_BIND (set by
line 4). So line 12 is always true, and line 13 is always true as well.
No transition ever takes place, and the check is skipped.

The code was added in this form in commit c86e0209dc77 ("flow_offload:
validate flags of filter and actions"), but I'm attributing the blame
even earlier in that series, to when TCA_ACT_FLAGS_SKIP_HW and
TCA_ACT_FLAGS_SKIP_SW were added to the UAPI.

Following the development process of this change, the check did not
always exist in this form. A change took place between v3 [1] and v4 [2],
AFAIU due to review feedback that it doesn't make sense for action flags
to be different than classifier flags. I think I agree with that
feedback, but it was translated into code that omits enforcing this for
"classic" actions created at the same time with the filters themselves.

There are 3 more important cases to discuss. First there is this command:

$ tc qdisc add dev eth0 clasct
$ tc filter add dev eth0 ingress matchall skip_sw \
action mirred ingress mirror dev eth1

which should be allowed, because prior to the concept of dedicated
action flags, it used to work and it used to mean the action inherited
the skip_sw/skip_hw flags from the classifier. It's not a mismatch.

Then we have this command:

$ tc qdisc add dev eth0 clasct
$ tc filter add dev eth0 ingress matchall skip_sw \
action mirred ingress mirror dev eth1 skip_hw

where there is a mismatch and it should be rejected.

Finally, we have:

$ tc qdisc add dev eth0 clasct
$ tc filter add dev eth0 ingress matchall skip_sw \
action mirred ingress mirror dev eth1 skip_sw

where the offload flags coincide, and this should be treated the same as
the first command based on inheritance, and accepted.

[1]: https://lore.kernel.org/netdev/20211028110646 [email protected]/
[2]: https://lore.kernel.org/netdev/20211118130805 [email protected]/
Fixes: 7adc57651211 ("flow_offload: add skip_hw and skip_sw to control if offload the action")
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Tested-by: Ido Schimmel <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: usb: usbnet: fix name regression

The fix for MAC addresses broke detection of the naming convention
because it gave network devices no random MAC before bind()
was called. This means that the check for the local assignment bit
was always negative as the address was zeroed from allocation,
instead of from overwriting the MAC with a unique hardware address.

The correct check for whether bind() has altered the MAC is
done with is_zero_ether_addr

Signed-off-by: Oliver Neukum <[email protected]>
Reported-by: Greg Thelen <[email protected]>
Diagnosed-by: John Sperbeck <[email protected]>
Fixes: bab8eb0dd4cb9 ("usbnet: modern method to get random MAC")
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

mlxsw: spectrum_router: fix xa_store() error checking

It is meant to use xa_err() to extract the error encoded in the return
value of xa_store().

Fixes: 44c2fbebe18a ("mlxsw: spectrum_router: Share nexthop counters in resilient groups")
Signed-off-by: Yuan Can <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Tested-by: Petr Machata <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge tag 'nf-24-10-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================

This patchset contains Netfilter fixes for net:

1) syzkaller managed to triger UaF due to missing reference on netns in
   bpf infrastructure, from Florian Westphal.

2) Fix incorrect conversion from NFPROTO_UNSPEC to NFPROTO_{IPV4,IPV6}
   in the following xtables targets: MARK and NFLOG. Moreover, add
   missing

I have my half share in this mistake, I did not take the necessary time
to review this: For several years I have been struggling to keep working
on Netfilter, juggling a myriad of side consulting projects to stop
burning my own savings.

I have extended the iptables-tests.py test infrastructure to improve the
coverage of ip6tables and detect similar problems in the future.

This is a v2 including a extended PR with one more fix.

netfilter pull request 24-10-21

* tag 'nf-24-10-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: xtables: fix typo causing some targets not to load on IPv6
  netfilter: bpf: must hold reference on net namespace
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

virtio_net: fix integer overflow in stats

Static analysis on linux-next has detected the following issue
in function virtnet_stats_ctx_init, in drivers/net/virtio_net.c :

        if (vi->device_stats_cap & VIRTIO_NET_STATS_TYPE_CVQ) {
                queue_type = VIRTNET_Q_TYPE_CQ;
                ctx->bitmap[queue_type]   |= VIRTIO_NET_STATS_TYPE_CVQ;
                ctx->desc_num[queue_type] += ARRAY_SIZE(virtnet_stats_cvq_desc);
                ctx->size[queue_type]     += sizeof(struct virtio_net_stats_cvq);
        }

ctx->bitmap is declared as a u32 however it is being bit-wise or'd with
VIRTIO_NET_STATS_TYPE_CVQ and this is defined as 1 << 32:

include/uapi/linux/virtio_net.h:#define VIRTIO_NET_STATS_TYPE_CVQ (1ULL << 32)

..and hence the bit-wise or operation won't set any bits in ctx->bitmap
because 1ULL < 32 is too wide for a u32.

In fact, the field is read into a u64:

       u64 offset, bitmap;
....
       bitmap = ctx->bitmap[queue_type];

so to fix, it is enough to make bitmap an array of u64.

Fixes: 941168f8b40e5 ("virtio_net: support device stats")
Reported-by: "Colin King (gmail)" <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Jason Wang <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Link: https://patch.msgid.link/53e2bd6728136d5916e384a7840e5dc7eebff832.1729099611.git.mst@redhat.com
Signed-off-by: Paolo Abeni <[email protected]>

net: fix races in netdev_tx_sent_queue()/dev_watchdog()

Some workloads hit the infamous dev_watchdog() message:

"NETDEV WATCHDOG: eth0 (xxxx): transmit queue XX timed out"

It seems possible to hit this even for perfectly normal
BQL enabled drivers:

1) Assume a TX queue was idle for more than dev->watchdog_timeo
   (5 seconds unless changed by the driver)

2) Assume a big packet is sent, exceeding current BQL limit.

3) Driver ndo_start_xmit() puts the packet in TX ring,
   and netdev_tx_sent_queue() is called.

4) QUEUE_STATE_STACK_XOFF could be set from netdev_tx_sent_queue()
   before txq->trans_start has been written.

5) txq->trans_start is written later, from netdev_start_xmit()

    if (rc == NETDEV_TX_OK)
          txq_trans_update(txq)

dev_watchdog() running on another cpu could read the old
txq->trans_start, and then see QUEUE_STATE_STACK_XOFF, because 5)
did not happen yet.

To solve the issue, write txq->trans_start right before one XOFF bit
is set :

- _QUEUE_STATE_DRV_XOFF from netif_tx_stop_queue()
- __QUEUE_STATE_STACK_XOFF from netdev_tx_sent_queue()

From dev_watchdog(), we have to read txq->state before txq->trans_start.

Add memory barriers to enforce correct ordering.

In the future, we could avoid writing over txq->trans_start for normal
operations, and rename this field to txq->xoff_start_time.

Fixes: bec251bc8b6a ("net: no longer stop all TX queues in dev_watchdog()")
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Reviewed-by: Toke Høiland-Jørgensen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: wwan: fix global oob in wwan_rtnl_policy

The variable wwan_rtnl_link_ops assign a *bigger* maxtype which leads to
a global out-of-bounds read when parsing the netlink attributes. Exactly
same bug cause as the oob fixed in commit b33fb5b801c6 ("net: qualcomm:
rmnet: fix global oob in rmnet_policy").

==================================================================
BUG: KASAN: global-out-of-bounds in validate_nla lib/nlattr.c:388 [inline]
BUG: KASAN: global-out-of-bounds in __nla_validate_parse+0x19d7/0x29a0 lib/nlattr.c:603
Read of size 1 at addr ffffffff8b09cb60 by task syz.1.66276/323862

CPU: 0 PID: 323862 Comm: syz.1.66276 Not tainted 6.1.70 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x177/0x231 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:284 [inline]
print_report+0x14f/0x750 mm/kasan/report.c:395
kasan_report+0x139/0x170 mm/kasan/report.c:495
validate_nla lib/nlattr.c:388 [inline]
__nla_validate_parse+0x19d7/0x29a0 lib/nlattr.c:603
__nla_parse+0x3c/0x50 lib/nlattr.c:700
nla_parse_nested_deprecated include/net/netlink.h:1269 [inline]
__rtnl_newlink net/core/rtnetlink.c:3514 [inline]
rtnl_newlink+0x7bc/0x1fd0 net/core/rtnetlink.c:3623
rtnetlink_rcv_msg+0x794/0xef0 net/core/rtnetlink.c:6122
netlink_rcv_skb+0x1de/0x420 net/netlink/af_netlink.c:2508
netlink_unicast_kernel net/netlink/af_netlink.c:1326 [inline]
netlink_unicast+0x74b/0x8c0 net/netlink/af_netlink.c:1352
netlink_sendmsg+0x882/0xb90 net/netlink/af_netlink.c:1874
sock_sendmsg_nosec net/socket.c:716 [inline]
__sock_sendmsg net/socket.c:728 [inline]
____sys_sendmsg+0x5cc/0x8f0 net/socket.c:2499
___sys_sendmsg+0x21c/0x290 net/socket.c:2553
__sys_sendmsg net/socket.c:2582 [inline]
__do_sys_sendmsg net/socket.c:2591 [inline]
__se_sys_sendmsg+0x19e/0x270 net/socket.c:2589
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x45/0x90 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f67b19a24ad
RSP: 002b:00007f67b17febb8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f67b1b45f80 RCX: 00007f67b19a24ad
RDX: 0000000000000000 RSI: 0000000020005e40 RDI: 0000000000000004
RBP: 00007f67b1a1e01d R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd2513764f R14: 00007ffd251376e0 R15: 00007f67b17fed40
</TASK>

The buggy address belongs to the variable:
wwan_rtnl_policy+0x20/0x40

The buggy address belongs to the physical page:
page:ffffea00002c2700 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xb09c
flags: 0xfff00000001000(reserved|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000001000 ffffea00002c2708 ffffea00002c2708 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner info is not present (never set?)

Memory state around the buggy address:
ffffffff8b09ca00: 05 f9 f9 f9 05 f9 f9 f9 00 01 f9 f9 00 01 f9 f9
ffffffff8b09ca80: 00 00 00 05 f9 f9 f9 f9 00 00 03 f9 f9 f9 f9 f9
>ffffffff8b09cb00: 00 00 00 00 05 f9 f9 f9 00 00 00 00 f9 f9 f9 f9
^
ffffffff8b09cb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

According to the comment of `nla_parse_nested_deprecated`, use correct size
`IFLA_WWAN_MAX` here to fix this issue.

Fixes: 88b710532e53 ("wwan: add interface creation support")
Signed-off-by: Lin Ma <[email protected]>
Reviewed-by: Loic Poulain <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

netfilter: xtables: fix typo causing some targets not to load on IPv6

- There is no NFPROTO_IPV6 family for mark and NFLOG.
- TRACE is also missing module autoload with NFPROTO_IPV6.

This results in ip6tables failing to restore a ruleset. This issue has been
reported by several users providing incomplete patches.

Very similar to Ilya Katsnelson's patch including a missing chunk in the
TRACE extension.

Fixes: 0bfcb7b71e73 ("netfilter: xtables: avoid NFPROTO_UNSPEC where needed")
Reported-by: Ignat Korchagin <[email protected]>
Reported-by: Ilya Katsnelson <[email protected]>
Reported-by: Krzysztof Olędzki <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

Merge branch 'fsl-fman-fix-refcount-handling-of-fman-related-devices'

Aleksandr Mishin says:

====================
fsl/fman: Fix refcount handling of fman-related devices

The series is intended to fix refcount handling for fman-related "struct
device" objects - the devices are not released upon driver removal or in
the error paths during probe. This leads to device reference leaks.

The device pointers are now saved to struct mac_device and properly handled
in the driver's probe and removal functions.

Originally reported by Simon Horman <[email protected]>
(https://lore.kernel.org/all/20240702133651 [email protected]/)

Compile tested only.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

fsl/fman: Fix refcount handling of fman-related devices

In mac_probe() there are multiple calls to of_find_device_by_node(),
fman_bind() and fman_port_bind() which takes references to of_dev->dev.
Not all references taken by these calls are released later on error path
in mac_probe() and in mac_remove() which lead to reference leaks.

Add references release.

Fixes: 3933961682a3 ("fsl/fman: Add FMan MAC driver")
Signed-off-by: Aleksandr Mishin <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

fsl/fman: Save device references taken in mac_probe()

In mac_probe() there are calls to of_find_device_by_node() which takes
references to of_dev->dev. These references are not saved and not released
later on error path in mac_probe() and in mac_remove().

Add new fields into mac_device structure to save references taken for
future use in mac_probe() and mac_remove().

This is a preparation for further reference leaks fix.

Signed-off-by: Aleksandr Mishin <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

MAINTAINERS: add samples/pktgen to NETWORKING [GENERAL]

samples/pktgen is missing in the MAINTAINERS file.

Suggested-by: Antoine Tenart <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Hangbin Liu <[email protected]>
Message-ID: <20241018005301 [email protected]>
Signed-off-by: Andrew Lunn <[email protected]>

mailmap: update entry for Jesper Dangaard Brouer

Mapping all my previously used emails to my kernel.org email.

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Message-ID: <172909057364.2452383.8019986488234344607.stgit@firesoul>
Signed-off-by: Andrew Lunn <[email protected]>

net: dsa: mv88e6xxx: Fix error when setting port policy on mv88e6393x

mv88e6393x_port_set_policy doesn't correctly shift the ptr value when
converting the policy format between the old and new styles, so the
target register ends up with the ptr being written over the data bits.

Shift the pointer to align with the format expected by
mv88e6393x_port_policy_write().

Fixes: 6584b26020fc ("net: dsa: mv88e6xxx: implement .port_set_policy for Amethyst")
Signed-off-by: Peter Rashleigh <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Message-ID: <20241016040822 [email protected]>
Signed-off-by: Andrew Lunn <[email protected]>

octeon_ep: Add SKB allocation failures handling in __octep_oq_process_rx()

build_skb() returns NULL in case of a memory allocation failure so handle
it inside __octep_oq_process_rx() to avoid NULL pointer dereference.

__octep_oq_process_rx() is called during NAPI polling by the driver. If
skb allocation fails, keep on pulling packets out of the Rx DMA queue: we
shouldn't break the polling immediately and thus falsely indicate to the
octep_napi_poll() that the Rx pressure is going down. As there is no
associated skb in this case, don't process the packets and don't push them
up the network stack - they are skipped.

Helper function is implemented to unmmap/flush all the fragment buffers
used by the dropped packet. 'alloc_failures' counter is incremented to
mark the skb allocation error in driver statistics.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 37d79d059606 ("octeon_ep: add Tx/Rx processing and interrupt support")
Suggested-by: Paolo Abeni <[email protected]>
Signed-off-by: Aleksandr Mishin <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>

octeon_ep: Implement helper for iterating packets in Rx queue

The common code with some packet and index manipulations is extracted and
moved to newly implemented helper to make the code more readable and avoid
duplication. This is a preparation for skb allocation failure handling.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Suggested-by: Simon Horman <[email protected]>
Suggested-by: Paolo Abeni <[email protected]>
Signed-off-by: Aleksandr Mishin <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>

bnxt_en: replace ptp_lock with irqsave variant

In netpoll configuration the completion processing can happen in hard
irq context which will break with spin_lock_bh() for fullfilling RX
timestamp in case of all packets timestamping. Replace it with
spin_lock_irqsave() variant.

Fixes: 7f5515d19cd7 ("bnxt_en: Get the RX packet timestamp")
Reviewed-by: Michael Chan <[email protected]>
Signed-off-by: Vadim Fedorenko <[email protected]>
Message-ID: <20241016195234.2622004 [email protected]>
Signed-off-by: Andrew Lunn <[email protected]>

net: phy: dp83822: Fix reset pin definitions

This change fixes a rare issue where the PHY fails to detect a link
due to incorrect reset behavior.

The SW_RESET definition was incorrectly assigned to bit 14, which is the
Digital Restart bit according to the datasheet. This commit corrects
SW_RESET to bit 15 and assigns DIG_RESTART to bit 14 as per the
datasheet specifications.

The SW_RESET define is only used in the phy_reset function, which fully
re-initializes the PHY after the reset is performed. The change in the
bit definitions should not have any negative impact on the functionality
of the PHY.

v2:
- added Fixes tag
- improved commit message

Cc: [email protected]
Fixes: 5dc39fd5ef35 ("net: phy: DP83822: Add ability to advertise Fiber connection")
Signed-off-by: Alex Michel <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Message-ID: <AS1P250MB0608A798661549BF83C4B43EA9462@AS1P250MB0608.EURP250.PROD.OUTLOOK.COM>
Signed-off-by: Andrew Lunn <[email protected]>

MAINTAINERS: add Simon as an official reviewer

Simon has been diligently and consistently reviewing networking
changes for at least as long as our development statistics
go back. Often if not usually topping the list of reviewers.
Make his role official.

Acked-by: Eric Dumazet <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Message-ID: <20241015153005.2854018 [email protected]>
Signed-off-by: Andrew Lunn <[email protected]>

net: plip: fix break; causing plip to never transmit

Since commit
71ae2cb30531 ("net: plip: Fix fall-through warnings for Clang")

plip was not able to send any packets, this patch replaces one
unintended break; with fallthrough; which was originally missed by
commit 9525d69a3667 ("net: plip: mark expected switch fall-throughs").

I have verified with a real hardware PLIP connection that everything
works once again after applying this patch.

Fixes: 71ae2cb30531 ("net: plip: Fix fall-through warnings for Clang")
Signed-off-by: Jakub Boehm <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Message-ID: <20241015-net-plip-tx-fix-v1-1-32d8be1c7e0b@gmail.com>
Signed-off-by: Andrew Lunn <[email protected]>

be2net: fix potential memory leak in be_xmit()

The be_xmit() returns NETDEV_TX_OK without freeing skb
in case of be_xmit_enqueue() fails, add dev_kfree_skb_any() to fix it.

Fixes: 760c295e0e8d ("be2net: Support for OS2BMC.")
Signed-off-by: Wang Hai <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Kalesh AP <[email protected]>
Message-ID: <20241015144802 [email protected]>
Signed-off-by: Andrew Lunn <[email protected]>

net/sun3_82586: fix potential memory leak in sun3_82586_send_packet()

The sun3_82586_send_packet() returns NETDEV_TX_OK without freeing skb
in case of skb->len being too long, add dev_kfree_skb() to fix it.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Wang Hai <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Message-ID: <20241015144148 [email protected]>
Signed-off-by: Andrew Lunn <[email protected]>

net: pse-pd: Fix out of bound for loop

Adjust the loop limit to prevent out-of-bounds access when iterating over
PI structures. The loop should not reach the index pcdev->nr_lines since
we allocate exactly pcdev->nr_lines number of PI structures. This fix
ensures proper bounds are maintained during iterations.

Fixes: 9be9567a7c59 ("net: pse-pd: Add support for PSE PIs")
Signed-off-by: Kory Maincent <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Oleksij Rempel <[email protected]>
Message-ID: <20241015130255 [email protected]>
Signed-off-by: Andrew Lunn <[email protected]>

Merge tag 'net-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Current release - new code bugs:

   - eth: mlx5: HWS, don't destroy more bwc queue locks than allocated

  Previous releases - regressions:

   - ipv4: give an IPv4 dev to blackhole_netdev

   - udp: compute L4 checksum as usual when not segmenting the skb

   - tcp/dccp: don't use timer_pending() in reqsk_queue_unlink().

   - eth: mlx5e: don't call cleanup on profile rollback failure

   - eth: microchip: vcap api: fix memory leaks in
     vcap_api_encode_rule_test()

   - eth: enetc: disable Tx BD rings after they are empty

   - eth: macb: avoid 20s boot delay by skipping MDIO bus registration
     for fixed-link PHY

  Previous releases - always broken:

   - posix-clock: fix missing timespec64 check in pc_clock_settime()

   - genetlink: hold RCU in genlmsg_mcast()

   - mptcp: prevent MPC handshake on port-based signal endpoints

   - eth: vmxnet3: fix packet corruption in vmxnet3_xdp_xmit_frame

   - eth: stmmac: dwmac-tegra: fix link bring-up sequence

   - eth: bcmasp: fix potential memory leak in bcmasp_xmit()

  Misc:

   - add Andrew Lunn as a co-maintainer of all networking drivers"

* tag 'net-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
  net/mlx5e: Don't call cleanup on profile rollback failure
  net/mlx5: Unregister notifier on eswitch init failure
  net/mlx5: Fix command bitmask initialization
  net/mlx5: Check for invalid vector index on EQ creation
  net/mlx5: HWS, use lock classes for bwc locks
  net/mlx5: HWS, don't destroy more bwc queue locks than allocated
  net/mlx5: HWS, fixed double free in error flow of definer layout
  net/mlx5: HWS, removed wrong access to a number of rules variable
  mptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow
  net: ethernet: mtk_eth_soc: fix memory corruption during fq dma init
  vmxnet3: Fix packet corruption in vmxnet3_xdp_xmit_frame
  net: dsa: vsc73xx: fix reception from VLAN-unaware bridges
  net: ravb: Only advertise Rx/Tx timestamps if hardware supports it
  net: microchip: vcap api: Fix memory leaks in vcap_api_encode_rule_test()
  net: phy: mdio-bcm-unimac: Add BCM6846 support
  dt-bindings: net: brcm,unimac-mdio: Add bcm6846-mdio
  udp: Compute L4 checksum as usual when not segmenting the skb
  genetlink: hold RCU in genlmsg_mcast()
  net: dsa: mv88e6xxx: Fix the max_vid definition for the MV88E6361
  tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
  ...

netfilter: bpf: must hold reference on net namespace

BUG: KASAN: slab-use-after-free in __nf_unregister_net_hook+0x640/0x6b0
Read of size 8 at addr ffff8880106fe400 by task repro/72=
bpf_nf_link_release+0xda/0x1e0
bpf_link_free+0x139/0x2d0
bpf_link_release+0x68/0x80
__fput+0x414/0xb60

Eric says:
It seems that bpf was able to defer the __nf_unregister_net_hook()
after exit()/close() time.
Perhaps a netns reference is missing, because the netns has been
dismantled/freed already.
bpf_nf_link_attach() does :
link->net = net;
But I do not see a reference being taken on net.

Add such a reference and release it after hook unreg.
Note that I was unable to get syzbot reproducer to work, so I
do not know if this resolves this splat.

Fixes: 84601d6ee68a ("bpf: add bpf_link support for BPF_NETFILTER programs")
Diagnosed-by: Eric Dumazet <[email protected]>
Reported-by: Lai, Yi <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

Merge branch 'mlx5-misc-fixes-2024-10-15'

Tariq Toukan says:

====================
mlx5 misc fixes 2024-10-15

This patchset provides misc bug fixes from the team to the mlx5 core and
Eth drivers.

Series generated against:
commit 174714f0e505 ("selftests: drivers: net: fix name not defined")
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net/mlx5e: Don't call cleanup on profile rollback failure

When profile rollback fails in mlx5e_netdev_change_profile, the netdev
profile var is left set to NULL. Avoid a crash when unloading the driver
by not calling profile->cleanup in such a case.

This was encountered while testing, with the original trigger that
the wq rescuer thread creation got interrupted (presumably due to
Ctrl+C-ing modprobe), which gets converted to ENOMEM (-12) by
mlx5e_priv_init, the profile rollback also fails for the same reason
(signal still active) so the profile is left as NULL, leading to a crash
later in _mlx5e_remove.

[  732.473932] mlx5_core 0000:08:00.1: E-Switch: Unload vfs: mode(OFFLOADS), nvfs(2), necvfs(0), active vports(2)
[  734.525513] workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR
[  734.557372] mlx5_core 0000:08:00.1: mlx5e_netdev_init_profile:6235:(pid 6086): mlx5e_priv_init failed, err=-12
[  734.559187] mlx5_core 0000:08:00.1 eth3: mlx5e_netdev_change_profile: new profile init failed, -12
[  734.560153] workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR
[  734.589378] mlx5_core 0000:08:00.1: mlx5e_netdev_init_profile:6235:(pid 6086): mlx5e_priv_init failed, err=-12
[  734.591136] mlx5_core 0000:08:00.1 eth3: mlx5e_netdev_change_profile: failed to rollback to orig profile, -12
[  745.537492] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  745.538222] #PF: supervisor read access in kernel mode
<snipped>
[  745.551290] Call Trace:
[  745.551590]  <TASK>
[  745.551866]  ? __die+0x20/0x60
[  745.552218]  ? page_fault_oops+0x150/0x400
[  745.555307]  ? exc_page_fault+0x79/0x240
[  745.555729]  ? asm_exc_page_fault+0x22/0x30
[  745.556166]  ? mlx5e_remove+0x6b/0xb0 [mlx5_core]
[  745.556698]  auxiliary_bus_remove+0x18/0x30
[  745.557134]  device_release_driver_internal+0x1df/0x240
[  745.557654]  bus_remove_device+0xd7/0x140
[  745.558075]  device_del+0x15b/0x3c0
[  745.558456]  mlx5_rescan_drivers_locked.part.0+0xb1/0x2f0 [mlx5_core]
[  745.559112]  mlx5_unregister_device+0x34/0x50 [mlx5_core]
[  745.559686]  mlx5_uninit_one+0x46/0xf0 [mlx5_core]
[  745.560203]  remove_one+0x4e/0xd0 [mlx5_core]
[  745.560694]  pci_device_remove+0x39/0xa0
[  745.561112]  device_release_driver_internal+0x1df/0x240
[  745.561631]  driver_detach+0x47/0x90
[  745.562022]  bus_remove_driver+0x84/0x100
[  745.562444]  pci_unregister_driver+0x3b/0x90
[  745.562890]  mlx5_cleanup+0xc/0x1b [mlx5_core]
[  745.563415]  __x64_sys_delete_module+0x14d/0x2f0
[  745.563886]  ? kmem_cache_free+0x1b0/0x460
[  745.564313]  ? lockdep_hardirqs_on_prepare+0xe2/0x190
[  745.564825]  do_syscall_64+0x6d/0x140
[  745.565223]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[  745.565725] RIP: 0033:0x7f1579b1288b

Fixes: 3ef14e463f6e ("net/mlx5e: Separate between netdev objects and mlx5e profiles initialization")
Signed-off-by: Cosmin Ratiu <[email protected]>
Reviewed-by: Dragos Tatulea <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net/mlx5: Unregister notifier on eswitch init failure

It otherwise remains registered and a subsequent attempt at eswitch
enabling might trigger warnings of the sort:

[  682.589148] ------------[ cut here ]------------
[  682.590204] notifier callback eswitch_vport_event [mlx5_core] already registered
[  682.590256] WARNING: CPU: 13 PID: 2660 at kernel/notifier.c:31 notifier_chain_register+0x3e/0x90
[...snipped]
[  682.610052] Call Trace:
[  682.610369]  <TASK>
[  682.610663]  ? __warn+0x7c/0x110
[  682.611050]  ? notifier_chain_register+0x3e/0x90
[  682.611556]  ? report_bug+0x148/0x170
[  682.611977]  ? handle_bug+0x36/0x70
[  682.612384]  ? exc_invalid_op+0x13/0x60
[  682.612817]  ? asm_exc_invalid_op+0x16/0x20
[  682.613284]  ? notifier_chain_register+0x3e/0x90
[  682.613789]  atomic_notifier_chain_register+0x25/0x40
[  682.614322]  mlx5_eswitch_enable_locked+0x1d4/0x3b0 [mlx5_core]
[  682.614965]  mlx5_eswitch_enable+0xc9/0x100 [mlx5_core]
[  682.615551]  mlx5_device_enable_sriov+0x25/0x340 [mlx5_core]
[  682.616170]  mlx5_core_sriov_configure+0x50/0x170 [mlx5_core]
[  682.616789]  sriov_numvfs_store+0xb0/0x1b0
[  682.617248]  kernfs_fop_write_iter+0x117/0x1a0
[  682.617734]  vfs_write+0x231/0x3f0
[  682.618138]  ksys_write+0x63/0xe0
[  682.618536]  do_syscall_64+0x4c/0x100
[  682.618958]  entry_SYSCALL_64_after_hwframe+0x4b/0x53

Fixes: 7624e58a8b3a ("net/mlx5: E-switch, register event handler before arming the event")
Signed-off-by: Cosmin Ratiu <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net/mlx5: Fix command bitmask initialization

Command bitmask have a dedicated bit for MANAGE_PAGES command, this bit
isn't Initialize during command bitmask Initialization, only during
MANAGE_PAGES.

In addition, mlx5_cmd_trigger_completions() is trying to trigger
completion for MANAGE_PAGES command as well.

Hence, in case health error occurred before any MANAGE_PAGES command
have been invoke (for example, during mlx5_enable_hca()),
mlx5_cmd_trigger_completions() will try to trigger completion for
MANAGE_PAGES command, which will result in null-ptr-deref error.[1]

Fix it by Initialize command bitmask correctly.

While at it, re-write the code for better understanding.

[1]
BUG: KASAN: null-ptr-deref in mlx5_cmd_trigger_completions+0x1db/0x600 [mlx5_core]
Write of size 4 at addr 0000000000000214 by task kworker/u96:2/12078
CPU: 10 PID: 12078 Comm: kworker/u96:2 Not tainted 6.9.0-rc2_for_upstream_debug_2024_04_07_19_01 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_health0000:08:00.0 mlx5_fw_fatal_reporter_err_work [mlx5_core]
Call Trace:
<TASK>
dump_stack_lvl+0x7e/0xc0
kasan_report+0xb9/0xf0
kasan_check_range+0xec/0x190
mlx5_cmd_trigger_completions+0x1db/0x600 [mlx5_core]
mlx5_cmd_flush+0x94/0x240 [mlx5_core]
enter_error_state+0x6c/0xd0 [mlx5_core]
mlx5_fw_fatal_reporter_err_work+0xf3/0x480 [mlx5_core]
process_one_work+0x787/0x1490
? lockdep_hardirqs_on_prepare+0x400/0x400
? pwq_dec_nr_in_flight+0xda0/0xda0
? assign_work+0x168/0x240
worker_thread+0x586/0xd30
? rescuer_thread+0xae0/0xae0
kthread+0x2df/0x3b0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x2d/0x70
? kthread_complete_and_exit+0x20/0x20
ret_from_fork_asm+0x11/0x20
</TASK>

Fixes: 9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load")
Signed-off-by: Shay Drory <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Reviewed-by: Saeed Mahameed <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net/mlx5: Check for invalid vector index on EQ creation

Currently, mlx5 driver does not enforce vector index to be lower than
the maximum number of supported completion vectors when requesting a
new completion EQ. Thus, mlx5_comp_eqn_get() fails when trying to
acquire an IRQ with an improper vector index.

To prevent the case above, enforce that vector index value is
valid and lower than maximum in mlx5_comp_eqn_get() before handling the
request.

Fixes: f14c1a14e632 ("net/mlx5: Allocate completion EQs dynamically")
Signed-off-by: Maher Sanalla <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net/mlx5: HWS, use lock classes for bwc locks

The HWS BWC API uses one lock per queue and usually acquires one of
them, except when doing changes which require locking all queues in
order. Naturally, lockdep isn't too happy about acquiring the same lock
class multiple times, so inform it that each queue lock is a different
class to avoid false positives.

Fixes: 2ca62599aa0b ("net/mlx5: HWS, added send engine and context handling")
Signed-off-by: Cosmin Ratiu <[email protected]>
Signed-off-by: Yevgeny Kliteynik <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net/mlx5: HWS, don't destroy more bwc queue locks than allocated

hws_send_queues_bwc_locks_destroy destroyed more queue locks than
allocated, leading to memory corruption (occasionally) and warnings such
as DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock)) in __mutex_destroy because
sometimes, the 'mutex' being destroyed was random memory.
The severity of this problem is proportional to the number of queues
configured because the code overreaches beyond the end of the
bwc_send_queue_locks array by 2x its length.

Fix that by using the correct number of bwc queues.

Fixes: 2ca62599aa0b ("net/mlx5: HWS, added send engine and context handling")
Signed-off-by: Cosmin Ratiu <[email protected]>
Signed-off-by: Yevgeny Kliteynik <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net/mlx5: HWS, fixed double free in error flow of definer layout

Fix error flow bug that could lead to double free of a buffer
during a failure to calculate a suitable definer layout.

Fixes: 74a778b4a63f ("net/mlx5: HWS, added definers handling")
Signed-off-by: Yevgeny Kliteynik <[email protected]>
Reviewed-by: Itamar Gozlan <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net/mlx5: HWS, removed wrong access to a number of rules variable

Removed wrong access to the num_of_rules field of the matcher.
This is a usual u32 variable, but the access was as if it was atomic.

This fixes the following CI warnings:
mlx5hws_bwc.c:708:17: warning: large atomic operation may incur significant performance penalty;
the access size (4 bytes) exceeds the max lock-free size (0 bytes) [-Watomic-alignment]

Fixes: 510f9f61a112 ("net/mlx5: HWS, added API and enabled HWS support")
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Yevgeny Kliteynik <[email protected]>
Reviewed-by: Itamar Gozlan <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

mptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow

Syzkaller reported this splat:

  ==================================================================
  BUG: KASAN: slab-use-after-free in mptcp_pm_nl_rm_addr_or_subflow+0xb44/0xcc0 net/mptcp/pm_netlink.c:881
  Read of size 4 at addr ffff8880569ac858 by task syz.1.2799/14662

  CPU: 0 UID: 0 PID: 14662 Comm: syz.1.2799 Not tainted 6.12.0-rc2-syzkaller-00307-g36c254515dc6 #0
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
  Call Trace:
   <TASK>
   __dump_stack lib/dump_stack.c:94 [inline]
   dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
   print_address_description mm/kasan/report.c:377 [inline]
   print_report+0xc3/0x620 mm/kasan/report.c:488
   kasan_report+0xd9/0x110 mm/kasan/report.c:601
   mptcp_pm_nl_rm_addr_or_subflow+0xb44/0xcc0 net/mptcp/pm_netlink.c:881
   mptcp_pm_nl_rm_subflow_received net/mptcp/pm_netlink.c:914 [inline]
   mptcp_nl_remove_id_zero_address+0x305/0x4a0 net/mptcp/pm_netlink.c:1572
   mptcp_pm_nl_del_addr_doit+0x5c9/0x770 net/mptcp/pm_netlink.c:1603
   genl_family_rcv_msg_doit+0x202/0x2f0 net/netlink/genetlink.c:1115
   genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
   genl_rcv_msg+0x565/0x800 net/netlink/genetlink.c:1210
   netlink_rcv_skb+0x165/0x410 net/netlink/af_netlink.c:2551
   genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
   netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
   netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1357
   netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901
   sock_sendmsg_nosec net/socket.c:729 [inline]
   __sock_sendmsg net/socket.c:744 [inline]
   ____sys_sendmsg+0x9ae/0xb40 net/socket.c:2607
   ___sys_sendmsg+0x135/0x1e0 net/socket.c:2661
   __sys_sendmsg+0x117/0x1f0 net/socket.c:2690
   do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
   __do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386
   do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411
   entry_SYSENTER_compat_after_hwframe+0x84/0x8e
  RIP: 0023:0xf7fe4579
  Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
  RSP: 002b:00000000f574556c EFLAGS: 00000296 ORIG_RAX: 0000000000000172
  RAX: ffffffffffffffda RBX: 000000000000000b RCX: 0000000020000140
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000296 R12: 0000000000000000
  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
   </TASK>

  Allocated by task 5387:
   kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
   kasan_save_track+0x14/0x30 mm/kasan/common.c:68
   poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
   __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:394
   kmalloc_noprof include/linux/slab.h:878 [inline]
   kzalloc_noprof include/linux/slab.h:1014 [inline]
   subflow_create_ctx+0x87/0x2a0 net/mptcp/subflow.c:1803
   subflow_ulp_init+0xc3/0x4d0 net/mptcp/subflow.c:1956
   __tcp_set_ulp net/ipv4/tcp_ulp.c:146 [inline]
   tcp_set_ulp+0x326/0x7f0 net/ipv4/tcp_ulp.c:167
   mptcp_subflow_create_socket+0x4ae/0x10a0 net/mptcp/subflow.c:1764
   __mptcp_subflow_connect+0x3cc/0x1490 net/mptcp/subflow.c:1592
   mptcp_pm_create_subflow_or_signal_addr+0xbda/0x23a0 net/mptcp/pm_netlink.c:642
   mptcp_pm_nl_fully_established net/mptcp/pm_netlink.c:650 [inline]
   mptcp_pm_nl_work+0x3a1/0x4f0 net/mptcp/pm_netlink.c:943
   mptcp_worker+0x15a/0x1240 net/mptcp/protocol.c:2777
   process_one_work+0x958/0x1b30 kernel/workqueue.c:3229
   process_scheduled_works kernel/workqueue.c:3310 [inline]
   worker_thread+0x6c8/0xf00 kernel/workqueue.c:3391
   kthread+0x2c1/0x3a0 kernel/kthread.c:389
   ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

  Freed by task 113:
   kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
   kasan_save_track+0x14/0x30 mm/kasan/common.c:68
   kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:579
   poison_slab_object mm/kasan/common.c:247 [inline]
   __kasan_slab_free+0x51/0x70 mm/kasan/common.c:264
   kasan_slab_free include/linux/kasan.h:230 [inline]
   slab_free_hook mm/slub.c:2342 [inline]
   slab_free mm/slub.c:4579 [inline]
   kfree+0x14f/0x4b0 mm/slub.c:4727
   kvfree+0x47/0x50 mm/util.c:701
   kvfree_rcu_list+0xf5/0x2c0 kernel/rcu/tree.c:3423
   kvfree_rcu_drain_ready kernel/rcu/tree.c:3563 [inline]
   kfree_rcu_monitor+0x503/0x8b0 kernel/rcu/tree.c:3632
   kfree_rcu_shrink_scan+0x245/0x3a0 kernel/rcu/tree.c:3966
   do_shrink_slab+0x44f/0x11c0 mm/shrinker.c:435
   shrink_slab+0x32b/0x12a0 mm/shrinker.c:662
   shrink_one+0x47e/0x7b0 mm/vmscan.c:4818
   shrink_many mm/vmscan.c:4879 [inline]
   lru_gen_shrink_node mm/vmscan.c:4957 [inline]
   shrink_node+0x2452/0x39d0 mm/vmscan.c:5937
   kswapd_shrink_node mm/vmscan.c:6765 [inline]
   balance_pgdat+0xc19/0x18f0 mm/vmscan.c:6957
   kswapd+0x5ea/0xbf0 mm/vmscan.c:7226
   kthread+0x2c1/0x3a0 kernel/kthread.c:389
   ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

  Last potentially related work creation:
   kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
   __kasan_record_aux_stack+0xba/0xd0 mm/kasan/generic.c:541
   kvfree_call_rcu+0x74/0xbe0 kernel/rcu/tree.c:3810
   subflow_ulp_release+0x2ae/0x350 net/mptcp/subflow.c:2009
   tcp_cleanup_ulp+0x7c/0x130 net/ipv4/tcp_ulp.c:124
   tcp_v4_destroy_sock+0x1c5/0x6a0 net/ipv4/tcp_ipv4.c:2541
   inet_csk_destroy_sock+0x1a3/0x440 net/ipv4/inet_connection_sock.c:1293
   tcp_done+0x252/0x350 net/ipv4/tcp.c:4870
   tcp_rcv_state_process+0x379b/0x4f30 net/ipv4/tcp_input.c:6933
   tcp_v4_do_rcv+0x1ad/0xa90 net/ipv4/tcp_ipv4.c:1938
   sk_backlog_rcv include/net/sock.h:1115 [inline]
   __release_sock+0x31b/0x400 net/core/sock.c:3072
   __tcp_close+0x4f3/0xff0 net/ipv4/tcp.c:3142
   __mptcp_close_ssk+0x331/0x14d0 net/mptcp/protocol.c:2489
   mptcp_close_ssk net/mptcp/protocol.c:2543 [inline]
   mptcp_close_ssk+0x150/0x220 net/mptcp/protocol.c:2526
   mptcp_pm_nl_rm_addr_or_subflow+0x2be/0xcc0 net/mptcp/pm_netlink.c:878
   mptcp_pm_nl_rm_subflow_received net/mptcp/pm_netlink.c:914 [inline]
   mptcp_nl_remove_id_zero_address+0x305/0x4a0 net/mptcp/pm_netlink.c:1572
   mptcp_pm_nl_del_addr_doit+0x5c9/0x770 net/mptcp/pm_netlink.c:1603
   genl_family_rcv_msg_doit+0x202/0x2f0 net/netlink/genetlink.c:1115
   genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
   genl_rcv_msg+0x565/0x800 net/netlink/genetlink.c:1210
   netlink_rcv_skb+0x165/0x410 net/netlink/af_netlink.c:2551
   genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
   netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
   netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1357
   netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901
   sock_sendmsg_nosec net/socket.c:729 [inline]
   __sock_sendmsg net/socket.c:744 [inline]
   ____sys_sendmsg+0x9ae/0xb40 net/socket.c:2607
   ___sys_sendmsg+0x135/0x1e0 net/socket.c:2661
   __sys_sendmsg+0x117/0x1f0 net/socket.c:2690
   do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
   __do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386
   do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411
   entry_SYSENTER_compat_after_hwframe+0x84/0x8e

  The buggy address belongs to the object at ffff8880569ac800
   which belongs to the cache kmalloc-512 of size 512
  The buggy address is located 88 bytes inside of
   freed 512-byte region [ffff8880569ac800, ffff8880569aca00)

  The buggy address belongs to the physical page:
  page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x569ac
  head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
  flags: 0x4fff00000000040(head|node=1|zone=1|lastcpupid=0x7ff)
  page_type: f5(slab)
  raw: 04fff00000000040 ffff88801ac42c80 dead000000000100 dead000000000122
  raw: 0000000000000000 0000000080100010 00000001f5000000 0000000000000000
  head: 04fff00000000040 ffff88801ac42c80 dead000000000100 dead000000000122
  head: 0000000000000000 0000000080100010 00000001f5000000 0000000000000000
  head: 04fff00000000002 ffffea00015a6b01 ffffffffffffffff 0000000000000000
  head: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
  page dumped because: kasan: bad access detected
  page_owner tracks the page as allocated
  page last allocated via order 2, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 10238, tgid 10238 (kworker/u32:6), ts 597403252405, free_ts 597177952947
   set_page_owner include/linux/page_owner.h:32 [inline]
   post_alloc_hook+0x2d1/0x350 mm/page_alloc.c:1537
   prep_new_page mm/page_alloc.c:1545 [inline]
   get_page_from_freelist+0x101e/0x3070 mm/page_alloc.c:3457
   __alloc_pages_noprof+0x223/0x25a0 mm/page_alloc.c:4733
   alloc_pages_mpol_noprof+0x2c9/0x610 mm/mempolicy.c:2265
   alloc_slab_page mm/slub.c:2412 [inline]
   allocate_slab mm/slub.c:2578 [inline]
   new_slab+0x2ba/0x3f0 mm/slub.c:2631
   ___slab_alloc+0xd1d/0x16f0 mm/slub.c:3818
   __slab_alloc.constprop.0+0x56/0xb0 mm/slub.c:3908
   __slab_alloc_node mm/slub.c:3961 [inline]
   slab_alloc_node mm/slub.c:4122 [inline]
   __kmalloc_cache_noprof+0x2c5/0x310 mm/slub.c:4290
   kmalloc_noprof include/linux/slab.h:878 [inline]
   kzalloc_noprof include/linux/slab.h:1014 [inline]
   mld_add_delrec net/ipv6/mcast.c:743 [inline]
   igmp6_leave_group net/ipv6/mcast.c:2625 [inline]
   igmp6_group_dropped+0x4ab/0xe40 net/ipv6/mcast.c:723
   __ipv6_dev_mc_dec+0x281/0x360 net/ipv6/mcast.c:979
   addrconf_leave_solict net/ipv6/addrconf.c:2253 [inline]
   __ipv6_ifa_notify+0x3f6/0xc30 net/ipv6/addrconf.c:6283
   addrconf_ifdown.isra.0+0xef9/0x1a20 net/ipv6/addrconf.c:3982
   addrconf_notify+0x220/0x19c0 net/ipv6/addrconf.c:3781
   notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
   call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1996
   call_netdevice_notifiers_extack net/core/dev.c:2034 [inline]
   call_netdevice_notifiers net/core/dev.c:2048 [inline]
   dev_close_many+0x333/0x6a0 net/core/dev.c:1589
  page last free pid 13136 tgid 13136 stack trace:
   reset_page_owner include/linux/page_owner.h:25 [inline]
   free_pages_prepare mm/page_alloc.c:1108 [inline]
   free_unref_page+0x5f4/0xdc0 mm/page_alloc.c:2638
   stack_depot_save_flags+0x2da/0x900 lib/stackdepot.c:666
   kasan_save_stack+0x42/0x60 mm/kasan/common.c:48
   kasan_save_track+0x14/0x30 mm/kasan/common.c:68
   unpoison_slab_object mm/kasan/common.c:319 [inline]
   __kasan_slab_alloc+0x89/0x90 mm/kasan/common.c:345
   kasan_slab_alloc include/linux/kasan.h:247 [inline]
   slab_post_alloc_hook mm/slub.c:4085 [inline]
   slab_alloc_node mm/slub.c:4134 [inline]
   kmem_cache_alloc_noprof+0x121/0x2f0 mm/slub.c:4141
   skb_clone+0x190/0x3f0 net/core/skbuff.c:2084
   do_one_broadcast net/netlink/af_netlink.c:1462 [inline]
   netlink_broadcast_filtered+0xb11/0xef0 net/netlink/af_netlink.c:1540
   netlink_broadcast+0x39/0x50 net/netlink/af_netlink.c:1564
   uevent_net_broadcast_untagged lib/kobject_uevent.c:331 [inline]
   kobject_uevent_net_broadcast lib/kobject_uevent.c:410 [inline]
   kobject_uevent_env+0xacd/0x1670 lib/kobject_uevent.c:608
   device_del+0x623/0x9f0 drivers/base/core.c:3882
   snd_card_disconnect.part.0+0x58a/0x7c0 sound/core/init.c:546
   snd_card_disconnect+0x1f/0x30 sound/core/init.c:495
   snd_usx2y_disconnect+0xe9/0x1f0 sound/usb/usx2y/usbusx2y.c:417
   usb_unbind_interface+0x1e8/0x970 drivers/usb/core/driver.c:461
   device_remove drivers/base/dd.c:569 [inline]
   device_remove+0x122/0x170 drivers/base/dd.c:561

That's because 'subflow' is used just after 'mptcp_close_ssk(subflow)',
which will initiate the release of its memory. Even if it is very likely
the release and the re-utilisation will be done later on, it is of
course better to avoid any issues and read the content of 'subflow'
before closing it.

Fixes: 1c1f72137598 ("mptcp: pm: only decrement add_addr_accepted for MPJ req")
Cc: [email protected]
Reported-by: [email protected]
Closes: https://lore.kernel.org/[email protected]
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: ethernet: mtk_eth_soc: fix memory corruption during fq dma init

The loop responsible for allocating up to MTK_FQ_DMA_LENGTH buffers must
only touch as many descriptors, otherwise it ends up corrupting unrelated
memory. Fix the loop iteration count accordingly.

Fixes: c57e55819443 ("net: ethernet: mtk_eth_soc: handle dma buffer size soc specific")
Signed-off-by: Felix Fietkau <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

vmxnet3: Fix packet corruption in vmxnet3_xdp_xmit_frame

Andrew and Nikolay reported connectivity issues with Cilium's service
load-balancing in case of vmxnet3.

If a BPF program for native XDP adds an encapsulation header such as
IPIP and transmits the packet out the same interface, then in case
of vmxnet3 a corrupted packet is being sent and subsequently dropped
on the path.

vmxnet3_xdp_xmit_frame() which is called e.g. via vmxnet3_run_xdp()
through vmxnet3_xdp_xmit_back() calculates an incorrect DMA address:

  page = virt_to_page(xdpf->data);
  tbi->dma_addr = page_pool_get_dma_addr(page) +
                  VMXNET3_XDP_HEADROOM;
  dma_sync_single_for_device(&adapter->pdev->dev,
                             tbi->dma_addr, buf_size,
                             DMA_TO_DEVICE);

The above assumes a fixed offset (VMXNET3_XDP_HEADROOM), but the XDP
BPF program could have moved xdp->data. While the passed buf_size is
correct (xdpf->len), the dma_addr needs to have a dynamic offset which
can be calculated as xdpf->data - (void *)xdpf, that is, xdp->data -
xdp->data_hard_start.

Fixes: 54f00cce1178 ("vmxnet3: Add XDP support.")
Reported-by: Andrew Sauber <[email protected]>
Reported-by: Nikolay Nikolaev <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Tested-by: Nikolay Nikolaev <[email protected]>
Acked-by: Anton Protopopov <[email protected]>
Cc: William Tu <[email protected]>
Cc: Ronak Doshi <[email protected]>
Link: https://patch.msgid.link/a0888656d7f09028f9984498cc698bb5364d89fc.1728931137.git.daniel@iogearbox.net
Signed-off-by: Paolo Abeni <[email protected]>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
"Several miscellaneous fixes. A lot of bnxt_re activity, there will be
  more rc patches there coming.

   - Many bnxt_re bug fixes - Memory leaks, kasn, NULL pointer deref,
     soft lockups, error unwinding and some small functional issues

   - Error unwind bug in rdma netlink

   - Two issues with incorrect VLAN detection for iWarp

   - skb_splice_from_iter() splat in siw

   - Give SRP slab caches unique names to resolve the merge window
     WARN_ON regression"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/bnxt_re: Fix the GID table length
  RDMA/bnxt_re: Fix a bug while setting up Level-2 PBL pages
  RDMA/bnxt_re: Change the sequence of updating the CQ toggle value
  RDMA/bnxt_re: Fix an error path in bnxt_re_add_device
  RDMA/bnxt_re: Avoid CPU lockups due fifo occupancy check loop
  RDMA/bnxt_re: Fix a possible NULL pointer dereference
  RDMA/bnxt_re: Return more meaningful error
  RDMA/bnxt_re: Fix incorrect dereference of srq in async event
  RDMA/bnxt_re: Fix out of bound check
  RDMA/bnxt_re: Fix the max CQ WQEs for older adapters
  RDMA/srpt: Make slab cache names unique
  RDMA/irdma: Fix misspelling of "accept*"
  RDMA/cxgb4: Fix RDMA_CM_EVENT_UNREACHABLE error for iWARP
  RDMA/siw: Add sendpage_ok() check to disable MSG_SPLICE_PAGES
  RDMA/core: Fix ENODEV error for iWARP test over vlan
  RDMA/nldev: Fix NULL pointer dereferences issue in rdma_nl_notify_event
  RDMA/bnxt_re: Fix the max WQEs used in Static WQE mode
  RDMA/bnxt_re: Add a check for memory allocation
  RDMA/bnxt_re: Fix incorrect AVID type in WQE structure
  RDMA/bnxt_re: Fix a possible memory leak

Merge tag 'for-6.12-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

- regression fix: dirty extents tracked in xarray for qgroups must be
   adjusted for 32bit platforms

- fix potentially freeing uninitialized name in fscrypt structure

- fix warning about unneeded variable in a send callback

* tag 'for-6.12-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix uninitialized pointer free on read_alloc_one_name() error
  btrfs: send: cleanup unneeded return variable in changed_verity()
  btrfs: fix uninitialized pointer free in add_inode_ref()
  btrfs: use sector numbers as keys for the dirty extents xarray

Merge tag 'v6.12-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

- fix race between session setup and session logoff

- add supplementary group support

* tag 'v6.12-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: add support for supplementary groups
ksmbd: fix user-after-free from session log off

Merge tag 'v6.12-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:

- Remove bogus testmgr ENOENT error messages

- Ensure algorithm is still alive before marking it as tested

- Disable buggy hash algorithms in marvell/cesa

* tag 'v6.12-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: marvell/cesa - Disable hash algorithms
  crypto: testmgr - Hide ENOENT errors better
  crypto: api - Fix liveliness check in crypto_alg_tested

Merge tag 'sched_ext-for-6.12-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:

- More issues reported in the enable/disable paths on large machines
   with many tasks due to scx_tasks_lock being held too long. Break up
   the task iterations

- Remove ops.select_cpu() dependency in bypass mode so that a
   misbehaving implementation can't live-lock the machine by pushing all
   tasks to few CPUs in bypass mode

- Other misc fixes

* tag 'sched_ext-for-6.12-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Remove unnecessary cpu_relax()
  sched_ext: Don't hold scx_tasks_lock for too long
  sched_ext: Move scx_tasks_lock handling into scx_task_iter helpers
  sched_ext: bypass mode shouldn't depend on ops.select_cpu()
  sched_ext: Move scx_buildin_idle_enabled check to scx_bpf_select_cpu_dfl()
  sched_ext: Start schedulers with consistent p->scx.slice values
  Revert "sched_ext: Use shorter slice while bypassing"
  sched_ext: use correct function name in pick_task_scx() warning message
  selftests: sched_ext: Add sched_ext as proper selftest target

net: dsa: vsc73xx: fix reception from VLAN-unaware bridges

Similar to the situation described for sja1105 in commit 1f9fc48fd302
("net: dsa: sja1105: fix reception from VLAN-unaware bridges"), the
vsc73xx driver uses tag_8021q and doesn't need the ds->untag_bridge_pvid
request. In fact, this option breaks packet reception.

The ds->untag_bridge_pvid option strips VLANs from packets received on
VLAN-unaware bridge ports. But those VLANs should already be stripped
by tag_vsc73xx_8021q.c as part of vsc73xx_rcv() - they are not VLANs in
VLAN-unaware mode, but DSA tags. Thus, dsa_software_vlan_untag() tries
to untag a VLAN that doesn't exist, corrupting the packet.

Fixes: 93e4649efa96 ("net: dsa: provide a software untagging function on RX for VLAN-aware bridges")
Tested-by: Pawel Dembicki <[email protected]>
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: ravb: Only advertise Rx/Tx timestamps if hardware supports it

Recent work moving the reporting of Rx software timestamps to the core
[1] highlighted an issue where hardware time stamping was advertised
for the platforms where it is not supported.

Fix this by covering advertising support for hardware timestamps only if
the hardware supports it. Due to the Tx implementation in RAVB software
Tx timestamping is also only considered if the hardware supports
hardware timestamps. This should be addressed in future, but this fix
only reflects what the driver currently implements.

1. Commit 277901ee3a26 ("ravb: Remove setting of RX software timestamp")

Fixes: 7e09a052dc4e ("ravb: Exclude gPTP feature support for RZ/G2L")
Signed-off-by: Niklas Söderlund <[email protected]>
Reviewed-by: Paul Barker <[email protected]>
Tested-by: Paul Barker <[email protected]>
Reviewed-by: Sergey Shtylyov <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: microchip: vcap api: Fix memory leaks in vcap_api_encode_rule_test()

Commit a3c1e45156ad ("net: microchip: vcap: Fix use-after-free error in
kunit test") fixed the use-after-free error, but introduced below
memory leaks by removing necessary vcap_free_rule(), add it to fix it.

unreferenced object 0xffffff80ca58b700 (size 192):
  comm "kunit_try_catch", pid 1215, jiffies 4294898264
  hex dump (first 32 bytes):
    00 12 7a 00 05 00 00 00 0a 00 00 00 64 00 00 00  ..z.........d...
    00 00 00 00 00 00 00 00 00 04 0b cc 80 ff ff ff  ................
  backtrace (crc 9c09c3fe):
    [<0000000052a0be73>] kmemleak_alloc+0x34/0x40
    [<0000000043605459>] __kmalloc_cache_noprof+0x26c/0x2f4
    [<0000000040a01b8d>] vcap_alloc_rule+0x3cc/0x9c4
    [<000000003fe86110>] vcap_api_encode_rule_test+0x1ac/0x16b0
    [<00000000b3595fc4>] kunit_try_run_case+0x13c/0x3ac
    [<0000000010f5d2bf>] kunit_generic_run_threadfn_adapter+0x80/0xec
    [<00000000c5d82c9a>] kthread+0x2e8/0x374
    [<00000000f4287308>] ret_from_fork+0x10/0x20
unreferenced object 0xffffff80cc0b0400 (size 64):
  comm "kunit_try_catch", pid 1215, jiffies 4294898265
  hex dump (first 32 bytes):
    80 04 0b cc 80 ff ff ff 18 b7 58 ca 80 ff ff ff  ..........X.....
    39 00 00 00 02 00 00 00 06 05 04 03 02 01 ff ff  9...............
  backtrace (crc daf014e9):
    [<0000000052a0be73>] kmemleak_alloc+0x34/0x40
    [<0000000043605459>] __kmalloc_cache_noprof+0x26c/0x2f4
    [<000000000ff63fd4>] vcap_rule_add_key+0x2cc/0x528
    [<00000000dfdb1e81>] vcap_api_encode_rule_test+0x224/0x16b0
    [<00000000b3595fc4>] kunit_try_run_case+0x13c/0x3ac
    [<0000000010f5d2bf>] kunit_generic_run_threadfn_adapter+0x80/0xec
    [<00000000c5d82c9a>] kthread+0x2e8/0x374
    [<00000000f4287308>] ret_from_fork+0x10/0x20
unreferenced object 0xffffff80cc0b0700 (size 64):
  comm "kunit_try_catch", pid 1215, jiffies 4294898265
  hex dump (first 32 bytes):
    80 07 0b cc 80 ff ff ff 28 b7 58 ca 80 ff ff ff  ........(.X.....
    3c 00 00 00 00 00 00 00 01 2f 03 b3 ec ff ff ff  <......../......
  backtrace (crc 8d877792):
    [<0000000052a0be73>] kmemleak_alloc+0x34/0x40
    [<0000000043605459>] __kmalloc_cache_noprof+0x26c/0x2f4
    [<000000006eadfab7>] vcap_rule_add_action+0x2d0/0x52c
    [<00000000323475d1>] vcap_api_encode_rule_test+0x4d4/0x16b0
    [<00000000b3595fc4>] kunit_try_run_case+0x13c/0x3ac
    [<0000000010f5d2bf>] kunit_generic_run_threadfn_adapter+0x80/0xec
    [<00000000c5d82c9a>] kthread+0x2e8/0x374
    [<00000000f4287308>] ret_from_fork+0x10/0x20
unreferenced object 0xffffff80cc0b0900 (size 64):
  comm "kunit_try_catch", pid 1215, jiffies 4294898266
  hex dump (first 32 bytes):
    80 09 0b cc 80 ff ff ff 80 06 0b cc 80 ff ff ff  ................
    7d 00 00 00 01 00 00 00 00 00 00 00 ff 00 00 00  }...............
  backtrace (crc 34181e56):
    [<0000000052a0be73>] kmemleak_alloc+0x34/0x40
    [<0000000043605459>] __kmalloc_cache_noprof+0x26c/0x2f4
    [<000000000ff63fd4>] vcap_rule_add_key+0x2cc/0x528
    [<00000000991e3564>] vcap_val_rule+0xcf0/0x13e8
    [<00000000fc9868e5>] vcap_api_encode_rule_test+0x678/0x16b0
    [<00000000b3595fc4>] kunit_try_run_case+0x13c/0x3ac
    [<0000000010f5d2bf>] kunit_generic_run_threadfn_adapter+0x80/0xec
    [<00000000c5d82c9a>] kthread+0x2e8/0x374
    [<00000000f4287308>] ret_from_fork+0x10/0x20
unreferenced object 0xffffff80cc0b0980 (size 64):
  comm "kunit_try_catch", pid 1215, jiffies 4294898266
  hex dump (first 32 bytes):
    18 b7 58 ca 80 ff ff ff 00 09 0b cc 80 ff ff ff  ..X.............
    67 00 00 00 00 00 00 00 01 01 74 88 c0 ff ff ff  g.........t.....
  backtrace (crc 275fd9be):
    [<0000000052a0be73>] kmemleak_alloc+0x34/0x40
    [<0000000043605459>] __kmalloc_cache_noprof+0x26c/0x2f4
    [<000000000ff63fd4>] vcap_rule_add_key+0x2cc/0x528
    [<000000001396a1a2>] test_add_def_fields+0xb0/0x100
    [<000000006e7621f0>] vcap_val_rule+0xa98/0x13e8
    [<00000000fc9868e5>] vcap_api_encode_rule_test+0x678/0x16b0
    [<00000000b3595fc4>] kunit_try_run_case+0x13c/0x3ac
    [<0000000010f5d2bf>] kunit_generic_run_threadfn_adapter+0x80/0xec
    [<00000000c5d82c9a>] kthread+0x2e8/0x374
    [<00000000f4287308>] ret_from_fork+0x10/0x20
......

Cc: [email protected]
Fixes: a3c1e45156ad ("net: microchip: vcap: Fix use-after-free error in kunit test")
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Jens Emil Schulz Østergaard <[email protected]>
Signed-off-by: Jinjie Ruan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'net-phy-mdio-bcm-unimac-add-bcm6846-variant'

Linus Walleij says:

====================
net: phy: mdio-bcm-unimac: Add BCM6846 variant

As pointed out by Florian:
https://lore.kernel.org/linux-devicetree/b542b2e8-115c-4234-a464-e73aa6bece5c@broadcom.com/

The BCM6846 has a few extra registers and cannot reuse the
compatible string from other variants of the Unimac
MDIO block: we need to be able to tell them apart.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phy: mdio-bcm-unimac: Add BCM6846 support

Add Unimac mdio compatible string for the special BCM6846
variant.

This variant has a few extra registers compared to other
versions.

Suggested-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/linux-devicetree/[email protected]/
Signed-off-by: Linus Walleij <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

dt-bindings: net: brcm,unimac-mdio: Add bcm6846-mdio

The MDIO block in the BCM6846 is not identical to any of the
previous versions, but has extended registers not present in
the other variants. For this reason we need to use a new
compatible especially for this SoC.

Suggested-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/linux-devicetree/[email protected]/
Signed-off-by: Linus Walleij <[email protected]>
Acked-by: Rob Herring (Arm) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

udp: Compute L4 checksum as usual when not segmenting the skb

If:

  1) the user requested USO, but
  2) there is not enough payload for GSO to kick in, and
  3) the egress device doesn't offer checksum offload, then

we want to compute the L4 checksum in software early on.

In the case when we are not taking the GSO path, but it has been requested,
the software checksum fallback in skb_segment doesn't get a chance to
compute the full checksum, if the egress device can't do it. As a result we
end up sending UDP datagrams with only a partial checksum filled in, which
the peer will discard.

Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload")
Reported-by: Ivan Babrou <[email protected]>
Signed-off-by: Jakub Sitnicki <[email protected]>
Acked-by: Willem de Bruijn <[email protected]>
Cc: [email protected]
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

genetlink: hold RCU in genlmsg_mcast()

While running net selftests with CONFIG_PROVE_RCU_LIST=y I saw
one lockdep splat [1].

genlmsg_mcast() uses for_each_net_rcu(), and must therefore hold RCU.

Instead of letting all callers guard genlmsg_multicast_allns()
with a rcu_read_lock()/rcu_read_unlock() pair, do it in genlmsg_mcast().

This also means the @flags parameter is useless, we need to always use
GFP_ATOMIC.

[1]
[10882.424136] =============================
[10882.424166] WARNING: suspicious RCU usage
[10882.424309] 6.12.0-rc2-virtme #1156 Not tainted
[10882.424400] -----------------------------
[10882.424423] net/netlink/genetlink.c:1940 RCU-list traversed in non-reader section!!
[10882.424469]
other info that might help us debug this:

[10882.424500]
rcu_scheduler_active = 2, debug_locks = 1
[10882.424744] 2 locks held by ip/15677:
[10882.424791] #0: ffffffffb6b491b0 (cb_lock){++++}-{3:3}, at: genl_rcv (net/netlink/genetlink.c:1219)
[10882.426334] #1: ffffffffb6b49248 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg (net/netlink/genetlink.c:61 net/netlink/genetlink.c:57 net/netlink/genetlink.c:1209)
[10882.426465]
stack backtrace:
[10882.426805] CPU: 14 UID: 0 PID: 15677 Comm: ip Not tainted 6.12.0-rc2-virtme #1156
[10882.426919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[10882.427046] Call Trace:
[10882.427131] <TASK>
[10882.427244] dump_stack_lvl (lib/dump_stack.c:123)
[10882.427335] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822)
[10882.427387] genlmsg_multicast_allns (net/netlink/genetlink.c:1940 (discriminator 7) net/netlink/genetlink.c:1977 (discriminator 7))
[10882.427436] l2tp_tunnel_notify.constprop.0 (net/l2tp/l2tp_netlink.c:119) l2tp_netlink
[10882.427683] l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:253) l2tp_netlink
[10882.427748] genl_family_rcv_msg_doit (net/netlink/genetlink.c:1115)
[10882.427834] genl_rcv_msg (net/netlink/genetlink.c:1195 net/netlink/genetlink.c:1210)
[10882.427877] ? __pfx_l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:186) l2tp_netlink
[10882.427927] ? __pfx_genl_rcv_msg (net/netlink/genetlink.c:1201)
[10882.427959] netlink_rcv_skb (net/netlink/af_netlink.c:2551)
[10882.428069] genl_rcv (net/netlink/genetlink.c:1220)
[10882.428095] netlink_unicast (net/netlink/af_netlink.c:1332 net/netlink/af_netlink.c:1357)
[10882.428140] netlink_sendmsg (net/netlink/af_netlink.c:1901)
[10882.428210] ____sys_sendmsg (net/socket.c:729 (discriminator 1) net/socket.c:744 (discriminator 1) net/socket.c:2607 (discriminator 1))

Fixes: 33f72e6f0c67 ("l2tp : multicast notification to the registered listeners")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: James Chapman <[email protected]>
Cc: Tom Parkin <[email protected]>
Cc: Johannes Berg <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: mv88e6xxx: Fix the max_vid definition for the MV88E6361

According to the Marvell datasheet the 88E6361 has two VTU pages
(4k VIDs per page) so the max_vid should be 8191, not 4095.

In the current implementation mv88e6xxx_vtu_walk() gives unexpected
results because of this error. I verified that mv88e6xxx_vtu_walk()
works correctly on the MV88E6361 with this patch in place.

Fixes: 12899f299803 ("net: dsa: mv88e6xxx: enable support for 88E6361 switch")
Signed-off-by: Peter Rashleigh <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().

Martin KaFai Lau reported use-after-free [0] in reqsk_timer_handler().

  """
  We are seeing a use-after-free from a bpf prog attached to
  trace_tcp_retransmit_synack. The program passes the req->sk to the
  bpf_sk_storage_get_tracing kernel helper which does check for null
  before using it.
  """

The commit 83fccfc3940c ("inet: fix potential deadlock in
reqsk_queue_unlink()") added timer_pending() in reqsk_queue_unlink() not
to call del_timer_sync() from reqsk_timer_handler(), but it introduced a
small race window.

Before the timer is called, expire_timers() calls detach_timer(timer, true)
to clear timer->entry.pprev and marks it as not pending.

If reqsk_queue_unlink() checks timer_pending() just after expire_timers()
calls detach_timer(), TCP will miss del_timer_sync(); the reqsk timer will
continue running and send multiple SYN+ACKs until it expires.

The reported UAF could happen if req->sk is close()d earlier than the timer
expiration, which is 63s by default.

The scenario would be

  1. inet_csk_complete_hashdance() calls inet_csk_reqsk_queue_drop(),
     but del_timer_sync() is missed

  2. reqsk timer is executed and scheduled again

  3. req->sk is accept()ed and reqsk_put() decrements rsk_refcnt, but
     reqsk timer still has another one, and inet_csk_accept() does not
     clear req->sk for non-TFO sockets

  4. sk is close()d

  5. reqsk timer is executed again, and BPF touches req->sk

Let's not use timer_pending() by passing the caller context to
__inet_csk_reqsk_queue_drop().

Note that reqsk timer is pinned, so the issue does not happen in most
use cases. [1]

[0]
BUG: KFENCE: use-after-free read in bpf_sk_storage_get_tracing+0x2e/0x1b0

Use-after-free read at 0x00000000a891fb3a (in kfence-#1):
bpf_sk_storage_get_tracing+0x2e/0x1b0
bpf_prog_5ea3e95db6da0438_tcp_retransmit_synack+0x1d20/0x1dda
bpf_trace_run2+0x4c/0xc0
tcp_rtx_synack+0xf9/0x100
reqsk_timer_handler+0xda/0x3d0
run_timer_softirq+0x292/0x8a0
irq_exit_rcu+0xf5/0x320
sysvec_apic_timer_interrupt+0x6d/0x80
asm_sysvec_apic_timer_interrupt+0x16/0x20
intel_idle_irq+0x5a/0xa0
cpuidle_enter_state+0x94/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb

kfence-#1: 0x00000000a72cc7b6-0x00000000d97616d9, size=2376, cache=TCPv6

allocated by task 0 on cpu 9 at 260507.901592s:
sk_prot_alloc+0x35/0x140
sk_clone_lock+0x1f/0x3f0
inet_csk_clone_lock+0x15/0x160
tcp_create_openreq_child+0x1f/0x410
tcp_v6_syn_recv_sock+0x1da/0x700
tcp_check_req+0x1fb/0x510
tcp_v6_rcv+0x98b/0x1420
ipv6_list_rcv+0x2258/0x26e0
napi_complete_done+0x5b1/0x2990
mlx5e_napi_poll+0x2ae/0x8d0
net_rx_action+0x13e/0x590
irq_exit_rcu+0xf5/0x320
common_interrupt+0x80/0x90
asm_common_interrupt+0x22/0x40
cpuidle_enter_state+0xfb/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb

freed by task 0 on cpu 9 at 260507.927527s:
rcu_core_si+0x4ff/0xf10
irq_exit_rcu+0xf5/0x320
sysvec_apic_timer_interrupt+0x6d/0x80
asm_sysvec_apic_timer_interrupt+0x16/0x20
cpuidle_enter_state+0xfb/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb

Fixes: 83fccfc3940c ("inet: fix potential deadlock in reqsk_queue_unlink()")
Reported-by: Martin KaFai Lau <[email protected]>
Closes: https://lore.kernel.org/netdev/[email protected]/
Link: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Martin KaFai Lau <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: bcmasp: fix potential memory leak in bcmasp_xmit()

The bcmasp_xmit() returns NETDEV_TX_OK without freeing skb
in case of mapping fails, add dev_kfree_skb() to fix it.

Fixes: 490cb412007d ("net: bcmasp: Add support for ASP2.0 Ethernet controller")
Signed-off-by: Wang Hai <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: systemport: fix potential memory leak in bcm_sysport_xmit()

The bcm_sysport_xmit() returns NETDEV_TX_OK without freeing skb
in case of dma_map_single() fails, add dev_kfree_skb() to fix it.

Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
Signed-off-by: Wang Hai <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'trace-ringbuffer-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ring-buffer fixes from Steven Rostedt:

- Fix ref counter of buffers assigned at boot up

   A tracing instance can be created from the kernel command line. If it
   maps to memory, it is considered permanent and should not be deleted,
   or bad things can happen. If it is not mapped to memory, then the
   user is fine to delete it via rmdir from the instances directory. But
   the ref counts assumed 0 was free to remove and greater than zero was
   not. But this was not the case. When an instance is created, it
   should have the reference of 1, and if it should not be removed, it
   must be greater than 1. The boot up code set normal instances with a
   ref count of 0, which could get removed if something accessed it and
   then released it. And memory mapped instances had a ref count of 1
   which meant it could be deleted, and bad things happen. Keep normal
   instances ref count as 1, and set memory mapped instances ref count
   to 2.

- Protect sub buffer size (order) updates from other modifications

   When a ring buffer is changing the size of its sub-buffers, no other
   operations should be performed on the ring buffer. That includes
   reading it. But the locking only grabbed the buffer->mutex that keeps
   some operations from touching the ring buffer. It also must hold the
   cpu_buffer->reader_lock as well when updates happen as other paths
   use that to do some operations on the ring buffer.

* tag 'trace-ringbuffer-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Fix reader locking when changing the sub buffer order
  ring-buffer: Fix refcount setting of boot mapped buffers

Merge tag 'bcachefs-2024-10-14' of git://evilpiepirate.org/bcachefs

Pull bcachefs fixes from Kent Overstreet:

- New metadata version inode_has_child_snapshots

   This fixes bugs with handling of unlinked inodes + snapshots, in
   particular when an inode is reattached after taking a snapshot;
   deleted inodes now get correctly cleaned up across snapshots.

- Disk accounting rewrite fixes
     - validation fixes for when a device has been removed
     - fix journal replay failing with "journal_reclaim_would_deadlock"

- Some more small fixes for erasure coding + device removal

- Assorted small syzbot fixes

* tag 'bcachefs-2024-10-14' of git://evilpiepirate.org/bcachefs: (27 commits)
  bcachefs: Fix sysfs warning in fstests generic/730,731
  bcachefs: Handle race between stripe reuse, invalidate_stripe_to_dev
  bcachefs: Fix kasan splat in new_stripe_alloc_buckets()
  bcachefs: Add missing validation for bch_stripe.csum_granularity_bits
  bcachefs: Fix missing bounds checks in bch2_alloc_read()
  bcachefs: fix uaf in bch2_dio_write_done()
  bcachefs: Improve check_snapshot_exists()
  bcachefs: Fix bkey_nocow_lock()
  bcachefs: Fix accounting replay flags
  bcachefs: Fix invalid shift in member_to_text()
  bcachefs: Fix bch2_have_enough_devs() for BCH_SB_MEMBER_INVALID
  bcachefs: __wait_for_freeing_inode: Switch to wait_bit_queue_entry
  bcachefs: Check if stuck in journal_res_get()
  closures: Add closure_wait_event_timeout()
  bcachefs: Fix state lock involved deadlock
  bcachefs: Fix NULL pointer dereference in bch2_opt_to_text
  bcachefs: Release transaction before wake up
  bcachefs: add check for btree id against max in try read node
  bcachefs: Disk accounting device validation fixes
  bcachefs: bch2_inode_or_descendents_is_open()
  ...

net: ethernet: rtsn: fix potential memory leak in rtsn_start_xmit()

The rtsn_start_xmit() returns NETDEV_TX_OK without freeing skb
in case of skb->len being too long, add dev_kfree_skb_any() to fix it.

Fixes: b0d3969d2b4d ("net: ethernet: rtsn: Add support for Renesas Ethernet-TSN")
Signed-off-by: Wang Hai <[email protected]>
Reviewed-by: Niklas Söderlund <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: xilinx: axienet: fix potential memory leak in axienet_start_xmit()

The axienet_start_xmit() returns NETDEV_TX_OK without freeing skb
in case of dma_map_single() fails, add dev_kfree_skb_any() to fix it.

Fixes: 71791dc8bdea ("net: axienet: Check for DMA mapping errors")
Signed-off-by: Wang Hai <[email protected]>
Reviewed-by: Radhey Shyam Pandey <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'mptcp-prevent-mpc-handshake-on-port-based-signal-endpoints'

Matthieu Baerts says:

====================
mptcp: prevent MPC handshake on port-based signal endpoints

MPTCP connection requests toward a listening socket created by the
in-kernel PM for a port based signal endpoint will never be accepted,
they need to be explicitly rejected.

- Patch 1: Explicitly reject such requests. A fix for >= v5.12.

- Patch 2: Cover this case in the MPTCP selftests to avoid regressions.

Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
v1: https://lore.kernel.org/20240908180620 [email protected]

Link: https://lore.kernel.org/[email protected]
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: join: test for prohibited MPC to port-based endp

Explicitly verify that MPC connection attempts towards a port-based
signal endpoint fail with a reset.

Note that this new test is a bit different from the other ones, not
using 'run_tests'. It is then needed to add the capture capability, and
the picking the right port which have been extracted into three new
helpers. The info about the capture can also be printed from a single
point, which simplifies the exit paths in do_transfer().

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port")
Cc: [email protected]
Co-developed-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

mptcp: prevent MPC handshake on port-based signal endpoints

Syzkaller reported a lockdep splat:

  ============================================
  WARNING: possible recursive locking detected
  6.11.0-rc6-syzkaller-00019-g67784a74e258 #0 Not tainted
  --------------------------------------------
  syz-executor364/5113 is trying to acquire lock:
  ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
  ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328

  but task is already holding lock:
  ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
  ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(k-slock-AF_INET);
    lock(k-slock-AF_INET);

   *** DEADLOCK ***

   May be due to missing lock nesting notation

  7 locks held by syz-executor364/5113:
   #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline]
   #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x153/0x1b10 net/mptcp/protocol.c:1806
   #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline]
   #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg_fastopen+0x11f/0x530 net/mptcp/protocol.c:1727
   #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline]
   #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
   #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5f/0x1b80 net/ipv4/ip_output.c:470
   #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline]
   #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
   #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x45f/0x1390 net/ipv4/ip_output.c:228
   #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
   #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x33b/0x15b0 net/core/dev.c:6104
   #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline]
   #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
   #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x230/0x5f0 net/ipv4/ip_input.c:232
   #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
   #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328

  stack backtrace:
  CPU: 0 UID: 0 PID: 5113 Comm: syz-executor364 Not tainted 6.11.0-rc6-syzkaller-00019-g67784a74e258 #0
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
  Call Trace:
   <IRQ>
   __dump_stack lib/dump_stack.c:93 [inline]
   dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
   check_deadlock kernel/locking/lockdep.c:3061 [inline]
   validate_chain+0x15d3/0x5900 kernel/locking/lockdep.c:3855
   __lock_acquire+0x137a/0x2040 kernel/locking/lockdep.c:5142
   lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5759
   __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
   _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
   spin_lock include/linux/spinlock.h:351 [inline]
   sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328
   mptcp_sk_clone_init+0x32/0x13c0 net/mptcp/protocol.c:3279
   subflow_syn_recv_sock+0x931/0x1920 net/mptcp/subflow.c:874
   tcp_check_req+0xfe4/0x1a20 net/ipv4/tcp_minisocks.c:853
   tcp_v4_rcv+0x1c3e/0x37f0 net/ipv4/tcp_ipv4.c:2267
   ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205
   ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233
   NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
   NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
   __netif_receive_skb_one_core net/core/dev.c:5661 [inline]
   __netif_receive_skb+0x2bf/0x650 net/core/dev.c:5775
   process_backlog+0x662/0x15b0 net/core/dev.c:6108
   __napi_poll+0xcb/0x490 net/core/dev.c:6772
   napi_poll net/core/dev.c:6841 [inline]
   net_rx_action+0x89b/0x1240 net/core/dev.c:6963
   handle_softirqs+0x2c4/0x970 kernel/softirq.c:554
   do_softirq+0x11b/0x1e0 kernel/softirq.c:455
   </IRQ>
   <TASK>
   __local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382
   local_bh_enable include/linux/bottom_half.h:33 [inline]
   rcu_read_unlock_bh include/linux/rcupdate.h:908 [inline]
   __dev_queue_xmit+0x1763/0x3e90 net/core/dev.c:4450
   dev_queue_xmit include/linux/netdevice.h:3105 [inline]
   neigh_hh_output include/net/neighbour.h:526 [inline]
   neigh_output include/net/neighbour.h:540 [inline]
   ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:235
   ip_local_out net/ipv4/ip_output.c:129 [inline]
   __ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:535
   __tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466
   tcp_rcv_synsent_state_process net/ipv4/tcp_input.c:6542 [inline]
   tcp_rcv_state_process+0x2c32/0x4570 net/ipv4/tcp_input.c:6729
   tcp_v4_do_rcv+0x77d/0xc70 net/ipv4/tcp_ipv4.c:1934
   sk_backlog_rcv include/net/sock.h:1111 [inline]
   __release_sock+0x214/0x350 net/core/sock.c:3004
   release_sock+0x61/0x1f0 net/core/sock.c:3558
   mptcp_sendmsg_fastopen+0x1ad/0x530 net/mptcp/protocol.c:1733
   mptcp_sendmsg+0x1884/0x1b10 net/mptcp/protocol.c:1812
   sock_sendmsg_nosec net/socket.c:730 [inline]
   __sock_sendmsg+0x1a6/0x270 net/socket.c:745
   ____sys_sendmsg+0x525/0x7d0 net/socket.c:2597
   ___sys_sendmsg net/socket.c:2651 [inline]
   __sys_sendmmsg+0x3b2/0x740 net/socket.c:2737
   __do_sys_sendmmsg net/socket.c:2766 [inline]
   __se_sys_sendmmsg net/socket.c:2763 [inline]
   __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2763
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
  RIP: 0033:0x7f04fb13a6b9
  Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 01 1a 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007ffd651f42d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
  RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f04fb13a6b9
  RDX: 0000000000000001 RSI: 0000000020000d00 RDI: 0000000000000004
  RBP: 00007ffd651f4310 R08: 0000000000000001 R09: 0000000000000001
  R10: 0000000020000080 R11: 0000000000000246 R12: 00000000000f4240
  R13: 00007f04fb187449 R14: 00007ffd651f42f4 R15: 00007ffd651f4300
   </TASK>

As noted by Cong Wang, the splat is false positive, but the code
path leading to the report is an unexpected one: a client is
attempting an MPC handshake towards the in-kernel listener created
by the in-kernel PM for a port based signal endpoint.

Such connection will be never accepted; many of them can make the
listener queue full and preventing the creation of MPJ subflow via
such listener - its intended role.

Explicitly detect this scenario at initial-syn time and drop the
incoming MPC request.

Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port")
Cc: [email protected]
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=f4aacdfef2c6a6529c3e
Cc: Cong Wang <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net/smc: Fix searching in list of known pnetids in smc_pnet_add_pnetid

pnetid of pi (not newly allocated pe) should be compared

Fixes: e888a2e8337c ("net/smc: introduce list of pnetids for Ethernet devices")
Reviewed-by: D. Wythe <[email protected]>
Reviewed-by: Wen Gu <[email protected]>
Signed-off-by: Li RongQing <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Gerd Bayer <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: macb: Avoid 20s boot delay by skipping MDIO bus registration for fixed-link PHY

A boot delay was introduced by commit 79540d133ed6 ("net: macb: Fix
handling of fixed-link node"). This delay was caused by the call to
`mdiobus_register()` in cases where a fixed-link PHY was present. The
MDIO bus registration triggered unnecessary PHY address scans, leading
to a 20-second delay due to attempts to detect Clause 45 (C45)
compatible PHYs, despite no MDIO bus being attached.

The commit 79540d133ed6 ("net: macb: Fix handling of fixed-link node")
was originally introduced to fix a regression caused by commit
7897b071ac3b4 ("net: macb: convert to phylink"), which caused the driver
to misinterpret fixed-link nodes as PHY nodes. This resulted in warnings
like:
mdio_bus f0028000.ethernet-ffffffff: fixed-link has invalid PHY address
mdio_bus f0028000.ethernet-ffffffff: scan phy fixed-link at address 0
...
mdio_bus f0028000.ethernet-ffffffff: scan phy fixed-link at address 31

This patch reworks the logic to avoid registering and allocation of the
MDIO bus when:
  - The device tree contains a fixed-link node.
  - There is no "mdio" child node in the device tree.

If a child node named "mdio" exists, the MDIO bus will be registered to
support PHYs  attached to the MACB's MDIO bus. Otherwise, with only a
fixed-link, the MDIO bus is skipped.

Tested on a sama5d35 based system with a ksz8863 switch attached to
macb0.

Fixes: 79540d133ed6 ("net: macb: Fix handling of fixed-link node")
Signed-off-by: Oleksij Rempel <[email protected]>
Cc: [email protected]
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: ethernet: aeroflex: fix potential memory leak in greth_start_xmit_gbit()

The greth_start_xmit_gbit() returns NETDEV_TX_OK without freeing skb
in case of skb->len being too long, add dev_kfree_skb() to fix it.

Fixes: d4c41139df6e ("net: Add Aeroflex Gaisler 10/100/1G Ethernet MAC driver")
Signed-off-by: Wang Hai <[email protected]>
Reviewed-by: Gerhard Engleder <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

netdevsim: use cond_resched() in nsim_dev_trap_report_work()

I am still seeing many syzbot reports hinting that syzbot
might fool nsim_dev_trap_report_work() with hundreds of ports [1]

Lets use cond_resched(), and system_unbound_wq
instead of implicit system_wq.

[1]
INFO: task syz-executor:20633 blocked for more than 143 seconds.
      Not tainted 6.12.0-rc2-syzkaller-00205-g1d227fcc7222 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor    state:D stack:25856 pid:20633 tgid:20633 ppid:1      flags:0x00004006
...
NMI backtrace for cpu 1
CPU: 1 UID: 0 PID: 16760 Comm: kworker/1:0 Not tainted 6.12.0-rc2-syzkaller-00205-g1d227fcc7222 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Workqueue: events nsim_dev_trap_report_work
RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x70 kernel/kcov.c:210
Code: 89 fb e8 23 00 00 00 48 8b 3d 04 fb 9c 0c 48 89 de 5b e9 c3 c7 5d 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <f3> 0f 1e fa 48 8b 04 24 65 48 8b 0c 25 c0 d7 03 00 65 8b 15 60 f0
RSP: 0018:ffffc90000a187e8 EFLAGS: 00000246
RAX: 0000000000000100 RBX: ffffc90000a188e0 RCX: ffff888027d3bc00
RDX: ffff888027d3bc00 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88804a2e6000 R08: ffffffff8a4bc495 R09: ffffffff89da3577
R10: 0000000000000004 R11: ffffffff8a4bc2b0 R12: dffffc0000000000
R13: ffff88806573b503 R14: dffffc0000000000 R15: ffff8880663cca00
FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc90a747f98 CR3: 000000000e734000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 000000000000002b DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
<NMI>
</NMI>
<TASK>
  __local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382
  spin_unlock_bh include/linux/spinlock.h:396 [inline]
  nsim_dev_trap_report drivers/net/netdevsim/dev.c:820 [inline]
  nsim_dev_trap_report_work+0x75d/0xaa0 drivers/net/netdevsim/dev.c:850
  process_one_work kernel/workqueue.c:3229 [inline]
  process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310
  worker_thread+0x870/0xd30 kernel/workqueue.c:3391
  kthread+0x2f0/0x390 kernel/kthread.c:389
  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>

Fixes: ba5e1272142d ("netdevsim: avoid potential loop in nsim_dev_trap_report_work()")
Reported-by: [email protected]
Reported-by: [email protected]
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Jiri Pirko <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

macsec: don't increment counters for an unrelated SA

On RX, we shouldn't be incrementing the stats for an arbitrary SA in
case the actual SA hasn't been set up. Those counters are intended to
track packets for their respective AN when the SA isn't currently
configured. Due to the way MACsec is implemented, we don't keep
counters unless the SA is configured, so we can't track those packets,
and those counters will remain at 0.

The RXSC's stats keeps track of those packets without telling us which
AN they belonged to. We could add counters for non-existent SAs, and
then find a way to integrate them in the dump to userspace, but I
don't think it's worth the effort.

Fixes: 91ec9bd57f35 ("macsec: Fix traffic counters/statistics")
Reported-by: Paolo Abeni <[email protected]>
Signed-off-by: Sabrina Dubroca <[email protected]>
Link: https://patch.msgid.link/f5ac92aaa5b89343232615f4c03f9f95042c6aa0.1728657709.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <[email protected]>

ring-buffer: Fix reader locking when changing the sub buffer order

The function ring_buffer_subbuf_order_set() updates each
ring_buffer_per_cpu and installs new sub buffers that match the requested
page order. This operation may be invoked concurrently with readers that
rely on some of the modified data, such as the head bit (RB_PAGE_HEAD), or
the ring_buffer_per_cpu.pages and reader_page pointers. However, no
exclusive access is acquired by ring_buffer_subbuf_order_set(). Modifying
the mentioned data while a reader also operates on them can then result in
incorrect memory access and various crashes.

Fix the problem by taking the reader_lock when updating a specific
ring_buffer_per_cpu in ring_buffer_subbuf_order_set().

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Link: https://lore.kernel.org/[email protected]
Fixes: 8e7b58c27b3c ("ring-buffer: Just update the subbuffers when changing their allocation order")
Signed-off-by: Petr Pavlu <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

octeontx2-af: Fix potential integer overflows on integer shifts

The left shift int 32 bit integer constants 1 is evaluated using 32 bit
arithmetic and then assigned to a 64 bit unsigned integer. In the case
where the shift is 32 or more this can lead to an overflow. Avoid this
by shifting using the BIT_ULL macro instead.

Fixes: 019aba04f08c ("octeontx2-af: Modify SMQ flush sequence to drop packets")
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Dan Carpenter <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: stmmac: dwmac-tegra: Fix link bring-up sequence

The Tegra MGBE driver sometimes fails to initialize, reporting the
following error, and as a result, it is unable to acquire an IP
address with DHCP:

tegra-mgbe 6800000.ethernet: timeout waiting for link to become ready

As per the recommendation from the Tegra hardware design team, fix this
issue by:
- clearing the PHY_RDY bit before setting the CDR_RESET bit and then
setting PHY_RDY bit before clearing CDR_RESET bit. This ensures valid
data is present at UPHY RX inputs before starting the CDR lock.
- adding the required delays when bringing up the UPHY lane. Note we
need to use delays here because there is no alternative, such as
polling, for these cases. Using the usleep_range() instead of ndelay()
as sleeping is preferred over busy wait loop.

Without this change we would see link failures on boot sometimes as
often as 1 in 5 boots. With this fix we have not observed any failures
in over 1000 boots.

Fixes: d8ca113724e7 ("net: stmmac: tegra: Add MGBE support")
Signed-off-by: Paritosh Dixit <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: usb: usbnet: fix race in probe failure

The same bug as in the disconnect code path also exists
in the case of a failure late during the probe process.
The flag must also be set.

Signed-off-by: Oliver Neukum <[email protected]>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net/smc: Fix memory leak when using percpu refs

This patch adds missing percpu_ref_exit when releasing percpu refs.
When releasing percpu refs, percpu_ref_exit should be called.
Otherwise, memory leak happens.

Fixes: 79a22238b4f2 ("net/smc: Use percpu ref for wr tx reference")
Signed-off-by: Kai Shen <[email protected]>
Reviewed-by: Dust Li <[email protected]>
Reviewed-by: Wenjia Zhang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'posix-clock-fix-missing-timespec64-check-for-ptp-clock'

Jinjie Ruan says:

====================
posix-clock: Fix missing timespec64 check for PTP clock

Check timespec64 in pc_clock_settime() for PTP clock as
the man manual of clock_settime() said.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: lan743x: Remove duplicate check

Since timespec64_valid() has been checked in higher layer
pc_clock_settime(), the duplicate check in lan743x_ptpci_settime64()
can be removed.

Acked-by: Richard Cochran <[email protected]>
Signed-off-by: Jinjie Ruan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

posix-clock: Fix missing timespec64 check in pc_clock_settime()

As Andrew pointed out, it will make sense that the PTP core
checked timespec64 struct's tv_sec and tv_nsec range before calling
ptp->info->settime64().

As the man manual of clock_settime() said, if tp.tv_sec is negative or
tp.tv_nsec is outside the range [0..999,999,999], it should return EINVAL,
which include dynamic clocks which handles PTP clock, and the condition is
consistent with timespec64_valid(). As Thomas suggested, timespec64_valid()
only check the timespec is valid, but not ensure that the time is
in a valid range, so check it ahead using timespec64_valid_strict()
in pc_clock_settime() and return -EINVAL if not valid.

There are some drivers that use tp->tv_sec and tp->tv_nsec directly to
write registers without validity checks and assume that the higher layer
has checked it, which is dangerous and will benefit from this, such as
hclge_ptp_settime(), igb_ptp_settime_i210(), _rcar_gen4_ptp_settime(),
and some drivers can remove the checks of itself.

Cc: [email protected]
Fixes: 0606f422b453 ("posix clocks: Introduce dynamic clocks")
Acked-by: Richard Cochran <[email protected]>
Suggested-by: Andrew Lunn <[email protected]>
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Jinjie Ruan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

sched_ext: Remove unnecessary cpu_relax()

As described in commit b07996c7abac ("sched_ext: Don't hold
scx_tasks_lock for too long"), we're doing a cond_resched() every 32
calls to scx_task_iter_next() to avoid RCU and other stalls. That commit
also added a cpu_relax() to the codepath where we drop and reacquire the
lock, but as Waiman described in [0], cpu_relax() should only be
necessary in busy loops to avoid pounding on a cacheline (or to allow a
hypertwin to more fully utilize a core).

Let's remove the unnecessary cpu_relax().

[0]: https://lore.kernel.org/all/35b3889b-904a-4d26-981f-c8aa1557a7c7@redhat.com/

Cc: Waiman Long <[email protected]>
Signed-off-by: David Vernet <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>

ring-buffer: Fix refcount setting of boot mapped buffers

A ring buffer which has its buffered mapped at boot up to fixed memory
should not be freed. Other buffers can be. The ref counting setup was
wrong for both. It made the not mapped buffers ref count have zero, and the
boot mapped buffer a ref count of 1. But an normally allocated buffer
should be 1, where it can be removed.

Keep the ref count of a normal boot buffer with its setup ref count (do
not decrement it), and increment the fixed memory boot mapped buffer's ref
count.

Cc: Mathieu Desnoyers <[email protected]>
Link: https://lore.kernel.org/[email protected]
Fixes: e645535a954ad ("tracing: Add option to use memmapped memory for trace boot instance")
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

Merge tag 'f2fs-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs fix from Jaegeuk Kim:
"An urgent fix to resolve DIO read performance regression caused by
'f2fs: fix to avoid racing in between read and OPU dio write'"

* tag 'f2fs-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: allow parallel DIO reads

Merge tag 'erofs-for-6.12-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:
"The main one fixes a syzbot issue due to the invalid inode type out of
  file-backed mounts. The others are minor cleanups without actual logic
  changes.

  Summary:

   - Make sure only regular inodes can be used for file-backed mounts

   - Two minor codebase cleanups"

* tag 'erofs-for-6.12-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: get rid of kaddr in `struct z_erofs_maprecorder`
  erofs: get rid of z_erofs_try_to_claim_pcluster()
  erofs: ensure regular inodes for file-backed mounts

MAINTAINERS: add Andrew Lunn as a co-maintainer of all networking drivers

Andrew has been a pillar of the community for as long as I remember.
Focusing on embedded networking, co-maintaining Ethernet PHYs and
DSA code, but also actively reviewing MAC and integrated NIC drivers.
Elevate Andrew to the status of co-maintainer of all netdev drivers.

Acked-by: Andrew Lunn <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

bcachefs: Fix sysfs warning in fstests generic/730,731

sysfs warns if we're removing a symlink from a directory that's no
longer in sysfs; this is triggered by fstests generic/730, which
simulates hot removal of a block device.

This patch is however not a correct fix, since checking
kobj->state_in_sysfs on a kobj owned by another subsystem is racy.

A better fix would be to add the appropriate check to
sysfs_remove_link() - and sysfs_create_link() as well.

But kobject_add_internal()/kobject_del() do not as of today have locking
that would support that.

Note that the block/holder.c code appears to be subject to this race as
well.

Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Handle race between stripe reuse, invalidate_stripe_to_dev

When creating a new stripe, we may reuse an existing stripe that has
some empty and some nonempty blocks.

Generally, the existing stripe won't change underneath us - except for
block sector counts, which we copy to the new key in
ec_stripe_key_update.

But the device removal path can now invalidate stripe pointers to a
device, and that can race with stripe reuse.

Change ec_stripe_key_update() to check for and resolve this
inconsistency.

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Fix kasan splat in new_stripe_alloc_buckets()

Update for BCH_SB_MEMBER_INVALID.

Signed-off-by: Kent Overstreet <[email protected]>

Merge tag 'hid-for-linus-2024101301' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

- fix for memory corruption regression in amd_sfh driver (Basavaraj
   Natikar)

- fix for mis-reporting of BTN_TOOL_PEN and BTN_TOOL_RUBBER for AES
   sensors tools in Wacom driver (Jason Gerecke)

- fix for unitialized variable use in intel-ish-hid driver
   (SurajSonawane2415)

- a few device-specific quirks / device ID additions

* tag 'hid-for-linus-2024101301' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: wacom: Hardcode (non-inverted) AES pens as BTN_TOOL_PEN
  HID: amd_sfh: Switch to device-managed dmam_alloc_coherent()
  HID: multitouch: Add quirk for HONOR MagicBook Art 14 touchpad
  HID: multitouch: Add support for B2402FVA track point
  HID: plantronics: Workaround for an unexcepted opposite volume key
  hid: intel-ish-hid: Fix uninitialized variable 'rv' in ish_fw_xfer_direct_dma

bcachefs: Add missing validation for bch_stripe.csum_granularity_bits

Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Fix missing bounds checks in bch2_alloc_read()

We were checking that the alloc key was for a valid device, but not a
valid bucket.

This is the upgrade path from versions prior to bcachefs being mainlined.

Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: fix uaf in bch2_dio_write_done()

Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>

Linux 6.12-rc3

Merge tag '6.12-rc2-cifs-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
"Two fixes for Windows symlink handling"

* tag '6.12-rc2-cifs-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix creating native symlinks pointing to current or parent directory
cifs: Improve creating native symlinks pointing to directory

Merge tag 'usb-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some small USB fixes for some reported problems for 6.12-rc3.
  Include in here is:

   - fix for yurex driver that was caused in -rc1

   - build error fix for usbg network filesystem code

   - onboard_usb_dev build fix

   - dwc3 driver fixes for reported errors

   - gadget driver fix

   - new USB storage driver quirk

   - xhci resume bugfix

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  net/9p/usbg: Fix build error
  USB: yurex: kill needless initialization in yurex_read
  Revert "usb: yurex: Replace snprintf() with the safer scnprintf() variant"
  usb: xhci: Fix problem with xhci resume from suspend
  usb: misc: onboard_usb_dev: introduce new config symbol for usb5744 SMBus support
  usb: dwc3: core: Stop processing of pending events if controller is halted
  usb: dwc3: re-enable runtime PM after failed resume
  usb: storage: ignore bogus device raised by JieLi BR21 USB sound chip
  usb: gadget: core: force synchronous registration

Merge tag 'driver-core-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
"Here is a single driver core fix, and a .mailmap update.

  The fix is for the rust driver core bindings, turned out that the
  from_raw binding wasn't a good idea (don't want to pass a pointer to a
  reference counted object without actually incrementing the pointer.)
  So this change fixes it up as the from_raw binding came in in -rc1.

  The other change is a .mailmap update.

  Both have been in linux-next for a while with no reported issues"

* tag 'driver-core-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  mailmap: update mail for Fiona Behrens
  rust: device: change the from_raw() function

Merge tag 'powerpc-6.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:

- Fix crash in memcpy on 8xx due to dcbz workaround since recent
changes

Thanks to Christophe Leroy.

* tag 'powerpc-6.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/8xx: Fix kernel DTLB miss on dcbz

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Four small fixes, three in drivers and one in the FC transport class
  to add idempotence to state setting"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: scsi_transport_fc: Allow setting rport state to current state
  scsi: wd33c93: Don't use stale scsi_pointer value
  scsi: fnic: Move flush_work initialization out of if block
  scsi: ufs: Use pre-calculated offsets in ufshcd_init_lrb()

Merge tag 'hwmon-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

- Add missing dependencies on REGMAP_I2C for several drivers

- Fix memory leak in adt7475 driver

- Relabel Columbiaville temperature sensor in intel-m10-bmc-hwmon
   driver to match other sensor labels

* tag 'hwmon-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (max1668) Add missing dependency on REGMAP_I2C
  hwmon: (ltc2991) Add missing dependency on REGMAP_I2C
  hwmon: (adt7470) Add missing dependency on REGMAP_I2C
  hwmon: (adm9240) Add missing dependency on REGMAP_I2C
  hwmon: (mc34vr500) Add missing dependency on REGMAP_I2C
  hwmon: (tmp513) Add missing dependency on REGMAP_I2C
  hwmon: (adt7475) Fix memory leak in adt7475_fan_pwm_config()
  hwmon: intel-m10-bmc-hwmon: relabel Columbiaville to CVL Die Temperature

bcachefs: Improve check_snapshot_exists()

Check if we have snapshot_trees or subvolumes that refer to the snapshot
node being reconstructed, and use them.

With this, the kill_btree_root test that blows away the snapshots btree
now passes, and we're able to successfully reconstruct.

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Fix bkey_nocow_lock()

This fixes an assertion pop in nocow_locking.c

00243 kernel BUG at fs/bcachefs/nocow_locking.c:41!
00243 Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
00243 Modules linked in:
00243 Hardware name: linux,dummy-virt (DT)
00243 pstate: 60001005 (nZCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00244 pc : bch2_bucket_nocow_unlock (/home/testdashboard/linux-7/fs/bcachefs/nocow_locking.c:41)
00244 lr : bkey_nocow_lock (/home/testdashboard/linux-7/fs/bcachefs/data_update.c:79)
00244 sp : ffffff80c82373b0
00244 x29: ffffff80c82373b0 x28: ffffff80e08958c0 x27: ffffff80e0880000
00244 x26: ffffff80c8237a98 x25: 00000000000000a0 x24: ffffff80c8237ab0
00244 x23: 00000000000000c0 x22: 0000000000000008 x21: 0000000000000000
00244 x20: ffffff80c8237a98 x19: 0000000000000018 x18: 0000000000000000
00244 x17: 0000000000000000 x16: 000000000000003f x15: 0000000000000000
00244 x14: 0000000000000008 x13: 0000000000000018 x12: 0000000000000000
00244 x11: 0000000000000000 x10: ffffff80e0880000 x9 : ffffffc0803ac1a4
00244 x8 : 0000000000000018 x7 : ffffff80c8237a88 x6 : ffffff80c8237ab0
00244 x5 : ffffff80e08988d0 x4 : 00000000ffffffff x3 : 0000000000000000
00244 x2 : 0000000000000004 x1 : 0003000000000d1e x0 : ffffff80e08988c0
00244 Call trace:
00244 bch2_bucket_nocow_unlock (/home/testdashboard/linux-7/fs/bcachefs/nocow_locking.c:41)
00245 bch2_data_update_init (/home/testdashboard/linux-7/fs/bcachefs/data_update.c:627 (discriminator 1))
00245 promote_alloc.isra.0 (/home/testdashboard/linux-7/fs/bcachefs/io_read.c:242 /home/testdashboard/linux-7/fs/bcachefs/io_read.c:304)
00245 __bch2_read_extent (/home/testdashboard/linux-7/fs/bcachefs/io_read.c:949)
00246 __bch2_read (/home/testdashboard/linux-7/fs/bcachefs/io_read.c:1215)
00246 bch2_direct_IO_read (/home/testdashboard/linux-7/fs/bcachefs/fs-io-direct.c:132)
00246 bch2_read_iter (/home/testdashboard/linux-7/fs/bcachefs/fs-io-direct.c:201)
00247 aio_read.constprop.0 (/home/testdashboard/linux-7/fs/aio.c:1602)
00247 io_submit_one.constprop.0 (/home/testdashboard/linux-7/fs/aio.c:2003 /home/testdashboard/linux-7/fs/aio.c:2052)
00248 __arm64_sys_io_submit (/home/testdashboard/linux-7/fs/aio.c:2111 /home/testdashboard/linux-7/fs/aio.c:2081 /home/testdashboard/linux-7/fs/aio.c:2081)
00248 invoke_syscall.constprop.0 (/home/testdashboard/linux-7/arch/arm64/include/asm/syscall.h:61 /home/testdashboard/linux-7/arch/arm64/kernel/syscall.c:54)
00248 ========= FAILED TIMEOUT tiering_variable_buckets_replicas in 1200s

Signed-off-by: Kent Overstreet <[email protected]>