]> Git Repo - linux.git/log
linux.git
3 months agobnxt_en: Add functions to copy host context memory
Sreekanth Reddy [Fri, 15 Nov 2024 15:14:34 +0000 (07:14 -0800)]
bnxt_en: Add functions to copy host context memory

Host context memory is used by the newer chips to store context
information for various L2 and RoCE states and FW logs.  This
information will be useful for debugging.  This patch adds the
functions to copy all pages of a context memory type to a contiguous
buffer.  The next patches will include the context memory dump
during ethtool -w coredump.

Reviewed-by: Pavan Chebbi <[email protected]>
Reviewed-by: Hongguang Gao <[email protected]>
Co-developed-by: Shruti Parab <[email protected]>
Signed-off-by: Shruti Parab <[email protected]>
Signed-off-by: Sreekanth Reddy <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agobnxt_en: Do not free FW log context memory
Hongguang Gao [Fri, 15 Nov 2024 15:14:33 +0000 (07:14 -0800)]
bnxt_en: Do not free FW log context memory

If FW supports appending new FW logs to an offset in the context
memory after FW reset, then do not free this type of context memory
during reset.  The driver will provide the initial offset to the FW
when configuring this type of context memory.  This way, we don't lose
the older FW logs after reset.

Signed-off-by: Hongguang Gao <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agobnxt_en: Manage the FW trace context memory
Shruti Parab [Fri, 15 Nov 2024 15:14:32 +0000 (07:14 -0800)]
bnxt_en: Manage the FW trace context memory

The FW trace memory pages will be added to the ethtool -w coredump
in later patches.  In addition to the raw data, the driver has to
add a header to provide the head and tail information on each FW
trace log segment when creating the coredump.  The FW sends an async
message to the driver after DMAing a chunk of logs to the context
memory to indicate the last offset containing the tail of the logs.
The driver needs to keep track of that.

Reviewed-by: Hongguang Gao <[email protected]>
Signed-off-by: Shruti Parab <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agobnxt_en: Allocate backing store memory for FW trace logs
Shruti Parab [Fri, 15 Nov 2024 15:14:31 +0000 (07:14 -0800)]
bnxt_en: Allocate backing store memory for FW trace logs

Allocate the new FW trace log backing store context memory types
if they are supported by the FW.  FW debug logs are DMA'ed to the host
backing store memory when the on-chip buffers are full.  If host
memory cannot be allocated for these memory types, the driver
will not abort.

Reviewed-by: Hongguang Gao <[email protected]>
Signed-off-by: Shruti Parab <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agobnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem()
Hongguang Gao [Fri, 15 Nov 2024 15:14:30 +0000 (07:14 -0800)]
bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem()

If 'force' is false, it will keep the memory pages and all data
structures for the context memory type if the memory is valid.

This patch always passes true for the 'force' parameter so there is
no change in behavior.  Later patches will adjust the 'force' parameter
for the FW log context memory types so that the logs will not be reset
after FW reset.

Signed-off-by: Hongguang Gao <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agobnxt_en: Refactor bnxt_free_ctx_mem()
Hongguang Gao [Fri, 15 Nov 2024 15:14:29 +0000 (07:14 -0800)]
bnxt_en: Refactor bnxt_free_ctx_mem()

Add a new function bnxt_free_one_ctx_mem() to free one context
memory type.  bnxt_free_ctx_mem() now calls the new function in
the loop to free each context memory type.  There is no change in
behavior.  Later patches will further make use of the new function.

Signed-off-by: Hongguang Gao <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agobnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type
Shruti Parab [Fri, 15 Nov 2024 15:14:28 +0000 (07:14 -0800)]
bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type

Add a new bit to struct bnxt_ctx_mem_type to indicate that host
memory has been successfully allocated for this context memory type.
In the next patches, we'll be adding some additional context memory
types for FW debugging/logging.  If memory cannot be allocated for
any of these new types, we will not abort and the cleared mem_valid
bit will indicate to skip configuring the memory type.

Reviewed-by: Hongguang Gao <[email protected]>
Signed-off-by: Shruti Parab <[email protected]>
Signed-of-by: Michael Chan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agobnxt_en: Update firmware interface spec to 1.10.3.85
Michael Chan [Fri, 15 Nov 2024 15:14:27 +0000 (07:14 -0800)]
bnxt_en: Update firmware interface spec to 1.10.3.85

The major change is the new firmware command to flush the FW debug
logs to the host backing store context memory buffers.

Reviewed-by: Hongguang Gao <[email protected]>
Reviewed-by: Kalesh AP <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agoMerge branch 'wireguard-updates-and-fixes-for-6-13'
Jakub Kicinski [Tue, 19 Nov 2024 03:32:33 +0000 (19:32 -0800)]
Merge branch 'wireguard-updates-and-fixes-for-6-13'

Jason A. Donenfeld says:

====================
wireguard updates and fixes for 6.13

This tiny series (+3/-2) fixes one bug and has three small improvements.

1) Fix running the netns.sh test suite on systems that haven't yet
   inserted the nf_conntrack module.

2) Remove a stray useless function call in a selftest.

3) There's no need to zero out the netdev private data in recent
   kernels.

4) Set the TSO max size to be GSO_MAX_SIZE, so that we aggregate larger
   packets. Daniel reports seeing a 15% improvement in a simple load and
   suggested the speedups would be even better in more complex loads.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agowireguard: device: support big tcp GSO
Daniel Borkmann [Sun, 17 Nov 2024 21:20:30 +0000 (22:20 +0100)]
wireguard: device: support big tcp GSO

Advertise GSO_MAX_SIZE as TSO max size in order support BIG TCP for wireguard.
This helps to improve wireguard performance a bit when enabled as it allows
wireguard to aggregate larger skbs in wg_packet_consume_data_done() via
napi_gro_receive(), but also allows the stack to build larger skbs on xmit
where the driver then segments them before encryption inside wg_xmit().
We've seen a 15% improvement in TCP stream performance.

Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agowireguard: selftests: load nf_conntrack if not present
Hangbin Liu [Sun, 17 Nov 2024 21:20:29 +0000 (22:20 +0100)]
wireguard: selftests: load nf_conntrack if not present

Some distros may not load nf_conntrack by default, which will cause
subsequent nf_conntrack sets to fail. Load this module if it is not
already loaded.

Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Hangbin Liu <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
[ Jason: add [[ -e ... ]] check so this works in the qemu harness. ]
Signed-off-by: Jason A. Donenfeld <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agowireguard: allowedips: remove redundant selftest call
Dheeraj Reddy Jonnalagadda [Sun, 17 Nov 2024 21:20:28 +0000 (22:20 +0100)]
wireguard: allowedips: remove redundant selftest call

This commit fixes a useless call issue detected by Coverity (CID
1508092). The call to horrible_allowedips_lookup_v4 is unnecessary as
its return value is never checked.

Signed-off-by: Dheeraj Reddy Jonnalagadda <[email protected]>
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agowireguard: device: omit unnecessary memset of netdev private data
Tobias Klauser [Sun, 17 Nov 2024 21:20:27 +0000 (22:20 +0100)]
wireguard: device: omit unnecessary memset of netdev private data

The memory for netdev_priv is allocated using kvzalloc in
alloc_netdev_mqs before rtnl_link_ops->setup is called so there is no
need to zero it again in wg_setup.

Signed-off-by: Tobias Klauser <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agonet: ip: fix unexpected return in fib_validate_source()
Menglong Dong [Mon, 18 Nov 2024 09:14:27 +0000 (17:14 +0800)]
net: ip: fix unexpected return in fib_validate_source()

The errno should be replaced with drop reasons in fib_validate_source(),
and the "-EINVAL" shouldn't be returned. And this causes a warning, which
is reported by syzkaller:

netlink: 'syz-executor371': attribute type 4 has an invalid length.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 5842 at net/core/skbuff.c:1219 __sk_skb_reason_drop net/core/skbuff.c:1216 [inline]
WARNING: CPU: 0 PID: 5842 at net/core/skbuff.c:1219 sk_skb_reason_drop+0x87/0x380 net/core/skbuff.c:1241
Modules linked in:
CPU: 0 UID: 0 PID: 5842 Comm: syz-executor371 Not tainted 6.12.0-rc6-syzkaller-01362-ga58f00ed24b8 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
RIP: 0010:__sk_skb_reason_drop net/core/skbuff.c:1216 [inline]
RIP: 0010:sk_skb_reason_drop+0x87/0x380 net/core/skbuff.c:1241
Code: 00 00 00 fc ff df 41 8d 9e 00 00 fc ff bf 01 00 fc ff 89 de e8 ea 9f 08 f8 81 fb 00 00 fc ff 77 3a 4c 89 e5 e8 9a 9b 08 f8 90 <0f> 0b 90 eb 5e bf 01 00 00 00 89 ee e8 c8 9f 08 f8 85 ed 0f 8e 49
RSP: 0018:ffffc90003d57078 EFLAGS: 00010293
RAX: ffffffff898c3ec6 RBX: 00000000fffbffea RCX: ffff8880347a5a00
RDX: 0000000000000000 RSI: 00000000fffbffea RDI: 00000000fffc0001
RBP: dffffc0000000000 R08: ffffffff898c3eb6 R09: 1ffff110023eb7d4
R10: dffffc0000000000 R11: ffffed10023eb7d5 R12: dffffc0000000000
R13: ffff888011f5bdc0 R14: 00000000ffffffea R15: 0000000000000000
FS:  000055557d41e380(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056519d31d608 CR3: 000000007854e000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 kfree_skb_reason include/linux/skbuff.h:1263 [inline]
 ip_rcv_finish_core+0xfde/0x1b50 net/ipv4/ip_input.c:424
 ip_list_rcv_finish net/ipv4/ip_input.c:610 [inline]
 ip_sublist_rcv+0x3b1/0xab0 net/ipv4/ip_input.c:636
 ip_list_rcv+0x42b/0x480 net/ipv4/ip_input.c:670
 __netif_receive_skb_list_ptype net/core/dev.c:5715 [inline]
 __netif_receive_skb_list_core+0x94e/0x980 net/core/dev.c:5762
 __netif_receive_skb_list net/core/dev.c:5814 [inline]
 netif_receive_skb_list_internal+0xa51/0xe30 net/core/dev.c:5905
 netif_receive_skb_list+0x55/0x4b0 net/core/dev.c:5957
 xdp_recv_frames net/bpf/test_run.c:280 [inline]
 xdp_test_run_batch net/bpf/test_run.c:361 [inline]
 bpf_test_run_xdp_live+0x1b5e/0x21b0 net/bpf/test_run.c:390
 bpf_prog_test_run_xdp+0x805/0x11e0 net/bpf/test_run.c:1318
 bpf_prog_test_run+0x2e4/0x360 kernel/bpf/syscall.c:4266
 __sys_bpf+0x48d/0x810 kernel/bpf/syscall.c:5671
 __do_sys_bpf kernel/bpf/syscall.c:5760 [inline]
 __se_sys_bpf kernel/bpf/syscall.c:5758 [inline]
 __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5758
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f18af25a8e9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffee4090af8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f18af25a8e9
RDX: 0000000000000048 RSI: 0000000020000600 RDI: 000000000000000a
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

Fix it by returning "-SKB_DROP_REASON_IP_LOCAL_SOURCE" instead of
"-EINVAL" in fib_validate_source().

Reported-by: [email protected]
Closes: https://lore.kernel.org/netdev/[email protected]/
Fixes: 82d9983ebeb8 ("net: ip: make ip_route_input_noref() return drop reasons")
Signed-off-by: Menglong Dong <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agonet/fungible: Remove unused fun_create_queue
Dr. David Alan Gilbert [Sat, 16 Nov 2024 15:26:44 +0000 (15:26 +0000)]
net/fungible: Remove unused fun_create_queue

fun_create_queue was added in 2022 by
commit e1ffcc66818f ("net/fungible: Add service module for Fungible
drivers")
but hasn't been used.

Remove it.

Also remove the static helper functions it was the only user of.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agoMerge branch 'uapi-ethtool-avoid-flex-array-in-struct-ethtool_link_settings'
Jakub Kicinski [Tue, 19 Nov 2024 02:52:14 +0000 (18:52 -0800)]
Merge branch 'uapi-ethtool-avoid-flex-array-in-struct-ethtool_link_settings'

Kees Cook says:

====================
UAPI: ethtool: Avoid flex-array in struct ethtool_link_settings

This reverts the tagged struct group in struct ethtool_link_settings and
instead just removes the flexible array member from Linux's view as it
is entirely unused.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agoUAPI: ethtool: Avoid flex-array in struct ethtool_link_settings
Kees Cook [Fri, 15 Nov 2024 20:43:05 +0000 (12:43 -0800)]
UAPI: ethtool: Avoid flex-array in struct ethtool_link_settings

struct ethtool_link_settings tends to be used as a header for other
structures that have trailing bytes[1], but has a trailing flexible array
itself. Using this overlapped with other structures leads to ambiguous
object sizing in the compiler, so we want to avoid such situations (which
have caused real bugs in the past). Detecting this can be done with
-Wflex-array-member-not-at-end, which will need to be enabled globally.

Using a tagged struct_group() to create a new ethtool_link_settings_hdr
structure isn't possible as it seems we cannot use the tagged variant of
struct_group() due to syntax issues from C++'s perspective (even within
"extern C")[2]. Instead, we can just leave the offending member defined
in UAPI and remove it from the kernel's view of the structure, as Linux
doesn't actually use this member at all. There is also no change in
size since it was already a flexible array that didn't contribute to
size returned by any use of sizeof().

Reported-by: Jakub Kicinski <[email protected]>
Closes: https://lore.kernel.org/lkml/[email protected]/ [2]
Link: https://lore.kernel.org/lkml/0bc2809fe2a6c11dd4c8a9a10d9bd65cccdb559b.1730238285.git.gustavoars@kernel.org/
Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agoRevert "UAPI: ethtool: Use __struct_group() in struct ethtool_link_settings"
Kees Cook [Fri, 15 Nov 2024 20:43:04 +0000 (12:43 -0800)]
Revert "UAPI: ethtool: Use __struct_group() in struct ethtool_link_settings"

This reverts commit 43d3487035e9a86fad952de4240a518614240d43. We cannot
use tagged struct groups in UAPI because C++ will throw syntax errors
even under "extern C".

Signed-off-by: Kees Cook <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agoRevert "net: ethtool: Avoid thousands of -Wflex-array-member-not-at-end warnings"
Kees Cook [Fri, 15 Nov 2024 20:43:03 +0000 (12:43 -0800)]
Revert "net: ethtool: Avoid thousands of -Wflex-array-member-not-at-end warnings"

This reverts commit 3bd9b9abdf1563a22041b7255baea6d449902f1a. We cannot
use the new tagged struct group because it throws C++ errors even under
"extern C".

Signed-off-by: Kees Cook <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agoselftests: net: add more info to error in bpf_offload
Jakub Kicinski [Fri, 15 Nov 2024 20:12:36 +0000 (12:12 -0800)]
selftests: net: add more info to error in bpf_offload

bpf_offload caught a spurious warning in TC recently, but the error
message did not provide enough information to know what the problem
is:

  FAIL: Found 'netdevsim' in command output, leaky extack?

Add the extack to the output:

  FAIL: Unexpected command output, leaky extack? ('netdevsim', 'Warning: Filter with specified priority/protocol not found.')

Acked-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agonet/smc: Run patches also by RDMA ML
Gerd Bayer [Fri, 15 Nov 2024 17:44:57 +0000 (18:44 +0100)]
net/smc: Run patches also by RDMA ML

Commits for the SMC protocol usually get carried through the netdev
mailing list. Some portions use InfiniBand verbs that are discussed on
the RDMA mailing list. So run patches by that list too to increase the
likelihood that all interested parties can see them.

Signed-off-by: Gerd Bayer <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agoMerge branch 'mptcp-pm-lockless-list-traversal-and-cleanup'
Jakub Kicinski [Tue, 19 Nov 2024 02:50:14 +0000 (18:50 -0800)]
Merge branch 'mptcp-pm-lockless-list-traversal-and-cleanup'

Matthieu Baerts says:

====================
mptcp: pm: lockless list traversal and cleanup

Here are two patches improving the MPTCP in-kernel path-manager.

- Patch 1: the get and dump endpoints operations are iterating over the
  endpoints list in a lockless way.

- Patch 2: reduce the code duplication to lookup an endpoint.
====================

Link: https://patch.msgid.link/20241115-net-next-mptcp-pm-lockless-dump-v1-0-f4a1bcb4ca2c@kernel.org
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agomptcp: pm: avoid code duplication to lookup endp
Geliang Tang [Fri, 15 Nov 2024 16:52:35 +0000 (17:52 +0100)]
mptcp: pm: avoid code duplication to lookup endp

The helper __lookup_addr() can be used in mptcp_pm_nl_get_local_id()
and mptcp_pm_nl_is_backup() to simplify the code, and avoid code
duplication.

Co-developed-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://patch.msgid.link/20241115-net-next-mptcp-pm-lockless-dump-v1-2-f4a1bcb4ca2c@kernel.org
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agomptcp: pm: lockless list traversal to dump endp
Matthieu Baerts (NGI0) [Fri, 15 Nov 2024 16:52:34 +0000 (17:52 +0100)]
mptcp: pm: lockless list traversal to dump endp

To return an endpoint to the userspace via Netlink, and to dump all of
them, the endpoint list was iterated while holding the pernet->lock, but
only to read the content of the list.

In these cases, the spin locks can be replaced by RCU read ones, and use
the _rcu variants to iterate over the entries list in a lockless way.

Note that the __lookup_addr_by_id() helper has been modified to use the
_rcu variants of list_for_each_entry(), but with an extra conditions, so
it can be called either while the RCU read lock is held, or when the
associated pernet->lock is held.

Reviewed-by: Geliang Tang <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://patch.msgid.link/20241115-net-next-mptcp-pm-lockless-dump-v1-1-f4a1bcb4ca2c@kernel.org
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agostmmac: dwmac-intel-plat: remove redundant dwmac->data check in probe
Vitalii Mordan [Fri, 15 Nov 2024 13:26:32 +0000 (16:26 +0300)]
stmmac: dwmac-intel-plat: remove redundant dwmac->data check in probe

The driver’s compatibility with devices is confirmed earlier in
platform_match(). Since reaching probe means the device is valid,
the extra check can be removed to simplify the code.

Signed-off-by: Vitalii Mordan <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agonet: txgbe: remove GPIO interrupt controller
Jiawen Wu [Fri, 15 Nov 2024 07:15:27 +0000 (15:15 +0800)]
net: txgbe: remove GPIO interrupt controller

Since the GPIO interrupt controller is always not working properly, we need
to constantly add workaround to cope with hardware deficiencies. So just
remove GPIO interrupt controller, and let the SFP driver poll the GPIO
status.

Fixes: b4a2496c17ed ("net: txgbe: fix GPIO interrupt blocking")
Signed-off-by: Jiawen Wu <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agoMerge branch 'eth-fbnic-cleanup-and-add-a-few-stats'
Jakub Kicinski [Tue, 19 Nov 2024 02:43:45 +0000 (18:43 -0800)]
Merge branch 'eth-fbnic-cleanup-and-add-a-few-stats'

Jakub Kicinski says:

====================
eth: fbnic: cleanup and add a few stats

Cleanup trival problems with fbnic and add the PCIe and RPC (Rx parser)
stats.

All stats are read under rtnl_lock for now, so the code is pretty
trivial. We'll need to add more locking when we start gathering
drops used by .ndo_get_stats64.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agoeth: fbnic: add RPC hardware statistics
Sanman Pradhan [Fri, 15 Nov 2024 01:53:44 +0000 (17:53 -0800)]
eth: fbnic: add RPC hardware statistics

Report Rx parser statistics via ethtool -S.

The parser stats are 32b, so we need to add refresh to the service
task to make sure we don't miss overflows.

Signed-off-by: Sanman Pradhan <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agoeth: fbnic: add PCIe hardware statistics
Sanman Pradhan [Fri, 15 Nov 2024 01:53:43 +0000 (17:53 -0800)]
eth: fbnic: add PCIe hardware statistics

Add PCIe hardware statistics support to the fbnic driver. These stats
provide insight into PCIe transaction performance and error conditions.

Which includes, read/write and completion TLP counts and DWORD counts and
debug counters for tag, completion credit and NP credit exhaustion

The stats are exposed via debugfs and can be used to monitor PCIe
performance and debug PCIe issues.

Signed-off-by: Sanman Pradhan <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agoeth: fbnic: add basic debugfs structure
Jakub Kicinski [Fri, 15 Nov 2024 01:53:42 +0000 (17:53 -0800)]
eth: fbnic: add basic debugfs structure

Add the usual debugfs structure:

 fbnic/
   $pci-id/
     device-fileA
     device-fileB

This patch only adds the directories, subsequent changes
will add files.

Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Kalesh AP <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agoeth: fbnic: add missing header guards
Jakub Kicinski [Fri, 15 Nov 2024 01:53:41 +0000 (17:53 -0800)]
eth: fbnic: add missing header guards

While adding the SPDX headers I noticed we're also missing
a header guard.

Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Kalesh AP <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agoeth: fbnic: add missing SPDX headers
Jakub Kicinski [Fri, 15 Nov 2024 01:53:40 +0000 (17:53 -0800)]
eth: fbnic: add missing SPDX headers

Paolo noticed that we are missing SPDX headers, add them.

Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Kalesh AP <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agoselftests: net: netlink-dumps: validation checks
Jakub Kicinski [Fri, 15 Nov 2024 00:32:48 +0000 (16:32 -0800)]
selftests: net: netlink-dumps: validation checks

The sanity checks are going to get silently cast to unsigned
and always pass. Cast the sizeof to signed size.

Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agonet/neighbor: clear error in case strict check is not set
Jakub Kicinski [Fri, 15 Nov 2024 00:32:21 +0000 (16:32 -0800)]
net/neighbor: clear error in case strict check is not set

Commit 51183d233b5a ("net/neighbor: Update neigh_dump_info for strict
data checking") added strict checking. The err variable is not cleared,
so if we find no table to dump we will return the validation error even
if user did not want strict checking.

I think the only way to hit this is to send an buggy request, and ask
for a table which doesn't exist, so there's no point treating this
as a real fix. I only noticed it because a syzbot repro depended on it
to trigger another bug.

Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agorocker: fix link status detection in rocker_carrier_init()
Dmitry Antipov [Thu, 14 Nov 2024 15:19:46 +0000 (18:19 +0300)]
rocker: fix link status detection in rocker_carrier_init()

Since '1 << rocker_port->pport' may be undefined for port >= 32,
cast the left operand to 'unsigned long long' like it's done in
'rocker_port_set_enable()' above. Compile tested only.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Dmitry Antipov <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agonet: wwan: t7xx: Change PM_AUTOSUSPEND_MS to 5000
Jack Wu [Thu, 14 Nov 2024 10:20:02 +0000 (18:20 +0800)]
net: wwan: t7xx: Change PM_AUTOSUSPEND_MS to 5000

Because optimizing the power consumption of t7XX,
change auto suspend time to 5000.

The Tests uses a script to loop through the power_state
of t7XX.
(for example: /sys/bus/pci/devices/0000\:72\:00.0/power_state)

* If Auto suspend is 20 seconds,
  test script show power_state have 0~5% of the time was in D3 state
  when host don't have data packet transmission.

* Changed auto suspend time to 5 seconds,
  test script show power_state have 50%~80% of the time was in D3 state
  when host don't have data packet transmission.

We tested Fibocom FM350 and our products using the t7xx and they all
benefited from this.

Signed-off-by: Jack Wu <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Sergey Ryazanov <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agotools: ynl-gen: allow uapi headers in sub-dirs
Jakub Kicinski [Wed, 13 Nov 2024 19:32:38 +0000 (11:32 -0800)]
tools: ynl-gen: allow uapi headers in sub-dirs

Binder places its headers under include/uapi/linux/android/
Make sure replace / with _ in the uAPI header guard, the c_upper()
is more strict and only converts - to _. This is likely a good
constraint to have, to enforce sane naming in enums etc.
But paths may include /.

Signed-off-by: Li Li <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agodt-bindings: net: renesas,ether: Drop undocumented "micrel,led-mode"
Rob Herring (Arm) [Wed, 13 Nov 2024 22:57:42 +0000 (16:57 -0600)]
dt-bindings: net: renesas,ether: Drop undocumented "micrel,led-mode"

"micrel,led-mode" is not yet documented by a schema. It's irrelevant to
the example, so just drop it.

Signed-off-by: Rob Herring (Arm) <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Acked-by: Conor Dooley <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 months agoMerge branch 'am65-cpsw-rx-dscp-prio-map'
David S. Miller [Mon, 18 Nov 2024 12:00:22 +0000 (12:00 +0000)]
Merge branch 'am65-cpsw-rx-dscp-prio-map'

Roger Quadros says:

====================
net: ethernet: ti: am65-cpsw: enable DSCP to priority map for RX

Configure default DSCP to User Priority mapping registers as per:
 https://datatracker.ietf.org/doc/html/rfc8325#section-4.3
and
 https://datatracker.ietf.org/doc/html/rfc8622#section-11

Also update Priority to Thread maping to be compliant with
IEEE802.1Q-2014. Priority Code Point (PCP) 2 is higher priority than
PCP 0 (Best Effort). PCP 1 (Background) is lower priority than
PCP 0 (Best Effort).

---
Changes in v4:
- Updated default DSCP to User Priority mapping as per
  https://datatracker.ietf.org/doc/html/rfc8325#section-4.3
  and
  https://datatracker.ietf.org/doc/html/rfc8622#section-11
- Link to v3: https://lore.kernel.org/r/20241109-am65-cpsw-multi-rx-dscp-v3-0-1cfb76928490@kernel.org

Changes in v3:
- Added Reviewed-by tag to patch 1
- Added macros for DSCP PRI field size and DSCP PRI per register
- Drop unnecessary readl() in am65_cpsw_port_set_dscp_map()
- Link to v2: https://lore.kernel.org/r/20241107-am65-cpsw-multi-rx-dscp-v2-0-9e9cd1920035@kernel.org

Changes in v2:
- Updated references to more recent standard IEEE802.1Q-2014.
- Dropped reference to web link which might change in the future.
- Typo fix in commit log.
- Link to v1: https://lore.kernel.org/r/20241105-am65-cpsw-multi-rx-dscp-v1-0-38db85333c88@kernel.org
====================

Signed-off-by: Roger Quadros <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 months agonet: ethernet: ti: am65-cpsw: enable DSCP to priority map for RX
Roger Quadros [Thu, 14 Nov 2024 13:36:53 +0000 (15:36 +0200)]
net: ethernet: ti: am65-cpsw: enable DSCP to priority map for RX

AM65 CPSW hardware can map the 6-bit DSCP/TOS field to
appropriate priority queue via DSCP to Priority mapping registers
(CPSW_PN_RX_PRI_MAP_REG).

Use a default DSCP to User Priority (UP) mapping as per
https://datatracker.ietf.org/doc/html/rfc8325#section-4.3
and
https://datatracker.ietf.org/doc/html/rfc8622#section-11

Signed-off-by: Roger Quadros <[email protected]>
Reviewed-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 months agonet: ethernet: ti: am65-cpsw: update pri_thread_map as per IEEE802.1Q-2014
Roger Quadros [Thu, 14 Nov 2024 13:36:52 +0000 (15:36 +0200)]
net: ethernet: ti: am65-cpsw: update pri_thread_map as per IEEE802.1Q-2014

IEEE802.1Q-2014 supersedes IEEE802.1D-2004. Now Priority Code Point (PCP)
2 is no longer at a lower priority than PCP 0. PCP 1 (Background) is still
at a lower priority than PCP 0 (Best Effort).

Reference:
IEEE802.1Q-2014, Standard for Local and metropolitan area networks
  Table I-2 - Traffic type acronyms
  Table I-3 - Defining traffic types

Signed-off-by: Roger Quadros <[email protected]>
Reviewed-by: Siddharth Vadapalli <[email protected]>
Reviewed-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 months agoMerge branch 'udp-4tuple-hash'
David S. Miller [Mon, 18 Nov 2024 11:56:21 +0000 (11:56 +0000)]
Merge branch 'udp-4tuple-hash'

Philo Lu says:

====================
udp: Add 4-tuple hash for connected sockets

This patchset introduces 4-tuple hash for connected udp sockets, to make
connected udp lookup faster.

Stress test results (with 1 cpu fully used) are shown below, in pps:
(1) _un-connected_ socket as server
    [a] w/o hash4: 1,825176
    [b] w/  hash4: 1,831750 (+0.36%)

(2) 500 _connected_ sockets as server
    [c] w/o hash4:   290860 (only 16% of [a])
    [d] w/  hash4: 1,889658 (+3.1% compared with [b])
With hash4, compute_score is skipped when lookup, so [d] is slightly
better than [b].

Patch1: Add a new counter for hslot2 named hash4_cnt, to avoid cache line
        miss when lookup.
Patch2: Add hslot/hlist_nulls for 4-tuple hash.
Patch3 and 4: Implement 4-tuple hash for ipv4 and ipv6.

The detailed motivation is described in Patch 3.

The 4-tuple hash increases the size of udp_sock and udp_hslot. Thus add it
with CONFIG_BASE_SMALL, i.e., it's a no op with CONFIG_BASE_SMALL.

Intentionally, the feature is not available for udplite. Though udplite
shares some structs and functions with udp, its connect() keeps unchanged.
So all udplite sockets perform the same as un-connected udp sockets.
Besides, udplite also shares the additional memory consumption in udp_sock
and udptable.

changelogs:
v8 -> v9 (Paolo Abeni):
- Add explanation about udplite in cover letter
- Update tags for co-developers
- Add acked-by tags of Paolo and Willem

v7 -> v8:
- add EXPORT_SYMBOL for ipv6.ko build

v6 -> v7 (Kuniyuki Iwashima):
- export udp_ehashfn to be used by udpv6 rehash

v5 -> v6 (Paolo Abeni):
- move udp_table_hash4_init from patch2 to patch1
- use hlist_nulls for lookup-rehash race
- add test results in commit log
- add more comment, e.g., for rehash4 used in hash4
- add ipv6 support (Patch4), and refactor some functions for better
  sharing, without functionality change

v4 -> v5 (Paolo Abeni):
- add CONFIG_BASE_SMALL with which udp hash4 does nothing

v3 -> v4 (Willem de Bruijn):
- fix mistakes in udp_pernet_table_alloc()

RFCv2 -> v3 (Gur Stavi):
- minor fix in udp_hashslot2() and udp_table_init()
- add rcu sync in rehash4()

RFCv1 -> RFCv2:
- add a new struct for hslot2
- remove the sockopt UDP_HASH4 because it has little side effect for
  unconnected sockets
- add rehash in connect()
- re-organize the patch into 3 smaller ones
- other minor fix

v8:
https://lore.kernel.org/all/20241108054836[email protected]/
v7:
https://lore.kernel.org/all/20241105121225[email protected]/
v6:
https://lore.kernel.org/all/20241031124550[email protected]/
v5:
https://lore.kernel.org/all/20241018114535[email protected]/
v4:
https://lore.kernel.org/all/20241012012918[email protected]/
v3:
https://lore.kernel.org/all/20241010090351[email protected]/
RFCv2:
https://lore.kernel.org/all/20240924110414[email protected]/
RFCv1:
https://lore.kernel.org/all/20240913100941[email protected]/
====================

Signed-off-by: David S. Miller <[email protected]>
3 months agoipv6/udp: Add 4-tuple hash for connected socket
Philo Lu [Thu, 14 Nov 2024 10:52:07 +0000 (18:52 +0800)]
ipv6/udp: Add 4-tuple hash for connected socket

Implement ipv6 udp hash4 like that in ipv4. The major difference is that
the hash value should be calculated with udp6_ehashfn(). Besides,
ipv4-mapped ipv6 address is handled before hash() and rehash(). Export
udp_ehashfn because now we use it in udpv6 rehash.

Core procedures of hash/unhash/rehash are same as ipv4, and udpv4 and
udpv6 share the same udptable, so some functions in ipv4 hash4 can also
be shared.

Co-developed-by: Cambda Zhu <[email protected]>
Signed-off-by: Cambda Zhu <[email protected]>
Co-developed-by: Fred Chen <[email protected]>
Signed-off-by: Fred Chen <[email protected]>
Co-developed-by: Yubing Qiu <[email protected]>
Signed-off-by: Yubing Qiu <[email protected]>
Signed-off-by: Philo Lu <[email protected]>
Acked-by: Willem de Bruijn <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 months agoipv4/udp: Add 4-tuple hash for connected socket
Philo Lu [Thu, 14 Nov 2024 10:52:06 +0000 (18:52 +0800)]
ipv4/udp: Add 4-tuple hash for connected socket

Currently, the udp_table has two hash table, the port hash and portaddr
hash. Usually for UDP servers, all sockets have the same local port and
addr, so they are all on the same hash slot within a reuseport group.

In some applications, UDP servers use connect() to manage clients. In
particular, when firstly receiving from an unseen 4 tuple, a new socket
is created and connect()ed to the remote addr:port, and then the fd is
used exclusively by the client.

Once there are connected sks in a reuseport group, udp has to score all
sks in the same hash2 slot to find the best match. This could be
inefficient with a large number of connections, resulting in high
softirq overhead.

To solve the problem, this patch implement 4-tuple hash for connected
udp sockets. During connect(), hash4 slot is updated, as well as a
corresponding counter, hash4_cnt, in hslot2. In __udp4_lib_lookup(),
hslot4 will be searched firstly if the counter is non-zero. Otherwise,
hslot2 is used like before. Note that only connected sockets enter this
hash4 path, while un-connected ones are not affected.

hlist_nulls is used for hash4, because we probably move to another hslot
wrongly when lookup with concurrent rehash. Then we check nulls at the
list end to see if we should restart lookup. Because udp does not use
SLAB_TYPESAFE_BY_RCU, we don't need to touch sk_refcnt when lookup.

Stress test results (with 1 cpu fully used) are shown below, in pps:
(1) _un-connected_ socket as server
    [a] w/o hash4: 1,825176
    [b] w/  hash4: 1,831750 (+0.36%)

(2) 500 _connected_ sockets as server
    [c] w/o hash4:   290860 (only 16% of [a])
    [d] w/  hash4: 1,889658 (+3.1% compared with [b])

With hash4, compute_score is skipped when lookup, so [d] is slightly
better than [b].

Co-developed-by: Cambda Zhu <[email protected]>
Signed-off-by: Cambda Zhu <[email protected]>
Co-developed-by: Fred Chen <[email protected]>
Signed-off-by: Fred Chen <[email protected]>
Co-developed-by: Yubing Qiu <[email protected]>
Signed-off-by: Yubing Qiu <[email protected]>
Signed-off-by: Philo Lu <[email protected]>
Acked-by: Willem de Bruijn <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 months agonet/udp: Add 4-tuple hash list basis
Philo Lu [Thu, 14 Nov 2024 10:52:05 +0000 (18:52 +0800)]
net/udp: Add 4-tuple hash list basis

Add a new hash list, hash4, in udp table. It will be used to implement
4-tuple hash for connected udp sockets. This patch adds the hlist to
table, and implements helpers and the initialization. 4-tuple hash is
implemented in the following patch.

hash4 uses hlist_nulls to avoid moving wrongly onto another hlist due to
concurrent rehash, because rehash() can happen with lookup().

Co-developed-by: Cambda Zhu <[email protected]>
Signed-off-by: Cambda Zhu <[email protected]>
Co-developed-by: Fred Chen <[email protected]>
Signed-off-by: Fred Chen <[email protected]>
Co-developed-by: Yubing Qiu <[email protected]>
Signed-off-by: Yubing Qiu <[email protected]>
Signed-off-by: Philo Lu <[email protected]>
Acked-by: Willem de Bruijn <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 months agonet/udp: Add a new struct for hash2 slot
Philo Lu [Thu, 14 Nov 2024 10:52:04 +0000 (18:52 +0800)]
net/udp: Add a new struct for hash2 slot

Preparing for udp 4-tuple hash (uhash4 for short).

To implement uhash4 without cache line missing when lookup, hslot2 is
used to record the number of hashed sockets in hslot4. Thus adding a new
struct udp_hslot_main with field hash4_cnt, which is used by hash2. The
new struct is used to avoid doubling the size of udp_hslot.

Before uhash4 lookup, firstly checking hash4_cnt to see if there are
hashed sks in hslot4. Because hslot2 is always used in lookup, there is
no cache line miss.

Related helpers are updated, and use the helpers as possible.

uhash4 is implemented in following patches.

Signed-off-by: Philo Lu <[email protected]>
Acked-by: Willem de Bruijn <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 months agoMerge tag 'ipsec-next-2024-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Mon, 18 Nov 2024 11:52:49 +0000 (11:52 +0000)]
Merge tag 'ipsec-next-2024-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================

ipsec-next-11-15

1) Add support for RFC 9611 per cpu xfrm state handling.

2) Add inbound and outbound xfrm state caches to speed up
   state lookups.

3) Convert xfrm to dscp_t. From Guillaume Nault.

4) Fix error handling in build_aevent.
   From Everest K.C.

5) Replace strncpy with strscpy_pad in copy_to_user_auth.
   From Daniel Yang.

6) Fix an uninitialized symbol during acquire state insertion.
====================

Signed-off-by: David S. Miller <[email protected]>
4 months agoMerge branch 'virtio-net-support-af_xdp-zero-copy-tx'
Jakub Kicinski [Sat, 16 Nov 2024 02:47:07 +0000 (18:47 -0800)]
Merge branch 'virtio-net-support-af_xdp-zero-copy-tx'

Xuan Zhuo says:

====================
virtio-net: support AF_XDP zero copy (tx)

XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good. mlx5 and intel ixgbe already support
this feature, This patch set allows virtio-net to support xsk's zerocopy xmit
feature.

At present, we have completed some preparation:

1. vq-reset (virtio spec and kernel code)
2. virtio-core premapped dma
3. virtio-net xdp refactor

So it is time for Virtio-Net to complete the support for the XDP Socket
Zerocopy.

Virtio-net can not increase the queue num at will, so xsk shares the queue with
kernel.

This patch set includes some refactor to the virtio-net to let that to support
AF_XDP.

The current configuration sets the virtqueue (vq) to premapped mode,
implying that all buffers submitted to this queue must be mapped ahead
of time. This presents a challenge for the virtnet send queue (sq): the
virtnet driver would be required to keep track of dma information for vq
size * 17, which can be substantial. However, if the premapped mode were
applied on a per-buffer basis, the complexity would be greatly reduced.
With AF_XDP enabled, AF_XDP buffers would become premapped, while kernel
skb buffers could remain unmapped.

We can distinguish them by sg_page(sg), When sg_page(sg) is NULL, this
indicates that the driver has performed DMA mapping in advance, allowing
the Virtio core to directly utilize sg_dma_address(sg) without
conducting any internal DMA mapping. Additionally, DMA unmap operations
for this buffer will be bypassed.

ENV: Qemu with vhost-user(polling mode).
Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz

testpmd> show port stats all

 ######################## NIC statistics for port 0 ########################
 RX-packets: 19531092064 RX-missed: 0     RX-bytes: 1093741155584
 RX-errors: 0
 RX-nombuf: 0
 TX-packets: 5959955552 TX-errors: 0     TX-bytes: 371030645664

 Throughput (since last show)
 Rx-pps:   8861574     Rx-bps:  3969985208
 Tx-pps:   8861493     Tx-bps:  3969962736
 ############################################################################

testpmd> show port stats all

  ######################## NIC statistics for port 0  ########################
  RX-packets: 68152727   RX-missed: 0          RX-bytes:  3816552712
  RX-errors: 0
  RX-nombuf:  0
  TX-packets: 68114967   TX-errors: 33216      TX-bytes:  3814438152

  Throughput (since last show)
  Rx-pps:      6333196          Rx-bps:   2837272088
  Tx-pps:      6333227          Tx-bps:   2837285936
  ############################################################################

But AF_XDP consumes more CPU for tx and rx napi(100% and 86%).
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agovirtio_net: xdp_features add NETDEV_XDP_ACT_XSK_ZEROCOPY
Xuan Zhuo [Tue, 12 Nov 2024 01:29:28 +0000 (09:29 +0800)]
virtio_net: xdp_features add NETDEV_XDP_ACT_XSK_ZEROCOPY

Now, we support AF_XDP(xsk). Add NETDEV_XDP_ACT_XSK_ZEROCOPY to
xdp_features.

Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agovirtio_net: update tx timeout record
Xuan Zhuo [Tue, 12 Nov 2024 01:29:27 +0000 (09:29 +0800)]
virtio_net: update tx timeout record

If send queue sent some packets, we update the tx timeout
record to prevent the tx timeout.

Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agovirtio_net: xsk: tx: support xmit xsk buffer
Xuan Zhuo [Tue, 12 Nov 2024 01:29:26 +0000 (09:29 +0800)]
virtio_net: xsk: tx: support xmit xsk buffer

The driver's tx napi is very important for XSK. It is responsible for
obtaining data from the XSK queue and sending it out.

At the beginning, we need to trigger tx napi.

virtnet_free_old_xmit distinguishes three type ptr(skb, xdp frame, xsk
buffer) by the last bits of the pointer.

Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agovirtio_net: xsk: prevent disable tx napi
Xuan Zhuo [Tue, 12 Nov 2024 01:29:25 +0000 (09:29 +0800)]
virtio_net: xsk: prevent disable tx napi

Since xsk's TX queue is consumed by TX NAPI, if sq is bound to xsk, then
we must stop tx napi from being disabled.

Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agovirtio_net: xsk: bind/unbind xsk for tx
Xuan Zhuo [Tue, 12 Nov 2024 01:29:24 +0000 (09:29 +0800)]
virtio_net: xsk: bind/unbind xsk for tx

This patch implement the logic of bind/unbind xsk pool to sq and rq.

Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agovirtio_net: refactor the xmit type
Xuan Zhuo [Tue, 12 Nov 2024 01:29:23 +0000 (09:29 +0800)]
virtio_net: refactor the xmit type

Because the af-xdp will introduce a new xmit type, so I refactor the
xmit type mechanism first.

We know both xdp_frame and sk_buff are at least 4 bytes aligned.
For the xdp tx, we do not pass any pointer to virtio core as data,
we just need to pass the len of the packet. So we will push len
to the void pointer. We can make sure the pointer is 4 bytes aligned.

And the data structure of AF_XDP also is at least 4 bytes aligned.

So the last two bits of the pointers are free, we can't use these to
distinguish them.

    00 for skb
    01 for SKB_ORPHAN
    10 for XDP
    11 for AF-XDP tx

Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agovirtio_ring: remove API virtqueue_set_dma_premapped
Xuan Zhuo [Tue, 12 Nov 2024 01:29:22 +0000 (09:29 +0800)]
virtio_ring: remove API virtqueue_set_dma_premapped

Now, this API is useless. remove it.

Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agovirtio-net: rq submits premapped per-buffer
Xuan Zhuo [Tue, 12 Nov 2024 01:29:21 +0000 (09:29 +0800)]
virtio-net: rq submits premapped per-buffer

virtio-net rq submits premapped per-buffer by setting sg page to NULL;

Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agovirtio_ring: introduce add api for premapped
Xuan Zhuo [Tue, 12 Nov 2024 01:29:20 +0000 (09:29 +0800)]
virtio_ring: introduce add api for premapped

Two APIs are introduced to submit premapped per-buffers.

int virtqueue_add_inbuf_premapped(struct virtqueue *vq,
                                 struct scatterlist *sg, unsigned int num,
                                 void *data,
                                 void *ctx,
                                 gfp_t gfp);

int virtqueue_add_outbuf_premapped(struct virtqueue *vq,
                                  struct scatterlist *sg, unsigned int num,
                                  void *data,
                                  gfp_t gfp);

Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agovirtio_ring: perform premapped operations based on per-buffer
Xuan Zhuo [Tue, 12 Nov 2024 01:29:19 +0000 (09:29 +0800)]
virtio_ring: perform premapped operations based on per-buffer

The current configuration sets the virtqueue (vq) to premapped mode,
implying that all buffers submitted to this queue must be mapped ahead
of time. This presents a challenge for the virtnet send queue (sq): the
virtnet driver would be required to keep track of dma information for vq
size * 17, which can be substantial. However, if the premapped mode were
applied on a per-buffer basis, the complexity would be greatly reduced.
With AF_XDP enabled, AF_XDP buffers would become premapped, while kernel
skb buffers could remain unmapped.

And consider that some sgs are not generated by the virtio driver,
that may be passed from the block stack. So we can not change the
sgs, new APIs are the better way.

So we pass the new argument 'premapped' to indicate the buffers
submitted to virtio are premapped in advance. Additionally,
DMA unmap operations for these buffers will be bypassed.

Suggested-by: Jason Wang <[email protected]>
Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agovirtio_ring: packed: record extras for indirect buffers
Xuan Zhuo [Tue, 12 Nov 2024 01:29:18 +0000 (09:29 +0800)]
virtio_ring: packed: record extras for indirect buffers

The subsequent commit needs to know whether every indirect buffer is
premapped or not. So we need to introduce an extra struct for every
indirect buffer to record this info.

Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agovirtio_ring: split: record extras for indirect buffers
Xuan Zhuo [Tue, 12 Nov 2024 01:29:17 +0000 (09:29 +0800)]
virtio_ring: split: record extras for indirect buffers

The subsequent commit needs to know whether every indirect buffer is
premapped or not. So we need to introduce an extra struct for every
indirect buffer to record this info.

Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agovirtio_ring: introduce vring_need_unmap_buffer
Xuan Zhuo [Tue, 12 Nov 2024 01:29:16 +0000 (09:29 +0800)]
virtio_ring: introduce vring_need_unmap_buffer

To make the code readable, introduce vring_need_unmap_buffer() to
replace do_unmap.

   use_dma_api premapped -> vring_need_unmap_buffer()
1. false       false        false
2. true        false        true
3. true        true         false

Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
Jakub Kicinski [Sat, 16 Nov 2024 02:21:34 +0000 (18:21 -0800)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-11-05 (ice, ixgbe, igc. igb, igbvf, e1000)

For ice:

Mateusz refactors and adds additional SerDes configuration values to be
output.

Przemek refactors processing of DDP and adds support for a flag field in
the DDP's signature segment header.

Joe Damato adds support for persistent NAPI config.

Brett adjusts setting of Tx promiscuous based on unicast/multicast
setting.

Jake moves setting of pf->supported_rxdids to occur directly after DDP
load and changes a small struct to use stack memory.

Frederic Weisbecker adds WQ_UNBOUND flag to the workqueue.

For ixgbe:

Diomidis Spinellis removes a circular dependency.

For igc:

Vitaly removes an unneeded autoneg parameter.

For igb:

Johnny Park fixes a couple of typos.

For igbvf:

Wander Lairson Costa removes an unused spinlock.

For e1000:

Joe Damato adds RTNL lock to some calls where it is expected to be held.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  e1000: Hold RTNL when e1000_down can be called
  igbvf: remove unused spinlock
  igb: Fix 2 typos in comments in igb_main.c
  igc: remove autoneg parameter from igc_mac_info
  ixgbe: Break include dependency cycle
  ice: Unbind the workqueue
  ice: use stack variable for virtchnl_supported_rxdids
  ice: initialize pf->supported_rxdids immediately after loading DDP
  ice: only allow Tx promiscuous for multicast
  ice: Add support for persistent NAPI config
  ice: support optional flags in signature segment header
  ice: refactor "last" segment of DDP pkg
  ice: extend dump serdes equalizer values feature
  ice: rework of dump serdes equalizer values feature
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoMerge branch 'net-ndo_fdb_add-del-have-drivers-report-whether-they-notified'
Jakub Kicinski [Sat, 16 Nov 2024 00:39:21 +0000 (16:39 -0800)]
Merge branch 'net-ndo_fdb_add-del-have-drivers-report-whether-they-notified'

Petr Machata says:

====================
net: ndo_fdb_add/del: Have drivers report whether they notified

Currently when FDB entries are added to or deleted from a VXLAN netdevice,
the VXLAN driver emits one notification, including the VXLAN-specific
attributes. The core however always sends a notification as well, a generic
one. Thus two notifications are unnecessarily sent for these operations. A
similar situation comes up with bridge driver, which also emits
notifications on its own.

 # ip link add name vx type vxlan id 1000 dstport 4789
 # bridge monitor fdb &
 [1] 1981693
 # bridge fdb add de:ad:be:ef:13:37 dev vx self dst 192.0.2.1
 de:ad:be:ef:13:37 dev vx dst 192.0.2.1 self permanent
 de:ad:be:ef:13:37 dev vx self permanent

In order to prevent this duplicity, add a parameter, bool *notified, to
ndo_fdb_add and ndo_fdb_del. The flag is primed to false, and if the callee
sends a notification on its own, it sets the flag to true, thus informing
the core that it should not generate another notification.

Patches #1 to #2 are concerned with the above.

In the remaining patches, #3 to #7, add a selftest. This takes place across
several patches. Many of the helpers we would like to use for the test are
in forwarding/lib.sh, whereas net/ is a more suitable place for the test,
so the libraries need to be massaged a bit first.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoselftests: net: fdb_notify: Add a test for FDB notifications
Petr Machata [Thu, 14 Nov 2024 14:09:59 +0000 (15:09 +0100)]
selftests: net: fdb_notify: Add a test for FDB notifications

Check that only one notification is produced for various FDB edit
operations.

Regarding the ip_link_add() and ip_link_master() helpers. This pattern of
action plus corresponding defer is bound to come up often, and a dedicated
vocabulary to capture it will be handy. tunnel_create() and vlan_create()
from forwarding/lib.sh are somewhat opaque and perhaps too kitchen-sinky,
so I tried to go in the opposite direction with these ones, and wrapped
only the bare minimum to schedule a corresponding cleanup.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Acked-by: Shuah Khan <[email protected]>
Link: https://patch.msgid.link/910c5880ae6d3b558d6889cbdba2be690c2615c6.1731589511.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoselftests: net: lib: Add kill_process
Petr Machata [Thu, 14 Nov 2024 14:09:58 +0000 (15:09 +0100)]
selftests: net: lib: Add kill_process

A number of selftests run processes in the background and need to kill them
afterwards. Instead for everyone to open-code the kill / wait / redirect
mantra, add a helper in net/lib.sh. Convert existing open-code sites.

Signed-off-by: Petr Machata <[email protected]>
Acked-by: Shuah Khan <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Link: https://patch.msgid.link/a9db102067d741c118f0bd93b10c75e2a34665ea.1731589511.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoselftests: net: lib: Move checks from forwarding/lib.sh here
Petr Machata [Thu, 14 Nov 2024 14:09:57 +0000 (15:09 +0100)]
selftests: net: lib: Move checks from forwarding/lib.sh here

For logging to be useful, something has to set RET and retmsg by calling
ret_set_ksft_status(). There is a suite of functions to that end in
forwarding/lib: check_err, check_fail et.al. Move them to net/lib.sh so
that every net test can use them.

Existing lib.sh users might be using these same names for their functions.
However lib.sh is always sourced near the top of the file (checked), and
whatever new definitions will simply override the ones provided by lib.sh.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Acked-by: Shuah Khan <[email protected]>
Link: https://patch.msgid.link/f488a00dc85b8e0c1f3c71476b32b21b5189a847.1731589511.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoselftests: net: lib: Move tests_run from forwarding/lib.sh here
Petr Machata [Thu, 14 Nov 2024 14:09:56 +0000 (15:09 +0100)]
selftests: net: lib: Move tests_run from forwarding/lib.sh here

It would be good to use the same mechanism for scheduling and dispatching
general net tests as the many forwarding tests already use. To that end,
move the logging helpers to net/lib.sh so that every net test can use them.

Existing lib.sh users might be using the name themselves. However lib.sh is
always sourced near the top of the file (checked), and whatever new
definition will simply override the one provided by lib.sh.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Acked-by: Shuah Khan <[email protected]>
Link: https://patch.msgid.link/a6fc083486493425b2c61185c327845b6ce3233a.1731589511.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoselftests: net: lib: Move logging from forwarding/lib.sh here
Petr Machata [Thu, 14 Nov 2024 14:09:55 +0000 (15:09 +0100)]
selftests: net: lib: Move logging from forwarding/lib.sh here

Many net selftests invent their own logging helpers. These really should be
in a library sourced by these tests. Currently forwarding/lib.sh has a
suite of perfectly fine logging helpers, but sourcing a forwarding/ library
from a higher-level directory smells of layering violation. In this patch,
move the logging helpers to net/lib.sh so that every net test can use them.

Together with the logging helpers, it's also necessary to move
pause_on_fail(), and EXIT_STATUS and RET.

Existing lib.sh users might be using these same names for their functions
or variables. However lib.sh is always sourced near the top of the
file (checked), and whatever new definitions will simply override the ones
provided by lib.sh.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Acked-by: Shuah Khan <[email protected]>
Link: https://patch.msgid.link/edd3785a3bd72ffbe1409300989e993ee50ae98b.1731589511.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agondo_fdb_del: Add a parameter to report whether notification was sent
Petr Machata [Thu, 14 Nov 2024 14:09:54 +0000 (15:09 +0100)]
ndo_fdb_del: Add a parameter to report whether notification was sent

In a similar fashion to ndo_fdb_add, which was covered in the previous
patch, add the bool *notified argument to ndo_fdb_del. Callees that send a
notification on their own set the flag to true.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Link: https://patch.msgid.link/06b1acf4953ef0a5ed153ef1f32d7292044f2be6.1731589511.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agondo_fdb_add: Add a parameter to report whether notification was sent
Petr Machata [Thu, 14 Nov 2024 14:09:53 +0000 (15:09 +0100)]
ndo_fdb_add: Add a parameter to report whether notification was sent

Currently when FDB entries are added to or deleted from a VXLAN netdevice,
the VXLAN driver emits one notification, including the VXLAN-specific
attributes. The core however always sends a notification as well, a generic
one. Thus two notifications are unnecessarily sent for these operations. A
similar situation comes up with bridge driver, which also emits
notifications on its own:

 # ip link add name vx type vxlan id 1000 dstport 4789
 # bridge monitor fdb &
 [1] 1981693
 # bridge fdb add de:ad:be:ef:13:37 dev vx self dst 192.0.2.1
 de:ad:be:ef:13:37 dev vx dst 192.0.2.1 self permanent
 de:ad:be:ef:13:37 dev vx self permanent

In order to prevent this duplicity, add a paremeter to ndo_fdb_add,
bool *notified. The flag is primed to false, and if the callee sends a
notification on its own, it sets it to true, thus informing the core that
it should not generate another notification.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Link: https://patch.msgid.link/cbf6ae8195e85cbf922f8058ce4eba770f3b71ed.1731589511.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoMerge branch 'modifying-format-and-renaming-goto-labels'
Jakub Kicinski [Sat, 16 Nov 2024 00:26:56 +0000 (16:26 -0800)]
Merge branch 'modifying-format-and-renaming-goto-labels'

Justin Lai says:

====================
Modifying format and renaming goto labels

This patch set primarily involves modifying the enum rtase_registers
format and renaming the goto labels in rtase_init_one.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agortase: Modify the content format of the enum rtase_registers
Justin Lai [Thu, 14 Nov 2024 11:25:49 +0000 (19:25 +0800)]
rtase: Modify the content format of the enum rtase_registers

Remove unnecessary spaces.

Signed-off-by: Justin Lai <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agortase: Modify the name of the goto label
Justin Lai [Thu, 14 Nov 2024 11:25:48 +0000 (19:25 +0800)]
rtase: Modify the name of the goto label

Modify the name of the goto label in rtase_init_one().

Signed-off-by: Justin Lai <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoMerge branch 'net-netpoll-improve-skb-pool-management'
Jakub Kicinski [Sat, 16 Nov 2024 00:25:39 +0000 (16:25 -0800)]
Merge branch 'net-netpoll-improve-skb-pool-management'

Breno Leitao says:

====================
net: netpoll: Improve SKB pool management

The netpoll subsystem pre-allocates 32 SKBs in a pool for emergency use
during out-of-memory conditions. However, the current implementation has
several inefficiencies:

 * The SKB pool, once allocated, is never freed:
 * Resources remain allocated even after netpoll users are removed
 * Failed initialization can leave pool populated forever
 * The global pool design makes resource tracking difficult

This series addresses these issues through three patches:

Patch 1 ("net: netpoll: Individualize the skb pool"):
 - Replace global pool with per-user pools in netpoll struct

Patch 2 ("net: netpoll: flush skb pool during cleanup"):
- Properly free pool resources during netconsole cleanup

These changes improve resource management and make the code more
maintainable.  As a side benefit, the improved structure would allow
netpoll to be modularized if desired in the future.

v2: https://lore.kernel.org/20241107-skb_buffers_v2-v2-0-288c6264ba4f@debian.org
v1: https://lore.kernel.org/20241025142025.3558051[email protected]
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet: netpoll: flush skb pool during cleanup
Breno Leitao [Thu, 14 Nov 2024 11:00:12 +0000 (03:00 -0800)]
net: netpoll: flush skb pool during cleanup

The netpoll subsystem maintains a pool of 32 pre-allocated SKBs per
instance, but these SKBs are not freed when the netpoll user is brought
down. This leads to memory waste as these buffers remain allocated but
unused.

Add skb_pool_flush() to properly clean up these SKBs when netconsole is
terminated, improving memory efficiency.

Signed-off-by: Breno Leitao <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet: netpoll: Individualize the skb pool
Breno Leitao [Thu, 14 Nov 2024 11:00:11 +0000 (03:00 -0800)]
net: netpoll: Individualize the skb pool

The current implementation of the netpoll system uses a global skb
pool, which can lead to inefficient memory usage and
waste when targets are disabled or no longer in use.

This can result in a significant amount of memory being unnecessarily
allocated and retained, potentially causing performance issues and
limiting the availability of resources for other system components.

Modify the netpoll system to assign a skb pool to each target instead of
using a global one.

This approach allows for more fine-grained control over memory
allocation and deallocation, ensuring that resources are only allocated
and retained as needed.

Signed-off-by: Breno Leitao <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoocteontx2-pf: Fix spelling mistake "reprentator" -> "representor"
Colin Ian King [Thu, 14 Nov 2024 10:20:12 +0000 (10:20 +0000)]
octeontx2-pf: Fix spelling mistake "reprentator" -> "representor"

There is a spelling mistake in a NL_SET_ERR_MSG_MOD error message.
Fix it.

Signed-off-by: Colin Ian King <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet/netlink: Correct the comment on netlink message max cap
Dmitry Safonov [Wed, 13 Nov 2024 18:46:44 +0000 (18:46 +0000)]
net/netlink: Correct the comment on netlink message max cap

Since commit d35c99ff77ec ("netlink: do not enter direct reclaim from
netlink_dump()") the cap is 32KiB.

Signed-off-by: Dmitry Safonov <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoMerge branch 'enic-use-all-the-resources-configured-on-vic'
Jakub Kicinski [Fri, 15 Nov 2024 23:38:48 +0000 (15:38 -0800)]
Merge branch 'enic-use-all-the-resources-configured-on-vic'

Nelson Escobar says:

====================
enic: Use all the resources configured on VIC

Allow users to configure and use more than 8 rx queues and 8 tx queues
on the Cisco VIC.

This series changes the maximum number of tx and rx queues supported
from 8 to the hardware limit of 256, and allocates memory based on the
number of resources configured on the VIC.

v3: https://lore.kernel.org/20241108-remove_vic_resource_limits-v3-0-3ba8123bcffc@cisco.com
v2: https://lore.kernel.org/20241024-remove_vic_resource_limits-v2-0-039b8cae5fdd@cisco.com
v1: https://lore.kernel.org/20241022041707[email protected]
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoenic: Move kdump check into enic_adjust_resources()
Nelson Escobar [Wed, 13 Nov 2024 23:56:39 +0000 (23:56 +0000)]
enic: Move kdump check into enic_adjust_resources()

Move the kdump check into enic_adjust_resources() so that everything
that modifies resources is in the same function.

Co-developed-by: John Daley <[email protected]>
Signed-off-by: John Daley <[email protected]>
Co-developed-by: Satish Kharat <[email protected]>
Signed-off-by: Satish Kharat <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Nelson Escobar <[email protected]>
Reviewed-by: Vadim Fedorenko <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoenic: Move enic resource adjustments to separate function
Nelson Escobar [Wed, 13 Nov 2024 23:56:38 +0000 (23:56 +0000)]
enic: Move enic resource adjustments to separate function

Move the enic resource adjustments out of enic_set_intr_mode() and into
its own function, enic_adjust_resources().

Co-developed-by: John Daley <[email protected]>
Signed-off-by: John Daley <[email protected]>
Co-developed-by: Satish Kharat <[email protected]>
Signed-off-by: Satish Kharat <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Nelson Escobar <[email protected]>
Reviewed-by: Vadim Fedorenko <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoenic: Adjust used MSI-X wq/rq/cq/interrupt resources in a more robust way
Nelson Escobar [Wed, 13 Nov 2024 23:56:37 +0000 (23:56 +0000)]
enic: Adjust used MSI-X wq/rq/cq/interrupt resources in a more robust way

Instead of failing to use MSI-X if resources aren't configured exactly
right, use the resources we do have.  Since we could start using large
numbers of rq resources, we do limit the rq count to what
netif_get_num_default_rss_queues() recommends.

Co-developed-by: John Daley <[email protected]>
Signed-off-by: John Daley <[email protected]>
Co-developed-by: Satish Kharat <[email protected]>
Signed-off-by: Satish Kharat <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Nelson Escobar <[email protected]>
Reviewed-by: Vadim Fedorenko <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoenic: Allocate arrays in enic struct based on VIC config
Nelson Escobar [Wed, 13 Nov 2024 23:56:36 +0000 (23:56 +0000)]
enic: Allocate arrays in enic struct based on VIC config

Allocate wq, rq, cq, intr, and napi arrays based on the number of
resources configured in the VIC.

Co-developed-by: John Daley <[email protected]>
Signed-off-by: John Daley <[email protected]>
Co-developed-by: Satish Kharat <[email protected]>
Signed-off-by: Satish Kharat <[email protected]>
Signed-off-by: Nelson Escobar <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Vadim Fedorenko <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoenic: Save resource counts we read from HW
Nelson Escobar [Wed, 13 Nov 2024 23:56:35 +0000 (23:56 +0000)]
enic: Save resource counts we read from HW

Save the resources counts for wq,rq,cq, and interrupts in *_avail variables
so that we don't lose the information when adjusting the counts we are
actually using.

Report the wq_avail and rq_avail as the channel maximums in 'ethtool -l'
output.

Co-developed-by: John Daley <[email protected]>
Signed-off-by: John Daley <[email protected]>
Co-developed-by: Satish Kharat <[email protected]>
Signed-off-by: Satish Kharat <[email protected]>
Signed-off-by: Nelson Escobar <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Vadim Fedorenko <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoenic: Make MSI-X I/O interrupts come after the other required ones
Nelson Escobar [Wed, 13 Nov 2024 23:56:34 +0000 (23:56 +0000)]
enic: Make MSI-X I/O interrupts come after the other required ones

The VIC hardware has a constraint that the MSIX interrupt used for errors
be specified as a 7 bit number.  Before this patch, it was allocated after
the I/O interrupts, which would cause a problem if 128 or more I/O
interrupts are in use.

So make the required interrupts come before the I/O interrupts to
guarantee the error interrupt offset never exceeds 7 bits.

Co-developed-by: John Daley <[email protected]>
Signed-off-by: John Daley <[email protected]>
Co-developed-by: Satish Kharat <[email protected]>
Signed-off-by: Satish Kharat <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Nelson Escobar <[email protected]>
Reviewed-by: Vadim Fedorenko <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoenic: Create enic_wq/rq structures to bundle per wq/rq data
Nelson Escobar [Wed, 13 Nov 2024 23:56:33 +0000 (23:56 +0000)]
enic: Create enic_wq/rq structures to bundle per wq/rq data

Bundling the wq/rq specific data into dedicated enic_wq/rq structures
cleans up the enic structure and simplifies future changes related to
wq/rq.

Co-developed-by: John Daley <[email protected]>
Signed-off-by: John Daley <[email protected]>
Co-developed-by: Satish Kharat <[email protected]>
Signed-off-by: Satish Kharat <[email protected]>
Signed-off-by: Nelson Escobar <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Vadim Fedorenko <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet: phy: microchip_t1: Clause-45 PHY loopback support for LAN887x
Tarun Alle [Thu, 14 Nov 2024 10:19:51 +0000 (15:49 +0530)]
net: phy: microchip_t1: Clause-45 PHY loopback support for LAN887x

Adds support for clause-45 PHY loopback for the Microchip LAN887x driver.

Signed-off-by: Tarun Alle <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agodt-bindings: net: dsa: microchip,ksz: Drop undocumented "id"
Rob Herring (Arm) [Wed, 13 Nov 2024 22:56:43 +0000 (16:56 -0600)]
dt-bindings: net: dsa: microchip,ksz: Drop undocumented "id"

"id" is not a documented property, so drop it.

Signed-off-by: Rob Herring (Arm) <[email protected]>
Acked-by: Conor Dooley <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agobnxt_en: optimize gettimex64
Vadim Fedorenko [Thu, 14 Nov 2024 11:48:20 +0000 (03:48 -0800)]
bnxt_en: optimize gettimex64

Current implementation of gettimex64() makes at least 3 PCIe reads to
get current PHC time. It takes at least 2.2us to get this value back to
userspace. At the same time there is cached value of upper bits of PHC
available for packet timestamps already. This patch reuses cached value
to speed up reading of PHC time.

Signed-off-by: Vadim Fedorenko <[email protected]>
Reviewed-by: Michael Chan <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonetdev-genl: Hold rcu_read_lock in napi_set
Joe Damato [Thu, 14 Nov 2024 17:55:59 +0000 (17:55 +0000)]
netdev-genl: Hold rcu_read_lock in napi_set

Hold rcu_read_lock during netdev_nl_napi_set_doit, which calls
napi_by_id and requires rcu_read_lock to be held.

Closes: https://lore.kernel.org/netdev/[email protected]/
Fixes: 1287c1ae0fc2 ("netdev-genl: Support setting per-NAPI config values")
Signed-off-by: Joe Damato <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoMerge tag 'for-net-next-2024-11-14' of git://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Fri, 15 Nov 2024 22:16:28 +0000 (14:16 -0800)]
Merge tag 'for-net-next-2024-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Luiz Augusto von Dentz says:

====================
bluetooth-next pull request for net-next:

 - btusb: add Foxconn 0xe0fc for Qualcomm WCN785x
 - btmtk: Fix ISO interface handling
 - Add quirk for ATS2851
 - btusb: Add RTL8852BE device 0489:e123
 - ISO: Do not emit LE PA/BIG Create Sync if previous is pending
 - btusb: Add USB HW IDs for MT7920/MT7925
 - btintel_pcie: Add handshake between driver and firmware
 - btintel_pcie: Add recovery mechanism
 - hci_conn: Use disable_delayed_work_sync
 - SCO: Use kref to track lifetime of sco_conn
 - ISO: Use kref to track lifetime of iso_conn
 - btnxpuart: Add GPIO support to power save feature
 - btusb: Add 0x0489:0xe0f3 and 0x13d3:0x3623 for Qualcomm WCN785x

* tag 'for-net-next-2024-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (51 commits)
  Bluetooth: MGMT: Add initial implementation of MGMT_OP_HCI_CMD_SYNC
  Bluetooth: fix use-after-free in device_for_each_child()
  Bluetooth: btintel: Direct exception event to bluetooth stack
  Bluetooth: hci_core: Fix calling mgmt_device_connected
  Bluetooth: hci_bcm: Use the devm_clk_get_optional() helper
  Bluetooth: ISO: Send BIG Create Sync via hci_sync
  Bluetooth: hci_conn: Remove alloc from critical section
  Bluetooth: ISO: Use kref to track lifetime of iso_conn
  Bluetooth: SCO: Use kref to track lifetime of sco_conn
  Bluetooth: HCI: Add IPC(11) bus type
  Bluetooth: btusb: Add 3 HWIDs for MT7925
  Bluetooth: btusb: Add new VID/PID 0489/e124 for MT7925
  Bluetooth: ISO: Update hci_conn_hash_lookup_big for Broadcast slave
  Bluetooth: ISO: Do not emit LE BIG Create Sync if previous is pending
  Bluetooth: ISO: Fix matching parent socket for BIS slave
  Bluetooth: ISO: Do not emit LE PA Create Sync if previous is pending
  Bluetooth: btrtl: Decrease HCI_OP_RESET timeout from 10 s to 2 s
  Bluetooth: btbcm: fix missing of_node_put() in btbcm_get_board_name()
  Bluetooth: btusb: Add new VID/PID 0489/e111 for MT7925
  Bluetooth: btmtk: adjust the position to init iso data anchor
  ...
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoMerge tag 'nf-next-24-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilt...
Jakub Kicinski [Fri, 15 Nov 2024 22:09:20 +0000 (14:09 -0800)]
Merge tag 'nf-next-24-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Extended netlink error reporting if nfnetlink attribute parser fails,
   from Donald Hunter.

2) Incorrect request_module() module, from Simon Horman.

3) A series of patches to reduce memory consumption for set element
   transactions.
   Florian Westphal says:

"When doing a flush on a set or mass adding/removing elements from a
set, each element needs to allocate 96 bytes to hold the transactional
state.

In such cases, virtually all the information in struct nft_trans_elem
is the same.

Change nft_trans_elem to a flex-array, i.e. a single nft_trans_elem
can hold multiple set element pointers.

The number of elements that can be stored in one nft_trans_elem is limited
by the slab allocator, this series limits the compaction to at most 62
elements as it caps the reallocation to 2048 bytes of memory."

4) A series of patches to prepare the transition to dscp_t in .flowi_tos.
   From Guillaume Nault.

5) Support for bitwise operations with two source registers,
   from Jeremy Sowden.

* tag 'nf-next-24-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: bitwise: add support for doing AND, OR and XOR directly
  netfilter: bitwise: rename some boolean operation functions
  netfilter: nf_dup4: Convert nf_dup_ipv4_route() to dscp_t.
  netfilter: nft_fib: Convert nft_fib4_eval() to dscp_t.
  netfilter: rpfilter: Convert rpfilter_mt() to dscp_t.
  netfilter: flow_offload: Convert nft_flow_route() to dscp_t.
  netfilter: ipv4: Convert ip_route_me_harder() to dscp_t.
  netfilter: nf_tables: allocate element update information dynamically
  netfilter: nf_tables: switch trans_elem to real flex array
  netfilter: nf_tables: prepare nft audit for set element compaction
  netfilter: nf_tables: prepare for multiple elements in nft_trans_elem structure
  netfilter: nf_tables: add nft_trans_commit_list_add_elem helper
  netfilter: bpf: Pass string literal as format argument of request_module()
  netfilter: nfnetlink: Report extack policy errors for batched ops
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonetfilter: bitwise: add support for doing AND, OR and XOR directly
Jeremy Sowden [Thu, 14 Nov 2024 21:08:13 +0000 (22:08 +0100)]
netfilter: bitwise: add support for doing AND, OR and XOR directly

Hitherto, these operations have been converted in user space to
mask-and-xor operations on one register and two immediate values, and it
is the latter which have been evaluated by the kernel.  We add support
for evaluating these operations directly in kernel space on one register
and either an immediate value or a second register.

Pablo made a few changes to the original patch:

- EINVAL if NFTA_BITWISE_SREG2 is used with fast version.
- Allow _AND,_OR,_XOR with _DATA != sizeof(u32)
- Dump _SREG2 or _DATA with _AND,_OR,_XOR

Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
4 months agonetfilter: bitwise: rename some boolean operation functions
Jeremy Sowden [Thu, 14 Nov 2024 21:07:51 +0000 (22:07 +0100)]
netfilter: bitwise: rename some boolean operation functions

In the next patch we add support for doing AND, OR and XOR operations
directly in the kernel, so rename some functions and an enum constant
related to mask-and-xor boolean operations.

Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
4 months agonetfilter: nf_dup4: Convert nf_dup_ipv4_route() to dscp_t.
Guillaume Nault [Thu, 14 Nov 2024 16:03:52 +0000 (17:03 +0100)]
netfilter: nf_dup4: Convert nf_dup_ipv4_route() to dscp_t.

Use ip4h_dscp() instead of reading iph->tos directly.

ip4h_dscp() returns a dscp_t value which is temporarily converted back
to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to
dscp_t in the future, we'll only have to remove that
inet_dscp_to_dsfield() call.

Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
4 months agonetfilter: nft_fib: Convert nft_fib4_eval() to dscp_t.
Guillaume Nault [Thu, 14 Nov 2024 16:03:45 +0000 (17:03 +0100)]
netfilter: nft_fib: Convert nft_fib4_eval() to dscp_t.

Use ip4h_dscp() instead of reading iph->tos directly.

ip4h_dscp() returns a dscp_t value which is temporarily converted back
to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to
dscp_t in the future, we'll only have to remove that
inet_dscp_to_dsfield() call.

Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
4 months agonetfilter: rpfilter: Convert rpfilter_mt() to dscp_t.
Guillaume Nault [Thu, 14 Nov 2024 16:03:38 +0000 (17:03 +0100)]
netfilter: rpfilter: Convert rpfilter_mt() to dscp_t.

Use ip4h_dscp() instead of reading iph->tos directly.

ip4h_dscp() returns a dscp_t value which is temporarily converted back
to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to
dscp_t in the future, we'll only have to remove that
inet_dscp_to_dsfield() call.

Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
4 months agonetfilter: flow_offload: Convert nft_flow_route() to dscp_t.
Guillaume Nault [Thu, 14 Nov 2024 16:03:31 +0000 (17:03 +0100)]
netfilter: flow_offload: Convert nft_flow_route() to dscp_t.

Use ip4h_dscp()instead of reading ip_hdr()->tos directly.

ip4h_dscp() returns a dscp_t value which is temporarily converted back
to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to
dscp_t in the future, we'll only have to remove that
inet_dscp_to_dsfield() call.

Also, remove the comment about the net/ip.h include file, since it's
now required for the ip4h_dscp() helper too.

Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
4 months agonetfilter: ipv4: Convert ip_route_me_harder() to dscp_t.
Guillaume Nault [Thu, 14 Nov 2024 16:03:21 +0000 (17:03 +0100)]
netfilter: ipv4: Convert ip_route_me_harder() to dscp_t.

Use ip4h_dscp()instead of reading iph->tos directly.

ip4h_dscp() returns a dscp_t value which is temporarily converted back
to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to
dscp_t in the future, we'll only have to remove that
inet_dscp_to_dsfield() call.

Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
4 months agoxfrm: Fix acquire state insertion.
Steffen Klassert [Thu, 14 Nov 2024 11:06:56 +0000 (12:06 +0100)]
xfrm: Fix acquire state insertion.

A recent commit jumped over the dst hash computation and
left the symbol uninitialized. Fix this by explicitly
computing the dst hash before it is used.

Fixes: 0045e3d80613 ("xfrm: Cache used outbound xfrm states at the policy.")
Reported-by: Dan Carpenter <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
This page took 0.125771 seconds and 4 git commands to generate.