Git Repo - linux.git/log

bpf, arm64: Fix trampoline for BPF_TRAMP_F_CALL_ORIG

When BPF_TRAMP_F_CALL_ORIG is set, the trampoline calls
__bpf_tramp_enter() and __bpf_tramp_exit() functions, passing them
the struct bpf_tramp_image *im pointer as an argument in R0.

The trampoline generation code uses emit_addr_mov_i64() to emit
instructions for moving the bpf_tramp_image address into R0, but
emit_addr_mov_i64() assumes the address to be in the vmalloc() space
and uses only 48 bits. Because bpf_tramp_image is allocated using
kzalloc(), its address can use more than 48-bits, in this case the
trampoline will pass an invalid address to __bpf_tramp_enter/exit()
causing a kernel crash.

Fix this by using emit_a64_mov_i64() in place of emit_addr_mov_i64()
as it can work with addresses that are greater than 48-bits.

Fixes: efc9909fdce0 ("bpf, arm64: Add bpf trampoline for arm64")
Signed-off-by: Puranjay Mohan <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Closes: https://lore.kernel.org/all/SJ0PR15MB461564D3F7E7A763498CA6A8CBDB2@SJ0PR15MB4615.namprd15.prod.outlook.com/
Link: https://lore.kernel.org/bpf/[email protected]

Merge branch 'BPF selftests misc fixes'

Geliang Tang says:

====================
v2:
- only check the first "link" (link_nl) in test_mixed_links().
- Drop patch 2 in v1.

This patchset fixes a segfault and a bpf object leak in test_progs.

It is a resend patch 1 out of "skip ENOTSUPP BPF selftests" set as Eduard
suggested. Together with another fix for xdp_adjust_tail.
====================

Signed-off-by: Martin KaFai Lau <[email protected]>

selftests/bpf: Close obj in error path in xdp_adjust_tail

If bpf_object__load() fails in test_xdp_adjust_frags_tail_grow(), "obj"
opened before this should be closed. So use "goto out" to close it instead
of using "return" here.

Fixes: 110221081aac ("bpf: selftests: update xdp_adjust_tail selftest to include xdp frags")
Signed-off-by: Geliang Tang <[email protected]>
Link: https://lore.kernel.org/r/f282a1ed2d0e3fb38cceefec8e81cabb69cab260.1720615848.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <[email protected]>

selftests/bpf: Null checks for links in bpf_tcp_ca

Run bpf_tcp_ca selftests (./test_progs -t bpf_tcp_ca) on a Loongarch
platform, some "Segmentation fault" errors occur:

'''
test_dctcp:PASS:bpf_dctcp__open_and_load 0 nsec
test_dctcp:FAIL:bpf_map__attach_struct_ops unexpected error: -524
#29/1    bpf_tcp_ca/dctcp:FAIL
test_cubic:PASS:bpf_cubic__open_and_load 0 nsec
test_cubic:FAIL:bpf_map__attach_struct_ops unexpected error: -524
#29/2    bpf_tcp_ca/cubic:FAIL
test_dctcp_fallback:PASS:dctcp_skel 0 nsec
test_dctcp_fallback:PASS:bpf_dctcp__load 0 nsec
test_dctcp_fallback:FAIL:dctcp link unexpected error: -524
#29/4    bpf_tcp_ca/dctcp_fallback:FAIL
test_write_sk_pacing:PASS:open_and_load 0 nsec
test_write_sk_pacing:FAIL:attach_struct_ops unexpected error: -524
#29/6    bpf_tcp_ca/write_sk_pacing:FAIL
test_update_ca:PASS:open 0 nsec
test_update_ca:FAIL:attach_struct_ops unexpected error: -524
settcpca:FAIL:setsockopt unexpected setsockopt: \
actual -1 == expected -1
(network_helpers.c:99: errno: No such file or directory) \
Failed to call post_socket_cb
start_test:FAIL:start_server_str unexpected start_server_str: \
actual -1 == expected -1
test_update_ca:FAIL:ca1_ca1_cnt unexpected ca1_ca1_cnt: \
actual 0 <= expected 0
#29/9    bpf_tcp_ca/update_ca:FAIL
#29      bpf_tcp_ca:FAIL
Caught signal #11!
Stack trace:
./test_progs(crash_handler+0x28)[0x5555567ed91c]
linux-vdso.so.1(__vdso_rt_sigreturn+0x0)[0x7ffffee408b0]
./test_progs(bpf_link__update_map+0x80)[0x555556824a78]
./test_progs(+0x94d68)[0x5555564c4d68]
./test_progs(test_bpf_tcp_ca+0xe8)[0x5555564c6a88]
./test_progs(+0x3bde54)[0x5555567ede54]
./test_progs(main+0x61c)[0x5555567efd54]
/usr/lib64/libc.so.6(+0x22208)[0x7ffff2aaa208]
/usr/lib64/libc.so.6(__libc_start_main+0xac)[0x7ffff2aaa30c]
./test_progs(_start+0x48)[0x55555646bca8]
Segmentation fault
'''

This is because BPF trampoline is not implemented on Loongarch yet,
"link" returned by bpf_map__attach_struct_ops() is NULL. test_progs
crashs when this NULL link passes to bpf_link__update_map(). This
patch adds NULL checks for all links in bpf_tcp_ca to fix these errors.
If "link" is NULL, goto the newly added label "out" to destroy the skel.

v2:
- use "goto out" instead of "return" as Eduard suggested.

Fixes: 06da9f3bd641 ("selftests/bpf: Test switching TCP Congestion Control algorithms.")
Signed-off-by: Geliang Tang <[email protected]>
Reviewed-by: Alan Maguire <[email protected]>
Link: https://lore.kernel.org/r/b4c841492bd4ed97964e4e61e92827ce51bf1dc9.1720615848.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <[email protected]>

Merge branch 'use network helpers, part 8'

Geliang Tang says:

====================
v11:
- new patches 2, 4, 6.
- drop expect_errno from network_helper_opts as Eduard and Martin
suggested.
- drop sockmap_ktls patches from this set.

v10:
- a new patch 10 is added.
- patches 1-6, 8-9 unchanged, only commit logs updated.
- "err = -errno" is used in patches 7, 11, 12 to get the real error
number before checking value of "err".

v9:
- new patches 5-7, new struct member expect_errno for network_helper_opts.
- patches 1-4, 8-9 unchanged.
- update patches 10-11 to make sure all tests pass.

v8:
- only patch 8 updated, to fix errors reported by CI.

v7:
- address Martin's comments in v6. (thanks)
- use MAX(opts->backlog, 0) instead of opts->backlog.
- use connect_to_fd_opts instead connect_to_fd.
- more ASSERT_* to check errors.

v6:
- update patch 6 as Daniel suggested. (thanks)

v5:
- keep make_server and make_client as Eduard suggested.

v4:
- a new patch to use make_sockaddr in sockmap_ktls.
- a new patch to close fd in error path in drop_on_reuseport.
- drop make_server() in patch 7.
- drop make_client() too in patch 9.

v3:
- a new patch to add backlog for network_helper_opts.
- use start_server_str in sockmap_ktls now, not start_server.

v2:
- address Eduard's comments in v1. (thanks)
- fix errors reported by CI.

This patch set uses network helpers in sk_lookup, and drop the local
helpers inetaddr_len() and make_socket().
====================

Signed-off-by: Martin KaFai Lau <[email protected]>

selftests/bpf: Use connect_fd_to_fd in sk_lookup

This patch uses public helper connect_fd_to_fd() exported in
network_helpers.h instead of using getsockname() + connect() in
run_lookup_prog() in prog_tests/sk_lookup.c. This can simplify
the code.

Signed-off-by: Geliang Tang <[email protected]>
Link: https://lore.kernel.org/r/7077c277cde5a1864cdc244727162fb75c8bb9c5.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <[email protected]>

selftests/bpf: Use start_server_addr in sk_lookup

This patch uses public helper start_server_addr() in udp_recv_send()
in prog_tests/sk_lookup.c to simplify the code.

And use ASSERT_OK_FD() to check fd returned by start_server_addr().

Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Link: https://lore.kernel.org/r/f11cabfef4a2170ecb66a1e8e2e72116d8f621b3.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <[email protected]>

selftests/bpf: Use start_server_str in sk_lookup

This patch uses public helper start_server_str() to simplify make_server()
in prog_tests/sk_lookup.c.

Add a callback setsockopts() to do all sockopts, set it to post_socket_cb
pointer of struct network_helper_opts. And add a new struct cb_opts to save
the data needed to pass to the callback. Then pass this network_helper_opts
to start_server_str().

Also use ASSERT_OK_FD() to check fd returned by start_server_str().

Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Link: https://lore.kernel.org/r/5981539f5591d2c4998c962ef2bf45f34c940548.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <[email protected]>

selftests/bpf: Close fd in error path in drop_on_reuseport

In the error path when update_lookup_map() fails in drop_on_reuseport in
prog_tests/sk_lookup.c, "server1", the fd of server 1, should be closed.
This patch fixes this by using "goto close_srv1" lable instead of "detach"
to close "server1" in this case.

Fixes: 0ab5539f8584 ("selftests/bpf: Tests for BPF_SK_LOOKUP attach point")
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Link: https://lore.kernel.org/r/86aed33b4b0ea3f04497c757845cff7e8e621a2d.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <[email protected]>

selftests/bpf: Add ASSERT_OK_FD macro

Add a new dedicated ASSERT macro ASSERT_OK_FD to test whether a socket
FD is valid or not. It can be used to replace macros ASSERT_GT(fd, 0, ""),
ASSERT_NEQ(fd, -1, "") or statements (fd < 0), (fd != -1).

Suggested-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Link: https://lore.kernel.org/r/ded75be86ac630a3a5099739431854c1ec33f0ea.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <[email protected]>

selftests/bpf: Add backlog for network_helper_opts

Some callers expect __start_server() helper to pass their own "backlog"
value to listen() instead of the default of 1. So this patch adds struct
member "backlog" for network_helper_opts to allow callers to set "backlog"
value via start_server_str() helper.

listen(fd, 0 /* backlog */) can be used to enforce syncookie. Meaning
backlog 0 is a legit value.

Using 0 as a default and changing it to 1 here is fine. It makes the test
program easier to write for the common case. Enforcing syncookie mode by
using backlog 0 is a niche use case but it should at least have a way for
the caller to do that. Thus, -ve backlog value is used here for the
syncookie use case. Please see the comment in network_helpers.h for
the details.

Signed-off-by: Geliang Tang <[email protected]>
Link: https://lore.kernel.org/r/1660229659b66eaad07aa2126e9c9fe217eba0dd.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <[email protected]>

selftests/bpf: fix compilation failure when CONFIG_NF_FLOW_TABLE=m

In many cases, kernel netfilter functionality is built as modules.
If CONFIG_NF_FLOW_TABLE=m in particular, progs/xdp_flowtable.c
(and hence selftests) will fail to compile, so add a ___local
version of "struct flow_ports".

Fixes: c77e572d3a8c ("selftests/bpf: Add selftest for bpf_xdp_flow_lookup kfunc")
Signed-off-by: Alan Maguire <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

bpf: Remove tst_run from lwt_seg6local_prog_ops.

The syzbot reported that the lwt_seg6 related BPF ops can be invoked
via bpf_test_run() without without entering input_action_end_bpf()
first.

Martin KaFai Lau said that self test for BPF_PROG_TYPE_LWT_SEG6LOCAL
probably didn't work since it was introduced in commit 04d4b274e2a
("ipv6: sr: Add seg6local action End.BPF"). The reason is that the
per-CPU variable seg6_bpf_srh_states::srh is never assigned in the self
test case but each BPF function expects it.

Remove test_run for BPF_PROG_TYPE_LWT_SEG6LOCAL.

Suggested-by: Martin KaFai Lau <[email protected]>
Reported-by: [email protected]
Fixes: d1542d4ae4df ("seg6: Use nested-BH locking for seg6_bpf_srh_states.")
Fixes: 004d4b274e2a ("ipv6: sr: Add seg6local action End.BPF")
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>

bpf: relax zero fixed offset constraint on KF_TRUSTED_ARGS/KF_RCU

Currently, BPF kfuncs which accept trusted pointer arguments
i.e. those flagged as KF_TRUSTED_ARGS, KF_RCU, or KF_RELEASE, all
require an original/unmodified trusted pointer argument to be supplied
to them. By original/unmodified, it means that the backing register
holding the trusted pointer argument that is to be supplied to the BPF
kfunc must have its fixed offset set to zero, or else the BPF verifier
will outright reject the BPF program load. However, this zero fixed
offset constraint that is currently enforced by the BPF verifier onto
BPF kfuncs specifically flagged to accept KF_TRUSTED_ARGS or KF_RCU
trusted pointer arguments is rather unnecessary, and can limit their
usability in practice. Specifically, it completely eliminates the
possibility of constructing a derived trusted pointer from an original
trusted pointer. To put it simply, a derived pointer is a pointer
which points to one of the nested member fields of the object being
pointed to by the original trusted pointer.

This patch relaxes the zero fixed offset constraint that is enforced
upon BPF kfuncs which specifically accept KF_TRUSTED_ARGS, or KF_RCU
arguments. Although, the zero fixed offset constraint technically also
applies to BPF kfuncs accepting KF_RELEASE arguments, relaxing this
constraint for such BPF kfuncs has subtle and unwanted
side-effects. This was discovered by experimenting a little further
with an initial version of this patch series [0]. The primary issue
with relaxing the zero fixed offset constraint on BPF kfuncs accepting
KF_RELEASE arguments is that it'd would open up the opportunity for
BPF programs to supply both trusted pointers and derived trusted
pointers to them. For KF_RELEASE BPF kfuncs specifically, this could
be problematic as resources associated with the backing pointer could
be released by the backing BPF kfunc and cause instabilities for the
rest of the kernel.

With this new fixed offset semantic in-place for BPF kfuncs accepting
KF_TRUSTED_ARGS and KF_RCU arguments, we now have more flexibility
when it comes to the BPF kfuncs that we're able to introduce moving
forward.

Early discussions covering the possibility of relaxing the zero fixed
offset constraint can be found using the link below. This will provide
more context on where all this has stemmed from [1].

Notably, pre-existing tests have been updated such that they provide
coverage for the updated zero fixed offset
functionality. Specifically, the nested offset test was converted from
a negative to positive test as it was already designed to assert zero
fixed offset semantics of a KF_TRUSTED_ARGS BPF kfunc.

[0] https://lore.kernel.org/bpf/[email protected]/
[1] https://lore.kernel.org/bpf/[email protected]/

Signed-off-by: Matt Bobrowski <[email protected]>
Acked-by: Kumar Kartikeya Dwivedi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

Merge branch 'fix-libbpf-bpf-skeleton-forward-backward-compat'

Andrii Nakryiko says:

====================
Fix libbpf BPF skeleton forward/backward compat

Fix recently identified (but long standing) bug with handling BPF skeleton
forward and backward compatibility. On libbpf side, even though BPF skeleton
was always designed to be forward and backwards compatible through recording
actual size of constrituents of BPF skeleton itself (map/prog/var skeleton
definitions), libbpf implementation did implicitly hard-code those sizes by
virtue of using a trivial array access syntax.

This issue will only affect libbpf used as a shared library. Statically
compiled libbpfs will always be in sync with BPF skeleton, bypassing this
problem altogether.

This patch set fixes libbpf, but also mitigates the problem for old libbpf
versions by teaching bpftool to generate more conservative BPF skeleton,
if possible (i.e., if there are no struct_ops maps defined).

v1->v2:
- fix SOB, add acks, typo fixes (Quentin, Eduard);
- improve reporting of skipped map auto-attachment (Alan, Eduard).
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

libbpf: improve old BPF skeleton handling for map auto-attach

Improve how we handle old BPF skeletons when it comes to BPF map
auto-attachment. Emit one warn-level message per each struct_ops map
that could have been auto-attached, if user provided recent enough BPF
skeleton version. Don't spam log if there are no relevant struct_ops
maps, though.

This should help users realize that they probably need to regenerate BPF
skeleton header with more recent bpftool/libbpf-cargo (or whatever other
means of BPF skeleton generation).

Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Eduard Zingerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

libbpf: fix BPF skeleton forward/backward compat handling

BPF skeleton was designed from day one to be extensible. Generated BPF
skeleton code specifies actual sizes of map/prog/variable skeletons for
that reason and libbpf is supposed to work with newer/older versions
correctly.

Unfortunately, it was missed that we implicitly embed hard-coded most
up-to-date (according to libbpf's version of libbpf.h header used to
compile BPF skeleton header) sizes of those structs, which can differ
from the actual sizes at runtime when libbpf is used as a shared
library.

We have a few places were we just index array of maps/progs/vars, which
implicitly uses these potentially invalid sizes of structs.

This patch aims to fix this problem going forward. Once this lands,
we'll backport these changes in Github repo to create patched releases
for older libbpfs.

Acked-by: Eduard Zingerman <[email protected]>
Reviewed-by: Alan Maguire <[email protected]>
Fixes: d66562fba1ce ("libbpf: Add BPF object skeleton support")
Fixes: 430025e5dca5 ("libbpf: Add subskeleton scaffolding")
Fixes: 08ac454e258e ("libbpf: Auto-attach struct_ops BPF maps in BPF skeleton")
Co-developed-by: Mykyta Yatsenko <[email protected]>
Signed-off-by: Mykyta Yatsenko <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

bpftool: improve skeleton backwards compat with old buggy libbpfs

Old versions of libbpf don't handle varying sizes of bpf_map_skeleton
struct correctly. As such, BPF skeleton generated by newest bpftool
might not be compatible with older libbpf (though only when libbpf is
used as a shared library), even though it, by design, should.

Going forward libbpf will be fixed, plus we'll release bug fixed
versions of relevant old libbpfs, but meanwhile try to mitigate from
bpftool side by conservatively assuming older and smaller definition of
bpf_map_skeleton, if possible. Meaning, if there are no struct_ops maps.

If there are struct_ops, then presumably user would like to have
auto-attaching logic and struct_ops map link placeholders, so use the
full bpf_map_skeleton definition in that case.

Acked-by: Quentin Monnet <[email protected]>
Co-developed-by: Mykyta Yatsenko <[email protected]>
Signed-off-by: Mykyta Yatsenko <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Eduard Zingerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

Merge branch 'selftests-drv-net-rss_ctx-more-tests'

Jakub Kicinski says:

====================
selftests: drv-net: rss_ctx: more tests

Add a few more tests for RSS.

v1: https://lore.kernel.org/all/20240705015725 [email protected]/
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: drv-net: rss_ctx: test flow rehashing without impacting traffic

Some workloads may want to rehash the flows in response to an imbalance.
Most effective way to do that is changing the RSS key. Check that changing
the key does not cause link flaps or traffic disruption.

Disrupting traffic for key update is not incorrect, but makes the key
update unusable for rehashing under load.

Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: drv-net: rss_ctx: check behavior of indirection table resizing

Some devices dynamically increase and decrease the size of the RSS
indirection table based on the number of enabled queues.
When that happens driver must maintain the balance of entries
(preferably duplicating the smaller table).

Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: drv-net: rss_ctx: test queue changes vs user RSS config

By default main RSS table should change to include all queues.
When user sets a specific RSS config the driver should preserve it,
even when queue count changes. Driver should refuse to deactivate
queues used in the user-set RSS config.

For additional contexts driver should still refuse to deactivate
queues in use. Whether the contexts should get resized like
context 0 when queue count increases is a bit unclear. I anticipate
most drivers today don't do that. Since main use case for additional
contexts is to set the indir table - it doesn't seem worthwhile to
care about behavior of the default table too much. Don't test that.

Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: drv-net: rss_ctx: factor out send traffic and check

Wrap up sending traffic and checking in which queues it landed
in a helper.

The method used for testing is to send a lot of iperf traffic
and check which queues received the most packets. Those should
be the queues where we expect iperf to land - either because we
installed a filter for the port iperf uses, or we didn't and
expect it to use context 0.

Contexts get disjoint queue sets, but the main context (AKA context 0)
may receive some background traffic (noise).

Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: drv-net: rss_ctx: fix cleanup in the basic test

The basic test may fail without resetting the RSS indir table.
Use the .exec() method to run cleanup early since we re-test
with traffic that returning to default state works.
While at it reformat the doc a tiny bit.

Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: ti: icssg-prueth: add missing deps

Add missing dependency on NET_SWITCHDEV.

Fixes: abd5576b9c57 ("net: ti: icssg-prueth: Add support for ICSSG switch firmware")
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Guillaume La Roque <[email protected]>
Reviewed-by: MD Danish Anwar <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

dt-bindings: net: fsl,fman: add ptimer-handle property

Add ptimer-handle property to link to ptp-timer node handle.
Fix below warning:
arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: fman@1a00000: 'ptimer-handle' do not match any of the regexes: '^ethernet@[a-f0-9]+$', '^mdio@[a-f0-9]+$', '^muram@[a-f0-9]+$', '^phc@[a-f0-9]+$', '^port@[a-f0-9]+$', 'pinctrl-[0-9]+'

Reviewed-by: Rob Herring (Arm) <[email protected]>
Signed-off-by: Frank Li <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

dt-bindings: net: fsl,fman: allow dma-coherent property

Add dma-coherent property to fix below warning.
arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dtb: fman@1a00000: 'dma-coherent', 'ptimer-handle' do not match any of the regexes: '^ethernet@[a-f0-9]+$', '^mdio@[a-f0-9]+$', '^muram@[a-f0-9]+$', '^phc@[a-f0-9]+$', '^port@[a-f0-9]+$', 'pinctrl-[0-9]+'
from schema $id: http://devicetree.org/schemas/net/fsl,fman.yaml#

Reviewed-by: Rob Herring (Arm) <[email protected]>
Signed-off-by: Frank Li <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: tls: Pass union tls_crypto_context pointer to memzero_explicit

Pass union tls_crypto_context pointer, rather than struct
tls_crypto_info pointer, to memzero_explicit().

The address of the pointer is the same before and after.
But the new construct means that the size of the dereferenced pointer type
matches the size being zeroed. Which aids static analysis.

As reported by Smatch:

.../tls_main.c:842 do_tls_setsockopt_conf() error: memzero_explicit() 'crypto_info' too small (4 vs 56)

No functional change intended.
Compile tested only.

Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: forwarding: Make vxlan-bridge-1d pass on debug kernels

The ageing time used by the test is too short for debug kernels and
results in entries being aged out prematurely [1].

Fix by increasing the ageing time.

The same change was done for the VLAN-aware version of the test in
commit dfbab74044be ("selftests: forwarding: Make vxlan-bridge-1q pass
on debug kernels").

[1]
# ./vxlan_bridge_1d.sh
[...]
# TEST: VXLAN: flood before learning                              [ OK ]
# TEST: VXLAN: show learned FDB entry                             [ OK ]
# TEST: VXLAN: learned FDB entry                                  [FAIL]
# veth3: Expected to capture 0 packets, got 4.
# RTNETLINK answers: No such file or directory
# TEST: VXLAN: deletion of learned FDB entry                      [ OK ]
# TEST: VXLAN: Ageing of learned FDB entry                        [FAIL]
# veth3: Expected to capture 0 packets, got 2.
[...]

Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-07-08

The following pull-request contains BPF updates for your *net-next* tree.

We've added 102 non-merge commits during the last 28 day(s) which contain
a total of 127 files changed, 4606 insertions(+), 980 deletions(-).

The main changes are:

1) Support resilient split BTF which cuts down on duplication and makes BTF
   as compact as possible wrt BTF from modules, from Alan Maguire & Eduard Zingerman.

2) Add support for dumping kfunc prototypes from BTF which enables both detecting
   as well as dumping compilable prototypes for kfuncs, from Daniel Xu.

3) Batch of s390x BPF JIT improvements to add support for BPF arena and to implement
   support for BPF exceptions, from Ilya Leoshkevich.

4) Batch of riscv64 BPF JIT improvements in particular to add 12-argument support
   for BPF trampolines and to utilize bpf_prog_pack for the latter, from Pu Lehui.

5) Extend BPF test infrastructure to add a CHECKSUM_COMPLETE validation option
   for skbs and add coverage along with it, from Vadim Fedorenko.

6) Inline bpf_get_current_task/_btf() helpers in the arm64 BPF JIT which gives
   a small 1% performance improvement in micro-benchmarks, from Puranjay Mohan.

7) Extend the BPF verifier to track the delta between linked registers in order
   to better deal with recent LLVM code optimizations, from Alexei Starovoitov.

8) Fix bpf_wq_set_callback_impl() kfunc signature where the third argument should
   have been a pointer to the map value, from Benjamin Tissoires.

9) Extend BPF selftests to add regular expression support for test output matching
   and adjust some of the selftest when compiled under gcc, from Cupertino Miranda.

10) Simplify task_file_seq_get_next() and remove an unnecessary loop which always
    iterates exactly once anyway, from Dan Carpenter.

11) Add the capability to offload the netfilter flowtable in XDP layer through
    kfuncs, from Florian Westphal & Lorenzo Bianconi.

12) Various cleanups in networking helpers in BPF selftests to shave off a few
    lines of open-coded functions on client/server handling, from Geliang Tang.

13) Properly propagate prog->aux->tail_call_reachable out of BPF verifier, so
    that x86 JIT does not need to implement detection, from Leon Hwang.

14) Fix BPF verifier to add a missing check_func_arg_reg_off() to prevent an
    out-of-bounds memory access for dynpointers, from Matt Bobrowski.

15) Fix bpf_session_cookie() kfunc to return __u64 instead of long pointer as
    it might lead to problems on 32-bit archs, from Jiri Olsa.

16) Enhance traffic validation and dynamic batch size support in xsk selftests,
    from Tushar Vyavahare.

bpf-next-for-netdev

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (102 commits)
  selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  selftests/bpf: amend for wrong bpf_wq_set_callback_impl signature
  bpf: helpers: fix bpf_wq_set_callback_impl signature
  libbpf: Add NULL checks to bpf_object__{prev_map,next_map}
  selftests/bpf: Remove exceptions tests from DENYLIST.s390x
  s390/bpf: Implement exceptions
  s390/bpf: Change seen_reg to a mask
  bpf: Remove unnecessary loop in task_file_seq_get_next()
  riscv, bpf: Optimize stack usage of trampoline
  bpf, devmap: Add .map_alloc_check
  selftests/bpf: Remove arena tests from DENYLIST.s390x
  selftests/bpf: Add UAF tests for arena atomics
  selftests/bpf: Introduce __arena_global
  s390/bpf: Support arena atomics
  s390/bpf: Enable arena
  s390/bpf: Support address space cast instruction
  s390/bpf: Support BPF_PROBE_MEM32
  s390/bpf: Land on the next JITed instruction after exception
  s390/bpf: Introduce pre- and post- probe functions
  s390/bpf: Get rid of get_probe_mem_regno()
  ...
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: phy: microchip: lan937x: add support for 100BaseTX PHY

Add support of 100BaseTX PHY build in to LAN9371 and LAN9372 switches.

Signed-off-by: Oleksij Rempel <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Michal Kubiak <[email protected]>
Acked-by: Arun Ramadoss <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

udp: Remove duplicate included header file trace/events/udp.h

Remove duplicate included header file trace/events/udp.h and the
following warning reported by make includecheck:

trace/events/udp.h is included more than once

Compile-tested only.

Signed-off-by: Thorsten Blum <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: tn40xx: add per queue netdev-genl stats support

Add support for the netdev-genl per queue stats API.

./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump qstats-get --json '{"scope":"queue"}'
[{'ifindex': 4,
  'queue-id': 0,
  'queue-type': 'rx',
  'rx-alloc-fail': 0,
  'rx-bytes': 266613,
  'rx-packets': 3325},
{'ifindex': 4,
  'queue-id': 0,
  'queue-type': 'tx',
  'tx-bytes': 142823367,
  'tx-packets': 2387}]

Signed-off-by: FUJITA Tomonori <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

sctp: Fix typos and improve comments

Fix typos s/steam/stream/ and spell out Schedule/Unschedule in the
comments.

Compile-tested only.

Signed-off-by: Thorsten Blum <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

l2tp: fix possible UAF when cleaning up tunnels

syzbot reported a UAF caused by a race when the L2TP work queue closes a
tunnel at the same time as a userspace thread closes a session in that
tunnel.

Tunnel cleanup is handled by a work queue which iterates through the
sessions contained within a tunnel, and closes them in turn.

Meanwhile, a userspace thread may arbitrarily close a session via
either netlink command or by closing the pppox socket in the case of
l2tp_ppp.

The race condition may occur when l2tp_tunnel_closeall walks the list
of sessions in the tunnel and deletes each one.  Currently this is
implemented using list_for_each_safe, but because the list spinlock is
dropped in the loop body it's possible for other threads to manipulate
the list during list_for_each_safe's list walk.  This can lead to the
list iterator being corrupted, leading to list_for_each_safe spinning.
One sequence of events which may lead to this is as follows:

* A tunnel is created, containing two sessions A and B.
* A thread closes the tunnel, triggering tunnel cleanup via the work
   queue.
* l2tp_tunnel_closeall runs in the context of the work queue.  It
   removes session A from the tunnel session list, then drops the list
   lock.  At this point the list_for_each_safe temporary variable is
   pointing to the other session on the list, which is session B, and
   the list can be manipulated by other threads since the list lock has
   been released.
* Userspace closes session B, which removes the session from its parent
   tunnel via l2tp_session_delete.  Since l2tp_tunnel_closeall has
   released the tunnel list lock, l2tp_session_delete is able to call
   list_del_init on the session B list node.
* Back on the work queue, l2tp_tunnel_closeall resumes execution and
   will now spin forever on the same list entry until the underlying
   session structure is freed, at which point UAF occurs.

The solution is to iterate over the tunnel's session list using
list_first_entry_not_null to avoid the possibility of the list
iterator pointing at a list item which may be removed during the walk.

Also, have l2tp_tunnel_closeall ref each session while it processes it
to prevent another thread from freeing it.

cpu1 cpu2
--- ---
pppol2tp_release()

spin_lock_bh(&tunnel->list_lock);
for (;;) {
session = list_first_entry_or_null(&tunnel->session_list,
   struct l2tp_session, list);
if (!session)
break;
list_del_init(&session->list);
spin_unlock_bh(&tunnel->list_lock);

l2tp_session_delete(session);

l2tp_session_delete(session);
spin_lock_bh(&tunnel->list_lock);
}
spin_unlock_bh(&tunnel->list_lock);

Calling l2tp_session_delete on the same session twice isn't a problem
per-se, but if cpu2 manages to destruct the socket and unref the
session to zero before cpu1 progresses then it would lead to UAF.

Reported-by: [email protected]
Reported-by: [email protected]
Fixes: d18d3f0a24fc ("l2tp: replace hlist with simple list for per-tunnel session list")
Signed-off-by: James Chapman <[email protected]>
Signed-off-by: Tom Parkin <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'net-stmmac-qcom-ethqos-enable-2-5g-ethernet-on-sa8775p-ride'

Bartosz Golaszewski says:

====================
net: stmmac: qcom-ethqos: enable 2.5G ethernet on sa8775p-ride

Here are the changes required to enable 2.5G ethernet on sa8775p-ride.
As advised by Andrew Lunn and Russell King, I am reusing the existing
stmmac infrastructure to enable the SGMII loopback and so I dropped the
patches adding new callbacks to the driver core. I also added more
details to the commit message and made sure the workaround is only
enabled on Rev 3 of the board (with AQR115C PHY). Also: dropped any
mentions of the OCSGMII mode.

v2: https://lore.kernel.org/20240627113948 [email protected]
v1: https://lore.kernel.org/20240619184550 [email protected]
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: stmmac: qcom-ethqos: enable SGMII loopback during DMA reset on sa8775p-ride-r3

On sa8775p-ride-r3 the RX clocks from the AQR115C PHY are not available at
the time of the DMA reset. We can however extract the RX clock from the
internal SERDES block. Once the link is up, we can revert to the
previous state.

The AQR115C PHY doesn't support in-band signalling so we can count on
getting the link up notification and safely reuse existing callbacks
which are already used by another HW quirk workaround which enables the
functional clock to avoid a DMA reset due to timeout.

Only enable loopback on revision 3 of the board - check the phy_mode to
make sure.

Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: stmmac: qcom-ethqos: add support for 2.5G BASEX mode

Add support for 2.5G speed in 2500BASEX mode to the QCom ethqos driver.

Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: page_pool: fix warning code

WARN_ON_ONCE("string") doesn't really do what appears to
be intended, so fix that.

Signed-off-by: Johannes Berg <[email protected]>
Fixes: 90de47f020db ("page_pool: fragment API support for 32-bit arch with 64-bit DMA")
Link: https://patch.msgid.link/20240705134221.2f4de205caa1.I28496dc0f2ced580282d1fb892048017c4491e21@changeid
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: net: ksft: interrupt cleanly on KeyboardInterrupt

It's very useful to be able to interrupt the tests during development.
Detect KeyboardInterrupt, run the cleanups and exit.

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep

fexit_sleep test runs successfully now on the BPF CI so remove it
from the deny list. ftrace direct calls was blocking tracing programs
on arm64 but it has been resolved by now. For more details see also
discussion in [*].

Signed-off-by: Puranjay Mohan <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

Merge branch 'small-api-fix-for-bpf_wq'

Benjamin Tissoires says:

====================
Small API fix for bpf_wq

I realized this while having a map containing both a struct bpf_timer and
a struct bpf_wq: the third argument provided to the bpf_wq callback is
not the struct bpf_wq pointer itself, but the pointer to the value in
the map.

Which means that the users need to double cast the provided "value" as
this is not a struct bpf_wq *.

This is a change of API, but there doesn't seem to be much users of bpf_wq
right now, so we should be able to go with this right now.

Signed-off-by: Benjamin Tissoires <[email protected]>
---
Changes in v2:
- amended the selftests to retrieve something from the third argument of
the callback
- Link to v1: https://lore.kernel.org/r/20240705-fix-wq-v1-0-91b4d82cd825@kernel.org

---
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

selftests/bpf: amend for wrong bpf_wq_set_callback_impl signature

See the previous patch: the API was wrong, we were provided the pointer
to the value, not the actual struct bpf_wq *.

Signed-off-by: Benjamin Tissoires <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

bpf: helpers: fix bpf_wq_set_callback_impl signature

I realized this while having a map containing both a struct bpf_timer and
a struct bpf_wq: the third argument provided to the bpf_wq callback is
not the struct bpf_wq pointer itself, but the pointer to the value in
the map.

Which means that the users need to double cast the provided "value" as
this is not a struct bpf_wq *.

This is a change of API, but there doesn't seem to be much users of bpf_wq
right now, so we should be able to go with this right now.

Fixes: 81f1d7a583fa ("bpf: wq: add bpf_wq_set_callback_impl")
Signed-off-by: Benjamin Tissoires <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

libbpf: Add NULL checks to bpf_object__{prev_map,next_map}

In the current state, an erroneous call to
bpf_object__find_map_by_name(NULL, ...) leads to a segmentation
fault through the following call chain:

  bpf_object__find_map_by_name(obj = NULL, ...)
  -> bpf_object__for_each_map(pos, obj = NULL)
  -> bpf_object__next_map((obj = NULL), NULL)
  -> return (obj = NULL)->maps

While calling bpf_object__find_map_by_name with obj = NULL is
obviously incorrect, this should not lead to a segmentation
fault but rather be handled gracefully.

As __bpf_map__iter already handles this situation correctly, we
can delegate the check for the regular case there and only add
a check in case the prev or next parameter is NULL.

Signed-off-by: Andreas Ziegler <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

selftests/bpf: Remove exceptions tests from DENYLIST.s390x

Now that the s390x JIT supports exceptions, remove the respective tests
from the denylist.

Signed-off-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

s390/bpf: Implement exceptions

Implement the following three pieces required from the JIT:

- A "top-level" BPF prog (exception_boundary) must save all
  non-volatile registers, and not only the ones that it clobbers.
- A "handler" BPF prog (exception_cb) must switch stack to that of
  exception_boundary, and restore the registers that exception_boundary
  saved.
- arch_bpf_stack_walk() must unwind the stack and provide the results
  in a way that satisfies both bpf_throw() and exception_cb.

Signed-off-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

s390/bpf: Change seen_reg to a mask

Using a mask instead of an array saves a small amount of memory and
allows marking multiple registers as seen with a simple "or". Another
positive side-effect is that it speeds up verification with jitterbug.

Signed-off-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf: Remove unnecessary loop in task_file_seq_get_next()

After commit 0ede61d8589c ("file: convert to SLAB_TYPESAFE_BY_RCU") this
loop always iterates exactly one time. Delete the for statement and pull
the code in a tab.

Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Christian Brauner <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

riscv, bpf: Optimize stack usage of trampoline

When BPF_TRAMP_F_CALL_ORIG is not set, stack space for passing arguments
on stack doesn't need to be reserved because the original function is
not called.

Only reserve space for stacked arguments when BPF_TRAMP_F_CALL_ORIG is
set.

Signed-off-by: Puranjay Mohan <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Pu Lehui <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

net: dsa: microchip: lan9371/2: update MAC capabilities for port 4

Set proper MAC capabilities for port 4 on LAN9371 and LAN9372 switches with
integrated 100BaseTX PHY. And introduce the is_lan937x_tx_phy() function to
reuse it where applicable.

Signed-off-by: Oleksij Rempel <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Acked-by: Arun Ramadoss <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

act_ct: prepare for stolen verdict coming from conntrack and nat engine

At this time, conntrack either returns NF_ACCEPT or NF_DROP.
To improve debuging it would be nice to be able to replace NF_DROP verdict
with NF_DROP_REASON() helper,

This helper releases the skb instantly (so drop_monitor can pinpoint
exact location) and returns NF_STOLEN.

Prepare call sites to deal with this before introducing such changes
in conntrack and nat core.

Signed-off-by: Florian Westphal <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'net-pse-pd-add-new-pse-c33-features'

Kory Maincent says:

====================
net: pse-pd: Add new PSE c33 features

This patch series adds new c33 features to the PSE API.
- Expand the PSE PI informations status with power, class and failure
reason
- Add the possibility to get and set the PSE PIs power limit

v5: https://lore.kernel.org/r/20240628-feature_poe_power_cap-v5-0-5e1375d3817a@bootlin.com
v4: https://lore.kernel.org/r/20240625-feature_poe_power_cap-v4-0-b0813aad57d5@bootlin.com
v3: https://lore.kernel.org/r/20240614-feature_poe_power_cap-v3-0-a26784e78311@bootlin.com
v2: https://lore.kernel.org/r/20240607-feature_poe_power_cap-v2-0-c03c2deb83ab@bootlin.com
v1: https://lore.kernel.org/r/20240529-feature_poe_power_cap-v1-0-0c4b1d5953b8@bootlin.com
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: pse-pd: pd692x0: Enhance with new current limit and voltage read callbacks

This patch expands PSE callbacks with newly introduced
pi_get/set_current_limit() and pi_get_voltage() callback.
It also add the power limit ranges description in the status returned.
The only way to set ps692x0 port power limit is by configure the power
class plus a small power supplement which maximum depends on each class.

Acked-by: Oleksij Rempel <[email protected]>
Signed-off-by: Kory Maincent <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

netlink: specs: Expand the PSE netlink command with C33 pw-limit attributes

Expand the c33 PSE attributes with power limit to be able to set and get
the PSE Power Interface power limit.

./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do pse-get
             --json '{"header":{"dev-name":"eth1"}}'
{'c33-pse-actual-pw': 1700,
'c33-pse-admin-state': 3,
'c33-pse-avail-pw-limit': 97500,
'c33-pse-pw-class': 4,
'c33-pse-pw-d-status': 4,
'c33-pse-pw-limit-ranges': [{'max': 18100, 'min': 15000},
                             {'max': 38000, 'min': 30000},
                             {'max': 65000, 'min': 60000},
                             {'max': 97500, 'min': 90000}],
'header': {'dev-index': 5, 'dev-name': 'eth1'}}

./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do pse-set
             --json '{"header":{"dev-name":"eth1"},
                      "c33-pse-avail-pw-limit":19000}'
None

Signed-off-by: Kory Maincent <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: ethtool: Add new power limit get and set features

This patch expands the status information provided by ethtool for PSE c33
with available power limit and available power limit ranges. It also adds
a call to pse_ethtool_set_pw_limit() to configure the PSE control power
limit.

Reviewed-by: Oleksij Rempel <[email protected]>
Signed-off-by: Kory Maincent <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: pse-pd: Add new power limit get and set c33 features

This patch add a way to get and set the power limit of a PSE PI.
For that it uses regulator API callbacks wrapper like get_voltage() and
get/set_current_limit() as power is simply V * I.
We used mW unit as defined by the IEEE 802.3-2022 standards.

set_current_limit() uses the voltage return by get_voltage() and the
desired power limit to calculate the current limit. get_voltage() callback
is then mandatory to set the power limit.

get_current_limit() callback is by default looking at a driver callback
and fallback to extracting the current limit from _pse_ethtool_get_status()
if the driver does not set its callback. We prefer let the user the choice
because ethtool_get_status return much more information than the current
limit.

expand pse status with c33_pw_limit_ranges to return the ranges available
to configure the power limit.

Reviewed-by: Sai Krishna <[email protected]>
Acked-by: Oleksij Rempel <[email protected]>
Signed-off-by: Kory Maincent <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: pse-pd: pd692x0: Expand ethtool status message

This update expands pd692x0_ethtool_get_status() callback with newly
introduced details such as the detected class, current power delivered,
and extended state information.

Acked-by: Oleksij Rempel <[email protected]>
Signed-off-by: Kory Maincent <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

netlink: specs: Expand the PSE netlink command with C33 new features

Expand the c33 PSE attributes with PSE class, extended state information
and power consumption.

./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do pse-get
--json '{"header":{"dev-name":"eth0"}}'
{'c33-pse-actual-pw': 1700,
'c33-pse-admin-state': 3,
'c33-pse-pw-class': 4,
'c33-pse-pw-d-status': 4,
'header': {'dev-index': 4, 'dev-name': 'eth0'}}

./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do pse-get
--json '{"header":{"dev-name":"eth0"}}'
{'c33-pse-admin-state': 3,
'c33-pse-ext-state': 'mr-mps-valid',
'c33-pse-ext-substate': 2,
'c33-pse-pw-d-status': 2,
'header': {'dev-index': 4, 'dev-name': 'eth0'}}

Signed-off-by: Kory Maincent <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: ethtool: pse-pd: Expand C33 PSE status with class, power and extended state

This update expands the status information provided by ethtool for PSE c33.
It includes details such as the detected class, current power delivered,
and extended state information.

Reviewed-by: Oleksij Rempel <[email protected]>
Signed-off-by: Kory Maincent <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'net-openvswitch-add-sample-multicasting'

Adrian Moreno says:

====================
net: openvswitch: Add sample multicasting.

** Background **
Currently, OVS supports several packet sampling mechanisms (sFlow,
per-bridge IPFIX, per-flow IPFIX). These end up being translated into a
userspace action that needs to be handled by ovs-vswitchd's handler
threads only to be forwarded to some third party application that
will somehow process the sample and provide observability on the
datapath.

A particularly interesting use-case is controller-driven
per-flow IPFIX sampling where the OpenFlow controller can add metadata
to samples (via two 32bit integers) and this metadata is then available
to the sample-collecting system for correlation.

** Problem **
The fact that sampled traffic share netlink sockets and handler thread
time with upcalls, apart from being a performance bottleneck in the
sample extraction itself, can severely compromise the datapath,
yielding this solution unfit for highly loaded production systems.

Users are left with little options other than guessing what sampling
rate will be OK for their traffic pattern and system load and dealing
with the lost accuracy.

Looking at available infrastructure, an obvious candidated would be
to use psample. However, it's current state does not help with the
use-case at stake because sampled packets do not contain user-defined
metadata.

** Proposal **
This series is an attempt to fix this situation by extending the
existing psample infrastructure to carry a variable length
user-defined cookie.

The main existing user of psample is tc's act_sample. It is also
extended to forward the action's cookie to psample.

Finally, a new OVS action (OVS_SAMPLE_ATTR_PSAMPLE) is created.
It accepts a group and an optional cookie and uses psample to
multicast the packet and the metadata.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: openvswitch: add psample test

Add a test to verify sampling packets via psample works.

In order to do that, create a subcommand in ovs-dpctl.py to listen to
on the psample multicast group and print samples.

Reviewed-by: Aaron Conole <[email protected]>
Tested-by: Ilya Maximets <[email protected]>
Signed-off-by: Adrian Moreno <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: openvswitch: parse trunc action

The trunc action was supported decode-able but not parse-able. Add
support for parsing the action string.

Reviewed-by: Aaron Conole <[email protected]>
Signed-off-by: Adrian Moreno <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: openvswitch: add userspace parsing

The userspace action lacks parsing support plus it contains a bug in the
name of one of its attributes.

This patch makes userspace action work.

Reviewed-by: Aaron Conole <[email protected]>
Signed-off-by: Adrian Moreno <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: openvswitch: add psample action

Add sample and psample action support to ovs-dpctl.py.

Refactor common attribute parsing logic into an external function.

Reviewed-by: Aaron Conole <[email protected]>
Signed-off-by: Adrian Moreno <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: openvswitch: store sampling probability in cb.

When a packet sample is observed, the sampling rate that was used is
important to estimate the real frequency of such event.

Store the probability of the parent sample action in the skb's cb area
and use it in psample action to pass it down to psample module.

Reviewed-by: Aaron Conole <[email protected]>
Acked-by: Eelco Chaudron <[email protected]>
Reviewed-by: Ilya Maximets <[email protected]>
Signed-off-by: Adrian Moreno <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: openvswitch: add psample action

Add support for a new action: psample.

This action accepts a u32 group id and a variable-length cookie and uses
the psample multicast group to make the packet available for
observability.

The maximum length of the user-defined cookie is set to 16, same as
tc_cookie, to discourage using cookies that will not be offloadable.

Reviewed-by: Michal Kubiak <[email protected]>
Reviewed-by: Aaron Conole <[email protected]>
Reviewed-by: Ilya Maximets <[email protected]>
Acked-by: Eelco Chaudron <[email protected]>
Signed-off-by: Adrian Moreno <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: psample: allow using rate as probability

Although not explicitly documented in the psample module itself, the
definition of PSAMPLE_ATTR_SAMPLE_RATE seems inherited from act_sample.

Quoting tc-sample(8):
"RATE of 100 will lead to an average of one sampled packet out of every
100 observed."

With this semantics, the rates that we can express with an unsigned
32-bits number are very unevenly distributed and concentrated towards
"sampling few packets".
For example, we can express a probability of 2.32E-8% but we
cannot express anything between 100% and 50%.

For sampling applications that are capable of sampling a decent
amount of packets, this sampling rate semantics is not very useful.

Add a new flag to the uAPI that indicates that the sampling rate is
expressed in scaled probability, this is:
- 0 is 0% probability, no packets get sampled.
- U32_MAX is 100% probability, all packets get sampled.

Reviewed-by: Aaron Conole <[email protected]>
Acked-by: Eelco Chaudron <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: Adrian Moreno <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: psample: skip packet copy if no listeners

If nobody is listening on the multicast group, generating the sample,
which involves copying packet data, seems completely unnecessary.

Return fast in this case.

Reviewed-by: Aaron Conole <[email protected]>
Acked-by: Eelco Chaudron <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Adrian Moreno <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: sched: act_sample: add action cookie to sample

If the action has a user_cookie, pass it along to the sample so it can
be easily identified.

Reviewed-by: Michal Kubiak <[email protected]>
Reviewed-by: Aaron Conole <[email protected]>
Acked-by: Eelco Chaudron <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: Adrian Moreno <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: psample: add user cookie

Add a user cookie to the sample metadata so that sample emitters can
provide more contextual information to samples.

If present, send the user cookie in a new attribute:
PSAMPLE_ATTR_USER_COOKIE.

Reviewed-by: Michal Kubiak <[email protected]>
Acked-by: Eelco Chaudron <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: Adrian Moreno <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: ethernet: mtk_eth_soc: implement .{get,set}_pauseparam ethtool ops

Implement operations to get and set flow-control link parameters.
Both is done by simply calling phylink_ethtool_{get,set}_pauseparam().
Fix whitespace in mtk_ethtool_ops while at it.

Signed-off-by: Daniel Golle <[email protected]>
Reviewed-by: Michal Kubiak <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Tested-by: Rui Salvaterra <[email protected]>
Link: https://patch.msgid.link/e3ece47323444631d6cb479f32af0dfd6d145be0.1720088047.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <[email protected]>

net: ethernet: mtk_ppe: Change PPE entries number to 16K

MT7981,7986 and 7988 all supports 32768 PPE entries, and MT7621/MT7620
supports 16384 PPE entries, but only set to 8192 entries in driver. So
incrase max entries to 16384 instead.

Signed-off-by: Elad Yifee <[email protected]>
Signed-off-by: Shengyu Qu <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/TY3P286MB261103F937DE4EEB0F88437D98DE2@TY3P286MB2611.JPNP286.PROD.OUTLOOK.COM
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'net-constify-struct-regmap_bus-regmap_config'

Javier Carrasco says:

====================
net: constify struct regmap_bus/regmap_config

This series adds the const modifier to the remaining regmap_bus and
regmap_config structs within the net subsystem that are effectively
used as const (i.e., only read after their declaration), but kept as
writtable data.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: ar9331: constify struct regmap_bus

`ar9331_sw_bus` is not modified and can be declared as const to
move its data to a read-only section.

Signed-off-by: Javier Carrasco <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: encx24j600: constify struct regmap_bus/regmap_config

`regmap_encx24j600`, `phycfg` and `phymap_encx24j600` are not modified
and can be declared as const to move their data to a read-only section.

Signed-off-by: Javier Carrasco <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: ti: icss-iep: constify struct regmap_config

`am654_icss_iep_regmap_config` is only assigned to a pointer that passes
the data as read-only.

Add the const modifier to the struct and pointer to move the data to a
read-only section.

Signed-off-by: Javier Carrasco <[email protected]>
Reviewed-by: Roger Quadros <[email protected]>
Reviewed-by: MD Danish Anwar <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: qca8k: constify struct regmap_config

`qca8k_regmap_config` is not modified and can be declared as const to
move its data to a read-only section.

Signed-off-by: Javier Carrasco <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tun: Assign missing bpf_net_context.

During the introduction of struct bpf_net_context handling for
XDP-redirect, the tun driver has been missed.
Jakub also pointed out that there is another call chain to
do_xdp_generic() originating from netif_receive_skb() and drivers may
use it outside from the NAPI context.

Set the bpf_net_context before invoking BPF XDP program within the TUN
driver. Set the bpf_net_context also in do_xdp_generic() if a xdp
program is available.

Reported-by: [email protected]
Reported-by: [email protected]
Reported-by: [email protected]
Fixes: 401cb7dae8130 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.")
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: ethernet: mediatek: Allow gaps in MAC allocation

Some devices with MediaTek SoCs don't use the first but only the second
MAC in the chip. Especially with MT7981 which got a built-in 1GE PHY
connected to the second MAC this is quite common.
Make sure to reset and enable PSE also in those cases by skipping gaps
using 'continue' instead of aborting the loop using 'break'.

Fixes: dee4dd10c79a ("net: ethernet: mtk_eth_soc: ppe: add support for multiple PPEs")
Suggested-by: Elad Yifee <[email protected]>
Signed-off-by: Daniel Golle <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Link: https://patch.msgid.link/379ae584cea112db60f4ada79c7e5ba4f3364a64.1719862038.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <[email protected]>

openvswitch: prepare for stolen verdict coming from conntrack and nat engine

At this time, conntrack either returns NF_ACCEPT or NF_DROP.
To improve debuging it would be nice to be able to replace NF_DROP
verdict with NF_DROP_REASON() helper,

This helper releases the skb instantly (so drop_monitor can pinpoint
precise location) and returns NF_STOLEN.

Prepare call sites to deal with this before introducing such changes
in conntrack and nat core.

Signed-off-by: Florian Westphal <[email protected]>
Reviewed-by: Aaron Conole <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'pcs-xpcs-mmap' into main

Serge Semin <fancer says:

====================
net: pcs: xpcs: Add memory-mapped device support

The main goal of this series is to extend the DW XPCS device support in
the kernel. Particularly the patchset adds a support of the DW XPCS
device with the MCI/APB3 IO interface registered as a platform device. In
order to have them utilized by the DW XPCS core the fwnode-based DW XPCS
descriptor creation procedure has been introduced. Finally the STMMAC
driver has been altered to support the DW XPCS passed via the 'pcs-handle'
property.

Note the series has been significantly re-developed since v1. So I even
had to change the subject. Anyway I've done my best to take all the noted
into account.

The series traditionally starts with a set of the preparation patches.
First one just moves the generic DW XPCS IDs macros from the internal
header file to the external one where some other IDs also reside. Second
patch splits up the xpcs_create() method to a set of the coherent
sub-functions for the sake of the further easier updates and to have it
looking less complicated. The goal of the next three patches is to extend
the DW XPCS ID management code by defining a new dw_xpcs_info structure
with both PCS and PMA IDs.

The next two patches provide the DW XPCS device DT-bindings and the
respective platform-device driver for the memory-mapped DW XPCS devices.
Besides the later patch makes use of the introduced dw_xpcs_info structure
to pre-define the DW XPCS IDs based on the platform-device compatible
string. Thus if there is no way to auto-identify the XPCS device
capabilities it can be done based on the custom device IDs passed via the
MDIO-device platform data.

Final DW XPCS driver change is about adding a new method of the DW XPCS
descriptor creation. The xpcs_create_fwnode() function has been introduced
with the same semantics as a similar method recently added to the Lynx PCS
driver. It's supposed to be called with the fwnode pointing to the DW XPCS
device node, for which the XPCS descriptor will be created.

The series is terminated with two STMMAC driver patches. The former one
simplifies the DW XPCS descriptor registration procedure by dropping the
MDIO-bus scanning and creating the descriptor for the particular device
address. The later patch alters the STMMAC PCS setup method so one would
support the DW XPCS specified via the "pcs-handle" property.

That's it for now. Thanks for review in advance. Any tests are very
welcome. After this series is merged in, I'll submit another one which
adds the generic 10GBase-R and 10GBase-X interfaces support to the STMMAC
and DW XPCS driver with the proper CSRs re-initialization, PMA
initialization and reference clock selection as it's described in the
Synopsys DW XPCS HW manual.

Link: https://lore.kernel.org/netdev/[email protected]
Changelog v2:
- Drop the patches:
  [PATCH net-next 01/16] net: pcs: xpcs: Drop sentinel entry from 2500basex ifaces list
  [PATCH net-next 02/16] net: pcs: xpcs: Drop redundant workqueue.h include directive
  [PATCH net-next 03/16] net: pcs: xpcs: Return EINVAL in the internal methods
  [PATCH net-next 04/16] net: pcs: xpcs: Explicitly return error on caps validation
  as ones have already been merged into the kernel repo:
Link: https://lore.kernel.org/netdev/[email protected]/
- Drop the patches:
  [PATCH net-next 14/16] net: stmmac: Pass netdev to XPCS setup function
  [PATCH net-next 15/16] net: stmmac: Add dedicated XPCS cleanup method
  as ones have already been merged into the kernel repo:
Link: https://lore.kernel.org/netdev/[email protected]/
- Drop the patch:
  [PATCH net-next 06/16] net: pcs: xpcs: Avoid creating dummy XPCS MDIO device
  [PATCH net-next 09/16] net: mdio: Add Synopsys DW XPCS management interface support
  [PATCH net-next 11/16] net: pcs: xpcs: Change xpcs_create_mdiodev() suffix to "byaddr"
  [PATCH net-next 13/16] net: stmmac: intel: Register generic MDIO device
  as no longer relevant.
- Add new patches:
  [PATCH net-next v2 03/10] net: pcs: xpcs: Convert xpcs_id to dw_xpcs_desc
  [PATCH net-next v2 04/10] net: pcs: xpcs: Convert xpcs_compat to dw_xpcs_compat
  [PATCH net-next v2 05/10] net: pcs: xpcs: Introduce DW XPCS info structure
  [PATCH net-next v2 09/10] net: stmmac: Create DW XPCS device with particular address
- Use the xpcs_create_fwnode() function name and semantics similar to the
  Lynx PCS driver.
- Add kdoc describing the DW XPCS registration functions.
- Convert the memory-mapped DW XPCS device driver to being the
  platform-device driver.
- Convert the DW XPCS DT-bindings to defining both memory-mapped and MDIO
  devices.
- Drop inline'es from the methods statically defined in *.c. (@Maxime)
- Preserve the strict refcount-ing pattern. (@Russell)

Link: https://lore.kernel.org/netdev/[email protected]/
Changelov v3:
- Implement the ordered clocks constraint. (@Rob)
- Convert xpcs_plat_pm_ops to being defined as static. (@Simon)
- Add the "@interface" argument kdoc to the xpcs_create_mdiodev()
  function. (@Simon)
- Fix the "@fwnode" argument name in the xpcs_create_fwnode() method kdoc.
  (@Simon)
- Move the return value descriptions to the "Return:" section of the
  xpcs_create_mdiodev() and xpcs_create_fwnode() kdoc. (@Simon)
- Drop stmmac_mdio_bus_data::has_xpcs flag and define the PCS-address
  mask with particular XPCS address instead.

Link: https://lore.kernel.org/netdev/[email protected]/
Changelog v4:
- Make sure the series is applicable to the net-next tree. (@Vladimir)
- Rename entry to desc in the xpcs_init_id() method. (@Andrew)
- Add a comment to the clock-names property constraint about the
  oneOf-subschemas applicability. (@Conor)
- Convert "pclk" clock name to "csr" to match the DW XPCS IP-core
  input signal name. (@Rob)
====================

Signed-off-by: David S. Miller <[email protected]>

net: stmmac: Add DW XPCS specified via "pcs-handle" support

Recently the DW XPCS DT-bindings have been introduced and the DW XPCS
driver has been altered to support the DW XPCS registered as a platform
device. In order to have the DW XPCS DT-device accessed from the STMMAC
driver let's alter the STMMAC PCS-setup procedure to support the
"pcs-handle" property containing the phandle reference to the DW XPCS
device DT-node. The respective fwnode will be then passed to the
xpcs_create_fwnode() function which in its turn will create the DW XPCS
descriptor utilized in the main driver for the PCS-related setups.

Signed-off-by: Serge Semin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: stmmac: Create DW XPCS device with particular address

Currently the only STMMAC platform driver using the DW XPCS code is the
Intel mGBE device driver. (It can be determined by finding all the drivers
having the stmmac_mdio_bus_data::has_xpcs flag set.) At the same time the
low-level platform driver masks out the DW XPCS MDIO-address from being
auto-detected as PHY by the MDIO subsystem core. Seeing the PCS MDIO ID is
known the procedure of the DW XPCS device creation can be simplified by
dropping the loop over all the MDIO IDs. From now the DW XPCS device
descriptor will be created for the MDIO-bus address pre-defined by the
platform drivers via the stmmac_mdio_bus_data::pcs_mask field.

Note besides this shall speed up a bit the Intel mGBE probing.

Signed-off-by: Serge Semin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: pcs: xpcs: Add fwnode-based descriptor creation method

It's now possible to have the DW XPCS device defined as a standard
platform device for instance in the platform DT-file. Although that
functionality is useless unless there is a way to have the device found by
the client drivers (STMMAC/DW *MAC, NXP SJA1105 Eth Switch, etc). Provide
such ability by means of the xpcs_create_fwnode() method. It needs to be
called with the device DW XPCS fwnode instance passed. That node will be
then used to find the MDIO-device instance in order to create the DW XPCS
descriptor.

Note the method semantics and name is similar to what has been recently
introduced in the Lynx PCS driver.

Signed-off-by: Serge Semin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: pcs: xpcs: Add Synopsys DW xPCS platform device driver

Synopsys DesignWare XPCS IP-core can be synthesized with the device CSRs
being accessible over the MCI or APB3 interface instead of the MDIO bus
(see the CSR_INTERFACE HDL parameter). Thus all the PCS registers can be
just memory mapped and be a subject of the standard MMIO operations of
course taking into account the peculiarities of the Clause C45 CSRs
mapping. From that perspective the DW XPCS devices would look as just
normal platform devices for the kernel.

On the other hand in order to have the DW XPCS devices handled by the
pcs-xpcs.c driver they need to be registered in the framework of the
MDIO-subsystem. So the suggested change is about providing a DW XPCS
platform device driver registering a virtual MDIO-bus with a single
MDIO-device representing the DW XPCS device.

DW XPCS platform device is supposed to be described by the respective
compatible string "snps,dw-xpcs" (or with the PMA-specific compatible
string), CSRs memory space and optional peripheral bus and reference clock
sources. Depending on the INDIRECT_ACCESS IP-core synthesize parameter the
memory-mapped reg-space can be represented as either directly or
indirectly mapped Clause 45 space. In the former case the particular
address is determined based on the MMD device and the registers offset (5
+ 16 bits all together) within the device reg-space. In the later case
there is only 8 lower address bits are utilized for the registers mapping
(255 CSRs). The upper bits are supposed to be written into the respective
viewport CSR in order to select the respective MMD sub-page.

Note, only the peripheral bus clock source is requested in the platform
device probe procedure. The core and pad clocks handling has been
implemented in the framework of the xpcs_create() method intentionally
since the clocks-related setups are supposed to be performed later, during
the DW XPCS main configuration procedures. (For instance they will be
required for the DW Gen5 10G PMA configuration.)

Signed-off-by: Serge Semin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

dt-bindings: net: Add Synopsys DW xPCS bindings

Synopsys DesignWare XPCS IP-core is a Physical Coding Sublayer (PCS) layer
providing an interface between the Media Access Control (MAC) and Physical
Medium Attachment Sublayer (PMA) through a Media independent interface.
From software point of view it exposes IEEE std. Clause 45 CSR space and
can be accessible either by MDIO or MCI/APB3 bus interfaces. In the former
case the PCS device is supposed to be defined under the respective MDIO
bus DT-node. In the later case the DW xPCS will be just a normal IO
memory-mapped device.

Besides of that DW XPCS DT-nodes can have an interrupt signal and clock
source properties specified. The former one indicates the Clause 73/37
auto-negotiation events like: negotiation page received, AN is completed
or incompatible link partner. The clock DT-properties can describe up to
three clock sources: peripheral bus clock source, internal reference clock
and the externally connected reference clock.

Finally the DW XPCS IP-core can be optionally synthesized with a
vendor-specific interface connected to the Synopsys PMA (also called
DesignWare Consumer/Enterprise PHY). Alas that isn't auto-detectable in a
portable way. So if the DW XPCS device has the respective PMA attached
then it should be reflected in the DT-node compatible string so the driver
would be aware of the PMA-specific device capabilities (mainly connected
with CSRs available for the fine-tunings).

Signed-off-by: Serge Semin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: pcs: xpcs: Introduce DW XPCS info structure

The being introduced structure will preserve the PCS and PMA IDs retrieved
from the respective DW XPCS MMDs or potentially pre-defined by the client
drivers. (The later change will be introduced later in the framework of
the commit adding the memory-mapped DW XPCS devices support.)

The structure fields are filled in in the xpcs_get_id() function, which
used to be responsible for the PCS Device ID getting only. Besides of the
PCS ID the method now fetches the PMA/PMD IDs too from MMD 1, which used
to be done in xpcs_dev_flag(). The retrieved PMA ID will be from now
utilized for the PMA-specific tweaks like it was introduced for the
Wangxun TxGBE PCS in the commit f629acc6f210 ("net: pcs: xpcs: support to
switch mode for Wangxun NICs").

Note 1. The xpcs_get_id() error-handling semantics has been changed. From
now the error number will be returned from the function. There is no point
in the next IOs or saving 0xffs and then looping over the actual device
IDs if device couldn't be reached. -ENODEV will be returned if the very
first IO operation failed thus indicating that no device could be found.

Note 2. The PCS and PMA IDs macros have been converted to enum'es. The
enum'es will be populated later in another commit with the virtual IDs
identifying the DW XPCS devices which have some platform-specifics, but
have been synthesized with the default PCS/PMA ID.

Signed-off-by: Serge Semin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: pcs: xpcs: Convert xpcs_compat to dw_xpcs_compat

The xpcs_compat structure has been left as the only dw-prefix-less
structure since the previous commit. Let's unify at least the structures
naming in the driver by adding the dw_-prefix to it.

Signed-off-by: Serge Semin <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: pcs: xpcs: Convert xpcs_id to dw_xpcs_desc

A structure with the PCS/PMA MMD IDs data is being introduced in one of
the next commits. In order to prevent the names ambiguity let's convert
the xpcs_id structure name to dw_xpcs_desc. The later version is more
suitable since the structure content is indeed the device descriptor
containing the data and callbacks required for the driver to correctly set
the device up.

Signed-off-by: Serge Semin <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: pcs: xpcs: Split up xpcs_create() body to sub-functions

As an initial preparation before adding the fwnode-based DW XPCS device
support let's split the xpcs_create() function code up to a set of the
small sub-functions. Thus the xpcs_create() implementation will get to
look simpler and turn to be more coherent. Further updates will just touch
the new sub-functions a bit: add platform-specific device info, add the
reference clock getting and enabling.

The xpcs_create() method will now contain the next static methods calls:

xpcs_create_data() - create the DW XPCS device descriptor, pre-initialize
it' fields and increase the mdio device refcount-er;

xpcs_init_id() - find XPCS ID instance and save it in the device
descriptor;

xpcs_init_iface() - find MAC/PCS interface descriptor and perform
basic initialization specific to it: soft-reset, disable polling.

The update doesn't imply any semantic change but merely makes the code
looking simpler and more ready for adding new features support.

Note the xpcs_destroy() has been moved to being defined below the
xpcs_create_mdiodev() function as the driver now implies having the
protagonist-then-antagonist functions definition order.

Signed-off-by: Serge Semin <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: pcs: xpcs: Move native device ID macro to linux/pcs/pcs-xpcs.h

One of the next commits will alter the DW XPCS driver to support setting a
custom device ID for the particular MDIO-device detected on the platform.
The generic DW XPCS ID can be used as a custom ID as well in case if the
DW XPCS-device was erroneously synthesized with no or some undefined ID.
In addition to that having all supported DW XPCS device IDs defined in a
single place will improve the code maintainability and readability.

Note while at it rename the macros to being shorter and looking alike to
the already defined NXP XPCS ID macro.

Signed-off-by: Serge Semin <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

dt-bindings: net: Define properties at top-level

Convention is DT schemas should define all properties at the top-level
and not inside of if/then schemas. That minimizes the if/then schemas
and is more future proof.

Signed-off-by: Rob Herring (Arm) <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Reviewed-by: Serge Semin <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ethtool: move firmware flashing flag to struct ethtool_netdev_state

Commit 31e0aa99dc02 ("ethtool: Veto some operations during firmware flashing process")
added a flag module_fw_flash_in_progress to struct net_device. As
this is ethtool related state, move it to the recently created
struct ethtool_netdev_state, accessed via the 'ethtool' member of
struct net_device.

Suggested-by: Jakub Kicinski <[email protected]>
Signed-off-by: Edward Cree <[email protected]>
Reviewed-by: Michal Kubiak <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/phy/aquantia/aquantia.h
  219343755eae ("net: phy: aquantia: add missing include guards")
  61578f679378 ("net: phy: aquantia: add support for PHY LEDs")

drivers/net/ethernet/wangxun/libwx/wx_hw.c
  bd07a9817846 ("net: txgbe: remove separate irq request for MSI and INTx")
  b501d261a5b3 ("net: txgbe: add FDIR ATR support")
https://lore.kernel.org/all/20240703112936.483c1975@canb.auug.org.au/

include/linux/mlx5/mlx5_ifc.h
  048a403648fc ("net/mlx5: IFC updates for changing max EQs")
  99be56171fa9 ("net/mlx5e: SHAMPO, Re-enable HW-GRO")
https://lore.kernel.org/all/20240701133951.6926b2e3@canb.auug.org.au/

Adjacent changes:

drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
  4130c67cd123 ("wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference")
  3f3126515fbe ("wifi: iwlwifi: mvm: add mvm-specific guard")

include/net/mac80211.h
  816c6bec09ed ("wifi: mac80211: fix BSS_CHANGED_UNSOL_BCAST_PROBE_RESP")
  5a009b42e041 ("wifi: mac80211: track changes in AP's TPE")

Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'crypto-caam-unembed-net_dev'

Breno Leitao says:

====================
crypto: caam: Unembed net_dev

This will un-embed the net_device struct from inside other struct, so we
can add flexible array into net_device.

This also enable COMPILE test for FSL_CAAM, as any config option that
depends on ARCH_LAYERSCAPE.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

crypto: caam: Unembed net_dev structure in dpaa2

Embedding net_device into structures prohibits the usage of flexible
arrays in the net_device structure. For more details, see the discussion
at [1].

Un-embed the net_devices from struct dpaa2_caam_priv_per_cpu by
converting them into pointers, and allocating them dynamically. Use the
leverage alloc_netdev_dummy() to allocate the net_device object at
dpaa2_dpseci_setup().

The free of the device occurs at dpaa2_dpseci_disable().

Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Breno Leitao <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

crypto: caam: Unembed net_dev structure from qi

Embedding net_device into structures prohibits the usage of flexible
arrays in the net_device structure. For more details, see the discussion
at [1].

Un-embed the net_devices from struct caam_qi_pcpu_priv by converting them
into pointers, and allocating them dynamically. Use the leverage
alloc_netdev_dummy() to allocate the net_device object at
caam_qi_init().

The free of the device occurs at caam_qi_shutdown().

Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Breno Leitao <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

crypto: caam: Make CRYPTO_DEV_FSL_CAAM dependent of COMPILE_TEST

As most of the drivers that depend on ARCH_LAYERSCAPE, make
CRYPTO_DEV_FSL_CAAM depend on COMPILE_TEST for compilation and testing.

# grep -r depends.\*ARCH_LAYERSCAPE.\*COMPILE_TEST | wc -l
29

Signed-off-by: Breno Leitao <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

crypto: caam: Avoid unused imx8m_machine_match variable

If caam module is built without OF support, the compiler returns the
following warning:

drivers/crypto/caam/ctrl.c:83:34: warning: 'imx8m_machine_match' defined but not used [-Wunused-const-variable=]

imx8m_machine_match is only referenced by of_match_node(), which is set
to NULL if CONFIG_OF is not set, as of commit 5762c20593b6b ("dt: Add
empty of_match_node() macro"):

#define of_match_node(_matches, _node) NULL

Do not create imx8m_machine_match if CONFIG_OF is not set.

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Suggested-by: Jakub Kicinski <[email protected]>
Signed-off-by: Breno Leitao <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>