When creating a helper to allocate and align an skb one location where
the skb data size was updated was missed. This can lead to a warning
being printed when the memory is being unmapped as it now always unmap
the maximum frame size, instead of the size after it have been
aligned.
This was correctly done for RZ/G2L but missed for R-Car.
Breno Leitao [Fri, 8 Mar 2024 16:26:05 +0000 (08:26 -0800)]
net: amt: Remove generic .ndo_get_stats64
Commit 3e2f544dd8a33 ("net: get stats64 if device if driver is
configured") moved the callback to dev_get_tstats64() to net core, so,
unless the driver is doing some custom stats collection, it does not
need to set .ndo_get_stats64.
Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it
doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64
function pointer.
Breno Leitao [Fri, 8 Mar 2024 16:26:04 +0000 (08:26 -0800)]
net: amt: Move stats allocation to core
With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core instead
of this driver.
With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.
Jakub Kicinski [Fri, 8 Mar 2024 19:03:19 +0000 (11:03 -0800)]
netlink: specs: support generating code for genl socket priv
The family struct is auto-generated for new families, support
use of the sock_priv_* mechanism added in commit a731132424ad
("genetlink: introduce per-sock family private storage").
For example if the family wants to use struct sk_buff as its
private struct (unrealistic but just for illustration), it would
add to its spec:
====================
selftests: mptcp: various improvements
In this series from Geliang, there are various improvements in MPTCP
selftests: sharing code, doing actions the same way, colours, etc.
Patch 1 prints all error messages to stdout: what was done in almost all
other MPTCP selftests. This can be now easily changed later if needed.
Patch 2 makes sure the test counter is continuous in mptcp_connect.sh.
Patch 3 aligns the messages that are printed in mptcp_connect.sh.
Patch 4 prints each test results in mptcp_sockopt.sh, similar to what we
have in the TAP output.
Patch 5 moves the different test counters to a single one in
mptcp_lib.sh, to uniform how it is used.
Patch 6 moves how titles are printed from mptcp_join.sh to the lib, to
be reused in patch 7 by all other MPTCP selftests.
Patch 8 uses the '+=' operator to append strings instead of repeating
twice the variable name: that's shorter, easier to read.
Patch 9 adds colours for the [ OK ], [SKIP], [FAIL] and INFO keywords in
all MPTCP selftests.
Patch 10 to 12 are some preparation patches for patch 13: patch 10
modifies how some 'test_fail' helpers, patch 11 moves a helper from
userspace_pm.sh to the lib, and patch 12 changes where titles are
printed in userspace_pm.sh. Patch 13 moves some duplicated helpers from
mptcp_join.sh and userspace_pm.sh to mptcp_lib.sh.
Patch 14 moves duplicated read-only variables from mptcp_join.sh and
userspace_pm.sh to mptcp_lib.sh as well.
Patch 15 uses explicit variables instead of hard-coded numbers for the
exit status.
====================
Geliang Tang [Fri, 8 Mar 2024 22:10:21 +0000 (23:10 +0100)]
selftests: mptcp: declare event macros in mptcp_lib
MPTCP event macros (SUB_ESTABLISHED, LISTENER_CREATED, LISTENER_CLOSED),
and the protocol family macros (AF_INET, AF_INET6) are defined in both
mptcp_join.sh and userspace_pm.sh. In order not to duplicate code, this
patch declares them all in mptcp_lib.sh with MPTCP_LIB_ prefixs.
To avoid duplicated code in different MPTCP selftests, we can add and use
helpers defined in mptcp_lib.sh.
The helper verify_listener_events() is defined both in mptcp_join.sh and
userspace_pm.sh, export it into mptcp_lib.sh and rename it with mptcp_lib_
prefix. Use this new helper in both scripts.
Geliang Tang [Fri, 8 Mar 2024 22:10:19 +0000 (23:10 +0100)]
selftests: mptcp: print_test out of verify_listener_events
verify_listener_events() helper will be exported into mptcp_lib.sh as a
public function, but print_test() is invoked in it, which is a private
function in userspace_pm.sh only. So this patch moves print_test() out of
verify_listener_events().
Extract the main part of check_expected() in userspace_pm.sh to a new
function mptcp_lib_check_expected() in mptcp_lib.sh. It will be used
in both mptcp_john.sh and userspace_pm.sh. check_expected_one() is
moved into mptcp_lib.sh too as mptcp_lib_check_expected_one().
Geliang Tang [Fri, 8 Mar 2024 22:10:17 +0000 (23:10 +0100)]
selftests: mptcp: call test_fail without argument
This patch modifies test_fail() to call mptcp_lib_pr_fail() only if there
are arguments (if [ ${#} -gt 0 ]) in userspace_pm.sh, add arguments
"unexpected type: ${type}" when calling test_fail() from test_remove().
Then mptcp_lib_pr_fail() can be used in check_expected_one() instead of
test_fail().
The same in mptcp_join.sh, calling fail_test() without argument, and adapt
this helper not to call print_fail() in this case.
Geliang Tang [Fri, 8 Mar 2024 22:10:15 +0000 (23:10 +0100)]
selftests: mptcp: use += operator to append strings
This patch uses addition assignment operator (+=) to append strings
instead of duplicating the variable name in mptcp_connect.sh and
mptcp_join.sh.
This can make the statements shorter.
Note: in mptcp_connect.sh, add a local variable extra in do_transfer to
save the various extra warning logs, using += to append it. And add a
new variable tc_info to save various tc info, also using += to append it.
This can make the code more readable and prepare for the next commit.
Geliang Tang [Fri, 8 Mar 2024 22:10:14 +0000 (23:10 +0100)]
selftests: mptcp: print test results with counters
This patch adds a new helper mptcp_lib_print_title(), a wrapper of
mptcp_lib_inc_test_counter() and mptcp_lib_pr_title_counter(), to
print out test counter in each test result and increase the counter.
Use this helper to print out test counters for every tests in diag.sh,
mptcp_connect.sh, mptcp_sockopt.sh, pm_netlink.sh, simult_flows.sh,
and userspace_pm.sh.
diag.sh:
01 no msk on netns creation [ ok ]
02 listen match for dport 10000 [ ok ]
03 listen match for sport 10000 [ ok ]
04 listen match for saddr and sport [ ok ]
05 all listen sockets [ ok ]
mptcp_connect.sh:
01 New MPTCP socket can be blocked via sysctl [ OK ]
02 Validating network environment with pings [ OK ]
INFO: Using loss of 0.85% delay 31 ms reorder .. with delay 7ms on ns3eth4
03 ns1 MPTCP -> ns1 (10.0.1.1:10000 ) MPTCP (duration 69ms) [ OK ]
04 ns1 MPTCP -> ns1 (10.0.1.1:10001 ) TCP (duration 20ms) [ OK ]
05 ns1 TCP -> ns1 (10.0.1.1:10002 ) MPTCP (duration 16ms) [ OK ]
mptcp_sockopt.sh:
01 Transfer v4 [ OK ]
02 Mark v4 [ OK ]
03 Transfer v6 [ OK ]
04 Mark v6 [ OK ]
05 SOL_MPTCP sockopt v4 [ OK ]
pm_netlink.sh:
01 defaults addr list [ OK ]
02 simple add/get addr [ OK ]
03 dump addrs [ OK ]
04 simple del addr [ OK ]
05 dump addrs after del [ OK ]
simult_flows.sh:
01 balanced bwidth 7391 max 8456 [ OK ]
02 balanced bwidth - reverse direction 7403 max 8456 [ OK ]
03 balanced bwidth with unbalanced delay 7429 max 8456 [ OK ]
04 balanced bwidth with unbalanced delay - reverse ... 7485 max 8456 [ OK ]
05 unbalanced bwidth 7549 max 8456 [ OK ]
userspace_pm.sh:
01 Created network namespaces ns1, ns2 [ OK ]
INFO: Make connections
02 Established IPv4 MPTCP Connection ns2 => ns1 [ OK ]
03 Established IPv6 MPTCP Connection ns2 => ns1 [ OK ]
INFO: Announce tests
04 ADD_ADDR 10.0.2.2 (ns2) => ns1, invalid token [ OK ]
05 ADD_ADDR id:67 10.0.2.2 (ns2) => ns1, reuse port [ OK ]
Having test counters helps to quickly identify issues when looking at a
long list of output logs and results.
Geliang Tang [Fri, 8 Mar 2024 22:10:13 +0000 (23:10 +0100)]
selftests: mptcp: add print_title in mptcp_lib
This patch adds a new variable MPTCP_LIB_TEST_FORMAT as the test title
printing format. Also add a helper mptcp_lib_print_title() to use this
format to print the test title with test counters. They are used in
mptcp_join.sh first.
Each MPTCP selftest is having subtests, and it helps to give them a
number to quickly identify them. This can be managed by mptcp_lib.sh,
reusing what has been done here. The following commit will use these
new helpers in the other tests.
Geliang Tang [Fri, 8 Mar 2024 22:10:12 +0000 (23:10 +0100)]
selftests: mptcp: export TEST_COUNTER variable
Variable TEST_COUNT are used in mptcp_connect.sh and mptcp_join.sh as
test counters, which are initialized to 0, while variable test_cnt are used
in diag.sh and simult_flows.sh, which are initialized to 1. To maintain
consistency, this patch renames them all as MPTCP_LIB_TEST_COUNTER,
initializes it to 1, and exports it into mptcp_lib.sh.
Geliang Tang [Fri, 8 Mar 2024 22:10:11 +0000 (23:10 +0100)]
selftests: mptcp: sockopt: print every test result
Only total test results are printed out in mptcp_sockopt.sh:
PASS: all packets had packet mark set
PASS: SOL_MPTCP getsockopt has expected information
PASS: TCP_INQ cmsg/ioctl -t tcp
PASS: TCP_INQ cmsg/ioctl -6 -t tcp
PASS: TCP_INQ cmsg/ioctl -r tcp
PASS: TCP_INQ cmsg/ioctl -6 -r tcp
PASS: TCP_INQ cmsg/ioctl -r tcp -t tcp
They mismatch with the test results:
ok 1 - mptcp_sockopt: mark ipv4
ok 2 - mptcp_sockopt: transfer ipv4
ok 3 - mptcp_sockopt: mark ipv6
ok 4 - mptcp_sockopt: transfer ipv6
ok 5 - mptcp_sockopt: sockopt v4
ok 6 - mptcp_sockopt: sockopt v6
ok 7 - mptcp_sockopt: TCP_INQ: -t tcp
ok 8 - mptcp_sockopt: TCP_INQ: -6 -t tcp
ok 9 - mptcp_sockopt: TCP_INQ: -r tcp
ok 10 - mptcp_sockopt: TCP_INQ: -6 -r tcp
ok 11 - mptcp_sockopt: TCP_INQ: -r tcp -t tcp
'mptcp_sockopt.sh' now display more detailed results + why (what you had
in a former patch from v6, merged here). It no longer displays 'PASS:',
because it is duplicated info now that the detailed are displayed:
Transfer v4 [ OK ]
Mark v4 [ OK ]
Transfer v6 [ OK ]
Mark v6 [ OK ]
SOL_MPTCP sockopt v4 [ OK ]
SOL_MPTCP sockopt v6 [ OK ]
TCP_INQ cmsg/ioctl -t tcp [ OK ]
TCP_INQ cmsg/ioctl -6 -t tcp [ OK ]
TCP_INQ cmsg/ioctl -r tcp [ OK ]
TCP_INQ cmsg/ioctl -6 -r tcp [ OK ]
TCP_INQ cmsg/ioctl -r tcp -t tcp [ OK ]
Also fix the TAP output:
ok 1 - mptcp_sockopt: transfer ipv4
ok 2 - mptcp_sockopt: mark ipv4
ok 3 - mptcp_sockopt: transfer ipv6
ok 4 - mptcp_sockopt: mark ipv6
ok 5 - mptcp_sockopt: sockopt v4
ok 6 - mptcp_sockopt: sockopt v6
ok 7 - mptcp_sockopt: TCP_INQ: -t tcp
ok 8 - mptcp_sockopt: TCP_INQ: -6 -t tcp
ok 9 - mptcp_sockopt: TCP_INQ: -r tcp
ok 10 - mptcp_sockopt: TCP_INQ: -6 -r tcp
ok 11 - mptcp_sockopt: TCP_INQ: -r tcp -t tcp
Geliang Tang [Fri, 8 Mar 2024 22:10:10 +0000 (23:10 +0100)]
selftests: mptcp: connect: fix misaligned output
The first [ OK ] in the output of mptcp_connect.sh misaligns with the
others:
New MPTCP socket can be blocked via sysctl [ OK ]
INFO: validating network environment with pings
INFO: Using loss of 0.85% delay 16 ms reorder 95% 70% with delay 4ms on
ns1 MPTCP -> ns1 (10.0.1.1:10000 ) MPTCP (duration 184ms) [ OK ]
ns1 MPTCP -> ns1 (10.0.1.1:10001 ) TCP (duration 50ms) [ OK ]
ns1 TCP -> ns1 (10.0.1.1:10002 ) MPTCP (duration 55ms) [ OK ]
This patch aligns them by using 69 chars to display the first two lines,
and 50 chars for the other. Since 19 chars are used to display duration
time. Also print out a [ OK ] at the end of the 2nd line for consistency.
Geliang Tang [Fri, 8 Mar 2024 22:10:08 +0000 (23:10 +0100)]
selftests: mptcp: print all error messages to stdout
Some error messages are printed to stderr while the others are printed
to 'stdout'. As part of the unification, this patch drop "1>&2" to let
all errors messages are printed to 'stdout'.
net: wan: framer/pef2256: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
====================
mlxsw: Support for nexthop group statistics
ECMP is a fundamental component in L3 designs. However, it's fragile. Many
factors influence whether an ECMP group will operate as intended: hash
policy (i.e. the set of fields that contribute to ECMP hash calculation),
neighbor validity, hash seed (which might lead to polarization) or the type
of ECMP group used (hash-threshold or resilient).
At the same time, collection of statistics that would help an operator
determine that the group performs as desired, is difficult.
Support for nexthop group statistics and their HW collection has been
introduced recently. In this patch set, add HW stats collection support
to mlxsw.
This patchset progresses as follows:
- Patches #1 and #2 add nexthop IDs to notifiers.
- Patches #3 and #4 are code-shaping.
- Patches #5, #6 and #7 adjust the flow counter code.
- Patches #8 and #9 add HW nexthop counters.
- Patch #10 adjusts the HW counter code to allow sharing the same counter
for several resilient group buckets with the same NH ID.
- Patch #11 adds a selftest.
====================
Petr Machata [Fri, 8 Mar 2024 12:59:55 +0000 (13:59 +0100)]
selftests: forwarding: Add a test for NH group stats
Add to lib.sh support for fetching NH stats, and a new library,
router_mpath_nh_lib.sh, with the common code for testing NH stats.
Use the latter from router_mpath_nh.sh and router_mpath_nh_res.sh.
The test works by sending traffic through a NH group, and checking that the
reported values correspond to what the link that ultimately receives the
traffic reports having seen.
Petr Machata [Fri, 8 Mar 2024 12:59:54 +0000 (13:59 +0100)]
mlxsw: spectrum_router: Share nexthop counters in resilient groups
For resilient groups, we can reuse the same counter for all the buckets
that share the same nexthop. Keep a reference count per counter, and keep
all these counters in a per-next hop group xarray, which serves as a
NHID->counter cache. If a counter is already present for a given NHID, just
take a reference and use the same counter.
Petr Machata [Fri, 8 Mar 2024 12:59:53 +0000 (13:59 +0100)]
mlxsw: spectrum_router: Support nexthop group hardware statistics
When hw_stats is set on a group, install nexthop counters on members of a
group.
Counter allocation request is moved from nexthop object initialization to
the update code. The previous placement made sense: when the counters are
enabled by dpipe, the counters are installed to all existing nexthops and
all nexthops created from then on get them. For the finer-grained nexthop
group statistics, this is unsuitable. The existing placement was kept for
the IPv4 and IPv6 nexthops.
Resilient group replacement emits a pre_replace notification, and then any
bucket_replace notifications if there were any replacements at all. If the
group is balanced and the nexthop composition of the replaced group didn't
change, there will be no such notifiers. Therefore hook to the pre_replace
notifier and mark all buckets for update, to un/install the counters.
When reporting deltas for resilient groups, use the nexthop ID that we
stored in a previous patch to look up to which nexthop a bucket
contributes.
Petr Machata [Fri, 8 Mar 2024 12:59:52 +0000 (13:59 +0100)]
mlxsw: spectrum_router: Track NH ID's of group members
The core interfaces for collecting per-NH statistics are built around
nexthops even for resilient groups. Because mlxsw models each bucket as a
nexthop, the core next hop that a given bucket contributes to needs to be
looked up. In order to be able to match the two up, we need to track
nexthop ID for members of group nexthop objects. For simplicity, do it for
all nexthop objects, not just group members.
Petr Machata [Fri, 8 Mar 2024 12:59:51 +0000 (13:59 +0100)]
mlxsw: spectrum_router: Add helpers for nexthop counters
The next patch will add the ability to share nexthop counters among
mlxsw nexthops backed by the same core nexthop. To have a place to store
reference count, the counter should be kept in a dedicated structure. In
this patch, introduce the structure together with the related helpers, sans
the refcount, which comes in the next patch.
mlxsw_sp_nexthop_counter_disable() decays to a nop when called on a
disabled counter, but mlxsw_sp_nexthop_counter_enable() can't similarly
be called on an enabled counter. This would be useful in the following
patches. Add the missing condition.
Petr Machata [Fri, 8 Mar 2024 12:59:49 +0000 (13:59 +0100)]
mlxsw: spectrum: Allow fetch-and-clear of flow counters
For the report_delta-like interface like a previous patch has added for
collection of NH group statistics, it's easiest to read the counter and
have the HW clear it right away. Thus, change mlxsw_sp_flow_counter_get()
to take a bool indicating whether this should be done.
Petr Machata [Fri, 8 Mar 2024 12:59:47 +0000 (13:59 +0100)]
mlxsw: spectrum_router: Rename two functions
The function mlxsw_sp_nexthop_counter_alloc() doesn't directly allocate
anything, and mlxsw_sp_nexthop_counter_free() doesn't directly free. For
the following patches, we will need names for functions that actually do
those things. Therefore rename to mlxsw_sp_nexthop_counter_enable() and
mlxsw_sp_nexthop_counter_disable() to free up the namespace.
Petr Machata [Fri, 8 Mar 2024 12:59:46 +0000 (13:59 +0100)]
net: nexthop: Have all NH notifiers carry NH ID
When sending the notifications to collect NH statistics for resilient
groups, the driver will need to know the nexthop IDs in individual buckets
to look up the right counter. To that end, move the nexthop ID from struct
nh_notifier_grp_entry_info to nh_notifier_single_info.
Petr Machata [Fri, 8 Mar 2024 12:59:45 +0000 (13:59 +0100)]
net: nexthop: Initialize NH group ID in resilient NH group notifiers
The NEXTHOP_EVENT_RES_TABLE_PRE_REPLACE notifier currently keeps the group
ID unset. That makes it impossible to look up the group for which the
notifier is intended. This is not an issue at the moment, because the only
client is netdevsim, and that just so that it veto replacements, which is a
static property not tied to a particular group. But for any practical use,
the ID is necessary. Set it.
Matthew Wood [Fri, 8 Mar 2024 00:25:24 +0000 (16:25 -0800)]
net: netconsole: Add continuation line prefix to userdata messages
Add a space (' ') prefix to every userdata line to match docs for
dev-kmsg. To account for this extra character in each userdata entry,
reduce userdata entry names (directory name) from 54 characters to 53.
According to the dev-kmsg docs, a space is used for subsequent lines to
mark them as continuation lines.
> A line starting with ' ', is a continuation line, adding
> key/value pairs to the log message, which provide the machine
> readable context of the message, for reliable processing in
> userspace.
Heiner Kallweit [Thu, 7 Mar 2024 21:16:12 +0000 (22:16 +0100)]
net: phy: simplify a check in phy_check_link_status
Handling case err == 0 in the other branch allows to simplify the
code. In addition I assume in "err & phydev->eee_cfg.tx_lpi_enabled"
it should have been a logical and operator. It works as expected also
with the bitwise and, but using a bitwise and with a bool value looks
ugly to me.
Justin Swartz [Tue, 5 Mar 2024 04:39:51 +0000 (06:39 +0200)]
net: dsa: mt7530: disable LEDs before reset
Disable LEDs just before resetting the MT7530 to avoid
situations where the ESW_P4_LED_0 and ESW_P3_LED_0 pin
states may cause an unintended external crystal frequency
to be selected.
The HT_XTAL_FSEL (External Crystal Frequency Selection)
field of HWTRAP (the Hardware Trap register) stores a
2-bit value that represents the state of the ESW_P4_LED_0
and ESW_P4_LED_0 pins (seemingly) sampled just after the
MT7530 has been reset, as:
The value of HT_XTAL_FSEL is bootstrapped by pulling
ESW_P4_LED_0 and ESW_P3_LED_0 up or down accordingly,
but:
if a 40MHz crystal has been selected and
the ESW_P3_LED_0 pin is high during reset,
or a 20MHz crystal has been selected and
the ESW_P4_LED_0 pin is high during reset,
then the value of HT_XTAL_FSEL will indicate
that a 25MHz crystal is present.
By default, the state of the LED pins is PHY controlled
to reflect the link state.
To illustrate, if a board has:
5 ports with active low LED control,
and HT_XTAL_FSEL bootstrapped for 40MHz.
When the MT7530 is powered up without any external
connection, only the LED associated with Port 3 is
illuminated as ESW_P3_LED_0 is low.
In this state, directly after mt7530_setup()'s reset
is performed, the HWTRAP register (0x7800) reflects
the intended HT_XTAL_FSEL (HWTRAP bits 10:9) of 40MHz:
Since commit 43a7206b0963 ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the ptp_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.
Note that outstanding refs (89088) == slow alloc * cache size (1392 * 64)
which means this machine is recycling page pool pages perfectly, not
a single page has been released.
The extra 0.3% is because sample ignores allocations from the ptr_ring.
Treat those the same as alloc_fast, the ring vs cache alloc is
already captured accurately enough by recycling stats.
Eric Dumazet [Thu, 7 Mar 2024 22:00:16 +0000 (22:00 +0000)]
udp: no longer touch sk->sk_refcnt in early demux
After commits ca065d0cf80f ("udp: no longer use SLAB_DESTROY_BY_RCU")
and 7ae215d23c12 ("bpf: Don't refcount LISTEN sockets in sk_assign()")
UDP early demux no longer need to grab a refcount on the UDP socket.
This save two atomic operations per incoming packet for connected
sockets.
Gavrilov Ilia [Thu, 7 Mar 2024 14:23:50 +0000 (14:23 +0000)]
net/x25: fix incorrect parameter validation in the x25_getsockopt() function
The 'len' variable can't be negative when assigned the result of
'min_t' because all 'min_t' parameters are cast to unsigned int,
and then the minimum one is chosen.
To fix the logic, check 'len' as read from 'optlen',
where the types of relevant variables are (signed) int.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Gavrilov Ilia <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Gavrilov Ilia [Thu, 7 Mar 2024 14:23:50 +0000 (14:23 +0000)]
net: kcm: fix incorrect parameter validation in the kcm_getsockopt) function
The 'len' variable can't be negative when assigned the result of
'min_t' because all 'min_t' parameters are cast to unsigned int,
and then the minimum one is chosen.
To fix the logic, check 'len' as read from 'optlen',
where the types of relevant variables are (signed) int.
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Signed-off-by: Gavrilov Ilia <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Gavrilov Ilia [Thu, 7 Mar 2024 14:23:50 +0000 (14:23 +0000)]
udp: fix incorrect parameter validation in the udp_lib_getsockopt() function
The 'len' variable can't be negative when assigned the result of
'min_t' because all 'min_t' parameters are cast to unsigned int,
and then the minimum one is chosen.
To fix the logic, check 'len' as read from 'optlen',
where the types of relevant variables are (signed) int.
Gavrilov Ilia [Thu, 7 Mar 2024 14:23:50 +0000 (14:23 +0000)]
l2tp: fix incorrect parameter validation in the pppol2tp_getsockopt() function
The 'len' variable can't be negative when assigned the result of
'min_t' because all 'min_t' parameters are cast to unsigned int,
and then the minimum one is chosen.
To fix the logic, check 'len' as read from 'optlen',
where the types of relevant variables are (signed) int.
Gavrilov Ilia [Thu, 7 Mar 2024 14:23:50 +0000 (14:23 +0000)]
ipmr: fix incorrect parameter validation in the ip_mroute_getsockopt() function
The 'olr' variable can't be negative when assigned the result of
'min_t' because all 'min_t' parameters are cast to unsigned int,
and then the minimum one is chosen.
To fix the logic, check 'olr' as read from 'optlen',
where the types of relevant variables are (signed) int.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Gavrilov Ilia <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Gavrilov Ilia [Thu, 7 Mar 2024 14:23:49 +0000 (14:23 +0000)]
tcp: fix incorrect parameter validation in the do_tcp_getsockopt() function
The 'len' variable can't be negative when assigned the result of
'min_t' because all 'min_t' parameters are cast to unsigned int,
and then the minimum one is chosen.
To fix the logic, check 'len' as read from 'optlen',
where the types of relevant variables are (signed) int.
David S. Miller [Mon, 11 Mar 2024 09:36:11 +0000 (09:36 +0000)]
Merge branch 'qmc-hdlc'
Herve Codina says:
====================
Add support for QMC HDLC
This series introduces the QMC HDLC support.
Patches were previously sent as part of a full feature series and were
previously reviewed in that context:
"Add support for QMC HDLC, framer infrastructure and PEF2256 framer" [1]
In order to ease the merge, the full feature series has been split and
needed parts were merged in v6.8-rc1:
- "Prepare the PowerQUICC QMC and TSA for the HDLC QMC driver" [2]
- "Add support for framer infrastructure and PEF2256 framer" [3]
This series contains patches related to the QMC HDLC part (QMC HDLC
driver):
- Introduce the QMC HDLC driver (patches 1 and 2)
- Add timeslots change support in QMC HDLC (patch 3)
- Add framer support as a framer consumer in QMC HDLC (patch 4)
Compare to the original full feature series, a modification was done on
patch 3 in order to use a coherent prefix in the commit title.
I kept the patches unsquashed as they were previously sent and reviewed.
Of course, I can squash them if needed.
Compared to the previous iteration:
https://lore.kernel.org/linux-kernel/20240306080726[email protected]/
this v7 series mainly:
- Rename a variable.
- Fix reverse xmas tree declarations.
- Add 'Acked-by' tag.
====================
Herve Codina [Thu, 7 Mar 2024 11:39:08 +0000 (12:39 +0100)]
net: wan: fsl_qmc_hdlc: Add framer support
Add framer support in the fsl_qmc_hdlc driver in order to be able to
signal carrier changes to the network stack based on the framer status
Also use this framer to provide information related to the E1/T1 line
interface on IF_GET_IFACE and configure the line interface according to
IF_IFACE_{E1,T1} information.
David S. Miller [Mon, 11 Mar 2024 09:33:01 +0000 (09:33 +0000)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ethtool: ice: Support for RSS settings to GTP
Takeru Hayasaka enables RSS functionality for GTP packets on ice driver
with ethtool.
A user can include TEID and make RSS work for GTP-U over IPv4 by doing the
following:`ethtool -N ens3 rx-flow-hash gtpu4 sde`
In addition to gtpu(4|6), we now support gtpc(4|6),gtpc(4|6)t,gtpu(4|6)e,
gtpu(4|6)u, and gtpu(4|6)d.
gtpc(4|6): Used for GTP-C in IPv4 and IPv6, where the GTP header format does
not include a TEID.
gtpc(4|6)t: Used for GTP-C in IPv4 and IPv6, with a GTP header format that
includes a TEID.
gtpu(4|6): Used for GTP-U in both IPv4 and IPv6 scenarios.
gtpu(4|6)e: Used for GTP-U with extended headers in both IPv4 and IPv6.
gtpu(4|6)u: Used when the PSC (PDU session container) in the GTP-U extended
header includes Uplink, applicable to both IPv4 and IPv6.
gtpu(4|6)d: Used when the PSC in the GTP-U extended header includes Downlink,
for both IPv4 and IPv6.
====================
Jakub Kicinski [Sat, 9 Mar 2024 04:45:17 +0000 (20:45 -0800)]
Merge tag 'mlx5-socket-direct-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Support Multi-PF netdev (Socket Direct)
This series adds support for combining multiple devices (PFs) of the
same port under one netdev instance. Passing traffic through different
devices belonging to different NUMA sockets saves cross-numa traffic and
allows apps running on the same netdev from different numas to still
feel a sense of proximity to the device and achieve improved
performance.
We achieve this by grouping PFs together, and creating the netdev only
once all group members are probed. Symmetrically, we destroy the netdev
once any of the PFs is removed.
The channels are distributed between all devices, a proper configuration
would utilize the correct close numa when working on a certain app/cpu.
We pick one device to be a primary (leader), and it fills a special
role. The other devices (secondaries) are disconnected from the network
in the chip level (set to silent mode). All RX/TX traffic is steered
through the primary to/from the secondaries.
Currently, we limit the support to PFs only, and up to two devices
(sockets).
* tag 'mlx5-socket-direct-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
Documentation: networking: Add description for multi-pf netdev
net/mlx5: Enable SD feature
net/mlx5e: Block TLS device offload on combined SD netdev
net/mlx5e: Support per-mdev queue counter
net/mlx5e: Support cross-vhca RSS
net/mlx5e: Let channels be SD-aware
net/mlx5e: Create EN core HW resources for all secondary devices
net/mlx5e: Create single netdev per SD group
net/mlx5: SD, Add debugfs
net/mlx5: SD, Add informative prints in kernel log
net/mlx5: SD, Implement steering for primary and secondaries
net/mlx5: SD, Implement devcom communication and primary election
net/mlx5: SD, Implement basic query and instantiation
net/mlx5: SD, Introduce SD lib
net/mlx5: Add MPIR bit in mcam_access_reg
====================
Jakub Kicinski [Sat, 9 Mar 2024 04:37:32 +0000 (20:37 -0800)]
Merge tag 'for-net-next-2024-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Luiz Augusto von Dentz says:
====================
bluetooth-next pull request for net-next:
- hci_conn: Only do ACL connections sequentially
- hci_core: Cancel request on command timeout
- Remove CONFIG_BT_HS
- btrtl: Add the support for RTL8852BT/RTL8852BE-VT
- btusb: Add support Mediatek MT7920
- btusb: Add new VID/PID 13d3/3602 for MT7925
- Add new quirk for broken read key length on ATS2851
* tag 'for-net-next-2024-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (52 commits)
Bluetooth: hci_sync: Fix UAF in hci_acl_create_conn_sync
Bluetooth: Fix eir name length
Bluetooth: ISO: Align broadcast sync_timeout with connection timeout
Bluetooth: Add new quirk for broken read key length on ATS2851
Bluetooth: mgmt: remove NULL check in add_ext_adv_params_complete()
Bluetooth: mgmt: remove NULL check in mgmt_set_connectable_complete()
Bluetooth: btusb: Add support Mediatek MT7920
Bluetooth: btmtk: Add MODULE_FIRMWARE() for MT7922
Bluetooth: btnxpuart: Fix btnxpuart_close
Bluetooth: ISO: Clean up returns values in iso_connect_ind()
Bluetooth: fix use-after-free in accessing skb after sending it
Bluetooth: af_bluetooth: Fix deadlock
Bluetooth: bnep: Fix out-of-bound access
Bluetooth: btusb: Fix memory leak
Bluetooth: msft: Fix memory leak
Bluetooth: hci_core: Fix possible buffer overflow
Bluetooth: btrtl: fix out of bounds memory access
Bluetooth: hci_h5: Add ability to allocate memory for private data
Bluetooth: hci_sync: Fix overwriting request callback
Bluetooth: hci_sync: Use QoS to determine which PHY to scan
...
====================
Various cross tree patches for ieee802154v drivers and a resource leak
fix for ieee802154 llsec.
Andy Shevchenko changed GPIO header usage for at86rf230 and mcr20a to
only include needed headers.
Bo Liu converted the at86rf230, mcr20a and mrf24j40 driver regmap
support to use the maple tree register cache.
Fedor Pchelkin fixed a resource leak in the llsec key deletion path.
Ricardo B. Marliere made wpan_phy_class const.
Tejun Heo removed WQ_UNBOUND from a workqueue call in ca8210.
* tag 'ieee802154-for-net-next-2024-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan-next:
ieee802154: cfg802154: make wpan_phy_class constant
ieee802154: mcr20a: Remove unused of_gpio.h
ieee802154: at86rf230: Replace of_gpio.h by proper one
mac802154: fix llsec key resources release in mac802154_llsec_key_del
ieee802154: ca8210: Drop spurious WQ_UNBOUND from alloc_ordered_workqueue() call
net: ieee802154: mrf24j40: convert to use maple tree register cache
net: ieee802154: mcr20a: convert to use maple tree register cache
net: ieee802154: at86rf230: convert to use maple tree register cache
====================
Ido Schimmel [Thu, 7 Mar 2024 15:47:27 +0000 (17:47 +0200)]
nexthop: Simplify dump error handling
The only error that can happen during a nexthop dump is insufficient
space in the skb caring the netlink messages (EMSGSIZE). If this happens
and some messages were already filled in, the nexthop code returns the
skb length to signal the netlink core that more objects need to be
dumped.
After commit b5a899154aa9 ("netlink: handle EMSGSIZE errors in the
core") there is no need to handle this error in the nexthop code as it
is now handled in the core.
Simplify the code and simply return the error to the core.
Eric Dumazet [Thu, 7 Mar 2024 12:34:46 +0000 (12:34 +0000)]
net: add skb_data_unref() helper
Similar to skb_unref(), add skb_data_unref() to save an expensive
atomic operation (and cache line dirtying) when last reference
on shinfo->dataref is released.
I saw this opportunity on hosts with RAW sockets accidentally
bound to UDP protocol, forcing an skb_clone() on all received packets.
These RAW sockets had their receive queue full, so all clone
packets were immediately dropped.
When UDP recvmsg() consumes later the original skb, skb_release_data()
is hitting atomic_sub_return() quite badly, because skb->clone
has been set permanently.
Note that this patch helps TCP TX performance, because
TCP stack also use (fast) clones.
This means that at least one of the two packets (the main skb or
its clone) will no longer have to perform this atomic operation
in skb_release_data().
Jakub Kicinski [Fri, 8 Mar 2024 17:05:48 +0000 (09:05 -0800)]
Merge tag 'wireless-next-2024-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.9
The fourth "new features" pull request for v6.9 with changes both in
stack and in drivers. The theme in this pull request is to fix sparse
warnings but we still have some left in wireless subsystem. Otherwise
quite normal.
Major changes:
rtw89
* NL80211_EXT_FEATURE_SCAN_RANDOM_SN support
* NL80211_EXT_FEATURE_SET_SCAN_DWELL support
rtw88
* support for more rtw8811cu and rtw8821cu devices
mt76
* mt76x2u: add Netgear WNDA3100v3 USB
* mt7915: newer ADIE version support
* mt7925: radio temperature sensor support
* mt7996: remove GCMP IGTK offload
* tag 'wireless-next-2024-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (125 commits)
wifi: rtw89: wow: move release offload packet earlier for WoWLAN mode
wifi: rtw89: wow: set security engine options for 802.11ax chips only
wifi: rtw89: update suspend/resume for different generation
wifi: rtw89: wow: update config mac function with different generation
wifi: rtw89: update DMA function with different generation
wifi: rtw89: wow: update WoWLAN status register for different generation
wifi: rtw89: wow: update WoWLAN reason register for different chips
wifi: brcm80211: handle pmk_op allocation failure
wifi: rtw89: coex: Add coexistence policy to decrease WiFi packet CRC-ERR
wifi: rtw89: coex: When Bluetooth not available don't set power/gain
wifi: rtw89: coex: add return value to ensure H2C command is success or not
wifi: rtw89: coex: Reorder H2C command index to align with firmware
wifi: rtw89: coex: add BTC ctrl_info version 7 and related logic
wifi: rtw89: coex: add init_info H2C command format version 7
wifi: rtw89: 8922a: add coexistence helpers of SW grant
wifi: rtw89: mac: add coexistence helpers {cfg/get}_plt
wifi: cw1200: restore endian swapping
wifi: wlcore: sdio: Rate limit wl12xx_sdio_raw_{read,write}() failures warns
wifi: rtlwifi: Remove rtl_intf_ops.read_efuse_byte
wifi: rtw88: 8821c: Fix false alarm count
...
====================
Bluetooth: hci_sync: Fix UAF in hci_acl_create_conn_sync
This fixes the following error caused by hci_conn being freed while
hcy_acl_create_conn_sync is pending:
==================================================================
BUG: KASAN: slab-use-after-free in hci_acl_create_conn_sync+0xa7/0x2e0
Write of size 2 at addr ffff888002ae0036 by task kworker/u3:0/848
Frédéric Danis [Thu, 7 Mar 2024 16:42:05 +0000 (17:42 +0100)]
Bluetooth: Fix eir name length
According to Section 1.2 of Core Specification Supplement Part A the
complete or short name strings are defined as utf8s, which should not
include the trailing NULL for variable length array as defined in Core
Specification Vol1 Part E Section 2.9.3.
Removing the trailing NULL allows PTS to retrieve the random address based
on device name, e.g. for SM/PER/KDU/BV-02-C, SM/PER/KDU/BV-08-C or
GAP/BROB/BCST/BV-03-C.
Fixes: f61851f64b17 ("Bluetooth: Fix append max 11 bytes of name to scan rsp data") Signed-off-by: Frédéric Danis <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Jie Wang [Thu, 7 Mar 2024 01:01:14 +0000 (09:01 +0800)]
net: hns3: fix port duplex configure error in IMP reset
Currently, the mac port is fixed to configured as full dplex mode in
hclge_mac_init() when driver initialization or reset restore. Users may
change the mode to half duplex with ethtool, so it may cause the user
configuration dropped after reset.
To fix it, don't change the duplex mode when resetting.
Fixes: 2d03eacc0b7e ("net: hns3: Only update mac configuation when necessary") Signed-off-by: Jie Wang <[email protected]> Signed-off-by: Jijie Shao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Peiyang Wang [Thu, 7 Mar 2024 01:01:13 +0000 (09:01 +0800)]
net: hns3: fix reset timeout under full functions and queues
The cmdq reset command times out when all VFs are enabled and the queue is
full. The hardware processing time exceeds the timeout set by the driver.
In order to avoid the above extreme situations, the driver extends the
reset timeout to 1 second.
Jijie Shao [Thu, 7 Mar 2024 01:01:12 +0000 (09:01 +0800)]
net: hns3: fix delete tc fail issue
When the tc is removed during reset, hns3 driver will return a errcode.
But kernel ignores this errcode, As a result,
the driver status is inconsistent with the kernel status.
This patch retains the deletion status when the deletion fails
and continues to delete after the reset to ensure that
the status of the driver is consistent with that of kernel.
Yonglong Liu [Thu, 7 Mar 2024 01:01:11 +0000 (09:01 +0800)]
net: hns3: fix kernel crash when 1588 is received on HIP08 devices
The HIP08 devices does not register the ptp devices, so the
hdev->ptp is NULL, but the hardware can receive 1588 messages,
and set the HNS3_RXD_TS_VLD_B bit, so, if match this case, the
access of hdev->ptp->flags will cause a kernel crash:
Hao Lan [Thu, 7 Mar 2024 01:01:10 +0000 (09:01 +0800)]
net: hns3: Disable SerDes serial loopback for HiLink H60
When the hilink version is H60, the serdes serial loopback test is not
supported. This patch add hilink version detection. When the version
is H60, the serdes serial loopback test will be disable.
Hao Lan [Thu, 7 Mar 2024 01:01:09 +0000 (09:01 +0800)]
net: hns3: add new 200G link modes for hisilicon device
The hisilicon device now supports a new 200G link interface,
which query from firmware in a new bit. Therefore,
the HCLGE_SUPPORT_200G_R4_BIT capability bit has been added.
The HCLGE_SUPPORT_200G_BIT has been renamed as
HCLGE_SUPPORT_200G_R4_EXT_BIT, and the firmware has
extended support for this mode.
Jijie Shao [Thu, 7 Mar 2024 01:01:08 +0000 (09:01 +0800)]
net: hns3: fix wrong judgment condition issue
In hns3_dcbnl_ieee_delapp, should check ieee_delapp not ieee_setapp.
This path fix the wrong judgment.
Fixes: 0ba22bcb222d ("net: hns3: add support config dscp map to tc") Signed-off-by: Jijie Shao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David S. Miller [Fri, 8 Mar 2024 11:54:35 +0000 (11:54 +0000)]
Merge branch 'ionic-diet'
Shannon Nelson says:
====================
ionic: putting ionic on a diet
Building on the performance work done in the previous patchset
[Link] https://lore.kernel.org/netdev/20240229193935[email protected]/
this patchset puts the ionic driver on a diet, decreasing the memory
requirements per queue, and simplifies a few more bits of logic.
We trimmed the queue management structs and gained some ground, but
the most savings came from trimming the individual buffer descriptors.
The original design used a single generic buffer descriptor for Tx, Rx and
Adminq needs, but the Rx and Adminq descriptors really don't need all the
info that the Tx descriptors track. By splitting up the descriptor types
we can significantly reduce the descriptor sizes for Rx and Adminq use.
There is a small reduction in the queue management structs, saving about
3 cachelines per queuepair:
Shannon Nelson [Wed, 6 Mar 2024 23:29:58 +0000 (15:29 -0800)]
ionic: better dma-map error handling
Fix up a couple of small dma_addr handling issues
- don't double-count dma-map-err stat in ionic_tx_map_skb()
or ionic_xdp_post_frame()
- return 0 on error from both ionic_tx_map_single() and
ionic_tx_map_frag() and check for !dma_addr in ionic_tx_map_skb()
and ionic_xdp_post_frame()
- be sure to unmap buf_info[0] in ionic_tx_map_skb() error path
- don't assign rx buf->dma_addr until error checked in ionic_rx_page_alloc()
- remove unnecessary dma_addr_t casts
Shannon Nelson [Wed, 6 Mar 2024 23:29:54 +0000 (15:29 -0800)]
ionic: carry idev in ionic_cq struct
Remove the idev field from ionic_queue, which saves us a
bit of space, and add it into ionic_cq where there's room
within some cacheline padding. Use this pointer rather
than doing a multi level reference from lif->ionic.
Shannon Nelson [Wed, 6 Mar 2024 23:29:53 +0000 (15:29 -0800)]
ionic: refactor skb building
The existing ionic_rx_frags() code is a bit of a mess and can
be cleaned up by unrolling the first frag/header setup from
the loop, then reworking the do-while-loop into a for-loop. We
rename the function to a more descriptive ionic_rx_build_skb().
We also change a couple of related variable names for readability.
Shannon Nelson [Wed, 6 Mar 2024 23:29:51 +0000 (15:29 -0800)]
ionic: use specialized desc info structs
Make desc_info structure specific to the queue type, which
allows us to cut down the Rx and AdminQ descriptor sizes by
not including all the fields needed for the Tx desriptors.
Shannon Nelson [Wed, 6 Mar 2024 23:29:50 +0000 (15:29 -0800)]
ionic: remove the cq_info to save more memory
With a little simple math we don't need another struct array to
find the completion structs, so we can remove the ionic_cq_info
altogether. This doesn't really save anything in the ionic_cq
since it gets padded out to the cacheline, but it does remove
the parallel array allocation of 8 * num_descriptors, or about
8 Kbytes per queue in a default configuration.
Shannon Nelson [Wed, 6 Mar 2024 23:29:49 +0000 (15:29 -0800)]
ionic: remove callback pointer from desc_info
By reworking the queue service routines to have their own
servicing loops we can remove the cb pointer from desc_info
to save another 8 bytes per descriptor,
This simplifies some of the queue handling indirection and makes
the code a little easier to follow, and keeps service code in
one place rather than jumping between code files.
Shannon Nelson [Wed, 6 Mar 2024 23:29:46 +0000 (15:29 -0800)]
ionic: remove desc, sg_desc and cmb_desc from desc_info
Remove the struct pointers from desc_info to use less space.
Instead of pointers in every desc_info to its descriptor,
we can use the queue descriptor index to find the individual
desc, desc_info, and sgl structs in their parallel arrays.