]> Git Repo - linux.git/log
linux.git
5 months agomlx4: Add support for persistent NAPI config to RX CQs
Joe Damato [Fri, 11 Oct 2024 18:45:04 +0000 (18:45 +0000)]
mlx4: Add support for persistent NAPI config to RX CQs

Use netif_napi_add_config to assign persistent per-NAPI config when
initializing RX CQ NAPIs.

Presently, struct napi_config only has support for two fields used for
RX, so there is no need to support them with TX CQs, yet.

Signed-off-by: Joe Damato <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agomlx5: Add support for persistent NAPI config
Joe Damato [Fri, 11 Oct 2024 18:45:03 +0000 (18:45 +0000)]
mlx5: Add support for persistent NAPI config

Use netif_napi_add_config to assign persistent per-NAPI config when
initializing NAPIs.

Signed-off-by: Joe Damato <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agobnxt: Add support for persistent NAPI config
Joe Damato [Fri, 11 Oct 2024 18:45:02 +0000 (18:45 +0000)]
bnxt: Add support for persistent NAPI config

Use netif_napi_add_config to assign persistent per-NAPI config when
initializing NAPIs.

Signed-off-by: Joe Damato <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonetdev-genl: Support setting per-NAPI config values
Joe Damato [Fri, 11 Oct 2024 18:45:01 +0000 (18:45 +0000)]
netdev-genl: Support setting per-NAPI config values

Add support to set per-NAPI defer_hard_irqs and gro_flush_timeout.

Signed-off-by: Joe Damato <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: napi: Add napi_config
Joe Damato [Fri, 11 Oct 2024 18:45:00 +0000 (18:45 +0000)]
net: napi: Add napi_config

Add a persistent NAPI config area for NAPI configuration to the core.
Drivers opt-in to setting the persistent config for a NAPI by passing an
index when calling netif_napi_add_config.

napi_config is allocated in alloc_netdev_mqs, freed in free_netdev
(after the NAPIs are deleted).

Drivers which call netif_napi_add_config will have persistent per-NAPI
settings: NAPI IDs, gro_flush_timeout, and defer_hard_irq settings.

Per-NAPI settings are saved in napi_disable and restored in napi_enable.

Co-developed-by: Martin Karsten <[email protected]>
Signed-off-by: Martin Karsten <[email protected]>
Signed-off-by: Joe Damato <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonetdev-genl: Dump gro_flush_timeout
Joe Damato [Fri, 11 Oct 2024 18:44:59 +0000 (18:44 +0000)]
netdev-genl: Dump gro_flush_timeout

Support dumping gro_flush_timeout for a NAPI ID.

Signed-off-by: Joe Damato <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: napi: Make gro_flush_timeout per-NAPI
Joe Damato [Fri, 11 Oct 2024 18:44:58 +0000 (18:44 +0000)]
net: napi: Make gro_flush_timeout per-NAPI

Allow per-NAPI gro_flush_timeout setting.

The existing sysfs parameter is respected; writes to sysfs will write to
all NAPI structs for the device and the net_device gro_flush_timeout
field. Reads from sysfs will read from the net_device field.

The ability to set gro_flush_timeout on specific NAPI instances will be
added in a later commit, via netdev-genl.

Signed-off-by: Joe Damato <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonetdev-genl: Dump napi_defer_hard_irqs
Joe Damato [Fri, 11 Oct 2024 18:44:57 +0000 (18:44 +0000)]
netdev-genl: Dump napi_defer_hard_irqs

Support dumping defer_hard_irqs for a NAPI ID.

Signed-off-by: Joe Damato <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: napi: Make napi_defer_hard_irqs per-NAPI
Joe Damato [Fri, 11 Oct 2024 18:44:56 +0000 (18:44 +0000)]
net: napi: Make napi_defer_hard_irqs per-NAPI

Add defer_hard_irqs to napi_struct in preparation for per-NAPI
settings.

The existing sysfs parameter is respected; writes to sysfs will write to
all NAPI structs for the device and the net_device defer_hard_irq field.
Reads from sysfs show the net_device field.

The ability to set defer_hard_irqs on specific NAPI instances will be
added in a later commit, via netdev-genl.

Signed-off-by: Joe Damato <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: phylink: allow half-duplex modes with RATE_MATCH_PAUSE
Daniel Golle [Fri, 11 Oct 2024 03:40:39 +0000 (04:40 +0100)]
net: phylink: allow half-duplex modes with RATE_MATCH_PAUSE

PHYs performing rate-matching using MAC-side flow-control always
perform duplex-matching as well in case they are supporting
half-duplex modes at all.
No longer remove half-duplex modes from their capabilities.

Suggested-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Daniel Golle <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Link: https://patch.msgid.link/b157c0c289cfba024039a96e635d037f9d946745.1728617993.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoMerge branch 'tcp-add-skb-sk-to-more-control-packets'
Jakub Kicinski [Tue, 15 Oct 2024 00:39:38 +0000 (17:39 -0700)]
Merge branch 'tcp-add-skb-sk-to-more-control-packets'

Eric Dumazet says:

====================
tcp: add skb->sk to more control packets

Currently, TCP can set skb->sk for a variety of transmit packets.

However, packets sent on behalf of a TIME_WAIT sockets do not
have an attached socket.

Same issue for RST packets.

We want to change this, in order to increase eBPF program
capabilities.

This is slightly risky, because various layers could
be confused by TIME_WAIT sockets showing up in skb->sk.

v2: audited all sk_to_full_sk() users and addressed Martin feedback.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoipv4: tcp: give socket pointer to control skbs
Eric Dumazet [Thu, 10 Oct 2024 17:48:17 +0000 (17:48 +0000)]
ipv4: tcp: give socket pointer to control skbs

ip_send_unicast_reply() send orphaned 'control packets'.

These are RST packets and also ACK packets sent from TIME_WAIT.

Some eBPF programs would prefer to have a meaningful skb->sk
pointer as much as possible.

This means that TCP can now attach TIME_WAIT sockets to outgoing
skbs.

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Brian Vazquez <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoipv6: tcp: give socket pointer to control skbs
Eric Dumazet [Thu, 10 Oct 2024 17:48:16 +0000 (17:48 +0000)]
ipv6: tcp: give socket pointer to control skbs

tcp_v6_send_response() send orphaned 'control packets'.

These are RST packets and also ACK packets sent from TIME_WAIT.

Some eBPF programs would prefer to have a meaningful skb->sk
pointer as much as possible.

This means that TCP can now attach TIME_WAIT sockets to outgoing
skbs.

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Brian Vazquez <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: add skb_set_owner_edemux() helper
Eric Dumazet [Thu, 10 Oct 2024 17:48:15 +0000 (17:48 +0000)]
net: add skb_set_owner_edemux() helper

This can be used to attach a socket to an skb,
taking a reference on sk->sk_refcnt.

This helper might be a NOP if sk->sk_refcnt is zero.

Use it from tcp_make_synack().

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Brian Vazquez <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet_sched: sch_fq: prepare for TIME_WAIT sockets
Eric Dumazet [Thu, 10 Oct 2024 17:48:14 +0000 (17:48 +0000)]
net_sched: sch_fq: prepare for TIME_WAIT sockets

TCP stack is not attaching skb to TIME_WAIT sockets yet,
but we would like to allow this in the future.

Add sk_listener_or_tw() helper to detect the three states
that FQ needs to take care.

Like NEW_SYN_RECV, TIME_WAIT are not full sockets and
do not contain sk->sk_pacing_status, sk->sk_pacing_rate.

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Brian Vazquez <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: add TIME_WAIT logic to sk_to_full_sk()
Eric Dumazet [Thu, 10 Oct 2024 17:48:13 +0000 (17:48 +0000)]
net: add TIME_WAIT logic to sk_to_full_sk()

TCP will soon attach TIME_WAIT sockets to some ACK and RST.

Make sure sk_to_full_sk() detects this and does not return
a non full socket.

v3: also changed sk_const_to_full_sk()

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Martin KaFai Lau <[email protected]>
Reviewed-by: Brian Vazquez <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agotg3: Address byte-order miss-matches
Simon Horman [Wed, 9 Oct 2024 09:40:10 +0000 (10:40 +0100)]
tg3: Address byte-order miss-matches

Address byte-order miss-matches flagged by Sparse.

In tg3_load_firmware_cpu() and tg3_get_device_address()
this is done using appropriate types to store big endian values.

In the cases of tg3_test_nvram(), where buf is an array which
contains values of several different types, cast to __le32
before converting values to host byte order.

Reported by Sparse as:
.../tg3.c:3745:34: warning: cast to restricted __be32
.../tg3.c:13096:21: warning: cast to restricted __le32
.../tg3.c:13096:21: warning: cast from restricted __be32
.../tg3.c:13101:21: warning: cast to restricted __le32
.../tg3.c:13101:21: warning: cast from restricted __be32
.../tg3.c:17070:63: warning: incorrect type in argument 3 (different base types)
.../tg3.c:17070:63:    expected restricted __be32 [usertype] *val
.../tg3.c:17070:63:    got unsigned int *
dr.../tg3.c:17071:63: warning: incorrect type in argument 3 (different base types)
.../tg3.c:17071:63:    expected restricted __be32 [usertype] *val
.../tg3.c:17071:63:    got unsigned int *

Also, address white-space issues on lines modified for the above.
And, for consistency, lines adjacent to them.

Compile tested only.
No functional change intended.

Signed-off-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoMerge branch 'net-ti-ethernet-warnings'
David S. Miller [Mon, 14 Oct 2024 12:20:41 +0000 (13:20 +0100)]
Merge branch 'net-ti-ethernet-warnings'

Simon Horman says:

====================
net: ethernet: ti: Address some warnings

This patchset addresses some warnings flagged by Sparse, and clang-18 in
TI Ethernet drivers.

Although these changes do not alter the functionality of the code, by
addressing them real problems introduced in future which are flagged by
tooling will stand out more readily.

Compile tested only.

---
Changes in v2:
- Dropped patch to directly address __percpu Sparse warnings and, instead
- Add patch to use tstats
- Added tags
- Thanks to all for the review of v1
- Link to v1: https://lore.kernel.org/r/20240910-ti-warn-v1-0-afd1e404abbe@kernel.org
====================

Signed-off-by: David S. Miller <[email protected]>
5 months agonet: ethernet: ti: cpsw_ale: Remove unused accessor functions
Simon Horman [Thu, 10 Oct 2024 11:04:12 +0000 (12:04 +0100)]
net: ethernet: ti: cpsw_ale: Remove unused accessor functions

W=1 builds flag that some accessor functions for ALE fields are unused.

Address this by splitting up the macros used to define these
accessors to allow only those that are used to be declared.

The warnings are verbose, but for example, the mcast_state case is
flagged by clang-18 as:

.../cpsw_ale.c:220:1: warning: unused function 'cpsw_ale_get_mcast_state' [-Wunused-function]
  220 | DEFINE_ALE_FIELD(mcast_state,           62,     2)
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.../cpsw_ale.c:145:19: note: expanded from macro 'DEFINE_ALE_FIELD'
  145 | static inline int cpsw_ale_get_##name(u32 *ale_entry)                   \
      |                   ^~~~~~~~~~~~~~~~~~~
<scratch space>:196:1: note: expanded from here
  196 | cpsw_ale_get_mcast_state
      | ^~~~~~~~~~~~~~~~~~~~~~~~

Compile tested only.
No functional change intended.

Signed-off-by: Simon Horman <[email protected]>
Reviewed-by: Roger Quadros <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agonet: ethernet: ti: am65-cpsw: Use tstats instead of open coded version
Simon Horman [Thu, 10 Oct 2024 11:04:11 +0000 (12:04 +0100)]
net: ethernet: ti: am65-cpsw: Use tstats instead of open coded version

Make use of struct pcpu_sw_netstats and related helpers to handle
existing per-cpu stats for this driver - the exact same counters
are maintained.

A side effect of this change is to address __percpu warnings
flagged by Sparse:

.../am65-cpsw-nuss.c:2658:55: warning: incorrect type in initializer (different address spaces)
.../am65-cpsw-nuss.c:2658:55:    expected struct am65_cpsw_ndev_stats [noderef] __percpu *stats
.../am65-cpsw-nuss.c:2658:55:    got void *data
.../am65-cpsw-nuss.c:2781:15: warning: incorrect type in argument 3 (different address spaces)
.../am65-cpsw-nuss.c:2781:15:    expected void *data
.../am65-cpsw-nuss.c:2781:15:    got struct am65_cpsw_ndev_stats [noderef] __percpu *stats

Compile tested only.
No functional change intended.

Suggested-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Simon Horman <[email protected]>
Reviewed-by: Roger Quadros <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agonet: ethernet: ti: am65-cpsw: Use __be64 type for id_temp
Simon Horman [Thu, 10 Oct 2024 11:04:10 +0000 (12:04 +0100)]
net: ethernet: ti: am65-cpsw: Use __be64 type for id_temp

The id_temp local variable in am65_cpsw_nuss_probe() is
used to hold a 64-bit big-endian value as it is assigned using
cpu_to_be64().

It is read using memcpy(), where it is written as an identifier into a
byte-array.  So this can also be treated as big endian.

As it's type is currently host byte order (u64), sparse flags
an endian mismatch when compiling for little-endian systems:

.../am65-cpsw-nuss.c:3454:17: warning: incorrect type in assignment (different base types)
.../am65-cpsw-nuss.c:3454:17:    expected unsigned long long [usertype] id_temp
.../am65-cpsw-nuss.c:3454:17:    got restricted __be64 [usertype]

Address this by using __be64 as the type of id_temp.

No functional change intended.
Compile tested only.

Reviewed-by: Kalesh AP <[email protected]>
Reviewed-by: Roger Quadros <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agor8169: enable SG/TSO on selected chip versions per default
Heiner Kallweit [Thu, 10 Oct 2024 10:58:02 +0000 (12:58 +0200)]
r8169: enable SG/TSO on selected chip versions per default

Due to problem reports in the past SG and TSO/TSO6 are disabled per
default. It's not fully clear which chip versions are affected, so we
may impact also users of unaffected chip versions, unless they know
how to use ethtool for enabling SG/TSO/TSO6.
Vendor drivers r8168/r8125 enable SG/TSO/TSO6 for selected chip
versions per default, I'd interpret this as confirmation that these
chip versions are unaffected. So let's do the same here.

Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agonet: hsr: convert to use new timer APIs
Yu Liao [Thu, 10 Oct 2024 09:27:44 +0000 (17:27 +0800)]
net: hsr: convert to use new timer APIs

del_timer() and del_timer_sync() have been renamed to timer_delete()
and timer_delete_sync().

Inconsistent API usage makes the code a bit confusing, so replace with
the new APIs.

No functional changes intended.

Signed-off-by: Yu Liao <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agoMerge branch 'ethtool-write-firmware'
David S. Miller [Sun, 13 Oct 2024 17:02:50 +0000 (18:02 +0100)]
Merge branch 'ethtool-write-firmware'

Danielle Ratson says:

====================
ethtool: Add support for writing firmware

In the CMIS specification for pluggable modules, LPL (Local Payload) and
EPL (Extended Payload) are two types of data payloads used for managing
various functions and features of the module.

EPL payloads are used for more complex and extensive management functions
that require a larger amount of data, so writing firmware blocks using EPL
is much more efficient.

Currently, only LPL payload is supported for writing firmware blocks to
the module.

Add support for writing firmware block using EPL payload, both to support
modules that support only EPL write mechanism, and to optimize the flashing
process of modules that support LPL and EPL.

Running the flashing command on the same sample module using EPL vs. LPL
showed an improvement of 84%.

Patchset overview:
Patch #1: preparations
Patch #2: Add EPL support

v5: Resending- no changes.

v4: Resending the right version after wrong v3.
    No changes from v2.

v2:
* Fix the commit meassges to align the cover letter about the
  right meaning of LPL and EPL.
Patch #2:
* Initialize the variable 'bytes_written' before the first
  iteration.
====================

Signed-off-by: David S. Miller <[email protected]>
5 months agonet: ethtool: Add support for writing firmware blocks using EPL payload
Danielle Ratson [Wed, 9 Oct 2024 10:53:47 +0000 (13:53 +0300)]
net: ethtool: Add support for writing firmware blocks using EPL payload

In the CMIS specification for pluggable modules, LPL (Local Payload) and
EPL (Extended Payload) are two types of data payloads used for managing
various functions and features of the module.

EPL payloads are used for more complex and extensive management
functions that require a larger amount of data, so writing firmware
blocks using EPL is much more efficient.

Currently, only LPL payload is supported for writing firmware blocks to
the module.

Add support for writing firmware block using EPL payload, both to
support modules that supports only EPL write mechanism, and to optimize
the flashing process of modules that support LPL and EPL.

Signed-off-by: Danielle Ratson <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agonet: ethtool: Add new parameters and a function to support EPL
Danielle Ratson [Wed, 9 Oct 2024 10:53:46 +0000 (13:53 +0300)]
net: ethtool: Add new parameters and a function to support EPL

In the CMIS specification for pluggable modules, LPL (Local Payload) and
EPL (Extended Payload) are two types of data payloads used for managing
various functions and features of the module.

EPL payloads are used for more complex and extensive management
functions that require a larger amount of data, so writing firmware
blocks using EPL is much more efficient.

Currently, only LPL payload is supported for writing firmware blocks to
the module.

Add EPL related parameters to the function ethtool_cmis_cdb_compose_args()
and add a specific function for calculating the maximum allowable length
extension for EPL. Both will be used in the next patch to add support for
writing firmware blocks using EPL.

Signed-off-by: Danielle Ratson <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agoMerge branch 'vxlan-skb-drop-reasons'
David S. Miller [Sun, 13 Oct 2024 10:33:10 +0000 (11:33 +0100)]
Merge branch 'vxlan-skb-drop-reasons'

Menglong Dong says:

====================
net: vxlan: add skb drop reasons support

In this series, we add skb drop reasons support to VXLAN, and following
new skb drop reasons are introduced:

  SKB_DROP_REASON_VXLAN_INVALID_HDR
  SKB_DROP_REASON_VXLAN_VNI_NOT_FOUND
  SKB_DROP_REASON_VXLAN_ENTRY_EXISTS
  SKB_DROP_REASON_VXLAN_NO_REMOTE
  SKB_DROP_REASON_MAC_INVALID_SOURCE
  SKB_DROP_REASON_IP_TUNNEL_ECN
  SKB_DROP_REASON_TUNNEL_TXINFO
  SKB_DROP_REASON_LOCAL_MAC

We add some helper functions in this series, who will capture the drop
reasons from pskb_may_pull_reason and return them:

  pskb_network_may_pull_reason()
  pskb_inet_may_pull_reason()

And we also make the following functions return skb drop reasons:

  skb_vlan_inet_prepare()
  vxlan_remcsum()
  vxlan_snoop()
  vxlan_set_mac()

Changes since v6:
- fix some typos in the document for SKB_DROP_REASON_TUNNEL_TXINFO

Changes since v5:
- fix some typos in the document for SKB_DROP_REASON_TUNNEL_TXINFO

Changes since v4:
- make skb_vlan_inet_prepare() return drop reasons, instead of introduce
  a wrapper for it in the 3rd patch.
- modify the document for SKB_DROP_REASON_LOCAL_MAC and
  SKB_DROP_REASON_TUNNEL_TXINFO.

Changes since v3:
- rename SKB_DROP_REASON_VXLAN_INVALID_SMAC to
  SKB_DROP_REASON_MAC_INVALID_SOURCE in the 6th patch

Changes since v2:
- move all the drop reasons of VXLAN to the "core", instead of introducing
  the VXLAN drop reason subsystem
- add the 6th patch, which capture the drop reasons from vxlan_snoop()
- move the commits for vxlan_remcsum() and vxlan_set_mac() after
  vxlan_rcv() to update the call of them accordingly
- fix some format problems

Changes since v1:
- document all the drop reasons that we introduce
- rename the drop reasons to make them more descriptive, as Ido advised
- remove the 2nd patch, which introduce the SKB_DR_RESET
- add the 4th patch, which adds skb_vlan_inet_prepare_reason() helper
- introduce the 6th patch, which make vxlan_set_mac return drop reasons
- introduce the 10th patch, which uses VXLAN_DROP_NO_REMOTE as the drop
  reasons, as Ido advised
====================

Signed-off-by: David S. Miller <[email protected]>
5 months agonet: vxlan: use kfree_skb_reason() in encap_bypass_if_local()
Menglong Dong [Wed, 9 Oct 2024 02:28:30 +0000 (10:28 +0800)]
net: vxlan: use kfree_skb_reason() in encap_bypass_if_local()

Replace kfree_skb() with kfree_skb_reason() in encap_bypass_if_local, and
no new skb drop reason is added in this commit.

Signed-off-by: Menglong Dong <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Kalesh AP <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agonet: vxlan: use kfree_skb_reason() in vxlan_encap_bypass()
Menglong Dong [Wed, 9 Oct 2024 02:28:29 +0000 (10:28 +0800)]
net: vxlan: use kfree_skb_reason() in vxlan_encap_bypass()

Replace kfree_skb with kfree_skb_reason in vxlan_encap_bypass, and no new
skb drop reason is added in this commit.

Signed-off-by: Menglong Dong <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agonet: vxlan: use kfree_skb_reason() in vxlan_mdb_xmit()
Menglong Dong [Wed, 9 Oct 2024 02:28:28 +0000 (10:28 +0800)]
net: vxlan: use kfree_skb_reason() in vxlan_mdb_xmit()

Replace kfree_skb() with kfree_skb_reason() in vxlan_mdb_xmit. No drop
reasons are introduced in this commit.

Signed-off-by: Menglong Dong <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agonet: vxlan: add drop reasons support to vxlan_xmit_one()
Menglong Dong [Wed, 9 Oct 2024 02:28:27 +0000 (10:28 +0800)]
net: vxlan: add drop reasons support to vxlan_xmit_one()

Replace kfree_skb/dev_kfree_skb with kfree_skb_reason in vxlan_xmit_one.
No drop reasons are introduced in this commit.

The only concern of mine is replacing dev_kfree_skb with
kfree_skb_reason. The dev_kfree_skb is equal to consume_skb, and I'm not
sure if we can change it to kfree_skb here. In my option, the skb is
"dropped" here, isn't it?

Signed-off-by: Menglong Dong <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agonet: vxlan: use kfree_skb_reason() in vxlan_xmit()
Menglong Dong [Wed, 9 Oct 2024 02:28:26 +0000 (10:28 +0800)]
net: vxlan: use kfree_skb_reason() in vxlan_xmit()

Replace kfree_skb() with kfree_skb_reason() in vxlan_xmit(). Following
new skb drop reasons are introduced for vxlan:

/* no remote found for xmit */
SKB_DROP_REASON_VXLAN_NO_REMOTE
/* packet without necessary metadata reached a device which is
 * in "external" mode
 */
SKB_DROP_REASON_TUNNEL_TXINFO

Signed-off-by: Menglong Dong <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agonet: vxlan: make vxlan_set_mac() return drop reasons
Menglong Dong [Wed, 9 Oct 2024 02:28:25 +0000 (10:28 +0800)]
net: vxlan: make vxlan_set_mac() return drop reasons

Change the return type of vxlan_set_mac() from bool to enum
skb_drop_reason. In this commit, the drop reason
"SKB_DROP_REASON_LOCAL_MAC" is introduced for the case that the source
mac of the packet is a local mac.

Signed-off-by: Menglong Dong <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agonet: vxlan: make vxlan_snoop() return drop reasons
Menglong Dong [Wed, 9 Oct 2024 02:28:24 +0000 (10:28 +0800)]
net: vxlan: make vxlan_snoop() return drop reasons

Change the return type of vxlan_snoop() from bool to enum
skb_drop_reason. In this commit, two drop reasons are introduced:

  SKB_DROP_REASON_MAC_INVALID_SOURCE
  SKB_DROP_REASON_VXLAN_ENTRY_EXISTS

Signed-off-by: Menglong Dong <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agonet: vxlan: make vxlan_remcsum() return drop reasons
Menglong Dong [Wed, 9 Oct 2024 02:28:23 +0000 (10:28 +0800)]
net: vxlan: make vxlan_remcsum() return drop reasons

Make vxlan_remcsum() support skb drop reasons by changing the return
value type of it from bool to enum skb_drop_reason.

The only drop reason in vxlan_remcsum() comes from pskb_may_pull_reason(),
so we just return it.

Signed-off-by: Menglong Dong <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Kalesh AP <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agonet: vxlan: add skb drop reasons to vxlan_rcv()
Menglong Dong [Wed, 9 Oct 2024 02:28:22 +0000 (10:28 +0800)]
net: vxlan: add skb drop reasons to vxlan_rcv()

Introduce skb drop reasons to the function vxlan_rcv(). Following new
drop reasons are added:

  SKB_DROP_REASON_VXLAN_INVALID_HDR
  SKB_DROP_REASON_VXLAN_VNI_NOT_FOUND
  SKB_DROP_REASON_IP_TUNNEL_ECN

Signed-off-by: Menglong Dong <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agonet: tunnel: make skb_vlan_inet_prepare() return drop reasons
Menglong Dong [Wed, 9 Oct 2024 02:28:21 +0000 (10:28 +0800)]
net: tunnel: make skb_vlan_inet_prepare() return drop reasons

Make skb_vlan_inet_prepare return the skb drop reasons, which is just
what pskb_may_pull_reason() returns. Meanwhile, adjust all the call of
it.

Signed-off-by: Menglong Dong <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agonet: tunnel: add pskb_inet_may_pull_reason() helper
Menglong Dong [Wed, 9 Oct 2024 02:28:20 +0000 (10:28 +0800)]
net: tunnel: add pskb_inet_may_pull_reason() helper

Introduce the function pskb_inet_may_pull_reason() and make
pskb_inet_may_pull a simple inline call to it. The drop reasons of it just
come from pskb_may_pull_reason().

Signed-off-by: Menglong Dong <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Kalesh AP <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agonet: skb: add pskb_network_may_pull_reason() helper
Menglong Dong [Wed, 9 Oct 2024 02:28:19 +0000 (10:28 +0800)]
net: skb: add pskb_network_may_pull_reason() helper

Introduce the function pskb_network_may_pull_reason() and make
pskb_network_may_pull() a simple inline call to it. The drop reasons of
it just come from pskb_may_pull_reason.

Signed-off-by: Menglong Dong <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agonet: bcmasp: enable SW timestamping
Justin Chen [Thu, 10 Oct 2024 22:15:06 +0000 (15:15 -0700)]
net: bcmasp: enable SW timestamping

Add skb_tx_timestamp() call and enable support for SW
timestamping.

Signed-off-by: Justin Chen <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: broadcom: remove select MII from brcmstb Ethernet drivers
Justin Chen [Thu, 10 Oct 2024 19:13:32 +0000 (12:13 -0700)]
net: broadcom: remove select MII from brcmstb Ethernet drivers

The MII driver isn't used by brcmstb Ethernet drivers. Remove it
from the BCMASP, GENET, and SYSTEMPORT drivers.

Signed-off-by: Justin Chen <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoMerge branch 'microchip_t1s-update-on-microchip-10base-t1s-phy-driver'
Jakub Kicinski [Fri, 11 Oct 2024 22:54:42 +0000 (15:54 -0700)]
Merge branch 'microchip_t1s-update-on-microchip-10base-t1s-phy-driver'

Parthiban Veerasooran says:

====================
microchip_t1s: Update on Microchip 10BASE-T1S PHY driver

This patch series contain the below updates:
- Restructured lan865x_write_cfg_params() and lan865x_read_cfg_params()
  functions arguments to more generic.
- Updated new/improved initial settings of LAN865X Rev.B0 from latest
  AN1760.
- Added support for LAN865X Rev.B1 from latest AN1760.
- Moved LAN867X reset handling to a new function for flexibility.
- Added support for LAN867X Rev.C1/C2 from latest AN1699.
- Disabled/enabled collision detection based on PLCA setting.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: phy: microchip_t1s: configure collision detection based on PLCA mode
Parthiban Veerasooran [Thu, 10 Oct 2024 08:22:05 +0000 (13:52 +0530)]
net: phy: microchip_t1s: configure collision detection based on PLCA mode

As per LAN8650/1 Rev.B0/B1 AN1760 (Revision F (DS60001760G - June 2024))
and LAN8670/1/2 Rev.C1/C2 AN1699 (Revision E (DS60001699F - June 2024)),
under normal operation, the device should be operated in PLCA mode.
Disabling collision detection is recommended to allow the device to
operate in noisy environments or when reflections and other inherent
transmission line distortion cause poor signal quality. Collision
detection must be re-enabled if the device is configured to operate in
CSMA/CD mode.

Signed-off-by: Parthiban Veerasooran <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: phy: microchip_t1s: add support for Microchip's LAN867X Rev.C2
Parthiban Veerasooran [Thu, 10 Oct 2024 08:22:04 +0000 (13:52 +0530)]
net: phy: microchip_t1s: add support for Microchip's LAN867X Rev.C2

Add support for LAN8670/1/2 Rev.C2 as per the latest configuration note
AN1699 released (Revision E (DS60001699F - June 2024)) for Rev.C1 is also
applicable for Rev.C2. Refer hardware revisions list in the latest AN1699
Revision E (DS60001699F - June 2024).
https://www.microchip.com/en-us/application-notes/an1699

Signed-off-by: Parthiban Veerasooran <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: phy: microchip_t1s: add support for Microchip's LAN867X Rev.C1
Parthiban Veerasooran [Thu, 10 Oct 2024 08:22:03 +0000 (13:52 +0530)]
net: phy: microchip_t1s: add support for Microchip's LAN867X Rev.C1

Add support for LAN8670/1/2 Rev.C1 as per the latest configuration note
AN1699 released (Revision E (DS60001699F - June 2024)).
https://www.microchip.com/en-us/application-notes/an1699

Signed-off-by: Parthiban Veerasooran <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: phy: microchip_t1s: move LAN867X reset handling to a new function
Parthiban Veerasooran [Thu, 10 Oct 2024 08:22:02 +0000 (13:52 +0530)]
net: phy: microchip_t1s: move LAN867X reset handling to a new function

Move LAN867X reset handling code to a new function called
lan867x_check_reset_complete() which will be useful for the next patch
which also uses the same code to handle the reset functionality.

Signed-off-by: Parthiban Veerasooran <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: phy: microchip_t1s: add support for Microchip's LAN865X Rev.B1
Parthiban Veerasooran [Thu, 10 Oct 2024 08:22:01 +0000 (13:52 +0530)]
net: phy: microchip_t1s: add support for Microchip's LAN865X Rev.B1

Add support for LAN8650/1 Rev.B1. As per the latest configuration note
AN1760 released (Revision F (DS60001760G - June 2024)) for Rev.B0 is also
applicable for Rev.B1. Refer hardware revisions list in the latest AN1760
Revision F (DS60001760G - June 2024).
https://www.microchip.com/en-us/application-notes/an1760

Signed-off-by: Parthiban Veerasooran <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: phy: microchip_t1s: update new initial settings for LAN865X Rev.B0
Parthiban Veerasooran [Thu, 10 Oct 2024 08:22:00 +0000 (13:52 +0530)]
net: phy: microchip_t1s: update new initial settings for LAN865X Rev.B0

Update the new/improved initial settings from the latest configuration
application note AN1760 released for LAN8650/1 Rev.B0 Revision F
(DS60001760G - June 2024).
https://www.microchip.com/en-us/application-notes/an1760

Signed-off-by: Parthiban Veerasooran <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: phy: microchip_t1s: restructure cfg read/write functions arguments
Parthiban Veerasooran [Thu, 10 Oct 2024 08:21:59 +0000 (13:51 +0530)]
net: phy: microchip_t1s: restructure cfg read/write functions arguments

Restructure lan865x_write_cfg_params() and lan865x_read_cfg_params()
functions arguments to more generic which will be useful for the next
patch which updates the improved initial configuration for LAN8650/1
Rev.B0 published in the Configuration Note.

Signed-off-by: Parthiban Veerasooran <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoselftests: drv-net: add missing trailing backslash
Jakub Kicinski [Thu, 10 Oct 2024 21:18:57 +0000 (14:18 -0700)]
selftests: drv-net: add missing trailing backslash

Commit b3ea416419c8 ("testing: net-drv: add basic shaper test")
removed the trailing backslash from the last entry. We have
a terminating comment here to avoid having to modify the last
line when adding at the end.

Reviewed-by: Joe Damato <[email protected]>
Reviewed-by: Donald Hunter <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoMerge branch 'netdevsim-better-ipsec-output-format'
Jakub Kicinski [Fri, 11 Oct 2024 22:44:29 +0000 (15:44 -0700)]
Merge branch 'netdevsim-better-ipsec-output-format'

Hangbin Liu says:

====================
netdevsim: better ipsec output format

The first 2 patches improve the netdevsim ipsec debug output with better
format. The 3rd patch update the selftests.

v2: update rtnetlink selftest with new output format (Stanislav Fomichev, Jakub Kicinski)
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoselftests: rtnetlink: update netdevsim ipsec output format
Hangbin Liu [Thu, 10 Oct 2024 04:00:27 +0000 (04:00 +0000)]
selftests: rtnetlink: update netdevsim ipsec output format

After the netdevsim update to use human-readable IP address formats for
IPsec, we can now use the source and destination IPs directly in testing.
Here is the result:
  # ./rtnetlink.sh -t kci_test_ipsec_offload
  PASS: ipsec_offload

Signed-off-by: Hangbin Liu <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonetdevsim: copy addresses for both in and out paths
Hangbin Liu [Thu, 10 Oct 2024 04:00:26 +0000 (04:00 +0000)]
netdevsim: copy addresses for both in and out paths

The current code only copies the address for the in path, leaving the out
path address set to 0. This patch corrects the issue by copying the addresses
for both the in and out paths. Before this patch:

  # cat /sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec
  SA count=2 tx=20
  sa[0] tx ipaddr=0.0.0.0
  sa[0]    spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
  sa[0]    key=0x3167608a ca4f1397 43565909 941fa627
  sa[1] rx ipaddr=192.168.0.1
  sa[1]    spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
  sa[1]    key=0x3167608a ca4f1397 43565909 941fa627

After this patch:

  = cat /sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec
  SA count=2 tx=20
  sa[0] tx ipaddr=192.168.0.2
  sa[0]    spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
  sa[0]    key=0x3167608a ca4f1397 43565909 941fa627
  sa[1] rx ipaddr=192.168.0.1
  sa[1]    spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
  sa[1]    key=0x3167608a ca4f1397 43565909 941fa627

Fixes: 7699353da875 ("netdevsim: add ipsec offload testing")
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Hangbin Liu <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonetdevsim: print human readable IP address
Hangbin Liu [Thu, 10 Oct 2024 04:00:25 +0000 (04:00 +0000)]
netdevsim: print human readable IP address

Currently, IPSec addresses are printed in hexadecimal format, which is
not user-friendly. e.g.

  # cat /sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec
  SA count=2 tx=20
  sa[0] rx ipaddr=0x00000000 00000000 00000000 0100a8c0
  sa[0]    spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
  sa[0]    key=0x3167608a ca4f1397 43565909 941fa627
  sa[1] tx ipaddr=0x00000000 00000000 00000000 00000000
  sa[1]    spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
  sa[1]    key=0x3167608a ca4f1397 43565909 941fa627

This patch updates the code to print the IPSec address in a human-readable
format for easier debug. e.g.

 # cat /sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec
 SA count=4 tx=40
 sa[0] tx ipaddr=0.0.0.0
 sa[0]    spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
 sa[0]    key=0x3167608a ca4f1397 43565909 941fa627
 sa[1] rx ipaddr=192.168.0.1
 sa[1]    spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
 sa[1]    key=0x3167608a ca4f1397 43565909 941fa627
 sa[2] tx ipaddr=::
 sa[2]    spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
 sa[2]    key=0x3167608a ca4f1397 43565909 941fa627
 sa[3] rx ipaddr=2000::1
 sa[3]    spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
 sa[3]    key=0x3167608a ca4f1397 43565909 941fa627

Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Hangbin Liu <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: dsa: mv88e6xxx: Fix uninitialised err value
Aryan Srivastava [Wed, 9 Oct 2024 21:23:19 +0000 (10:23 +1300)]
net: dsa: mv88e6xxx: Fix uninitialised err value

The err value in mv88e6xxx_region_atu_snapshot is now potentially
uninitialised on return. Initialise err as 0.

Fixes: ada5c3229b32 ("net: dsa: mv88e6xxx: Add FID map cache")
Signed-off-by: Aryan Srivastava <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoMerge branch 'net-xilinx-emaclite-adopt-clock-support'
Jakub Kicinski [Fri, 11 Oct 2024 22:41:36 +0000 (15:41 -0700)]
Merge branch 'net-xilinx-emaclite-adopt-clock-support'

Radhey Shyam Pandey says:

====================
net: xilinx: emaclite: Adopt clock support

This patchset adds emaclite clock support. AXI Ethernet Lite IP can also
be used on SoC platforms like Zynq UltraScale+ MPSoC which combines
powerful processing system (PS) and user-programmable logic (PL) into
the same device. On these platforms it is mandatory to explicitly enable
IP clocks for proper functionality.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: emaclite: Adopt clock support
Abin Joseph [Wed, 9 Oct 2024 16:28:23 +0000 (21:58 +0530)]
net: emaclite: Adopt clock support

Adapt to use the clock framework. Add s_axi_aclk clock from the processor
bus clock domain and make clk optional to keep DTB backward compatibility.

Signed-off-by: Abin Joseph <[email protected]>
Signed-off-by: Radhey Shyam Pandey <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: emaclite: Replace alloc_etherdev() with devm_alloc_etherdev()
Abin Joseph [Wed, 9 Oct 2024 16:28:22 +0000 (21:58 +0530)]
net: emaclite: Replace alloc_etherdev() with devm_alloc_etherdev()

Use device managed ethernet device allocation to simplify the error
handling logic. No functional change.

Signed-off-by: Abin Joseph <[email protected]>
Signed-off-by: Radhey Shyam Pandey <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agodt-bindings: net: emaclite: Add clock support
Abin Joseph [Wed, 9 Oct 2024 16:28:21 +0000 (21:58 +0530)]
dt-bindings: net: emaclite: Add clock support

Add s_axi_aclk AXI4 clock support. Traditionally this IP was used on
microblaze platforms which had fixed clocks enabled all the time. But
since its a PL IP, it can also be used on SoC platforms like Zynq
UltraScale+ MPSoC which combines processing system (PS) and user
programmable logic (PL) into the same device. On these platforms instead
of fixed enabled clocks it is mandatory to explicitly enable IP clocks
for proper functionality.

So make clock a required property and also define max supported clock
constraints.

Signed-off-by: Abin Joseph <[email protected]>
Signed-off-by: Radhey Shyam Pandey <[email protected]>
Acked-by: Conor Dooley <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoMerge branch 'net-remove-rtnl-from-fib_seq_sum'
Jakub Kicinski [Fri, 11 Oct 2024 22:35:07 +0000 (15:35 -0700)]
Merge branch 'net-remove-rtnl-from-fib_seq_sum'

Eric Dumazet says:

====================
net: remove RTNL from fib_seq_sum()

This series is inspired by a syzbot report showing
rtnl contention and one thread blocked in:

7 locks held by syz-executor/10835:
  #0: ffff888033390420 (sb_writers#8){.+.+}-{0:0}, at: file_start_write include/linux/fs.h:2931 [inline]
  #0: ffff888033390420 (sb_writers#8){.+.+}-{0:0}, at: vfs_write+0x224/0xc90 fs/read_write.c:679
  #1: ffff88806df6bc88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x1ea/0x500 fs/kernfs/file.c:325
  #2: ffff888026fcf3c8 (kn->active#50){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x20e/0x500 fs/kernfs/file.c:326
  #3: ffffffff8f56f848 (nsim_bus_dev_list_lock){+.+.}-{3:3}, at: new_device_store+0x1b4/0x890 drivers/net/netdevsim/bus.c:166
  #4: ffff88805e0140e8 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:1014 [inline]
  #4: ffff88805e0140e8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x8e/0x520 drivers/base/dd.c:1005
  #5: ffff88805c5fb250 (&devlink->lock_key#55){+.+.}-{3:3}, at: nsim_drv_probe+0xcb/0xb80 drivers/net/netdevsim/dev.c:1534
  #6: ffffffff8fcd1748 (rtnl_mutex){+.+.}-{3:3}, at: fib_seq_sum+0x31/0x290 net/core/fib_notifier.c:46
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: do not acquire rtnl in fib_seq_sum()
Eric Dumazet [Wed, 9 Oct 2024 18:44:05 +0000 (18:44 +0000)]
net: do not acquire rtnl in fib_seq_sum()

After we made sure no fib_seq_read() handlers needs RTNL anymore,
we can remove RTNL from fib_seq_sum().

Note that after RTNL was dropped, fib_seq_sum() result was possibly
outdated anyway.

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoipmr: use READ_ONCE() to read net->ipv[46].ipmr_seq
Eric Dumazet [Wed, 9 Oct 2024 18:44:04 +0000 (18:44 +0000)]
ipmr: use READ_ONCE() to read net->ipv[46].ipmr_seq

mr_call_vif_notifiers() and mr_call_mfc_notifiers() already
uses WRITE_ONCE() on the write side.

Using RTNL to protect the reads seems a big hammer.

Constify 'struct net' argument of ip6mr_rules_seq_read()
and ipmr_rules_seq_read().

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoipv6: use READ_ONCE()/WRITE_ONCE() on fib6_table->fib_seq
Eric Dumazet [Wed, 9 Oct 2024 18:44:03 +0000 (18:44 +0000)]
ipv6: use READ_ONCE()/WRITE_ONCE() on fib6_table->fib_seq

Using RTNL to protect ops->fib_rules_seq reads seems a big hammer.

Writes are protected by RTNL.
We can use READ_ONCE() when reading it.

Constify 'struct net' argument of fib6_tables_seq_read() and
fib6_rules_seq_read().

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoipv4: use READ_ONCE()/WRITE_ONCE() on net->ipv4.fib_seq
Eric Dumazet [Wed, 9 Oct 2024 18:44:02 +0000 (18:44 +0000)]
ipv4: use READ_ONCE()/WRITE_ONCE() on net->ipv4.fib_seq

Using RTNL to protect ops->fib_rules_seq reads seems a big hammer.

Writes are protected by RTNL.
We can use READ_ONCE() when reading it.

Constify 'struct net' argument of fib4_rules_seq_read()

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agofib: rules: use READ_ONCE()/WRITE_ONCE() on ops->fib_rules_seq
Eric Dumazet [Wed, 9 Oct 2024 18:44:01 +0000 (18:44 +0000)]
fib: rules: use READ_ONCE()/WRITE_ONCE() on ops->fib_rules_seq

Using RTNL to protect ops->fib_rules_seq reads seems a big hammer.

Writes are protected by RTNL.
We can use READ_ONCE() on readers.

Constify 'struct net' argument of fib_rules_seq_read()
and lookup_rules_ops().

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agotcp: move sysctl_tcp_l3mdev_accept to netns_ipv4_read_rx
Eric Dumazet [Thu, 10 Oct 2024 03:41:00 +0000 (03:41 +0000)]
tcp: move sysctl_tcp_l3mdev_accept to netns_ipv4_read_rx

sysctl_tcp_l3mdev_accept is read from TCP receive fast path from
tcp_v6_early_demux(),
 __inet6_lookup_established,
  inet_request_bound_dev_if().

Move it to netns_ipv4_read_rx.

Remove the '#ifdef CONFIG_NET_L3_MASTER_DEV' that was guarding
its definition.

Note this adds a hole of three bytes that could be filled later.

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Cc: Wei Wang <[email protected]>
Cc: Coco Li <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: phy: aquantia: poll status register
Aryan Srivastava [Thu, 10 Oct 2024 00:49:34 +0000 (13:49 +1300)]
net: phy: aquantia: poll status register

The system interface connection status register is not immediately
correct upon line side link up. This results in the status being read as
OFF and then transitioning to the correct host side link mode with a
short delay. This causes the phylink framework passing the OFF status
down to all MAC config drivers, resulting in the host side link being
misconfigured, which in turn can lead to link flapping or complete
packet loss in some cases.

Mitigate this by periodically polling the register until it not showing
the OFF state. This will be done every 1ms for 10ms, using the same
poll/timeout as the processor intensive operation reads.

If the phy is still expressing the OFF state after the timeout, then set
the link to false and pass the NA interface mode onto the phylink
framework.

Signed-off-by: Aryan Srivastava <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoeth: remove the DLink/Sundance (ST201) driver
Jakub Kicinski [Tue, 8 Oct 2024 15:48:24 +0000 (08:48 -0700)]
eth: remove the DLink/Sundance (ST201) driver

Konstantin reports the maintainer's address bounces.
There is no other maintainer and the driver is quite old.
There is a good chance nobody is using this driver any more.
Let's try to remove it completely, we can revert it back in
if someone complains.

Link: https://lore.kernel.org/20240925-bizarre-earwig-from-pluto-1484aa@lemu/
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Denis Kirjanov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agoMerge branch 'tg3-link-irqs-napis-and-queues'
Jakub Kicinski [Fri, 11 Oct 2024 01:40:34 +0000 (18:40 -0700)]
Merge branch 'tg3-link-irqs-napis-and-queues'

Joe Damato says:

====================
tg3: Link IRQs, NAPIs, and queues

This follows from a previous RFC (wherein I botched the subject lines of
all the messages) [1].

I've taken Michael Chan's suggestion on modifying patch 2 and I've
updated the commit messages of both patches to test and show the output
for the default 1 TX 4 RX queues and the 4 TX and 4 RX queues cases.

Reviewers: please check the commit messages carefully to ensure the
output is correct (or on your own systems to verify, if you like). I am
not a tg3 expert and it's possible that I got something wrong.

[1]: https://lore.kernel.org/all/20241005145717[email protected]/T/
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agotg3: Link queues to NAPIs
Joe Damato [Wed, 9 Oct 2024 17:55:09 +0000 (17:55 +0000)]
tg3: Link queues to NAPIs

Link queues to NAPIs using the netdev-genl API so this information is
queryable.

First, test with the default setting on my tg3 NIC at boot with 1 TX
queue:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump queue-get --json='{"ifindex": 2}'

[{'id': 0, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'},
 {'id': 1, 'ifindex': 2, 'napi-id': 8195, 'type': 'rx'},
 {'id': 2, 'ifindex': 2, 'napi-id': 8196, 'type': 'rx'},
 {'id': 3, 'ifindex': 2, 'napi-id': 8197, 'type': 'rx'},
 {'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'tx'}]

Now, adjust the number of TX queues to be 4 via ethtool:

$ sudo ethtool -L eth0 tx 4
$ sudo ethtool -l eth0 | tail -5
Current hardware settings:
RX: 4
TX: 4
Other: n/a
Combined: n/a

Despite "Combined: n/a" in the ethtool output, /proc/interrupts shows
the tg3 has renamed the IRQs to be combined:

343: [...] eth0-0
344: [...] eth0-txrx-1
345: [...] eth0-txrx-2
346: [...] eth0-txrx-3
347: [...] eth0-txrx-4

Now query this via netlink to ensure the queues are linked properly to
their NAPIs:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump queue-get --json='{"ifindex": 2}'
[{'id': 0, 'ifindex': 2, 'napi-id': 8960, 'type': 'rx'},
 {'id': 1, 'ifindex': 2, 'napi-id': 8961, 'type': 'rx'},
 {'id': 2, 'ifindex': 2, 'napi-id': 8962, 'type': 'rx'},
 {'id': 3, 'ifindex': 2, 'napi-id': 8963, 'type': 'rx'},
 {'id': 0, 'ifindex': 2, 'napi-id': 8960, 'type': 'tx'},
 {'id': 1, 'ifindex': 2, 'napi-id': 8961, 'type': 'tx'},
 {'id': 2, 'ifindex': 2, 'napi-id': 8962, 'type': 'tx'},
 {'id': 3, 'ifindex': 2, 'napi-id': 8963, 'type': 'tx'}]

As you can see above, id 0 for both TX and RX share a NAPI, NAPI ID
8960, and so on for each queue index up to 3.

Signed-off-by: Joe Damato <[email protected]>
Reviewed-by: Michael Chan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agotg3: Link IRQs to NAPI instances
Joe Damato [Wed, 9 Oct 2024 17:55:08 +0000 (17:55 +0000)]
tg3: Link IRQs to NAPI instances

Link IRQs to NAPI instances with netif_napi_set_irq. This information
can be queried with the netdev-genl API.

Begin by testing my tg3 device in its default state: 1 TX queue and 4 RX
queues.

Compare the output of /proc/interrupts for my tg3 device with the output of
netdev-genl after applying this patch:

$ cat /proc/interrupts | grep eth0
343: [...] eth0-tx-0
344: [...] eth0-rx-1
345: [...] eth0-rx-2
346: [...] eth0-rx-3
347: [...] eth0-rx-4

As you can see above, tg3 has named the IRQs such that there is a
dedicated tx IRQ and 4 dedicated rx IRQs, for a total of 5 IRQs.

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
 --dump napi-get --json='{"ifindex": 2}'

[{'id': 8197, 'ifindex': 2, 'irq': 347},
 {'id': 8196, 'ifindex': 2, 'irq': 346},
 {'id': 8195, 'ifindex': 2, 'irq': 345},
 {'id': 8194, 'ifindex': 2, 'irq': 344},
 {'id': 8193, 'ifindex': 2, 'irq': 343}]

Netlink displays the same IRQs as above, noting that each is mapped to a
unique NAPI instance.

Now, reconfigure the NIC to have 4 TX queues and 4 RX queues:

$ sudo ethtool -L eth0 rx 4 tx 4
$ sudo ethtool -l eth0 | tail -5
Current hardware settings:
RX: 4
TX: 4
Other: n/a
Combined: n/a

Examine /proc/interrupts once again, noting that tg3 will now rename the
IRQs to suggest that they are combined tx and rx without allocating
additional IRQs, so the total IRQ count in /proc/interrupts is
unchanged:

343: [...] eth0-0
344: [...] eth0-txrx-1
345: [...] eth0-txrx-2
346: [...] eth0-txrx-3
347: [...] eth0-txrx-4

Check the output from netlink again:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 2}'
[{'id': 8973, 'ifindex': 2, 'irq': 347},
 {'id': 8972, 'ifindex': 2, 'irq': 346},
 {'id': 8971, 'ifindex': 2, 'irq': 345},
 {'id': 8970, 'ifindex': 2, 'irq': 344},
 {'id': 8969, 'ifindex': 2, 'irq': 343}]

Signed-off-by: Joe Damato <[email protected]>
Reviewed-by: Pavan Chebbi <[email protected]>
Reviewed-by: Michael Chan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agor8169: remove original workaround for RTL8125 broken rx issue
Heiner Kallweit [Wed, 9 Oct 2024 05:48:05 +0000 (07:48 +0200)]
r8169: remove original workaround for RTL8125 broken rx issue

Now that we have b9c7ac4fe22c ("r8169: disable ALDPS per default for
RTL8125"), the first attempt to fix the issue shouldn't be needed
any longer. So let's effectively revert 621735f59064 ("r8169: fix
rare issue with broken rx after link-down on RTL8125") and see
whether anybody complains.

Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agor8169: don't apply UDP padding quirk on RTL8126A
Heiner Kallweit [Wed, 9 Oct 2024 05:44:23 +0000 (07:44 +0200)]
r8169: don't apply UDP padding quirk on RTL8126A

Vendor drivers r8125/r8126 indicate that this quirk isn't needed
any longer for RTL8126A. Mimic this in r8169.

Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 3 Oct 2024 17:05:55 +0000 (10:05 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.12-rc3).

No conflicts and no adjacent changes.

Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoMerge tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 10 Oct 2024 19:36:35 +0000 (12:36 -0700)]
Merge tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth and netfilter.

  Current release - regressions:

   - dsa: sja1105: fix reception from VLAN-unaware bridges

   - Revert "net: stmmac: set PP_FLAG_DMA_SYNC_DEV only if XDP is
     enabled"

   - eth: fec: don't save PTP state if PTP is unsupported

  Current release - new code bugs:

   - smc: fix lack of icsk_syn_mss with IPPROTO_SMC, prevent null-deref

   - eth: airoha: update Tx CPU DMA ring idx at the end of xmit loop

   - phy: aquantia: AQR115c fix up PMA capabilities

  Previous releases - regressions:

   - tcp: 3 fixes for retrans_stamp and undo logic

  Previous releases - always broken:

   - net: do not delay dst_entries_add() in dst_release()

   - netfilter: restrict xtables extensions to families that are safe,
     syzbot found a way to combine ebtables with extensions that are
     never used by userspace tools

   - sctp: ensure sk_state is set to CLOSED if hashing fails in
     sctp_listen_start

   - mptcp: handle consistently DSS corruption, and prevent corruption
     due to large pmtu xmit"

* tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
  MAINTAINERS: Add headers and mailing list to UDP section
  MAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL]
  slip: make slhc_remember() more robust against malicious packets
  net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC
  ppp: fix ppp_async_encode() illegal access
  docs: netdev: document guidance on cleanup patches
  phonet: Handle error of rtnl_register_module().
  mpls: Handle error of rtnl_register_module().
  mctp: Handle error of rtnl_register_module().
  bridge: Handle error of rtnl_register_module().
  vxlan: Handle error of rtnl_register_module().
  rtnetlink: Add bulk registration helpers for rtnetlink message handlers.
  net: do not delay dst_entries_add() in dst_release()
  mptcp: pm: do not remove closing subflows
  mptcp: fallback when MPTCP opts are dropped after 1st data
  tcp: fix mptcp DSS corruption due to large pmtu xmit
  mptcp: handle consistently DSS corruption
  net: netconsole: fix wrong warning
  net: dsa: refuse cross-chip mirroring operations
  net: fec: don't save PTP state if PTP is unsupported
  ...

5 months agoMerge tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 10 Oct 2024 19:25:32 +0000 (12:25 -0700)]
Merge tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:
 "Ring-buffer fix: do not have boot-mapped buffers use CPU hotplug
  callbacks

  When a ring buffer is mapped to memory assigned at boot, it also
  splits it up evenly between the possible CPUs. But the allocation code
  still attached a CPU notifier callback to this ring buffer. When a CPU
  is added, the callback will happen and another per-cpu buffer is
  created for the ring buffer.

  But for boot mapped buffers, there is no room to add another one (as
  they were all created already). The result of calling the CPU hotplug
  notifier on a boot mapped ring buffer is unpredictable and could lead
  to a system crash.

  If the ring buffer is boot mapped simply do not attach the CPU
  notifier to it"

* tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Do not have boot mapped buffers hook to CPU hotplug

5 months agoMerge tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Thu, 10 Oct 2024 17:02:59 +0000 (10:02 -0700)]
Merge tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - update fstrim loop and add more cancellation points, fix reported
   delayed or blocked suspend if there's a huge chunk queued

 - fix error handling in recent qgroup xarray conversion

 - in zoned mode, fix warning printing device path without RCU
   protection

 - again fix invalid extent xarray state (6252690f7e1b), lost due to
   refactoring

* tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix clear_dirty and writeback ordering in submit_one_sector()
  btrfs: zoned: fix missing RCU locking in error message when loading zone info
  btrfs: fix missing error handling when adding delayed ref with qgroups enabled
  btrfs: add cancellation points to trim loops
  btrfs: split remaining space to discard in chunks

5 months agoMerge tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Linus Torvalds [Thu, 10 Oct 2024 16:52:49 +0000 (09:52 -0700)]
Merge tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Fix NFSD bring-up / shutdown

 - Fix a UAF when releasing a stateid

* tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: fix possible badness in FREE_STATEID
  nfsd: nfsd_destroy_serv() must call svc_destroy() even if nfsd_startup_net() failed
  NFSD: Mark filecache "down" if init fails

5 months agoMerge tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Thu, 10 Oct 2024 16:45:45 +0000 (09:45 -0700)]
Merge tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Carlos Maiolino:

 - A few small typo fixes

 - fstests xfs/538 DEBUG-only fix

 - Performance fix on blockgc on COW'ed files, by skipping trims on
   cowblock inodes currently opened for write

 - Prevent cowblocks to be freed under dirty pagecache during unshare

 - Update MAINTAINERS file to quote the new maintainer

* tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix a typo
  xfs: don't free cowblocks from under dirty pagecache on unshare
  xfs: skip background cowblock trims on inodes open for write
  xfs: support lowmode allocations in xfs_bmap_exact_minlen_extent_alloc
  xfs: call xfs_bmap_exact_minlen_extent_alloc from xfs_bmap_btalloc
  xfs: don't ifdef around the exact minlen allocations
  xfs: fold xfs_bmap_alloc_userdata into xfs_bmapi_allocate
  xfs: distinguish extra split from real ENOSPC from xfs_attr_node_try_addname
  xfs: distinguish extra split from real ENOSPC from xfs_attr3_leaf_split
  xfs: return bool from xfs_attr3_leaf_add
  xfs: merge xfs_attr_leaf_try_add into xfs_attr_leaf_addname
  xfs: Use try_cmpxchg() in xlog_cil_insert_pcp_aggregate()
  xfs: scrub: convert comma to semicolon
  xfs: Remove empty declartion in header file
  MAINTAINERS: add Carlos Maiolino as XFS release manager

5 months agoMerge branch 'maintainers-networking-file-coverage-updates'
Jakub Kicinski [Thu, 10 Oct 2024 16:35:50 +0000 (09:35 -0700)]
Merge branch 'maintainers-networking-file-coverage-updates'

Simon Horman says:

====================
MAINTAINERS: Networking file coverage updates

The aim of this proposal is to make the handling of some files,
related to Networking and Wireless, more consistently. It does so by:

1. Adding some more headers to the UDP section, making it consistent
   with the TCP section.

2. Excluding some files relating to Wireless from NETWORKING [GENERAL],
   making their handling consistent with other files related to
   Wireless.

The aim of this is to make things more consistent.  And for MAINTAINERS
to better reflect the situation on the ground.  I am more than happy to
be told that the current state of affairs is fine. Or for other ideas to
be discussed.

v1: https://lore.kernel.org/20241004-maint-net-hdrs-v1-0-41fd555aacc5@kernel.org
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoMAINTAINERS: Add headers and mailing list to UDP section
Simon Horman [Wed, 9 Oct 2024 08:47:23 +0000 (09:47 +0100)]
MAINTAINERS: Add headers and mailing list to UDP section

Add netdev mailing list and some more udp.h headers to the UDP section.
This is now more consistent with the TCP section.

Acked-by: Willem de Bruijn <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoMAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL]
Simon Horman [Wed, 9 Oct 2024 08:47:22 +0000 (09:47 +0100)]
MAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL]

We already exclude wireless drivers from the netdev@ traffic, to
delegate it to linux-wireless@, and avoid overwhelming netdev@.

Many of the following wireless-related sections MAINTAINERS
are already not included in the NETWORKING [GENERAL] section.
For consistency, exclude those that are.

* 802.11 (including CFG80211/NL80211)
* MAC80211
* RFKILL

Acked-by: Johannes Berg <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoslip: make slhc_remember() more robust against malicious packets
Eric Dumazet [Wed, 9 Oct 2024 09:11:32 +0000 (09:11 +0000)]
slip: make slhc_remember() more robust against malicious packets

syzbot found that slhc_remember() was missing checks against
malicious packets [1].

slhc_remember() only checked the size of the packet was at least 20,
which is not good enough.

We need to make sure the packet includes the IPv4 and TCP header
that are supposed to be carried.

Add iph and th pointers to make the code more readable.

[1]

BUG: KMSAN: uninit-value in slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666
  slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666
  ppp_receive_nonmp_frame+0xe45/0x35e0 drivers/net/ppp/ppp_generic.c:2455
  ppp_receive_frame drivers/net/ppp/ppp_generic.c:2372 [inline]
  ppp_do_recv+0x65f/0x40d0 drivers/net/ppp/ppp_generic.c:2212
  ppp_input+0x7dc/0xe60 drivers/net/ppp/ppp_generic.c:2327
  pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379
  sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113
  __release_sock+0x1da/0x330 net/core/sock.c:3072
  release_sock+0x6b/0x250 net/core/sock.c:3626
  pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903
  sock_sendmsg_nosec net/socket.c:729 [inline]
  __sock_sendmsg+0x30f/0x380 net/socket.c:744
  ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
  ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
  __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
  __do_sys_sendmmsg net/socket.c:2771 [inline]
  __se_sys_sendmmsg net/socket.c:2768 [inline]
  __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
  x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Uninit was created at:
  slab_post_alloc_hook mm/slub.c:4091 [inline]
  slab_alloc_node mm/slub.c:4134 [inline]
  kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4186
  kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587
  __alloc_skb+0x363/0x7b0 net/core/skbuff.c:678
  alloc_skb include/linux/skbuff.h:1322 [inline]
  sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732
  pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867
  sock_sendmsg_nosec net/socket.c:729 [inline]
  __sock_sendmsg+0x30f/0x380 net/socket.c:744
  ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
  ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
  __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
  __do_sys_sendmmsg net/socket.c:2771 [inline]
  __se_sys_sendmmsg net/socket.c:2768 [inline]
  __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
  x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

CPU: 0 UID: 0 PID: 5460 Comm: syz.2.33 Not tainted 6.12.0-rc2-syzkaller-00006-g87d6aab2389e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024

Fixes: b5451d783ade ("slip: Move the SLIP drivers")
Reported-by: [email protected]
Closes: https://lore.kernel.org/netdev/[email protected]/T/#u
Signed-off-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet/smc: Address spelling errors
Simon Horman [Wed, 9 Oct 2024 10:05:21 +0000 (11:05 +0100)]
net/smc: Address spelling errors

Address spelling errors flagged by codespell.

This patch is intended to cover all files under drivers/smc

Signed-off-by: Simon Horman <[email protected]>
Reviewed-by: D. Wythe <[email protected]>
Reviewed-by: Guangguan Wang <[email protected]>
Reviewed-by: Randy Dunlap <[email protected]>
Reviewed-by: Wenjia Zhang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC
D. Wythe [Wed, 9 Oct 2024 06:55:16 +0000 (14:55 +0800)]
net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC

Eric report a panic on IPPROTO_SMC, and give the facts
that when INET_PROTOSW_ICSK was set, icsk->icsk_sync_mss must be set too.

Bug: Unable to handle kernel NULL pointer dereference at virtual address
0000000000000000
Mem abort info:
ESR = 0x0000000086000005
EC = 0x21: IABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x05: level 1 translation fault
user pgtable: 4k pages, 48-bit VAs, pgdp=00000001195d1000
[0000000000000000] pgd=0800000109c46003, p4d=0800000109c46003,
pud=0000000000000000
Internal error: Oops: 0000000086000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 UID: 0 PID: 8037 Comm: syz.3.265 Not tainted
6.11.0-rc7-syzkaller-g5f5673607153 #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 08/06/2024
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : 0x0
lr : cipso_v4_sock_setattr+0x2a8/0x3c0 net/ipv4/cipso_ipv4.c:1910
sp : ffff80009b887a90
x29: ffff80009b887aa0 x28: ffff80008db94050 x27: 0000000000000000
x26: 1fffe0001aa6f5b3 x25: dfff800000000000 x24: ffff0000db75da00
x23: 0000000000000000 x22: ffff0000d8b78518 x21: 0000000000000000
x20: ffff0000d537ad80 x19: ffff0000d8b78000 x18: 1fffe000366d79ee
x17: ffff8000800614a8 x16: ffff800080569b84 x15: 0000000000000001
x14: 000000008b336894 x13: 00000000cd96feaa x12: 0000000000000003
x11: 0000000000040000 x10: 00000000000020a3 x9 : 1fffe0001b16f0f1
x8 : 0000000000000000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000040 x4 : 0000000000000001 x3 : 0000000000000000
x2 : 0000000000000002 x1 : 0000000000000000 x0 : ffff0000d8b78000
Call trace:
0x0
netlbl_sock_setattr+0x2e4/0x338 net/netlabel/netlabel_kapi.c:1000
smack_netlbl_add+0xa4/0x154 security/smack/smack_lsm.c:2593
smack_socket_post_create+0xa8/0x14c security/smack/smack_lsm.c:2973
security_socket_post_create+0x94/0xd4 security/security.c:4425
__sock_create+0x4c8/0x884 net/socket.c:1587
sock_create net/socket.c:1622 [inline]
__sys_socket_create net/socket.c:1659 [inline]
__sys_socket+0x134/0x340 net/socket.c:1706
__do_sys_socket net/socket.c:1720 [inline]
__se_sys_socket net/socket.c:1718 [inline]
__arm64_sys_socket+0x7c/0x94 net/socket.c:1718
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
Code: ???????? ???????? ???????? ???????? (????????)
---[ end trace 0000000000000000 ]---

This patch add a toy implementation that performs a simple return to
prevent such panic. This is because MSS can be set in sock_create_kern
or smc_setsockopt, similar to how it's done in AF_SMC. However, for
AF_SMC, there is currently no way to synchronize MSS within
__sys_connect_file. This toy implementation lays the groundwork for us
to support such feature for IPPROTO_SMC in the future.

Fixes: d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC")
Reported-by: Eric Dumazet <[email protected]>
Signed-off-by: D. Wythe <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Wenjia Zhang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoppp: fix ppp_async_encode() illegal access
Eric Dumazet [Wed, 9 Oct 2024 18:58:02 +0000 (18:58 +0000)]
ppp: fix ppp_async_encode() illegal access

syzbot reported an issue in ppp_async_encode() [1]

In this case, pppoe_sendmsg() is called with a zero size.
Then ppp_async_encode() is called with an empty skb.

BUG: KMSAN: uninit-value in ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline]
 BUG: KMSAN: uninit-value in ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675
  ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline]
  ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675
  ppp_async_send+0x130/0x1b0 drivers/net/ppp/ppp_async.c:634
  ppp_channel_bridge_input drivers/net/ppp/ppp_generic.c:2280 [inline]
  ppp_input+0x1f1/0xe60 drivers/net/ppp/ppp_generic.c:2304
  pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379
  sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113
  __release_sock+0x1da/0x330 net/core/sock.c:3072
  release_sock+0x6b/0x250 net/core/sock.c:3626
  pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903
  sock_sendmsg_nosec net/socket.c:729 [inline]
  __sock_sendmsg+0x30f/0x380 net/socket.c:744
  ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
  ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
  __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
  __do_sys_sendmmsg net/socket.c:2771 [inline]
  __se_sys_sendmmsg net/socket.c:2768 [inline]
  __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
  x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Uninit was created at:
  slab_post_alloc_hook mm/slub.c:4092 [inline]
  slab_alloc_node mm/slub.c:4135 [inline]
  kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4187
  kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587
  __alloc_skb+0x363/0x7b0 net/core/skbuff.c:678
  alloc_skb include/linux/skbuff.h:1322 [inline]
  sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732
  pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867
  sock_sendmsg_nosec net/socket.c:729 [inline]
  __sock_sendmsg+0x30f/0x380 net/socket.c:744
  ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
  ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
  __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
  __do_sys_sendmmsg net/socket.c:2771 [inline]
  __se_sys_sendmmsg net/socket.c:2768 [inline]
  __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
  x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

CPU: 1 UID: 0 PID: 5411 Comm: syz.1.14 Not tainted 6.12.0-rc1-syzkaller-00165-g360c1f1f24c6 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: [email protected]
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agodocs: netdev: document guidance on cleanup patches
Simon Horman [Wed, 9 Oct 2024 09:12:19 +0000 (10:12 +0100)]
docs: netdev: document guidance on cleanup patches

The purpose of this section is to document what is the current practice
regarding clean-up patches which address checkpatch warnings and similar
problems. I feel there is a value in having this documented so others
can easily refer to it.

Clearly this topic is subjective. And to some extent the current
practice discourages a wider range of patches than is described here.
But I feel it is best to start somewhere, with the most well established
part of the current practice.

Signed-off-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoMerge branch 'net-introduce-tx-h-w-shaping-api'
Jakub Kicinski [Thu, 10 Oct 2024 15:31:06 +0000 (08:31 -0700)]
Merge branch 'net-introduce-tx-h-w-shaping-api'

Paolo Abeni says:

====================
net: introduce TX H/W shaping API

We have a plurality of shaping-related drivers API, but none flexible
enough to meet existing demand from vendors[1].

This series introduces new device APIs to configure in a flexible way
TX H/W shaping. The new functionalities are exposed via a newly
defined generic netlink interface and include introspection
capabilities. Some self-tests are included, on top of a dummy
netdevsim implementation. Finally a basic implementation for the iavf
driver is provided.

Some usage examples:

* Configure shaping on a given queue:

./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \
--do set --json '{"ifindex": '$IFINDEX',
  "shaper": {"handle":
     {"scope": "queue", "id":'$QUEUEID'},
  "bw-max": 2000000}}'

* Container B/W sharing

The orchestration infrastructure wants to group the
container-related queues under a RR scheduling and limit the aggregate
bandwidth:

./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \
--do group --json '{"ifindex": '$IFINDEX',
"leaves": [
  {"handle": {"scope": "queue", "id":'$QID1'},
   "weight": '$W1'},
  {"handle": {"scope": "queue", "id":'$QID2'},
   "weight": '$W2'}],
  {"handle": {"scope": "queue", "id":'$QID3'},
   "weight": '$W3'}],
"handle": {"scope":"node"},
"bw-max": 10000000}'
{'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 0}}

Q1 \
    \
Q2 -- node 0 -------  netdev
    / (bw-max: 10M)
Q3 /

* Delegation

A containers wants to limit the aggregate B/W bandwidth of 2 of the 3
queues it owns - the starting configuration is the one from the
previous point:

SPEC=Documentation/netlink/specs/net_shaper.yaml
./tools/net/ynl/cli.py --spec $SPEC \
--do group --json '{"ifindex": '$IFINDEX',
"leaves": [
  {"handle": {"scope": "queue", "id":'$QID1'},
   "weight": '$W1'},
  {"handle": {"scope": "queue", "id":'$QID2'},
   "weight": '$W2'}],
"handle": {"scope": "node"},
"bw-max": 5000000 }'
{'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 1}}

Q1 -- node 1 --------\
    / (bw-max: 5M)    \
Q2 /                   node 0 -------  netdev
                      /(bw-max: 10M)
Q3 ------------------/

In a group operation, when prior to the op itself, the leaves have
different parents, the user must specify the parent handle for the
group. I.e., starting from the previous config:

./tools/net/ynl/cli.py --spec $SPEC \
--do group --json '{"ifindex": '$IFINDEX',
"leaves": [
  {"handle": {"scope": "queue", "id":'$QID1'},
   "weight": '$W1'},
  {"handle": {"scope": "queue", "id":'$QID3'},
   "weight": '$W3'}],
"handle": {"scope": "node"},
"bw-max": 3000000 }'
Netlink error: Invalid argument
nl_len = 96 (80) nl_flags = 0x300 nl_type = 2
error: -22
extack: {'msg': 'All the leaves shapers must have the same old parent'}

./tools/net/ynl/cli.py --spec $SPEC \
--do group --json '{"ifindex": '$IFINDEX',
"leaves": [
  {"handle": {"scope": "queue", "id":'$QID1'},
   "weight": '$W1'},
  {"handle": {"scope": "queue", "id":'$QID3'},
   "weight": '$W3'}],
"handle": {"scope": "node"},
"parent": {"scope": "node", "id": 1},
"bw-max": 3000000 }
{'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 2}}

Q1 -- node 2 ---
    /(bw-max:3M)\
Q3 /             \
         ---- node 1 \
        / (bw-max: 5M)\
      Q2              node 0 -------  netdev
                      (bw-max: 10M)

* Cleanup:

Still starting from config 1To delete a single queue shaper

./tools/net/ynl/cli.py --spec $SPEC --do delete --json \
'{"ifindex": '$IFINDEX',
  "handle": {"scope": "queue", "id":'$QID3'}}'

Q1 -- node 2 ---
     (bw-max:3M)\
                 \
         ---- node 1 \
        / (bw-max: 5M)\
      Q2              node 0 -------  netdev
                      (bw-max: 10M)

Deleting a node shaper relinks all its leaves to the node's parent:

./tools/net/ynl/cli.py --spec $SPEC --do delete --json \
'{"ifindex": '$IFINDEX',
  "handle": {"scope": "node", "id":2}}'

Q1 ---\
       \
        node 1----- \
       / (bw-max: 5M)\
Q2----/              node 0 -------  netdev
                     (bw-max: 10M)

Deleting the last shaper under a node shaper deletes the node, too:

./tools/net/ynl/cli.py --spec $SPEC --do delete --json \
'{"ifindex": '$IFINDEX',
  "handle": {"scope": "queue", "id":'$QID1'}}'
./tools/net/ynl/cli.py --spec $SPEC --do delete --json \
'{"ifindex": '$IFINDEX',
  "handle": {"scope": "queue", "id":'$QID2'}}'
./tools/net/ynl/cli.py --spec $SPEC --do get --json \
'{"ifindex": '$IFINDEX',
  "handle": {"scope": "node", "id": 1}}'
Netlink error: No such file or directory
nl_len = 44 (28) nl_flags = 0x300 nl_type = 2
error: -2
extack: {'bad-attr': '.handle'}

Such delete recurses on parents that are left over with no leaves:

./tools/net/ynl/cli.py --spec $SPEC --do get --json \
'{"ifindex": '$IFINDEX',
  "handle": {"scope": "node", "id": 0}}'
Netlink error: No such file or directory
nl_len = 44 (28) nl_flags = 0x300 nl_type = 2
error: -2
extack: {'bad-attr': '.handle'}

v8: https://lore.kernel.org/cover.1727704215[email protected]
v7: https://lore.kernel.org/cover.1725919039[email protected]
v6: https://lore.kernel.org/cover.1725457317[email protected]
v5: https://lore.kernel.org/cover.1724944116[email protected]
v4: https://lore.kernel.org/cover.1724165948[email protected]
v3: https://lore.kernel.org/cover.1722357745[email protected]
RFC v2: https://lore.kernel.org/cover.1721851988[email protected]
RFC v1: https://lore.kernel.org/cover.1719518113[email protected]
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoiavf: add support to exchange qos capabilities
Sudheer Mogilappagari [Wed, 9 Oct 2024 08:10:01 +0000 (10:10 +0200)]
iavf: add support to exchange qos capabilities

During driver initialization VF determines QOS capability is allowed
by PF and receives QOS parameters. After which quanta size for queues
is configured which is not configurable and is set to 1KB currently.

Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Sudheer Mogilappagari <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Link: https://patch.msgid.link/72cbeb9c88d40e557053c57d7531c96bed490576.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoiavf: Add net_shaper_ops support
Sudheer Mogilappagari [Wed, 9 Oct 2024 08:10:00 +0000 (10:10 +0200)]
iavf: Add net_shaper_ops support

Implement net_shaper_ops support for IAVF. This enables configuration
of rate limiting on per queue basis. Customer intends to enforce
bandwidth limit on Tx traffic steered to the queue by configuring
rate limits on the queue.

To set rate limiting for a queue, update shaper object of given queues
in driver and send VIRTCHNL_OP_CONFIG_QUEUE_BW to PF to update HW
configuration.

Deleting shaper configured for queue is nothing but configuring shaper
with bw_max 0. The PF restores the default rate limiting config
when bw_max is zero.

Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Sudheer Mogilappagari <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Link: https://patch.msgid.link/5a882cb51998c4c2c3d21fed521498eba1c8f079.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoice: Support VF queue rate limit and quanta size configuration
Wenjun Wu [Wed, 9 Oct 2024 08:09:59 +0000 (10:09 +0200)]
ice: Support VF queue rate limit and quanta size configuration

Add support to configure VF queue rate limit and quanta size.

For quanta size configuration, the quanta profiles are divided evenly
by PF numbers. For each port, the first quanta profile is reserved for
default. When VF is asked to set queue quanta size, PF will search for
an available profile, change the fields and assigned this profile to the
queue.

Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Wenjun Wu <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Link: https://patch.msgid.link/fddefc2c1ec3ab32b241ce444af401da19e834dd.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agovirtchnl: support queue rate limit and quanta size configuration
Wenjun Wu [Wed, 9 Oct 2024 08:09:58 +0000 (10:09 +0200)]
virtchnl: support queue rate limit and quanta size configuration

This patch adds new virtchnl opcodes and structures for rate limit
and quanta size configuration, which include:
1. VIRTCHNL_OP_CONFIG_QUEUE_BW, to configure max bandwidth for each
VF per queue.
2. VIRTCHNL_OP_CONFIG_QUANTA, to configure quanta size per queue.
3. VIRTCHNL_OP_GET_QOS_CAPS, VF queries current QoS configuration, such
as enabled TCs, arbiter type, up2tc and bandwidth of VSI node. The
configuration is previously set by DCB and PF, and now is the potential
QoS capability of VF. VF can take it as reference to configure queue TC
mapping.

Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Wenjun Wu <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Link: https://patch.msgid.link/839002f7bd6f63b985a060a51b079f6e6dbbe237.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agotesting: net-drv: add basic shaper test
Paolo Abeni [Wed, 9 Oct 2024 08:09:57 +0000 (10:09 +0200)]
testing: net-drv: add basic shaper test

Leverage a basic/dummy netdevsim implementation to do functional
coverage for NL interface.

Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Link: https://patch.msgid.link/43092afbf38365c796088bf8fc155e523ab434ae.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet-shapers: implement cap validation in the core
Paolo Abeni [Wed, 9 Oct 2024 08:09:56 +0000 (10:09 +0200)]
net-shapers: implement cap validation in the core

Use the device capabilities to reject invalid attribute values before
pushing them to the H/W.

Note that validating the metric explicitly avoids NL_SET_BAD_ATTR()
usage, to provide unambiguous error messages to the user.

Validating the nesting requires the knowledge of the new parent for
the given shaper; as such is a chicken-egg problem: to validate the
leaf nesting we need to know the node scope, to validate the node
nesting we need to know the leafs parent scope.

To break the circular dependency, place the leafs nesting validation
after the parsing.

Suggested-by: Jakub Kicinski <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Link: https://patch.msgid.link/54667601813e4c0348f39bf8ad2446ffc9fcd383.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: shaper: implement introspection support
Paolo Abeni [Wed, 9 Oct 2024 08:09:55 +0000 (10:09 +0200)]
net: shaper: implement introspection support

The netlink op is a simple wrapper around the device callback.

Extend the existing fetch_dev() helper adding an attribute argument
for the requested device. Reuse such helper in the newly implemented
operation.

Reviewed-by: Jiri Pirko <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Link: https://patch.msgid.link/66eb62f22b3a5ba06ca91d01ae77515e5f447e15.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonetlink: spec: add shaper introspection support
Paolo Abeni [Wed, 9 Oct 2024 08:09:54 +0000 (10:09 +0200)]
netlink: spec: add shaper introspection support

Allow the user-space to fine-grain query the shaping features
supported by the NIC on each domain.

Reviewed-by: Jiri Pirko <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Link: https://patch.msgid.link/3ddd10e450e3fe7d4b944c0d0b886d4483529ee6.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet-shapers: implement shaper cleanup on queue deletion
Paolo Abeni [Wed, 9 Oct 2024 08:09:53 +0000 (10:09 +0200)]
net-shapers: implement shaper cleanup on queue deletion

hook into netif_set_real_num_tx_queues() to cleanup any shaper
configured on top of the to-be-destroyed TX queues.

Reviewed-by: Jiri Pirko <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Link: https://patch.msgid.link/6da4ee03cae2b2a757d7b59e88baf09cc94c5ef1.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet-shapers: implement delete support for NODE scope shaper
Paolo Abeni [Wed, 9 Oct 2024 08:09:52 +0000 (10:09 +0200)]
net-shapers: implement delete support for NODE scope shaper

Leverage the previously introduced group operation to implement
the removal of NODE scope shaper, re-linking its leaves under the
the parent node before actually deleting the specified NODE scope
shaper.

Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Link: https://patch.msgid.link/763d484b5b69e365acccfd8031b183c647a367a4.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet-shapers: implement NL group operation
Paolo Abeni [Wed, 9 Oct 2024 08:09:51 +0000 (10:09 +0200)]
net-shapers: implement NL group operation

Allow grouping multiple leaves shaper under the given root.
The node and the leaves shapers are created, if needed, otherwise
the existing shapers are re-linked as requested.

Try hard to pre-allocated the needed resources, to avoid non
trivial H/W configuration rollbacks in case of any failure.

Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Link: https://patch.msgid.link/8a721274fde18b872d1e3a61aaa916bb7b7996d3.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet-shapers: implement NL set and delete operations
Paolo Abeni [Wed, 9 Oct 2024 08:09:50 +0000 (10:09 +0200)]
net-shapers: implement NL set and delete operations

Both NL operations directly map on the homonymous device shaper
callbacks, update accordingly the shapers cache and are serialized
via a per device lock.
Implement the cache modification helpers to additionally deal with
NODE scope shaper. That will be needed by the group() operation
implemented in the next patch.
The delete implementation is partial: does not handle NODE scope
shaper yet. Such support will require infrastructure from
the next patch and will be implemented later in the series.

Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Link: https://patch.msgid.link/1e6a34a4095b35d773d2b9c476164671bbcf8397.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>
This page took 0.135847 seconds and 4 git commands to generate.