Git Repo - linux.git/log

mlxsw: Configure descriptor buffers

Spectrum machines have two resources related to keeping packets in an
internal buffer: bytes (allocated in cell-sized units) for packet payload,
and descriptors, for keeping metadata. Currently, mlxsw only configures the
bytes part of the resource management.

Spectrum switches permit a full parallel configuration for the descriptor
resources, including port-pool and port-TC-pool quotas. By default, these
are all configured to use pool 14, with an infinite quota. The ingress pool
14 is then infinite in size.

However, egress pool 14 has finite size by default. The size is chip
dependent, but always much lower than what the chip actually permits. As a
result, we can easily construct workloads that exhaust the configured
descriptor limit.

Fix the issue by configuring the egress descriptor pool to be infinite in
size as well. This will maintain the configuration philosophy of the
default configuration, but will unlock all chip resources to be usable.

In the code, include both the configuration of ingress and ingress, mostly
for clarity.

Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

mlxsw: reg: Add "desc" field to SBPR

SBPR, or Shared Buffer Pools Register, configures and retrieves the shared
buffer pools and configuration. The desc field determines whether the
configuration relates to the byte pool or the descriptor pool.

Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'use-standard-sysctl-macro'

Tonghao Zhang says:

====================
use standard sysctl macro

From: Tonghao Zhang <[email protected]>

This patchset introduce sysctl macro or replace var
with macro.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

selftests/sysctl: add sysctl macro test

Cc: Luis Chamberlain <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Hideaki YOSHIFUJI <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Simon Horman <[email protected]>
Cc: Julian Anastasov <[email protected]>
Cc: Pablo Neira Ayuso <[email protected]>
Cc: Jozsef Kadlecsik <[email protected]>
Cc: Florian Westphal <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Lorenz Bauer <[email protected]>
Cc: Akhmat Karakotov <[email protected]>
Signed-off-by: Tonghao Zhang <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: sysctl: introduce sysctl SYSCTL_THREE

This patch introdues the SYSCTL_THREE.

KUnit:
[00:10:14] ================ sysctl_test (10 subtests) =================
[00:10:14] [PASSED] sysctl_test_api_dointvec_null_tbl_data
[00:10:14] [PASSED] sysctl_test_api_dointvec_table_maxlen_unset
[00:10:14] [PASSED] sysctl_test_api_dointvec_table_len_is_zero
[00:10:14] [PASSED] sysctl_test_api_dointvec_table_read_but_position_set
[00:10:14] [PASSED] sysctl_test_dointvec_read_happy_single_positive
[00:10:14] [PASSED] sysctl_test_dointvec_read_happy_single_negative
[00:10:14] [PASSED] sysctl_test_dointvec_write_happy_single_positive
[00:10:14] [PASSED] sysctl_test_dointvec_write_happy_single_negative
[00:10:14] [PASSED] sysctl_test_api_dointvec_write_single_less_int_min
[00:10:14] [PASSED] sysctl_test_api_dointvec_write_single_greater_int_max
[00:10:14] =================== [PASSED] sysctl_test ===================

./run_kselftest.sh -c sysctl
...
ok 1 selftests: sysctl: sysctl.sh

Cc: Luis Chamberlain <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Hideaki YOSHIFUJI <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Simon Horman <[email protected]>
Cc: Julian Anastasov <[email protected]>
Cc: Pablo Neira Ayuso <[email protected]>
Cc: Jozsef Kadlecsik <[email protected]>
Cc: Florian Westphal <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Lorenz Bauer <[email protected]>
Cc: Akhmat Karakotov <[email protected]>
Signed-off-by: Tonghao Zhang <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: sysctl: use shared sysctl macro

This patch replace two, four and long_one to SYSCTL_XXX.

Cc: Luis Chamberlain <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Hideaki YOSHIFUJI <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Simon Horman <[email protected]>
Cc: Julian Anastasov <[email protected]>
Cc: Pablo Neira Ayuso <[email protected]>
Cc: Jozsef Kadlecsik <[email protected]>
Cc: Florian Westphal <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Lorenz Bauer <[email protected]>
Cc: Akhmat Karakotov <[email protected]>
Signed-off-by: Tonghao Zhang <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'vsock-virtio-add-support-for-device-suspend-resume'

Stefano Garzarella says:

====================
vsock/virtio: add support for device suspend/resume

Vilas reported that virtio-vsock no longer worked properly after
suspend/resume (echo mem >/sys/power/state).
It was impossible to connect to the host and vice versa.

Indeed, the support has never been implemented.

This series implement .freeze and .restore callbacks of struct virtio_driver
to support device suspend/resume.

The first patch factors our the code to initialize and delete VQs.
The second patch uses that code to support device suspend/resume.

Signed-off-by: Stefano Garzarella <[email protected]>
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

vsock/virtio: add support for device suspend/resume

Implement .freeze and .restore callbacks of struct virtio_driver
to support device suspend/resume.

During suspension all connected sockets are reset and VQs deleted.
During resume the VQs are re-initialized.

Reported by: Vilas R K <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

vsock/virtio: factor our the code to initialize and delete VQs

Add virtio_vsock_vqs_init() and virtio_vsock_vqs_del() with the code
that was in virtio_vsock_probe() and virtio_vsock_remove to initialize
and delete VQs.

These new functions will be used in the next commit to support device
suspend/resume

Signed-off-by: Stefano Garzarella <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: forwarding: add Per-Stream Filtering and Policing test for Ocelot

The Felix VSC9959 switch in NXP LS1028A supports the tc-gate action
which enforced time-based access control per stream. A stream as seen by
this switch is identified by {MAC DA, VID}.

We use the standard forwarding selftest topology with 2 host interfaces
and 2 switch interfaces. The host ports must require timestamping non-IP
packets and supporting tc-etf offload, for isochron to work. The
isochron program monitors network sync status (ptp4l, phc2sys) and
deterministically transmits packets to the switch such that the tc-gate
action either (a) always accepts them based on its schedule, or
(b) always drops them.

I tried to keep as much of the logic that isn't specific to the NXP
LS1028A in a new tsn_lib.sh, for future reuse. This covers
synchronization using ptp4l and phc2sys, and isochron.

The cycle-time chosen for this selftest isn't particularly impressive
(and the focus is the functionality of the switch), but I didn't really
know what to do better, considering that it will mostly be run during
debugging sessions, various kernel bloatware would be enabled, like
lockdep, KASAN, etc, and we certainly can't run any races with those on.

I tried to look through the kselftest framework for other real time
applications and didn't really find any, so I'm not sure how better to
prepare the environment in case we want to go for a lower cycle time.
At the moment, the only thing the selftest is ensuring is that dynamic
frequency scaling is disabled on the CPU that isochron runs on. It would
probably be useful to have a blacklist of kernel config options (checked
through zcat /proc/config.gz) and some cyclictest scripts to run
beforehand, but I saw none of those.

Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Kurt Kanzenbach <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ipv6: Don't send rs packets to the interface of ARPHRD_TUNNEL

ARPHRD_TUNNEL interface can't process rs packets
and will generate TX errors

ex:
ip tunnel add ethn mode ipip local 192.168.1.1 remote 192.168.1.2
ifconfig ethn x.x.x.x

ethn: flags=209<UP,POINTOPOINT,RUNNING,NOARP>  mtu 1480
inet x.x.x.x  netmask 255.255.255.255  destination x.x.x.x
inet6 fe80::5efe:ac1e:3cdb  prefixlen 64  scopeid 0x20<link>
tunnel   txqueuelen 1000  (IPIP Tunnel)
RX packets 0  bytes 0 (0.0 B)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 0  bytes 0 (0.0 B)
TX errors 3  dropped 0 overruns 0  carrier 0  collisions 0

Signed-off-by: jianghaoran <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tcp: optimise skb_zerocopy_iter_stream()

It's expensive to make a copy of 40B struct iov_iter to the point it
was taking 0.2-0.5% of all cycles in my tests. iov_iter_revert() should
be fine as it's a simple case without nested reverts/truncates.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/a7e1690c00c5dfe700c30eb9a8a81ec59f6545dd.1650884401.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>

octeontx2-af: debugfs: fix error return of allocations

Current memory failure code in the debugfs returns -ENOSPC. This is
normally used for indicating that there is no space left on the
device and is not applicable for memory allocation failures.
Replace this with -ENOMEM.

Signed-off-by: Niels Dossche <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'ocelot-stats-improvement'

Colin Foster says:

====================
ocelot stats improvement

A couple of pick-ups after f187bfa6f35 ("net: ethernet: ocelot: remove
the need for num_stats initializer") - one addresses a warning
patchwork flagged about operator precedence when using macro arguments.
The other is a reduction of unnecessary memory allocation.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: mscc: ocelot: add missed parentheses around macro argument

Commit 2f187bfa6f35 ("net: ethernet: ocelot: remove the need for num_stats
initializer") added a macro that patchwork warned it lacked parentheses
around an argument. Correct this mistake.

Fixes: 2f187bfa6f35 ("net: ethernet: ocelot: remove the need for num_stats initializer")
Signed-off-by: Colin Foster <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Tested-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: mscc: ocelot: remove unnecessary variable

Commit 2f187bfa6f35 ("net: ethernet: ocelot: remove the need for num_stats
initializer") added a flags field to the ocelot stats structure. The same
behavior can be achieved without this additional field taking up extra
memory.

Remove this structure element to free up RAM

Suggested-by: Vladimir Oltean <[email protected]>
Signed-off-by: Colin Foster <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Tested-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

Stefan Schmidt says:

====================
pull-request: ieee802154-next 2022-05-01

Miquel Raynal landed two patch series bundled in this pull request.

The first series re-works the symbol duration handling to better
accommodate the needs of the various phy layers in ieee802154.

In the second series Miquel improves th errors handling from drivers
up mac802154. THis streamlines the error handling throughout the
ieee/mac802154 stack in preparation for sync TX to be introduced for
MLME frames.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'net-more-heap-allocation-and-split-of-rtnl_newlink'

Jakub Kicinski says:

====================
net: more heap allocation and split of rtnl_newlink()

Small refactoring of rtnl_newlink() to fix a stack usage warning
and make the function shorter.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

rtnl: move rtnl_newlink_create()

Pure code move.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

rtnl: split __rtnl_newlink() into two functions

__rtnl_newlink() is 250LoC, but has a few clear sections.
Move the part which creates a new netdev to a separate
function.

For ease of review code will be moved in the next change.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

rtnl: allocate more attr tables on the heap

Commit a293974590cf ("rtnetlink: avoid frame size warning in rtnl_newlink()")
moved to allocating the largest attribute array of rtnl_newlink()
on the heap. Kalle reports the stack has grown above 1k again:

net/core/rtnetlink.c:3557:1: error: the frame size of 1104 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

Move more attrs to the heap, wrap them in a struct.
Don't bother with linkinfo, it's referenced a lot and we take
its size so it's awkward to move, plus it's small (6 elements).

Reported-by: Kalle Valo <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Tested-by: Kalle Valo <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'use-mmd-c45-helpers'

Andrew Lunn says:

====================
Use MMD/C45 helpers

MDIO busses can perform two sorts of bus transaction, defined in
clause 22 and clause 45 of 802.3. This results in two register
addresses spaces. The current driver structure for indicating if C22
or C45 should be used is messy, and many C22 only bus drivers will
wrongly interpret a C45 transaction as a C22 transaction.

This patchset is a preparation step to cleanup the situation. It
converts MDIO bus users to make use of existing _mmd and _c45 helpers
to perform accesses to C45 registers. This will later allow C45 and
C22 to be kept separate.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: pcs: pcs-xpcs: Convert to mdiobus_c45_read

Stop using the helpers to construct a special mdio address which
indicates C45. Instead use the C45 accessors, which will call the
busses C45 specific read/write API.

Reviewed-by: Vladimir Oltean <[email protected]>
Tested-by: Vladimir Oltean <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: dsa: sja1105: Convert to mdiobus_c45_read

Stop using the helpers to construct a special phy address which
indicates C45. Instead use the C45 accessors, which will call the
busses C45 specific read/write API.

Reviewed-by: Vladimir Oltean <[email protected]>
Tested-by: Vladimir Oltean <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: phy: bcm87xx: Use mmd helpers

Rather than construct special phy device addresses to access C45
registers, use the mmd helpers. These will directly call the C45 API
of the MDIO bus driver.

Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: phy: Convert to mdiobus_c45_{read|write}

Stop using the helpers to construct a special phy address which
indicates C45. Instead use the C45 accessors, which will call the
busses C45 specific read/write API.

Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: phylink: Convert to mdiobus_c45_{read|write}

Stop using the helpers to construct a special phy address which
indicates C45. Instead use the C45 accessors, which will call the
busses C45 specific read/write API.

Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

Merge tag 'linux-can-next-for-5.19-20220502' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2022-05-02

this is a pull request of 9 patches for net-next/master.

The first patch is by Biju Das and documents renesas,r9a07g043-canfd
support in the renesas,rcar-canfd bindings document.

Jakub Kicinski's patch removes a copy of the NAPI_POLL_WEIGHT define
from the m_can driver.

The last 7 patches all target the ctucanfd driver. Pavel Pisa provides
2 patch which update the documentation. 2 patches by Jiapeng Chong
remove unneeded includes and error messages. And another 3 patches by
Pavel Pisa to further clean up the driver (remove inline keyword,
remove unneeded debug statements, and remove unneeded module parameters).

linux-can-next-for-5.19-20220502

* tag 'linux-can-next-for-5.19-20220502' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
  can: ctucanfd: remove PCI module debug parameters
  can: ctucanfd: remove debug statements
  can: ctucanfd: remove inline keyword from local static functions
  can: ctucanfd: ctucan_platform_probe(): remove unnecessary print function dev_err()
  can: ctucanfd: remove unused including <linux/version.h>
  docs: networking: device drivers: can: ctucanfd: update author e-mail
  docs: networking: device drivers: can: add ctucanfd to index
  can: m_can: remove a copy of the NAPI_POLL_WEIGHT define
  dt-bindings: can: renesas,rcar-canfd: Document RZ/G2UL support
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

nfp: support VxLAN inner TSO with GSO_PARTIAL offload

VxLAN belongs to UDP-based encapsulation protocol. Inner TSO for VxLAN
packet with udpcsum requires offloading of outer header csum.

The device doesn't support outer header csum offload. However, inner TSO
for VxLAN with udpcsum can still work with GSO_PARTIAL offload, which
means outer udp csum computed by stack and inner tcp segmentation finished
by hardware. Thus, the patch enable features "NETIF_F_GSO_UDP_TUNNEL_CSUM"
and "NETIF_F_GSO_PARTIAL" and set gso_partial_features.

Signed-off-by: Fei Qin <[email protected]>
Signed-off-by: Yinjun Zhang <[email protected]>
Signed-off-by: Louis Peens <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

selftests: net: vrf_strict_mode_test: add support to select a test to run

Add a boilerplate test loop to run all tests in
vrf_strict_mode_test.sh. Add a -t flag that allows a selected test to
run. Remove the vrf_strict_mode_tests function which is now unused.

Signed-off-by: Jaehee Park <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/20220429164658.GA656707@jaehee-ThinkPad-X1-Extreme
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'devices-always-netif_f_lltx'

Peilin Ye says:

====================
devices always NETIF_F_LLTX

v1: https://lore.kernel.org/netdev/cover.1650580763 [email protected]/

change since v1:
  - deleted "depends on patch..." in [1/2]'s commit message

This patchset depends on these fixes [1], which has been merged into
net-next.  Since o_seqno is now atomic_t, we can always turn on
NETIF_F_LLTX for [IP6]GRE[TAP] devices, since we no longer need the TX
lock (&txq->_xmit_lock).

We could probably do the same thing to [IP6]ERSPAN devices as well, but
I'm not familiar with them yet.  For example, ERSPAN devices are
initialized as |= GRE_FEATURES in erspan_tunnel_init(), but I don't see
IP6ERSPAN devices being initialized as |= GRE6_FEATURES.  Where should we
initialize IP6ERSPAN devices' ->features?  Please suggest if I'm missing
something, thanks!

[1] https://lore.kernel.org/netdev/cover.1650575919 [email protected]/
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

ip6_gre: Make IP6GRE and IP6GRETAP devices always NETIF_F_LLTX

Recently we made o_seqno atomic_t. Stop special-casing TUNNEL_SEQ, and
always mark IP6GRE[TAP] devices as NETIF_F_LLTX, since we no longer need
the TX lock (&txq->_xmit_lock).

Signed-off-by: Peilin Ye <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

ip_gre: Make GRE and GRETAP devices always NETIF_F_LLTX

Recently we made o_seqno atomic_t. Stop special-casing TUNNEL_SEQ, and
always mark GRE[TAP] devices as NETIF_F_LLTX, since we no longer need
the TX lock (&txq->_xmit_lock).

Signed-off-by: Peilin Ye <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

can: ctucanfd: remove PCI module debug parameters

This patch removes the PCI module debug parameters, which are not
needed anymore, to make both checkpatch.pl and patchwork happy.

Link: https://lore.kernel.org/all/1fd684bcf5ddb0346aad234072f54e976a5210fb.1650816929.git.pisa@cmp.felk.cvut.cz
Signed-off-by: Pavel Pisa <[email protected]>
[mkl: split into separate patches]
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: ctucanfd: remove debug statements

This patch removes the debug statements from the driver to make
checkpatch.pl and patchwork happy.

Link: https://lore.kernel.org/all/1fd684bcf5ddb0346aad234072f54e976a5210fb.1650816929.git.pisa@cmp.felk.cvut.cz
Signed-off-by: Pavel Pisa <[email protected]>
[mkl: split into separate patches]
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: ctucanfd: remove inline keyword from local static functions

This patch removes the inline keywords from the local static functions
to make both checkpatch.pl and patchwork happy.

Link: https://lore.kernel.org/all/1fd684bcf5ddb0346aad234072f54e976a5210fb.1650816929.git.pisa@cmp.felk.cvut.cz
Signed-off-by: Pavel Pisa <[email protected]>
[mkl: split into separate patches]
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: ctucanfd: ctucan_platform_probe(): remove unnecessary print function dev_err()

The print function dev_err() is redundant because platform_get_irq()
already prints an error.

Eliminate the follow coccicheck warnings:

| drivers/net/can/ctucanfd/ctucanfd_platform.c:67:2-9:
| line 67 is redundant because platform_get_irq() already prints an error.

Link: https://lore.kernel.org/all/[email protected]
Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Jiapeng Chong <[email protected]>
Acked-by: Pave Pisa <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: ctucanfd: remove unused including <linux/version.h>

Eliminate the follow versioncheck warning:

| drivers/net/can/ctucanfd/ctucanfd_base.c: 34 linux/version.h not needed.

Link: https://lore.kernel.org/all/[email protected]
Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Jiapeng Chong <[email protected]>
Acked-by: Pave Pisa <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

docs: networking: device drivers: can: ctucanfd: update author e-mail

This patch updates the author's email address.

Link: https://lore.kernel.org/all/e4396244da6b008c671def9f50bb983a10389863.1650816929.git.pisa@cmp.felk.cvut.cz
Cc: Odrej Ille <[email protected]>
Signed-off-by: Pavel Pisa <[email protected]>
[mkl: split into separate patches]
Signed-off-by: Marc Kleine-Budde <[email protected]>

docs: networking: device drivers: can: add ctucanfd to index

This patch adds the ctucanfd-driver document to the index.

Link: https://lore.kernel.org/all/e4396244da6b008c671def9f50bb983a10389863.1650816929.git.pisa@cmp.felk.cvut.cz
Signed-off-by: Pavel Pisa <[email protected]>
[mkl: split into separate patches]
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: m_can: remove a copy of the NAPI_POLL_WEIGHT define

Defining local versions of NAPI_POLL_WEIGHT with the same values in
the drivers just makes refactoring harder.

Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

dt-bindings: can: renesas,rcar-canfd: Document RZ/G2UL support

Add CANFD binding documentation for Renesas R9A07G043 (RZ/G2UL) SoC.

Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Biju Das <[email protected]>
Acked-by: Krzysztof Kozlowski <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

Merge branch 'adin1100-industrial-PHY-support'

Alexandru Tachici says:

====================
net: phy: adin1100: Add initial support for ADIN1100 industrial PHY

The ADIN1100 is a low power single port 10BASE-T1L transceiver designed for
industrial Ethernet applications and is compliant with the IEEE 802.3cg
Ethernet standard for long reach 10 Mb/s Single Pair Ethernet.

The ADIN1100 uses Auto-Negotiation capability in accordance
with IEEE 802.3 Clause 98, providing a mechanism for
exchanging information between PHYs to allow link partners to
agree to a common mode of operation.

The concluded operating mode is the transmit amplitude mode and
master/slave preference common across the two devices.

Both device and LP advertise their ability and request for
increased transmit at:
- BASE-T1 autonegotiation advertisement register [47:32]\
Clause 45.2.7.21 of Standard 802.3
- BIT(13) - 10BASE-T1L High Level Transmit Operating Mode Ability
- BIT(12) - 10BASE-T1L High Level Transmit Operating Mode Request

For 2.4 Vpp (high level transmit) operation, both devices need
to have the High Level Transmit Operating Mode Ability bit set,
and only one of them needs to have the High Level Transmit
Operating Mode Request bit set. Otherwise 1.0 Vpp transmit level
will be used.

Settings for eth1:
Supported ports: [ TP MII ]
Supported link modes:   10baseT1L/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes:  10baseT1L/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Link partner advertised link modes:  10baseT1L/Full
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported
Speed: 10Mb/s
Duplex: Full
Auto-negotiation: on
master-slave cfg: preferred slave
master-slave status: slave
Port: Twisted Pair
PHYAD: 0
Transceiver: external
MDI-X: Unknown
Link detected: yes
SQI: 7/7

1. Add basic support for ADIN1100.

Alexandru Ardelean (1):
  net: phy: adin1100: Add initial support for ADIN1100 industrial PHY

1. Added 10baset-T1L link modes.

2. Added 10-BasetT1L registers.

3. Added Base-T1 auto-negotiation registers. For Base-T1 these
registers decide master/slave status and TX voltage of the
device and link partner.

4. Added 10BASE-T1L support in phy-c45.c. Now genphy functions will call
Base-T1 functions where registers don't match, like the auto-negotiation ones.

5. Convert MSE to SQI using a predefined table and allow user access
through ethtool.

6. DT bindings for the 2.4 Vpp transmit mode.
====================

Signed-off-by: David S. Miller <[email protected]>

dt-bindings: net: phy: Add 10-baseT1L 2.4 Vpp

Add a tristate property to advertise desired transmit level.

If the device supports the 2.4 Vpp operating mode for 10BASE-T1L,
as defined in 802.3gc, and the 2.4 Vpp transmit voltage operation
is desired, property should be set to 1. This property is used
to select whether Auto-Negotiation advertises a request to
operate the 10BASE-T1L PHY in increased transmit level mode.

If property is set to 1, the PHY shall advertise a request
to operate the 10BASE-T1L PHY in increased transmit level mode.
If property is set to zero, the PHY shall not advertise
a request to operate the 10BASE-T1L PHY in increased transmit level mode.

Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Alexandru Tachici <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: adin1100: Add SQI support

Determine the SQI from MSE using a predefined table
for the 10BASE-T1L.

Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Alexandru Tachici <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: adin1100: Add initial support for ADIN1100 industrial PHY

The ADIN1100 is a low power single port 10BASE-T1L transceiver designed for
industrial Ethernet applications and is compliant with the IEEE 802.3cg
Ethernet standard for long reach 10 Mb/s Single Pair Ethernet.

Signed-off-by: Alexandru Ardelean <[email protected]>
Signed-off-by: Alexandru Tachici <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: Add 10BASE-T1L support in phy-c45

This patch is needed because the BASE-T1 uses different registers
for status, control and advertisement to those already
employed in the existing phy-c45 functions.

Where required, genphy_c45 functions will now check whether
the device supports BASE-T1 and use the specific registers
instead: 45.2.7.19 BASE-T1 AN control register,
45.2.7.20 BASE-T1 AN status, 45.2.7.21 BASE-T1 AN
advertisement register, 45.2.7.22 BASE-T1 AN LP Base
Page ability register, 45.2.1.185 BASE-T1 PMA/PMD control
register.

Tested-by: Oleksij Rempel <[email protected]>
Signed-off-by: Alexandru Tachici <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: Add BaseT1 auto-negotiation registers

Added BASE-T1 AN advertisement register (Registers 7.514, 7.515, and
7.516) and BASE-T1 AN LP Base Page ability register (Registers 7.517,
7.518, and 7.519).

Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Alexandru Tachici <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: Add 10-BaseT1L registers

The 802.3gc specification defines the 10-BaseT1L link
mode for ethernet trafic on twisted wire pair.

PMA status register can be used to detect if the phy supports
2.4 V TX level and PCS control register can be used to
enable/disable PCS level loopback.

Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Alexandru Tachici <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ethtool: Add 10base-T1L link mode entry

Add entry for the 10base-T1L full duplex mode.

Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Oleksij Rempel <[email protected]>
Signed-off-by: Alexandru Tachici <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Cosmetic change spaces to tabs in dsa_switch_ops

All but 5 methods in dsa_swith_ops use tabs for indentation.

Change the 5 methods that break this rule.

Signed-off-by: Marek Behún <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: enable memcg accounting for veth queues

veth netdevice defines own rx queues and allocates array containing
up to 4095 ~750-bytes-long 'struct veth_rq' elements. Such allocation
is quite huge and should be accounted to memcg.

Signed-off-by: Vasily Averin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'UDP-sock_wfree-opts'

Pavel Begunkov says:

====================
UDP sock_wfree optimisations

The series is not UDP specific but that the main beneficiary. 2/3 saves one
atomic in sock_wfree() and on top 3/3 removes an extra barrier.
Tested with UDP over dummy netdev, 2038491 -> 2099071 req/s (or around +3%).

note: in regards to 1/3, there is a "Should agree with poll..." comment
that I don't completely get, and there is no git history to explain it.
Though I can't see how it could rely on having the second check without
racing with tasks woken by wake_up*().

The series was split from a larger patchset, see
https://lore.kernel.org/netdev/cover.1648981570 [email protected]/
====================

Signed-off-by: David S. Miller <[email protected]>

sock: optimise sock_def_write_space barriers

Now we have a separate path for sock_def_write_space() and can go one
step further. When it's called from sock_wfree() we know that there is a
preceding atomic for putting down ->sk_wmem_alloc. We can use it to
replace to replace smb_mb() with a less expensive
smp_mb__after_atomic(). It also removes an extra RCU read lock/unlock as
a small bonus.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sock: optimise UDP sock_wfree() refcounting

For non SOCK_USE_WRITE_QUEUE sockets, sock_wfree() (atomically) puts
->sk_wmem_alloc twice. It's needed to keep the socket alive while
calling ->sk_write_space() after the first put.

However, some sockets, such as UDP, are freed by RCU
(i.e. SOCK_RCU_FREE) and use already RCU-safe sock_def_write_space().
Carve a fast path for such sockets, put down all refs in one go before
calling sock_def_write_space() but guard the socket from being freed
by an RCU read section.

note: because TCP sockets are marked with SOCK_USE_WRITE_QUEUE it
doesn't add extra checks in its path.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sock: dedup sock_def_write_space wmem_alloc checks

Except for minor rounding differences the first ->sk_wmem_alloc test in
sock_def_write_space() is a hand coded version of sock_writeable().
Replace it with the helper, and also kill the following if duplicating
the check.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: marvell: update abilities and advertising when switching to SGMII

With some SFP modules, such as Finisar FCLF8522P2BTL, the PHY hardware
strapping defaults to 1000BaseX mode, but the kernel prefers to set them
for SGMII mode. When this happens and the PHY is soft reset, the BMSR
status register is updated, but this happens after the kernel has already
read the PHY abilities during probing. This results in support not being
detected for, and the PHY not advertising support for, 10 and 100 Mbps
modes, preventing the link from working with a non-gigabit link partner.

When the PHY is being configured for SGMII mode, call genphy_read_abilities
again in order to re-read the capabilities, and update the advertising
field accordingly.

Signed-off-by: Robert Hancock <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mac802154: Fix symbol durations

There are two major issues in the logic calculating the symbol durations
based on the page/channel:
- The page number is used in place of the channel value.
- The BIT() macro is missing because we want to check the channel
value against a bitmask.

Fix these two errors and apologize loudly for this mistake.

Signed-off-by: Miquel Raynal <[email protected]>
Acked-by: Alexander Aring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Stefan Schmidt <[email protected]>

net: lan966x: Fix compilation error

Starting from the blamed commit, the lan966x build fails with the
following compilation error:

drivers/net/ethernet/microchip/lan966x/lan966x_ptp.c:342:9: error: implicit declaration of function ‘ptp_find_pin_unlocked’ [-Werror=implicit-function-declaration]
342 | pin = ptp_find_pin_unlocked(phc->clock, PTP_PF_EXTTS, 0);

The issue is that there is no stub function for ptp_find_pin_unlocked
in case CONFIG_PTP_1588_CLOCK is not selected. Therefore add one.

Reported-by: kernel test robot <[email protected]>
Fixes: f3d8e0a9c28ba0 ("net: lan966x: Add support for PTP_PF_EXTTS")
Signed-off-by: Horatiu Vultur <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv4: remove unnecessary type castings

remove unnecessary void* type castings.

Signed-off-by: Yu Zhe <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

eth: remove remaining copies of the NAPI_POLL_WEIGHT define

Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.

This patch covers three more drivers which I missed in
commit 5f012b40ef63 ("eth: remove copies of the NAPI_POLL_WEIGHT define").

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: use tcp_skb_sent_after() instead in RACK

This patch doesn't change any functionality.

Signed-off-by: Pengcheng Yang <[email protected]>
Cc: Neal Cardwell <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Tested-by: Neal Cardwell <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/funeth: simplify the return expression of fun_dl_info_get()

Simplify the return expression.

Reported-by: Zeal Robot <[email protected]>
Signed-off-by: Minghao Chi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qede: Reduce verbosity of ptp tx timestamp

Reduce verbosity of ptp tx timestamp error to reduce excessive log
messages.

Signed-off-by: Manish Chopra <[email protected]>
Signed-off-by: Alok Prasad <[email protected]>
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: Prabhakar Kushwaha <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ethernet: ocelot: remove the need for num_stats initializer

There is a desire to share the oclot_stats_layout struct outside of the
current vsc7514 driver. In order to do so, the length of the array needs to
be known at compile time, and defined in the struct ocelot and struct
felix_info.

Since the array is defined in a .c file and would be declared in the header
file via:
extern struct ocelot_stat_layout[];
the size of the array will not be known at compile time to outside modules.

To fix this, remove the need for defining the number of stats at compile
time and allow this number to be determined at initialization.

Signed-off-by: Colin Foster <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: drop skb dst in tcp_rcv_established()

In commit f84af32cbca7 ("net: ip_queue_rcv_skb() helper")
I dropped the skb dst in tcp_data_queue().

This only dealt with so-called TCP input slow path.

When fast path is taken, tcp_rcv_established() calls
tcp_queue_rcv() while skb still has a dst.

This was mostly fine, because most dsts at this point
are not refcounted (thanks to early demux)

However, TCP packets sent over loopback have refcounted dst.

Then commit 68822bdf76f1 ("net: generalize skb freeing
deferral to per-cpu lists") came and had the effect
of delaying skb freeing for an arbitrary time.

If during this time the involved netns is dismantled, cleanup_net()
frees the struct net with embedded net->ipv6.ip6_dst_ops.

Then when eventually dst_destroy_rcu() is called,
if (dst->ops->destroy) ... triggers an use-after-free.

It is not clear if ip6_route_net_exit() lacks a rcu_barrier()
as syzbot reported similar issues before the blamed commit.

( https://groups.google.com/g/syzkaller-bugs/c/CofzW4eeA9A/m/009WjumTAAAJ )

Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists")
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'lan966x-phy-reset-remove'

Michael Walle says:

====================
net: lan966x: remove PHY reset support

Remove the unneeded PHY reset node as well as the driver support for it.

This was already discussed [1] and I expect Microchip to Ack on this
removal. Since there is no user, no breakage is expected.

I'm not sure it this should go through net or net-next and if the patches
should have a Fixes: tag or not. In upstream linux there was never any user
of it, so there is no bug to be fixed. But OTOH if the schema fix isn't
backported, then there might be an older schema version still containing
the reset node. Thoughts?

The patches needed for the GPIO part are just waiting to be picked up by
Linus [2,3]. This patch and the GPIO parts are the last pieces of the
puzzle to get ethernet working on the LAN9668 on upstream linux.

[1] https://lore.kernel.org/netdev/20220330110210.3374165 [email protected]/
[2] https://lore.kernel.org/linux-gpio/CACRpkdbxmN+SWt95aGHjA2ZGnN61aWaA7c5S4PaG+WePAj=htg@mail.gmail.com/
[3] https://lore.kernel.org/linux-gpio/20220420191926.3411830 [email protected]/
====================

Signed-off-by: David S. Miller <[email protected]>

net: lan966x: remove PHY reset support

The PHY subsystem as well as the MIIM mdio driver (in case of the
integrated PHYs) will take care of the resets. A separate reset driver
isn't needed. There is no in-tree user of this feature. Remove the
support.

Signed-off-by: Michael Walle <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

dt-bindings: net: lan966x: remove PHY reset

The PHY reset was intended to be a phandle for a special PHY reset
driver for the integrated PHYs as well as any external PHYs. It turns
out, that the culprit is how the reset of the switch device is done.
In particular, the switch reset also affects other subsystems like
the GPIO and the SGPIO block and it happens to be the case that the
reset lines of the external PHYs are connected to a common GPIO line.
Thus as soon as the switch issues a reset during probe time, all the
external PHYs will go into reset because all the GPIO lines will
switch to input and the pull-down on that signal will take effect.

So even if there was a special PHY reset driver, it (1) won't fix
the root cause of the problem and (2) it won't fix all the other
consumers of GPIO lines which will also be reset.

It turns out, the Ocelot SoC has the same weird behavior (or the
lack of a dedicated switch reset) and there the problem is already
solved and all the bits and pieces are already there and this PHY
reset property isn't not needed at all.

There are no users of this binding. Just remove it.

Signed-off-by: Michael Walle <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'ipv6-net-opts'

Pavel Begunkov says:

====================
generic net and ipv6 minor optimisations

1-3 inline simple functions that only reshuffle arguments possibly adding
extra zero args, and call another function. It was benchmarked before with
a bunch of extra patches, see for details

https://lore.kernel.org/netdev/cover.1648981570 [email protected]/

It may increase the binary size, but it's the right thing to do and at least
without modules it actually sheds some bytes for some standard-ish config.

   text    data     bss     dec     hex filename
9627200       0       0 9627200  92e640 ./arch/x86_64/boot/bzImage
   text    data     bss     dec     hex filename
9627104       0       0 9627104  92e5e0 ./arch/x86_64/boot/bzImage
====================

Signed-off-by: David S. Miller <[email protected]>

ipv6: refactor ip6_finish_output2()

Throw neigh checks in ip6_finish_output2() under a single slow path if,
so we don't have the overhead in the hot path.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: help __ip6_finish_output() inlining

There are two callers of __ip6_finish_output(), both are in
ip6_finish_output(). We can combine the call sites into one and handle
return code after, that will inline __ip6_finish_output().

Note, error handling under NET_XMIT_CN will only return 0 if
__ip6_finish_output() succeded, and in this case it return 0.
Considering that NET_XMIT_SUCCESS is 0, it'll be returning exactly the
same result for it as before.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: inline dev_queue_xmit()

Inline dev_queue_xmit() and dev_queue_xmit_accel(), they both are small
proxy functions doing nothing but redirecting the control flow to
__dev_queue_xmit().

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: inline skb_zerocopy_iter_dgram

skb_zerocopy_iter_dgram() is a small proxy function, inline it. For
that, move __zerocopy_sg_from_iter into linux/skbuff.h

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: inline sock_alloc_send_skb

sock_alloc_send_skb() is simple and just proxying to another function,
so we can inline it and cut associated overhead.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'tcp-pass-back-data-left-in-socket-after-receive' of git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux

Signed-off-by: Jakub Kicinski <[email protected]>

tcp: pass back data left in socket after receive

This is currently done for CMSG_INQ, add an ability to do so via struct
msghdr as well and have CMSG_INQ use that too. If the caller sets
msghdr->msg_get_inq, then we'll pass back the hint in msghdr->msg_inq.

Rearrange struct msghdr a bit so we can add this member while shrinking
it at the same time. On a 64-bit build, it was 96 bytes before this
change and 88 bytes afterwards.

Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

nfp: flower: utilize the tuple iifidx in offloading ct flows

The device info from which conntrack originates is stored in metadata
field of the ct flow to offload now, driver can utilize it to reduce
the number of offloaded flows.

v2: Drop inline keyword from get_netdev_from_rule() signature.
The compiler can decide.

Signed-off-by: Yinjun Zhang <[email protected]>
Signed-off-by: Louis Peens <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

sfc: add EF100 VF support via a write to sriov_numvfs

This patch extends the EF100 PF driver by adding .sriov_configure()
which would allow users to enable and disable virtual functions
using the sriov sysfs.

Signed-off-by: Pieter Jansen van Vuuren <[email protected]>
Signed-off-by: Edward Cree <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'mptcp-path-manager-mode-selection'

Mat Martineau says:

====================
mptcp: Path manager mode selection

MPTCP already has an in-kernel path manager (PM) to add and remove TCP
subflows associated with a given MPTCP connection. This in-kernel PM has
been designed to handle typical server-side use cases, but is not very
flexible or configurable for client devices that may have more
complicated policies to implement.

This patch series from the MPTCP tree is the first step toward adding a
generic-netlink-based API for MPTCP path management, which a privileged
userspace daemon will be able to use to control subflow
establishment. These patches add a per-namespace sysctl to select the
default PM type (in-kernel or userspace) for new MPTCP sockets. New
self-tests confirm expected behavior when userspace PM is selected but
there is no daemon available to handle existing MPTCP PM events.

Subsequent patch series (already staged in the MPTCP tree) will add the
generic netlink path management API.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: Add tests for userspace PM type

These tests ensure that the in-kernel path manager is bypassed when
the userspace path manager is configured. Kernel code is still
responsible for ADD_ADDR echo, so also make sure that's working.

Tested-by: Geliang Tang <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Co-developed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mptcp: Add a per-namespace sysctl to set the default path manager type

The new net.mptcp.pm_type sysctl determines which path manager will be
used by each newly-created MPTCP socket.

v2: Handle builds without CONFIG_SYSCTL
v3: Clarify logic for type-specific PM init (Geliang Tang and Paolo Abeni)

Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mptcp: Make kernel path manager check for userspace-managed sockets

Userspace-managed sockets should not have their subflows or
advertisements changed by the kernel path manager.

v3: Use helper function for PM mode (Paolo Abeni)

Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mptcp: Bypass kernel PM when userspace PM is enabled

When a MPTCP connection is managed by a userspace PM, bypass the kernel
PM for incoming advertisements and subflow events. Netlink events are
still sent to userspace.

v2: Remove unneeded check in mptcp_pm_rm_addr_received() (Kishen Maloor)
v3: Add and use helper function for PM mode (Paolo Abeni)

Acked-by: Paolo Abeni <[email protected]>
Co-developed-by: Kishen Maloor <[email protected]>
Signed-off-by: Kishen Maloor <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mptcp: Add a member to mptcp_pm_data to track kernel vs userspace mode

When adding support for netlink path management commands, the kernel
needs to know whether paths are being controlled by the in-kernel path
manager or a userspace PM.

Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mptcp: Remove redundant assignments in path manager init

A few members of the mptcp_pm_data struct were assigned to hard-coded
values in mptcp_pm_data_reset(), and then immediately changed in
mptcp_pm_nl_data_init().

Instead, flatten all the assignments in to mptcp_pm_data_reset().

v2: Resolve conflicts due to rename of mptcp_pm_data_reset()
v4: Resolve conflict in mptcp_pm_data_reset()

Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'net-phy-micrel-add-coma-mode-support'

Michael Walle says:

====================
net: phy: micrel: add coma mode support

Add support to disable coma mode by a GPIO line.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phy: micrel: add coma mode GPIO

The LAN8814 has a coma mode pin which puts the PHY into isolate and
power-dowm mode. Unfortunately, the mode cannot be disabled by a
register. Usually, the input pin has a pull-up and connected to a GPIO
which can then be used to disable the mode. Try to get the GPIO and
deassert it.

Signed-off-by: Michael Walle <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: phy: micrel: move the PHY timestamping check

Both lan8814_ptp_init() and lan8814_ptp_probe_once() are only used if
PTP and PHY timestamping is enabed. Up until now the probe function just
returns early, if they are not needed. But we need the
phy_package_init_once() functionality for the coma mode GPIO setup. Move
the check into the functions itself.

Signed-off-by: Michael Walle <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

dt-bindings: net: micrel: add coma-mode-gpios property

The LAN8814 has a coma mode pin which is used to put the PHY into
isolate and power-down mode. Usually strapped to be asserted by default.
A GPIO is then used to take the PHY out of this mode.

Signed-off-by: Michael Walle <[email protected]>
Acked-by: Rob Herring <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'remove-NAPI_POLL_WEIGHT-copies'

Merge branch 'remove-NAPI_POLL_WEIGHT-copies'

Jakub Kicinski says:

====================
remove copies of the NAPI_POLL_WEIGHT define

netif_napi_add() takes weight as the last argument. The value of
that parameter is hard to come up with and depends on many factors,
so driver authors are encouraged to use NAPI_POLL_WEIGHT.

We should probably move weight to an "advanced" version of the API
(__netif_napi_add()?) and simplify the life of most driver authors.

In preparation for such API changes this series removes local
defines equivalent to NAPI_POLL_WEIGHT from drivers, so that a simple
coccinelle / spatch script does not get thrown off by them.

v2:
- drop staging bits (patch 2)
- fix subject (patch 8)
- add qeth change (patch 15)
====================

Signed-off-by: David S. Miller <[email protected]>

qeth: remove a copy of the NAPI_POLL_WEIGHT define

Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.

Signed-off-by: Jakub Kicinski <[email protected]>
Acked-by: Alexandra Winter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

eth: velocity: remove a copy of the NAPI_POLL_WEIGHT define

Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

eth: spider: remove a copy of the NAPI_POLL_WEIGHT define

Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.

Acked-by: Geoff Levand <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

eth: vxge: remove a copy of the NAPI_POLL_WEIGHT define

Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

eth: gfar: remove a copy of the NAPI_POLL_WEIGHT define

Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

eth: benet: remove a copy of the NAPI_POLL_WEIGHT define

Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

eth: atlantic: remove a copy of the NAPI_POLL_WEIGHT define

Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: bgmac: remove a copy of the NAPI_POLL_WEIGHT define

Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.

Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

slic: remove a copy of the NAPI_POLL_WEIGHT define

Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.

Signed-off-by: Jakub Kicinski <[email protected]>
Acked-by: Lino Sanfilippo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>