Petr Machata [Mon, 2 May 2022 08:49:24 +0000 (11:49 +0300)]
mlxsw: Configure descriptor buffers
Spectrum machines have two resources related to keeping packets in an
internal buffer: bytes (allocated in cell-sized units) for packet payload,
and descriptors, for keeping metadata. Currently, mlxsw only configures the
bytes part of the resource management.
Spectrum switches permit a full parallel configuration for the descriptor
resources, including port-pool and port-TC-pool quotas. By default, these
are all configured to use pool 14, with an infinite quota. The ingress pool
14 is then infinite in size.
However, egress pool 14 has finite size by default. The size is chip
dependent, but always much lower than what the chip actually permits. As a
result, we can easily construct workloads that exhaust the configured
descriptor limit.
Fix the issue by configuring the egress descriptor pool to be infinite in
size as well. This will maintain the configuration philosophy of the
default configuration, but will unlock all chip resources to be usable.
In the code, include both the configuration of ingress and ingress, mostly
for clarity.
Petr Machata [Mon, 2 May 2022 08:49:23 +0000 (11:49 +0300)]
mlxsw: reg: Add "desc" field to SBPR
SBPR, or Shared Buffer Pools Register, configures and retrieves the shared
buffer pools and configuration. The desc field determines whether the
configuration relates to the byte pool or the descriptor pool.
====================
vsock/virtio: add support for device suspend/resume
Vilas reported that virtio-vsock no longer worked properly after
suspend/resume (echo mem >/sys/power/state).
It was impossible to connect to the host and vice versa.
Indeed, the support has never been implemented.
This series implement .freeze and .restore callbacks of struct virtio_driver
to support device suspend/resume.
The first patch factors our the code to initialize and delete VQs.
The second patch uses that code to support device suspend/resume.
vsock/virtio: factor our the code to initialize and delete VQs
Add virtio_vsock_vqs_init() and virtio_vsock_vqs_del() with the code
that was in virtio_vsock_probe() and virtio_vsock_remove to initialize
and delete VQs.
These new functions will be used in the next commit to support device
suspend/resume
Vladimir Oltean [Sun, 1 May 2022 11:29:53 +0000 (14:29 +0300)]
selftests: forwarding: add Per-Stream Filtering and Policing test for Ocelot
The Felix VSC9959 switch in NXP LS1028A supports the tc-gate action
which enforced time-based access control per stream. A stream as seen by
this switch is identified by {MAC DA, VID}.
We use the standard forwarding selftest topology with 2 host interfaces
and 2 switch interfaces. The host ports must require timestamping non-IP
packets and supporting tc-etf offload, for isochron to work. The
isochron program monitors network sync status (ptp4l, phc2sys) and
deterministically transmits packets to the switch such that the tc-gate
action either (a) always accepts them based on its schedule, or
(b) always drops them.
I tried to keep as much of the logic that isn't specific to the NXP
LS1028A in a new tsn_lib.sh, for future reuse. This covers
synchronization using ptp4l and phc2sys, and isochron.
The cycle-time chosen for this selftest isn't particularly impressive
(and the focus is the functionality of the switch), but I didn't really
know what to do better, considering that it will mostly be run during
debugging sessions, various kernel bloatware would be enabled, like
lockdep, KASAN, etc, and we certainly can't run any races with those on.
I tried to look through the kselftest framework for other real time
applications and didn't really find any, so I'm not sure how better to
prepare the environment in case we want to go for a lower cycle time.
At the moment, the only thing the selftest is ensuring is that dynamic
frequency scaling is disabled on the CPU that isochron runs on. It would
probably be useful to have a blacklist of kernel config options (checked
through zcat /proc/config.gz) and some cyclictest scripts to run
beforehand, but I saw none of those.
Pavel Begunkov [Thu, 28 Apr 2022 10:57:46 +0000 (11:57 +0100)]
tcp: optimise skb_zerocopy_iter_stream()
It's expensive to make a copy of 40B struct iov_iter to the point it
was taking 0.2-0.5% of all cycles in my tests. iov_iter_revert() should
be fine as it's a simple case without nested reverts/truncates.
octeontx2-af: debugfs: fix error return of allocations
Current memory failure code in the debugfs returns -ENOSPC. This is
normally used for indicating that there is no space left on the
device and is not applicable for memory allocation failures.
Replace this with -ENOMEM.
Jakub Kicinski [Mon, 2 May 2022 21:04:19 +0000 (14:04 -0700)]
Merge branch 'ocelot-stats-improvement'
Colin Foster says:
====================
ocelot stats improvement
A couple of pick-ups after f187bfa6f35 ("net: ethernet: ocelot: remove
the need for num_stats initializer") - one addresses a warning
patchwork flagged about operator precedence when using macro arguments.
The other is a reduction of unnecessary memory allocation.
====================
Colin Foster [Sat, 30 Apr 2022 23:23:27 +0000 (16:23 -0700)]
net: mscc: ocelot: add missed parentheses around macro argument
Commit 2f187bfa6f35 ("net: ethernet: ocelot: remove the need for num_stats
initializer") added a macro that patchwork warned it lacked parentheses
around an argument. Correct this mistake.
Fixes: 2f187bfa6f35 ("net: ethernet: ocelot: remove the need for num_stats initializer") Signed-off-by: Colin Foster <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Tested-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
Colin Foster [Sat, 30 Apr 2022 23:23:26 +0000 (16:23 -0700)]
net: mscc: ocelot: remove unnecessary variable
Commit 2f187bfa6f35 ("net: ethernet: ocelot: remove the need for num_stats
initializer") added a flags field to the ocelot stats structure. The same
behavior can be achieved without this additional field taking up extra
memory.
Miquel Raynal landed two patch series bundled in this pull request.
The first series re-works the symbol duration handling to better
accommodate the needs of the various phy layers in ieee802154.
In the second series Miquel improves th errors handling from drivers
up mac802154. THis streamlines the error handling throughout the
ieee/mac802154 stack in preparation for sync TX to be introduced for
MLME frames.
====================
Jakub Kicinski [Fri, 29 Apr 2022 23:55:06 +0000 (16:55 -0700)]
rtnl: allocate more attr tables on the heap
Commit a293974590cf ("rtnetlink: avoid frame size warning in rtnl_newlink()")
moved to allocating the largest attribute array of rtnl_newlink()
on the heap. Kalle reports the stack has grown above 1k again:
net/core/rtnetlink.c:3557:1: error: the frame size of 1104 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
Move more attrs to the heap, wrap them in a struct.
Don't bother with linkinfo, it's referenced a lot and we take
its size so it's awkward to move, plus it's small (6 elements).
Paolo Abeni [Mon, 2 May 2022 11:21:40 +0000 (13:21 +0200)]
Merge branch 'use-mmd-c45-helpers'
Andrew Lunn says:
====================
Use MMD/C45 helpers
MDIO busses can perform two sorts of bus transaction, defined in
clause 22 and clause 45 of 802.3. This results in two register
addresses spaces. The current driver structure for indicating if C22
or C45 should be used is messy, and many C22 only bus drivers will
wrongly interpret a C45 transaction as a C22 transaction.
This patchset is a preparation step to cleanup the situation. It
converts MDIO bus users to make use of existing _mmd and _c45 helpers
to perform accesses to C45 registers. This will later allow C45 and
C22 to be kept separate.
====================
Andrew Lunn [Sat, 30 Apr 2022 17:30:37 +0000 (19:30 +0200)]
net: pcs: pcs-xpcs: Convert to mdiobus_c45_read
Stop using the helpers to construct a special mdio address which
indicates C45. Instead use the C45 accessors, which will call the
busses C45 specific read/write API.
Andrew Lunn [Sat, 30 Apr 2022 17:30:36 +0000 (19:30 +0200)]
net: dsa: sja1105: Convert to mdiobus_c45_read
Stop using the helpers to construct a special phy address which
indicates C45. Instead use the C45 accessors, which will call the
busses C45 specific read/write API.
Andrew Lunn [Sat, 30 Apr 2022 17:30:35 +0000 (19:30 +0200)]
net: phy: bcm87xx: Use mmd helpers
Rather than construct special phy device addresses to access C45
registers, use the mmd helpers. These will directly call the C45 API
of the MDIO bus driver.
Andrew Lunn [Sat, 30 Apr 2022 17:30:34 +0000 (19:30 +0200)]
net: phy: Convert to mdiobus_c45_{read|write}
Stop using the helpers to construct a special phy address which
indicates C45. Instead use the C45 accessors, which will call the
busses C45 specific read/write API.
Andrew Lunn [Sat, 30 Apr 2022 17:30:33 +0000 (19:30 +0200)]
net: phylink: Convert to mdiobus_c45_{read|write}
Stop using the helpers to construct a special phy address which
indicates C45. Instead use the C45 accessors, which will call the
busses C45 specific read/write API.
this is a pull request of 9 patches for net-next/master.
The first patch is by Biju Das and documents renesas,r9a07g043-canfd
support in the renesas,rcar-canfd bindings document.
Jakub Kicinski's patch removes a copy of the NAPI_POLL_WEIGHT define
from the m_can driver.
The last 7 patches all target the ctucanfd driver. Pavel Pisa provides
2 patch which update the documentation. 2 patches by Jiapeng Chong
remove unneeded includes and error messages. And another 3 patches by
Pavel Pisa to further clean up the driver (remove inline keyword,
remove unneeded debug statements, and remove unneeded module parameters).
nfp: support VxLAN inner TSO with GSO_PARTIAL offload
VxLAN belongs to UDP-based encapsulation protocol. Inner TSO for VxLAN
packet with udpcsum requires offloading of outer header csum.
The device doesn't support outer header csum offload. However, inner TSO
for VxLAN with udpcsum can still work with GSO_PARTIAL offload, which
means outer udp csum computed by stack and inner tcp segmentation finished
by hardware. Thus, the patch enable features "NETIF_F_GSO_UDP_TUNNEL_CSUM"
and "NETIF_F_GSO_PARTIAL" and set gso_partial_features.
Jaehee Park [Fri, 29 Apr 2022 16:46:58 +0000 (12:46 -0400)]
selftests: net: vrf_strict_mode_test: add support to select a test to run
Add a boilerplate test loop to run all tests in
vrf_strict_mode_test.sh. Add a -t flag that allows a selected test to
run. Remove the vrf_strict_mode_tests function which is now unused.
change since v1:
- deleted "depends on patch..." in [1/2]'s commit message
This patchset depends on these fixes [1], which has been merged into
net-next. Since o_seqno is now atomic_t, we can always turn on
NETIF_F_LLTX for [IP6]GRE[TAP] devices, since we no longer need the TX
lock (&txq->_xmit_lock).
We could probably do the same thing to [IP6]ERSPAN devices as well, but
I'm not familiar with them yet. For example, ERSPAN devices are
initialized as |= GRE_FEATURES in erspan_tunnel_init(), but I don't see
IP6ERSPAN devices being initialized as |= GRE6_FEATURES. Where should we
initialize IP6ERSPAN devices' ->features? Please suggest if I'm missing
something, thanks!
Peilin Ye [Fri, 29 Apr 2022 05:25:47 +0000 (22:25 -0700)]
ip6_gre: Make IP6GRE and IP6GRETAP devices always NETIF_F_LLTX
Recently we made o_seqno atomic_t. Stop special-casing TUNNEL_SEQ, and
always mark IP6GRE[TAP] devices as NETIF_F_LLTX, since we no longer need
the TX lock (&txq->_xmit_lock).
Peilin Ye [Fri, 29 Apr 2022 05:25:21 +0000 (22:25 -0700)]
ip_gre: Make GRE and GRETAP devices always NETIF_F_LLTX
Recently we made o_seqno atomic_t. Stop special-casing TUNNEL_SEQ, and
always mark GRE[TAP] devices as NETIF_F_LLTX, since we no longer need
the TX lock (&txq->_xmit_lock).
David S. Miller [Sun, 1 May 2022 16:45:35 +0000 (17:45 +0100)]
Merge branch 'adin1100-industrial-PHY-support'
Alexandru Tachici says:
====================
net: phy: adin1100: Add initial support for ADIN1100 industrial PHY
The ADIN1100 is a low power single port 10BASE-T1L transceiver designed for
industrial Ethernet applications and is compliant with the IEEE 802.3cg
Ethernet standard for long reach 10 Mb/s Single Pair Ethernet.
The ADIN1100 uses Auto-Negotiation capability in accordance
with IEEE 802.3 Clause 98, providing a mechanism for
exchanging information between PHYs to allow link partners to
agree to a common mode of operation.
The concluded operating mode is the transmit amplitude mode and
master/slave preference common across the two devices.
Both device and LP advertise their ability and request for
increased transmit at:
- BASE-T1 autonegotiation advertisement register [47:32]\
Clause 45.2.7.21 of Standard 802.3
- BIT(13) - 10BASE-T1L High Level Transmit Operating Mode Ability
- BIT(12) - 10BASE-T1L High Level Transmit Operating Mode Request
For 2.4 Vpp (high level transmit) operation, both devices need
to have the High Level Transmit Operating Mode Ability bit set,
and only one of them needs to have the High Level Transmit
Operating Mode Request bit set. Otherwise 1.0 Vpp transmit level
will be used.
Settings for eth1:
Supported ports: [ TP MII ]
Supported link modes: 10baseT1L/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT1L/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Link partner advertised link modes: 10baseT1L/Full
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported
Speed: 10Mb/s
Duplex: Full
Auto-negotiation: on
master-slave cfg: preferred slave
master-slave status: slave
Port: Twisted Pair
PHYAD: 0
Transceiver: external
MDI-X: Unknown
Link detected: yes
SQI: 7/7
1. Add basic support for ADIN1100.
Alexandru Ardelean (1):
net: phy: adin1100: Add initial support for ADIN1100 industrial PHY
1. Added 10baset-T1L link modes.
2. Added 10-BasetT1L registers.
3. Added Base-T1 auto-negotiation registers. For Base-T1 these
registers decide master/slave status and TX voltage of the
device and link partner.
4. Added 10BASE-T1L support in phy-c45.c. Now genphy functions will call
Base-T1 functions where registers don't match, like the auto-negotiation ones.
5. Convert MSE to SQI using a predefined table and allow user access
through ethtool.
6. DT bindings for the 2.4 Vpp transmit mode.
====================
Add a tristate property to advertise desired transmit level.
If the device supports the 2.4 Vpp operating mode for 10BASE-T1L,
as defined in 802.3gc, and the 2.4 Vpp transmit voltage operation
is desired, property should be set to 1. This property is used
to select whether Auto-Negotiation advertises a request to
operate the 10BASE-T1L PHY in increased transmit level mode.
If property is set to 1, the PHY shall advertise a request
to operate the 10BASE-T1L PHY in increased transmit level mode.
If property is set to zero, the PHY shall not advertise
a request to operate the 10BASE-T1L PHY in increased transmit level mode.
net: phy: adin1100: Add initial support for ADIN1100 industrial PHY
The ADIN1100 is a low power single port 10BASE-T1L transceiver designed for
industrial Ethernet applications and is compliant with the IEEE 802.3cg
Ethernet standard for long reach 10 Mb/s Single Pair Ethernet.
This patch is needed because the BASE-T1 uses different registers
for status, control and advertisement to those already
employed in the existing phy-c45 functions.
Where required, genphy_c45 functions will now check whether
the device supports BASE-T1 and use the specific registers
instead: 45.2.7.19 BASE-T1 AN control register,
45.2.7.20 BASE-T1 AN status, 45.2.7.21 BASE-T1 AN
advertisement register, 45.2.7.22 BASE-T1 AN LP Base
Page ability register, 45.2.1.185 BASE-T1 PMA/PMD control
register.
Added BASE-T1 AN advertisement register (Registers 7.514, 7.515, and
7.516) and BASE-T1 AN LP Base Page ability register (Registers 7.517,
7.518, and 7.519).
The 802.3gc specification defines the 10-BaseT1L link
mode for ethernet trafic on twisted wire pair.
PMA status register can be used to detect if the phy supports
2.4 V TX level and PCS control register can be used to
enable/disable PCS level loopback.
veth netdevice defines own rx queues and allocates array containing
up to 4095 ~750-bytes-long 'struct veth_rq' elements. Such allocation
is quite huge and should be accounted to memcg.
David S. Miller [Sun, 1 May 2022 11:19:01 +0000 (12:19 +0100)]
Merge branch 'UDP-sock_wfree-opts'
Pavel Begunkov says:
====================
UDP sock_wfree optimisations
The series is not UDP specific but that the main beneficiary. 2/3 saves one
atomic in sock_wfree() and on top 3/3 removes an extra barrier.
Tested with UDP over dummy netdev, 2038491 -> 2099071 req/s (or around +3%).
note: in regards to 1/3, there is a "Should agree with poll..." comment
that I don't completely get, and there is no git history to explain it.
Though I can't see how it could rely on having the second check without
racing with tasks woken by wake_up*().
The series was split from a larger patchset, see
https://lore.kernel.org/netdev/cover.1648981570[email protected]/
====================
Pavel Begunkov [Thu, 28 Apr 2022 10:58:19 +0000 (11:58 +0100)]
sock: optimise sock_def_write_space barriers
Now we have a separate path for sock_def_write_space() and can go one
step further. When it's called from sock_wfree() we know that there is a
preceding atomic for putting down ->sk_wmem_alloc. We can use it to
replace to replace smb_mb() with a less expensive
smp_mb__after_atomic(). It also removes an extra RCU read lock/unlock as
a small bonus.
Pavel Begunkov [Thu, 28 Apr 2022 10:58:18 +0000 (11:58 +0100)]
sock: optimise UDP sock_wfree() refcounting
For non SOCK_USE_WRITE_QUEUE sockets, sock_wfree() (atomically) puts
->sk_wmem_alloc twice. It's needed to keep the socket alive while
calling ->sk_write_space() after the first put.
However, some sockets, such as UDP, are freed by RCU
(i.e. SOCK_RCU_FREE) and use already RCU-safe sock_def_write_space().
Carve a fast path for such sockets, put down all refs in one go before
calling sock_def_write_space() but guard the socket from being freed
by an RCU read section.
note: because TCP sockets are marked with SOCK_USE_WRITE_QUEUE it
doesn't add extra checks in its path.
Except for minor rounding differences the first ->sk_wmem_alloc test in
sock_def_write_space() is a hand coded version of sock_writeable().
Replace it with the helper, and also kill the following if duplicating
the check.
Robert Hancock [Wed, 27 Apr 2022 19:39:28 +0000 (13:39 -0600)]
net: phy: marvell: update abilities and advertising when switching to SGMII
With some SFP modules, such as Finisar FCLF8522P2BTL, the PHY hardware
strapping defaults to 1000BaseX mode, but the kernel prefers to set them
for SGMII mode. When this happens and the PHY is soft reset, the BMSR
status register is updated, but this happens after the kernel has already
read the PHY abilities during probing. This results in support not being
detected for, and the PHY not advertising support for, 10 and 100 Mbps
modes, preventing the link from working with a non-gigabit link partner.
When the PHY is being configured for SGMII mode, call genphy_read_abilities
again in order to re-read the capabilities, and update the advertising
field accordingly.
There are two major issues in the logic calculating the symbol durations
based on the page/channel:
- The page number is used in place of the channel value.
- The BIT() macro is missing because we want to check the channel
value against a bitmask.
Fix these two errors and apologize loudly for this mistake.
Starting from the blamed commit, the lan966x build fails with the
following compilation error:
drivers/net/ethernet/microchip/lan966x/lan966x_ptp.c:342:9: error: implicit declaration of function ‘ptp_find_pin_unlocked’ [-Werror=implicit-function-declaration]
342 | pin = ptp_find_pin_unlocked(phc->clock, PTP_PF_EXTTS, 0);
The issue is that there is no stub function for ptp_find_pin_unlocked
in case CONFIG_PTP_1588_CLOCK is not selected. Therefore add one.
Reported-by: kernel test robot <[email protected]> Fixes: f3d8e0a9c28ba0 ("net: lan966x: Add support for PTP_PF_EXTTS") Signed-off-by: Horatiu Vultur <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Colin Foster [Fri, 29 Apr 2022 21:30:36 +0000 (14:30 -0700)]
net: ethernet: ocelot: remove the need for num_stats initializer
There is a desire to share the oclot_stats_layout struct outside of the
current vsc7514 driver. In order to do so, the length of the array needs to
be known at compile time, and defined in the struct ocelot and struct
felix_info.
Since the array is defined in a .c file and would be declared in the header
file via:
extern struct ocelot_stat_layout[];
the size of the array will not be known at compile time to outside modules.
To fix this, remove the need for defining the number of stats at compile
time and allow this number to be determined at initialization.
Eric Dumazet [Sat, 30 Apr 2022 01:15:23 +0000 (18:15 -0700)]
tcp: drop skb dst in tcp_rcv_established()
In commit f84af32cbca7 ("net: ip_queue_rcv_skb() helper")
I dropped the skb dst in tcp_data_queue().
This only dealt with so-called TCP input slow path.
When fast path is taken, tcp_rcv_established() calls
tcp_queue_rcv() while skb still has a dst.
This was mostly fine, because most dsts at this point
are not refcounted (thanks to early demux)
However, TCP packets sent over loopback have refcounted dst.
Then commit 68822bdf76f1 ("net: generalize skb freeing
deferral to per-cpu lists") came and had the effect
of delaying skb freeing for an arbitrary time.
If during this time the involved netns is dismantled, cleanup_net()
frees the struct net with embedded net->ipv6.ip6_dst_ops.
Then when eventually dst_destroy_rcu() is called,
if (dst->ops->destroy) ... triggers an use-after-free.
It is not clear if ip6_route_net_exit() lacks a rcu_barrier()
as syzbot reported similar issues before the blamed commit.
David S. Miller [Sat, 30 Apr 2022 12:09:26 +0000 (13:09 +0100)]
Merge branch 'lan966x-phy-reset-remove'
Michael Walle says:
====================
net: lan966x: remove PHY reset support
Remove the unneeded PHY reset node as well as the driver support for it.
This was already discussed [1] and I expect Microchip to Ack on this
removal. Since there is no user, no breakage is expected.
I'm not sure it this should go through net or net-next and if the patches
should have a Fixes: tag or not. In upstream linux there was never any user
of it, so there is no bug to be fixed. But OTOH if the schema fix isn't
backported, then there might be an older schema version still containing
the reset node. Thoughts?
The patches needed for the GPIO part are just waiting to be picked up by
Linus [2,3]. This patch and the GPIO parts are the last pieces of the
puzzle to get ethernet working on the LAN9668 on upstream linux.
Michael Walle [Thu, 28 Apr 2022 11:40:49 +0000 (13:40 +0200)]
net: lan966x: remove PHY reset support
The PHY subsystem as well as the MIIM mdio driver (in case of the
integrated PHYs) will take care of the resets. A separate reset driver
isn't needed. There is no in-tree user of this feature. Remove the
support.
Michael Walle [Thu, 28 Apr 2022 11:40:48 +0000 (13:40 +0200)]
dt-bindings: net: lan966x: remove PHY reset
The PHY reset was intended to be a phandle for a special PHY reset
driver for the integrated PHYs as well as any external PHYs. It turns
out, that the culprit is how the reset of the switch device is done.
In particular, the switch reset also affects other subsystems like
the GPIO and the SGPIO block and it happens to be the case that the
reset lines of the external PHYs are connected to a common GPIO line.
Thus as soon as the switch issues a reset during probe time, all the
external PHYs will go into reset because all the GPIO lines will
switch to input and the pull-down on that signal will take effect.
So even if there was a special PHY reset driver, it (1) won't fix
the root cause of the problem and (2) it won't fix all the other
consumers of GPIO lines which will also be reset.
It turns out, the Ocelot SoC has the same weird behavior (or the
lack of a dedicated switch reset) and there the problem is already
solved and all the bits and pieces are already there and this PHY
reset property isn't not needed at all.
There are no users of this binding. Just remove it.
David S. Miller [Sat, 30 Apr 2022 11:58:45 +0000 (12:58 +0100)]
Merge branch 'ipv6-net-opts'
Pavel Begunkov says:
====================
generic net and ipv6 minor optimisations
1-3 inline simple functions that only reshuffle arguments possibly adding
extra zero args, and call another function. It was benchmarked before with
a bunch of extra patches, see for details
It may increase the binary size, but it's the right thing to do and at least
without modules it actually sheds some bytes for some standard-ish config.
text data bss dec hex filename 9627200 0 0 9627200 92e640 ./arch/x86_64/boot/bzImage
text data bss dec hex filename 9627104 0 0 9627104 92e5e0 ./arch/x86_64/boot/bzImage
====================
Pavel Begunkov [Thu, 28 Apr 2022 10:58:47 +0000 (11:58 +0100)]
ipv6: help __ip6_finish_output() inlining
There are two callers of __ip6_finish_output(), both are in
ip6_finish_output(). We can combine the call sites into one and handle
return code after, that will inline __ip6_finish_output().
Note, error handling under NET_XMIT_CN will only return 0 if
__ip6_finish_output() succeded, and in this case it return 0.
Considering that NET_XMIT_SUCCESS is 0, it'll be returning exactly the
same result for it as before.
Pavel Begunkov [Thu, 28 Apr 2022 10:58:46 +0000 (11:58 +0100)]
net: inline dev_queue_xmit()
Inline dev_queue_xmit() and dev_queue_xmit_accel(), they both are small
proxy functions doing nothing but redirecting the control flow to
__dev_queue_xmit().
This is currently done for CMSG_INQ, add an ability to do so via struct
msghdr as well and have CMSG_INQ use that too. If the caller sets
msghdr->msg_get_inq, then we'll pass back the hint in msghdr->msg_inq.
Rearrange struct msghdr a bit so we can add this member while shrinking
it at the same time. On a 64-bit build, it was 96 bytes before this
change and 88 bytes afterwards.
nfp: flower: utilize the tuple iifidx in offloading ct flows
The device info from which conntrack originates is stored in metadata
field of the ct flow to offload now, driver can utilize it to reduce
the number of offloaded flows.
v2: Drop inline keyword from get_netdev_from_rule() signature.
The compiler can decide.
sfc: add EF100 VF support via a write to sriov_numvfs
This patch extends the EF100 PF driver by adding .sriov_configure()
which would allow users to enable and disable virtual functions
using the sriov sysfs.
MPTCP already has an in-kernel path manager (PM) to add and remove TCP
subflows associated with a given MPTCP connection. This in-kernel PM has
been designed to handle typical server-side use cases, but is not very
flexible or configurable for client devices that may have more
complicated policies to implement.
This patch series from the MPTCP tree is the first step toward adding a
generic-netlink-based API for MPTCP path management, which a privileged
userspace daemon will be able to use to control subflow
establishment. These patches add a per-namespace sysctl to select the
default PM type (in-kernel or userspace) for new MPTCP sockets. New
self-tests confirm expected behavior when userspace PM is selected but
there is no daemon available to handle existing MPTCP PM events.
Subsequent patch series (already staged in the MPTCP tree) will add the
generic netlink path management API.
====================
Mat Martineau [Wed, 27 Apr 2022 22:50:02 +0000 (15:50 -0700)]
selftests: mptcp: Add tests for userspace PM type
These tests ensure that the in-kernel path manager is bypassed when
the userspace path manager is configured. Kernel code is still
responsible for ADD_ADDR echo, so also make sure that's working.
Mat Martineau [Wed, 27 Apr 2022 22:49:59 +0000 (15:49 -0700)]
mptcp: Bypass kernel PM when userspace PM is enabled
When a MPTCP connection is managed by a userspace PM, bypass the kernel
PM for incoming advertisements and subflow events. Netlink events are
still sent to userspace.
v2: Remove unneeded check in mptcp_pm_rm_addr_received() (Kishen Maloor)
v3: Add and use helper function for PM mode (Paolo Abeni)
Mat Martineau [Wed, 27 Apr 2022 22:49:58 +0000 (15:49 -0700)]
mptcp: Add a member to mptcp_pm_data to track kernel vs userspace mode
When adding support for netlink path management commands, the kernel
needs to know whether paths are being controlled by the in-kernel path
manager or a userspace PM.
Mat Martineau [Wed, 27 Apr 2022 22:49:57 +0000 (15:49 -0700)]
mptcp: Remove redundant assignments in path manager init
A few members of the mptcp_pm_data struct were assigned to hard-coded
values in mptcp_pm_data_reset(), and then immediately changed in
mptcp_pm_nl_data_init().
Instead, flatten all the assignments in to mptcp_pm_data_reset().
v2: Resolve conflicts due to rename of mptcp_pm_data_reset()
v4: Resolve conflict in mptcp_pm_data_reset()
Michael Walle [Wed, 27 Apr 2022 21:44:06 +0000 (23:44 +0200)]
net: phy: micrel: add coma mode GPIO
The LAN8814 has a coma mode pin which puts the PHY into isolate and
power-dowm mode. Unfortunately, the mode cannot be disabled by a
register. Usually, the input pin has a pull-up and connected to a GPIO
which can then be used to disable the mode. Try to get the GPIO and
deassert it.
Michael Walle [Wed, 27 Apr 2022 21:44:05 +0000 (23:44 +0200)]
net: phy: micrel: move the PHY timestamping check
Both lan8814_ptp_init() and lan8814_ptp_probe_once() are only used if
PTP and PHY timestamping is enabed. Up until now the probe function just
returns early, if they are not needed. But we need the
phy_package_init_once() functionality for the coma mode GPIO setup. Move
the check into the functions itself.
The LAN8814 has a coma mode pin which is used to put the PHY into
isolate and power-down mode. Usually strapped to be asserted by default.
A GPIO is then used to take the PHY out of this mode.
David S. Miller [Fri, 29 Apr 2022 10:57:01 +0000 (11:57 +0100)]
Merge branch 'remove-NAPI_POLL_WEIGHT-copies'
Merge branch 'remove-NAPI_POLL_WEIGHT-copies'
Jakub Kicinski says:
====================
remove copies of the NAPI_POLL_WEIGHT define
netif_napi_add() takes weight as the last argument. The value of
that parameter is hard to come up with and depends on many factors,
so driver authors are encouraged to use NAPI_POLL_WEIGHT.
We should probably move weight to an "advanced" version of the API
(__netif_napi_add()?) and simplify the life of most driver authors.
In preparation for such API changes this series removes local
defines equivalent to NAPI_POLL_WEIGHT from drivers, so that a simple
coccinelle / spatch script does not get thrown off by them.