]> Git Repo - linux.git/log
linux.git
9 months agoio_uring/net: move charging socket out of zc io_uring
Pavel Begunkov [Thu, 27 Jun 2024 12:59:44 +0000 (13:59 +0100)]
io_uring/net: move charging socket out of zc io_uring

Currently, io_uring's io_sg_from_iter() duplicates the part of
__zerocopy_sg_from_iter() charging pages to the socket. It'd be too easy
to miss while changing it in net/, the chunk is not the most
straightforward for outside users and full of internal implementation
details. io_uring is not a good place to keep it, deduplicate it by
moving out of the callback into __zerocopy_sg_from_iter().

Signed-off-by: Pavel Begunkov <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
9 months agonet: batch zerocopy_fill_skb_from_iter accounting
Pavel Begunkov [Thu, 27 Jun 2024 12:59:43 +0000 (13:59 +0100)]
net: batch zerocopy_fill_skb_from_iter accounting

Instead of accounting every page range against the socket separately, do
it in batch based on the change in skb->truesize. It's also moved into
__zerocopy_sg_from_iter(), so that zerocopy_fill_skb_from_iter() is
simpler and responsible for setting frags but not the accounting.

Signed-off-by: Pavel Begunkov <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
9 months agonet: split __zerocopy_sg_from_iter()
Pavel Begunkov [Thu, 27 Jun 2024 12:59:42 +0000 (13:59 +0100)]
net: split __zerocopy_sg_from_iter()

Split a function out of __zerocopy_sg_from_iter() that only cares about
the traditional path with refcounted pages and doesn't need to know
about ->sg_from_iter. A preparation patch, we'll improve on the function
later.

Signed-off-by: Pavel Begunkov <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
9 months agonet: always try to set ubuf in skb_zerocopy_iter_stream
Pavel Begunkov [Thu, 27 Jun 2024 12:59:41 +0000 (13:59 +0100)]
net: always try to set ubuf in skb_zerocopy_iter_stream

skb_zcopy_set() does nothing if there is already a ubuf_info associated
with an skb, and since ->link_skb should have set it several lines above
the check here essentially does nothing and can be removed. It's also
safer this way, because even if the callback is faulty we'll
have it set.

Signed-off-by: Pavel Begunkov <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
9 months agonet: phy: fix potential use of NULL pointer in phy_suspend()
Russell King (Oracle) [Fri, 28 Jun 2024 10:32:11 +0000 (11:32 +0100)]
net: phy: fix potential use of NULL pointer in phy_suspend()

phy_suspend() checks the WoL status, and then dereferences
phydrv->flags if (and only if) we decided that WoL has been enabled
on either the PHY or the netdev.

We then check whether phydrv was NULL, but we've potentially already
dereferenced the pointer.

If phydrv is NULL, then phy_ethtool_get_wol() will return an error
and leave wol.wolopts set to zero. However, if netdev->wol_enabled
is true, then we would dereference a NULL pointer.

Checking the PHY drivers, the only place that phydev->wol_enabled is
checked by them is in their suspend/resume callbacks and nowhere else
(which is correct, because phylib only updates this in phy_suspend()).

So, move the NULL pointer check earlier to avoid a NULL pointer
dereference. Leave the check for phydrv->suspend in place as a driver
may populate the .resume method but not the .suspend method.

Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agoMerge tag 'linux-can-next-for-6.11-20240629' of git://git.kernel.org/pub/scm/linux...
Jakub Kicinski [Tue, 2 Jul 2024 03:04:58 +0000 (20:04 -0700)]
Merge tag 'linux-can-next-for-6.11-20240629' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2024-06-29

Geert Uytterhoeven contributes 3 patches with small improvements and
cleanups for the rcar_canfd driver.

A patch by Christophe JAILLET constifies the struct m_can_ops in the
m_can driver to reduce the code size.

The last 9 patches are by me an work around erratum DS80000789E 6 of
mcp2518fd.

* tag 'linux-can-next-for-6.11-20240629' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
  can: mcp251xfd: tef: update workaround for erratum DS80000789E 6 of mcp2518fd
  can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum
  can: mcp251xfd: rx: add workaround for erratum DS80000789E 6 of mcp2518fd
  can: mcp251xfd: rx: prepare to workaround broken RX FIFO head index erratum
  can: mcp251xfd: mcp251xfd_handle_rxif_ring_uinc(): factor out in separate function
  can: mcp251xfd: clarify the meaning of timestamp
  can: mcp251xfd: move mcp251xfd_timestamp_start()/stop() into mcp251xfd_chip_start/stop()
  can: mcp251xfd: update errata references
  can: mcp251xfd: properly indent labels
  can: gs_usb: add VID/PID for Xylanta SAINT3 product family
  can: m_can: Constify struct m_can_ops
  can: rcar_canfd: Remove superfluous parentheses in address calculations
  can: rcar_canfd: Improve printing of global operational state
  can: rcar_canfd: Simplify clock handling
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agonet: ethtool: Fix the panic caused by dev being null when dumping coalesce
Heng Qi [Fri, 28 Jun 2024 04:40:18 +0000 (12:40 +0800)]
net: ethtool: Fix the panic caused by dev being null when dumping coalesce

syzbot reported a general protection fault caused by a null pointer
dereference in coalesce_fill_reply(). The issue occurs when req_base->dev
is null, leading to an invalid memory access.

This panic occurs if dumping coalesce when no device name is specified.

Fixes: f750dfe825b9 ("ethtool: provide customized dim profile management")
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=e77327e34cdc8c36b7d3
Signed-off-by: Heng Qi <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
David S. Miller [Mon, 1 Jul 2024 12:11:57 +0000 (13:11 +0100)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue into main

Tony nguyen says:

====================
Intel Wired LAN Driver Updates 2024-06-28 (MAINTAINERS, ice)

This series contains updates to MAINTAINERS file and ice driver.

Jesse replaces himself with Przemek in the maintainers file.

Karthik Sundaravel adds support for VF get/set MAC address via devlink.

Eric checks for errors from ice_vsi_rebuild() during queue
reconfiguration.

Paul adjusts FW API version check for E830 devices.

Piotr adds differentiation of unload type when shutting down AdminQ.

Przemek changes ice_adapter initialization to occur once per physical
card.
====================

Signed-off-by: David S. Miller <[email protected]>
9 months agoMerge branch 'bnxt_en-ptp' into main
David S. Miller [Mon, 1 Jul 2024 10:23:22 +0000 (11:23 +0100)]
Merge branch 'bnxt_en-ptp' into main

Michael Chan says:

====================
bnxt_en: PTP updates for net-next

The first 5 patches implement the PTP feature on the new BCM5760X
chips.  The main new hardware feature is the new TX timestamp
completion which enables the driver to retrieve the TX timestamp
in NAPI without deferring to the PTP worker.

The last 5 patches increase the number of TX PTP packets in-flight
from 1 to 4 on the older BCM5750X chips.  On these older chips, we
need to call firmware in the PTP worker to retrieve the timestamp.
We use an arry to keep track of the in-flight TX PTP packets.

v2: Patch #2: Fix the unwind of txr->is_ts_pkt when bnxt_start_xmit() aborts.
    Patch #4: Set the SKBTX_IN_PROGRESS flag for timestamp packets.
====================

Signed-off-by: David S. Miller <[email protected]>
9 months agobnxt_en: Remove atomic operations on ptp->tx_avail
Pavan Chebbi [Fri, 28 Jun 2024 19:30:05 +0000 (12:30 -0700)]
bnxt_en: Remove atomic operations on ptp->tx_avail

Now that we require the spinlock to protect ptp->txts_prod, change
ptp->tx_avail to non-atomic and protect it under the same spinlock.
Add a new helper function bnxt_ptp_get_txts_prod() to decrement
ptp->tx_avail under spinlock and return the producer.

Signed-off-by: Pavan Chebbi <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agobnxt_en: Increase the max total outstanding PTP TX packets to 4
Pavan Chebbi [Fri, 28 Jun 2024 19:30:04 +0000 (12:30 -0700)]
bnxt_en: Increase the max total outstanding PTP TX packets to 4

Start accepting up to 4 TX TS requests on BCM5750X (P5) chips.
These PTP TX packets will be queued in the ptp->txts_req[] array
waiting for the TX timestamp to complete.  The entries in the
array will be managed by a producer and consumer index.  The
producer index is updated under spinlock since multiple TX rings
can try to send PTP packets at the same time.

Signed-off-by: Pavan Chebbi <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agobnxt_en: Let bnxt_stamp_tx_skb() return error code
Pavan Chebbi [Fri, 28 Jun 2024 19:30:03 +0000 (12:30 -0700)]
bnxt_en: Let bnxt_stamp_tx_skb() return error code

Change the function bnxt_stamp_tx_skb() to return 0 for suceess
or -EAGAIN if the timestamp is still pending in firmware.  The
calling PTP aux worker will reschedule based on the return code.

Signed-off-by: Pavan Chebbi <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agobnxt_en: Remove an impossible condition check for PTP TX pending SKB
Pavan Chebbi [Fri, 28 Jun 2024 19:30:02 +0000 (12:30 -0700)]
bnxt_en: Remove an impossible condition check for PTP TX pending SKB

In the current 5750X PTP code paths, there is always at most one TX
SKB requested for timestamp and we won't accept another one until we
have retrieved the timestamp or it has timed out.  Remove the
unnecessary check in bnxt_get_tx_ts_p5() for a pending SKB and change
the function to void.

Signed-off-by: Pavan Chebbi <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agobnxt_en: Refactor all PTP TX timestamp fields into a struct
Pavan Chebbi [Fri, 28 Jun 2024 19:30:01 +0000 (12:30 -0700)]
bnxt_en: Refactor all PTP TX timestamp fields into a struct

On the older 5750X (P5) chips, we currently support only 1 TX PTP
packet in-flight waiting for the timestamp.  Refactor the
datastructures to prepare to support up to 4 TX PTP packets.

Combine all fields required for PTP TX timestamp query into one
structure.  An array of this structure will be added in follow-on
patches to support multiple outstanding TX timestamps.

Signed-off-by: Pavan Chebbi <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agobnxt_en: Add BCM5760X specific PHC registers mapping
Pavan Chebbi [Fri, 28 Jun 2024 19:30:00 +0000 (12:30 -0700)]
bnxt_en: Add BCM5760X specific PHC registers mapping

BCM5760X firmware will advertise direct 64-bit PHC registers access
for the driver from BAR0.

Make the necessary changes in handling HWRM_PORT_MAC_PTP_QCFG's
response and PHC register mapping for 5760X chips.

Signed-off-by: Pavan Chebbi <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agobnxt_en: Add TX timestamp completion logic
Michael Chan [Fri, 28 Jun 2024 19:29:59 +0000 (12:29 -0700)]
bnxt_en: Add TX timestamp completion logic

The new BCM5760X chips will return the timestamp of TX packets in a
new completion.  Add logic in __bnxt_poll_work() to handle this
completion type to retrieve the timestamp.  This feature eliminates
the limit on the number of in-flight PTP TX packets.

Reviewed-by: Pavan Chebbi <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agobnxt_en: Allow some TX packets to be unprocessed in NAPI
Michael Chan [Fri, 28 Jun 2024 19:29:58 +0000 (12:29 -0700)]
bnxt_en: Allow some TX packets to be unprocessed in NAPI

The driver's current logic will always free all the TX SKBs up to
txr->tx_hw_cons within NAPI.  In the next patches, we'll be adding
logic to handle TX timestamp completion and we may need to hold
some remaining TX SKBs if we don't have the timestamp completions
yet.

Modify __bnxt_poll_work_done() to clear each event bit separately to
allow bnapi->tx_int() to decide whether to clear BNXT_TX_CMP_EVENT or
not.  bnapi->tx_int() will not clear BNXT_TX_CMP_EVENT if some TX
SKBs are held waiting for TX timestamps.  Note that legacy chips will
never hold any SKBs this way.  The SKB is always deferred to the PTP
worker slow path to retrieve the timestamp from firmware.  On the new
P7 chips, the timestamp is returned by the hardware directly and we
can retrieve it directly from NAPI.

Reviewed-by: Pavan Chebbi <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agobnxt_en: Add is_ts_pkt field to struct bnxt_sw_tx_bd
Michael Chan [Fri, 28 Jun 2024 19:29:57 +0000 (12:29 -0700)]
bnxt_en: Add is_ts_pkt field to struct bnxt_sw_tx_bd

Remove the unused is_gso field and add the is_ts_pkt field to struct
bnxt_sw_tx_bd.  This field will mark the TX BD that has requested
HW TX timestamp.  The field needs to be cleared if the timestamp packet
is later aborted.  This field will be useful when processing the
new TX timestamp completion from the hardware in the next patches.

Reviewed-by: Pavan Chebbi <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agobnxt_en: Add new TX timestamp completion definitions
Michael Chan [Fri, 28 Jun 2024 19:29:56 +0000 (12:29 -0700)]
bnxt_en: Add new TX timestamp completion definitions

The new BCM5760X chips will generate this new TX timestamp completion
when a TX packet's timestamp has been taken right before transmission.
The driver logic to retrieve the timestamp will be added in the next
few patches.

Reviewed-by: Pavan Chebbi <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoocteontx2-af: Sync NIX and NPA contexts from NDC to LLC/DRAM
Nithin Dabilpuram [Fri, 28 Jun 2024 16:31:26 +0000 (22:01 +0530)]
octeontx2-af: Sync NIX and NPA contexts from NDC to LLC/DRAM

Octeontx2 hardware uses Near Data Cache(NDC) block to cache
contexts in it so that access to LLC/DRAM can be avoided.
It is recommended in HRM to sync the NDC contents before
releasing/resetting LF resources. Hence implement NDC_SYNC
mailbox and sync contexts during driver teardown.

Signed-off-by: Nithin Dabilpuram <[email protected]>
Signed-off-by: Subbaraya Sundeep <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agonet: tn40xx: add initial ethtool_ops support
FUJITA Tomonori [Fri, 28 Jun 2024 13:41:16 +0000 (22:41 +0900)]
net: tn40xx: add initial ethtool_ops support

Call phylink_ethtool_ksettings_get() for get_link_ksettings method and
ethtool_op_get_link() for get_link method.

Signed-off-by: FUJITA Tomonori <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoMerge tag 'nf-next-24-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilt...
David S. Miller [Mon, 1 Jul 2024 08:52:35 +0000 (09:52 +0100)]
Merge tag 'nf-next-24-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next into main

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next:

Patch #1 to #11 to shrink memory consumption for transaction objects:

  struct nft_trans_chain { /* size: 120 (-32), cachelines: 2, members: 10 */
  struct nft_trans_elem { /* size: 72 (-40), cachelines: 2, members: 4 */
  struct nft_trans_flowtable { /* size: 80 (-48), cachelines: 2, members: 5 */
  struct nft_trans_obj { /* size: 72 (-40), cachelines: 2, members: 4 */
  struct nft_trans_rule { /* size: 80 (-32), cachelines: 2, members: 6 */
  struct nft_trans_set { /* size: 96 (-24), cachelines: 2, members: 8 */
  struct nft_trans_table { /* size: 56 (-40), cachelines: 1, members: 2 */

  struct nft_trans_elem can now be allocated from kmalloc-96 instead of
  kmalloc-128 slab.

  Series from Florian Westphal. For the record, I have mangled patch #1
  to add nft_trans_container_*() and use if for every transaction object.
   I have also added BUILD_BUG_ON to ensure struct nft_trans always comes
  at the beginning of the container transaction object. And few minor
  cleanups, any new bugs are of my own.

Patch #12 simplify check for SCTP GSO in IPVS, from Ismael Luceno.

Patch #13 nf_conncount key length remains in the u32 bound, from Yunjian Wang.

Patch #14 removes unnecessary check for CTA_TIMEOUT_L3PROTO when setting
          default conntrack timeouts via nfnetlink_cttimeout API, from
          Lin Ma.

Patch #15 updates NFT_SECMARK_CTX_MAXLEN to 4096, SELinux could use
          larger secctx names than the existing 256 bytes length.

Patch #16 adds a selftest to exercise nfnetlink_queue listeners leaving
          nfnetlink_queue, from Florian Westphal.

Patch #17 increases hitcount from 255 to 65535 in xt_recent, from Phil Sutter.
====================

Signed-off-by: David S. Miller <[email protected]>
9 months agoMerge branch 'tcp_metrics-netlink-specs' into main
David S. Miller [Mon, 1 Jul 2024 08:44:27 +0000 (09:44 +0100)]
Merge branch 'tcp_metrics-netlink-specs' into main

Jakub Kicinski says:

====================
tcp_metrics: add netlink protocol spec in YAML

Add a netlink protocol spec for the tcp_metrics generic netlink family.
First patch adjusts the uAPI header guards to make it easier to build
tools/ with non-system headers.

v1: https://lore.kernel.org/all/20240626201133.2572487[email protected]
====================

Signed-off-by: David S. Miller <[email protected]>
9 months agotcp_metrics: add netlink protocol spec in YAML
Jakub Kicinski [Thu, 27 Jun 2024 21:35:51 +0000 (14:35 -0700)]
tcp_metrics: add netlink protocol spec in YAML

Add a protocol spec for tcp_metrics, so that it's accessible via YNL.
Useful at the very least for testing fixes.

In this episode of "10,000 ways to complicate netlink" the metric
nest has defines which are off by 1. iproute2 does:

        struct rtattr *m[TCP_METRIC_MAX + 1 + 1];

        parse_rtattr_nested(m, TCP_METRIC_MAX + 1, a);

        for (i = 0; i < TCP_METRIC_MAX + 1; i++) {
                // ...
                attr = m[i + 1];

This is too weird to support in YNL, add a new set of defines
with _correct_ values to the official kernel header.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Donald Hunter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agotcp_metrics: add UAPI to the header guard
Jakub Kicinski [Thu, 27 Jun 2024 21:35:50 +0000 (14:35 -0700)]
tcp_metrics: add UAPI to the header guard

tcp_metrics' header lacks the customary _UAPI in the header guard.
This makes YNL build rules work less seamlessly.
We can easily fix that on YNL side, but this could also be
problematic if we ever needed to create a kernel-only tcp_metrics.h.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Donald Hunter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agonet: phy: realtek: Add support for PHY LEDs on RTL8211F
Marek Vasut [Tue, 25 Jun 2024 20:42:17 +0000 (22:42 +0200)]
net: phy: realtek: Add support for PHY LEDs on RTL8211F

Realtek RTL8211F Ethernet PHY supports 3 LED pins which are used to
indicate link status and activity. Add minimal LED controller driver
supporting the most common uses with the 'netdev' trigger.

Signed-off-by: Marek Vasut <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoMerge branch 'ethtool-track-custom-rss-contexts-in-the-core'
Jakub Kicinski [Sat, 29 Jun 2024 01:51:38 +0000 (18:51 -0700)]
Merge branch 'ethtool-track-custom-rss-contexts-in-the-core'

Edward Cree says:

====================
ethtool: track custom RSS contexts in the core

Make the core responsible for tracking the set of custom RSS contexts,
 their IDs, indirection tables, hash keys, and hash functions; this
 lets us get rid of duplicative code in drivers, and will allow us to
 support netlink dumps later.

This series only moves the sfc EF10 & EF100 driver over to the new API;
 other drivers (mvpp2, octeontx2, mlx5, sfc/siena, bnxt_en) can be converted
 afterwards and the legacy API removed.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agosfc: remove get_rxfh_context dead code
Edward Cree [Thu, 27 Jun 2024 15:33:54 +0000 (16:33 +0100)]
sfc: remove get_rxfh_context dead code

The core now always satisfies 'ethtool -x context nonzero' from its own
 tracking, so our lookup code for that case is never called.  Remove it.

Signed-off-by: Edward Cree <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Link: https://patch.msgid.link/b426fcc416dedc8f203e52eebef6891eccebe4c1.1719502240.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agonet: ethtool: use the tracking array for get_rxfh on custom RSS contexts
Edward Cree [Thu, 27 Jun 2024 15:33:53 +0000 (16:33 +0100)]
net: ethtool: use the tracking array for get_rxfh on custom RSS contexts

On 'ethtool -x' with rss_context != 0, instead of calling the driver to
 read the RSS settings for the context, just get the settings from the
 rss_ctx xarray, and return them to the user with no driver involvement.

Signed-off-by: Edward Cree <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Link: https://patch.msgid.link/2d0190fa29638f307ea720f882ebd41f6f867694.1719502240.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agosfc: use new rxfh_context API
Edward Cree [Thu, 27 Jun 2024 15:33:52 +0000 (16:33 +0100)]
sfc: use new rxfh_context API

The core is now responsible for allocating IDs and a memory region for
 us to store our state (struct efx_rss_context_priv), so we no longer
 need efx_alloc_rss_context_entry() and friends.
Since the contexts are now maintained by the core, use the core's lock
 (net_dev->ethtool->rss_lock), rather than our own mutex (efx->rss_lock),
 to serialise access against changes; and remove the now-unused
 efx->rss_lock from struct efx_nic.

Signed-off-by: Edward Cree <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Link: https://patch.msgid.link/150274740ea8cc137fef5502541ce573d32fb319.1719502240.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agonet: ethtool: add a mutex protecting RSS contexts
Edward Cree [Thu, 27 Jun 2024 15:33:51 +0000 (16:33 +0100)]
net: ethtool: add a mutex protecting RSS contexts

While this is not needed to serialise the ethtool entry points (which
 are all under RTNL), drivers may have cause to asynchronously access
 dev->ethtool->rss_ctx; taking dev->ethtool->rss_lock allows them to
 do this safely without needing to take the RTNL.

Signed-off-by: Edward Cree <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Link: https://patch.msgid.link/7f9c15eb7525bf87af62c275dde3a8570ee8bf0a.1719502240.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agonet: ethtool: add an extack parameter to new rxfh_context APIs
Edward Cree [Thu, 27 Jun 2024 15:33:50 +0000 (16:33 +0100)]
net: ethtool: add an extack parameter to new rxfh_context APIs

Currently passed as NULL, but will allow drivers to report back errors
 when ethnl support for these ops is added.

Signed-off-by: Edward Cree <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Link: https://patch.msgid.link/6e0012347d175fdd1280363d7bfa76a2f2777e17.1719502240.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agonet: ethtool: let the core choose RSS context IDs
Edward Cree [Thu, 27 Jun 2024 15:33:49 +0000 (16:33 +0100)]
net: ethtool: let the core choose RSS context IDs

Add a new API to create/modify/remove RSS contexts, that passes in the
 newly-chosen context ID (not as a pointer) rather than leaving the
 driver to choose it on create.  Also pass in the ctx, allowing drivers
 to easily use its private data area to store their hardware-specific
 state.
Keep the existing .set_rxfh API for now as a fallback, but deprecate it
 for custom contexts (rss_context != 0).

Signed-off-by: Edward Cree <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Link: https://patch.msgid.link/45f1fe61df2163c091ec394c9f52000c8b16cc3b.1719502240.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agonet: ethtool: record custom RSS contexts in the XArray
Edward Cree [Thu, 27 Jun 2024 15:33:48 +0000 (16:33 +0100)]
net: ethtool: record custom RSS contexts in the XArray

Since drivers are still choosing the context IDs, we have to force the
 XArray to use the ID they've chosen rather than picking one ourselves,
 and handle the case where they give us an ID that's already in use.

Signed-off-by: Edward Cree <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Link: https://patch.msgid.link/801f5faa4cec87c65b2c6e27fb220c944bce593a.1719502240.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agonet: ethtool: attach an XArray of custom RSS contexts to a netdevice
Edward Cree [Thu, 27 Jun 2024 15:33:47 +0000 (16:33 +0100)]
net: ethtool: attach an XArray of custom RSS contexts to a netdevice

Each context stores the RXFH settings (indir, key, and hfunc) as well
 as optionally some driver private data.
Delete any still-existing contexts at netdev unregister time.

Signed-off-by: Edward Cree <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Link: https://patch.msgid.link/cbd1c402cec38f2e03124f2ab65b4ae4e08bd90d.1719502240.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agonet: move ethtool-related netdev state into its own struct
Edward Cree [Thu, 27 Jun 2024 15:33:46 +0000 (16:33 +0100)]
net: move ethtool-related netdev state into its own struct

net_dev->ethtool is a pointer to new struct ethtool_netdev_state, which
 currently contains only the wol_enabled field.

Suggested-by: Jakub Kicinski <[email protected]>
Signed-off-by: Edward Cree <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Link: https://patch.msgid.link/293a562278371de7534ed1eb17531838ca090633.1719502239.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agoMerge branch 'selftests-drv-net-add-ability-to-schedule-cleanup-with-defer'
Jakub Kicinski [Sat, 29 Jun 2024 01:39:43 +0000 (18:39 -0700)]
Merge branch 'selftests-drv-net-add-ability-to-schedule-cleanup-with-defer'

Jakub Kicinski says:

====================
selftests: drv-net: add ability to schedule cleanup with defer()

Introduce a defer / cleanup mechanism for driver selftests.
More detailed info in the second patch.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agoselftests: drv-net: rss_ctx: convert to defer()
Jakub Kicinski [Thu, 27 Jun 2024 18:55:02 +0000 (11:55 -0700)]
selftests: drv-net: rss_ctx: convert to defer()

Use just added defer().

Reviewed-by: Petr Machata <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agoselftests: drv-net: add ability to schedule cleanup with defer()
Jakub Kicinski [Thu, 27 Jun 2024 18:55:01 +0000 (11:55 -0700)]
selftests: drv-net: add ability to schedule cleanup with defer()

This implements what I was describing in [1]. When writing a test
author can schedule cleanup / undo actions right after the creation
completes, eg:

  cmd("touch /tmp/file")
  defer(cmd, "rm /tmp/file")

defer() takes the function name as first argument, and the rest are
arguments for that function. defer()red functions are called in
inverse order after test exits. It's also possible to capture them
and execute earlier (in which case they get automatically de-queued).

  undo = defer(cmd, "rm /tmp/file")
  # ... some unsafe code ...
  undo.exec()

As a nice safety all exceptions from defer()ed calls are captured,
printed, and ignored (they do make the test fail, however).
This addresses the common problem of exceptions in cleanup paths
often being unhandled, leading to potential leaks.

There is a global action queue, flushed by ksft_run(). We could support
function level defers too, I guess, but there's no immediate need..

Link: https://lore.kernel.org/all/[email protected]/
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agoselftests: net: ksft: avoid continue when handling results
Jakub Kicinski [Thu, 27 Jun 2024 18:55:00 +0000 (11:55 -0700)]
selftests: net: ksft: avoid continue when handling results

Exception handlers print the result and use continue
to skip the non-exception result printing. This makes
inserting common post-test code hard. Refactor to
avoid the continues and have only one ktap_result() call.

Reviewed-by: Petr Machata <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agoenic: add ethtool get_channel support
Jon Kohler [Thu, 27 Jun 2024 20:20:13 +0000 (13:20 -0700)]
enic: add ethtool get_channel support

Add .get_channel to enic_ethtool_ops to enable basic ethtool -l
support to get the current channel configuration.

Note that the driver does not support dynamically changing queue
configuration, so .set_channel is intentionally unused. Instead, users
should use Cisco's hardware management tools (UCSM/IMC) to modify
virtual interface card configuration out of band.

Signed-off-by: Jon Kohler <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agoMerge branch 'lift-udp_segment-restriction-for-egress-via-device-w-o-csum-offload'
Jakub Kicinski [Sat, 29 Jun 2024 01:13:07 +0000 (18:13 -0700)]
Merge branch 'lift-udp_segment-restriction-for-egress-via-device-w-o-csum-offload'

Jakub Sitnicki says:

====================
Lift UDP_SEGMENT restriction for egress via device w/o csum offload

This is a follow-up to an earlier question [1] if we can make UDP GSO work
with any egress device, even those with no checksum offload capability.
That's the default setup for TUN/TAP.

Because there is a change in behavior - sendmsg() does no longer return
EIO error - I'm submitting through net-next tree, rather than net,
as per Willem's advice.

[1] https://lore.kernel.org/netdev/[email protected]/

v1: https://lore.kernel.org/r/20240622-linux-udpgso-v1-0-d2344157ab2a@cloudflare.com
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agoselftests/net: Add test coverage for UDP GSO software fallback
Jakub Sitnicki [Wed, 26 Jun 2024 17:51:27 +0000 (19:51 +0200)]
selftests/net: Add test coverage for UDP GSO software fallback

Extend the existing test to exercise UDP GSO egress through devices with
various offload capabilities, including lack of checksum offload, which is
the default case for TUN/TAP devices.

Test against a dummy device because it is simpler to set up then TUN/TAP.

Signed-off-by: Jakub Sitnicki <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agoudp: Allow GSO transmit from devices with no checksum offload
Jakub Sitnicki [Wed, 26 Jun 2024 17:51:26 +0000 (19:51 +0200)]
udp: Allow GSO transmit from devices with no checksum offload

Today sending a UDP GSO packet from a TUN device results in an EIO error:

  import fcntl, os, struct
  from socket import *

  TUNSETIFF = 0x400454CA
  IFF_TUN = 0x0001
  IFF_NO_PI = 0x1000
  UDP_SEGMENT = 103

  tun_fd = os.open("/dev/net/tun", os.O_RDWR)
  ifr = struct.pack("16sH", b"tun0", IFF_TUN | IFF_NO_PI)
  fcntl.ioctl(tun_fd, TUNSETIFF, ifr)

  os.system("ip addr add 192.0.2.1/24 dev tun0")
  os.system("ip link set dev tun0 up")

  s = socket(AF_INET, SOCK_DGRAM)
  s.setsockopt(SOL_UDP, UDP_SEGMENT, 1200)
  s.sendto(b"x" * 3000, ("192.0.2.2", 9)) # EIO

This is due to a check in the udp stack if the egress device offers
checksum offload. While TUN/TAP devices, by default, don't advertise this
capability because it requires support from the TUN/TAP reader.

However, the GSO stack has a software fallback for checksum calculation,
which we can use. This way we don't force UDP_SEGMENT users to handle the
EIO error and implement a segmentation fallback.

Lift the restriction so that UDP_SEGMENT can be used with any egress
device. We also need to adjust the UDP GSO code to match the GSO stack
expectation about ip_summed field, as set in commit 8d63bee643f1 ("net:
avoid skb_warn_bad_offload false positives on UFO"). Otherwise we will hit
the bad offload check.

Users should, however, expect a potential performance impact when
batch-sending packets with UDP_SEGMENT without checksum offload on the
egress device. In such case the packet payload is read twice: first during
the sendmsg syscall when copying data from user memory, and then in the GSO
stack for checksum computation. This double memory read can be less
efficient than a regular sendmsg where the checksum is calculated during
the initial data copy from user memory.

Signed-off-by: Jakub Sitnicki <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agoMerge patch series "can: mcp251xfd: workaround for erratum DS80000789E 6 of mcp2518fd"
Marc Kleine-Budde [Fri, 28 Jun 2024 21:49:37 +0000 (23:49 +0200)]
Merge patch series "can: mcp251xfd: workaround for erratum DS80000789E 6 of mcp2518fd"

Marc Kleine-Budde <[email protected]> says:

This patch series tries to work around erratum DS80000789E 6 of the
mcp2518fd, found by Stefan Althöfer, the other variants of the chip
family (mcp2517fd and mcp251863) are probably also affected.

Erratum DS80000789E 6 says "reading of the FIFOCI bits in the FIFOSTA
register for an RX FIFO may be corrupted". However observation shows
that this problem is not limited to RX FIFOs but also effects the TEF
FIFO.

In the bad case, the driver reads a too large head index. In the
original code, the driver always trusted the read value.

For the RX FIDO this caused old, already processed CAN frames or new,
incompletely written CAN frames to be (re-)processed.

To work around this issue, keep a per FIFO timestamp of the last valid
received CAN frame and compare against the timestamp of every received
CAN frame.

Further tests showed that this workaround can recognize old CAN
frames, but a small time window remains in which partially written CAN
frames are not recognized but then processed. These CAN frames have
the correct data and time stamps, but the DLC has not yet been
updated.

For the TEF FIFO the original driver already detects the error, update
the error handling with the knowledge that it is causes by this erratum.

Link: https://lore.kernel.org/all/20240628-mcp251xfd-workaround-erratum-6-v4-0-53586f168524@pengutronix.de
Signed-off-by: Marc Kleine-Budde <[email protected]>
9 months agocan: mcp251xfd: tef: update workaround for erratum DS80000789E 6 of mcp2518fd
Marc Kleine-Budde [Sun, 22 Jan 2023 21:35:03 +0000 (22:35 +0100)]
can: mcp251xfd: tef: update workaround for erratum DS80000789E 6 of mcp2518fd

This patch updates the workaround for a problem similar to erratum
DS80000789E 6 of the mcp2518fd, the other variants of the chip
family (mcp2517fd and mcp251863) are probably also affected.

Erratum DS80000789E 6 says "reading of the FIFOCI bits in the FIFOSTA
register for an RX FIFO may be corrupted". However observation shows
that this problem is not limited to RX FIFOs but also effects the TEF
FIFO.

In the bad case, the driver reads a too large head index. As the FIFO
is implemented as a ring buffer, this results in re-handling old CAN
transmit complete events.

Every transmit complete event contains with a sequence number that
equals to the sequence number of the corresponding TX request. This
way old TX complete events can be detected.

If the original driver detects a non matching sequence number, it
prints an info message and tries again later. As wrong sequence
numbers can be explained by the erratum DS80000789E 6, demote the info
message to debug level, streamline the code and update the comments.

Keep the behavior: If an old CAN TX complete event is detected, abort
the iteration and mark the number of valid CAN TX complete events as
processed in the chip by incrementing the FIFO's tail index.

Cc: Stefan Althöfer <[email protected]>
Cc: Thomas Kopp <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
9 months agocan: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum
Marc Kleine-Budde [Sun, 22 Jan 2023 20:30:41 +0000 (21:30 +0100)]
can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum

This is a preparatory patch to work around a problem similar to
erratum DS80000789E 6 of the mcp2518fd, the other variants of the chip
family (mcp2517fd and mcp251863) are probably also affected.

Erratum DS80000789E 6 says "reading of the FIFOCI bits in the FIFOSTA
register for an RX FIFO may be corrupted". However observation shows
that this problem is not limited to RX FIFOs but also effects the TEF
FIFO.

When handling the TEF interrupt, the driver reads the FIFO header
index from the TEF FIFO STA register of the chip.

In the bad case, the driver reads a too large head index. In the
original code, the driver always trusted the read value, which caused
old CAN transmit complete events that were already processed to be
re-processed.

Instead of reading and trusting the head index, read the head index
and calculate the number of CAN frames that were supposedly received -
replace mcp251xfd_tef_ring_update() with mcp251xfd_get_tef_len().

The mcp251xfd_handle_tefif() function reads the CAN transmit complete
events from the chip, iterates over them and pushes them into the
network stack. The original driver already contains code to detect old
CAN transmit complete events, that will be updated in the next patch.

Cc: Stefan Althöfer <[email protected]>
Cc: Thomas Kopp <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
9 months agocan: mcp251xfd: rx: add workaround for erratum DS80000789E 6 of mcp2518fd
Marc Kleine-Budde [Wed, 11 Jan 2023 10:53:50 +0000 (11:53 +0100)]
can: mcp251xfd: rx: add workaround for erratum DS80000789E 6 of mcp2518fd

This patch tries to works around erratum DS80000789E 6 of the
mcp2518fd, the other variants of the chip family (mcp2517fd and
mcp251863) are probably also affected.

In the bad case, the driver reads a too large head index. In the
original code, the driver always trusted the read value, which caused
old, already processed CAN frames or new, incompletely written CAN
frames to be (re-)processed.

To work around this issue, keep a per FIFO timestamp [1] of the last
valid received CAN frame and compare against the timestamp of every
received CAN frame. If an old CAN frame is detected, abort the
iteration and mark the number of valid CAN frames as processed in the
chip by incrementing the FIFO's tail index.

Further tests showed that this workaround can recognize old CAN
frames, but a small time window remains in which partially written CAN
frames [2] are not recognized but then processed. These CAN frames
have the correct data and time stamps, but the DLC has not yet been
updated.

[1] As the raw timestamp overflows every 107 seconds (at the usual
    clock rate of 40 MHz) convert it to nanoseconds with the
    timecounter framework and use this to detect stale CAN frames.

Link: https://lore.kernel.org/all/BL3PR11MB64844C1C95CA3BDADAE4D8CCFBC99@BL3PR11MB6484.namprd11.prod.outlook.com
Reported-by: Stefan Althöfer <[email protected]>
Closes: https://lore.kernel.org/all/FR0P281MB1966273C216630B120ABB6E197E89@FR0P281MB1966.DEUP281.PROD.OUTLOOK.COM
Tested-by: Stefan Althöfer <[email protected]>
Tested-by: Thomas Kopp <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
9 months agocan: mcp251xfd: rx: prepare to workaround broken RX FIFO head index erratum
Marc Kleine-Budde [Wed, 11 Jan 2023 20:07:03 +0000 (21:07 +0100)]
can: mcp251xfd: rx: prepare to workaround broken RX FIFO head index erratum

This is a preparatory patch to work around erratum DS80000789E 6 of
the mcp2518fd, the other variants of the chip family (mcp2517fd and
mcp251863) are probably also affected.

When handling the RX interrupt, the driver iterates over all pending
FIFOs (which are implemented as ring buffers in hardware) and reads
the FIFO header index from the RX FIFO STA register of the chip.

In the bad case, the driver reads a too large head index. In the
original code, the driver always trusted the read value, which caused
old CAN frames that were already processed, or new, incompletely
written CAN frames to be (re-)processed.

Instead of reading and trusting the head index, read the head index
and calculate the number of CAN frames that were supposedly received -
replace mcp251xfd_rx_ring_update() with mcp251xfd_get_rx_len().

The mcp251xfd_handle_rxif_ring() function reads the received CAN
frames from the chip, iterates over them and pushes them into the
network stack. Prepare that the iteration can be stopped if an old CAN
frame is detected. The actual code to detect old or incomplete frames
and abort will be added in the next patch.

Link: https://lore.kernel.org/all/BL3PR11MB64844C1C95CA3BDADAE4D8CCFBC99@BL3PR11MB6484.namprd11.prod.outlook.com
Reported-by: Stefan Althöfer <[email protected]>
Closes: https://lore.kernel.org/all/FR0P281MB1966273C216630B120ABB6E197E89@FR0P281MB1966.DEUP281.PROD.OUTLOOK.COM
Tested-by: Stefan Althöfer <[email protected]>
Tested-by: Thomas Kopp <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
9 months agocan: mcp251xfd: mcp251xfd_handle_rxif_ring_uinc(): factor out in separate function
Marc Kleine-Budde [Wed, 11 Jan 2023 19:25:48 +0000 (20:25 +0100)]
can: mcp251xfd: mcp251xfd_handle_rxif_ring_uinc(): factor out in separate function

This is a preparation patch.

Sending the UINC messages followed by incrementing the tail pointer
will be called in more than one place in upcoming patches, so factor
this out into a separate function.

Also make mcp251xfd_handle_rxif_ring_uinc() safe to be called with a
"len" of 0.

Tested-by: Stefan Althöfer <[email protected]>
Tested-by: Thomas Kopp <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
9 months agocan: mcp251xfd: clarify the meaning of timestamp
Marc Kleine-Budde [Wed, 11 Jan 2023 10:48:16 +0000 (11:48 +0100)]
can: mcp251xfd: clarify the meaning of timestamp

The mcp251xfd chip is configured to provide a timestamp with each
received and transmitted CAN frame. The timestamp is derived from the
internal free-running timer, which can also be read from the TBC
register via SPI. The timer is 32 bits wide and is clocked by the
external oscillator (typically 20 or 40 MHz).

To avoid confusion, we call this timestamp "timestamp_raw" or "ts_raw"
for short.

Using the timecounter framework, the "ts_raw" is converted to 64 bit
nanoseconds since the epoch. This is what we call "timestamp".

This is a preparation for the next patches which use the "timestamp"
to work around a bug where so far only the "ts_raw" is used.

Tested-by: Stefan Althöfer <[email protected]>
Tested-by: Thomas Kopp <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
9 months agocan: mcp251xfd: move mcp251xfd_timestamp_start()/stop() into mcp251xfd_chip_start...
Marc Kleine-Budde [Wed, 11 Jan 2023 11:10:04 +0000 (12:10 +0100)]
can: mcp251xfd: move mcp251xfd_timestamp_start()/stop() into mcp251xfd_chip_start/stop()

The mcp251xfd wakes up from Low Power or Sleep Mode when SPI activity
is detected. To avoid this, make sure that the timestamp worker is
stopped before shutting down the chip.

Split the starting of the timestamp worker out of
mcp251xfd_timestamp_init() into the separate function
mcp251xfd_timestamp_start().

Call mcp251xfd_timestamp_init() before mcp251xfd_chip_start(), move
mcp251xfd_timestamp_start() to mcp251xfd_chip_start(). In this way,
mcp251xfd_timestamp_stop() can be called unconditionally by
mcp251xfd_chip_stop().

Signed-off-by: Marc Kleine-Budde <[email protected]>
9 months agocan: mcp251xfd: update errata references
Marc Kleine-Budde [Sun, 5 May 2024 12:40:10 +0000 (14:40 +0200)]
can: mcp251xfd: update errata references

Since the errata references have been added to the driver, new errata
sheets have been published. Update the references for the mcp2517fd
and mcp2518fd. For completeness add references for the mcp251863.

Signed-off-by: Marc Kleine-Budde <[email protected]>
9 months agocan: mcp251xfd: properly indent labels
Marc Kleine-Budde [Thu, 25 Apr 2024 08:14:45 +0000 (10:14 +0200)]
can: mcp251xfd: properly indent labels

To fix the coding style, remove the whitespace in front of labels.

Signed-off-by: Marc Kleine-Budde <[email protected]>
9 months agoice: do not init struct ice_adapter more times than needed
Przemek Kitszel [Mon, 17 Jun 2024 13:24:07 +0000 (15:24 +0200)]
ice: do not init struct ice_adapter more times than needed

Allocate and initialize struct ice_adapter object only once per physical
card instead of once per port. This is not a big deal by now, but we want
to extend this struct more and more in the near future. Our plans include
PTP stuff and a devlink instance representing whole-device/physical card.

Transactions requiring to be sleep-able (like those doing user (here ice)
memory allocation) must be performed with an additional (on top of xarray)
mutex. Adding it here removes need to xa_lock() manually.

Since this commit is a reimplementation of ice_adapter_get(), a rather new
scoped_guard() wrapper for locking is used to simplify the logic.

It's worth to mention that xa_insert() use gives us both slot reservation
and checks if it is already filled, what simplifies code a tiny bit.

Reviewed-by: Wojciech Drewek <[email protected]>
Signed-off-by: Przemek Kitszel <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Michal Schmidt <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
9 months agoice: Distinguish driver reset and removal for AQ shutdown
Piotr Gardocki [Fri, 14 Jun 2024 10:38:11 +0000 (12:38 +0200)]
ice: Distinguish driver reset and removal for AQ shutdown

Admin queue command for shutdown AQ contains a flag to indicate driver
unload. However, the flag is always set in the driver, even for resets. It
can cause the firmware to consider driver as unloaded once the PF reset is
triggered on all ports of device, which could lead to unexpected results.

Add an additional function parameter to functions that shutdown AQ,
indicating whether the driver is actually unloading.

Reviewed-by: Ahmed Zaki <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Piotr Gardocki <[email protected]>
Signed-off-by: Marcin Szycik <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
9 months agoice: Allow different FW API versions based on MAC type
Paul Greenwalt [Thu, 6 Jun 2024 13:48:46 +0000 (09:48 -0400)]
ice: Allow different FW API versions based on MAC type

Allow the driver to be compatible with different FW API versions based
on the device's MAC type. Currently, E810 is only compatible with one
FW API version. Now the driver can be compatible with different FW API
versions for both E810 and E830. For example, E810 FW API version is
1.5.0 and E830 is 1.7.0.

Signed-off-by: Paul Greenwalt <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
9 months agoice: Check all ice_vsi_rebuild() errors in function
Eric Joyner [Mon, 17 Jun 2024 12:46:25 +0000 (14:46 +0200)]
ice: Check all ice_vsi_rebuild() errors in function

Check the return value from ice_vsi_rebuild() and prevent the usage of
incorrectly configured VSI.

Reviewed-by: Michal Swiatkowski <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Eric Joyner <[email protected]>
Signed-off-by: Karen Ostrowska <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
9 months agoice: Add get/set hw address for VFs using devlink commands
Karthik Sundaravel [Tue, 28 May 2024 15:02:13 +0000 (20:32 +0530)]
ice: Add get/set hw address for VFs using devlink commands

Changing the MAC address of the VFs is currently unsupported via devlink.
Add the function handlers to set and get the HW address for the VFs.

Signed-off-by: Karthik Sundaravel <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
9 months agoMAINTAINERS: update Intel Ethernet maintainers
Jesse Brandeburg [Mon, 17 Jun 2024 19:07:46 +0000 (12:07 -0700)]
MAINTAINERS: update Intel Ethernet maintainers

Since Jesse has moved to a new role, replace him with a new maintainer
to work with Tony on representing Intel networking drivers in the
kernel.

Cc: Przemek Kitszel <[email protected]>
Signed-off-by: Jesse Brandeburg <[email protected]>
Acked-by: Paul Menzel <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
9 months agonetfilter: xt_recent: Lift restrictions on max hitcount value
Phil Sutter [Wed, 26 Jun 2024 22:35:05 +0000 (00:35 +0200)]
netfilter: xt_recent: Lift restrictions on max hitcount value

Support tracking of up to 65535 packets per table entry instead of just
255 to better facilitate longer term tracking or higher throughput
scenarios.

Note how this aligns sizes of struct recent_entry's 'nstamps' and
'index' fields when 'nstamps' was larger before. This is unnecessary as
the value of 'nstamps' grows along with that of 'index' after being
initialized to 1 (see recent_entry_update()). Its value will thus never
exceed that of 'index' and therefore does not need to provide space for
larger values.

Requested-by: Fabio <[email protected]>
Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1745
Signed-off-by: Phil Sutter <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
9 months agoselftests: netfilter: nft_queue.sh: add test for disappearing listener
Florian Westphal [Tue, 25 Jun 2024 19:07:44 +0000 (21:07 +0200)]
selftests: netfilter: nft_queue.sh: add test for disappearing listener

If userspace program exits while the queue its subscribed to has packets
those need to be discarded.

commit dc21c6cc3d69 ("netfilter: nfnetlink_queue: acquire rcu_read_lock()
in instance_destroy_rcu()") fixed a (harmless) rcu splat that could be
triggered in this case.

Add a test case to cover this.

Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
9 months agocan: gs_usb: add VID/PID for Xylanta SAINT3 product family
Marc Kleine-Budde [Tue, 25 Jun 2024 14:03:52 +0000 (16:03 +0200)]
can: gs_usb: add VID/PID for Xylanta SAINT3 product family

Add support for the Xylanta SAINT3 product family.

Cc: Andy Jackson <[email protected]>
Cc: Ken Aitchison <[email protected]>
Tested-by: Andy Jackson <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
9 months agoMerge branch 'net-selftests-mirroring-cleanup' into main
David S. Miller [Fri, 28 Jun 2024 09:55:38 +0000 (10:55 +0100)]
Merge branch 'net-selftests-mirroring-cleanup' into main

Petr Machata says:

====================
selftest: Clean-up and stabilize mirroring tests

The mirroring selftests work by sending ICMP traffic between two hosts.
Along the way, this traffic is mirrored to a gretap netdevice, and counter
taps are then installed strategically along the path of the mirrored
traffic to verify the mirroring took place.

The problem with this is that besides mirroring the primary traffic, any
other service traffic is mirrored as well. At the same time, because the
tests need to work in HW-offloaded scenarios, the ability of the device to
do arbitrary packet inspection should not be taken for granted. Most tests
therefore simply use matchall, one uses flower to match on IP address.
As a result, the selftests are noisy.

mirror_test() accommodated this noisiness by giving the counters an
allowance of several packets. But that only works up to a point, and on
busy systems won't be always enough.

In this patch set, clean up and stabilize the mirroring selftests. The
original intention was to port the tests over to UDP, but the logic of
ICMP ends up being so entangled in the mirroring selftests that the
changes feel overly invasive. Instead, ICMP is kept, but where possible,
we match on ICMP message type, thus filtering out hits by other ICMP
messages.

Where this is not practical (where the counter tap is put on a device
that carries encapsulated packets), switch the counter condition to _at
least_ X observed packets. This is less robust, but barely so --
probably the only scenario that this would not catch is something like
erroneous packet duplication, which would hopefully get caught by the
numerous other tests in this extensive suite.

- Patches #1 to #3 clean up parameters at various helpers.

- Patches #4 to #6 stabilize the mirroring selftests as described above.

- Mirroring tests currently allow testing SW datapath even on HW
  netdevices by trapping traffic to the SW datapath. This complicates
  the tests a bit without a good reason: to test SW datapath, just run
  the selftests on the veth topology. Thus in patch #7, drop support for
  this dual SW/HW testing.

- At this point, some cleanups were either made possible by the previous
  patches, or were always possible. In patches #8 to #11, realize these
  cleanups.

- In patch #12, fix mlxsw mirror_gre selftest to respect setting TESTS.
====================

Signed-off-by: David S. Miller <[email protected]>
9 months agoselftests: mlxsw: mirror_gre: Obey TESTS
Petr Machata [Thu, 27 Jun 2024 14:48:49 +0000 (16:48 +0200)]
selftests: mlxsw: mirror_gre: Obey TESTS

This test is unusual in that overriding TESTS does not change the tests to
be run. Split the individual tests into several functions and invoke them
through tests_run() as appropriate.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Danielle Ratson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoselftests: libs: Drop unused functions
Petr Machata [Thu, 27 Jun 2024 14:48:48 +0000 (16:48 +0200)]
selftests: libs: Drop unused functions

Nothing calls these.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Danielle Ratson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoselftests: libs: Drop slow_path_trap_install()/_uninstall()
Petr Machata [Thu, 27 Jun 2024 14:48:47 +0000 (16:48 +0200)]
selftests: libs: Drop slow_path_trap_install()/_uninstall()

These functions are not used anymore.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Danielle Ratson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoselftests: mirror_gre_lag_lacp: Drop unnecessary code
Petr Machata [Thu, 27 Jun 2024 14:48:46 +0000 (16:48 +0200)]
selftests: mirror_gre_lag_lacp: Drop unnecessary code

The selftest does not use functions from mirror_gre_lib, ditch the import.

It does not use arping either, so drop the require_command as well.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Danielle Ratson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoselftests: mlxsw: mirror_gre: Simplify
Petr Machata [Thu, 27 Jun 2024 14:48:45 +0000 (16:48 +0200)]
selftests: mlxsw: mirror_gre: Simplify

After the previous patch, the function test_span_failable() is always
called with should_fail=1. Drop the argument and streamline the code.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Danielle Ratson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoselftests: mirror: Drop dual SW/HW testing
Petr Machata [Thu, 27 Jun 2024 14:48:44 +0000 (16:48 +0200)]
selftests: mirror: Drop dual SW/HW testing

The mirroring tests are currently run in a skip_hw and optionally a skip_sw
mode. The former tests the SW datapath, the latter the HW datapath, if
available. In order to be able to test SW datapath on HW loopbacks, traps
are installed on ingress to get traffic from the HW datapath to the SW one.
This adds an unnecessary complexity when it would be much simpler to just
use a veth-based topology to test the SW datapath. Thus drop all the code
that supports this dual testing.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Danielle Ratson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoselftests: mirror: mirror_test(): Allow exact count of packets
Petr Machata [Thu, 27 Jun 2024 14:48:43 +0000 (16:48 +0200)]
selftests: mirror: mirror_test(): Allow exact count of packets

The mirroring selftests work by sending ICMP traffic between two hosts.
Along the way, this traffic is mirrored to a gretap netdevice, and counter
taps are then installed strategically along the path of the mirrored
traffic to verify the mirroring took place.

The problem with this is that besides mirroring the primary traffic, any
other service traffic is mirrored as well. At the same time, because the
tests need to work in HW-offloaded scenarios, the ability of the device to
do arbitrary packet inspection should not be taken for granted. Most tests
therefore simply use matchall, one uses flower to match on IP address.

As a result, the selftests are noisy, because besides the primary ICMP
traffic, any amount of other service traffic is mirrored as well.

mirror_test() accommodated this noisiness by giving the counters an
allowance of several packets. But in the previous patch, where possible,
counter taps were changed to match only on an exact ICMP message. At least
in those cases, we can demand an exact number of packets to match.

Where the tap is installed on a connective netdevice, the exact matching is
not practical (though with u32, anything is possible). In those places,
there should still be some leeway -- and probably bigger than before,
because experience shows that these tests are very noisy.

To that end, change mirror_test() so that it can be either called with an
exact number to expect, or with an expression. Where leeway is needed,
adjust callers to pass a ">= 10" instead of mere 10.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Danielle Ratson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoselftests: mirror: do_test_span_dir_ips(): Install accurate taps
Petr Machata [Thu, 27 Jun 2024 14:48:42 +0000 (16:48 +0200)]
selftests: mirror: do_test_span_dir_ips(): Install accurate taps

The mirroring selftests work by sending ICMP traffic between two hosts.
Along the way, this traffic is mirrored to a gretap netdevice, and counter
taps are then installed strategically along the path of the mirrored
traffic to verify the mirroring took place.

The problem with this is that besides mirroring the primary traffic, any
other service traffic is mirrored as well. At the same time, because the
tests need to work in HW-offloaded scenarios, the ability of the device to
do arbitrary packet inspection should not be taken for granted. Most tests
therefore simply use matchall, one uses flower to match on IP address.

As a result, the selftests are noisy, because besides the primary ICMP
traffic, any amount of other service traffic is mirrored as well.

However, often the counter tap is installed at the remote end of the gretap
tunnel. Since this is a SW-datapath scenario anyway, we can make the filter
arbitrarily accurate.

Thus in this patch, add parameters forward_type and backward_type to
several mirroring test helpers, as some other helpers already have. Then
change do_test_span_dir_ips() to instead of installing one generic tap and
using it for test in both directions, install the tap for each direction
separately, matching on the ICMP type given by these parameters.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Danielle Ratson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoselftests: mirror_gre_lag_lacp: Check counters at tunnel
Petr Machata [Thu, 27 Jun 2024 14:48:41 +0000 (16:48 +0200)]
selftests: mirror_gre_lag_lacp: Check counters at tunnel

The test works by sending packets through a tunnel, whence they are
forwarded to a LAG. One of the LAG children is removed from the LAG prior
to the exercise, and the test then counts how many packets pass through the
other one. The issue with this is that it counts all packets, not just the
encapsulated ones.

So instead add a second gretap endpoint to receive the sent packets, and
check reception counters there.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Danielle Ratson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoselftests: lib: tc_rule_stats_get(): Move default to argument definition
Petr Machata [Thu, 27 Jun 2024 14:48:40 +0000 (16:48 +0200)]
selftests: lib: tc_rule_stats_get(): Move default to argument definition

The argument $dir has a fallback value of "ingress". Move the fallback from
the usage site to the argument definition block to make the fact clearer.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Danielle Ratson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoselftests: mirror: Drop direction argument from several functions
Petr Machata [Thu, 27 Jun 2024 14:48:39 +0000 (16:48 +0200)]
selftests: mirror: Drop direction argument from several functions

The argument is not used by these functions except to propagate it for
ultimately no purpose.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Danielle Ratson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoselftests: libs: Expand "$@" where possible
Petr Machata [Thu, 27 Jun 2024 14:48:38 +0000 (16:48 +0200)]
selftests: libs: Expand "$@" where possible

In some functions, argument-forwarding through "$@" without listing the
individual arguments explicitly is fundamental to the operation of a
function. E.g. xfail_on_veth() should be able to run various tests in the
fail-to-xfail regime, and usage of "$@" is appropriate as an abstraction
mechanism. For functions such as simple_if_init(), $@ is a handy way to
pass an array.

In other functions, it's merely a mechanism to save some typing, which
however ends up obscuring the real arguments and makes life hard for those
that end up reading the code.

This patch adds some of the implicit function arguments and correspondingly
expands $@'s. In several cases this will come in handy as following patches
adjust the parameter lists.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Danielle Ratson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoMerge branch 'net-flash-modees-firmware' into main
David S. Miller [Fri, 28 Jun 2024 09:49:35 +0000 (10:49 +0100)]
Merge branch 'net-flash-modees-firmware' into main

Danielle Ratson says:

====================
Add ability to flash modules' firmware

CMIS compliant modules such as QSFP-DD might be running a firmware that
can be updated in a vendor-neutral way by exchanging messages between
the host and the module as described in section 7.2.2 of revision
4.0 of the CMIS standard.

According to the CMIS standard, the firmware update process is done
using a CDB commands sequence.

CDB (Command Data Block Message Communication) reads and writes are
performed on memory map pages 9Fh-AFh according to the CMIS standard,
section 8.12 of revision 4.0.

Add a pair of new ethtool messages that allow:

* User space to trigger firmware update of transceiver modules

* The kernel to notify user space about the progress of the process

The user interface is designed to be asynchronous in order to avoid RTNL
being held for too long and to allow several modules to be updated
simultaneously. The interface is designed with CMIS compliant modules in
mind, but kept generic enough to accommodate future use cases, if these
arise.

The kernel interface that will implement the firmware update using CDB
command will include 2 layers that will be added under ethtool:

* The upper layer that will be triggered from the module layer, is
 cmis_ fw_update.
* The lower one is cmis_cdb.

In the future there might be more operations to implement using CDB
commands. Therefore, the idea is to keep the cmis_cdb interface clean and
the cmis_fw_update specific to the cdb commands handling it.

The communication between the kernel and the driver will be done using
two ethtool operations that enable reading and writing the transceiver
module EEPROM.
The operation ethtool_ops::get_module_eeprom_by_page, that is already
implemented, will be used for reading from the EEPROM the CDB reply,
e.g. reading module setting, state, etc.
The operation ethtool_ops::set_module_eeprom_by_page, that is added in
the current patchset, will be used for writing to the EEPROM the CDB
command such as start firmware image, run firmware image, etc.

Therefore in order for a driver to implement module flashing, that
driver needs to implement the two functions mentioned above.

Patchset overview:
Patch #1-#2: Implement the EEPROM writing in mlxsw.
Patch #3: Define the interface between the kernel and user space.
Patch #4: Add ability to notify the flashing firmware progress.
Patch #5: Veto operations during flashing.
Patch #6: Add extended compliance codes.
Patch #7: Add the cdb layer.
Patch #8: Add the fw_update layer.
Patch #9: Add ability to flash transceiver modules' firmware.

v8:
Patch #7:
* In the ethtool_cmis_wait_for_cond() evaluate the condition once more
  to decide if the error code should be -ETIMEDOUT or something else.
* s/netdev_err/netdev_err_once.

v7:
Patch #4:
* Return -ENOMEM instead of PTR_ERR(attr) on
  ethnl_module_fw_flash_ntf_put_err().
Patch #9:
* Fix Warning for not unlocking the spin_lock in the error flow
             on module_flash_fw_work_list_add().
* Avoid the fall-through on ethnl_sock_priv_destroy().

v6:
* Squash some of the last patch to patch #5 and patch #9.
Patch #3:
* Add paragraph in .rst file.
Patch #4:
* Reserve '1' more place on SKB for NUL terminator in
  the error message string.
* Add more prints on error flow, re-write the printing
  function and add ethnl_module_fw_flash_ntf_put_err().
* Change the communication method so notification will be
  sent in unicast instead of multicast.
* Add new 'struct ethnl_module_fw_flash_ntf_params' that holds
  the relevant info for unicast communication and use it to
  send notification to the specific socket.
* s/nla_put_u64_64bit/nla_put_uint/
Patch #7:
* In ethtool_cmis_cdb_init(), Use 'const' for the 'params'
  parameter.
Patch #8:
* Add a list field to struct ethtool_module_fw_flash for
  module_fw_flash_work_list that will be presented in the next
  patch.
* Move ethtool_cmis_fw_update() cleaning to a new function that
  will be represented in the next patch.
* Move some of the fields in struct ethtool_module_fw_flash to
  a separate struct, so ethtool_cmis_fw_update() will get only
  the relevant parameters for it.
* Edit the relevant functions to get the relevant params for
  them.
* s/CMIS_MODULE_READY_MAX_DURATION_USEC/CMIS_MODULE_READY_MAX_DURATION_MSEC
Patch #9:
* Add a paragraph in the commit message.
* Rename labels in module_flash_fw_schedule().
* Add info to genl_sk_priv_*() and implement the relevant
  callbacks, in order to handle properly a scenario of closing
  the socket from user space before the work item was ended.
* Add a list the holds all the ethtool_module_fw_flash struct
  that corresponds to the in progress work items.
* Add a new enum for the socket types.
* Use both above to identify a flashing socket, add it to the
  list and when closing socket affect only the flashing type.
* Create a new function that will get the work item instead of
  ethtool_cmis_fw_update().
* Edit the relevant functions to get the relevant params for
  them.
* The new function will call the old ethtool_cmis_fw_update(),
  and do the cleaning, so the existence of the list should be
  completely isolated in module.c.
===================

Signed-off-by: David S. Miller <[email protected]>
9 months agoethtool: Add ability to flash transceiver modules' firmware
Danielle Ratson [Thu, 27 Jun 2024 14:08:56 +0000 (17:08 +0300)]
ethtool: Add ability to flash transceiver modules' firmware

Add the ability to flash the modules' firmware by implementing the
interface between the user space and the kernel.

Example from a succeeding implementation:

 # ethtool --flash-module-firmware swp40 file test.bin

 Transceiver module firmware flashing started for device swp40
 Transceiver module firmware flashing in progress for device swp40
 Progress: 99%
 Transceiver module firmware flashing completed for device swp40

In addition, add infrastructure that allows modules to set socket-specific
private data. This ensures that when a socket is closed from user space
during the flashing process, the right socket halts sending notifications
to user space until the work item is completed.

Signed-off-by: Danielle Ratson <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoethtool: cmis_fw_update: add a layer for supporting firmware update using CDB
Danielle Ratson [Thu, 27 Jun 2024 14:08:55 +0000 (17:08 +0300)]
ethtool: cmis_fw_update: add a layer for supporting firmware update using CDB

According to the CMIS standard, the firmware update process is done using
a CDB commands sequence.

Implement a work that will be triggered from the module layer in the
next patch the will initiate and execute all the CDB commands in order, to
eventually complete the firmware update process.

This flashing process includes, writing the firmware image, running the new
firmware image and committing it after testing, so that it will run upon
reset.

This work will also notify user space about the progress of the firmware
update process.

Signed-off-by: Danielle Ratson <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoethtool: cmis_cdb: Add a layer for supporting CDB commands
Danielle Ratson [Thu, 27 Jun 2024 14:08:54 +0000 (17:08 +0300)]
ethtool: cmis_cdb: Add a layer for supporting CDB commands

CDB (Command Data Block Message Communication) reads and writes are
performed on memory map pages 9Fh-AFh according to the CMIS standard,
section 8.20 of revision 5.2.
Page 9Fh is used to specify the CDB command to be executed and also
provides an area for a local payload (LPL).

According to the CMIS standard, the firmware update process is done using
a CDB commands sequence that will be implemented in the next patch.

The kernel interface that will implement the firmware update using CDB
command will include 2 layers that will be added under ethtool:

* The upper layer that will be triggered from the module layer, is
  cmis_fw_update.
* The lower one is cmis_cdb.

In the future there might be more operations to implement using CDB
commands. Therefore, the idea is to keep the CDB interface clean and the
cmis_fw_update specific to the CDB commands handling it.

These two layers will communicate using the API the consists of three
functions:

- struct ethtool_cmis_cdb *
  ethtool_cmis_cdb_init(struct net_device *dev,
struct ethtool_module_fw_flash_params *params);
- void ethtool_cmis_cdb_fini(struct ethtool_cmis_cdb *cdb);
- int ethtool_cmis_cdb_execute_cmd(struct net_device *dev,
   struct ethtool_cmis_cdb_cmd_args *args);

Add the CDB layer to support initializing, finishing and executing CDB
commands:

* The initialization process will include creating of an ethtool_cmis_cdb
  instance, querying the module CDB support, entering and validating the
  password from user space (CMD 0x0000) and querying the module features
  (CMD 0x0040).

* The finishing API will simply free the ethtool_cmis_cdb instance.

* The executing process will write the CDB command to EEPROM using
  set_module_eeprom_by_page() that was presented earlier, and will
  process the reply from EEPROM.

Signed-off-by: Danielle Ratson <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agonet: sfp: Add more extended compliance codes
Danielle Ratson [Thu, 27 Jun 2024 14:08:53 +0000 (17:08 +0300)]
net: sfp: Add more extended compliance codes

SFF-8024 is used to define various constants re-used in several SFF
SFP-related specifications.

Add SFF-8024 extended compliance code definitions for CMIS compliant
modules and use them in the next patch to determine the firmware flashing
work.

Signed-off-by: Danielle Ratson <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoethtool: Veto some operations during firmware flashing process
Danielle Ratson [Thu, 27 Jun 2024 14:08:52 +0000 (17:08 +0300)]
ethtool: Veto some operations during firmware flashing process

Some operations cannot be performed during the firmware flashing
process.

For example:

- Port must be down during the whole flashing process to avoid packet loss
  while committing reset for example.

- Writing to EEPROM interrupts the flashing process, so operations like
  ethtool dump, module reset, get and set power mode should be vetoed.

- Split port firmware flashing should be vetoed.

In order to veto those scenarios, add a flag in 'struct net_device' that
indicates when a firmware flash is taking place on the module and use it
to prevent interruptions during the process.

Signed-off-by: Danielle Ratson <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoethtool: Add flashing transceiver modules' firmware notifications ability
Danielle Ratson [Thu, 27 Jun 2024 14:08:51 +0000 (17:08 +0300)]
ethtool: Add flashing transceiver modules' firmware notifications ability

Add progress notifications ability to user space while flashing modules'
firmware by implementing the interface between the user space and the
kernel.

Signed-off-by: Danielle Ratson <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoethtool: Add an interface for flashing transceiver modules' firmware
Danielle Ratson [Thu, 27 Jun 2024 14:08:50 +0000 (17:08 +0300)]
ethtool: Add an interface for flashing transceiver modules' firmware

CMIS compliant modules such as QSFP-DD might be running a firmware that
can be updated in a vendor-neutral way by exchanging messages between
the host and the module as described in section 7.3.1 of revision 5.2 of
the CMIS standard.

Add a pair of new ethtool messages that allow:

* User space to trigger firmware update of transceiver modules

* The kernel to notify user space about the progress of the process

The user interface is designed to be asynchronous in order to avoid
RTNL being held for too long and to allow several modules to be
updated simultaneously. The interface is designed with CMIS compliant
modules in mind, but kept generic enough to accommodate future use
cases, if these arise.

Signed-off-by: Danielle Ratson <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agomlxsw: Implement ethtool operation to write to a transceiver module EEPROM
Ido Schimmel [Thu, 27 Jun 2024 14:08:49 +0000 (17:08 +0300)]
mlxsw: Implement ethtool operation to write to a transceiver module EEPROM

Implement the ethtool_ops::set_module_eeprom_by_page operation to allow
ethtool to write to a transceiver module EEPROM, in a similar fashion to
the ethtool_ops::get_module_eeprom_by_page operation.

Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agoethtool: Add ethtool operation to write to a transceiver module EEPROM
Ido Schimmel [Thu, 27 Jun 2024 14:08:48 +0000 (17:08 +0300)]
ethtool: Add ethtool operation to write to a transceiver module EEPROM

Ethtool can already retrieve information from a transceiver module
EEPROM by invoking the ethtool_ops::get_module_eeprom_by_page operation.
Add a corresponding operation that allows ethtool to write to a
transceiver module EEPROM.

The new write operation is purely an in-kernel API and is not exposed to
user space.

The purpose of this operation is not to enable arbitrary read / write
access, but to allow the kernel to write to specific addresses as part
of transceiver module firmware flashing. In the future, more
functionality can be implemented on top of these read / write
operations.

Adjust the comments of the 'ethtool_module_eeprom' structure as it is
no longer used only for read access.

Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Danielle Ratson <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agodt-bindings: net: realtek,rtl82xx: Document known PHY IDs as compatible strings
Marek Vasut [Tue, 25 Jun 2024 18:42:28 +0000 (20:42 +0200)]
dt-bindings: net: realtek,rtl82xx: Document known PHY IDs as compatible strings

Extract known PHY IDs from Linux kernel realtek PHY driver
and convert them into supported compatible string list for
this DT binding document.

Signed-off-by: Marek Vasut <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
9 months agocan: m_can: Constify struct m_can_ops
Christophe JAILLET [Sun, 23 Jun 2024 20:01:50 +0000 (22:01 +0200)]
can: m_can: Constify struct m_can_ops

'struct m_can_ops' is not modified in these drivers.

Constifying this structure moves some data to a read-only section, so
increase overall security.

On a x86_64, with allmodconfig, as an example:
Before:
======
   text    data     bss     dec     hex filename
   4806     520       0    5326    14ce drivers/net/can/m_can/m_can_pci.o

After:
=====
   text    data     bss     dec     hex filename
   4862     464       0    5326    14ce drivers/net/can/m_can/m_can_pci.o

Signed-off-by: Christophe JAILLET <[email protected]>
Link: https://lore.kernel.org/all/a17b96d1be5341c11f263e1e45c9de1cb754e416.1719172843.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <[email protected]>
9 months agoMerge patch series "can: rcar_canfd: Small improvements and cleanups"
Marc Kleine-Budde [Fri, 28 Jun 2024 07:35:07 +0000 (09:35 +0200)]
Merge patch series "can: rcar_canfd: Small improvements and cleanups"

Geert Uytterhoeven <[email protected]> says:

This series contains some improvements and cleanups for the R-Car
CAN-FD driver. It has been tested on R-Car V4H (White Hawk and White
Hawk Single).

Link: https://lore.kernel.org/all/[email protected]
[mkl: fixed typo in cover letter]
Signed-off-by: Marc Kleine-Budde <[email protected]>
9 months agocan: rcar_canfd: Remove superfluous parentheses in address calculations
Geert Uytterhoeven [Wed, 29 May 2024 09:12:15 +0000 (11:12 +0200)]
can: rcar_canfd: Remove superfluous parentheses in address calculations

There is no need to wrap simple variables or multiplications inside
parentheses.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Wolfram Sang <[email protected]>
Reviewed-by: Vincent Mailhol <[email protected]>
Link: https://lore.kernel.org/all/b5aee80895fa029070fd37d1d837cf1c0ecb52dc.1716973640.git.geert+renesas@glider.be
Signed-off-by: Marc Kleine-Budde <[email protected]>
9 months agocan: rcar_canfd: Improve printing of global operational state
Geert Uytterhoeven [Wed, 29 May 2024 09:12:14 +0000 (11:12 +0200)]
can: rcar_canfd: Improve printing of global operational state

Replace the printing of internal numerical values by the printing of
strings reflecting their meaning, to make the message self-explanatory.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Wolfram Sang <[email protected]>
Reviewed-by: Vincent Mailhol <[email protected]>
Link: https://lore.kernel.org/all/14c8c5ce026e9fec128404706d1c73c8ffa11ced.1716973640.git.geert+renesas@glider.be
Signed-off-by: Marc Kleine-Budde <[email protected]>
9 months agocan: rcar_canfd: Simplify clock handling
Geert Uytterhoeven [Wed, 29 May 2024 09:12:13 +0000 (11:12 +0200)]
can: rcar_canfd: Simplify clock handling

The main CAN clock is either the internal CANFD clock, or the external
CAN clock.  Hence replace the two-valued enum by a simple boolean flag.
Consolidate all CANFD clock handling inside a single branch.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Wolfram Sang <[email protected]>
Reviewed-by: Vincent Mailhol <[email protected]>
Link: https://lore.kernel.org/all/2cf38c10b83c8e5c04d68b17a930b6d9dbf66f40.1716973640.git.geert+renesas@glider.be
Signed-off-by: Marc Kleine-Budde <[email protected]>
9 months agonet: thunderx: Unembed netdev structure
Breno Leitao [Wed, 26 Jun 2024 17:35:02 +0000 (10:35 -0700)]
net: thunderx: Unembed netdev structure

Embedding net_device into structures prohibits the usage of flexible
arrays in the net_device structure. For more details, see the discussion
at [1].

Un-embed the net_devices from struct lmac by converting them
into pointers, and allocating them dynamically. Use the leverage
alloc_netdev() to allocate the net_device object at
bgx_lmac_enable().

The free of the device occurs at bgx_lmac_disable().

 Do not free_netdevice() if bgx_lmac_enable() fails after lmac->netdev
is allocated, since bgx_lmac_disable() is called if bgx_lmac_enable()
fails, and lmac->netdev will be freed there (similarly to lmac->dmacs).

Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Breno Leitao <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agoRevert "net: micro-optimize skb_datagram_iter"
Sagi Grimberg [Wed, 26 Jun 2024 07:01:53 +0000 (10:01 +0300)]
Revert "net: micro-optimize skb_datagram_iter"

This reverts commit 934c29999b57b835d65442da6f741d5e27f3b584.
This triggered a usercopy BUG() in systems with HIGHMEM, reported
by the test robot in:
 https://lore.kernel.org/oe-lkp/202406161539.b5ff7b20[email protected]

Signed-off-by: Sagi Grimberg <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agoMerge branch 'selftests-net-switch-pmtu-sh-to-use-the-internal-ovs-script'
Jakub Kicinski [Thu, 27 Jun 2024 22:50:19 +0000 (15:50 -0700)]
Merge branch 'selftests-net-switch-pmtu-sh-to-use-the-internal-ovs-script'

Aaron Conole says:

====================
selftests: net: Switch pmtu.sh to use the internal ovs script.

Currently, if a user wants to run pmtu.sh and cover all the provided test
cases, they need to install the Open vSwitch userspace utilities.  This
dependency is difficult for users as well as CI environments, because the
userspace build and setup may require lots of support and devel packages
to be installed, system setup to be correct, and things like permissions
and selinux policies to be properly configured.

The kernel selftest suite includes an ovs-dpctl.py utility which can
interact with the openvswitch module directly.  This lets developers and
CI environments run without needing too many extra dependencies - just
the pyroute2 python package.

This series enhances the ovs-dpctl utility to provide support for set()
and tunnel() flow specifiers, better ipv6 handling support, and the
ability to add tunnel vports, and LWT interfaces.  Finally, it modifies
the pmtu.sh script to call the ovs-dpctl.py utility rather than the
typical OVS userspace utilities.  The pmtu.sh can still fall back on
the Open vSwitch userspace utilities if the ovs-dpctl.py script can't
be used.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agoselftests: net: add config for openvswitch
Aaron Conole [Tue, 25 Jun 2024 17:22:45 +0000 (13:22 -0400)]
selftests: net: add config for openvswitch

The pmtu testing will require that the OVS module is installed,
so do that.

Reviewed-by: Simon Horman <[email protected]>
Tested-by: Simon Horman <[email protected]>
Signed-off-by: Aaron Conole <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agoselftests: net: Use the provided dpctl rather than the vswitchd for tests.
Aaron Conole [Tue, 25 Jun 2024 17:22:44 +0000 (13:22 -0400)]
selftests: net: Use the provided dpctl rather than the vswitchd for tests.

The current pmtu test infrastucture requires an installed copy of the
ovs-vswitchd userspace.  This means that any automated or constrained
environments may not have the requisite tools to run the tests.  However,
the pmtu tests don't require any special classifier processing.  Indeed
they are only using the vswitchd in the most basic mode - as a NORMAL
switch.

However, the ovs-dpctl kernel utility can now program all the needed basic
flows to allow traffic to traverse the tunnels and provide support for at
least testing some basic pmtu scenarios.  More complicated flow pipelines
can be added to the internal ovs test infrastructure, but that is work for
the future.  For now, enable the most common cases - wide mega flows with
no other prerequisites.

Enhance the pmtu testing to try testing using the internal utility, first.
As a fallback, if the internal utility isn't running, then try with the
ovs-vswitchd userspace tools.

Additionally, make sure that when the pyroute2 package is not available
the ovs-dpctl utility will error out to properly signal an error has
occurred and skip using the internal utility.

Reviewed-by: Stefano Brivio <[email protected]>
Signed-off-by: Aaron Conole <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agoselftests: openvswitch: Support implicit ipv6 arguments.
Aaron Conole [Tue, 25 Jun 2024 17:22:43 +0000 (13:22 -0400)]
selftests: openvswitch: Support implicit ipv6 arguments.

The current iteration of IPv6 support requires explicit fields to be set
in addition to not properly support the actual IPv6 addresses properly.
With this change, make it so that the ipv6() bare option is usable to
create wildcarded flows to match broad swaths of ipv6 traffic.

Reviewed-by: Simon Horman <[email protected]>
Tested-by: Simon Horman <[email protected]>
Signed-off-by: Aaron Conole <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agoselftests: openvswitch: Add support for tunnel() key.
Aaron Conole [Tue, 25 Jun 2024 17:22:42 +0000 (13:22 -0400)]
selftests: openvswitch: Add support for tunnel() key.

This will be used when setting details about the tunnel to use as
transport.  There is a difference between the ODP format between tunnel():
the 'key' flag is not actually a flag field, so we don't support it in the
same way that the vswitchd userspace supports displaying it.

Signed-off-by: Aaron Conole <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
9 months agoselftests: openvswitch: Add set() and set_masked() support.
Aaron Conole [Tue, 25 Jun 2024 17:22:41 +0000 (13:22 -0400)]
selftests: openvswitch: Add set() and set_masked() support.

These will be used in upcoming commits to set specific attributes for
interacting with tunnels.  Since set() will use the key parsing routine, we
also make sure to prepend it with an open paren, for the action parsing to
properly understand it.

Reviewed-by: Simon Horman <[email protected]>
Tested-by: Simon Horman <[email protected]>
Signed-off-by: Aaron Conole <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
This page took 0.119178 seconds and 4 git commands to generate.