Git Repo - linux.git/log

selftests: mptcp: add mptcp_lib_kill_wait

To avoid duplicated code in different MPTCP selftests, we can add
and use helpers defined in mptcp_lib.sh.

Export kill_wait() helper in userspace_pm.sh into mptcp_lib.sh and
rename it as mptcp_lib_kill_wait(). It can be used to instead of
kill_wait() in mptcp_join.sh. Use the new helper in both scripts.

Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: userspace pm send RM_ADDR for ID 0

This patch adds a selftest for userspace PM to remove id 0 address.

Use userspace_pm_add_addr() helper to add an id 10 address, then use
userspace_pm_rm_addr() helper to remove id 0 address.

Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: userspace pm remove initial subflow

This patch adds a selftest for userspace PM to remove the initial
subflow.

Use userspace_pm_add_sf() to add a subflow, and pass initial IP address
to userspace_pm_rm_sf() to remove the initial subflow.

Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

mptcp: userspace pm rename remove_err to out

The value of 'err' will not be only '-EINVAL', but can be '0' in some
cases.

So it's better to rename the label 'remove_err' to 'out' to avoid
confusions.

Suggested-by: Matthieu Baerts <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: userspace pm create id 0 subflow

This patch adds a selftest to create id 0 subflow. Pass id 0 to the
helper userspace_pm_add_sf() to create id 0 subflow. chk_mptcp_info
shows one subflow but chk_subflows_total shows two subflows in each
namespace.

Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: update userspace pm test helpers

This patch adds a new argument namespace to userspace_pm_add_addr() and
userspace_pm_add_sf() to make these two helper more versatile.

Add two more versatile helpers for userspace pm remove subflow or address:
userspace_pm_rm_addr() and userspace_pm_rm_sf(). The original test helpers
userspace_pm_rm_sf_addr_ns1() and userspace_pm_rm_sf_addr_ns2() can be
replaced by these new helpers.

Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: add chk_subflows_total helper

This patch adds a new helper chk_subflows_total(), in it use the newly
added counter mptcpi_subflows_total to get the "correct" amount of
subflows, including the initial one.

To be compatible with old 'ss' or kernel versions not supporting this
counter, get the total subflows by listing TCP connections that are
MPTCP subflows:

ss -ti state state established state syn-sent state syn-recv |
grep -c tcp-ulp-mptcp.

Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: add evts_get_info helper

This patch adds a new helper get_info_value(), using 'sed' command to
parse the value of the given item name in the line with the given keyword,
to make chk_mptcp_info() and pedit_action_pkts() more readable.

Also add another helper evts_get_info() to use get_info_value() to parse
the output of 'pm_nl_ctl events' command, to make all the userspace pm
selftests more readable, both in mptcp_join.sh and userspace_pm.sh.

Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

mptcp: add mptcpi_subflows_total counter

If the initial subflow has been removed, we cannot know without checking
other counters, e.g. ss -ti <filter> | grep -c tcp-ulp-mptcp or
getsockopt(SOL_MPTCP, MPTCP_FULL_INFO, ...) (or others except MPTCP_INFO
of course) and then check mptcp_subflow_data->num_subflows to get the
total amount of subflows.

This patch adds a new counter mptcpi_subflows_total in mptcpi_flags to
store the total amount of subflows, including the initial one. A new
helper __mptcp_has_initial_subflow() is added to check whether the
initial subflow has been removed or not. With this helper, we can then
compute the total amount of subflows from mptcp_info by doing something
like:

mptcpi_subflows_total = mptcpi_subflows +
__mptcp_has_initial_subflow(msk).

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/428
Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'mlxsw-support-cff-flood-mode'

Petr Machata says:

====================
mlxsw: Support CFF flood mode

The registers to configure to initialize a flood table differ between the
controlled and CFF flood modes. In therefore needs to be an op. Add it,
hook up the current init to the existing families, and invoke the op.

PGT is an in-HW table that maps addresses to sets of ports. Then when some
HW process needs a set of ports as an argument, instead of embedding the
actual set in the dynamic configuration, what gets configured is the
address referencing the set. The HW then works with the appropriate PGT
entry.

Among other allocations, the PGT currently contains two large blocks for
bridge flooding: one for 802.1q and one for 802.1d. Within each of these
blocks are three tables, for unknown-unicast, multicast and broadcast
flooding:

      . . . |    802.1q    |    802.1d    | . . .
            | UC | MC | BC | UC | MC | BC |
             \______ _____/ \_____ ______/
                    v             v
                   FID flood vectors

Thus each FID (which corresponds to an 802.1d bridge or one VLAN in an
802.1q bridge) uses three flood vectors spread across a fairly large region
of PGT.

This way of organizing the flood table (called "controlled") is not very
flexible. E.g. to decrease a bridge scale and store more IP MC vectors, one
would need to completely rewrite the bridge PGT blocks, or resort to hacks
such as storing individual MC flood vectors into unused part of the bridge
table.

In order to address these shortcomings, Spectrum-2 and above support what
is called CFF flood mode, for Compressed FID Flooding. In CFF flood mode,
each FID has a little table of its own, with three entries adjacent to each
other, one for unknown-UC, one for MC, one for BC. This allows for a much
more fine-grained approach to PGT management, where bits of it are
allocated on demand.

      . . . | FID | FID | FID | FID | FID | . . .
            |U|M|B|U|M|B|U|M|B|U|M|B|U|M|B|
             \_____________ _____________/
                           v
                   FID flood vectors

Besides the FID table organization, the CFF flood mode also impacts Router
Subport (RSP) table. This table contains flood vectors for rFIDs, which are
FIDs that reference front panel ports or LAGs. The RSP table contains two
entries per front panel port and LAG, one for unknown-UC traffic, and one
for everything else. Currently, the FW allocates and manages the table in
its own part of PGT. rFIDs are marked with flood_rsp bit and managed
specially. In CFF mode, rFIDs are managed as all other FIDs. The driver
therefore has to allocate and maintain the flood vectors. Like with bridge
FIDs, this is more work, but increases flexibility of the system.

The FW currently supports both the controlled and CFF flood modes. To shed
complexity, in the future it should only support CFF flood mode. Hence this
patchset, which adds CFF flood mode support to mlxsw.

Since mlxsw needs to maintain both the controlled mode as well as CFF mode
support, we will keep the layout as compatible as possible. The bridge
tables will stay in the same overall shape, just their inner organization
will change from flood mode -> FID to FID -> flood mode. Likewise will RSP
be kept as a contiguous block of PGT memory, as was the case when the FW
maintained it.

- The way FIDs get configured under the CFF flood mode differs from the
  currently used controlled mode. The simple approach of having several
  globally visible arrays for spectrum.c to statically choose from no
  longer works.

  Patch #1 thus privatizes all FID initialization and finalization logic,
  and exposes it as ops instead.

- Patch #2 renames the ops that are specific to the controlled mode, to
  make room in the namespace for the CFF variants.

  Patch #3 extracts a helper to compute flood table base out of
  mlxsw_sp_fid_flood_table_mid().

- The op fid_setup configured fid_offset, i.e. the number of this FID
  within its family. For rFIDs in CFF mode, to determine this number, the
  driver will need to do fallible queries.

  Thus in patch #4, make the FID setup operation fallible as well.

- Flood mode initialization routine differs between the controlled and CFF
  flood modes. The controlled mode needs to configure flood table layout,
  which the CFF mode does not need to do.

  In patch #5, move mlxsw_sp_fid_flood_table_init() up so that the
  following patch can make use of it.

  In patch #6, add an op to be invoked per table (if defined).

- The current way of determining PGT allocation size depends on the number
  of FIDs and number of flood tables. RFIDs however have PGT footprint
  depending not on number of FIDs, but on number of ports and LAGs, because
  which ports an rFID should flood to does not depend on the FID itself,
  but on the port or LAG that it references.

  Therefore in patch #7, add FID family ops for determining PGT allocation
  size.

- As elaborated above, layout of PGT will differ between controlled and CFF
  flood modes. In CFF mode, it will further differ between rFIDs and other
  FIDs (as described at previous patch). The way to pack the SFMR register
  to configure a FID will likewise differ from controlled to CFF.

  Thus in patches #8 and #9 add FID family ops to determine PGT base
  address for a FID and to pack SFMR.

- Patches #10 and #11 add more bits for RSP support. In patch #10, add a
  new traffic type enumerator, for non-UC traffic. This is a combination of
  BC and MC traffic, but the way that mlxsw maps these mnemonic names to
  actual traffic type configurations requires that we have a new name to
  describe this class of traffic.

  Patch #11 then adds hooks necessary for RSP table maintenance. As ports
  come and go, and join and leave LAGs, it is necessary to update flood
  vectors that the rFIDs use. These new hooks will make that possible.

- Patches #12, #13 and #14 introduce flood profiles. These have been
  implicit so far, but the way that CFF flood mode works with profile IDs
  requires that we make them explicit.

  Thus in patch #12, introduce flood profile objects as a set of flood
  tables that FID families then refer to. The FID code currently only
  uses a single flood profile.

  In patch #13, add a flood profile ID to flood profile objects.

  In patch #14, when in CFF mode, configure SFFP according to the existing
  flood profiles (or the one that exists as of that point).

- Patches #15 and #16 add code to implement, respectively, bridge FIDs and
  RSP FIDs in CFF mode.

- In patch #17, toggle flood_mode_prefer_cff on Spectrum-2 and above, which
  makes the newly-added code live.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum: Use CFF mode where available

Mark all Spectrum>2 systems as preferring CFF flood mode if supported by
the firmware.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/8a3d2ad96b943f7e3f53f998bd333a14e19cd641.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_fid: Add support for rFID family in CFF flood mode

In this patch, add the artifacts for the rFID family that works in CFF
flood mode.

The same that was said about PGT organization and lookup in bridge FID
families applies for the rFID family as well. The main difference lies in
the fact that in the controlled flood mode, the FW was taking care of
maintaining the PGT tables for rFIDs. In CFF mode, the responsibility
shifts to the driver.

All rFIDs are based off either a front panel port, or a LAG port. For those
based off ports, we need to maintain at worst one PGT block for each port,
for those based off LAGs, one PGT block per LAG. This reflects in the
pgt_size callback, which determines the PGT footprint based on number of
ports and the LAG capacity.

A number of FIDs may end up using the same PGT base. Unlike with bridges,
where membership of a port in a given FID is highly dynamic, an rFID based
of a port will just always need to flood to that port.

Both the port and the LAG subtables need to be actively maintained. To that
end, the CFF rFID family implements fid_port_init and fid_port_fini
callbacks, which toggle the necessary bits.

Both FID-MID translation and SFMR packing then point into either the port
or the LAG subtable, to the block that corresponds to a given port or a
given LAG, depending on what port the RIF bound to the rFID uses.

As in the previous patch, the way CFF flood mode organizes PGT accesses
allows for much more smarts and dynamism. As in the previous patch, we
rather aim to keep things simple and static.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/962deb4367585d38250e80c685a34735c0c7f3ad.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_fid: Add a family for bridge FIDs in CFF flood mode

In this patch, add the artifacts for 802.1d and 802.1q FID families that
work in CFF flood mode.

In CFF flood mode, the way flood vectors are looked up changes: there's a
per-FID PGT base, to which a small offset is added depending on type of
traffic. Thus each FID occupies a small contiguous block of PGT memory,
whereas in the controlled flood mode, flood vectors for a given FID were
spread across the PGT.

The term "flood table" as used by the spectrum_fid module, borrows from
controlled flood mode way of organizing the PGT table. There flood tables
were actual tables, contiguous in the PGT. In the CFF flood mode, they are
more abstract: a flood table becomes a collection of e.g. all first rows of
the per-FID PGT blocks. Nonetheless we retain the nomenclature.

FIDs are still configured through the SFMR register, but there are
different fields to set under CFF mode: PGT base and profile. Thus register
packing gets a dedicated op overload as well.

The new organization of PGT makes it possible to treat the PGT as a block
of an ordinary memory, allocate and deallocate on demand, and achieve
better flexibility. Here instead, we aim to keep the code as close as
possible to the previous controlled flood mode, support for which we need
to retain for Spectrum-1 and older FW versions anyway. Thus the PGT
footprint of the individual families is the same as before, just the
internal organization of the per-family PGT region differs. Hence the
pgt_size callback is reused between the controlled and CFF flood modes.

Since the dummy family has no flood tables in either the CTL mode or in
CFF mode, the existing one can be reused for the CFF family array.

Users should not notice any changes between the controlled and CFF flood
modes.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/ca40b8163e6d6a21f63ef299619acee953cf9519.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_fid: Initialize flood profiles in CFF mode

In CFF flood mode, the way flood vectors are looked up changes: there's a
per-FID PGT base, to which a small offset is added depending on type of
traffic. Thus each FID occupies a small contiguous block of PGT memory,
whereas in the controlled flood mode, flood vectors for a given FID were
spread across the PGT.

Each FID is associated with one of a handful of profiles. The profile and
the traffic type are then used as keys to look up the PGT offset. This
offset is then added to the per-FID PGT base. The profile / type / offset
mapping needs to be configured by the driver, and is only relevant in CFF
flood mode.

In this patch, add the SFFP initialization code. Only initialize the one
profile currently explicitly used. As follow-up patch add more profiles,
this code will pick them up and initialize as well.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/2c4733ed72d439444218969c032acad22cd4ed88.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_fid: Add profile_id to flood profile

In the CFF mode, flood profiles are identified by a unique numerical
identifier. This is used for configuration of FIDs and for configuration of
traffic-type to PGT offset rules. In both cases, the numerical identifier
serves as a handle for the flood profile. Add the identifier to the flood
profile structure.

There is currently only one flood profile in use explicitly, the one used
for all bridging. Eventually three will be necessary in total: one for
bridges, one for rFIDs, one for NVE underlay. A total of four profiles
are supported by the HW. Start allocating at 1, because 0 is currently
used for underlay NVE flood.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/19ea9c35ba8b522fa5f7eb6fd7bc1b68f0f66b41.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_fid: Add an object to keep flood profiles

A flood profile is a mapping from traffic type to an offset at which a
flood vector should be looked up. In mlxsw so far, a flood profile was
somewhat implicitly represented by flood table array. When the CFF flood
mode will be introduced, the flood profile will become more explicit: each
will get a number and the profile ID / traffic-type / offset mapping will
actually need to be initialized in the hardware.

Therefore it is going to be handy to have a structure that keeps all the
components that compose a flood profile. Add this structure, currently with
just the flood table array bits. In the FID families that flood at all,
reference the flood profile instead of just the table array.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/15e113de114d3f41ce3fd2a14a2fa6a1b1d7e8f2.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_fid: Add hooks for RSP table maintenance

In the CFF flood mode, the driver has to allocate a table within PGT, which
holds flood vectors for router subport FIDs. For LAGs, these flood vectors
have to obviously be maintained dynamically as port membership in a LAG
changes. But even for physical ports, the flood vectors have to be kept
valid, and may not contain enabled bits corresponding to non-existent
ports. It is therefore not possible to precompute the port part of the RSP
table, it has to be maintained as ports come and go due to splits.

To support the RSP table maintenance, add to FID ops two new ops:
fid_port_init and fid_port_fini, for when a port comes to existence, or
joins a lag, and vice versa. Invoke these ops from
mlxsw_sp_port_fids_init() and mlxsw_sp_port_fids_fini(), which are called
when port is added and removed, respectively. Also add two new hooks for
LAG maintenance, mlxsw_sp_fid_port_join_lag() / _leave_lag() which
transitively call into the same ops.

Later patches will actually add the op implementations themselves, this
just adds the scaffolding.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/234398a23540317abb25f74f920a5c8121faecf0.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_fid: Add a not-UC packet type

In CFF flood mode, the rFID family will allocate two tables. One for
unknown UC traffic, one for everything else. Add a traffic type for the
everything else traffic.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/8fb968b2d1cc37137cd0110c98cdeb625b03ca99.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_fid: Add an op for packing SFMR

The way SFMR is packed differs between the controlled and CFF flood modes.
Add an op to dispatch it dynamically.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/f12fe7879a7086ee86343ee4db02c859f78f0534.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_fid: Add an op to get PGT address of a FID

In the CFF flood mode, the way to determine a PGT address where a given FID
/ flood table resides is different from the controlled flood mode, which
mlxsw currently uses. Furthermore, this will differ between rFID family and
bridge families. The operation therefore needs to be dynamically
dispatched. To that end, add an op to FID-family ops.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Link: https://lore.kernel.org/r/00e8f6ad79009a9a77a5c95d596ea9574776dc95.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_fid: Add an op to get PGT allocation size

In the CFF flood mode, the PGT allocation size of RFID family will not
depend on number of FIDs, but rather number of ports and LAGs. Therefore
introduce a FID family operation to calculate the PGT allocation size.

The way that size is calculated in the CFF mode depends on calling fallible
functions. Thus express the op as returning an int, with the size returned
via a pointer argument.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/1174651b7160fcedbef50010ae4b68201112fe6f.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_fid: Add an op for flood table initialization

In controlled flood mode, for each bridge FID family (i.e., 802.1Q and
802.1D) and packet type (i.e., UUC/MC/BC), the hardware needs to be told
which PGT address to use as the base address for the flood table and how
to determine the offset from the base for each FID.

The above is not needed in CFF mode where each FID has its own flood
table instead of the FID family itself.

Therefore, create a new FID family operation for the above configuration
and only implement it for the 802.1Q and 802.1D families in controlled
flood mode.

No functional changes intended.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/06f71415eec75811585ec597e1dd101b6dff77e7.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_fid: Move mlxsw_sp_fid_flood_table_init() up

Move the function to the point where it will need to be to be visible for
the 802.1d ops.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/aef09e26b0c2dd077531e665d7135b300bdaf0a8.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_fid: Make mlxsw_sp_fid_ops.setup return an int

This operation will be fallible for rFIDs in CFF mode, which will be
introduced in follow-up patches. Have it return an int, and handle
the failures in the caller.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/75f1b85c0cb86bea5501fcc8657042f221a78b32.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_fid: Split a helper out of mlxsw_sp_fid_flood_table_mid()

In future patches, for CFF flood mode support, we will need a way to
determine a PGT base dynamically, as an op. Therefore, for symmetry,
split out a helper, mlxsw_sp_fid_pgt_base_ctl(), that determines a PGT base
in the controlled mode as well.

Now that the helper is available, use it in mlxsw_sp_fid_flood_table_init()
which currently invokes the FID->MID helper to that end.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/fd41c66a1df4df6499d3da34f40e7b9efa15bc3e.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_fid: Rename FID ops, families, arrays

Currently, mlxsw always uses a "controlled" flood mode on all Nvidia
Spectrum generations. The following patches will however introduce a
possibility to run a "CFF" (for Compressed FID Flooding) mode on newer
machines, if the FW supports it.

To reflect that, label all FID ops, FID families and FID family arrays with
a _ctl suffix. This will make it clearer what is what when the CFF families
are introduced in later patches.

Keep the dummy family intact. Since the dummy family has no flood tables
in either CTL or CFF mode, there are no flood-mode-specific callbacks.

Additionally, add a remark at two fields that they are only relevant when
flood mode is not CFF.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/96b6da5439bb662fa86e795bbcec9dc3ccfa59fd.1701183892.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_fid: Privatize FID families

Currently, mlxsw always uses a "controlled" flood mode on all Nvidia
Spectrum generations. The following patches will however introduce a
possibility to run a "CFF" (for Compressed FID Flooding) mode on newer
machines, if the FW supports it.

Several operations will differ between how they need to be done in
controlled mode vs. CFF mode. Thus the per-FID-family ops will differ
between controlled and CFF, thus the FID family array as such will
differ depending on whether the mode negotiated with FW is controlled
or CFF.

The simple approach of having several globally visible arrays for
spectrum.c to statically choose from no longer works. Instead privatize all
FID initialization and finalization logic, and expose it as ops instead.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/d3fa390d97cf3dbd2f7a28741be69b311e2059e4.1701183891.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

net: phy: aquantia: drop wrong endianness conversion for addr and CRC

On further testing on BE target with kernel test robot, it was notice
that the endianness conversion for addr and CRC in fw_load_memory was
wrong.

Drop the cpu_to_le32 conversion for addr load as it's not needed.

Use get_unaligned_le32 instead of get_unaligned for FW data word load to
correctly convert data in the correct order to follow system endian.

Also drop the cpu_to_be32 for CRC calculation as it's wrong and would
cause different CRC on BE system.
The loaded word is swapped internally and MAILBOX calculates the CRC on
the swapped word. To correctly calculate the CRC to be later matched
with the one from MAILBOX, use an u8 struct and swap the word there to
keep the same order on both LE and BE for crc_ccitt_false function.
Also add additional comments on how the CRC verification for the loaded
section works.

CRC is calculated as we load the section and verified with the MAILBOX
only after the entire section is loaded to skip additional slowdown by
loop the section data again.

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Fixes: e93984ebc1c8 ("net: phy: aquantia: add firmware load support")
Tested-by: Robert Marko <[email protected]> # ipq8072 LE device
Signed-off-by: Christian Marangi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phy: adin: allow control of Fast Link Down

Add support to allow Fast Link Down (aka "Enhanced link detection") to
be controlled via the ETHTOOL_PHY_FAST_LINK_DOWN tunable. These PHYs
have this feature enabled by default.

Signed-off-by: Vincent Whitchurch <[email protected]>
Acked-by: Nuno Sa <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

r8169: improve handling task scheduling

If we know that the task is going to be a no-op, don't even schedule it.
And remove the check for netif_running() in the worker function, the
check for flag RTL_FLAG_TASK_ENABLED is sufficient. Note that we can't
remove the check for flag RTL_FLAG_TASK_ENABLED in the worker function
because we have no guarantee when it will be executed.

Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'create-a-binding-for-the-marvell-mv88e6xxx-dsa-switches'

Linus Walleij says:

====================
Create a binding for the Marvell MV88E6xxx DSA switches

The Marvell switches are lacking DT bindings.

I need proper schema checking to add LED support to the
Marvell switch. Just how it is, it can't go on like this.

Some Device Tree fixes are included in the series, these
remove the major and most annoying warnings fallout noise:
some warnings remain, and these are of more serious nature,
such as missing phy-mode. They can be applied individually,
or to the networking tree with the rest of the patches.

Thanks to Andrew Lunn, Vladimir Oltean and Russell King
for excellent review and feedback!

This latest version employs special compatibles in the
odd ABI device trees.

v8: https://lore.kernel.org/r/20231114-marvell-88e6152-wan-led-v8-0-50688741691b@linaro.org
v7: https://lore.kernel.org/r/20231024-marvell-88e6152-wan-led-v7-0-2869347697d1@linaro.org
v6: https://lore.kernel.org/r/20231024-marvell-88e6152-wan-led-v6-0-993ab0949344@linaro.org
v5: https://lore.kernel.org/r/20231023-marvell-88e6152-wan-led-v5-0-0e82952015a7@linaro.org
v4: https://lore.kernel.org/r/20231018-marvell-88e6152-wan-led-v4-0-3ee0c67383be@linaro.org
v3: https://lore.kernel.org/r/20231016-marvell-88e6152-wan-led-v3-0-38cd449dfb15@linaro.org
v2: https://lore.kernel.org/r/20231014-marvell-88e6152-wan-led-v2-0-7fca08b68849@linaro.org
v1: https://lore.kernel.org/r/20231013-marvell-88e6152-wan-led-v1-0-0712ba99857c@linaro.org
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

dt-bindings: marvell: Add Marvell MV88E6060 DSA schema

The Marvell MV88E6060 is one of the oldest DSA switches from
Marvell, and it has DT bindings used in the wild. Let's define
them properly.

It is different enough from the rest of the MV88E6xxx switches
that it deserves its own binding.

Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

dt-bindings: marvell: Rewrite MV88E6xxx in schema

This is an attempt to rewrite the Marvell MV88E6xxx switch bindings
in YAML schema.

The current text binding says:
  WARNING: This binding is currently unstable. Do not program it into a
  FLASH never to be changed again. Once this binding is stable, this
  warning will be removed.

Well that never happened before we switched to YAML markup,
we can't have it like this, what about fixing the mess?

Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

dt-bindings: net: ethernet-switch: Accept special variants

Accept special node naming variants for Marvell switches with
special node names as ABI.

This is maybe not the prettiest but it avoids special-casing
the Marvell MV88E6xxx bindings by copying a lot of generic
binding code down into that one binding just to special-case
these unfixable nodes.

Reviewed-by: Rob Herring <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

dt-bindings: net: mvusb: Fix up DSA example

When adding a proper schema for the Marvell mx88e6xxx switch,
the scripts start complaining about this embedded example:

  dtschema/dtc warnings/errors:
  net/marvell,mvusb.example.dtb: switch@0: ports: '#address-cells'
  is a required property
  from schema $id: http://devicetree.org/schemas/net/dsa/marvell,mv88e6xxx.yaml#
  net/marvell,mvusb.example.dtb: switch@0: ports: '#size-cells'
  is a required property
  from schema $id: http://devicetree.org/schemas/net/dsa/marvell,mv88e6xxx.yaml#

Fix this up by extending the example with those properties in
the ports node.

While we are at it, rename "ports" to "ethernet-ports" and rename
"switch" to "ethernet-switch" as this is recommended practice.

Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

dt-bindings: net: dsa: Require ports or ethernet-ports

Bindings using dsa.yaml#/$defs/ethernet-ports specify that
a DSA switch node need to have a ports or ethernet-ports
subnode, and that is actually required, so add requirements
using oneOf.

Suggested-by: Rob Herring <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'tools-ynl-fixes-for-the-page-pool-sample-and-the-generation-process'

Jakub Kicinski says:

====================
tools: ynl: fixes for the page-pool sample and the generation process

Minor fixes to the new sample and the Makefiles.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tools: ynl: don't skip regeneration from make targets

Commit 2b7ac0c87d98 ("tools: ynl-gen: don't touch the output file if
content is the same") is working too well. It was added so that
ynl-regen -f doesn't make us rebuild half of the kernel, if there
are no actual changes in any generated code.

When ynl-gen-c is called by make, however, we're better off trusting
make's tracking and overwrite the file. Otherwise if output is identical
we won't update file timestamps and make will retry code gen on every
invocation.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tools: ynl: order building samples after generated code

Parallel builds of ynl:

make -C tools/net/ynl/ -j 4

don't work correctly right now. samples get handled before
generated, so build of samples does not notice that protos.a
has changed. Order samples to be last.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tools: ynl: make sure we use local headers for page-pool

Building samples generates the following warning:

  In file included from page-pool.c:11:
  generated/netdev-user.h:21:45: warning: ‘enum netdev_xdp_rx_metadata’ declared inside parameter list will not be visible outside of this definition or declaration
   21 | const char *netdev_xdp_rx_metadata_str(enum netdev_xdp_rx_metadata value);
      |                                             ^~~~~~~~~~~~~~~~~~~~~~

Our magic way of including uAPI headers assumes the sample
name matches the family name. We need to copy the flags over.

Fixes: 637567e4a3ef ("tools: ynl: add sample for getting page-pool information")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tools: ynl: fix build of the page-pool sample

The name of the "destroyed" field in the reply was not changed
in the sample after we started calling it "detach_time".

page-pool.c: In function ‘main’:
page-pool.c:84:33: error: ‘struct <anonymous>’ has no member named ‘destroyed’
84 | if (pp->_present.destroyed)
| ^

Fixes: 637567e4a3ef ("tools: ynl: add sample for getting page-pool information")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'fine-tune-flow-control-and-speed-configurations-in-microchip-ksz8xxx-dsa-driver'

Oleksij Rempel says:

====================
Fine-Tune Flow Control and Speed Configurations in Microchip KSZ8xxx DSA Driver

This patch set focuses on enhancing the configurability of flow
control, speed, and duplex settings in the Microchip KSZ8xxx DSA driver.

The first patch allows more granular control over the CPU port's flow
control, speed, and duplex settings. The second patch introduces a
method for port configurations for port with integrated PHYs, primarily
concerning flow control based on duplex mode.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: microchip: make phylink_mac_link_up() not optional

Last part of the driver do now support phylink_mac_link_up(). So, make it
not optional.

Signed-off-by: Oleksij Rempel <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: microchip: ksz8: Add function to configure ports with integrated PHYs

This patch introduces the function 'ksz8_phy_port_link_up' to the
Microchip KSZ8xxx driver. This function is responsible for setting up
flow control and duplex settings for the ports that are integrated with
PHYs.

The KSZ8795 switch supports asymmetric pause control, which can't be
fully utilized since a single bit controls both RX and TX pause. Despite
this, the flow control can be adjusted based on the auto-negotiation
process, taking into account the capabilities of both link partners.

On the other hand, the KSZ8873's PORT_FORCE_FLOW_CTRL bit can be set by
the hardware bootstrap, which ignores the auto-negotiation result.
Therefore, even in auto-negotiation mode, we need to ensure that this
bit is correctly set.

When auto-negotiation isn't in use, we enforce symmetric pause control
for the KSZ8795 switch.

Please note, forcing flow control disable on a port while still
advertising pause support isn't possible. While this scenario
might not be practical or desired, it's important to be aware of this
limitation when working with the KSZ8873 and similar devices.

Signed-off-by: Oleksij Rempel <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: microchip: ksz8: Make flow control, speed, and duplex on CPU port configurable

Allow flow control, speed, and duplex settings on the CPU port to be
configurable. Previously, the speed and duplex relied on default switch
values, which limited flexibility. Additionally, flow control was
hardcoded and only functional in duplex mode. This update enhances the
configurability of these parameters.

Signed-off-by: Oleksij Rempel <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'gve-add-support-for-non-4k-page-sizes'

John Fraker says:

====================
gve: Add support for non-4k page sizes.

This patch series adds support for non-4k page sizes to the driver. Prior
to this patch series, the driver assumes a 4k page size in many small
ways, and will crash in a kernel compiled for a different page size.

This changeset aims to be a minimal changeset that unblocks certain arm
platforms with large page sizes.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

gve: Remove dependency on 4k page size.

Prior to this change, gve crashes when attempting to run in kernels with
page sizes other than 4k. This change removes unnecessary references to
PAGE_SIZE and replaces them with more meaningful constants.

Signed-off-by: Jordan Kimbrough <[email protected]>
Signed-off-by: John Fraker <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

gve: Add page size register to the register_page_list command.

This register is required on platforms with page sizes greater than 4k.
This is because the tx side of the driver vmaps the entire queue page
list of pages into a single flat address space, then uses the entire
space. Without communicating the guest page size to the backend, the
backend will only access the first 4k of each page in the queue page list.

Signed-off-by: Jordan Kimbrough <[email protected]>
Signed-off-by: John Fraker <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

gve: Remove obsolete checks that rely on page size.

These checks are safe to remove as they are no longer enforced by the
backend. Retaining them would require updating them to work differently
with page sizes larger than 4k.

Signed-off-by: Jordan Kimbrough <[email protected]>
Signed-off-by: John Fraker <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

gve: Deprecate adminq_pfn for pci revision 0x1.

adminq_pfn assumes a page size of 4k, causing this mechanism to break in
kernels compiled with different page sizes. A new PCI device revision was
needed for the device to be able to communicate with the driver how to
set up the admin queue prior to having access to the admin queue.

Signed-off-by: Jordan Kimbrough <[email protected]>
Signed-off-by: John Fraker <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

gve: Perform adminq allocations through a dma_pool.

This allows the adminq to be smaller than a page, paving the way for
non 4k page support. This is to support platforms where PAGE_SIZE
is not 4k, such as some ARM platforms.

Signed-off-by: Jordan Kimbrough <[email protected]>
Signed-off-by: John Fraker <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-11-27 (i40e, iavf)

This series contains updates to i40e and iavf drivers.

Ivan Vecera performs more cleanups on i40e and iavf drivers; removing
unused fields, defines, and unneeded fields.

Petr Oros utilizes iavf_schedule_aq_request() helper to replace open
coded equivalents.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  iavf: use iavf_schedule_aq_request() helper
  iavf: Remove queue tracking fields from iavf_adminq_ring
  i40e: Remove queue tracking fields from i40e_adminq_ring
  i40e: Remove AQ register definitions for VF types
  i40e: Delete unused and useless i40e_pf fields
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

r8169: remove multicast filter limit

Once upon a time, when r8169 was new, the multicast filter limit code
was copied from RTL8139 driver. There the filter limit is even
user-configurable.
The filtering is hash-based and we don't have perfect filtering.
Actually the mc filtering on RTL8125 still seems to be the same
as used on 8390/NE2000. So it's not clear to me which benefit it
should bring when switching to all-multi mode once a certain number
of filter bits is set. More the opposite: Filtering out at least
some unwanted mc traffic is better than no filtering.
Also the available chip documentation doesn't mention any restriction.
Therefore remove the filter limit.

Signed-off-by: Heiner Kallweit <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ice: fix error code in ice_eswitch_attach()

Set the "err" variable on this error path.

Fixes: fff292b47ac1 ("ice: add VF representors one by one")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Wojciech Drewek <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

nfp: ethtool: support TX/RX pause frame on/off

Add support for ethtool -A tx on/off and rx on/off.

Signed-off-by: Yu Xiao <[email protected]>
Signed-off-by: Louis Peens <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'net-page_pool-add-netlink-based-introspection'

Jakub Kicinski says:

====================
net: page_pool: add netlink-based introspection

We recently started to deploy newer kernels / drivers at Meta,
making significant use of page pools for the first time.
We immediately run into page pool leaks both real and false positive
warnings. As Eric pointed out/predicted there's no guarantee that
applications will read / close their sockets so a page pool page
may be stuck in a socket (but not leaked) forever. This happens
a lot in our fleet. Most of these are obviously due to application
bugs but we should not be printing kernel warnings due to minor
application resource leaks.

Conversely the page pool memory may get leaked at runtime, and
we have no way to detect / track that, unless someone reconfigures
the NIC and destroys the page pools which leaked the pages.

The solution presented here is to expose the memory use of page
pools via netlink. This allows for continuous monitoring of memory
used by page pools, regardless if they were destroyed or not.
Sample in patch 15 can print the memory use and recycling
efficiency:

$ ./page-pool
    eth0[2] page pools: 10 (zombies: 0)
refs: 41984 bytes: 171966464 (refs: 0 bytes: 0)
recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201)

v4:
- use dev_net(netdev)->loopback_dev
- extend inflight doc
v3: https://lore.kernel.org/all/20231122034420.1158898 [email protected]/
- ID is still here, can't decide if it matters
- rename destroyed -> detach-time, good enough?
- fix build for netsec
v2: https://lore.kernel.org/r/20231121000048 [email protected]
- hopefully fix build with PAGE_POOL=n
v1: https://lore.kernel.org/all/20231024160220.3973311 [email protected]/
- The main change compared to the RFC is that the API now exposes
   outstanding references and byte counts even for "live" page pools.
   The warning is no longer printed if page pool is accessible via netlink.
RFC: https://lore.kernel.org/all/20230816234303.3786178 [email protected]/
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

tools: ynl: add sample for getting page-pool information

Regenerate the tools/ code after netdev spec changes.

Add sample to query page-pool info in a concise fashion:

$ ./page-pool
eth0[2] page pools: 10 (zombies: 0)
refs: 41984 bytes: 171966464 (refs: 0 bytes: 0)
recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201)

Acked-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: page_pool: mute the periodic warning for visible page pools

Mute the periodic "stalled pool shutdown" warning if the page pool
is visible to user space. Rolling out a driver using page pools
to just a few hundred hosts at Meta surfaces applications which
fail to reap their broken sockets. Obviously it's best if the
applications are fixed, but we don't generally print warnings
for application resource leaks. Admins can now depend on the
netlink interface for getting page pool info to detect buggy
apps.

While at it throw in the ID of the pool into the message,
in rare cases (pools from destroyed netns) this will make
finding the pool with a debugger easier.

Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: page_pool: expose page pool stats via netlink

Dump the stats into netlink. More clever approaches
like dumping the stats per-CPU for each CPU individually
to see where the packets get consumed can be implemented
in the future.

A trimmed example from a real (but recently booted system):

$ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \
           --dump page-pool-stats-get
[{'info': {'id': 19, 'ifindex': 2},
  'alloc-empty': 48,
  'alloc-fast': 3024,
  'alloc-refill': 0,
  'alloc-slow': 48,
  'alloc-slow-high-order': 0,
  'alloc-waive': 0,
  'recycle-cache-full': 0,
  'recycle-cached': 0,
  'recycle-released-refcnt': 0,
  'recycle-ring': 0,
  'recycle-ring-full': 0},
{'info': {'id': 18, 'ifindex': 2},
  'alloc-empty': 66,
  'alloc-fast': 11811,
  'alloc-refill': 35,
  'alloc-slow': 66,
  'alloc-slow-high-order': 0,
  'alloc-waive': 0,
  'recycle-cache-full': 1145,
  'recycle-cached': 6541,
  'recycle-released-refcnt': 0,
  'recycle-ring': 1275,
  'recycle-ring-full': 0},
{'info': {'id': 17, 'ifindex': 2},
  'alloc-empty': 73,
  'alloc-fast': 62099,
  'alloc-refill': 413,
...

Acked-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: page_pool: report when page pool was destroyed

Report when page pool was destroyed. Together with the inflight
/ memory use reporting this can serve as a replacement for the
warning about leaked page pools we currently print to dmesg.

Example output for a fake leaked page pool using some hacks
in netdevsim (one "live" pool, and one "leaked" on the same dev):

$ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \
--dump page-pool-get
[{'id': 2, 'ifindex': 3},
{'id': 1, 'ifindex': 3, 'destroyed': 133, 'inflight': 1}]

Tested-by: Dragos Tatulea <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: page_pool: report amount of memory held by page pools

Advanced deployments need the ability to check memory use
of various system components. It makes it possible to make informed
decisions about memory allocation and to find regressions and leaks.

Report memory use of page pools. Report both number of references
and bytes held.

Acked-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: page_pool: add netlink notifications for state changes

Generate netlink notifications about page pool state changes.

Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: page_pool: implement GET in the netlink API

Expose the very basic page pool information via netlink.

Example using ynl-py for a system with 9 queues:

$ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \
--dump page-pool-get
[{'id': 19, 'ifindex': 2, 'napi-id': 147},
{'id': 18, 'ifindex': 2, 'napi-id': 146},
{'id': 17, 'ifindex': 2, 'napi-id': 145},
{'id': 16, 'ifindex': 2, 'napi-id': 144},
{'id': 15, 'ifindex': 2, 'napi-id': 143},
{'id': 14, 'ifindex': 2, 'napi-id': 142},
{'id': 13, 'ifindex': 2, 'napi-id': 141},
{'id': 12, 'ifindex': 2, 'napi-id': 140},
{'id': 11, 'ifindex': 2, 'napi-id': 139},
{'id': 10, 'ifindex': 2, 'napi-id': 138}]

Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: page_pool: add nlspec for basic access to page pools

Add a Netlink spec in YAML for getting very basic information
about page pools.

Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

eth: link netdev to page_pools in drivers

Link page pool instances to netdev for the drivers which
already link to NAPI. Unless the driver is doing something
very weird per-NAPI should imply per-netdev.

Add netsec as well, Ilias indicates that it fits the mold.

Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: page_pool: stash the NAPI ID for easier access

To avoid any issues with race conditions on accessing napi
and having to think about the lifetime of NAPI objects
in netlink GET - stash the napi_id to which page pool
was linked at creation time.

Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: page_pool: record pools per netdev

Link the page pools with netdevs. This needs to be netns compatible
so we have two options. Either we record the pools per netns and
have to worry about moving them as the netdev gets moved.
Or we record them directly on the netdev so they move with the netdev
without any extra work.

Implement the latter option. Since pools may outlast netdev we need
a place to store orphans. In time honored tradition use loopback
for this purpose.

Reviewed-by: Mina Almasry <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: page_pool: id the page pools

To give ourselves the flexibility of creating netlink commands
and ability to refer to page pool instances in uAPIs create
IDs for page pools.

Reviewed-by: Ilias Apalodimas <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: page_pool: factor out uninit

We'll soon (next change in the series) need a fuller unwind path
in page_pool_create() so create the inverse of page_pool_init().

Reviewed-by: Ilias Apalodimas <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

Merge tag 'wireless-next-2023-11-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.8

The first features pull request for v6.8. Not so big in number of
commits but we removed quite a few ancient drivers: libertas 16-bit
PCMCIA support, atmel, hostap, zd1201, orinoco, ray_cs, wl3501 and
rndis_wlan.

Major changes:

cfg80211/mac80211
- extend support for scanning while Multi-Link Operation (MLO) connected

* tag 'wireless-next-2023-11-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (68 commits)
  wifi: nl80211: Documentation update for NL80211_CMD_PORT_AUTHORIZED event
  wifi: mac80211: Extend support for scanning while MLO connected
  wifi: cfg80211: Extend support for scanning while MLO connected
  wifi: ieee80211: fix PV1 frame control field name
  rfkill: return ENOTTY on invalid ioctl
  MAINTAINERS: update iwlwifi maintainers
  wifi: rtw89: 8922a: read efuse content from physical map
  wifi: rtw89: 8922a: read efuse content via efuse map struct from logic map
  wifi: rtw89: 8852c: read RX gain offset from efuse for 6GHz channels
  wifi: rtw89: mac: add to access efuse for WiFi 7 chips
  wifi: rtw89: mac: use mac_gen pointer to access about efuse
  wifi: rtw89: 8922a: add 8922A basic chip info
  wifi: rtlwifi: drop unused const_amdpci_aspm
  wifi: mwifiex: mwifiex_process_sleep_confirm_resp(): remove unused priv variable
  wifi: rtw89: regd: update regulatory map to R65-R44
  wifi: rtw89: regd: handle policy of 6 GHz according to BIOS
  wifi: rtw89: acpi: process 6 GHz band policy from DSM
  wifi: rtlwifi: simplify rtl_action_proc() and rtl_tx_agg_start()
  wifi: rtw89: pci: update interrupt mitigation register for 8922AE
  wifi: rtw89: pci: correct interrupt mitigation register for 8852CE
  ...
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'selftests-tc-testing-updates-and-cleanups-for-tdc'

Pedro Tammela says:

====================
selftests: tc-testing: updates and cleanups for tdc

Address the recommendations from the previous series and cleanup some
leftovers.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: tc-testing: remove unused import

Remove this leftover from the times we pre-allocated everything

Signed-off-by: Pedro Tammela <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: tc-testing: cleanup on Ctrl-C

Cleanup net namespaces and other resources if we get a SIGINT (Ctrl-C).
As user visible resources are allocated on a per test basis, it's only
required to catch this condition when (possibly) running tests.

So far calling post_suite is enough to free up anything that might
linger.

A missing keyword replacement for nsPlugin is also included.

Signed-off-by: Pedro Tammela <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: tc-testing: prefix iproute2 functions with "ipr2"

As suggested by Simon, prefix the functions that operate on iproute2
commands in contrast with the "nl" netlink prefix.

Cc: Simon Horman <[email protected]>
Signed-off-by: Pedro Tammela <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: tc-testing: remove unnecessary time.sleep

This operation is redundant and it's not stabilizing nor waiting
for anything.

Signed-off-by: Pedro Tammela <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: tc-testing: remove buildebpf plugin

As tdc only tests loading/deleting and anything more complicated is
better left to the ebpf test suite, provide a pre-compiled version of
'action.c' and don't bother compiling it in kselftests or on the fly
at all.

Cc: Davide Caratti <[email protected]>
Signed-off-by: Pedro Tammela <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'net-phylink-improve-phy-validation'

Russell King says:

====================
net: phylink: improve PHY validation

One of the issues which has concerned me about the rate matching
implenentation that we have is that phy_get_rate_matching() returns
whether rate matching will be used for a particular interface, and we
enquire only for one interface.

Aquantia PHYs can be programmed with the rate matching and interface
mode settings on a per-media speed basis using the per-speed vendor 1
global configuration registers.

Thus, it is possible for the PHY to be configured to use rate matching
for 10G, 5G, 2.5G with 10GBASE-R, and then SGMII for the remaining
speeds. Therefore, it clearly doesn't make sense to enquire about rate
matching for just one interface mode.

Also, PHYs that change their interfaces are handled sub-optimally, in
that we validate all the interface modes that the host supports, rather
than the interface modes that the PHY will use.

This patch series changes the way we validate PHYs, but in order to do
so, we need to know exactly which interface modes will be used by the
PHY. So that phylib can convey this information, we add
"possible_interfaces" to struct phy_device.

possible_interfaces is to be filled in by a phylib driver once the PHY
is configured (in other words in the PHYs .config_init method) with the
interface modes that it will switch between. This then allows users of
phylib to know which interface modes will be used by the PHY.

This allows us to solve both these issues: where possible_interfaces is
provided, we can validate which ethtool link modes can be supported by
looking at which interface modes that both the PHY and host support,
and request rate matching information for each mode.

This should improve the accuracy of the validation.

Sending this out again without RFC as Jie Luo will need it for the
QCA8084 changes. No changes except to add the attributations already
received. Thanks!
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phylink: use the PHY's possible_interfaces if populated

Some PHYs such as Aquantia, Broadcom 84881, and Marvell 88X33x0 can
switch between a set of interface types depending on the negotiated
media speed, or can use rate adaption for some or all of these
interface types.

We currently assume that these are Clause 45 PHYs that are configured
not to use a specific set of interface modes, which has worked so far,
but is just a work-around. In this workaround, we validate using all
interfaces that the MAC supports, which can lead to extra modes being
advertised that can not be supported.

To properly address this, switch to using the newly introduced PHY
possible_interfaces bitmap which indicates which interface modes will
be used by the PHY as configured. We calculate the union of the PHY's
possible interfaces and MACs supported interfaces, checking that is
non-empty. If the PHY is on a SFP, we further reduce the set by those
which can be used on a SFP module, again checking that is non-empty.
Finally, we validate the subset of interfaces, taking account of
whether rate matching will be used for each individual interface mode.

This becomes independent of whether the PHY is clause 22 or clause 45.

It is encouraged that all PHYs that switch interface modes or use
rate matching should populate phydev->possible_interfaces.

Tested-by: Luo Jie <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phylink: split out PHY validation from phylink_bringup_phy()

When bringing up a PHY, we need to work out which ethtool link modes it
should support and advertise. Clause 22 PHYs operate in a single
interface mode, which can be easily dealt with. However, clause 45 PHYs
tend to switch interface mode depending on the media. We need more
flexible validation at this point, so this patch splits out that code
in preparation to changing it.

Tested-by: Luo Jie <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phylink: pass PHY into phylink_validate_mask()

Pass the phy (if any) into phylink_validate_mask() so that we can
validate each interface with its rate matching setting.

Tested-by: Luo Jie <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phylink: pass PHY into phylink_validate_one()

Pass the phy (if any) into phylink_validate_one() so that we can
validate each interface with its rate matching setting.

Tested-by: Luo Jie <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phylink: split out per-interface validation

Split out the internals of phylink_validate_mask() to make the code
easier to read.

Tested-by: Luo Jie <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phy: aquantia: fill in possible_interfaces for AQR113C

Fill in the possible_interfaces bitmap for AQR113C so phylink knows
which interface modes will be used by the PHY.

Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phy: bcm84881: fill in possible_interfaces

Fill in the possible_interfaces member. This PHY driver only supports
a single configuration found on SFPs.

Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phy: marvell10g: fill in possible_interfaces

Fill in the possible_interfaces member according to the selected
mactype mode.

Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phy: marvell10g: table driven mactype decode

Replace the code-based mactype decode with a table driven approach.
This will allow us to fill in the possible_interfaces cleanly.

Signed-off-by: Russell King (Oracle) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phy: add possible interfaces

Add a possible_interfaces member to struct phy_device to indicate which
interfaces a clause 45 PHY may switch between depending on the media.
This must be populated by the PHY driver by the time the .config_init()
method completes according to the PHYs host-side configuration.

For example, the Marvell 88x3310 PHY can switch between 10GBASE-R,
5GBASE-R, 2500BASE-X, and SGMII on the host side depending on the media
side speed, so all these interface modes are set in the
possible_interfaces member.

This allows phylib users (such as phylink) to know in advance which
interface modes to expect, which allows them to appropriately restrict
the advertised link modes according to the capabilities of other parts
of the link.

Tested-by: Luo Jie <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

iavf: use iavf_schedule_aq_request() helper

Use the iavf_schedule_aq_request() helper when we need to
schedule a watchdog task immediately. No functional change.

Signed-off-by: Petr Oros <[email protected]>
Reviewed-by: Wojciech Drewek <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

iavf: Remove queue tracking fields from iavf_adminq_ring

Fields 'head', 'tail', 'len', 'bah' and 'bal' in iavf_adminq_ring
are used to store register offsets. These offsets are initialized
and remains constant so there is no need to store them in the
iavf_adminq_ring structure.

Remove these fields from iavf_adminq_ring and use register offset
constants instead. Remove iavf_adminq_init_regs() that originally
stores these constants into these fields.

Finally improve iavf_check_asq_alive() that assumes that
non-zero value of hw->aq.asq.len indicates fully initialized
AdminQ send queue. Replace it by check for non-zero value
of field hw->aq.asq.count that is non-zero when the sending
queue is initialized and is zeroed during shutdown of
the queue.

Signed-off-by: Ivan Vecera <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Wojciech Drewek <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

i40e: Remove queue tracking fields from i40e_adminq_ring

Fields 'head', 'tail', 'len', 'bah' and 'bal' in i40e_adminq_ring
are used to store register offsets. These offsets are initialized
and remains constant so there is no need to store them in the
i40e_adminq_ring structure.

Remove these fields from i40e_adminq_ring and use register offset
constants instead. Remove i40e_adminq_init_regs() that originally
stores these constants into these fields.

Finally improve i40e_check_asq_alive() that assumes that
non-zero value of hw->aq.asq.len indicates fully initialized
AdminQ send queue. Replace it by check for non-zero value
of field hw->aq.asq.count that is non-zero when the sending
queue is initialized and is zeroed during shutdown of
the queue.

Signed-off-by: Ivan Vecera <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Wojciech Drewek <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

i40e: Remove AQ register definitions for VF types

The i40e driver does not handle its VF device types so there
is no need to keep AdminQ register definitions for such
device types. Remove them.

Signed-off-by: Ivan Vecera <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Wojciech Drewek <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

i40e: Delete unused and useless i40e_pf fields

Removed fields:
.fc_autoneg_status
    Since commit c56999f94876 ("i40e/i40evf: Add set_fc and init of
    FC settings") write-only and otherwise unused
.eeprom_version
    Write-only and otherwise unused
.atr_sample_rate
    Has only one possible value (I40E_DEFAULT_ATR_SAMPLE_RATE). Remove
    it and replace its occurrences by I40E_DEFAULT_ATR_SAMPLE_RATE
.adminq_work_limit
    Has only one possible value (I40E_AQ_WORK_LIMIT). Remove it and
    replace its occurrences by I40E_AQ_WORK_LIMIT
.tx_sluggish_count
    Unused, never written
.pf_seid
    Used to store VSI downlink seid and it is referenced only once
    in the same codepath. There is no need to save it into i40e_pf.
    Remove it and use downlink_seid directly in the mentioned log
    message.
.instance
    Write only. Remove it as well as ugly static local variable
    'pfs_found' in i40e_probe.
.int_policy
.switch_kobj
.ptp_pps_work
.ptp_extts1_work
.ptp_pps_start
.pps_delay
.ptp_pin
.override_q_count
    All these unused at all

Prior the patch:
pahole -Ci40e_pf drivers/net/ethernet/intel/i40e/i40e.ko | tail -5
        /* size: 5368, cachelines: 84, members: 127 */
        /* sum members: 5297, holes: 20, sum holes: 71 */
        /* paddings: 6, sum paddings: 19 */
        /* last cacheline: 56 bytes */
};

After the patch:
pahole -Ci40e_pf drivers/net/ethernet/intel/i40e/i40e.ko | tail -5
        /* size: 4976, cachelines: 78, members: 112 */
        /* sum members: 4905, holes: 17, sum holes: 71 */
        /* paddings: 6, sum paddings: 19 */
        /* last cacheline: 48 bytes */
};

Signed-off-by: Ivan Vecera <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Wojciech Drewek <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

net :mana :Add remaining GDMA stats for MANA to ethtool

Extend performance counter stats in 'ethtool -S <interface>'
for MANA VF to include all GDMA stat counter.

Tested-on: Ubuntu22
Testcases:
1. LISA testcase:
PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-Synthetic
2. LISA testcase:
PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-SRIOV

Signed-off-by: Shradha Gupta <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

mm/page_pool: catch page_pool memory leaks

Pages belonging to a page_pool (PP) instance must be freed through the
PP APIs in-order to correctly release any DMA mappings and release
refcnt on the DMA device when freeing PP instance. When PP release a
page (page_pool_release_page) the page->pp_magic value is cleared.

This patch detect a leaked PP page in free_page_is_bad() via
unexpected state of page->pp_magic value being PP_SIGNATURE.

We choose to report and treat it as a bad page. It would be possible
to release the page via returning it to the PP instance as the
page->pp pointer is likely still valid.

Notice this code is only activated when either compiled with
CONFIG_DEBUG_VM or boot cmdline debug_pagealloc=on, and
CONFIG_PAGE_POOL.

Reduced example output of leak with PP_SIGNATURE = dead000000000040:

BUG: Bad page state in process swapper/4  pfn:141fa6
page:000000006dbf8062 refcount:0 mapcount:0 mapping:0000000000000000 index:0x141fa6000 pfn:0x141fa6
flags: 0x2fffff80000000(node=0|zone=2|lastcpupid=0x1fffff)
page_type: 0xffffffff()
raw: 002fffff80000000 dead000000000040 ffff88814888a000 0000000000000000
raw: 0000000141fa6000 0000000000000001 00000000ffffffff 0000000000000000
page dumped because: page_pool leak
[...]
Call Trace:
  <IRQ>
  dump_stack_lvl+0x32/0x50
  bad_page+0x70/0xf0
  free_unref_page_prepare+0x263/0x430
  free_unref_page+0x34/0x130
  mlx5e_free_rx_mpwqe+0x190/0x1c0 [mlx5_core]
  mlx5e_post_rx_mpwqes+0x1ac/0x280 [mlx5_core]
  mlx5e_napi_poll+0x12b/0x710 [mlx5_core]
  ? skb_free_head+0x4f/0x90
  __napi_poll+0x2b/0x1c0
  net_rx_action+0x27b/0x360

The advantage is the Call Trace directly points to the function
leaking the PP page, which in this case is an on purpose bug
introduced into the mlx5 driver to test this code change.

Currently PP will periodically in page_pool_release_retry()
printk warning "stalled pool shutdown" which cannot be directly
corrolated to leaking and might as well be a false positive
due to SKBs being stuck on a socket for an extended period.
After this patch we should be able to remove this printk.

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: pci: Fix missing error checking

I accidentally removed the error checking after issuing the reset.
Restore it.

Fixes: f257c73e5356 ("mlxsw: pci: Add support for new reset flow")
Reported-by: Coverity Scan <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sfp: rework the RollBall PHY waiting code

RollBall SFP modules allow the access to PHY registers only after a
certain time has passed. Until then, the registers read 0xffff.

Currently we have quirks for modules where we need to wait either 25
seconds or 4 seconds, but recently I got hands on another module where
the wait is even shorter.

Instead of hardcoding different wait times, lets rework the code:
- increase the PHY retry count to 25
- when RollBall module is detected, increase the PHY retry time from
50ms to 1s

Signed-off-by: Marek Behún <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'firmware_loader'

Kory says:

====================
This patch was initially submitted as part of a net patch series.
Conor expressed interest in using it in a different subsystem.
https://lore.kernel.org/netdev/20231116-feature_poe-v1-7-be48044bf249@bootlin.com/

Consequently, I extracted it from the series and submitted it separately.
I first tried to send it to driver-core but it seems also not the best
choice:
https://lore.kernel.org/lkml/2023111720-slicer-exes-7d9f@gregkh/
====================

Signed-off-by: Jakub Kicinski <[email protected]>

firmware_loader: Expand Firmware upload error codes with firmware invalid error

No error code are available to signal an invalid firmware content.
Drivers that can check the firmware content validity can not return this
specific failure to the user-space

Expand the firmware error code with an additional code:
- "firmware invalid" code which can be used when the provided firmware
is invalid

Sync lib/test_firmware.c file accordingly.

Acked-by: Luis Chamberlain <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Kory Maincent <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/20231122-feature_firmware_error_code-v3-1-04ec753afb71@bootlin.com
Signed-off-by: Jakub Kicinski <[email protected]>

wifi: nl80211: Documentation update for NL80211_CMD_PORT_AUTHORIZED event

Drivers supporting 4-way handshake offload for AP/P2p-GO and
STA/P2P-client should use this event to indicate that port has
been authorized and open for regular data traffic, sending
this event on completion of successful 4-way handshake.

Signed-off-by: Vinayak Yadawad <[email protected]>
Link: https://lore.kernel.org/r/f746b59f41436e9df29c24688035fbc6eb91ab06.1699510229.git.vinayak.yadawad@broadcom.com
[rewrite it all to not use the term 'GC' that we don't use
in place of P2P-client]
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211: Extend support for scanning while MLO connected

- If the scan request includes a link ID, validate that it is
  one of the active links. Otherwise, if the scan request doesn't
  include a valid link ID, select one of the active links.

- When reporting the TSF for a BSS entry, use the link ID information
  from the Rx status or the scan request to set the parent BSSID.

Signed-off-by: Ilan Peer <[email protected]>
Signed-off-by: Gregory Greenman <[email protected]>
Link: https://lore.kernel.org/r/20231113112844.68564692c404.Iae9605cbb7f9d52e00ce98260b3559a34cf18341@changeid
Signed-off-by: Johannes Berg <[email protected]>