Git Repo - linux.git/log

net: phy: nxp-c45-tja11xx: prepare the ground for TJA1120

Between TJA1120 and TJA1103 the hardware was improved, but some register
addresses were changed and some bit fields were moved from one register
to another.

Introduce the nxp_c45_reg_field structure and its associated functions to
abstract the differences between the PHYs.

Remove the defined bits and register addresses that are not common
between TJA1103 and TJA1120 and replace them with reg_fields and
register addresses from phydev->drv->driver_data.

Signed-off-by: Radu Pirea (NXP OSS) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phy: nxp-c45-tja11xx: remove RX BIST frame counters

Remove RX BIST frame counters from the PHY statistics.
In production mode, these counters are always read as 0.

Signed-off-by: Radu Pirea (NXP OSS) <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phy: nxp-c45-tja11xx: use phylib master/slave implementation

Remove the custom implementation of master/save setup and read status
and use genphy_c45_config_aneg and genphy_c45_read_status since phylib
has support for master/slave setup and master/slave status.

Signed-off-by: Radu Pirea (NXP OSS) <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'virtio_net-add-per-queue-interrupt-coalescing-support'

Gavin Li says:

====================
virtio_net: add per queue interrupt coalescing support

Currently, coalescing parameters are grouped for all transmit and receive
virtqueues. This patch series add support to set or get the parameters for
a specified virtqueue.

When the traffic between virtqueues is unbalanced, for example, one virtqueue
is busy and another virtqueue is idle, then it will be very useful to
control coalescing parameters at the virtqueue granularity.

Example command:
$ ethtool -Q eth5 queue_mask 0x1 --coalesce tx-packets 10
Would set max_packets=10 to VQ 1.
$ ethtool -Q eth5 queue_mask 0x1 --coalesce rx-packets 10
Would set max_packets=10 to VQ 0.
$ ethtool -Q eth5 queue_mask 0x1 --show-coalesce
Queue: 0
Adaptive RX: off TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 222
rx-frames: 0
rx-usecs-irq: 0
rx-frames-irq: 256

tx-usecs: 222
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 256

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

virtio_net: enable per queue interrupt coalesce feature

Enable per queue interrupt coalesce feature bit in driver and validate its
dependency with control queue.

Signed-off-by: Gavin Li <[email protected]>
Reviewed-by: Dragos Tatulea <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Heng Qi <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

virtio_net: support per queue interrupt coalesce command

Add interrupt_coalesce config in send_queue and receive_queue to cache user
config.

Send per virtqueue interrupt moderation config to underlying device in
order to have more efficient interrupt moderation and cpu utilization of
guest VM.

Additionally, address all the VQs when updating the global configuration,
as now the individual VQs configuration can diverge from the global
configuration.

Signed-off-by: Gavin Li <[email protected]>
Reviewed-by: Dragos Tatulea <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Heng Qi <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

virtio_net: extract interrupt coalescing settings to a structure

Extract interrupt coalescing settings to a structure so that it could be
reused in other data structures.

Signed-off-by: Gavin Li <[email protected]>
Reviewed-by: Dragos Tatulea <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Acked-by: Jason Wang <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Heng Qi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

inet6: Remove unused function declaration udpv6_connect()

This is never implemented since the beginning of git history.

Signed-off-by: Yue Haibing <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: make sure we never create ifindex = 0

Instead of allocating from 1 use proper xa_init flag,
to protect ourselves from IDs wrapping back to 0.

Fixes: 759ab1edb56c ("net: store netdevs in an xarray")
Reported-by: Stephen Hemminger <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Reviewed-by: Leon Romanovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net/macmace: Replace zero-length array with DECLARE_FLEX_ARRAY() helper

Since zero-length arrays are deprecated, we are replacing
them with C99 flexible-array members. As a result, instead
of declaring a zero-length array, use the new
DECLARE_FLEX_ARRAY() helper macro.

This fixes warnings such as:
./drivers/net/ethernet/apple/macmace.c:80:4-8: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)

Signed-off-by: Atul Raut <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: qca8k: use dsa_for_each macro instead of for loop

Convert for loop to dsa_for_each macro to save some redundant write on
unconnected/unused port and tidy things up.

Signed-off-by: Christian Marangi <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: dsa: qca8k: move qca8xxx hol fixup to separate function

Move qca8xxx hol fixup to separate function to tidy things up and to
permit using a more efficent loop in future patch.

Signed-off-by: Christian Marangi <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: dsa: qca8k: limit user ports access to the first CPU port on setup

In preparation for multi-CPU support, set CPU port LOOKUP MEMBER outside
the port loop and setup the LOOKUP MEMBER mask for user ports only to
the first CPU port.

This is to handle flooding condition where every CPU port is set as
target and prevent packet duplication for unknown frames from user ports.

Secondary CPU port LOOKUP MEMBER mask will be setup later when
port_change_master will be implemented.

Signed-off-by: Christian Marangi <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: dsa: qca8k: make learning configurable and keep off if standalone

Address learning should initially be turned off by the driver for port
operation in standalone mode, then the DSA core handles changes to it
via ds->ops->port_bridge_flags().

Currently this is not the case for qca8k where learning is enabled
unconditionally in qca8k_setup for every user port.

Handle ports configured in standalone mode by making the learning
configurable and not enabling it by default.

Implement .port_pre_bridge_flags and .port_bridge_flags dsa ops to
enable learning for bridge that request it and tweak
.port_stp_state_set to correctly disable learning when port is
configured in standalone mode.

Signed-off-by: Christian Marangi <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: dsa: tag_qca: return early if dev is not found

Currently checksum is recalculated and dsa tag stripped even if we later
don't find the dev.

To improve code, exit early if we don't find the dev and skip additional
operation on the skb since it will be freed anyway.

Signed-off-by: Christian Marangi <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'net-sched-improve-class-lifetime-handling'

Pedro Tammela says:

====================
net/sched: improve class lifetime handling

Valis says[0]:
============
Three classifiers (cls_fw, cls_u32 and cls_route) always copy
tcf_result struct into the new instance of the filter on update.

This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.
============

Turns out these could have been spotted easily with proper warnings.
Improve the current class lifetime with wrappers that check for
overflow/underflow.

While at it add an extack for when a class in use is deleted.

[0] https://lore.kernel.org/all/20230721174856 [email protected]/
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net/sched: sch_qfq: warn about class in use while deleting

Add extack to warn that delete was rejected because
the class is still in use

Acked-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net/sched: sch_htb: warn about class in use while deleting

Add extack to warn that delete was rejected because
the class is still in use

Acked-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net/sched: sch_hfsc: warn about class in use while deleting

Add extack to warn that delete was rejected because
the class is still in use

Acked-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net/sched: sch_drr: warn about class in use while deleting

Add extack to warn that delete was rejected because
the class is still in use

Acked-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net/sched: wrap open coded Qdics class filter counter

The 'filter_cnt' counter is used to control a Qdisc class lifetime.
Each filter referecing this class by its id will eventually
increment/decrement this counter in their respective
'add/update/delete' routines.
As these operations are always serialized under rtnl lock, we don't
need an atomic type like 'refcount_t'.

It also means that we lose the overflow/underflow checks already
present in refcount_t, which are valuable to hunt down bugs
where the unsigned counter wraps around as it aids automated tools
like syzkaller to scream in such situations.

Wrap the open coded increment/decrement into helper functions and
add overflow checks to the operations.

Acked-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'mptcp-cleanup-and-improvements-in-the-selftests'

Matthieu Baerts says:

====================
mptcp: cleanup and improvements in the selftests

This small series of 4 patches adds some improvements in MPTCP
selftests:

- Patch 1 reworks the detailed report of mptcp_join.sh selftest to
  better display what went well or wrong per test.

- Patch 2 adds colours (if supported, forced and/or not disabled) in
  mptcp_join.sh selftest output to help spotting issues.

- Patch 3 modifies an MPTCP selftest tool to interact with the
  path-manager via Netlink to always look for errors if any. This makes
  sure odd behaviours can be seen in the logs and errors can be caught
  later if needed.

- Patch 4 removes stdout and stderr redirections to /dev/null when using
  pm_nl_ctl if no errors are expected in order to log odd behaviours.
====================

Link: https://lore.kernel.org/r/20230730-upstream-net-next-20230728-mptcp-selftests-misc-v1-0-7e9cc530a9cd@tessares.net
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: userspace_pm: unmute unexpected errors

All pm_nl_ctl commands were muted. If there was an unexpected error with
one of them, this was simply not visible in the logs, making the
analysis very hard. It could also hide misuse of commands by mistake.

Now the output is only muted when we do expect to have an error, e.g.
when giving invalid arguments on purpose.

Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Matthieu Baerts <[email protected]>
Link: https://lore.kernel.org/r/20230730-upstream-net-next-20230728-mptcp-selftests-misc-v1-4-7e9cc530a9cd@tessares.net
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: pm_nl_ctl: always look for errors

If a Netlink command for the MPTCP path-managers is not valid, it is
important to check if there are errors. If yes, they need to be reported
instead of being ignored and exiting without errors.

Now if no replies are expected, an ACK from the kernelspace is asked by
the userspace in order to always expect a reply. We can use the same
buffer that is currently always >1024 bytes. Then we can check if there
is an error (err->error), print it if any and report the error.

After this modification, it is required to mute expected errors in
mptcp_join.sh and pm_netlink.sh selftests:

- when trying to add a bad endpoint, e.g. duplicated
- when trying to set the two limits above the hard limit

Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Matthieu Baerts <[email protected]>
Link: https://lore.kernel.org/r/20230730-upstream-net-next-20230728-mptcp-selftests-misc-v1-3-7e9cc530a9cd@tessares.net
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: join: colored results

Thanks to the parent commit, it is easy to change the output and add
some colours to help spotting issues.

The colours are not used if stdout is redirected or if NO_COLOR env var
is set to 1 as specified in https://no-color.org.

It is possible to force displaying the colours even if stdout is
redirected by setting this env var:

SELFTESTS_MPTCP_LIB_COLOR_FORCE=1

Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Matthieu Baerts <[email protected]>
Link: https://lore.kernel.org/r/20230730-upstream-net-next-20230728-mptcp-selftests-misc-v1-2-7e9cc530a9cd@tessares.net
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: join: rework detailed report

This patch modifies how the detailed results are printed, mainly to
improve what is displayed in case of issue:

- Now the test name (title) is printed earlier, when starting the test
  if it is not intentionally skipped: by doing that, errors linked to
  a test will be printed after having written the test name and then
  avoid confusions.

- Due to the previous item, it is required to add a new line after
  having printed the test name because in case of error with a command,
  it is better not to have the output in the middle of the screen.

- Each check is printed on a dedicated line with aligned status (ok,
  skip, fail): it is easier to spot which one has failed, simpler to
  manage in the code not having to deal with alignment case by case and
  helpers can be used to uniform what is done. These helpers can also be
  useful later to do more actions depending on the results or change in
  one place what is printed.

- Info messages have been reduced and aligned as well. And info messages
  about the creation of the default test files of 1 KB are no longer
  printed.

Example:

  001 no JOIN
        syn                                 [ ok ]
        synack                              [ ok ]
        ack                                 [ ok ]

Or with a skip and a failure:

  001 no JOIN
        syn                                 [ ok ]
        synack                              [fail] got 42 JOIN[s] synack expected 0
  Server ns stats
  (...)
  Client ns stats
  (...)
        ack                                 [skip]

Or with info:

  104 Infinite map
        Test file (size 128 KB) for client
        Test file (size 128 KB) for server
        file received by server has inverted byte at 169
        5 corrupted pkts
        syn                                 [ ok ]
        synack                              [ ok ]

While at it, verify_listener_events() now also print more info in case
of failure and in pm_nl_check_endpoint(), the test is marked as failed
instead of skipped if no ID has been given (internal selftest issue).

Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Matthieu Baerts <[email protected]>
Link: https://lore.kernel.org/r/20230730-upstream-net-next-20230728-mptcp-selftests-misc-v1-1-7e9cc530a9cd@tessares.net
Signed-off-by: Jakub Kicinski <[email protected]>

net/hsr: Remove unused function declarations

commit f421436a591d ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)")
introducted these but never implemented.

Signed-off-by: Yue Haibing <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: connector: Fix input argument error paths to skip

Fix input argument parsing paths to skip from their error legs.
This fix helps to avoid false test failure reports without running
the test.

Signed-off-by: Shuah Khan <[email protected]>
Reviewed-by: Anjali Kulkarni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tcx: Fix splat during dev unregister

During unregister_netdevice_many_notify(), the ordering of our concerned
function calls is like this:

  unregister_netdevice_many_notify
    dev_shutdown
qdisc_put
            clsact_destroy
    tcx_uninstall

The syzbot reproducer triggered a case that the qdisc refcnt is not
zero during dev_shutdown().

tcx_uninstall() will then WARN_ON_ONCE(tcx_entry(entry)->miniq_active)
because the miniq is still active and the entry should not be freed.
The latter assumed that qdisc destruction happens before tcx teardown.

This fix is to avoid tcx_uninstall() doing tcx_entry_free() when the
miniq is still alive and let the clsact_destroy() do the free later, so
that we do not assume any specific ordering for either of them.

If still active, tcx_uninstall() does clear the entry when flushing out
the prog/link. clsact_destroy() will then notice the "!tcx_entry_is_active()"
and then does the tcx_entry_free() eventually.

Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support")
Reported-by: [email protected]
Reported-by: Leon Romanovsky <[email protected]>
Signed-off-by: Martin KaFai Lau <[email protected]>
Co-developed-by: Daniel Borkmann <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Tested-by: [email protected]
Tested-by: Leon Romanovsky <[email protected]>
Link: https://lore.kernel.org/r/222255fe07cb58f15ee662e7ee78328af5b438e4.1690549248.git.daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <[email protected]>

net: bcmgenet: Remove TX ring full logging

There is no need to spam the kernel log with such an indication, remove
this message.

Signed-off-by: Florian Fainelli <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

vsock: Remove unused function declarations

These are never implemented since introduction in
commit d021c344051a ("VSOCK: Introduce VM Sockets")

Signed-off-by: Yue Haibing <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net/smc: Remove unused function declarations

commit f9aab6f2ce57 ("net/smc: immediate freeing in smc_lgr_cleanup_early()")
left behind smc_lgr_schedule_free_work_fast() declaration.
And since commit 349d43127dac ("net/smc: fix kernel panic caused by race of smc_sock")
smc_ib_modify_qp_reset() is not used anymore.

Signed-off-by: Yue Haibing <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Tony Lu <[email protected]>
Reviewed-by: Wenjia Zhang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'connector-proc_filter-test-fixes'

Shuah Khan says:

====================
Connector/proc_filter test fixes

The first patch fixes the LKFT reported compile error, second
one adds .gitignore.
====================

Applying the first 2 patches, third one resent separately.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: connector: Add .gitignore and poupulate it with test

Add gitignore and poupulate it with test name - proc_filter

Signed-off-by: Shuah Khan <[email protected]>
Link: https://lore.kernel.org/r/e3d04cc34e9af07909dc882b50fb1b6f1ce7705b.1690564372.git.skhan@linuxfoundation.org
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: connector: Fix Makefile to include KHDR_INCLUDES

The test compile fails with following errors. Fix the Makefile
CFLAGS to include KHDR_INCLUDES to pull in uapi defines.

gcc -Wall     proc_filter.c  -o ../tools/testing/selftests/connector/proc_filter
proc_filter.c: In function ‘send_message’:
proc_filter.c:22:33: error: invalid application of ‘sizeof’ to incomplete type ‘struct proc_input’
   22 |                          sizeof(struct proc_input))
      |                                 ^~~~~~
proc_filter.c:42:19: note: in expansion of macro ‘NL_MESSAGE_SIZE’
   42 |         char buff[NL_MESSAGE_SIZE];
      |                   ^~~~~~~~~~~~~~~
proc_filter.c:22:33: error: invalid application of ‘sizeof’ to incomplete type ‘struct proc_input’
   22 |                          sizeof(struct proc_input))
      |                                 ^~~~~~
proc_filter.c:48:34: note: in expansion of macro ‘NL_MESSAGE_SIZE’
   48 |                 hdr->nlmsg_len = NL_MESSAGE_SIZE;
      |                                  ^~~~~~~~~~~~~~~
`

Reported-by: Naresh Kamboju <[email protected]>
Link: https://lore.kernel.org/all/CA+G9fYt=6ysz636XcQ=-KJp7vJcMZ=NjbQBrn77v7vnTcfP2cA@mail.gmail.com/
Signed-off-by: Shuah Khan <[email protected]>
Link: https://lore.kernel.org/r/d0055c8cdf18516db8ba9edec99cfc5c08f32a7c.1690564372.git.skhan@linuxfoundation.org
Signed-off-by: Jakub Kicinski <[email protected]>

i40e: remove i40e_status

Replace uses of i40e_status to as equivalent as possible error codes.
Remove enum i40e_status as it is no longer needed

Signed-off-by: Jan Sokolowski <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tcp: Remove unused function declarations

commit 8a59f9d1e3d4 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()")
left behind tcp_bpf_get_proto() declaration. And tcp_v4_tw_remember_stamp()
function is remove in ccb7c410ddc0 ("timewait_sock: Create and use getpeer op.").
Since commit 686989700cab ("tcp: simplify tcp_mark_skb_lost")
tcp_skb_mark_lost_uncond_verify() declaration is not used anymore.

Signed-off-by: Yue Haibing <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

devlink: Remove unused extern declaration devlink_port_region_destroy()

devlink_port_region_destroy() is never implemented since
commit 544e7c33ec2f ("net: devlink: Add support for port regions").

Signed-off-by: Yue Haibing <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: Use sockaddr_storage for getsockopt(SO_PEERNAME).

Commit df8fc4e934c1 ("kbuild: Enable -fstrict-flex-arrays=3") started
applying strict rules to standard string functions.

It does not work well with conventional socket code around each protocol-
specific sockaddr_XXX struct, which is cast from sockaddr_storage and has
a bigger size than fortified functions expect. See these commits:

commit 06d4c8a80836 ("af_unix: Fix fortify_panic() in unix_bind_bsd().")
commit ecb4534b6a1c ("af_unix: Terminate sun_path when bind()ing pathname socket.")
commit a0ade8404c3b ("af_packet: Fix warning of fortified memcpy() in packet_getname().")

We must cast the protocol-specific address back to sockaddr_storage
to call such functions.

However, in the case of getsockaddr(SO_PEERNAME), the rationale is a bit
unclear as the buffer is defined by char[128] which is the same size as
sockaddr_storage.

Let's use sockaddr_storage explicitly.

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: flow_dissector: Use 64bits for used_keys

As 32bits of dissector->used_keys are exhausted,
increase the size to 64bits.

This is base change for ESP/AH flow dissector patch.
Please find patch and discussions at
https://lore.kernel.org/netdev/[email protected]/T/#t

Signed-off-by: Ratheesh Kannoth <[email protected]>
Reviewed-by: Petr Machata <[email protected]> # for mlxsw
Tested-by: Petr Machata <[email protected]>
Reviewed-by: Martin Habets <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

team: Remove NULL check before dev_{put, hold}

The call netdev_{put, hold} of dev_{put, hold} will check NULL,
so there is no need to check before using dev_{put, hold},
remove it to silence the warning:

./drivers/net/team/team.c:2325:3-10: WARNING: NULL check before dev_{put, hold} functions is not needed.

Reported-by: Abaci Robot <[email protected]>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5991
Signed-off-by: Yang Li <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ethernet: mtk_eth_soc: enable nft hw flowtable_offload for MT7988 SoC

Enable hw Packet Process Engine (PPE) for MT7988 SoC.

Tested-by: Daniel Golle <[email protected]>
Signed-off-by: Lorenzo Bianconi <[email protected]>
Link: https://lore.kernel.org/r/5e86341b0220a49620dadc02d77970de5ded9efc.1690441576.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <[email protected]>

net: ethernet: mtk_eth_soc: enable page_pool support for MT7988 SoC

In order to recycle pages, enable page_pool allocator for MT7988 SoC.

Tested-by: Daniel Golle <[email protected]>
Signed-off-by: Lorenzo Bianconi <[email protected]>
Link: https://lore.kernel.org/r/fd4e8693980e47385a543e7b002eec0b88bd09df.1690440675.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <[email protected]>

net: bcmasp: Clean up redundant dev_err_probe()

Referring to platform_get_irq()'s definition, the return value has
already been checked, error message also been printed via
dev_err_probe() if ret < 0. Calling dev_err_probe() one more time
outside platform_get_irq() is obviously redundant.

Removing dev_err_probe() outside platform_get_irq() to clean up
above problem.

Signed-off-by: Chen Jiahao <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Justin Chen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

bonding: 3ad: Remove unused declaration bond_3ad_update_lacp_active()

This is not used since commit 3a755cd8b7c6 ("bonding: add new option lacp_active")

Signed-off-by: YueHaibing <[email protected]>
Acked-by: Hangbin Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'r8152-reduce-control-transfer'

Hayes Wang says:

====================
r8152: reduce control transfer

The two patches are used to reduce the number of control transfer when
access the registers in bulk.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

r8152: set bp in bulk

PLA_BP_0 ~ PLA_BP_15 (0xfc28 ~ 0xfc46) are continuous registers, so we
could combine the control transfers into one control transfer.

Signed-off-by: Hayes Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

r8152: adjust generic_ocp_write function

Reduce the control transfer if all bytes of first or the last DWORD are
written.

The original method is to split the control transfer into three parts
(the first DWORD, middle continuous data, and the last DWORD). However,
they could be combined if whole bytes of the first DWORD or last DWORD
are written. That is, the first DWORD or the last DWORD could be combined
with the middle continuous data, if the byte_en is 0xff.

Signed-off-by: Hayes Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: ethernet: slicoss: remove redundant increment of pointer data

The pointer data is being incremented but this change to the pointer
is not used afterwards. The increment is redundant and can be removed.

Signed-off-by: Colin Ian King <[email protected]>
Acked-by: Lino Sanfilippo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'in-kernel-support-for-the-tls-alert-protocol'

Chuck Lever says:

====================
In-kernel support for the TLS Alert protocol

IMO the kernel doesn't need user space (ie, tlshd) to handle the TLS
Alert protocol. Instead, a set of small helper functions can be used
to handle sending and receiving TLS Alerts for in-kernel TLS
consumers.
====================

Merged on top of a tag in case it's needed in the NFS tree.

Link: https://lore.kernel.org/r/169047923706.5241.1181144206068116926.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <[email protected]>

net/handshake: Trace events for TLS Alert helpers

Add observability for the new TLS Alert infrastructure.

Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Link: https://lore.kernel.org/r/169047947409.5241.14548832149596892717.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <[email protected]>

SUNRPC: Use new helpers to handle TLS Alerts

Use the helpers to parse the level and description fields in
incoming alerts. "Warning" alerts are discarded, and "fatal"
alerts mean the session is no longer valid.

Signed-off-by: Chuck Lever <[email protected]>
Link: https://lore.kernel.org/r/169047944747.5241.1974889594004407123.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <[email protected]>

net/handshake: Add helpers for parsing incoming TLS Alerts

Kernel TLS consumers can replace common TLS Alert parsing code with
these helpers.

Signed-off-by: Chuck Lever <[email protected]>
Link: https://lore.kernel.org/r/169047942074.5241.13791647439480672048.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <[email protected]>

SUNRPC: Send TLS Closure alerts before closing a TCP socket

Before closing a TCP connection, the TLS protocol wants peers to
send session close Alert notifications. Add those in both the RPC
client and server.

Signed-off-by: Chuck Lever <[email protected]>
Link: https://lore.kernel.org/r/169047939404.5241.14392506226409865832.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <[email protected]>

net/handshake: Add API for sending TLS Closure alerts

This helper sends an alert only if a TLS session was established.

Signed-off-by: Chuck Lever <[email protected]>
Link: https://lore.kernel.org/r/169047936730.5241.618595693821012638.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <[email protected]>

net/tls: Add TLS Alert definitions

I'm about to add support for kernel handshake API consumers to send
TLS Alerts, so introduce the needed protocol definitions in the new
header tls_prot.h.

This presages support for Closure alerts. Also, support for alerts
is a pre-requite for handling session re-keying, where one peer will
signal the need for a re-key by sending a TLS Alert.

Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Link: https://lore.kernel.org/r/169047934064.5241.8377890858495063518.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <[email protected]>

net/tls: Move TLS protocol elements to a separate header

Kernel TLS consumers will need definitions of various parts of the
TLS protocol, but often do not need the function declarations and
other infrastructure provided in <net/tls.h>.

Break out existing standardized protocol elements into a separate
header, and make room for a few more elements in subsequent patches.

Signed-off-by: Chuck Lever <[email protected]>
Link: https://lore.kernel.org/r/169047931374.5241.7713175865185969309.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <[email protected]>

octeontx2-af: Initialize 'cntr_val' to fix uninitialized symbol error

drivers/net/ethernet/marvell/octeontx2/nic/otx2_tc.c:860
otx2_tc_update_mcam_table_del_req()
error: uninitialized symbol 'cntr_val'.

Fixes: ec87f05402f5 ("octeontx2-af: Install TC filter rules in hardware based on priority")
Signed-off-by: Suman Ghosh <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'eth-bnxt-fix-a-couple-of-w-1-c-1-warnings'

Jakub Kicinski says:

====================
eth: bnxt: fix a couple of W=1 C=1 warnings

Fix a couple of build warnings.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

eth: bnxt: fix warning for define in struct_group

Fix C=1 warning with sparse 0.6.4:

drivers/net/ethernet/broadcom/bnxt/bnxt.c: note: in included file:
drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h:30:1: warning: directive in macro's argument list

Don't put defines in a struct_group().

Reviewed-by: Michael Chan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

eth: bnxt: fix one of the W=1 warnings about fortified memcpy()

Fix a W=1 warning with gcc 13.1:

In function ‘fortify_memcpy_chk’,
    inlined from ‘bnxt_hwrm_queue_cos2bw_cfg’ at drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c:133:3:
include/linux/fortify-string.h:592:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
  592 |                         __read_overflow2_field(q_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The field group is already defined and starts at queue_id:

struct bnxt_cos2bw_cfg {
u8 pad[3];
struct_group_attr(cfg, __packed,
u8 queue_id;
__le32 min_bw;

Reviewed-by: Michael Chan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'mlx5-updates-2023-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-07-24

1) Generalize devcom implementation to be independent of number of ports
   or device's GUID.

2) Save memory on command interface statistics.

3) General code cleanups

* tag 'mlx5-updates-2023-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Give esw_offloads_load/unload_rep() "mlx5_" prefix
  net/mlx5: Make mlx5_eswitch_load/unload_vport() static
  net/mlx5: Make mlx5_esw_offloads_rep_load/unload() static
  net/mlx5: Remove pointless devlink_rate checks
  net/mlx5: Don't check vport->enabled in port ops
  net/mlx5e: Make flow classification filters static
  net/mlx5e: Remove duplicate code for user flow
  net/mlx5: Allocate command stats with xarray
  net/mlx5: split mlx5_cmd_init() to probe and reload routines
  net/mlx5: Remove redundant cmdif revision check
  net/mlx5: Re-organize mlx5_cmd struct
  net/mlx5e: E-Switch, Allow devcom initialization on more vports
  net/mlx5e: E-Switch, Register devcom device with switch id key
  net/mlx5: Devcom, Infrastructure changes
  net/mlx5: Use shared code for checking lag is supported
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'mlxsw-avoid-non-tracker-helpers-when-holding-and-putting-netdevices'

Petr Machata says:

====================
mlxsw: Avoid non-tracker helpers when holding and putting netdevices

Using the tracking helpers, netdev_hold() and netdev_put(), makes it easier
to debug netdevice refcount imbalances when CONFIG_NET_DEV_REFCNT_TRACKER
is enabled. For example, the following traceback shows the callpath to the
point of an outstanding hold that was never put:

    unregister_netdevice: waiting for swp3 to become free. Usage count = 6
    ref_tracker: eth%d@ffff888123c9a580 has 1/5 users at
mlxsw_sp_switchdev_event+0x6bd/0xcc0 [mlxsw_spectrum]
notifier_call_chain+0xbf/0x3b0
atomic_notifier_call_chain+0x78/0x200
br_switchdev_fdb_notify+0x25f/0x2c0 [bridge]
fdb_notify+0x16a/0x1a0 [bridge]
[...]

In this patchset, get rid of all non-ref-tracking helpers in mlxsw.

- Patch #1 drops two functions that are not used anymore, but contain
  dev_hold() / dev_put() calls.

- Patch #2 avoids taking a reference in one function which is called
  under RTNL.

- The remaining patches convert individual hold/put sites one by one
  from trackerless to tracker-enabled.

Suggested-by: Paolo Abeni <[email protected]>
Link: https://lore.kernel.org/netdev/[email protected]/
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_router: IPv6 events: Use tracker helpers to hold & put netdevices

Using the tracking helpers makes it easier to debug netdevice refcount
imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled.

Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the
router code that deals with IPv6 address events.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/f0af6ad4722b4ca6e598fd4fda8311a3041651ec.1690471775.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_router: RIF: Use tracker helpers to hold & put netdevices

Using the tracking helpers makes it easier to debug netdevice refcount
imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled.

Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the
router code that deals with RIF allocation.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/8b7701a7b439ac268e4be4040eff99d01e27ae47.1690471775.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_router: hw_stats: Use tracker helpers to hold & put netdevices

Using the tracking helpers makes it easier to debug netdevice refcount
imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled.

Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the
router code that deals with hw_stats events.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/b972314cfef4f4c24e66e60d13cffa5d606d1bf3.1690471774.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_router: FIB: Use tracker helpers to hold & put netdevices

Using the tracking helpers makes it easier to debug netdevice refcount
imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled.

Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the
router code that deals with FIB events.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/5221a92e751c40447c55959f622267ccc999ed04.1690471774.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_switchdev: Use tracker helpers to hold & put netdevices

Using the tracking helpers makes it easier to debug netdevice refcount
imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled.

Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the
switchdev module.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/774c3d7b5b0231f1435df2ec9dd660192e382756.1690471774.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_nve: Do not take reference when looking up netdevice

mlxsw_sp_nve_fid_disable() is always called under RTNL. It is therefore
safe to call __dev_get_by_index() to get the netdevice pointer without
bumping the reference count, because we can be sure the netdevice is not
going away. That then obviates the need to put the netdevice later in the
function.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/341d1046f89d8d839d9d00e4a3d58cdc351e9397.1690471774.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum: Drop unused functions mlxsw_sp_port_lower_dev_hold/_put()

As of commit 151b89f6025a ("mlxsw: spectrum_router: Reuse work neighbor
initialization in work scheduler"), the functions
mlxsw_sp_port_lower_dev_hold() and mlxsw_sp_port_dev_put() have no users.
Drop them.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/d0adcd7cb4ea19416294a0f861100edba84c9f36.1690471774.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

net: change accept_ra_min_rtr_lft to affect all RA lifetimes

accept_ra_min_rtr_lft only considered the lifetime of the default route
and discarded entire RAs accordingly.

This change renames accept_ra_min_rtr_lft to accept_ra_min_lft, and
applies the value to individual RA sections; in particular, router
lifetime, PIO preferred lifetime, and RIO lifetime. If any of those
lifetimes are lower than the configured value, the specific RA section
is ignored.

In order for the sysctl to be useful to Android, it should really apply
to all lifetimes in the RA, since that is what determines the minimum
frequency at which RAs must be processed by the kernel. Android uses
hardware offloads to drop RAs for a fraction of the minimum of all
lifetimes present in the RA (some networks have very frequent RAs (5s)
with high lifetimes (2h)). Despite this, we have encountered networks
that set the router lifetime to 30s which results in very frequent CPU
wakeups. Instead of disabling IPv6 (and dropping IPv6 ethertype in the
WiFi firmware) entirely on such networks, it seems better to ignore the
misconfigured routers while still processing RAs from other IPv6 routers
on the same network (i.e. to support IoT applications).

The previous implementation dropped the entire RA based on router
lifetime. This turned out to be hard to expand to the other lifetimes
present in the RA in a consistent manner; dropping the entire RA based
on RIO/PIO lifetimes would essentially require parsing the whole thing
twice.

Fixes: 1671bcfd76fd ("net: add sysctl accept_ra_min_rtr_lft")
Cc: Lorenzo Colitti <[email protected]>
Signed-off-by: Patrick Rohr <[email protected]>
Reviewed-by: Maciej Żenczykowski <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'net-store-netdevs-in-an-xarray'

Jakub Kicinski says:

====================
net: store netdevs in an xarray

One of more annoying developer experience gaps we have in netlink
is iterating over netdevs. It's painful. Add an xarray to make
it trivial.

v1: https://lore.kernel.org/all/20230722014237.4078962 [email protected]/
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: convert some netlink netdev iterators to depend on the xarray

Reap the benefits of easier iteration thanks to the xarray.
Convert just the genetlink ones, those are easier to test.

Reviewed-by: Leon Romanovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: store netdevs in an xarray

Iterating over the netdev hash table for netlink dumps is hard.
Dumps are done in "chunks" so we need to save the position
after each chunk, so we know where to restart from. Because
netdevs are stored in a hash table we remember which bucket
we were in and how many devices we dumped.

Since we don't hold any locks across the "chunks" - devices may
come and go while we're dumping. If that happens we may miss
a device (if device is deleted from the bucket we were in).
We indicate to user space that this may have happened by setting
NLM_F_DUMP_INTR. User space is supposed to dump again (I think)
if it sees that. Somehow I doubt most user space gets this right..

To illustrate let's look at an example:

               System state:
  start:       # [A, B, C]
  del:  B      # [A, C]

with the hash table we may dump [A, B], missing C completely even
tho it existed both before and after the "del B".

Add an xarray and use it to allocate ifindexes. This way we
can iterate ifindexes in order, without the worry that we'll
skip one. We may still generate a dump of a state which "never
existed", for example for a set of values and sequence of ops:

               System state:
  start:       # [A, B]
  add:  C      # [A, C, B]
  del:  B      # [A, C]

we may generate a dump of [A], if C got an index between A and B.
System has never been in such state. But I'm 90% sure that's perfectly
fine, important part is that we can't _miss_ devices which exist before
and after. User space which wants to mirror kernel's state subscribes
to notifications and does periodic dumps so it will know that C exists
from the notification about its creation or from the next dump
(next dump is _guaranteed_ to include C, if it doesn't get removed).

To avoid any perf regressions keep the hash table for now. Most
net namespaces have very few devices and microbenchmarking 1M lookups
on Skylake I get the following results (not counting loopback
to number of devs):

#devs | hash |  xa  | delta
    2  | 18.3 | 20.1 | + 9.8%
   16  | 18.3 | 20.1 | + 9.5%
   64  | 18.3 | 26.3 | +43.8%
  128  | 20.4 | 26.3 | +28.6%
  256  | 20.0 | 26.4 | +32.1%
1024  | 26.6 | 26.7 | + 0.2%
8192  |541.3 | 33.5 | -93.8%

No surprises since the hash table has 256 entries.
The microbenchmark scans indexes in order, if the pattern is more
random xa starts to win at 512 devices already. But that's a lot
of devices, in practice.

Reviewed-by: Leon Romanovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'ynl-couple-of-unrelated-fixes'

Stanislav Fomichev says:

====================
ynl: couple of unrelated fixes

- spelling of xdp-features
- s/xdp_zc_max_segs/xdp-zc-max-segs/
- expose xdp-zc-max-segs
- add /* private: */
- regenerate headers
- print xdp_zc_max_segs from sample
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ynl: print xdp-zc-max-segs in the sample

Technically we don't have to keep extending the sample, but it
feels useful to run these tools locally to confirm everything
is working.

Signed-off-by: Stanislav Fomichev <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ynl: regenerate all headers

Also add support to pass topdir to ynl-regen.sh (Jakub) and call
it from the makefile to update the UAPI headers.

Signed-off-by: Stanislav Fomichev <[email protected]>
Co-developed-by: Jakub Kicinski <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ynl: mark max/mask as private for kdoc

Simon mentioned in another thread that it makes kdoc happy
and Jakub confirms that commit e27cb89a22ad ("scripts: kernel-doc: support
private / public marking for enums") actually added the needed
support.

Signed-off-by: Stanislav Fomichev <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ynl: expose xdp-zc-max-segs

Also rename it to dashes, to match the rest. And fix unrelated
spelling error while we're at it.

Signed-off-by: Stanislav Fomichev <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nex
t-queue

Tony Nguyen says:

====================
ice: Implement support for SRIOV + LAG

Dave Ertman says:

Implement support for SRIOV VF's on interfaces that are in an
aggregate interface.

The first interface added into the aggregate will be flagged as
the primary interface, and this primary interface will be
responsible for managing the VF's resources. VF's created on the
primary are the only VFs that will be supported on the aggregate.
Only Active-Backup mode will be supported and only aggregates whose
primary interface is in switchdev mode will be supported.

The ice-lag DDP must be loaded to support this feature.

Additional restrictions on what interfaces can be added to the aggregate
and still support SRIOV VFs are:
- interfaces have to all be on the same physical NIC
- all interfaces have to have the same QoS settings
- interfaces have to have the FW LLDP agent disabled
- only the primary interface is to be put into switchdev mode
- no more than two interfaces in the aggregate
---
v2:
- Move NULL check for q_ctx in ice_lag_qbuf_recfg() earlier (patch 6)

v1: https://lore.kernel.org/netdev/20230726182141.3797928 [email protected]/
====================

Signed-off-by: David S. Miller <[email protected]>

IPv6: add extack info for IPv6 address add/delete

Add extack info for IPv6 address add/delete, which would be useful for
users to understand the problem without having to read kernel code.

Suggested-by: Beniamino Galvani <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: Hangbin Liu <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'selftest-ptp'

Alex Maftei says:

====================
selftests/ptp: Add support for new timestamp IOCTLs

PTP_SYS_OFFSET_EXTENDED was added in November 2018 in
361800876f80 (" ptp: add PTP_SYS_OFFSET_EXTENDED ioctl")
and PTP_SYS_OFFSET_PRECISE was added in February 2016 in
719f1aa4a671 ("ptp: Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping")

The PTP selftest code is lacking support for these two IOCTLS.
This short series of patches adds support for them.

Changes in v2:
- Fixed rebase issues (v1 somehow ended up with patch 1 being from the
first manual split of my changes and patch 2 being from rebase 2 out
of 3)
- Rebased on top of net-next
====================

Acked-by: Richard Cochran <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

selftests/ptp: Add -X option for testing PTP_SYS_OFFSET_PRECISE

The -X option was chosen because X looks like a cross, and the underlying
callback is 'get cross timestamp'.

Signed-off-by: Alex Maftei <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

selftests/ptp: Add -x option for testing PTP_SYS_OFFSET_EXTENDED

The -x option (where 'x' stands for eXtended) takes an argument which
represents the number of samples to request from the PTP device.
The help message will display the maximum number of samples allowed.
Providing an invalid argument will also display the maximum number of
samples allowed.

Signed-off-by: Alex Maftei <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'linux-can-next-for-6.6-20230728' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

linux-can-next-for-6.6-20230728

Marc Kleine-Budde says:

====================
Hello netdev-team,

this is a pull request of 21 patches for net-next/master.

The 1st patch is by Gerhard Uttenthaler, which adds Gerhard as the
maintainer ems_pci driver.

Peter Seiderer's patch removes a unused function from the peak_usb
driver.

The next 4 patches are by John Watts and add support for the sun4i_can
driver on the Allwinner D1.

Rob Herring's patch corrects the DT includes in various CAN drivers.

Followed by 14 patches from me concerning the gs_usb driver. The first
11 are various cleanups consisting of coding style improvements, error
path printout cleanups, and removal of unneeded
usb_kill_anchored_urbs(). The last 3 convert the driver to use NAPI to
avoid out-of-order reception of CAN frames.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'sfc-siena-next'

Martin Habets says:

====================
sfc: Remove Siena bits from sfc.ko

Last year we split off Siena into it's own driver under directory siena.
This patch series removes the now unused Falcon and Siena code from sfc.ko.
No functional changes are intended.
====================

Signed-off-by: David S. Miller <[email protected]>

sfc: Remove vfdi.h

It was only used for Siena SRIOV, so nothing includes it any more
in this directory.

Signed-off-by: Martin Habets <[email protected]>
Acked-by: Edward Cree <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sfc: Cleanups in io.h

Most of the Falcon locking description does not apply to EF10.

Signed-off-by: Martin Habets <[email protected]>
Acked-by: Edward Cree <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sfc: Miscellaneous comment removals

Remove comments that only apply to Falcon and Siena.

Signed-off-by: Martin Habets <[email protected]>
Acked-by: Edward Cree <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sfc: Remove struct efx_special_buffer

The attributes index and entries are no longer needed, so use
struct efx_buffer instead.
next_buffer_table was also Siena specific.
Removed some checkpatch warnings.

Signed-off-by: Martin Habets <[email protected]>
Acked-by: Edward Cree <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sfc: Filter cleanups for Falcon and Siena

unicast_filter and multicast_hash are no longer needed.

Signed-off-by: Martin Habets <[email protected]>
Acked-by: Edward Cree <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sfc: Remove some NIC type indirections that are no longer needed

The special handling for SRIOV reset and FLR is not needed on EF10.

Signed-off-by: Martin Habets <[email protected]>
Acked-by: Edward Cree <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sfc: Remove PTP code for Siena

rx_tx_inline is now always true.
The special event list is no longer needed, event handling is always
inline.
Event MCDI_EVENT_CODE_PTP_RX is no longer needed.

Signed-off-by: Martin Habets <[email protected]>
Acked-by: Edward Cree <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sfc: Remove EFX_REV_SIENA_A0

The workarounds for bug 8568 and 17213 are no longer needed.

Signed-off-by: Martin Habets <[email protected]>
Acked-by: Edward Cree <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sfc: Remove support for siena high priority queue

This also removes TC support code, since that was never supported for EF10.
TC support for EF100 is not handled from efx.c.

Signed-off-by: Martin Habets <[email protected]>
Acked-by: Edward Cree <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sfc: Remove siena_nic_data and stats

These are no longer used, and the two Siena specific functions are
no longer present in sfc.ko.

Signed-off-by: Martin Habets <[email protected]>
Acked-by: Edward Cree <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sfc: Remove falcon references

Siena was using functions from the Falcon architecture.
These start with the efx_farch prefix.
Now that both of these are in separate modules the references
are no longer used in the sfc.ko module.

Signed-off-by: Martin Habets <[email protected]>
Acked-by: Edward Cree <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'rxfh-custom-rss'

Joe Damato says:

====================
rxfh with custom RSS fixes

Greetings:

Welcome to v2, now via net-next. No functional changes; only style
changes (see the summary below).

While attempting to get the RX flow hash key for a custom RSS context on
my mlx5 NIC, I got an error:

$ sudo ethtool -u eth1 rx-flow-hash tcp4 context 1
Cannot get RX network flow hashing options: Invalid argument

I dug into this a bit and noticed two things:

1. ETHTOOL_GRXFH supports custom RSS contexts, but ETHTOOL_SRXFH does
not. I moved the copy logic out of ETHTOOL_GRXFH and into a helper so
that both ETHTOOL_{G,S}RXFH now call it, which fixes ETHTOOL_SRXFH. This
is patch 1/2.

2. mlx5 defaulted to RSS context 0 for both ETHTOOL_{G,S}RXFH paths. I
have modified the driver to support custom contexts for both paths. It
is now possible to get and set the flow hash key for custom RSS contexts
with mlx5. This is patch 2/2.

See commit messages for more details.

Thanks.

v2:
- Rebased on net-next
- Adjusted arguments of mlx5e_rx_res_rss_get_hash_fields and
  mlx5e_rx_res_rss_set_hash_fields to move rss_idx next to the rss
  argument
- Changed return value of both mlx5e_rx_res_rss_get_hash_fields and
  mlx5e_rx_res_rss_set_hash_fields to -ENOENT when the rss entry is
  NULL
- Changed order of local variables in mlx5e_get_rss_hash_opt and
  mlx5e_set_rss_hash_opt
====================

Signed-off-by: David S. Miller <[email protected]>

net/mlx5: Fix flowhash key set/get for custom RSS

mlx5 flow hash field retrieval and set only worked on the default
RSS context as of commit f01cc58c18d6 ("net/mlx5e: Support multiple RSS
contexts"), not custom RSS contexts.

For example, before this patch attempting to retrieve the flow hash fields
for RSS context 1 fails:

$ sudo ethtool -u eth1 rx-flow-hash tcp4 context 1
Cannot get RX network flow hashing options: Invalid argument

This patch fixes getting and setting the flow hash fields for contexts
other than the default context.

Getting the flow hash fields for RSS context 1:

$ sudo ethtool -u eth1 rx-flow-hash tcp4 context 1
For RSS context 1:
TCP over IPV4 flows use these fields for computing Hash flow key:
IP DA

Now, setting the flash hash fields to a custom value:

$ sudo ethtool -U eth1 rx-flow-hash tcp4 sdfn context 1

And retrieving them again:

$ sudo ethtool -u eth1 rx-flow-hash tcp4 context 1
For RSS context 1:
TCP over IPV4 flows use these fields for computing Hash flow key:
IP SA
IP DA
L4 bytes 0 & 1 [TCP/UDP src port]
L4 bytes 2 & 3 [TCP/UDP dst port]

Signed-off-by: Joe Damato <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ethtool: Unify ETHTOOL_{G,S}RXFH rxnfc copy

ETHTOOL_GRXFH correctly copies in the full struct ethtool_rxnfc when
FLOW_RSS is set; ETHTOOL_SRXFH needs a similar code path to handle the
FLOW_RSS case so that ethtool can set the flow hash for custom RSS
contexts (if supported by the driver).

The copy code from ETHTOOL_GRXFH has been pulled out in to a helper so
that it can be called in both ETHTOOL_{G,S}RXFH code paths.

Acked-by: Edward Cree <[email protected]>
Signed-off-by: Joe Damato <[email protected]>
Signed-off-by: David S. Miller <[email protected]>