Git Repo - linux.git/log

firewire: net: Make use of get_unaligned_be48(), put_unaligned_be48()

Since we have a proper endianness converters for BE 48-bit data use
them.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20220726144906.5217-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

amt: fix typo in comment

Correct spelling on 'non-existent' in comment

Signed-off-by: Ruffalo Lavoisier <RuffaloLavoisier@gmail.com>
Link: https://lore.kernel.org/r/20220728032854.151180-1-RuffaloLavoisier@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: core_linecards: Remove duplicated include in core_linecard_dev.c

Fix following includecheck warning:
./drivers/net/ethernet/mellanox/mlxsw/core_linecard_dev.c: linux/err.h is included more than once.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220727233801.23781-1-yang.lee@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: dsa: Add a Makefile which installs the selftests

Add a Makefile which takes care of installing the selftests in
tools/testing/selftests/drivers/net/dsa. This can be used to install all
DSA specific selftests and forwarding.config using the same approach as
for the selftests in tools/testing/selftests/net/forwarding.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220727191642.480279-1-martin.blumenstingl@googlemail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'take-devlink-lock-on-mlx4-and-mlx5-callbacks'

Moshe Shemesh says:

====================
Take devlink lock on mlx4 and mlx5 callbacks

Prepare mlx4 and mlx5 drivers to have all devlink callbacks called with
devlink instance locked. Change mlx4 driver to use devl_ API where
needed to have devlink reload callbacks locked. Change mlx5 driver to
use devl_ API where needed to have devlink reload and devlink health
callbacks locked.

As mlx5 is the only driver which needed changes to enable calling health
callbacks with devlink instance locked, this patchset also removes
DEVLINK_NL_FLAG_NO_LOCK flag from devlink health callbacks.

This patchset will be followed by a patchset that will remove
DEVLINK_NL_FLAG_NO_LOCK flag from devlink and will remove devlink_mutex.
====================

Link: https://lore.kernel.org/r/1659023630-32006-1-git-send-email-moshe@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: Hold the instance lock in health callbacks

Let the core take the devlink instance lock around health callbacks and
remove the now redundant locking in the drivers.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Lock mlx5 devlink health recovery callback

Change devlink instance locks in mlx5 driver to have devlink health
recovery callback locked, while keeping all driver paths which lead to
devl_ API functions called by the driver locked.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx4: Lock mlx4 devlink reload callback

Change devlink instance locks in mlx4 driver to have devlink reload
callback locked, while keeping all driver paths which leads to devl_ API
functions called by the mlx4 driver locked.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx4: Use devl_ API for devlink port register / unregister

Use devl_ API to call devl_port_register() and devl_port_unregister()
instead of devlink_port_register() and devlink_port_unregister(). Add
devlink instance lock in mlx4 driver paths to these functions.

This will be used by the downstream patch to invoke mlx4 devlink reload
callbacks with devlink lock held.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx4: Use devl_ API for devlink region create / destroy

Use devl_ API to call devl_region_create() and devl_region_destroy()
instead of devlink_region_create() and devlink_region_destroy().
Add devlink instance lock in mlx4 driver paths to these functions.

This will be used by the downstream patch to invoke mlx4 devlink reload
callbacks with devlink lock held.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Lock mlx5 devlink reload callbacks

Change devlink instance locks in mlx5 driver to have devlink reload
callbacks locked, while keeping all driver paths which lead to devl_ API
functions called by the driver locked.

Add mlx5_load_one_devl_locked() and mlx5_unload_one_devl_locked() which
are used by the paths which are already locked such as devlink reload
callbacks.

This patch makes the driver use devl_ API also for traps register as
these functions are called from the driver paths parallel to reload that
requires locking now.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Move fw reset unload to mlx5_fw_reset_complete_reload

Refactor fw reset code to have the unload driver part done on
mlx5_fw_reset_complete_reload(), so if it was called by the PF which
initiated the reload fw activate flow, the unload part will be handled
by the mlx5_devlink_reload_fw_activate() callback itself and not by the
reset event work.

This will be used by the downstream patch to invoke devlink reload
callbacks with devlink lock held.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: remove region snapshots list dependency on devlink->lock

After mlx4 driver is converted to do locked reload,
devlink_region_snapshot_create() may be called from both locked and
unlocked context.

Note that in mlx4 region snapshots could be created on any command
failure. That can happen in any flow that involves commands to FW,
which means most of the driver flows.

So resolve this by removing dependency on devlink->lock for region
snapshots list consistency and introduce new mutex to ensure it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: remove region snapshot ID tracking dependency on devlink->lock

After mlx4 driver is converted to do locked reload, functions to get/put
regions snapshot ID may be called from both locked and unlocked context.

So resolve this by removing dependency on devlink->lock for region
snapshot ID tracking by using internal xa_lock() to maintain
shapshot_ids xa_array consistency.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'add-framework-for-selftests-in-devlink'

Vikas Gupta says:

====================
add framework for selftests in devlink

Add support for selftests in the devlink framework.
Adds a callback .selftests_check and .selftests_run in devlink_ops.
User can add test(s) suite which is subsequently passed to the driver
and driver can opt for running particular tests based on its capabilities.

Patchset adds a flash based test for the bnxt_en driver.
====================

Link: https://lore.kernel.org/r/20220727165721.37959-1-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: implement callbacks for devlink selftests

Add callbacks
=============
.selftest_check: returns true for flash selftest.
.selftest_run: runs a flash selftest.

Also, refactor NVM APIs so that they can be
used with devlink and ethtool both.

Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: introduce framework for selftests

Add a framework for running selftests.
Framework exposes devlink commands and test suite(s) to the user
to execute and query the supported tests by the driver.

Below are new entries in devlink_nl_ops
devlink_nl_cmd_selftests_show_doit/dumpit: To query the supported
selftests by the drivers.
devlink_nl_cmd_selftests_run: To execute selftests. Users can
provide a test mask for executing group tests or standalone tests.

Documentation/networking/devlink/ path is already part of MAINTAINERS &
the new files come under this path. Hence no update needed to the
MAINTAINERS

Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlx5e-use-tls-tx-pool-to-improve-connection-rate'

Tariq Toukan says:

====================
mlx5e use TLS TX pool to improve connection rate

To offload encryption operations, the mlx5 device maintains state and
keeps track of every kTLS device-offloaded connection.  Two HW objects
are used per TX context of a kTLS offloaded connection: a. Transport
interface send (TIS) object, to reach the HW context.  b. Data Encryption
Key (DEK) to perform the crypto operations.

These two objects are created and destroyed per TLS TX context, via FW
commands.  In total, 4 FW commands are issued per TLS TX context, which
seriously limits the connection rate.

In this series, we aim to save creation and destroy of TIS objects by
recycling them.  Upon recycling of a TIS, the HW still needs to be
notified for the re-mapping between a TIS and a context. This is done by
posting WQEs via an SQ, significantly faster API than the FW command
interface.

A pool is used for recycling. The pool dynamically interacts to the load
and connection rate, growing and shrinking accordingly.

Saving the TIS FW commands per context increases connection rate by ~42%,
from 11.6K to 16.5K connections per sec.

Connection rate is still limited by FW bottleneck due to the remaining
per context FW commands (DEK create/destroy). This will soon be addressed
in a followup series.  By combining the two series, the FW bottleneck
will be released, and a significantly higher (about 100K connections per
sec) kTLS TX device-offloaded connection rate is reached.
====================

Link: https://lore.kernel.org/r/20220727094346.10540-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: kTLS, Dynamically re-size TX recycling pool

Let the TLS TX recycle pool be more flexible in size, by continuously
and dynamically allocating and releasing HW resources in response to
changes in the connections rate and load.

Allocate and release pool entries in bulks (16). Use a workqueue to
release/allocate in the background. Allocate a new bulk when the pool
size goes lower than the low threshold (1K). Symmetric operation is done
when the pool size gets greater than the upper threshold (4K).

Every idle pool entry holds: 1 TIS, 1 DEK (HW resources), in addition to
~100 bytes in host memory.

Start with an empty pool to minimize memory and HW resources waste for
non-TLS users that have the device-offload TLS enabled.

Upon a new request, in case the pool is empty, do not wait for a whole bulk
allocation to complete. Instead, trigger an instant allocation of a single
resource to reduce latency.

Performance tests:
Before: 11,684 CPS
After: 16,556 CPS

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: kTLS, Recycle objects of device-offloaded TLS TX connections

The transport interface send (TIS) object is responsible for performing
all transport related operations of the transmit side. The ConnectX HW
uses a TIS object to save and access the TLS crypto information and state
of an offloaded TX kTLS connection.

Before this patch, we used to create a new TIS per connection and destroy
it once it’s closed. Every create and destroy of a TIS is a FW command.

Same applies for the private TLS context, where we used to dynamically
allocate and free it per connection.

Resources recycling reduce the impact of the allocation/free operations
and helps speeding up the connection rate.

In this feature we maintain a pool of TX objects and use it to recycle
the resources instead of re-creating them per connection.

A cached TIS popped from the pool is updated to serve the new connection
via the fast-path HW interface, updating the tls static and progress
params. This is a very fast operation, significantly faster than FW
commands.

On recycling, a WQE fence is required after the context params change.
This guarantees that the data is sent after the context has been
successfully updated in hardware, and that the context modification
doesn't interfere with existing traffic.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: kTLS, Take stats out of OOO handler

Let the caller of mlx5e_ktls_tx_handle_ooo() take care of updating the
stats, according to the returned value. As the switch/case blocks are
already there, this change saves unnecessary branches in the handler.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: kTLS, Introduce TLS-specific create TIS

TLS TIS objects have a defined role in mapping and reaching the HW TLS
contexts. Some standard TIS attributes (like LAG port affinity) are
not relevant for them.

Use a dedicated TLS TIS create function instead of the generic
mlx5e_create_tis.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/tls: Multi-threaded calls to TX tls_dev_del

Multiple TLS device-offloaded contexts can be added in parallel via
concurrent calls to .tls_dev_add, while calls to .tls_dev_del are
sequential in tls_device_gc_task.

This is not a sustainable behavior. This creates a rate gap between add
and del operations (addition rate outperforms the deletion rate).  When
running for enough time, the TLS device resources could get exhausted,
failing to offload new connections.

Replace the single-threaded garbage collector work with a per-context
alternative, so they can be handled on several cores in parallel. Use
a new dedicated destruct workqueue for this.

Tested with mlx5 device:
Before: 22141 add/sec,   103 del/sec
After:  11684 add/sec, 11684 del/sec

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/tls: Perform immediate device ctx cleanup when possible

TLS context destructor can be run in atomic context. Cleanup operations
for device-offloaded contexts could require access and interaction with
the device callbacks, which might sleep. Hence, the cleanup of such
contexts must be deferred and completed inside an async work.

For all others, this is not necessary, as cleanup is atomic. Invoke
cleanup immediately for them, avoiding queueing redundant gc work.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tls: rx: Fix unsigned comparison with less than zero

The return from the call to tls_rx_msg_size() is int, it can be
a negative error code, however this is being assigned to an
unsigned long variable 'sz', so making 'sz' an int.

Eliminate the following coccicheck warning:
./net/tls/tls_strp.c:211:6-8: WARNING: Unsigned expression compared with zero: sz < 0

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220728031019.32838-1-yang.lee@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'tls-rx-follow-ups-to-rx-work'

Jakub Kicinski says:

====================
tls: rx: follow ups to rx work

A selection of unrelated changes. First some selftest polishing.
Next a change to rcvtimeo handling for locking based on an exchange
with Eric. Follow up to Paolo's comments from yesterday. Last but
not least a fix to a false positive warning, turns out I've been
testing with DEBUG_NET=n this whole time.
====================

Link: https://lore.kernel.org/r/20220727031524.358216-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tls: rx: fix the false positive warning

I went too far in the accessor conversion, we can't use tls_strp_msg()
after decryption because the message may not be ready. What we care
about on this path is that the output skb is detached, i.e. we didn't
somehow just turn around and used the input skb with its TCP data
still attached. So look at the anchor directly.

Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tls: strp: rename and multithread the workqueue

Paolo points out that there seems to be no strong reason strparser
users a single threaded workqueue. Perhaps there were some performance
or pinning considerations? Since we don't know (and it's the slow path)
let's default to the most natural, multi-threaded choice.

Also rename the workqueue to "tls-".

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tls: rx: don't consider sock_rcvtimeo() cumulative

Eric indicates that restarting rcvtimeo on every wait may be fine.
I thought that we should consider it cumulative, and made
tls_rx_reader_lock() return the remaining timeo after acquiring
the reader lock.

tls_rx_rec_wait() gets its timeout passed in by value so it
does not keep track of time previously spent.

Make the lock waiting consistent with tls_rx_rec_wait() - don't
keep track of time spent.

Read the timeo fresh in tls_rx_rec_wait().
It's unclear to me why callers are supposed to cache the value.

Link: https://lore.kernel.org/all/CANn89iKcmSfWgvZjzNGbsrndmCch2HC_EPZ7qmGboDNaWoviNQ@mail.gmail.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: tls: handful of memrnd() and length checks

Add a handful of memory randomizations and precise length checks.
Nothing is really broken here, I did this to increase confidence
when debugging. It does fix a GCC warning, tho. Apparently GCC
recognizes that memory needs to be initialized for send() but
does not recognize that for write().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: usb: delete extra space and tab in blank line

delete extra space and tab in blank line, there is no functional change.

Signed-off-by: Xie Shaowen <studentxswpy@163.com>
Link: https://lore.kernel.org/r/20220727081253.3043941-1-studentxswpy@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

libbpf: Support PPC in arch_specific_syscall_pfx

Commit 708ac5bea0ce ("libbpf: add ksyscall/kretsyscall sections support
for syscall kprobes") added the arch_specific_syscall_pfx() function,
which returns a string representing the architecture in use. As it turns
out this function is currently not aware of Power PC, where NULL is
returned. That's being flagged by the libbpf CI system, which builds for
ppc64le and the compiler sees a NULL pointer being passed in to a %s
format string.
With this change we add representations for two more architectures, for
Power PC and Power PC 64, and also adjust the string format logic to
handle NULL pointers gracefully, in an attempt to prevent similar issues
with other architectures in the future.

Fixes: 708ac5bea0ce ("libbpf: add ksyscall/kretsyscall sections support for syscall kprobes")
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220728222345.3125975-1-deso@posteo.net

net/mlx5e: Move mlx5e_init_l2_addr to en_main

Move the function declaration of mlx5e_init_l2_addr to en/fs.h, rename
to mlx5e_fs_init_l2_addr to align with the fs API functions naming
convention and let it take mlx5e_flow_steering as arguments while keeping
implementation at en_fs.c file. This helps maintain a clean driver code
and avoids unnecessary dependencies.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Split en_fs ndo's and move to en_main

Add inner callee for ndo mlx5e_vlan_rx_add_vid and
mlx5e_vlan_rx_kill_vid, to separate the priv usage from other
flow steering flows.

Move wrapper ndo's into en_main, and split the rest of the functionality
into a separate part inside en_fs.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Separate mlx5e_set_rx_mode_work and move caller to en_main

Separate mlx5e_set_rx_mode into two, and move caller to en_main while
keeping implementation in en_fs in the newly declared function
mlx5e_fs_set_rx_mode. This; to minimize the coupling of flow_steering
to priv.

Add a parallel boolean member vlan_strip_disable to
mlx5e_flow_steering that's updated similarly as its identical in priv,
thus making it possible to adjust the rx_mode work handler to current
changes.

Also, add state_destroy boolean to mlx5e_flow_steering struct which
replaces the old check : !test_bit(MLX5E_STATE_DESTROYING, &priv->state).
This state member is updated accordingly prior to
INIT_WORK(mlx5e_set_rx_mode_work), This is done for similar purposes as
mentioned earlier and to minimize argument passings.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Add mdev to flow_steering struct

Make flow_steering struct contain mlx5_core_dev such that
it becomes self contained and easier to decouple later on this series.
Let its values be initialized in mlx5e_fs_init().

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Report flow steering errors with mdev err report API

Let en_fs report errors via mdev error report API, aka mlx5_core_*
macros, thus replace the netdev API reports.
This to minimize netdev coupling to the flow steering struct.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer

Make mlx5e_flow_steering member of mlx5e_priv a pointer.
Add dynamic allocation respectively.

Allocate fs for all profiles when initializing profile,
symmetrically deallocate at profile cleanup.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Allocate VLAN and TC for featured profiles only

Introduce allocation and de-allocation functions for both flow steering
VLAN and TC as part of fs API.
Add allocations of VLAN and TC as nic profile feature, such that
fs_init() will allocate both VLAN and TC only if they're featured in
the profile. VLAN and TC are relevant for nic_profile only.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Make mlx5e_tc_table private

Move mlx5e_tc_table struct to en_tc.c thus make it private.
Introduce allocation and deallocation functions as part of the tc API
to allow this switch smoothly.

Convert mlx5e_nic_chain() macro to a function of en_tc.c.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Convert mlx5e_tc_table member of mlx5e_flow_steering to pointer

Make fs.tc be a pointer and allocate it dynamically.
Add mlx5e_priv pointer to mlx5e_tc_table, and thus get a work-around to
accessing priv via tc when handling tc events inside mlx5e_tc_netdev_event.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: TC, Support tc action api for police

Add support for tc action api for police.
Offloading standalone police action without
a tc rule and reporting stats.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: TC, Separate get/update/replace meter functions

mlx5e_tc_meter_get() to get an existing meter.
mlx5e_tc_meter_update() to update an existing meter without refcount.
mlx5e_tc_meter_replace() to get/create a meter and update if needed.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Add red and green counters for metering

Add red and green counters per meter instance.
TC police action is implemented as a meter instance.
The meter counters represent the police action
notexceed/exceed counters.
TC rules using the same meter instance will use
the same counters.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: TC, Allocate post meter ft per rule

To support a TC police action notexceed counter and supporting
actions other than drop/pipe there is a need to create separate ft
and rules per rule and not to use a common one created on eswitch init.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Add support for flow metering ASO

Add support for ASO action of type flow metering
on device that supports STEv1.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Hamdan Igbaria <hamdani@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Fix wrong use of skb_tcp_all_headers() with encapsulation

Use skb_inner_tcp_all_headers() instead of skb_tcp_all_headers() when
transmitting an encapsulated packet in mlx5e_tx_get_gso_ihs().

Fixes: 504148fedb85 ("net: add skb_[inner_]tcp_all_headers helpers")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Merge tag 'net-5.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth and netfilter, no known blockers for
  the release.

  Current release - regressions:

   - wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop(), fix
     taking the lock before its initialized

   - Bluetooth: mgmt: fix double free on error path

  Current release - new code bugs:

   - eth: ice: fix tunnel checksum offload with fragmented traffic

  Previous releases - regressions:

   - tcp: md5: fix IPv4-mapped support after refactoring, don't take the
     pure v6 path

   - Revert "tcp: change pingpong threshold to 3", improving detection
     of interactive sessions

   - mld: fix netdev refcount leak in mld_{query | report}_work() due to
     a race

   - Bluetooth:
      - always set event mask on suspend, avoid early wake ups
      - L2CAP: fix use-after-free caused by l2cap_chan_put

   - bridge: do not send empty IFLA_AF_SPEC attribute

  Previous releases - always broken:

   - ping6: fix memleak in ipv6_renew_options()

   - sctp: prevent null-deref caused by over-eager error paths

   - virtio-net: fix the race between refill work and close, resulting
     in NAPI scheduled after close and a BUG()

   - macsec:
      - fix three netlink parsing bugs
      - avoid breaking the device state on invalid change requests
      - fix a memleak in another error path

  Misc:

   - dt-bindings: net: ethernet-controller: rework 'fixed-link' schema

   - two more batches of sysctl data race adornment"

* tag 'net-5.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (67 commits)
  stmmac: dwmac-mediatek: fix resource leak in probe
  ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr
  net: ping6: Fix memleak in ipv6_renew_options().
  net/funeth: Fix fun_xdp_tx() and XDP packet reclaim
  sctp: leave the err path free in sctp_stream_init to sctp_stream_free
  sfc: disable softirqs for ptp TX
  ptp: ocp: Select CRC16 in the Kconfig.
  tcp: md5: fix IPv4-mapped support
  virtio-net: fix the race between refill work and close
  mptcp: Do not return EINPROGRESS when subflow creation succeeds
  Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
  Bluetooth: Always set event mask on suspend
  Bluetooth: mgmt: Fix double free on error path
  wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()
  ice: do not setup vlan for loopback VSI
  ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)
  ice: Fix VSIs unable to share unicast MAC
  ice: Fix tunnel checksum offload with fragmented traffic
  ice: Fix max VLANs available for VF
  netfilter: nft_queue: only allow supported familes and hooks
  ...

ice: allow toggling loopback mode via ndo_set_features callback

Add support for NETIF_F_LOOPBACK. This feature can be set via:
$ ethtool -K eth0 loopback <on|off>

Feature can be useful for local data path tests.

Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: compress branches in ice_set_features()

Instead of rather verbose comparison of current netdev->features bits vs
the incoming ones from user, let us compress them by a helper features
set that will be the result of netdev->features XOR features. This way,
current, extensive branches:

if (features & NETIF_F_BIT && !(netdev->features & NETIF_F_BIT))
set_feature(true);
else if (!(features & NETIF_F_BIT) && netdev->features & NETIF_F_BIT)
set_feature(false);

can become:

netdev_features_t changed = netdev->features ^ features;

if (changed & NETIF_F_BIT)
set_feature(!!(features & NETIF_F_BIT));

This is nothing new as currently several other drivers use this
approach, which I find much more convenient.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Fix promiscuous mode not turning off

When trust is turned off for the VF, the expectation is that promiscuous
and allmulticast filters are removed. Currently default VSI filter is not
getting cleared in this flow.

Example:

ip link set enp236s0f0 vf 0 trust on
ip link set enp236s0f0v0 promisc on
ip link set enp236s0f0 vf 0 trust off
/* promiscuous mode is still enabled on VF0 */

Remove switch filters for both cases.
This commit fixes above behavior by removing default VSI filters and
allmulticast filters when vf-true-promisc-support is OFF.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Introduce enabling promiscuous mode on multiple VF's

In current implementation default VSI switch filter is only able to
forward traffic to a single VSI. This limits promiscuous mode with
private flag 'vf-true-promisc-support' to a single VF. Enabling it on
the second VF won't work. Also allmulticast support doesn't seem to be
properly implemented when vf-true-promisc-support is true.

Use standard ice_add_rule_internal() function that already implements
forwarding to multiple VSI's instead of constructing AQ call manually.

Add switch filter for allmulticast mode when vf-true-promisc-support is
enabled. The same filter is added regardless of the flag - it doesn't
matter for this case.

Remove unnecessary fields in switch structure. From now on book keeping
will be done by ice_add_rule_internal().

Refactor unnecessarily passed function arguments.

To test:
1) Create 2 VM's, and two VF's. Attach VF's to VM's.
2) Enable promiscuous mode on both of them and check if
traffic is seen on both of them.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igb: convert .adjfreq to .adjfine

The 82576 PTP implementation still uses .adjfreq instead of using the newer
.adjfine.

This implementation uses a pre-simplified calculation since the base
increment value for the 82576 is just 16 * 2^19. Converting this into
scaled_ppm is tricky, and makes the intent a bit less clear.

Simply convert to the normal flow of multiplying the base increment value
by the scaled_ppm and then dividing by 1000000ULL << 16. This can be
implemented using mul_u64_u64_div_u64 which can avoid the possible overflow
that might occur for large adjustments.

Use of .adjfine can improve the precision of small adjustments and gets us
one driver closer to removing the old implementation from the kernel
entirely.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ixgbe: convert .adjfreq to .adjfine

Convert the ixgbe PTP frequency adjustment implementations from .adjfreq to
.adjfine. This allows using the scaled parts per million adjustment from
the PTP core and results in a more precise adjustment for small
corrections.

To avoid overflow, use mul_u64_u64_div_u64 to perform the calculation. On
X86 platforms, this will use instructions that perform the operations with
128bit intermediate values. For other architectures, the implementation
will limit the loss of precision as much as possible.

This change slightly improves the precision of frequency adjustments for
all ixgbe based devices, and gets us one driver closer to being able to
remove the older .adjfreq implementation from the kernel.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

i40e: convert .adjfreq to .adjfine

The i40e driver currently implements the .adjfreq handler for frequency
adjustments. This takes the adjustment parameter in parts per billion. The
PTP core supports .adjfine which provides an adjustment in scaled parts per
million. This has a higher resolution and can result in more precise
adjustments for small corrections.

Convert the existing .adjfreq implementation to the newer .adjfine
implementation. This is trivial since it just requires changing the divisor
from 1000000000ULL to (1000000ULL << 16) in the mul_u64_u64_div_u64 call.

This improves the precision of the adjustments and gets us one driver
closer to removing the old .adjfreq support from the kernel.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

i40e: use mul_u64_u64_div_u64 for PTP frequency calculation

The i40e device has a different clock rate depending on the current link
speed. This requires using a different increment rate for the PTP clock
registers. For slower link speeds, the base increment value is larger.
Directly multiplying the larger increment value by the parts per billion
adjustment might overflow.

To avoid this, the i40e implementation defaults to using the lower
increment value and then multiplying the adjustment afterwards. This causes
a loss of precision for lower link speeds.

We can fix this by using mul_u64_u64_div_u64 instead of performing the
multiplications using standard C operations. On X86, this will use special
instructions that perform the multiplication and division with 128bit
intermediate values. For other architectures, the fallback implementation
will limit the loss of precision for large values. Small adjustments don't
overflow anyways and won't lose precision at all.

This allows first multiplying the base increment value and then performing
the adjustment calculation, since we no longer fear overflowing. It also
makes it easier to convert to the even more precise .adjfine implementation
in a following change.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

e1000e: convert .adjfreq to .adjfine

The PTP implementation for the e1000e driver uses the older .adjfreq
method. This method takes an adjustment in parts per billion. The newer
.adjfine implementation uses scaled_ppm. The use of scaled_ppm allows for
finer grained adjustments and is preferred over using the older
implementation.

Make use of mul_u64_u64_div_u64 in order to handle possible overflow of the
multiplication used to calculate the desired adjustment to the hardware
increment value.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

e1000e: remove unnecessary range check in e1000e_phc_adjfreq

The e1000e_phc_adjfreq function validates that the input delta is within
the maximum range. This is already handled by the core PTP code and this is
a duplicate and thus unnecessary check. It also complicates refactoring to
use the newer .adjfine implementation, where the input is no longer
specified in parts per billion. Remove the range validation check.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: implement adjfine with mul_u64_u64_div_u64

The PTP frequency adjustment code needs to determine an appropriate
adjustment given an input scaled_ppm adjustment.

We calculate the adjustment to the register by multiplying the base
(nominal) increment value by the scaled_ppm and then dividing by the
scaled one million value.

For very large adjustments, this might overflow. To avoid this, both the
scaled_ppm and divisor values are downshifted.

We can avoid that on X86 architectures by using mul_u64_u64_div_u64. This
helper function will perform the multiplication and division with 128bit
intermediate values. We know that scaled_ppm is never larger than the
divisor so this operation will never result in an overflow.

This improves the accuracy of the calculations for large adjustment values
on X86. It is likely an improvement on other architectures as well because
the default implementation of mul_u64_u64_div_u64 is smarter than the
original approach taken in the ice code.

Additionally, this implementation is easier to read, using fewer local
variables and lines of code to implement.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

stmmac: dwmac-mediatek: fix resource leak in probe

If mediatek_dwmac_clks_config() fails, then call stmmac_remove_config_dt()
before returning. Otherwise it is a resource leak.

Fixes: fa4b3ca60e80 ("stmmac: dwmac-mediatek: fix clock issue")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/YuJ4aZyMUlG6yGGa@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr

Change net device's MTU to smaller than IPV6_MIN_MTU or unregister
device while matching route. That may trigger null-ptr-deref bug
for ip6_ptr probability as following.

=========================================================
BUG: KASAN: null-ptr-deref in find_match.part.0+0x70/0x134
Read of size 4 at addr 0000000000000308 by task ping6/263

CPU: 2 PID: 263 Comm: ping6 Not tainted 5.19.0-rc7+ #14
Call trace:
dump_backtrace+0x1a8/0x230
show_stack+0x20/0x70
dump_stack_lvl+0x68/0x84
print_report+0xc4/0x120
kasan_report+0x84/0x120
__asan_load4+0x94/0xd0
find_match.part.0+0x70/0x134
__find_rr_leaf+0x408/0x470
fib6_table_lookup+0x264/0x540
ip6_pol_route+0xf4/0x260
ip6_pol_route_output+0x58/0x70
fib6_rule_lookup+0x1a8/0x330
ip6_route_output_flags_noref+0xd8/0x1a0
ip6_route_output_flags+0x58/0x160
ip6_dst_lookup_tail+0x5b4/0x85c
ip6_dst_lookup_flow+0x98/0x120
rawv6_sendmsg+0x49c/0xc70
inet_sendmsg+0x68/0x94

Reproducer as following:
Firstly, prepare conditions:
$ip netns add ns1
$ip netns add ns2
$ip link add veth1 type veth peer name veth2
$ip link set veth1 netns ns1
$ip link set veth2 netns ns2
$ip netns exec ns1 ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1
$ip netns exec ns2 ip -6 addr add 2001:0db8:0:f101::2/64 dev veth2
$ip netns exec ns1 ifconfig veth1 up
$ip netns exec ns2 ifconfig veth2 up
$ip netns exec ns1 ip -6 route add 2000::/64 dev veth1 metric 1
$ip netns exec ns2 ip -6 route add 2001::/64 dev veth2 metric 1

Secondly, execute the following two commands in two ssh windows
respectively:
$ip netns exec ns1 sh
$while true; do ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1; ip -6 route add 2000::/64 dev veth1 metric 1; ping6 2000::2; done

$ip netns exec ns1 sh
$while true; do ip link set veth1 mtu 1000; ip link set veth1 mtu 1500; sleep 5; done

It is because ip6_ptr has been assigned to NULL in addrconf_ifdown() firstly,
then ip6_ignore_linkdown() accesses ip6_ptr directly without NULL check.

cpu0 cpu1
fib6_table_lookup
__find_rr_leaf
addrconf_notify [ NETDEV_CHANGEMTU ]
addrconf_ifdown
RCU_INIT_POINTER(dev->ip6_ptr, NULL)
find_match
ip6_ignore_linkdown

So we can add NULL check for ip6_ptr before using in ip6_ignore_linkdown() to
fix the null-ptr-deref bug.

Fixes: dcd1f572954f ("net/ipv6: Remove fib6_idev")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220728013307.656257-1-william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ping6: Fix memleak in ipv6_renew_options().

When we close ping6 sockets, some resources are left unfreed because
pingv6_prot is missing sk->sk_prot->destroy().  As reported by
syzbot [0], just three syscalls leak 96 bytes and easily cause OOM.

    struct ipv6_sr_hdr *hdr;
    char data[24] = {0};
    int fd;

    hdr = (struct ipv6_sr_hdr *)data;
    hdr->hdrlen = 2;
    hdr->type = IPV6_SRCRT_TYPE_4;

    fd = socket(AF_INET6, SOCK_DGRAM, NEXTHDR_ICMP);
    setsockopt(fd, IPPROTO_IPV6, IPV6_RTHDR, data, 24);
    close(fd);

To fix memory leaks, let's add a destroy function.

Note the socket() syscall checks if the GID is within the range of
net.ipv4.ping_group_range.  The default value is [1, 0] so that no
GID meets the condition (1 <= GID <= 0).  Thus, the local DoS does
not succeed until we change the default value.  However, at least
Ubuntu/Fedora/RHEL loosen it.

    $ cat /usr/lib/sysctl.d/50-default.conf
    ...
    -net.ipv4.ping_group_range = 0 2147483647

Also, there could be another path reported with these options, and
some of them require CAP_NET_RAW.

  setsockopt
      IPV6_ADDRFORM (inet6_sk(sk)->pktoptions)
      IPV6_RECVPATHMTU (inet6_sk(sk)->rxpmtu)
      IPV6_HOPOPTS (inet6_sk(sk)->opt)
      IPV6_RTHDRDSTOPTS (inet6_sk(sk)->opt)
      IPV6_RTHDR (inet6_sk(sk)->opt)
      IPV6_DSTOPTS (inet6_sk(sk)->opt)
      IPV6_2292PKTOPTIONS (inet6_sk(sk)->opt)

  getsockopt
      IPV6_FLOWLABEL_MGR (inet6_sk(sk)->ipv6_fl_list)

For the record, I left a different splat with syzbot's one.

  unreferenced object 0xffff888006270c60 (size 96):
    comm "repro2", pid 231, jiffies 4294696626 (age 13.118s)
    hex dump (first 32 bytes):
      01 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00  ....D...........
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    backtrace:
      [<00000000f6bc7ea9>] sock_kmalloc (net/core/sock.c:2564 net/core/sock.c:2554)
      [<000000006d699550>] do_ipv6_setsockopt.constprop.0 (net/ipv6/ipv6_sockglue.c:715)
      [<00000000c3c3b1f5>] ipv6_setsockopt (net/ipv6/ipv6_sockglue.c:1024)
      [<000000007096a025>] __sys_setsockopt (net/socket.c:2254)
      [<000000003a8ff47b>] __x64_sys_setsockopt (net/socket.c:2265 net/socket.c:2262 net/socket.c:2262)
      [<000000007c409dcb>] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
      [<00000000e939c4a9>] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)

[0]: https://syzkaller.appspot.com/bug?extid=a8430774139ec3ab7176

Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.")
Reported-by: syzbot+a8430774139ec3ab7176@syzkaller.appspotmail.com
Reported-by: Ayushman Dutta <ayudutta@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220728012220.46918-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

watch_queue: Fix missing locking in add_watch_to_object()

If a watch is being added to a queue, it needs to guard against
interference from addition of a new watch, manual removal of a watch and
removal of a watch due to some other queue being destroyed.

KEYCTL_WATCH_KEY guards against this for the same {key,queue} pair by
holding the key->sem writelocked and by holding refs on both the key and
the queue - but that doesn't prevent interaction from other {key,queue}
pairs.

While add_watch_to_object() does take the spinlock on the event queue,
it doesn't take the lock on the source's watch list. The assumption was
that the caller would prevent that (say by taking key->sem) - but that
doesn't prevent interference from the destruction of another queue.

Fix this by locking the watcher list in add_watch_to_object().

Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Reported-by: syzbot+03d7b43290037d1f87ca@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: keyrings@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

watch_queue: Fix missing rcu annotation

Since __post_watch_notification() walks wlist->watchers with only the
RCU read lock held, we need to use RCU methods to add to the list (we
already use RCU methods to remove from the list).

Fix add_watch_to_object() to use hlist_add_head_rcu() instead of
hlist_add_head() for that list.

Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

net: cdns,macb: use correct xlnx prefix for Xilinx

Use correct vendor for Xilinx versions of Cadence MACB/GEM Ethernet
controller. The Versal compatible was not released, so it can be
changed. Zynq-7xxx and Ultrascale+ has to be kept in new and deprecated
form.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Harini Katakam <harini.katakam@amd.com>
Link: https://lore.kernel.org/r/20220726070802.26579-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dt-bindings: net: cdns,macb: use correct xlnx prefix for Xilinx

Use correct vendor for Xilinx versions of Cadence MACB/GEM Ethernet
controller. The Versal compatible was not released, so it can be
changed. Zynq-7xxx and Ultrascale+ has to be kept in new and deprecated
form.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220726070802.26579-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/funeth: Fix fun_xdp_tx() and XDP packet reclaim

The current implementation of fun_xdp_tx(), used for XPD_TX, is
incorrect in that it takes an address/length pair and later releases it
with page_frag_free(). It is OK for XDP_TX but the same code is used by
ndo_xdp_xmit. In that case it loses the XDP memory type and releases the
packet incorrectly for some of the types. Assorted breakage follows.

Change fun_xdp_tx() to take xdp_frame and rely on xdp_return_frame() in
reclaim.

Fixes: db37bc177dae ("net/funeth: add the data path")
Signed-off-by: Dimitris Michailidis <dmichail@fungible.com>
Link: https://lore.kernel.org/r/20220726215923.7887-1-dmichail@fungible.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

wifi: brcmfmac: prevent double-free on hardware-reset

In case of buggy firmware, brcmfmac may perform a hardware reset. If during
reset and subsequent probing an early failure occurs, a memory region is
accidentally double-freed. With hardened memory allocation enabled, this error
will be detected.

- return early where appropriate to skip unnecessary clean-up.
- set '.freezer' pointer to NULL to prevent double-freeing under possible
  other circumstances and to re-align result under various different
  behaviors of memory allocation freeing.
- correctly claim host on func1 for disabling func2.
- after reset, do not initiate probing immediately, but rely on events.

Given a firmware crash, function 'brcmf_sdio_bus_reset' is called. It calls
'brcmf_sdiod_remove', then follows up with 'brcmf_sdiod_probe' to reinitialize
the hardware. If 'brcmf_sdiod_probe' fails to "set F1 blocksize", it exits
early, which includes calling 'brcmf_sdiod_remove'. In both cases
'brcmf_sdiod_freezer_detach' is called to free allocated '.freezer', which
has not yet been re-allocated the second time.

Stacktrace of (failing) hardware reset after firmware-crash:

Code: b9402b82 8b0202c0 eb1a02df 54000041 (d4210000)
ret_from_fork+0x10/0x20
kthread+0x154/0x160
worker_thread+0x188/0x504
process_one_work+0x1f4/0x490
brcmf_core_bus_reset+0x34/0x44 [brcmfmac]
brcmf_sdio_bus_reset+0x68/0xc0 [brcmfmac]
brcmf_sdiod_probe+0x170/0x21c [brcmfmac]
brcmf_sdiod_remove+0x48/0xc0 [brcmfmac]
kfree+0x210/0x220
__slab_free+0x58/0x40c
Call trace:
x2 : 0000000000000040 x1 : fffffc00002d2b80 x0 : ffff00000b4aee40
x5 : ffff8000013fa728 x4 : 0000000000000001 x3 : ffff00000b4aee00
x8 : ffff800009967ce0 x7 : ffff8000099bfce0 x6 : 00000006f8005d01
x11: ffff8000099bfce0 x10: 00000000fffff000 x9 : ffff8000083401d0
x14: 0000000000000000 x13: 657a69736b636f6c x12: 6220314620746573
x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000030
x20: fffffc00002d2ba0 x19: fffffc00002d2b80 x18: 0000000000000000
x23: ffff00000b4aee00 x22: ffff00000b4aee00 x21: 0000000000000001
x26: ffff00000b4aee00 x25: ffff0000f7753705 x24: 000000000001288a
x29: ffff80000a22bbf0 x28: ffff000000401200 x27: 000000008020001a
sp : ffff80000a22bbf0
lr : kfree+0x210/0x220
pc : __slab_free+0x58/0x40c
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
Workqueue: events brcmf_core_bus_reset [brcmfmac]
Hardware name: Pine64 Pinebook Pro (DT)
CPU: 2 PID: 639 Comm: kworker/2:2 Tainted: G         C        5.16.0-0.bpo.4-arm64 #1  Debian 5.16.12-1~bpo11+1
nvmem_rockchip_efuse industrialio_triggered_buffer videodev snd_soc_core snd_pcm_dmaengine kfifo_buf snd_pcm io_domain mc industrialio mt>
Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reje>
Internal error: Oops - BUG: 0 [#1] SMP
kernel BUG at mm/slub.c:379!

Signed-off-by: Danny van Heumen <danny@dannyvanheumen.nl>
Reviewed-by: Arend van Spriel <aspriel.gmail.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/id1HN6qCMAirApBzTA6fT7ZFWBBGCJhULpflxQ7NT6cgCboVnn3RHpiOFjA9SbRqzBRFLk9ES0C4FNvO6fUQsNg7pqF6ZSNAYUo99nHy8PY=@dannyvanheumen.nl

wifi: brcmfmac: support brcm,ccode-map-trivial DT property

Commit a21bf90e927f ("brcmfmac: use ISO3166 country code and 0 rev as
fallback on some devices") introduced a fallback mechanism whereby a
trivial mapping from ISO3166 country codes to firmware country code and
revision is used on some devices. This fallback operates on the device
level, so it is enabled only for certain supported chipsets.

In general though, the firmware country codes are determined by the CLM
blob, which is board-specific and may vary despite the underlying
chipset being the same.

The aforementioned commit is actually a refinement of a previous commit
that was reverted in commit 151a7c12c4fc ("Revert "brcmfmac: use ISO3166
country code and 0 rev as fallback"") due to regressions with a BCM4359
device. The refinement restricted the fallback mechanism to specific
chipsets such as the BCM4345.

We use a chipset - CYW88359 - that the driver identifies as a BCM4359
too. But in our case, the CLM blob uses ISO3166 country codes
internally, and all with revision 0. So the trivial mapping is exactly
what is needed in order for the driver to sync the kernel regulatory
domain to the firmware. This is just a matter of how the CLM blob was
prepared by the hardware vendor. The same could hold for other boards
too.

Although the brcm,ccode-map device tree property is useful for cases
where the mapping is more complex, the trivial case invites a much
simpler specification. This patch adds support for parsing the
brcm,ccode-map-trivial device tree property. Subordinate to the more
specific brcm,ccode-map property, this new proprety simply informs the
driver that the fallback method should be used in every case.

In the absence of the new property in the device tree, expect no
functional change.

Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220711123005.3055300-3-alvin@pqrs.dk

dt-bindings: bcm4329-fmac: add optional brcm,ccode-map-trivial

The bindings already offer a brcm,ccode-map property to describe the
mapping between the kernel's ISO3166 alpha 2 country code string and the
firmware's country code string and revision number. This is a
board-specific property and determined by the CLM blob firmware provided
by the hardware vendor.

However, in some cases the firmware will also use ISO3166 country codes
internally, and the revision will always be zero. This implies a trivial
mapping: cc -> { cc, 0 }.

For such cases, add an optional property brcm,ccode-map-trivial which
obviates the need to describe every trivial country code mapping in the
device tree with the existing brcm,ccode-map property. The new property
is subordinate to the more explicit brcm,ccode-map property.

Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220711123005.3055300-2-alvin@pqrs.dk

wifi: brcmfmac: Replace default (not configured) MAC with a random MAC

On some boards there is no eeprom to hold the nvram, in this case instead
a board specific nvram is loaded from /lib/firmware. On most boards the
macaddr=... setting in the /lib/firmware nvram file is ignored because
the wifi/bt chip has a unique MAC programmed into the chip itself.

But in some cases the actual MAC from the /lib/firmware nvram file gets
used, leading to MAC conflicts.

The MAC addresses in the troublesome nvram files seem to all come from
the same nvram file template, so we can detect this by checking for
the template nvram file MAC.

Detect that the default MAC address is being used and replace it
with a random MAC address to avoid MAC address conflicts.

Note that udev will detect this is a random MAC based on
/sys/class/net/wlan0/addr_assign_type and then replace this with
a MAC based on hashing the netdev-name + the machine-id. So that
the MAC address is both guaranteed to be unique per machine while
it is still the same/persistent at each boot (assuming the
default Link.MACAddressPolicy=persistent udev setting).

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220708133712.102179-2-hdegoede@redhat.com

wifi: brcmfmac: Add brcmf_c_set_cur_etheraddr() helper

Add a little helper to send "cur_etheraddr" commands to the interface
and to handle the error reporting of it in a single place.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Arend van Spriel <aspriel@gmail.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220708133712.102179-1-hdegoede@redhat.com

wifi: brcmfmac: Remove #ifdef guards for PM related functions

Use the new DEFINE_SIMPLE_DEV_PM_OPS() and pm_sleep_ptr() macros to
handle the .suspend/.resume callbacks.

These macros allow the suspend and resume functions to be automatically
dropped by the compiler when CONFIG_SUSPEND is disabled, without having
to use #ifdef guards.

Some other functions not directly called by the .suspend/.resume
callbacks, but still related to PM were also taken outside #ifdef
guards.

The advantage is then that these functions are now always compiled
independently of any Kconfig option, and thanks to that bugs and
regressions are easier to catch.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220627193701.31074-1-paul@crapouillou.net

wifi: brcmfmac: use strreplace() in brcmf_of_probe()

The for loop in brcmf_of_probe() would ideally end with something like
"i <= strlen(board_type)" instead of "i < board_type[i]". But
fortunately, the two are equivalent.

Anyway, it's simpler to use strreplace() instead.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/YqrhsKcjEA7B2pC4@kili

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
ice: PPPoE offload support

Marcin Szycik says:

Add support for dissecting PPPoE and PPP-specific fields in flow dissector:
PPPoE session id and PPP protocol type. Add support for those fields in
tc-flower and support offloading PPPoE. Finally, add support for hardware
offload of PPPoE packets in switchdev mode in ice driver.

Example filter:
tc filter add dev $PF1 ingress protocol ppp_ses prio 1 flower pppoe_sid \
    1234 ppp_proto ip skip_sw action mirred egress redirect dev $VF1_PR

Changes in iproute2 are required to use the new fields (will be submitted
soon).

ICE COMMS DDP package is required to create a filter in ice.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: Add support for PPPoE hardware offload
  flow_offload: Introduce flow_match_pppoe
  net/sched: flower: Add PPPoE filter
  flow_dissector: Add PPPoE dissectors
====================

Link: https://lore.kernel.org/r/20220726203133.2171332-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge patch series "can: add ethtool support and reporting of timestamping capabilities"

Vincent Mailhol <mailhol.vincent@wanadoo.fr> says:

====================

This series revolves around ethtool and timestamping. Its ultimate
goal is that the timestamping implementation within socketCAN meets
the specification of other network drivers in the kernel. This way,
tcpdump or other tools derived from libpcap can be used to do
timestamping on CAN devices.

* Example on a device with hardware timestamp support *

Before this series:
| # tcpdump -j adapter_unsynced -i can0
| tcpdump: WARNING: When trying to set timestamp type
| 'adapter_unsynced' on can0: That type of time stamp is not supported
| by that device

After applying this series, the warning disappears and tcpdump can be
used to get RX hardware timestamps.

This series is articulated in three major parts.

* Part 1: Add TX software timestamps and report the software
  timestamping capabilities through ethtool.

All the drivers using can_put_echo_skb() already support TX software
timestamps. However, the five drivers not using this function (namely
can327, janz-ican3, slcan, vcan and vxcan) lack such support. Patch 1
to 4 adds this support.  Finally, patch 5 advertises the timesamping
capabilities of all drivers which do not support hardware timestamps.

* Part 2: add TX hardware timestapms

This part is a single patch. In SocketCAN TX hardware is equal to the
RX hardware timestamps of the corresponding loopback frame. Reuse the
TX hardware timestamp to populate the RX hardware timestamp. While the
need of this feature can be debatable, we implement it here so that
generic timestamping tools which are agnostic of the specificity of
SocketCAN can still obtain the value. For example, tcpdump expects for
both TX and RX hardware timestamps to be supported in order to do:
| # tcpdump -j adapter_unsynced -i canX

* Part 3: report the hardware timestamping capabilities and implement
  the hardware timestamps ioctls.

The kernel documentation specifies in [1] that, for the drivers which
support hardware timestamping, SIOCSHWTSTAMP ioctl must be supported
and that SIOCGHWTSTAMP ioctl should be supported. Currently, none of
the CAN drivers do so. This is a gap.

Furthermore, even if not specified, the tools based on libpcap
(e.g. tcpdump) also expect ethtool_ops::get_ts_info to be implemented.

This last part first adds some generic implementation of
net_device_ops::ndo_eth_ioctl and ethtool_ops::get_ts_info which can
be used by the drivers with hardware timestamping capabilities.

It then uses those generic functions to add ioctl and reporting
functionalities to the drivers with hardware timestamping support
(namely: mcp251xfd, etas_es58x, kvaser_{pciefd,usb}, peak_{canfd,usb})

[1] Kernel doc: Timestamping, section 3.1 "Hardware Timestamping
    Implementation: Device Drivers"
Link: https://docs.kernel.org/networking/timestamping.html#hardware-timestamping-implementation-device-drivers
* Testing *

I also developed a tool to test all the different timestamps. For
those who would also like to test it, please have a look at:
https://lore.kernel.org/linux-can/20220725134345.432367-1-mailhol.vincent@wanadoo.fr/T/

* Changelog *

changes since v3: https://lore.kernel.org/all/20220726102454.95096-1-mailhol.vincent@wanadoo.fr

  * The peak drivers (both PCI and USB) do not support hardware TX
    timestamps (only RX). Implement specific ioctl and ethtool
    callback functions for this device.

changes since v2: https://lore.kernel.org/all/20220725155354.482986-1-mailhol.vincent@wanadoo.fr

  * The c_can, flexcan, mcp251xfd and the slcan drivers already
    declared a struct ethtool_ops. Do not declare again the same
    structure and instead populate the .get_ts_info() field of the
    existing structures.

changes since v1: https://lore.kernel.org/all/20220725133208.432176-1-mailhol.vincent@wanadoo.fr

  * First series had a patch to implement
    ethtool_ops::get_drvinfo. This proved to be useless. This patch
    was removed and all the clean-up patches made in preparation of
    that one were moved to a separate series:

    https://lore.kernel.org/linux-can/20220725153124.467061-1-mailhol.vincent@wanadoo.fr/T/#u

====================

Link: https://lore.kernel.org/all/20220727101641.198847-1-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: peak_usb: advertise timestamping capabilities and add ioctl support

Currently, userland has no method to query which timestamping features
are supported by the peak_usb driver (aside maybe of getting RX
messages and observe whether or not hardware timestamps stay at zero).

The canonical way to add hardware timestamp support is to implement
ethtool_ops::get_ts_info() in order to advertise the timestamping
capabilities and to implement net_device_ops::ndo_eth_ioctl() as
requested in [1]. Currently, the driver only supports hardware RX
timestamps [2] but not hardware TX. For this reason, the generic
function can_ethtool_op_get_ts_info_hwts() and can_eth_ioctl_hwts()
can not be reused and instead this patch adds pcan_get_ts_info() and
peak_eth_ioctl().

[1] kernel doc Timestamping, section 3.1: "Hardware Timestamping
Implementation: Device Drivers"
Link: https://docs.kernel.org/networking/timestamping.html#hardware-timestamping-implementation-device-drivers
[2] https://lore.kernel.org/linux-can/20220727080634.l6uttnbrmwbabh3o@pengutronix.de/

CC: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20220727101641.198847-15-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: peak_canfd: advertise timestamping capabilities and add ioctl support

Currently, userland has no method to query which timestamping features
are supported by the peak_canfd driver (aside maybe of getting RX
messages and observe whether or not hardware timestamps stay at zero).

The canonical way to add hardware timestamp support is to implement
ethtool_ops::get_ts_info() in order to advertise the timestamping
capabilities and to implement net_device_ops::ndo_eth_ioctl() as
requested in [1]. Currently, the driver only supports hardware RX
timestamps [2] but not hardware TX. For this reason, the generic
function can_ethtool_op_get_ts_info_hwts() and can_eth_ioctl_hwts()
can not be reused and instead this patch adds peak_get_ts_info() and
peak_eth_ioctl().

[1] kernel doc Timestamping, section 3.1: "Hardware Timestamping
Implementation: Device Drivers"
Link: https://docs.kernel.org/networking/timestamping.html#hardware-timestamping-implementation-device-drivers
[2] https://lore.kernel.org/linux-can/20220727084257.brcbbf7lksoeekbr@pengutronix.de/

CC: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20220727101641.198847-14-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: kvaser_usb: advertise timestamping capabilities and add ioctl support

Currently, userland has no method to query which timestamping features
are supported by the kvaser_usb driver (aside maybe of getting RX
messages and observe whether or not hardware timestamps stay at zero).

The canonical way for a network driver to advertise what kind of
timestamping it supports is to implement
ethtool_ops::get_ts_info(). Here, we use the CAN specific
can_ethtool_op_get_ts_info_hwts() function to achieve this.

In addition, the driver currently does not support the hardware
timestamps ioctls. According to [1], SIOCSHWTSTAMP is "must" and
SIOCGHWTSTAMP is "should". This patch fills up that gap by
implementing net_device_ops::ndo_eth_ioctl() using the CAN specific
function can_eth_ioctl_hwts().

[1] kernel doc Timestamping, section 3.1: "Hardware Timestamping
Implementation: Device Drivers"
Link: https://docs.kernel.org/networking/timestamping.html#hardware-timestamping-implementation-device-drivers
CC: Jimmy Assarsson <extja@kvaser.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20220727101641.198847-13-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: kvaser_pciefd: advertise timestamping capabilities and add ioctl support

Currently, userland has no method to query which timestamping features
are supported by the kvaser_pciefd driver (aside maybe of getting RX
messages and observe whether or not hardware timestamps stay at zero).

The canonical way for a network driver to advertise what kind of
timestamping it supports is to implement
ethtool_ops::get_ts_info(). Here, we use the CAN specific
can_ethtool_op_get_ts_info_hwts() function to achieve this.

In addition, the driver currently does not support the hardware
timestamps ioctls. According to [1], SIOCSHWTSTAMP is "must" and
SIOCGHWTSTAMP is "should". This patch fills up that gap by
implementing net_device_ops::ndo_eth_ioctl() using the CAN specific
function can_eth_ioctl_hwts().

[1] kernel doc Timestamping, section 3.1: "Hardware Timestamping
Implementation: Device Drivers"
Link: https://docs.kernel.org/networking/timestamping.html#hardware-timestamping-implementation-device-drivers
CC: Jimmy Assarsson <extja@kvaser.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20220727101641.198847-12-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: etas_es58x: advertise timestamping capabilities and add ioctl support

Currently, userland has no method to query which timestamping features
are supported by the etas_es58x driver (aside maybe of getting RX
messages and observe whether or not hardware timestamps stay at zero).

The canonical way for a network driver to advertise what kind of
timestamping is supports is to implement
ethtool_ops::get_ts_info(). Here, we use the CAN specific
can_ethtool_op_get_ts_info_hwts() function to achieve this.

In addition, the driver currently does not support the hardware
timestamps ioctls. According to [1], SIOCSHWTSTAMP is "must" and
SIOCGHWTSTAMP is "should". This patch fills up that gap by
implementing net_device_ops::ndo_eth_ioctl() using the CAN specific
function can_eth_ioctl_hwts().

[1] kernel doc Timestamping, section 3.1: "Hardware Timestamping
Implementation: Device Drivers"
Link: https://docs.kernel.org/networking/timestamping.html#hardware-timestamping-implementation-device-drivers
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20220727101641.198847-11-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: advertise timestamping capabilities and add ioctl support

Currently, userland has no methods to query which timestamping
features are supported by the mcp251xfd driver (aside maybe of getting
RX messages and observe whether or not hardware timestamps stay at
zero).

The canonical way for a network driver to advertise what kind of
timestamping it supports is to implement
ethtool_ops::get_ts_info(). Here, we use the CAN specific
can_ethtool_op_get_ts_info_hwts() function to achieve this.

In addition, the driver currently does not support the hardware
timestamps ioctls. According to [1], SIOCSHWTSTAMP is "must" and
SIOCGHWTSTAMP is "should". This patch fills up that gap by
implementing net_device_ops::ndo_eth_ioctl() using the CAN specific
function can_eth_ioctl_hwts().

[1] kernel doc Timestamping, section 3.1: "Hardware Timestamping
Implementation: Device Drivers"
Link: https://docs.kernel.org/networking/timestamping.html#hardware-timestamping-implementation-device-drivers
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20220727101641.198847-10-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: dev: add generic function can_eth_ioctl_hwts()

Tools based on libpcap (such as tcpdump) expect the SIOCSHWTSTAMP
ioctl call to be supported. This is also specified in the kernel doc
[1]. The purpose of this ioctl is to toggle the hardware timestamps.

Currently, CAN devices which support hardware timestamping have those
always activated. can_eth_ioctl_hwts() is a dumb function that will
always succeed when requested to set tx_type to HWTSTAMP_TX_ON or
rx_filter to HWTSTAMP_FILTER_ALL.

[1] Kernel doc: Timestamping, section 3.1 "Hardware Timestamping
Implementation: Device Drivers"
Link: https://docs.kernel.org/networking/timestamping.html#hardware-timestamping-implementation-device-drivers
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20220727101641.198847-9-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: dev: add generic function can_ethtool_op_get_ts_info_hwts()

Add function can_ethtool_op_get_ts_info_hwts(). This function will be
used by CAN devices with hardware TX/RX timestamping support to
implement ethtool_ops::get_ts_info. This function does not offer
support to activate/deactivate hardware timestamps at device level nor
support the filter options (which is currently the case for all CAN
devices with hardware timestamping support).

The fact that hardware timestamp can not be deactivated at hardware
level does not impact the userland. As long as the user do not set
SO_TIMESTAMPING using a setsockopt() or ioctl(), the kernel will not
emit TX timestamps (RX timestamps will still be reproted as it is the
case currently).

Drivers which need more fine grained control remains free to implement
their own function, but we foresee that the generic function
introduced here will be sufficient for the majority.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20220727101641.198847-8-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: dev: add hardware TX timestamp

Because of the loopback feature of socket CAN, hardware TX timestamps
are nothing else than the hardware RX timespamp of the corresponding
loopback packet. This patch simply reuses the hardware RX timestamp.

The rationale to clone this timestamp value is that existing tools
which rely of libpcap (such as tcpdump) expect support for both TX and
RX hardware timestamps in order to activate the feature (i.e. no
granular control to activate either of TX or RX hardware timestamps).

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20220727101641.198847-7-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: tree-wide: advertise software timestamping capabilities

Currently, some CAN drivers support hardware timestamping, some do
not. But userland has no method to query which features are supported
(aside maybe of getting RX messages and observe whether or not
hardware timestamps stay at zero).

The canonical way for a network driver to advertised what kind of
timestamping it supports is to implement ethtool_ops::get_ts_info().

This patch only targets the CAN drivers which *do not* support
hardware timestamping. For each of those CAN drivers, implement the
get_ts_info() using the generic ethtool_op_get_ts_info().

This way, userland can do:

| $ ethtool --show-time-stamping canX

to confirm the device timestamping capacities.

N.B. the drivers which support hardware timestamping will be migrated
in separate patches.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20220727101641.198847-6-mailhol.vincent@wanadoo.fr
[mkl: mscan: add missing mscan_ethtool_ops]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

add missing includes and forward declarations to networking includes under linux/

Similarly to a recent include/net/ cleanup, this patch adds
missing includes to networking headers under include/linux.
All these problems are currently masked by the existing users
including the missing dependency before the broken header.

Link: https://lore.kernel.org/all/20220723045755.2676857-1-kuba@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20220726215652.158167-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Revert "Merge branch 'octeontx2-minor-tc-fixes'"

This reverts commit 35d099da41967f114c6472b838e12014706c26e7, reversing
changes made to 58d8bcd47ecc55f1ab92320fe36c31ff4d83cc0c.

I wrongly applied that to the net-next tree instead of the intended
target tree (net). Reverting it on net-next.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>

can: v(x)can: add software tx timestamps

TX timestamps were added to the can_put_echo_skb() function of can_dev
modules in [1]. However, vcan and vxcan do not rely on that function
and as such do not offer TX timestamping.

While it could be arguable whether TX timestamps are really needed for
virtual interfaces, we prefer to still add it so that all CAN drivers,
without exception, support the software TX timestamps.

Add a call to skb_tx_timestamp() in the vcan_tx() and vxcan_xmit()
functions so that the modules now support TX software timestamps.

[1] commit 741b91f1b0ea ("can: dev: can_put_echo_skb(): add software
tx timestamps")
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=741b91f1b0ea34f00f6a7d4539b767c409291fcf
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20220727101641.198847-5-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: slcan: add software tx timestamps

TX timestamps were added to the can_put_echo_skb() function of can_dev
modules in [1]. However, slcan does not rely on that function and as
such does not offer TX timestamping.

Add a call to skb_tx_timestamp() in the slc_xmit() function so that
the module now supports TX software timestamps.

[1] commit 741b91f1b0ea ("can: dev: can_put_echo_skb(): add software
tx timestamps")
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=741b91f1b0ea34f00f6a7d4539b767c409291fcf
CC: Dario Binacchi <dario.binacchi@amarulasolutions.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20220727101641.198847-4-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: janz-ican3: add software tx timestamp

TX timestamps were added to the can_put_echo_skb() function of can_dev
modules in [1]. However, janz-ican3 does not rely on that function but
instead implements its own echo_skb logic. As such it does not offer
TX timestamping.

Add a call to skb_tx_timestamp() in the ican3_put_echo_skb() function
so that the module now supports TX software timestamps.

[1] commit 741b91f1b0ea ("can: dev: can_put_echo_skb(): add software
tx timestamps")
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=741b91f1b0ea34f00f6a7d4539b767c409291fcf
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20220727101641.198847-3-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: can327: add software tx timestamps

TX timestamps were added to the can_put_echo_skb() function of can_dev
modules in [1]. However, can327 does not rely on that function and as
such does not offer TX timestamping.

Add a call to skb_tx_timestamp() in the can327_netdev_start_xmit()
function so that the module now supports TX software timestamps.

[1] commit 741b91f1b0ea ("can: dev: can_put_echo_skb(): add software
tx timestamps")
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=741b91f1b0ea34f00f6a7d4539b767c409291fcf
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20220727101641.198847-2-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge patch series "can: slcan: extend supported features (step 2)"

Dario Binacchi <dario.binacchi@amarulasolutions.com> says:

====================

With this series I try to finish the task, started with the series [1],
of completely removing the dependency of the slcan driver from the
userspace slcand/slcan_attach applications.

The series also contains patches that remove the legacy stuff (slcan_devs,
SLCAN_MAGIC, ...) and do some module cleanup.

The series has been created on top of the patches:

can: slcan: convert comments to network style comments
can: slcan: slcan_init() convert printk(LEVEL ...) to pr_level()
can: slcan: fix whitespace issues
can: slcan: convert comparison to NULL into !val
can: slcan: clean up if/else
can: slcan: use scnprintf() as a hardening measure
can: slcan: do not report txerr and rxerr during bus-off
can: slcan: do not sleep with a spin lock held

applied to linux-next.

[1] https://lore.kernel.org/all/20220628163137.413025-1-dario.binacchi@amarulasolutions.com

Changes since v3: https://lore.kernel.org/all/20220726210217.3368497-1-dario.binacchi@amarulasolutions.com
- Add Max Staudt's `Reviewed-by' tag.
- Drop the patch "ethtool: add support to get/set CAN bit time register".
- Drop the patch "can: slcan: add support to set bit time register (btr)".
- Remove the RFC prefix from the series.

Changes since v2: https://lore.kernel.org/all/20220725065419.3005015-1-dario.binacchi@amarulasolutions.com
- Update the commit message.
- Use 1 space in front of the =.
- Put the series as RFC again.
- Pick up the patch "can: slcan: use KBUILD_MODNAME and define pr_fmt to replace hardcoded names".
- Add the patch "ethtool: add support to get/set CAN bit time register"
  to the series.
- Add the patch "can: slcan: add support to set bit time register (btr)"
  to the series.
- Replace the link https://marc.info/?l=linux-can&m=165806705927851&w=2 with
  https://lore.kernel.org/all/507b5973-d673-4755-3b64-b41cb9a13b6f@hartkopp.net.
- Add the `Suggested-by' tag.

Changes since RFC: https://lore.kernel.org/all/20220716170007.2020037-1-dario.binacchi@amarulasolutions.com
- Re-add headers that export at least one symbol used by the module.
- Update the commit description.
- Drop the old "slcan" name to use the standard canX interface naming.
- Remove comment on listen-only command.
- Update the commit subject and description.
- Add the patch "MAINTAINERS: Add myself as maintainer of the SLCAN driver"
  to the series.

====================

mkl: rebased to can-next/master

Link: https://lore.kernel.org/all/20220728070254.267974-1-dario.binacchi@amarulasolutions.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

MAINTAINERS: Add maintainer for the slcan driver

At the suggestion of its author Oliver Hartkopp ([1]), I take over the
maintainer-ship and add myself to the authors of the driver.

[1] https://lore.kernel.org/all/507b5973-d673-4755-3b64-b41cb9a13b6f@hartkopp.net

Suggested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com>
Link: https://lore.kernel.org/all/20220728070254.267974-8-dario.binacchi@amarulasolutions.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: slcan: add support for listen-only mode

For non-legacy, i.e. ip based configuration, add support for listen-only
mode. If listen-only is requested send a listen-only ("L\r") command
instead of an open ("O\r") command to the adapter.

Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com>
Link: https://lore.kernel.org/all/20220728070254.267974-7-dario.binacchi@amarulasolutions.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: slcan: use the generic can_change_mtu()

It is useless to define a custom function that does nothing but always
return the same error code. Better to use the generic can_change_mtu()
function.

Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com>
Link: https://lore.kernel.org/all/20220728070254.267974-6-dario.binacchi@amarulasolutions.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: slcan: change every `slc' occurrence in `slcan'

In the driver there are parts of code where the prefix `slc' is used and
others where the prefix `slcan' is used instead. The patch replaces
every occurrence of `slc' with `slcan', except for the netdev functions
where, to avoid compilation conflicts, it was necessary to replace `slc'
with `slcan_netdev'.

The patch does not make any functional changes.

Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com>
Link: https://lore.kernel.org/all/20220728070254.267974-5-dario.binacchi@amarulasolutions.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: slcan: remove legacy infrastructure

Taking inspiration from the drivers/net/can/can327.c driver and at the
suggestion of its author Max Staudt, I removed legacy stuff like
`SLCAN_MAGIC' and `slcan_devs' resulting in simplification of the code
and its maintainability.

The use of slcan_devs is derived from a very old kernel, since slip.c
is about 30 years old, so today's kernel allows us to remove it.

The .hangup() ldisc function, which only called the ldisc .close(), has
been removed since the ldisc layer calls .close() in a good place
anyway.

The old slcanX name has been dropped in order to use the standard canX
interface naming. The ioctl SIOCGIFNAME can be used to query the name of
the created interface. Furthermore, there are several ways to get stable
interfaces names in user space, e.g. udev or systemd-networkd.

The `maxdev' module parameter has also been removed.

CC: Max Staudt <max@enpas.org>
Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com>
Reviewed-by: Max Staudt <max@enpas.org>
Link: https://lore.kernel.org/all/20220728070254.267974-4-dario.binacchi@amarulasolutions.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: slcan: remove useless header inclusions

Include only the necessary headers.

Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com>
Link: https://lore.kernel.org/all/20220728070254.267974-3-dario.binacchi@amarulasolutions.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>