Git Repo - linux.git/log

net/mlx5: Add devlink subfunction port documentation

Add documentation for subfunction management using devlink
port.

Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

devlink: Extend devlink port documentation for subfunctions

Add devlink port documentation for subfunction management.

Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

devlink: Add devlink port documentation

Added documentation for devlink port and port function related commands.

Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: SF, Port function state change support

Support changing the state of the SF port's function through devlink.
When activating the SF port's function, enable the hca in the device
followed by adding its auxiliary device.
When deactivating the SF port's function, delete its auxiliary device
followed by disabling the vHCA.

Port function attributes get/set callbacks are invoked with devlink
instance lock held. Such callbacks need to synchronize with sf port
table getting disabled either via sriov sysfs callback. Such callbacks
synchronize with table disable context holding table refcount.

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show ens2f0npf0sf88
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:88:88 state inactive opstate detached

$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 state active

$ devlink port show ens2f0npf0sf88 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "external": false,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

On port function activation, an auxiliary device is created in below
example.

$ devlink dev show
devlink dev show auxiliary/mlx5_core.sf.4

$ devlink port show auxiliary/mlx5_core.sf.4/1
auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false

Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Vu Pham <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: SF, Add port add delete functionality

To handle SF port management outside of the eswitch as independent
software layer, introduce eswitch notifier APIs so that mlx5 upper
layer who wish to support sf port management in switchdev mode can
perform its task whenever eswitch mode is set to switchdev or before
eswitch is disabled.

Initialize sf port table on such eswitch event.

Add SF port add and delete functionality in switchdev mode.
Destroy all SF ports when eswitch is disabled.
Expose SF port add and delete to user via devlink commands.

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show ens2f0npf0sf88
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

or by its unique port index:
$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show ens2f0npf0sf88 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "external": false,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00",
                "state": "inactive",
                "opstate": "detached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Vu Pham <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: E-switch, Add eswitch helpers for SF vport

Add helpers to enable/disable eswitch port, register its devlink port and
load its representor.

Signed-off-by: Vu Pham <[email protected]>
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: E-switch, Prepare eswitch to handle SF vport

Prepare eswitch to handle SF vport during
(a) querying eswitch functions
(b) egress ACL creation
(c) account for SF vports in total vports calculation

Assign a dedicated placeholder for SFs vports and their representors.
They are placed after VFs vports and before ECPF vports as below:
[PF,VF0,...,VFn,SF0,...SFm,ECPF,UPLINK].

Change functions to map SF's vport numbers to indices when
accessing the vports or representors arrays, and vice versa.

Signed-off-by: Vu Pham <[email protected]>
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: SF, Add auxiliary device driver

Add auxiliary device driver for mlx5 subfunction auxiliary device.

A mlx5 subfunction is similar to PCI PF and VF. For a subfunction
an auxiliary device is created.

As a result, when mlx5 SF auxiliary device binds to the driver,
its netdev and rdma device are created, they appear as

$ ls -l /sys/bus/auxiliary/devices/
mlx5_core.sf.4 -> ../../../devices/pci0000:00/0000:00:03.0/0000:06:00.0/mlx5_core.sf.4

$ ls -l /sys/class/net/eth1/device
/sys/class/net/eth1/device -> ../../../mlx5_core.sf.4

$ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum
88

$ devlink dev show
pci/0000:06:00.0
auxiliary/mlx5_core.sf.4

$ devlink port show auxiliary/mlx5_core.sf.4/1
auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false

$ rdma link show mlx5_0/1
link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88

$ rdma dev show
8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112
13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112

In future, devlink device instance name will adapt to have sfnum
annotation using either an alias or as devlink instance name described
in RFC [1].

[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/

Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Vu Pham <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: SF, Add auxiliary device support

Introduce API to add and delete an auxiliary device for an SF.
Each SF has its own dedicated window in the PCI BAR 2.

SF device is similar to PCI PF and VF that supports multiple class of
devices such as net, rdma and vdpa.

SF device will be added or removed in subsequent patch during SF
devlink port function state change command.

A subfunction device exposes user supplied subfunction number which will
be further used by systemd/udev to have deterministic name for its
netdevice and rdma device.

An mlx5 subfunction auxiliary device example:

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show ens2f0npf0sf88
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:88:88 state inactive opstate detached

$ devlink port function set ens2f0npf0sf88 hw_addr 00:00:00:00:88:88 state active

On activation,

$ ls -l /sys/bus/auxiliary/devices/
mlx5_core.sf.4 -> ../../../devices/pci0000:00/0000:00:03.0/0000:06:00.0/mlx5_core.sf.4

$ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum
88

Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Vu Pham <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: Introduce vhca state event notifier

vhca state events indicates change in the state of the vhca that may
occur due to a SF allocation, deallocation or enabling/disabling the
SF HCA.

Introduce vhca state event handler which will be used by SF devlink
port manager and SF hardware id allocator in subsequent patches
to act on the event.

This enables single entity to subscribe, query and rearm the event
for a function.

Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Vu Pham <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

devlink: Support get and set state of port function

devlink port function can be in active or inactive state.
Allow users to get and set port function's state.

When the port function it activated, its operational state may change
after a while when the device is created and driver binds to it.
Similarly on deactivation flow.

To clearly describe the state of the port function and its device's
operational state in the host system, define state and opstate
attributes.

Example of a PCI SF port which supports a port function:

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:88:88 state inactive opstate detached

$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 state active

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "external": false,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Reviewed-by: Vu Pham <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

devlink: Support add and delete devlink port

Extended devlink interface for the user to add and delete a port.
Extend devlink to connect user requests to driver to add/delete
a port in the device.

Driver routines are invoked without holding devlink instance lock.
This enables driver to perform several devlink objects registration,
unregistration such as (port, health reporter, resource etc) by using
existing devlink APIs.
This also helps to uniformly use the code for port unregistration
during driver unload and during port deletion initiated by user.

Examples of add, show and delete commands:
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ udevadm test-builtin net_id /sys/class/net/eth6
Load module index
Parsed configuration file /usr/lib/systemd/network/99-default.link
Created link configuration context.
Using default interface naming scheme 'v245'.
ID_NET_NAMING_SCHEME=v245
ID_NET_NAME_PATH=enp6s0f0npf0sf88
ID_NET_NAME_SLOT=ens2f0npf0sf88
Unload module index
Unloaded link configuration context.

Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Vu Pham <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

devlink: Introduce PCI SF port flavour and port attribute

A PCI sub-function (SF) represents a portion of the device similar
to PCI VF.

In an eswitch, PCI SF may have port which is normally represented
using a representor netdevice.
To have better visibility of eswitch port, its association with SF,
and its representor netdevice, introduce a PCI SF port flavour.

When devlink port flavour is PCI SF, fill up PCI SF attributes of the
port.

Extend port name creation using PCI PF and SF number scheme on best
effort basis, so that vendor drivers can skip defining their own
scheme.
This is done as cApfNSfM, where A, N and M are controller, PCI PF and
PCI SF number respectively.
This is similar to existing naming for PCI PF and PCI VF ports.

An example view of a PCI SF port:

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:88:88 state active opstate attached

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Reviewed-by: Vu Pham <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

devlink: Prepare code to fill multiple port function attributes

Prepare code to fill zero or more port function optional attributes.
Subsequent patch makes use of this to fill more port function
attributes.

Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Reviewed-by: Vu Pham <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

bonding: add a vlan+srcmac tx hashing option

This comes from an end-user request, where they're running multiple VMs on
hosts with bonded interfaces connected to some interest switch topologies,
where 802.3ad isn't an option. They're currently running a proprietary
solution that effectively achieves load-balancing of VMs and bandwidth
utilization improvements with a similar form of transmission algorithm.

Basically, each VM has it's own vlan, so it always sends its traffic out
the same interface, unless that interface fails. Traffic gets split
between the interfaces, maintaining a consistent path, with failover still
available if an interface goes down.

Unlike bond_eth_hash(), this hash function is using the full source MAC
address instead of just the last byte, as there are so few components to
the hash, and in the no-vlan case, we would be returning just the last
byte of the source MAC as the hash value. It's entirely possible to have
two NICs in a bond with the same last byte of their MAC, but not the same
MAC, so this adjustment should guarantee distinct hashes in all cases.

This has been rudimetarily tested to provide similar results to the
proprietary solution it is aiming to replace. A patch for iproute2 is also
posted, to properly support the new mode there as well.

Cc: Jay Vosburgh <[email protected]>
Cc: Veaceslav Falico <[email protected]>
Cc: Andy Gospodarek <[email protected]>
Cc: Thomas Davis <[email protected]>
Signed-off-by: Jarod Wilson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: fix GSO for SG-enabled devices

The commit dbd50f238dec ("net: move the hsize check to the else
block in skb_segment") introduced a data corruption for devices
supporting scatter-gather.

The problem boils down to signed/unsigned comparison given
unexpected results: if signed 'hsize' is negative, it will be
considered greater than a positive 'len', which is unsigned.

This commit addresses resorting to the old checks order, so that
'hsize' never has a negative value when compared with 'len'.

v1 -> v2:
- reorder hsize checks instead of explicit cast (Alex)

Bisected-by: Matthieu Baerts <[email protected]>
Fixes: dbd50f238dec ("net: move the hsize check to the else block in skb_segment")
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Xin Long <[email protected]>
Link: https://lore.kernel.org/r/861947c2d2d087db82af93c21920ce8147d15490.1611074818.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>

octeontx2-af: Remove unneeded semicolons

fix semicolon.cocci warnings:
./drivers/net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c:272:2-3: Unneeded semicolon
drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c:1788:3-4: Unneeded semicolon
drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c:1809:3-4: Unneeded semicolon
drivers/net/ethernet/marvell/octeontx2/af/rvu.c:1326:2-3: Unneeded semicolon

Signed-off-by: Xu Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: smsc911x: Make Runtime PM handling more fine-grained

Currently the smsc911x driver has mininal power management: during
driver probe, the device is powered up, and during driver remove, it is
powered down.

Improve power management by making it more fine-grained:
  1. Power the device down when driver probe is finished,
  2. Power the device (down) when it is opened (closed),
  3. Make sure the device is powered during PHY access.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: forwarding: Fix spelling mistake "succeded" -> "succeeded"

There are two spelling mistakes in check_fail messages. Fix them.

Signed-off-by: Colin Ian King <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: tun: fix misspellings using codespell tool

Some typos are found out by codespell tool:

$ codespell -w -i 3 ./drivers/net/tun.c
aovid ==> avoid

Fix typos found by codespell.

Signed-off-by: Menglong Dong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

taprio: boolean values to a bool variable

Fix the following coccicheck warnings:

./net/sched/sch_taprio.c:393:3-16: WARNING: Assignment of 0/1 to bool
variable.

./net/sched/sch_taprio.c:375:2-15: WARNING: Assignment of 0/1 to bool
variable.

./net/sched/sch_taprio.c:244:4-19: WARNING: Assignment of 0/1 to bool
variable.

Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Jiapeng Zhong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'net-ethernet-ti-am65-cpsw-nuss-introduce-support-for-am64x-cpsw3g'

Grygorii Strashko says:

====================
net: ethernet: ti: am65-cpsw-nuss: introduce support for am64x cpsw3g

This series introduces basic support for recently introduced TI K3 AM642x SoC [1]
which contains 3 port (2 external ports) CPSW3g module. The CPSW3g integrated
in MAIN domain and can be configured in multi port or switch modes.
In this series only multi port mode is enabled. The initial version of switchdev
support was introduced by Vignesh Raghavendra [2] and work is in progress.

The overall functionality and DT bindings are similar to other K3 CPSWxg
versions, so DT binding changes are minimal and driver is mostly re-used for
TI K3 AM642x CPSW3g.

The main difference is that TI K3 AM642x SoC is not fully DMA coherent and
instead DMA coherency is supported per DMA channel.

Patches 1-2 - DT bindings update
Patches 3-4 - Update driver to support changed DMA coherency model
Patches 5-6 - adds TI K3 AM642x SoC platform data and so enable CPSW3g

[1] https://www.ti.com/lit/pdf/spruim2
[2] https://patchwork.ozlabs.org/project/netdev/cover/20201130082046 [email protected]/
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: ethernet: ti: am65-cpsw: add support for am64x cpsw3g

The TI AM64x SoCs Gigabit Ethernet Switch subsystem (CPSW3g NUSS) has three
ports (2 ext. ports) and provides Ethernet packet communication for the
device and can be configured in multi port mode or as an Ethernet switch.

This patch adds support for the corresponding CPSW3g version.

Signed-off-by: Vignesh Raghavendra <[email protected]>
Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ti: cpsw_ale: add driver data for AM64 CPSW3g

The AM642x CPSW3g is similar to j721e-cpswxg except its ALE table size is
512 entries. Add entry for the same.

Signed-off-by: Vignesh Raghavendra <[email protected]>
Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ethernet: ti: am65-cpsw-nuss: Support for transparent ASEL handling

Use the glue layer's functions to convert the dma_addr_t to and from CPPI5
address (with the ASEL bits), which should be used within the descriptors
and data buffers.

- Per channel coherency support
The DMAs use the 'ASEL' bits to select data and configuration fetch path.
The ASEL bits are placed at the unused parts of any address field used by
the DMAs (pointers to descriptors, addresses in descriptors, ring base
addresses). The ASEL is not part of the address (the DMAs can address
48bits). Individual channels can be configured to be coherent (via ACP
port) or non coherent individually by configuring the ASEL to appropriate
value.

[1] https://lore.kernel.org/patchwork/cover/1350756/
Signed-off-by: Peter Ujfalusi <[email protected]>
Co-developed-by: Vignesh Raghavendra <[email protected]>
Signed-off-by: Vignesh Raghavendra <[email protected]>
Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ethernet: ti: am65-cpsw-nuss: Use DMA device for DMA API

For DMA API the DMA device should be used as cpsw does not accesses to
descriptors or data buffers in any ways. The DMA does.

Also, drop dma_coerce_mask_and_coherent() setting on CPSW device, as it
should be done by DMA driver which does data movement.

This is required for adding AM64x CPSW3g support where DMA coherency
supported per DMA channel.

Signed-off-by: Peter Ujfalusi <[email protected]>
Co-developed-by: Vignesh Raghavendra <[email protected]>
Signed-off-by: Vignesh Raghavendra <[email protected]>
Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

dt-binding: net: ti: k3-am654-cpsw-nuss: update bindings for am64x cpsw3g

Update DT binding for recently introduced TI K3 AM642x SoC [1] which
contains 3 port (2 external ports) CPSW3g module. The CPSW3g integrated
in MAIN domain and can be configured in multi port or switch modes.

The overall functionality and DT bindings are similar to other K3 CPSWxg
versions, so DT binding changes are minimal:
- reword description
- add new compatible 'ti,am642-cpsw-nuss'
- allow 2 external ports child nodes
- add missed 'assigned-clock' props

[1] https://www.ti.com/lit/pdf/spruim2
Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

dt-binding: ti: am65x-cpts: add assigned-clock and power-domains props

The CPTS clock is usually a clk-mux which allows to select CPTS reference
clock by using 'assigned-clock-parents', 'assigned-clocks' DT properties.
Also depending on integration the power-domains has to be specified to
enable CPTS IP.

Hence add 'assigned-clock-parents', 'assigned-clocks' and 'power-domains'
properties to the CPTS DT bindings to avoid dtbs_check warnings:
cpts@310d0000: 'assigned-clock-parents', 'assigned-clocks' do not match any of the regexes: 'pinctrl-[0-9]+'
cpts@310d0000: 'power-domains' does not match any of the regexes: 'pinctrl-[0-9]+'

Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'net-support-sctp-crc-csum-offload-for-tunneling-packets-in-some-drivers'

Xin Long says:

====================
net: support SCTP CRC csum offload for tunneling packets in some drivers

This patchset introduces inline function skb_csum_is_sctp(), and uses it
to validate it's a sctp CRC csum offload packet, to make SCTP CRC csum
offload for tunneling packets supported in some HW drivers.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: ixgbevf: use skb_csum_is_sctp instead of protocol check

Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC
checksum offload packet, and yet it also makes ixgbevf support SCTP
CRC checksum offload for UDP and GRE encapped packets, just as it
does in igb driver.

Signed-off-by: Xin Long <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ixgbe: use skb_csum_is_sctp instead of protocol check

Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC
checksum offload packet, and yet it also makes ixgbe support SCTP
CRC checksum offload for UDP and GRE encapped packets, just as it
does in igb driver.

Signed-off-by: Xin Long <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: igc: use skb_csum_is_sctp instead of protocol check

Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC
checksum offload packet, and yet it also makes igc support SCTP
CRC checksum offload for UDP and GRE encapped packets, just as it
does in igb driver.

Signed-off-by: Xin Long <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: igbvf: use skb_csum_is_sctp instead of protocol check

Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC
checksum offload packet, and yet it also makes igbvf support SCTP
CRC checksum offload for UDP and GRE encapped packets, just as it
does in igb driver.

Signed-off-by: Xin Long <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: igb: use skb_csum_is_sctp instead of protocol check

Using skb_csum_is_sctp is a easier way to validate it's a SCTP
CRC checksum offload packet, and there is no need to parse the
packet to check its proto field, especially when it's a UDP or
GRE encapped packet.

So this patch also makes igb support SCTP CRC checksum offload
for UDP and GRE encapped packets.

Signed-off-by: Xin Long <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: add inline function skb_csum_is_sctp

This patch is to define a inline function skb_csum_is_sctp(), and
also replace all places where it checks if it's a SCTP CSUM skb.
This function would be used later in many networking drivers in
the following patches.

Suggested-by: Alexander Duyck <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mdio, phy: fix -Wshadow warnings triggered by nested container_of()

container_of() macro hides a local variable '__mptr' inside. This
becomes a problem when several container_of() are nested in each
other within single line or plain macros.
As C preprocessor doesn't support generating random variable names,
the sole solution is to avoid defining macros that consist only of
container_of() calls, or they will self-shadow '__mptr' each time:

In file included from ./include/linux/bitmap.h:10,
                 from drivers/net/phy/phy_device.c:12:
drivers/net/phy/phy_device.c: In function ‘phy_device_release’:
./include/linux/kernel.h:693:8: warning: declaration of ‘__mptr’ shadows a previous local [-Wshadow]
  693 |  void *__mptr = (void *)(ptr);     \
      |        ^~~~~~
./include/linux/phy.h:647:26: note: in expansion of macro ‘container_of’
  647 | #define to_phy_device(d) container_of(to_mdio_device(d), \
      |                          ^~~~~~~~~~~~
./include/linux/mdio.h:52:27: note: in expansion of macro ‘container_of’
   52 | #define to_mdio_device(d) container_of(d, struct mdio_device, dev)
      |                           ^~~~~~~~~~~~
./include/linux/phy.h:647:39: note: in expansion of macro ‘to_mdio_device’
  647 | #define to_phy_device(d) container_of(to_mdio_device(d), \
      |                                       ^~~~~~~~~~~~~~
drivers/net/phy/phy_device.c:217:8: note: in expansion of macro ‘to_phy_device’
  217 |  kfree(to_phy_device(dev));
      |        ^~~~~~~~~~~~~
./include/linux/kernel.h:693:8: note: shadowed declaration is here
  693 |  void *__mptr = (void *)(ptr);     \
      |        ^~~~~~
./include/linux/phy.h:647:26: note: in expansion of macro ‘container_of’
  647 | #define to_phy_device(d) container_of(to_mdio_device(d), \
      |                          ^~~~~~~~~~~~
drivers/net/phy/phy_device.c:217:8: note: in expansion of macro ‘to_phy_device’
  217 |  kfree(to_phy_device(dev));
      |        ^~~~~~~~~~~~~

As they are declared in header files, these warnings are highly
repetitive and very annoying (along with the one from linux/pci.h).

Convert the related macros from linux/{mdio,phy}.h to static inlines
to avoid self-shadowing and potentially improve bug-catching.
No functional changes implied.

Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

vhost_net: avoid tx queue stuck when sendmsg fails

Currently the driver doesn't drop a packet which can't be sent by tun
(e.g bad packet). In this case, the driver will always process the
same packet lead to the tx queue stuck.

To fix this issue:
1. in the case of persistent failure (e.g bad packet), the driver
can skip this descriptor by ignoring the error.
2. in the case of transient failure (e.g -ENOBUFS, -EAGAIN and -ENOMEM),
the driver schedules the worker to try again.

Signed-off-by: Yunjian Wang <[email protected]>
Acked-by: Jason Wang <[email protected]>
Acked-by: Willem de Bruijn <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: hns: fix variable used when DEBUG is defined

When DEBUG is defined this error occurs

drivers/net/ethernet/hisilicon/hns/hns_enet.c:1505:36: error:
  ‘struct net_device’ has no member named ‘ae_handle’;
  did you mean ‘rx_handler’?
  assert(skb->queue_mapping < ndev->ae_handle->q_num);
                                    ^~~~~~~~~

ae_handle is an element of struct hns_nic_priv, so change
ndev to priv.

Signed-off-by: Tom Rix <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

arcnet: fix macro name when DEBUG is defined

When DEBUG is defined this error occurs

drivers/net/arcnet/com20020_cs.c:70:15: error: ‘com20020_REG_W_ADDR_HI’
  undeclared (first use in this function);
  did you mean ‘COM20020_REG_W_ADDR_HI’?
       ioaddr, com20020_REG_W_ADDR_HI);
               ^~~~~~~~~~~~~~~~~~~~~~

From reviewing the context, the suggestion is what is meant.

Signed-off-by: Tom Rix <[email protected]>
Acked-by: Joe Perches <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'tls-device-offload-for-bond'

Tariq Toukan says:

====================
TLS device offload for Bond

This series opens TX and RX TLS device offload for bond interfaces.
This allows bond interfaces to benefit from capable lower devices.

We add a new ndo_sk_get_lower_dev() to be used to get the lower dev that
corresponds to a given socket.
The TLS module uses it to interact directly with the lowest device in
chain, and invoke the control operations in tlsdev_ops. This means that the
bond interface doesn't have his own struct tlsdev_ops instance and
derived logic/callbacks.

To keep simple track of the HW and SW TLS contexts, we bind each socket to
a specific lower device for the socket's whole lifetime. This is logically
valid (and similar to the SW kTLS behavior) in the following bond configuration,
so we restrict the offload support to it:

((mode == balance-xor) or (mode == 802.3ad))
and xmit_hash_policy == layer3+4.

In this design, TLS TX/RX offload feature flags of the bond device are
independent from the lower devices. They reflect the current features state,
but are not directly controllable.
This is because the bond driver is bypassed by the call to
ndo_sk_get_lower_dev(), without him knowing who the caller is.
The bond TLS feature flags are set/cleared only according to the configuration
of the mode and xmit_hash_policy.

Bypass is true only for the control flow. Packets in fast path still go through
the bond logic.

The design here differs from the xfrm/ipsec offload, where the bond driver
has his own copy of struct xfrmdev_ops and callbacks.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net/tls: Except bond interface from some TLS checks

In the tls_dev_event handler, ignore tlsdev_ops requirement for bond
interfaces, they do not exist as the interaction is done directly with
the lower device.

Also, make the validate function pass when it's called with the upper
bond interface.

Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Boris Pismenny <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net/tls: Device offload to use lowest netdevice in chain

Do not call the tls_dev_ops of upper devices. Instead, ask them
for the proper lowest device and communicate with it directly.

Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Boris Pismenny <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net/bonding: Declare TLS RX device offload support

Following the description in previous patch (for TX):
As the bond interface is being bypassed by the TLS module, interacting
directly against the lower devs, there is no way for the bond interface
to disable its device offload capabilities, as long as the mode/policy
config allows it.
Hence, the feature flag is not directly controllable, but just reflects
the offload status based on the logic under bond_sk_check().

Here we just declare RX device offload support, and expose it via the
NETIF_F_HW_TLS_RX flag.

Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Boris Pismenny <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net/bonding: Implement TLS TX device offload

Implement TLS TX device offload for bonding interfaces.
This allows kTLS sockets running on a bond to benefit from the
device offload on capable lower devices.

To allow a simple and fast maintenance of the TLS context in SW and
lower devices, we bind the TLS socket to a specific lower dev.
To achieve a behavior similar to SW kTLS, we support only balance-xor
and 802.3ad modes, with xmit_hash_policy=layer3+4. This is enforced
in bond_sk_check(), done in a previous patch.

For the above configuration, the SW implementation keeps picking the
same exact lower dev for all the socket's SKBs. The device offload
behaves similarly, making the decision once at the connection creation.

Per socket, the TLS module should work directly with the lowest netdev
in chain, to call the tls_dev_ops operations.

As the bond interface is being bypassed by the TLS module, interacting
directly against the lower devs, there is no way for the bond interface
to disable its device offload capabilities, as long as the mode/policy
config allows it.
Hence, the feature flag is not directly controllable, but just reflects
the current offload status based on the logic under bond_sk_check().

Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Boris Pismenny <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net/bonding: Take update_features call out of XFRM funciton

In preparation for more cases that call netdev_update_features().

While here, move the features logic to the stage where struct bond
is already updated, and pass it as the only parameter to function
bond_set_xfrm_features().

Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Boris Pismenny <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net/bonding: Implement ndo_sk_get_lower_dev

Add ndo_sk_get_lower_dev() implementation for bond interfaces.

Support only for the cases where the socket's and SKBs' hash
yields identical value for the whole connection lifetime.

Here we restrict it to L3+4 sockets only, with
xmit_hash_policy==LAYER34 and bond modes xor/802.3ad.

Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Boris Pismenny <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net/bonding: Take IP hash logic into a helper

Hash logic on L3 will be used in a downstream patch for one more use
case.
Take it to a function for a better code reuse.

Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Boris Pismenny <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: netdevice: Add operation ndo_sk_get_lower_dev

ndo_sk_get_lower_dev returns the lower netdev that corresponds to
a given socket.
Additionally, we implement a helper netdev_sk_get_lowest_dev() to get
the lowest one in chain.

Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Boris Pismenny <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net/qla3xxx: switch from 'pci_' to 'dma_' API

The wrappers in include/linux/pci-dma-compat.h should go away.

The patch has been generated with the coccinelle script below and has been
hand modified to replace GFP_ with a correct flag.
It has been compile tested.

When memory is allocated in 'ql_alloc_net_req_rsp_queues()' GFP_KERNEL can
be used because it is only called from 'ql_alloc_mem_resources()' which
already calls 'ql_alloc_buffer_queues()' which uses GFP_KERNEL. (see below)

When memory is allocated in 'ql_alloc_buffer_queues()' GFP_KERNEL can be
used because this flag is already used just a few line above.

When memory is allocated in 'ql_alloc_small_buffers()' GFP_KERNEL can
be used because it is only called from 'ql_alloc_mem_resources()' which
already calls 'ql_alloc_buffer_queues()' which uses GFP_KERNEL. (see above)

When memory is allocated in 'ql_alloc_mem_resources()' GFP_KERNEL can be
used because this function already calls 'ql_alloc_buffer_queues()' which
uses GFP_KERNEL. (see above)

While at it, use 'dma_set_mask_and_coherent()' instead of 'dma_set_mask()/
dma_set_coherent_mask()' in order to slightly simplify code.

@@
@@
-    PCI_DMA_BIDIRECTIONAL
+    DMA_BIDIRECTIONAL

@@
@@
-    PCI_DMA_TODEVICE
+    DMA_TO_DEVICE

@@
@@
-    PCI_DMA_FROMDEVICE
+    DMA_FROM_DEVICE

@@
@@
-    PCI_DMA_NONE
+    DMA_NONE

@@
expression e1, e2, e3;
@@
-    pci_alloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3;
@@
-    pci_zalloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3, e4;
@@
-    pci_free_consistent(e1, e2, e3, e4)
+    dma_free_coherent(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_single(e1, e2, e3, e4)
+    dma_map_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_single(e1, e2, e3, e4)
+    dma_unmap_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4, e5;
@@
-    pci_map_page(e1, e2, e3, e4, e5)
+    dma_map_page(&e1->dev, e2, e3, e4, e5)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_page(e1, e2, e3, e4)
+    dma_unmap_page(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_sg(e1, e2, e3, e4)
+    dma_map_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_sg(e1, e2, e3, e4)
+    dma_unmap_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_device(e1, e2, e3, e4)
+    dma_sync_single_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2;
@@
-    pci_dma_mapping_error(e1, e2)
+    dma_mapping_error(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_dma_mask(e1, e2)
+    dma_set_mask(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_consistent_dma_mask(e1, e2)
+    dma_set_coherent_mask(&e1->dev, e2)

Signed-off-by: Christophe JAILLET <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net_sched: fix RTNL deadlock again caused by request_module()

tcf_action_init_1() loads tc action modules automatically with
request_module() after parsing the tc action names, and it drops RTNL
lock and re-holds it before and after request_module(). This causes a
lot of troubles, as discovered by syzbot, because we can be in the
middle of batch initializations when we create an array of tc actions.

One of the problem is deadlock:

CPU 0 CPU 1
rtnl_lock();
for (...) {
  tcf_action_init_1();
    -> rtnl_unlock();
    -> request_module();
rtnl_lock();
for (...) {
  tcf_action_init_1();
    -> tcf_idr_check_alloc();
   // Insert one action into idr,
   // but it is not committed until
   // tcf_idr_insert_many(), then drop
   // the RTNL lock in the _next_
   // iteration
   -> rtnl_unlock();
    -> rtnl_lock();
    -> a_o->init();
      -> tcf_idr_check_alloc();
      // Now waiting for the same index
      // to be committed
    -> request_module();
    -> rtnl_lock()
    // Now waiting for RTNL lock
}
rtnl_unlock();
}
rtnl_unlock();

This is not easy to solve, we can move the request_module() before
this loop and pre-load all the modules we need for this netlink
message and then do the rest initializations. So the loop breaks down
to two now:

        for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) {
                struct tc_action_ops *a_o;

                a_o = tc_action_load_ops(name, tb[i]...);
                ops[i - 1] = a_o;
        }

        for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) {
                act = tcf_action_init_1(ops[i - 1]...);
        }

Although this looks serious, it only has been reported by syzbot, so it
seems hard to trigger this by humans. And given the size of this patch,
I'd suggest to make it to net-next and not to backport to stable.

This patch has been tested by syzbot and tested with tdc.py by me.

Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together")
Reported-and-tested-by: [email protected]
Reported-and-tested-by: [email protected]
Reported-by: [email protected]
Cc: Jiri Pirko <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Tested-by: Jamal Hadi Salim <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phy: national: remove definition of DEBUG

Defining DEBUG should only be done in development.
So remove DEBUG.

Signed-off-by: Tom Rix <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'net-make-udp-tunnel-devices-support-fraglist'

Xin Long says:

====================
net: make udp tunnel devices support fraglist

Like GRE device, UDP tunnel devices should also support fraglist, so
that some protocol (like SCTP) HW GSO that requires NETIF_F_FRAGLIST
in the dev can work. Especially when the lower device support both
NETIF_F_GSO_UDP_TUNNEL and NETIF_F_GSO_SCTP.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

bareudp: add NETIF_F_FRAGLIST flag for dev features

Like vxlan and geneve, bareudp also needs this dev feature
to support some protocol's HW GSO.

Signed-off-by: Xin Long <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

geneve: add NETIF_F_FRAGLIST flag for dev features

Some protocol HW GSO requires fraglist supported by the device, like
SCTP. Without NETIF_F_FRAGLIST set in the dev features of geneve, it
would have to do SW GSO before the packets enter the driver, even
when the geneve dev and lower dev (like veth) both have the feature
of NETIF_F_GSO_SCTP.

So this patch is to add it for geneve.

Signed-off-by: Xin Long <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

vxlan: add NETIF_F_FRAGLIST flag for dev features

Some protocol HW GSO requires fraglist supported by the device, like
SCTP. Without NETIF_F_FRAGLIST set in the dev features of vxlan, it
would have to do SW GSO before the packets enter the driver, even
when the vxlan dev and lower dev (like veth) both have the feature
of NETIF_F_GSO_SCTP.

So this patch is to add it for vxlan.

Signed-off-by: Xin Long <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

hv_netvsc: Add (more) validation for untrusted Hyper-V values

For additional robustness in the face of Hyper-V errors or malicious
behavior, validate all values that originate from packets that Hyper-V
has sent to the guest. Ensure that invalid values cannot cause indexing
off the end of an array, or subvert an existing validation via integer
overflow. Ensure that outgoing packets do not have any leftover guest
memory that has not been zeroed out.

Reported-by: Juan Vazquez <[email protected]>
Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: bridge: check vlan with eth_type_vlan() method

Replace some checks for ETH_P_8021Q and ETH_P_8021AD with
eth_type_vlan().

Signed-off-by: Menglong Dong <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'net-ipa-interconnect-improvements'

Alex Elder says:

====================
net: ipa: interconnect improvements

The main outcome of this series is to allow the number of
interconnects used by the IPA to differ from the three that
are implemented now. With this series in place, any number
of interconnects can now be used, all specified in the
configuration data for a specific platform.

A few minor interconnect-related cleanups are implemented as well.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: ipa: allow arbitrary number of interconnects

Currently we assume that the IPA hardware has exactly three
interconnects. But that won't be guaranteed for all platforms,
so allow any number of interconnects to be specified in the
configuration data.

For each platform, define an array of interconnect data entries
(still associated with the IPA clock structure), and record the
number of entries initialized in that array.

Loop over all entries in this array when initializing, enabling,
disabling, or tearing down the set of interconnects.

With this change we no longer need the ipa_interconnect_id
enumerated type, so get rid of it.

Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ipa: clean up interconnect initialization

Pass an the address of an IPA interconnect structure and its
configuration data to ipa_interconnect_init_one() and have that
function initialize all the structure's fields. Change the function
to simply return an error code.

Introduce ipa_interconnect_exit_one() to encapsulate the cleanup of
an IPA interconnect structure.

Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ipa: add interconnect name to configuration data

Add the name to the configuration data for each interconnect. Use
this information rather than a constant string during initialization.

Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ipa: store average and peak interconnect bandwidth

Add fields in the ipa_interconnect structure to hold the average and
peak bandwidth values for the interconnect. Pass the configuring
data for interconnects to ipa_interconnect_init() so these values
can be recorded, and use them when enabling the interconnects.

There's no longer any need to keep a copy of the interconnect data
after initialization.

Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ipa: introduce an IPA interconnect structure

Rather than having separate pointers for the memory, imem, and
config interconnect paths, maintain an array of ipa_interconnect
structures each of which contains a pointer to a path.

Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ipa: don't return an error from ipa_interconnect_disable()

If disabling interconnects fails there's not a lot we can do. The
only two callers of ipa_interconnect_disable() ignore the return
value, so just give the function a void return type.

Print an error message if disabling any of the interconnects is not
successful. Return (and print) only the first error seen.

Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ipa: rename interconnect settings

Use "bandwidth" rather than "rate" in describing the average and
peak values to use for IPA interconnects. They should have been
named that way to begin with.

Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'net-fix-the-features-flag-in-sctp_gso_segment'

Xin Long says:

====================
net: fix the features flag in sctp_gso_segment

Patch 1/2 is to improve the code in skb_segment(), and it is needed
by Patch 2/2.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

sctp: remove the NETIF_F_SG flag before calling skb_segment

It makes more sense to clear NETIF_F_SG instead of set it when
calling skb_segment() in sctp_gso_segment(), since SCTP GSO is
using head_skb's fraglist, of which all frags are linear skbs.

This will make SCTP GSO code more understandable.

Suggested-by: Alexander Duyck <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: move the hsize check to the else block in skb_segment

After commit 89319d3801d1 ("net: Add frag_list support to skb_segment"),
it goes to process frag_list when !hsize in skb_segment(). However, when
using skb frag_list, sg normally should not be set. In this case, hsize
will be set with len right before !hsize check, then it won't go to
frag_list processing code.

So the right thing to do is move the hsize check to the else block, so
that it won't affect the !hsize check for frag_list processing.

v1->v2:
- change to do "hsize <= 0" check instead of "!hsize", and also move
"hsize < 0" into else block, to save some cycles, as Alex suggested.

Signed-off-by: Xin Long <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: mscc: ocelot: Remove unneeded semicolon

fix semicolon.cocci warnings:
drivers/net/ethernet/mscc/ocelot_net.c:460:2-3: Unneeded semicolon

Signed-off-by: Xu Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

cxgb4: enable interrupt based Tx completions for T5

Enable interrupt based Tx completions to improve latency for T5.
The consumer index (CIDX) will now come via interrupts so that Tx
SKBs can be freed up sooner in Rx path. Also, enforce CIDX flush
threshold override (CIDXFTHRESHO) to improve latency for slow
traffic. This ensures that the interrupt is generated immediately
whenever hardware catches up with driver (i.e. CIDX == PIDX is
reached), which is often the case for slow traffic.

Signed-off-by: Raju Rangoju <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'rid-w-1-warnings-in-ethernet'

Lee Jones says:

====================
Rid W=1 warnings in Ethernet

This set is part of a larger effort attempting to clean-up W=1
kernel builds, which are currently overwhelmingly riddled with
niggly little warnings.

v2:
- Squashed IBM patches
- Fixed real issue in SMSC
- Added Andrew's Reviewed-by tags on remainder
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: ethernet: toshiba: spider_net: Document a whole bunch of function parameters

Fixes the following W=1 kernel build warning(s):

drivers/net/ethernet/toshiba/spider_net.c:263: warning: Function parameter or member 'hwdescr' not described in 'spider_net_get_descr_status'
drivers/net/ethernet/toshiba/spider_net.c:263: warning: Excess function parameter 'descr' description in 'spider_net_get_descr_status'
drivers/net/ethernet/toshiba/spider_net.c:554: warning: Function parameter or member 'netdev' not described in 'spider_net_get_multicast_hash'
drivers/net/ethernet/toshiba/spider_net.c:902: warning: Function parameter or member 't' not described in 'spider_net_cleanup_tx_ring'
drivers/net/ethernet/toshiba/spider_net.c:902: warning: Excess function parameter 'card' description in 'spider_net_cleanup_tx_ring'
drivers/net/ethernet/toshiba/spider_net.c:1074: warning: Function parameter or member 'card' not described in 'spider_net_resync_head_ptr'
drivers/net/ethernet/toshiba/spider_net.c:1234: warning: Function parameter or member 'napi' not described in 'spider_net_poll'
drivers/net/ethernet/toshiba/spider_net.c:1234: warning: Excess function parameter 'netdev' description in 'spider_net_poll'
drivers/net/ethernet/toshiba/spider_net.c:1278: warning: Function parameter or member 'p' not described in 'spider_net_set_mac'
drivers/net/ethernet/toshiba/spider_net.c:1278: warning: Excess function parameter 'ptr' description in 'spider_net_set_mac'
drivers/net/ethernet/toshiba/spider_net.c:1350: warning: Function parameter or member 'error_reg1' not described in 'spider_net_handle_error_irq'
drivers/net/ethernet/toshiba/spider_net.c:1350: warning: Function parameter or member 'error_reg2' not described in 'spider_net_handle_error_irq'
drivers/net/ethernet/toshiba/spider_net.c:1968: warning: Function parameter or member 't' not described in 'spider_net_link_phy'
drivers/net/ethernet/toshiba/spider_net.c:1968: warning: Excess function parameter 'data' description in 'spider_net_link_phy'
drivers/net/ethernet/toshiba/spider_net.c:2149: warning: Function parameter or member 'work' not described in 'spider_net_tx_timeout_task'
drivers/net/ethernet/toshiba/spider_net.c:2149: warning: Excess function parameter 'data' description in 'spider_net_tx_timeout_task'
drivers/net/ethernet/toshiba/spider_net.c:2182: warning: Function parameter or member 'txqueue' not described in 'spider_net_tx_timeout'

Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ethernet: toshiba: ps3_gelic_net: Fix some kernel-doc misdemeanours

Fixes the following W=1 kernel build warning(s):

drivers/net/ethernet/toshiba/ps3_gelic_net.c:1107: warning: Function parameter or member 'irq' not described in 'gelic_card_interrupt'
drivers/net/ethernet/toshiba/ps3_gelic_net.c:1107: warning: Function parameter or member 'ptr' not described in 'gelic_card_interrupt'
drivers/net/ethernet/toshiba/ps3_gelic_net.c:1407: warning: Function parameter or member 'txqueue' not described in 'gelic_net_tx_timeout'
drivers/net/ethernet/toshiba/ps3_gelic_net.c:1439: warning: Function parameter or member 'napi' not described in 'gelic_ether_setup_netdev_ops'
drivers/net/ethernet/toshiba/ps3_gelic_net.c:1639: warning: Function parameter or member 'dev' not described in 'ps3_gelic_driver_probe'
drivers/net/ethernet/toshiba/ps3_gelic_net.c:1795: warning: Function parameter or member 'dev' not described in 'ps3_gelic_driver_remove'

Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ethernet: ibm: ibmvnic: Fix some kernel-doc misdemeanours

Fixes the following W=1 kernel build warning(s):

from drivers/net/ethernet/ibm/ibmvnic.c:35:
inlined from ‘handle_vpd_rsp’ at drivers/net/ethernet/ibm/ibmvnic.c:4124:3:
drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'hdr_field' not described in 'build_hdr_data'
drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'skb' not described in 'build_hdr_data'
drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'hdr_len' not described in 'build_hdr_data'
drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'hdr_data' not described in 'build_hdr_data'
drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'hdr_field' not described in 'create_hdr_descs'
drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'hdr_data' not described in 'create_hdr_descs'
drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'len' not described in 'create_hdr_descs'
drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'hdr_len' not described in 'create_hdr_descs'
drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'scrq_arr' not described in 'create_hdr_descs'
drivers/net/ethernet/ibm/ibmvnic.c:1474: warning: Function parameter or member 'txbuff' not described in 'build_hdr_descs_arr'
drivers/net/ethernet/ibm/ibmvnic.c:1474: warning: Function parameter or member 'num_entries' not described in 'build_hdr_descs_arr'
drivers/net/ethernet/ibm/ibmvnic.c:1474: warning: Function parameter or member 'hdr_field' not described in 'build_hdr_descs_arr'
drivers/net/ethernet/ibm/ibmvnic.c:1832: warning: Function parameter or member 'adapter' not described in 'do_change_param_reset'
drivers/net/ethernet/ibm/ibmvnic.c:1832: warning: Function parameter or member 'rwi' not described in 'do_change_param_reset'
drivers/net/ethernet/ibm/ibmvnic.c:1832: warning: Function parameter or member 'reset_state' not described in 'do_change_param_reset'
drivers/net/ethernet/ibm/ibmvnic.c:1911: warning: Function parameter or member 'adapter' not described in 'do_reset'
drivers/net/ethernet/ibm/ibmvnic.c:1911: warning: Function parameter or member 'rwi' not described in 'do_reset'
drivers/net/ethernet/ibm/ibmvnic.c:1911: warning: Function parameter or member 'reset_state' not described in 'do_reset'

Signed-off-by: Lee Jones <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ethernet: ti: am65-cpts: Document am65_cpts_rx_enable()'s 'en' parameter

Fixes the following W=1 kernel build warning(s):

drivers/net/ethernet/ti/am65-cpts.c:736: warning: Function parameter or member 'en' not described in 'am65_cpts_rx_enable'
drivers/net/ethernet/ti/am65-cpts.c:736: warning: Excess function parameter 'skb' description in 'am65_cpts_rx_enable'

Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ethernet: ti: am65-cpsw-qos: Demote non-conformant function header

Fixes the following W=1 kernel build warning(s):

drivers/net/ethernet/ti/am65-cpsw-qos.c:364: warning: Function parameter or member 'ndev' not described in 'am65_cpsw_timer_set'
drivers/net/ethernet/ti/am65-cpsw-qos.c:364: warning: Function parameter or member 'est_new' not described in 'am65_cpsw_timer_set'

Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: xen-netback: xenbus: Demote nonconformant kernel-doc headers

Fixes the following W=1 kernel build warning(s):

drivers/net/xen-netback/xenbus.c:419: warning: Function parameter or member 'dev' not described in 'frontend_changed'
drivers/net/xen-netback/xenbus.c:419: warning: Function parameter or member 'frontend_state' not described in 'frontend_changed'
drivers/net/xen-netback/xenbus.c:1001: warning: Function parameter or member 'dev' not described in 'netback_probe'
drivers/net/xen-netback/xenbus.c:1001: warning: Function parameter or member 'id' not described in 'netback_probe'

Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ethernet: smsc: smc91x: Fix function name in kernel-doc header

Fixes the following W=1 kernel build warning(s):

drivers/net/ethernet/smsc/smc91x.c:2200: warning: Function parameter or member 'dev' not described in 'try_toggle_control_gpio'
drivers/net/ethernet/smsc/smc91x.c:2200: warning: Function parameter or member 'desc' not described in 'try_toggle_control_gpio'
drivers/net/ethernet/smsc/smc91x.c:2200: warning: Function parameter or member 'name' not described in 'try_toggle_control_gpio'
drivers/net/ethernet/smsc/smc91x.c:2200: warning: Function parameter or member 'index' not described in 'try_toggle_control_gpio'
drivers/net/ethernet/smsc/smc91x.c:2200: warning: Function parameter or member 'value' not described in 'try_toggle_control_gpio'
drivers/net/ethernet/smsc/smc91x.c:2200: warning: Function parameter or member 'nsdelay' not described in 'try_toggle_control_gpio'

Signed-off-by: Lee Jones <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

GTP: add support for flow based tunneling API

Following patch add support for flow based tunneling API
to send and recv GTP tunnel packet over tunnel metadata API.
This would allow this device integration with OVS or eBPF using
flow based tunneling APIs.

Signed-off-by: Pravin B Shelar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tcp_cubic: use memset and offsetof init

In bictcp_reset(), use memset and offsetof instead of = 0.

Signed-off-by: Yejune Deng <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: tap: check vlan with eth_type_vlan() method

Replace some checks for ETH_P_8021Q and ETH_P_8021AD in
drivers/net/tap.c with eth_type_vlan.

Signed-off-by: Menglong Dong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

nfc: netlink: use &w->w in nfc_genl_rcv_nl_event

Use the struct member w of the struct urelease_work directly instead of
casting it.

Signed-off-by: Geliang Tang <[email protected]>
Link: https://lore.kernel.org/r/f0ed86d6d54ac0834bd2e161d172bf7bb5647cf7.1610683862.git.geliangtang@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'configuring-congestion-watermarks-on-ocelot-switch-using-devlink-sb'

Vladimir Oltean says:

====================
Configuring congestion watermarks on ocelot switch using devlink-sb

In some applications, it is important to create resource reservations in
the Ethernet switches, to prevent background traffic, or deliberate
attacks, from inducing denial of service into the high-priority traffic.

These patches give the user some knobs to turn. The ocelot switches
support per-port and per-port-tc reservations, on ingress and on egress.
The resources that are monitored are packet buffers (in cells of 60
bytes each) and frame references.

The frames that exceed the reservations can optionally consume from
sharing watermarks which are not per-port but global across the switch.
There are 10 sharing watermarks, 8 of them are per traffic class and 2
are per drop priority.

I am configuring the hardware using the best of my knowledge, and mostly
through trial and error. Same goes for devlink-sb integration. Feedback
is welcome.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: mscc: ocelot: configure watermarks using devlink-sb

Using devlink-sb, we can configure 12/16 (the important 75%) of the
switch's controlling watermarks for congestion drops, and we can monitor
50% of the watermark occupancies (we can monitor the reservation
watermarks, but not the sharing watermarks, which are exposed as pool
sizes).

The following definitions can be made:

SB_BUF=0 # The devlink-sb for frame buffers
SB_REF=1 # The devlink-sb for frame references
POOL_ING=0 # The pool for ingress traffic. Both devlink-sb instances
# have one of these.
POOL_EGR=1 # The pool for egress traffic. Both devlink-sb instances
# have one of these.

Editing the hardware watermarks is done in the following way:
BUF_xxxx_I is accessed when sb=$SB_BUF and pool=$POOL_ING
REF_xxxx_I is accessed when sb=$SB_REF and pool=$POOL_ING
BUF_xxxx_E is accessed when sb=$SB_BUF and pool=$POOL_EGR
REF_xxxx_E is accessed when sb=$SB_REF and pool=$POOL_EGR

Configuring the sharing watermarks for COL_SHR(dp=0) is done implicitly
by modifying the corresponding pool size. By default, the pool size has
maximum size, so this can be skipped.

devlink sb pool set pci/0000:00:00.5 sb $SB_BUF pool $POOL_ING \
size 129840 thtype static

Since by default there is no buffer reservation, the above command has
maxed out BUF_COL_SHR_I(dp=0).

Configuring the per-port reservation watermark (P_RSRV) is done in the
following way:

devlink sb port pool set pci/0000:00:00.5/0 sb $SB_BUF \
pool $POOL_ING th 1000

The above command sets BUF_P_RSRV_I(port 0) to 1000 bytes. After this
command, the sharing watermarks are internally reconfigured with 1000
bytes less, i.e. from 129840 bytes to 128840 bytes.

Configuring the per-port-tc reservation watermarks (Q_RSRV) is done in
the following way:

for tc in {0..7}; do
devlink sb tc bind set pci/0000:00:00.5/0 sb 0 tc $tc \
type ingress pool $POOL_ING \
th 3000
done

The above command sets BUF_Q_RSRV_I(port 0, tc 0..7) to 3000 bytes.
The sharing watermarks are again reconfigured with 24000 bytes less.

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: mscc: ocelot: initialize watermarks to sane defaults

This is meant to be a gentle introduction into the world of watermarks
on ocelot. The code is placed in ocelot_devlink.c because it will be
integrated with devlink, even if it isn't right now.

My first step was intended to be to replicate the default configuration
of the congestion watermarks programatically, since they are now going
to be tuned by the user.

But after studying and understanding through trial and error how they
work, I now believe that the configuration used out of reset does not do
justice to the word "reservation", since the sum of all reservations
exceeds the total amount of resources (otherwise said, all reservations
cannot be fulfilled at the same time, which means that, contrary to the
reference manual, they don't guarantee anything).

As an example, here's a dump of the reservation watermarks for frame
buffers, for port 0 (for brevity, the ports 1-6 were omitted, but they
have the same configuration):

BUF_Q_RSRV_I(port 0, prio 0) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 1) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 2) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 3) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 4) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 5) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 6) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 7) = max 3000 bytes

Otherwise said, every port-tc has an ingress reservation of 3000 bytes,
and there are 7 ports in VSC9959 Felix (6 user ports and 1 CPU port).
Concentrating only on the ingress reservations, there are, in total,
8 [traffic classes] x 7 [ports] x 3000 [bytes] = 168,000 bytes of memory
reserved on ingress.
But, surprise, Felix only has 128 KB of packet buffer in total...
A similar thing happens with Seville, which has a larger packet buffer,
but also more ports, and the default configuration is also overcommitted.

This patch disables the (apparently) bogus reservations and moves all
resources to the shared area. This way, real reservations can be set up
by the user, using devlink-sb.

Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: mscc: ocelot: register devlink ports

Add devlink integration into the mscc_ocelot switchdev driver. All
physical ports (i.e. the unused ones as well) except the CPU port module
at ocelot->num_phys_ports are registered with devlink, and that requires
keeping the devlink_port structure outside struct ocelot_port_private,
since the latter has a 1:1 mapping with a struct net_device (which does
not exist for unused ports).

Since we use devlink_port_type_eth_set to link the devlink port to the
net_device, we can as well remove the .ndo_get_phys_port_name and
.ndo_get_port_parent_id implementations, since devlink takes care of
retrieving the port name and number automatically, once
.ndo_get_devlink_port is implemented.

Note that the felix DSA driver is already integrated with devlink by
default, since that is a thing that the DSA core takes care of. This is
the reason why these devlink stubs were put in ocelot_net.c and not in
the common library. It is also the reason why ocelot::devlink is a
pointer and not a full structure embedded inside struct ocelot: because
the mscc_ocelot driver allocates that by itself (as the container of
struct ocelot, in fact), but in the case of felix, it is DSA who
allocates the devlink, and felix just propagates the pointer towards
struct ocelot.

Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: mscc: ocelot: delete unused ocelot_set_cpu_port prototype

This is a leftover of commit 69df578c5f4b ("net: mscc: ocelot: eliminate
confusion between CPU and NPI port") which renamed that function.

Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: mscc: ocelot: export NUM_TC constant from felix to common switch lib

We should be moving anything that isn't DSA-specific or SoC-specific out
of the felix DSA driver, and into the common mscc_ocelot switch library.

The number of traffic classes is one of the aspects that is common
between all ocelot switches, so it belongs in the library.

This patch also makes seville use 8 TX queues, and therefore enables
prioritization via the QOS_CLASS field in the NPI injection header.

Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: felix: perform teardown in reverse order of setup

In general it is desirable that cleanup is the reverse process of setup.
In this case I am not seeing any particular issue, but with the
introduction of devlink-sb for felix, a non-obvious decision had to be
made as to where to put its cleanup method. When there's a convention in
place, that decision becomes obvious.

Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: felix: reindent struct dsa_switch_ops

The devlink function pointer names are super long, and they would break
the alignment. So reindent the existing ops now by adding one tab.

Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: add ops for devlink-sb

Switches that care about QoS might have hardware support for reserving
buffer pools for individual ports or traffic classes, and configuring
their sizes and thresholds. Through devlink-sb (shared buffers), this is
all configurable, as well as their occupancy being viewable.

Add the plumbing in DSA for these operations.

Individual drivers still need to call devlink_sb_register() with the
shared buffers they want to expose. A helper was not created in DSA for
this purpose (unlike, say, dsa_devlink_params_register), since in my
opinion it does not bring any benefit over plainly calling
devlink_sb_register() directly.

Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: mscc: ocelot: add ops for decoding watermark threshold and occupancy

We'll need to read back the watermark thresholds and occupancy from
hardware (for devlink-sb integration), not only to write them as we did
so far in ocelot_port_set_maxlen. So introduce 2 new functions in struct
ocelot_ops, similar to wm_enc, and implement them for the 3 supported
mscc_ocelot switches.

Remove the INUSE and MAXUSE unpacking helpers for the QSYS_RES_STAT
register, because that doesn't scale with the number of switches that
mscc_ocelot supports now. They have different bit widths for the
watermarks, and we need function pointers to abstract that difference
away.

Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: mscc: ocelot: auto-detect packet buffer size and number of frame references

Instead of reading these values from the reference manual and writing
them down into the driver, it appears that the hardware gives us the
option of detecting them dynamically.

The number of frame references corresponds to what the reference manual
notes, however it seems that the frame buffers are reported as slightly
less than the books would indicate. On VSC9959 (Felix), the books say it
should have 128KB of packet buffer, but the registers indicate only
129840 bytes (126.79 KB). Also, the unit of measurement for FREECNT from
the documentation of all these devices is incorrect (taken from an older
generation). This was confirmed by Younes Leroul from Microchip support.

Not having anything better to do with these values at the moment* (this
will change soon), let's just print them.

*The frame buffer size is, in fact, used to calculate the tail dropping
watermarks.

Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2021-01-16

1) Extend atomic operations to the BPF instruction set along with x86-64 JIT support,
   that is, atomic{,64}_{xchg,cmpxchg,fetch_{add,and,or,xor}}, from Brendan Jackman.

2) Add support for using kernel module global variables (__ksym externs in BPF
   programs) retrieved via module's BTF, from Andrii Nakryiko.

3) Generalize BPF stackmap's buildid retrieval and add support to have buildid
   stored in mmap2 event for perf, from Jiri Olsa.

4) Various fixes for cross-building BPF sefltests out-of-tree which then will
   unblock wider automated testing on ARM hardware, from Jean-Philippe Brucker.

5) Allow to retrieve SOL_SOCKET opts from sock_addr progs, from Daniel Borkmann.

6) Clean up driver's XDP buffer init and split into two helpers to init per-
   descriptor and non-changing fields during processing, from Lorenzo Bianconi.

7) Minor misc improvements to libbpf & bpftool, from Ian Rogers.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (41 commits)
  perf: Add build id data in mmap2 event
  bpf: Add size arg to build_id_parse function
  bpf: Move stack_map_get_build_id into lib
  bpf: Document new atomic instructions
  bpf: Add tests for new BPF atomic operations
  bpf: Add bitwise atomic instructions
  bpf: Pull out a macro for interpreting atomic ALU operations
  bpf: Add instructions for atomic_[cmp]xchg
  bpf: Add BPF_FETCH field / create atomic_fetch_add instruction
  bpf: Move BPF_STX reserved field check into BPF_STX verifier code
  bpf: Rename BPF_XADD and prepare to encode other atomics in .imm
  bpf: x86: Factor out a lookup table for some ALU opcodes
  bpf: x86: Factor out emission of REX byte
  bpf: x86: Factor out emission of ModR/M for *(reg + off)
  tools/bpftool: Add -Wall when building BPF programs
  bpf, libbpf: Avoid unused function warning on bpf_tail_call_static
  selftests/bpf: Install btf_dump test cases
  selftests/bpf: Fix installation of urandom_read
  selftests/bpf: Move generated test files to $(TEST_GEN_FILES)
  selftests/bpf: Fix out-of-tree build
  ...
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: ks8851: remove definition of DEBUG

Defining DEBUG should only be done in development.
So remove DEBUG.

Signed-off-by: Tom Rix <[email protected]>
Reviewed-by: Marek Vasut <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

neighbor: remove definition of DEBUG

Defining DEBUG should only be done in development.
So remove DEBUG.

Signed-off-by: Tom Rix <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

gianfar: remove definition of DEBUG

Defining DEBUG should only be done in development.
So remove DEBUG.

Signed-off-by: Tom Rix <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: set configure_vlan_while_not_filtering to true by default

As explained in commit 54a0ed0df496 ("net: dsa: provide an option for
drivers to always receive bridge VLANs"), DSA has historically been
skipping VLAN switchdev operations when the bridge wasn't in
vlan_filtering mode, but the reason why it was doing that has never been
clear. So the configure_vlan_while_not_filtering option is there merely
to preserve functionality for existing drivers. It isn't some behavior
that drivers should opt into. Ideally, when all drivers leave this flag
set, we can delete the dsa_port_skip_vlan_configuration() function.

New drivers always seem to omit setting this flag, for some reason. So
let's reverse the logic: the DSA core sets it by default to true before
the .setup() callback, and legacy drivers can turn it off. This way, new
drivers get the new behavior by default, unless they explicitly set the
flag to false, which is more obvious during review.

Remove the assignment from drivers which were setting it to true, and
add the assignment to false for the drivers that didn't previously have
it. This way, it should be easier to see how many we have left.

The following drivers: lan9303, mv88e6060 were skipped from setting this
flag to false, because they didn't have any VLAN offload ops in the
first place.

The Broadcom Starfighter 2 driver calls the common b53_switch_alloc and
therefore also inherits the configure_vlan_while_not_filtering=true
behavior.

Also, print a message through netlink extack every time a VLAN has been
skipped. This is mildly annoying on purpose, so that (a) it is at least
clear that VLANs are being skipped - the legacy behavior in itself is
confusing, and the extack should be much more difficult to miss, unlike
kernel logs - and (b) people have one more incentive to convert to the
new behavior.

No behavior change except for the added prints is intended at this time.

$ ip link add br0 type bridge vlan_filtering 0
$ ip link set sw0p2 master br0
[   60.315148] br0: port 1(sw0p2) entered blocking state
[   60.320350] br0: port 1(sw0p2) entered disabled state
[   60.327839] device sw0p2 entered promiscuous mode
[   60.334905] br0: port 1(sw0p2) entered blocking state
[   60.340142] br0: port 1(sw0p2) entered forwarding state
Warning: dsa_core: skipping configuration of VLAN. # This was the pvid
$ bridge vlan add dev sw0p2 vid 100
Warning: dsa_core: skipping configuration of VLAN.

Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Kurt Kanzenbach <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

netxen_nic: switch from 'pci_' to 'dma_' API

The wrappers in include/linux/pci-dma-compat.h should go away.

The patch has been generated with the coccinelle script below and has been
hand modified to replace GFP_ with a correct flag.
It has been compile tested.

When memory is allocated in 'netxen_get_minidump_template()' GFP_KERNEL can
be used because its only caller, ' netxen_setup_minidump(()' already uses
it and no lock is acquired in the between.

When memory is allocated in other function in 'netxen_nic_ctx.c' GFP_KERNEL
can be used because the call chain already uses GFP_KERNEL and no lock is
taken in the between.
The call chain is:
  netxen_nic_attach()
    --> netxen_alloc_sw_resources()          : already uses GFP_KERNEL
    --> netxen_alloc_hw_resources()
      --> nx_fw_cmd_create_rx_ctx()
      --> nx_fw_cmd_create_tx_ctx()

When memory is allocated in 'netxen_init_dummy_dma()' GFP_KERNEL can be
used because its only call chain already uses it and no lock is acquired in
the between.
The call chain is:
  --> netxen_start_firmware
    --> netxen_request_firmware()
      --> request_firmware()
        --> _request_firmware(()
          --> fw_get_filesystem_firmware()
            --> __getname()                  : already uses GFP_KERNEL
    --> netxen_init_dummy_dma()

@@
@@
-    PCI_DMA_BIDIRECTIONAL
+    DMA_BIDIRECTIONAL

@@
@@
-    PCI_DMA_TODEVICE
+    DMA_TO_DEVICE

@@
@@
-    PCI_DMA_FROMDEVICE
+    DMA_FROM_DEVICE

@@
@@
-    PCI_DMA_NONE
+    DMA_NONE

@@
expression e1, e2, e3;
@@
-    pci_alloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3;
@@
-    pci_zalloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3, e4;
@@
-    pci_free_consistent(e1, e2, e3, e4)
+    dma_free_coherent(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_single(e1, e2, e3, e4)
+    dma_map_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_single(e1, e2, e3, e4)
+    dma_unmap_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4, e5;
@@
-    pci_map_page(e1, e2, e3, e4, e5)
+    dma_map_page(&e1->dev, e2, e3, e4, e5)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_page(e1, e2, e3, e4)
+    dma_unmap_page(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_sg(e1, e2, e3, e4)
+    dma_map_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_sg(e1, e2, e3, e4)
+    dma_unmap_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_device(e1, e2, e3, e4)
+    dma_sync_single_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2;
@@
-    pci_dma_mapping_error(e1, e2)
+    dma_mapping_error(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_dma_mask(e1, e2)
+    dma_set_mask(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_consistent_dma_mask(e1, e2)
+    dma_set_coherent_mask(&e1->dev, e2)

Signed-off-by: Christophe JAILLET <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'net-dsa-mv88e6xxx-lag-fixes'

Tobias Waldekranz says:

====================
net: dsa: mv88e6xxx: LAG fixes

The kernel test robot kindly pointed out that Global 2 support in
mv88e6xxx is optional.

This also made me realize that we should verify that the hardware
actually supports LAG offloading before trying to configure it.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>