Git Repo - linux.git/log

ice: devlink: add shadow-ram region to snapshot Shadow RAM

We have a region for reading the contents of the NVM flash as
a snapshot. This region does not allow reading the Shadow RAM, as it
always passes the FLASH_ONLY bit to the low level firmware interface.

Add a separate shadow-ram region which will allow snapshot of the
current contents of the Shadow RAM. This data is built from the NVM
contents but is distinct as the device builds up the Shadow RAM during
initialization, so being able to snapshot its contents can be useful
when attempting to debug flash related issues.

Fix the comment description of the nvm-flash region which incorrectly
stated that it filled the shadow-ram region, and add a comment
explaining that the nvm-flash region does not actually read the Shadow
RAM.

Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ethtool: always write dev in ethnl_parse_header_dev_get

Commit 0976b888a150 ("ethtool: fix null-ptr-deref on ref tracker")
made the write to req_info.dev conditional, but as Eric points out
in a different follow up the structure is often allocated on the
stack and not kzalloc()'d so seems safer to always write the dev,
in case it's garbage on input.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: add net device refcount tracker to struct packet_type

Most notable changes are in af_packet, tipc ones are trivial.

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Jon Maloy <[email protected]>
Cc: Ying Xue <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'mlxsw-ipv6-underlay'

Ido Schimmel says:

====================
mlxsw: Add support for VxLAN with IPv6 underlay

So far, mlxsw only supported VxLAN with IPv4 underlay. This patchset
extends mlxsw to also support VxLAN with IPv6 underlay. The main
difference is related to the way IPv6 addresses are handled by the
device. See patch #1 for a detailed explanation.

Patch #1 creates a common hash table to store the mapping from IPv6
addresses to KVDL indexes. This table is useful for both IP-in-IP and
VxLAN tunnels with an IPv6 underlay.

Patch #2 converts the IP-in-IP code to use the new hash table.

Patches #3-#6 are preparations.

Patch #7 finally adds support for VxLAN with IPv6 underlay.

Patch #8 removes a test case that checked that VxLAN configurations with
IPv6 underlay are vetoed by the driver.

A follow-up patchset will add forwarding selftests.
====================

Signed-off-by: David S. Miller <[email protected]>

selftests: mlxsw: vxlan: Remove IPv6 test case

Currently, there is a test case to verify that VxLAN with IPv6 underlay
is forbidden.

Remove this test case as support for VxLAN with IPv6 underlay was added
by the previous patch.

Signed-off-by: Amit Cohen <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: Add support for VxLAN with IPv6 underlay

Currently, mlxsw driver supports VxLAN with IPv4 underlay only.
Add support for IPv6 underlay.

The main differences are:

* Learning is not supported for IPv6 FDB entries, use static entries and
  do not allow 'learning' flag for IPv6 VxLAN.

* IPv6 addresses for FDB entries should be saved as part of KVDL.
  Use the new API to allocate and release entries for IPv6 addresses.

* Spectrum ASICs do not fill UDP checksum, while in software IPv6 UDP
  packets with checksum zero are dropped.
  Force the relevant flags which allow the VxLAN device to generate UDP
  packets with zero checksum and also receive them.

Signed-off-by: Amit Cohen <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_nve: Keep track of IPv6 addresses used by FDB entries

FDB entries that perform VxLAN encapsulation with an IPv6 underlay hold
a reference on a resource. Namely, the KVDL entry where the IPv6
underlay destination IP is stored. When such an FDB entry is deleted, it
needs to drop the reference from the corresponding KVDL entry.

To that end, maintain a hash table that maps an FDB entry (i.e., {MAC,
FID}) to the IPv6 address used by it.

Signed-off-by: Amit Cohen <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: reg: Add a function to fill IPv6 unicast FDB entries

Add a function to fill IPv6 unicast FDB entries. Use the common function
for common fields.

Unlike IPv4 entries, the underlay IP address is not filled in the
register payload, but instead a pointer to KVDL is used.

Signed-off-by: Amit Cohen <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: Split handling of FDB tunnel entries between address families

Currently, the function which adds/removes unicast tunnel FDB entries is
shared between IPv4 and IPv6, while for IPv6 it warns because there is
no support for it.

The code for IPv6 will be more complicated because it needs to
allocate/release a KVDL pointer for the underlay IPv6 address.

As a preparation for IPv6 underlay support, split the code according to
address family.

Signed-off-by: Amit Cohen <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_nve_vxlan: Make VxLAN flags check per address family

As part of 'can_offload' checks, there is a check of VxLAN flags.

The supported flags for IPv6 VxLAN will be different from the existing
flags because of some limitations.

As preparation for IPv6 underlay support, make this check per address
family.

Signed-off-by: Amit Cohen <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_ipip: Use common hash table for IPv6 address mapping

Use the common hash table introduced by the previous patch instead of
the IP-in-IP specific implementation.

Signed-off-by: Amit Cohen <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum: Add hash table for IPv6 address mapping

The device supports forwarding entries such as routes and FDBs that
perform tunnel (e.g., VXLAN, IP-in-IP) encapsulation or decapsulation.
When the underlay is IPv6, these entries do not encode the 128 bit IPv6
address used for encapsulation / decapsulation. Instead, these entries
encode a 24 bit pointer to an array called KVDL where the IPv6 address
is stored.

Currently, only IP-in-IP with IPv6 underlay is supported, but subsequent
patches will add support for VxLAN with IPv6 underlay. To avoid
duplicating the logic required to store and retrieve these IPv6
addresses, introduce a hash table that will store the mapping between
IPv6 addresses and their KVDL index.

Signed-off-by: Amit Cohen <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'mlx5-updates-2021-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saed Mahameed says:

====================
mlx5-updates-2021-12-14

Parsing Infrastructure for TC actions:

The series introduce a TC action infrastructure to help
parsing TC actions in a generic way for both FDB and NIC rules.

To help maintain the parsing code of TC actions, we the parsing code to
action parser per action TC type in separate files, instead of having one
big switch case loop, duplicated between FDB and NIC parsers as before this
patchset.

Each TC flow_action->id is represented by a dedicated mlx5e_tc_act handler
which has callbacks to check if the specific action is offload supported and
to parse the specific action.

We move each case (TC action) handling into the specific handler, which is
responsible for parsing and determining if the action is supported.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2021-12-14

This series contains updates to ice driver only.

Haiyue adds support to query hardware for supported PTYPEs.

Jeff changes PTYPE validation to utilize the capabilities queried from
the hardware instead of maintaining a per DDP support list.

Brett refactors promiscuous functions to provide common and clear
interfaces to call for configuration.

Wojciech modifies DDP package load to simplify determining the final
state of the load.

Tony removes the use of ice_status from the driver. This involves
removing string conversion functions, converting variables and values to
standard errors, and clean up. He also removes an unused define.

Dan Carpenter removes unneeded casts.
====================

Signed-off-by: David S. Miller <[email protected]>

net: fec: fix system hang during suspend/resume

1. During normal suspend (WoL not enabled) process, system has posibility
to hang. The root cause is TXF interrupt coming after clocks disabled,
system hang when accessing registers from interrupt handler. To fix this
issue, disable all interrupts when system suspend.

2. System also has posibility to hang with WoL enabled during suspend,
after entering stop mode, then magic pattern coming after clocks
disabled, system will be waked up, and interrupt handler will be called,
system hang when access registers. To fix this issue, disable wakeup
irq in .suspend(), and enable it in .resume().

Signed-off-by: Joakim Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ocelot: add support to get port mac from device-tree

Add support to get mac from device-tree using of_get_ethdev_address.

Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: Clément Léger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sun4i-emac.c: remove unnecessary branch

According to the current implementation of emac_rx, every arrived packet
will be processed in the while loop. So, there is no remain packet last
time. The skb_last field and this branch for dealing with it is
unnecessary.

Signed-off-by: Conley Lee <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ethtool: use ethnl_parse_header_dev_put()

It seems I missed that most ethnl_parse_header_dev_get() callers
declare an on-stack struct ethnl_req_info, and that they simply call
dev_put(req_info.dev) when about to return.

Add ethnl_parse_header_dev_put() helper to properly untrack
reference taken by ethnl_parse_header_dev_get().

Fixes: e4b8954074f6 ("netlink: add net device refcount tracker to struct ethnl_req_info")
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/mlx5e: Move goto action checks into tc_action goto post parse op

Move goto action checks from parse nic/fdb funcs into the tc action
infra goto post parse op.
While moving this part also use NL_SET_ERR_MSG_MOD() instead of
NL_SET_ERR_MSG().

Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Move vlan action chunk into tc action vlan post parse op

Move vlan prio tag rewrite handling into tc action infra vlan post parse op.

Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Add post_parse() op to tc action infrastructure

The post_parse() op should be called after the parse op was called
for all actions. It could be an action state is dependent on other
actions. In the new op an action can fail the parse if the state
is not valid anymore.

Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Move sample attr allocation to tc_action sample parse op

There is no reason to wait with the kmalloc to after parsing all
other actions. There could still be a failure later and before
offloading the rule. So alloc the mem when parsing.
The memory is being released on mlx5e_flow_put() which is called
also on error flow.

Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: TC action parsing loop

Introduce a common function to implement the generic parsing loop.
The same function can be used for parsing NIC and FDB (Switchdev mode) flows.

Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Add redirect ingress to tc action infra

Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Add sample and ptype to tc_action infra

Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Add ct to tc action infra

Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Add mirred/redirect to tc action infra

Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Add mpls push/pop to tc action infra

Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Add vlan push/pop/mangle to tc action infra

Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Add pedit to tc action infra

Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Add csum to tc action infra

Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Add tunnel encap/decap to tc action infra

Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Add goto to tc action infra

Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Add tc action infrastructure

Add an infrastructure to help parsing tc actions in a generic way.

Supporting an action parser means implementing struct mlx5e_tc_act
for that action.

The infrastructure will give the possibility to be generic when parsing tc
actions, i.e. parse_tc_nic_actions() and parse_tc_fdb_actions().
To parse tc actions a user needs to allocate a parse_state instance
and pass it when iterating over the tc actions parsers.
If a parser doesn't exists then a user can treat it as unsupported.

To add an action parser a user needs to implement two callbacks.
The can_offload() callback to quickly check if an action can be offloaded.
The parse_action() callback to do actual parsing and prepare for offload.

Add implementation for drop, trap, mark and accept action parsers with this
commit to act as examples and implement usage of the new infrastructure for
those actions.

Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

Merge branch 'net-dsa-hellcreek-fix-handling-of-mgmt-protocols'

Kurt Kanzenbach says:

====================
net: dsa: hellcreek: Fix handling of MGMT protocols

this series fixes some minor issues with regards to management protocols
such as PTP and STP in the hellcreek DSA driver. Configure static FDB
for these protocols. The end result is:

|root@tsn:~# mv88e6xxx_dump --atu
|Using device <platform/ff240000.switch>
|ATU:
|FID  MAC               0123 Age OBT Pass Static Reprio Prio
|   0 01:1b:19:00:00:00 1100   1               X       X    6
|   1 01:00:5e:00:01:81 1100   1               X       X    6
|   2 33:33:00:00:01:81 1100   1               X       X    6
|   3 01:80:c2:00:00:0e 1100   1        X      X       X    6
|   4 01:00:5e:00:00:6b 1100   1        X      X       X    6
|   5 33:33:00:00:00:6b 1100   1        X      X       X    6
|   6 01:80:c2:00:00:00 1100   1        X      X       X    6

Previous version:
* https://lore.kernel.org/r/20211213101810 [email protected]/

Changes since v1:
* Target net-next, as this never worked correctly and is not critical
* Add STP and PTP over UDP rules
* Use pass_blocked for PDelay messages only (Richard Cochran)
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: hellcreek: Add missing PTP via UDP rules

The switch supports PTP for UDP transport too. Therefore, add the missing static
FDB entries to ensure correct forwarding of these packets.

Fixes: ddd56dfe52c9 ("net: dsa: hellcreek: Add PTP clock support")
Signed-off-by: Kurt Kanzenbach <[email protected]>
Acked-by: Richard Cochran <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: hellcreek: Allow PTP P2P measurements on blocked ports

Allow PTP peer delay measurements on blocked ports by STP. In case of topology
changes the PTP stack can directly start with the correct delays.

Fixes: ddd56dfe52c9 ("net: dsa: hellcreek: Add PTP clock support")
Signed-off-by: Kurt Kanzenbach <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Acked-by: Richard Cochran <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: hellcreek: Add STP forwarding rule

Treat STP as management traffic. STP traffic is designated for the CPU port
only. In addition, STP traffic has to pass blocked ports.

Fixes: e4b27ebc780f ("net: dsa: Add DSA driver for Hirschmann Hellcreek switches")
Signed-off-by: Kurt Kanzenbach <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Acked-by: Richard Cochran <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: hellcreek: Fix insertion of static FDB entries

The insertion of static FDB entries ignores the pass_blocked bit. That bit is
evaluated with regards to STP. Add the missing functionality.

Fixes: e4b27ebc780f ("net: dsa: Add DSA driver for Hirschmann Hellcreek switches")
Signed-off-by: Kurt Kanzenbach <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Acked-by: Richard Cochran <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dev_replace_track() cleanup

Use existing helpers (netdev_tracker_free()
and netdev_tracker_alloc()) to remove ifdefery.

Signed-off-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: linkwatch: be more careful about dev->linkwatch_dev_tracker

Apparently a concurrent linkwatch_add_event() could
run while we are in __linkwatch_run_queue().

We need to free dev->linkwatch_dev_tracker tracker
under lweventlist_lock protection to avoid this race.

syzbot report:
[   77.935949][ T3661] reference already released.
[   77.941015][ T3661] allocated in:
[   77.944482][ T3661]  linkwatch_fire_event+0x202/0x260
[   77.950318][ T3661]  netif_carrier_on+0x9c/0x100
[   77.955120][ T3661]  __ieee80211_sta_join_ibss+0xc52/0x1590
[   77.960888][ T3661]  ieee80211_sta_create_ibss.cold+0xd2/0x11f
[   77.966908][ T3661]  ieee80211_ibss_work.cold+0x30e/0x60f
[   77.972483][ T3661]  ieee80211_iface_work+0xb70/0xd00
[   77.977715][ T3661]  process_one_work+0x9ac/0x1680
[   77.982671][ T3661]  worker_thread+0x652/0x11c0
[   77.987371][ T3661]  kthread+0x405/0x4f0
[   77.991465][ T3661]  ret_from_fork+0x1f/0x30
[   77.995895][ T3661] freed in:
[   77.999006][ T3661]  linkwatch_do_dev+0x96/0x160
[   78.004014][ T3661]  __linkwatch_run_queue+0x233/0x6a0
[   78.009496][ T3661]  linkwatch_event+0x4a/0x60
[   78.014099][ T3661]  process_one_work+0x9ac/0x1680
[   78.019034][ T3661]  worker_thread+0x652/0x11c0
[   78.023719][ T3661]  kthread+0x405/0x4f0
[   78.027810][ T3661]  ret_from_fork+0x1f/0x30
[   78.042541][ T3661] ------------[ cut here ]------------
[   78.048253][ T3661] WARNING: CPU: 0 PID: 3661 at lib/ref_tracker.c:120 ref_tracker_free.cold+0x110/0x14e
[   78.062364][ T3661] Modules linked in:
[   78.066424][ T3661] CPU: 0 PID: 3661 Comm: kworker/0:5 Not tainted 5.16.0-rc4-next-20211210-syzkaller #0
[   78.076075][ T3661] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[   78.090648][ T3661] Workqueue: events linkwatch_event
[   78.095890][ T3661] RIP: 0010:ref_tracker_free.cold+0x110/0x14e
[   78.102191][ T3661] Code: ea 03 48 c1 e0 2a 0f b6 04 02 84 c0 74 04 3c 03 7e 4c 8b 7b 18 e8 6b 54 e9 fa e8 26 4d 57 f8 4c 89 ee 48 89 ef e8 fb 33 36 00 <0f> 0b 41 bd ea ff ff ff e9 bd 60 e9 fa 4c 89 f7 e8 16 45 a2 f8 e9
[   78.127211][ T3661] RSP: 0018:ffffc90002b5fb18 EFLAGS: 00010246
[   78.133684][ T3661] RAX: 0000000000000000 RBX: ffff88807467f700 RCX: 0000000000000000
[   78.141928][ T3661] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000001
[   78.150087][ T3661] RBP: ffff888057e105b8 R08: 0000000000000001 R09: ffffffff8ffa1967
[   78.158211][ T3661] R10: 0000000000000001 R11: 0000000000000000 R12: 1ffff9200056bf65
[   78.166204][ T3661] R13: 0000000000000292 R14: ffff88807467f718 R15: 00000000c0e0008c
[   78.174321][ T3661] FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
[   78.183310][ T3661] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   78.190156][ T3661] CR2: 000000c000208800 CR3: 000000007f7b5000 CR4: 00000000003506f0
[   78.198235][ T3661] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   78.206214][ T3661] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   78.214328][ T3661] Call Trace:
[   78.217679][ T3661]  <TASK>
[   78.220621][ T3661]  ? __sanitizer_cov_trace_const_cmp4+0x1c/0x70
[   78.226981][ T3661]  ? nlmsg_notify+0xbe/0x280
[   78.231607][ T3661]  ? ref_tracker_dir_exit+0x330/0x330
[   78.237654][ T3661]  ? linkwatch_do_dev+0x96/0x160
[   78.242628][ T3661]  ? __linkwatch_run_queue+0x233/0x6a0
[   78.248170][ T3661]  ? linkwatch_event+0x4a/0x60
[   78.252946][ T3661]  ? process_one_work+0x9ac/0x1680
[   78.258136][ T3661]  ? worker_thread+0x853/0x11c0
[   78.263020][ T3661]  ? kthread+0x405/0x4f0
[   78.267905][ T3661]  ? ret_from_fork+0x1f/0x30
[   78.272670][ T3661]  ? netdev_state_change+0xa1/0x130
[   78.278019][ T3661]  ? netdev_exit+0xd0/0xd0
[   78.282466][ T3661]  ? dev_activate+0x420/0xa60
[   78.287261][ T3661]  linkwatch_do_dev+0x96/0x160
[   78.292043][ T3661]  __linkwatch_run_queue+0x233/0x6a0
[   78.297505][ T3661]  ? linkwatch_do_dev+0x160/0x160
[   78.302561][ T3661]  linkwatch_event+0x4a/0x60
[   78.307225][ T3661]  process_one_work+0x9ac/0x1680
[   78.312292][ T3661]  ? pwq_dec_nr_in_flight+0x2a0/0x2a0
[   78.317757][ T3661]  ? rwlock_bug.part.0+0x90/0x90
[   78.322726][ T3661]  ? _raw_spin_lock_irq+0x41/0x50
[   78.327844][ T3661]  worker_thread+0x853/0x11c0
[   78.332543][ T3661]  ? process_one_work+0x1680/0x1680
[   78.338500][ T3661]  kthread+0x405/0x4f0
[   78.342610][ T3661]  ? set_kthread_struct+0x130/0x130

Fixes: 63f13937cbe9 ("net: linkwatch: add net device refcount tracker")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

mptcp: adjust to use netns refcount tracker

MPTCP can change sk_net_refcnt after sock_create_kern() call.

We need to change its corresponding get_net() to avoid
a splat at release time, as in :

refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 0 PID: 3599 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31
Modules linked in:
CPU: 1 PID: 3599 Comm: syz-fuzzer Not tainted 5.16.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31
Code: 1d b1 99 a1 09 31 ff 89 de e8 5d 3a 9c fd 84 db 75 e0 e8 74 36 9c fd 48 c7 c7 60 00 05 8a c6 05 91 99 a1 09 01 e8 cc 4b 27 05 <0f> 0b eb c4 e8 58 36 9c fd 0f b6 1d 80 99 a1 09 31 ff 89 de e8 28
RSP: 0018:ffffc90001f5fab0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff888021873a00 RSI: ffffffff815f1e28 RDI: fffff520003ebf48
RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff815ebbce R11: 0000000000000000 R12: 1ffff920003ebf5b
R13: 00000000ffffffef R14: ffffffff8d2fcd94 R15: ffffc90001f5fd10
FS: 000000c00008a090(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0a5b59e300 CR3: 000000001cbe6000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__refcount_dec include/linux/refcount.h:344 [inline]
refcount_dec include/linux/refcount.h:359 [inline]
ref_tracker_free+0x4fe/0x610 lib/ref_tracker.c:101
netns_tracker_free include/net/net_namespace.h:327 [inline]
put_net_track include/net/net_namespace.h:341 [inline]
__sk_destruct+0x4a6/0x920 net/core/sock.c:2042
sk_destruct+0xbd/0xe0 net/core/sock.c:2058
__sk_free+0xef/0x3d0 net/core/sock.c:2069
sk_free+0x78/0xa0 net/core/sock.c:2080
sock_put include/net/sock.h:1911 [inline]
__mptcp_close_ssk+0x435/0x590 net/mptcp/protocol.c:2276
__mptcp_destroy_sock+0x35f/0x830 net/mptcp/protocol.c:2702
mptcp_close+0x5f8/0x7f0 net/mptcp/protocol.c:2750
inet_release+0x12e/0x280 net/ipv4/af_inet.c:428
inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:476
__sock_release+0xcd/0x280 net/socket.c:649
sock_close+0x18/0x20 net/socket.c:1314
__fput+0x286/0x9f0 fs/file_table.c:280
task_work_run+0xdd/0x1a0 kernel/task_work.c:164
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
exit_to_user_mode_prepare+0x27e/0x290 kernel/entry/common.c:207
__syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: ffa84b5ffb37 ("net: add netns refcount tracker to struct sock")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Cc: Mat Martineau <[email protected]>
Cc: Florian Westphal <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ipv6: use GFP_ATOMIC in rt6_probe()

syzbot reminded me that rt6_probe() can run from
atomic contexts.

stack backtrace:

CPU: 1 PID: 7461 Comm: syz-executor.2 Not tainted 5.16.0-rc4-next-20211210-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_usage_bug kernel/locking/lockdep.c:203 [inline]
valid_state kernel/locking/lockdep.c:3945 [inline]
mark_lock_irq kernel/locking/lockdep.c:4148 [inline]
mark_lock.cold+0x61/0x8e kernel/locking/lockdep.c:4605
mark_usage kernel/locking/lockdep.c:4500 [inline]
__lock_acquire+0x11d5/0x54a0 kernel/locking/lockdep.c:4981
lock_acquire kernel/locking/lockdep.c:5639 [inline]
lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604
__fs_reclaim_acquire mm/page_alloc.c:4550 [inline]
fs_reclaim_acquire+0x115/0x160 mm/page_alloc.c:4564
might_alloc include/linux/sched/mm.h:253 [inline]
slab_pre_alloc_hook mm/slab.h:739 [inline]
slab_alloc_node mm/slub.c:3145 [inline]
slab_alloc mm/slub.c:3239 [inline]
kmem_cache_alloc_trace+0x3b/0x2c0 mm/slub.c:3256
kmalloc include/linux/slab.h:581 [inline]
kzalloc include/linux/slab.h:715 [inline]
ref_tracker_alloc+0xe1/0x430 lib/ref_tracker.c:74
netdev_tracker_alloc include/linux/netdevice.h:3860 [inline]
dev_hold_track include/linux/netdevice.h:3877 [inline]
rt6_probe net/ipv6/route.c:661 [inline]
find_match.part.0+0xac9/0xd00 net/ipv6/route.c:752
find_match net/ipv6/route.c:825 [inline]
__find_rr_leaf+0x17f/0xd20 net/ipv6/route.c:826
find_rr_leaf net/ipv6/route.c:847 [inline]
rt6_select net/ipv6/route.c:891 [inline]
fib6_table_lookup+0x649/0xa20 net/ipv6/route.c:2185
ip6_pol_route+0x1c5/0x11e0 net/ipv6/route.c:2221
pol_lookup_func include/net/ip6_fib.h:580 [inline]
fib6_rule_lookup+0x52a/0x6f0 net/ipv6/fib6_rules.c:120
ip6_route_output_flags_noref+0x2e2/0x380 net/ipv6/route.c:2629
ip6_route_output_flags+0x72/0x320 net/ipv6/route.c:2642
ip6_route_output include/net/ip6_route.h:98 [inline]
ip6_dst_lookup_tail+0x5ab/0x1620 net/ipv6/ip6_output.c:1070
ip6_dst_lookup_flow+0x8c/0x1d0 net/ipv6/ip6_output.c:1200
geneve_get_v6_dst+0x46f/0x9a0 drivers/net/geneve.c:858
geneve6_xmit_skb drivers/net/geneve.c:991 [inline]
geneve_xmit+0x520/0x3530 drivers/net/geneve.c:1074
__netdev_start_xmit include/linux/netdevice.h:4685 [inline]
netdev_start_xmit include/linux/netdevice.h:4699 [inline]
xmit_one net/core/dev.c:3473 [inline]
dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3489
__dev_queue_xmit+0x2983/0x3640 net/core/dev.c:4112
neigh_resolve_output net/core/neighbour.c:1522 [inline]
neigh_resolve_output+0x50e/0x820 net/core/neighbour.c:1502
neigh_output include/net/neighbour.h:541 [inline]
ip6_finish_output2+0x56e/0x14f0 net/ipv6/ip6_output.c:126
__ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
__ip6_finish_output+0x61e/0xe80 net/ipv6/ip6_output.c:170
ip6_finish_output+0x32/0x200 net/ipv6/ip6_output.c:201
NF_HOOK_COND include/linux/netfilter.h:296 [inline]
ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:224
dst_output include/net/dst.h:451 [inline]
NF_HOOK include/linux/netfilter.h:307 [inline]
ndisc_send_skb+0xa99/0x17f0 net/ipv6/ndisc.c:508
ndisc_send_rs+0x12e/0x6f0 net/ipv6/ndisc.c:702
addrconf_rs_timer+0x3f2/0x820 net/ipv6/addrconf.c:3898
call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1421
expire_timers kernel/time/timer.c:1466 [inline]
__run_timers.part.0+0x675/0xa20 kernel/time/timer.c:1734
__run_timers kernel/time/timer.c:1715 [inline]
run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1747
__do_softirq+0x29b/0x9c2 kernel/softirq.c:558
invoke_softirq kernel/softirq.c:432 [inline]
__irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1097
</IRQ>

Fixes: fb67510ba9bd ("ipv6: add net device refcount tracker to rt6_probe_deferred()")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ice: Remove unused ICE_FLOW_SEG_HDRS_L2_MASK

Remove the unused define ICE_FLOW_SEG_HDRS_L2_MASK.

Reported-by: Jesse Brandeburg <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Acked-by: Paul Menzel <[email protected]>
Tested-by: Gurucharan G <[email protected]>

ice: Remove unnecessary casts

The "bitmap" variable is already an unsigned long so there is no need
for this cast.

Signed-off-by: Dan Carpenter <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: Propagate error codes

As all functions now return standard error codes, propagate the values
being returned instead of converting them to generic values.

Signed-off-by: Tony Nguyen <[email protected]>
Tested-by: Gurucharan G <[email protected]>

ice: Remove excess error variables

ice_status previously had a variable to contain these values where other
error codes had a variable as well. With ice_status now being an int,
there is no need for two variables to hold error values. In cases where
this occurs, remove one of the excess variables and use a single one.
Some initialization of variables are no longer needed and have been
removed.

Signed-off-by: Tony Nguyen <[email protected]>
Tested-by: Gurucharan G <[email protected]>

ice: Cleanup after ice_status removal

Clean up code after changing ice_status to int. Rearrange to fix reverse
Christmas tree and pull lines up where applicable.

Signed-off-by: Tony Nguyen <[email protected]>
Tested-by: Gurucharan G <[email protected]>

ice: Remove enum ice_status

Replace uses of ice_status to, as equivalent as possible, error codes.
Remove enum ice_status and its helper conversion function as they are no
longer needed.

Signed-off-by: Tony Nguyen <[email protected]>
Tested-by: Gurucharan G <[email protected]>

ice: Use int for ice_status

To prepare for removal of ice_status, change the variables from
ice_status to int. This eases the transition when values are changed to
return standard int error codes over enum ice_status.

Signed-off-by: Tony Nguyen <[email protected]>
Tested-by: Gurucharan G <[email protected]>

ice: Remove string printing for ice_status

Remove the ice_stat_str() function which prints the string
representation of the ice_status error code. With upcoming changes
moving away from ice_status, there will be no need for this function.

Signed-off-by: Tony Nguyen <[email protected]>
Tested-by: Gurucharan G <[email protected]>

ice: Refactor status flow for DDP load

Before this change, final state of the DDP pkg load process was
dependent on many variables such as: ice_status, pkg version,
ice_aq_err. The last one had be stored in hw->pkg_dwnld_status.
It was impossible to conclude this state just from ice_status, that's
why logging process of DDP pkg load in the caller was a little bit
complicated.

With this patch new status enum is introduced - ice_ddp_state.
It covers all the possible final states of the loading process.
What's tricky for ice_ddp_state is that not only
ICE_DDP_PKG_SUCCESS(=0) means that load was successful. Actually
three states mean that:
- ICE_DDP_PKG_SUCCESS
- ICE_DDP_PKG_SAME_VERSION_ALREADY_LOADED
- ICE_DDP_PKG_COMPATIBLE_ALREADY_LOADED
ice_is_init_pkg_successful can tell that information.

One ddp_state should not be used outside of ice_init_pkg which is
ICE_DDP_PKG_ALREADY_LOADED. It is more generic, it is used in
ice_dwnld_cfg_bufs to see if pkg is already loaded. At this point
we can't use one of the specific one (SAME_VERSION, COMPATIBLE,
NOT_SUPPORTED) because we don't have information on the package
currently loaded in HW (we are before calling ice_get_pkg_info).

We can get rid of hw->pkg_dwnld_status because we are immediately
mapping aq errors to ice_ddp_state in ice_dwnld_cfg_bufs.

Other errors like ICE_ERR_NO_MEMORY, ICE_ERR_PARAM are mapped the
generic ICE_DDP_PKG_ERR.

Suggested-by: Jacob Keller <[email protected]>
Signed-off-by: Wojciech Drewek <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: Refactor promiscuous functions

Some of the promiscuous mode functions take a boolean to indicate
set/clear, which affects readability. Refactor and provide an
interface for the promiscuous mode code with explicit set and clear
promiscuous mode operations.

Signed-off-by: Brett Creeley <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: refactor PTYPE validating

Since the capability of a PTYPE within a specific package could be
negotiated by checking the HW bit map, it means that there's no need
to maintain a different PTYPE list for each type of the package when
parsing PTYPE. So refactor the PTYPE validating mechanism.

Signed-off-by: Jeff Guo <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: Add package PTYPE enable information

Scan the 'Marker Ptype TCAM' section to retrieve the Rx parser PTYPE
enable information from the current package.

Signed-off-by: Haiyue Wang <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ibmvnic: remove unused defines

IBMVNIC_STATS_TIMEOUT and IBMVNIC_INIT_FAILED are not used in the driver.
Remove them.

Suggested-by: Sukadev Bhattiprolu <[email protected]>
Signed-off-by: Dany Madden <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ibmvnic: Update driver return codes

Update return codes to be more informative.

Signed-off-by: Jacob Root <[email protected]>
Signed-off-by: Dany Madden <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Revert "pktgen: use min() to make code cleaner"

This reverts commit 13510fef48a3803d9ee8f044b015dacfb06fe0f5.

Causes build warnings.

Signed-off-by: David S. Miller <[email protected]>

pktgen: use min() to make code cleaner

Use min() in order to make code cleaner. Issue found by coccinelle.

Reported-by: Zeal Robot <[email protected]>
Signed-off-by: Changcheng Deng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'dsa-fixups'

Vladimir Oltean says:

====================
DSA tagger-owned storage fixups

It seems that the DSA tagger-owned storage changes were insufficiently
tested and do not work in all cases. Specifically, the NXP Bluebox 3
(arch/arm64/boot/dts/freescale/fsl-lx2160a-bluebox3.dts) got broken by
these changes, because
(a) I forgot that DSA_TAG_PROTO_SJA1110 exists and differs from
    DSA_TAG_PROTO_SJA1105
(b) the Bluebox 3 uses a DSA switch tree with 2 switches, and the
    tagger-owned storage patches don't cover that use case well, it
    seems

Therefore, I'm sorry to say that there needs to be an API fixup: tagging
protocol drivers will from now on connect to individual switches from a
tree, rather than to the tree as a whole. This is more robust against
various ordering constraints in the DSA probe and teardown paths, and is
also symmetrical with the connection API exposed to the switch drivers
themselves, which is also per switch.

With these changes, the Bluebox 3 also works fine.
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: make tagging protocols connect to individual switches from a tree

On the NXP Bluebox 3 board which uses a multi-switch setup with sja1105,
the mechanism through which the tagger connects to the switch tree is
broken, due to improper DSA code design. At the time when tag_ops->connect()
is called in dsa_port_parse_cpu(), DSA hasn't finished "touching" all
the ports, so it doesn't know how large the tree is and how many ports
it has. It has just seen the first CPU port by this time. As a result,
this function will call the tagger's ->connect method too early, and the
tagger will connect only to the first switch from the tree.

This could be perhaps addressed a bit more simply by just moving the
tag_ops->connect(dst) call a bit later (for example in dsa_tree_setup),
but there is already a design inconsistency at present: on the switch
side, the notification is on a per-switch basis, but on the tagger side,
it is on a per-tree basis. Furthermore, the persistent storage itself is
per switch (ds->tagger_data). And the tagger connect and disconnect
procedures (at least the ones that exist currently) could see a fair bit
of simplification if they didn't have to iterate through the switches of
a tree.

To fix the issue, this change transforms tag_ops->connect(dst) into
tag_ops->connect(ds) and moves it somewhere where we already iterate
over all switches of a tree. That is in dsa_switch_setup_tag_protocol(),
which is a good placement because we already have there the connection
call to the switch side of things.

As for the dsa_tree_bind_tag_proto() method (called from the code path
that changes the tag protocol), things are a bit more complicated
because we receive the tree as argument, yet when we unwind on errors,
it would be nice to not call tag_ops->disconnect(ds) where we didn't
previously call tag_ops->connect(ds). We didn't have this problem before
because the tag_ops connection operations passed the entire dst before,
and this is more fine grained now. To solve the error rewind case using
the new API, we have to create yet one more cross-chip notifier for
disconnection, and stay connected with the old tag protocol to all the
switches in the tree until we've succeeded to connect with the new one
as well. So if something fails half way, the whole tree is still
connected to the old tagger. But there may still be leaks if the tagger
fails to connect to the 2nd out of 3 switches in a tree: somebody needs
to tell the tagger to disconnect from the first switch. Nothing comes
for free, and this was previously handled privately by the tagging
protocol driver before, but now we need to emit a disconnect cross-chip
notifier for that, because DSA has to take care of the unwind path. We
assume that the tagging protocol has connected to a switch if it has set
ds->tagger_data to something, otherwise we avoid calling its
disconnection method in the error rewind path.

The rest of the changes are in the tagging protocol drivers, and have to
do with the replacement of dst with ds. The iteration is removed and the
error unwind path is simplified, as mentioned above.

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: sja1105: fix broken connection with the sja1110 tagger

The driver was incorrectly converted assuming that "sja1105" is the only
tagger supported by this driver. This results in SJA1110 switches
failing to probe:

sja1105 spi1.0: Unable to connect to tag protocol "sja1110": -EPROTONOSUPPORT
sja1105: probe of spi1.2 failed with error -93

Add DSA_TAG_PROTO_SJA1110 to the list of supported taggers by the
sja1105 driver. The sja1105_tagger_data structure format is common for
the two tagging protocols.

Fixes: c79e84866d2a ("net: dsa: tag_sja1105: convert to tagger-owned data")
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: tag_sja1105: fix zeroization of ds->priv on tag proto disconnect

The method was meant to zeroize ds->tagger_data but got the wrong
pointer. Fix this.

Fixes: c79e84866d2a ("net: dsa: tag_sja1105: convert to tagger-owned data")
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bareudp: Add extack support to bareudp_configure()

Add missing extacks for common configuration errors.

Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ethtool: fix null-ptr-deref on ref tracker

dev can be a NULL here, not all requests set require_dev.

Fixes: e4b8954074f6 ("netlink: add net device refcount tracker to struct ethnl_req_info")
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dev: Change the order of the arguments for the contended condition.

Change the order of arguments and make qdisc_is_running() appear first.
This is more readable for the general case.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'hwtstamp_bonding'

Hangbin Liu says:

====================
net: add new hwtstamp flag HWTSTAMP_FLAG_BONDED_PHC_INDEX

This patchset add a new hwtstamp_config flag HWTSTAMP_FLAG_BONDED_PHC_INDEX.
When user want to get bond active interface's PHC, they need to add this flag
and aware the PHC index may changed.

v3: Use bitwise test to check the flags validation
v2: rename the flag to HWTSTAMP_FLAG_BONDED_PHC_INDEX
====================

Signed-off-by: David S. Miller <[email protected]>

Bonding: force user to add HWTSTAMP_FLAG_BONDED_PHC_INDEX when get/set HWTSTAMP

When there is a failover, the PHC index of bond active interface will be
changed. This may break the user space program if the author didn't aware.

By setting this flag, the user should aware that the PHC index get/set
by syscall is not stable. And the user space is able to deal with it.
Without this flag, the kernel will reject the request forwarding to
bonding.

Reported-by: Jakub Kicinski <[email protected]>
Fixes: 94dd016ae538 ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device")
Signed-off-by: Hangbin Liu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net_tstamp: add new flag HWTSTAMP_FLAG_BONDED_PHC_INDEX

Since commit 94dd016ae538 ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP
ioctl to active device") the user could get bond active interface's
PHC index directly. But when there is a failover, the bond active
interface will change, thus the PHC index is also changed. This may
break the user's program if they did not update the PHC timely.

This patch adds a new hwtstamp_config flag HWTSTAMP_FLAG_BONDED_PHC_INDEX.
When the user wants to get the bond active interface's PHC, they need to
add this flag and be aware the PHC index may be changed.

With the new flag. All flag checks in current drivers are removed. Only
the checking in net_hwtstamp_validate() is kept.

Suggested-by: Jakub Kicinski <[email protected]>
Signed-off-by: Hangbin Liu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mtk_eth: add COMPILE_TEST support

Improve the build testing of mtk_eth drivers by enabling them when
COMPILE_TEST is selected. Moreover COMPILE_TEST will be useful
for the driver development.

Signed-off-by: Lorenzo Bianconi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dev: Always serialize on Qdisc::busylock in __dev_xmit_skb() on PREEMPT_RT.

The root-lock is dropped before dev_hard_start_xmit() is invoked and after
setting the __QDISC___STATE_RUNNING bit. If the Qdisc owner is preempted
by another sender/task with a higher priority then this new sender won't
be able to submit packets to the NIC directly instead they will be
enqueued into the Qdisc. The NIC will remain idle until the Qdisc owner
is scheduled again and finishes the job.

By serializing every task on the ->busylock then the task will be
preempted by a sender only after the Qdisc has no owner.

Always serialize on the busylock on PREEMPT_RT.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mt76: remove variable set but not used

The code that uses variable queued has been removed,
and "mt76_is_usb(dev) ? q->ndesc - q->queued : q->queued"
didn't do anything, so all they should be removed as well.

Eliminate the following clang warnings:
drivers/net/wireless/mediatek/mt76/debugfs.c:77:9: warning: variable
‘queued’ set but not used.

Reported-by: Abaci Robot <[email protected]>
Fixes: 2d8be76c1674 ("mt76: debugfs: improve queue node readability")
Signed-off-by: Yang Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: bonding: debug: avoid printing debug logs when bond is not notifying peers

Currently "bond_should_notify_peers: slave ..." messages are printed whenever
"bond_should_notify_peers" function is called.

+++
Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
Dec 12 12:33:26 node1 kernel: bond0: (slave enp0s25): Received LACPDU on port 1
Dec 12 12:33:26 node1 kernel: bond0: (slave enp0s25): Rx Machine: Port=1, Last State=6, Curr State=6
Dec 12 12:33:26 node1 kernel: bond0: (slave enp0s25): partner sync=1
Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
...
Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
Dec 12 12:33:30 node1 kernel: bond0: (slave enp4s3): Received LACPDU on port 2
Dec 12 12:33:30 node1 kernel: bond0: (slave enp4s3): Rx Machine: Port=2, Last State=6, Curr State=6
Dec 12 12:33:30 node1 kernel: bond0: (slave enp4s3): partner sync=1
Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25
+++

This is confusing and can also clutter up debug logs.
Print logs only when the peer notification happens.

Signed-off-by: Suresh Kumar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ocelot: use dma_unmap_addr to get tx buffer dma_addr

dma_addr was declared using DEFINE_DMA_UNMAP_ADDR() which requires to
use dma_unmap_addr() to access it.

Reported-by: kernel test robot <[email protected]>
Fixes: 753a026cfec1 ("net: ocelot: add FDMA support")
Signed-off-by: Clément Léger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: lan966x: Fix the configuration of the pcs

When inserting a SFP that runs at 2.5G, then the Serdes was still
configured to run at 1G. Because the config->speed was 0, and then the
speed of the serdes was not configured at all, it was using the default
value which is 1G. This patch stop calling the serdes function set_speed
and allow the serdes to figure out the speed based on the interface
type.

Fixes: d28d6d2e37d10d ("net: lan966x: add port module support")
Signed-off-by: Horatiu Vultur <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

selftests/net: expand gro with two machine test

The test is currently run on a single host with private addresses,
either over veth or by setting a nic in loopback mode with macvlan.

Support running between two real devices. Allow overriding addresses.

Also cut timeout to fail faster on error and explicitly log success.

Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'mse102x-support'

Stefan Wahren says:

====================
add Vertexcom MSE102x support

This patch series adds support for the Vertexcom MSE102x Homeplug GreenPHY
chips [1]. They can be connected either via RGMII, RMII or SPI to a host CPU.
These patches handles only the last one, with an Ethernet over SPI protocol
driver.

The code has been tested only on Raspberry Pi boards, but should work
on other platforms.

Changes in V3:
- drop IF_PORT_HOMEPLUG again, since it's actually not used

Changes in V2:
- improve lock handling for RX & TX path
- add new patch to introduce IF_PORT_HOMEPLUG as suggested by Andrew Lunn
- address all the comments by Jakub Kicinski, Andrew Lunn, Kernel test robot

[1] - http://www.vertexcom.com/p_homeplug_plc_en.html
====================

Signed-off-by: David S. Miller <[email protected]>

net: vertexcom: Add MSE102x SPI support

This implements an SPI protocol driver for Vertexcom MSE102x
Homeplug GreenPHY chip.

Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

dt-bindings: net: add Vertexcom MSE102x support

Add devicetree binding for the Vertexcom MSE102x Homeplug GreenPHY chip
as SPI device.

Signed-off-by: Stefan Wahren <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

dt-bindings: add vendor Vertexcom

Add vendor prefix for Vertexcom Technologies, Inc [1].

[1] - http://www.vertexcom.com/

Signed-off-by: Stefan Wahren <[email protected]>
Acked-by: Rob Herring <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mvneta: mark as a legacy_pre_march2020 driver

mvneta provides mac_an_restart and mac_pcs_get_state methods, so needs
to be marked as a legacy driver. Marek spotted that mvneta had stopped
working in 2500base-X mode - thanks for reporting.

Reported-by: Marek Behún <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: axienet: mark as a legacy_pre_march2020 driver

axienet has a PCS, but does not make use of the phylink PCS support.
Mark it was a pre-March 2020 driver.

Signed-off-by: Russell King (Oracle) <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

isdn: cpai: no need to initialise statics to 0

Static variables do not need to be initialised to 0, because compiler
will initialise all uninitialised statics to 0. Thus, remove the
unneeded initializations.

Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ipa: fix IPA v4.5 interconnect data

Update the definition of the IPA interconnects for IPA v4.5 so
the path between IPA and system memory is represented by a single
"memory" interconnect.

Tested-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ARM: dts: qcom: sdx55: fix IPA interconnect definitions

The first two interconnects defined for IPA on the SDX55 SoC are
really two parts of what should be represented as a single path
between IPA and system memory.

Fix this by combining the "memory-a" and "memory-b" interconnects
into a single "memory" interconnect.

Reported-by: David Heidelberg <[email protected]>
Tested-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Enable max_dgram_qlen unix sysctl to be configurable by non-init user namespaces

This patch enables the "/proc/sys/net/unix/max_dgram_qlen" sysctl to be
exposed to non-init user namespaces. max_dgram_qlen is used as the default
"sk_max_ack_backlog" value for when a unix socket is created.

Currently, when a networking namespace is initialized, its unix sysctls
are exposed only if the user namespace that "owns" it is the init user
namespace. If there is an non-init user namespace that "owns" a networking
namespace (for example, in the case after we call clone() with both
CLONE_NEWUSER and CLONE_NEWNET set), the sysctls are hidden from view
and not configurable.

Exposing the unix sysctl is safe because any changes made to it will be
limited in scope to the networking namespace the non-init user namespace
"owns" and has privileges over (changes won't affect any other net
namespace). There is also no possibility of a non-privileged user namespace
messing up the net namespace sysctls it shares with its parent user namespace.
When a new user namespace is created without unsharing the network namespace
(eg calling clone() with CLONE_NEWUSER), the new user namespace shares its
parent's network namespace. Write access is protected by the mode set
in the sysctl's ctl_table (and enforced by procfs). Here in the case of
"max_dgram_qlen", 0644 is set; only the user owner has write access.

v1 -> v2:
* Add more detail to commit message, specify the
"/proc/sys/net/unix/max_dgram_qlen" sysctl in commit message.

Signed-off-by: Joanne Koong <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

u64_stats: Disable preemption on 32bit UP+SMP PREEMPT_RT during updates.

On PREEMPT_RT the seqcount_t for synchronisation is required on 32bit
architectures even on UP because the softirq (and the threaded IRQ handler) can
be preempted.

With the seqcount_t for synchronisation, a reader with higher priority can
preempt the writer and then spin endlessly in read_seqcount_begin() while the
writer can't make progress.

To avoid such a lock up on PREEMPT_RT the writer must disable preemption during
the update. There is no need to disable interrupts because no writer is using
this API in hard-IRQ context on PREEMPT_RT.

Disable preemption on 32bit-RT within the u64_stats write section.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'bareudp-remove-unused'

Guillaume Nault says:

====================
bareudp: Remove unused code from header file

Stop exporting unused functions and structures in bareudp.h. The only
piece of bareudp.h that is actually used is netif_is_bareudp(). The
rest can be moved to bareudp.c or even dropped entirely.
====================

Signed-off-by: David S. Miller <[email protected]>

bareudp: Move definition of struct bareudp_conf to bareudp.c

This structure is used only in bareudp.c.

While there, adjust include files: we need netdevice.h, not skbuff.h.

Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bareudp: Remove bareudp_dev_create()

There's no user for this function.

Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tipc: discard MSG_CRYPTO msgs when key_exchange_enabled is not set

When key_exchange is disabled, there is no reason to accept MSG_CRYPTO
msgs if it doesn't send MSG_CRYPTO msgs.

Signed-off-by: Xin Long <[email protected]>
Acked-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: stmmac: bump tc when get underflow error from DMA descriptor

In DMA threshold mode, frame underflow errors may sometimes occur when
the TC(threshold control) value is not enough. The TC value need to be
bumped up in this case.

There is no underflow interrupt bit on DMA_CH(#i)_Status of dwmac4, so
the DMA threshold cannot be bumped up in stmmac_dma_interrupt(). The
i.mx8mp board observed an underflow error while running NFS boot, the
NFS rootfs could not be mounted.

The underflow error can be got from the DMA descriptor TDES3 on dwmac4.
This patch bump up tc value once underflow error is got from TDES3.

Signed-off-by: Xiaoliang Yang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'dsa-tagger-storage'

Vladimir Oltean says:

====================
Replace DSA dp->priv with tagger-owned storage

Ansuel's recent work on qca8k register access over Ethernet:
https://patchwork.kernel.org/project/netdevbpf/cover/20211207145942 [email protected]/
has triggered me to do something which I should've done for a longer
time:
https://patchwork.kernel.org/project/netdevbpf/patch/20211109095013 [email protected]/#24585521
which is to replace dp->priv with something that has less caveats.

The dp->priv was introduced when sja1105 needed to hold stateful
information in the tagging protocol driver. In that design, dp->priv
held memory allocated by the switch driver, because the tagging protocol
driver design was 100% stateless.

Some years have passed and others have started to feel the need for
stateful information kept by the tagger, as well as passing data back
and forth between the tagging protocol driver and the switch driver.
This isn't possible cleanly in DSA due to a circular dependency which
leads to broken module autoloading:
https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/

This patchset introduces a framework that resembles something normal,
which allows data to be passed from the tagging protocol driver (things
like switch management packets, which aren't intended for the network
stack) to the switch driver, while the tagging protocol still remains
more or less stateless. The overall design of the framework was
discussed with Ansuel too and it appears to be flexible enough to cover
the "register access over Ethernet" use case. Additionally, the existing
uses of dp->priv, which have mainly to do with PTP timestamping, have
also been migrated.

Changes in v2:
Fix transient build breakage in patch 5/11 due to a missing parenthesis,
https://patchwork.hopto.org/static/nipa/592567/12665213/build_clang/
and another transient build warning in patch 4/11 that for some reason
doesn't appear in my W=1 C=1 build.
https://patchwork.hopto.org/static/nipa/592567/12665209/build_clang/stderr
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: remove dp->priv

All current in-tree uses of dp->priv have been replaced with
ds->tagger_data, which provides for a safer API especially when the
connection isn't the regular 1:1 link between one switch driver and one
tagging protocol driver, but could be either one switch to many taggers,
or many switches to one tagger.

Therefore, we can remove this unused pointer.

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: tag_sja1105: split sja1105_tagger_data into private and public sections

The sja1105 driver messes with the tagging protocol's state when PTP RX
timestamping is enabled/disabled. This is fundamentally necessary
because the tagger needs to know what to do when it receives a PTP
packet. If RX timestamping is enabled, then a metadata follow-up frame
is expected, and this holds the (partial) timestamp. So the tagger plays
hide-and-seek with the network stack until it also gets the metadata
frame, and then presents a single packet, the timestamped PTP packet.
But when RX timestamping isn't enabled, there is no metadata frame
expected, so the hide-and-seek game must be turned off and the packet
must be delivered right away to the network stack.

Considering this, we create a pseudo isolation by devising two tagger
methods callable by the switch: one to get the RX timestamping state,
and one to set it. Since we can't export symbols between the tagger and
the switch driver, these methods are exposed through function pointers.

After this change, the public portion of the sja1105_tagger_data
contains only function pointers.

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Revert "net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver"

This reverts commit 6d709cadfde68dbd12bef12fcced6222226dcb06.

The above change was done to avoid calling symbols exported by the
switch driver from the tagging protocol driver.

With the tagger-owned storage model, we have a new option on our hands,
and that is for the switch driver to provide a data consumer handler in
the form of a function pointer inside the ->connect_tag_protocol()
method. Having a function pointer avoids the problems of the exported
symbols approach.

By creating a handler for metadata frames holding TX timestamps on
SJA1110, we are able to eliminate an skb queue from the tagger data, and
replace it with a simple, and stateless, function pointer. This skb
queue is now handled exclusively by sja1105_ptp.c, which makes the code
easier to follow, as it used to be before the reverted patch.

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: tag_sja1105: convert to tagger-owned data

Currently, struct sja1105_tagger_data is a part of struct
sja1105_private, and is used by the sja1105 driver to populate dp->priv.

With the movement towards tagger-owned storage, the sja1105 driver
should not be the owner of this memory.

This change implements the connection between the sja1105 switch driver
and its tagging protocol, which means that sja1105_tagger_data no longer
stays in dp->priv but in ds->tagger_data, and that the sja1105 driver
now only populates the sja1105_port_deferred_xmit callback pointer.
The kthread worker is now the responsibility of the tagger.

The sja1105 driver also alters the tagger's state some more, especially
with regard to the PTP RX timestamping state. This will be fixed up a
bit in further changes.

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: sja1105: move ts_id from sja1105_tagger_data

The TX timestamp ID is incremented by the SJA1110 PTP timestamping
callback (->port_tx_timestamp) for every packet, when cloning it.
It isn't used by the tagger at all, even though it sits inside the
struct sja1105_tagger_data.

Also, serialization to this structure is currently done through
tagger_data->meta_lock, which is a cheap hack because the meta_lock
isn't used for anything else on SJA1110 (sja1105_rcv_meta_state_machine
isn't called).

This change moves ts_id from sja1105_tagger_data to sja1105_private and
introduces a dedicated spinlock for it, also in sja1105_private.

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: sja1105: make dp->priv point directly to sja1105_tagger_data

The design of the sja1105 tagger dp->priv is that each port has a
separate struct sja1105_port, and the sp->data pointer points to a
common struct sja1105_tagger_data.

We have removed all per-port members accessible by the tagger, and now
only struct sja1105_tagger_data remains. Make dp->priv point directly to
this.

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: sja1105: remove hwts_tx_en from tagger data

This tagger property is in fact not used at all by the tagger, only by
the switch driver. Therefore it makes sense to be moved to
sja1105_private.

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>