Git Repo - linux.git/log

net: dsa: microchip: ksz8795: Reject unsupported VLAN configuration

The switches supported by ksz8795 only have a per-port flag for Tag
Removal.  This means it is not possible to support both tagged and
untagged VLANs on the same port.  Reject attempts to add a VLAN that
requires the flag to be changed, unless there are no VLANs currently
configured.

VID 0 is excluded from this check since it is untagged regardless of
the state of the flag.

On the CPU port we could support tagged and untagged VLANs at the same
time.  This will be enabled by a later patch.

Fixes: e66f840c08a2 ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: microchip: ksz8795: Fix PVID tag insertion

ksz8795 has never actually enabled PVID tag insertion, and it also
programmed the PVID incorrectly.  To fix this:

* Allow tag insertion to be controlled per ingress port.  On most
  chips, set bit 2 in Global Control 19.  On KSZ88x3 this control
  flag doesn't exist.

* When adding a PVID:
  - Set the appropriate register bits to enable tag insertion on
    egress at every other port if this was the packet's ingress port.
  - Mask *out* the VID from the default tag, before or-ing in the new
    PVID.

* When removing a PVID:
  - Clear the same control bits to disable tag insertion.
  - Don't update the default tag.  This wasn't doing anything useful.

Fixes: e66f840c08a2 ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: microchip: Fix ksz_read64()

ksz_read64() currently does some dubious byte-swapping on the two
halves of a 64-bit register, and then only returns the high bits.
Replace this with a straightforward expression.

Fixes: e66f840c08a2 ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'linux-can-fixes-for-5.14-20210810' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

linux-can-fixes-for-5.14-20210810

Marc Kleine-Budde says:

====================
pull-request: can 2021-08-10

this is a pull request of 2 patches for net/master.

Baruch Siach's patch fixes a typo for the Microchip CAN BUS Analyzer
Tool entry in the MAINTAINERS file.

Hussein Alasadi fixes the setting of the M_CAN_DBTP register in the
m_can driver. The regression git mainline in v5.14-rc1, so no backport
to stable is needed.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge tag 'mlx5-fixes-2021-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2021-08-09

This series introduces fixes to mlx5 driver.
Please pull and let me know if there is any problem.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2021-08-09

This series contains updates to ice and iavf drivers.

Ani prevents the ice driver from accidentally being probed to a virtual
function and stops processing of VF messages when VFs are being torn
down.

Brett prevents the ice driver from deleting is own MAC address.

Fahad ensures the RSS LUT and key are always set following reset for
iavf.
====================

Signed-off-by: David S. Miller <[email protected]>

bpf: Fix potentially incorrect results with bpf_get_local_storage()

Commit b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage()
helper") fixed a bug for bpf_get_local_storage() helper so different tasks
won't mess up with each other's percpu local storage.

The percpu data contains 8 slots so it can hold up to 8 contexts (same or
different tasks), for 8 different program runs, at the same time. This in
general is sufficient. But our internal testing showed the following warning
multiple times:

  [...]
  warning: WARNING: CPU: 13 PID: 41661 at include/linux/bpf-cgroup.h:193
     __cgroup_bpf_run_filter_sock_ops+0x13e/0x180
  RIP: 0010:__cgroup_bpf_run_filter_sock_ops+0x13e/0x180
  <IRQ>
   tcp_call_bpf.constprop.99+0x93/0xc0
   tcp_conn_request+0x41e/0xa50
   ? tcp_rcv_state_process+0x203/0xe00
   tcp_rcv_state_process+0x203/0xe00
   ? sk_filter_trim_cap+0xbc/0x210
   ? tcp_v6_inbound_md5_hash.constprop.41+0x44/0x160
   tcp_v6_do_rcv+0x181/0x3e0
   tcp_v6_rcv+0xc65/0xcb0
   ip6_protocol_deliver_rcu+0xbd/0x450
   ip6_input_finish+0x11/0x20
   ip6_input+0xb5/0xc0
   ip6_sublist_rcv_finish+0x37/0x50
   ip6_sublist_rcv+0x1dc/0x270
   ipv6_list_rcv+0x113/0x140
   __netif_receive_skb_list_core+0x1a0/0x210
   netif_receive_skb_list_internal+0x186/0x2a0
   gro_normal_list.part.170+0x19/0x40
   napi_complete_done+0x65/0x150
   mlx5e_napi_poll+0x1ae/0x680
   __napi_poll+0x25/0x120
   net_rx_action+0x11e/0x280
   __do_softirq+0xbb/0x271
   irq_exit_rcu+0x97/0xa0
   common_interrupt+0x7f/0xa0
   </IRQ>
   asm_common_interrupt+0x1e/0x40
  RIP: 0010:bpf_prog_1835a9241238291a_tw_egress+0x5/0xbac
   ? __cgroup_bpf_run_filter_skb+0x378/0x4e0
   ? do_softirq+0x34/0x70
   ? ip6_finish_output2+0x266/0x590
   ? ip6_finish_output+0x66/0xa0
   ? ip6_output+0x6c/0x130
   ? ip6_xmit+0x279/0x550
   ? ip6_dst_check+0x61/0xd0
  [...]

Using drgn [0] to dump the percpu buffer contents showed that on this CPU
slot 0 is still available, but slots 1-7 are occupied and those tasks in
slots 1-7 mostly don't exist any more. So we might have issues in
bpf_cgroup_storage_unset().

Further debugging confirmed that there is a bug in bpf_cgroup_storage_unset().
Currently, it tries to unset "current" slot with searching from the start.
So the following sequence is possible:

  1. A task is running and claims slot 0
  2. Running BPF program is done, and it checked slot 0 has the "task"
     and ready to reset it to NULL (not yet).
  3. An interrupt happens, another BPF program runs and it claims slot 1
     with the *same* task.
  4. The unset() in interrupt context releases slot 0 since it matches "task".
  5. Interrupt is done, the task in process context reset slot 0.

At the end, slot 1 is not reset and the same process can continue to occupy
slots 2-7 and finally, when the above step 1-5 is repeated again, step 3 BPF
program won't be able to claim an empty slot and a warning will be issued.

To fix the issue, for unset() function, we should traverse from the last slot
to the first. This way, the above issue can be avoided.

The same reverse traversal should also be done in bpf_get_local_storage() helper
itself. Otherwise, incorrect local storage may be returned to BPF program.

  [0] https://github.com/osandov/drgn

Fixes: b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper")
Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

ovl: prevent private clone if bind mount is not allowed

Add the following checks from __do_loopback() to clone_private_mount() as
well:

- verify that the mount is in the current namespace

- verify that there are no locked children

Reported-by: Alois Wohlschlager <[email protected]>
Fixes: c771d683a62e ("vfs: introduce clone_private_mount()")
Cc: <[email protected]> # v3.18
Signed-off-by: Miklos Szeredi <[email protected]>

ovl: fix uninitialized pointer read in ovl_lookup_real_one()

One error path can result in release_dentry_name_snapshot() being called
before "name" was initialized by take_dentry_name_snapshot().

Fix by moving the release_dentry_name_snapshot() to immediately after the
only use.

Reported-by: Colin Ian King <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>

ovl: fix deadlock in splice write

There's possibility of an ABBA deadlock in case of a splice write to an
overlayfs file and a concurrent splice write to a corresponding real file.

The call chain for splice to an overlay file:

-> do_splice                     [takes sb_writers on overlay file]
   -> do_splice_from
     -> iter_file_splice_write    [takes pipe->mutex]
       -> vfs_iter_write
         ...
         -> ovl_write_iter        [takes sb_writers on real file]

And the call chain for splice to a real file:

-> do_splice                     [takes sb_writers on real file]
   -> do_splice_from
     -> iter_file_splice_write    [takes pipe->mutex]

Syzbot successfully bisected this to commit 82a763e61e2b ("ovl: simplify
file splice").

Fix by reverting the write part of the above commit and by adding missing
bits from ovl_write_iter() into ovl_splice_write().

Fixes: 82a763e61e2b ("ovl: simplify file splice")
Reported-and-tested-by: [email protected]
Signed-off-by: Miklos Szeredi <[email protected]>

ovl: skip stale entries in merge dir cache iteration

On the first getdents call, ovl_iterate() populates the readdir cache
with a list of entries, but for upper entries with origin lower inode,
p->ino remains zero.

Following getdents calls traverse the readdir cache list and call
ovl_cache_update_ino() for entries with zero p->ino to lookup the entry
in the overlay and return d_ino that is consistent with st_ino.

If the upper file was unlinked between the first getdents call and the
getdents call that lists the file entry, ovl_cache_update_ino() will not
find the entry and fall back to setting d_ino to the upper real st_ino,
which is inconsistent with how this object was presented to users.

Instead of listing a stale entry with inconsistent d_ino, simply skip
the stale entry, which is better for users.

xfstest overlay/077 is failing without this patch.

Signed-off-by: Amir Goldstein <[email protected]>
Link: https://lore.kernel.org/fstests/CAOQ4uxgR_cLnC_vdU5=seP3fwqVkuZM_-WfD6maFTMbMYq=a9w@mail.gmail.com/
Signed-off-by: Miklos Szeredi <[email protected]>

bpf: Add missing bpf_read_[un]lock_trace() for syscall program

Commit 79a7f8bdb159d ("bpf: Introduce bpf_sys_bpf() helper and program type.")
added support for syscall program, which is a sleepable program.

But the program run missed bpf_read_lock_trace()/bpf_read_unlock_trace(),
which is needed to ensure proper rcu callback invocations. This patch adds
bpf_read_[un]lock_trace() properly.

Fixes: 79a7f8bdb159d ("bpf: Introduce bpf_sys_bpf() helper and program type.")
Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf: Add lockdown check for probe_write_user helper

Back then, commit 96ae52279594 ("bpf: Add bpf_probe_write_user BPF helper
to be called in tracers") added the bpf_probe_write_user() helper in order
to allow to override user space memory. Its original goal was to have a
facility to "debug, divert, and manipulate execution of semi-cooperative
processes" under CAP_SYS_ADMIN. Write to kernel was explicitly disallowed
since it would otherwise tamper with its integrity.

One use case was shown in cf9b1199de27 ("samples/bpf: Add test/example of
using bpf_probe_write_user bpf helper") where the program DNATs traffic
at the time of connect(2) syscall, meaning, it rewrites the arguments to
a syscall while they're still in userspace, and before the syscall has a
chance to copy the argument into kernel space. These days we have better
mechanisms in BPF for achieving the same (e.g. for load-balancers), but
without having to write to userspace memory.

Of course the bpf_probe_write_user() helper can also be used to abuse
many other things for both good or bad purpose. Outside of BPF, there is
a similar mechanism for ptrace(2) such as PTRACE_PEEK{TEXT,DATA} and
PTRACE_POKE{TEXT,DATA}, but would likely require some more effort.
Commit 96ae52279594 explicitly dedicated the helper for experimentation
purpose only. Thus, move the helper's availability behind a newly added
LOCKDOWN_BPF_WRITE_USER lockdown knob so that the helper is disabled under
the "integrity" mode. More fine-grained control can be implemented also
from LSM side with this change.

Fixes: 96ae52279594 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers")
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>

drm/meson: fix colour distortion from HDR set during vendor u-boot

Add support for the OSD1 HDR registers so meson DRM can handle the HDR
properties set by Amlogic u-boot on G12A and newer devices which result
in blue/green/pink colour distortion to display output.

This takes the original patch submissions from Mathias [0] and [1] with
corrections for formatting and the missing description and attribution
needed for merge.

[0] https://lore.kernel.org/linux-amlogic/59dfd7e6-fc91-3d61-04c4-94e078a3188c@baylibre.com/T/
[1] https://lore.kernel.org/linux-amlogic/CAOKfEHBx_fboUqkENEMd-OC-NSrf46nto+vDLgvgttzPe99kXg@mail.gmail.com/T/#u

Fixes: 728883948b0d ("drm/meson: Add G12A Support for VIU setup")
Suggested-by: Mathias Steiger <[email protected]>
Signed-off-by: Christian Hewitt <[email protected]>
Tested-by: Neil Armstrong <[email protected]>
Tested-by: Philip Milev <[email protected]>
[narmsrong: adding missing space on second tested-by tag]
Signed-off-by: Neil Armstrong <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

can: m_can: m_can_set_bittiming(): fix setting M_CAN_DBTP register

This patch fixes the setting of the M_CAN_DBTP register contents:
- use DBTP_ (the data bitrate macros) instead of NBTP_ which area used
for the nominal bitrate
- do not overwrite possibly-existing DBTP_TDC flag by ORing reg_btp
instead of overwriting

Link: https://lore.kernel.org/r/FRYP281MB06140984ABD9994C0AAF7433D1F69@FRYP281MB0614.DEUP281.PROD.OUTLOOK.COM
Fixes: 20779943a080 ("can: m_can: use bits.h macros for all regmasks")
Cc: Torin Cooper-Bennun <[email protected]>
Cc: Chandrasekar Ramakrishnan <[email protected]>
Signed-off-by: Hussein Alasadi <[email protected]>
[mkl: update patch description, update indention]
Signed-off-by: Marc Kleine-Budde <[email protected]>

MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo

This patch fixes the abbreviated name of the Microchip CAN BUS
Analyzer Tool.

Fixes: 8a7b46fa7902 ("MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver")
Link: https://lore.kernel.org/r/cc4831cb1c8759c15fb32c21fd326e831183733d.1627876781.git.baruch@tkos.co.il
Signed-off-by: Baruch Siach <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

net/mlx5: Fix return value from tracer initialization

Check return value of mlx5_fw_tracer_start(), set error path and fix
return value of mlx5_fw_tracer_init() accordingly.

Fixes: c71ad41ccb0c ("net/mlx5: FW tracer, events handling")
Signed-off-by: Aya Levin <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: Synchronize correct IRQ when destroying CQ

The CQ destroy is performed based on the IRQ number that is stored in
cq->irqn. That number wasn't set explicitly during CQ creation and as
expected some of the API users of mlx5_core_create_cq() forgot to update
it.

This caused to wrong synchronization call of the wrong IRQ with a number
0 instead of the real one.

As a fix, set the IRQ number directly in the mlx5_core_create_cq() and
update all users accordingly.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Fixes: ef1659ade359 ("IB/mlx5: Add DEVX support for CQ events")
Signed-off-by: Shay Drory <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: TC, Fix error handling memory leak

Free the offload sample action on error.

Fixes: f94d6389f6a8 ("net/mlx5e: TC, Add support to offload sample action")
Signed-off-by: Chris Mi <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: Destroy pool->mutex

Destroy pool->mutex when we destroy the pool.

Fixes: c36326d38d93 ("net/mlx5: Round-Robin EQs over IRQs")
Signed-off-by: Shay Drory <[email protected]>
Reviewed-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: Set all field of mlx5_irq before inserting it to the xarray

Currently irq->pool is set after the irq is insert to the xarray.
Set irq->pool before the irq is inserted to the xarray.

Fixes: 71e084e26414 ("net/mlx5: Allocating a pool of MSI-X vectors for SFs")
Signed-off-by: Shay Drory <[email protected]>
Reviewed-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: Fix order of functions in mlx5_irq_detach_nb()

Change order of functions in mlx5_irq_detach_nb() so it will be
a mirror of mlx5_irq_attach_nb.

Fixes: 71e084e26414 ("net/mlx5: Allocating a pool of MSI-X vectors for SFs")
Signed-off-by: Shay Drory <[email protected]>
Reviewed-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: Block switchdev mode while devlink traps are active

Since switchdev mode can't support devlink traps, verify there are
no active devlink traps before moving eswitch to switchdev mode. If
there are active traps, prevent the switchdev mode configuration.

Fixes: eb3862a0525d ("net/mlx5e: Enable traps according to link state")
Signed-off-by: Aya Levin <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Destroy page pool after XDP SQ to fix use-after-free

mlx5e_close_xdpsq does the cleanup: it calls mlx5e_free_xdpsq_descs to
free the outstanding descriptors, which relies on
mlx5e_page_release_dynamic and page_pool_release_page. However,
page_pool_destroy is already called by this point, because
mlx5e_close_rq runs before mlx5e_close_xdpsq.

This commit fixes the use-after-free by swapping mlx5e_close_xdpsq and
mlx5e_close_rq.

The commit cited below started calling page_pool_destroy directly from
the driver. Previously, the page pool was destroyed under a call_rcu
from xdp_rxq_info_unreg_mem_model, which would defer the deallocation
until after the XDPSQ is cleaned up.

Fixes: 1da4bbeffe41 ("net: core: page_pool: add user refcnt and reintroduce page_pool_destroy")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: Bridge, fix ageing time

Ageing time is not converted from clock_t to jiffies which results
incorrect ageing timeout calculation in workqueue update task. Fix it by
applying clock_t_to_jiffies() to provided value.

Fixes: c636a0f0f3f0 ("net/mlx5: Bridge, dynamic entry ageing")
Signed-off-by: Vlad Buslov <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Avoid creating tunnel headers for local route

It could be local and remote are on the same machine and the route
result will be a local route which will result in creating encap id
with src/dst mac address of 0.

Fixes: a54e20b4fcae ("net/mlx5e: Add basic TC tunnel set action for SRIOV offloads")
Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: DR, Add fail on error check on decap

While processing encapsulated packet on RX, one of the fields that is
checked is the inner packet length. If the length as specified in the header
doesn't match the actual inner packet length, the packet is invalid
and should be dropped. However, such packet caused the NIC to hang.

This patch turns on a 'fail_on_error' HW bit which allows HW to drop
such an invalid packet while processing RX packet and trying to decap it.

Fixes: ad17dc8cf910 ("net/mlx5: DR, Move STEv0 action apply logic")
Signed-off-by: Alex Vesker <[email protected]>
Signed-off-by: Yevgeny Kliteynik <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: Don't skip subfunction cleanup in case of error in module init

Clean SF resources if mlx5 eth failed to initialize.

Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver")
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

Merge branch 'for-5.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fix from Tejun Heo:
"One commit to fix a possible A-A deadlock around u64_stats_sync on
  32bit machines caused by updating it without disabling IRQ when it may
  be read from IRQ context"

* 'for-5.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: rstat: fix A-A deadlock on 32bit around u64_stats_sync

Merge branch 'add-frag-page-support-in-page-pool'

Yunsheng Lin says:

====================
add frag page support in page pool

This patchset adds frag page support in page pool and
enable skb's page frag recycling based on page pool in
hns3 drvier.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: hns3: support skb's frag page recycling based on page pool

This patch adds skb's frag page recycling support based on
the frag page support in page pool.

The performance improves above 10~20% for single thread iperf
TCP flow with IOMMU disabled when iperf server and irq/NAPI
have a different CPU.

The performance improves about 135%(14Gbit to 33Gbit) for single
thread iperf TCP flow when IOMMU is in strict mode and iperf
server shares the same cpu with irq/NAPI.

Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

page_pool: add frag page recycling support in page pool

Currently page pool only support page recycling when there
is only one user of the page, and the split page reusing
implemented in the most driver can not use the page pool as
bing-pong way of reusing requires the multi user support in
page pool.

Those reusing or recycling has below limitations:
1. page from page pool can only be used be one user in order
   for the page recycling to happen.
2. Bing-pong way of reusing in most driver does not support
   multi desc using different part of the same page in order
   to save memory.

So add multi-users support and frag page recycling in page
pool to overcome the above limitation.

Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

page_pool: add interface to manipulate frag count in page pool

For 32 bit systems with 64 bit dma, dma_addr[1] is used to
store the upper 32 bit dma addr, those system should be rare
those days.

For normal system, the dma_addr[1] in 'struct page' is not
used, so we can reuse dma_addr[1] for storing frag count,
which means how many frags this page might be splited to.

In order to simplify the page frag support in the page pool,
the PAGE_POOL_DMA_USE_PP_FRAG_COUNT macro is added to indicate
the 32 bit systems with 64 bit dma, and the page frag support
in page pool is disabled for such system.

The newly added page_pool_set_frag_count() is called to reserve
the maximum frag count before any page frag is passed to the
user. The page_pool_atomic_sub_frag_count_return() is called
when user is done with the page frag.

Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

page_pool: keep pp info as long as page pool owns the page

Currently, page->pp is cleared and set everytime the page
is recycled, which is unnecessary.

So only set the page->pp when the page is added to the page
pool and only clear it when the page is released from the
page pool.

This is also a preparation to support allocating frag page
in page pool.

Reviewed-by: Ilias Apalodimas <[email protected]>
Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

bareudp: Fix invalid read beyond skb's linear data

Data beyond the UDP header might not be part of the skb's linear data.
Use skb_copy_bits() instead of direct access to skb->data+X, so that
we read the correct bytes even on a fragmented skb.

Fixes: 4b5f67232d95 ("net: Special handling for IP & MPLS.")
Signed-off-by: Guillaume Nault <[email protected]>
Link: https://lore.kernel.org/r/7741c46545c6ef02e70c80a9b32814b22d9616b3.1628264975.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>

net: openvswitch: fix kernel-doc warnings in flow.c

Repair kernel-doc notation in a few places to make it conform to
the expected format.

Fixes the following kernel-doc warnings:

flow.c:296: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Parse vlan tag from vlan header.
flow.c:296: warning: missing initial short description on line:
* Parse vlan tag from vlan header.
flow.c:537: warning: No description found for return value of 'key_extract_l3l4'
flow.c:769: warning: No description found for return value of 'key_extract'

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Pravin B Shelar <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

psample: Add a fwd declaration for skbuff

Without this there is a warning if source files include psample.h
before skbuff.h or doesn't include it at all.

Fixes: 6ae0a6286171 ("net: Introduce psample, a new genetlink channel for packet sampling")
Signed-off-by: Roi Dayan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

MAINTAINERS: update Vineet's email address

I'll be leaving Synopsys shortly, but will continue to handle maintenance
for the transition period.

Signed-off-by: Vineet Gupta <[email protected]>

selftests/bpf: Add tests for XDP bonding

Add a test suite to test XDP bonding implementation over a pair of
veth devices.

Signed-off-by: Jussi Maki <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

selftests/bpf: Fix xdp_tx.c prog section name

The program type cannot be deduced from 'tx' which causes an invalid
argument error when trying to load xdp_tx.o using the skeleton.
Rename the section name to "xdp" so that libbpf can deduce the type.

Signed-off-by: Jussi Maki <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

net, core: Allow netdev_lower_get_next_private_rcu in bh context

For the XDP bonding slave lookup to work in the NAPI poll context in which
the redudant rcu_read_lock() has been removed we have to follow the same
approach as in 694cea395fde ("bpf: Allow RCU-protected lookups to happen
from bh context") and modify the WARN_ON to also check rcu_read_lock_bh_held().

Signed-off-by: Jussi Maki <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Toke Høiland-Jørgensen <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf, devmap: Exclude XDP broadcast to master device

If the ingress device is bond slave, do not broadcast back through it or
the bond master.

Signed-off-by: Jussi Maki <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

net, bonding: Add XDP support to the bonding driver

XDP is implemented in the bonding driver by transparently delegating
the XDP program loading, removal and xmit operations to the bonding
slave devices. The overall goal of this work is that XDP programs
can be attached to a bond device *without* any further changes (or
awareness) necessary to the program itself, meaning the same XDP
program can be attached to a native device but also a bonding device.

Semantics of XDP_TX when attached to a bond are equivalent in such
setting to the case when a tc/BPF program would be attached to the
bond, meaning transmitting the packet out of the bond itself using one
of the bond's configured xmit methods to select a slave device (rather
than XDP_TX on the slave itself). Handling of XDP_TX to transmit
using the configured bonding mechanism is therefore implemented by
rewriting the BPF program return value in bpf_prog_run_xdp. To avoid
performance impact this check is guarded by a static key, which is
incremented when a XDP program is loaded onto a bond device. This
approach was chosen to avoid changes to drivers implementing XDP. If
the slave device does not match the receive device, then XDP_REDIRECT
is transparently used to perform the redirection in order to have
the network driver release the packet from its RX ring. The bonding
driver hashing functions have been refactored to allow reuse with
xdp_buff's to avoid code duplication.

The motivation for this change is to enable use of bonding (and
802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
XDP and also to transparently support bond devices for projects that
use XDP given most modern NICs have dual port adapters. An alternative
to this approach would be to implement 802.3ad in user-space and
implement the bonding load-balancing in the XDP program itself, but
is rather a cumbersome endeavor in terms of slave device management
(e.g. by watching netlink) and requires separate programs for native
vs bond cases for the orchestrator. A native in-kernel implementation
overcomes these issues and provides more flexibility.

Below are benchmark results done on two machines with 100Gbit
Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
16-core 3950X on receiving machine. 64 byte packets were sent with
pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
ice driver, so the tests were performed with iommu=off and patch [2]
applied. Additionally the bonding round robin algorithm was modified
to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
of cache misses were caused by the shared rr_tx_counter (see patch
2/3). The statistics were collected using "sar -n dev -u 1 10". On top
of that, for ice, further work is in progress on improving the XDP_TX
numbers [4].

-----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
without patch (1 dev):
   XDP_DROP:              3.15%      48.6Mpps
   XDP_TX:                3.12%      18.3Mpps     18.3Mpps
   XDP_DROP (RSS):        9.47%      116.5Mpps
   XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
-----------------------
with patch, bond (1 dev):
   XDP_DROP:              3.14%      46.7Mpps
   XDP_TX:                3.15%      13.9Mpps     13.9Mpps
   XDP_DROP (RSS):        10.33%     117.2Mpps
   XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
-----------------------
with patch, bond (2 devs):
   XDP_DROP:              6.27%      92.7Mpps
   XDP_TX:                6.26%      17.6Mpps     17.5Mpps
   XDP_DROP (RSS):       11.38%      117.2Mpps
   XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
--------------------------------------------------------------

RSS: Receive Side Scaling, e.g. the packets were sent to a range of
destination IPs.

  [1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
  [2]: https://lore.kernel.org/bpf/20210601113236 [email protected]/T/#t
  [3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/
  [4]: https://lore.kernel.org/bpf/20210805230046 [email protected]/T/#t

Signed-off-by: Jussi Maki <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Jay Vosburgh <[email protected]>
Cc: Veaceslav Falico <[email protected]>
Cc: Andy Gospodarek <[email protected]>
Cc: Maciej Fijalkowski <[email protected]>
Cc: Magnus Karlsson <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

net, core: Add support for XDP redirection to slave device

This adds the ndo_xdp_get_xmit_slave hook for transforming XDP_TX
into XDP_REDIRECT after BPF program run when the ingress device
is a bond slave.

The dev_xdp_prog_count is exposed so that slave devices can be checked
for loaded XDP programs in order to avoid the situation where both
bond master and slave have programs loaded according to xdp_state.

Signed-off-by: Jussi Maki <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Jay Vosburgh <[email protected]>
Cc: Veaceslav Falico <[email protected]>
Cc: Andy Gospodarek <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

net, bonding: Refactor bond_xmit_hash for use with xdp_buff

In preparation for adding XDP support to the bonding driver
refactor the packet hashing functions to be able to work with
any linear data buffer without an skb.

Signed-off-by: Jussi Maki <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Jay Vosburgh <[email protected]>
Cc: Veaceslav Falico <[email protected]>
Cc: Andy Gospodarek <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

ucounts: add missing data type changes

commit f9c82a4ea89c3 ("Increase size of ucounts to atomic_long_t")
changed the data type of ucounts/ucounts_max to long, but missed to
adjust a few other places. This is noticeable on big endian platforms
from user space because the /proc/sys/user/max_*_names files all
contain 0.

v4 - Made the min and max constants long so the sysctl values
are actually settable on little endian machines.
-- EWB

Fixes: f9c82a4ea89c ("Increase size of ucounts to atomic_long_t")
Signed-off-by: Sven Schnelle <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Tested-by: Linux Kernel Functional Testing <[email protected]>
Acked-by: Alexey Gladkov <[email protected]>
v1: https://lkml.kernel.org/r/20210721115800 [email protected]
v2: https://lkml.kernel.org/r/20210721125233.1041429 [email protected]
v3: https://lkml.kernel.org/r/20210730062854.3601635 [email protected]
Link: https://lkml.kernel.org/r/8735rijqlv.fsf_-_@disp2133
Signed-off-by: Eric W. Biederman <[email protected]>

bpf: Add _kernel suffix to internal lockdown_bpf_read

Rename LOCKDOWN_BPF_READ into LOCKDOWN_BPF_READ_KERNEL so we have naming
more consistent with a LOCKDOWN_BPF_WRITE_USER option that we are adding.

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>

iavf: Set RSS LUT and key in reset handle path

iavf driver should set RSS LUT and key unconditionally in reset
path. Currently, the driver does not do that. This patch fixes
this issue.

Fixes: 2c86ac3c7079 ("i40evf: create a generic config RSS function")
Signed-off-by: Md Fahad Iqbal Polash <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: don't remove netdev->dev_addr from uc sync list

In some circumstances, such as with bridging, it's possible that the
stack will add the device's own MAC address to its unicast address list.

If, later, the stack deletes this address, the driver will receive a
request to remove this address.

The driver stores its current MAC address as part of the VSI MAC filter
list instead of separately. So, this causes a problem when the device's
MAC address is deleted unexpectedly, which results in traffic failure in
some cases.

The following configuration steps will reproduce the previously
mentioned problem:

> ip link set eth0 up
> ip link add dev br0 type bridge
> ip link set br0 up
> ip addr flush dev eth0
> ip link set eth0 master br0
> echo 1 > /sys/class/net/br0/bridge/vlan_filtering
> modprobe -r veth
> modprobe -r bridge
> ip addr add 192.168.1.100/24 dev eth0

The following ping command fails due to the netdev->dev_addr being
deleted when removing the bridge module.
> ping <link partner>

Fix this by making sure to not delete the netdev->dev_addr during MAC
address sync. After fixing this issue it was noticed that the
netdev_warn() in .set_mac was overly verbose, so make it at
netdev_dbg().

Also, there is a possibility of a race condition between .set_mac and
.set_rx_mode. Fix this by calling netif_addr_lock_bh() and
netif_addr_unlock_bh() on the device's netdev when the netdev->dev_addr
is going to be updated in .set_mac.

Fixes: e94d44786693 ("ice: Implement filter sync, NDO operations and bump version")
Signed-off-by: Brett Creeley <[email protected]>
Tested-by: Liang Li <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: Stop processing VF messages during teardown

When VFs are setup and torn down in quick succession, it is possible
that a VF is torn down by the PF while the VF's virtchnl requests are
still in the PF's mailbox ring. Processing the VF's virtchnl request
when the VF itself doesn't exist results in undefined behavior. Fix
this by adding a check to stop processing virtchnl requests when VF
teardown is in progress.

Fixes: ddf30f7ff840 ("ice: Add handler to configure SR-IOV")
Signed-off-by: Anirudh Venkataramanan <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: Prevent probing virtual functions

The userspace utility "driverctl" can be used to change/override the
system's default driver choices. This is useful in some situations
(buggy driver, old driver missing a device ID, trying a workaround,
etc.) where the user needs to load a different driver.

However, this is also prone to user error, where a driver is mapped
to a device it's not designed to drive. For example, if the ice driver
is mapped to driver iavf devices, the ice driver crashes.

Add a check to return an error if the ice driver is being used to
probe a virtual function.

Fixes: 837f08fdecbe ("ice: Add basic driver framework for Intel(R) E800 Series")
Signed-off-by: Anirudh Venkataramanan <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

devlink: Fix port_type_set function pointer check

Fix a typo when checking existence of port_type_set function pointer.

Fixes: 82564f6c706a ("devlink: Simplify devlink port API calls")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: fec: fix build error for ARCH m68k

reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
make.cross ARCH=m68k  m5272c3_defconfig
make.cross ARCH=m68k

   drivers/net/ethernet/freescale/fec_main.c: In function 'fec_enet_eee_mode_set':
>> drivers/net/ethernet/freescale/fec_main.c:2758:33: error: 'FEC_LPI_SLEEP' undeclared (first use in this function); did you mean 'FEC_ECR_SLEEP'?
    2758 |  writel(sleep_cycle, fep->hwp + FEC_LPI_SLEEP);
         |                                 ^~~~~~~~~~~~~
   arch/m68k/include/asm/io_no.h:25:66: note: in definition of macro '__raw_writel'
      25 | #define __raw_writel(b, addr) (void)((*(__force volatile u32 *) (addr)) = (b))
         |                                                                  ^~~~
   drivers/net/ethernet/freescale/fec_main.c:2758:2: note: in expansion of macro 'writel'
    2758 |  writel(sleep_cycle, fep->hwp + FEC_LPI_SLEEP);
         |  ^~~~~~
   drivers/net/ethernet/freescale/fec_main.c:2758:33: note: each undeclared identifier is reported only once for each function it appears in
    2758 |  writel(sleep_cycle, fep->hwp + FEC_LPI_SLEEP);
         |                                 ^~~~~~~~~~~~~
   arch/m68k/include/asm/io_no.h:25:66: note: in definition of macro '__raw_writel'
      25 | #define __raw_writel(b, addr) (void)((*(__force volatile u32 *) (addr)) = (b))
         |                                                                  ^~~~
   drivers/net/ethernet/freescale/fec_main.c:2758:2: note: in expansion of macro 'writel'
    2758 |  writel(sleep_cycle, fep->hwp + FEC_LPI_SLEEP);
         |  ^~~~~~
>> drivers/net/ethernet/freescale/fec_main.c:2759:32: error: 'FEC_LPI_WAKE' undeclared (first use in this function)
    2759 |  writel(wake_cycle, fep->hwp + FEC_LPI_WAKE);
         |                                ^~~~~~~~~~~~
   arch/m68k/include/asm/io_no.h:25:66: note: in definition of macro '__raw_writel'
      25 | #define __raw_writel(b, addr) (void)((*(__force volatile u32 *) (addr)) = (b))
         |                                                                  ^~~~
   drivers/net/ethernet/freescale/fec_main.c:2759:2: note: in expansion of macro 'writel'
    2759 |  writel(wake_cycle, fep->hwp + FEC_LPI_WAKE);
         |  ^~~~~~

This patch adds register definition for M5272 platform to pass build.

Fixes: b82f8c3f1409 ("net: fec: add eee mode tx lpi support")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Joakim Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sched: act_mirred: Reset ct info when mirror/redirect skb

When mirror/redirect a skb to a different port, the ct info should be reset
for reclassification. Or the pkts will match unexpected rules. For example,
with following topology and commands:

    -----------
              |
       veth0 -+-------
              |
       veth1 -+-------
              |
   ------------

tc qdisc add dev veth0 clsact
# The same with "action mirred egress mirror dev veth1" or "action mirred ingress redirect dev veth1"
tc filter add dev veth0 egress chain 1 protocol ip flower ct_state +trk action mirred ingress mirror dev veth1
tc filter add dev veth0 egress chain 0 protocol ip flower ct_state -inv action ct commit action goto chain 1
tc qdisc add dev veth1 clsact
tc filter add dev veth1 ingress chain 0 protocol ip flower ct_state +trk action drop

ping <remove ip via veth0> &
tc -s filter show dev veth1 ingress

With command 'tc -s filter show', we can find the pkts were dropped on
veth1.

Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
Signed-off-by: Roi Dayan <[email protected]>
Signed-off-by: Hangbin Liu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/smc: Allow SMC-D 1MB DMB allocations

Commit a3fe3d01bd0d7 ("net/smc: introduce sg-logic for RMBs") introduced
a restriction for RMB allocations as used by SMC-R. However, SMC-D does
not use scatter-gather lists to back its DMBs, yet it was limited by
this restriction, still.
This patch exempts SMC, but limits allocations to the maximum RMB/DMB
size respectively.

Signed-off-by: Stefan Raspl <[email protected]>
Signed-off-by: Guvenc Gulce <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'smc-fixes'

Guvenc Gulce says:

====================
net/smc: fixes 2021-08-09

please apply the following patch series for smc to netdev's net tree.
One patch fixes invalid connection counting for links and the other
one fixes an access to an already cleared link.
====================

Signed-off-by: David S. Miller <[email protected]>

net/smc: Correct smc link connection counter in case of smc client

SMC clients may be assigned to a different link after the initial
connection between two peers was established. In such a case,
the connection counter was not correctly set.

Update the connection counter correctly when a smc client connection
is assigned to a different smc link.

Fixes: 07d51580ff65 ("net/smc: Add connection counters for links")
Signed-off-by: Guvenc Gulce <[email protected]>
Tested-by: Karsten Graul <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/smc: fix wait on already cleared link

There can be a race between the waiters for a tx work request buffer
and the link down processing that finally clears the link. Although
all waiters are woken up before the link is cleared there might be
waiters which did not yet get back control and are still waiting.
This results in an access to a cleared wait queue head.

Fix this by introducing atomic reference counting around the wait calls,
and wait with the link clear processing until all waiters have finished.
Move the work request layer related calls into smc_wr.c and set the
link state to INACTIVE before calling smcr_link_clear() in
smc_llc_srv_add_link().

Fixes: 15e1b99aadfb ("net/smc: no WR buffer wait for terminating link group")
Signed-off-by: Karsten Graul <[email protected]>
Signed-off-by: Guvenc Gulce <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

devlink: Set device as early as possible

All kernel devlink implementations call to devlink_alloc() during
initialization routine for specific device which is used later as
a parent device for devlink_register().

Such late device assignment causes to the situation which requires us to
call to device_register() before setting other parameters, but that call
opens devlink to the world and makes accessible for the netlink users.

Any attempt to move devlink_register() to be the last call generates the
following error due to access to the devlink->dev pointer.

[    8.758862]  devlink_nl_param_fill+0x2e8/0xe50
[    8.760305]  devlink_param_notify+0x6d/0x180
[    8.760435]  __devlink_params_register+0x2f1/0x670
[    8.760558]  devlink_params_register+0x1e/0x20

The simple change of API to set devlink device in the devlink_alloc()
instead of devlink_register() fixes all this above and ensures that
prior to call to devlink_register() everything already set.

Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

wwan: mhi: Fix missing spin_lock_init() in mhi_mbim_probe()

The driver allocates the spinlock but not initialize it.
Use spin_lock_init() on it to initialize it correctly.

Fixes: aa730a9905b7 ("net: wwan: Add MHI MBIM network driver")
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Wei Yongjun <[email protected]>
Reviewed-by: Sergey Ryazanov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ethernet: ti: cpsw: fix min eth packet size for non-switch use-cases

The CPSW switchdev driver inherited fix from commit 9421c9015047 ("net:
ethernet: ti: cpsw: fix min eth packet size") which changes min TX packet
size to 64bytes (VLAN_ETH_ZLEN, excluding ETH_FCS). It was done to fix HW
packed drop issue when packets are sent from Host to the port with PVID and
un-tagging enabled. Unfortunately this breaks some other non-switch
specific use-cases, like:
- [1] CPSW port as DSA CPU port with DSA-tag applied at the end of the
packet
- [2] Some industrial protocols, which expects min TX packet size 60Bytes
(excluding FCS).

Fix it by configuring min TX packet size depending on driver mode
- 60Bytes (ETH_ZLEN) for multi mac (dual-mac) mode
- 64Bytes (VLAN_ETH_ZLEN) for switch mode
and update it during driver mode change and annotate with
READ_ONCE()/WRITE_ONCE() as it can be read by napi while writing.

[1] https://lore.kernel.org/netdev/20210531124051.GA15218@cephalopod/
[2] https://e2e.ti.com/support/arm/sitara_arm/f/791/t/701669

Cc: [email protected]
Fixes: ed3525eda4c4 ("net: ethernet: ti: introduce cpsw switchdev based driver part 1 - dual-emac")
Reported-by: Ben Hutchings <[email protected]>
Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'iucv-next'

Karsten Graul says:

====================
net/iucv: updates 2021-08-09

Please apply the following iucv patches to netdev's net-next tree.

Remove the usage of register asm statements and replace deprecated
CPU-hotplug functions with the current version.
Use use consume_skb() instead of kfree_skb() to avoid flooding
dropwatch with false-positives, and 2 patches with cleanups.
====================

Signed-off-by: David S. Miller <[email protected]>

net/iucv: Replace deprecated CPU-hotplug functions.

The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().

Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.

Cc: Julian Wiedmann <[email protected]>
Cc: Karsten Graul <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: Karsten Graul <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/iucv: get rid of register asm usage

Using register asm statements has been proven to be very error prone,
especially when using code instrumentation where gcc may add function
calls, which clobbers register contents in an unexpected way.

Therefore get rid of register asm statements in iucv code, even though
there is currently nothing wrong with it. This way we know for sure
that the above mentioned bug class won't be introduced here.

Acked-by: Karsten Graul <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Karsten Graul <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/af_iucv: remove wrappers around iucv (de-)registration

These wrappers are just unnecessary obfuscation.

Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: Karsten Graul <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/af_iucv: clean up a try_then_request_module()

Use IS_ENABLED(CONFIG_IUCV) to determine whether the iucv_if symbol
is available, and let depmod deal with the module dependency.

This was introduced back with commit 6fcd61f7bf5d ("af_iucv: use
loadable iucv interface"). And to avoid sprinkling IS_ENABLED() over
all the code, we're keeping the indirection through pr_iucv->...().

Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: Karsten Graul <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/af_iucv: support drop monitoring

Change the good paths to use consume_skb() instead of kfree_skb(). This
avoids flooding dropwatch with false-positives.

Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: Karsten Graul <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

page_pool: mask the page->signature before the checking

As mentioned in commit c07aea3ef4d4 ("mm: add a signature in
struct page"):
"The page->signature field is aliased to page->lru.next and
page->compound_head."

And as the comment in page_is_pfmemalloc():
"lru.next has bit 1 set if the page is allocated from the
pfmemalloc reserves. Callers may simply overwrite it if they
do not need to preserve that information."

The page->signature is OR’ed with PP_SIGNATURE when a page is
allocated in page pool, see __page_pool_alloc_pages_slow(),
and page->signature is checked directly with PP_SIGNATURE in
page_pool_return_skb_page(), which might cause resoure leaking
problem for a page from page pool if bit 1 of lru.next is set
for a pfmemalloc page. What happens here is that the original
pp->signature is OR'ed with PP_SIGNATURE after the allocation
in order to preserve any existing bits(such as the bit 1, used
to indicate a pfmemalloc page), so when those bits are present,
those page is not considered to be from page pool and the DMA
mapping of those pages will be left stale.

As bit 0 is for page->compound_head, So mask both bit 0/1 before
the checking in page_pool_return_skb_page(). And we will return
those pfmemalloc pages back to the page allocator after cleaning
up the DMA mapping.

Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling")
Reviewed-by: Ilias Apalodimas <[email protected]>
Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

dccp: add do-while-0 stubs for dccp_pr_debug macros

GCC complains about empty macros in an 'if' statement, so convert
them to 'do {} while (0)' macros.

Fixes these build warnings:

net/dccp/output.c: In function 'dccp_xmit_packet':
../net/dccp/output.c:283:71: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
283 | dccp_pr_debug("transmit_skb() returned err=%d\n", err);
net/dccp/ackvec.c: In function 'dccp_ackvec_update_old':
../net/dccp/ackvec.c:163:80: warning: suggest braces around empty body in an 'else' statement [-Wempty-body]
163 | (unsigned long long)seqno, state);

Fixes: dc841e30eaea ("dccp: Extend CCID packet dequeueing interface")
Fixes: 380240864451 ("dccp ccid-2: Update code for the Ack Vector input/registration routine")
Signed-off-by: Randy Dunlap <[email protected]>
Cc: [email protected]
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Gerrit Renker <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'dsa-fast-ageing'

Vladimir Oltean says:

====================
DSA fast ageing fixes/improvements

These are 2 small improvements brought to the DSA fast ageing changes
merged earlier today.

Patch 1 restores the behavior for DSA drivers that don't implement the
.port_bridge_flags function (I don't think there is any breakage due
to the new behavior, but just to be sure). This came as a result of
Andrew's review.

Patch 2 reduces the number of fast ages of a port from 2 to 1 when it
leaves a bridge.
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: avoid fast ageing twice when port leaves a bridge

Drivers that support both the toggling of address learning and dynamic
FDB flushing (mv88e6xxx, b53, sja1105) currently need to fast-age a port
twice when it leaves a bridge:

- once, when del_nbp() calls br_stp_disable_port() which puts the port
in the BLOCKING state
- twice, when dsa_port_switchdev_unsync_attrs() calls
dsa_port_clear_brport_flags() which disables address learning

The knee-jerk reaction might be to say "dsa_port_clear_brport_flags does
not need to fast-age the port at all", but the thing is, we still need
both code paths to flush the dynamic FDB entries in different situations.
When a DSA switch port leaves a bonding/team interface that is (still) a
bridge port, no del_nbp() will be called, so we rely on
dsa_port_clear_brport_flags() function to restore proper standalone port
functionality with address learning disabled.

So the solution is just to avoid double the work when both code paths
are called in series. Luckily, DSA already caches the STP port state, so
we can skip flushing the dynamic FDB when we disable address learning
and the STP state is one where no address learning takes place at all.
Under that condition, not flushing the FDB is safe because there is
supposed to not be any dynamic FDB entry at all (they were flushed
during the transition towards that state, and none were learned in the
meanwhile).

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: still fast-age ports joining a bridge if they can't configure learning

Commit 39f32101543b ("net: dsa: don't fast age standalone ports")
assumed that all standalone ports disable address learning, but if the
switch driver implements .port_fast_age but not .port_bridge_flags (like
ksz9477, ksz8795, lantiq_gswip, lan9303), then that might not actually
be true.

So whereas before, the bridge temporarily walking us through the
BLOCKING STP state meant that the standalone ports had a checkpoint to
flush their baggage and start fresh when they join a bridge, after that
commit they no longer do.

Restore the old behavior for these drivers by checking if the switch can
toggle address learning. If it can't, disregard the "do_fast_age"
argument and unconditionally perform fast ageing on STP state changes.

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

netfilter: x_tables: never register tables by default

For historical reasons x_tables still register tables by default in the
initial namespace.
Only newly created net namespaces add the hook on demand.

This means that the init_net always pays hook cost, even if no filtering
rules are added (e.g. only used inside a single netns).

Note that the hooks are added even when 'iptables -L' is called.
This is because there is no way to tell 'iptables -A' and 'iptables -L'
apart at kernel level.

The only solution would be to register the table, but delay hook
registration until the first rule gets added (or policy gets changed).

That however means that counters are not hooked either, so 'iptables -L'
would always show 0-counters even when traffic is flowing which might be
unexpected.

This keeps table and hook registration consistent with what is already done
in non-init netns: first iptables(-save) invocation registers both table
and hooks.

This applies the same solution adopted for ebtables.
All tables register a template that contains the l3 family, the name
and a constructor function that is called when the initial table has to
be added.

Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

drm/i915/gvt: Fix cached atomics setting for Windows VM

We've seen recent regression with host and windows VM running
simultaneously that cause gpu hang or even crash. Finally bisect to
commit 58586680ffad ("drm/i915: Disable atomics in L3 for gen9"),
which seems cached atomics behavior difference caused regression
issue.

This tries to add new scratch register handler and add those in mmio
save/restore list for context switch. No gpu hang produced with this one.

Cc: [email protected] # 5.12+
Cc: "Xu, Terrence" <[email protected]>
Cc: "Bloomfield, Jon" <[email protected]>
Cc: "Ekstrand, Jason" <[email protected]>
Reviewed-by: Colin Xu <[email protected]>
Fixes: 58586680ffad ("drm/i915: Disable atomics in L3 for gen9")
Signed-off-by: Zhenyu Wang <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]

ALSA: pcm: Fix mmap breakage without explicit buffer setup

The recent fix c4824ae7db41 ("ALSA: pcm: Fix mmap capability check")
restricts the mmap capability only to the drivers that properly set up
the buffers, but it caused a regression for a few drivers that manage
the buffer on its own way.

For those with UNKNOWN buffer type (i.e. the uninitialized / unused
substream->dma_buffer), just assume that the driver handles the mmap
properly and blindly trust the hardware info bit.

Fixes: c4824ae7db41 ("ALSA: pcm: Fix mmap capability check")
Reported-and-tested-by: Jeff Woods <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

Linux 5.14-rc5

Merge branch 'sja1105-fast-ageing'

Vladimir Oltean says:

====================
Fast ageing support for SJA1105 DSA driver

While adding support for flushing dynamically learned FDB entries in the
sja1105 driver, I noticed a few things that could be improved in DSA.
Most notably, drivers could omit a fast age when address learning is
turned off, which might mean that ports leaving a bridge and becoming
standalone could still have FDB entries pointing towards them. Secondly,
when DSA fast ages a port after the 'learning' flag has been turned off,
the software bridge still has the dynamically learned 'master' FDB
entries installed, and those should be deleted too.
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: sja1105: add FDB fast ageing support

Delete the dynamically learned FDB entries when the STP state changes
and when address learning is disabled.

On sja1105 there is no shorthand SPI command for this, so we need to
walk through the entire FDB to delete.

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: sja1105: rely on DSA core tracking of port learning state

Now that DSA keeps track of the port learning state, it becomes
superfluous to keep an additional variable with this information in the
sja1105 driver. Remove it.

The DSA core's learning state is present in struct dsa_port *dp.
To avoid the antipattern where we iterate through a DSA switch's
ports and then call dsa_to_port to obtain the "dp" reference (which is
bad because dsa_to_port iterates through the DSA switch tree once
again), just iterate through the dst->ports and operate on those
directly.

The sja1105 had an extra use of priv->learn_ena on non-user ports. DSA
does not touch the learning state of those ports - drivers are free to
do what they wish on them. Mark that information with a comment in
struct dsa_port and let sja1105 set dp->learning for cascade ports.

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: flush the dynamic FDB of the software bridge when fast ageing a port

Currently, when DSA performs fast ageing on a port, 'bridge fdb' shows
us that the 'self' entries (corresponding to the hardware bridge, as
printed by dsa_slave_fdb_dump) are deleted, but the 'master' entries
(corresponding to the software bridge) aren't.

Indeed, searching through the bridge driver, neither the
brport_attr_learning handler nor the IFLA_BRPORT_LEARNING handler call
br_fdb_delete_by_port. However, br_stp_disable_port does, which is one
of the paths which DSA uses to trigger a fast ageing process anyway.

There is, however, one other very promising caller of
br_fdb_delete_by_port, and that is the bridge driver's handler of the
SWITCHDEV_FDB_FLUSH_TO_BRIDGE atomic notifier. Currently the s390/qeth
HiperSockets card driver is the only user of this.

I can't say I understand that driver's architecture or interaction with
the bridge, but it appears to not be a switchdev driver in the traditional
sense of the word. Nonetheless, the mechanism it provides is a useful
way for DSA to express the fact that it performs fast ageing too, in a
way that does not change the existing behavior for other drivers.

Cc: Alexandra Winter <[email protected]>
Cc: Julian Wiedmann <[email protected]>
Cc: Roopa Prabhu <[email protected]>
Cc: Nikolay Aleksandrov <[email protected]>
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: don't fast age bridge ports with learning turned off

On topology changes, stations that were dynamically learned on ports
that are no longer part of the active topology must be flushed - this is
described by clause "17.11 Updating learned station location information"
of IEEE 802.1D-2004.

However, when address learning on the bridge port is turned off in the
first place, there is nothing to flush, so skip a potentially expensive
operation.

We can finally do this now since DSA is aware of the learning state of
its bridged ports.

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: centralize fast ageing when address learning is turned off

Currently DSA leaves it down to device drivers to fast age the FDB on a
port when address learning is disabled on it. There are 2 reasons for
doing that in the first place:

- when address learning is disabled by user space, through
  IFLA_BRPORT_LEARNING or the brport_attr_learning sysfs, what user
  space typically wants to achieve is to operate in a mode with no
  dynamic FDB entry on that port. But if the port is already up, some
  addresses might have been already learned on it, and it seems silly to
  wait for 5 minutes for them to expire until something useful can be
  done.

- when a port leaves a bridge and becomes standalone, DSA turns off
  address learning on it. This also has the nice side effect of flushing
  the dynamically learned bridge FDB entries on it, which is a good idea
  because standalone ports should not have bridge FDB entries on them.

We let drivers manage fast ageing under this condition because if DSA
were to do it, it would need to track each port's learning state, and
act upon the transition, which it currently doesn't.

But there are 2 reasons why doing it is better after all:

- drivers might get it wrong and not do it (see b53_port_set_learning)

- we would like to flush the dynamic entries from the software bridge
  too, and letting drivers do that would be another pain point

So track the port learning state and trigger a fast age process
automatically within DSA.

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'timers-urgent-2021-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
"A single timer fix:

   - Prevent a memory ordering issue in the timer expiry code which
     makes it possible to observe falsely that the callback has been
     executed already while that's not the case, which violates the
     guarantee of del_timer_sync()"

* tag 'timers-urgent-2021-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers: Move clearing of base::timer_running under base:: Lock

Merge tag 'sched-urgent-2021-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Thomas Gleixner:
"A single scheduler fix:

   - Prevent a double enqueue caused by rt_effective_prio() being
     invoked twice in __sched_setscheduler()"

* tag 'sched-urgent-2021-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/rt: Fix double enqueue caused by rt_effective_prio

Merge tag 'perf-urgent-2021-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
"A set of perf fixes:

   - Correct the permission checks for perf event which send SIGTRAP to
     a different process and clean up that code to be more readable.

   - Prevent an out of bound MSR access in the x86 perf code which
     happened due to an incomplete limiting to the actually available
     hardware counters.

   - Prevent access to the AMD64_EVENTSEL_HOSTONLY bit when running
     inside a guest.

   - Handle small core counter re-enabling correctly by issuing an ACK
     right before reenabling it to prevent a stale PEBS record being
     kept around"

* tag 'perf-urgent-2021-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Apply mid ACK for small core
  perf/x86/amd: Don't touch the AMD64_EVENTSEL_HOSTONLY bit inside the guest
  perf/x86: Fix out of bound MSR access
  perf: Refactor permissions check into perf_check_permission()
  perf: Fix required permissions if sigtrap is requested

Merge tag 'char-misc-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc driver fixes for 5.14-rc5.

  They resolve a few regressions that people reported:

   - acrn driver fix

   - fpga driver fix

   - interconnect tiny driver fixes

  All have been in linux-next for a while with no reported issues"

* tag 'char-misc-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  interconnect: Fix undersized devress_alloc allocation
  interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate
  interconnect: qcom: icc-rpmh: Ensure floor BW is enforced for all nodes
  fpga: dfl: fme: Fix cpu hotplug issue in performance reporting
  virt: acrn: Do hcall_destroy_vm() before resource release
  interconnect: Always call pre_aggregate before aggregate
  interconnect: Zero initial BW after sync-state

Merge tag 'driver-core-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
"Here are three tiny driver core and firmware loader fixes for
  5.14-rc5. They are:

   - driver core fix for when probing fails

   - firmware loader fixes for reported problems.

  All have been in linux-next for a while with no reported issues"

* tag 'driver-core-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  firmware_loader: fix use-after-free in firmware_fallback_sysfs
  firmware_loader: use -ETIMEDOUT instead of -EAGAIN in fw_load_sysfs_fallback
  drivers core: Fix oops when driver probe fails

Merge tag 'staging-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
"Here are a few small staging driver fixes for 5.14-rc5 to resolve some
  reported problems. They include:

   - mt7621 driver fix

   - rtl8723bs driver fixes

   - rtl8712 driver fixes.

  Nothing major, just small problems resolved.

  All have been in linux-next for a while with no reported issues"

* tag 'staging-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: mt7621-pci: avoid to re-disable clock for those pcies not in use
  staging: rtl8712: error handling refactoring
  staging: rtl8712: get rid of flush_scheduled_work
  staging: rtl8723bs: select CONFIG_CRYPTO_LIB_ARC4
  staging: rtl8723bs: Fix a resource leak in sd_int_dpc

Merge tag 'tty-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
"Here are some small tty/serial driver fixes for 5.14-rc5 to resolve a
  number of reported problems.

  They include:

   - mips serial driver fixes

   - 8250 driver fixes for reported problems

   - fsl_lpuart driver fixes

   - other tiny driver fixes

  All have been in linux-next for a while with no reported problems"

* tag 'tty-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: 8250_pci: Avoid irq sharing for MSI(-X) interrupts.
  serial: 8250_mtk: fix uart corruption issue when rx power off
  tty: serial: fsl_lpuart: fix the wrong return value in lpuart32_get_mctrl
  serial: 8250_pci: Enumerate Elkhart Lake UARTs via dedicated driver
  serial: 8250: fix handle_irq locking
  serial: tegra: Only print FIFO error message when an error occurs
  MIPS: Malta: Do not byte-swap accesses to the CBUS UART
  serial: 8250: Mask out floating 16/32-bit bus bits
  serial: max310x: Unprepare and disable clock in error path

Merge tag 'usb-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB driver fixes from Greg KH:
"Here are some small USB driver fixes for 5.14-rc5. They resolve a
  number of small reported issues, including:

   - cdnsp driver fixes

   - usb serial driver fixes and device id updates

   - usb gadget hid fixes

   - usb host driver fixes

   - usb dwc3 driver fixes

   - other usb gadget driver fixes

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (21 commits)
  usb: typec: tcpm: Keep other events when receiving FRS and Sourcing_vbus events
  usb: dwc3: gadget: Avoid runtime resume if disabling pullup
  usb: dwc3: gadget: Use list_replace_init() before traversing lists
  USB: serial: ftdi_sio: add device ID for Auto-M3 OP-COM v2
  USB: serial: pl2303: fix GT type detection
  USB: serial: option: add Telit FD980 composition 0x1056
  USB: serial: pl2303: fix HX type detection
  USB: serial: ch341: fix character loss at high transfer rates
  usb: cdnsp: Fix the IMAN_IE_SET and IMAN_IE_CLEAR macro
  usb: cdnsp: Fixed issue with ZLP
  usb: cdnsp: Fix incorrect supported maximum speed
  usb: cdns3: Fixed incorrect gadget state
  usb: gadget: f_hid: idle uses the highest byte for duration
  Revert "thunderbolt: Hide authorized attribute if router does not support PCIe tunnels"
  usb: otg-fsm: Fix hrtimer list corruption
  usb: host: ohci-at91: suspend/resume ports after/before OHCI accesses
  usb: musb: Fix suspend and resume issues for PHYs on I2C and SPI
  usb: gadget: f_hid: added GET_IDLE and SET_IDLE handlers
  usb: gadget: f_hid: fixed NULL pointer dereference
  usb: gadget: remove leaked entry from udc driver list
  ...

ppp: Fix generating ppp unit id when ifname is not specified

When registering new ppp interface via PPPIOCNEWUNIT ioctl then kernel has
to choose interface name as this ioctl API does not support specifying it.

Kernel in this case register new interface with name "ppp<id>" where <id>
is the ppp unit id, which can be obtained via PPPIOCGUNIT ioctl. This
applies also in the case when registering new ppp interface via rtnl
without supplying IFLA_IFNAME.

PPPIOCNEWUNIT ioctl allows to specify own ppp unit id which will kernel
assign to ppp interface, in case this ppp id is not already used by other
ppp interface.

In case user does not specify ppp unit id then kernel choose the first free
ppp unit id. This applies also for case when creating ppp interface via
rtnl method as it does not provide a way for specifying own ppp unit id.

If some network interface (does not have to be ppp) has name "ppp<id>"
with this first free ppp id then PPPIOCNEWUNIT ioctl or rtnl call fails.

And registering new ppp interface is not possible anymore, until interface
which holds conflicting name is renamed. Or when using rtnl method with
custom interface name in IFLA_IFNAME.

As list of allocated / used ppp unit ids is not possible to retrieve from
kernel to userspace, userspace has no idea what happens nor which interface
is doing this conflict.

So change the algorithm how ppp unit id is generated. And choose the first
number which is not neither used as ppp unit id nor in some network
interface with pattern "ppp<id>".

This issue can be simply reproduced by following pppd call when there is no
ppp interface registered and also no interface with name pattern "ppp<id>":

pppd ifname ppp1 +ipv6 noip noauth nolock local nodetach pty "pppd +ipv6 noip noauth nolock local nodetach notty"

Or by creating the one ppp interface (which gets assigned ppp unit id 0),
renaming it to "ppp1" and then trying to create a new ppp interface (which
will always fails as next free ppp unit id is 1, but network interface with
name "ppp1" exists).

This patch fixes above described issue by generating new and new ppp unit
id until some non-conflicting id with network interfaces is generated.

Signed-off-by: Pali Rohár <[email protected]>
Cc: [email protected]
Signed-off-by: David S. Miller <[email protected]>

ppp: Fix generating ifname when empty IFLA_IFNAME is specified

IFLA_IFNAME is nul-term string which means that IFLA_IFNAME buffer can be
larger than length of string which contains.

Function __rtnl_newlink() generates new own ifname if either IFLA_IFNAME
was not specified at all or userspace passed empty nul-term string.

It is expected that if userspace does not specify ifname for new ppp netdev
then kernel generates one in format "ppp<id>" where id matches to the ppp
unit id which can be later obtained by PPPIOCGUNIT ioctl.

And it works in this way if IFLA_IFNAME is not specified at all. But it
does not work when IFLA_IFNAME is specified with empty string.

So fix this logic also for empty IFLA_IFNAME in ppp_nl_newlink() function
and correctly generates ifname based on ppp unit identifier if userspace
did not provided preferred ifname.

Without this patch when IFLA_IFNAME was specified with empty string then
kernel created a new ppp interface in format "ppp<id>" but id did not
match ppp unit id returned by PPPIOCGUNIT ioctl. In this case id was some
number generated by __rtnl_newlink() function.

Signed-off-by: Pali Rohár <[email protected]>
Fixes: bb8082f69138 ("ppp: build ifname using unit identifier for rtnl based devices")
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'bnxt_en-ptp-fixes'

Michael Chan says:

====================
bnxt_en: PTP fixes

This series includes 2 fixes for the PTP feature.  Update to the new
firmware interface so that the driver can pass the PTP sequence number
header offset of TX packets to the firmware.  This is needed for all
PTP packet types (v1, v2, with or without VLAN) to work.  The 2nd
fix is to use a different register window to read the PHC to avoid
conflict with an older Broadcom tool.
====================

Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Use register window 6 instead of 5 to read the PHC

Some older Broadcom debug tools use window 5 and may conflict, so switch
to use window 6 instead.

Fixes: 118612d519d8 ("bnxt_en: Add PTP clock APIs, ioctls, and ethtool methods")
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Update firmware call to retrieve TX PTP timestamp

New firmware interface requires the PTP sequence ID header offset to
be passed to the firmware to properly find the matching timestamp
for all protocols.

Fixes: 83bb623c968e ("bnxt_en: Transmit and retrieve packet timestamps")
Reviewed-by: Edwin Peer <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Update firmware interface to 1.10.2.52

The key change is the firmware call to retrieve the PTP TX timestamp.
The header offset for the PTP sequence number field is now added.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

once: Fix panic when module unload

DO_ONCE
DEFINE_STATIC_KEY_TRUE(___once_key);
__do_once_done
  once_disable_jump(once_key);
    INIT_WORK(&w->work, once_deferred);
    struct once_work *w;
    w->key = key;
    schedule_work(&w->work);                     module unload
                                                   //*the key is
destroy*
process_one_work
  once_deferred
    BUG_ON(!static_key_enabled(work->key));
       static_key_count((struct static_key *)x)    //*access key, crash*

When module uses DO_ONCE mechanism, it could crash due to the above
concurrency problem, we could reproduce it with link[1].

Fix it by add/put module refcount in the once work process.

[1] https://lore.kernel.org/netdev/eaa6c371-465e-57eb-6be9-f4b16b9d7cbf@huawei.com/

Cc: Hannes Frederic Sowa <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Eric Dumazet <[email protected]>
Reported-by: Minmin chen <[email protected]>
Signed-off-by: Kefeng Wang <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

atm: horizon: Fix spelling mistakes in TX comment

It's "must not", not "musn't", meaning "shall not".
Let's fix that.

Suggested-by: Joe Perches <[email protected]>
Signed-off-by: Jun Miao <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ptp: Fix possible memory leak caused by invalid cast

Fixes possible leak of PTP virtual clocks.

The number of PTP virtual clocks to be unregistered is passed as
'u32', but the function that unregister the devices handles that as
'u8'.

Fixes: 73f37068d540 ("ptp: support ptp physical/virtual clocks conversion")
Signed-off-by: Vinicius Costa Gomes <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

devlink: Simplify devlink port API calls

Devlink port already has pointer to the devlink instance and all API
calls that forward these devlink ports to the drivers perform same
"devlink_port->devlink" assignment before actual call.

This patch removes useless parameter and allows us in the future
to create specific devlink_port_ops to manage user space access with
reliable ops assignment.

Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>