]> Git Repo - linux.git/log
linux.git
15 months agobpf: Add missing BPF_LINK_TYPE invocations
Jiri Olsa [Fri, 15 Dec 2023 23:05:02 +0000 (00:05 +0100)]
bpf: Add missing BPF_LINK_TYPE invocations

Pengfei Xu reported [1] Syzkaller/KASAN issue found in bpf_link_show_fdinfo.

The reason is missing BPF_LINK_TYPE invocation for uprobe multi
link and for several other links, adding that.

[1] https://lore.kernel.org/bpf/[email protected]/

Fixes: 89ae89f53d20 ("bpf: Add multi uprobe link")
Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support")
Fixes: 84601d6ee68a ("bpf: add bpf_link support for BPF_NETFILTER programs")
Fixes: 35dfaad7188c ("netkit, bpf: Add bpf programmable net device")
Reported-by: Pengfei Xu <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Tested-by: Pengfei Xu <[email protected]>
Acked-by: Hou Tao <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
15 months agoselftests/bpf: Temporarily disable dummy_struct_ops test on s390
Alexei Starovoitov [Sat, 16 Dec 2023 00:28:25 +0000 (16:28 -0800)]
selftests/bpf: Temporarily disable dummy_struct_ops test on s390

Temporarily disable dummy_struct_ops test on s390.
The breakage is likely due to
commit 2cd3e3772e41 ("x86/cfi,bpf: Fix bpf_struct_ops CFI").

Signed-off-by: Alexei Starovoitov <[email protected]>
15 months agoMerge branch 'x86-cfi-bpf-fix-cfi-vs-ebpf'
Alexei Starovoitov [Fri, 15 Dec 2023 19:24:51 +0000 (11:24 -0800)]
Merge branch 'x86-cfi-bpf-fix-cfi-vs-ebpf'

Peter Zijlstra says:

====================
x86/cfi,bpf: Fix CFI vs eBPF

Hi!

What started with the simple observation that bpf_dispatcher_*_func() was
broken for calling CFI functions with a __nocfi calling context for FineIBT
ended up with a complete BPF wide CFI fixup.

With these changes on the BPF selftest suite passes without crashing -- there's
still a few failures, but Alexei has graciously offered to look into those.

(Alexei, I have presumed your SoB on the very last patch, please update
as you see fit)

Changes since v2 are numerous but include:
 - cfi_get_offset() -- as a means to communicate the offset (ast)
 - 5 new patches fixing various BPF internals to be CFI clean

Note: it *might* be possible to merge the
bpf_bpf_tcp_ca.c:unsupported_ops[] thing into the CFI stubs, as is
get_info will have a NULL stub, unlike the others.
---
 arch/riscv/include/asm/cfi.h   |   3 +-
 arch/riscv/kernel/cfi.c        |   2 +-
 arch/x86/include/asm/cfi.h     | 126 +++++++++++++++++++++++++++++++++++++-
 arch/x86/kernel/alternative.c  |  87 +++++++++++++++++++++++---
 arch/x86/kernel/cfi.c          |   4 +-
 arch/x86/net/bpf_jit_comp.c    | 134 +++++++++++++++++++++++++++++++++++------
 include/asm-generic/Kbuild     |   1 +
 include/linux/bpf.h            |  27 ++++++++-
 include/linux/cfi.h            |  12 ++++
 kernel/bpf/bpf_struct_ops.c    |  16 ++---
 kernel/bpf/core.c              |  25 ++++++++
 kernel/bpf/cpumask.c           |   8 ++-
 kernel/bpf/helpers.c           |  18 +++++-
 net/bpf/bpf_dummy_struct_ops.c |  31 +++++++++-
 net/bpf/test_run.c             |  15 ++++-
 net/ipv4/bpf_tcp_ca.c          |  69 +++++++++++++++++++++
 16 files changed, 528 insertions(+), 50 deletions(-)
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
15 months agox86/cfi,bpf: Fix bpf_exception_cb() signature
Alexei Starovoitov [Fri, 15 Dec 2023 09:12:23 +0000 (10:12 +0100)]
x86/cfi,bpf: Fix bpf_exception_cb() signature

As per the earlier patches, BPF sub-programs have bpf_callback_t
signature and CFI expects callers to have matching signature. This is
violated by bpf_prog_aux::bpf_exception_cb().

[peterz: Changelog]
Reported-by: Peter Zijlstra <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/CAADnVQ+Z7UcXXBBhMubhcMM=R-dExk-uHtfOLtoLxQ1XxEpqEA@mail.gmail.com
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
15 months agobpf: Fix dtor CFI
Peter Zijlstra [Fri, 15 Dec 2023 09:12:22 +0000 (10:12 +0100)]
bpf: Fix dtor CFI

Ensure the various dtor functions match their prototype and retain
their CFI signatures, since they don't have their address taken, they
are prone to not getting CFI, making them impossible to call
indirectly.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
15 months agocfi: Add CFI_NOSEAL()
Peter Zijlstra [Fri, 15 Dec 2023 09:12:21 +0000 (10:12 +0100)]
cfi: Add CFI_NOSEAL()

Add a CFI_NOSEAL() helper to mark functions that need to retain their
CFI information, despite not otherwise leaking their address.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
15 months agox86/cfi,bpf: Fix bpf_struct_ops CFI
Peter Zijlstra [Fri, 15 Dec 2023 09:12:20 +0000 (10:12 +0100)]
x86/cfi,bpf: Fix bpf_struct_ops CFI

BPF struct_ops uses __arch_prepare_bpf_trampoline() to write
trampolines for indirect function calls. These tramplines much have
matching CFI.

In order to obtain the correct CFI hash for the various methods, add a
matching structure that contains stub functions, the compiler will
generate correct CFI which we can pilfer for the trampolines.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
15 months agox86/cfi,bpf: Fix bpf_callback_t CFI
Peter Zijlstra [Fri, 15 Dec 2023 09:12:19 +0000 (10:12 +0100)]
x86/cfi,bpf: Fix bpf_callback_t CFI

Where the main BPF program is expected to match bpf_func_t,
sub-programs are expected to match bpf_callback_t.

This fixes things like:

tools/testing/selftests/bpf/progs/bloom_filter_bench.c:

           bpf_for_each_map_elem(&array_map, bloom_callback, &data, 0);

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
15 months agox86/cfi,bpf: Fix BPF JIT call
Peter Zijlstra [Fri, 15 Dec 2023 09:12:18 +0000 (10:12 +0100)]
x86/cfi,bpf: Fix BPF JIT call

The current BPF call convention is __nocfi, except when it calls !JIT things,
then it calls regular C functions.

It so happens that with FineIBT the __nocfi and C calling conventions are
incompatible. Specifically __nocfi will call at func+0, while FineIBT will have
endbr-poison there, which is not a valid indirect target. Causing #CP.

Notably this only triggers on IBT enabled hardware, which is probably why this
hasn't been reported (also, most people will have JIT on anyway).

Implement proper CFI prologues for the BPF JIT codegen and drop __nocfi for
x86.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
15 months agocfi: Flip headers
Peter Zijlstra [Fri, 15 Dec 2023 09:12:17 +0000 (10:12 +0100)]
cfi: Flip headers

Normal include order is that linux/foo.h should include asm/foo.h, CFI has it
the wrong way around.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Sami Tolvanen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
15 months agobtrfs: do not allow non subvolume root targets for snapshot
Josef Bacik [Fri, 15 Dec 2023 15:01:44 +0000 (10:01 -0500)]
btrfs: do not allow non subvolume root targets for snapshot

Our btrfs subvolume snapshot <source> <destination> utility enforces
that <source> is the root of the subvolume, however this isn't enforced
in the kernel.  Update the kernel to also enforce this limitation to
avoid problems with other users of this ioctl that don't have the
appropriate checks in place.

Reported-by: Martin Michaelis <[email protected]>
CC: [email protected] # 4.14+
Reviewed-by: Neal Gompa <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
15 months agocred: get rid of CONFIG_DEBUG_CREDENTIALS
Jens Axboe [Fri, 15 Dec 2023 20:40:57 +0000 (13:40 -0700)]
cred: get rid of CONFIG_DEBUG_CREDENTIALS

This code is rarely (never?) enabled by distros, and it hasn't caught
anything in decades. Let's kill off this legacy debug code.

Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
15 months agocred: switch to using atomic_long_t
Jens Axboe [Fri, 15 Dec 2023 20:24:10 +0000 (13:24 -0700)]
cred: switch to using atomic_long_t

There are multiple ways to grab references to credentials, and the only
protection we have against overflowing it is the memory required to do
so.

With memory sizes only moving in one direction, let's bump the reference
count to 64-bit and move it outside the realm of feasibly overflowing.

Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
15 months agoselftests/bpf: Add test for abnormal cnt during multi-kprobe attachment
Hou Tao [Fri, 15 Dec 2023 10:07:08 +0000 (18:07 +0800)]
selftests/bpf: Add test for abnormal cnt during multi-kprobe attachment

If an abnormally huge cnt is used for multi-kprobes attachment, the
following warning will be reported:

  ------------[ cut here ]------------
  WARNING: CPU: 1 PID: 392 at mm/util.c:632 kvmalloc_node+0xd9/0xe0
  Modules linked in: bpf_testmod(O)
  CPU: 1 PID: 392 Comm: test_progs Tainted: G ...... 6.7.0-rc3+ #32
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
  ......
  RIP: 0010:kvmalloc_node+0xd9/0xe0
   ? __warn+0x89/0x150
   ? kvmalloc_node+0xd9/0xe0
   bpf_kprobe_multi_link_attach+0x87/0x670
   __sys_bpf+0x2a28/0x2bc0
   __x64_sys_bpf+0x1a/0x30
   do_syscall_64+0x36/0xb0
   entry_SYSCALL_64_after_hwframe+0x6e/0x76
  RIP: 0033:0x7fbe067f0e0d
  ......
   </TASK>
  ---[ end trace 0000000000000000 ]---

So add a test to ensure the warning is fixed.

Signed-off-by: Hou Tao <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
15 months agoselftests/bpf: Don't use libbpf_get_error() in kprobe_multi_test
Hou Tao [Fri, 15 Dec 2023 10:07:07 +0000 (18:07 +0800)]
selftests/bpf: Don't use libbpf_get_error() in kprobe_multi_test

Since libbpf v1.0, libbpf doesn't return error code embedded into the
pointer iteself, libbpf_get_error() is deprecated and it is basically
the same as using -errno directly.

So replace the invocations of libbpf_get_error() by -errno in
kprobe_multi_test. For libbpf_get_error() in test_attach_api_fails(),
saving -errno before invoking ASSERT_xx() macros just in case that
errno is overwritten by these macros. However, the invocation of
libbpf_get_error() in get_syms() should be kept intact, because
hashmap__new() still returns a pointer with embedded error code.

Signed-off-by: Hou Tao <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
15 months agoselftests/bpf: Add test for abnormal cnt during multi-uprobe attachment
Hou Tao [Fri, 15 Dec 2023 10:07:06 +0000 (18:07 +0800)]
selftests/bpf: Add test for abnormal cnt during multi-uprobe attachment

If an abnormally huge cnt is used for multi-uprobes attachment, the
following warning will be reported:

  ------------[ cut here ]------------
  WARNING: CPU: 7 PID: 406 at mm/util.c:632 kvmalloc_node+0xd9/0xe0
  Modules linked in: bpf_testmod(O)
  CPU: 7 PID: 406 Comm: test_progs Tainted: G ...... 6.7.0-rc3+ #32
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ......
  RIP: 0010:kvmalloc_node+0xd9/0xe0
  ......
  Call Trace:
   <TASK>
   ? __warn+0x89/0x150
   ? kvmalloc_node+0xd9/0xe0
   bpf_uprobe_multi_link_attach+0x14a/0x480
   __sys_bpf+0x14a9/0x2bc0
   do_syscall_64+0x36/0xb0
   entry_SYSCALL_64_after_hwframe+0x6e/0x76
   ......
   </TASK>
  ---[ end trace 0000000000000000 ]---

So add a test to ensure the warning is fixed.

Signed-off-by: Hou Tao <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
15 months agobpf: Limit the number of kprobes when attaching program to multiple kprobes
Hou Tao [Fri, 15 Dec 2023 10:07:05 +0000 (18:07 +0800)]
bpf: Limit the number of kprobes when attaching program to multiple kprobes

An abnormally big cnt may also be assigned to kprobe_multi.cnt when
attaching multiple kprobes. It will trigger the following warning in
kvmalloc_node():

if (unlikely(size > INT_MAX)) {
    WARN_ON_ONCE(!(flags & __GFP_NOWARN));
    return NULL;
}

Fix the warning by limiting the maximal number of kprobes in
bpf_kprobe_multi_link_attach(). If the number of kprobes is greater than
MAX_KPROBE_MULTI_CNT, the attachment will fail and return -E2BIG.

Fixes: 0dcac2725406 ("bpf: Add multi kprobe link")
Signed-off-by: Hou Tao <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
15 months agobpf: Limit the number of uprobes when attaching program to multiple uprobes
Hou Tao [Fri, 15 Dec 2023 10:07:04 +0000 (18:07 +0800)]
bpf: Limit the number of uprobes when attaching program to multiple uprobes

An abnormally big cnt may be passed to link_create.uprobe_multi.cnt,
and it will trigger the following warning in kvmalloc_node():

if (unlikely(size > INT_MAX)) {
WARN_ON_ONCE(!(flags & __GFP_NOWARN));
return NULL;
}

Fix the warning by limiting the maximal number of uprobes in
bpf_uprobe_multi_link_attach(). If the number of uprobes is greater than
MAX_UPROBE_MULTI_CNT, the attachment will return -E2BIG.

Fixes: 89ae89f53d20 ("bpf: Add multi uprobe link")
Reported-by: Xingwei Lee <[email protected]>
Signed-off-by: Hou Tao <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Closes: https://lore.kernel.org/bpf/CABOYnLwwJY=yFAGie59LFsUsBAgHfroVqbzZ5edAXbFE3YiNVA@mail.gmail.com
Link: https://lore.kernel.org/bpf/[email protected]
15 months agoRevert "PCI: acpiphp: Reassign resources on bridge if necessary"
Bjorn Helgaas [Thu, 14 Dec 2023 15:08:56 +0000 (09:08 -0600)]
Revert "PCI: acpiphp: Reassign resources on bridge if necessary"

This reverts commit 40613da52b13fb21c5566f10b287e0ca8c12c4e9 and the
subsequent fix to it:

  cc22522fd55e ("PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus")

40613da52b13 fixed a problem where hot-adding a device with large BARs
failed if the bridge windows programmed by firmware were not large enough.

cc22522fd55e ("PCI: acpiphp: Use pci_assign_unassigned_bridge_resources()
only for non-root bus") fixed a problem with 40613da52b13: an ACPI hot-add
of a device on a PCI root bus (common in the virt world) or firmware
sending ACPI Bus Check to non-existent Root Ports (e.g., on Dell Inspiron
7352/0W6WV0) caused a NULL pointer dereference and suspend/resume hangs.

Unfortunately the combination of 40613da52b13 and cc22522fd55e caused other
problems:

  - Fiona reported that hot-add of SCSI disks in QEMU virtual machine fails
    sometimes.

  - Dongli reported a similar problem with hot-add of SCSI disks.

  - Jonathan reported a console freeze during boot on bare metal due to an
    error in radeon GPU initialization.

Revert both patches to avoid adding these problems.  This means we will
again see the problems with hot-adding devices with large BARs and the NULL
pointer dereferences and suspend/resume issues that 40613da52b13 and
cc22522fd55e were intended to fix.

Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")
Fixes: cc22522fd55e ("PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus")
Reported-by: Fiona Ebner <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]
Reported-by: Dongli Zhang <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]
Reported-by: Jonathan Woithe <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bjorn Helgaas <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Acked-by: Igor Mammedov <[email protected]>
Cc: <[email protected]>
15 months agoMerge tag 'io_uring-6.7-2023-12-15' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 15 Dec 2023 20:20:14 +0000 (12:20 -0800)]
Merge tag 'io_uring-6.7-2023-12-15' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:
 "Just two minor fixes:

   - Fix for the io_uring socket option commands using the wrong value
     on some archs (Al)

   - Tweak to the poll lazy wake enable (me)"

* tag 'io_uring-6.7-2023-12-15' of git://git.kernel.dk/linux:
  io_uring/cmd: fix breakage in SOCKET_URING_OP_SIOC* implementation
  io_uring/poll: don't enable lazy wake for POLLEXCLUSIVE

15 months agoMerge tag 'mm-hotfixes-stable-2023-12-15-07-11' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Fri, 15 Dec 2023 20:00:54 +0000 (12:00 -0800)]
Merge tag 'mm-hotfixes-stable-2023-12-15-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "17 hotfixes. 8 are cc:stable and the other 9 pertain to post-6.6
  issues"

* tag 'mm-hotfixes-stable-2023-12-15-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/mglru: reclaim offlined memcgs harder
  mm/mglru: respect min_ttl_ms with memcgs
  mm/mglru: try to stop at high watermarks
  mm/mglru: fix underprotected page cache
  mm/shmem: fix race in shmem_undo_range w/THP
  Revert "selftests: error out if kernel header files are not yet built"
  crash_core: fix the check for whether crashkernel is from high memory
  x86, kexec: fix the wrong ifdeffery CONFIG_KEXEC
  sh, kexec: fix the incorrect ifdeffery and dependency of CONFIG_KEXEC
  mips, kexec: fix the incorrect ifdeffery and dependency of CONFIG_KEXEC
  m68k, kexec: fix the incorrect ifdeffery and build dependency of CONFIG_KEXEC
  loongarch, kexec: change dependency of object files
  mm/damon/core: make damon_start() waits until kdamond_fn() starts
  selftests/mm: cow: print ksft header before printing anything else
  mm: fix VMA heap bounds checking
  riscv: fix VMALLOC_START definition
  kexec: drop dependency on ARCH_SUPPORTS_KEXEC from CRASH_DUMP

15 months agoMerge tag 'sound-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 15 Dec 2023 19:35:55 +0000 (11:35 -0800)]
Merge tag 'sound-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A collection of HD-audio quirks for TAS2781 codec and device-specific
  workarounds"

* tag 'sound-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/tas2781: reset the amp before component_add
  ALSA: hda/tas2781: call cleanup functions only once
  ALSA: hda/tas2781: handle missing EFI calibration data
  ALSA: hda/tas2781: leave hda_component in usable state
  ALSA: hda/realtek: Apply mute LED quirk for HP15-db
  ALSA: hda/hdmi: add force-connect quirks for ASUSTeK Z170 variants
  ALSA: hda/hdmi: add force-connect quirk for NUC5CPYB

15 months agoMerge tag 'drm-fixes-2023-12-15' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 15 Dec 2023 19:07:13 +0000 (11:07 -0800)]
Merge tag 'drm-fixes-2023-12-15' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "More regular fixes, amdgpu, i915, mediatek and nouveau are most of
  them this week. Nothing too major, then a few misc bits and pieces in
  core, panel and ivpu.

  drm:
   - fix uninit problems in crtc
   - fix fd ownership check
   - edid: add modes in fallback paths

  panel:
   - move LG panel into DSI yaml
   - ltk050h3146w: set burst mode

  mediatek:
   - mtk_disp_gamma: Fix breakage due to merge issue
   - fix kernel oops if no crtc is found
   - Add spinlock for setting vblank event in atomic_begin
   - Fix access violation in mtk_drm_crtc_dma_dev_get

  i915:
   - Fix selftest engine reset count storage for multi-tile
   - Fix out-of-bounds reads for engine reset counts
   - Fix ADL+ remapped stride with CCS
   - Fix intel_atomic_setup_scalers() plane_state handling
   - Fix ADL+ tiled plane stride when the POT stride is smaller than the original
   - Fix eDP 1.4 rate select method link configuration

  amdgpu:
   - Fix suspend fix that got accidently mangled last week
   - Fix OD regression
   - PSR fixes
   - OLED Backlight regression fix
   - JPEG 4.0.5 fix
   - Misc display fixes
   - SDMA 5.2 fix
   - SDMA 2.4 regression fix
   - GPUVM race fix

  nouveau:
   - fix gk20a instobj hierarchy
   - fix headless iors inheritance regression

  ivpu:
   - fix WA initialisation"

* tag 'drm-fixes-2023-12-15' of git://anongit.freedesktop.org/drm/drm: (31 commits)
  drm/nouveau/kms/nv50-: Don't allow inheritance of headless iors
  drm/nouveau: Fixup gk20a instobj hierarchy
  drm/amdgpu: warn when there are still mappings when a BO is destroyed v2
  drm/amdgpu: fix tear down order in amdgpu_vm_pt_free
  drm/amd: Fix a probing order problem on SDMA 2.4
  drm/amdgpu/sdma5.2: add begin/end_use ring callbacks
  drm/panel: ltk050h3146w: Set burst mode for ltk050h3148w
  dt-bindings: panel-simple-dsi: move LG 5" HD TFT LCD panel into DSI yaml
  drm/amd/display: Disable PSR-SU on Parade 0803 TCON again
  drm/amd/display: Populate dtbclk from bounding box
  drm/amd/display: Revert "Fix conversions between bytes and KB"
  drm/amdgpu/jpeg: configure doorbell for each playback
  drm/amd/display: Restore guard against default backlight value < 1 nit
  drm/amd/display: fix hw rotated modes when PSR-SU is enabled
  drm/amd/pm: fix pp_*clk_od typo
  drm/amdgpu: fix buffer funcs setting order on suspend harder
  drm/mediatek: Fix access violation in mtk_drm_crtc_dma_dev_get
  drm/edid: also call add modes in EDID connector update fallback
  drm/i915/edp: don't write to DP_LINK_BW_SET when using rate select
  drm/i915: Fix ADL+ tiled plane stride when the POT stride is smaller than the original
  ...

15 months agobnxt_en: do not map packet buffers twice
Andy Gospodarek [Thu, 14 Dec 2023 21:31:38 +0000 (13:31 -0800)]
bnxt_en: do not map packet buffers twice

Remove double-mapping of DMA buffers as it can prevent page pool entries
from being freed.  Mapping is managed by page pool infrastructure and
was previously managed by the driver in __bnxt_alloc_rx_page before
allowing the page pool infrastructure to manage it.

Fixes: 578fcfd26e2a ("bnxt_en: Let the page pool manage the DMA mapping")
Reviewed-by: Somnath Kotur <[email protected]>
Signed-off-by: Andy Gospodarek <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Reviewed-by: David Wei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
15 months agoBluetooth: af_bluetooth: Fix Use-After-Free in bt_sock_recvmsg
Hyunwoo Kim [Sat, 9 Dec 2023 10:55:18 +0000 (05:55 -0500)]
Bluetooth: af_bluetooth: Fix Use-After-Free in bt_sock_recvmsg

This can cause a race with bt_sock_ioctl() because
bt_sock_recvmsg() gets the skb from sk->sk_receive_queue
and then frees it without holding lock_sock.
A use-after-free for a skb occurs with the following flow.
```
bt_sock_recvmsg() -> skb_recv_datagram() -> skb_free_datagram()
bt_sock_ioctl() -> skb_peek()
```
Add lock_sock to bt_sock_recvmsg() to fix this issue.

Cc: [email protected]
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Hyunwoo Kim <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
15 months agoBluetooth: Add more enc key size check
Alex Lu [Tue, 12 Dec 2023 02:30:34 +0000 (10:30 +0800)]
Bluetooth: Add more enc key size check

When we are slave role and receives l2cap conn req when encryption has
started, we should check the enc key size to avoid KNOB attack or BLUFFS
attack.
From SIG recommendation, implementations are advised to reject
service-level connections on an encrypted baseband link with key
strengths below 7 octets.
A simple and clear way to achieve this is to place the enc key size
check in hci_cc_read_enc_key_size()

The btmon log below shows the case that lacks enc key size check.

> HCI Event: Connect Request (0x04) plen 10
        Address: BB:22:33:44:55:99 (OUI BB-22-33)
        Class: 0x480104
          Major class: Computer (desktop, notebook, PDA, organizers)
          Minor class: Desktop workstation
          Capturing (Scanner, Microphone)
          Telephony (Cordless telephony, Modem, Headset)
        Link type: ACL (0x01)
< HCI Command: Accept Connection Request (0x01|0x0009) plen 7
        Address: BB:22:33:44:55:99 (OUI BB-22-33)
        Role: Peripheral (0x01)
> HCI Event: Command Status (0x0f) plen 4
      Accept Connection Request (0x01|0x0009) ncmd 2
        Status: Success (0x00)
> HCI Event: Connect Complete (0x03) plen 11
        Status: Success (0x00)
        Handle: 1
        Address: BB:22:33:44:55:99 (OUI BB-22-33)
        Link type: ACL (0x01)
        Encryption: Disabled (0x00)
...

> HCI Event: Encryption Change (0x08) plen 4
        Status: Success (0x00)
        Handle: 1 Address: BB:22:33:44:55:99 (OUI BB-22-33)
        Encryption: Enabled with E0 (0x01)
< HCI Command: Read Encryption Key Size (0x05|0x0008) plen 2
        Handle: 1 Address: BB:22:33:44:55:99 (OUI BB-22-33)
> HCI Event: Command Complete (0x0e) plen 7
      Read Encryption Key Size (0x05|0x0008) ncmd 2
        Status: Success (0x00)
        Handle: 1 Address: BB:22:33:44:55:99 (OUI BB-22-33)
        Key size: 6
// We should check the enc key size
...

> ACL Data RX: Handle 1 flags 0x02 dlen 12
      L2CAP: Connection Request (0x02) ident 3 len 4
        PSM: 25 (0x0019)
        Source CID: 64
< ACL Data TX: Handle 1 flags 0x00 dlen 16
      L2CAP: Connection Response (0x03) ident 3 len 8
        Destination CID: 64
        Source CID: 64
        Result: Connection pending (0x0001)
        Status: Authorization pending (0x0002)
> HCI Event: Number of Completed Packets (0x13) plen 5
        Num handles: 1
        Handle: 1 Address: BB:22:33:44:55:99 (OUI BB-22-33)
        Count: 1
        #35: len 16 (25 Kb/s)
        Latency: 5 msec (2-7 msec ~4 msec)
< ACL Data TX: Handle 1 flags 0x00 dlen 16
      L2CAP: Connection Response (0x03) ident 3 len 8
        Destination CID: 64
        Source CID: 64
        Result: Connection successful (0x0000)
        Status: No further information available (0x0000)

Cc: [email protected]
Signed-off-by: Alex Lu <[email protected]>
Signed-off-by: Max Chou <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
15 months agoBluetooth: MGMT/SMP: Fix address type when using SMP over BREDR/LE
Xiao Yao [Mon, 11 Dec 2023 16:27:18 +0000 (00:27 +0800)]
Bluetooth: MGMT/SMP: Fix address type when using SMP over BREDR/LE

If two Bluetooth devices both support BR/EDR and BLE, and also
support Secure Connections, then they only need to pair once.
The LTK generated during the LE pairing process may be converted
into a BR/EDR link key for BR/EDR transport, and conversely, a
link key generated during the BR/EDR SSP pairing process can be
converted into an LTK for LE transport. Hence, the link type of
the link key and LTK is not fixed, they can be either an LE LINK
or an ACL LINK.

Currently, in the mgmt_new_irk/ltk/crsk/link_key functions, the
link type is fixed, which could lead to incorrect address types
being reported to the application layer. Therefore, it is necessary
to add link_type/addr_type to the smp_irk/ltk/crsk and link_key,
to ensure the generation of the correct address type.

SMP over BREDR:
Before Fix:
> ACL Data RX: Handle 11 flags 0x02 dlen 12
        BR/EDR SMP: Identity Address Information (0x09) len 7
        Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
@ MGMT Event: New Identity Resolving Key (0x0018) plen 30
        Random address: 00:00:00:00:00:00 (Non-Resolvable)
        LE Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
@ MGMT Event: New Long Term Key (0x000a) plen 37
        LE Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
        Key type: Authenticated key from P-256 (0x03)

After Fix:
> ACL Data RX: Handle 11 flags 0x02 dlen 12
      BR/EDR SMP: Identity Address Information (0x09) len 7
        Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
@ MGMT Event: New Identity Resolving Key (0x0018) plen 30
        Random address: 00:00:00:00:00:00 (Non-Resolvable)
        BR/EDR Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
@ MGMT Event: New Long Term Key (0x000a) plen 37
        BR/EDR Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
        Key type: Authenticated key from P-256 (0x03)

SMP over LE:
Before Fix:
@ MGMT Event: New Identity Resolving Key (0x0018) plen 30
        Random address: 5F:5C:07:37:47:D5 (Resolvable)
        LE Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
@ MGMT Event: New Long Term Key (0x000a) plen 37
        LE Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
        Key type: Authenticated key from P-256 (0x03)
@ MGMT Event: New Link Key (0x0009) plen 26
        BR/EDR Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
        Key type: Authenticated Combination key from P-256 (0x08)

After Fix:
@ MGMT Event: New Identity Resolving Key (0x0018) plen 30
        Random address: 5E:03:1C:00:38:21 (Resolvable)
        LE Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
@ MGMT Event: New Long Term Key (0x000a) plen 37
        LE Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
        Key type: Authenticated key from P-256 (0x03)
@ MGMT Event: New Link Key (0x0009) plen 26
        Store hint: Yes (0x01)
        LE Address: F8:7D:76:F2:12:F3 (OUI F8-7D-76)
        Key type: Authenticated Combination key from P-256 (0x08)

Cc: [email protected]
Signed-off-by: Xiao Yao <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
15 months agoBluetooth: L2CAP: Send reject on command corrupted request
Frédéric Danis [Fri, 8 Dec 2023 17:41:50 +0000 (18:41 +0100)]
Bluetooth: L2CAP: Send reject on command corrupted request

L2CAP/COS/CED/BI-02-C PTS test send a malformed L2CAP signaling packet
with 2 commands in it (a connection request and an unknown command) and
expect to get a connection response packet and a command reject packet.
The second is currently not sent.

Cc: [email protected]
Signed-off-by: Frédéric Danis <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
15 months agoBluetooth: hci_core: Fix hci_conn_hash_lookup_cis
Luiz Augusto von Dentz [Fri, 8 Dec 2023 22:22:29 +0000 (17:22 -0500)]
Bluetooth: hci_core: Fix hci_conn_hash_lookup_cis

hci_conn_hash_lookup_cis shall always match the requested CIG and CIS
ids even when they are unset as otherwise it result in not being able
to bind/connect different sockets to the same address as that would
result in having multiple sockets mapping to the same hci_conn which
doesn't really work and prevents BAP audio configuration such as
AC 6(i) when CIG and CIS are left unset.

Fixes: c14516faede3 ("Bluetooth: hci_conn: Fix not matching by CIS ID")
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
15 months agoBluetooth: hci_event: shut up a false-positive warning
Arnd Bergmann [Wed, 22 Nov 2023 22:17:44 +0000 (23:17 +0100)]
Bluetooth: hci_event: shut up a false-positive warning

Turning on -Wstringop-overflow globally exposed a misleading compiler
warning in bluetooth:

net/bluetooth/hci_event.c: In function 'hci_cc_read_class_of_dev':
net/bluetooth/hci_event.c:524:9: error: 'memcpy' writing 3 bytes into a
region of size 0 overflows the destination [-Werror=stringop-overflow=]
  524 |         memcpy(hdev->dev_class, rp->dev_class, 3);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The problem here is the check for hdev being NULL in bt_dev_dbg() that
leads the compiler to conclude that hdev->dev_class might be an invalid
pointer access.

Add another explicit check for the same condition to make sure gcc sees
this cannot happen.

Fixes: a9de9248064b ("[Bluetooth] Switch from OGF+OCF to using only opcodes")
Fixes: 1b56c90018f0 ("Makefile: Enable -Wstringop-overflow globally")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
15 months agoBluetooth: hci_event: Fix not checking if HCI_OP_INQUIRY has been sent
Luiz Augusto von Dentz [Mon, 20 Nov 2023 15:04:39 +0000 (10:04 -0500)]
Bluetooth: hci_event: Fix not checking if HCI_OP_INQUIRY has been sent

Before setting HCI_INQUIRY bit check if HCI_OP_INQUIRY was really sent
otherwise the controller maybe be generating invalid events or, more
likely, it is a result of fuzzing tools attempting to test the right
behavior of the stack when unexpected events are generated.

Cc: [email protected]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218151
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
15 months agoBluetooth: Fix deadlock in vhci_send_frame
Ying Hsu [Fri, 10 Nov 2023 01:46:05 +0000 (01:46 +0000)]
Bluetooth: Fix deadlock in vhci_send_frame

syzbot found a potential circular dependency leading to a deadlock:
    -> #3 (&hdev->req_lock){+.+.}-{3:3}:
    __mutex_lock_common+0x1b6/0x1bc2 kernel/locking/mutex.c:599
    __mutex_lock kernel/locking/mutex.c:732 [inline]
    mutex_lock_nested+0x17/0x1c kernel/locking/mutex.c:784
    hci_dev_do_close+0x3f/0x9f net/bluetooth/hci_core.c:551
    hci_rfkill_set_block+0x130/0x1ac net/bluetooth/hci_core.c:935
    rfkill_set_block+0x1e6/0x3b8 net/rfkill/core.c:345
    rfkill_fop_write+0x2d8/0x672 net/rfkill/core.c:1274
    vfs_write+0x277/0xcf5 fs/read_write.c:594
    ksys_write+0x19b/0x2bd fs/read_write.c:650
    do_syscall_x64 arch/x86/entry/common.c:55 [inline]
    do_syscall_64+0x51/0xba arch/x86/entry/common.c:93
    entry_SYSCALL_64_after_hwframe+0x61/0xcb

    -> #2 (rfkill_global_mutex){+.+.}-{3:3}:
    __mutex_lock_common+0x1b6/0x1bc2 kernel/locking/mutex.c:599
    __mutex_lock kernel/locking/mutex.c:732 [inline]
    mutex_lock_nested+0x17/0x1c kernel/locking/mutex.c:784
    rfkill_register+0x30/0x7e3 net/rfkill/core.c:1045
    hci_register_dev+0x48f/0x96d net/bluetooth/hci_core.c:2622
    __vhci_create_device drivers/bluetooth/hci_vhci.c:341 [inline]
    vhci_create_device+0x3ad/0x68f drivers/bluetooth/hci_vhci.c:374
    vhci_get_user drivers/bluetooth/hci_vhci.c:431 [inline]
    vhci_write+0x37b/0x429 drivers/bluetooth/hci_vhci.c:511
    call_write_iter include/linux/fs.h:2109 [inline]
    new_sync_write fs/read_write.c:509 [inline]
    vfs_write+0xaa8/0xcf5 fs/read_write.c:596
    ksys_write+0x19b/0x2bd fs/read_write.c:650
    do_syscall_x64 arch/x86/entry/common.c:55 [inline]
    do_syscall_64+0x51/0xba arch/x86/entry/common.c:93
    entry_SYSCALL_64_after_hwframe+0x61/0xcb

    -> #1 (&data->open_mutex){+.+.}-{3:3}:
    __mutex_lock_common+0x1b6/0x1bc2 kernel/locking/mutex.c:599
    __mutex_lock kernel/locking/mutex.c:732 [inline]
    mutex_lock_nested+0x17/0x1c kernel/locking/mutex.c:784
    vhci_send_frame+0x68/0x9c drivers/bluetooth/hci_vhci.c:75
    hci_send_frame+0x1cc/0x2ff net/bluetooth/hci_core.c:2989
    hci_sched_acl_pkt net/bluetooth/hci_core.c:3498 [inline]
    hci_sched_acl net/bluetooth/hci_core.c:3583 [inline]
    hci_tx_work+0xb94/0x1a60 net/bluetooth/hci_core.c:3654
    process_one_work+0x901/0xfb8 kernel/workqueue.c:2310
    worker_thread+0xa67/0x1003 kernel/workqueue.c:2457
    kthread+0x36a/0x430 kernel/kthread.c:319
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298

    -> #0 ((work_completion)(&hdev->tx_work)){+.+.}-{0:0}:
    check_prev_add kernel/locking/lockdep.c:3053 [inline]
    check_prevs_add kernel/locking/lockdep.c:3172 [inline]
    validate_chain kernel/locking/lockdep.c:3787 [inline]
    __lock_acquire+0x2d32/0x77fa kernel/locking/lockdep.c:5011
    lock_acquire+0x273/0x4d5 kernel/locking/lockdep.c:5622
    __flush_work+0xee/0x19f kernel/workqueue.c:3090
    hci_dev_close_sync+0x32f/0x1113 net/bluetooth/hci_sync.c:4352
    hci_dev_do_close+0x47/0x9f net/bluetooth/hci_core.c:553
    hci_rfkill_set_block+0x130/0x1ac net/bluetooth/hci_core.c:935
    rfkill_set_block+0x1e6/0x3b8 net/rfkill/core.c:345
    rfkill_fop_write+0x2d8/0x672 net/rfkill/core.c:1274
    vfs_write+0x277/0xcf5 fs/read_write.c:594
    ksys_write+0x19b/0x2bd fs/read_write.c:650
    do_syscall_x64 arch/x86/entry/common.c:55 [inline]
    do_syscall_64+0x51/0xba arch/x86/entry/common.c:93
    entry_SYSCALL_64_after_hwframe+0x61/0xcb

This change removes the need for acquiring the open_mutex in
vhci_send_frame, thus eliminating the potential deadlock while
maintaining the required packet ordering.

Fixes: 92d4abd66f70 ("Bluetooth: vhci: Fix race when opening vhci device")
Signed-off-by: Ying Hsu <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
15 months agoBluetooth: Fix not notifying when connection encryption changes
Luiz Augusto von Dentz [Mon, 23 Oct 2023 23:26:23 +0000 (16:26 -0700)]
Bluetooth: Fix not notifying when connection encryption changes

Some layers such as SMP depend on getting notified about encryption
changes immediately as they only allow certain PDU to be transmitted
over an encrypted link which may cause SMP implementation to reject
valid PDUs received thus causing pairing to fail when it shouldn't.

Fixes: 7aca0ac4792e ("Bluetooth: Wait for HCI_OP_WRITE_AUTH_PAYLOAD_TO to complete")
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
15 months agowifi: ath11k: workaround too long expansion sparse warnings
Kalle Valo [Thu, 14 Dec 2023 16:17:40 +0000 (18:17 +0200)]
wifi: ath11k: workaround too long expansion sparse warnings

In v6.7-rc1 sparse warns:

drivers/net/wireless/ath/ath11k/mac.c:4702:15: error: too long token expansion
drivers/net/wireless/ath/ath11k/mac.c:4702:15: error: too long token expansion
drivers/net/wireless/ath/ath11k/mac.c:8393:23: error: too long token expansion
drivers/net/wireless/ath/ath11k/mac.c:8393:23: error: too long token expansion

Workaround the warnings by refactoring the code to a new function, which also
reduces code duplication. And in the new function use max3() to make the code
more readable.

No functional changes, compile tested only.

Acked-by: Jeff Johnson <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]
15 months agoRevert "wifi: ath12k: use ATH12K_PCI_IRQ_DP_OFFSET for DP IRQ"
Karthikeyan Periyasamy [Thu, 14 Dec 2023 05:32:15 +0000 (11:02 +0530)]
Revert "wifi: ath12k: use ATH12K_PCI_IRQ_DP_OFFSET for DP IRQ"

This reverts commit 1f1f7d548a00ebe50808cb1f580df9693e194a7c. The commit
caused bootup failure on QCN9274 hw2.0 platform. Incorrect hardcode DP
irq offset overwrite the CE irq, which caused the driver to miss the
mandatory bootup message from the firmware through the CE interrupt. This
occurs because the CE count differs between platforms. The revert has no
impact since the original change was based on an incorrect assumption.

Log:

ath12k_pci 0000:06:00.0: fw_version 0x1011001d fw_build_timestamp 2022-12-02 01:16 fw_build_id QC_IMAGE_VERSION_STRING=WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1
ath12k_pci 0000:06:00.0: failed to receive control response completion, polling..
ath12k_pci 0000:06:00.0: Service connect timeout
ath12k_pci 0000:06:00.0: failed to connect to HTT: -110
ath12k_pci 0000:06:00.0: failed to start core: -110

Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4

Signed-off-by: Karthikeyan Periyasamy <[email protected]>
Acked-by: Jeff Johnson <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]
15 months agonfsd: hold nfsd_mutex across entire netlink operation
NeilBrown [Fri, 15 Dec 2023 00:56:33 +0000 (11:56 +1100)]
nfsd: hold nfsd_mutex across entire netlink operation

Rather than using svc_get() and svc_put() to hold a stable reference to
the nfsd_svc for netlink lookups, simply hold the mutex for the entire
time.

The "entire" time isn't very long, and the mutex is not often contented.

This makes way for us to remove the refcounts of svc, which is more
confusing than useful.

Reported-by: Jeff Layton <[email protected]>
Closes: https://lore.kernel.org/linux-nfs/[email protected]/T/#u
Fixes: bd9d6a3efa97 ("NFSD: add rpc_status netlink support")
Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
15 months agonfsd: call nfsd_last_thread() before final nfsd_put()
NeilBrown [Fri, 15 Dec 2023 00:56:31 +0000 (11:56 +1100)]
nfsd: call nfsd_last_thread() before final nfsd_put()

If write_ports_addfd or write_ports_addxprt fail, they call nfsd_put()
without calling nfsd_last_thread().  This leaves nn->nfsd_serv pointing
to a structure that has been freed.

So remove 'static' from nfsd_last_thread() and call it when the
nfsd_serv is about to be destroyed.

Fixes: ec52361df99b ("SUNRPC: stop using ->sv_nrthreads as a refcount")
Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Cc: <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
15 months agoring-buffer: Do not record in NMI if the arch does not support cmpxchg in NMI
Steven Rostedt (Google) [Wed, 13 Dec 2023 22:54:03 +0000 (17:54 -0500)]
ring-buffer: Do not record in NMI if the arch does not support cmpxchg in NMI

As the ring buffer recording requires cmpxchg() to work, if the
architecture does not support cmpxchg in NMI, then do not do any recording
within an NMI.

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
15 months agoring-buffer: Have rb_time_cmpxchg() set the msb counter too
Steven Rostedt (Google) [Fri, 15 Dec 2023 13:41:14 +0000 (08:41 -0500)]
ring-buffer: Have rb_time_cmpxchg() set the msb counter too

The rb_time_cmpxchg() on 32-bit architectures requires setting three
32-bit words to represent the 64-bit timestamp, with some salt for
synchronization. Those are: msb, top, and bottom

The issue is, the rb_time_cmpxchg() did not properly salt the msb portion,
and the msb that was written was stale.

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Fixes: f03f2abce4f39 ("ring-buffer: Have 32 bit time stamps use all 64 bits")
Signed-off-by: Steven Rostedt (Google) <[email protected]>
15 months agowifi: rt2x00: remove useless code in rt2x00queue_create_tx_descriptor()
Dmitry Antipov [Wed, 13 Dec 2023 05:14:43 +0000 (08:14 +0300)]
wifi: rt2x00: remove useless code in rt2x00queue_create_tx_descriptor()

In 'rt2x00queue_create_tx_descriptor()', there is no need to call
'ieee80211_get_rts_cts_rate()' while checking for RTS/CTS frame
since this function returns NULL or pointer to internal bitrate
table entry, and the return value is not actually used. Compile
tested only.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Dmitry Antipov <[email protected]>
Acked-by: Stanislaw Gruszka <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]
15 months agoring-buffer: Fix 32-bit rb_time_read() race with rb_time_cmpxchg()
Mathieu Desnoyers [Tue, 12 Dec 2023 19:30:49 +0000 (14:30 -0500)]
ring-buffer: Fix 32-bit rb_time_read() race with rb_time_cmpxchg()

The following race can cause rb_time_read() to observe a corrupted time
stamp:

rb_time_cmpxchg()
[...]
        if (!rb_time_read_cmpxchg(&t->msb, msb, msb2))
                return false;
        if (!rb_time_read_cmpxchg(&t->top, top, top2))
                return false;
<interrupted before updating bottom>
__rb_time_read()
[...]
        do {
                c = local_read(&t->cnt);
                top = local_read(&t->top);
                bottom = local_read(&t->bottom);
                msb = local_read(&t->msb);
        } while (c != local_read(&t->cnt));

        *cnt = rb_time_cnt(top);

        /* If top and msb counts don't match, this interrupted a write */
        if (*cnt != rb_time_cnt(msb))
                return false;
          ^ this check fails to catch that "bottom" is still not updated.

So the old "bottom" value is returned, which is wrong.

Fix this by checking that all three of msb, top, and bottom 2-bit cnt
values match.

The reason to favor checking all three fields over requiring a specific
update order for both rb_time_set() and rb_time_cmpxchg() is because
checking all three fields is more robust to handle partial failures of
rb_time_cmpxchg() when interrupted by nested rb_time_set().

Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Fixes: f458a1453424e ("ring-buffer: Test last update in 32bit version of __rb_time_read()")
Signed-off-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
15 months agoring-buffer: Fix a race in rb_time_cmpxchg() for 32 bit archs
Steven Rostedt (Google) [Tue, 12 Dec 2023 16:53:01 +0000 (11:53 -0500)]
ring-buffer: Fix a race in rb_time_cmpxchg() for 32 bit archs

Mathieu Desnoyers pointed out an issue in the rb_time_cmpxchg() for 32 bit
architectures. That is:

 static bool rb_time_cmpxchg(rb_time_t *t, u64 expect, u64 set)
 {
unsigned long cnt, top, bottom, msb;
unsigned long cnt2, top2, bottom2, msb2;
u64 val;

/* The cmpxchg always fails if it interrupted an update */
 if (!__rb_time_read(t, &val, &cnt2))
 return false;

 if (val != expect)
 return false;

<<<< interrupted here!

 cnt = local_read(&t->cnt);

The problem is that the synchronization counter in the rb_time_t is read
*after* the value of the timestamp is read. That means if an interrupt
were to come in between the value being read and the counter being read,
it can change the value and the counter and the interrupted process would
be clueless about it!

The counter needs to be read first and then the value. That way it is easy
to tell if the value is stale or not. If the counter hasn't been updated,
then the value is still good.

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Fixes: 10464b4aa605e ("ring-buffer: Add rb_time_t 64 bit operations for speeding up 32 bit")
Reported-by: Mathieu Desnoyers <[email protected]>
Reviewed-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
15 months agowifi: rtw89: only reset BB/RF for existing WiFi 6 chips while starting up
Ping-Ke Shih [Mon, 11 Dec 2023 08:33:41 +0000 (16:33 +0800)]
wifi: rtw89: only reset BB/RF for existing WiFi 6 chips while starting up

The new WiFi 7 chips change the design, so no need to disable/enable
BB/RF when core_start(). Keep the same logic for existing chips.

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]
15 months agowifi: rtw89: add DBCC H2C to notify firmware the status
Ping-Ke Shih [Mon, 11 Dec 2023 08:33:40 +0000 (16:33 +0800)]
wifi: rtw89: add DBCC H2C to notify firmware the status

To support MLO of WiFi 7, we should configure hardware as DBCC mode, and
notify this status to firmware.

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]
15 months agowifi: rtw89: mac: add suffix _ax to MAC functions
Ping-Ke Shih [Mon, 11 Dec 2023 08:33:39 +0000 (16:33 +0800)]
wifi: rtw89: mac: add suffix _ax to MAC functions

Many existing MAC access functions are used by WiFi 6 chips only, so add
suffix _ax to be clearer. Some are common and can be used by WiFi 7, so
export this kind of functions. This patch doesn't change logic at all.

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]
15 months agowifi: rtw89: mac: add flags to check if CMAC and DMAC are enabled
Ping-Ke Shih [Mon, 11 Dec 2023 08:33:38 +0000 (16:33 +0800)]
wifi: rtw89: mac: add flags to check if CMAC and DMAC are enabled

Before accessing CMAC and DMAC registers, we should ensure they have been
powered on, so add flag to determine the state. For old chips, we read
registers and check corresponding bit, but it takes extra cost to read.

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]
15 months agowifi: rtw89: 8922a: add power on/off functions
Ping-Ke Shih [Mon, 11 Dec 2023 08:33:37 +0000 (16:33 +0800)]
wifi: rtw89: 8922a: add power on/off functions

The power on/off functions are to turn on hardware function blocks and
to turn off them if we are going to stay in idle state.

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]
15 months agowifi: rtw89: add XTAL SI for WiFi 7 chips
Ping-Ke Shih [Mon, 11 Dec 2023 08:33:36 +0000 (16:33 +0800)]
wifi: rtw89: add XTAL SI for WiFi 7 chips

The XTAL SI is a serial interface to indirectly access registers of
analog hardware circuit. Since WiFi 7 chips use different registers, add
a ops to access them via common functions. This patch doesn't change logic
for existing chips.

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]
15 months agowifi: rtw89: phy: print out RFK log with formatted string
Ping-Ke Shih [Wed, 13 Dec 2023 00:50:54 +0000 (08:50 +0800)]
wifi: rtw89: phy: print out RFK log with formatted string

With formatted string loaded from firmware file, we can use the formatted
string ID and get corresponding string, and then use regular rtw89_debug()
to show the message if debug mask of RFK is enabled.

If the string ID doesn't present, fallback to print plain hexadecimal.

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]
15 months agowifi: rtw89: parse and print out RFK log from C2H events
Ping-Ke Shih [Wed, 13 Dec 2023 00:50:53 +0000 (08:50 +0800)]
wifi: rtw89: parse and print out RFK log from C2H events

RFK log events contains two types. One called RUN log is to reflect state
during RFK is running, and it replies on formatted string loaded from
firmware file, but print this type as plain hexadecimal only in this patch.
The other is REPORT log that reflects the final result of a RFK, and
each calibration has its own struct to carry many specific information.

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]
15 months agowifi: rtw89: add C2H event handlers of RFK log and report
Ping-Ke Shih [Wed, 13 Dec 2023 00:50:52 +0000 (08:50 +0800)]
wifi: rtw89: add C2H event handlers of RFK log and report

Trigger a RFK (RF calibration) in firmware by a H2C command, and in
progress it reports log and a result finally by C2H events. Firstly, add
prototype of the C2H event handlers to have a simple picture of framework.

The callers who trigger H2C will wait until a C2H event is received,
so we must process these C2H events in receiving process. Thus, mark this
kind of C2H events as atomic. Also, timestamp is also useful for
debugging, mark C2H events carrying RFK log as atomic as well.

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]
15 months agowifi: rtw89: load RFK log format string from firmware file
Ping-Ke Shih [Wed, 13 Dec 2023 00:50:51 +0000 (08:50 +0800)]
wifi: rtw89: load RFK log format string from firmware file

To debug RFK (RF calibration) in firmware, it sends log via firmware C2H
events to driver with string format ID and four arguments. Load formatted
string from firmware file, and the string ID can get back its string. Then,
use regular print format to show the message.

This firmware element layout looks like

    +============================================+
    |  elm ID  | elm size | version  |           |
    +----------+----------+----------+-----------+
    |                     | nr |rsvd |rfk_id|rsvd|
    +--------------------------------------------+
    | offset[] (__le16 * nr)                     |
    | ...                                        |
    +--------------------------------------------+
    | formatted string with null termintor (*nr) |
    | ...                                        |
    +============================================+

 * a firmware file can contains more than one elements with this element ID
   named RTW89_FW_ELEMENT_ID_RFKLOG_FMT (19), because many RFK needs its
   own formatted strings, so add 'rfk_id' to know it belongs to which RFK.
 * the 'formatted string' just follow 'offset[]' without padding to align
   32bits.

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]
15 months agowifi: rtw89: fw: add version field to BB MCU firmware element
Ping-Ke Shih [Wed, 13 Dec 2023 00:50:50 +0000 (08:50 +0800)]
wifi: rtw89: fw: add version field to BB MCU firmware element

8922AE has more than one hardware version, and they use different BB MCU
firmware, so occupy a byte from element priv[] to annotate version. Since
there are more than one firmware and only matched version is adopted,
return 1 to ignore not matched firmware.

     +===========================================+
     |  elm ID  | elm size | version  |          |
     +----------+----------+----------+----------+
     |                     |  element_priv[]     |
     +-------------------------------------------+

                change to  |
                           v

     +===========================================+
     |  elm ID  | elm size | version  |          |
     +----------+----------+----------+----------+
     |                     | cv | element_rsvd[] |
     +-------------------------------------------+

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]
15 months agowifi: rtw89: fw: load TX power track tables from fw_element
Ping-Ke Shih [Wed, 13 Dec 2023 00:50:49 +0000 (08:50 +0800)]
wifi: rtw89: fw: load TX power track tables from fw_element

The TX power track tables are used to define compensation power reflected
to thermal value. Currently, we have 16 (2 * 4 * 2) tables made by
combinations of
  {negative/positive thermal value, 2GHz/2GHz-CCK/5GHz/6GHz, path A/B}

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]
15 months agoring-buffer: Remove useless update to write_stamp in rb_try_to_discard()
Steven Rostedt (Google) [Fri, 15 Dec 2023 13:18:10 +0000 (08:18 -0500)]
ring-buffer: Remove useless update to write_stamp in rb_try_to_discard()

When filtering is enabled, a temporary buffer is created to place the
content of the trace event output so that the filter logic can decide
from the trace event output if the trace event should be filtered out or
not. If it is to be filtered out, the content in the temporary buffer is
simply discarded, otherwise it is written into the trace buffer.

But if an interrupt were to come in while a previous event was using that
temporary buffer, the event written by the interrupt would actually go
into the ring buffer itself to prevent corrupting the data on the
temporary buffer. If the event is to be filtered out, the event in the
ring buffer is discarded, or if it fails to discard because another event
were to have already come in, it is turned into padding.

The update to the write_stamp in the rb_try_to_discard() happens after a
fix was made to force the next event after the discard to use an absolute
timestamp by setting the before_stamp to zero so it does not match the
write_stamp (which causes an event to use the absolute timestamp).

But there's an effort in rb_try_to_discard() to put back the write_stamp
to what it was before the event was added. But this is useless and
wasteful because nothing is going to be using that write_stamp for
calculations as it still will not match the before_stamp.

Remove this useless update, and in doing so, we remove another
cmpxchg64()!

Also update the comments to reflect this change as well as remove some
extra white space in another comment.

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Vincent Donnefort <[email protected]>
Fixes: b2dd797543cf ("ring-buffer: Force absolute timestamp on discard of event")
Signed-off-by: Steven Rostedt (Google) <[email protected]>
15 months agoring-buffer: Do not try to put back write_stamp
Steven Rostedt (Google) [Fri, 15 Dec 2023 03:29:21 +0000 (22:29 -0500)]
ring-buffer: Do not try to put back write_stamp

If an update to an event is interrupted by another event between the time
the initial event allocated its buffer and where it wrote to the
write_stamp, the code try to reset the write stamp back to the what it had
just overwritten. It knows that it was overwritten via checking the
before_stamp, and if it didn't match what it wrote to the before_stamp
before it allocated its space, it knows it was overwritten.

To put back the write_stamp, it uses the before_stamp it read. The problem
here is that by writing the before_stamp to the write_stamp it makes the
two equal again, which means that the write_stamp can be considered valid
as the last timestamp written to the ring buffer. But this is not
necessarily true. The event that interrupted the event could have been
interrupted in a way that it was interrupted as well, and can end up
leaving with an invalid write_stamp. But if this happens and returns to
this context that uses the before_stamp to update the write_stamp again,
it can possibly incorrectly make it valid, causing later events to have in
correct time stamps.

As it is OK to leave this function with an invalid write_stamp (one that
doesn't match the before_stamp), there's no reason to try to make it valid
again in this case. If this race happens, then just leave with the invalid
write_stamp and the next event to come along will just add a absolute
timestamp and validate everything again.

Bonus points: This gets rid of another cmpxchg64!

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Vincent Donnefort <[email protected]>
Fixes: a389d86f7fd09 ("ring-buffer: Have nested events still record running time stamp")
Signed-off-by: Steven Rostedt (Google) <[email protected]>
15 months agowifi: mwifiex: configure BSSID consistently when starting AP
David Lin [Fri, 15 Dec 2023 00:51:18 +0000 (08:51 +0800)]
wifi: mwifiex: configure BSSID consistently when starting AP

AP BSSID configuration is missing at AP start.  Without this fix, FW returns
STA interface MAC address after first init.  When hostapd restarts, it gets MAC
address from netdev before driver sets STA MAC to netdev again. Now MAC address
between hostapd and net interface are different causes STA cannot connect to
AP.  After that MAC address of uap0 mlan0 become the same. And issue disappears
after following hostapd restart (another issue is AP/STA MAC address become the
same).

This patch fixes the issue cleanly.

Signed-off-by: David Lin <[email protected]>
Fixes: 12190c5d80bd ("mwifiex: add cfg80211 start_ap and stop_ap handlers")
Cc: [email protected]
Reviewed-by: Francesco Dolcini <[email protected]>
Tested-by: Rafael Beims <[email protected]> # Verdin iMX8MP/SD8997 SD
Acked-by: Brian Norris <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]
15 months agowifi: mwifiex: add extra delay for firmware ready
David Lin [Fri, 8 Dec 2023 23:40:29 +0000 (07:40 +0800)]
wifi: mwifiex: add extra delay for firmware ready

For SDIO IW416, due to a bug, FW may return ready before complete full
initialization. Command timeout may occur at driver load after reboot.
Workaround by adding 100ms delay at checking FW status.

Signed-off-by: David Lin <[email protected]>
Cc: [email protected]
Reviewed-by: Francesco Dolcini <[email protected]>
Acked-by: Brian Norris <[email protected]>
Tested-by: Marcel Ziswiler <[email protected]> # Verdin AM62 (IW416)
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]
15 months agohv_netvsc: remove duplicated including of slab.h
Wang Jinchao [Fri, 15 Dec 2023 10:06:59 +0000 (18:06 +0800)]
hv_netvsc: remove duplicated including of slab.h

rm the second include <linux/slab.h>

Signed-off-by: Wang Jinchao <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agoMerge branch 'netlink-specs-legacy'
David S. Miller [Fri, 15 Dec 2023 12:17:17 +0000 (12:17 +0000)]
Merge branch 'netlink-specs-legacy'

Jakub Kicinski says:

====================
netlink: specs: prep legacy specs for C code gen

Minor adjustments to some specs to make them ready for C code gen.

v2:
 - fix MAINATINERS and subject of patch 3
====================

Signed-off-by: David S. Miller <[email protected]>
15 months agonetlink: specs: mptcp: rename the MPTCP path management spec
Jakub Kicinski [Fri, 15 Dec 2023 01:57:35 +0000 (17:57 -0800)]
netlink: specs: mptcp: rename the MPTCP path management spec

We assume in handful of places that the name of the spec is
the same as the name of the family. We could fix that but
it seems like a fair assumption to make. Rename the MPTCP
spec instead.

Reviewed-by: Mat Martineau <[email protected]>
Reviewed-by: Donald Hunter <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agonetlink: specs: ovs: correct enum names in specs
Jakub Kicinski [Fri, 15 Dec 2023 01:57:34 +0000 (17:57 -0800)]
netlink: specs: ovs: correct enum names in specs

Align the enum-names of OVS with what's actually in the uAPI.
Either correct the names, or mark the enum as empty because
the values are in fact #defines.

Reviewed-by: Donald Hunter <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agonetlink: specs: ovs: remove fixed header fields from attrs
Jakub Kicinski [Fri, 15 Dec 2023 01:57:33 +0000 (17:57 -0800)]
netlink: specs: ovs: remove fixed header fields from attrs

Op's "attributes" list is a workaround for families with a single
attr set. We don't want to render a single huge request structure,
the same for each op since we know that most ops accept only a small
set of attributes. "Attributes" list lets us narrow down the attributes
to what op acctually pays attention to.

It doesn't make sense to put names of fixed headers in there.
They are not "attributes" and we can't really narrow down the struct
members.

Remove the fixed header fields from attrs for ovs families
in preparation for C codegen support.

Reviewed-by: Donald Hunter <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
David S. Miller [Fri, 15 Dec 2023 12:03:20 +0000 (12:03 +0000)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
add v2 FW logging for ice driver

Paul Stillwell says:

Firmware (FW) log support was added to the ice driver, but that version is
no longer supported. There is a newer version of FW logging (v2) that
adds more control knobs to get the exact data out of the FW
for debugging.

The interface for FW logging is debugfs. This was chosen based on
discussions here:
https://lore.kernel.org/netdev/20230214180712.53fc8ba2@kernel.org/ and
https://lore.kernel.org/netdev/20231012164033.1069fb4b@kernel.org/

We talked about using devlink in a variety of ways, but none of those
options made any sense for the way the FW reports data. We briefly talked
about using ethtool, but that seemed to go by the wayside. Ultimately it
seems like using debugfs is the way to go so re-implement the code to use
that.

FW logging is across all the PFs on the device so restrict the commands to
only PF0.

If the device supports FW logging then a directory named 'fwlog' will be
created under '/sys/kernel/debug/ice/<pci_dev>'. A variety of files will be
created to manage the behavior of logging. The following files will be
created:
- modules/<module>
- nr_messages
- enable
- log_size
- data

where
modules/<module> is used to read/write the log level for a specific module

nr_messages is used to determine how many events should be in each message
sent to the driver

enable is used to start/stop FW logging. This is a boolean value so only 1
or 0 are permissible values

log_size is used to configure the amount of memory the driver uses for log
data

data is used to read/clear the log data

Generally there is a lot of data and dumping that data to syslog will
result in a loss of data. This causes problems when decoding the data and
the user doesn't know that data is missing until later. Instead of dumping
the FW log output to syslog use debugfs. This ensures that all the data the
driver has gets retrieved correctly.

The FW log data is binary data that the FW team decodes to determine what
happened in firmware. The binary blob is sent to Intel for decoding.
---
v6:
- use seq_printf() for outputting module info when reading from 'module' file
- replace code that created argc and argv for handling command line input
- removed checks in all the _read() and _write() functions to see if FW logging
  is supported because the files will not exist if it is not supported
- removed warnings on allocation failures on debugfs file creation failures
- removed a newline between memory allocation and checking if the memory was
  allocated
- fixed cases where we could just return the value from a function call
  instead of saving the value in a variable
- moved the check for PFO in ice_fwlog_init() to an earlier patch
- reworked all of argument scanning in the _write() functions in ice_debugfs.c
  to remove adding characters past the end of the buffer

v5: https://lore.kernel.org/netdev/20231205211251.2122874[email protected]/
- changed the log level configuration from a single file for all modules to a
  file per module.
- changed 'nr_buffs' to 'log_size' because users understand memory sizes
  better than a number of buffers
- changed 'resolution' to 'nr_messages' to better reflect what it represents
- updated documentation to reflect these changes
- updated documentation to indicate that FW logging must be disabled to
  clear the data. also clarified that any value written to the 'data' file will
  clear the data

v4: https://lore.kernel.org/netdev/20231005170110.3221306[email protected]/
- removed CONFIG_DEBUG_FS wrapper around code because the debugfs calls handle
  this case already
- moved ice_debugfs_exit() call to remove unreachable code issue
- minor changes to documentation based on feedback

v3: https://lore.kernel.org/netdev/20230815165750.2789609[email protected]/
- Adjust error path cleanup in ice_module_init() for unreachable code.

v2: https://lore.kernel.org/netdev/20230810170109.1963832[email protected]/
- Rewrote code to use debugfs instead of devlink

v1: https://lore.kernel.org/netdev/20230209190702.3638688[email protected]/
====================

Signed-off-by: David S. Miller <[email protected]>
15 months agoEDAC/versal: Read num_csrows and num_chans using the correct bitfield macro
Shubhrajyoti Datta [Fri, 15 Dec 2023 05:33:52 +0000 (11:03 +0530)]
EDAC/versal: Read num_csrows and num_chans using the correct bitfield macro

Fix the extraction of num_csrows and num_chans. The extraction of the
num_rows is wrong. Instead of extracting using the FIELD_GET it is
calling FIELD_PREP.

The issue was masked as the default design has the rows as 0.

Fixes: 6f15b178cd63 ("EDAC/versal: Add a Xilinx Versal memory controller driver")
Closes: https://lore.kernel.org/all/[email protected]/
Reported-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Shubhrajyoti Datta <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
15 months agonet/rose: fix races in rose_kill_by_device()
Eric Dumazet [Thu, 14 Dec 2023 15:27:47 +0000 (15:27 +0000)]
net/rose: fix races in rose_kill_by_device()

syzbot found an interesting netdev refcounting issue in
net/rose/af_rose.c, thanks to CONFIG_NET_DEV_REFCNT_TRACKER=y [1]

Problem is that rose_kill_by_device() can change rose->device
while other threads do not expect the pointer to be changed.

We have to first collect sockets in a temporary array,
then perform the changes while holding the socket
lock and rose_list_lock spinlock (in this order)

Change rose_release() to also acquire rose_list_lock
before releasing the netdev refcount.

[1]

[ 1185.055088][ T7889] ref_tracker: reference already released.
[ 1185.061476][ T7889] ref_tracker: allocated in:
[ 1185.066081][ T7889]  rose_bind+0x4ab/0xd10
[ 1185.070446][ T7889]  __sys_bind+0x1ec/0x220
[ 1185.074818][ T7889]  __x64_sys_bind+0x72/0xb0
[ 1185.079356][ T7889]  do_syscall_64+0x40/0x110
[ 1185.083897][ T7889]  entry_SYSCALL_64_after_hwframe+0x63/0x6b
[ 1185.089835][ T7889] ref_tracker: freed in:
[ 1185.094088][ T7889]  rose_release+0x2f5/0x570
[ 1185.098629][ T7889]  __sock_release+0xae/0x260
[ 1185.103262][ T7889]  sock_close+0x1c/0x20
[ 1185.107453][ T7889]  __fput+0x270/0xbb0
[ 1185.111467][ T7889]  task_work_run+0x14d/0x240
[ 1185.116085][ T7889]  get_signal+0x106f/0x2790
[ 1185.120622][ T7889]  arch_do_signal_or_restart+0x90/0x7f0
[ 1185.126205][ T7889]  exit_to_user_mode_prepare+0x121/0x240
[ 1185.131846][ T7889]  syscall_exit_to_user_mode+0x1e/0x60
[ 1185.137293][ T7889]  do_syscall_64+0x4d/0x110
[ 1185.141783][ T7889]  entry_SYSCALL_64_after_hwframe+0x63/0x6b
[ 1185.148085][ T7889] ------------[ cut here ]------------

WARNING: CPU: 1 PID: 7889 at lib/ref_tracker.c:255 ref_tracker_free+0x61a/0x810 lib/ref_tracker.c:255
Modules linked in:
CPU: 1 PID: 7889 Comm: syz-executor.2 Not tainted 6.7.0-rc4-syzkaller-00162-g65c95f78917e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
RIP: 0010:ref_tracker_free+0x61a/0x810 lib/ref_tracker.c:255
Code: 00 44 8b 6b 18 31 ff 44 89 ee e8 21 62 f5 fc 45 85 ed 0f 85 a6 00 00 00 e8 a3 66 f5 fc 48 8b 34 24 48 89 ef e8 27 5f f1 05 90 <0f> 0b 90 bb ea ff ff ff e9 52 fd ff ff e8 84 66 f5 fc 4c 8d 6d 44
RSP: 0018:ffffc90004917850 EFLAGS: 00010202
RAX: 0000000000000201 RBX: ffff88802618f4c0 RCX: 0000000000000000
RDX: 0000000000000202 RSI: ffffffff8accb920 RDI: 0000000000000001
RBP: ffff8880269ea5b8 R08: 0000000000000001 R09: fffffbfff23e35f6
R10: ffffffff91f1afb7 R11: 0000000000000001 R12: 1ffff92000922f0c
R13: 0000000005a2039b R14: ffff88802618f4d8 R15: 00000000ffffffff
FS: 00007f0a720ef6c0(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f43a819d988 CR3: 0000000076c64000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
netdev_tracker_free include/linux/netdevice.h:4127 [inline]
netdev_put include/linux/netdevice.h:4144 [inline]
netdev_put include/linux/netdevice.h:4140 [inline]
rose_kill_by_device net/rose/af_rose.c:195 [inline]
rose_device_event+0x25d/0x330 net/rose/af_rose.c:218
notifier_call_chain+0xb6/0x3b0 kernel/notifier.c:93
call_netdevice_notifiers_info+0xbe/0x130 net/core/dev.c:1967
call_netdevice_notifiers_extack net/core/dev.c:2005 [inline]
call_netdevice_notifiers net/core/dev.c:2019 [inline]
__dev_notify_flags+0x1f5/0x2e0 net/core/dev.c:8646
dev_change_flags+0x122/0x170 net/core/dev.c:8682
dev_ifsioc+0x9ad/0x1090 net/core/dev_ioctl.c:529
dev_ioctl+0x224/0x1090 net/core/dev_ioctl.c:786
sock_do_ioctl+0x198/0x270 net/socket.c:1234
sock_ioctl+0x22e/0x6b0 net/socket.c:1339
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:871 [inline]
__se_sys_ioctl fs/ioctl.c:857 [inline]
__x64_sys_ioctl+0x18f/0x210 fs/ioctl.c:857
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7f0a7147cba9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f0a720ef0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f0a7159bf80 RCX: 00007f0a7147cba9
RDX: 0000000020000040 RSI: 0000000000008914 RDI: 0000000000000004
RBP: 00007f0a714c847a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007f0a7159bf80 R15: 00007ffc8bb3a5f8
</TASK>

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Bernard Pidoux <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agoperf: Fix perf_event_validate_size() lockdep splat
Mark Rutland [Fri, 15 Dec 2023 11:24:50 +0000 (11:24 +0000)]
perf: Fix perf_event_validate_size() lockdep splat

When lockdep is enabled, the for_each_sibling_event(sibling, event)
macro checks that event->ctx->mutex is held. When creating a new group
leader event, we call perf_event_validate_size() on a partially
initialized event where event->ctx is NULL, and so when
for_each_sibling_event() attempts to check event->ctx->mutex, we get a
splat, as reported by Lucas De Marchi:

  WARNING: CPU: 8 PID: 1471 at kernel/events/core.c:1950 __do_sys_perf_event_open+0xf37/0x1080

This only happens for a new event which is its own group_leader, and in
this case there cannot be any sibling events. Thus it's safe to skip the
check for siblings, which avoids having to make invasive and ugly
changes to for_each_sibling_event().

Avoid the splat by bailing out early when the new event is its own
group_leader.

Fixes: 382c27f4ed28f803 ("perf: Fix perf_event_validate_size()")
Closes: https://lore.kernel.org/lkml/[email protected]/
Closes: https://lore.kernel.org/lkml/ZXpm6gQ%[email protected]/
Reported-by: Lucas De Marchi <[email protected]>
Reported-by: Pengfei Xu <[email protected]>
Signed-off-by: Mark Rutland <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
15 months agoMerge branch 'mv88e6xxx-counters'
David S. Miller [Fri, 15 Dec 2023 11:05:03 +0000 (11:05 +0000)]
Merge branch 'mv88e6xxx-counters'

Tobias Waldekranz says:

====================
net: dsa: mv88e6xxx: Add "eth-mac" and "rmon" counter group support

The majority of the changes (2/8) are about refactoring the existing
ethtool statistics support to make it possible to read individual
counters, rather than the whole set.

4/8 tries to collect all information about a stat in a single place
using a mapper macro, which is then used to generate the original list
of stats, along with a matching enum. checkpatch is less than amused
with this construct, but prior art exists (__BPF_FUNC_MAPPER in
include/uapi/linux/bpf.h, for example).

To support the histogram counters from the "rmon" group, we have to
change mv88e6xxx's configuration of them. Instead of counting rx and
tx, we restrict them to rx-only. 6/8 has the details.

With that in place, adding the actual counter groups is pretty
straight forward (5,7/8).

Tie it all together with a selftest (8/8).

v3 -> v4:
- Return size_t from mv88e6xxx_stats_get_stats
- Spelling errors in commit message of 6/8
- Improve selftest:
  - Report progress per-bucket
  - Test both ports in the pair
  - Increase MTU, if required

v2 -> v3:
- Added 6/8
- Added 8/8

v1 -> v2:
- Added 1/6
- Added 3/6
- Changed prototype of stats operation to reflect the fact that the
  number of read stats are returned, no errors
- Moved comma into MV88E6XXX_HW_STAT_MAPPER definition
- Avoid the construction of mapping table iteration which relied on
  struct layouts outside of mv88e6xxx's control
====================

Signed-off-by: David S. Miller <[email protected]>
15 months agoselftests: forwarding: ethtool_rmon: Add histogram counter test
Tobias Waldekranz [Thu, 14 Dec 2023 13:50:29 +0000 (14:50 +0100)]
selftests: forwarding: ethtool_rmon: Add histogram counter test

Validate the operation of rx and tx histogram counters, if supported
by the interface, by sending batches of packets targeted for each
bucket.

Signed-off-by: Tobias Waldekranz <[email protected]>
Tested-by: Vladimir Oltean <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agonet: dsa: mv88e6xxx: Add "rmon" counter group support
Tobias Waldekranz [Thu, 14 Dec 2023 13:50:28 +0000 (14:50 +0100)]
net: dsa: mv88e6xxx: Add "rmon" counter group support

Report the applicable subset of an mv88e6xxx port's counters using
ethtool's standardized "rmon" counter group.

Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Tobias Waldekranz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agonet: dsa: mv88e6xxx: Limit histogram counters to ingress traffic
Tobias Waldekranz [Thu, 14 Dec 2023 13:50:27 +0000 (14:50 +0100)]
net: dsa: mv88e6xxx: Limit histogram counters to ingress traffic

Chips in this family only have one set of histogram counters, which
can be used to count ingressing and/or egressing traffic. mv88e6xxx
has, up until this point, kept the hardware default of counting both
directions.

In the mean time, standard counter group support has been added to
ethtool. Via that interface, drivers may report ingress-only and
egress-only histograms separately - but not combined.

In order for mv88e6xxx to maximize amount of diagnostic information
that can be exported via standard interfaces, we opt to limit the
histogram counters to ingress traffic only. Which will allow us to
export them via the standard "rmon" group in an upcoming commit.

The reason for choosing ingress-only over egress-only, is to be
compatible with RFC2819 (RMON MIB).

Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: Tobias Waldekranz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agonet: dsa: mv88e6xxx: Add "eth-mac" counter group support
Tobias Waldekranz [Thu, 14 Dec 2023 13:50:26 +0000 (14:50 +0100)]
net: dsa: mv88e6xxx: Add "eth-mac" counter group support

Report the applicable subset of an mv88e6xxx port's counters using
ethtool's standardized "eth-mac" counter group.

Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Tobias Waldekranz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agonet: dsa: mv88e6xxx: Give each hw stat an ID
Tobias Waldekranz [Thu, 14 Dec 2023 13:50:25 +0000 (14:50 +0100)]
net: dsa: mv88e6xxx: Give each hw stat an ID

With the upcoming standard counter group support, we are no longer
reading out the whole set of counters, but rather mapping a subset to
the requested group.

Therefore, create an enum with an ID for each stat, such that
mv88e6xxx_hw_stats[] can be subscripted with a human-readable ID
corresponding to the counter's name.

Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Tobias Waldekranz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agonet: dsa: mv88e6xxx: Fix mv88e6352_serdes_get_stats error path
Tobias Waldekranz [Thu, 14 Dec 2023 13:50:24 +0000 (14:50 +0100)]
net: dsa: mv88e6xxx: Fix mv88e6352_serdes_get_stats error path

mv88e6xxx_get_stats, which collects stats from various sources,
expects all callees to return the number of stats read. If an error
occurs, 0 should be returned.

Prevent future mishaps of this kind by updating the return type to
reflect this contract.

Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Tobias Waldekranz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agonet: dsa: mv88e6xxx: Create API to read a single stat counter
Tobias Waldekranz [Thu, 14 Dec 2023 13:50:23 +0000 (14:50 +0100)]
net: dsa: mv88e6xxx: Create API to read a single stat counter

This change contains no functional change. We simply push the hardware
specific stats logic to a function reading a single counter, rather
than the whole set.

This is a preparatory change for the upcoming standard ethtool
statistics support (i.e. "eth-mac", "eth-ctrl" etc.).

Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Tobias Waldekranz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agonet: dsa: mv88e6xxx: Push locking into stats snapshotting
Tobias Waldekranz [Thu, 14 Dec 2023 13:50:22 +0000 (14:50 +0100)]
net: dsa: mv88e6xxx: Push locking into stats snapshotting

This is more consistent with the driver's general structure.

Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Tobias Waldekranz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agoMerge branch 'net-optmem_max-changes'
David S. Miller [Fri, 15 Dec 2023 11:01:27 +0000 (11:01 +0000)]
Merge branch 'net-optmem_max-changes'

Eric Dumazet says:

====================
net: optmem_max changes

optmem_max default value is too small for tx zerocopy workloads.

First patch increases default from 20KB to 128 KB,
which is the value we have used for seven years.

Second patch makes optmem_max sysctl per netns.

Last patch tweaks two tests accordingly.
====================

Signed-off-by: David S. Miller <[email protected]>
15 months agoselftests/net: optmem_max became per netns
Eric Dumazet [Thu, 14 Dec 2023 10:49:01 +0000 (10:49 +0000)]
selftests/net: optmem_max became per netns

/proc/sys/net/core/optmem_max is now per netns, change two tests
that were saving/changing/restoring its value on the parent netns.

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agonet: Namespace-ify sysctl_optmem_max
Eric Dumazet [Thu, 14 Dec 2023 10:49:00 +0000 (10:49 +0000)]
net: Namespace-ify sysctl_optmem_max

optmem_max being used in tx zerocopy,
we want to be able to control it on a netns basis.

Following patch changes two tests.

Tested:

oqq130:~# cat /proc/sys/net/core/optmem_max
131072
oqq130:~# echo 1000000 >/proc/sys/net/core/optmem_max
oqq130:~# cat /proc/sys/net/core/optmem_max
1000000
oqq130:~# unshare -n
oqq130:~# cat /proc/sys/net/core/optmem_max
131072
oqq130:~# exit
logout
oqq130:~# cat /proc/sys/net/core/optmem_max
1000000

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agonet: increase optmem_max default value
Eric Dumazet [Thu, 14 Dec 2023 10:48:59 +0000 (10:48 +0000)]
net: increase optmem_max default value

For many years, /proc/sys/net/core/optmem_max default value
on a 64bit kernel has been 20 KB.

Regular usage of TCP tx zerocopy needs a bit more.

Google has used 128KB as the default value for 7 years without
any problem.

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agoMerge branch 'mlxsw-CFF-flood-mode'
David S. Miller [Fri, 15 Dec 2023 10:58:00 +0000 (10:58 +0000)]
Merge branch 'mlxsw-CFF-flood-mode'

Petr Machata says:

====================
mlxsw: CFF flood mode: NVE underlay configuration

Recently, support for CFF flood mode (for Compressed FID Flooding) was
added to the mlxsw driver. The most recent patchset has a detailed coverage
of what CFF is and what has changed and how:

    https://lore.kernel.org/netdev/cover.1701183891[email protected]/

In CFF flood mode, each FID allocates a handful (in our implementation two
or three) consecutive PGT entries. One entry holds the flood vector for
unknown-UC traffic, one for MC, one for BC.

To determine how to look up flood vectors, the CFF flood mode uses a
concept of flood profiles, which are IDs that reference mappings from
traffic types to offsets. In the case of CFF flood mode, the offset in
question is applied to the PGT address configured at a FID. The same
mechanism is used by NVE underlay for flooding. Again the profile ID and
the traffic type determine the offset to apply, this time to KVD address
used to look up flooding entries. Since mlxsw configures NVE underlay flood
the same regardless of traffic type, only one offset was ever needed: the
zero, which is the default, and thus no explicit configuration was needed.

Now that CFF uses profiles as well, it would be better to configure the
profile used by NVE explicitly, to make the configuration visible in the
source code.

In this patchset, add the register support (in patch #1), add a new traffic
type to refer to "any traffic at all" (in patch #2) and finally configure
the NVE profile explicitly for FIDs (in patch #3).

So far, the implicitly configured flood profile was the ID 0. With this
patchset, it changes to 3, leaving the 0 free to allow us to spot missed
configuration.
====================

Signed-off-by: David S. Miller <[email protected]>
15 months agomlxsw: spectrum_fid: Set NVE flood profile as part of FID configuration
Petr Machata [Thu, 14 Dec 2023 13:19:07 +0000 (14:19 +0100)]
mlxsw: spectrum_fid: Set NVE flood profile as part of FID configuration

The NVE flood profile is used for determining of offset applied to KVD
address for NVE flood. We currently do not set it, leaving it at the
default value of 0. That is not an issue: all the traffic-type-to-offset
mappings (as configured by SFFP) default to offset of 0. This is what we
need anyway, as mlxsw only allocates a single KVD entry for NVE underlay.

The field is only relevant on Spectrum-2 and above. So to be fully
consistent, we should split the existing controlled ops to Spectrum-1 and
Spectrum>1 variants, with only the latter setting the field. But that seems
like a lot of overhead for a single field whose meaning is "everything is
the default". So instead pretend that the NVE flood profile does not exist
in the controlled flood mode, like we have so far, and only set it when
flood mode is CFF.

Setting this at all serves dual purpose. First, it is now clear which
profile belongs to NVE, because in the CFF mode, we have multiple users.
This should prevent bugs in flood profile management. Second, using
specifically non-zero value means there will be no valid uses of the
profile 0, which we can therefore use as a sentinel.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agomlxsw: spectrum_fid: Add an "any" packet type
Petr Machata [Thu, 14 Dec 2023 13:19:06 +0000 (14:19 +0100)]
mlxsw: spectrum_fid: Add an "any" packet type

Flood profiles have been used prior to CFF support for NVE underlay. Like
is the case with FID flooding, an NVE profile describes at which offset a
datum is located given traffic type. mlxsw currently only ever uses one KVD
entry for NVE lookup, i.e. regardless of traffic type, the offset is always
zero. To be able to describe this, add a traffic type enumerator describing
"any traffic type".

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agomlxsw: reg: Add nve_flood_prf_id field to SFMR
Petr Machata [Thu, 14 Dec 2023 13:19:05 +0000 (14:19 +0100)]
mlxsw: reg: Add nve_flood_prf_id field to SFMR

The field is used for setting a flood profile for lookup of KVD entry for
NVE underlay. As the other uses of flood profile, this references a traffic
type-to-offset mapping, except here it is not applied to PGT offsets, but
KVD offsets.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agoethernet: atheros: fix a memleak in atl1e_setup_ring_resources
Zhipeng Lu [Thu, 14 Dec 2023 13:04:04 +0000 (21:04 +0800)]
ethernet: atheros: fix a memleak in atl1e_setup_ring_resources

In the error handling of 'offset > adapter->ring_size', the
tx_ring->tx_buffer allocated by kzalloc should be freed,
instead of 'goto failed' instantly.

Fixes: a6a5325239c2 ("atl1e: Atheros L1E Gigabit Ethernet driver")
Signed-off-by: Zhipeng Lu <[email protected]>
Reviewed-by: Suman Ghosh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agonet: sched: ife: fix potential use-after-free
Eric Dumazet [Thu, 14 Dec 2023 11:30:38 +0000 (11:30 +0000)]
net: sched: ife: fix potential use-after-free

ife_decode() calls pskb_may_pull() two times, we need to reload
ifehdr after the second one, or risk use-after-free as reported
by syzbot:

BUG: KASAN: slab-use-after-free in __ife_tlv_meta_valid net/ife/ife.c:108 [inline]
BUG: KASAN: slab-use-after-free in ife_tlv_meta_decode+0x1d1/0x210 net/ife/ife.c:131
Read of size 2 at addr ffff88802d7300a4 by task syz-executor.5/22323

CPU: 0 PID: 22323 Comm: syz-executor.5 Not tainted 6.7.0-rc3-syzkaller-00804-g074ac38d5b95 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:364 [inline]
print_report+0xc4/0x620 mm/kasan/report.c:475
kasan_report+0xda/0x110 mm/kasan/report.c:588
__ife_tlv_meta_valid net/ife/ife.c:108 [inline]
ife_tlv_meta_decode+0x1d1/0x210 net/ife/ife.c:131
tcf_ife_decode net/sched/act_ife.c:739 [inline]
tcf_ife_act+0x4e3/0x1cd0 net/sched/act_ife.c:879
tc_act include/net/tc_wrapper.h:221 [inline]
tcf_action_exec+0x1ac/0x620 net/sched/act_api.c:1079
tcf_exts_exec include/net/pkt_cls.h:344 [inline]
mall_classify+0x201/0x310 net/sched/cls_matchall.c:42
tc_classify include/net/tc_wrapper.h:227 [inline]
__tcf_classify net/sched/cls_api.c:1703 [inline]
tcf_classify+0x82f/0x1260 net/sched/cls_api.c:1800
hfsc_classify net/sched/sch_hfsc.c:1147 [inline]
hfsc_enqueue+0x315/0x1060 net/sched/sch_hfsc.c:1546
dev_qdisc_enqueue+0x3f/0x230 net/core/dev.c:3739
__dev_xmit_skb net/core/dev.c:3828 [inline]
__dev_queue_xmit+0x1de1/0x3d30 net/core/dev.c:4311
dev_queue_xmit include/linux/netdevice.h:3165 [inline]
packet_xmit+0x237/0x350 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3081 [inline]
packet_sendmsg+0x24aa/0x5200 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0xd5/0x180 net/socket.c:745
__sys_sendto+0x255/0x340 net/socket.c:2190
__do_sys_sendto net/socket.c:2202 [inline]
__se_sys_sendto net/socket.c:2198 [inline]
__x64_sys_sendto+0xe0/0x1b0 net/socket.c:2198
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7fe9acc7cae9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fe9ada450c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007fe9acd9bf80 RCX: 00007fe9acc7cae9
RDX: 000000000000fce0 RSI: 00000000200002c0 RDI: 0000000000000003
RBP: 00007fe9accc847a R08: 0000000020000140 R09: 0000000000000014
R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007fe9acd9bf80 R15: 00007ffd5427ae78
</TASK>

Allocated by task 22323:
kasan_save_stack+0x33/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0xa2/0xb0 mm/kasan/common.c:383
kasan_kmalloc include/linux/kasan.h:198 [inline]
__do_kmalloc_node mm/slab_common.c:1007 [inline]
__kmalloc_node_track_caller+0x5a/0x90 mm/slab_common.c:1027
kmalloc_reserve+0xef/0x260 net/core/skbuff.c:582
__alloc_skb+0x12b/0x330 net/core/skbuff.c:651
alloc_skb include/linux/skbuff.h:1298 [inline]
alloc_skb_with_frags+0xe4/0x710 net/core/skbuff.c:6331
sock_alloc_send_pskb+0x7e4/0x970 net/core/sock.c:2780
packet_alloc_skb net/packet/af_packet.c:2930 [inline]
packet_snd net/packet/af_packet.c:3024 [inline]
packet_sendmsg+0x1e2a/0x5200 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0xd5/0x180 net/socket.c:745
__sys_sendto+0x255/0x340 net/socket.c:2190
__do_sys_sendto net/socket.c:2202 [inline]
__se_sys_sendto net/socket.c:2198 [inline]
__x64_sys_sendto+0xe0/0x1b0 net/socket.c:2198
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b

Freed by task 22323:
kasan_save_stack+0x33/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:522
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free+0x15b/0x1b0 mm/kasan/common.c:200
kasan_slab_free include/linux/kasan.h:164 [inline]
slab_free_hook mm/slub.c:1800 [inline]
slab_free_freelist_hook+0x114/0x1e0 mm/slub.c:1826
slab_free mm/slub.c:3809 [inline]
__kmem_cache_free+0xc0/0x180 mm/slub.c:3822
skb_kfree_head net/core/skbuff.c:950 [inline]
skb_free_head+0x110/0x1b0 net/core/skbuff.c:962
pskb_expand_head+0x3c5/0x1170 net/core/skbuff.c:2130
__pskb_pull_tail+0xe1/0x1830 net/core/skbuff.c:2655
pskb_may_pull_reason include/linux/skbuff.h:2685 [inline]
pskb_may_pull include/linux/skbuff.h:2693 [inline]
ife_decode+0x394/0x4f0 net/ife/ife.c:82
tcf_ife_decode net/sched/act_ife.c:727 [inline]
tcf_ife_act+0x43b/0x1cd0 net/sched/act_ife.c:879
tc_act include/net/tc_wrapper.h:221 [inline]
tcf_action_exec+0x1ac/0x620 net/sched/act_api.c:1079
tcf_exts_exec include/net/pkt_cls.h:344 [inline]
mall_classify+0x201/0x310 net/sched/cls_matchall.c:42
tc_classify include/net/tc_wrapper.h:227 [inline]
__tcf_classify net/sched/cls_api.c:1703 [inline]
tcf_classify+0x82f/0x1260 net/sched/cls_api.c:1800
hfsc_classify net/sched/sch_hfsc.c:1147 [inline]
hfsc_enqueue+0x315/0x1060 net/sched/sch_hfsc.c:1546
dev_qdisc_enqueue+0x3f/0x230 net/core/dev.c:3739
__dev_xmit_skb net/core/dev.c:3828 [inline]
__dev_queue_xmit+0x1de1/0x3d30 net/core/dev.c:4311
dev_queue_xmit include/linux/netdevice.h:3165 [inline]
packet_xmit+0x237/0x350 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3081 [inline]
packet_sendmsg+0x24aa/0x5200 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0xd5/0x180 net/socket.c:745
__sys_sendto+0x255/0x340 net/socket.c:2190
__do_sys_sendto net/socket.c:2202 [inline]
__se_sys_sendto net/socket.c:2198 [inline]
__x64_sys_sendto+0xe0/0x1b0 net/socket.c:2198
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b

The buggy address belongs to the object at ffff88802d730000
which belongs to the cache kmalloc-8k of size 8192
The buggy address is located 164 bytes inside of
freed 8192-byte region [ffff88802d730000ffff88802d732000)

The buggy address belongs to the physical page:
page:ffffea0000b5cc00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2d730
head:ffffea0000b5cc00 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0xfff00000000840(slab|head|node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 00fff00000000840 ffff888013042280 dead000000000122 0000000000000000
raw: 0000000000000000 0000000080020002 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 22323, tgid 22320 (syz-executor.5), ts 950317230369, free_ts 950233467461
set_page_owner include/linux/page_owner.h:31 [inline]
post_alloc_hook+0x2d0/0x350 mm/page_alloc.c:1544
prep_new_page mm/page_alloc.c:1551 [inline]
get_page_from_freelist+0xa28/0x3730 mm/page_alloc.c:3319
__alloc_pages+0x22e/0x2420 mm/page_alloc.c:4575
alloc_pages_mpol+0x258/0x5f0 mm/mempolicy.c:2133
alloc_slab_page mm/slub.c:1870 [inline]
allocate_slab mm/slub.c:2017 [inline]
new_slab+0x283/0x3c0 mm/slub.c:2070
___slab_alloc+0x979/0x1500 mm/slub.c:3223
__slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3322
__slab_alloc_node mm/slub.c:3375 [inline]
slab_alloc_node mm/slub.c:3468 [inline]
__kmem_cache_alloc_node+0x131/0x310 mm/slub.c:3517
__do_kmalloc_node mm/slab_common.c:1006 [inline]
__kmalloc_node_track_caller+0x4a/0x90 mm/slab_common.c:1027
kmalloc_reserve+0xef/0x260 net/core/skbuff.c:582
__alloc_skb+0x12b/0x330 net/core/skbuff.c:651
alloc_skb include/linux/skbuff.h:1298 [inline]
alloc_skb_with_frags+0xe4/0x710 net/core/skbuff.c:6331
sock_alloc_send_pskb+0x7e4/0x970 net/core/sock.c:2780
packet_alloc_skb net/packet/af_packet.c:2930 [inline]
packet_snd net/packet/af_packet.c:3024 [inline]
packet_sendmsg+0x1e2a/0x5200 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0xd5/0x180 net/socket.c:745
__sys_sendto+0x255/0x340 net/socket.c:2190
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1144 [inline]
free_unref_page_prepare+0x53c/0xb80 mm/page_alloc.c:2354
free_unref_page+0x33/0x3b0 mm/page_alloc.c:2494
__unfreeze_partials+0x226/0x240 mm/slub.c:2655
qlink_free mm/kasan/quarantine.c:168 [inline]
qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187
kasan_quarantine_reduce+0x18e/0x1d0 mm/kasan/quarantine.c:294
__kasan_slab_alloc+0x65/0x90 mm/kasan/common.c:305
kasan_slab_alloc include/linux/kasan.h:188 [inline]
slab_post_alloc_hook mm/slab.h:763 [inline]
slab_alloc_node mm/slub.c:3478 [inline]
slab_alloc mm/slub.c:3486 [inline]
__kmem_cache_alloc_lru mm/slub.c:3493 [inline]
kmem_cache_alloc_lru+0x219/0x6f0 mm/slub.c:3509
alloc_inode_sb include/linux/fs.h:2937 [inline]
ext4_alloc_inode+0x28/0x650 fs/ext4/super.c:1408
alloc_inode+0x5d/0x220 fs/inode.c:261
new_inode_pseudo fs/inode.c:1006 [inline]
new_inode+0x22/0x260 fs/inode.c:1032
__ext4_new_inode+0x333/0x5200 fs/ext4/ialloc.c:958
ext4_symlink+0x5d7/0xa20 fs/ext4/namei.c:3398
vfs_symlink fs/namei.c:4464 [inline]
vfs_symlink+0x3e5/0x620 fs/namei.c:4448
do_symlinkat+0x25f/0x310 fs/namei.c:4490
__do_sys_symlinkat fs/namei.c:4506 [inline]
__se_sys_symlinkat fs/namei.c:4503 [inline]
__x64_sys_symlinkat+0x97/0xc0 fs/namei.c:4503
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82

Fixes: d57493d6d1be ("net: sched: ife: check on metadata length")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Jamal Hadi Salim <[email protected]>
Cc: Alexander Aring <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agonet: Return error from sk_stream_wait_connect() if sk_wait_event() fails
Shigeru Yoshida [Thu, 14 Dec 2023 05:09:22 +0000 (14:09 +0900)]
net: Return error from sk_stream_wait_connect() if sk_wait_event() fails

The following NULL pointer dereference issue occurred:

BUG: kernel NULL pointer dereference, address: 0000000000000000
<...>
RIP: 0010:ccid_hc_tx_send_packet net/dccp/ccid.h:166 [inline]
RIP: 0010:dccp_write_xmit+0x49/0x140 net/dccp/output.c:356
<...>
Call Trace:
 <TASK>
 dccp_sendmsg+0x642/0x7e0 net/dccp/proto.c:801
 inet_sendmsg+0x63/0x90 net/ipv4/af_inet.c:846
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg+0x83/0xe0 net/socket.c:745
 ____sys_sendmsg+0x443/0x510 net/socket.c:2558
 ___sys_sendmsg+0xe5/0x150 net/socket.c:2612
 __sys_sendmsg+0xa6/0x120 net/socket.c:2641
 __do_sys_sendmsg net/socket.c:2650 [inline]
 __se_sys_sendmsg net/socket.c:2648 [inline]
 __x64_sys_sendmsg+0x45/0x50 net/socket.c:2648
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x43/0x110 arch/x86/entry/common.c:82
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

sk_wait_event() returns an error (-EPIPE) if disconnect() is called on the
socket waiting for the event. However, sk_stream_wait_connect() returns
success, i.e. zero, even if sk_wait_event() returns -EPIPE, so a function
that waits for a connection with sk_stream_wait_connect() may misbehave.

In the case of the above DCCP issue, dccp_sendmsg() is waiting for the
connection. If disconnect() is called in concurrently, the above issue
occurs.

This patch fixes the issue by returning error from sk_stream_wait_connect()
if sk_wait_event() fails.

Fixes: 419ce133ab92 ("tcp: allow again tcp_disconnect() when threads are waiting")
Signed-off-by: Shigeru Yoshida <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reported-by: [email protected]
Reviewed-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Reported-by: syzkaller <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agoMerge branch 'net-at803x-cleanups'
David S. Miller [Fri, 15 Dec 2023 10:45:57 +0000 (10:45 +0000)]
Merge branch 'net-at803x-cleanups'

Christian Marangi says:

====================
net: phy: at803x: additional cleanup for qca808x

This small series is a preparation for the big code split. While the
qca808x code is waiting to be reviwed and merged, we can further cleanup
and generalize shared functions between at803x and qca808x.

With these last 2 patch everything is ready to move the driver to a
dedicated directory and split the code by creating a library module
for the few shared functions between the 2 driver.

Eventually at803x can be further cleaned and generalized but everything
will be already self contained and related only to at803x family of PHYs.
====================

Signed-off-by: David S. Miller <[email protected]>
15 months agonet: phy: at803x: make read specific status function more generic
Christian Marangi [Thu, 14 Dec 2023 00:44:32 +0000 (01:44 +0100)]
net: phy: at803x: make read specific status function more generic

Rework read specific status function to be more generic. The function
apply different speed mask based on the PHY ID. Make it more generic by
adding an additional arg to pass the specific speed (ss) mask and use
the provided mask to parse the speed value.

This is needed to permit an easier deatch of qca808x code from the
at803x driver.

Signed-off-by: Christian Marangi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agonet: phy: at803x: move specific qca808x config_aneg to dedicated function
Christian Marangi [Thu, 14 Dec 2023 00:44:31 +0000 (01:44 +0100)]
net: phy: at803x: move specific qca808x config_aneg to dedicated function

Move specific qca808x config_aneg to dedicated function to permit easier
split of qca808x portion from at803x driver.

Signed-off-by: Christian Marangi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agoMerge branch 'vsock-credit-update'
David S. Miller [Fri, 15 Dec 2023 10:37:36 +0000 (10:37 +0000)]
Merge branch 'vsock-credit-update'

Arseniy Krasnov says:

====================
send credit update during setting SO_RCVLOWAT

                               DESCRIPTION

This patchset fixes old problem with hungup of both rx/tx sides and adds
test for it. This happens due to non-default SO_RCVLOWAT value and
deferred credit update in virtio/vsock. Link to previous old patchset:
https://lore.kernel.org/netdev/39b2e9fd-601b-189d-39a9-914e5574524c@sberdevices.ru/

Here is what happens step by step:

                                  TEST

                            INITIAL CONDITIONS

1) Vsock buffer size is 128KB.
2) Maximum packet size is also 64KB as defined in header (yes it is
   hardcoded, just to remind about that value).
3) SO_RCVLOWAT is default, e.g. 1 byte.

                                 STEPS

            SENDER                              RECEIVER
1) sends 128KB + 1 byte in a
   single buffer. 128KB will
   be sent, but for 1 byte
   sender will wait for free
   space at peer. Sender goes
   to sleep.

2)                                     reads 64KB, credit update not sent
3)                                     sets SO_RCVLOWAT to 64KB + 1
4)                                     poll() -> wait forever, there is
                                       only 64KB available to read.

So in step 4) receiver also goes to sleep, waiting for enough data or
connection shutdown message from the sender. Idea to fix it is that rx
kicks tx side to continue transmission (and may be close connection)
when rx changes number of bytes to be woken up (e.g. SO_RCVLOWAT) and
this value is bigger than number of available bytes to read.

I've added small test for this, but not sure as it uses hardcoded value
for maximum packet length, this value is defined in kernel header and
used to control deferred credit update. And as this is not available to
userspace, I can't control test parameters correctly (if one day this
define will be changed - test may become useless).

Head for this patchset is:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=9bab51bd662be4c3ebb18a28879981d69f3ef15a

Link to v1:
https://lore.kernel.org/netdev/20231108072004.1045669[email protected]/
Link to v2:
https://lore.kernel.org/netdev/20231119204922.2251912[email protected]/
Link to v3:
https://lore.kernel.org/netdev/20231122180510.2297075[email protected]/
Link to v4:
https://lore.kernel.org/netdev/20231129212519.2938875[email protected]/
Link to v5:
https://lore.kernel.org/netdev/20231130130840[email protected]/
Link to v6:
https://lore.kernel.org/netdev/20231205064806.2851305[email protected]/
Link to v7:
https://lore.kernel.org/netdev/20231206211849.2707151[email protected]/
Link to v8:
https://lore.kernel.org/netdev/20231211211658.2904268[email protected]/
Link to v9:
https://lore.kernel.org/netdev/20231214091947[email protected]/

Changelog:
v1 -> v2:
 * Patchset rebased and tested on new HEAD of net-next (see hash above).
 * New patch is added as 0001 - it removes return from SO_RCVLOWAT set
   callback in 'af_vsock.c' when transport callback is set - with that
   we can set 'sk_rcvlowat' only once in 'af_vsock.c' and in future do
   not copy-paste it to every transport. It was discussed in v1.
 * See per-patch changelog after ---.
v2 -> v3:
 * See changelog after --- in 0003 only (0001 and 0002 still same).
v3 -> v4:
 * Patchset rebased and tested on new HEAD of net-next (see hash above).
 * See per-patch changelog after ---.
v4 -> v5:
 * Change patchset tag 'RFC' -> 'net-next'.
 * See per-patch changelog after ---.
v5 -> v6:
 * New patch 0003 which sends credit update during reading bytes from
   socket.
 * See per-patch changelog after ---.
v6 -> v7:
 * Patchset rebased and tested on new HEAD of net-next (see hash above).
 * See per-patch changelog after ---.
v7 -> v8:
 * See per-patch changelog after ---.
v8 -> v9:
 * Patchset rebased and tested on new HEAD of net-next (see hash above).
 * Add 'Fixes' tag for the current 0002.
 * Reorder patches by moving two fixes first.
v9 -> v10:
 * Squash 0002 and 0003 and update commit message in result.
====================

Signed-off-by: David S. Miller <[email protected]>
15 months agovsock/test: two tests to check credit update logic
Arseniy Krasnov [Thu, 14 Dec 2023 12:52:30 +0000 (15:52 +0300)]
vsock/test: two tests to check credit update logic

Both tests are almost same, only differs in two 'if' conditions, so
implemented in a single function. Tests check, that credit update
message is sent:

1) During setting SO_RCVLOWAT value of the socket.
2) When number of 'rx_bytes' become smaller than SO_RCVLOWAT value.

Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agovirtio/vsock: send credit update during setting SO_RCVLOWAT
Arseniy Krasnov [Thu, 14 Dec 2023 12:52:29 +0000 (15:52 +0300)]
virtio/vsock: send credit update during setting SO_RCVLOWAT

Send credit update message when SO_RCVLOWAT is updated and it is bigger
than number of bytes in rx queue. It is needed, because 'poll()' will
wait until number of bytes in rx queue will be not smaller than
O_RCVLOWAT, so kick sender to send more data. Otherwise mutual hungup
for tx/rx is possible: sender waits for free space and receiver is
waiting data in 'poll()'.

Rename 'set_rcvlowat' callback to 'notify_set_rcvlowat' and set
'sk->sk_rcvlowat' only in one place (i.e. 'vsock_set_rcvlowat'), so the
transport doesn't need to do it.

Fixes: b89d882dc9fc ("vsock/virtio: reduce credit update messages")
Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agovirtio/vsock: fix logic which reduces credit update messages
Arseniy Krasnov [Thu, 14 Dec 2023 12:52:28 +0000 (15:52 +0300)]
virtio/vsock: fix logic which reduces credit update messages

Add one more condition for sending credit update during dequeue from
stream socket: when number of bytes in the rx queue is smaller than
SO_RCVLOWAT value of the socket. This is actual for non-default value
of SO_RCVLOWAT (e.g. not 1) - idea is to "kick" peer to continue data
transmission, because we need at least SO_RCVLOWAT bytes in our rx
queue to wake up user for reading data (in corner case it is also
possible to stuck both tx and rx sides, this is why 'Fixes' is used).

Fixes: b89d882dc9fc ("vsock/virtio: reduce credit update messages")
Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agoocteontx2-pf: Fix graceful exit during PFC configuration failure
Suman Ghosh [Wed, 13 Dec 2023 18:10:44 +0000 (23:40 +0530)]
octeontx2-pf: Fix graceful exit during PFC configuration failure

During PFC configuration failure the code was not handling a graceful
exit. This patch fixes the same and add proper code for a graceful exit.

Fixes: 99c969a83d82 ("octeontx2-pf: Add egress PFC support")
Signed-off-by: Suman Ghosh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agoipmr: support IP_PKTINFO on cache report IGMP msg
Leone Fernando [Wed, 13 Dec 2023 16:19:35 +0000 (17:19 +0100)]
ipmr: support IP_PKTINFO on cache report IGMP msg

In order to support IP_PKTINFO on those packets, we need to call
ipv4_pktinfo_prepare.

When sending mrouted/pimd daemons a cache report IGMP msg, it is
unnecessary to set dst on the newly created skb.
It used to be necessary on older versions until
commit d826eb14ecef ("ipv4: PKTINFO doesnt need dst reference") which
changed the way IP_PKTINFO struct is been retrieved.

Changes from v1:
1. Undo changes in ipv4_pktinfo_prepare function. use it directly
   and copy the control block.

Fixes: d826eb14ecef ("ipv4: PKTINFO doesnt need dst reference")
Signed-off-by: Leone Fernando <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agonet: mana: add msix index sharing between EQs
Konstantin Taranov [Wed, 13 Dec 2023 10:01:47 +0000 (02:01 -0800)]
net: mana: add msix index sharing between EQs

This patch allows to assign and poll more than one EQ on the same
msix index.
It is achieved by introducing a list of attached EQs in each IRQ context.
It also removes the existing msix_index map that tried to ensure that there
is only one EQ at each msix_index.
This patch exports symbols for creating EQs from other MANA kernel modules.

Signed-off-by: Konstantin Taranov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agoocteontx2-af: Fix multicast/mirror group lock/unlock issue
Suman Ghosh [Wed, 13 Dec 2023 09:53:49 +0000 (15:23 +0530)]
octeontx2-af: Fix multicast/mirror group lock/unlock issue

As per the existing implementation, there exists a race between finding
a multicast/mirror group entry and deleting that entry. The group lock
was taken and released independently by rvu_nix_mcast_find_grp_elem()
function. Which is incorrect and group lock should be taken during the
entire operation of group updation/deletion. This patch fixes the same.

Fixes: 51b2804c19cd ("octeontx2-af: Add new mbox to support multicast/mirror offload")
Signed-off-by: Suman Ghosh <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agonet: libwx: fix memory leak on free page
duanqiangwen [Thu, 14 Dec 2023 02:33:37 +0000 (10:33 +0800)]
net: libwx: fix memory leak on free page

ifconfig ethx up, will set page->refcount larger than 1,
and then ifconfig ethx down, calling __page_frag_cache_drain()
to free pages, it is not compatible with page pool.
So deleting codes which changing page->refcount.

Fixes: 3c47e8ae113a ("net: libwx: Support to receive packets in NAPI")
Signed-off-by: duanqiangwen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
15 months agoMerge tag 'mlx5-updates-2023-12-13' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Fri, 15 Dec 2023 10:00:02 +0000 (10:00 +0000)]
Merge tag 'mlx5-updates-2023-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-12-13

Preparation for mlx5e socket direct feature.

Socket direct will allow multiple PF devices attached to different
NUMA nodes but sharing the same physical port.

The following series is a small refactoring series in preparation
to support socket direct in the following submission.

Highlights:
 - Define required device registers and bits related to socket direct
 - Flow steering re-arrangements
 - Generalize TX objects (TISs) and store them in a common object, will
   be useful in the next series for per function object management.
 - Decouple raw CQ objects from their parent netdev priv
 - Prepare devcom for Socket Direct device group discovery.

Please see the individual patches for more information.
====================

Signed-off-by: David S. Miller <[email protected]>
This page took 0.153255 seconds and 4 git commands to generate.