Git Repo - linux.git/log

bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup

The bpf_fib_lookup() also looks up the neigh table.
This was done before bpf_redirect_neigh() was added.

In the use case that does not manage the neigh table
and requires bpf_fib_lookup() to lookup a fib to
decide if it needs to redirect or not, the bpf prog can
depend only on using bpf_redirect_neigh() to lookup the
neigh. It also keeps the neigh entries fresh and connected.

This patch adds a bpf_fib_lookup flag, SKIP_NEIGH, to avoid
the double neigh lookup when the bpf prog always call
bpf_redirect_neigh() to do the neigh lookup. The params->smac
output is skipped together when SKIP_NEIGH is set because
bpf_redirect_neigh() will figure out the smac also.

Signed-off-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

riscv, bpf: Add bpf trampoline support for RV64

BPF trampoline is the critical infrastructure of the BPF subsystem, acting
as a mediator between kernel functions and BPF programs. Numerous important
features, such as using BPF program for zero overhead kernel introspection,
rely on this key component. We can't wait to support bpf trampoline on RV64.
The related tests have passed, as well as the test_verifier with no new
failure ceses.

Signed-off-by: Pu Lehui <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Tested-by: Björn Töpel <[email protected]>
Acked-by: Björn Töpel <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

riscv, bpf: Add bpf_arch_text_poke support for RV64

Implement bpf_arch_text_poke for RV64. For call scenario, to make BPF
trampoline compatible with the kernel and BPF context, we follow the
framework of RV64 ftrace to reserve 4 nops for BPF programs as function
entry, and use auipc+jalr instructions for function call. However, since
auipc+jalr call instruction is non-atomic operation, we need to use
stop-machine to make sure instructions patching in atomic context. Also,
we use auipc+jalr pair and need to patch in stop-machine context for
jump scenario.

Signed-off-by: Pu Lehui <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Tested-by: Björn Töpel <[email protected]>
Acked-by: Björn Töpel <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

riscv, bpf: Factor out emit_call for kernel and bpf context

The current emit_call function is not suitable for kernel function call as
it store return value to bpf R0 register. We can separate it out for common
use. Meanwhile, simplify judgment logic, that is, fixed function address
can use jal or auipc+jalr, while the unfixed can use only auipc+jalr.

Signed-off-by: Pu Lehui <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Tested-by: Björn Töpel <[email protected]>
Acked-by: Björn Töpel <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

riscv: Extend patch_text for multiple instructions

Extend patch_text for multiple instructions. This is the preparaiton for
multiple instructions text patching in riscv BPF trampoline, and may be
useful for other scenario.

Signed-off-by: Pu Lehui <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Tested-by: Björn Töpel <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Acked-by: Björn Töpel <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES"

This reverts commit 6c20822fada1b8adb77fa450d03a0d449686a4a9.

build bot failed on arch with different cache line size:
https://lore.kernel.org/bpf/50c35055-afa9-d01e-9a05-ea5351280e4f@intel.com/

Signed-off-by: Martin KaFai Lau <[email protected]>

selftests/bpf: Add global subprog context passing tests

Add tests validating that it's possible to pass context arguments into
global subprogs for various types of programs, including a particularly
tricky KPROBE programs (which cover kprobes, uprobes, USDTs, a vast and
important class of programs).

Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

selftests/bpf: Convert test_global_funcs test to test_loader framework

Convert 17 test_global_funcs subtests into test_loader framework for
easier maintenance and more declarative way to define expected
failures/successes.

Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf: Fix global subprog context argument resolution logic

KPROBE program's user-facing context type is defined as typedef
bpf_user_pt_regs_t. This leads to a problem when trying to passing
kprobe/uprobe/usdt context argument into global subprog, as kernel
always strip away mods and typedefs of user-supplied type, but takes
expected type from bpf_ctx_convert as is, which causes mismatch.

Current way to work around this is to define a fake struct with the same
name as expected typedef:

struct bpf_user_pt_regs_t {};

__noinline my_global_subprog(struct bpf_user_pt_regs_t *ctx) { ... }

This patch fixes the issue by resolving expected type, if it's not
a struct. It still leaves the above work-around working for backwards
compatibility.

Fixes: 91cc1a99740e ("bpf: Annotate context types")
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

LoongArch, bpf: Use 4 instructions for function address in JIT

This patch fixes the following issue of function calls in JIT, like:

[ 29.346981] multi-func JIT bug 105 != 103

The issus can be reproduced by running the "inline simple bpf_loop call"
verifier test.

This is because we are emiting 2-4 instructions for 64-bit immediate moves.
During the first pass of JIT, the placeholder address is zero, emiting two
instructions for it. In the extra pass, the function address is in XKVRANGE,
emiting four instructions for it. This change the instruction index in
JIT context. Let's always use 4 instructions for function address in JIT.
So that the instruction sequences don't change between the first pass and
the extra pass for function calls.

Fixes: 5dc615520c4d ("LoongArch: Add BPF JIT support")
Signed-off-by: Hengqi Chen <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Tested-by: Tiezhu Yang <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

wifi: rtl8xxxu: add LEDS_CLASS dependency

rtl8xxxu now unconditionally uses LEDS_CLASS, so a Kconfig dependency
is required to avoid link errors:

aarch64-linux-ld: drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.o: in function `rtl8xxxu_disconnect':
rtl8xxxu_core.c:(.text+0x730): undefined reference to `led_classdev_unregister'

ERROR: modpost: "led_classdev_unregister" [drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.ko] undefined!
ERROR: modpost: "led_classdev_register_ext" [drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.ko] undefined!

Fixes: 3be01622995b ("wifi: rtl8xxxu: Register the LED and make it blink")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Merge tag 'nvme-6.2-2022-02-17' of git://git.infradead.org/nvme into block-6.2

Pull NVMe fix from Christoph:

"nvme fix for Linux 6.2

- fix visibility of the CMB sysfs attributes (Keith Busch)"

* tag 'nvme-6.2-2022-02-17' of git://git.infradead.org/nvme:
nvme-pci: refresh visible attrs for cmb attributes

bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state

The bpf_fib_lookup() helper does not only look up the fib (ie. route)
but it also looks up the neigh. Before returning the neigh, the helper
does not check for NUD_VALID. When a neigh state (neigh->nud_state)
is in NUD_FAILED, its dmac (neigh->ha) could be all zeros. The helper
still returns SUCCESS instead of NO_NEIGH in this case. Because of the
SUCCESS return value, the bpf prog directly uses the returned dmac
and ends up filling all zero in the eth header.

This patch checks for NUD_VALID and returns NO_NEIGH if the neigh is
not valid.

Signed-off-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf: Disable bh in bpf_test_run for xdp and tc prog

Some of the bpf helpers require bh disabled. eg. The bpf_fib_lookup
helper that will be used in a latter selftest. In particular, it
calls ___neigh_lookup_noref that expects the bh disabled.

This patch disables bh before calling bpf_prog_run[_xdp], so
the testing prog can also use those helpers.

Signed-off-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

xsk: check IFF_UP earlier in Tx path

Xsk Tx can be triggered via either sendmsg() or poll() syscalls. These
two paths share a call to common function xsk_xmit() which has two
sanity checks within. A pseudo code example to show the two paths:

__xsk_sendmsg() :                       xsk_poll():
if (unlikely(!xsk_is_bound(xs)))        if (unlikely(!xsk_is_bound(xs)))
    return -ENXIO;                          return mask;
if (unlikely(need_wait))                (...)
    return -EOPNOTSUPP;                 xsk_xmit()
mark napi id
(...)
xsk_xmit()

xsk_xmit():
if (unlikely(!(xs->dev->flags & IFF_UP)))
return -ENETDOWN;
if (unlikely(!xs->tx))
return -ENOBUFS;

As it can be observed above, in sendmsg() napi id can be marked on
interface that was not brought up and this causes a NULL ptr
dereference:

[31757.505631] BUG: kernel NULL pointer dereference, address: 0000000000000018
[31757.512710] #PF: supervisor read access in kernel mode
[31757.517936] #PF: error_code(0x0000) - not-present page
[31757.523149] PGD 0 P4D 0
[31757.525726] Oops: 0000 [#1] PREEMPT SMP NOPTI
[31757.530154] CPU: 26 PID: 95641 Comm: xdpsock Not tainted 6.2.0-rc5+ #40
[31757.536871] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[31757.547457] RIP: 0010:xsk_sendmsg+0xde/0x180
[31757.551799] Code: 00 75 a2 48 8b 00 a8 04 75 9b 84 d2 74 69 8b 85 14 01 00 00 85 c0 75 1b 48 8b 85 28 03 00 00 48 8b 80 98 00 00 00 48 8b 40 20 <8b> 40 18 89 85 14 01 00 00 8b bd 14 01 00 00 81 ff 00 01 00 00 0f
[31757.570840] RSP: 0018:ffffc90034f27dc0 EFLAGS: 00010246
[31757.576143] RAX: 0000000000000000 RBX: ffffc90034f27e18 RCX: 0000000000000000
[31757.583389] RDX: 0000000000000001 RSI: ffffc90034f27e18 RDI: ffff88984cf3c100
[31757.590631] RBP: ffff88984714a800 R08: ffff88984714a800 R09: 0000000000000000
[31757.597877] R10: 0000000000000001 R11: 0000000000000000 R12: 00000000fffffffa
[31757.605123] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000000
[31757.612364] FS:  00007fb4c5931180(0000) GS:ffff88afdfa00000(0000) knlGS:0000000000000000
[31757.620571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[31757.626406] CR2: 0000000000000018 CR3: 000000184b41c003 CR4: 00000000007706e0
[31757.633648] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[31757.640894] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[31757.648139] PKRU: 55555554
[31757.650894] Call Trace:
[31757.653385]  <TASK>
[31757.655524]  sock_sendmsg+0x8f/0xa0
[31757.659077]  ? sockfd_lookup_light+0x12/0x70
[31757.663416]  __sys_sendto+0xfc/0x170
[31757.667051]  ? do_sched_setscheduler+0xdb/0x1b0
[31757.671658]  __x64_sys_sendto+0x20/0x30
[31757.675557]  do_syscall_64+0x38/0x90
[31757.679197]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[31757.687969] Code: 8e f6 ff 44 8b 4c 24 2c 4c 8b 44 24 20 41 89 c4 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 3a 44 89 e7 48 89 44 24 08 e8 b5 8e f6 ff 48
[31757.707007] RSP: 002b:00007ffd49c73c70 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
[31757.714694] RAX: ffffffffffffffda RBX: 000055a996565380 RCX: 00007fb4c5727c16
[31757.721939] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
[31757.729184] RBP: 0000000000000040 R08: 0000000000000000 R09: 0000000000000000
[31757.736429] R10: 0000000000000040 R11: 0000000000000293 R12: 0000000000000000
[31757.743673] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[31757.754940]  </TASK>

To fix this, let's make xsk_xmit a function that will be responsible for
generic Tx, where RCU is handled accordingly and pull out sanity checks
and xs->zc handling. Populate sanity checks to __xsk_sendmsg() and
xsk_poll().

Fixes: ca2e1a627035 ("xsk: Mark napi_id on sendmsg()")
Fixes: 18b1ab7aa76b ("xsk: Fix race at socket teardown")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

Revert "NFSv4.2: Change the default KConfig value for READ_PLUS"

This reverts commit 7fd461c47c6cfab4ca4d003790ec276209e52978.

Unfortunately, it has come to our attention that there is still a bug
somewhere in the READ_PLUS code that can result in nfsroot systems on
ARM to crash during boot.

Let's do the right thing and revert this change so we don't break
people's nfsroot setups.

Signed-off-by: Anna Schumaker <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>

brd: use radix_tree_maybe_preload instead of radix_tree_preload

Unconditionally calling radix_tree_preload_end() results in a OOPS
message as the preload is only conditionally called for
gfpflags_allow_blocking().

[   20.267323] BUG: using smp_processor_id() in preemptible [00000000] code: fio/416
[   20.267837] caller is brd_insert_page.part.0+0xbe/0x190 [brd]
[   20.269436] Call Trace:
[   20.269598]  <TASK>
[   20.269742]  dump_stack_lvl+0x32/0x50
[   20.269982]  check_preemption_disabled+0xd1/0xe0
[   20.270289]  brd_insert_page.part.0+0xbe/0x190 [brd]
[   20.270664]  brd_submit_bio+0x33f/0xf40 [brd]

Use radix_tree_maybe_preload() which does preload only if
gfpflags_allow_blocking() is true but also takes the lock. Therefore,
unconditionally calling radix_tree_preload_end() should not create any
issues and the message disappears.

Fixes: 6ded703c56c2 ("brd: check for REQ_NOWAIT and set correct page allocation mask")
Signed-off-by: Pankaj Raghav <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

netfilter: let reset rules clean out conntrack entries

iptables/nftables support responding to tcp packets with tcp resets.

The generated tcp reset packet passes through both output and postrouting
netfilter hooks, but conntrack will never see them because the generated
skb has its ->nfct pointer copied over from the packet that triggered the
reset rule.

If the reset rule is used for established connections, this
may result in the conntrack entry to be around for a very long
time (default timeout is 5 days).

One way to avoid this would be to not copy the nf_conn pointer
so that the rest packet passes through conntrack too.

Problem is that output rules might not have the same conntrack
zone setup as the prerouting ones, so its possible that the
reset skb won't find the correct entry. Generating a template
entry for the skb seems error prone as well.

Add an explicit "closing" function that switches a confirmed
conntrack entry to closed state and wire this up for tcp.

If the entry isn't confirmed, no action is needed because
the conntrack entry will never be committed to the table.

Reported-by: Russel King <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net

Some of the devlink bits were tricky, but I think I got it right.

Signed-off-by: David S. Miller <[email protected]>

gpio: sim: fix a memory leak

Fix an inverted logic bug in gpio_sim_remove_hogs() that leads to GPIO
hog structures never being freed.

Fixes: cb8c474e79be ("gpio: sim: new testing module")
Reported-by: Mirsad Goran Todorovac <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>

wifi: iwlegacy: avoid fortify warning

There are two different alive messages, the "init" one is
bigger than the other one, so we have a fortify read warn
here. Avoid it by copying from the variable-sized 'raw'
instead.

Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

wifi: iwlwifi: mvm: remove unused iwl_dbgfs_is_match()

This inline function is unused, remove it.

Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/20230216205754.d500dcc2e90c.Id87df297263f86b5bba002f7cbb387abc13adf53@changeid

wifi: rtw89: fix AP mode authentication transmission failed

For some ICs, packets can't be sent correctly without initializing
CMAC table first. Previous flow do this initialization after
associated, results in authentication response fails to transmit.
Move the initialization up front to a proper place to solve this.

Signed-off-by: Po-Hao Huang <[email protected]>
Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

wifi: rtw88: use RTW_FLAG_POWERON flag to prevent to power on/off twice

Use power state to decide whether we can enter or leave IPS accurately,
and then prevent to power on/off twice.

The commit 6bf3a083407b ("wifi: rtw88: add flag check before enter or leave IPS")
would like to prevent this as well, but it still can't entirely handle all
cases. The exception is that WiFi gets connected and does suspend/resume,
it will power on twice and cause it failed to power on after resuming,
like:

  rtw_8723de 0000:03:00.0: failed to poll offset=0x6 mask=0x2 value=0x2
  rtw_8723de 0000:03:00.0: mac power on failed
  rtw_8723de 0000:03:00.0: failed to power on mac
  rtw_8723de 0000:03:00.0: leave idle state failed
  rtw_8723de 0000:03:00.0: failed to leave ips state
  rtw_8723de 0000:03:00.0: failed to leave idle state
  rtw_8723de 0000:03:00.0: failed to send h2c command

To fix this, introduce new flag RTW_FLAG_POWERON to reflect power state,
and call rtw_mac_pre_system_cfg() to configure registers properly between
power-off/-on.

Reported-by: Paul Gover <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217016
Fixes: 6bf3a083407b ("wifi: rtw88: add flag check before enter or leave IPS")
Cc: <[email protected]>
Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Merge tag 'asoc-fix-v6.2-rc8-2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: One more fix for v6.2

One more fix from Peter which he'd very much like to get into
v6.2.

nvme-pci: refresh visible attrs for cmb attributes

The sysfs group containing the cmb attributes is registered before the
driver knows if they need to be visible or not. Update the group when
cmb attributes are known to exist so the visibility setting is correct.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217037
Fixes: 86adbf0cdb9ec65 ("nvme: simplify transport specific device attribute handling")
Signed-off-by: Keith Busch <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

Merge tag 'drm-fixes-2023-02-17' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"Just a final collection of misc fixes, the biggest disables the
  recently added dynamic debugging support, it has a regression that
  needs some bigger fixes.

  Otherwise a bunch of fixes across the board, vc4, amdgpu and vmwgfx
  mostly, with some smaller i915 and ast fixes.

  drm:
   - dynamic debug disable for now

  fbdev:
   - deferred i/o device close fix

  amdgpu:
   - Fix GC11.x suspend warning
   - Fix display warning

  vc4:
   - YUV planes fix
   - hdmi display fix
   - crtc reduced blanking fix

  ast:
   - fix start address computation

  vmwgfx:
   - fix bo/handle races

  i915:
   - gen11 WA fix"

* tag 'drm-fixes-2023-02-17' of git://anongit.freedesktop.org/drm/drm:
  drm/amd/display: Fail atomic_check early on normalize_zpos error
  drm/amd/amdgpu: fix warning during suspend
  drm/vmwgfx: Do not drop the reference to the handle too soon
  drm/vmwgfx: Stop accessing buffer objects which failed init
  drm/i915/gen11: Wa_1408615072/Wa_1407596294 should be on GT list
  drm: Disable dynamic debug as broken
  drm/ast: Fix start address computation
  fbdev: Fix invalid page access after closing deferred I/O devices
  drm/vc4: crtc: Increase setup cost in core clock calculation to handle extreme reduced blanking
  drm/vc4: hdmi: Always enable GCP with AVMUTE cleared
  drm/vc4: Fix YUV plane handling when planes are in different buffers

block: use proper return value from bio_failfast()

kernel test robot complains about a type mismatch:

   block/blk-merge.c:984:42: sparse:     expected restricted blk_opf_t const [usertype] ff
   block/blk-merge.c:984:42: sparse:     got unsigned int
   block/blk-merge.c:1010:42: sparse: sparse: incorrect type in initializer (different base types) @@     expected restricted blk_opf_t const [usertype] ff @@     got unsigned int @@
   block/blk-merge.c:1010:42: sparse:     expected restricted blk_opf_t const [usertype] ff
   block/blk-merge.c:1010:42: sparse:     got unsigned int

because bio_failfast() is return an unsigned int rather than the
appropriate blk_opt_f type. Fix it up.

Fixes: 3ce6a115980c ("block: sync mixed merged request's failfast with 1st bio's")
Reported-by: kernel test robot <[email protected]>
Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Jens Axboe <[email protected]>

MAINTAINERS: update FPU EMULATOR web page

The web page entry for the FPU EMULATOR no longer works. I notified Bill
of this and he asked me to update it to this new entry.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Randy Dunlap <[email protected]>
Acked-by: Bill Metzenthen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount

During collapse, in a few places we check to see if a given small page has
any unaccounted references. If the refcount on the page doesn't match our
expectations, it must be there is an unknown user concurrently interested
in the page, and so it's not safe to move the contents elsewhere.
However, the unaccounted pins are likely an ephemeral state.

In this situation, MADV_COLLAPSE returns -EINVAL when it should return
-EAGAIN. This could cause userspace to conclude that the syscall
failed, when it in fact could succeed by retrying.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
Signed-off-by: Zach O'Keefe <[email protected]>
Reported-by: Hugh Dickins <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/filemap: fix page end in filemap_get_read_batch

I was running traces of the read code against an RAID storage system to
understand why read requests were being misaligned against the underlying
RAID strips.  I found that the page end offset calculation in
filemap_get_read_batch() was off by one.

When a read is submitted with end offset 1048575, then it calculates the
end page for read of 256 when it should be 255.  "last_index" is the index
of the page beyond the end of the read and it should be skipped when get a
batch of pages for read in @filemap_get_read_batch().

The below simple patch fixes the problem.  This code was introduced in
kernel 5.12.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: cbd59c48ae2b ("mm/filemap: use head pages in generic_file_buffered_read")
Signed-off-by: Qian Yingjin <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

powerpc/64s: Prevent fallthrough to hash TLB flush when using radix

In the fix reconnecting hash__tlb_flush() to tlb_flush() the
void return on radix__tlb_flush() was not restored and subsequently
falls through to the restored hash__tlb_flush().

Guard hash__tlb_flush() under an else to prevent this.

Fixes: 1665c027afb2 ("powerpc/64s: Reconnect tlb_flush() to hash__tlb_flush()")
Reported-by: "Erhard F." <[email protected]>
Suggested-by: Christophe Leroy <[email protected]>
Signed-off-by: Benjamin Gray <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Fix typos in selftest/bpf files

Run spell checker on files in selftest/bpf and fixed typos.

Signed-off-by: Taichi Nishimura <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Reviewed-by: Randy Dunlap <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

Merge tag 'drm-intel-fixes-2023-02-16' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Moving gen11 hw wa to the right place. (Matt)

Signed-off-by: Dave Airlie <[email protected]>
From: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/Y+47eUvwbafER35/@intel.com

selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()

Use the new type-safe wrappers around bpf_obj_get_info_by_fd().
Fix a prog/map mixup in prog_holds_map().

Signed-off-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

samples/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()

Use the new type-safe wrappers around bpf_obj_get_info_by_fd().

Signed-off-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd()

Use the new type-safe wrappers around bpf_obj_get_info_by_fd().

Split the bpf_obj_get_info_by_fd() call in build_btf_type_table() in
two, since knowing the type helps with the Memory Sanitizer.

Improve map_parse_fd_and_info() type safety by using
struct bpf_map_info * instead of void * for info.

Signed-off-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()

Use the new type-safe wrappers around bpf_obj_get_info_by_fd().

Signed-off-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd()

These are type-safe wrappers around bpf_obj_get_info_by_fd(). They
found one problem in selftests, and are also useful for adding
Memory Sanitizer annotations.

Signed-off-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

Merge tag 'drm-misc-fixes-2023-02-16' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Multiple fixes in vc4 to address issues with YUV planes, HDMI and CRTC;
an invalid page access fix for fbdev, mark dynamic debug as broken, a
double free and refcounting fix for vmwgfx.

Signed-off-by: Dave Airlie <[email protected]>
From: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20230216091905.i5wswy4dd74x4br5@houat

Merge tag 'amd-drm-fixes-6.2-2023-02-15' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.2-2023-02-15:

amdgpu:
- Fix GC11.x suspend warning
- Fix display warning

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

arm64: perf: reject CHAIN events at creation time

Currently it's possible for a user to open CHAIN events arbitrarily,
which we previously tried to rule out in commit:

  ca2b497253ad01c8 ("arm64: perf: Reject stand-alone CHAIN events for PMUv3")

Which allowed the events to be opened, but prevented them from being
scheduled by by using an arm_pmu::filter_match hook to reject the
relevant events.

The CHAIN event filtering in the arm_pmu::filter_match hook was silently
removed in commit:

  bd27568117664b8b ("perf: Rewrite core context handling")

As a result, it's now possible for users to open CHAIN events, and for
these to be installed arbitrarily.

Fix this by rejecting CHAIN events at creation time. This avoids the
creation of events which will never count, and doesn't require using the
dynamic filtering.

Attempting to open a CHAIN event (0x1e) will now be rejected:

| # ./perf stat -e armv8_pmuv3/config=0x1e/ ls
| perf
|
|  Performance counter stats for 'ls':
|
|    <not supported>      armv8_pmuv3/config=0x1e/
|
|        0.002197470 seconds time elapsed
|
|        0.000000000 seconds user
|        0.002294000 seconds sys

Other events (e.g. CPU_CYCLES / 0x11) will open as usual:

| # ./perf stat -e armv8_pmuv3/config=0x11/ ls
| perf
|
|  Performance counter stats for 'ls':
|
|            2538761      armv8_pmuv3/config=0x11/
|
|        0.002227330 seconds time elapsed
|
|        0.002369000 seconds user
|        0.000000000 seconds sys

Fixes: bd2756811766 ("perf: Rewrite core context handling")
Signed-off-by: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

arm_pmu: fix event CPU filtering

Janne reports that perf has been broken on Apple M1 as of commit:

  bd27568117664b8b ("perf: Rewrite core context handling")

That commit replaced the pmu::filter_match() callback with
pmu::filter(), whose return value has the opposite polarity, with true
implying events should be ignored rather than scheduled. While an
attempt was made to update the logic in armv8pmu_filter() and
armpmu_filter() accordingly, the return value remains inverted in a
couple of cases:

* If the arm_pmu does not have an arm_pmu::filter() callback,
  armpmu_filter() will always return whether the CPU is supported rather
  than whether the CPU is not supported.

  As a result, the perf core will not schedule events on supported CPUs,
  resulting in a loss of events. Additionally, the perf core will
  attempt to schedule events on unsupported CPUs, but this will be
  rejected by armpmu_add(), which may result in a loss of events from
  other PMUs on those unsupported CPUs.

* If the arm_pmu does have an arm_pmu::filter() callback, and
  armpmu_filter() is called on a CPU which is not supported by the
  arm_pmu, armpmu_filter() will return false rather than true.

  As a result, the perf core will attempt to schedule events on
  unsupported CPUs, but this will be rejected by armpmu_add(), which may
  result in a loss of events from other PMUs on those unsupported CPUs.

This means a loss of events can be seen with any arm_pmu driver, but
with the ARMv8 PMUv3 driver (which is the only arm_pmu driver with an
arm_pmu::filter() callback) the event loss will be more limited and may
go unnoticed, which is how this issue evaded testing so far.

Fix the CPU filtering by performing this consistently in
armpmu_filter(), and remove the redundant arm_pmu::filter() callback and
armv8pmu_filter() implementation.

Commit bd2756811766 also silently removed the CHAIN event filtering from
armv8pmu_filter(), which will be addressed by a separate patch without
using the filter callback.

Fixes: bd2756811766 ("perf: Rewrite core context handling")
Reported-by: Janne Grunau <[email protected]>
Link: https://lore.kernel.org/asahi/[email protected]/
Signed-off-by: Mark Rutland <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Asahi Lina <[email protected]>
Cc: Eric Curtin <[email protected]>
Tested-by: Janne Grunau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

dt-bindings: riscv: correct starfive visionfive 2 compatibles

Using "va" and "vb" doesn't match what's written on the board, or the
communications from StarFive.
Switching to using the silkscreened version number will ease confusion &
the risk of another spin of the board containing a "conflicting" version
identifier.
As the binding has not made it into mainline yet, take the opportunity
to "correct" things.

Suggested-by: Emil Renner Berthing <[email protected]>
Link: https://lore.kernel.org/linux-riscv/Y+4AxDSDLyL1WAqh@wendy/
Fixes: 97b7ed072784 ("dt-bindings: riscv: Add StarFive JH7110 SoC and VisionFive 2 board")
Signed-off-by: Conor Dooley <[email protected]>
Acked-by: Krzysztof Kozlowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'socfpga_dts_updates_for_v6.3_part2' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux into arm/dt

SoCFPGA dts updates for v6.3, part 2
- Add support for the enclustra PE1 board that is based on Arria10

* tag 'socfpga_dts_updates_for_v6.3_part2' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
ARM: dts: socfpga: Add enclustra PE1 devicetree
dt-bindings: altera: Add enclustra mercury PE1

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'net-6.2-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Fixes from the main networking tree only, probably because all
  sub-trees have backed off and haven't submitted their changes.

  None of the fixes here are particularly scary and no outstanding
  regressions. In an ideal world the "current release" sections would be
  empty at this stage but that never happens.

  Current release - regressions:

   - fix unwanted sign extension in netdev_stats_to_stats64()

  Current release - new code bugs:

   - initialize net->notrefcnt_tracker earlier

   - devlink: fix netdev notifier chain corruption

   - nfp: make sure mbox accesses in IPsec code are atomic

   - ice: fix check for weight and priority of a scheduling node

  Previous releases - regressions:

   - ice: xsk: fix cleaning of XDP_TX frame, prevent inf loop

   - igb: fix I2C bit banging config with external thermal sensor

  Previous releases - always broken:

   - sched: tcindex: update imperfect hash filters respecting rcu

   - mpls: fix stale pointer if allocation fails during device rename

   - dccp/tcp: avoid negative sk_forward_alloc by ipv6_pinfo.pktoptions

   - remove WARN_ON_ONCE(sk->sk_forward_alloc) from
     sk_stream_kill_queues()

   - af_key: fix heap information leak

   - ipv6: fix socket connection with DSCP (correct interpretation of
     the tclass field vs fib rule matching)

   - tipc: fix kernel warning when sending SYN message

   - vmxnet3: read RSS information from the correct descriptor (eop)"

* tag 'net-6.2-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits)
  devlink: Fix netdev notifier chain corruption
  igb: conditionalize I2C bit banging on external thermal sensor support
  net: mpls: fix stale pointer if allocation fails during device rename
  net/sched: tcindex: search key must be 16 bits
  tipc: fix kernel warning when sending SYN message
  igb: Fix PPS input and output using 3rd and 4th SDP
  net: use a bounce buffer for copying skb->mark
  ixgbe: add double of VLAN header when computing the max MTU
  i40e: add double of VLAN header when computing the max MTU
  ixgbe: allow to increase MTU to 3K with XDP enabled
  net: stmmac: Restrict warning on disabling DMA store and fwd mode
  net/sched: act_ctinfo: use percpu stats
  net: stmmac: fix order of dwmac5 FlexPPS parametrization sequence
  ice: fix lost multicast packets in promisc mode
  ice: Fix check for weight and priority of a scheduling node
  bnxt_en: Fix mqprio and XDP ring checking logic
  net: Fix unwanted sign extension in netdev_stats_to_stats64()
  net/usb: kalmia: Don't pass act_len in usb_bulk_msg error path
  net: openvswitch: fix possible memory leak in ovs_meter_cmd_set()
  af_key: Fix heap information leak
  ...

Merge tag 'block-6.2-2023-02-16' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:
"Just a few NVMe fixes that should go into the 6.2 release, adding a
  quirk and fixing two issues introduced in this release:

   - NVMe fixes via Christoph:
       - Always return an ERR_PTR from nvme_pci_alloc_dev (Irvin Cote)
       - Add bogus ID quirk for ADATA SX6000PNP (Daniel Wagner)
       - Set the DMA mask earlier (Christoph Hellwig)"

* tag 'block-6.2-2023-02-16' of git://git.kernel.dk/linux:
  nvme-pci: always return an ERR_PTR from nvme_pci_alloc_dev
  nvme-pci: set the DMA mask earlier
  nvme-pci: add bogus ID quirk for ADATA SX6000PNP

Merge tag 'spi-v6.2-rc8-abi' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fix from Mark Brown:
"One more last minute patch for v6.2 updating the parsing of the newly
  added spi-cs-setup-delay-ns.

  It's been pointed out that due to the way DT parsing works the change
  in property size is ABI visible so let's not let a release go out
  without it being fixed. The change got split from some earlier ABI
  related fixes to the property since the first version sent had a build
  error"

* tag 'spi-v6.2-rc8-abi' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: Use a 32-bit DT property for spi-cs-setup-delay-ns

Merge patch series "can: esd_usb: Some more preparation for supporting esd CAN-USB/3"

Frank Jungclaus <[email protected]> says:

Another small batch of patches to be seen as preparation for adding
support of the newly available esd CAN-USB/3 to esd_usb.c.

Due to some unresolved questions adding support for
CAN_CTRLMODE_BERR_REPORTING has been postponed to one of the future
patches.

v2 -> v3:
* More specific subjects
* Try to use imperative instead of past tense

v1 -> v2: https://lore.kernel.org/all/20230214160223.1199464 [email protected]
* [Patch v2 1/3]: No changes.
* [Patch v2 2/3]: Make use of can_change_state() and relocate testing
   alloc_can_err_skb() for NULL to the end of esd_usb_rx_event(), to
   have things like can_bus_off(), can_change_state() working even in
   out of memory conditions.
* [Patch v2 3/3]: No changes. I will 'declare esd_usb_msg as an union
   instead of a struct' in a separate follow-up patch.

v1: https://lore.kernel.org/all/20221219212013.1294820 [email protected]
    https://lore.kernel.org/all/20221219212717.1298282 [email protected]

Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: esd_usb: Improve readability on decoding ESD_EV_CAN_ERROR_EXT messages

As suggested by Marc introduce a union plus a struct ev_can_err_ext
for easier decoding of an ESD_EV_CAN_ERROR_EXT event message (which
simply is a rx_msg with some dedicated data).

Suggested-by: Marc Kleine-Budde <[email protected]>
Link: https://lore.kernel.org/linux-can/[email protected]/
Signed-off-by: Frank Jungclaus <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: esd_usb: Make use of can_change_state() and relocate checking skb for NULL

Start a rework initiated by Vincents remarks "You should not report
the greatest of txerr and rxerr but the one which actually increased."
[1] and "As far as I understand, those flags should be set only when
the threshold is reached" [2] .

Therefore make use of can_change_state() to (among others) set the
flags CAN_ERR_CRTL_[RT]X_WARNING and CAN_ERR_CRTL_[RT]X_PASSIVE,
maintain CAN statistic counters for error_warning, error_passive and
bus_off.

Relocate testing alloc_can_err_skb() for NULL to the end of
esd_usb_rx_event(), to have things like can_bus_off(),
can_change_state() working even in out of memory conditions.

Fixes: 96d8e90382dc ("can: Add driver for esd CAN-USB/2 device")
Signed-off-by: Frank Jungclaus <[email protected]>
Link: [1] https://lore.kernel.org/all/CAMZ6RqKGBWe15aMkf8-QLf-cOQg99GQBebSm+1wEzTqHgvmNuw@mail.gmail.com/
Link: [2] https://lore.kernel.org/all/CAMZ6Rq+QBO1yTX_o6GV0yhdBj-RzZSRGWDZBS0fs7zbSTy4hmA@mail.gmail.com/
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: esd_usb: Move mislocated storage of SJA1000_ECC_SEG bits in case of a bus error

Move the supply for cf->data[3] (bit stream position of CAN error), in
case of a bus- or protocol-error, outside of the "switch (ecc &
SJA1000_ECC_MASK){}"-statement, because this bit stream position is
independent of the error type.

Fixes: 96d8e90382dc ("can: Add driver for esd CAN-USB/2 device")
Signed-off-by: Frank Jungclaus <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>

Merge tag 'gpio-fixes-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

- fix a potential Kconfig issue with gpio-mlxbf2 not selecting
   GPIOLIB_IRQCHIP

- another immutable irqchip conversion, this time for gpio-vf610

- fix a wakeup issue on Clevo NH5xAx

* tag 'gpio-fixes-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: mlxbf2: select GPIOLIB_IRQCHIP
  gpiolib: acpi: Add a ignore wakeup quirk for Clevo NH5xAx
  gpio: vf610: make irq_chip immutable
  gpiolib: acpi: remove redundant declaration

can: ctucanfd: ctucan_platform_probe(): use devm_platform_ioremap_resource()

Convert platform_get_resource(), devm_ioremap_resource() to a single
call to Use devm_platform_ioremap_resource(), as this is exactly what
this function does.

Signed-off-by: Yang Li <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Acked-by: Pavel Pisa <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>

stop mainaining UUID

The uuid code is very low maintainance now that the major overhaul
has completed, and doesn't need it's own tree. All the recent work
has been done by Andy who'd like to stay on as a reviewer without an
explicit tree.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Andy Shevchenko <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

orphan sysvfs

This code has been stale for years and I have no way to test it.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Leon Romanovsky says:

====================
mlx5-next changes

Following previous conversations [1] and our clear commitment to do
the TC work [2], please pull mlx5-next shared branch, which includes
low-level steering logic to allow RoCEv2 traffic to be encrypted/
decrypted through IPsec.

[1] https://lore.kernel.org/all/20230126230815 [email protected]/
[2] https://lore.kernel.org/all/[email protected]/

* 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Configure IPsec steering for egress RoCEv2 traffic
  net/mlx5: Configure IPsec steering for ingress RoCEv2 traffic
  net/mlx5: Add IPSec priorities in RDMA namespaces
  net/mlx5: Implement new destination type TABLE_TYPE
  net/mlx5: Introduce new destination type TABLE_TYPE
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'wireless-next-2023-03-16' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
Major stack changes:
* EHT channel puncturing support (client & AP)
* some support for AP MLD without mac80211
* fixes for A-MSDU on mesh connections

Major driver changes:

iwlwifi
* EHT rate reporting
* Bump FW API to 74 for AX devices
* STEP equalizer support: transfer some STEP (connection to radio
   on platforms with integrated wifi) related parameters from the
   BIOS to the firmware

mt76
* switch to using page pool allocator
* mt7996 EHT (Wi-Fi 7) support
* Wireless Ethernet Dispatch (WED) reset support

libertas
* WPS enrollee support

brcmfmac
* Rename Cypress 89459 to BCM4355
* BCM4355 and BCM4377 support

mwifiex
* SD8978 chipset support

rtl8xxxu
* LED support

ath12k
* new driver for Qualcomm Wi-Fi 7 devices

ath11k
* IPQ5018 support
* Fine Timing Measurement (FTM) responder role support
* channel 177 support

ath10k
* store WLAN firmware version in SMEM image table

* tag 'wireless-next-2023-03-16' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (207 commits)
  wifi: brcmfmac: p2p: Introduce generic flexible array frame member
  wifi: mac80211: add documentation for amsdu_mesh_control
  wifi: cfg80211: remove gfp parameter from cfg80211_obss_color_collision_notify description
  wifi: mac80211: always initialize link_sta with sta
  wifi: mac80211: pass 'sta' to ieee80211_rx_data_set_sta()
  wifi: cfg80211: Set SSID if it is not already set
  wifi: rtw89: move H2C of del_pkt_offload before polling FW status ready
  wifi: rtw89: use readable return 0 in rtw89_mac_cfg_ppdu_status()
  wifi: rtw88: usb: drop now unnecessary URB size check
  wifi: rtw88: usb: send Zero length packets if necessary
  wifi: rtw88: usb: Set qsel correctly
  wifi: mac80211: fix off-by-one link setting
  wifi: mac80211: Fix for Rx fragmented action frames
  wifi: mac80211: avoid u32_encode_bits() warning
  wifi: mac80211: Don't translate MLD addresses for multicast
  wifi: cfg80211: call reg_notifier for self managed wiphy from driver hint
  wifi: cfg80211: get rid of gfp in cfg80211_bss_color_notify
  wifi: nl80211: Allow authentication frames and set keys on NAN interface
  wifi: mac80211: fix non-MLO station association
  wifi: mac80211: Allow NSS change only up to capability
  ...
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

block: bio-integrity: Copy flags when bio_integrity_payload is cloned

Make sure to copy the flags when a bio_integrity_payload is cloned.
Otherwise per-I/O properties such as IP checksum flag will not be
passed down to the HBA driver. Since the integrity buffer is owned by
the original bio, the BIP_BLOCK_INTEGRITY flag needs to be masked off
to avoid a double free in the completion path.

Fixes: aae7df50190a ("block: Integrity checksum flag")
Fixes: b1f01388574c ("block: Relocate bio integrity flags")
Reported-by: Saurav Kashyap <[email protected]>
Tested-by: Saurav Kashyap <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

block: Fix io statistics for cgroup in throttle path

In the current code, io statistics are missing for cgroup when bio
was throttled by blk-throttle. Fix it by moving the unreaching code
to submit_bio_noacct_nocheck.

Fixes: 3f98c753717c ("block: don't check bio in blk_throtl_dispatch_work_fn")
Signed-off-by: Jinke Han <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Acked-by: Muchun Song <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

kvm: initialize all of the kvm_debugregs structure before sending it to userspace

When calling the KVM_GET_DEBUGREGS ioctl, on some configurations, there
might be some unitialized portions of the kvm_debugregs structure that
could be copied to userspace. Prevent this as is done in the other kvm
ioctls, by setting the whole structure to 0 before copying anything into
it.

Bonus is that this reduces the lines of code as the explicit flag
setting and reserved space zeroing out can be removed.

Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: stable <[email protected]>
Reported-by: Xingyuan Mo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Message-Id: <20230214103304.3689213 [email protected]>
Tested-by: Xingyuan Mo <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

brd: mark as nowait compatible

By default, non-mq drivers do not support nowait. This causes io_uring
to use a slower path as the driver cannot be trust not to block. brd
can safely set the nowait flag, as worst case all it does is a NOIO
allocation.

For io_uring, this makes a substantial difference. Before:

submitter=0, tid=453, file=/dev/ram0, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=0, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
IOPS=440.03K, BW=1718MiB/s, IOS/call=32/31
IOPS=428.96K, BW=1675MiB/s, IOS/call=32/32
IOPS=442.59K, BW=1728MiB/s, IOS/call=32/31
IOPS=419.65K, BW=1639MiB/s, IOS/call=32/32
IOPS=426.82K, BW=1667MiB/s, IOS/call=32/31

and after:

submitter=0, tid=354, file=/dev/ram0, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=0, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
IOPS=3.37M, BW=13.15GiB/s, IOS/call=32/31
IOPS=3.45M, BW=13.46GiB/s, IOS/call=32/31
IOPS=3.43M, BW=13.42GiB/s, IOS/call=32/32
IOPS=3.43M, BW=13.39GiB/s, IOS/call=32/31
IOPS=3.43M, BW=13.38GiB/s, IOS/call=32/31

or about an 8x in difference. Now that brd is prepared to deal with
REQ_NOWAIT reads/writes, mark it as supporting that.

Cc: [email protected] # 5.10+
Link: https://lore.kernel.org/linux-block/[email protected]/
Signed-off-by: Jens Axboe <[email protected]>

brd: check for REQ_NOWAIT and set correct page allocation mask

If REQ_NOWAIT is set, then do a non-blocking allocation if the operation
is a write and we need to insert a new page. Currently REQ_NOWAIT cannot
be set as the queue isn't marked as supporting nowait, this change is in
preparation for allowing that.

radix_tree_preload() warns on attempting to call it with an allocation
mask that doesn't allow blocking. While that warning could arguably
be removed, we need to handle radix insertion failures anyway as they
are more likely if we cannot block to get memory.

Remove legacy BUG_ON()'s and turn them into proper errors instead, one
for the allocation failure and one for finding a page that doesn't
match the correct index.

Cc: [email protected] # 5.10+
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

brd: return 0/-error from brd_insert_page()

It currently returns a page, but callers just check for NULL/page to
gauge success. Clean this up and return the appropriate error directly
instead.

Cc: [email protected] # 5.10+
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

ASoC: SOF: Intel: hda-dai: fix possible stream_tag leak

The HDaudio stream allocation is done first, and in a second step the
LOSIDV parameter is programmed for the multi-link used by a codec.

This leads to a possible stream_tag leak, e.g. if a DisplayAudio link
is not used. This would happen when a non-Intel graphics card is used
and userspace unconditionally uses the Intel Display Audio PCMs without
checking if they are connected to a receiver with jack controls.

We should first check that there is a valid multi-link entry to
configure before allocating a stream_tag. This change aligns the
dma_assign and dma_cleanup phases.

Complements: b0cd60f3e9f5 ("ALSA/ASoC: hda: clarify bus_get_link() and bus_link_get() helpers")
Link: https://github.com/thesofproject/linux/issues/4151
Signed-off-by: Pierre-Louis Bossart <[email protected]>
Reviewed-by: Ranjani Sridharan <[email protected]>
Reviewed-by: Rander Wang <[email protected]>
Reviewed-by: Bard Liao <[email protected]>
Signed-off-by: Peter Ujfalusi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

ARM: dts: socfpga: Add enclustra PE1 devicetree

The enclustra PE1 is a baseboard from enclustra GmbH for the enclustra
Mercury AA1+ SOM.

Reviewed-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Steffen Trumtrar <[email protected]>
Signed-off-by: Dinh Nguyen <[email protected]>

dt-bindings: altera: Add enclustra mercury PE1

Add binding for the enclustra PE1 baseboard from enclustra GmbH.

Acked-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Steffen Trumtrar <[email protected]>
Signed-off-by: Dinh Nguyen <[email protected]>

erofs: fix an error code in z_erofs_init_zip_subsystem()

Return -ENOMEM if alloc_workqueue() fails. Don't return success.

Fixes: d8a650adf429 ("erofs: add per-cpu threads for decompression as an option")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
Link: https://lore.kernel.org/r/Y+4d0FRsUq8jPoOu@kili
Signed-off-by: Gao Xiang <[email protected]>

block: sync mixed merged request's failfast with 1st bio's

We support mixed merge for requests/bios with different fastfail
settings. When request fails, each time we only handle the portion
with same failfast setting, then bios with failfast can be failed
immediately, and bios without failfast can be retried.

The idea is pretty good, but the current implementation has several
defects:

1) initially RA bio doesn't set failfast, however bio merge code
doesn't consider this point, and just check its failfast setting for
deciding if mixed merge is required. Fix this issue by adding helper
of bio_failfast().

2) when merging bio to request front, if this request is mixed
merged, we have to sync request's faifast setting with 1st bio's
failfast. Fix it by calling blk_update_mixed_merge().

3) when merging bio to request back, if this request is mixed
merged, we have to mark the bio as failfast, because blk_update_request
simply updates request failfast with 1st bio's failfast. Fix
it by calling blk_update_mixed_merge().

Fixes one normal EXT4 READ IO failure issue, because it is observed
that the normal READ IO is merged with RA IO, and the mixed merged
request has different failfast setting with 1st bio's, so finally
the normal READ IO doesn't get retried.

Cc: Tejun Heo <[email protected]>
Fixes: 80a761fd33cf ("block: implement mixed merge of different failfast requests")
Signed-off-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

x86/hyperv: Fix hv_get/set_register for nested bringup

hv_get_nested_reg only translates SINT0, resulting in the wrong sint
being registered by nested vmbus.

Fix the issue with new utility function hv_is_sint_reg.

While at it, improve clarity of hv_set_non_nested_register and hv_is_synic_reg.

Signed-off-by: Nuno Das Neves <[email protected]>
Reviewed-by: Jinank Jain <[email protected]>
Link: https://lore.kernel.org/r/1675980172-6851-1-git-send-email-nunodasneves@linux.microsoft.com
Signed-off-by: Wei Liu <[email protected]>

Merge tag 'asoc-fix-v6.2-rc8' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fix for v6.2

One non-urgent fix for v6.2, this could possibly wait till the
merge window.

io_uring: Support calling io_uring_register with a registered ring fd

Add a new flag IORING_REGISTER_USE_REGISTERED_RING (set via the high bit
of the opcode) to treat the fd as a registered index rather than a file
descriptor.

This makes it possible for a library to open an io_uring, register the
ring fd, close the ring fd, and subsequently use the ring entirely via
registered index.

Signed-off-by: Josh Triplett <[email protected]>
Link: https://lore.kernel.org/r/f2396369e638284586b069dbddffb8c992afba95.1676419314.git.josh@joshtriplett.org
[axboe: remove extra high bit clear]
Signed-off-by: Jens Axboe <[email protected]>

Merge tag 'intel-gpio-v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-gpio-intel into gpio/for-current

intel-gpio for v6.2-2

* Ignore spurious wakeup by touchpad on Clevo NH5xAx
* Miscellaneous fix(es)

Merge branch 'seg6-add-psp-flavor-support-for-srv6-end-behavior'

Andrea Mayer says:

====================
seg6: add PSP flavor support for SRv6 End behavior

Segment Routing for IPv6 (SRv6 in short) [1] is the instantiation of the
Segment Routing (SR) [2] architecture on the IPv6 dataplane.
In SRv6, the segment identifiers (SID) are IPv6 addresses and the segment list
(SID List) is carried in the Segment Routing Header (SRH). A segment may be
bound to a specific packet processing operation called "behavior". The RFC8986
[3] defines and standardizes the most common/relevant behaviors for network
operators, e.g., End, End.X and End.T and so on.

The RFC8986 also introduces the "flavors" framework aiming to modify or extend
the capabilities of SRv6 End, End.X and End.T behaviors. Specifically, these
behaviors support the following flavors (either individually or in
combinations):
- Penultimate Segment Pop (PSP);
- Ultimate Segment Pop (USP);
- Ultimate Segment Decapsulation (USD).

Such flavors enable an End/End.X/End.T behavior to pop the SRH on the
penultimate/ultimate SR endpoint node listed in the SID List or to perform a
full decapsulation.

Currently, the Linux kernel supports a large subset of behaviors described in
RFC8986, including the End, End.X and End.T. However, PSP, USP and USD flavors
have not yet been implemented.

In this patchset, we extend the SRv6 subsystem to implement the PSP flavor in
the SRv6 End behavior. To accomplish this task, we leverage the flavor
framework previously introduced by another patchset required for supporting the
efficient representation of the SID List through the NEXT-C-SID mechanism [4].

In details, the patchset is made of:
- patch 1/3: seg6: factor out End lookup nexthop processing to a dedicated
function
- patch 2/3: seg6: add PSP flavor support for SRv6 End behavior
- patch 3/3: selftests: seg6: add selftest for PSP flavor in SRv6 End
behavior

From the user space perspective, we do not need to change the iproute2 code to
support the PSP flavor. However, we provide the man page for the PSP flavor in
a separate patch.

Comments, improvements and suggestions are always appreciated.

[1] - RFC8754: https://datatracker.ietf.org/doc/html/rfc8754
[2] - RFC8402: https://datatracker.ietf.org/doc/html/rfc8402
[3] - RFC8986: https://datatracker.ietf.org/doc/html/rfc8986
[4] - https://datatracker.ietf.org/doc/html/draft-ietf-spring-srv6-srh-compression
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

selftests: seg6: add selftest for PSP flavor in SRv6 End behavior

This selftest is designed for testing the PSP flavor in SRv6 End behavior.
It instantiates a virtual network composed of several nodes: hosts and
SRv6 routers. Each node is realized using a network namespace that is
properly interconnected to others through veth pairs.
The test makes use of the SRv6 End behavior and of the PSP flavor needed
for removing the SRH from the IPv6 header at the penultimate node.

The correct execution of the behavior is verified through reachability
tests carried out between hosts.

Signed-off-by: Andrea Mayer <[email protected]>
Signed-off-by: Paolo Lungaroni <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

seg6: add PSP flavor support for SRv6 End behavior

The "flavors" framework defined in RFC8986 [1] represents additional
operations that can modify or extend a subset of existing behaviors such as
SRv6 End, End.X and End.T. We report these flavors hereafter:
- Penultimate Segment Pop (PSP);
- Ultimate Segment Pop (USP);
- Ultimate Segment Decapsulation (USD).

Depending on how the Segment Routing Header (SRH) has to be handled, an
SRv6 End* behavior can support these flavors either individually or in
combinations.
In this patch, we only consider the PSP flavor for the SRv6 End behavior.

A PSP enabled SRv6 End behavior is used by the Source/Ingress SR node
(i.e., the one applying the SRv6 Policy) when it needs to instruct the
penultimate SR Endpoint node listed in the SID List (carried by the SRH) to
remove the SRH from the IPv6 header.

Specifically, a PSP enabled SRv6 End behavior processes the SRH by:
   i) decreasing the Segment Left (SL) from 1 to 0;
  ii) copying the Last Segment IDentifier (SID) into the IPv6 Destination
      Address (DA);
iii) removing (i.e., popping) the outer SRH from the extension headers
      following the IPv6 header.

It is important to note that PSP operation (steps i, ii, iii) takes place
only at a penultimate SR Segment Endpoint node (i.e., when the SL=1) and
does not happen at non-penultimate Endpoint nodes. Indeed, when a SID of
PSP flavor is processed at a non-penultimate SR Segment Endpoint node, the
PSP operation is not performed because it would not be possible to decrease
the SL from 1 to 0.

                                                 SL=2 SL=1 SL=0
                                                   |    |    |
For example, given the SRv6 policy (SID List := <  X,   Y,   Z  >):
- a PSP enabled SRv6 End behavior bound to SID "Y" will apply the PSP
   operation as Segment Left (SL) is 1, corresponding to the Penultimate
   Segment of the SID List;
- a PSP enabled SRv6 End behavior bound to SID "X" will *NOT* apply the
   PSP operation as the Segment Left is 2. This behavior instance will
   apply the "standard" End packet processing, ignoring the configured PSP
   flavor at all.

[1] - RFC8986: https://datatracker.ietf.org/doc/html/rfc8986

Signed-off-by: Andrea Mayer <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

seg6: factor out End lookup nexthop processing to a dedicated function

The End nexthop lookup/input operations are moved into a new helper
function named input_action_end_finish(). This avoids duplicating the
code needed to compute the nexthop in the different flavors of the End
behavior.

Signed-off-by: Andrea Mayer <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: dsa: ocelot: fix selecting MFD_OCELOT

Commit 3d7316ac81ac ("net: dsa: ocelot: add external ocelot switch
control") adds config NET_DSA_MSCC_OCELOT_EXT, which selects the
non-existing config MFD_OCELOT_CORE.

Replace this select with the intended and existing MFD_OCELOT.

Signed-off-by: Lukas Bulwahn <[email protected]>
Acked-by: Colin Foster <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'sfc-devlink-support-for-ef100'

Alejandro Lucero says:

====================
sfc: devlink support for ef100

This patchset adds devlink port support for ef100 allowing setting VFs
mac addresses through the VF representor devlink ports.

Basic devlink infrastructure is first introduced, then support for info
command. Next changes for enumerating MAE ports which will be used for
devlink port creation when netdevs are registered.

Adding support for devlink port_function_hw_addr_get requires changes in
the ef100 driver for getting the mac address based on a client handle.
This allows to obtain VFs mac addresses during netdev initialization as
well what is included in patch 6.

Such client handle is used in patches 7 and 8 for getting and setting
devlink port addresses.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

sfc: add support for devlink port_function_hw_addr_set in ef100

Using the builtin client handle id infrastructure, add support for
setting the mac address linked to mports in ef100. This implies to
execute an MCDI command for giving the address to the firmware for
the specific devlink port.

Signed-off-by: Alejandro Lucero <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Acked-by: Martin Habets <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

sfc: add support for devlink port_function_hw_addr_get in ef100

Using the builtin client handle id infrastructure, add support for
obtaining the mac address linked to mports in ef100. This implies
to execute an MCDI command for getting the data from the firmware
for each devlink port.

Signed-off-by: Alejandro Lucero <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Acked-by: Martin Habets <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

sfc: obtain device mac address based on firmware handle for ef100

Getting device mac address is currently based on a specific MCDI command
only available for the PF. This patch changes the MCDI command to a
generic one for PFs and VFs based on a client handle. This allows both
PFs and VFs to ask for their mac address during initialization using the
CLIENT_HANDLE_SELF.

Moreover, the patch allows other client handles which will be used by
the PF to ask for mac addresses linked to VFs. This is necessary for
suporting the port_function_hw_addr_get devlink function in further
patches.

Signed-off-by: Alejandro Lucero <[email protected]>
Acked-by: Martin Habets <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

sfc: add devlink port support for ef100

Using the data when enumerating mports, create devlink ports just before
netdevs are registered and remove those devlink ports after netdev has
been unregistered.

Signed-off-by: Alejandro Lucero <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Acked-by: Martin Habets <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

sfc: add mport lookup based on driver's mport data

Obtaining mport id is based on asking the firmware about it. This is
still needed for mport initialization itself, but once the mport data is
now kept by the driver, further mport id request can be satisfied
internally without firmware interaction.

Previous function is just modified in name making clear the firmware
interaction. The new function uses the old name and looks for the data
in the mport data structure.

Signed-off-by: Alejandro Lucero <[email protected]>
Acked-by: Martin Habets <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

sfc: enumerate mports in ef100

MAE ports (mports) are the ports on the EF100 embedded switch such
as networking PCIe functions, the physical port, and potentially
others.

Signed-off-by: Alejandro Lucero <[email protected]>
Acked-by: Martin Habets <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

sfc: add devlink info support for ef100

Add devlink info support for ef100. The information reported is obtained
through the MCDI interface with the specific meaning defined in new
documentation file.

Signed-off-by: Alejandro Lucero <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Acked-by: Martin Habets <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

sfc: add devlink support for ef100

Add devlink infrastructure support. Further patches add devlink
info and devlink port support.

Signed-off-by: Alejandro Lucero <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Acked-by: Martin Habets <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

devlink: Fix netdev notifier chain corruption

Cited commit changed devlink to register its netdev notifier block on
the global netdev notifier chain instead of on the per network namespace
one.

However, when changing the network namespace of the devlink instance,
devlink still tries to unregister its notifier block from the chain of
the old namespace and register it on the chain of the new namespace.
This results in corruption of the notifier chains, as the same notifier
block is registered on two different chains: The global one and the per
network namespace one. In turn, this causes other problems such as the
inability to dismantle namespaces due to netdev reference count issues.

Fix by preventing devlink from moving its notifier block between
namespaces.

Reproducer:

# echo "10 1" > /sys/bus/netdevsim/new_device
# ip netns add test123
# devlink dev reload netdevsim/netdevsim10 netns test123
# ip netns del test123
[ 71.935619] unregister_netdevice: waiting for lo to become free. Usage count = 2
[ 71.938348] leaked reference.

Fixes: 565b4824c39f ("devlink: change port event netdev notifier from per-net to global")
Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'net-sched-transition-actions-to-pcpu-stats-and-rcu'

Pedro Tammela says:

====================
net/sched: transition actions to pcpu stats and rcu

Following the work done for act_pedit[0], transition the remaining tc
actions to percpu stats and rcu, whenever possible.
Percpu stats make updating the action stats very cheap, while combining
it with rcu action parameters makes it possible to get rid of the per
action lock in the datapath.

For act_connmark and act_nat we run the following tests:
- tc filter add dev ens2f0 ingress matchall action connmark
- tc filter add dev ens2f0 ingress matchall action nat ingress any 10.10.10.10

Our setup consists of a 26 cores Intel CPU and a 25G NIC.
We use TRex to shoot 10mpps TCP packets and take perf measurements.
Both actions improved performance as expected since the datapath lock disappeared.

For act_pedit we move the drop counter to percpu, when available.
For act_gate we move the counters to percpu, when available.

[0] https://lore.kernel.org/all/20230131145149.3776656 [email protected]/
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net/sched: act_pedit: use percpu overlimit counter when available

Since act_pedit now has access to percpu counters, use the
tcf_action_inc_overlimit_qstats wrapper that will use the percpu
counter whenever they are available.

Reviewed-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: Pedro Tammela <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net/sched: act_gate: use percpu stats

The tc action act_gate was using shared stats, move it to percpu stats.

tdc results:
1..12
ok 1 5153 - Add gate action with priority and sched-entry
ok 2 7189 - Add gate action with base-time
ok 3 a721 - Add gate action with cycle-time
ok 4 c029 - Add gate action with cycle-time-ext
ok 5 3719 - Replace gate base-time action
ok 6 d821 - Delete gate action with valid index
ok 7 3128 - Delete gate action with invalid index
ok 8 7837 - List gate actions
ok 9 9273 - Flush gate actions
ok 10 c829 - Add gate action with duplicate index
ok 11 3043 - Add gate action with invalid index
ok 12 2930 - Add gate action with cookie

Reviewed-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: Pedro Tammela <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net/sched: act_connmark: transition to percpu stats and rcu

The tc action act_connmark was using shared stats and taking the per
action lock in the datapath. Improve it by using percpu stats and rcu.

perf before:
- 13.55% tcf_connmark_act
- 81.18% _raw_spin_lock
80.46% native_queued_spin_lock_slowpath

perf after:
- 2.85% tcf_connmark_act

tdc results:
1..15
ok 1 2002 - Add valid connmark action with defaults
ok 2 56a5 - Add valid connmark action with control pass
ok 3 7c66 - Add valid connmark action with control drop
ok 4 a913 - Add valid connmark action with control pipe
ok 5 bdd8 - Add valid connmark action with control reclassify
ok 6 b8be - Add valid connmark action with control continue
ok 7 d8a6 - Add valid connmark action with control jump
ok 8 aae8 - Add valid connmark action with zone argument
ok 9 2f0b - Add valid connmark action with invalid zone argument
ok 10 9305 - Add connmark action with unsupported argument
ok 11 71ca - Add valid connmark action and replace it
ok 12 5f8f - Add valid connmark action with cookie
ok 13 c506 - Replace connmark with invalid goto chain control
ok 14 6571 - Delete connmark action with valid index
ok 15 3426 - Delete connmark action with invalid index

Reviewed-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: Pedro Tammela <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net/sched: act_nat: transition to percpu stats and rcu

The tc action act_nat was using shared stats and taking the per action
lock in the datapath. Improve it by using percpu stats and rcu.

perf before:
- 10.48% tcf_nat_act
- 81.83% _raw_spin_lock
81.08% native_queued_spin_lock_slowpath

perf after:
- 0.48% tcf_nat_act

tdc results:
1..27
ok 1 7565 - Add nat action on ingress with default control action
ok 2 fd79 - Add nat action on ingress with pipe control action
ok 3 eab9 - Add nat action on ingress with continue control action
ok 4 c53a - Add nat action on ingress with reclassify control action
ok 5 76c9 - Add nat action on ingress with jump control action
ok 6 24c6 - Add nat action on ingress with drop control action
ok 7 2120 - Add nat action on ingress with maximum index value
ok 8 3e9d - Add nat action on ingress with invalid index value
ok 9 f6c9 - Add nat action on ingress with invalid IP address
ok 10 be25 - Add nat action on ingress with invalid argument
ok 11 a7bd - Add nat action on ingress with DEFAULT IP address
ok 12 ee1e - Add nat action on ingress with ANY IP address
ok 13 1de8 - Add nat action on ingress with ALL IP address
ok 14 8dba - Add nat action on egress with default control action
ok 15 19a7 - Add nat action on egress with pipe control action
ok 16 f1d9 - Add nat action on egress with continue control action
ok 17 6d4a - Add nat action on egress with reclassify control action
ok 18 b313 - Add nat action on egress with jump control action
ok 19 d9fc - Add nat action on egress with drop control action
ok 20 a895 - Add nat action on egress with DEFAULT IP address
ok 21 2572 - Add nat action on egress with ANY IP address
ok 22 37f3 - Add nat action on egress with ALL IP address
ok 23 6054 - Add nat action on egress with cookie
ok 24 79d6 - Add nat action on ingress with cookie
ok 25 4b12 - Replace nat action with invalid goto chain control
ok 26 b811 - Delete nat action with valid index
ok 27 a521 - Delete nat action with invalid index

Reviewed-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: Pedro Tammela <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'net-core-commmon-prints-for-promisc'

Jesse Brandeburg says:

====================
net/core: commmon prints for promisc

Add a print to the kernel log for allmulticast entry and exit, and
standardize the print for entry and exit of promiscuous mode.

These prints are useful to both user and developer and should have the
triggering driver/bus/device info that netdev_info (optionally) gives.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net/core: refactor promiscuous mode message

The kernel stack can be more consistent by printing the IFF_PROMISC
aka promiscuous enable/disable messages with the standard netdev_info
message which can include bus and driver info as well as the device.

typical command usage from user space looks like:
ip link set eth0 promisc <on|off>

But lots of utilities such as bridge, tcpdump, etc put the interface into
promiscuous mode.

old message:
[  406.034418] device eth0 entered promiscuous mode
[  408.424703] device eth0 left promiscuous mode

new message:
[  406.034431] ice 0000:17:00.0 eth0: entered promiscuous mode
[  408.424715] ice 0000:17:00.0 eth0: left promiscuous mode

Signed-off-by: Jesse Brandeburg <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net/core: print message for allmulticast

When the user sets or clears the IFF_ALLMULTI flag in the netdev, there are
no log messages printed to the kernel log to indicate anything happened.
This is inexplicably different from most other dev->flags changes, and
could suprise the user.

Typically this occurs from user-space when a user:
ip link set eth0 allmulticast <on|off>

However, other devices like bridge set allmulticast as well, and many
other flows might trigger entry into allmulticast as well.

The new message uses the standard netdev_info print and looks like:
[ 413.246110] ixgbe 0000:17:00.0 eth0: entered allmulticast mode
[ 415.977184] ixgbe 0000:17:00.0 eth0: left allmulticast mode

Signed-off-by: Jesse Brandeburg <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

wifi: brcmfmac: p2p: Introduce generic flexible array frame member

Silence run-time memcpy() false positive warning when processing
management frames:

memcpy: detected field-spanning write (size 27) of single field "&mgmt_frame->u" at drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c:1469 (size 26)

Due to this (soon to be fixed) GCC bug[1], FORTIFY_SOURCE (via
__builtin_dynamic_object_size) doesn't recognize that the union may end
with a flexible array, and returns "26" (the fixed size of the union),
rather than the remaining size of the allocation. Add an explicit
flexible array member and set it as the destination here, so that we
get the correct coverage for the memcpy().

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832

Reported-by: Ard Biesheuvel <[email protected]>
Cc: Arend van Spriel <[email protected]>
Cc: Franky Lin <[email protected]>
Cc: Hante Meuleman <[email protected]>
Cc: Kalle Valo <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: "Jason A. Donenfeld" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Darrick J. Wong" <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Brian Henriquez <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[rename 'frame' to 'body']
Signed-off-by: Johannes Berg <[email protected]>

Merge branch 'net-sched-retire-some-tc-qdiscs-and-classifiers'

Jamal Hadi Salim says:

====================
net/sched: Retire some tc qdiscs and classifiers

The CBQ + dsmark qdiscs and the tcindex + rsvp classifiers have served us for
over 2 decades. Unfortunately, they have not been getting much attention due
to reduced usage. While we dont have a good metric for tabulating how much use
a specific kernel feature gets, for these specific features we observed that
some of the functionality has been broken for some time and no users complained.
In addition, syzkaller has been going to town on most of these and finding
issues; and while we have been fixing those issues, at times it becomes obvious
that we would need to perform bigger surgeries to resolve things found while
getting a syzkaller fix in place. After some discussion we feel that in order
to reduce the maintenance burden it is best to retire them.

This patchset leaves the UAPI alone. I could send another version which deletes
the UAPI as well. AFAIK, this has not been done before - so it wasnt clear what
how to handle UAPI. It seems legit to just delete it but we would need to
coordinate with iproute2 (given they sync up with kernel uapi headers). There
are probably other users we don't know of that copy kernel headers.
If folks feel differently I will resend the patches deleting UAPI for these
qdiscs and classifiers.

I will start another thread on iproute2 before sending any patches to iproute2.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net/sched: Retire rsvp classifier

The rsvp classifier has served us well for about a quarter of a century but has
has not been getting much maintenance attention due to lack of known users.

Signed-off-by: Jamal Hadi Salim <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net/sched: Retire tcindex classifier

The tcindex classifier has served us well for about a quarter of a century
but has not been getting much TLC due to lack of known users. Most recently
it has become easy prey to syzkaller. For this reason, we are retiring it.

Signed-off-by: Jamal Hadi Salim <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>