Martin KaFai Lau [Fri, 17 Feb 2023 20:55:14 +0000 (12:55 -0800)]
bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup
The bpf_fib_lookup() also looks up the neigh table.
This was done before bpf_redirect_neigh() was added.
In the use case that does not manage the neigh table
and requires bpf_fib_lookup() to lookup a fib to
decide if it needs to redirect or not, the bpf prog can
depend only on using bpf_redirect_neigh() to lookup the
neigh. It also keeps the neigh entries fresh and connected.
This patch adds a bpf_fib_lookup flag, SKIP_NEIGH, to avoid
the double neigh lookup when the bpf prog always call
bpf_redirect_neigh() to do the neigh lookup. The params->smac
output is skipped together when SKIP_NEIGH is set because
bpf_redirect_neigh() will figure out the smac also.
Pu Lehui [Wed, 15 Feb 2023 13:52:05 +0000 (21:52 +0800)]
riscv, bpf: Add bpf trampoline support for RV64
BPF trampoline is the critical infrastructure of the BPF subsystem, acting
as a mediator between kernel functions and BPF programs. Numerous important
features, such as using BPF program for zero overhead kernel introspection,
rely on this key component. We can't wait to support bpf trampoline on RV64.
The related tests have passed, as well as the test_verifier with no new
failure ceses.
Pu Lehui [Wed, 15 Feb 2023 13:52:04 +0000 (21:52 +0800)]
riscv, bpf: Add bpf_arch_text_poke support for RV64
Implement bpf_arch_text_poke for RV64. For call scenario, to make BPF
trampoline compatible with the kernel and BPF context, we follow the
framework of RV64 ftrace to reserve 4 nops for BPF programs as function
entry, and use auipc+jalr instructions for function call. However, since
auipc+jalr call instruction is non-atomic operation, we need to use
stop-machine to make sure instructions patching in atomic context. Also,
we use auipc+jalr pair and need to patch in stop-machine context for
jump scenario.
Pu Lehui [Wed, 15 Feb 2023 13:52:03 +0000 (21:52 +0800)]
riscv, bpf: Factor out emit_call for kernel and bpf context
The current emit_call function is not suitable for kernel function call as
it store return value to bpf R0 register. We can separate it out for common
use. Meanwhile, simplify judgment logic, that is, fixed function address
can use jal or auipc+jalr, while the unfixed can use only auipc+jalr.
Pu Lehui [Wed, 15 Feb 2023 13:52:02 +0000 (21:52 +0800)]
riscv: Extend patch_text for multiple instructions
Extend patch_text for multiple instructions. This is the preparaiton for
multiple instructions text patching in riscv BPF trampoline, and may be
useful for other scenario.
Andrii Nakryiko [Thu, 16 Feb 2023 04:59:54 +0000 (20:59 -0800)]
selftests/bpf: Add global subprog context passing tests
Add tests validating that it's possible to pass context arguments into
global subprogs for various types of programs, including a particularly
tricky KPROBE programs (which cover kprobes, uprobes, USDTs, a vast and
important class of programs).
Andrii Nakryiko [Thu, 16 Feb 2023 04:59:53 +0000 (20:59 -0800)]
selftests/bpf: Convert test_global_funcs test to test_loader framework
Convert 17 test_global_funcs subtests into test_loader framework for
easier maintenance and more declarative way to define expected
failures/successes.
Andrii Nakryiko [Thu, 16 Feb 2023 04:59:52 +0000 (20:59 -0800)]
bpf: Fix global subprog context argument resolution logic
KPROBE program's user-facing context type is defined as typedef
bpf_user_pt_regs_t. This leads to a problem when trying to passing
kprobe/uprobe/usdt context argument into global subprog, as kernel
always strip away mods and typedefs of user-supplied type, but takes
expected type from bpf_ctx_convert as is, which causes mismatch.
Current way to work around this is to define a fake struct with the same
name as expected typedef:
This patch fixes the issue by resolving expected type, if it's not
a struct. It still leaves the above work-around working for backwards
compatibility.
Hengqi Chen [Tue, 14 Feb 2023 15:26:33 +0000 (15:26 +0000)]
LoongArch, bpf: Use 4 instructions for function address in JIT
This patch fixes the following issue of function calls in JIT, like:
[ 29.346981] multi-func JIT bug 105 != 103
The issus can be reproduced by running the "inline simple bpf_loop call"
verifier test.
This is because we are emiting 2-4 instructions for 64-bit immediate moves.
During the first pass of JIT, the placeholder address is zero, emiting two
instructions for it. In the extra pass, the function address is in XKVRANGE,
emiting four instructions for it. This change the instruction index in
JIT context. Let's always use 4 instructions for function address in JIT.
So that the instruction sequences don't change between the first pass and
the extra pass for function calls.
Arnd Bergmann [Fri, 17 Feb 2023 09:59:04 +0000 (10:59 +0100)]
wifi: rtl8xxxu: add LEDS_CLASS dependency
rtl8xxxu now unconditionally uses LEDS_CLASS, so a Kconfig dependency
is required to avoid link errors:
aarch64-linux-ld: drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.o: in function `rtl8xxxu_disconnect':
rtl8xxxu_core.c:(.text+0x730): undefined reference to `led_classdev_unregister'
Martin KaFai Lau [Fri, 17 Feb 2023 00:41:48 +0000 (16:41 -0800)]
bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state
The bpf_fib_lookup() helper does not only look up the fib (ie. route)
but it also looks up the neigh. Before returning the neigh, the helper
does not check for NUD_VALID. When a neigh state (neigh->nud_state)
is in NUD_FAILED, its dmac (neigh->ha) could be all zeros. The helper
still returns SUCCESS instead of NO_NEIGH in this case. Because of the
SUCCESS return value, the bpf prog directly uses the returned dmac
and ends up filling all zero in the eth header.
This patch checks for NUD_VALID and returns NO_NEIGH if the neigh is
not valid.
Martin KaFai Lau [Fri, 17 Feb 2023 00:41:47 +0000 (16:41 -0800)]
bpf: Disable bh in bpf_test_run for xdp and tc prog
Some of the bpf helpers require bh disabled. eg. The bpf_fib_lookup
helper that will be used in a latter selftest. In particular, it
calls ___neigh_lookup_noref that expects the bh disabled.
This patch disables bh before calling bpf_prog_run[_xdp], so
the testing prog can also use those helpers.
Xsk Tx can be triggered via either sendmsg() or poll() syscalls. These
two paths share a call to common function xsk_xmit() which has two
sanity checks within. A pseudo code example to show the two paths:
__xsk_sendmsg() : xsk_poll():
if (unlikely(!xsk_is_bound(xs))) if (unlikely(!xsk_is_bound(xs)))
return -ENXIO; return mask;
if (unlikely(need_wait)) (...)
return -EOPNOTSUPP; xsk_xmit()
mark napi id
(...)
xsk_xmit()
xsk_xmit():
if (unlikely(!(xs->dev->flags & IFF_UP)))
return -ENETDOWN;
if (unlikely(!xs->tx))
return -ENOBUFS;
As it can be observed above, in sendmsg() napi id can be marked on
interface that was not brought up and this causes a NULL ptr
dereference:
To fix this, let's make xsk_xmit a function that will be responsible for
generic Tx, where RCU is handled accordingly and pull out sanity checks
and xs->zc handling. Populate sanity checks to __xsk_sendmsg() and
xsk_poll().
Unfortunately, it has come to our attention that there is still a bug
somewhere in the READ_PLUS code that can result in nfsroot systems on
ARM to crash during boot.
Let's do the right thing and revert this change so we don't break
people's nfsroot setups.
Use radix_tree_maybe_preload() which does preload only if
gfpflags_allow_blocking() is true but also takes the lock. Therefore,
unconditionally calling radix_tree_preload_end() should not create any
issues and the message disappears.
netfilter: let reset rules clean out conntrack entries
iptables/nftables support responding to tcp packets with tcp resets.
The generated tcp reset packet passes through both output and postrouting
netfilter hooks, but conntrack will never see them because the generated
skb has its ->nfct pointer copied over from the packet that triggered the
reset rule.
If the reset rule is used for established connections, this
may result in the conntrack entry to be around for a very long
time (default timeout is 5 days).
One way to avoid this would be to not copy the nf_conn pointer
so that the rest packet passes through conntrack too.
Problem is that output rules might not have the same conntrack
zone setup as the prerouting ones, so its possible that the
reset skb won't find the correct entry. Generating a template
entry for the skb seems error prone as well.
Add an explicit "closing" function that switches a confirmed
conntrack entry to closed state and wire this up for tcp.
If the entry isn't confirmed, no action is needed because
the conntrack entry will never be committed to the table.
Johannes Berg [Thu, 16 Feb 2023 20:34:44 +0000 (21:34 +0100)]
wifi: iwlegacy: avoid fortify warning
There are two different alive messages, the "init" one is
bigger than the other one, so we have a fortify read warn
here. Avoid it by copying from the variable-sized 'raw'
instead.
Po-Hao Huang [Thu, 16 Feb 2023 08:28:07 +0000 (16:28 +0800)]
wifi: rtw89: fix AP mode authentication transmission failed
For some ICs, packets can't be sent correctly without initializing
CMAC table first. Previous flow do this initialization after
associated, results in authentication response fails to transmit.
Move the initialization up front to a proper place to solve this.
Ping-Ke Shih [Thu, 16 Feb 2023 05:36:33 +0000 (13:36 +0800)]
wifi: rtw88: use RTW_FLAG_POWERON flag to prevent to power on/off twice
Use power state to decide whether we can enter or leave IPS accurately,
and then prevent to power on/off twice.
The commit 6bf3a083407b ("wifi: rtw88: add flag check before enter or leave IPS")
would like to prevent this as well, but it still can't entirely handle all
cases. The exception is that WiFi gets connected and does suspend/resume,
it will power on twice and cause it failed to power on after resuming,
like:
rtw_8723de 0000:03:00.0: failed to poll offset=0x6 mask=0x2 value=0x2
rtw_8723de 0000:03:00.0: mac power on failed
rtw_8723de 0000:03:00.0: failed to power on mac
rtw_8723de 0000:03:00.0: leave idle state failed
rtw_8723de 0000:03:00.0: failed to leave ips state
rtw_8723de 0000:03:00.0: failed to leave idle state
rtw_8723de 0000:03:00.0: failed to send h2c command
To fix this, introduce new flag RTW_FLAG_POWERON to reflect power state,
and call rtw_mac_pre_system_cfg() to configure registers properly between
power-off/-on.
Keith Busch [Thu, 16 Feb 2023 16:44:03 +0000 (08:44 -0800)]
nvme-pci: refresh visible attrs for cmb attributes
The sysfs group containing the cmb attributes is registered before the
driver knows if they need to be visible or not. Update the group when
cmb attributes are known to exist so the visibility setting is correct.
Linus Torvalds [Fri, 17 Feb 2023 04:23:32 +0000 (20:23 -0800)]
Merge tag 'drm-fixes-2023-02-17' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Just a final collection of misc fixes, the biggest disables the
recently added dynamic debugging support, it has a regression that
needs some bigger fixes.
Otherwise a bunch of fixes across the board, vc4, amdgpu and vmwgfx
mostly, with some smaller i915 and ast fixes.
* tag 'drm-fixes-2023-02-17' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: Fail atomic_check early on normalize_zpos error
drm/amd/amdgpu: fix warning during suspend
drm/vmwgfx: Do not drop the reference to the handle too soon
drm/vmwgfx: Stop accessing buffer objects which failed init
drm/i915/gen11: Wa_1408615072/Wa_1407596294 should be on GT list
drm: Disable dynamic debug as broken
drm/ast: Fix start address computation
fbdev: Fix invalid page access after closing deferred I/O devices
drm/vc4: crtc: Increase setup cost in core clock calculation to handle extreme reduced blanking
drm/vc4: hdmi: Always enable GCP with AVMUTE cleared
drm/vc4: Fix YUV plane handling when planes are in different buffers
Zach O'Keefe [Wed, 25 Jan 2023 01:57:37 +0000 (17:57 -0800)]
mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount
During collapse, in a few places we check to see if a given small page has
any unaccounted references. If the refcount on the page doesn't match our
expectations, it must be there is an unknown user concurrently interested
in the page, and so it's not safe to move the contents elsewhere.
However, the unaccounted pins are likely an ephemeral state.
In this situation, MADV_COLLAPSE returns -EINVAL when it should return
-EAGAIN. This could cause userspace to conclude that the syscall
failed, when it in fact could succeed by retrying.
Qian Yingjin [Wed, 8 Feb 2023 02:24:00 +0000 (10:24 +0800)]
mm/filemap: fix page end in filemap_get_read_batch
I was running traces of the read code against an RAID storage system to
understand why read requests were being misaligned against the underlying
RAID strips. I found that the page end offset calculation in
filemap_get_read_batch() was off by one.
When a read is submitted with end offset 1048575, then it calculates the
end page for read of 256 when it should be 255. "last_index" is the index
of the page beyond the end of the read and it should be skipped when get a
batch of pages for read in @filemap_get_read_batch().
The below simple patch fixes the problem. This code was introduced in
kernel 5.12.
Benjamin Gray [Fri, 17 Feb 2023 01:14:34 +0000 (12:14 +1100)]
powerpc/64s: Prevent fallthrough to hash TLB flush when using radix
In the fix reconnecting hash__tlb_flush() to tlb_flush() the
void return on radix__tlb_flush() was not restored and subsequently
falls through to the restored hash__tlb_flush().
Guard hash__tlb_flush() under an else to prevent this.
These are type-safe wrappers around bpf_obj_get_info_by_fd(). They
found one problem in selftests, and are also useful for adding
Memory Sanitizer annotations.
Dave Airlie [Thu, 16 Feb 2023 23:23:43 +0000 (09:23 +1000)]
Merge tag 'drm-misc-fixes-2023-02-16' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Multiple fixes in vc4 to address issues with YUV planes, HDMI and CRTC;
an invalid page access fix for fbdev, mark dynamic debug as broken, a
double free and refcounting fix for vmwgfx.
Mark Rutland [Thu, 16 Feb 2023 14:12:39 +0000 (14:12 +0000)]
arm64: perf: reject CHAIN events at creation time
Currently it's possible for a user to open CHAIN events arbitrarily,
which we previously tried to rule out in commit:
ca2b497253ad01c8 ("arm64: perf: Reject stand-alone CHAIN events for PMUv3")
Which allowed the events to be opened, but prevented them from being
scheduled by by using an arm_pmu::filter_match hook to reject the
relevant events.
The CHAIN event filtering in the arm_pmu::filter_match hook was silently
removed in commit:
As a result, it's now possible for users to open CHAIN events, and for
these to be installed arbitrarily.
Fix this by rejecting CHAIN events at creation time. This avoids the
creation of events which will never count, and doesn't require using the
dynamic filtering.
Attempting to open a CHAIN event (0x1e) will now be rejected:
| # ./perf stat -e armv8_pmuv3/config=0x1e/ ls
| perf
|
| Performance counter stats for 'ls':
|
| <not supported> armv8_pmuv3/config=0x1e/
|
| 0.002197470 seconds time elapsed
|
| 0.000000000 seconds user
| 0.002294000 seconds sys
Other events (e.g. CPU_CYCLES / 0x11) will open as usual:
| # ./perf stat -e armv8_pmuv3/config=0x11/ ls
| perf
|
| Performance counter stats for 'ls':
|
| 2538761 armv8_pmuv3/config=0x11/
|
| 0.002227330 seconds time elapsed
|
| 0.002369000 seconds user
| 0.000000000 seconds sys
That commit replaced the pmu::filter_match() callback with
pmu::filter(), whose return value has the opposite polarity, with true
implying events should be ignored rather than scheduled. While an
attempt was made to update the logic in armv8pmu_filter() and
armpmu_filter() accordingly, the return value remains inverted in a
couple of cases:
* If the arm_pmu does not have an arm_pmu::filter() callback,
armpmu_filter() will always return whether the CPU is supported rather
than whether the CPU is not supported.
As a result, the perf core will not schedule events on supported CPUs,
resulting in a loss of events. Additionally, the perf core will
attempt to schedule events on unsupported CPUs, but this will be
rejected by armpmu_add(), which may result in a loss of events from
other PMUs on those unsupported CPUs.
* If the arm_pmu does have an arm_pmu::filter() callback, and
armpmu_filter() is called on a CPU which is not supported by the
arm_pmu, armpmu_filter() will return false rather than true.
As a result, the perf core will attempt to schedule events on
unsupported CPUs, but this will be rejected by armpmu_add(), which may
result in a loss of events from other PMUs on those unsupported CPUs.
This means a loss of events can be seen with any arm_pmu driver, but
with the ARMv8 PMUv3 driver (which is the only arm_pmu driver with an
arm_pmu::filter() callback) the event loss will be more limited and may
go unnoticed, which is how this issue evaded testing so far.
Fix the CPU filtering by performing this consistently in
armpmu_filter(), and remove the redundant arm_pmu::filter() callback and
armv8pmu_filter() implementation.
Commit bd2756811766 also silently removed the CHAIN event filtering from
armv8pmu_filter(), which will be addressed by a separate patch without
using the filter callback.
Using "va" and "vb" doesn't match what's written on the board, or the
communications from StarFive.
Switching to using the silkscreened version number will ease confusion &
the risk of another spin of the board containing a "conflicting" version
identifier.
As the binding has not made it into mainline yet, take the opportunity
to "correct" things.
Linus Torvalds [Thu, 16 Feb 2023 20:13:58 +0000 (12:13 -0800)]
Merge tag 'net-6.2-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Fixes from the main networking tree only, probably because all
sub-trees have backed off and haven't submitted their changes.
None of the fixes here are particularly scary and no outstanding
regressions. In an ideal world the "current release" sections would be
empty at this stage but that never happens.
Current release - regressions:
- fix unwanted sign extension in netdev_stats_to_stats64()
Current release - new code bugs:
- initialize net->notrefcnt_tracker earlier
- devlink: fix netdev notifier chain corruption
- nfp: make sure mbox accesses in IPsec code are atomic
- ice: fix check for weight and priority of a scheduling node
- mpls: fix stale pointer if allocation fails during device rename
- dccp/tcp: avoid negative sk_forward_alloc by ipv6_pinfo.pktoptions
- remove WARN_ON_ONCE(sk->sk_forward_alloc) from
sk_stream_kill_queues()
- af_key: fix heap information leak
- ipv6: fix socket connection with DSCP (correct interpretation of
the tclass field vs fib rule matching)
- tipc: fix kernel warning when sending SYN message
- vmxnet3: read RSS information from the correct descriptor (eop)"
* tag 'net-6.2-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits)
devlink: Fix netdev notifier chain corruption
igb: conditionalize I2C bit banging on external thermal sensor support
net: mpls: fix stale pointer if allocation fails during device rename
net/sched: tcindex: search key must be 16 bits
tipc: fix kernel warning when sending SYN message
igb: Fix PPS input and output using 3rd and 4th SDP
net: use a bounce buffer for copying skb->mark
ixgbe: add double of VLAN header when computing the max MTU
i40e: add double of VLAN header when computing the max MTU
ixgbe: allow to increase MTU to 3K with XDP enabled
net: stmmac: Restrict warning on disabling DMA store and fwd mode
net/sched: act_ctinfo: use percpu stats
net: stmmac: fix order of dwmac5 FlexPPS parametrization sequence
ice: fix lost multicast packets in promisc mode
ice: Fix check for weight and priority of a scheduling node
bnxt_en: Fix mqprio and XDP ring checking logic
net: Fix unwanted sign extension in netdev_stats_to_stats64()
net/usb: kalmia: Don't pass act_len in usb_bulk_msg error path
net: openvswitch: fix possible memory leak in ovs_meter_cmd_set()
af_key: Fix heap information leak
...
Linus Torvalds [Thu, 16 Feb 2023 20:05:33 +0000 (12:05 -0800)]
Merge tag 'block-6.2-2023-02-16' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"Just a few NVMe fixes that should go into the 6.2 release, adding a
quirk and fixing two issues introduced in this release:
- NVMe fixes via Christoph:
- Always return an ERR_PTR from nvme_pci_alloc_dev (Irvin Cote)
- Add bogus ID quirk for ADATA SX6000PNP (Daniel Wagner)
- Set the DMA mask earlier (Christoph Hellwig)"
* tag 'block-6.2-2023-02-16' of git://git.kernel.dk/linux:
nvme-pci: always return an ERR_PTR from nvme_pci_alloc_dev
nvme-pci: set the DMA mask earlier
nvme-pci: add bogus ID quirk for ADATA SX6000PNP
Linus Torvalds [Thu, 16 Feb 2023 20:01:46 +0000 (12:01 -0800)]
Merge tag 'spi-v6.2-rc8-abi' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fix from Mark Brown:
"One more last minute patch for v6.2 updating the parsing of the newly
added spi-cs-setup-delay-ns.
It's been pointed out that due to the way DT parsing works the change
in property size is ABI visible so let's not let a release go out
without it being fixed. The change got split from some earlier ABI
related fixes to the property since the first version sent had a build
error"
* tag 'spi-v6.2-rc8-abi' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: Use a 32-bit DT property for spi-cs-setup-delay-ns
Another small batch of patches to be seen as preparation for adding
support of the newly available esd CAN-USB/3 to esd_usb.c.
Due to some unresolved questions adding support for
CAN_CTRLMODE_BERR_REPORTING has been postponed to one of the future
patches.
v2 -> v3:
* More specific subjects
* Try to use imperative instead of past tense
v1 -> v2: https://lore.kernel.org/all/20230214160223.1199464[email protected]
* [Patch v2 1/3]: No changes.
* [Patch v2 2/3]: Make use of can_change_state() and relocate testing
alloc_can_err_skb() for NULL to the end of esd_usb_rx_event(), to
have things like can_bus_off(), can_change_state() working even in
out of memory conditions.
* [Patch v2 3/3]: No changes. I will 'declare esd_usb_msg as an union
instead of a struct' in a separate follow-up patch.
Frank Jungclaus [Thu, 16 Feb 2023 19:04:50 +0000 (20:04 +0100)]
can: esd_usb: Improve readability on decoding ESD_EV_CAN_ERROR_EXT messages
As suggested by Marc introduce a union plus a struct ev_can_err_ext
for easier decoding of an ESD_EV_CAN_ERROR_EXT event message (which
simply is a rx_msg with some dedicated data).
Frank Jungclaus [Thu, 16 Feb 2023 19:04:49 +0000 (20:04 +0100)]
can: esd_usb: Make use of can_change_state() and relocate checking skb for NULL
Start a rework initiated by Vincents remarks "You should not report
the greatest of txerr and rxerr but the one which actually increased."
[1] and "As far as I understand, those flags should be set only when
the threshold is reached" [2] .
Therefore make use of can_change_state() to (among others) set the
flags CAN_ERR_CRTL_[RT]X_WARNING and CAN_ERR_CRTL_[RT]X_PASSIVE,
maintain CAN statistic counters for error_warning, error_passive and
bus_off.
Relocate testing alloc_can_err_skb() for NULL to the end of
esd_usb_rx_event(), to have things like can_bus_off(),
can_change_state() working even in out of memory conditions.
Fixes: 96d8e90382dc ("can: Add driver for esd CAN-USB/2 device") Signed-off-by: Frank Jungclaus <[email protected]>
Link: [1] https://lore.kernel.org/all/CAMZ6RqKGBWe15aMkf8-QLf-cOQg99GQBebSm+1wEzTqHgvmNuw@mail.gmail.com/
Link: [2] https://lore.kernel.org/all/CAMZ6Rq+QBO1yTX_o6GV0yhdBj-RzZSRGWDZBS0fs7zbSTy4hmA@mail.gmail.com/ Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Marc Kleine-Budde <[email protected]>
Frank Jungclaus [Thu, 16 Feb 2023 19:04:48 +0000 (20:04 +0100)]
can: esd_usb: Move mislocated storage of SJA1000_ECC_SEG bits in case of a bus error
Move the supply for cf->data[3] (bit stream position of CAN error), in
case of a bus- or protocol-error, outside of the "switch (ecc &
SJA1000_ECC_MASK){}"-statement, because this bit stream position is
independent of the error type.
Yang Li [Thu, 16 Feb 2023 09:06:10 +0000 (17:06 +0800)]
can: ctucanfd: ctucan_platform_probe(): use devm_platform_ioremap_resource()
Convert platform_get_resource(), devm_ioremap_resource() to a single
call to Use devm_platform_ioremap_resource(), as this is exactly what
this function does.
The uuid code is very low maintainance now that the major overhaul
has completed, and doesn't need it's own tree. All the recent work
has been done by Andy who'd like to stay on as a reviewer without an
explicit tree.
Jakub Kicinski [Thu, 16 Feb 2023 19:36:14 +0000 (11:36 -0800)]
Merge branch 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Leon Romanovsky says:
====================
mlx5-next changes
Following previous conversations [1] and our clear commitment to do
the TC work [2], please pull mlx5-next shared branch, which includes
low-level steering logic to allow RoCEv2 traffic to be encrypted/
decrypted through IPsec.
* 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Configure IPsec steering for egress RoCEv2 traffic
net/mlx5: Configure IPsec steering for ingress RoCEv2 traffic
net/mlx5: Add IPSec priorities in RDMA namespaces
net/mlx5: Implement new destination type TABLE_TYPE
net/mlx5: Introduce new destination type TABLE_TYPE
====================
Jakub Kicinski [Thu, 16 Feb 2023 19:33:10 +0000 (11:33 -0800)]
Merge tag 'wireless-next-2023-03-16' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
Major stack changes:
* EHT channel puncturing support (client & AP)
* some support for AP MLD without mac80211
* fixes for A-MSDU on mesh connections
Major driver changes:
iwlwifi
* EHT rate reporting
* Bump FW API to 74 for AX devices
* STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware
mt76
* switch to using page pool allocator
* mt7996 EHT (Wi-Fi 7) support
* Wireless Ethernet Dispatch (WED) reset support
libertas
* WPS enrollee support
brcmfmac
* Rename Cypress 89459 to BCM4355
* BCM4355 and BCM4377 support
mwifiex
* SD8978 chipset support
rtl8xxxu
* LED support
ath12k
* new driver for Qualcomm Wi-Fi 7 devices
ath11k
* IPQ5018 support
* Fine Timing Measurement (FTM) responder role support
* channel 177 support
ath10k
* store WLAN firmware version in SMEM image table
* tag 'wireless-next-2023-03-16' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (207 commits)
wifi: brcmfmac: p2p: Introduce generic flexible array frame member
wifi: mac80211: add documentation for amsdu_mesh_control
wifi: cfg80211: remove gfp parameter from cfg80211_obss_color_collision_notify description
wifi: mac80211: always initialize link_sta with sta
wifi: mac80211: pass 'sta' to ieee80211_rx_data_set_sta()
wifi: cfg80211: Set SSID if it is not already set
wifi: rtw89: move H2C of del_pkt_offload before polling FW status ready
wifi: rtw89: use readable return 0 in rtw89_mac_cfg_ppdu_status()
wifi: rtw88: usb: drop now unnecessary URB size check
wifi: rtw88: usb: send Zero length packets if necessary
wifi: rtw88: usb: Set qsel correctly
wifi: mac80211: fix off-by-one link setting
wifi: mac80211: Fix for Rx fragmented action frames
wifi: mac80211: avoid u32_encode_bits() warning
wifi: mac80211: Don't translate MLD addresses for multicast
wifi: cfg80211: call reg_notifier for self managed wiphy from driver hint
wifi: cfg80211: get rid of gfp in cfg80211_bss_color_notify
wifi: nl80211: Allow authentication frames and set keys on NAN interface
wifi: mac80211: fix non-MLO station association
wifi: mac80211: Allow NSS change only up to capability
...
====================
block: bio-integrity: Copy flags when bio_integrity_payload is cloned
Make sure to copy the flags when a bio_integrity_payload is cloned.
Otherwise per-I/O properties such as IP checksum flag will not be
passed down to the HBA driver. Since the integrity buffer is owned by
the original bio, the BIP_BLOCK_INTEGRITY flag needs to be masked off
to avoid a double free in the completion path.
Jinke Han [Thu, 16 Feb 2023 03:22:50 +0000 (11:22 +0800)]
block: Fix io statistics for cgroup in throttle path
In the current code, io statistics are missing for cgroup when bio
was throttled by blk-throttle. Fix it by moving the unreaching code
to submit_bio_noacct_nocheck.
kvm: initialize all of the kvm_debugregs structure before sending it to userspace
When calling the KVM_GET_DEBUGREGS ioctl, on some configurations, there
might be some unitialized portions of the kvm_debugregs structure that
could be copied to userspace. Prevent this as is done in the other kvm
ioctls, by setting the whole structure to 0 before copying anything into
it.
Bonus is that this reduces the lines of code as the explicit flag
setting and reserved space zeroing out can be removed.
Jens Axboe [Wed, 15 Feb 2023 23:43:47 +0000 (16:43 -0700)]
brd: mark as nowait compatible
By default, non-mq drivers do not support nowait. This causes io_uring
to use a slower path as the driver cannot be trust not to block. brd
can safely set the nowait flag, as worst case all it does is a NOIO
allocation.
For io_uring, this makes a substantial difference. Before:
Jens Axboe [Thu, 16 Feb 2023 15:01:08 +0000 (08:01 -0700)]
brd: check for REQ_NOWAIT and set correct page allocation mask
If REQ_NOWAIT is set, then do a non-blocking allocation if the operation
is a write and we need to insert a new page. Currently REQ_NOWAIT cannot
be set as the queue isn't marked as supporting nowait, this change is in
preparation for allowing that.
radix_tree_preload() warns on attempting to call it with an allocation
mask that doesn't allow blocking. While that warning could arguably
be removed, we need to handle radix insertion failures anyway as they
are more likely if we cannot block to get memory.
Remove legacy BUG_ON()'s and turn them into proper errors instead, one
for the allocation failure and one for finding a page that doesn't
match the correct index.
ASoC: SOF: Intel: hda-dai: fix possible stream_tag leak
The HDaudio stream allocation is done first, and in a second step the
LOSIDV parameter is programmed for the multi-link used by a codec.
This leads to a possible stream_tag leak, e.g. if a DisplayAudio link
is not used. This would happen when a non-Intel graphics card is used
and userspace unconditionally uses the Intel Display Audio PCMs without
checking if they are connected to a receiver with jack controls.
We should first check that there is a valid multi-link entry to
configure before allocating a stream_tag. This change aligns the
dma_assign and dma_cleanup phases.
Ming Lei [Thu, 9 Feb 2023 12:55:27 +0000 (20:55 +0800)]
block: sync mixed merged request's failfast with 1st bio's
We support mixed merge for requests/bios with different fastfail
settings. When request fails, each time we only handle the portion
with same failfast setting, then bios with failfast can be failed
immediately, and bios without failfast can be retried.
The idea is pretty good, but the current implementation has several
defects:
1) initially RA bio doesn't set failfast, however bio merge code
doesn't consider this point, and just check its failfast setting for
deciding if mixed merge is required. Fix this issue by adding helper
of bio_failfast().
2) when merging bio to request front, if this request is mixed
merged, we have to sync request's faifast setting with 1st bio's
failfast. Fix it by calling blk_update_mixed_merge().
3) when merging bio to request back, if this request is mixed
merged, we have to mark the bio as failfast, because blk_update_request
simply updates request failfast with 1st bio's failfast. Fix
it by calling blk_update_mixed_merge().
Fixes one normal EXT4 READ IO failure issue, because it is observed
that the normal READ IO is merged with RA IO, and the mixed merged
request has different failfast setting with 1st bio's, so finally
the normal READ IO doesn't get retried.
Josh Triplett [Wed, 15 Feb 2023 00:42:22 +0000 (16:42 -0800)]
io_uring: Support calling io_uring_register with a registered ring fd
Add a new flag IORING_REGISTER_USE_REGISTERED_RING (set via the high bit
of the opcode) to treat the fd as a registered index rather than a file
descriptor.
This makes it possible for a library to open an io_uring, register the
ring fd, close the ring fd, and subsequently use the ring entirely via
registered index.
====================
seg6: add PSP flavor support for SRv6 End behavior
Segment Routing for IPv6 (SRv6 in short) [1] is the instantiation of the
Segment Routing (SR) [2] architecture on the IPv6 dataplane.
In SRv6, the segment identifiers (SID) are IPv6 addresses and the segment list
(SID List) is carried in the Segment Routing Header (SRH). A segment may be
bound to a specific packet processing operation called "behavior". The RFC8986
[3] defines and standardizes the most common/relevant behaviors for network
operators, e.g., End, End.X and End.T and so on.
The RFC8986 also introduces the "flavors" framework aiming to modify or extend
the capabilities of SRv6 End, End.X and End.T behaviors. Specifically, these
behaviors support the following flavors (either individually or in
combinations):
- Penultimate Segment Pop (PSP);
- Ultimate Segment Pop (USP);
- Ultimate Segment Decapsulation (USD).
Such flavors enable an End/End.X/End.T behavior to pop the SRH on the
penultimate/ultimate SR endpoint node listed in the SID List or to perform a
full decapsulation.
Currently, the Linux kernel supports a large subset of behaviors described in
RFC8986, including the End, End.X and End.T. However, PSP, USP and USD flavors
have not yet been implemented.
In this patchset, we extend the SRv6 subsystem to implement the PSP flavor in
the SRv6 End behavior. To accomplish this task, we leverage the flavor
framework previously introduced by another patchset required for supporting the
efficient representation of the SID List through the NEXT-C-SID mechanism [4].
In details, the patchset is made of:
- patch 1/3: seg6: factor out End lookup nexthop processing to a dedicated
function
- patch 2/3: seg6: add PSP flavor support for SRv6 End behavior
- patch 3/3: selftests: seg6: add selftest for PSP flavor in SRv6 End
behavior
From the user space perspective, we do not need to change the iproute2 code to
support the PSP flavor. However, we provide the man page for the PSP flavor in
a separate patch.
Comments, improvements and suggestions are always appreciated.
Andrea Mayer [Wed, 15 Feb 2023 13:46:59 +0000 (14:46 +0100)]
selftests: seg6: add selftest for PSP flavor in SRv6 End behavior
This selftest is designed for testing the PSP flavor in SRv6 End behavior.
It instantiates a virtual network composed of several nodes: hosts and
SRv6 routers. Each node is realized using a network namespace that is
properly interconnected to others through veth pairs.
The test makes use of the SRv6 End behavior and of the PSP flavor needed
for removing the SRH from the IPv6 header at the penultimate node.
The correct execution of the behavior is verified through reachability
tests carried out between hosts.
Andrea Mayer [Wed, 15 Feb 2023 13:46:58 +0000 (14:46 +0100)]
seg6: add PSP flavor support for SRv6 End behavior
The "flavors" framework defined in RFC8986 [1] represents additional
operations that can modify or extend a subset of existing behaviors such as
SRv6 End, End.X and End.T. We report these flavors hereafter:
- Penultimate Segment Pop (PSP);
- Ultimate Segment Pop (USP);
- Ultimate Segment Decapsulation (USD).
Depending on how the Segment Routing Header (SRH) has to be handled, an
SRv6 End* behavior can support these flavors either individually or in
combinations.
In this patch, we only consider the PSP flavor for the SRv6 End behavior.
A PSP enabled SRv6 End behavior is used by the Source/Ingress SR node
(i.e., the one applying the SRv6 Policy) when it needs to instruct the
penultimate SR Endpoint node listed in the SID List (carried by the SRH) to
remove the SRH from the IPv6 header.
Specifically, a PSP enabled SRv6 End behavior processes the SRH by:
i) decreasing the Segment Left (SL) from 1 to 0;
ii) copying the Last Segment IDentifier (SID) into the IPv6 Destination
Address (DA);
iii) removing (i.e., popping) the outer SRH from the extension headers
following the IPv6 header.
It is important to note that PSP operation (steps i, ii, iii) takes place
only at a penultimate SR Segment Endpoint node (i.e., when the SL=1) and
does not happen at non-penultimate Endpoint nodes. Indeed, when a SID of
PSP flavor is processed at a non-penultimate SR Segment Endpoint node, the
PSP operation is not performed because it would not be possible to decrease
the SL from 1 to 0.
SL=2 SL=1 SL=0
| | |
For example, given the SRv6 policy (SID List := < X, Y, Z >):
- a PSP enabled SRv6 End behavior bound to SID "Y" will apply the PSP
operation as Segment Left (SL) is 1, corresponding to the Penultimate
Segment of the SID List;
- a PSP enabled SRv6 End behavior bound to SID "X" will *NOT* apply the
PSP operation as the Segment Left is 2. This behavior instance will
apply the "standard" End packet processing, ignoring the configured PSP
flavor at all.
Andrea Mayer [Wed, 15 Feb 2023 13:46:57 +0000 (14:46 +0100)]
seg6: factor out End lookup nexthop processing to a dedicated function
The End nexthop lookup/input operations are moved into a new helper
function named input_action_end_finish(). This avoids duplicating the
code needed to compute the nexthop in the different flavors of the End
behavior.
Paolo Abeni [Thu, 16 Feb 2023 11:03:15 +0000 (12:03 +0100)]
Merge branch 'sfc-devlink-support-for-ef100'
Alejandro Lucero says:
====================
sfc: devlink support for ef100
This patchset adds devlink port support for ef100 allowing setting VFs
mac addresses through the VF representor devlink ports.
Basic devlink infrastructure is first introduced, then support for info
command. Next changes for enumerating MAE ports which will be used for
devlink port creation when netdevs are registered.
Adding support for devlink port_function_hw_addr_get requires changes in
the ef100 driver for getting the mac address based on a client handle.
This allows to obtain VFs mac addresses during netdev initialization as
well what is included in patch 6.
Such client handle is used in patches 7 and 8 for getting and setting
devlink port addresses.
====================
Alejandro Lucero [Wed, 15 Feb 2023 09:08:28 +0000 (09:08 +0000)]
sfc: add support for devlink port_function_hw_addr_set in ef100
Using the builtin client handle id infrastructure, add support for
setting the mac address linked to mports in ef100. This implies to
execute an MCDI command for giving the address to the firmware for
the specific devlink port.
Alejandro Lucero [Wed, 15 Feb 2023 09:08:27 +0000 (09:08 +0000)]
sfc: add support for devlink port_function_hw_addr_get in ef100
Using the builtin client handle id infrastructure, add support for
obtaining the mac address linked to mports in ef100. This implies
to execute an MCDI command for getting the data from the firmware
for each devlink port.
Alejandro Lucero [Wed, 15 Feb 2023 09:08:26 +0000 (09:08 +0000)]
sfc: obtain device mac address based on firmware handle for ef100
Getting device mac address is currently based on a specific MCDI command
only available for the PF. This patch changes the MCDI command to a
generic one for PFs and VFs based on a client handle. This allows both
PFs and VFs to ask for their mac address during initialization using the
CLIENT_HANDLE_SELF.
Moreover, the patch allows other client handles which will be used by
the PF to ask for mac addresses linked to VFs. This is necessary for
suporting the port_function_hw_addr_get devlink function in further
patches.
Alejandro Lucero [Wed, 15 Feb 2023 09:08:25 +0000 (09:08 +0000)]
sfc: add devlink port support for ef100
Using the data when enumerating mports, create devlink ports just before
netdevs are registered and remove those devlink ports after netdev has
been unregistered.
Alejandro Lucero [Wed, 15 Feb 2023 09:08:24 +0000 (09:08 +0000)]
sfc: add mport lookup based on driver's mport data
Obtaining mport id is based on asking the firmware about it. This is
still needed for mport initialization itself, but once the mport data is
now kept by the driver, further mport id request can be satisfied
internally without firmware interaction.
Previous function is just modified in name making clear the firmware
interaction. The new function uses the old name and looks for the data
in the mport data structure.
Alejandro Lucero [Wed, 15 Feb 2023 09:08:22 +0000 (09:08 +0000)]
sfc: add devlink info support for ef100
Add devlink info support for ef100. The information reported is obtained
through the MCDI interface with the specific meaning defined in new
documentation file.
Ido Schimmel [Wed, 15 Feb 2023 07:31:39 +0000 (09:31 +0200)]
devlink: Fix netdev notifier chain corruption
Cited commit changed devlink to register its netdev notifier block on
the global netdev notifier chain instead of on the per network namespace
one.
However, when changing the network namespace of the devlink instance,
devlink still tries to unregister its notifier block from the chain of
the old namespace and register it on the chain of the new namespace.
This results in corruption of the notifier chains, as the same notifier
block is registered on two different chains: The global one and the per
network namespace one. In turn, this causes other problems such as the
inability to dismantle namespaces due to netdev reference count issues.
Fix by preventing devlink from moving its notifier block between
namespaces.
Reproducer:
# echo "10 1" > /sys/bus/netdevsim/new_device
# ip netns add test123
# devlink dev reload netdevsim/netdevsim10 netns test123
# ip netns del test123
[ 71.935619] unregister_netdevice: waiting for lo to become free. Usage count = 2
[ 71.938348] leaked reference.
====================
net/sched: transition actions to pcpu stats and rcu
Following the work done for act_pedit[0], transition the remaining tc
actions to percpu stats and rcu, whenever possible.
Percpu stats make updating the action stats very cheap, while combining
it with rcu action parameters makes it possible to get rid of the per
action lock in the datapath.
For act_connmark and act_nat we run the following tests:
- tc filter add dev ens2f0 ingress matchall action connmark
- tc filter add dev ens2f0 ingress matchall action nat ingress any 10.10.10.10
Our setup consists of a 26 cores Intel CPU and a 25G NIC.
We use TRex to shoot 10mpps TCP packets and take perf measurements.
Both actions improved performance as expected since the datapath lock disappeared.
For act_pedit we move the drop counter to percpu, when available.
For act_gate we move the counters to percpu, when available.
Pedro Tammela [Tue, 14 Feb 2023 21:15:34 +0000 (18:15 -0300)]
net/sched: act_pedit: use percpu overlimit counter when available
Since act_pedit now has access to percpu counters, use the
tcf_action_inc_overlimit_qstats wrapper that will use the percpu
counter whenever they are available.
Pedro Tammela [Tue, 14 Feb 2023 21:15:33 +0000 (18:15 -0300)]
net/sched: act_gate: use percpu stats
The tc action act_gate was using shared stats, move it to percpu stats.
tdc results:
1..12
ok 1 5153 - Add gate action with priority and sched-entry
ok 2 7189 - Add gate action with base-time
ok 3 a721 - Add gate action with cycle-time
ok 4 c029 - Add gate action with cycle-time-ext
ok 5 3719 - Replace gate base-time action
ok 6 d821 - Delete gate action with valid index
ok 7 3128 - Delete gate action with invalid index
ok 8 7837 - List gate actions
ok 9 9273 - Flush gate actions
ok 10 c829 - Add gate action with duplicate index
ok 11 3043 - Add gate action with invalid index
ok 12 2930 - Add gate action with cookie
tdc results:
1..15
ok 1 2002 - Add valid connmark action with defaults
ok 2 56a5 - Add valid connmark action with control pass
ok 3 7c66 - Add valid connmark action with control drop
ok 4 a913 - Add valid connmark action with control pipe
ok 5 bdd8 - Add valid connmark action with control reclassify
ok 6 b8be - Add valid connmark action with control continue
ok 7 d8a6 - Add valid connmark action with control jump
ok 8 aae8 - Add valid connmark action with zone argument
ok 9 2f0b - Add valid connmark action with invalid zone argument
ok 10 9305 - Add connmark action with unsupported argument
ok 11 71ca - Add valid connmark action and replace it
ok 12 5f8f - Add valid connmark action with cookie
ok 13 c506 - Replace connmark with invalid goto chain control
ok 14 6571 - Delete connmark action with valid index
ok 15 3426 - Delete connmark action with invalid index
tdc results:
1..27
ok 1 7565 - Add nat action on ingress with default control action
ok 2 fd79 - Add nat action on ingress with pipe control action
ok 3 eab9 - Add nat action on ingress with continue control action
ok 4 c53a - Add nat action on ingress with reclassify control action
ok 5 76c9 - Add nat action on ingress with jump control action
ok 6 24c6 - Add nat action on ingress with drop control action
ok 7 2120 - Add nat action on ingress with maximum index value
ok 8 3e9d - Add nat action on ingress with invalid index value
ok 9 f6c9 - Add nat action on ingress with invalid IP address
ok 10 be25 - Add nat action on ingress with invalid argument
ok 11 a7bd - Add nat action on ingress with DEFAULT IP address
ok 12 ee1e - Add nat action on ingress with ANY IP address
ok 13 1de8 - Add nat action on ingress with ALL IP address
ok 14 8dba - Add nat action on egress with default control action
ok 15 19a7 - Add nat action on egress with pipe control action
ok 16 f1d9 - Add nat action on egress with continue control action
ok 17 6d4a - Add nat action on egress with reclassify control action
ok 18 b313 - Add nat action on egress with jump control action
ok 19 d9fc - Add nat action on egress with drop control action
ok 20 a895 - Add nat action on egress with DEFAULT IP address
ok 21 2572 - Add nat action on egress with ANY IP address
ok 22 37f3 - Add nat action on egress with ALL IP address
ok 23 6054 - Add nat action on egress with cookie
ok 24 79d6 - Add nat action on ingress with cookie
ok 25 4b12 - Replace nat action with invalid goto chain control
ok 26 b811 - Delete nat action with valid index
ok 27 a521 - Delete nat action with invalid index
====================
net/core: commmon prints for promisc
Add a print to the kernel log for allmulticast entry and exit, and
standardize the print for entry and exit of promiscuous mode.
These prints are useful to both user and developer and should have the
triggering driver/bus/device info that netdev_info (optionally) gives.
====================
Jesse Brandeburg [Tue, 14 Feb 2023 21:01:17 +0000 (13:01 -0800)]
net/core: refactor promiscuous mode message
The kernel stack can be more consistent by printing the IFF_PROMISC
aka promiscuous enable/disable messages with the standard netdev_info
message which can include bus and driver info as well as the device.
typical command usage from user space looks like:
ip link set eth0 promisc <on|off>
But lots of utilities such as bridge, tcpdump, etc put the interface into
promiscuous mode.
old message:
[ 406.034418] device eth0 entered promiscuous mode
[ 408.424703] device eth0 left promiscuous mode
new message:
[ 406.034431] ice 0000:17:00.0 eth0: entered promiscuous mode
[ 408.424715] ice 0000:17:00.0 eth0: left promiscuous mode
Jesse Brandeburg [Tue, 14 Feb 2023 21:01:16 +0000 (13:01 -0800)]
net/core: print message for allmulticast
When the user sets or clears the IFF_ALLMULTI flag in the netdev, there are
no log messages printed to the kernel log to indicate anything happened.
This is inexplicably different from most other dev->flags changes, and
could suprise the user.
Typically this occurs from user-space when a user:
ip link set eth0 allmulticast <on|off>
However, other devices like bridge set allmulticast as well, and many
other flows might trigger entry into allmulticast as well.
The new message uses the standard netdev_info print and looks like:
[ 413.246110] ixgbe 0000:17:00.0 eth0: entered allmulticast mode
[ 415.977184] ixgbe 0000:17:00.0 eth0: left allmulticast mode
Kees Cook [Wed, 15 Feb 2023 22:41:14 +0000 (14:41 -0800)]
wifi: brcmfmac: p2p: Introduce generic flexible array frame member
Silence run-time memcpy() false positive warning when processing
management frames:
memcpy: detected field-spanning write (size 27) of single field "&mgmt_frame->u" at drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c:1469 (size 26)
Due to this (soon to be fixed) GCC bug[1], FORTIFY_SOURCE (via
__builtin_dynamic_object_size) doesn't recognize that the union may end
with a flexible array, and returns "26" (the fixed size of the union),
rather than the remaining size of the allocation. Add an explicit
flexible array member and set it as the destination here, so that we
get the correct coverage for the memcpy().
====================
net/sched: Retire some tc qdiscs and classifiers
The CBQ + dsmark qdiscs and the tcindex + rsvp classifiers have served us for
over 2 decades. Unfortunately, they have not been getting much attention due
to reduced usage. While we dont have a good metric for tabulating how much use
a specific kernel feature gets, for these specific features we observed that
some of the functionality has been broken for some time and no users complained.
In addition, syzkaller has been going to town on most of these and finding
issues; and while we have been fixing those issues, at times it becomes obvious
that we would need to perform bigger surgeries to resolve things found while
getting a syzkaller fix in place. After some discussion we feel that in order
to reduce the maintenance burden it is best to retire them.
This patchset leaves the UAPI alone. I could send another version which deletes
the UAPI as well. AFAIK, this has not been done before - so it wasnt clear what
how to handle UAPI. It seems legit to just delete it but we would need to
coordinate with iproute2 (given they sync up with kernel uapi headers). There
are probably other users we don't know of that copy kernel headers.
If folks feel differently I will resend the patches deleting UAPI for these
qdiscs and classifiers.
I will start another thread on iproute2 before sending any patches to iproute2.
====================
Jamal Hadi Salim [Tue, 14 Feb 2023 13:49:15 +0000 (08:49 -0500)]
net/sched: Retire rsvp classifier
The rsvp classifier has served us well for about a quarter of a century but has
has not been getting much maintenance attention due to lack of known users.
Jamal Hadi Salim [Tue, 14 Feb 2023 13:49:14 +0000 (08:49 -0500)]
net/sched: Retire tcindex classifier
The tcindex classifier has served us well for about a quarter of a century
but has not been getting much TLC due to lack of known users. Most recently
it has become easy prey to syzkaller. For this reason, we are retiring it.