mptcp_recvmsg is allowed to release the msk socket lock when
blocking, and before re-acquiring it another thread could have
switched the sock to TCP_LISTEN status - with a prior
connect(AF_UNSPEC) - also clearing icsk_ack.rcv_mss.
Address the issue preventing the disconnect if some other process is
concurrently performing a blocking syscall on the same socket, alike
commit 4faeee0cf8a5 ("tcp: deny tcp_disconnect() when threads are waiting").
Paolo Abeni [Tue, 20 Jun 2023 16:24:18 +0000 (18:24 +0200)]
mptcp: handle correctly disconnect() failures
Currently the mptcp code has assumes that disconnect() can fail only
at mptcp_sendmsg_fastopen() time - to avoid a deadlock scenario - and
don't even bother returning an error code.
Soon mptcp_disconnect() will handle more error conditions: let's track
them explicitly.
As a bonus, explicitly annotate TCP-level disconnect as not failing:
the mptcp code never blocks for event on the subflows.
Jakub Kicinski [Wed, 21 Jun 2023 20:59:46 +0000 (13:59 -0700)]
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-06-21
We've added 7 non-merge commits during the last 14 day(s) which contain
a total of 7 files changed, 181 insertions(+), 15 deletions(-).
The main changes are:
1) Fix a verifier id tracking issue with scalars upon spill,
from Maxim Mikityanskiy.
2) Fix NULL dereference if an exception is generated while a BPF
subprogram is running, from Krister Johansen.
3) Fix a BTF verification failure when compiling kernel with LLVM_IAS=0,
from Florent Revest.
4) Fix expected_attach_type enforcement for kprobe_multi link,
from Jiri Olsa.
5) Fix a bpf_jit_dump issue for x86_64 to pick the correct JITed image,
from Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Force kprobe multi expected_attach_type for kprobe_multi link
bpf/btf: Accept function names that contain dots
selftests/bpf: add a test for subprogram extables
bpf: ensure main program has an extable
bpf: Fix a bpf_jit_dump issue for x86_64 with sysctl bpf_jit_enable.
selftests/bpf: Add test cases to assert proper ID tracking on spill
bpf: Fix verifier id tracking of scalars on spill
====================
Jiri Olsa [Sun, 18 Jun 2023 13:14:14 +0000 (15:14 +0200)]
bpf: Force kprobe multi expected_attach_type for kprobe_multi link
We currently allow to create perf link for program with
expected_attach_type == BPF_TRACE_KPROBE_MULTI.
This will cause crash when we call helpers like get_attach_cookie or
get_func_ip in such program, because it will call the kprobe_multi's
version (current->bpf_ctx context setup) of those helpers while it
expects perf_link's current->bpf_ctx context setup.
Making sure that we use BPF_TRACE_KPROBE_MULTI expected_attach_type
only for programs attaching through kprobe_multi link.
Florent Revest [Thu, 15 Jun 2023 14:56:07 +0000 (16:56 +0200)]
bpf/btf: Accept function names that contain dots
When building a kernel with LLVM=1, LLVM_IAS=0 and CONFIG_KASAN=y, LLVM
leaves DWARF tags for the "asan.module_ctor" & co symbols. In turn,
pahole creates BTF_KIND_FUNC entries for these and this makes the BTF
metadata validation fail because they contain a dot.
In a dramatic turn of event, this BTF verification failure can cause
the netfilter_bpf initialization to fail, causing netfilter_core to
free the netfilter_helper hashmap and netfilter_ftp to trigger a
use-after-free. The risk of u-a-f in netfilter will be addressed
separately but the existence of "asan.module_ctor" debug info under some
build conditions sounds like a good enough reason to accept functions
that contain dots in BTF.
Although using only LLVM=1 is the recommended way to compile clang-based
kernels, users can certainly do LLVM=1, LLVM_IAS=0 as well and we still
try to support that combination according to Nick. To clarify:
- > v5.10 kernel, LLVM=1 (LLVM_IAS=0 is not the default) is recommended,
but user can still have LLVM=1, LLVM_IAS=0 to trigger the issue
- <= 5.10 kernel, LLVM=1 (LLVM_IAS=0 is the default) is recommended in
which case GNU as will be used
This fixes a regression in which the link would come up, but no
communication was possible.
The reverted commit was also removing a comment about
DP83867_PHYCR_FORCE_LINK_GOOD, this is not added back in this commits
since it seems that this is unrelated to the original code change.
Ross Lagerwall [Fri, 16 Jun 2023 16:45:49 +0000 (17:45 +0100)]
be2net: Extend xmit workaround to BE3 chip
We have seen a bug where the NIC incorrectly changes the length in the
IP header of a padded packet to include the padding bytes. The driver
already has a workaround for this so do the workaround for this NIC too.
This resolves the issue.
The NIC in question identifies itself as follows:
[ 8.828494] be2net 0000:02:00.0: FW version is 10.7.110.31
[ 8.834759] be2net 0000:02:00.0: Emulex OneConnect(be3): PF FLEX10 port 1
David S. Miller [Tue, 20 Jun 2023 08:40:26 +0000 (09:40 +0100)]
Merge branch 'dsa-mt7530-fixes'
Arınç ÜNAL says:
====================
net: dsa: mt7530: fix multiple CPU ports, BPDU and LLDP handling
This patch series fixes all non-theoretical issues regarding multiple CPU
ports and the handling of LLDP frames and BPDUs.
I am adding me as a maintainer, I've got some code improvements on the way.
I will keep an eye on this driver and the patches submitted for it in the
future.
Arınç
v6:
- Change a small portion of the comment in the diff on "net: dsa: mt7530:
set all CPU ports in MT7531_CPU_PMAP" with Russell's suggestion.
- Change the patch log of "net: dsa: mt7530: fix trapping frames on
non-MT7621 SoC MT7530 switch" with Vladimir's suggestion.
- Group the code for trapping frames into a common function and call that.
- Add Vladimir and Russell's reviewed-by tags to where they're given.
v5:
- Change the comment in the diff on the first patch with Russell's words.
- Change the patch log of the first patch to state that the patch is just
preparatory work for change "net: dsa: introduce
preferred_default_local_cpu_port and use on MT7530" and not a fix to an
existing problem on the code base.
- Remove the "net: dsa: mt7530: fix trapping frames with multiple CPU ports
on MT7530" patch. It fixes a theoretical issue, therefore it is net-next
material.
- Remove unnecessary information from the patch logs. Remove the enum
renaming change.
- Strengthen the point of the "net: dsa: introduce
preferred_default_local_cpu_port and use on MT7530" patch.
v4: Make the patch logs and my comments in the code easier to understand.
v3: Fix the from header on the patches. Write a cover letter.
v2: Add patches to fix the handling of LLDP frames and BPDUs.
====================
Vladimir Oltean [Sat, 17 Jun 2023 06:26:48 +0000 (09:26 +0300)]
net: dsa: introduce preferred_default_local_cpu_port and use on MT7530
Since the introduction of the OF bindings, DSA has always had a policy that
in case multiple CPU ports are present in the device tree, the numerically
smallest one is always chosen.
The MT7530 switch family, except the switch on the MT7988 SoC, has 2 CPU
ports, 5 and 6, where port 6 is preferable on the MT7531BE switch because
it has higher bandwidth.
The MT7530 driver developers had 3 options:
- to modify DSA when the MT7531 switch support was introduced, such as to
prefer the better port
- to declare both CPU ports in device trees as CPU ports, and live with the
sub-optimal performance resulting from not preferring the better port
- to declare just port 6 in the device tree as a CPU port
Of course they chose the path of least resistance (3rd option), kicking the
can down the road. The hardware description in the device tree is supposed
to be stable - developers are not supposed to adopt the strategy of
piecemeal hardware description, where the device tree is updated in
lockstep with the features that the kernel currently supports.
Now, as a result of the fact that they did that, any attempts to modify the
device tree and describe both CPU ports as CPU ports would make DSA change
its default selection from port 6 to 5, effectively resulting in a
performance degradation visible to users with the MT7531BE switch as can be
seen below.
Using one port for WAN and the other ports for LAN is a very popular use
case which is what this test emulates.
As such, this change proposes that we retroactively modify stable kernels
(which don't support the modification of the CPU port assignments, so as to
let user space fix the problem and restore the throughput) to keep the
mt7530 driver preferring port 6 even with device trees where the hardware
is more fully described.
Arınç ÜNAL [Sat, 17 Jun 2023 06:26:47 +0000 (09:26 +0300)]
net: dsa: mt7530: fix handling of LLDP frames
LLDP frames are link-local frames, therefore they must be trapped to the
CPU port. Currently, the MT753X switches treat LLDP frames as regular
multicast frames, therefore flooding them to user ports. To fix this, set
LLDP frames to be trapped to the CPU port(s).
Arınç ÜNAL [Sat, 17 Jun 2023 06:26:46 +0000 (09:26 +0300)]
net: dsa: mt7530: fix handling of BPDUs on MT7530 switch
BPDUs are link-local frames, therefore they must be trapped to the CPU
port. Currently, the MT7530 switch treats BPDUs as regular multicast
frames, therefore flooding them to user ports. To fix this, set BPDUs to be
trapped to the CPU port. Group this on mt7530_setup() and
mt7531_setup_common() into mt753x_trap_frames() and call that.
All MT7530 switch IP variants share the MT7530_MFC register, but the
current driver only writes it for the switch variant that is integrated in
the MT7621 SoC. Modify the code to include all MT7530 derivatives.
Arınç ÜNAL [Sat, 17 Jun 2023 06:26:44 +0000 (09:26 +0300)]
net: dsa: mt7530: set all CPU ports in MT7531_CPU_PMAP
MT7531_CPU_PMAP represents the destination port mask for trapped-to-CPU
frames (further restricted by PCR_MATRIX).
Currently the driver sets the first CPU port as the single port in this bit
mask, which works fine regardless of whether the device tree defines port
5, 6 or 5+6 as CPU ports. This is because the logic coincides with DSA's
logic of picking the first CPU port as the CPU port that all user ports are
affine to, by default.
An upcoming change would like to influence DSA's selection of the default
CPU port to no longer be the first one, and in that case, this logic needs
adaptation.
Since there is no observed leakage or duplication of frames if all CPU
ports are defined in this bit mask, simply include them all.
Josua Mayer [Fri, 16 Jun 2023 11:14:14 +0000 (14:14 +0300)]
net: dpaa2-mac: add 25gbase-r support
Layerscape MACs support 25Gbps network speed with dpmac "CAUI" mode.
Add the mappings between DPMAC_ETH_IF_* and HY_INTERFACE_MODE_*, as well
as the 25000 mac capability.
Tested on SolidRun LX2162a Clearfog, serdes 1 protocol 18.
Stefan Wahren [Wed, 14 Jun 2023 21:06:56 +0000 (23:06 +0200)]
net: qca_spi: Avoid high load if QCA7000 is not available
In case the QCA7000 is not available via SPI (e.g. in reset),
the driver will cause a high load. The reason for this is
that the synchronization is never finished and schedule()
is never called. Since the synchronization is not timing
critical, it's safe to drop this from the scheduling condition.
Signed-off-by: Stefan Wahren <[email protected]> Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: David S. Miller <[email protected]>
Andrew Lunn [Sat, 17 Jun 2023 15:55:00 +0000 (17:55 +0200)]
net: phy: Manual remove LEDs to ensure correct ordering
If the core is left to remove the LEDs via devm_, it is performed too
late, after the PHY driver is removed from the PHY. This results in
dereferencing a NULL pointer when the LED core tries to turn the LED
off before destroying the LED.
Manually unregister the LEDs at a safe point in phy_remove.
Íñigo Huguet [Thu, 15 Jun 2023 08:49:29 +0000 (10:49 +0200)]
sfc: use budget for TX completions
When running workloads heavy unbalanced towards TX (high TX, low RX
traffic), sfc driver can retain the CPU during too long times. Although
in many cases this is not enough to be visible, it can affect
performance and system responsiveness.
A way to reproduce it is to use a debug kernel and run some parallel
netperf TX tests. In some systems, this will lead to this message being
logged:
kernel:watchdog: BUG: soft lockup - CPU#12 stuck for 22s!
The reason is that sfc driver doesn't account any NAPI budget for the TX
completion events work. With high-TX/low-RX traffic, this makes that the
CPU is held for long time for NAPI poll.
Documentations says "drivers can process completions for any number of Tx
packets but should only process up to budget number of Rx packets".
However, many drivers do limit the amount of TX completions that they
process in a single NAPI poll.
In the same way, this patch adds a limit for the TX work in sfc. With
the patch applied, the watchdog warning never appears.
Tested with netperf in different combinations: single process / parallel
processes, TCP / UDP and different sizes of UDP messages. Repeated the
tests before and after the patch, without any noticeable difference in
network or CPU performance.
Test hardware:
Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz (4 cores, 2 threads/core)
Solarflare Communications XtremeScale X2522-25G Network Adapter
Azeem Shaikh [Tue, 13 Jun 2023 00:33:25 +0000 (00:33 +0000)]
ieee802154: Replace strlcpy with strscpy
strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().
Direct replacement is safe here since the return values
from the helper macros are ignored by the callers.
Previously during mlx5e_ipsec_handle_event the driver tried to execute
an operation that could sleep, while holding a spinlock, which caused
the kernel panic mentioned below.
Move the function call that can sleep outside of the spinlock context.
Leon Romanovsky [Mon, 5 Jun 2023 08:09:49 +0000 (11:09 +0300)]
net/mlx5e: Don't delay release of hardware objects
XFRM core provides two callbacks to release resources, one is .xdo_dev_policy_delete()
and another is .xdo_dev_policy_free(). This separation allows delayed release so
"ip xfrm policy free" commands won't starve. Unfortunately, mlx5 command interface
can't run in .xdo_dev_policy_free() callbacks as the latter runs in ATOMIC context.
net/mlx5: DR, Fix wrong action data allocation in decap action
When TUNNEL_L3_TO_L2 decap action was created, a pointer to a local
variable was passed as its HW action data, resulting in attempt to
free invalid address:
BUG: KASAN: invalid-free in mlx5dr_action_destroy+0x318/0x410 [mlx5_core]
net/mlx5: DR, Support SW created encap actions for FW table
In some cases, steering might need to use SW-created action in
FW table, which results in wrong packet reformat being used:
mlx5_core 0000:81:00.1: mlx5_cmd_check:756:(pid 1154):
SET_FLOW_TABLE_ENTRY(0×936) op_mod(0×0) failed,
status bad resource(0×5), syndrome (0xf2ff71)
This patch adds support for usage of SW-created packet reformat (encap)
actions in FW tables, and adds clear error flow for attempt to use
SW-created modify header on FW tables.
Chris Mi [Thu, 6 Apr 2023 06:38:09 +0000 (09:38 +0300)]
net/mlx5e: TC, Cleanup ct resources for nic flow
The cited commit removes special handling of CT action. But it
removes too much. Pre ct/ct_nat tables and some other resources
are not destroyed due to the cited commit.
Fix it by adding it back.
Fixes: 08fe94ec5f77 ("net/mlx5e: TC, Remove special handling of CT action") Signed-off-by: Chris Mi <[email protected]> Reviewed-by: Paul Blakey <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
Chris Mi [Tue, 11 Apr 2023 08:17:35 +0000 (11:17 +0300)]
net/mlx5e: TC, Add null pointer check for hardware miss support
The cited commits add hardware miss support to tc action. But if
the rules can't be offloaded, the pointers are null and system
will panic when accessing them.
Fix it by checking null pointer.
Fixes: 08fe94ec5f77 ("net/mlx5e: TC, Remove special handling of CT action") Fixes: 6702782845a5 ("net/mlx5e: TC, Set CT miss to the specific ct action instance") Signed-off-by: Chris Mi <[email protected]> Reviewed-by: Paul Blakey <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
Eli Cohen [Wed, 3 May 2023 12:10:05 +0000 (15:10 +0300)]
net/mlx5: Fix driver load with single msix vector
When a PCI device has just one msix vector available, we want to share
this vector between async and completion events. Current code fails to
do that assuming it will always have at least one dedicated vector for
completion events. Fix this by detecting when the pool contains just a
single vector.
net/mlx5e: XDP, Allow growing tail for XDP multi buffer
The cited commits missed passing frag_size to __xdp_rxq_info_reg, which
is required by bpf_xdp_adjust_tail to support growing the tail pointer
in fragmented packets. Pass the missing parameter when the current RQ
mode allows XDP multi buffer.
Fixes: ea5d49bdae8b ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ") Fixes: 9cb9482ef10e ("net/mlx5e: Use fragments of the same size in non-linear legacy RQ with XDP") Signed-off-by: Maxim Mikityanskiy <[email protected]> Cc: Tariq Toukan <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
xfrm: Linearize the skb after offloading if needed.
With offloading enabled, esp_xmit() gets invoked very late, from within
validate_xmit_xfrm() which is after validate_xmit_skb() validates and
linearizes the skb if the underlying device does not support fragments.
esp_output_tail() may add a fragment to the skb while adding the auth
tag/ IV. Devices without the proper support will then send skb->data
points to with the correct length so the packet will have garbage at the
end. A pcap sniffer will claim that the proper data has been sent since
it parses the skb properly.
It is not affected with INET_ESP_OFFLOAD disabled.
Linearize the skb after offloading if the sending hardware requires it.
It was tested on v4, v6 has been adopted.
Fixes: 7785bba299a8d ("esp: Add a software GRO codepath") Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
====================
Check if FIPS mode is enabled when running selftests
Some test cases from net/tls, net/fcnal-test and net/vrf-xfrm-tests
that rely on cryptographic functions to work and use non-compliant FIPS
algorithms fail in FIPS mode.
In order to allow these tests to pass in a wider set of kernels,
- for net/tls, skip the test variants that use the ChaCha20-Poly1305
and SM4 algorithms, when FIPS mode is enabled;
- for net/fcnal-test, skip the MD5 tests, when FIPS mode is enabled;
- for net/vrf-xfrm-tests, replace the algorithms that are not
FIPS-compliant with compliant ones.
Magali Lemes [Tue, 13 Jun 2023 12:32:22 +0000 (09:32 -0300)]
selftests: net: fcnal-test: check if FIPS mode is enabled
There are some MD5 tests which fail when the kernel is in FIPS mode,
since MD5 is not FIPS compliant. Add a check and only run those tests
if FIPS mode is not enabled.
Magali Lemes [Tue, 13 Jun 2023 12:32:21 +0000 (09:32 -0300)]
selftests: net: vrf-xfrm-tests: change authentication and encryption algos
The vrf-xfrm-tests tests use the hmac(md5) and cbc(des3_ede)
algorithms for performing authentication and encryption, respectively.
This causes the tests to fail when fips=1 is set, since these algorithms
are not allowed in FIPS mode. Therefore, switch from hmac(md5) and
cbc(des3_ede) to hmac(sha1) and cbc(aes), which are FIPS compliant.
Fixes: 3f251d741150 ("selftests: Add tests for vrf and xfrms") Reviewed-by: David Ahern <[email protected]> Signed-off-by: Magali Lemes <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
Magali Lemes [Tue, 13 Jun 2023 12:32:20 +0000 (09:32 -0300)]
selftests: net: tls: check if FIPS mode is enabled
TLS selftests use the ChaCha20-Poly1305 and SM4 algorithms, which are not
FIPS compliant. When fips=1, this set of tests fails. Add a check and only
run these tests if not in FIPS mode.
Fixes: 4f336e88a870 ("selftests/tls: add CHACHA20-POLY1305 to tls selftests") Fixes: e506342a03c7 ("selftests/tls: add SM4 GCM/CCM to tls selftests") Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Magali Lemes <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
Magali Lemes [Tue, 13 Jun 2023 12:32:19 +0000 (09:32 -0300)]
selftests/harness: allow tests to be skipped during setup
Before executing each test from a fixture, FIXTURE_SETUP is run once.
When SKIP is used in FIXTURE_SETUP, the setup function returns early
but the test still proceeds to run, unless another SKIP macro is used
within the test definition, leading to some code repetition. Therefore,
allow tests to be skipped directly from the setup function.
Linus Torvalds [Fri, 16 Jun 2023 04:11:17 +0000 (21:11 -0700)]
Merge tag 'net-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless, and netfilter.
Selftests excluded - we have 58 patches and diff of +442/-199, which
isn't really small but perhaps with the exception of the WiFi locking
change it's old(ish) bugs.
We have no known problems with v6.4.
The selftest changes are rather large as MPTCP folks try to apply
Greg's guidance that selftest from torvalds/linux should be able to
run against stable kernels.
Last thing I should call out is the DCCP/UDP-lite deprecation notices.
We are fairly sure those are dead, but if we're wrong reverting them
back in won't be fun.
Current release - regressions:
- wifi:
- cfg80211: fix double lock bug in reg_wdev_chan_valid()
- iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
Current release - new code bugs:
- handshake: remove fput() that causes use-after-free
Previous releases - regressions:
- sched: cls_u32: fix reference counter leak leading to overflow
- sched: cls_api: fix lockup on flushing explicitly created chain
Previous releases - always broken:
- nf_tables: integrate pipapo into commit protocol
- nf_tables: incorrect error path handling with NFT_MSG_NEWRULE, fix
dangling pointer on failure
- ping6: fix send to link-local addresses with VRF
- sched: act_pedit: parse L3 header for L4 offset, the skb may not
have the offset saved
- sched: act_ct: fix promotion of offloaded unreplied tuple
- sched: refuse to destroy an ingress and clsact Qdiscs if there are
lockless change operations in flight
- wifi: mac80211: fix handful of bugs in multi-link operation
- ipvlan: fix bound dev checking for IPv6 l3s mode
- eth: enetc: correct the indexes of highest and 2nd highest TCs
- eth: ice: fix XDP memory leak when NIC is brought up and down
Misc:
- add deprecation notices for UDP-lite and DCCP
- selftests: mptcp: skip tests not supported by old kernels
- sctp: handle invalid error codes without calling BUG()"
* tag 'net-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
dccp: Print deprecation notice.
udplite: Print deprecation notice.
octeon_ep: Add missing check for ioremap
selftests/ptp: Fix timestamp printf format for PTP_SYS_OFFSET
net: ethernet: stmicro: stmmac: fix possible memory leak in __stmmac_open
net: tipc: resize nlattr array to correct size
sfc: fix XDP queues mode with legacy IRQ
net: macsec: fix double free of percpu stats
net: lapbether: only support ethernet devices
MAINTAINERS: add reviewers for SMC Sockets
s390/ism: Fix trying to free already-freed IRQ by repeated ism_dev_exit()
net: dsa: felix: fix taprio guard band overflow at 10Mbps with jumbo frames
net/sched: cls_api: Fix lockup on flushing explicitly created chain
ice: Fix ice module unload
net/handshake: remove fput() that causes use-after-free
selftests: forwarding: hw_stats_l3: Set addrgenmode in a separate step
net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting
net/sched: Refactor qdisc_graft() for ingress and clsact Qdiscs
net/sched: act_ct: Fix promotion of offloaded unreplied tuple
wifi: iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
...
Linus Torvalds [Fri, 16 Jun 2023 03:19:21 +0000 (20:19 -0700)]
Merge tag 'for-6.4/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix DM thinp discard performance regression introduced during this
merge window where DM core was splitting large discards every 128K
(max_sectors_kb) rather than every 64M (discard_max_bytes).
- Extend DM core LOCKFS fix, made during 6.4 merge, to also fix race
between do_mount and dm's do_suspend (in addition to the earlier
fix's do_mount race with dm's do_resume).
- Fix DM thin metadata operations to first check if the thin-pool is in
"fail_io" mode; otherwise UAF can occur.
- Fix DM thinp's call to __blkdev_issue_discard to use GFP_NOIO rather
than GFP_NOWAIT (__blkdev_issue_discard cannot handle NULL return
from bio_alloc).
* tag 'for-6.4/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: use op specific max_sectors when splitting abnormal io
dm thin: fix issue_discard to pass GFP_NOIO to __blkdev_issue_discard
dm thin metadata: check fail_io before using data_sm
dm: don't lock fs when the map is NULL during suspend or resume
Linus Torvalds [Fri, 16 Jun 2023 03:13:56 +0000 (20:13 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"This is an unusually large bunch of bug fixes for the later rc cycle,
rxe and mlx5 both dumped a lot of things at once. rxe continues to fix
itself, and mlx5 is fixing a bunch of "queue counters" related bugs.
There is one highly notable bug fix regarding the qkey. This small
security check was missed in the original 2005 implementation and it
allows some significant issues.
Summary:
- Two rtrs bug fixes for error unwind bugs
- Several rxe bug fixes:
* Incorrect Rx packet validation
* Using memory without a refcount
* Syzkaller found use before initialization
* Regression fix for missing locking with the tasklet conversion
from this merge window
- Have bnxt report the correct link properties to userspace, this was
a regression in v6.3
- Several mlx5 bug fixes:
* Kernel crash triggerable by userspace for the RAW ethernet
profile
* Defend against steering refcounting issues created by userspace
* Incorrect change of QP port affinity parameters in some LAG
configurations
- Fix mlx5 Q counters:
* Do not over allocate Q counters to allow userspace to use the
full port capacity
* Kernel crash triggered by eswitch due to mis-use of Q counters
* Incorrect mlx5_device for Q counters in some LAG configurations
- Properly implement the IBA spec restricting privileged qkeys to
root
- Always an error when reading from a disassociated device's event
queue
- isert bug fixes:
* Avoid a deadlock with the CM handler and CM ID destruction
* Correct list corruption due to incorrect locking
* Fix a use after free around connection tear down"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/rxe: Fix rxe_cq_post
IB/isert: Fix incorrect release of isert connection
IB/isert: Fix possible list corruption in CMA handler
IB/isert: Fix dead lock in ib_isert
RDMA/mlx5: Fix affinity assignment
IB/uverbs: Fix to consider event queue closing also upon non-blocking mode
RDMA/uverbs: Restrict usage of privileged QKEYs
RDMA/cma: Always set static rate to 0 for RoCE
RDMA/mlx5: Fix Q-counters query in LAG mode
RDMA/mlx5: Remove vport Q-counters dependency on normal Q-counters
RDMA/mlx5: Fix Q-counters per vport allocation
RDMA/mlx5: Create an indirect flow table for steering anchor
RDMA/mlx5: Initiate dropless RQ for RAW Ethernet functions
RDMA/rxe: Fix the use-before-initialization error of resp_pkts
RDMA/bnxt_re: Fix reporting active_{speed,width} attributes
RDMA/rxe: Fix ref count error in check_rkey()
RDMA/rxe: Fix packet length checks
RDMA/rtrs: Fix rxe_dealloc_pd warning
RDMA/rtrs: Fix the last iu->buf leak in err path
Linus Torvalds [Fri, 16 Jun 2023 03:03:15 +0000 (20:03 -0700)]
Merge tag 'spi-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few more driver specific fixes.
The DesignWare fix is for an issue introduced by conversion to the
chip select accessor functions and is pretty important but the other
two are less severe"
* tag 'spi-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: dw: Replace incorrect spi_get_chipselect with set
spi: fsl-dspi: avoid SCK glitches with continuous transfers
spi: cadence-quadspi: Add missing check for dma_set_mask
Linus Torvalds [Fri, 16 Jun 2023 02:54:58 +0000 (19:54 -0700)]
Merge tag 'regulator-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"The set of regulators described for the Qualcomm PM8550 just seems to
have been completely wrong and would likely not have worked at all if
anything tried to actually configure anything except for enabling and
disabling at runtime"
* tag 'regulator-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: qcom-rpmh: Fix regulators for PM8550
Linus Torvalds [Fri, 16 Jun 2023 02:50:57 +0000 (19:50 -0700)]
Merge tag 'regmap-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fix from Mark Brown:
"Another fix for the maple tree cache, Takashi noticed that unlike
other caches the maple tree cache didn't check for read only registers
before trying to sync which would result in spurious syncs for read
only registers where we don't have a default.
This was due to the check being open coded in the caches, we now check
in the shared 'does this register need sync' function so that is fixed
for this and future caches"
* tag 'regmap-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: regcache: Don't sync read-only registers
Linus Torvalds [Fri, 16 Jun 2023 02:13:45 +0000 (19:13 -0700)]
Merge tag 'media/v6.4-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"A fix for dvb-core to avoid a race condition during DVB board
registration"
* tag 'media/v6.4-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
Revert "media: dvb-core: Fix use-after-free on race condition at dvb_frontend"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: drop the call to ext4_error() from ext4_get_group_info()
Revert "ext4: remove unnecessary check in ext4_bg_num_gdb_nometa"
Linus Torvalds [Thu, 15 Jun 2023 22:24:33 +0000 (15:24 -0700)]
Merge tag '6.4-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
"Eight, mostly small, smb3 client fixes:
- important fix for deferred close oops (race with unmount) found
with xfstest generic/098 to some servers
- important reconnect fix
- fix problem with max_credits mount option
- two multichannel (interface related) fixes
- one trivial removal of confusing comment
- two small debugging improvements (to better spot crediting
problems)"
* tag '6.4-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: add a warning when the in-flight count goes negative
cifs: fix lease break oops in xfstest generic/098
cifs: fix max_credits implementation
cifs: fix sockaddr comparison in iface_cmp
smb/client: print "Unknown" instead of bogus link speed value
cifs: print all credit counters in DebugData
cifs: fix status checks in cifs_tree_connect
smb: remove obsolete comment
DCCP was marked as Orphan in the MAINTAINERS entry 2 years ago in commit 054c4610bd05 ("MAINTAINERS: dccp: move Gerrit Renker to CREDITS"). It says
we haven't heard from the maintainer for five years, so DCCP is not well
maintained for 7 years now.
Recently DCCP only receives updates for bugs, and major distros disable it
by default.
Removing DCCP would allow for better organisation of TCP fields to reduce
the number of cache lines hit in the fast path.
Let's add a deprecation notice when DCCP socket is created and schedule its
removal to 2025.
Recently syzkaller reported a 7-year-old null-ptr-deref [0] that occurs
when a UDP-Lite socket tries to allocate a buffer under memory pressure.
Someone should have stumbled on the bug much earlier if UDP-Lite had been
used in a real app. Also, we do not always need a large UDP-Lite workload
to hit the bug since UDP and UDP-Lite share the same memory accounting
limit.
Removing UDP-Lite would simplify UDP code removing a bunch of conditionals
in fast path.
Let's add a deprecation notice when UDP-Lite socket is created and schedule
its removal to 2025.
Alex Maftei [Thu, 15 Jun 2023 08:34:04 +0000 (09:34 +0100)]
selftests/ptp: Fix timestamp printf format for PTP_SYS_OFFSET
Previously, timestamps were printed using "%lld.%u" which is incorrect
for nanosecond values lower than 100,000,000 as they're fractional
digits, therefore leading zeros are meaningful.
This patch changes the format strings to "%lld.%09u" in order to add
leading zeros to the nanosecond value.
Fixes: 568ebc5985f5 ("ptp: add the PTP_SYS_OFFSET ioctl to the testptp program") Fixes: 4ec54f95736f ("ptp: Fix compiler warnings in the testptp utility") Fixes: 6ab0e475f1f3 ("Documentation: fix misc. warnings") Signed-off-by: Alex Maftei <[email protected]> Acked-by: Richard Cochran <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
net: ethernet: stmicro: stmmac: fix possible memory leak in __stmmac_open
Fix a possible memory leak in __stmmac_open when stmmac_init_phy fails.
It's also needed to free everything allocated by stmmac_setup_dma_desc
and not just the dma_conf struct.
Drop free_dma_desc_resources from __stmmac_open and correctly call
free_dma_desc_resources on each user of __stmmac_open on error.
Lin Ma [Wed, 14 Jun 2023 12:06:04 +0000 (20:06 +0800)]
net: tipc: resize nlattr array to correct size
According to nla_parse_nested_deprecated(), the tb[] is supposed to the
destination array with maxtype+1 elements. In current
tipc_nl_media_get() and __tipc_nl_media_set(), a larger array is used
which is unnecessary. This patch resize them to a proper size.
Mike Snitzer [Thu, 15 Jun 2023 01:47:46 +0000 (21:47 -0400)]
dm: use op specific max_sectors when splitting abnormal io
Split abnormal IO in terms of the corresponding operation specific
max_sectors (max_discard_sectors, max_secure_erase_sectors or
max_write_zeroes_sectors).
This fixes a significant dm-thinp discard performance regression that
was introduced with commit e2dd8aca2d76 ("dm bio prison v1: improve
concurrent IO performance"). Relative to discard: max_discard_sectors
is used instead of max_sectors; which fixes excessive discard splitting
(e.g. max_sectors=128K vs max_discard_sectors=64M).
Tested by discarding an 1 Petabyte dm-thin device:
lvcreate -V 1125899906842624B -T test/pool -n thin
time blkdiscard /dev/test/thin
Before this fix (splitting discards every 128K): ~116m
After this fix (splitting discards every 64M) : 0m33.460s
Reported-by: Zorro Lang <[email protected]> Fixes: 06961c487a33 ("dm: split discards further if target sets max_discard_granularity")
Requires: 13f6facf3fae ("dm: allow targets to require splitting WRITE_ZEROES and SECURE_ERASE") Fixes: e2dd8aca2d76 ("dm bio prison v1: improve concurrent IO performance") Signed-off-by: Mike Snitzer <[email protected]>
Mike Snitzer [Wed, 14 Jun 2023 00:05:34 +0000 (20:05 -0400)]
dm thin: fix issue_discard to pass GFP_NOIO to __blkdev_issue_discard
issue_discard() passes GFP_NOWAIT to __blkdev_issue_discard() despite
its code assuming bio_alloc() always succeeds.
Commit 3dba53a958a75 ("dm thin: use __blkdev_issue_discard for async
discard support") clearly shows where things went bad:
Before commit 3dba53a958a75, dm-thin.c's open-coded
__blkdev_issue_discard_async() properly handled using GFP_NOWAIT.
Unfortunately __blkdev_issue_discard() doesn't and it was missed
during review.
As shown above, if dm_pool_commit_metadata() and
dm_pool_abort_metadata() fail in pool_message process, kworker may
trigger UAF.
Fixes: be500ed721a6 ("dm space maps: improve performance with inc/dec on ranges of blocks") Cc: [email protected] Signed-off-by: Li Lingfeng <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
Li Lingfeng [Thu, 1 Jun 2023 06:14:23 +0000 (14:14 +0800)]
dm: don't lock fs when the map is NULL during suspend or resume
As described in commit 38d11da522aa ("dm: don't lock fs when the map is
NULL in process of resume"), a deadlock may be triggered between
do_resume() and do_mount().
This commit preserves the fix from commit 38d11da522aa but moves it to
where it also serves to fix a similar deadlock between do_suspend()
and do_mount(). It does so, if the active map is NULL, by clearing
DM_SUSPEND_LOCKFS_FLAG in dm_suspend() which is called by both
do_suspend() and do_resume().
Fixes: 38d11da522aa ("dm: don't lock fs when the map is NULL in process of resume") Signed-off-by: Li Lingfeng <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
Íñigo Huguet [Tue, 13 Jun 2023 13:38:54 +0000 (15:38 +0200)]
sfc: fix XDP queues mode with legacy IRQ
In systems without MSI-X capabilities, xdp_txq_queues_mode is calculated
in efx_allocate_msix_channels, but when enabling MSI-X fails, it was not
changed to a proper default value. This was leading to the driver
thinking that it has dedicated XDP queues, when it didn't.
Fix it by setting xdp_txq_queues_mode to the correct value if the driver
fallbacks to MSI or legacy IRQ mode. The correct value is
EFX_XDP_TX_QUEUES_BORROWED because there are no XDP dedicated queues.
The issue can be easily visible if the kernel is started with pci=nomsi,
then a call trace is shown. It is not shown only with sfc's modparam
interrupt_mode=2. Call trace example:
WARNING: CPU: 2 PID: 663 at drivers/net/ethernet/sfc/efx_channels.c:828 efx_set_xdp_channels+0x124/0x260 [sfc]
[...skip...]
Call Trace:
<TASK>
efx_set_channels+0x5c/0xc0 [sfc]
efx_probe_nic+0x9b/0x15a [sfc]
efx_probe_all+0x10/0x1a2 [sfc]
efx_pci_probe_main+0x12/0x156 [sfc]
efx_pci_probe_post_io+0x18/0x103 [sfc]
efx_pci_probe.cold+0x154/0x257 [sfc]
local_pci_probe+0x42/0x80
Fedor Pchelkin [Tue, 13 Jun 2023 19:22:20 +0000 (22:22 +0300)]
net: macsec: fix double free of percpu stats
Inside macsec_add_dev() we free percpu macsec->secy.tx_sc.stats and
macsec->stats on some of the memory allocation failure paths. However, the
net_device is already registered to that moment: in macsec_newlink(), just
before calling macsec_add_dev(). This means that during unregister process
its priv_destructor - macsec_free_netdev() - will be called and will free
the stats again.
Remove freeing percpu stats inside macsec_add_dev() because
macsec_free_netdev() will correctly free the already allocated ones. The
pointers to unallocated stats stay NULL, and free_percpu() treats that
correctly.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 0a28bfd4971f ("net/macsec: Add MACsec skb_metadata_dst Tx Data path support") Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Fedor Pchelkin <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jan Karcher [Wed, 14 Jun 2023 06:54:56 +0000 (08:54 +0200)]
MAINTAINERS: add reviewers for SMC Sockets
adding three people from Alibaba as reviewers for SMC.
They are currently working on improving SMC on other architectures than
s390 and help with reviewing patches on top.
Thank you D. Wythe, Tony Lu and Wen Gu for your contributions and
collaboration and welcome on board as reviewers!
Julian Ruess [Tue, 13 Jun 2023 12:25:37 +0000 (14:25 +0200)]
s390/ism: Fix trying to free already-freed IRQ by repeated ism_dev_exit()
This patch prevents the system from crashing when unloading the ISM module.
How to reproduce: Attach an ISM device and execute 'rmmod ism'.
Error-Log:
- Trying to free already-free IRQ 0
- WARNING: CPU: 1 PID: 966 at kernel/irq/manage.c:1890 free_irq+0x140/0x540
After calling ism_dev_exit() for each ISM device in the exit routine,
pci_unregister_driver() will execute ism_remove() for each ISM device.
Because ism_remove() also calls ism_dev_exit(),
free_irq(pci_irq_vector(pdev, 0), ism) is called twice for each ISM
device. This results in a crash with the error
'Trying to free already-free IRQ'.
In the exit routine, it is enough to call pci_unregister_driver()
because it ensures that ism_dev_exit() is called once per
ISM device.
The debugfs_create_dir() returns ERR_PTR in case of an error and the
correct way of checking it is using the IS_ERR_OR_NULL inline function
rather than the simple null comparision. This patch fixes the issue.
Hongchen Zhang [Thu, 15 Jun 2023 06:35:52 +0000 (14:35 +0800)]
LoongArch: Let pmd_present() return true when splitting pmd
When we split a pmd into ptes, pmd_present() and pmd_trans_huge() should
return true, otherwise it would be treated as a swap pmd.
This is the same as arm64 does in commit b65399f6111b ("arm64/mm: Change
THP helpers to comply with generic MM semantics"), we also add a new bit
named _PAGE_PRESENT_INVALID for LoongArch.
Vladimir Oltean [Tue, 13 Jun 2023 17:09:07 +0000 (20:09 +0300)]
net: dsa: felix: fix taprio guard band overflow at 10Mbps with jumbo frames
The DEV_MAC_MAXLEN_CFG register contains a 16-bit value - up to 65535.
Plus 2 * VLAN_HLEN (4), that is up to 65543.
The picos_per_byte variable is the largest when "speed" is lowest -
SPEED_10 = 10. In that case it is (1000000L * 8) / 10 = 800000.
Their product - 52434400000 - exceeds 32 bits, which is a problem,
because apparently, a multiplication between two 32-bit factors is
evaluated as 32-bit before being assigned to a 64-bit variable.
In fact it's a problem for any MTU value larger than 5368.
Cast one of the factors of the multiplication to u64 to force the
multiplication to take place on 64 bits.
Vlad Buslov [Mon, 12 Jun 2023 09:34:26 +0000 (11:34 +0200)]
net/sched: cls_api: Fix lockup on flushing explicitly created chain
Mingshuai Ren reports:
When a new chain is added by using tc, one soft lockup alarm will be
generated after delete the prio 0 filter of the chain. To reproduce
the problem, perform the following steps:
(1) tc qdisc add dev eth0 root handle 1: htb default 1
(2) tc chain add dev eth0
(3) tc filter del dev eth0 chain 0 parent 1: prio 0
(4) tc filter add dev eth0 chain 0 parent 1:
Fix the issue by accounting for additional reference to chains that are
explicitly created by RTM_NEWCHAIN message as opposed to implicitly by
RTM_NEWTFILTER message.
Jakub Buchocki [Mon, 12 Jun 2023 17:14:21 +0000 (10:14 -0700)]
ice: Fix ice module unload
Clearing the interrupt scheme before PFR reset,
during the removal routine, could cause the hardware
errors and possibly lead to system reboot, as the PF
reset can cause the interrupt to be generated.
Place the call for PFR reset inside ice_deinit_dev(),
wait until reset and all pending transactions are done,
then call ice_clear_interrupt_scheme().
This introduces a PFR reset to multiple error paths.
Additionally, remove the call for the reset from
ice_load() - it will be a part of ice_unload() now.
Jakub Kicinski [Thu, 15 Jun 2023 05:36:53 +0000 (22:36 -0700)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-06-12 (igc, igb)
This series contains updates to igc and igb drivers.
Husaini clears Tx rings when interface is brought down for igc.
Vinicius disables PTM and PCI busmaster when removing igc driver.
Alex adds error check and path for NVM read error on igb.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
igb: fix nvm.ops.read() error handling
igc: Fix possible system crash when loading module
igc: Clean the TX buffer and TX descriptor ring
====================
One can enable CONFIG_NET_HANDSHAKE_KUNIT_TEST config to reproduce above
crash.
The root cause of this bug is that the commit 1ce77c998f04
("net/handshake: Unpin sock->file if a handshake is cancelled") adds one
additional fput() function. That patch claims that the fput() is used to
enable sock->file to be freed even when user space never calls DONE.
However, it seems that the intended DONE routine will never give an
additional fput() of ths sock->file. The existing two of them are just
used to balance the reference added in sockfd_lookup().
This patch revert the mentioned commit to avoid the use-after-free. The
patched kernel could successfully pass the KUNIT test and boot to shell.
[ 0.733613] # Subtest: Handshake API tests
[ 0.734029] 1..11
[ 0.734255] KTAP version 1
[ 0.734542] # Subtest: req_alloc API fuzzing
[ 0.736104] ok 1 handshake_req_alloc NULL proto
[ 0.736114] ok 2 handshake_req_alloc CLASS_NONE
[ 0.736559] ok 3 handshake_req_alloc CLASS_MAX
[ 0.737020] ok 4 handshake_req_alloc no callbacks
[ 0.737488] ok 5 handshake_req_alloc no done callback
[ 0.737988] ok 6 handshake_req_alloc excessive privsize
[ 0.738529] ok 7 handshake_req_alloc all good
[ 0.739036] # req_alloc API fuzzing: pass:7 fail:0 skip:0 total:7
[ 0.739444] ok 1 req_alloc API fuzzing
[ 0.740065] ok 2 req_submit NULL req arg
[ 0.740436] ok 3 req_submit NULL sock arg
[ 0.740834] ok 4 req_submit NULL sock->file
[ 0.741236] ok 5 req_lookup works
[ 0.741621] ok 6 req_submit max pending
[ 0.741974] ok 7 req_submit multiple
[ 0.742382] ok 8 req_cancel before accept
[ 0.742764] ok 9 req_cancel after accept
[ 0.743151] ok 10 req_cancel after done
[ 0.743510] ok 11 req_destroy works
[ 0.743882] # Handshake API tests: pass:11 fail:0 skip:0 total:11
[ 0.744205] # Totals: pass:17 fail:0 skip:0 total:17
Jakub Kicinski [Thu, 15 Jun 2023 04:28:59 +0000 (21:28 -0700)]
Merge tag 'wireless-2023-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
A couple of straggler fixes, mostly in the stack:
- fix fragmentation for multi-link related elements
- fix callback copy/paste error
- fix multi-link locking
- remove double-locking of wiphy mutex
- transmit only on active links, not all
- activate links in the correct order
- don't remove links that weren't added
- disable soft-IRQs for LQ lock in iwlwifi
* tag 'wireless-2023-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
wifi: mac80211: fragment per STA profile correctly
wifi: mac80211: Use active_links instead of valid_links in Tx
wifi: cfg80211: remove links only on AP
wifi: mac80211: take lock before setting vif links
wifi: cfg80211: fix link del callback to call correct handler
wifi: mac80211: fix link activation settings order
wifi: cfg80211: fix double lock bug in reg_wdev_chan_valid()
====================
ext4: drop the call to ext4_error() from ext4_get_group_info()
A recent patch added a call to ext4_error() which is problematic since
some callers of the ext4_get_group_info() function may be holding a
spinlock, whereas ext4_error() must never be called in atomic context.
This triggered a report from Syzbot: "BUG: sleeping function called from
invalid context in ext4_update_super" (see the link below).
Therefore, drop the call to ext4_error() from ext4_get_group_info(). In
the meantime use eight characters tabs instead of nine characters ones.
The reverted commit was intended to simpfy the code to get group
descriptor block number in non-meta block group by assuming
s_gdb_count is block number used for all non-meta block group descriptors.
However s_gdb_count is block number used for all meta *and* non-meta
group descriptors. So s_gdb_group will be > actual group descriptor block
number used for all non-meta block group which should be "total non-meta
block group" / "group descriptors per block", e.g. s_first_meta_bg.
Revert "media: dvb-core: Fix use-after-free on race condition at dvb_frontend"
As reported by Thomas Voegtle <[email protected]>, sometimes a DVB card does
not initialize properly booting Linux 6.4-rc4. This is not always, maybe
in 3 out of 4 attempts.
After double-checking, the root cause seems to be related to the
UAF fix, which is causing a race issue:
Bob Pearson [Mon, 12 Jun 2023 15:50:33 +0000 (10:50 -0500)]
RDMA/rxe: Fix rxe_cq_post
A recent patch replaced a tasklet execution of cq->comp_handler by a
direct call. While this made sense it let changes to cq->notify state be
unprotected and assumed that the cq completion machinery and the ulp done
callbacks were reentrant. The result is that in some cases completion
events can be lost. This patch moves the cq->comp_handler call inside of
the spinlock in rxe_cq_post which solves both issues. This is compatible
with the matching code in the request notify verb.
Steve French [Sun, 11 Jun 2023 16:23:32 +0000 (11:23 -0500)]
cifs: fix lease break oops in xfstest generic/098
umount can race with lease break so need to check if
tcon->ses->server is still valid to send the lease
break response.
Reviewed-by: Bharath SM <[email protected]> Reviewed-by: Shyam Prasad N <[email protected]> Fixes: 59a556aebc43 ("SMB3: drop reference to cfile before sending oplock break") Signed-off-by: Steve French <[email protected]>
Danielle Ratson [Mon, 12 Jun 2023 14:34:58 +0000 (16:34 +0200)]
selftests: forwarding: hw_stats_l3: Set addrgenmode in a separate step
Setting the IPv6 address generation mode of a net device during its
creation never worked, but after commit b0ad3c179059 ("rtnetlink: call
validate_linkmsg in rtnl_create_link") it explicitly fails [1]. The
failure is caused by the fact that validate_linkmsg() is called before
the net device is registered, when it still does not have an 'inet6_dev'.
Likewise, raising the net device before setting the address generation
mode is meaningless, because by the time the mode is set, the address
has already been generated.
Therefore, fix the test to first create the net device, then set its
IPv6 address generation mode and finally bring it up.
[1]
# ip link add name mydev addrgenmode eui64 type dummy
RTNETLINK answers: Address family not supported by protocol
====================
net/sched: Fix race conditions in mini_qdisc_pair_swap()
These 2 patches fix race conditions for ingress and clsact Qdiscs as
reported [1] by syzbot, split out from another [2] series (last 2 patches
of it). Per-patch changelog omitted.
Patch 1 hasn't been touched since last version; I just included
everybody's tag.
Patch 2 bases on patch 6 v1 of [2], with comments and commit log slightly
changed. We also need rtnl_dereference() to load ->qdisc_sleeping since
commit d636fc5dd692 ("net: sched: add rcu annotations around
qdisc->qdisc_sleeping"), so I changed that; please take yet another look,
thanks!
Patch 2 has been tested with the new reproducer Pedro posted [3].
Peilin Ye [Sun, 11 Jun 2023 03:30:25 +0000 (20:30 -0700)]
net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting
mini_Qdisc_pair::p_miniq is a double pointer to mini_Qdisc, initialized
in ingress_init() to point to net_device::miniq_ingress. ingress Qdiscs
access this per-net_device pointer in mini_qdisc_pair_swap(). Similar
for clsact Qdiscs and miniq_egress.
Unfortunately, after introducing RTNL-unlocked RTM_{NEW,DEL,GET}TFILTER
requests (thanks Hillf Danton for the hint), when replacing ingress or
clsact Qdiscs, for example, the old Qdisc ("@old") could access the same
miniq_{in,e}gress pointer(s) concurrently with the new Qdisc ("@new"),
causing race conditions [1] including a use-after-free bug in
mini_qdisc_pair_swap() reported by syzbot:
@old and @new should not affect each other. In other words, @old should
never modify miniq_{in,e}gress after @new, and @new should not update
@old's RCU state.
Fixing without changing sch_api.c turned out to be difficult (please
refer to Closes: for discussions). Instead, make sure @new's first call
always happen after @old's last call (in {ingress,clsact}_destroy()) has
finished:
In qdisc_graft(), return -EBUSY if @old has any ongoing filter requests,
and call qdisc_destroy() for @old before grafting @new.
Introduce qdisc_refcount_dec_if_one() as the counterpart of
qdisc_refcount_inc_nz() used for filter requests. Introduce a
non-static version of qdisc_destroy() that does a TCQ_F_BUILTIN check,
just like qdisc_put() etc.
Depends on patch "net/sched: Refactor qdisc_graft() for ingress and
clsact Qdiscs".
[1] To illustrate, the syzkaller reproducer adds ingress Qdiscs under
TC_H_ROOT (no longer possible after commit c7cfbd115001 ("net/sched:
sch_ingress: Only create under TC_H_INGRESS")) on eth0 that has 8
transmission queues:
Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2),
then adds a flower filter X to A.
Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and
b2) to replace A, then adds a flower filter Y to B.
Here, B calls mini_qdisc_pair_swap(), pointing eth0->miniq_ingress to
its mini Qdisc, b1. Then, A calls mini_qdisc_pair_swap() again during
ingress_destroy(), setting eth0->miniq_ingress to NULL, so ingress
packets on eth0 will not find filter Y in sch_handle_ingress().
This is just one of the possible consequences of concurrently accessing
miniq_{in,e}gress pointers.
Paul Blakey [Fri, 9 Jun 2023 12:22:59 +0000 (15:22 +0300)]
net/sched: act_ct: Fix promotion of offloaded unreplied tuple
Currently UNREPLIED and UNASSURED connections are added to the nf flow
table. This causes the following connection packets to be processed
by the flow table which then skips conntrack_in(), and thus such the
connections will remain UNREPLIED and UNASSURED even if reply traffic
is then seen. Even still, the unoffloaded reply packets are the ones
triggering hardware update from new to established state, and if
there aren't any to triger an update and/or previous update was
missed, hardware can get out of sync with sw and still mark
packets as new.
Fix the above by:
1) Not skipping conntrack_in() for UNASSURED packets, but still
refresh for hardware, as before the cited patch.
2) Try and force a refresh by reply-direction packets that update
the hardware rules from new to established state.
3) Remove any bidirectional flows that didn't failed to update in
hardware for re-insertion as bidrectional once any new packet
arrives.
Hugh Dickins [Fri, 9 Jun 2023 21:29:39 +0000 (14:29 -0700)]
wifi: iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
Lockdep on 6.4-rc on ThinkPad X1 Carbon 5th says
=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
6.4.0-rc5 #1 Not tainted
-----------------------------------------------------
kworker/3:1/49 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire: ffff8881066fa368 (&mvm_sta->deflink.lq_sta.rs_drv.pers.lock){+.+.}-{2:2}, at: rs_drv_get_rate+0x46/0xe7
and this task is already holding: ffff8881066f80a8 (&sta->rate_ctrl_lock){+.-.}-{2:2}, at: rate_control_get_rate+0xbd/0x126
which would create a new lock dependency:
(&sta->rate_ctrl_lock){+.-.}-{2:2} -> (&mvm_sta->deflink.lq_sta.rs_drv.pers.lock){+.+.}-{2:2}
but this new dependency connects a SOFTIRQ-irq-safe lock:
(&sta->rate_ctrl_lock){+.-.}-{2:2}
etc. etc. etc.
Changing the spin_lock() in rs_drv_get_rate() to spin_lock_bh() was not
enough to pacify lockdep, but changing them all on pers.lock has worked.
Vlad Buslov [Mon, 12 Jun 2023 07:57:12 +0000 (09:57 +0200)]
selftests/tc-testing: Remove configs that no longer exist
Some qdiscs and classifiers have recently been retired from kernel.
However, tc-testing config is still cluttered with them which causes noise
when using merge_config.sh script to update existing config for tc-testing
compatibility. Remove the config settings for affected qdiscs and
classifiers.
Vlad Buslov [Mon, 12 Jun 2023 07:57:11 +0000 (09:57 +0200)]
selftests/tc-testing: Fix SFB db test
Setting very small value of db like 10ms introduces rounding errors when
converting to/from jiffies on some kernel configs. For example, on 250hz
the actual value will be set to 12ms which causes the test to fail:
# $ sudo ./tdc.py -d eth2 -e 3410
# -- ns/SubPlugin.__init__
# Test 3410: Create SFB with db setting
#
# All test results:
#
# 1..1
# not ok 1 3410 - Create SFB with db setting
# Could not match regex pattern. Verify command output:
# qdisc sfb 1: root refcnt 2 rehash 600s db 12ms limit 1000p max 25p target 20p increment 0.000503548 decrement 4.57771e-05 penalty_rate 10pps penalty_burst 20p
Set the value to 100ms instead which currently seem to work on 100hz,
250hz, 300hz and 1000hz kernel configs.
Fixes: 6ad92dc56fca ("selftests/tc-testing: add selftests for sfb qdisc") Signed-off-by: Vlad Buslov <[email protected]> Reviewed-by: Pedro Tammela <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>