Paolo Abeni [Thu, 22 Feb 2024 09:04:46 +0000 (10:04 +0100)]
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-02-22
The following pull-request contains BPF updates for your *net* tree.
We've added 11 non-merge commits during the last 24 day(s) which contain
a total of 15 files changed, 217 insertions(+), 17 deletions(-).
The main changes are:
1) Fix a syzkaller-triggered oops when attempting to read the vsyscall
page through bpf_probe_read_kernel and friends, from Hou Tao.
2) Fix a kernel panic due to uninitialized iter position pointer in
bpf_iter_task, from Yafang Shao.
3) Fix a race between bpf_timer_cancel_and_free and bpf_timer_cancel,
from Martin KaFai Lau.
4) Fix a xsk warning in skb_add_rx_frag() (under CONFIG_DEBUG_NET)
due to incorrect truesize accounting, from Sebastian Andrzej Siewior.
5) Fix a NULL pointer dereference in sk_psock_verdict_data_ready,
from Shigeru Yoshida.
6) Fix a resolve_btfids warning when bpf_cpumask symbol cannot be
resolved, from Hari Bathini.
bpf-for-netdev
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, sockmap: Fix NULL pointer dereference in sk_psock_verdict_data_ready()
selftests/bpf: Add negtive test cases for task iter
bpf: Fix an issue due to uninitialized bpf_iter_task
selftests/bpf: Test racing between bpf_timer_cancel_and_free and bpf_timer_cancel
bpf: Fix racing between bpf_timer_cancel_and_free and bpf_timer_cancel
selftest/bpf: Test the read of vsyscall page under x86-64
x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault()
x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h
bpf, scripts: Correct GPL license name
xsk: Add truesize to skb_add_rx_frag().
bpf: Fix warning for bpf_cpumask in verifier
====================
net: phy: realtek: Fix rtl8211f_config_init() for RTL8211F(D)(I)-VD-CG PHY
Commit bb726b753f75 ("net: phy: realtek: add support for
RTL8211F(D)(I)-VD-CG") extended support of the driver from the existing
support for RTL8211F(D)(I)-CG PHY to the newer RTL8211F(D)(I)-VD-CG PHY.
While that commit indicated that the RTL8211F_PHYCR2 register is not
supported by the "VD-CG" PHY model and therefore updated the corresponding
section in rtl8211f_config_init() to be invoked conditionally, the call to
"genphy_soft_reset()" was left as-is, when it should have also been invoked
conditionally. This is because the call to "genphy_soft_reset()" was first
introduced by the commit 0a4355c2b7f8 ("net: phy: realtek: add dt property
to disable CLKOUT clock") since the RTL8211F guide indicates that a PHY
reset should be issued after setting bits in the PHYCR2 register.
As the PHYCR2 register is not applicable to the "VD-CG" PHY model, fix the
rtl8211f_config_init() function by invoking "genphy_soft_reset()"
conditionally based on the presence of the "PHYCR2" register.
Justin Iurman [Mon, 19 Feb 2024 13:52:55 +0000 (14:52 +0100)]
selftests: ioam: refactoring to align with the fix
ioam6_parser uses a packet socket. After the fix to prevent writing to
cloned skb's, the receiver does not see its IOAM data anymore, which
makes input/forward ioam-selftests to fail. As a workaround,
ioam6_parser now uses an IPv6 raw socket and leverages ancillary data to
get hop-by-hop options. As a consequence, the hook is "after" the IOAM
data insertion by the receiver and all tests are working again.
Justin Iurman [Mon, 19 Feb 2024 13:52:54 +0000 (14:52 +0100)]
Fix write to cloned skb in ipv6_hop_ioam()
ioam6_fill_trace_data() writes inside the skb payload without ensuring
it's writeable (e.g., not cloned). This function is called both from the
input and output path. The output path (ioam6_iptunnel) already does the
check. This commit provides a fix for the input path, inside
ipv6_hop_ioam(). It also updates ip6_parse_tlv() to refresh the network
header pointer ("nh") when returning from ipv6_hop_ioam().
Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace") Reported-by: Paolo Abeni <[email protected]> Signed-off-by: Justin Iurman <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
The receive queues are protected by their respective spin-lock, not
the socket lock. This could lead to skb_peek() unexpectedly
returning NULL or a pointer to an already dequeued socket buffer.
Horatiu Vultur [Mon, 19 Feb 2024 08:00:43 +0000 (09:00 +0100)]
net: sparx5: Add spinlock for frame transmission from CPU
Both registers used when doing manual injection or fdma injection are
shared between all the net devices of the switch. It was noticed that
when having two process which each of them trying to inject frames on
different ethernet ports, that the HW started to behave strange, by
sending out more frames then expected. When doing fdma injection it is
required to set the frame in the DCB and then make sure that the next
pointer of the last DCB is invalid. But because there is no locks for
this, then easily this pointer between the DCB can be broken and then it
would create a loop of DCBs. And that means that the HW will
continuously transmit these frames in a loop. Until the SW will break
this loop.
Therefore to fix this issue, add a spin lock for when accessing the
registers for manual or fdma injection.
Jiri Pirko [Tue, 20 Feb 2024 07:52:45 +0000 (08:52 +0100)]
devlink: fix port dump cmd type
Unlike other commands, due to a c&p error, port dump fills-up cmd with
wrong value, different from port-get request cmd, port-get doit reply
and port notification.
Fix it by filling cmd with value DEVLINK_CMD_PORT_NEW.
Skimmed through devlink userspace implementations, none of them cares
about this cmd value. Only ynl, for which, this is actually a fix, as it
expects doit and dumpit ops rsp_value to be the same.
Omit the fixes tag, even thought this is fix, better to target this for
next release.
Jakub Kicinski [Tue, 20 Feb 2024 16:11:12 +0000 (08:11 -0800)]
tools: ynl: don't leak mcast_groups on init error
Make sure to free the already-parsed mcast_groups if
we don't get an ack from the kernel when reading family info.
This is part of the ynl_sock_create() error path, so we won't
get a call to ynl_sock_destroy() to free them later.
Jakub Kicinski [Tue, 20 Feb 2024 16:11:11 +0000 (08:11 -0800)]
tools: ynl: make sure we always pass yarg to mnl_cb_run
There is one common error handler in ynl - ynl_cb_error().
It expects priv to be a pointer to struct ynl_parse_arg AKA yarg.
To avoid potential crashes if we encounter a stray NLMSG_ERROR
always pass yarg as priv (or a struct which has it as the first
member).
ynl_cb_null() has a similar problem directly - it expects yarg
but priv passed by the caller is ys.
Jeremy Kerr [Thu, 15 Feb 2024 07:53:08 +0000 (15:53 +0800)]
net: mctp: put sock on tag allocation failure
We may hold an extra reference on a socket if a tag allocation fails: we
optimistically allocate the sk_key, and take a ref there, but do not
drop if we end up not using the allocated key.
Ensure we're dropping the sock on this failure by doing a proper unref
rather than directly kfree()ing.
====================
tls: fixes for record type handling with PEEK
There are multiple bugs in tls_sw_recvmsg's handling of record types
when MSG_PEEK flag is used, which can lead to incorrectly merging two
records:
- consecutive non-DATA records shouldn't be merged, even if they're
the same type (partly handled by the test at the end of the main
loop)
- records of the same type (even DATA) shouldn't be merged if one
record of a different type comes in between
====================
Sabrina Dubroca [Thu, 15 Feb 2024 16:17:33 +0000 (17:17 +0100)]
selftests: tls: add test for peeking past a record of a different type
If we queue 3 records:
- record 1, type DATA
- record 2, some other type
- record 3, type DATA
the current code can look past the 2nd record and merge the 2 data
records.
Sabrina Dubroca [Thu, 15 Feb 2024 16:17:31 +0000 (17:17 +0100)]
tls: don't skip over different type records from the rx_list
If we queue 3 records:
- record 1, type DATA
- record 2, some other type
- record 3, type DATA
and do a recv(PEEK), the rx_list will contain the first two records.
The next large recv will walk through the rx_list and copy data from
record 1, then stop because record 2 is a different type. Since we
haven't filled up our buffer, we will process the next available
record. It's also DATA, so we can merge it with the current read.
We shouldn't do that, since there was a record in between that we
ignored.
Add a flag to let process_rx_list inform tls_sw_recvmsg that it had
more data available.
Sabrina Dubroca [Thu, 15 Feb 2024 16:17:30 +0000 (17:17 +0100)]
tls: stop recv() if initial process_rx_list gave us non-DATA
If we have a non-DATA record on the rx_list and another record of the
same type still on the queue, we will end up merging them:
- process_rx_list copies the non-DATA record
- we start the loop and process the first available record since it's
of the same type
- we break out of the loop since the record was not DATA
Just check the record type and jump to the end in case process_rx_list
did some work.
Sabrina Dubroca [Thu, 15 Feb 2024 16:17:29 +0000 (17:17 +0100)]
tls: break out of main loop when PEEK gets a non-data record
PEEK needs to leave decrypted records on the rx_list so that we can
receive them later on, so it jumps back into the async code that
queues the skb. Unfortunately that makes us skip the
TLS_RECORD_TYPE_DATA check at the bottom of the main loop, so if two
records of the same (non-DATA) type are queued, we end up merging
them.
Add the same record type check, and make it unlikely to not penalize
the async fastpath. Async decrypt only applies to data record, so this
check is only needed for PEEK.
If sk_psock_verdict_data_ready() and sk_psock_stop_verdict() are called
concurrently, psock->saved_data_ready can be NULL, causing the above issue.
This patch fixes this issue by calling the appropriate data ready function
using the sk_psock_data_ready() helper and protecting it from concurrency
with sk->sk_callback_lock.
Simon Horman [Mon, 19 Feb 2024 17:55:31 +0000 (17:55 +0000)]
MAINTAINERS: Add framer headers to NETWORKING [GENERAL]
The cited commit [1] added framer support under drivers/net/wan,
which is covered by NETWORKING [GENERAL]. And it is implied
that framer-provider.h and framer.h, which were also added
buy the same patch, are also maintained as part of NETWORKING [GENERAL].
Make this explicit by adding these files to the corresponding
section in MAINTAINERS.
af_unix: Drop oob_skb ref before purging queue in GC.
syzbot reported another task hung in __unix_gc(). [0]
The current while loop assumes that all of the left candidates
have oob_skb and calling kfree_skb(oob_skb) releases the remaining
candidates.
However, I missed a case that oob_skb has self-referencing fd and
another fd and the latter sk is placed before the former in the
candidate list. Then, the while loop never proceeds, resulting
the task hung.
__unix_gc() has the same loop just before purging the collected skb,
so we can call kfree_skb(oob_skb) there and let __skb_queue_purge()
release all inflight sockets.
In newer hardware, IPA supports more than 32 endpoints. Some
registers--such as IPA interrupt registers--represent endpoints
as bits in a 4-byte register, and such registers are repeated as
needed to represent endpoints beyond the first 32.
In ipa_interrupt_suspend_clear_all(), we clear all pending IPA
suspend interrupts by reading all status register(s) and writing
corresponding registers to clear interrupt conditions.
Unfortunately the number of registers to read/write is calculated
incorrectly, and as a result we access *many* more registers than
intended. This bug occurs only when the IPA hardware signals a
SUSPEND interrupt, which happens when a packet is received for an
endpoint (or its underlying GSI channel) that is suspended. This
situation is difficult to reproduce, but possible.
Fix this by correctly computing the number of interrupt registers to
read and write. This is the only place in the code where registers
that map endpoints or channels this way perform this calculation.
Fixes: f298ba785e2d ("net: ipa: add a parameter to suspend registers") Signed-off-by: Alex Elder <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Eric Dumazet [Mon, 19 Feb 2024 14:12:20 +0000 (14:12 +0000)]
net: implement lockless setsockopt(SO_PEEK_OFF)
syzbot reported a lockdep violation [1] involving af_unix
support of SO_PEEK_OFF.
Since SO_PEEK_OFF is inherently not thread safe (it uses a per-socket
sk_peek_off field), there is really no point to enforce a pointless
thread safety in the kernel.
After this patch :
- setsockopt(SO_PEEK_OFF) no longer acquires the socket lock.
- skb_consume_udp() no longer has to acquire the socket lock.
- af_unix no longer needs a special version of sk_set_peek_off(),
because it does not lock u->iolock anymore.
As a followup, we could replace prot->set_peek_off to be a boolean
and avoid an indirect call, since we always use sk_set_peek_off().
AF reserves MCAM entries for each PF, VF present in the
system and populates the entry with DMAC and action with
default RSS so that basic packet I/O works. Since PF/VF is
not aware of the RSS action installed by AF, AF only fixup
the actions of the rules installed by PF/VF with corresponding
default RSS action. This worked well for rules installed by
PF/VF for features like RX VLAN offload and DMAC filters but
rules involving action like drop/forward to queue are also
getting modified by AF. Hence fix it by setting the default
RSS action only if requested by PF/VF.
Fixes: 967db3529eca ("octeontx2-af: add support for multicast/promisc packet replication feature") Signed-off-by: Subbaraya Sundeep <[email protected]> Signed-off-by: David S. Miller <[email protected]>
syzkaller reported an overflown write in arp_req_get(). [0]
When ioctl(SIOCGARP) is issued, arp_req_get() looks up an neighbour
entry and copies neigh->ha to struct arpreq.arp_ha.sa_data.
The arp_ha here is struct sockaddr, not struct sockaddr_storage, so
the sa_data buffer is just 14 bytes.
In the splat below, 2 bytes are overflown to the next int field,
arp_flags. We initialise the field just after the memcpy(), so it's
not a problem.
However, when dev->addr_len is greater than 22 (e.g. MAX_ADDR_LEN),
arp_netmask is overwritten, which could be set as htonl(0xFFFFFFFFUL)
in arp_ioctl() before calling arp_req_get().
To avoid the overflow, let's limit the max length of memcpy().
Note that commit b5f0de6df6dc ("net: dev: Convert sa_data to flexible
array in struct sockaddr") just silenced syzkaller.
Yafang Shao [Sat, 17 Feb 2024 11:41:52 +0000 (19:41 +0800)]
selftests/bpf: Add negtive test cases for task iter
Incorporate a test case to assess the handling of invalid flags or
task__nullable parameters passed to bpf_iter_task_new(). Prior to the
preceding commit, this scenario could potentially trigger a kernel panic.
However, with the previous commit, this test case is expected to function
correctly.
Yafang Shao [Sat, 17 Feb 2024 11:41:51 +0000 (19:41 +0800)]
bpf: Fix an issue due to uninitialized bpf_iter_task
Failure to initialize it->pos, coupled with the presence of an invalid
value in the flags variable, can lead to it->pos referencing an invalid
task, potentially resulting in a kernel panic. To mitigate this risk, it's
crucial to ensure proper initialization of it->pos to NULL.
Martin KaFai Lau [Thu, 15 Feb 2024 21:12:18 +0000 (13:12 -0800)]
selftests/bpf: Test racing between bpf_timer_cancel_and_free and bpf_timer_cancel
This selftest is based on a Alexei's test adopted from an internal
user to troubleshoot another bug. During this exercise, a separate
racing bug was discovered between bpf_timer_cancel_and_free
and bpf_timer_cancel. The details can be found in the previous
patch.
This patch is to add a selftest that can trigger the bug.
I can trigger the UAF everytime in my qemu setup with KASAN. The idea
is to have multiple user space threads running in a tight loop to exercise
both bpf_map_update_elem (which calls into bpf_timer_cancel_and_free)
and bpf_timer_cancel.
In bpf_timer_cancel_and_free, this patch frees the timer->timer
after a rcu grace period. This requires a rcu_head addition
to the "struct bpf_hrtimer". Another kfree(t) happens in bpf_timer_init,
this does not need a kfree_rcu because it is still under the
spin_lock and timer->timer has not been visible by others yet.
In bpf_timer_cancel, rcu_read_lock() is added because this helper
can be used in a non rcu critical section context (e.g. from
a sleepable bpf prog). Other timer->timer usages in helpers.c
have been audited, bpf_timer_cancel() is the only place where
timer->timer is used outside of the spin_lock.
Another solution considered is to mark a t->flag in bpf_timer_cancel
and clear it after hrtimer_cancel() is done. In bpf_timer_cancel_and_free,
it busy waits for the flag to be cleared before kfree(t). This patch
goes with a straight forward solution and frees timer->timer after
a rcu grace period.
Kees Cook [Fri, 16 Feb 2024 23:30:05 +0000 (15:30 -0800)]
enic: Avoid false positive under FORTIFY_SOURCE
FORTIFY_SOURCE has been ignoring 0-sized destinations while the kernel
code base has been converted to flexible arrays. In order to enforce
the 0-sized destinations (e.g. with __counted_by), the remaining 0-sized
destinations need to be handled. Unfortunately, struct vic_provinfo
resists full conversion, as it contains a flexible array of flexible
arrays, which is only possible with the 0-sized fake flexible array.
Use unsafe_memcpy() to avoid future false positives under
CONFIG_FORTIFY_SOURCE.
Hangbin Liu [Thu, 15 Feb 2024 02:33:25 +0000 (10:33 +0800)]
selftests: bonding: set active slave to primary eth1 specifically
In bond priority testing, we set the primary interface to eth1 and add
eth0,1,2 to bond in serial. This is OK in normal times. But when in
debug kernel, the bridge port that eth0,1,2 connected would start
slowly (enter blocking, forwarding state), which caused the primary
interface down for a while after enslaving and active slave changed.
Here is a test log from Jakub's debug test[1].
[ 400.399070][ T50] br0: port 1(s0) entered disabled state
[ 400.400168][ T50] br0: port 4(s2) entered disabled state
[ 400.941504][ T2791] bond0: (slave eth0): making interface the new active one
[ 400.942603][ T2791] bond0: (slave eth0): Enslaving as an active interface with an up link
[ 400.943633][ T2766] br0: port 1(s0) entered blocking state
[ 400.944119][ T2766] br0: port 1(s0) entered forwarding state
[ 401.128792][ T2792] bond0: (slave eth1): making interface the new active one
[ 401.130771][ T2792] bond0: (slave eth1): Enslaving as an active interface with an up link
[ 401.131643][ T69] br0: port 2(s1) entered blocking state
[ 401.132067][ T69] br0: port 2(s1) entered forwarding state
[ 401.346201][ T2793] bond0: (slave eth2): Enslaving as a backup interface with an up link
[ 401.348414][ T50] br0: port 4(s2) entered blocking state
[ 401.348857][ T50] br0: port 4(s2) entered forwarding state
[ 401.519669][ T250] bond0: (slave eth0): link status definitely down, disabling slave
[ 401.526522][ T250] bond0: (slave eth1): link status definitely down, disabling slave
[ 401.526986][ T250] bond0: (slave eth2): making interface the new active one
[ 401.629470][ T250] bond0: (slave eth0): link status definitely up
[ 401.630089][ T250] bond0: (slave eth1): link status definitely up
[...]
# TEST: prio (active-backup ns_ip6_target primary_reselect 1) [FAIL]
# Current active slave is eth2 but not eth1
Fix it by setting active slave to primary slave specifically before
testing.
Fixes: 481b56e0391e ("selftests: bonding: re-format bond option tests") Signed-off-by: Hangbin Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David S. Miller [Sun, 18 Feb 2024 11:32:10 +0000 (11:32 +0000)]
Merge branch 'bcmasp-fixes'
Justin Chen says:
====================
net: bcmasp: bug fixes for bcmasp
Fix two bugs.
- Indicate that PM is managed by mac to prevent double pm calls. This
doesn't lead to a crash, but waste a noticable amount of time
suspending/resuming.
- Sanity check for OOB write was off by one. Leading to a false error
when using the full array.
====================
Florian Fainelli [Thu, 15 Feb 2024 18:27:31 +0000 (10:27 -0800)]
net: bcmasp: Indicate MAC is in charge of PHY PM
Avoid the PHY library call unnecessarily into the suspend/resume
functions by setting phydev->mac_managed_pm to true. The ASP driver
essentially does exactly what mdio_bus_phy_resume() does.
Fixes: 490cb412007d ("net: bcmasp: Add support for ASP2.0 Ethernet controller") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: Justin Chen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David S. Miller [Sun, 18 Feb 2024 10:25:01 +0000 (10:25 +0000)]
Merge branch 'mptcp-fixes'
Matthieu Baerts says:
====================
mptcp: misc. fixes for v6.8
This series includes 4 types of fixes:
Patches 1 and 2 force the path-managers not to allocate a new address
entry when dealing with the "special" ID 0, reserved to the address of
the initial subflow. These patches can be backported up to v5.19 and
v5.12 respectively.
Patch 3 to 6 fix the in-kernel path-manager not to create duplicated
subflows. Patch 6 is the main fix, but patches 3 to 5 are some kind of
pre-requisities: they fix some data races that could also lead to the
creation of unexpected subflows. These patches can be backported up to
v5.7, v5.10, v6.0, and v5.15 respectively.
Note that patch 3 modifies the existing ULP API. No better solutions
have been found for -net, and there is some similar prior art, see
commit 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info"). Please
also note that TLS ULP Diag has likely the same issue.
Patches 7 to 9 fix issues in the selftests, when executing them on older
kernels, e.g. when testing the last version of these kselftests on the
v5.15.148 kernel as it is done by LKFT when validating stable kernels.
These patches only avoid printing expected errors the console and
marking some tests as "OK" while they have been skipped. Patches 7 and 8
can be backported up to v6.6.
Patches 10 to 13 make sure all MPTCP selftests subtests have a unique
name. It is important to have a unique (sub)test name in TAP, because
that's the test identifier. Some CI environments might drop tests with
duplicated names. Patches 10 to 12 can be backported up to v6.6.
====================
It is important to have a unique (sub)test name in TAP, because some CI
environments drop tests with duplicated name.
Some 'cestab' subtests from the diag selftest had the same names, e.g.:
....chk 0 cestab
Now the previous value is taken, to have different names, e.g.:
....chk 2->0 cestab after flush
While at it, the 'after flush' info is added, similar to what is done
with the 'in use' subtests. Also inspired by these 'in use' subtests,
'many' is displayed instead of a large number:
many msk socket present [ ok ]
....chk many msk in use [ ok ]
....chk many cestab [ ok ]
....chk many->0 msk in use after flush [ ok ]
....chk many->0 cestab after flush [ ok ]
It is important to have a unique (sub)test name in TAP, because some CI
environments drop tests with duplicated names.
Some subtests from the userspace_pm selftest had the same names. That's
because different subflows are created (and deleted) between the same
pair of IP addresses.
Simply adding the destination port in the name is then enough to have
different names, because the destination port is always different.
Note that adding such info takes a bit more space, so we need to
increase a bit the width to print the name, simply to keep all the
'[ OK ]' aligned as before.
selftests: mptcp: diag: fix bash warnings on older kernels
Since the 'Fixes' commit mentioned below, the command that is executed
in __chk_nr() helper can return nothing if the feature is not supported.
This is the case when the MPTCP CURRESTAB counter is not supported.
To avoid this warning ...
./diag.sh: line 65: [: !=: unary operator expected
... we just need to surround '$nr' with double quotes, to support an
empty string when the feature is not supported.
selftests: mptcp: pm nl: avoid error msg on older kernels
Since the 'Fixes' commit mentioned below, and if the kernel being tested
doesn't support the 'fullmesh' flag, this error will be printed:
netlink error -22 (Invalid argument)
./pm_nl_ctl: bailing out due to netlink error[s]
But that can be normal if the kernel doesn't support the feature, no
need to print this worrying error message while everything else looks
OK. So we can mute stderr. Failures will still be detected if any.
Paolo Abeni [Thu, 15 Feb 2024 18:25:33 +0000 (19:25 +0100)]
mptcp: fix duplicate subflow creation
Fullmesh endpoints could end-up unexpectedly generating duplicate
subflows - same local and remote addresses - when multiple incoming
ADD_ADDR are processed before the PM creates the subflow for the local
endpoints.
Address the issue explicitly checking for duplicates at subflow
creation time.
To avoid a quadratic computational complexity, track the unavailable
remote address ids in a temporary bitmap and initialize such bitmap
with the remote ids of all the existing subflows matching the local
address currently processed.
The above allows additionally replacing the existing code checking
for duplicate entry in the current set with a simple bit test
operation.
Fixes: 2843ff6f36db ("mptcp: remote addresses fullmesh") Cc: [email protected] Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/435 Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Paolo Abeni [Thu, 15 Feb 2024 18:25:31 +0000 (19:25 +0100)]
mptcp: fix data races on local_id
The local address id is accessed lockless by the NL PM, add
all the required ONCE annotation. There is a caveat: the local
id can be initialized late in the subflow life-cycle, and its
validity is controlled by the local_id_valid flag.
Remove such flag and encode the validity in the local_id field
itself with negative value before initialization. That allows
accessing the field consistently with a single read operation.
Paolo Abeni [Thu, 15 Feb 2024 18:25:30 +0000 (19:25 +0100)]
mptcp: fix lockless access in subflow ULP diag
Since the introduction of the subflow ULP diag interface, the
dump callback accessed all the subflow data with lockless.
We need either to annotate all the read and write operation accordingly,
or acquire the subflow socket lock. Let's do latter, even if slower, to
avoid a diffstat havoc.
Geliang Tang [Thu, 15 Feb 2024 18:25:28 +0000 (19:25 +0100)]
mptcp: add needs_id for userspace appending addr
When userspace PM requires to create an ID 0 subflow in "userspace pm
create id 0 subflow" test like this:
userspace_pm_add_sf $ns2 10.0.3.2 0
An ID 1 subflow, in fact, is created.
Since in mptcp_pm_nl_append_new_local_addr(), 'id 0' will be treated as
no ID is set by userspace, and will allocate a new ID immediately:
if (!e->addr.id)
e->addr.id = find_next_zero_bit(pernet->id_bitmap,
MPTCP_PM_MAX_ADDR_ID + 1,
1);
To solve this issue, a new parameter needs_id is added for
mptcp_userspace_pm_append_new_local_addr() to distinguish between
whether userspace PM has set an ID 0 or whether userspace PM has
not set any address.
needs_id is true in mptcp_userspace_pm_get_local_id(), but false in
mptcp_pm_nl_announce_doit() and mptcp_pm_nl_subflow_create_doit().
Pavel Sakharov [Wed, 14 Feb 2024 09:27:17 +0000 (12:27 +0300)]
net: stmmac: Fix incorrect dereference in interrupt handlers
If 'dev' or 'data' is NULL, the 'priv' variable has an incorrect address
when dereferencing calling netdev_err().
Since we get as 'dev_id' or 'data' what was passed as the 'dev' argument
to request_irq() during interrupt initialization (that is, the net_device
and rx/tx queue pointers initialized at the time of the call) and since
there are usually no checks for the 'dev_id' argument in such handlers
in other drivers, remove these checks from the handlers in stmmac driver.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 8532f613bc78 ("net: stmmac: introduce MSI Interrupt routines for mac, safety, RX & TX") Signed-off-by: Pavel Sakharov <[email protected]> Reviewed-by: Serge Semin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jakub Kicinski [Thu, 15 Feb 2024 14:33:46 +0000 (06:33 -0800)]
net/sched: act_mirred: don't override retval if we already lost the skb
If we're redirecting the skb, and haven't called tcf_mirred_forward(),
yet, we need to tell the core to drop the skb by setting the retcode
to SHOT. If we have called tcf_mirred_forward(), however, the skb
is out of our hands and returning SHOT will lead to UaF.
Move the retval override to the error path which actually need it.
Jakub Kicinski [Thu, 15 Feb 2024 14:33:45 +0000 (06:33 -0800)]
net/sched: act_mirred: use the backlog for mirred ingress
The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog
for nested calls to mirred ingress") hangs our testing VMs every 10 or so
runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
lockdep.
The problem as previously described by Davide (see Link) is that
if we reverse flow of traffic with the redirect (egress -> ingress)
we may reach the same socket which generated the packet. And we may
still be holding its socket lock. The common solution to such deadlocks
is to put the packet in the Rx backlog, rather than run the Rx path
inline. Do that for all egress -> ingress reversals, not just once
we started to nest mirred calls.
In the past there was a concern that the backlog indirection will
lead to loss of error reporting / less accurate stats. But the current
workaround does not seem to address the issue.
Randy Dunlap [Thu, 15 Feb 2024 07:00:50 +0000 (23:00 -0800)]
net: ethernet: adi: requires PHYLIB support
This driver uses functions that are supplied by the Kconfig symbol
PHYLIB, so select it to ensure that they are built as needed.
When CONFIG_ADIN1110=y and CONFIG_PHYLIB=m, there are multiple build
(linker) errors that are resolved by this Kconfig change:
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_net_open':
drivers/net/ethernet/adi/adin1110.c:933: undefined reference to `phy_start'
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_probe_netdevs':
drivers/net/ethernet/adi/adin1110.c:1603: undefined reference to `get_phy_device'
ld: drivers/net/ethernet/adi/adin1110.c:1609: undefined reference to `phy_connect'
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_disconnect_phy':
drivers/net/ethernet/adi/adin1110.c:1226: undefined reference to `phy_disconnect'
ld: drivers/net/ethernet/adi/adin1110.o: in function `devm_mdiobus_alloc':
include/linux/phy.h:455: undefined reference to `devm_mdiobus_alloc_size'
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_register_mdiobus':
drivers/net/ethernet/adi/adin1110.c:529: undefined reference to `__devm_mdiobus_register'
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_net_stop':
drivers/net/ethernet/adi/adin1110.c:958: undefined reference to `phy_stop'
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_disconnect_phy':
drivers/net/ethernet/adi/adin1110.c:1226: undefined reference to `phy_disconnect'
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_adjust_link':
drivers/net/ethernet/adi/adin1110.c:1077: undefined reference to `phy_print_status'
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_ioctl':
drivers/net/ethernet/adi/adin1110.c:790: undefined reference to `phy_do_ioctl'
ld: drivers/net/ethernet/adi/adin1110.o:(.rodata+0xf60): undefined reference to `phy_ethtool_get_link_ksettings'
ld: drivers/net/ethernet/adi/adin1110.o:(.rodata+0xf68): undefined reference to `phy_ethtool_set_link_ksettings'
However, the syzkaller's log hinted that connect() failed just before
the warning due to FAULT_INJECTION. [1]
When connect() is called for an unbound socket, we search for an
available ephemeral port. If a bhash bucket exists for the port, we
call __inet_check_established() or __inet6_check_established() to check
if the bucket is reusable.
If reusable, we add the socket into ehash and set inet_sk(sk)->inet_num.
Later, we look up the corresponding bhash2 bucket and try to allocate
it if it does not exist.
Although it rarely occurs in real use, if the allocation fails, we must
revert the changes by check_established(). Otherwise, an unconnected
socket could illegally occupy an ehash entry.
Note that we do not put tw back into ehash because sk might have
already responded to a packet for tw and it would be better to free
tw earlier under such memory presure.
David S. Miller [Fri, 16 Feb 2024 09:36:38 +0000 (09:36 +0000)]
Merge branch 'bridge-mdb-events'
Tobias Waldekranz says:
====================
net: bridge: switchdev: Ensure MDB events are delivered exactly once
When a device is attached to a bridge, drivers will request a replay
of objects that were created before the device joined the bridge, that
are still of interest to the joining port. Typical examples include
FDB entries and MDB memberships on other ports ("foreign interfaces")
or on the bridge itself.
Conversely when a device is detached, the bridge will synthesize
deletion events for all those objects that are still live, but no
longer applicable to the device in question.
This series eliminates two races related to the synching and
unsynching phases of a bridge's MDB with a joining or leaving device,
that would cause notifications of such objects to be either delivered
twice (1/2), or not at all (2/2).
A similar race to the one solved by 1/2 still remains for the
FDB. This is much harder to solve, due to the lockless operation of
the FDB's rhashtable, and is therefore knowingly left out of this
series.
v1 -> v2:
- Squash the previously separate addition of
switchdev_port_obj_act_is_deferred into first consumer.
- Use ether_addr_equal to compare MAC addresses.
- Document switchdev_port_obj_act_is_deferred (renamed from
switchdev_port_obj_is_deferred in v1, to indicate that we also match
on the action).
- Delay allocations of MDB objects until we know they're needed.
- Use non-RCU version of the hash list iterator, now that the MDB is
not scanned while holding the RCU read lock.
- Add Fixes tag to commit message
v2 -> v3:
- Fix unlocking in error paths
- Access RCU protected port list via mlock_dereference, since MDB is
guaranteed to remain constant for the duration of the scan.
v3 -> v4:
- Limit the search for exiting deferred events in 1/2 to only apply to
additions, since the problem does not exist in the deletion case.
- Add 2/2, to plug a related race when unoffloading an indirectly
associated device.
v4 -> v5:
- Fix grammatical errors in kerneldoc of
switchdev_port_obj_act_is_deferred
====================
net: bridge: switchdev: Ensure deferred event delivery on unoffload
When unoffloading a device, it is important to ensure that all
relevant deferred events are delivered to it before it disassociates
itself from the bridge.
Before this change, this was true for the normal case when a device
maps 1:1 to a net_bridge_port, i.e.
br0
/
swp0
When swp0 leaves br0, the call to switchdev_deferred_process() in
del_nbp() makes sure to process any outstanding events while the
device is still associated with the bridge.
In the case when the association is indirect though, i.e. when the
device is attached to the bridge via an intermediate device, like a
LAG...
br0
/
lag0
/
swp0
...then detaching swp0 from lag0 does not cause any net_bridge_port to
be deleted, so there was no guarantee that all events had been
processed before the device disassociated itself from the bridge.
Fix this by always synchronously processing all deferred events before
signaling completion of unoffloading back to the driver.
Fixes: 4e51bf44a03a ("net: bridge: move the switchdev object replay helpers to "push" mode") Signed-off-by: Tobias Waldekranz <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net: bridge: switchdev: Skip MDB replays of deferred events on offload
Before this change, generation of the list of MDB events to replay
would race against the creation of new group memberships, either from
the IGMP/MLD snooping logic or from user configuration.
While new memberships are immediately visible to walkers of
br->mdb_list, the notification of their existence to switchdev event
subscribers is deferred until a later point in time. So if a replay
list was generated during a time that overlapped with such a window,
it would also contain a replay of the not-yet-delivered event.
The driver would thus receive two copies of what the bridge internally
considered to be one single event. On destruction of the bridge, only
a single membership deletion event was therefore sent. As a
consequence of this, drivers which reference count memberships (at
least DSA), would be left with orphan groups in their hardware
database when the bridge was destroyed.
This is only an issue when replaying additions. While deletion events
may still be pending on the deferred queue, they will already have
been removed from br->mdb_list, so no duplicates can be generated in
that scenario.
To a user this meant that old group memberships, from a bridge in
which a port was previously attached, could be reanimated (in
hardware) when the port joined a new bridge, without the new bridge's
knowledge.
For example, on an mv88e6xxx system, create a snooping bridge and
immediately add a port to it:
root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \
> ip link set dev x3 up master br0
And then destroy the bridge:
root@infix-06-0b-00:~$ ip link del dev br0
root@infix-06-0b-00:~$ mvls atu
ADDRESS FID STATE Q F 0 1 2 3 4 5 6 7 8 9 a
DEV:0 Marvell 88E6393X
33:33:00:00:00:6a 1 static - - 0 . . . . . . . . . .
33:33:ff:87:e4:3f 1 static - - 0 . . . . . . . . . .
ff:ff:ff:ff:ff:ff 1 static - - 0 1 2 3 4 5 6 7 8 9 a
root@infix-06-0b-00:~$
The two IPv6 groups remain in the hardware database because the
port (x3) is notified of the host's membership twice: once via the
original event and once via a replay. Since only a single delete
notification is sent, the count remains at 1 when the bridge is
destroyed.
Then add the same port (or another port belonging to the same hardware
domain) to a new bridge, this time with snooping disabled:
root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \
> ip link set dev x3 up master br1
All multicast, including the two IPv6 groups from br0, should now be
flooded, according to the policy of br1. But instead the old
memberships are still active in the hardware database, causing the
switch to only forward traffic to those groups towards the CPU (port
0).
Eliminate the race in two steps:
1. Grab the write-side lock of the MDB while generating the replay
list.
This prevents new memberships from showing up while we are generating
the replay list. But it leaves the scenario in which a deferred event
was already generated, but not delivered, before we grabbed the
lock. Therefore:
2. Make sure that no deferred version of a replay event is already
enqueued to the switchdev deferred queue, before adding it to the
replay list, when replaying additions.
Fixes: 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries") Signed-off-by: Tobias Waldekranz <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net/iucv: fix the allocation size of iucv_path_table array
iucv_path_table is a dynamically allocated array of pointers to
struct iucv_path items. Yet, its size is calculated as if it was
an array of struct iucv_path items.
As reported by syzboot [1] and [2], when trying to read vsyscall page
by using bpf_probe_read_kernel() or bpf_probe_read(), oops may happen.
Thomas Gleixner had proposed a test patch [3], but it seems that no
formal patch is posted after about one month [4], so I post it instead
and add an Originally-by tag in patch #2.
Patch #1 makes is_vsyscall_vaddr() being a common helper. Patch #2 fixes
the problem by disallowing vsyscall page read for
copy_from_kernel_nofault(). Patch #3 adds one test case to ensure the
read of vsyscall page through bpf is rejected. Please see individual
patches for more details.
Change Log:
v3:
* rephrase commit message for patch #1 & #2 (Sohil)
* reword comments in copy_from_kernel_nofault_allowed() (Sohil)
* add Rvb tag for patch #1 and Acked-by tag for patch #3 (Sohil, Yonghong)
v2: https://lore.kernel.org/bpf/20240126115423.3943360[email protected]/
* move is_vsyscall_vaddr to asm/vsyscall.h instead (Sohil)
* elaborate on the reason for disallowing of vsyscall page read in
copy_from_kernel_nofault_allowed() (Sohil)
* update the commit message of patch #2 to more clearly explain how
the oops occurs. (Sohil)
* update the commit message of patch #3 to explain the expected return
values of various bpf helpers (Yonghong)
Hou Tao [Fri, 2 Feb 2024 10:39:35 +0000 (18:39 +0800)]
selftest/bpf: Test the read of vsyscall page under x86-64
Under x86-64, when using bpf_probe_read_kernel{_str}() or
bpf_probe_read{_str}() to read vsyscall page, the read may trigger oops,
so add one test case to ensure that the problem is fixed. Beside those
four bpf helpers mentioned above, testing the read of vsyscall page by
using bpf_probe_read_user{_str} and bpf_copy_from_user{_task}() as well.
The test case passes the address of vsyscall page to these six helpers
and checks whether the returned values are expected:
1) For bpf_probe_read_kernel{_str}()/bpf_probe_read{_str}(), the
expected return value is -ERANGE as shown below:
The occurrence of oops depends on the availability of CPU SMAP [1]
feature and there are three possible configurations of vsyscall page in
the boot cmd-line: vsyscall={xonly|none|emulate}, so there are a total
of six possible combinations. Under all these combinations, the test
case runs successfully.
1) A bpf program uses bpf_probe_read_kernel() to read from the vsyscall
page and invokes copy_from_kernel_nofault() which in turn calls
__get_user_asm().
2) Because the vsyscall page address is not readable from kernel space,
a page fault exception is triggered accordingly.
3) handle_page_fault() considers the vsyscall page address as a user
space address instead of a kernel space address. This results in the
fix-up setup by bpf not being applied and a page_fault_oops() is invoked
due to SMAP.
Considering handle_page_fault() has already considered the vsyscall page
address as a userspace address, fix the problem by disallowing vsyscall
page read for copy_from_kernel_nofault().
Linus Torvalds [Thu, 15 Feb 2024 19:39:27 +0000 (11:39 -0800)]
Merge tag 'net-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from can, wireless and netfilter.
Current release - regressions:
- af_unix: fix task hung while purging oob_skb in GC
- pds_core: do not try to run health-thread in VF path
Current release - new code bugs:
- sched: act_mirred: don't zero blockid when net device is being
deleted
Previous releases - regressions:
- netfilter:
- nat: restore default DNAT behavior
- nf_tables: fix bidirectional offload, broken when unidirectional
offload support was added
- openvswitch: limit the number of recursions from action sets
- eth: i40e: do not allow untrusted VF to remove administratively set
MAC address
Previous releases - always broken:
- tls: fix races and bugs in use of async crypto
- mptcp: prevent data races on some of the main socket fields, fix
races in fastopen handling
- dpll: fix possible deadlock during netlink dump operation
- dsa: lan966x: fix crash when adding interface under a lag when some
of the ports are disabled
- can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock
Misc:
- a handful of fixes and reliability improvements for selftests
- fix sysfs documentation missing net/ in paths
- finish the work of squashing the missing MODULE_DESCRIPTION()
warnings in networking"
* tag 'net-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits)
net: fill in MODULE_DESCRIPTION()s for missing arcnet
net: fill in MODULE_DESCRIPTION()s for mdio_devres
net: fill in MODULE_DESCRIPTION()s for ppp
net: fill in MODULE_DESCRIPTION()s for fddik/skfp
net: fill in MODULE_DESCRIPTION()s for plip
net: fill in MODULE_DESCRIPTION()s for ieee802154/fakelb
net: fill in MODULE_DESCRIPTION()s for xen-netback
net: ravb: Count packets instead of descriptors in GbEth RX path
pppoe: Fix memory leak in pppoe_sendmsg()
net: sctp: fix skb leak in sctp_inq_free()
net: bcmasp: Handle RX buffer allocation failure
net-timestamp: make sk_tskey more predictable in error path
selftests: tls: increase the wait in poll_partial_rec_async
ice: Add check for lport extraction to LAG init
netfilter: nf_tables: fix bidirectional offload regression
netfilter: nat: restore default DNAT behavior
netfilter: nft_set_pipapo: fix missing : in kdoc
igc: Remove temporary workaround
igb: Fix string truncation warnings in igb_set_fw_version
can: netlink: Fix TDCO calculation using the old data bittiming
...
Linus Torvalds [Thu, 15 Feb 2024 19:33:35 +0000 (11:33 -0800)]
Merge tag 'for-linus-6.8a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Fixes and simple cleanups:
- use a proper flexible array instead of a one-element array in order
to avoid array-bounds sanitizer errors
- add NULL pointer checks after allocating memory
- use memdup_array_user() instead of open-coding it
- fix a rare race condition in Xen event channel allocation code
- make struct bus_type instances const
- make kerneldoc inline comments match reality"
* tag 'for-linus-6.8a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/events: close evtchn after mapping cleanup
xen/gntalloc: Replace UAPI 1-element array
xen: balloon: make balloon_subsys const
xen: pcpu: make xen_pcpu_subsys const
xen/privcmd: Use memdup_array_user() in alloc_ioreq()
x86/xen: Add some null pointer checking to smp.c
xen/xenbus: document will_handle argument for xenbus_watch_path()
Linus Torvalds [Thu, 15 Feb 2024 19:14:33 +0000 (11:14 -0800)]
update workarounds for gcc "asm goto" issue
In commit 4356e9f841f7 ("work around gcc bugs with 'asm goto' with
outputs") I did the gcc workaround unconditionally, because the cause of
the bad code generation wasn't entirely clear.
In the meantime, Jakub Jelinek debugged the issue, and has come up with
a fix in gcc [2], which also got backported to the still maintained
branches of gcc-11, gcc-12 and gcc-13.
Note that while the fix technically wasn't in the original gcc-14
branch, Jakub says:
"while it is true that no GCC 14 snapshots until today (or whenever the
fix will be committed) have the fix, for GCC trunk it is up to the
distros to use the latest snapshot if they use it at all and would
allow better testing of the kernel code without the workaround, so
that if there are other issues they won't be discovered years later.
Most userland code doesn't actually use asm goto with outputs..."
so we will consider gcc-14 to be fixed - if somebody is using gcc
snapshots of the gcc-14 before the fix, they should upgrade.
Note that while the bug goes back to gcc-11, in practice other gcc
changes seem to have effectively hidden it since gcc-12.1 as per a
bisect by Jakub. So even a gcc-14 snapshot without the fix likely
doesn't show actual problems.
Also, make the default 'asm_goto_output()' macro mark the asm as
volatile by hand, because of an unrelated gcc issue [1] where it doesn't
match the documented behavior ("asm goto is always volatile").
Linus Torvalds [Thu, 15 Feb 2024 18:19:55 +0000 (10:19 -0800)]
Merge tag 'devicetree-fixes-for-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Improve devlink dependency parsing for DT graphs
- Fix devlink handling of io-channels dependencies
- Fix PCI addressing in marvell,prestera example
- A few schema fixes for property constraints
- Improve performance of DT unprobed devices kselftest
- Fix regression in DT_SCHEMA_FILES handling
- Fix compile error in unittest for !OF_DYNAMIC
* tag 'devicetree-fixes-for-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: ufs: samsung,exynos-ufs: Add size constraints on "samsung,sysreg"
of: property: Add in-ports/out-ports support to of_graph_get_port_parent()
of: property: Improve finding the supplier of a remote-endpoint property
of: property: Improve finding the consumer of a remote-endpoint property
net: marvell,prestera: Fix example PCI bus addressing
of: unittest: Fix compile in the non-dynamic case
of: property: fix typo in io-channels
dt-bindings: tpm: Drop type from "resets"
dt-bindings: display: nxp,tda998x: Fix 'audio-ports' constraints
dt-bindings: xilinx: replace Piyush Mehta maintainership
kselftest: dt: Stop relying on dirname to improve performance
dt-bindings: don't anchor DT_SCHEMA_FILES to bindings directory
Linus Torvalds [Thu, 15 Feb 2024 17:13:12 +0000 (09:13 -0800)]
Merge tag 'spi-fix-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A smallish collection of fixes for SPI, all driver specific, plus one
device ID addition for a new Intel part.
The ppc4xx isn't routinely covered by most of the automated testing so
there were some errors that were missed in some of the recent API
conversions, otherwise there's nothing super remarkable here"
* tag 'spi-fix-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi-mxs: Fix chipselect glitch
spi: intel-pci: Add support for Lunar Lake-M SPI serial flash
spi: omap2-mcspi: Revert FIFO support without DMA
spi: ppc4xx: Drop write-only variable
spi: ppc4xx: Fix fallout from rename in struct spi_bitbang
spi: ppc4xx: Fix fallout from include cleanup
spi: spi-ppc4xx: include missing platform_device.h
spi: imx: fix the burst length at DMA mode and CPU mode
Linus Torvalds [Thu, 15 Feb 2024 17:11:06 +0000 (09:11 -0800)]
Merge tag 'regmap-fix-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap test fixes from Mark Brown:
"Guenter runs a lot of KUnit tests so noticed that there were a couple
of the regmap tests, including the newly added noinc test, which could
show spurious failures due to the use of randomly generated test
values. These changes handle the randomly generated data properly"
* tag 'regmap-fix-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: kunit: Ensure that changed bytes are actually different
regmap: kunit: fix raw noinc write test wrapping
Linus Torvalds [Thu, 15 Feb 2024 17:08:19 +0000 (09:08 -0800)]
Merge tag 'hid-for-linus-2024021501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- fix for 'MSC_SERIAL = 0' corner case handling in wacom driver (Jason
Gerecke)
- ACPI S3 suspend/resume fix for intel-ish-hid (Even Xu)
- race condition fix preventing Wacom driver from losing events shortly
after initialization (Jason Gerecke)
- fix preventing certain Logitech HID++ devices from spamming kernel
log (Oleksandr Natalenko)
* tag 'hid-for-linus-2024021501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: wacom: generic: Avoid reporting a serial of '0' to userspace
HID: Intel-ish-hid: Ishtp: Fix sensor reads after ACPI S3 suspend
HID: multitouch: Add required quirk for Synaptics 0xcddc device
HID: wacom: Do not register input devices until after hid_hw_start
HID: logitech-hidpp: Do not flood kernel log
Jakub Kicinski [Thu, 15 Feb 2024 16:03:49 +0000 (08:03 -0800)]
Merge branch 'fix-module_description-for-net-p6'
Breno Leitao says:
====================
Fix MODULE_DESCRIPTION() for net (p6)
There are a few network modules left that misses MODULE_DESCRIPTION(),
causing a warnning when compiling with W=1. Example:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/....
This last patchset solves the problem for all the missing driver. It is
not expect to see any warning for the driver/net and net/ directory once
all these patches have landed.
Gavrilov Ilia [Wed, 14 Feb 2024 09:01:50 +0000 (09:01 +0000)]
pppoe: Fix memory leak in pppoe_sendmsg()
syzbot reports a memory leak in pppoe_sendmsg [1].
The problem is in the pppoe_recvmsg() function that handles errors
in the wrong order. For the skb_recv_datagram() function, check
the pointer to skb for NULL first, and then check the 'error' variable,
because the skb_recv_datagram() function can set 'error'
to -EAGAIN in a loop but return a correct pointer to socket buffer
after a number of attempts, though 'error' remains set to -EAGAIN.
skb_recv_datagram
__skb_recv_datagram // Loop. if (err == -EAGAIN) then
// go to the next loop iteration
__skb_try_recv_datagram // if (skb != NULL) then return 'skb'
// else if a signal is received then
// return -EAGAIN
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with Syzkaller.
Dmitry Antipov [Wed, 14 Feb 2024 08:22:24 +0000 (11:22 +0300)]
net: sctp: fix skb leak in sctp_inq_free()
In case of GSO, 'chunk->skb' pointer may point to an entry from
fraglist created in 'sctp_packet_gso_append()'. To avoid freeing
random fraglist entry (and so undefined behavior and/or memory
leak), introduce 'sctp_inq_chunk_free()' helper to ensure that
'chunk->skb' is set to 'chunk->head_skb' (i.e. fraglist head)
before calling 'sctp_chunk_free()', and use the aforementioned
helper in 'sctp_inq_pop()' as well.
Florian Fainelli [Tue, 13 Feb 2024 17:33:39 +0000 (09:33 -0800)]
net: bcmasp: Handle RX buffer allocation failure
The buffer_pg variable needs to hold an order-5 allocation (32 x
PAGE_SIZE) which, under memory pressure may fail to be allocated. Deal
with that error condition properly to avoid doing a NULL pointer
de-reference in the subsequent call to dma_map_page().
In addition, the err_reclaim_tx error label in bcmasp_netif_init() needs
to ensure that the TX NAPI object is properly deleted, otherwise
unregister_netdev() will spin forever attempting to test and clear
the NAPI_STATE_HASHED bit.
Paolo Abeni [Thu, 15 Feb 2024 11:31:22 +0000 (12:31 +0100)]
Merge tag 'linux-can-fixes-for-6.8-20240214' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2024-02-14
this is a pull request of 3 patches for net/master.
the first patch is by Ziqi Zhao and targets the CAN J1939 protocol, it
fixes a potential deadlock by replacing the spinlock by an rwlock.
Oleksij Rempel's patch adds a missing spin_lock_bh() to prevent a
potential Use-After-Free in the CAN J1939's
setsockopt(SO_J1939_FILTER).
Maxime Jayat contributes a patch to fix the transceiver delay
compensation (TDCO) calculation, which is needed for higher CAN-FD bit
rates (usually 2Mbit/s).
* tag 'linux-can-fixes-for-6.8-20240214' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: netlink: Fix TDCO calculation using the old data bittiming
can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER)
can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock
====================
Vadim Fedorenko [Tue, 13 Feb 2024 11:04:28 +0000 (03:04 -0800)]
net-timestamp: make sk_tskey more predictable in error path
When SOF_TIMESTAMPING_OPT_ID is used to ambiguate timestamped datagrams,
the sk_tskey can become unpredictable in case of any error happened
during sendmsg(). Move increment later in the code and make decrement of
sk_tskey in error path. This solution is still racy in case of multiple
threads doing snedmsg() over the very same socket in parallel, but still
makes error path much more predictable.
Jakub Kicinski [Tue, 13 Feb 2024 14:20:55 +0000 (06:20 -0800)]
selftests: tls: increase the wait in poll_partial_rec_async
Test runners on debug kernels occasionally fail with:
# # RUN tls_err.13_aes_gcm.poll_partial_rec_async ...
# # tls.c:1883:poll_partial_rec_async:Expected poll(&pfd, 1, 5) (0) == 1 (1)
# # tls.c:1870:poll_partial_rec_async:Expected status (256) == 0 (0)
# # poll_partial_rec_async: Test failed at step #17
# # FAIL tls_err.13_aes_gcm.poll_partial_rec_async
# not ok 699 tls_err.13_aes_gcm.poll_partial_rec_async
# # FAILED: 698 / 699 tests passed.
This points to the second poll() in the test which is expected
to wait for the sender to send the rest of the data.
Apparently under some conditions that doesn't happen within 5ms,
bump the timeout to 20ms.
Dave Ertman [Tue, 13 Feb 2024 18:39:55 +0000 (10:39 -0800)]
ice: Add check for lport extraction to LAG init
To fully support initializing the LAG support code, a DDP package that
extracts the logical port from the metadata is required. If such a
package is not present, there could be difficulties in supporting some
bond types.
Add a check into the initialization flow that will bypass the new paths
if any of the support pieces are missing.
Jakub Kicinski [Thu, 15 Feb 2024 01:32:37 +0000 (17:32 -0800)]
Merge tag 'wireless-2024-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Valentine's day edition, with just few fixes because
that's how we love it ;-)
iwlwifi:
- correct A3 in A-MSDUs
- fix crash when operating as AP and running out of station
slots to use
- clear link ID to correct some later checks against it
- fix error codes in SAR table loading
- fix error path in PPAG table read
mac80211:
- reload a pointer after SKB may have changed
(only in certain monitor inject mode scenarios)
* tag 'wireless-2024-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: iwlwifi: mvm: fix a crash when we run out of stations
wifi: iwlwifi: uninitialized variable in iwl_acpi_get_ppag_table()
wifi: iwlwifi: Fix some error codes
wifi: iwlwifi: clear link_id in time_event
wifi: iwlwifi: mvm: use correct address 3 in A-MSDU
wifi: mac80211: reload info pointer in ieee80211_tx_dequeue()
====================
Linus Torvalds [Thu, 15 Feb 2024 00:06:31 +0000 (16:06 -0800)]
Merge tag 'mips-fixes_6.8_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- Fix for broken ipv6 checksums
- Fix handling of exceptions in delay slots
* tag 'mips-fixes_6.8_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
mm/memory: Use exception ip to search exception tables
MIPS: Clear Cause.BD in instruction_pointer_set
ptrace: Introduce exception_ip arch hook
MIPS: Add 'memory' clobber to csum_ipv6_magic() inline assembler
Linus Torvalds [Thu, 15 Feb 2024 00:02:36 +0000 (16:02 -0800)]
Merge tag 'landlock-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock test fixes from Mickaël Salaün:
"Fix build issues for tests, and improve test compatibility"
* tag 'landlock-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
selftests/landlock: Fix capability for net_test
selftests/landlock: Fix fs_test build with old libc
selftests/landlock: Fix net_test build with old libc