Florian Fainelli [Thu, 15 Feb 2024 18:27:31 +0000 (10:27 -0800)]
net: bcmasp: Indicate MAC is in charge of PHY PM
Avoid the PHY library call unnecessarily into the suspend/resume
functions by setting phydev->mac_managed_pm to true. The ASP driver
essentially does exactly what mdio_bus_phy_resume() does.
Fixes: 490cb412007d ("net: bcmasp: Add support for ASP2.0 Ethernet controller") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: Justin Chen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David S. Miller [Sun, 18 Feb 2024 10:25:01 +0000 (10:25 +0000)]
Merge branch 'mptcp-fixes'
Matthieu Baerts says:
====================
mptcp: misc. fixes for v6.8
This series includes 4 types of fixes:
Patches 1 and 2 force the path-managers not to allocate a new address
entry when dealing with the "special" ID 0, reserved to the address of
the initial subflow. These patches can be backported up to v5.19 and
v5.12 respectively.
Patch 3 to 6 fix the in-kernel path-manager not to create duplicated
subflows. Patch 6 is the main fix, but patches 3 to 5 are some kind of
pre-requisities: they fix some data races that could also lead to the
creation of unexpected subflows. These patches can be backported up to
v5.7, v5.10, v6.0, and v5.15 respectively.
Note that patch 3 modifies the existing ULP API. No better solutions
have been found for -net, and there is some similar prior art, see
commit 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info"). Please
also note that TLS ULP Diag has likely the same issue.
Patches 7 to 9 fix issues in the selftests, when executing them on older
kernels, e.g. when testing the last version of these kselftests on the
v5.15.148 kernel as it is done by LKFT when validating stable kernels.
These patches only avoid printing expected errors the console and
marking some tests as "OK" while they have been skipped. Patches 7 and 8
can be backported up to v6.6.
Patches 10 to 13 make sure all MPTCP selftests subtests have a unique
name. It is important to have a unique (sub)test name in TAP, because
that's the test identifier. Some CI environments might drop tests with
duplicated names. Patches 10 to 12 can be backported up to v6.6.
====================
It is important to have a unique (sub)test name in TAP, because some CI
environments drop tests with duplicated name.
Some 'cestab' subtests from the diag selftest had the same names, e.g.:
....chk 0 cestab
Now the previous value is taken, to have different names, e.g.:
....chk 2->0 cestab after flush
While at it, the 'after flush' info is added, similar to what is done
with the 'in use' subtests. Also inspired by these 'in use' subtests,
'many' is displayed instead of a large number:
many msk socket present [ ok ]
....chk many msk in use [ ok ]
....chk many cestab [ ok ]
....chk many->0 msk in use after flush [ ok ]
....chk many->0 cestab after flush [ ok ]
It is important to have a unique (sub)test name in TAP, because some CI
environments drop tests with duplicated names.
Some subtests from the userspace_pm selftest had the same names. That's
because different subflows are created (and deleted) between the same
pair of IP addresses.
Simply adding the destination port in the name is then enough to have
different names, because the destination port is always different.
Note that adding such info takes a bit more space, so we need to
increase a bit the width to print the name, simply to keep all the
'[ OK ]' aligned as before.
selftests: mptcp: diag: fix bash warnings on older kernels
Since the 'Fixes' commit mentioned below, the command that is executed
in __chk_nr() helper can return nothing if the feature is not supported.
This is the case when the MPTCP CURRESTAB counter is not supported.
To avoid this warning ...
./diag.sh: line 65: [: !=: unary operator expected
... we just need to surround '$nr' with double quotes, to support an
empty string when the feature is not supported.
selftests: mptcp: pm nl: avoid error msg on older kernels
Since the 'Fixes' commit mentioned below, and if the kernel being tested
doesn't support the 'fullmesh' flag, this error will be printed:
netlink error -22 (Invalid argument)
./pm_nl_ctl: bailing out due to netlink error[s]
But that can be normal if the kernel doesn't support the feature, no
need to print this worrying error message while everything else looks
OK. So we can mute stderr. Failures will still be detected if any.
Paolo Abeni [Thu, 15 Feb 2024 18:25:33 +0000 (19:25 +0100)]
mptcp: fix duplicate subflow creation
Fullmesh endpoints could end-up unexpectedly generating duplicate
subflows - same local and remote addresses - when multiple incoming
ADD_ADDR are processed before the PM creates the subflow for the local
endpoints.
Address the issue explicitly checking for duplicates at subflow
creation time.
To avoid a quadratic computational complexity, track the unavailable
remote address ids in a temporary bitmap and initialize such bitmap
with the remote ids of all the existing subflows matching the local
address currently processed.
The above allows additionally replacing the existing code checking
for duplicate entry in the current set with a simple bit test
operation.
Fixes: 2843ff6f36db ("mptcp: remote addresses fullmesh") Cc: [email protected] Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/435 Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Paolo Abeni [Thu, 15 Feb 2024 18:25:31 +0000 (19:25 +0100)]
mptcp: fix data races on local_id
The local address id is accessed lockless by the NL PM, add
all the required ONCE annotation. There is a caveat: the local
id can be initialized late in the subflow life-cycle, and its
validity is controlled by the local_id_valid flag.
Remove such flag and encode the validity in the local_id field
itself with negative value before initialization. That allows
accessing the field consistently with a single read operation.
Paolo Abeni [Thu, 15 Feb 2024 18:25:30 +0000 (19:25 +0100)]
mptcp: fix lockless access in subflow ULP diag
Since the introduction of the subflow ULP diag interface, the
dump callback accessed all the subflow data with lockless.
We need either to annotate all the read and write operation accordingly,
or acquire the subflow socket lock. Let's do latter, even if slower, to
avoid a diffstat havoc.
Geliang Tang [Thu, 15 Feb 2024 18:25:28 +0000 (19:25 +0100)]
mptcp: add needs_id for userspace appending addr
When userspace PM requires to create an ID 0 subflow in "userspace pm
create id 0 subflow" test like this:
userspace_pm_add_sf $ns2 10.0.3.2 0
An ID 1 subflow, in fact, is created.
Since in mptcp_pm_nl_append_new_local_addr(), 'id 0' will be treated as
no ID is set by userspace, and will allocate a new ID immediately:
if (!e->addr.id)
e->addr.id = find_next_zero_bit(pernet->id_bitmap,
MPTCP_PM_MAX_ADDR_ID + 1,
1);
To solve this issue, a new parameter needs_id is added for
mptcp_userspace_pm_append_new_local_addr() to distinguish between
whether userspace PM has set an ID 0 or whether userspace PM has
not set any address.
needs_id is true in mptcp_userspace_pm_get_local_id(), but false in
mptcp_pm_nl_announce_doit() and mptcp_pm_nl_subflow_create_doit().
Pavel Sakharov [Wed, 14 Feb 2024 09:27:17 +0000 (12:27 +0300)]
net: stmmac: Fix incorrect dereference in interrupt handlers
If 'dev' or 'data' is NULL, the 'priv' variable has an incorrect address
when dereferencing calling netdev_err().
Since we get as 'dev_id' or 'data' what was passed as the 'dev' argument
to request_irq() during interrupt initialization (that is, the net_device
and rx/tx queue pointers initialized at the time of the call) and since
there are usually no checks for the 'dev_id' argument in such handlers
in other drivers, remove these checks from the handlers in stmmac driver.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 8532f613bc78 ("net: stmmac: introduce MSI Interrupt routines for mac, safety, RX & TX") Signed-off-by: Pavel Sakharov <[email protected]> Reviewed-by: Serge Semin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jakub Kicinski [Thu, 15 Feb 2024 14:33:46 +0000 (06:33 -0800)]
net/sched: act_mirred: don't override retval if we already lost the skb
If we're redirecting the skb, and haven't called tcf_mirred_forward(),
yet, we need to tell the core to drop the skb by setting the retcode
to SHOT. If we have called tcf_mirred_forward(), however, the skb
is out of our hands and returning SHOT will lead to UaF.
Move the retval override to the error path which actually need it.
Jakub Kicinski [Thu, 15 Feb 2024 14:33:45 +0000 (06:33 -0800)]
net/sched: act_mirred: use the backlog for mirred ingress
The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog
for nested calls to mirred ingress") hangs our testing VMs every 10 or so
runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
lockdep.
The problem as previously described by Davide (see Link) is that
if we reverse flow of traffic with the redirect (egress -> ingress)
we may reach the same socket which generated the packet. And we may
still be holding its socket lock. The common solution to such deadlocks
is to put the packet in the Rx backlog, rather than run the Rx path
inline. Do that for all egress -> ingress reversals, not just once
we started to nest mirred calls.
In the past there was a concern that the backlog indirection will
lead to loss of error reporting / less accurate stats. But the current
workaround does not seem to address the issue.
Randy Dunlap [Thu, 15 Feb 2024 07:00:50 +0000 (23:00 -0800)]
net: ethernet: adi: requires PHYLIB support
This driver uses functions that are supplied by the Kconfig symbol
PHYLIB, so select it to ensure that they are built as needed.
When CONFIG_ADIN1110=y and CONFIG_PHYLIB=m, there are multiple build
(linker) errors that are resolved by this Kconfig change:
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_net_open':
drivers/net/ethernet/adi/adin1110.c:933: undefined reference to `phy_start'
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_probe_netdevs':
drivers/net/ethernet/adi/adin1110.c:1603: undefined reference to `get_phy_device'
ld: drivers/net/ethernet/adi/adin1110.c:1609: undefined reference to `phy_connect'
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_disconnect_phy':
drivers/net/ethernet/adi/adin1110.c:1226: undefined reference to `phy_disconnect'
ld: drivers/net/ethernet/adi/adin1110.o: in function `devm_mdiobus_alloc':
include/linux/phy.h:455: undefined reference to `devm_mdiobus_alloc_size'
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_register_mdiobus':
drivers/net/ethernet/adi/adin1110.c:529: undefined reference to `__devm_mdiobus_register'
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_net_stop':
drivers/net/ethernet/adi/adin1110.c:958: undefined reference to `phy_stop'
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_disconnect_phy':
drivers/net/ethernet/adi/adin1110.c:1226: undefined reference to `phy_disconnect'
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_adjust_link':
drivers/net/ethernet/adi/adin1110.c:1077: undefined reference to `phy_print_status'
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_ioctl':
drivers/net/ethernet/adi/adin1110.c:790: undefined reference to `phy_do_ioctl'
ld: drivers/net/ethernet/adi/adin1110.o:(.rodata+0xf60): undefined reference to `phy_ethtool_get_link_ksettings'
ld: drivers/net/ethernet/adi/adin1110.o:(.rodata+0xf68): undefined reference to `phy_ethtool_set_link_ksettings'
However, the syzkaller's log hinted that connect() failed just before
the warning due to FAULT_INJECTION. [1]
When connect() is called for an unbound socket, we search for an
available ephemeral port. If a bhash bucket exists for the port, we
call __inet_check_established() or __inet6_check_established() to check
if the bucket is reusable.
If reusable, we add the socket into ehash and set inet_sk(sk)->inet_num.
Later, we look up the corresponding bhash2 bucket and try to allocate
it if it does not exist.
Although it rarely occurs in real use, if the allocation fails, we must
revert the changes by check_established(). Otherwise, an unconnected
socket could illegally occupy an ehash entry.
Note that we do not put tw back into ehash because sk might have
already responded to a packet for tw and it would be better to free
tw earlier under such memory presure.
David S. Miller [Fri, 16 Feb 2024 09:36:38 +0000 (09:36 +0000)]
Merge branch 'bridge-mdb-events'
Tobias Waldekranz says:
====================
net: bridge: switchdev: Ensure MDB events are delivered exactly once
When a device is attached to a bridge, drivers will request a replay
of objects that were created before the device joined the bridge, that
are still of interest to the joining port. Typical examples include
FDB entries and MDB memberships on other ports ("foreign interfaces")
or on the bridge itself.
Conversely when a device is detached, the bridge will synthesize
deletion events for all those objects that are still live, but no
longer applicable to the device in question.
This series eliminates two races related to the synching and
unsynching phases of a bridge's MDB with a joining or leaving device,
that would cause notifications of such objects to be either delivered
twice (1/2), or not at all (2/2).
A similar race to the one solved by 1/2 still remains for the
FDB. This is much harder to solve, due to the lockless operation of
the FDB's rhashtable, and is therefore knowingly left out of this
series.
v1 -> v2:
- Squash the previously separate addition of
switchdev_port_obj_act_is_deferred into first consumer.
- Use ether_addr_equal to compare MAC addresses.
- Document switchdev_port_obj_act_is_deferred (renamed from
switchdev_port_obj_is_deferred in v1, to indicate that we also match
on the action).
- Delay allocations of MDB objects until we know they're needed.
- Use non-RCU version of the hash list iterator, now that the MDB is
not scanned while holding the RCU read lock.
- Add Fixes tag to commit message
v2 -> v3:
- Fix unlocking in error paths
- Access RCU protected port list via mlock_dereference, since MDB is
guaranteed to remain constant for the duration of the scan.
v3 -> v4:
- Limit the search for exiting deferred events in 1/2 to only apply to
additions, since the problem does not exist in the deletion case.
- Add 2/2, to plug a related race when unoffloading an indirectly
associated device.
v4 -> v5:
- Fix grammatical errors in kerneldoc of
switchdev_port_obj_act_is_deferred
====================
net: bridge: switchdev: Ensure deferred event delivery on unoffload
When unoffloading a device, it is important to ensure that all
relevant deferred events are delivered to it before it disassociates
itself from the bridge.
Before this change, this was true for the normal case when a device
maps 1:1 to a net_bridge_port, i.e.
br0
/
swp0
When swp0 leaves br0, the call to switchdev_deferred_process() in
del_nbp() makes sure to process any outstanding events while the
device is still associated with the bridge.
In the case when the association is indirect though, i.e. when the
device is attached to the bridge via an intermediate device, like a
LAG...
br0
/
lag0
/
swp0
...then detaching swp0 from lag0 does not cause any net_bridge_port to
be deleted, so there was no guarantee that all events had been
processed before the device disassociated itself from the bridge.
Fix this by always synchronously processing all deferred events before
signaling completion of unoffloading back to the driver.
Fixes: 4e51bf44a03a ("net: bridge: move the switchdev object replay helpers to "push" mode") Signed-off-by: Tobias Waldekranz <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net: bridge: switchdev: Skip MDB replays of deferred events on offload
Before this change, generation of the list of MDB events to replay
would race against the creation of new group memberships, either from
the IGMP/MLD snooping logic or from user configuration.
While new memberships are immediately visible to walkers of
br->mdb_list, the notification of their existence to switchdev event
subscribers is deferred until a later point in time. So if a replay
list was generated during a time that overlapped with such a window,
it would also contain a replay of the not-yet-delivered event.
The driver would thus receive two copies of what the bridge internally
considered to be one single event. On destruction of the bridge, only
a single membership deletion event was therefore sent. As a
consequence of this, drivers which reference count memberships (at
least DSA), would be left with orphan groups in their hardware
database when the bridge was destroyed.
This is only an issue when replaying additions. While deletion events
may still be pending on the deferred queue, they will already have
been removed from br->mdb_list, so no duplicates can be generated in
that scenario.
To a user this meant that old group memberships, from a bridge in
which a port was previously attached, could be reanimated (in
hardware) when the port joined a new bridge, without the new bridge's
knowledge.
For example, on an mv88e6xxx system, create a snooping bridge and
immediately add a port to it:
root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \
> ip link set dev x3 up master br0
And then destroy the bridge:
root@infix-06-0b-00:~$ ip link del dev br0
root@infix-06-0b-00:~$ mvls atu
ADDRESS FID STATE Q F 0 1 2 3 4 5 6 7 8 9 a
DEV:0 Marvell 88E6393X
33:33:00:00:00:6a 1 static - - 0 . . . . . . . . . .
33:33:ff:87:e4:3f 1 static - - 0 . . . . . . . . . .
ff:ff:ff:ff:ff:ff 1 static - - 0 1 2 3 4 5 6 7 8 9 a
root@infix-06-0b-00:~$
The two IPv6 groups remain in the hardware database because the
port (x3) is notified of the host's membership twice: once via the
original event and once via a replay. Since only a single delete
notification is sent, the count remains at 1 when the bridge is
destroyed.
Then add the same port (or another port belonging to the same hardware
domain) to a new bridge, this time with snooping disabled:
root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \
> ip link set dev x3 up master br1
All multicast, including the two IPv6 groups from br0, should now be
flooded, according to the policy of br1. But instead the old
memberships are still active in the hardware database, causing the
switch to only forward traffic to those groups towards the CPU (port
0).
Eliminate the race in two steps:
1. Grab the write-side lock of the MDB while generating the replay
list.
This prevents new memberships from showing up while we are generating
the replay list. But it leaves the scenario in which a deferred event
was already generated, but not delivered, before we grabbed the
lock. Therefore:
2. Make sure that no deferred version of a replay event is already
enqueued to the switchdev deferred queue, before adding it to the
replay list, when replaying additions.
Fixes: 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries") Signed-off-by: Tobias Waldekranz <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net/iucv: fix the allocation size of iucv_path_table array
iucv_path_table is a dynamically allocated array of pointers to
struct iucv_path items. Yet, its size is calculated as if it was
an array of struct iucv_path items.
Linus Torvalds [Thu, 15 Feb 2024 19:39:27 +0000 (11:39 -0800)]
Merge tag 'net-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from can, wireless and netfilter.
Current release - regressions:
- af_unix: fix task hung while purging oob_skb in GC
- pds_core: do not try to run health-thread in VF path
Current release - new code bugs:
- sched: act_mirred: don't zero blockid when net device is being
deleted
Previous releases - regressions:
- netfilter:
- nat: restore default DNAT behavior
- nf_tables: fix bidirectional offload, broken when unidirectional
offload support was added
- openvswitch: limit the number of recursions from action sets
- eth: i40e: do not allow untrusted VF to remove administratively set
MAC address
Previous releases - always broken:
- tls: fix races and bugs in use of async crypto
- mptcp: prevent data races on some of the main socket fields, fix
races in fastopen handling
- dpll: fix possible deadlock during netlink dump operation
- dsa: lan966x: fix crash when adding interface under a lag when some
of the ports are disabled
- can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock
Misc:
- a handful of fixes and reliability improvements for selftests
- fix sysfs documentation missing net/ in paths
- finish the work of squashing the missing MODULE_DESCRIPTION()
warnings in networking"
* tag 'net-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits)
net: fill in MODULE_DESCRIPTION()s for missing arcnet
net: fill in MODULE_DESCRIPTION()s for mdio_devres
net: fill in MODULE_DESCRIPTION()s for ppp
net: fill in MODULE_DESCRIPTION()s for fddik/skfp
net: fill in MODULE_DESCRIPTION()s for plip
net: fill in MODULE_DESCRIPTION()s for ieee802154/fakelb
net: fill in MODULE_DESCRIPTION()s for xen-netback
net: ravb: Count packets instead of descriptors in GbEth RX path
pppoe: Fix memory leak in pppoe_sendmsg()
net: sctp: fix skb leak in sctp_inq_free()
net: bcmasp: Handle RX buffer allocation failure
net-timestamp: make sk_tskey more predictable in error path
selftests: tls: increase the wait in poll_partial_rec_async
ice: Add check for lport extraction to LAG init
netfilter: nf_tables: fix bidirectional offload regression
netfilter: nat: restore default DNAT behavior
netfilter: nft_set_pipapo: fix missing : in kdoc
igc: Remove temporary workaround
igb: Fix string truncation warnings in igb_set_fw_version
can: netlink: Fix TDCO calculation using the old data bittiming
...
Linus Torvalds [Thu, 15 Feb 2024 19:33:35 +0000 (11:33 -0800)]
Merge tag 'for-linus-6.8a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Fixes and simple cleanups:
- use a proper flexible array instead of a one-element array in order
to avoid array-bounds sanitizer errors
- add NULL pointer checks after allocating memory
- use memdup_array_user() instead of open-coding it
- fix a rare race condition in Xen event channel allocation code
- make struct bus_type instances const
- make kerneldoc inline comments match reality"
* tag 'for-linus-6.8a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/events: close evtchn after mapping cleanup
xen/gntalloc: Replace UAPI 1-element array
xen: balloon: make balloon_subsys const
xen: pcpu: make xen_pcpu_subsys const
xen/privcmd: Use memdup_array_user() in alloc_ioreq()
x86/xen: Add some null pointer checking to smp.c
xen/xenbus: document will_handle argument for xenbus_watch_path()
Linus Torvalds [Thu, 15 Feb 2024 19:14:33 +0000 (11:14 -0800)]
update workarounds for gcc "asm goto" issue
In commit 4356e9f841f7 ("work around gcc bugs with 'asm goto' with
outputs") I did the gcc workaround unconditionally, because the cause of
the bad code generation wasn't entirely clear.
In the meantime, Jakub Jelinek debugged the issue, and has come up with
a fix in gcc [2], which also got backported to the still maintained
branches of gcc-11, gcc-12 and gcc-13.
Note that while the fix technically wasn't in the original gcc-14
branch, Jakub says:
"while it is true that no GCC 14 snapshots until today (or whenever the
fix will be committed) have the fix, for GCC trunk it is up to the
distros to use the latest snapshot if they use it at all and would
allow better testing of the kernel code without the workaround, so
that if there are other issues they won't be discovered years later.
Most userland code doesn't actually use asm goto with outputs..."
so we will consider gcc-14 to be fixed - if somebody is using gcc
snapshots of the gcc-14 before the fix, they should upgrade.
Note that while the bug goes back to gcc-11, in practice other gcc
changes seem to have effectively hidden it since gcc-12.1 as per a
bisect by Jakub. So even a gcc-14 snapshot without the fix likely
doesn't show actual problems.
Also, make the default 'asm_goto_output()' macro mark the asm as
volatile by hand, because of an unrelated gcc issue [1] where it doesn't
match the documented behavior ("asm goto is always volatile").
Linus Torvalds [Thu, 15 Feb 2024 18:19:55 +0000 (10:19 -0800)]
Merge tag 'devicetree-fixes-for-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Improve devlink dependency parsing for DT graphs
- Fix devlink handling of io-channels dependencies
- Fix PCI addressing in marvell,prestera example
- A few schema fixes for property constraints
- Improve performance of DT unprobed devices kselftest
- Fix regression in DT_SCHEMA_FILES handling
- Fix compile error in unittest for !OF_DYNAMIC
* tag 'devicetree-fixes-for-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: ufs: samsung,exynos-ufs: Add size constraints on "samsung,sysreg"
of: property: Add in-ports/out-ports support to of_graph_get_port_parent()
of: property: Improve finding the supplier of a remote-endpoint property
of: property: Improve finding the consumer of a remote-endpoint property
net: marvell,prestera: Fix example PCI bus addressing
of: unittest: Fix compile in the non-dynamic case
of: property: fix typo in io-channels
dt-bindings: tpm: Drop type from "resets"
dt-bindings: display: nxp,tda998x: Fix 'audio-ports' constraints
dt-bindings: xilinx: replace Piyush Mehta maintainership
kselftest: dt: Stop relying on dirname to improve performance
dt-bindings: don't anchor DT_SCHEMA_FILES to bindings directory
Linus Torvalds [Thu, 15 Feb 2024 17:13:12 +0000 (09:13 -0800)]
Merge tag 'spi-fix-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A smallish collection of fixes for SPI, all driver specific, plus one
device ID addition for a new Intel part.
The ppc4xx isn't routinely covered by most of the automated testing so
there were some errors that were missed in some of the recent API
conversions, otherwise there's nothing super remarkable here"
* tag 'spi-fix-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi-mxs: Fix chipselect glitch
spi: intel-pci: Add support for Lunar Lake-M SPI serial flash
spi: omap2-mcspi: Revert FIFO support without DMA
spi: ppc4xx: Drop write-only variable
spi: ppc4xx: Fix fallout from rename in struct spi_bitbang
spi: ppc4xx: Fix fallout from include cleanup
spi: spi-ppc4xx: include missing platform_device.h
spi: imx: fix the burst length at DMA mode and CPU mode
Linus Torvalds [Thu, 15 Feb 2024 17:11:06 +0000 (09:11 -0800)]
Merge tag 'regmap-fix-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap test fixes from Mark Brown:
"Guenter runs a lot of KUnit tests so noticed that there were a couple
of the regmap tests, including the newly added noinc test, which could
show spurious failures due to the use of randomly generated test
values. These changes handle the randomly generated data properly"
* tag 'regmap-fix-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: kunit: Ensure that changed bytes are actually different
regmap: kunit: fix raw noinc write test wrapping
Linus Torvalds [Thu, 15 Feb 2024 17:08:19 +0000 (09:08 -0800)]
Merge tag 'hid-for-linus-2024021501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- fix for 'MSC_SERIAL = 0' corner case handling in wacom driver (Jason
Gerecke)
- ACPI S3 suspend/resume fix for intel-ish-hid (Even Xu)
- race condition fix preventing Wacom driver from losing events shortly
after initialization (Jason Gerecke)
- fix preventing certain Logitech HID++ devices from spamming kernel
log (Oleksandr Natalenko)
* tag 'hid-for-linus-2024021501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: wacom: generic: Avoid reporting a serial of '0' to userspace
HID: Intel-ish-hid: Ishtp: Fix sensor reads after ACPI S3 suspend
HID: multitouch: Add required quirk for Synaptics 0xcddc device
HID: wacom: Do not register input devices until after hid_hw_start
HID: logitech-hidpp: Do not flood kernel log
Jakub Kicinski [Thu, 15 Feb 2024 16:03:49 +0000 (08:03 -0800)]
Merge branch 'fix-module_description-for-net-p6'
Breno Leitao says:
====================
Fix MODULE_DESCRIPTION() for net (p6)
There are a few network modules left that misses MODULE_DESCRIPTION(),
causing a warnning when compiling with W=1. Example:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/....
This last patchset solves the problem for all the missing driver. It is
not expect to see any warning for the driver/net and net/ directory once
all these patches have landed.
Gavrilov Ilia [Wed, 14 Feb 2024 09:01:50 +0000 (09:01 +0000)]
pppoe: Fix memory leak in pppoe_sendmsg()
syzbot reports a memory leak in pppoe_sendmsg [1].
The problem is in the pppoe_recvmsg() function that handles errors
in the wrong order. For the skb_recv_datagram() function, check
the pointer to skb for NULL first, and then check the 'error' variable,
because the skb_recv_datagram() function can set 'error'
to -EAGAIN in a loop but return a correct pointer to socket buffer
after a number of attempts, though 'error' remains set to -EAGAIN.
skb_recv_datagram
__skb_recv_datagram // Loop. if (err == -EAGAIN) then
// go to the next loop iteration
__skb_try_recv_datagram // if (skb != NULL) then return 'skb'
// else if a signal is received then
// return -EAGAIN
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with Syzkaller.
Dmitry Antipov [Wed, 14 Feb 2024 08:22:24 +0000 (11:22 +0300)]
net: sctp: fix skb leak in sctp_inq_free()
In case of GSO, 'chunk->skb' pointer may point to an entry from
fraglist created in 'sctp_packet_gso_append()'. To avoid freeing
random fraglist entry (and so undefined behavior and/or memory
leak), introduce 'sctp_inq_chunk_free()' helper to ensure that
'chunk->skb' is set to 'chunk->head_skb' (i.e. fraglist head)
before calling 'sctp_chunk_free()', and use the aforementioned
helper in 'sctp_inq_pop()' as well.
Florian Fainelli [Tue, 13 Feb 2024 17:33:39 +0000 (09:33 -0800)]
net: bcmasp: Handle RX buffer allocation failure
The buffer_pg variable needs to hold an order-5 allocation (32 x
PAGE_SIZE) which, under memory pressure may fail to be allocated. Deal
with that error condition properly to avoid doing a NULL pointer
de-reference in the subsequent call to dma_map_page().
In addition, the err_reclaim_tx error label in bcmasp_netif_init() needs
to ensure that the TX NAPI object is properly deleted, otherwise
unregister_netdev() will spin forever attempting to test and clear
the NAPI_STATE_HASHED bit.
Paolo Abeni [Thu, 15 Feb 2024 11:31:22 +0000 (12:31 +0100)]
Merge tag 'linux-can-fixes-for-6.8-20240214' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2024-02-14
this is a pull request of 3 patches for net/master.
the first patch is by Ziqi Zhao and targets the CAN J1939 protocol, it
fixes a potential deadlock by replacing the spinlock by an rwlock.
Oleksij Rempel's patch adds a missing spin_lock_bh() to prevent a
potential Use-After-Free in the CAN J1939's
setsockopt(SO_J1939_FILTER).
Maxime Jayat contributes a patch to fix the transceiver delay
compensation (TDCO) calculation, which is needed for higher CAN-FD bit
rates (usually 2Mbit/s).
* tag 'linux-can-fixes-for-6.8-20240214' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: netlink: Fix TDCO calculation using the old data bittiming
can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER)
can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock
====================
Vadim Fedorenko [Tue, 13 Feb 2024 11:04:28 +0000 (03:04 -0800)]
net-timestamp: make sk_tskey more predictable in error path
When SOF_TIMESTAMPING_OPT_ID is used to ambiguate timestamped datagrams,
the sk_tskey can become unpredictable in case of any error happened
during sendmsg(). Move increment later in the code and make decrement of
sk_tskey in error path. This solution is still racy in case of multiple
threads doing snedmsg() over the very same socket in parallel, but still
makes error path much more predictable.
Jakub Kicinski [Tue, 13 Feb 2024 14:20:55 +0000 (06:20 -0800)]
selftests: tls: increase the wait in poll_partial_rec_async
Test runners on debug kernels occasionally fail with:
# # RUN tls_err.13_aes_gcm.poll_partial_rec_async ...
# # tls.c:1883:poll_partial_rec_async:Expected poll(&pfd, 1, 5) (0) == 1 (1)
# # tls.c:1870:poll_partial_rec_async:Expected status (256) == 0 (0)
# # poll_partial_rec_async: Test failed at step #17
# # FAIL tls_err.13_aes_gcm.poll_partial_rec_async
# not ok 699 tls_err.13_aes_gcm.poll_partial_rec_async
# # FAILED: 698 / 699 tests passed.
This points to the second poll() in the test which is expected
to wait for the sender to send the rest of the data.
Apparently under some conditions that doesn't happen within 5ms,
bump the timeout to 20ms.
Dave Ertman [Tue, 13 Feb 2024 18:39:55 +0000 (10:39 -0800)]
ice: Add check for lport extraction to LAG init
To fully support initializing the LAG support code, a DDP package that
extracts the logical port from the metadata is required. If such a
package is not present, there could be difficulties in supporting some
bond types.
Add a check into the initialization flow that will bypass the new paths
if any of the support pieces are missing.
Jakub Kicinski [Thu, 15 Feb 2024 01:32:37 +0000 (17:32 -0800)]
Merge tag 'wireless-2024-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Valentine's day edition, with just few fixes because
that's how we love it ;-)
iwlwifi:
- correct A3 in A-MSDUs
- fix crash when operating as AP and running out of station
slots to use
- clear link ID to correct some later checks against it
- fix error codes in SAR table loading
- fix error path in PPAG table read
mac80211:
- reload a pointer after SKB may have changed
(only in certain monitor inject mode scenarios)
* tag 'wireless-2024-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: iwlwifi: mvm: fix a crash when we run out of stations
wifi: iwlwifi: uninitialized variable in iwl_acpi_get_ppag_table()
wifi: iwlwifi: Fix some error codes
wifi: iwlwifi: clear link_id in time_event
wifi: iwlwifi: mvm: use correct address 3 in A-MSDU
wifi: mac80211: reload info pointer in ieee80211_tx_dequeue()
====================
Linus Torvalds [Thu, 15 Feb 2024 00:06:31 +0000 (16:06 -0800)]
Merge tag 'mips-fixes_6.8_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- Fix for broken ipv6 checksums
- Fix handling of exceptions in delay slots
* tag 'mips-fixes_6.8_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
mm/memory: Use exception ip to search exception tables
MIPS: Clear Cause.BD in instruction_pointer_set
ptrace: Introduce exception_ip arch hook
MIPS: Add 'memory' clobber to csum_ipv6_magic() inline assembler
Linus Torvalds [Thu, 15 Feb 2024 00:02:36 +0000 (16:02 -0800)]
Merge tag 'landlock-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock test fixes from Mickaël Salaün:
"Fix build issues for tests, and improve test compatibility"
* tag 'landlock-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
selftests/landlock: Fix capability for net_test
selftests/landlock: Fix fs_test build with old libc
selftests/landlock: Fix net_test build with old libc
Linus Torvalds [Wed, 14 Feb 2024 23:47:02 +0000 (15:47 -0800)]
Merge tag 'for-6.8-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few regular fixes and one fix for space reservation regression since
6.7 that users have been reporting:
- fix over-reservation of metadata chunks due to not keeping proper
balance between global block reserve and delayed refs reserve; in
practice this leaves behind empty metadata block groups, the
workaround is to reclaim them by using the '-musage=1' balance
filter
- other space reservation fixes:
- do not delete unused block group if it may be used soon
- do not reserve space for checksums for NOCOW files
- fix extent map assertion failure when writing out free space inode
- reject encoded write if inode has nodatasum flag set
- fix chunk map leak when loading block group zone info"
* tag 'for-6.8-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: don't refill whole delayed refs block reserve when starting transaction
btrfs: zoned: fix chunk map leak when loading block group zone info
btrfs: reject encoded write if inode has nodatasum flag set
btrfs: don't reserve space for checksums when writing to nocow files
btrfs: add new unused block groups to the list of unused block groups
btrfs: do not delete unused block group if it may be used soon
btrfs: add and use helper to check if block group is used
btrfs: don't drop extent_map for free space inode on write error
Linus Torvalds [Wed, 14 Feb 2024 23:34:03 +0000 (15:34 -0800)]
Merge tag 'linux_kselftest-kunit-fixes-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit fix from Shuah Khan:
"One important fix to unregister kunit_bus when KUnit module is
unloaded.
Not doing so causes an error when KUnit module tries to re-register
the bus when it gets reloaded"
* tag 'linux_kselftest-kunit-fixes-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: device: Unregister the kunit_bus on shutdown
Commit 8f84780b84d6 ("netfilter: flowtable: allow unidirectional rules")
made unidirectional flow offload possible, while completely ignoring (and
breaking) bidirectional flow offload for nftables.
Add the missing flag that was left out as an exercise for the reader :)
we seem to be DNATing to some random port on the LAN side. While this is
expected if --random is passed to the iptables command, it is not
expected without passing --random. The expected behavior (and the
observed behavior prior to the commit in the "Fixes" tag) is the traffic
will be DNAT'd to 192.168.0.10:21000 unless there is a tuple collision
with that destination. In that case, we expect the traffic to be
instead DNAT'd to 192.168.0.10:21001, so on so forth until the end of
the range.
This patch intends to restore the behavior observed prior to the "Fixes"
tag.
Fixes: 6ed5943f8735 ("netfilter: nat: remove l4 protocol port rovers") Signed-off-by: Kyle Swenson <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
Kunwu Chan [Mon, 15 Jan 2024 08:28:25 +0000 (16:28 +0800)]
igb: Fix string truncation warnings in igb_set_fw_version
Commit 1978d3ead82c ("intel: fix string truncation warnings")
fixes '-Wformat-truncation=' warnings in igb_main.c by using kasprintf.
drivers/net/ethernet/intel/igb/igb_main.c:3092:53: warning:‘%d’ directive output may be truncated writing between 1 and 5 bytes into a region of size between 1 and 13 [-Wformat-truncation=]
3092 | "%d.%d, 0x%08x, %d.%d.%d",
| ^~
drivers/net/ethernet/intel/igb/igb_main.c:3092:34: note:directive argument in the range [0, 65535]
3092 | "%d.%d, 0x%08x, %d.%d.%d",
| ^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/intel/igb/igb_main.c:3092:34: note:directive argument in the range [0, 65535]
drivers/net/ethernet/intel/igb/igb_main.c:3090:25: note:‘snprintf’ output between 23 and 43 bytes into a destination of size 32
kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure.
Fix this warning by using a larger space for adapter->fw_version,
and then fall back and continue to use snprintf.
Maxime Jayat [Mon, 6 Nov 2023 18:01:58 +0000 (19:01 +0100)]
can: netlink: Fix TDCO calculation using the old data bittiming
The TDCO calculation was done using the currently applied data bittiming,
instead of the newly computed data bittiming, which means that the TDCO
had an invalid value unless setting the same data bittiming twice.
Oleksij Rempel [Fri, 20 Oct 2023 13:38:14 +0000 (15:38 +0200)]
can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER)
Lock jsk->sk to prevent UAF when setsockopt(..., SO_J1939_FILTER, ...)
modifies jsk->filters while receiving packets.
Following trace was seen on affected system:
==================================================================
BUG: KASAN: slab-use-after-free in j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939]
Read of size 4 at addr ffff888012144014 by task j1939/350
A reasonable fix is to change j1939_socks_lock to an rwlock, since in
the rare situations where a write lock is required for the linked list
that j1939_socks_lock is protecting, the code does not attempt to
acquire any more locks. This would break the circular lock dependency,
where, for example, the current thread already locks j1939_socks_lock
and attempts to acquire sk_session_queue_lock, and at the same time,
another thread attempts to acquire j1939_socks_lock while holding
sk_session_queue_lock.
NOTE: This patch along does not fix the unregister_netdevice bug
reported by Syzbot; instead, it solves a deadlock situation to prepare
for one or more further patches to actually fix the Syzbot bug, which
appears to be a reference counting problem within the j1939 codebase.
Arnd Bergmann [Tue, 13 Feb 2024 10:16:34 +0000 (11:16 +0100)]
ethernet: cpts: fix function pointer cast warnings
clang-16 warns about the mismatched prototypes for the devm_* callbacks:
drivers/net/ethernet/ti/cpts.c:691:12: error: cast from 'void (*)(struct clk_hw *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict]
691 | (void(*)(void *))clk_hw_unregister_mux,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/device.h:406:34: note: expanded from macro 'devm_add_action_or_reset'
406 | __devm_add_action_or_reset(dev, action, data, #action)
| ^~~~~~
drivers/net/ethernet/ti/cpts.c:703:12: error: cast from 'void (*)(struct device_node *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict]
703 | (void(*)(void *))of_clk_del_provider,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/device.h:406:34: note: expanded from macro 'devm_add_action_or_reset'
406 | __devm_add_action_or_reset(dev, action, data, #action)
Use separate helper functions for this instead, using the expected prototypes
with a void* argument.
Fixes: a3047a81ba13 ("net: ethernet: ti: cpts: add support for ext rftclk selection") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Arnd Bergmann [Tue, 13 Feb 2024 10:07:50 +0000 (11:07 +0100)]
bnad: fix work_queue type mismatch
clang-16 warns about a function pointer cast:
drivers/net/ethernet/brocade/bna/bnad.c:1995:4: error: cast from 'void (*)(struct delayed_work *)' to 'work_func_t' (aka 'void (*)(struct work_struct *)') converts to incompatible function type [-Werror,-Wcast-function-type-strict]
1995 | (work_func_t)bnad_tx_cleanup);
drivers/net/ethernet/brocade/bna/bnad.c:2252:4: error: cast from 'void (*)(void *)' to 'work_func_t' (aka 'void (*)(struct work_struct *)') converts to incompatible function type [-Werror,-Wcast-function-type-strict]
2252 | (work_func_t)(bnad_rx_cleanup));
The problem here is mixing up work_struct and delayed_work, which relies
the former being the first member of the latter.
Change the code to use consistent types here to address the warning and
make it more robust against workqueue interface changes.
Side note: the use of a delayed workqueue for cleaning up TX descriptors
is probably a bad idea since this introduces a noticeable delay. The
driver currently does not appear to use BQL, but if one wanted to add
that, this would have to be changed as well.
Fixes: 01b54b145185 ("bna: tx rx cleanup fix") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Dmitry Antipov [Mon, 12 Feb 2024 14:34:02 +0000 (17:34 +0300)]
net: smc: fix spurious error message from __sock_release()
Commit 67f562e3e147 ("net/smc: transfer fasync_list in case of fallback")
leaves the socket's fasync list pointer within a container socket as well.
When the latter is destroyed, '__sock_release()' warns about its non-empty
fasync list, which is a dangling pointer to previously freed fasync list
of an underlying TCP socket. Fix this spurious warning by nullifying
fasync list of a container socket.
Fixes: 67f562e3e147 ("net/smc: transfer fasync_list in case of fallback") Signed-off-by: Dmitry Antipov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David S. Miller [Wed, 14 Feb 2024 10:37:33 +0000 (10:37 +0000)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-02-12 (i40e)
This series contains updates to i40e driver only.
Ivan Vecera corrects the looping value used while waiting for queues to
be disabled as well as an incorrect mask being used for DCB
configuration.
Maciej resolves an issue related to XDP traffic; removing a double call to
i40e_pf_rxq_wait() and accounting for XDP rings when stopping rings.
====================
octeontx2-af: Remove the PF_FUNC validation for NPC transmit rules
NPC transmit side mcam rules can use the pcifunc (in packet metadata
added by hardware) of transmitting device for mcam lookup similar to
the channel of receiving device at receive side.
The commit 18603683d766 ("octeontx2-af: Remove channel verification
while installing MCAM rules") removed the receive side channel
verification to save hardware MCAM filters while switching packets
across interfaces but missed removing transmit side checks.
This patch removes transmit side rules validation.
Fixes: 18603683d766 ("octeontx2-af: Remove channel verification while installing MCAM rules") Signed-off-by: Subbaraya Sundeep <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jakub Kicinski [Tue, 13 Feb 2024 18:19:07 +0000 (10:19 -0800)]
Merge branch 'selftests-net-more-pmtu-sh-fixes'
Paolo Abeni says:
====================
selftests: net: more pmtu.sh fixes
The mentioned test is still flaky, unusally enough in 'fast'
environments.
Patch 2/2 [try to] address the existing issues, while patch 1/2
introduces more strict tests for the existing net helpers, to hopefully
prevent future pain.
====================
Paolo Abeni [Mon, 12 Feb 2024 10:19:23 +0000 (11:19 +0100)]
selftests: net: more strict check in net_helper
The helper waiting for a listener port can match any socket whose
hexadecimal representation of source or destination addresses
matches that of the given port.
Additionally, any socket state is accepted.
All the above can let the helper return successfully before the
relevant listener is actually ready, with unexpected results.
So far I could not find any related failure in the netdev CI, but
the next patch is going to make the critical event more easily
reproducible.
Address the issue matching the port hex only vs the relevant socket
field and additionally checking the socket state for TCP sockets.
Since commit 28270e25c69a ("btrfs: always reserve space for delayed refs
when starting transaction") we started not only to reserve metadata space
for the delayed refs a caller of btrfs_start_transaction() might generate
but also to try to fully refill the delayed refs block reserve, because
there are several case where we generate delayed refs and haven't reserved
space for them, relying on the global block reserve. Relying too much on
the global block reserve is not always safe, and can result in hitting
-ENOSPC during transaction commits or worst, in rare cases, being unable
to mount a filesystem that needs to do orphan cleanup or anything that
requires modifying the filesystem during mount, and has no more
unallocated space and the metadata space is nearly full. This was
explained in detail in that commit's change log.
However the gap between the reserved amount and the size of the delayed
refs block reserve can be huge, so attempting to reserve space for such
a gap can result in allocating many metadata block groups that end up
not being used. After a recent patch, with the subject:
"btrfs: add new unused block groups to the list of unused block groups"
We started to add new block groups that are unused to the list of unused
block groups, to avoid having them around for a very long time in case
they are never used, because a block group is only added to the list of
unused block groups when we deallocate the last extent or when mounting
the filesystem and the block group has 0 bytes used. This is not a problem
introduced by the commit mentioned earlier, it always existed as our
metadata space reservations are, most of the time, pessimistic and end up
not using all the space they reserved, so we can occasionally end up with
one or two unused metadata block groups for a long period. However after
that commit mentioned earlier, we are just more pessimistic in the
metadata space reservations when starting a transaction and therefore the
issue is more likely to happen.
This however is not always enough because we might create unused metadata
block groups when reserving metadata space at a high rate if there's
always a gap in the delayed refs block reserve and the cleaner kthread
isn't triggered often enough or is busy with other work (running delayed
iputs, cleaning deleted roots, etc), not to mention the block group's
allocated space is only usable for a new block group after the transaction
used to remove it is committed.
A user reported that he's getting a lot of allocated metadata block groups
but the usage percentage of metadata space was very low compared to the
total allocated space, specially after running a series of block group
relocations.
So for now stop trying to refill the gap in the delayed refs block reserve
and reserve space only for the delayed refs we are expected to generate
when starting a transaction.
Filipe Manana [Mon, 12 Feb 2024 21:50:53 +0000 (21:50 +0000)]
btrfs: zoned: fix chunk map leak when loading block group zone info
At btrfs_load_block_group_zone_info() we never drop a reference on the
chunk map we have looked up, therefore leaking a reference on it. So
add the missing btrfs_free_chunk_map() at the end of the function.
Filipe Manana [Fri, 2 Feb 2024 12:09:22 +0000 (12:09 +0000)]
btrfs: reject encoded write if inode has nodatasum flag set
Currently we allow an encoded write against inodes that have the NODATASUM
flag set, either because they are NOCOW files or they were created while
the filesystem was mounted with "-o nodatasum". This results in having
compressed extents without corresponding checksums, which is a filesystem
inconsistency reported by 'btrfs check'.
For example, running btrfs/281 with MOUNT_OPTIONS="-o nodatacow" triggers
this and 'btrfs check' errors out with:
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space tree
[4/7] checking fs roots
root 256 inode 257 errors 1040, bad file extent, some csum missing
root 256 inode 258 errors 1040, bad file extent, some csum missing
ERROR: errors found in fs roots
(...)
So reject encoded writes if the target inode has NODATASUM set.
Filipe Manana [Wed, 31 Jan 2024 17:18:04 +0000 (17:18 +0000)]
btrfs: don't reserve space for checksums when writing to nocow files
Currently when doing a write to a file we always reserve metadata space
for inserting data checksums. However we don't need to do it if we have
a nodatacow file (-o nodatacow mount option or chattr +C) or if checksums
are disabled (-o nodatasum mount option), as in that case we are only
adding unnecessary pressure to metadata reservations.
For example on x86_64, with the default node size of 16K, a 4K buffered
write into a nodatacow file is reserving 655360 bytes of metadata space,
as it's accounting for checksums. After this change, which stops reserving
space for checksums if we have a nodatacow file or checksums are disabled,
we only need to reserve 393216 bytes of metadata.
Linus Torvalds [Tue, 13 Feb 2024 16:38:57 +0000 (08:38 -0800)]
Merge tag 'trace-tools-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing tooling fixes from Steven Rostedt:
"RTLA:
- rtla tools are exiting with a positive value when usage() is
called. Make them return 0 if the usage was called via -h/--help
- the -P priority sets the sched priority for rtla workload. When the
SCHED_OTHER scheduler is selected, it sets the rt_priority instead
of the nice parameter. Setting the nice value is the correct thing,
so fix it
- rtla is failing to compile with clang due to unsupported options
from gcc. Adjusting the compiler/linker options makes clang work
properly
- Remove the sched_getattr() unused function on utils.c
- Fixes for variable initialization and size, reported by clang
Verification:
- rv is failing to compile with clang due to unsupported options from
gcc. Adjusting the compiler/linker options makes clang work
properly
- Fix an uninitialized variable on in_kernel.c reported by clang"
* tag 'trace-tools-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tools/rtla: Exit with EXIT_SUCCESS when help is invoked
tools/rtla: Replace setting prio with nice for SCHED_OTHER
tools/rv: Fix curr_reactor uninitialized variable
tools/rv: Fix Makefile compiler options for clang
tools/rtla: Remove unused sched_getattr() function
tools/rtla: Fix clang warning about mount_point var size
tools/rtla: Fix uninitialized bucket/data->bucket_size warning
tools/rtla: Fix Makefile compiler options for clang
Rob Herring [Wed, 24 Jan 2024 19:07:33 +0000 (13:07 -0600)]
dt-bindings: ufs: samsung,exynos-ufs: Add size constraints on "samsung,sysreg"
The 'phandle-array' type is a bit ambiguous. It can be either just an
array of phandles or an array of phandles plus args. "samsung,sysreg" is
the latter and needs to be constrained to a single entry with a phandle and
offset.
There was a change in the mxs-dma engine that uses a new custom flag.
The change was not applied to the mxs spi driver.
This results in chipselect being deasserted too early.
This fixes the chipselect problem by using the new flag in the mxs-spi
driver.
aarch64-linux-ld: drivers/net/ethernet/ti/icssg/icss_iep.o: in function `icss_iep_get_ptp_clock_idx':
icss_iep.c:(.text+0x1d4): undefined reference to `ptp_clock_index'
aarch64-linux-ld: drivers/net/ethernet/ti/icssg/icss_iep.o: in function `icss_iep_exit':
icss_iep.c:(.text+0xde8): undefined reference to `ptp_clock_unregister'
aarch64-linux-ld: drivers/net/ethernet/ti/icssg/icss_iep.o: in function `icss_iep_init':
icss_iep.c:(.text+0x176c): undefined reference to `ptp_clock_register'
HID: wacom: generic: Avoid reporting a serial of '0' to userspace
The xf86-input-wacom driver does not treat '0' as a valid serial
number and will drop any input report which contains an
MSC_SERIAL = 0 event. The kernel driver already takes care to
avoid sending any MSC_SERIAL event if the value of serial[0] == 0
(which is the case for devices that don't actually report a
serial number), but this is not quite sufficient.
Only the lower 32 bits of the serial get reported to userspace,
so if this portion of the serial is zero then there can still
be problems.
This commit allows the driver to report either the lower 32 bits
if they are non-zero or the upper 32 bits otherwise.
af_unix: Fix task hung while purging oob_skb in GC.
syzbot reported a task hung; at the same time, GC was looping infinitely
in list_for_each_entry_safe() for OOB skb. [0]
syzbot demonstrated that the list_for_each_entry_safe() was not actually
safe in this case.
A single skb could have references for multiple sockets. If we free such
a skb in the list_for_each_entry_safe(), the current and next sockets could
be unlinked in a single iteration.
unix_notinflight() uses list_del_init() to unlink the socket, so the
prefetched next socket forms a loop itself and list_for_each_entry_safe()
never stops.
Here, we must use while() and make sure we always fetch the first socket.
Even Xu [Fri, 9 Feb 2024 06:52:32 +0000 (14:52 +0800)]
HID: Intel-ish-hid: Ishtp: Fix sensor reads after ACPI S3 suspend
After legacy suspend/resume via ACPI S3, sensor read operation fails
with timeout. Also, it will cause delay in resume operation as there
will be retries on failure.
This is caused by commit f645a90e8ff7 ("HID: intel-ish-hid:
ishtp-hid-client: use helper functions for connection"), which used
helper functions to simplify connect, reset and disconnect process.
Also avoid freeing and allocating client buffers again during reconnect
process.
But there is a case, when ISH firmware resets after ACPI S3 suspend,
ishtp bus driver frees client buffers. Since there is no realloc again
during reconnect, there are no client buffers available to send connection
requests to the firmware. Without successful connection to the firmware,
subsequent sensor reads will timeout.
To address this issue, ishtp bus driver does not free client buffers on
warm reset after S3 resume. Simply add the buffers from the read list
to free list of buffers.
Fixes: f645a90e8ff7 ("HID: intel-ish-hid: ishtp-hid-client: use helper functions for connection") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218442 Signed-off-by: Even Xu <[email protected]> Acked-by: Srinivas Pandruvada <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
Keqi Wang [Fri, 9 Feb 2024 09:16:59 +0000 (17:16 +0800)]
connector/cn_proc: revert "connector: Fix proc_event_num_listeners count not cleared"
This reverts commit c46bfba1337d ("connector: Fix proc_event_num_listeners
count not cleared").
It is not accurate to reset proc_event_num_listeners according to
cn_netlink_send_mult() return value -ESRCH.
In the case of stress-ng netlink-proc, -ESRCH will always be returned,
because netlink_broadcast_filtered will return -ESRCH,
which may cause stress-ng netlink-proc performance degradation.
Functions rds_still_queued and rds_clear_recv_queue lock a given socket
in order to safely iterate over the incoming rds messages. However
calling rds_inc_put while under this lock creates a potential deadlock.
rds_inc_put may eventually call rds_message_purge, which will lock
m_rs_lock. This is the incorrect locking order since m_rs_lock is
meant to be locked before the socket. To fix this, we move the message
item to a local list or variable that wont need rs_recv_lock protection.
Then we can safely call rds_inc_put on any item stored locally after
rs_recv_lock is released.
Maximilian Heyne [Wed, 24 Jan 2024 16:31:28 +0000 (16:31 +0000)]
xen/events: close evtchn after mapping cleanup
shutdown_pirq and startup_pirq are not taking the
irq_mapping_update_lock because they can't due to lock inversion. Both
are called with the irq_desc->lock being taking. The lock order,
however, is first irq_mapping_update_lock and then irq_desc->lock.
This opens multiple races:
- shutdown_pirq can be interrupted by a function that allocates an event
channel:
Assume here event channel e refers here to the same event channel
number.
After this race the evtchn_to_irq mapping for e is invalid (-1).
- __startup_pirq races with __unbind_from_irq in a similar way. Because
__startup_pirq doesn't take irq_mapping_update_lock it can grab the
evtchn that __unbind_from_irq is currently freeing and cleaning up. In
this case even though the event channel is allocated, its mapping can
be unset in evtchn_to_irq.
The fix is to first cleanup the mappings and then close the event
channel. In this way, when an event channel gets allocated it's
potential previous evtchn_to_irq mappings are guaranteed to be unset already.
This is also the reverse order of the allocation where first the event
channel is allocated and then the mappings are setup.
On a 5.10 kernel prior to commit 3fcdaf3d7634 ("xen/events: modify internal
[un]bind interfaces"), we hit a BUG like the following during probing of NVMe
devices. The issue is that during nvme_setup_io_queues, pci_free_irq
is called for every device which results in a call to shutdown_pirq.
With many nvme devices it's therefore likely to hit this race during
boot because there will be multiple calls to shutdown_pirq and
startup_pirq are running potentially in parallel.
Kees Cook [Tue, 6 Feb 2024 17:03:24 +0000 (09:03 -0800)]
xen/gntalloc: Replace UAPI 1-element array
Without changing the structure size (since it is UAPI), add a proper
flexible array member, and reference it in the kernel so that it will
not be trip the array-bounds sanitizer[1].
Now that the driver core can properly handle constant struct bus_type,
move the balloon_subsys variable to be a constant structure as well,
placing it into read-only memory which can not be modified at runtime.
Now that the driver core can properly handle constant struct bus_type,
move the xen_pcpu_subsys variable to be a constant structure as well,
placing it into read-only memory which can not be modified at runtime.
Markus Elfring [Sun, 28 Jan 2024 16:50:43 +0000 (17:50 +0100)]
xen/privcmd: Use memdup_array_user() in alloc_ioreq()
* The function “memdup_array_user” was added with the
commit 313ebe47d75558511aa1237b6e35c663b5c0ec6f ("string.h: add
array-wrappers for (v)memdup_user()").
Thus use it accordingly.
This issue was detected by using the Coccinelle software.
* Delete a label which became unnecessary with this refactoring.
Shannon Nelson [Sat, 10 Feb 2024 00:13:07 +0000 (16:13 -0800)]
ionic: minimal work with 0 budget
We should be doing as little as possible besides freeing Tx
space when our napi routines are called with budget of 0, so
jump out before doing anything besides Tx cleaning.
See commit afbed3f74830 ("net/mlx5e: do as little as possible in napi poll when budget is 0")
for more info.
Simon Horman [Thu, 8 Feb 2024 09:48:27 +0000 (09:48 +0000)]
net: stmmac: xgmac: use #define for string constants
The cited commit introduces and uses the string constants dpp_tx_err and
dpp_rx_err. These are assigned to constant fields of the array
dwxgmac3_error_desc.
It has been reported that on GCC 6 and 7.5.0 this results in warnings
such as:
.../dwxgmac2_core.c:836:20: error: initialiser element is not constant
{ true, "TDPES0", dpp_tx_err },
I have been able to reproduce this using: GCC 7.5.0, 8.4.0, 9.4.0 and 10.5.0.
But not GCC 13.2.0.
So it seems this effects older compilers but not newer ones.
As Jon points out in his report, the minimum compiler supported by
the kernel is GCC 5.1, so it does seem that this ought to be fixed.
It is not clear to me what combination of 'const', if any, would address
this problem. So this patch takes of using #defines for the string
constants
i40e: take into account XDP Tx queues when stopping rings
Seth reported that on his side XDP traffic can not survive a round of
down/up against i40e interface. Dmesg output was telling us that we were
not able to disable the very first XDP ring. That was due to the fact
that in i40e_vsi_stop_rings() in a pre-work that is done before calling
i40e_vsi_wait_queues_disabled(), XDP Tx queues were not taken into the
account.
To fix this, let us distinguish between Rx and Tx queue boundaries and
take into the account XDP queues for Tx side.
Reported-by: Seth Forshee <[email protected]> Closes: https://lore.kernel.org/netdev/ZbkE7Ep1N1Ou17sA@do-x1extreme/ Fixes: 65662a8dcdd0 ("i40e: Fix logic of disabling queues") Tested-by: Seth Forshee <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Reviewed-by: Ivan Vecera <[email protected]> Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>