Linus Torvalds [Thu, 1 Feb 2024 20:39:54 +0000 (12:39 -0800)]
Merge tag 'net-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter.
As Paolo promised we continue to hammer out issues in our selftests.
This is not the end but probably the peak.
Current release - regressions:
- smc: fix incorrect SMC-D link group matching logic
Current release - new code bugs:
- eth: bnxt: silence WARN() when device skips a timestamp, it happens
Previous releases - regressions:
- ipmr: fix null-deref when forwarding mcast packets
- conntrack: evaluate window negotiation only for packets in the
REPLY direction, otherwise SYN retransmissions trigger incorrect
window scale negotiation
- ipset: fix performance regression in swap operation
Previous releases - always broken:
- tcp: add sanity checks to types of pages getting into the rx
zerocopy path, we only support basic NIC -> user, no page cache
pages etc.
- ip6_tunnel: make sure to pull inner header in __ip6_tnl_rcv()
- nt_tables: more input sanitization changes
- dsa: mt7530: fix 10M/100M speed on MediaTek MT7988 switch
- bridge: mcast: fix loss of snooping after long uptime, jiffies do
wrap on 32bit
- xen-netback: properly sync TX responses, protect with locking
Linus Torvalds [Thu, 1 Feb 2024 20:32:43 +0000 (12:32 -0800)]
Merge tag 'parisc-for-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc architecture fixes from Helge Deller:
"The current exception handler, which helps on kernel accesses to
userspace, may exhibit data corruption. The problem is that it is not
guaranteed that the compiler will use the processor register we
specified in the source code, but may choose another register which
then will lead to silent register- and data corruption. To fix this
issue we now use another strategy to help the exception handler to
always find and set the error code into the correct CPU register.
The other fixes are small: fixing CPU hotplug bringup, fix the page
alignment of the RO_DATA section, added a check for the calculated
cache stride and fix possible hangups when printing longer output at
bootup when running on serial console.
Most of the patches are tagged for stable series.
- Fix random data corruption triggered by exception handler
- Fix crash when setting up BTLB at CPU bringup
- Prevent hung tasks when printing inventory on serial console
- Make RO_DATA page aligned in vmlinux.lds.S
- Add check for valid cache stride size"
* tag 'parisc-for-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: BTLB: Fix crash when setting up BTLB at CPU bringup
parisc: Fix random data corruption from exception handler
parisc: Drop unneeded semicolon in parse_tree_node()
parisc: Prevent hung tasks when printing inventory on serial console
parisc: Check for valid stride size for cache flushes
parisc: Make RO_DATA page aligned in vmlinux.lds.S
Linus Torvalds [Thu, 1 Feb 2024 19:57:42 +0000 (11:57 -0800)]
Merge tag 'kbuild-fixes-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix UML build with clang-18 and newer
- Avoid using the alias attribute in host programs
- Replace tabs with spaces when followed by conditionals for future GNU
Make versions
- Fix rpm-pkg for the systemd-provided kernel-install tool
- Fix the undefined behavior in Kconfig for a 'int' symbol used in a
conditional
* tag 'kbuild-fixes-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: initialize sym->curr.tri to 'no' for all symbol types again
kbuild: rpm-pkg: simplify installkernel %post
kbuild: Replace tabs with spaces when followed by conditionals
modpost: avoid using the alias attribute
kbuild: fix W= flags in the help message
modpost: Add '.ltext' and '.ltext.*' to TEXT_SECTIONS
um: Fix adding '-no-pie' for clang
kbuild: defconf: use SRCARCH to find merged configs
Linus Torvalds [Thu, 1 Feb 2024 18:19:34 +0000 (10:19 -0800)]
Merge tag 'hid-for-linus-2024020101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Benjamin Tissoires:
- cleanups in the error path in hid-steam (Dan Carpenter)
- fixes for Wacom tablets selftests that sneaked in while the CI was
taking a break during the year end holidays (Benjamin Tissoires)
- null pointer check in nvidia-shield (Kunwu Chan)
- memory leak fix in hidraw (Su Hui)
- another null pointer fix in i2c-hid-of (Johan Hovold)
- another memory leak fix in HID-BPF this time, as well as a double
fdget() fix reported by Dan Carpenter (Benjamin Tissoires)
- fix for Cirque touchpad when they go on suspend (Kai-Heng Feng)
- new device ID in hid-logitech-hidpp: "Logitech G Pro X SuperLight 2"
(Jiri Kosina)
* tag 'hid-for-linus-2024020101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: bpf: use __bpf_kfunc instead of noinline
HID: bpf: actually free hdev memory after attaching a HID-BPF program
HID: bpf: remove double fdget()
HID: i2c-hid-of: fix NULL-deref on failed power up
HID: hidraw: fix a problem of memory leak in hidraw_release()
HID: i2c-hid: Skip SET_POWER SLEEP for Cirque touchpad on system suspend
HID: nvidia-shield: Add missing null pointer checks to LED initialization
HID: logitech-hidpp: add support for Logitech G Pro X Superlight 2
selftests/hid: wacom: fix confidence tests
HID: hid-steam: Fix cleanup in probe()
HID: hid-steam: remove pointless error message
Linus Torvalds [Thu, 1 Feb 2024 18:12:53 +0000 (10:12 -0800)]
Merge tag 'firewire-fixes-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire fixes from Takashi Sakamoto:
"FireWire subsystem now supports the legacy layout of configuration
ROM, while it appears that some of DV devices in the early 2000's have
the legacy layout with a quirk. This includes some changes to handle
the quirk"
* tag 'firewire-fixes-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: core: search descriptor leaf just after vendor directory entry in root directory
firewire: core: correct documentation of fw_csr_string() kernel API
Linus Torvalds [Thu, 1 Feb 2024 18:06:55 +0000 (10:06 -0800)]
Merge tag 'regulator-fix-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"The main set of fixes here are for the PWM regulator, fixing
bootstrapping issues on some platforms where the hardware setup looked
like it was out of spec for the constraints we have for the regulator
causing us to make spurious and unhelpful changes to try to bring
things in line with the constraints.
There's also a couple of other driver specific fixes"
* tag 'regulator-fix-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator (max5970): Fix IRQ handler
regulator: ti-abb: don't use devm_platform_ioremap_resource_byname for shared interrupt register
regulator: pwm-regulator: Manage boot-on with disabled PWM channels
regulator: pwm-regulator: Calculate the output voltage for disabled PWMs
regulator: pwm-regulator: Add validity checks in continuous .get_voltage
Linus Torvalds [Thu, 1 Feb 2024 18:00:28 +0000 (10:00 -0800)]
Merge tag 'lsm-pr-20240131' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm fixes from Paul Moore:
"Two small patches to fix some problems relating to LSM hook return
values and how the individual LSMs interact"
* tag 'lsm-pr-20240131' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lsm: fix default return value of the socket_getpeersec_*() hooks
lsm: fix the logic in security_inode_getsecctx()
Jakub Kicinski [Thu, 1 Feb 2024 17:25:53 +0000 (09:25 -0800)]
Merge tag 'batadv-net-pullrequest-20240201' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- fix a timeout issue and a memory leak in batman-adv multicast,
by Linus Lüssing (2 patches)
* tag 'batadv-net-pullrequest-20240201' of git://git.open-mesh.org/linux-merge:
batman-adv: mcast: fix memory leak on deleting a batman-adv interface
batman-adv: mcast: fix mcast packet type counter on timeouted nodes
====================
Jakub Kicinski [Thu, 1 Feb 2024 17:14:13 +0000 (09:14 -0800)]
Merge tag 'nf-24-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) TCP conntrack now only evaluates window negotiation for packets in
the REPLY direction, from Ryan Schaefer. Otherwise SYN retransmissions
trigger incorrect window scale negotiation. From Ryan Schaefer.
2) Restrict tunnel objects to NFPROTO_NETDEV which is where it makes sense
to use this object type.
3) Fix conntrack pick up from the middle of SCTP_CID_SHUTDOWN_ACK packets.
From Xin Long.
4) Another attempt from Jozsef Kadlecsik to address the slow down of the
swap command in ipset.
5) Replace a BUG_ON by WARN_ON_ONCE in nf_log, and consolidate check for
the case that the logger is NULL from the read side lock section.
6) Address lack of sanitization for custom expectations. Restrict layer 3
and 4 families to what it is supported by userspace.
* tag 'nf-24-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectations
netfilter: nf_log: replace BUG_ON by WARN_ON_ONCE when putting logger
netfilter: ipset: fix performance regression in swap operation
netfilter: conntrack: check SCTP_CID_SHUTDOWN_ACK for vtag setting in sctp_new
netfilter: nf_tables: restrict tunnel object to NFPROTO_NETDEV
netfilter: conntrack: correct window scaling with retransmitted SYN
====================
idpf: avoid compiler padding in virtchnl2_ptype struct
In the arm random config file, kconfig option 'CONFIG_AEABI' is
disabled which results in adding the compiler flag '-mabi=apcs-gnu'.
This causes the compiler to add padding in virtchnl2_ptype
structure to align it to 8 bytes, resulting in the following
size check failure:
Avoid the compiler padding by using "__packed" structure
attribute for the virtchnl2_ptype struct. Also align the
structure by using "__aligned(2)" for better code optimization.
====================
mptcp: fixes for recent issues reported by CI's
This series of 9 patches fixes issues mostly identified by CI's not
managed by the MPTCP maintainers. Thank you Linero (LKFT) and Netdev
maintainers (NIPA) for running our kunit and selftests tests!
For the first patch, it took a bit of time to identify the root cause.
Some MPTCP Join selftest subtests have been "flaky", mostly in slow
environments. It appears to be due to the use of a TCP-specific helper
on an MPTCP socket. A fix for kernels >= v5.15.
Patches 2 to 4 add missing kernel config to support NetFilter tables
needed for IPTables commands. These kconfigs are usually enabled in
default configurations, but apparently not for all architectures.
Patches 2 and 3 can be backported up to v5.11 and the 4th one up to
v5.19.
Patch 5 increases the time limit for MPTCP selftests. It appears that
many CI's execute tests in a VM without acceleration supports, e.g. QEmu
without KVM. As a result, the tests take longer. Plus, there are more
and more tests. This patch modifies the timeout added in v5.18.
Patch 6 reduces the maximum rate and delay of the different links in
some Simult Flows selftest subtests. The goal is to let slow VMs reach
the maximum speed. The original rate was introduced in v5.11.
Patch 7 lets CI changing the prefix of the subtests titles, to be able
to run the same selftest multiple times with different parameters. With
different titles, tests will be considered as different and not override
previous results as it is the case with some CI envs. Subtests have been
introduced in v6.6.
Patch 8 and 9 make some MPTCP Join selftest subtests quicker by stopping
the transfer when the expected events have been seen. Patch 8 can be
backported up to v6.5.
selftests: mptcp: join: stop transfer when check is done (part 2)
Since the "Fixes" commits mentioned below, the newly added "userspace
pm" subtests of mptcp_join selftests are launching the whole transfer in
the background, do the required checks, then wait for the end of
transfer.
There is no need to wait longer, especially because the checks at the
end of the transfer are ignored (which is fine). This saves quite a few
seconds on slow environments.
While at it, use 'mptcp_lib_kill_wait()' helper everywhere, instead of
on a specific one with 'kill_tests_wait()'.
selftests: mptcp: join: stop transfer when check is done (part 1)
Since the "Fixes" commit mentioned below, "userspace pm" subtests of
mptcp_join selftests introduced in v6.5 are launching the whole transfer
in the background, do the required checks, then wait for the end of
transfer.
There is no need to wait longer, especially because the checks at the
end of the transfer are ignored (which is fine). This saves quite a few
seconds in slow environments.
Note that old versions will need commit bdbef0a6ff10 ("selftests: mptcp:
add mptcp_lib_kill_wait") as well to get 'mptcp_lib_kill_wait()' helper.
If a CI executes the same selftest multiple times with different
options, all results from the same subtests will have the same title,
which confuse the CI. With the same title printed in TAP, the tests are
considered as the same ones.
Now, it is possible to override this prefix by using MPTCP_LIB_KSFT_TEST
env var, and have a different title.
While at it, use 'basename' to remove the suffix as well instead of
using an extra 'sed'.
When running the simult_flow selftest in slow environments -- e.g. QEmu
without KVM support --, the results can be unstable. This selftest
checks if the aggregated bandwidth is (almost) fully used as expected.
To help improving the stability while still keeping the same validation
in place, the BW and the delay are reduced to lower the pressure on the
CPU.
On very slow environments -- e.g. when QEmu is used without KVM --,
mptcp_join.sh selftest can take a bit more than 20 minutes. Bump the
default timeout by 50% as it seems normal to take that long on some
environments.
When a debug kernel config is used, this selftest will take even longer,
but that's certainly not a common test env to consider for the timeout.
The Fixes tag that has been picked here is there simply to help having
this patch backported to older stable versions. It is difficult to point
to the exact commit that made some env reaching the timeout from time to
time.
Paolo Abeni [Wed, 31 Jan 2024 21:49:46 +0000 (22:49 +0100)]
mptcp: fix data re-injection from stale subflow
When the MPTCP PM detects that a subflow is stale, all the packet
scheduler must re-inject all the mptcp-level unacked data. To avoid
acquiring unneeded locks, it first try to check if any unacked data
is present at all in the RTX queue, but such check is currently
broken, as it uses TCP-specific helper on an MPTCP socket.
Funnily enough fuzzers and static checkers are happy, as the accessed
memory still belongs to the mptcp_sock struct, and even from a
functional perspective the recovery completed successfully, as
the short-cut test always failed.
A recent unrelated TCP change - commit d5fed5addb2b ("tcp: reorganize
tcp_sock fast path variables") - exposed the issue, as the tcp field
reorganization makes the mptcp code always skip the re-inection.
Fix the issue dropping the bogus call: we are on a slow path, the early
optimization proved once again to be evil.
Jakub Kicinski [Thu, 1 Feb 2024 16:36:39 +0000 (08:36 -0800)]
Merge branch 'selftests-net-more-small-fixes'
Benjamin Poirier says:
====================
selftests: net: More small fixes
Some small fixes for net selftests which follow from these recent commits: dd2d40acdbb2 ("selftests: bonding: Add more missing config options") 49078c1b80b6 ("selftests: forwarding: Remove executable bits from lib.sh")
====================
Benjamin Poirier [Wed, 31 Jan 2024 14:08:48 +0000 (09:08 -0500)]
selftests: forwarding: List helper scripts in TEST_FILES Makefile variable
Some scripts are not tests themselves; they contain utility functions used
by other tests. According to Documentation/dev-tools/kselftest.rst, such
files should be listed in TEST_FILES. Currently they are incorrectly listed
in TEST_PROGS_EXTENDED so rename the variable.
Benjamin Poirier [Wed, 31 Jan 2024 14:08:47 +0000 (09:08 -0500)]
selftests: net: List helper scripts in TEST_FILES Makefile variable
Some scripts are not tests themselves; they contain utility functions used
by other tests. According to Documentation/dev-tools/kselftest.rst, such
files should be listed in TEST_FILES. Move those utility scripts to
TEST_FILES.
Fixes: 1751eb42ddb5 ("selftests: net: use TEST_PROGS_EXTENDED") Fixes: 25ae948b4478 ("selftests/net: add lib.sh") Fixes: b99ac1841147 ("kselftests/net: add missed setup_loopback.sh/setup_veth.sh to Makefile") Fixes: f5173fe3e13b ("selftests: net: included needed helper in the install targets") Suggested-by: Petr Machata <[email protected]> Signed-off-by: Benjamin Poirier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
Benjamin Poirier [Wed, 31 Jan 2024 14:08:46 +0000 (09:08 -0500)]
selftests: net: Remove executable bits from library scripts
setup_loopback.sh and net_helper.sh are meant to be sourced from other
scripts, not executed directly. Therefore, remove the executable bits from
those files' permissions.
This change is similar to commit 49078c1b80b6 ("selftests: forwarding:
Remove executable bits from lib.sh")
Benjamin Poirier [Wed, 31 Jan 2024 14:08:45 +0000 (09:08 -0500)]
selftests: bonding: Check initial state
The purpose of the test_LAG_cleanup() function is to check that some
hardware addresses are removed from underlying devices after they have been
unenslaved. The test function simply checks that those addresses are not
present at the end. However, if the addresses were never added to begin
with due to some error in device setup, the test function currently passes.
This is a false positive since in that situation the test did not actually
exercise the intended functionality.
Add a check that the expected addresses are indeed present after device
setup. This makes the test function more robust.
I noticed this problem when running the team/dev_addr_lists.sh test on a
system without support for dummy and ipv6:
tools/testing/selftests/drivers/net/team# ./dev_addr_lists.sh
Error: Unknown device type.
Error: Unknown device type.
This program is not intended to be run as root.
RTNETLINK answers: Operation not supported
TEST: team cleanup mode lacp [ OK ]
Benjamin Poirier [Wed, 31 Jan 2024 14:08:44 +0000 (09:08 -0500)]
selftests: team: Add missing config options
Similar to commit dd2d40acdbb2 ("selftests: bonding: Add more missing
config options"), add more networking-specific config options which are
needed for team device tests.
For testing, I used the minimal config generated by virtme-ng and I added
the options in the config file. Afterwards, the team device test passed.
hv_netvsc: Fix race condition between netvsc_probe and netvsc_remove
In commit ac5047671758 ("hv_netvsc: Disable NAPI before closing the
VMBus channel"), napi_disable was getting called for all channels,
including all subchannels without confirming if they are enabled or not.
This caused hv_netvsc getting hung at napi_disable, when netvsc_probe()
has finished running but nvdev->subchan_work has not started yet.
netvsc_subchan_work() -> rndis_set_subchannel() has not created the
sub-channels and because of that netvsc_sc_open() is not running.
netvsc_remove() calls cancel_work_sync(&nvdev->subchan_work), for which
netvsc_subchan_work did not run.
netif_napi_add() sets the bit NAPI_STATE_SCHED because it ensures NAPI
cannot be scheduled. Then netvsc_sc_open() -> napi_enable will clear the
NAPIF_STATE_SCHED bit, so it can be scheduled. napi_disable() does the
opposite.
Now during netvsc_device_remove(), when napi_disable is called for those
subchannels, napi_disable gets stuck on infinite msleep.
This fix addresses this problem by ensuring that napi_disable() is not
getting called for non-enabled NAPI struct.
But netif_napi_del() is still necessary for these non-enabled NAPI struct
for cleanup purpose.
Jan Beulich [Mon, 29 Jan 2024 13:03:08 +0000 (14:03 +0100)]
xen-netback: properly sync TX responses
Invoking the make_tx_response() / push_tx_responses() pair with no lock
held would be acceptable only if all such invocations happened from the
same context (NAPI instance or dealloc thread). Since this isn't the
case, and since the interface "spec" also doesn't demand that multicast
operations may only be performed with no in-flight transmits,
MCAST_{ADD,DEL} processing also needs to acquire the response lock
around the invocations.
To prevent similar mistakes going forward, "downgrade" the present
functions to private helpers of just the two remaining ones using them
directly, with no forward declarations anymore. This involves renaming
what so far was make_tx_response(), for the new function of that name
to serve the new (wrapper) purpose.
While there,
- constify the txp parameters,
- correct xenvif_idx_release()'s status parameter's type,
- rename {,_}make_tx_response()'s status parameters for consistency with
xenvif_idx_release()'s.
Geetha sowjanya [Tue, 30 Jan 2024 12:06:10 +0000 (17:36 +0530)]
octeontx2-pf: Remove xdp queues on program detach
XDP queues are created/destroyed when a XDP program
is attached/detached. In current driver xdp_queues are not
getting destroyed on program exit due to incorrect xdp_queue
and tot_tx_queue count values.
This patch fixes the issue by setting tot_tx_queue and xdp_queue
count to correct values. It also fixes xdp.data_hard_start address.
firewire: core: search descriptor leaf just after vendor directory entry in root directory
It appears that Sony DVMC-DA1 has a quirk that the descriptor leaf entry
locates just after the vendor directory entry in root directory. This is
not conformant to the legacy layout of configuration ROM described in
Configuration ROM for AV/C Devices 1.0 (1394 Trading Association, Dec 2000,
TA Document 1999027).
This commit changes current implementation to parse configuration ROM for
device attributes so that the descriptor leaf entry can be detected for
the vendor name.
root directory
-----------------------------------------------------------------
1044 0006b681 directory_length 6, crc 46721
1048 03080046 vendor
1052 0c0083c0 node capabilities: per IEEE 1394
1056 8d00000a --> eui-64 leaf at 1096
1060 d1000003 --> unit directory at 1072
1064 c3000005 --> vendor directory at 1084
1068 8100000a --> descriptor leaf at 1108
unit directory at 1072
-----------------------------------------------------------------
1072 0002cdbf directory_length 2, crc 52671
1076 1200a02d specifier id
1080 13010000 version
vendor directory at 1084
-----------------------------------------------------------------
1084 00020cfe directory_length 2, crc 3326
1088 17fa0000 model
1092 81000008 --> descriptor leaf at 1124
Jakub Kicinski [Thu, 1 Feb 2024 05:08:11 +0000 (21:08 -0800)]
Merge branch 'selftests-net-a-few-pmtu-sh-fixes'
Paolo Abeni says:
====================
selftests: net: a few pmtu.sh fixes
This series try to address CI failures for the pmtu.sh tests. It
does _not_ attempt to enable all the currently skipped cases, to
avoid adding more entropy.
Paolo Abeni [Tue, 30 Jan 2024 17:47:18 +0000 (18:47 +0100)]
selftests: net: don't access /dev/stdout in pmtu.sh
When running the pmtu.sh via the kselftest infra, accessing
/dev/stdout gives unexpected results:
# dd: failed to open '/dev/stdout': Device or resource busy
# TEST: IPv4, bridged vxlan4: PMTU exceptions [FAIL]
Let dd use directly the standard output to fix the above:
# TEST: IPv4, bridged vxlan4: PMTU exceptions - nexthop objects [ OK ]
Paolo Abeni [Tue, 30 Jan 2024 17:47:17 +0000 (18:47 +0100)]
selftests: net: fix available tunnels detection
The pmtu.sh test tries to detect the tunnel protocols available
in the running kernel and properly skip the unsupported cases.
In a few more complex setup, such detection is unsuccessful, as
the script currently ignores some intermediate error code at
setup time.
Before:
# which: no nettest in (/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin)
# TEST: vti6: PMTU exceptions (ESP-in-UDP) [FAIL]
# PMTU exception wasn't created after creating tunnel exceeding link layer MTU
# ./pmtu.sh: line 931: kill: (7543) - No such process
# ./pmtu.sh: line 931: kill: (7544) - No such process
Paolo Abeni [Tue, 30 Jan 2024 17:47:16 +0000 (18:47 +0100)]
selftests: net: add missing config for pmtu.sh tests
The mentioned test uses a few Kconfig still missing the
net config, add them.
Before:
# Error: Specified qdisc kind is unknown.
# Error: Specified qdisc kind is unknown.
# Error: Qdisc not classful.
# We have an error talking to the kernel
# Error: Qdisc not classful.
# We have an error talking to the kernel
# policy_routing not supported
# TEST: ICMPv4 with DSCP and ECN: PMTU exceptions [SKIP]
After:
# TEST: ICMPv4 with DSCP and ECN: PMTU exceptions [ OK ]
Jakub Kicinski [Thu, 1 Feb 2024 02:27:01 +0000 (18:27 -0800)]
Merge branch 'pds_core-various-fixes'
Brett Creeley says:
====================
pds_core: Various fixes
This series includes the following changes:
There can be many users of the pds_core's adminq. This includes
pds_core's uses and any clients that depend on it. When the pds_core
device goes through a reset for any reason the adminq is freed
and reconfigured. There are some gaps in the current implementation
that will cause crashes during reset if any of the previously mentioned
users of the adminq attempt to use it after it's been freed.
Issues around how resets are handled, specifically regarding the driver's
error handlers.
Originally these patches were aimed at net-next, but it was requested to
push the fixes patches to net. The original patches can be found here:
Brett Creeley [Mon, 29 Jan 2024 23:40:35 +0000 (15:40 -0800)]
pds_core: Rework teardown/setup flow to be more common
Currently the teardown/setup flow for driver probe/remove is quite
a bit different from the reset flows in pdsc_fw_down()/pdsc_fw_up().
One key piece that's missing are the calls to pci_alloc_irq_vectors()
and pci_free_irq_vectors(). The pcie reset case is calling
pci_free_irq_vectors() on reset_prepare, but not calling the
corresponding pci_alloc_irq_vectors() on reset_done. This is causing
unexpected/unwanted interrupt behavior due to the adminq interrupt
being accidentally put into legacy interrupt mode. Also, the
pci_alloc_irq_vectors()/pci_free_irq_vectors() functions are being
called directly in probe/remove respectively.
Fix this inconsistency by making the following changes:
1. Always call pdsc_dev_init() in pdsc_setup(), which calls
pci_alloc_irq_vectors() and get rid of the now unused
pds_dev_reinit().
2. Always free/clear the pdsc->intr_info in pdsc_teardown()
since this structure will get re-alloced in pdsc_setup().
3. Move the calls of pci_free_irq_vectors() to pdsc_teardown()
since pci_alloc_irq_vectors() will always be called in
pdsc_setup()->pdsc_dev_init() for both the probe/remove and
reset flows.
4. Make sure to only create the debugfs "identity" entry when it
doesn't already exist, which it will in the reset case because
it's already been created in the initial call to pdsc_dev_init().
Brett Creeley [Mon, 29 Jan 2024 23:40:34 +0000 (15:40 -0800)]
pds_core: Clear BARs on reset
During reset the BARs might be accessed when they are
unmapped. This can cause unexpected issues, so fix it by
clearing the cached BAR values so they are not accessed
until they are re-mapped.
Also, make sure any places that can access the BARs
when they are NULL are prevented.
Brett Creeley [Mon, 29 Jan 2024 23:40:33 +0000 (15:40 -0800)]
pds_core: Prevent race issues involving the adminq
There are multiple paths that can result in using the pdsc's
adminq.
[1] pdsc_adminq_isr and the resulting work from queue_work(),
i.e. pdsc_work_thread()->pdsc_process_adminq()
[2] pdsc_adminq_post()
When the device goes through reset via PCIe reset and/or
a fw_down/fw_up cycle due to bad PCIe state or bad device
state the adminq is destroyed and recreated.
A NULL pointer dereference can happen if [1] or [2] happens
after the adminq is already destroyed.
In order to fix this, add some further state checks and
implement reference counting for adminq uses. Reference
counting was used because multiple threads can attempt to
access the adminq at the same time via [1] or [2]. Additionally,
multiple clients (i.e. pds-vfio-pci) can be using [2]
at the same time.
The adminq_refcnt is initialized to 1 when the adminq has been
allocated and is ready to use. Users/clients of the adminq
(i.e. [1] and [2]) will increment the refcnt when they are using
the adminq. When the driver goes into a fw_down cycle it will
set the PDSC_S_FW_DEAD bit and then wait for the adminq_refcnt
to hit 1. Setting the PDSC_S_FW_DEAD before waiting will prevent
any further adminq_refcnt increments. Waiting for the
adminq_refcnt to hit 1 allows for any current users of the adminq
to finish before the driver frees the adminq. Once the
adminq_refcnt hits 1 the driver clears the refcnt to signify that
the adminq is deleted and cannot be used. On the fw_up cycle the
driver will once again initialize the adminq_refcnt to 1 allowing
the adminq to be used again.
Brett Creeley [Mon, 29 Jan 2024 23:40:32 +0000 (15:40 -0800)]
pds_core: Use struct pdsc for the pdsc_adminq_isr private data
The initial design for the adminq interrupt was done based
on client drivers having their own adminq and adminq
interrupt. So, each client driver's adminq isr would use
their specific adminqcq for the private data struct. For the
time being the design has changed to only use a single
adminq for all clients. So, instead use the struct pdsc for
the private data to simplify things a bit.
This also has the benefit of not dereferencing the adminqcq
to access the pdsc struct when the PDSC_S_STOPPING_DRIVER bit
is set and the adminqcq has actually been cleared/freed.
Brett Creeley [Mon, 29 Jan 2024 23:40:31 +0000 (15:40 -0800)]
pds_core: Cancel AQ work on teardown
There is a small window where pdsc_work_thread()
calls pdsc_process_adminq() and pdsc_process_adminq()
passes the PDSC_S_STOPPING_DRIVER check and starts
to process adminq/notifyq work and then the driver
starts a fw_down cycle. This could cause some
undefined behavior if the notifyqcq/adminqcq are
free'd while pdsc_process_adminq() is running. Use
cancel_work_sync() on the adminqcq's work struct
to make sure any pending work items are cancelled
and any in progress work items are completed.
Also, make sure to not call cancel_work_sync() if
the work item has not be initialized. Without this,
traces will happen in cases where a reset fails and
teardown is called again or if reset fails and the
driver is removed.
Brett Creeley [Mon, 29 Jan 2024 23:40:30 +0000 (15:40 -0800)]
pds_core: Prevent health thread from running during reset/remove
The PCIe reset handlers can run at the same time as the
health thread. This can cause the health thread to
stomp on the PCIe reset. Fix this by preventing the
health thread from running while a PCIe reset is happening.
As part of this use timer_shutdown_sync() during reset and
remove to make sure the timer doesn't ever get rearmed.
Dmitry Safonov [Tue, 30 Jan 2024 03:51:54 +0000 (03:51 +0000)]
selftests/net: Repair RST passive reset selftest
Currently, the test is racy and seems to not pass anymore.
In order to rectify it, aim on TCP_TW_RST.
Doesn't seem way too good with this sleep() part, but it seems as
a reasonable compromise for the test. There is a plan in-line comment on
how-to improve it, going to do it on the top, at this moment I want it
to run on netdev/patchwork selftests dashboard.
It also slightly changes tcp_ao-lib in order to get SO_ERROR propagated
to test_client_verify() return value.
Dmitry Safonov [Tue, 30 Jan 2024 03:51:53 +0000 (03:51 +0000)]
selftests/net: Rectify key counters checks
As the names of (struct test_key) members didn't reflect whether the key
was used for TX or RX, the verification for the counters was done
incorrectly for asymmetrical selftests.
Rename these with _tx appendix and fix checks in verify_counters().
While at it, as the checks are now correct, introduce skip_counters_checks,
which is intended for tests where it's expected that a key that was set
with setsockopt(sk, IPPROTO_TCP, TCP_AO_INFO, ...) might had no chance
of getting used on the wire.
Fixes the following failures, exposed by the previous commit:
> not ok 51 server: Check current != rnext keys set before connect(): Counter pkt_good was expected to increase 0 => 0 for key 132:5
> not ok 52 server: Check current != rnext keys set before connect(): Counter pkt_good was not expected to increase 0 => 21 for key 137:10
>
> not ok 63 server: Check current flapping back on peer's RnextKey request: Counter pkt_good was expected to increase 0 => 0 for key 132:5
> not ok 64 server: Check current flapping back on peer's RnextKey request: Counter pkt_good was not expected to increase 0 => 40 for key 137:10
Mohammad Nassiri [Tue, 30 Jan 2024 03:51:52 +0000 (03:51 +0000)]
selftests/net: Argument value mismatch when calling verify_counters()
The end_server() function only operates in the server thread
and always takes an accept socket instead of a listen socket as
its input argument. To align with this, invert the boolean values
used when calling verify_counters() within the end_server() function.
As a result of this typo, the test didn't correctly check for
the non-symmetrical scenario, where i.e. peer-A uses a key <100:200>
to send data, but peer-B uses another key <105:205> to send its data.
So, in simple words, different keys for TX and RX.
Andrew Lunn [Mon, 29 Jan 2024 22:49:48 +0000 (23:49 +0100)]
net: dsa: mv88e6xxx: Fix failed probe due to unsupported C45 reads
Not all mv88e6xxx device support C45 read/write operations. Those
which do not return -EOPNOTSUPP. However, when phylib scans the bus,
it considers this fatal, and the probe of the MDIO bus fails, which in
term causes the mv88e6xxx probe as a whole to fail.
When there is no device on the bus for a given address, the pull up
resistor on the data line results in the read returning 0xffff. The
phylib core code understands this when scanning for devices on the
bus. C45 allows multiple devices to be supported at one address, so
phylib will perform a few reads at each address, so although thought
not the most efficient solution, it is a way to avoid fatal
errors. Make use of this as a minimal fix for stable to fix the
probing problems.
Follow up patches will rework how C45 operates to make it similar to
C22 which considers -ENODEV as a none-fatal, and swap mv88e6xxx to
using this.
Andrew Halaney [Mon, 29 Jan 2024 17:12:11 +0000 (11:12 -0600)]
MAINTAINERS: Drop unreachable reviewer for Qualcomm ETHQOS ethernet driver
Bhupesh's email responds indicating they've changed employers and with
no new contact information. Let's drop the line from MAINTAINERS to
avoid getting the same response over and over.
netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectations
- Disallow families other than NFPROTO_{IPV4,IPV6,INET}.
- Disallow layer 4 protocol with no ports, since destination port is a
mandatory attribute for this object.
Jozsef Kadlecsik [Mon, 29 Jan 2024 09:57:01 +0000 (10:57 +0100)]
netfilter: ipset: fix performance regression in swap operation
The patch "netfilter: ipset: fix race condition between swap/destroy
and kernel side add/del/test", commit 28628fa9 fixes a race condition.
But the synchronize_rcu() added to the swap function unnecessarily slows
it down: it can safely be moved to destroy and use call_rcu() instead.
Eric Dumazet pointed out that simply calling the destroy functions as
rcu callback does not work: sets with timeout use garbage collectors
which need cancelling at destroy which can wait. Therefore the destroy
functions are split into two: cancelling garbage collectors safely at
executing the command received by netlink and moving the remaining
part only into the rcu callback.
Xin Long [Thu, 25 Jan 2024 22:29:46 +0000 (17:29 -0500)]
netfilter: conntrack: check SCTP_CID_SHUTDOWN_ACK for vtag setting in sctp_new
The annotation says in sctp_new(): "If it is a shutdown ack OOTB packet, we
expect a return shutdown complete, otherwise an ABORT Sec 8.4 (5) and (8)".
However, it does not check SCTP_CID_SHUTDOWN_ACK before setting vtag[REPLY]
in the conntrack entry(ct).
Because of that, if the ct in Router disappears for some reason in [1]
with the packet sequence like below:
when processing HB ACK packet in Router it calls sctp_new() to initialize
the new ct with vtag[REPLY] set to HB_ACK packet's vtag.
Later when sending DATA from Client, all the SACKs from Server will get
dropped in Router, as the SACK packet's vtag does not match vtag[REPLY]
in the ct. The worst thing is the vtag in this ct will never get fixed
by the upcoming packets from Server.
This patch fixes it by checking SCTP_CID_SHUTDOWN_ACK before setting
vtag[REPLY] in the ct in sctp_new() as the annotation says. With this
fix, it will leave vtag[REPLY] in ct to 0 in the case above, and the
next HB REQ/ACK from Server is able to fix the vtag as its value is 0
in nf_conntrack_sctp_packet().
Ryan Schaefer [Sun, 21 Jan 2024 21:51:44 +0000 (21:51 +0000)]
netfilter: conntrack: correct window scaling with retransmitted SYN
commit c7aab4f17021 ("netfilter: nf_conntrack_tcp: re-init for syn packets
only") introduces a bug where SYNs in ORIGINAL direction on reused 5-tuple
result in incorrect window scale negotiation. This commit merged the SYN
re-initialization and simultaneous open or SYN retransmits cases. Merging
this block added the logic in tcp_init_sender() that performed window scale
negotiation to the retransmitted syn case. Previously. this would only
result in updating the sender's scale and flags. After the merge the
additional logic results in improperly clearing the scale in ORIGINAL
direction before any packets in the REPLY direction are received. This
results in packets incorrectly being marked invalid for being
out-of-window.
In order to fix the issue, only evaluate window negotiation for packets
in the REPLY direction. This was tested with simultaneous open, fast
open, and the above reproduction.
Fixes: c7aab4f17021 ("netfilter: nf_conntrack_tcp: re-init for syn packets only") Signed-off-by: Ryan Schaefer <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
Linus Torvalds [Wed, 31 Jan 2024 18:12:03 +0000 (10:12 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Six small fixes. Five are obvious and in drivers. The last one is a
core fix to remove the host lock acquisition and release, caused by a
dynamic check of host_busy, in the error handling loop which has been
reported to cause lockups"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: storvsc: Fix ring buffer size calculation
scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler
scsi: MAINTAINERS: Update ibmvscsi_tgt maintainer
scsi: initio: Remove redundant variable 'rb'
scsi: virtio_scsi: Remove duplicate check if queue is broken
scsi: isci: Fix an error code problem in isci_io_request_build()
Matthias May [Tue, 30 Jan 2024 10:12:18 +0000 (10:12 +0000)]
selftests: net: add missing config for GENEVE
l2_tos_ttl_inherit.sh verifies the inheritance of tos and ttl
for GRETAP, VXLAN and GENEVE.
Before testing it checks if the required module is available
and if not skips the tests accordingly.
Currently only GRETAP and VXLAN are tested because the GENEVE
module is missing.
Masahiro Yamada [Fri, 26 Jan 2024 13:30:10 +0000 (22:30 +0900)]
kconfig: initialize sym->curr.tri to 'no' for all symbol types again
Geert Uytterhoeven reported that commit 4e244c10eab3 ("kconfig: remove
unneeded symbol_empty variable") changed the default value of
CONFIG_LOG_CPU_MAX_BUF_SHIFT from 12 to 0.
As it turned out, this is an undefined behavior because sym_calc_value()
stopped setting the sym->curr.tri field for 'int', 'hex', and 'string'
symbols.
This commit restores the original behavior, where 'int', 'hex', 'string'
symbols are interpreted as false if used in boolean contexts.
CONFIG_LOG_CPU_MAX_BUF_SHIFT will default to 12 again, irrespective
of CONFIG_BASE_SMALL. Presumably, this is not the intended behavior,
as already reported [1], but this is another issue that should be
addressed by a separate patch.
The new installkernel application that is now included in systemd-udev
package allows installation although destination files are already present
in the boot directory of the kernel package, but is failing with the
implemented workaround for the old installkernel application from grubby
package.
For the new installkernel application, as Davide says:
<<The %post currently does a shuffling dance before calling installkernel.
This isn't actually necessary afaict, and the current implementation
ends up triggering downstream issues such as
https://github.com/systemd/systemd/issues/29568
This commit simplifies the logic to remove the shuffling. For reference,
the original logic was added in commit 3c9c7a14b627("rpm-pkg: add %post
section to create initramfs and grub hooks").>>
But we need to keep the old behavior as well, because the old installkernel
application from grubby package, does not allow this simplification and
we need to be backward compatible to avoid issues with the different
packages.
Mimic Fedora shipping process and store vmlinuz, config amd System.map
in the module directory instead of the boot directory. In this way, we will
avoid the commented problem for all the cases, because the new destination
files are not going to exist in the boot directory of the kernel package.
Replace installkernel tool with kernel-install tool, because the latter is
more complete.
Besides, after installkernel tool execution, check to complete if the
correct package files vmlinuz, System.map and config files are present
in /boot directory, and if necessary, copy manually for install operation.
In this way, take into account if files were not previously copied from
/usr/lib/kernel/install.d/* scripts and if the suitable files for the
requested package are present (it could be others if the rpm files were
replace with a new pacakge with the same release and a different build).
Tested with Fedora 38, Fedora 39, RHEL 9, Oracle Linux 9.3,
openSUSE Tumbleweed and openMandrive ROME, using dnf/zypper and rpm tools.
Dmitry Goncharov [Sun, 28 Jan 2024 15:10:39 +0000 (10:10 -0500)]
kbuild: Replace tabs with spaces when followed by conditionals
This is needed for the future (post make-4.4.1) versions of gnu make.
Starting from https://git.savannah.gnu.org/cgit/make.git/commit/?id=07fcee35f058a876447c8a021f9eb1943f902534
gnu make won't allow conditionals to follow recipe prefix.
For example there is a tab followed by ifeq on line 324 in the root Makefile.
With the new make this conditional causes the following
Masahiro Yamada [Sat, 27 Jan 2024 13:28:11 +0000 (22:28 +0900)]
modpost: avoid using the alias attribute
Aiden Leong reported modpost fails to build on macOS since commit 16a473f60edc ("modpost: inform compilers that fatal() never returns"):
scripts/mod/modpost.c:93:21: error: aliases are not supported on darwin
Nathan's research indicates that Darwin seems to support weak aliases
at least [1]. Although the situation might be improved in future Clang
versions, we can achieve a similar outcome without relying on it.
This commit makes fatal() a macro of error() + exit(1) in modpost.h, as
compilers recognize that exit() never returns.
Helge Deller [Wed, 31 Jan 2024 12:37:25 +0000 (13:37 +0100)]
parisc: BTLB: Fix crash when setting up BTLB at CPU bringup
When using hotplug and bringing up a 32-bit CPU, ask the firmware about the
BTLB information to set up the static (block) TLB entries.
For that write access to the static btlb_info struct is needed, but
since it is marked __ro_after_init the kernel segfaults with missing
write permissions.
Fix the crash by dropping the __ro_after_init annotation.
Fixes: e5ef93d02d6c ("parisc: BTLB: Initialize BTLB tables at CPU startup") Signed-off-by: Helge Deller <[email protected]> Cc: <[email protected]> # v6.6+
Follow the docs at Documentation/bpf/kfuncs.rst:
- declare the function with `__bpf_kfunc`
- disables missing prototype warnings, which allows to remove them from
include/linux/hid-bpf.h
Removing the prototypes is not an issue because we currently have to
redeclare them when writing the BPF program. They will eventually be
generated by bpftool directly AFAIU.
HID: bpf: actually free hdev memory after attaching a HID-BPF program
Turns out that I got my reference counts wrong and each successful
bus_find_device() actually calls get_device(), and we need to manually
call put_device().
Ensure each bus_find_device() gets a matching put_device() when releasing
the bpf programs and fix all the error paths.
When the kfunc hid_bpf_attach_prog() is called, we called twice fdget():
one for fetching the type of the bpf program, and one for actually
attaching the program to the device.
The problem is that between those two calls, we have no guarantees that
the prog_fd is still the same file descriptor for the given program.
Solve this by calling bpf_prog_get() earlier, and use this to fetch the
program type.
Linus Torvalds [Wed, 31 Jan 2024 05:27:50 +0000 (21:27 -0800)]
Merge tag 'erofs-for-6.8-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
- fix an infinite loop issue of sub-page compressed data support found
with lengthy stress tests on a 64k-page arm64 VM
- optimize the temporary buffer allocation for low-memory scenarios,
which can reduce 20.21% on average under a heavy multi-app launch
benchmark workload
- get rid of unnecessary GFP_NOFS
* tag 'erofs-for-6.8-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: relaxed temporary buffers allocation on readahead
erofs: fix infinite loop due to a race of filling compressed_bvecs
erofs: get rid of unneeded GFP_NOFS
Jakub Kicinski [Wed, 31 Jan 2024 02:35:27 +0000 (18:35 -0800)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-01-29 (e1000e, ixgbe)
This series contains updates to e1000e and ixgbe drivers.
Jake corrects values used for maximum frequency adjustment for e1000e.
Christophe Jaillet adjusts error handling path so that semaphore is
released on ixgbe.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ixgbe: Fix an error handling path in ixgbe_read_iosf_sb_reg_x550()
e1000e: correct maximum frequency adjustment values
====================
Parav Pandit [Mon, 29 Jan 2024 19:10:59 +0000 (21:10 +0200)]
devlink: Fix referring to hw_addr attribute during state validation
When port function state change is requested, and when the driver
does not support it, it refers to the hw address attribute instead
of state attribute. Seems like a copy paste error.
Fix it by referring to the port function state attribute.
Linus Lüssing [Sat, 27 Jan 2024 17:50:32 +0000 (18:50 +0100)]
bridge: mcast: fix disabled snooping after long uptime
The original idea of the delay_time check was to not apply multicast
snooping too early when an MLD querier appears. And to instead wait at
least for MLD reports to arrive before switching from flooding to group
based, MLD snooped forwarding, to avoid temporary packet loss.
However in a batman-adv mesh network it was noticed that after 248 days of
uptime 32bit MIPS based devices would start to signal that they had
stopped applying multicast snooping due to missing queriers - even though
they were the elected querier and still sending MLD queries themselves.
While time_is_before_jiffies() generally is safe against jiffies
wrap-arounds, like the code comments in jiffies.h explain, it won't
be able to track a difference larger than ULONG_MAX/2. With a 32bit
large jiffies and one jiffies tick every 10ms (CONFIG_HZ=100) on these MIPS
devices running OpenWrt this would result in a difference larger than
ULONG_MAX/2 after 248 (= 2^32/100/60/60/24/2) days and
time_is_before_jiffies() would then start to return false instead of
true. Leading to multicast snooping not being applied to multicast
packets anymore.
Fix this issue by using a proper timer_list object which won't have this
ULONG_MAX/2 difference limitation.
Ido Schimmel [Mon, 29 Jan 2024 12:37:03 +0000 (14:37 +0200)]
selftests: net: Add missing matchall classifier
One of the test cases in the test_bridge_backup_port.sh selftest relies
on a matchall classifier to drop unrelated traffic so that the Tx drop
counter on the VXLAN device will only be incremented as a result of
traffic generated by the test.
However, the configuration option for the matchall classifier is
missing from the configuration file which might explain the failures we
see in the netdev CI [1].
Fix by adding CONFIG_NET_CLS_MATCHALL to the configuration file.
[1]
# Backup nexthop ID - invalid IDs
# -------------------------------
[...]
# TEST: Forwarding out of vx0 [ OK ]
# TEST: No forwarding using backup nexthop ID [ OK ]
# TEST: Tx drop increased [FAIL]
# TEST: IPv6 address family nexthop as backup nexthop [ OK ]
# TEST: No forwarding out of swp1 [ OK ]
# TEST: Forwarding out of vx0 [ OK ]
# TEST: No forwarding using backup nexthop ID [ OK ]
# TEST: Tx drop increased [FAIL]
[...]
Linus Torvalds [Tue, 30 Jan 2024 23:12:58 +0000 (15:12 -0800)]
Merge tag 'linux_kselftest-kunit-fixes-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit fixes from Shuah Khan:
"NULL vs IS_ERR() bug fixes, documentation update, MAINTAINERS file
update to add Rae Moar as a reviewer, and a fix to run test suites
only after module initialization completes"
* tag 'linux_kselftest-kunit-fixes-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
Documentation: KUnit: Update the instructions on how to test static functions
kunit: run test suites only after module initialization completes
MAINTAINERS: kunit: Add Rae Moar as a reviewer
kunit: device: Fix a NULL vs IS_ERR() check in init()
kunit: Fix a NULL vs IS_ERR() bug
Linus Torvalds [Tue, 30 Jan 2024 23:10:17 +0000 (15:10 -0800)]
Merge tag 'linux_kselftest-fixes-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"Three fixes to livepatch, rseq, and seccomp tests"
* tag 'linux_kselftest-fixes-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kselftest/seccomp: Report each expectation we assert as a KTAP test
kselftest/seccomp: Use kselftest output functions for benchmark
selftests/livepatch: fix and refactor new dmesg message code
selftests/rseq: Do not skip !allowed_cpus for mm_cid
Ondrej Mosnacek [Fri, 26 Jan 2024 18:45:31 +0000 (19:45 +0100)]
lsm: fix default return value of the socket_getpeersec_*() hooks
For these hooks the true "neutral" value is -EOPNOTSUPP, which is
currently what is returned when no LSM provides this hook and what LSMs
return when there is no security context set on the socket. Correct the
value in <linux/lsm_hooks.h> and adjust the dispatch functions in
security/security.c to avoid issues when the BPF LSM is enabled.
Cc: [email protected] Fixes: 98e828a0650f ("security: Refactor declaration of LSM hooks") Signed-off-by: Ondrej Mosnacek <[email protected]>
[PM: subject line tweak] Signed-off-by: Paul Moore <[email protected]>
Linus Torvalds [Tue, 30 Jan 2024 19:34:49 +0000 (11:34 -0800)]
soc: apple: mailbox: error pointers are negative integers
In an entirely unrelated discussion where I pointed out a stupid thinko
of mine, Rasmus piped up and noted that that obvious mistake already
existed elsewhere in the kernel tree.
An "error pointer" is the negative error value encoded as a pointer,
making the whole "return error or valid pointer" use-case simple and
straightforward. We use it all over the kernel.
But the key here is that errors are _negative_ error numbers, not the
horrid UNIX user-level model of "-1 and the value of 'errno'".
The Apple mailbox driver used the positive error values, and thus just
returned invalid normal pointers instead of actual errors.
Of course, the reason nobody ever noticed is that the errors presumably
never actually happen, so this is fixing a conceptual bug rather than an
actual one.
Helge Deller [Sat, 20 Jan 2024 14:29:27 +0000 (15:29 +0100)]
parisc: Fix random data corruption from exception handler
The current exception handler implementation, which assists when accessing
user space memory, may exhibit random data corruption if the compiler decides
to use a different register than the specified register %r29 (defined in
ASM_EXCEPTIONTABLE_REG) for the error code. If the compiler choose another
register, the fault handler will nevertheless store -EFAULT into %r29 and thus
trash whatever this register is used for.
Looking at the assembly I found that this happens sometimes in emulate_ldd().
To solve the issue, the easiest solution would be if it somehow is
possible to tell the fault handler which register is used to hold the error
code. Using %0 or %1 in the inline assembly is not posssible as it will show
up as e.g. %r29 (with the "%r" prefix), which the GNU assembler can not
convert to an integer.
This patch takes another, better and more flexible approach:
We extend the __ex_table (which is out of the execution path) by one 32-word.
In this word we tell the compiler to insert the assembler instruction
"or %r0,%r0,%reg", where %reg references the register which the compiler
choosed for the error return code.
In case of an access failure, the fault handler finds the __ex_table entry and
can examine the opcode. The used register is encoded in the lowest 5 bits, and
the fault handler can then store -EFAULT into this register.
Since we extend the __ex_table to 3 words we can't use the BUILDTIME_TABLE_SORT
config option any longer.
Mark Brown [Wed, 24 Jan 2024 13:00:19 +0000 (13:00 +0000)]
kselftest/seccomp: Report each expectation we assert as a KTAP test
The seccomp benchmark test makes a number of checks on the performance it
measures and logs them to the output but does so in a custom format which
none of the automated test runners understand meaning that the chances that
anyone is paying attention are slim. Let's additionally log each result in
KTAP format so that automated systems parsing the test output will see each
comparison as a test case. The original logs are left in place since they
provide the actual numbers for analysis.
As part of this rework the flow for the main program so that when we skip
tests we still log all the tests we skip, this is because the standard KTAP
headers and footers include counts of the number of expected and run tests.
Mark Brown [Wed, 24 Jan 2024 13:00:18 +0000 (13:00 +0000)]
kselftest/seccomp: Use kselftest output functions for benchmark
In preparation for trying to output the test results themselves in TAP
format rework all the prints in the benchmark to use the kselftest output
functions. The uses of system() all produce single line output so we can
avoid having to deal with fully managing the child process and continue to
use system() by simply printing an empty message before we invoke system().
We also leave one printf() used to complete a line of output in place.
Joe Lawrence [Wed, 20 Dec 2023 15:11:51 +0000 (10:11 -0500)]
selftests/livepatch: fix and refactor new dmesg message code
The livepatching kselftests rely on comparing expected vs. observed
dmesg output. After each test, new dmesg entries are determined by the
'comm' utility comparing a saved, pre-test copy of dmesg to post-test
dmesg output.
Alexander reports that the 'comm --nocheck-order -13' invocation used by
the tests can be confused when dmesg entry timestamps vary in magnitude
(ie, "[ 98.820331]" vs. "[ 100.031067]"), in which case, additional
messages are reported as new. The unexpected entries then spoil the
test results.
Instead of relying on 'comm' or 'diff' to determine new testing dmesg
entries, refactor the code:
- pre-test : log a unique canary dmesg entry
- test : run tests, log messages
- post-test : filter dmesg starting from pre-test message
Patrick Rudolph [Tue, 30 Jan 2024 15:02:56 +0000 (20:32 +0530)]
regulator (max5970): Fix IRQ handler
The max5970 datasheet gives the impression that IRQ status bits must
be cleared by writing a one to set bits, as those are marked with 'R/C',
however tests showed that a zero must be written.
Fixes an IRQ storm as the interrupt handler actually clears the IRQ
status bits.
Eric Dumazet [Fri, 26 Jan 2024 16:55:32 +0000 (16:55 +0000)]
llc: call sock_orphan() at release time
syzbot reported an interesting trace [1] caused by a stale sk->sk_wq
pointer in a closed llc socket.
In commit ff7b11aa481f ("net: socket: set sock->sk to NULL after
calling proto_ops::release()") Eric Biggers hinted that some protocols
are missing a sock_orphan(), we need to perform a full audit.
In net-next, I plan to clear sock->sk from sock_orphan() and
amend Eric patch to add a warning.
[1]
BUG: KASAN: slab-use-after-free in list_empty include/linux/list.h:373 [inline]
BUG: KASAN: slab-use-after-free in waitqueue_active include/linux/wait.h:127 [inline]
BUG: KASAN: slab-use-after-free in sock_def_write_space_wfree net/core/sock.c:3384 [inline]
BUG: KASAN: slab-use-after-free in sock_wfree+0x9a8/0x9d0 net/core/sock.c:2468
Read of size 8 at addr ffff88802f4fc880 by task ksoftirqd/1/27
The buggy address belongs to the object at ffff88802f4fc800
which belongs to the cache sock_inode_cache of size 1408
The buggy address is located 128 bytes inside of
freed 1408-byte region [ffff88802f4fc800, ffff88802f4fcd80)
Memory state around the buggy address: ffff88802f4fc780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88802f4fc800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88802f4fc880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff88802f4fc900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88802f4fc980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
====================
net: stmmac: dwmac-imx: Time Based Scheduling support
This small patch series allows using TBS support of the i.MX Ethernet QOS
controller for etf qdisc offload.
It achieves this in a similar manner that it is done in dwmac-intel.c,
dwmac-mediatek.c and stmmac_pci.c.
Changes since v1:
- Simplified for loop by starting at index 1.
- Fixed problem with indentation.
====================
Helge Deller [Fri, 26 Jan 2024 08:32:20 +0000 (09:32 +0100)]
ipv6: Ensure natural alignment of const ipv6 loopback and router addresses
On a parisc64 kernel I sometimes notice this kernel warning:
Kernel unaligned access to 0x40ff8814 at ndisc_send_skb+0xc0/0x4d8
The address 0x40ff8814 points to the in6addr_linklocal_allrouters
variable and the warning simply means that some ipv6 function tries to
read a 64-bit word directly from the not-64-bit aligned
in6addr_linklocal_allrouters variable.
Unaligned accesses are non-critical as the architecture or exception
handlers usually will fix it up at runtime. Nevertheless it may trigger
a performance penality for some architectures. For details read the
"unaligned-memory-access" kernel documentation.
The patch below ensures that the ipv6 loopback and router addresses will
always be naturally aligned. This prevents the unaligned accesses for
all architectures.
Linus Torvalds [Tue, 30 Jan 2024 02:42:34 +0000 (18:42 -0800)]
Merge tag 'trace-v6.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
"Two small fixes for tracefs and eventfs:
- Fix register_snapshot_trigger() on allocation error
If the snapshot fails to allocate, the register_snapshot_trigger()
can still return success. If the call to
tracing_alloc_snapshot_instance() returned anything but 0, it
returned 0, but it should have been returning the error code from
that allocation function.
- Remove leftover code from tracefs doing a dentry walk on remount.
The update_gid() function was called by the tracefs code on remount
to update the gid of eventfs, but that is no longer the case, but
that code wasn't deleted. Nothing calls it. Remove it"
* tag 'trace-v6.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracefs: remove stale 'update_gid' code
tracing/trigger: Fix to return error if failed to alloc snapshot
Michal Vokáč [Fri, 26 Jan 2024 10:49:35 +0000 (11:49 +0100)]
net: dsa: qca8k: fix illegal usage of GPIO
When working with GPIO, its direction must be set either when the GPIO is
requested by gpiod_get*() or later on by one of the gpiod_direction_*()
functions. Neither of this is done here which results in undefined
behavior on some systems.
As the reset GPIO is used right after it is requested here, it makes sense
to configure it as GPIOD_OUT_HIGH right away. With that, the following
gpiod_set_value_cansleep(1) becomes redundant and can be safely
removed.
Linus Torvalds [Tue, 30 Jan 2024 01:12:16 +0000 (17:12 -0800)]
Merge tag 'mm-hotfixes-stable-2024-01-28-23-21' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"22 hotfixes. 11 are cc:stable and the remainder address post-6.7
issues or aren't considered appropriate for backporting"
* tag 'mm-hotfixes-stable-2024-01-28-23-21' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits)
mm: thp_get_unmapped_area must honour topdown preference
mm: huge_memory: don't force huge page alignment on 32 bit
userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb
selftests/mm: ksm_tests should only MADV_HUGEPAGE valid memory
scs: add CONFIG_MMU dependency for vfree_atomic()
mm/memory: fix folio_set_dirty() vs. folio_mark_dirty() in zap_pte_range()
mm/huge_memory: fix folio_set_dirty() vs. folio_mark_dirty()
selftests/mm: Update va_high_addr_switch.sh to check CPU for la57 flag
selftests: mm: fix map_hugetlb failure on 64K page size systems
MAINTAINERS: supplement of zswap maintainers update
stackdepot: make fast paths lock-less again
stackdepot: add stats counters exported via debugfs
mm, kmsan: fix infinite recursion due to RCU critical section
mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again
selftests/mm: switch to bash from sh
MAINTAINERS: add man-pages git trees
mm: memcontrol: don't throttle dying tasks on memory.high
mm: mmap: map MAP_STACK to VM_NOHUGEPAGE
uprobes: use pagesize-aligned virtual address when replacing pages
selftests/mm: mremap_test: fix build warning
...