Merge branch 'bpf: fix NULL dereference during extable search'
Krister Johansen says:
====================
Hi,
Enclosed are a pair of patches for an oops that can occur if an exception is
generated while a bpf subprogram is running. One of the bpf_prog_aux entries
for the subprograms are missing an extable. This can lead to an exception that
would otherwise be handled turning into a NULL pointer bug.
These changes were tested via the verifier and progs selftests and no
regressions were observed.
Changes from v4:
- Ensure that num_exentries is copied to prog->aux from func[0] (Feedback from
Ilya Leoshkevich)
Changes from v3:
- Selftest style fixups (Feedback from Yonghong Song)
- Selftest needs to assert that test bpf program executed (Feedback from
Yonghong Song)
- Selftest should combine open and load using open_and_load (Feedback from
Yonghong Song)
Changes from v2:
- Insert only the main program's kallsyms (Feedback from Yonghong Song and
Alexei Starovoitov)
- Selftest should use ASSERT instead of CHECK (Feedback from Yonghong Song)
- Selftest needs some cleanup (Feedback from Yonghong Song)
- Switch patch order (Feedback from Alexei Starovoitov)
Changes from v1:
- Add a selftest (Feedback From Alexei Starovoitov)
- Move to a 1-line verifier change instead of searching multiple extables
====================
Krister Johansen [Tue, 13 Jun 2023 00:44:54 +0000 (17:44 -0700)]
selftests/bpf: add a test for subprogram extables
In certain situations a program with subprograms may have a NULL
extable entry. This should not happen, and when it does, it turns a
single trap into multiple. Add a test case for further debugging and to
prevent regressions.
The test-case contains three essentially identical versions of the same
test because just one program may not be sufficient to trigger the oops.
This is due to the fact that the items are stored in a binary tree and
have identical values so it's possible to sometimes find the ksym with
the extable. With 3 copies, this has been reliable on this author's
test systems.
When triggered out of this test case, the oops looks like this:
Krister Johansen [Tue, 13 Jun 2023 00:44:40 +0000 (17:44 -0700)]
bpf: ensure main program has an extable
When subprograms are in use, the main program is not jit'd after the
subprograms because jit_subprogs sets a value for prog->bpf_func upon
success. Subsequent calls to the JIT are bypassed when this value is
non-NULL. This leads to a situation where the main program and its
func[0] counterpart are both in the bpf kallsyms tree, but only func[0]
has an extable. Extables are only created during JIT. Now there are
two nearly identical program ksym entries in the tree, but only one has
an extable. Depending upon how the entries are placed, there's a chance
that a fault will call search_extable on the aux with the NULL entry.
Since jit_subprogs already copies state from func[0] to the main
program, include the extable pointer in this state duplication.
Additionally, ensure that the copy of the main program in func[0] is not
added to the bpf_prog_kallsyms table. Instead, let the main program get
added later in bpf_prog_load(). This ensures there is only a single
copy of the main program in the kallsyms table, and that its tag matches
the tag observed by tooling like bpftool.
Janne Grunau [Sun, 12 Feb 2023 12:16:32 +0000 (13:16 +0100)]
nios2: dts: Fix tse_mac "max-frame-size" property
The given value of 1518 seems to refer to the layer 2 ethernet frame
size without 802.1Q tag. Actual use of the "max-frame-size" including in
the consumer of the "altr,tse-1.0" compatible is the MTU.
Abe Kohandel [Tue, 13 Jun 2023 16:21:03 +0000 (09:21 -0700)]
spi: dw: Replace incorrect spi_get_chipselect with set
Commit 445164e8c136 ("spi: dw: Replace spi->chip_select references with
function calls") replaced direct access to spi.chip_select with
spi_*_chipselect calls but incorrectly replaced a set instance with a
get instance, replace the incorrect instance.
Rob Herring [Tue, 18 Apr 2023 15:06:06 +0000 (10:06 -0500)]
dt-bindings: pinctrl: qcom,pmic-mpp: Fix schema for "qcom,paired"
The "qcom,paired" schema is all wrong. First, it's a list rather than an
object(dictionary). Second, it is missing a required type. The meta-schema
normally catches this, but schemas under "$defs" was not getting checked.
A fix for that is pending.
Takashi Iwai [Tue, 13 Jun 2023 11:22:40 +0000 (13:22 +0200)]
regmap: regcache: Don't sync read-only registers
regcache_maple_sync() tries to sync all cached values no matter
whether it's writable or not. OTOH, regache_sync_val() does care the
wrtability and returns -EIO for a read-only register. This results in
an error message like:
snd_hda_codec_realtek hdaudioC0D0: Unable to sync register 0x2f0009. -5
and the sync loop is aborted incompletely.
This patch adds the writable register check to regcache_sync_val() for
addressing the bug above.
Note that, although we may add the check in the caller side
(regcache_maple_sync()), here we put in regcache_sync_val(), so that a
similar case like this can be avoided in future.
Jakub Kicinski [Mon, 12 Jun 2023 23:55:45 +0000 (16:55 -0700)]
Merge branch 'mptcp-fixes'
Matthieu Baerts says:
====================
selftests: mptcp: skip tests not supported by old kernels (part 3)
After a few years of increasing test coverage in the MPTCP selftests, we
realised [1] the last version of the selftests is supposed to run on old
kernels without issues.
Supporting older versions is not that easy for this MPTCP case: these
selftests are often validating the internals by checking packets that
are exchanged, when some MIB counters are incremented after some
actions, how connections are getting opened and closed in some cases,
etc. In other words, it is not limited to the socket interface between
the userspace and the kernelspace.
In addition to that, the current MPTCP selftests run a lot of different
sub-tests but the TAP13 protocol used in the selftests don't support
sub-tests: one failure in sub-tests implies that the whole selftest is
seen as failed at the end because sub-tests are not tracked. It is then
important to skip sub-tests not supported by old kernels.
To minimise the modifications and reduce the complexity to support old
versions, the idea is to look at external signs and skip the whole
selftest or just some sub-tests before starting them. This cannot be
applied in all cases.
Similar to the second part, this third one focuses on marking different
sub-tests as skipped if some MPTCP features are not supported. This
time, only in "mptcp_join.sh" selftest, the remaining one, is modified.
Several techniques are used here to achieve this task:
- Before starting some tests:
- Check if a file (sysctl knob) is present: that's what patch 12/17 is
doing for the userspace PM feature.
- Check if a required kernel symbol is present in /proc/kallsyms:
patches 9, 10, 14 and 15/17 are using this technique.
- Check if it is possible to setup a particular network environment
requiring Netfilter or TC: if the preparation step fail, the linked
sub-test is marked as skipped. Patch 5/17 is doing that.
- Check if a MIB counter is available: patches 7 and 13/17 do that.
- Check if the kernel version is newer than a specific one: patch 1/17
adds some helpers in mptcp_lib.sh to ease its use. That's not ideal
and it is only used as last resort but as mentioned above, it is
important to skip tests if they are not supported not to have the
whole selftest always being marked as failed on old kernels. Patches
11 and 17/17 are checking the kernel version. An alternative would
be to ignore the results for some sub-tests but that's not ideal
too. Note that SELFTESTS_MPTCP_LIB_NO_KVERSION_CHECK env var can be
set to 1 not to skip these tests if the running kernel doesn't have
a supported version.
- After having launched the tests:
- Adapt the expectations depending on the presence of a kernel symbol
(patch 6/17) or a kernel version (patch 8/17).
- Check is a MIB counter is available and skip the verification if
not. Patch 4/17 is using this technique.
Before skipping tests, SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var
value is checked: if it is set to 1, the test is marked as "failed"
instead of "skipped". MPTCP public CI expects to have all features
supported and it sets this env var to 1 to catch regressions in these
new checks.
Patch 2/17 uses 'iptables-legacy' if available because it might be
needed when using an older kernel not supporting iptables-nft.
Patch 3/17 adds some helpers used in the other patches mentioned to
easily mark sub-tests as skipped.
Patch 16/17 uniforms MPTCP Join "listener" tests: it was imported code
from userspace_pm.sh but without using the "code style" and ways of
using tools and printing messages from MPTCP Join selftest.
Matthieu Baerts [Sat, 10 Jun 2023 16:11:52 +0000 (18:11 +0200)]
selftests: mptcp: join: skip mixed tests if not supported
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the support of a mix of subflows in v4 and v6 by the
in-kernel PM introduced by commit b9d69db87fb7 ("mptcp: let the
in-kernel PM use mixed IPv4 and IPv6 addresses").
It looks like there is no external sign we can use to predict the
expected behaviour. Instead of accepting different behaviours and thus
not really checking for the expected behaviour, we are looking here for
a specific kernel version. That's not ideal but it looks better than
removing the test because it cannot support older kernel versions.
Matthieu Baerts [Sat, 10 Jun 2023 16:11:51 +0000 (18:11 +0200)]
selftests: mptcp: join: uniform listener tests
The alignment was different from the other tests because tabs were used
instead of spaces.
While at it, also use 'echo' instead of 'printf' to print the result to
keep the same style as done in the other sub-tests. And, even if it
should be better with, also remove 'stdbuf' and sed's '--unbuffered'
option because they are not used in the other subtests and they are not
available when using a minimal environment with busybox.
Matthieu Baerts [Sat, 10 Jun 2023 16:11:49 +0000 (18:11 +0200)]
selftests: mptcp: join: skip MPC backups tests if not supported
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the support of sending an MP_PRIO signal for the initial
subflow, introduced by commit c157bbe776b7 ("mptcp: allow the in kernel
PM to set MPC subflow priority").
It is possible to look for "mptcp_subflow_send_ack" in kallsyms because
it was needed to introduce the mentioned feature. So we can know in
advance if the feature is supported instead of trying and accepting any
results.
Matthieu Baerts [Sat, 10 Jun 2023 16:11:48 +0000 (18:11 +0200)]
selftests: mptcp: join: skip fail tests if not supported
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the support of the MP_FAIL / infinite mapping introduced
by commit 1e39e5a32ad7 ("mptcp: infinite mapping sending") and the
following ones.
It is possible to look for one of the infinite mapping counters to know
in advance if the this feature is available.
Matthieu Baerts [Sat, 10 Jun 2023 16:11:47 +0000 (18:11 +0200)]
selftests: mptcp: join: skip userspace PM tests if not supported
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the support of the userspace PM introduced by commit 4638de5aefe5 ("mptcp: handle local addrs announced by userspace PMs")
and the following ones.
It is possible to look for the MPTCP pm_type's sysctl knob to know in
advance if the userspace PM is available.
Matthieu Baerts [Sat, 10 Jun 2023 16:11:46 +0000 (18:11 +0200)]
selftests: mptcp: join: skip fullmesh flag tests if not supported
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the support of the fullmesh flag for the in-kernel PM
introduced by commit 2843ff6f36db ("mptcp: remote addresses fullmesh")
and commit 1a0d6136c5f0 ("mptcp: local addresses fullmesh").
It looks like there is no easy external sign we can use to predict the
expected behaviour. We could add the flag and then check if it has been
added but for that, and for each fullmesh test, we would need to setup a
new environment, do the checks, clean it and then only start the test
from yet another clean environment. To keep it simple and avoid
introducing new issues, we look for a specific kernel version. That's
not ideal but an acceptable solution for this case.
Matthieu Baerts [Sat, 10 Jun 2023 16:11:45 +0000 (18:11 +0200)]
selftests: mptcp: join: skip backup if set flag on ID not supported
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
Commit bccefb762439 ("selftests: mptcp: simplify pm_nl_change_endpoint")
has simplified the way the backup flag is set on an endpoint. Instead of
doing:
./pm_nl_ctl set 10.0.2.1 flags backup
Now we do:
./pm_nl_ctl set id 1 flags backup
The new way is easier to maintain but it is also incompatible with older
kernels not supporting the implicit endpoints putting in place the
infrastructure to set flags per ID, hence the second Fixes tag.
Matthieu Baerts [Sat, 10 Jun 2023 16:11:44 +0000 (18:11 +0200)]
selftests: mptcp: join: skip implicit tests if not supported
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the support of the implicit endpoints introduced by
commit d045b9eb95a9 ("mptcp: introduce implicit endpoints").
It is possible to look for "mptcp_subflow_send_ack" in kallsyms because
it was needed to introduce the mentioned feature. So we can know in
advance if the feature is supported instead of trying and accepting any
results.
Note that here and in the following commits, we re-do the same check for
each sub-test of the same function for a few reasons. The main one is
not to break the ID assign to each test in order to be able to easily
compare results between different kernel versions. Also, we can still
run a specific test even if it is skipped. Another reason is that it
makes it clear during the review that a specific subtest will be skipped
or not under certain conditions. At the end, it looks OK to call the
exact same helper multiple times: it is not a critical path and it is
the same code that is executed, not really more cases to maintain.
Matthieu Baerts [Sat, 10 Jun 2023 16:11:43 +0000 (18:11 +0200)]
selftests: mptcp: join: support RM_ADDR for used endpoints or not
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
At some points, a new feature caused internal behaviour changes we are
verifying in the selftests, see the Fixes tag below. It was not a UAPI
change but because in these selftests, we check some internal
behaviours, it is normal we have to adapt them from time to time after
having added some features.
It looks like there is no external sign we can use to predict the
expected behaviour. Instead of accepting different behaviours and thus
not really checking for the expected behaviour, we are looking here for
a specific kernel version. That's not ideal but it looks better than
removing the test because it cannot support older kernel versions.
Matthieu Baerts [Sat, 10 Jun 2023 16:11:42 +0000 (18:11 +0200)]
selftests: mptcp: join: skip Fastclose tests if not supported
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the support of MP_FASTCLOSE introduced in commit f284c0c77321 ("mptcp: implement fastclose xmit path").
If the MIB counter is not available, the test cannot be verified and the
behaviour will not be the expected one. So we can skip the test if the
counter is missing.
Matthieu Baerts [Sat, 10 Jun 2023 16:11:41 +0000 (18:11 +0200)]
selftests: mptcp: join: support local endpoint being tracked or not
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
At some points, a new feature caused internal behaviour changes we are
verifying in the selftests, see the Fixes tag below. It was not a uAPI
change but because in these selftests, we check some internal
behaviours, it is normal we have to adapt them from time to time after
having added some features.
It is possible to look for "mptcp_pm_subflow_check_next" in kallsyms
because it was needed to introduce the mentioned feature. So we can know
in advance what the behaviour we are expecting here instead of
supporting the two behaviours.
Matthieu Baerts [Sat, 10 Jun 2023 16:11:40 +0000 (18:11 +0200)]
selftests: mptcp: join: skip test if iptables/tc cmds fail
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
Some tests are using IPTables and/or TC commands to force some
behaviours. If one of these commands fails -- likely because some
features are not available due to missing kernel config -- we should
intercept the error and skip the tests requiring these features.
Note that if we expect to have these features available and if
SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var is set to 1, the tests
will be marked as failed instead of skipped.
This patch also replaces the 'exit 1' by 'return 1' not to stop the
selftest in the middle without the conclusion if there is an issue with
NF or TC.
Matthieu Baerts [Sat, 10 Jun 2023 16:11:39 +0000 (18:11 +0200)]
selftests: mptcp: join: skip check if MIB counter not supported
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the MPTCP MIB counters introduced in commit fc518953bc9c
("mptcp: add and use MIB counter infrastructure") and more later. The
MPTCP Join selftest heavily relies on these counters.
If a counter is not supported by the kernel, it is not displayed when
using 'nstat -z'. We can then detect that and skip the verification. A
new helper (get_counter()) has been added to do the required checks and
return an error if the counter is not available.
Note that if we expect to have these features available and if
SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var is set to 1, the tests
will be marked as failed instead of skipped.
This new helper also makes sure we get the exact counter we want to
avoid issues we had in the past, e.g. with MPTcpExtRmAddr and
MPTcpExtRmAddrDrop sharing the same prefix. While at it, we uniform the
way we fetch a MIB counter.
Note for the backports: we rarely change these modified blocks so if
there is are conflicts, it is very likely because a counter is not used
in the older kernels and we don't need that chunk.
Matthieu Baerts [Sat, 10 Jun 2023 16:11:38 +0000 (18:11 +0200)]
selftests: mptcp: join: helpers to skip tests
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
Here are some helpers that will be used to mark subtests as skipped if a
feature is not supported. Marking as a fix for the commit introducing
this selftest to help with the backports.
While at it, also check if kallsyms feature is available as it will also
be used in the following commits to check if MPTCP features are
available before starting a test.
Matthieu Baerts [Sat, 10 Jun 2023 16:11:36 +0000 (18:11 +0200)]
selftests: mptcp: lib: skip if not below kernel version
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
A new function is now available to easily detect if a feature is
missing by looking at the kernel version. That's clearly not ideal and
this kind of check should be avoided as soon as possible. But sometimes,
there are no external sign that a "feature" is available or not:
internal behaviours can change without modifying the uAPI and these
selftests are verifying the internal behaviours. Sometimes, the only
(easy) way to verify if the feature is present is to run the test but
then the validation cannot determine if there is a failure with the
feature or if the feature is missing. Then it looks better to check the
kernel version instead of having tests that can never fail. In any case,
we need a solution not to have a whole selftest being marked as failed
just because one sub-test has failed.
Note that this env var car be set to 1 not to do such check and run the
linked sub-test: SELFTESTS_MPTCP_LIB_NO_KVERSION_CHECK.
This new helper is going to be used in the following commits. In order
to ease the backport of such future patches, it would be good if this
patch is backported up to the introduction of MPTCP selftests, hence the
Fixes tag below: this type of check was supposed to be done from the
beginning.
====================
fixes for Q-USGMII speeds and autoneg
This is the second version of a small changeset for QUSGMII support,
fixing inconsistencies in reported max speed and control word parsing.
As reported here [1], there are some inconsistencies for the Q-USGMII
mode speeds and configuration. The first patch in this fixup series
makes so that we correctly report the max speed of 1Gbps for this mode.
The second patch uses a dedicated helper to decode the control word.
This is necessary as although USGMII control words are close to USXGMII,
they don't support the same speeds.
net: phylink: use a dedicated helper to parse usgmii control word
Q-USGMII is a derivative of USGMII, that uses a specific formatting for
the control word. The layout is close to the USXGMII control word, but
doesn't support speeds over 1Gbps. Use a dedicated decoding logic for
the USGMII control word, re-using USXGMII definitions but only considering
10/100/1000Mbps speeds
Fixes: 5e61fe157a27 ("net: phy: Introduce QUSGMII PHY mode") Signed-off-by: Maxime Chevallier <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
net: phylink: report correct max speed for QUSGMII
Q-USGMII is the quad port version of USGMII, and supports a max speed of
1Gbps on each line. Make so that phylink_interface_max_speed() reports
this information correctly.
Fixes: ae0e4bb2a0e0 ("net: phylink: Adjust link settings based on rate matching") Signed-off-by: Maxime Chevallier <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
Linus Torvalds [Mon, 12 Jun 2023 23:14:34 +0000 (16:14 -0700)]
Merge tag 'mm-hotfixes-stable-2023-06-12-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"19 hotfixes. 14 are cc:stable and the remainder address issues which
were introduced during this development cycle or which were considered
inappropriate for a backport"
* tag 'mm-hotfixes-stable-2023-06-12-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
zswap: do not shrink if cgroup may not zswap
page cache: fix page_cache_next/prev_miss off by one
ocfs2: check new file size on fallocate call
mailmap: add entry for John Keeping
mm/damon/core: fix divide error in damon_nr_accesses_to_accesses_bp()
epoll: ep_autoremove_wake_function should use list_del_init_careful
mm/gup_test: fix ioctl fail for compat task
nilfs2: reject devices with insufficient block count
ocfs2: fix use-after-free when unmounting read-only filesystem
lib/test_vmalloc.c: avoid garbage in page array
nilfs2: fix possible out-of-bounds segment allocation in resize ioctl
riscv/purgatory: remove PGO flags
powerpc/purgatory: remove PGO flags
x86/purgatory: remove PGO flags
kexec: support purgatories with .text.hot sections
mm/uffd: allow vma to merge as much as possible
mm/uffd: fix vma operation where start addr cuts part of vma
radix-tree: move declarations to header
nilfs2: fix incomplete buffer cleanup in nilfs_btnode_abort_change_key()
igc: Fix possible system crash when loading module
Guarantee that when probe() is run again, PTM and PCI busmaster will be
in the same state as it was if the driver was never loaded.
Avoid an i225/i226 hardware issue that PTM requests can be made even
though PCI bus mastering is not enabled. These unexpected PTM requests
can crash some systems.
So, "force" disable PTM and busmastering before removing the driver,
so they can be re-enabled in the right order during probe(). This is
more like a workaround and should be applicable for i225 and i226, in
any platform.
There could be a race condition during link down where interrupt
being generated and igc_clean_tx_irq() been called to perform the
TX completion. Properly clear the TX buffer/descriptor ring and
disable the TX Queue ring in igc_free_tx_resources() to avoid that.
Nhat Pham [Tue, 30 May 2023 22:24:40 +0000 (15:24 -0700)]
zswap: do not shrink if cgroup may not zswap
Before storing a page, zswap first checks if the number of stored pages
exceeds the limit specified by memory.zswap.max, for each cgroup in the
hierarchy. If this limit is reached or exceeded, then zswap shrinking is
triggered and short-circuits the store attempt.
However, since the zswap's LRU is not memcg-aware, this can create the
following pathological behavior: the cgroup whose zswap limit is 0 will
evict pages from other cgroups continually, without lowering its own zswap
usage. This means the shrinking will continue until the need for swap
ceases or the pool becomes empty.
As a result of this, we observe a disproportionate amount of zswap
writeback and a perpetually small zswap pool in our experiments, even
though the pool limit is never hit.
More generally, a cgroup might unnecessarily evict pages from other
cgroups before we drive the memcg back below its limit.
This patch fixes the issue by rejecting zswap store attempt without
shrinking the pool when obj_cgroup_may_zswap() returns false.
Mike Kravetz [Fri, 2 Jun 2023 22:57:47 +0000 (15:57 -0700)]
page cache: fix page_cache_next/prev_miss off by one
Ackerley Tng reported an issue with hugetlbfs fallocate here[1]. The
issue showed up after the conversion of hugetlb page cache lookup code to
use page_cache_next_miss. Code in hugetlb fallocate, userfaultfd and GUP
is now using page_cache_next_miss to determine if a page is present the
page cache. The following statement is used.
There are two issues with page_cache_next_miss when used in this way.
1) If the passed value for index is equal to the 'wrap-around' value,
the same index will always be returned. This wrap-around value is 0,
so 0 will be returned even if page is present at index 0.
2) If there is no gap in the range passed, the last index in the range
will be returned. When passed a range of 1 as above, the passed
index value will be returned even if the page is present.
The end result is the statement above will NEVER indicate a page is
present in the cache, even if it is.
As noted by Ackerley in [1], users can see this by hugetlb fallocate
incorrectly returning EEXIST if pages are already present in the file. In
addition, hugetlb pages will not be included in core dumps if they need to
be brought in via GUP. userfaultfd UFFDIO_COPY also uses this code and
will not notice pages already present in the cache. It may try to
allocate a new page and potentially return ENOMEM as opposed to EEXIST.
Both page_cache_next_miss and page_cache_prev_miss have similar issues.
Fix by:
- Check for index equal to 'wrap-around' value and do not exit early.
- If no gap is found in range, return index outside range.
- Update function description to say 'wrap-around' value could be
returned if passed as index.
Luís Henriques [Mon, 29 May 2023 15:26:45 +0000 (16:26 +0100)]
ocfs2: check new file size on fallocate call
When changing a file size with fallocate() the new size isn't being
checked. In particular, the FSIZE ulimit isn't being checked, which makes
fstest generic/228 fail. Simply adding a call to inode_newsize_ok() fixes
this issue.
Kefeng Wang [Sat, 27 May 2023 03:21:01 +0000 (11:21 +0800)]
mm/damon/core: fix divide error in damon_nr_accesses_to_accesses_bp()
If 'aggr_interval' is smaller than 'sample_interval', max_nr_accesses in
damon_nr_accesses_to_accesses_bp() becomes zero which leads to divide
error, let's validate the values of them in damon_set_attrs() to fix it,
which similar to others attrs check.
Benjamin Segall [Tue, 30 May 2023 18:32:28 +0000 (11:32 -0700)]
epoll: ep_autoremove_wake_function should use list_del_init_careful
autoremove_wake_function uses list_del_init_careful, so should epoll's
more aggressive variant. It only doesn't because it was copied from an
older wait.c rather than the most recent.
Ryusuke Konishi [Fri, 26 May 2023 02:13:32 +0000 (11:13 +0900)]
nilfs2: reject devices with insufficient block count
The current sanity check for nilfs2 geometry information lacks checks for
the number of segments stored in superblocks, so even for device images
that have been destructively truncated or have an unusually high number of
segments, the mount operation may succeed.
This causes out-of-bounds block I/O on file system block reads or log
writes to the segments, the latter in particular causing
"a_ops->writepages" to repeatedly fail, resulting in sync_inodes_sb() to
hang.
Fix this issue by checking the number of segments stored in the superblock
and avoiding mounting devices that can cause out-of-bounds accesses. To
eliminate the possibility of overflow when calculating the number of
blocks required for the device from the number of segments, this also adds
a helper function to calculate the upper bound on the number of segments
and inserts a check using it.
Luís Henriques [Mon, 22 May 2023 10:21:12 +0000 (11:21 +0100)]
ocfs2: fix use-after-free when unmounting read-only filesystem
It's trivial to trigger a use-after-free bug in the ocfs2 quotas code using
fstest generic/452. After a read-only remount, quotas are suspended and
ocfs2_mem_dqinfo is freed through ->ocfs2_local_free_info(). When unmounting
the filesystem, an UAF access to the oinfo will eventually cause a crash.
BUG: KASAN: slab-use-after-free in timer_delete+0x54/0xc0
Read of size 8 at addr ffff8880389a8208 by task umount/669
...
Call Trace:
<TASK>
...
timer_delete+0x54/0xc0
try_to_grab_pending+0x31/0x230
__cancel_work_timer+0x6c/0x270
ocfs2_disable_quotas.isra.0+0x3e/0xf0 [ocfs2]
ocfs2_dismount_volume+0xdd/0x450 [ocfs2]
generic_shutdown_super+0xaa/0x280
kill_block_super+0x46/0x70
deactivate_locked_super+0x4d/0xb0
cleanup_mnt+0x135/0x1f0
...
</TASK>
Lorenzo Stoakes [Wed, 24 May 2023 08:24:24 +0000 (09:24 +0100)]
lib/test_vmalloc.c: avoid garbage in page array
It turns out that alloc_pages_bulk_array() does not treat the page_array
parameter as an output parameter, but rather reads the array and skips any
entries that have already been allocated.
This is somewhat unexpected and breaks this test, as we allocate the pages
array uninitialised on the assumption it will be overwritten.
As a result, the test was referencing uninitialised data and causing the
PFN to not be valid and thus a WARN_ON() followed by a null pointer deref
and panic.
In addition, this is an array of pointers not of struct page objects, so we
need only allocate an array with elements of pointer size.
We solve both problems by simply using kcalloc() and referencing
sizeof(struct page *) rather than sizeof(struct page).
Ryusuke Konishi [Wed, 24 May 2023 09:43:48 +0000 (18:43 +0900)]
nilfs2: fix possible out-of-bounds segment allocation in resize ioctl
Syzbot reports that in its stress test for resize ioctl, the log writing
function nilfs_segctor_do_construct hits a WARN_ON in
nilfs_segctor_truncate_segments().
It turned out that there is a problem with the current implementation of
the resize ioctl, which changes the writable range on the device (the
range of allocatable segments) at the end of the resize process.
This order is necessary for file system expansion to avoid corrupting the
superblock at trailing edge. However, in the case of a file system
shrink, if log writes occur after truncating out-of-bounds trailing
segments and before the resize is complete, segments may be allocated from
the truncated space.
The userspace resize tool was fine as it limits the range of allocatable
segments before performing the resize, but it can run into this issue if
the resize ioctl is called alone.
Fix this issue by changing nilfs_sufile_resize() to update the range of
allocatable segments immediately after successful truncation of segment
space in case of file system shrink.
And both of them have their range [sh_addr ... sh_addr+sh_size] on the
area pointed by `e_entry`.
This causes that image->start is calculated twice, once for .text and
another time for .text.hot. The second calculation leaves image->start
in a random location.
Because of this, the system crashes immediately after:
Peter Xu [Wed, 17 May 2023 19:09:16 +0000 (15:09 -0400)]
mm/uffd: allow vma to merge as much as possible
We used to not pass in the pgoff correctly when register/unregister uffd
regions, it caused incorrect behavior on vma merging and can cause
mergeable vmas being separate after ioctls return.
For example, when we have:
vma1(range 0-9, with uffd), vma2(range 10-19, no uffd)
Then someone unregisters uffd on range (5-9), it should logically become:
vma1(range 0-4, with uffd), vma2(range 5-19, no uffd)
But with current code we'll have:
vma1(range 0-4, with uffd), vma3(range 5-9, no uffd), vma2(range 10-19, no uffd)
This patch allows such merge to happen correctly before ioctl returns.
This behavior seems to have existed since the 1st day of uffd. Since
pgoff for vma_merge() is only used to identify the possibility of vma
merging, meanwhile here what we did was always passing in a pgoff smaller
than what we should, so there should have no other side effect besides not
merging it. Let's still tentatively copy stable for this, even though I
don't see anything will go wrong besides vma being split (which is mostly
not user visible).
Peter Xu [Wed, 17 May 2023 19:09:15 +0000 (15:09 -0400)]
mm/uffd: fix vma operation where start addr cuts part of vma
Patch series "mm/uffd: Fix vma merge/split", v2.
This series contains two patches that fix vma merge/split for userfaultfd
on two separate issues.
Patch 1 fixes a regression since 6.1+ due to something we overlooked when
converting to maple tree apis. The plan is we use patch 1 to replace the
commit "2f628010799e (mm: userfaultfd: avoid passing an invalid range to
vma_merge())" in mm-hostfixes-unstable tree if possible, so as to bring
uffd vma operations back aligned with the rest code again.
Patch 2 fixes a long standing issue that vma can be left unmerged even if
we can for either uffd register or unregister.
Many thanks to Lorenzo on either noticing this issue from the assert
movement patch, looking at this problem, and also provided a reproducer on
the unmerged vma issue [1].
It seems vma merging with uffd paths is broken with either
register/unregister, where right now we can feed wrong parameters to
vma_merge() and it's found by recent patch which moved asserts upwards in
vma_merge() by Lorenzo Stoakes:
It's possible that "start" is contained within vma but not clamped to its
start. We need to convert this into either "cannot merge" case or "can
merge" case 4 which permits subdivision of prev by assigning vma to prev.
As we loop, each subsequent VMA will be clamped to the start.
This patch will eliminate the report and make sure vma_merge() calls will
become legal again.
One thing to mention is that the "Fixes: 29417d292bd0" below is there only
to help explain where the warning can start to trigger, the real commit to
fix should be 69dbe6daf104. Commit 29417d292bd0 helps us to identify the
issue, but unfortunately we may want to keep it in Fixes too just to ease
kernel backporters for easier tracking.
Arnd Bergmann [Tue, 16 May 2023 19:41:54 +0000 (21:41 +0200)]
radix-tree: move declarations to header
The xarray.c file contains the only call to radix_tree_node_rcu_free(),
and it comes with its own extern declaration for it. This means the
function definition causes a missing-prototype warning:
lib/radix-tree.c:288:6: error: no previous prototype for 'radix_tree_node_rcu_free' [-Werror=missing-prototypes]
Instead, move the declaration for this function to a new header that can
be included by both, and do the same for the radix_tree_node_cachep
variable that has the same underlying problem but does not cause a warning
with gcc.
Ryusuke Konishi [Sat, 13 May 2023 10:24:28 +0000 (19:24 +0900)]
nilfs2: fix incomplete buffer cleanup in nilfs_btnode_abort_change_key()
A syzbot fault injection test reported that nilfs_btnode_create_block, a
helper function that allocates a new node block for b-trees, causes a
kernel BUG for disk images where the file system block size is smaller
than the page size.
This was due to unexpected flags on the newly allocated buffer head, and
it turned out to be because the buffer flags were not cleared by
nilfs_btnode_abort_change_key() after an error occurred during a b-tree
update operation and the buffer was later reused in that state.
Fix this issue by using nilfs_btnode_delete() to abandon the unused
preallocated buffer in nilfs_btnode_abort_change_key().
Jens Axboe [Mon, 12 Jun 2023 18:11:57 +0000 (12:11 -0600)]
io_uring/io-wq: don't clear PF_IO_WORKER on exit
A recent commit gated the core dumping task exit logic on current->flags
remaining consistent in terms of PF_{IO,USER}_WORKER at task exit time.
This exposed a problem with the io-wq handling of that, which explicitly
clears PF_IO_WORKER before calling do_exit().
The reasons for this manual clear of PF_IO_WORKER is historical, where
io-wq used to potentially trigger a sleep on exit. As the io-wq thread
is exiting, it should not participate any further accounting. But these
days we don't need to rely on current->flags anymore, so we can safely
remove the PF_IO_WORKER clearing.
Linus Torvalds [Mon, 12 Jun 2023 17:53:35 +0000 (10:53 -0700)]
Merge tag 'for-6.4-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A more fixes and regression fixes:
- in subpage mode, fix crash when repairing metadata at the end of
a stripe
- properly enable async discard when remounting from read-only to
read-write
- scrub regression fixes:
- respect read-only scrub when attempting to do a repair
- fix reporting of found errors, the stats don't get properly
accounted after a stripe repair"
* tag 'for-6.4-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: scrub: also report errors hit during the initial read
btrfs: scrub: respect the read-only flag during repair
btrfs: properly enable async discard when switching from RO->RW
btrfs: subpage: fix a crash in metadata repair path
Yonghong Song [Fri, 9 Jun 2023 00:54:39 +0000 (17:54 -0700)]
bpf: Fix a bpf_jit_dump issue for x86_64 with sysctl bpf_jit_enable.
The sysctl net/core/bpf_jit_enable does not work now due to commit 1022a5498f6f ("bpf, x86_64: Use bpf_jit_binary_pack_alloc"). The
commit saved the jitted insns into 'rw_image' instead of 'image'
which caused bpf_jit_dump not dumping proper content.
With 'echo 2 > /proc/sys/net/core/bpf_jit_enable', run
'./test_progs -t fentry_test'. Without this patch, one of jitted
image for one particular prog is:
flen=17 proglen=92 pass=4 image=0000000014c64883 from=test_progs pid=1807 00000000: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 00000010: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 00000020: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 00000030: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 00000040: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 00000050: cc cc cc cc cc cc cc cc cc cc cc cc
With this patch, the jitte image for the same prog is:
Herbert Xu [Wed, 7 Jun 2023 08:38:47 +0000 (16:38 +0800)]
xfrm: Use xfrm_state selector for BEET input
For BEET the inner address and therefore family is stored in the
xfrm_state selector. Use that when decapsulating an input packet
instead of incorrectly relying on a non-existent tunnel protocol.
Dan Carpenter [Fri, 9 Jun 2023 11:05:19 +0000 (14:05 +0300)]
sctp: fix an error code in sctp_sf_eat_auth()
The sctp_sf_eat_auth() function is supposed to enum sctp_disposition
values and returning a kernel error code will cause issues in the
caller. Change -ENOMEM to SCTP_DISPOSITION_NOMEM.
Fixes: 65b07e5d0d09 ("[SCTP]: API updates to suport SCTP-AUTH extensions.") Signed-off-by: Dan Carpenter <[email protected]> Acked-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Dan Carpenter [Fri, 9 Jun 2023 11:04:43 +0000 (14:04 +0300)]
sctp: handle invalid error codes without calling BUG()
The sctp_sf_eat_auth() function is supposed to return enum sctp_disposition
values but if the call to sctp_ulpevent_make_authkey() fails, it returns
-ENOMEM.
This results in calling BUG() inside the sctp_side_effects() function.
Calling BUG() is an over reaction and not helpful. Call WARN_ON_ONCE()
instead.
Hangbin Liu [Fri, 9 Jun 2023 09:15:02 +0000 (17:15 +0800)]
ipvlan: fix bound dev checking for IPv6 l3s mode
The commit 59a0b022aa24 ("ipvlan: Make skb->skb_iif track skb->dev for l3s
mode") fixed ipvlan bonded dev checking by updating skb skb_iif. This fix
works for IPv4, as in raw_v4_input() the dif is from inet_iif(skb), which
is skb->skb_iif when there is no route.
But for IPv6, the fix is not enough, because in ipv6_raw_deliver() ->
raw_v6_match(), the dif is inet6_iif(skb), which is returns IP6CB(skb)->iif
instead of skb->skb_iif if it's not a l3_slave. To fix the IPv6 part
issue. Let's set IP6CB(skb)->iif to correct ifindex.
BTW, ipvlan handles NS/NA specifically. Since it works fine, I will not
reset IP6CB(skb)->iif when addr->atype is IPVL_ICMPV6.
Jakub Kicinski [Thu, 8 Jun 2023 16:23:44 +0000 (09:23 -0700)]
net: ethtool: correct MAX attribute value for stats
When compiling YNL generated code compiler complains about
array-initializer-out-of-bounds. Turns out the MAX value
for STATS_GRP uses the value for STATS.
This may lead to random corruptions in user space (kernel
itself doesn't use this value as it never parses stats).
Fixes: f09ea6fb1272 ("ethtool: add a new command for reading standard stats") Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Shyam Prasad N [Fri, 9 Jun 2023 17:46:58 +0000 (17:46 +0000)]
cifs: fix max_credits implementation
The current implementation of max_credits on the client does
not work because the CreditRequest logic for several commands
does not take max_credits into account.
Still, we can end up asking the server for more credits, depending
on the number of credits in flight. For this, we need to
limit the credits while parsing the responses too.
Shyam Prasad N [Fri, 9 Jun 2023 17:46:59 +0000 (17:46 +0000)]
cifs: fix sockaddr comparison in iface_cmp
iface_cmp used to simply do a memcmp of the two
provided struct sockaddrs. The comparison needs to do more
based on the address family. Similar logic was already
present in cifs_match_ipaddr. Doing something similar now.
Enzo Matsumiya [Fri, 9 Jun 2023 21:29:59 +0000 (18:29 -0300)]
smb/client: print "Unknown" instead of bogus link speed value
The virtio driver for Linux guests will not set a link speed to its
paravirtualized NICs. This will be seen as -1 in the ethernet layer, and
when some servers (e.g. samba) fetches it, it's converted to an unsigned
value (and multiplied by 1000 * 1000), so in client side we end up with:
This patch introduces a helper that returns a speed string (in Mbps or
Gbps) if interface speed is valid (>= SPEED_10 and <= SPEED_800000), or
"Unknown" otherwise.
The reason to not change the value in iface->speed is because we don't
know the real speed of the HW backing the server NIC, so let's keep
considering these as the fastest NICs available.
Also print "Capabilities: None" when the interface doesn't support any.
Shyam Prasad N [Fri, 9 Jun 2023 17:46:55 +0000 (17:46 +0000)]
cifs: print all credit counters in DebugData
Output of /proc/fs/cifs/DebugData shows only the per-connection
counter for the number of credits of regular type. i.e. the
credits reserved for echo and oplocks are not displayed.
There have been situations recently where having this info
would have been useful. This change prints the credit counters
of all three types: regular, echo, oplocks.
Shyam Prasad N [Fri, 9 Jun 2023 17:46:54 +0000 (17:46 +0000)]
cifs: fix status checks in cifs_tree_connect
The ordering of status checks at the beginning of
cifs_tree_connect is wrong. As a result, a tcon
which is good may stay marked as needing reconnect
infinitely.
Fixes: 2f0e4f034220 ("cifs: check only tcon status on tcon related functions") Cc: [email protected] # 6.3 Signed-off-by: Shyam Prasad N <[email protected]> Signed-off-by: Steve French <[email protected]>
IB/isert: Fix incorrect release of isert connection
The ib_isert module is releasing the isert connection both in
isert_wait_conn() handler as well as isert_free_conn() handler.
In isert_wait_conn() handler, it is expected to wait for iSCSI
session logout operation to complete. It should free the isert
connection only in isert_free_conn() handler.
When a bunch of iSER target is cleared, this issue can lead to
use-after-free memory issue as isert conn is twice released
IB/isert: Fix possible list corruption in CMA handler
When ib_isert module receives connection error event, it is
releasing the isert session and removes corresponding list
node but it doesn't take appropriate mutex lock to remove
the list node. This can lead to linked list corruption
- When a iSER session is released, ib_isert module is taking a mutex
lock and releasing all pending connections. As part of this, ib_isert
is destroying rdma cm_id. To destroy cm_id, rdma_cm module is sending
CM events to CMA handler of ib_isert. This handler is taking same
mutex lock. Hence it leads to deadlock between ib_isert & rdma_cm
modules.
- For fix, created local list of pending connections and release the
connection outside of mutex lock.
Linus Torvalds [Sun, 11 Jun 2023 17:14:02 +0000 (10:14 -0700)]
Merge tag 'x86_urgent_for_v6.4_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Borislav Petkov:
- Set up the kernel CS earlier in the boot process in case EFI boots
the kernel after bypassing the decompressor and the CS descriptor
used ends up being the EFI one which is not mapped in the identity
page table, leading to early SEV/SNP guest communication exceptions
resulting in the guest crashing
* tag 'x86_urgent_for_v6.4_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/head/64: Switch to KERNEL_CS as soon as new GDT is installed
Linus Torvalds [Sun, 11 Jun 2023 17:07:35 +0000 (10:07 -0700)]
Merge tag '6.4-rc5-smb3-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
"Five smb3 server fixes, all also for stable:
- Fix four slab out of bounds warnings: improve checks for protocol
id, and for small packet length, and for create context parsing,
and for negotiate context parsing
- Fix for incorrect dereferencing POSIX ACLs"
* tag '6.4-rc5-smb3-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: validate smb request protocol id
ksmbd: check the validation of pdu_size in ksmbd_conn_handler_loop
ksmbd: fix posix_acls and acls dereferencing possible ERR_PTR()
ksmbd: fix out-of-bound read in parse_lease_state()
ksmbd: fix out-of-bound read in deassemble_neg_contexts()
Mark Bloch [Mon, 5 Jun 2023 10:33:26 +0000 (13:33 +0300)]
RDMA/mlx5: Fix affinity assignment
The cited commit aimed to ensure that Virtual Functions (VFs) assign a
queue affinity to a Queue Pair (QP) to distribute traffic when
the LAG master creates a hardware LAG. If the affinity was set while
the hardware was not in LAG, the firmware would ignore the affinity value.
However, this commit unintentionally assigned an affinity to QPs on the LAG
master's VPORT even if the RDMA device was not marked as LAG-enabled.
In most cases, this was not an issue because when the hardware entered
hardware LAG configuration, the RDMA device of the LAG master would be
destroyed and a new one would be created, marked as LAG-enabled.
The problem arises when a user configures Equal-Cost Multipath (ECMP).
In ECMP mode, traffic can be directed to different physical ports based on
the queue affinity, which is intended for use by VPORTS other than the
E-Switch manager. ECMP mode is supported only if both E-Switch managers are
in switchdev mode and the appropriate route is configured via IP. In this
configuration, the RDMA device is not destroyed, and we retain the RDMA
device that is not marked as LAG-enabled.
To ensure correct behavior, Send Queues (SQs) opened by the E-Switch
manager through verbs should be assigned strict affinity. This means they
will only be able to communicate through the native physical port
associated with the E-Switch manager. This will prevent the firmware from
assigning affinity and will not allow the SQs to be remapped in case of
failover.
Edward Srouji [Mon, 5 Jun 2023 10:33:24 +0000 (13:33 +0300)]
RDMA/uverbs: Restrict usage of privileged QKEYs
According to the IB specification rel-1.6, section 3.5.3:
"QKEYs with the most significant bit set are considered controlled
QKEYs, and a HCA does not allow a consumer to arbitrarily specify a
controlled QKEY."
Thus, block non-privileged users from setting such a QKEY.
Mark Zhang [Mon, 5 Jun 2023 10:33:23 +0000 (13:33 +0300)]
RDMA/cma: Always set static rate to 0 for RoCE
Set static rate to 0 as it should be discovered by path query and
has no meaning for RoCE.
This also avoid of using the rtnl lock and ethtool API, which is
a bottleneck when try to setup many rdma-cm connections at the same
time, especially with multiple processes.
Previously we used the core device associated to the IB device in order
to do the Q-counters query to the FW, but in LAG mode it is possible
that the core device isn't the one that created this VF.
Hence instead of using the core device to query the Q-counters
we use the ESW core device which is guaranteed to be that of the VF.
RDMA/mlx5: Remove vport Q-counters dependency on normal Q-counters
Previously the Q-counters initialization assumed that the vport Q-counters
structures and the normal Q-counters structures are identical in size,
and hence when a Q-counter was added to normal Q-counters structure but
not to the vport Q-counters struct it would lead to that counter name
being NULL in switchdev mode, which could cause the kernel crash below.
Currently break the dependency between those two structure and always
use the appropriate struct size, in order to remove the assumption
that both structure sizes are equal.
Previously Q-counters data was being allocated over the PF for all of
the available vports, however that isn't necessary.
Since each VF or SF has a Q-counter allocated for itself.
So we only need to allocate two counters data structures, one for the
device counters, and one for all the other vports to expose the
representors, since they only need to read from it in order to
determine mainly counters numbers and names, so they can all share.
This in turn also solves a bug we previously had where we couldn't
switch the device to switchdev mode when there were more than 128 SF/VFs
configured, since that is the maximum amount of Q-counters available for
a single port
Mark Bloch [Mon, 5 Jun 2023 10:33:18 +0000 (13:33 +0300)]
RDMA/mlx5: Create an indirect flow table for steering anchor
A misbehaved user can create a steering anchor that points to a kernel
flow table and then destroy the anchor without freeing the associated
STC. This creates a problem as the kernel can't destroy the flow
table since there is still a reference to it. As a result, this can
exhaust all available flow table resources, preventing other users from
using the RDMA device.
To prevent this problem, a solution is implemented where a special flow
table with two steering rules is created when a user creates a steering
anchor for the first time. The rules include one that drops all traffic
and another that points to the kernel flow table. If the steering anchor
is destroyed, only the rule pointing to the kernel's flow table is removed.
Any traffic reaching the special flow table after that is dropped.
Since the special flow table is not destroyed when the steering anchor is
destroyed, any issues are prevented from occurring. The remaining resources
are only destroyed when the RDMA device is destroyed, which happens after
all DEVX objects are freed, including the STCs, thus mitigating the issue.
Maher Sanalla [Mon, 5 Jun 2023 10:33:17 +0000 (13:33 +0300)]
RDMA/mlx5: Initiate dropless RQ for RAW Ethernet functions
Delay drop data is initiated for PFs that have the capability of
rq_delay_drop and are in roce profile.
However, PFs with RAW ethernet profile do not initiate delay drop data
on function load, causing kernel panic if delay drop struct members are
accessed later on in case a dropless RQ is created.
Thus, stage the delay drop initialization as part of RAW ethernet
PF loading process.
Linus Torvalds [Sat, 10 Jun 2023 20:21:04 +0000 (13:21 -0700)]
Merge tag 'soundwire-6.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire fixes from Vinod Koul:
"Core fix for missing flag clear, error patch handling in qcom driver
and BIOS quirk for HP Spectre x360:
- HP Spectre x360 soundwire DMI quirk
- Error path handling for qcom driver
- Core fix for missing clear of alloc_slave_rt"
* tag 'soundwire-6.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: stream: Add missing clear of alloc_slave_rt
soundwire: qcom: add proper error paths in qcom_swrm_startup()
soundwire: dmi-quirks: add new mapping for HP Spectre x360
Linus Torvalds [Sat, 10 Jun 2023 20:01:09 +0000 (13:01 -0700)]
Merge tag 'arm-fixes-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"Most of the changes this time are for the Qualcomm Snapdragon
platforms.
There are bug fixes for error handling in Qualcomm icc-bwmon,
rpmh-rsc, ramp_controller and rmtfs driver as well as the AMD tee
firmware driver and a missing initialization in the Arm ff-a firmware
driver. The Qualcomm RPMh and EDAC drivers need some rework to work
correctly on all supported chips.
The DT fixes include:
- i.MX8 fixes for gpio, pinmux and clock settings
- ADS touchscreen gpio polarity settings in several machines
- Address dtb warnings for caches, panel and input-enable properties
on Qualcomm platforms
- Incorrect data on qualcomm platforms fir SA8155P power domains,
SM8550 LLCC, SC7180-lite SDRAM frequencies and SM8550 soundwire
- Remoteproc firmware paths are corrected for Sony Xperia 10 IV"
* tag 'arm-fixes-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (36 commits)
firmware: arm_ffa: Set handle field to zero in memory descriptor
ARM: dts: Fix erroneous ADS touchscreen polarities
arm64: dts: imx8mn-beacon: Fix SPI CS pinmux
arm64: dts: imx8-ss-dma: assign default clock rate for lpuarts
arm64: dts: imx8qm-mek: correct GPIOs for USDHC2 CD and WP signals
EDAC/qcom: Get rid of hardcoded register offsets
EDAC/qcom: Remove superfluous return variable assignment in qcom_llcc_core_setup()
arm64: dts: qcom: sm8550: Use the correct LLCC register scheme
dt-bindings: cache: qcom,llcc: Fix SM8550 description
arm64: dts: qcom: sc7180-lite: Fix SDRAM freq for misidentified sc7180-lite boards
arm64: dts: qcom: sm8550: use uint16 for Soundwire interval
soc: qcom: rpmhpd: Add SA8155P power domains
arm64: dts: qcom: Split out SA8155P and use correct RPMh power domains
dt-bindings: power: qcom,rpmpd: Add SA8155P
soc: qcom: Rename ice to qcom_ice to avoid module name conflict
soc: qcom: rmtfs: Fix error code in probe()
soc: qcom: ramp_controller: Fix an error handling path in qcom_ramp_controller_probe()
ARM: dts: at91: sama7g5ek: fix debounce delay property for shdwc
ARM: at91: pm: fix imbalanced reference counter for ethernet devices
arm64: dts: qcom: sm6375-pdx225: Fix remoteproc firmware paths
...
In the last step of the EEH recovery process, the EEH driver calls into
bnx2x_io_resume() to re-initialize the NIC hardware via the function
bnx2x_nic_load(). If an error occurs during bnx2x_nic_load(), OS and
hardware resources are released and an error code is returned to the
caller. When called from bnx2x_io_resume(), the return code is ignored
and the network interface is brought up unconditionally. Later attempts
to send a packet via this interface result in a page fault due to a null
pointer reference.
This patch checks the return code of bnx2x_nic_load(), prints an error
message if necessary, and does not enable the interface.
Dmitry Mastykin [Thu, 8 Jun 2023 13:57:54 +0000 (16:57 +0300)]
netlabel: fix shift wrapping bug in netlbl_catmap_setlong()
There is a shift wrapping bug in this code on 32-bit architectures.
NETLBL_CATMAP_MAPTYPE is u64, bitmap is unsigned long.
Every second 32-bit word of catmap becomes corrupted.
Fix LBK link credits on CN10K to be same as CN9K i.e
16 * MAX_LBK_DATA_RATE instead of current scheme of
calculation based on LBK buf length / FIFO size.
Fixes: 6e54e1c5399a ("octeontx2-af: cn10K: Add MTU configuration") Signed-off-by: Nithin Dabilpuram <[email protected]> Signed-off-by: Naveen Mamindlapalli <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Satha Rao [Thu, 8 Jun 2023 11:42:00 +0000 (17:12 +0530)]
octeontx2-af: fixed resource availability check
txschq_alloc response have two different arrays to store continuous
and non-continuous schedulers of each level. Requested count should
be checked for each array separately.
Jakub Kicinski [Sat, 10 Jun 2023 07:11:18 +0000 (00:11 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-06-08 (ice)
This series contains updates to ice driver only.
Simon Horman stops null pointer dereference for GNSS error path.
Kamil fixes memory leak when downing interface when XDP is enabled.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Fix XDP memory leak when NIC is brought up and down
ice: Don't dereference NULL in ice_gnss_read error path
====================
Ahmed Zaki [Thu, 8 Jun 2023 20:02:26 +0000 (13:02 -0700)]
iavf: remove mask from iavf_irq_enable_queues()
Enable more than 32 IRQs by removing the u32 bit mask in
iavf_irq_enable_queues(). There is no need for the mask as there are no
callers that select individual IRQs through the bitmask. Also, if the PF
allocates more than 32 IRQs, this mask will prevent us from using all of
them.
Modify the comment in iavf_register.h to show that the maximum number
allowed for the IRQ index is 63 as per the iAVF standard 1.0 [1].
====================
selftests: mptcp: skip tests not supported by old kernels (part 2)
After a few years of increasing test coverage in the MPTCP selftests, we
realised [1] the last version of the selftests is supposed to run on old
kernels without issues.
Supporting older versions is not that easy for this MPTCP case: these
selftests are often validating the internals by checking packets that
are exchanged, when some MIB counters are incremented after some
actions, how connections are getting opened and closed in some cases,
etc. In other words, it is not limited to the socket interface between
the userspace and the kernelspace.
In addition to that, the current MPTCP selftests run a lot of different
sub-tests but the TAP13 protocol used in the selftests don't support
sub-tests: one failure in sub-tests implies that the whole selftest is
seen as failed at the end because sub-tests are not tracked. It is then
important to skip sub-tests not supported by old kernels.
To minimise the modifications and reduce the complexity to support old
versions, the idea is to look at external signs and skip the whole
selftests or just some sub-tests before starting them. This cannot be
applied in all cases.
This second part focuses on marking different sub-tests as skipped if
some MPTCP features are not supported. A few techniques are used here:
- Before starting some tests:
- Check if a file (sysctl knob) is present: that's what patch 13/14 is
doing for the userspace PM feature.
- Check if a symbol is present in /proc/kallsyms: patch 1/14 adds some
helpers in mptcp_lib.sh to ease its use. Then these helpers are used
in patches 2, 3, 4, 10, 11 and 14/14.
- Set a flag and get the status to check if a feature is supported:
patch 8/14 is doing that with the 'fullmesh' flag.
- After having launched the tests:
- Retrieve the counters after a test and check if they are different
than 0. Similar to the check with the flag, that's not ideal but in
this case, the counters were already present before the introduction
of MPTCP but they have been supported by MPTCP sockets only later.
Patches 5 and 6/14 are using this technique.
Before skipping tests, SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var
value is checked: if it is set to 1, the test is marked as "failed"
instead of "skipped". MPTCP public CI expects to have all features
supported and it sets this env var to 1 to catch regressions in these
new checks.
Patches 7/14 and 9/14 are a bit different because they don't skip tests:
- Patch 7/14 retrieves the default values instead of using hardcoded
ones because these default values have been modified at some points.
Then the comparisons are done with the default values.
- patch 9/14 relaxes the expected returned size from MPTCP's getsockopt
because the different structures gathering various info can get new
fields and get bigger over time. We cannot expect that the userspace
is using the same structure as the kernel.
Patch 12/14 marks the test as "skipped" instead of "failed" if the "ip"
tool is not available.
In this second part, the "mptcp_join" selftest is not modified yet. This
will come soon after in the third part with quite a few patches.
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the new listener events linked to the path-manager
introduced by commit f8c9dfbd875b ("mptcp: add pm listener events").
It is possible to look for "mptcp_event_pm_listener" in kallsyms to know
in advance if the kernel supports this feature and skip these sub-tests
if the feature is not supported.
Matthieu Baerts [Thu, 8 Jun 2023 16:38:53 +0000 (18:38 +0200)]
selftests: mptcp: sockopt: skip TCP_INQ checks if not supported
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is TCP_INQ cmsg support introduced in commit 2c9e77659a0c
("mptcp: add TCP_INQ cmsg support").
It is possible to look for "mptcp_ioctl" in kallsyms because it was
needed to introduce the mentioned feature. We can skip these tests and
not set TCPINQ option if the feature is not supported.
Matthieu Baerts [Thu, 8 Jun 2023 16:38:52 +0000 (18:38 +0200)]
selftests: mptcp: sockopt: skip getsockopt checks if not supported
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the getsockopt(SOL_MPTCP) to get info about the MPTCP
connections introduced by commit 55c42fa7fa33 ("mptcp: add MPTCP_INFO
getsockopt") and the following ones.
It is possible to look for "mptcp_diag_fill_info" in kallsyms because
it is introduced by the mentioned feature. So we can know in advance if
the feature is supported and skip the sub-test if not.
Matthieu Baerts [Thu, 8 Jun 2023 16:38:51 +0000 (18:38 +0200)]
selftests: mptcp: sockopt: relax expected returned size
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the getsockopt(SOL_MPTCP) to get info about the MPTCP
connections introduced by commit 55c42fa7fa33 ("mptcp: add MPTCP_INFO
getsockopt") and the following ones.
We cannot guess in advance which sizes the kernel will returned: older
kernel can returned smaller sizes, e.g. recently the tcp_info structure
has been modified in commit 71fc704768f6 ("tcp: add rcv_wnd and
plb_rehash to TCP_INFO") where a new field has been added.
The userspace can also expect a smaller size if it is compiled with old
uAPI kernel headers.
So for these sizes, we can only check if they are above a certain
threshold, 0 for the moment. We can also only compared sizes with the
ones set by the kernel.