kbuild: doc: update the description about Kbuild/Makefile split
The phrase "In newer versions of the kernel" was added 14 years ago, by
commit efdf02cf0651 ("Documentation/kbuild: major edit of modules.txt
sections 1-4"). This feature is no longer new, so remove it and update
the paragraph.
Example 3 was written 20 years ago [1]. There is no need to note about
backward compatibility with such an old build system. Remove Example 3
entirely.
If RUST_LIB_SRC is defined in the top-level Makefile (via an environment
variable or command line), it is already exported.
The only situation where it is defined but not exported is when the
top-level Makefile is wrapped by another Makefile (e.g., GNUmakefile).
I cannot think of any other use cases.
I know some people use this tip to define custom variables. However,
even in that case, you can export it directly in the wrapper Makefile.
Example GNUmakefile:
export RUST_LIB_SRC = /path/to/your/sysroot/lib/rustlib/src/rust/library
include Makefile
The append operation was introduced in
commit b1a1a1a09b46 ("kbuild: lto: postpone objtool")
when the command was created from two parts.
In commit 850ded46c642 ("kbuild: Fix TRIM_UNUSED_KSYMS with LTO_CLANG")
however the first part was removed again, making the append operation
unnecessary.
To keep this command definition aligned with all other command
definitions, remove the append again.
Currently, every expression in Kconfig files produces a new abstract
syntax tree (AST), even if it is identical to a previously encountered
one.
Consider the following code:
config FOO
bool "FOO"
depends on (A || B) && C
config BAR
bool "BAR"
depends on (A || B) && C
config BAZ
bool "BAZ"
depends on A || B
The "depends on" lines are similar, but currently a separate AST is
allocated for each one.
The current data structure looks like this:
FOO->dep ==> AND BAR->dep ==> AND BAZ->dep ==> OR
/ \ / \ / \
OR C OR C A B
/ \ / \
A B A B
This is redundant; FOO->dep and BAR->dep have identical ASTs but
different memory instances.
We can optimize this; FOO->dep and BAR->dep can share the same AST, and
BAZ->dep can reference its sub tree.
The optimized data structure looks like this:
FOO->dep, BAR->dep ==> AND
/ \
BAZ->dep ==> OR C
/ \
A B
This commit introduces a hash table to keep track of allocated
expressions. If an identical expression is found, it is reused.
This does not necessarily result in memory savings, as menu_finalize()
transforms expressions without freeing up stale ones. This will be
addressed later.
One optimization that can be easily implemented is caching the
expression's value. Once FOO's dependency, (A || B) && C, is calculated,
it can be cached, eliminating the need to recalculate it for BAR.
This commit also reverts commit e983b7b17ad1 ("kconfig/menu.c: fix
multiple references to expressions in menu_add_prop()").
Currently, expr_eliminate_dups() passes two identical pointers down to
expr_eliminate_dups1(), which later skips processing identical leaves.
This approach is somewhat tricky and, more importantly, it will not work
with the refactoring made in the next commit.
This commit slightly changes the recursion logic; it deduplicates both
the left and right arms, and then passes them to expr_eliminate_dups1().
expr_eliminate_dups() should produce the same result.
scripts: add verifier script for builtin module range data
The modules.builtin.ranges offset range data for builtin modules is
generated at compile time based on the list of built-in modules and
the vmlinux.map and vmlinux.o.map linker maps. This data can be used
to determine whether a symbol at a particular address belongs to
module code that was configured to be compiled into the kernel proper
as a built-in module (rather than as a standalone module).
This patch adds a script that uses the generated modules.builtin.ranges
data to annotate the symbols in the System.map with module names if
their address falls within a range that belongs to one or more built-in
modules.
It then processes the vmlinux.map (and if needed, vmlinux.o.map) to
verify the annotation:
- For each top-level section:
- For each object in the section:
- Determine whether the object is part of a built-in module
(using modules.builtin and the .*.cmd file used to compile
the object as suggested in [0])
- For each symbol in that object, verify that the built-in
module association (or lack thereof) matches the annotation
given to the symbol.
kbuild: generate offset range data for builtin modules
Create file module.builtin.ranges that can be used to find where
built-in modules are located by their addresses. This will be useful for
tracing tools to find what functions are for various built-in modules.
The offset range data for builtin modules is generated using:
- modules.builtin: associates object files with module names
- vmlinux.map: provides load order of sections and offset of first member
per section
- vmlinux.o.map: provides offset of object file content per section
- .*.cmd: build cmd file with KBUILD_MODFILE
For each ELF section, it lists the offset of the first symbol. This can
be used to determine the base address of the section at runtime.
Next, it lists (in strict ascending order) offset ranges in that section
that cover the symbols of one or more builtin modules. Multiple ranges
can apply to a single module, and ranges can be shared between modules.
The CONFIG_BUILTIN_MODULE_RANGES option controls whether offset range data
is generated for kernel modules that are built into the kernel image.
How it works:
1. The modules.builtin file is parsed to obtain a list of built-in
module names and their associated object names (the .ko file that
the module would be in if it were a loadable module, hereafter
referred to as <kmodfile>). This object name can be used to
identify objects in the kernel compile because any C or assembler
code that ends up into a built-in module will have the option
-DKBUILD_MODFILE=<kmodfile> present in its build command, and those
can be found in the .<obj>.cmd file in the kernel build tree.
If an object is part of multiple modules, they will all be listed
in the KBUILD_MODFILE option argument.
This allows us to conclusively determine whether an object in the
kernel build belong to any modules, and which.
2. The vmlinux.map is parsed next to determine the base address of each
top level section so that all addresses into the section can be
turned into offsets. This makes it possible to handle sections
getting loaded at different addresses at system boot.
We also determine an 'anchor' symbol at the beginning of each
section to make it possible to calculate the true base address of
a section at runtime (i.e. symbol address - symbol offset).
We collect start addresses of sections that are included in the top
level section. This is used when vmlinux is linked using vmlinux.o,
because in that case, we need to look at the vmlinux.o linker map to
know what object a symbol is found in.
And finally, we process each symbol that is listed in vmlinux.map
(or vmlinux.o.map) based on the following structure:
vmlinux linked from vmlinux.a:
vmlinux.map:
<top level section>
<included section> -- might be same as top level section)
<object> -- built-in association known
<symbol> -- belongs to module(s) object belongs to
...
vmlinux linked from vmlinux.o:
vmlinux.map:
<top level section>
<included section> -- might be same as top level section)
vmlinux.o -- need to use vmlinux.o.map
<symbol> -- ignored
...
vmlinux.o.map:
<section>
<object> -- built-in association known
<symbol> -- belongs to module(s) object belongs to
...
3. As sections, objects, and symbols are processed, offset ranges are
constructed in a straight-forward way:
- If the symbol belongs to one or more built-in modules:
- If we were working on the same module(s), extend the range
to include this object
- If we were working on another module(s), close that range,
and start the new one
- If the symbol does not belong to any built-in modules:
- If we were working on a module(s) range, close that range
kbuild: add mod(name,file)_flags to assembler flags for module objects
In order to create the file at build time, modules.builtin.ranges, that
contains the range of addresses for all built-in modules, there needs to
be a way to identify what code is compiled into modules.
To identify what code is compiled into modules during a kernel build,
one can look for the presence of the -DKBUILD_MODFILE and -DKBUILD_MODNAME
options in the compile command lines. A simple grep in .*.cmd files for
those options is sufficient for this.
Unfortunately, these options are only passed when compiling C source files.
Various modules also include objects built from assembler source, and these
options are not passed in that case.
Adding $(modfile_flags) to modkern_aflags (similar to modkern_cflags), and
adding $(modname_flags) to a_flags (similar to c_flags) makes it possible
to identify which objects are compiled into modules for both C and
assembler source files. While KBUILD_MODFILE is sufficient to generate
the modules ranges data, KBUILD_MODNAME is passed as well for consistency
with the C source code case.
scripts: subarch.include: fix SUBARCH on macOS hosts
When building the Linux kernel on an aarch64 macOS based host, if we don't
specify a value for ARCH when invoking make, we default to arm and thus
multi_v7_defconfig rather than the expected arm64 and arm64's defconfig.
This is because subarch.include invokes `uname -m` which on MacOS hosts
evaluates to `arm64` but on Linux hosts evaluates to `aarch64`,
This allows us to build ARCH=arm64 natively on macOS (as in ARCH need
not be specified on an aarch64-based system).
Avoid matching arm64 by excluding it from the arm.* sed expression.
kbuild: split device tree build rules into scripts/Makefile.dtbs
scripts/Makefile.lib is included not only from scripts/Makefile.build
but also from scripts/Makefile.{modfinal,package,vmlinux,vmlinux_o},
where DT build rules are not required.
Split the DT build rules out to scripts/Makefile.dtbs, and include it
only when necessary.
While I was here, I added $(DT_TMP_SCHEMA) as a prerequisite of
$(multi-dtb-y).
kbuild: compile constant module information only once
Various information about modules is compiled into the info sections.
For that a dedicated .mod.c file is generated by modpost for each module
and then linked into the module.
However most of the information in the .mod.c is the same for all
modules, internal and external.
Split the shared information into a dedicated source file that is
compiled once and then linked into all modules.
This avoids frequent rebuilds for all .mod.c files when using
CONFIG_LOCALVERSION_AUTO because the local version ends up in .mod.c
through UTS_RELEASE and VERMAGIC_STRING.
The modules are still relinked in this case.
The code is also easier to maintain as it's now in a proper source file
instead of an inline string literal.
Tony Battersby [Thu, 29 Aug 2024 13:51:25 +0000 (09:51 -0400)]
kbuild: remove recent dependency on "truncate" program
Remove the recently-added dependency on the truncate program for
building the kernel. truncate is not available when building the kernel
under Yocto. It could be added, but it would be better just to avoid
the unnecessary dependency.
Jose Fernandez [Sat, 24 Aug 2024 22:07:56 +0000 (16:07 -0600)]
kbuild: add debug package to pacman PKGBUILD
Add a new debug package to the PKGBUILD for the pacman-pkg target. The
debug package includes the non-stripped vmlinux file with debug symbols
for kernel debugging and profiling. The file is installed at
/usr/src/debug/${pkgbase}, with a symbolic link at
/usr/lib/modules/$(uname -r)/build/vmlinux. The debug package is built
by default.
There are a few lines in the kbuild-language.rst document which
obliquely reference the behavior of config options without prompts.
But there is nothing in the obvious location that explicitly calls
out that users cannot edit config options unless they have a prompt.
Masahiro Yamada [Fri, 16 Aug 2024 13:44:29 +0000 (22:44 +0900)]
modpost: simplify modpost_log()
With commit cda5f94e88b4 ("modpost: avoid using the alias attribute"),
only two log levels remain: LOG_WARN and LOG_ERROR. Simplify this by
making it a boolean variable.
Jose Fernandez [Tue, 13 Aug 2024 01:16:19 +0000 (19:16 -0600)]
kbuild: control extra pacman packages with PACMAN_EXTRAPACKAGES
Introduce the PACMAN_EXTRAPACKAGES variable in PKGBUILD to allow users
to specify which additional packages are built by the pacman-pkg target.
Previously, the api-headers package was always included, and the headers
package was included only if CONFIG_MODULES=y. With this change, both
headers and api-headers packages are included by default. Users can now
control this behavior by setting PACMAN_EXTRAPACKAGES to a
space-separated list of desired extra packages or leaving it empty to
exclude all.
For example, to build only the base package without extras:
Masahiro Yamada [Mon, 12 Aug 2024 16:54:51 +0000 (01:54 +0900)]
modpost: improve the section mismatch warning format
This commit improves the section mismatch warning format when there is
no suitable symbol name to print.
The section mismatch warning prints the reference source in the form
of <symbol_name>+<offset> and the reference destination in the form
of <symbol_name>.
However, there are some corner cases where <symbol_name> becomes
"(unknown)", as reported in commit 23dfd914d2bf ("modpost: fix null
pointer dereference").
In such cases, it is better to print the symbol address.
Masahiro Yamada [Mon, 12 Aug 2024 12:48:52 +0000 (21:48 +0900)]
kallsyms: use xmalloc() and xrealloc()
When malloc() or realloc() fails, there is not much userspace programs
can do. xmalloc() and xrealloc() are useful to bail out on a memory
allocation failure.
Masahiro Yamada [Mon, 12 Aug 2024 11:49:46 +0000 (20:49 +0900)]
kconfig: stop adding P_SYMBOL property to symbols
I believe its last usage was in the following code:
if (prop == NULL)
prop = stack->sym->prop;
This code was previously used to print the file name and line number of
associated symbols in sym_check_print_recursive(), which was removed by
commit 9d0d26604657 ("kconfig: recursive checks drop file/lineno").
Masahiro Yamada [Mon, 12 Aug 2024 11:49:45 +0000 (20:49 +0900)]
kconfig: remove dummy assignments to cur_{filename,lineno}
Since commit ca4c74ba306e ("kconfig: remove P_CHOICE property"),
menu_finalize() no longer calls menu_add_symbol(). No function
references cur_filename or cur_lineno after yyparse().
Commit 3f1b0e1f2875 (".gitignore update") added *.orig and *.rej
patterns to .gitignore in v2.6.23. The commit message didn't give a
rationale. Later on, commit 1f5d3a6b6532 ("Remove *.rej pattern from
.gitignore") removed the *.rej pattern in v2.6.26, on the rationale that
*.rej files indicated something went really wrong and should not be
ignored.
The *.rej files are now shown by `git status`, which helps located
conflicts when applying patches and lowers the probability that they
will go unnoticed. It is however still easy to overlook the *.orig files
which slowly polute the source tree. That's not as big of a deal as not
noticing a conflict, but it's still not nice.
Drop the *.orig pattern from .gitignore to avoid this and help keep the
source tree clean.
Signed-off-by: Laurent Pinchart <[email protected]>
[[email protected]:
I do not have a strong opinion about this. Perhaps some people may have
a different opinion.
If you are someone who wants to ignore *.orig, it is likely you would
want to do so across all projects. Then, $XDG_CONFIG_HOME/git/ignore
would be more suitable for your needs. gitignore(5) suggests, "Patterns
which a user wants Git to ignore in all situations generally go into a
file specified by core.excludesFile in the user's ~/.gitconfig".
Please note that you cannot do the opposite; if *.orig is ignored by
the project's .gitignore, you cannot override the decision because
$XDG_CONFIG_HOME/git/ignore has a lower priority.
If *.orig is sitting on the fence, I'd leave it to the users. ] Signed-off-by: Masahiro Yamada <[email protected]>
kbuild: cross-compile linux-headers package when possible
A long standing issue in the upstream kernel packaging is that the
linux-headers package is not cross-compiled.
For example, you can cross-build Debian packages for arm64 by running
the following command:
$ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- bindeb-pkg
However, the generated linux-headers-*_arm64.deb is useless because the
host programs in it were built for your build machine architecture
(likely x86), not arm64.
The Debian kernel maintains its own Makefiles to cross-compile host
tools without relying on Kbuild. [1]
Instead of adding such full custom Makefiles, this commit adds a small
piece of code to cross-compile host programs located under the scripts/
directory.
A straightforward solution is to pass HOSTCC=${CROSS_COMPILE}gcc, but it
would also cross-compile scripts/basic/fixdep, which needs to be native
to process the if_changed_dep macro. (This approach may work under some
circumstances; you can execute foreign architecture programs with the
help of binfmt_misc because Debian systems enable CONFIG_BINFMT_MISC,
but it would require installing QEMU and libc for that architecture.)
A trick is to use the external module build (KBUILD_EXTMOD=), which
does not rebuild scripts/basic/fixdep. ${CC} needs to be able to link
userspace programs (CONFIG_CC_CAN_LINK=y).
There are known limitations:
- GCC plugins
It would possible to rebuild GCC plugins for the target architecture
by passing HOSTCXX=${CROSS_COMPILE}g++ with necessary packages
installed, but gcc on the installed system emits
"cc1: error: incompatible gcc/plugin versions".
- objtool and resolve_btfids
These are built by the tools build system. They are not covered by
the current solution. The resulting linux-headers package is broken
if CONFIG_OBJTOOL or CONFIG_DEBUG_INFO_BTF is enabled.
I only tested this with Debian, but it should work for other package
systems as well.
Endianness is currently detected on compile-time, but we can defer this
until run-time. This change avoids re-executing scripts/mod/mk_elfconfig
even if modpost in the linux-headers package needs to be rebuilt for a
foreign architecture.
Merge tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- copy_file_range fix
- two read fixes including read past end of file rc fix and read retry
crediting fix
- falloc zero range fix
* tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix FALLOC_FL_ZERO_RANGE to preflush buffered part of target region
cifs: Fix copy offload to flush destination region
netfs, cifs: Fix handling of short DIO read
cifs: Fix lack of credit renegotiation on read retry
Merge tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs
Push bcachefs fixes from Kent Overstreet:
"The data corruption in the buffered write path is troubling; inode
lock should not have been able to cause that...
- Fix a rare data corruption in the rebalance path, caught as a nonce
inconsistency on encrypted filesystems
- Revert lockless buffered write path
- Mark more errors as autofix"
* tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs:
bcachefs: Mark more errors as autofix
bcachefs: Revert lockless buffered IO path
bcachefs: Fix bch2_extents_match() false positive
bcachefs: Fix failure to return error in data_update_index_update()
Kent Overstreet [Thu, 22 Aug 2024 15:47:32 +0000 (11:47 -0400)]
bcachefs: Mark more errors as autofix
errors that are known to always be safe to fix should be autofix: this
should be most errors even at this point, but that will need some
thorough review.
note that errors are still logged in the superblock, so we'll still know
that they happened.
It seems that writes are being dropped, but only when issued by QEMU,
and possibly only in snapshot mode. It's undetermined if it's write
calls are being dropped or dirty folios.
Further testing, via minimizing the original patch to just the change
that skips the inode lock on non appends/truncates, reveals that it
really is just not taking the inode lock that causes the corruption: it
has nothing to do with the other logic changes for preserving write
atomicity in corner cases.
It's also kernel config dependent: it doesn't reproduce with the minimal
kernel config that ktest uses, but it does reproduce with nixos's distro
config. Bisection the kernel config initially pointer the finger at page
migration or compaction, but it appears that was erroneous; we haven't
yet determined what kernel config option actually triggers it.
Sadly it appears this will have to be reverted since we're getting too
close to release and my plate is full, but we'd _really_ like to fully
debug it.
My suspicion is that this patch is exposing a preexisting bug - the
inode lock actually covers very little in IO paths, and we have a
different lock (the pagecache add lock) that guards against races with
truncate here.
Fixes: 7e64c86cdc6c ("bcachefs: Buffered write path now can avoid the inode lock") Signed-off-by: Kent Overstreet <[email protected]>
Linus Torvalds [Sat, 31 Aug 2024 21:18:48 +0000 (09:18 +1200)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull misc fixes from Guenter Roeck.
These are fixes for regressions that Guenther has been reporting, and
the maintainers haven't picked up and sent in. With rc6 fairly imminent,
I'm taking them directly from Guenter.
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
apparmor: fix policy_unpack_test on big endian systems
Revert "MIPS: csrc-r4k: Apply verification clocksource flags"
microblaze: don't treat zero reserved memory regions as error
Linus Torvalds [Sat, 31 Aug 2024 21:07:44 +0000 (09:07 +1200)]
Merge tag 'pwrseq-fixes-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull power sequencing fix from Bartosz Golaszewski:
"A follow-up fix for the power sequencing subsystem. It turned out the
previous fix for this driver was incomplete and broke the WLAN support
on some platforms. This addresses the issue.
- set the direction of the wlan-enable GPIO to output after
requesting it as-is"
* tag 'pwrseq-fixes-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
power: sequencing: qcom-wcn: set the wlan-enable GPIO to output
power: sequencing: qcom-wcn: set the wlan-enable GPIO to output
Commit a9aaf1ff88a8 ("power: sequencing: request the WLAN enable GPIO
as-is") broke WLAN on boards on which the wlan-enable GPIO enabling the
wifi module isn't in output mode by default. We need to set direction to
output while retaining the value that was already set to keep the ath
module on if it's already started.
Linus Torvalds [Sat, 31 Aug 2024 19:00:38 +0000 (07:00 +1200)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Minor fixes only.
The sd.c one ignores a sync cache request if format is in progress
which can happen if formatting a drive across suspend/resume"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: Ignore command SYNCHRONIZE CACHE error if format in progress
scsi: aacraid: Fix double-free on probe failure
scsi: lpfc: Fix overflow build issue
Linus Torvalds [Sat, 31 Aug 2024 18:55:47 +0000 (06:55 +1200)]
Merge tag 'nfsd-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- One more write delegation fix
* tag 'nfsd-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix nfsd4_deleg_getattr_conflict in presence of third party lease
Linus Torvalds [Sat, 31 Aug 2024 18:48:37 +0000 (06:48 +1200)]
Merge tag 'xfs-6.11-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Chandan Babu:
- Do not call out v1 inodes with non-zero di_nlink field as being
corrupt
- Change xfs_finobt_count_blocks() to count "free inode btree" blocks
rather than "inode btree" blocks
- Don't report the number of trimmed bytes via FITRIM because the
underlying storage isn't required to do anything and failed discard
IOs aren't reported to the caller anyway
- Fix incorrect setting of rm_owner field in an rmap query
- Report missing disk offset range in an fsmap query
- Obtain m_growlock when extending realtime section of the filesystem
- Reset rootdir extent size hint after extending realtime section of
the filesystem
* tag 'xfs-6.11-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: reset rootdir extent size hint after growfsrt
xfs: take m_growlock when running growfsrt
xfs: Fix missing interval for missing_owner in xfs fsmap
xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code
xfs: Fix the owner setting issue for rmap query in xfs fsmap
xfs: don't bother reporting blocks trimmed via FITRIM
xfs: xfs_finobt_count_blocks() walks the wrong btree
xfs: fix folio dirtying for XFILE_ALLOC callers
xfs: fix di_onlink checking for V1/V2 inodes
Linus Torvalds [Sat, 31 Aug 2024 18:42:13 +0000 (06:42 +1200)]
Merge tag 'arm-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"There is a fairly large number of bug fixes for Qualcomm platforms,
most of them addressing issues with the devicetree files for the newly
added Snapdragon X1 based laptops to make them more reliable.
The Qualcomm driver changes address a few build-time issues as well as
runtime problems in the tzmem and scm firmware, the USB Type-C driver,
and the cmd-db and pmic_glink soc drivers.
The NXP i.MX usually gets a bunch of devicetree fixes that is
proportional to the number of supported machines. This includes both
warning fixes and correctness for the 64-bit i.MX9, i.MX8 and
layerscape platforms, as well as a single fix for a 32-bit i.MX6 based
board.
The other changes are the usual minor changes, including an update to
the MAINTAINERS file, an omap3 dts file and a SoC driver for mpfs
(risc-v)"
* tag 'arm-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (50 commits)
firmware: microchip: fix incorrect error report of programming:timeout on success
soc: qcom: pd-mapper: Fix singleton refcount
firmware: qcom: tzmem: disable sdm670 platform
soc: qcom: pmic_glink: Actually communicate when remote goes down
usb: typec: ucsi: Move unregister out of atomic section
soc: qcom: pmic_glink: Fix race during initialization
firmware: qcom: qseecom: remove unused functions
firmware: qcom: tzmem: fix virtual-to-physical address conversion
firmware: qcom: scm: Mark get_wq_ctx() as atomic call
arm64: dts: qcom: x1e80100: Fix Adreno SMMU global interrupt
arm64: dts: qcom: disable GPU on x1e80100 by default
arm64: dts: imx8mm-phygate: fix typo pinctrcl-0
arm64: dts: imx95: correct L3Cache cache-sets
arm64: dts: imx95: correct a55 power-domains
arm64: dts: freescale: imx93-tqma9352-mba93xxla: fix typo
arm64: dts: freescale: imx93-tqma9352: fix CMA alloc-ranges
ARM: dts: imx6dl-yapp43: Increase LED current to match the yapp4 HW design
arm64: dts: imx93: update default value for snps,clk-csr
arm64: dts: freescale: tqma9352: Fix watchdog reset
arm64: dts: imx8mp-beacon-kit: Fix Stereo Audio on WM8962
...
Linus Torvalds [Sat, 31 Aug 2024 03:32:38 +0000 (15:32 +1200)]
Merge tag 'input-for-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fix from Dmitry Torokhov:
- a fix for Cypress PS/2 touchpad for regression introduced in 6.11
merge window where a timeout condition is incorrectly reported for
all extended Cypress commands
* tag 'input-for-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: cypress_ps2 - fix waiting for command response
Linus Torvalds [Sat, 31 Aug 2024 02:54:11 +0000 (14:54 +1200)]
Merge tag 'pci-v6.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Add Manivannan Sadhasivam as PCI native host bridge and endpoint
driver reviewer (Manivannan Sadhasivam)
- Disable MHI RAM data parity error interrupt for qcom SA8775P SoC to
work around hardware erratum that causes a constant stream of
interrupts (Manivannan Sadhasivam)
- Don't try to fall back to qcom Operating Performance Points (OPP)
support unless the platform actually supports OPP (Manivannan
Sadhasivam)
- Add [email protected] mailing list to MAINTAINERS for NXP
layerscape and imx6 PCI controller drivers (Frank Li)
* tag 'pci-v6.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
MAINTAINERS: PCI: Add NXP PCI controller mailing list [email protected]
PCI: qcom: Use OPP only if the platform supports it
PCI: qcom-ep: Disable MHI RAM data parity error interrupt for SA8775P SoC
MAINTAINERS: Add Manivannan Sadhasivam as Reviewer for PCI native host bridge and endpoint drivers
Linus Torvalds [Sat, 31 Aug 2024 01:51:27 +0000 (13:51 +1200)]
Merge tag 'io_uring-6.11-20240830' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- A fix for a regression that happened in 6.11 merge window, where the
copying of iovecs for compat mode applications got broken for certain
cases.
- Fix for a bug introduced in 6.10, where if using recv/send bundles
with classic provided buffers, the recv/send would fail to set the
right iovec count. This caused 0 byte send/recv results. Found via
code coverage testing and writing a test case to exercise it.
* tag 'io_uring-6.11-20240830' of git://git.kernel.dk/linux:
io_uring/kbuf: return correct iovec count from classic buffer peek
io_uring/rsrc: ensure compat iovecs are copied correctly
Linus Torvalds [Fri, 30 Aug 2024 18:33:59 +0000 (06:33 +1200)]
Merge tag 'lsm-pr-20240830' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm fix from Paul Moore:
"One small patch to correct a NFS permissions problem with SELinux and
Smack"
* tag 'lsm-pr-20240830' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
selinux,smack: don't bypass permissions check in inode_setsecctx hook
Linus Torvalds [Fri, 30 Aug 2024 18:25:34 +0000 (06:25 +1200)]
Merge tag 'pm-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix three issues in the amd-pstate cpufreq driver.
Specifics:
- Remove checks for highest performance match on preferred cores when
updating preferred core ranking in amd-pstate (Mario Limonciello)
- Make amd-pstate call topology_logical_package_id() instead of
logical_die_id() to get a socked ID for a CPU (Gautham Shenoy)
- Fix uninitialized variable in amd_pstate_cpu_boost_update() (Dan
Carpenter)"
* tag 'pm-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq/amd-pstate-ut: Don't check for highest perf matching on prefcore
cpufreq/amd-pstate: Use topology_logical_package_id() instead of logical_die_id()
cpufreq: amd-pstate: Fix uninitialized variable in amd_pstate_cpu_boost_update()
Linus Torvalds [Fri, 30 Aug 2024 18:15:02 +0000 (06:15 +1200)]
Merge tag 'soundwire-6.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire fix from Vinod Koul:
- Single fix for non-continous port map programming
* tag 'soundwire-6.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: stream: fix programming slave ports for non-continous port maps
Linus Torvalds [Fri, 30 Aug 2024 18:11:34 +0000 (06:11 +1200)]
Merge tag 'iommu-fixes-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:
- Fix a device-stall problem in bad io-page-fault setups (faults
received from devices with no supporting domain attached).
- Context flush fix for Intel VT-d.
- Do not allow non-read+non-write mapping through iommufd as most
implementations can not handle that.
- Fix a possible infinite-loop issue in map_pages() path.
- Add Jean-Philippe as reviewer for SMMUv3 SVA support
* tag 'iommu-fixes-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
MAINTAINERS: Add Jean-Philippe as SMMUv3 SVA reviewer
iommu: Do not return 0 from map_pages if it doesn't do anything
iommufd: Do not allow creating areas without READ or WRITE
iommu/vt-d: Fix incorrect domain ID in context flush helper
iommu: Handle iommu faults for a bad iopf setup
Jens Axboe [Fri, 30 Aug 2024 16:45:54 +0000 (10:45 -0600)]
io_uring/kbuf: return correct iovec count from classic buffer peek
io_provided_buffers_select() returns 0 to indicate success, but it should
be returning 1 to indicate that 1 vec was mapped. This causes peeking
to fail with classic provided buffers, and while that's not a use case
that anyone should use, it should still work correctly.
The end result is that no buffer will be selected, and hence a completion
with '0' as the result will be posted, without a buffer attached.
NeilBrown [Wed, 28 Aug 2024 23:06:28 +0000 (09:06 +1000)]
nfsd: fix nfsd4_deleg_getattr_conflict in presence of third party lease
It is not safe to dereference fl->c.flc_owner without first confirming
fl->fl_lmops is the expected manager. nfsd4_deleg_getattr_conflict()
tests fl_lmops but largely ignores the result and assumes that flc_owner
is an nfs4_delegation anyway. This is wrong.
With this patch we restore the "!= &nfsd_lease_mng_ops" case to behave
as it did before the change mentioned below. This is the same as the
current code, but without any reference to a possible delegation.
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation") Signed-off-by: NeilBrown <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
Jens Axboe [Wed, 28 Aug 2024 15:42:33 +0000 (09:42 -0600)]
io_uring/rsrc: ensure compat iovecs are copied correctly
For buffer registration (or updates), a userspace iovec is copied in
and updated. If the application is within a compat syscall, then the
iovec type is compat_iovec rather than iovec. However, the type used
in __io_sqe_buffers_update() and io_sqe_buffers_register() is always
struct iovec, and hence the source is incremented by the size of a
non-compat iovec in the loop. This misses every other iovec in the
source, and will run into garbage half way through the copies and
return -EFAULT to the application.
Maintain the source address separately and assign to our user vec
pointer, so that copies always happen from the right source address.
While in there, correct a bad placement of __user which triggered
the following sparse warning prior to this fix:
io_uring/rsrc.c:981:33: warning: cast removes address space '__user' of expression
io_uring/rsrc.c:981:30: warning: incorrect type in assignment (different address spaces)
io_uring/rsrc.c:981:30: expected struct iovec const [noderef] __user *uvec
io_uring/rsrc.c:981:30: got struct iovec *[noderef] __user
Fixes: f4eaf8eda89e ("io_uring/rsrc: Drop io_copy_iov in favor of iovec API") Reviewed-by: Gabriel Krisman Bertazi <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
Linus Torvalds [Fri, 30 Aug 2024 02:17:30 +0000 (14:17 +1200)]
Merge tag 'drm-fixes-2024-08-30' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Another week, another set of GPU fixes. amdgpu and vmwgfx leading the
charge, then i915 and xe changes along with v3d and some other bits.
The TTM revert is due to some stuttering graphical apps probably due
to longer stalls while prefaulting.
i915:
- Fix #11195: The external display connect via USB type-C dock stays
blank after re-connect the dock
- Make DSI backlight work for 2G version of Lenovo Yoga Tab 3 X90F
- Move ARL GuC firmware to correct version
vmwgfx:
- prevent unmapping active read buffers
- fix prime with external buffers
- disable coherent dumb buffers without 3d
v3d:
- disable preemption while updating GPU stats"
* tag 'drm-fixes-2024-08-30' of https://gitlab.freedesktop.org/drm/kernel:
drm/xe/hwmon: Fix WRITE_I1 param from u32 to u16
drm/v3d: Disable preemption while updating GPU stats
drm/amd/pm: Drop unsupported features on smu v14_0_2
drm/amd/pm: Add support for new P2S table revision
drm/amdgpu: support for gc_info table v1.3
drm/amd/display: avoid using null object of framebuffer
drm/amdgpu/gfx12: set UNORD_DISPATCH in compute MQDs
drm/amd/pm: update message interface for smu v14.0.2/3
drm/amdgpu/swsmu: always force a state reprogram on init
drm/amdgpu/smu13.0.7: print index for profiles
drm/amdgpu: align pp_power_profile_mode with kernel docs
drm/i915/dp_mst: Fix MST state after a sink reset
drm/xe: Invalidate media_gt TLBs
drm/i915: ARL requires a newer GSC firmware
drm/i915/dsi: Make Lenovo Yoga Tab 3 X90F DMI match less strict
video/aperture: optionally match the device in sysfb_disable()
drm/vmwgfx: Disable coherent dumb buffers without 3d
drm/vmwgfx: Fix prime with external buffers
drm/vmwgfx: Prevent unmapping active read buffers
Revert "drm/ttm: increase ttm pre-fault value to PMD size"
Dave Airlie [Fri, 30 Aug 2024 01:28:00 +0000 (11:28 +1000)]
Merge tag 'drm-misc-fixes-2024-08-29' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
A revert for a previous TTM commit causing stuttering, 3 fixes for
vmwgfx related to buffer operations, a fix for video/aperture with
non-VGA primary devices, and a preemption status fix for v3d
Linus Torvalds [Fri, 30 Aug 2024 00:32:53 +0000 (12:32 +1200)]
Merge tag 'execve-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve fix from Kees Cook:
- binfmt_elf_fdpic: fix AUXV size with ELF_HWCAP2 (Max Filippov)
* tag 'execve-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt_elf_fdpic: fix AUXV size calculation when ELF_HWCAP2 is defined
Stephen Brennan [Thu, 29 Aug 2024 18:20:49 +0000 (11:20 -0700)]
dcache: keep dentry_hashtable or d_hash_shift even when not used
The runtime constant feature removes all the users of these variables,
allowing the compiler to optimize them away. It's quite difficult to
extract their values from the kernel text, and the memory saved by
removing them is tiny, and it was never the point of this optimization.
Since the dentry_hashtable is a core data structure, it's valuable for
debugging tools to be able to read it easily. For instance, scripts
built on drgn, like the dentrycache script[1], rely on it to be able to
perform diagnostics on the contents of the dcache. Annotate it as used,
so the compiler doesn't discard it.
Dave Airlie [Thu, 29 Aug 2024 23:02:27 +0000 (09:02 +1000)]
Merge tag 'drm-intel-fixes-2024-08-29' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
- Fix #11195: The external display connect via USB type-C dock stays blank after re-connect the dock
- Make DSI backlight work for 2G version of Lenovo Yoga Tab 3 X90F
. Move ARL GuC firmware to correct version
-
Linus Torvalds [Thu, 29 Aug 2024 18:14:39 +0000 (06:14 +1200)]
Merge tag 'net-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth, wireless and netfilter.
No known outstanding regressions.
Current release - regressions:
- wifi: iwlwifi: fix hibernation
- eth: ionic: prevent tx_timeout due to frequent doorbell ringing
Previous releases - regressions:
- sched: fix sch_fq incorrect behavior for small weights
- wifi:
- iwlwifi: take the mutex before running link selection
- wfx: repair open network AP mode
- netfilter: restore IP sanity checks for netdev/egress
- tcp: fix forever orphan socket caused by tcp_abort
- mptcp: close subflow when receiving TCP+FIN
- bluetooth: fix random crash seen while removing btnxpuart driver
Previous releases - always broken:
- mptcp: more fixes for the in-kernel PM
- eth: bonding: change ipsec_lock from spin lock to mutex
- eth: mana: fix race of mana_hwc_post_rx_wqe and new hwc response
Misc:
- documentation: drop special comment style for net code"
* tag 'net-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
nfc: pn533: Add poll mod list filling check
mailmap: update entry for Sriram Yagnaraman
selftests: mptcp: join: check re-re-adding ID 0 signal
mptcp: pm: ADD_ADDR 0 is not a new address
selftests: mptcp: join: validate event numbers
mptcp: avoid duplicated SUB_CLOSED events
selftests: mptcp: join: check re-re-adding ID 0 endp
mptcp: pm: fix ID 0 endp usage after multiple re-creations
mptcp: pm: do not remove already closed subflows
selftests: mptcp: join: no extra msg if no counter
selftests: mptcp: join: check re-adding init endp with != id
mptcp: pm: reset MPC endp ID when re-added
mptcp: pm: skip connecting to already established sf
mptcp: pm: send ACK on an active subflow
selftests: mptcp: join: check removing ID 0 endpoint
mptcp: pm: fix RM_ADDR ID for the initial subflow
mptcp: pm: reuse ID 0 after delete and re-add
net: busy-poll: use ktime_get_ns() instead of local_clock()
sctp: fix association labeling in the duplicate COOKIE-ECHO case
mptcp: pr_debug: add missing \n at the end
...
Dmitry Torokhov [Thu, 29 Aug 2024 15:38:54 +0000 (08:38 -0700)]
Input: cypress_ps2 - fix waiting for command response
Commit 8bccf667f62a ("Input: cypress_ps2 - report timeouts when reading
command status") uncovered an existing problem with cypress_ps2 driver:
it tries waiting on a PS/2 device waitqueue without using the rest of
libps2. Unfortunately without it nobody signals wakeup for the
waiting process, and each "extended" command was timing out. But the
rest of the code simply did not notice it.
Fix this by switching from homegrown way of sending request to get
command response and reading it to standard ps2_command() which does
the right thing.
Karthik Poosa [Tue, 27 Aug 2024 15:53:01 +0000 (21:23 +0530)]
drm/xe/hwmon: Fix WRITE_I1 param from u32 to u16
WRITE_I1 sub-command of the POWER_SETUP pcode command accepts a u16
parameter instead of u32. This change prevents potential illegal
sub-command errors.
v2: Mask uval instead of changing the prototype. (Badal)
Aleksandr Mishin [Tue, 27 Aug 2024 08:48:22 +0000 (11:48 +0300)]
nfc: pn533: Add poll mod list filling check
In case of im_protocols value is 1 and tm_protocols value is 0 this
combination successfully passes the check
'if (!im_protocols && !tm_protocols)' in the nfc_start_poll().
But then after pn533_poll_create_mod_list() call in pn533_start_poll()
poll mod list will remain empty and dev->poll_mod_count will remain 0
which lead to division by zero.
Normally no im protocol has value 1 in the mask, so this combination is
not expected by driver. But these protocol values actually come from
userspace via Netlink interface (NFC_CMD_START_POLL operation). So a
broken or malicious program may pass a message containing a "bad"
combination of protocol parameter values so that dev->poll_mod_count
is not incremented inside pn533_poll_create_mod_list(), thus leading
to division by zero.
Call trace looks like:
nfc_genl_start_poll()
nfc_start_poll()
->start_poll()
pn533_start_poll()
Add poll mod list filling check.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Paolo Abeni [Thu, 29 Aug 2024 09:35:54 +0000 (11:35 +0200)]
Merge tag 'nf-24-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
Patch #1 sets on NFT_PKTINFO_L4PROTO for UDP packets less than 4 bytes
payload from netdev/egress by subtracting skb_network_offset() when
validating IPv4 packet length, otherwise 'meta l4proto udp' never
matches.
Patch #2 subtracts skb_network_offset() when validating IPv6 packet
length for netdev/egress.
netfilter pull request 24-08-28
* tag 'nf-24-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables_ipv6: consider network offset in netdev/egress validation
netfilter: nf_tables: restore IP sanity checks for netdev/egress
====================
====================
mptcp: more fixes for the in-kernel PM
Here is a new batch of fixes for the MPTCP in-kernel path-manager:
Patch 1 ensures the address ID is set to 0 when the path-manager sends
an ADD_ADDR for the address of the initial subflow. The same fix is
applied when a new subflow is created re-using this special address. A
fix for v6.0.
Patch 2 is similar, but for the case where an endpoint is removed: if
this endpoint was used for the initial address, it is important to send
a RM_ADDR with this ID set to 0, and look for existing subflows with the
ID set to 0. A fix for v6.0 as well.
Patch 3 validates the two previous patches.
Patch 4 makes the PM selecting an "active" path to send an address
notification in an ACK, instead of taking the first path in the list. A
fix for v5.11.
Patch 5 fixes skipping the establishment of a new subflow if a previous
subflow using the same pair of addresses is being closed. A fix for
v5.13.
Patch 6 resets the ID linked to the initial subflow when the linked
endpoint is re-added, possibly with a different ID. A fix for v6.0.
Patch 7 validates the three previous patches.
Patch 8 is a small fix for the MPTCP Join selftest, when being used with
older subflows not supporting all MIB counters. A fix for a commit
introduced in v6.4, but backported up to v5.10.
Patch 9 avoids the PM to try to close the initial subflow multiple
times, and increment counters while nothing happened. A fix for v5.10.
Patch 10 stops incrementing local_addr_used and add_addr_accepted
counters when dealing with the address ID 0, because these counters are
not taking into account the initial subflow, and are then not
decremented when the linked addresses are removed. A fix for v6.0.
Patch 11 validates the previous patch.
Patch 12 avoids the PM to send multiple SUB_CLOSED events for the
initial subflow. A fix for v5.12.
Patch 13 validates the previous patch.
Patch 14 stops treating the ADD_ADDR 0 as a new address, and accepts it
in order to re-create the initial subflow if it has been closed, even if
the limit for *new* addresses -- not taking into account the address of
the initial subflow -- has been reached. A fix for v5.10.
Patch 15 validates the previous patch.
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
---
Matthieu Baerts (NGI0) (15):
mptcp: pm: reuse ID 0 after delete and re-add
mptcp: pm: fix RM_ADDR ID for the initial subflow
selftests: mptcp: join: check removing ID 0 endpoint
mptcp: pm: send ACK on an active subflow
mptcp: pm: skip connecting to already established sf
mptcp: pm: reset MPC endp ID when re-added
selftests: mptcp: join: check re-adding init endp with != id
selftests: mptcp: join: no extra msg if no counter
mptcp: pm: do not remove already closed subflows
mptcp: pm: fix ID 0 endp usage after multiple re-creations
selftests: mptcp: join: check re-re-adding ID 0 endp
mptcp: avoid duplicated SUB_CLOSED events
selftests: mptcp: join: validate event numbers
mptcp: pm: ADD_ADDR 0 is not a new address
selftests: mptcp: join: check re-re-adding ID 0 signal
selftests: mptcp: join: check re-re-adding ID 0 signal
This test extends "delete re-add signal" to validate the previous
commit: when the 'signal' endpoint linked to the initial subflow (ID 0)
is re-added multiple times, it will re-send the ADD_ADDR with id 0. The
client should still be able to re-create this subflow, even if the
add_addr_accepted limit has been reached as this special address is not
considered as a new address.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
The ADD_ADDR 0 with the address from the initial subflow should not be
considered as a new address: this is not something new. If the host
receives it, it simply means that the address is available again.
When receiving an ADD_ADDR for the ID 0, the PM already doesn't consider
it as new by not incrementing the 'add_addr_accepted' counter. But the
'accept_addr' might not be set if the limit has already been reached:
this can be bypassed in this case. But before, it is important to check
that this ADD_ADDR for the ID 0 is for the same address as the initial
subflow. If not, it is not something that should happen, and the
ADD_ADDR can be ignored.
Note that if an ADD_ADDR is received while there is already a subflow
opened using the same address, this ADD_ADDR is ignored as well. It
means that if multiple ADD_ADDR for ID 0 are received, there will not be
any duplicated subflows created by the client.
This test extends "delete and re-add" and "delete re-add signal" to
validate the previous commit: the number of MPTCP events are checked to
make sure there are no duplicated or unexpected ones.
A new helper has been introduced to easily check these events. The
missing events have been added to the lib.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
The initial subflow might have already been closed, but still in the
connection list. When the worker is instructed to close the subflows
that have been marked as closed, it might then try to close the initial
subflow again.
A consequence of that is that the SUB_CLOSED event can be seen twice:
# ip mptcp endpoint
1.1.1.1 id 1 subflow dev eth0
2.2.2.2 id 2 subflow dev eth1
# ip mptcp endpoint delete id 1
[ SF_CLOSED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9
[ SF_CLOSED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9
The first one is coming from mptcp_pm_nl_rm_subflow_received(), and the
second one from __mptcp_close_subflow().
To avoid doing the post-closed processing twice, the subflow is now
marked as closed the first time.
Note that it is not enough to check if we are dealing with the first
subflow and check its sk_state: the subflow might have been reset or
closed before calling mptcp_close_ssk().
selftests: mptcp: join: check re-re-adding ID 0 endp
This test extends "delete and re-add" to validate the previous commit:
when the endpoint linked to the initial subflow (ID 0) is re-added
multiple times, it was no longer being used, because the internal linked
counters are not decremented for this special endpoint: it is not an
additional endpoint.
Here, the "del/add id 0" steps are done 3 times to unsure this case is
validated.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
mptcp: pm: fix ID 0 endp usage after multiple re-creations
'local_addr_used' and 'add_addr_accepted' are decremented for addresses
not related to the initial subflow (ID0), because the source and
destination addresses of the initial subflows are known from the
beginning: they don't count as "additional local address being used" or
"ADD_ADDR being accepted".
It is then required not to increment them when the entrypoint used by
the initial subflow is removed and re-added during a connection. Without
this modification, this entrypoint cannot be removed and re-added more
than once.
It is possible to have in the list already closed subflows, e.g. the
initial subflow has been already closed, but still in the list. No need
to try to close it again, and increments the related counters again.
selftests: mptcp: join: check re-adding init endp with != id
The initial subflow has a special local ID: 0. It is specific per
connection.
When a global endpoint is deleted and re-added later, it can have a
different ID, but the kernel should still use the ID 0 if it corresponds
to the initial address.
This test validates this behaviour: the endpoint linked to the initial
subflow is removed, and re-added with a different ID.
Note that removing the initial subflow will not decrement the 'subflows'
counters, which corresponds to the *additional* subflows. On the other
hand, when the same endpoint is re-added, it will increment this
counter, as it will be seen as an additional subflow this time.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
The initial subflow has a special local ID: 0. It is specific per
connection.
When a global endpoint is deleted and re-added later, it can have a
different ID -- most services managing the endpoints automatically don't
force the ID to be the same as before. It is then important to track
these modifications to be consistent with the ID being used for the
address used by the initial subflow, not to confuse the other peer or to
send the ID 0 for the wrong address.
Now when removing an endpoint, msk->mpc_endpoint_id is reset if it
corresponds to this endpoint. When adding a new endpoint, the same
variable is updated if the address match the one of the initial subflow.
mptcp: pm: skip connecting to already established sf
The lookup_subflow_by_daddr() helper checks if there is already a
subflow connected to this address. But there could be a subflow that is
closing, but taking time due to some reasons: latency, losses, data to
process, etc.
If an ADD_ADDR is received while the endpoint is being closed, it is
better to try connecting to it, instead of rejecting it: the peer which
has sent the ADD_ADDR will not be notified that the ADD_ADDR has been
rejected for this reason, and the expected subflow will not be created
at the end.
This helper should then only look for subflows that are established, or
going to be, but not the ones being closed.
Taking the first one on the list doesn't work in some cases, e.g. if the
initial subflow is being removed. Pick another one instead of not
sending anything.
selftests: mptcp: join: check removing ID 0 endpoint
Removing the endpoint linked to the initial subflow should trigger a
RM_ADDR for the right ID, and the removal of the subflow. That's what is
now being verified in the "delete and re-add" test.
Note that removing the initial subflow will not decrement the 'subflows'
counters, which corresponds to the *additional* subflows. On the other
hand, when the same endpoint is re-added, it will increment this
counter, as it will be seen as an additional subflow this time.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.