Chuck Lever [Fri, 13 Sep 2024 18:08:13 +0000 (14:08 -0400)]
tools: Add xdrgen
Add a Python-based tool for translating XDR specifications into XDR
encoder and decoder functions written in the Linux kernel's C coding
style. The generator attempts to match the usual C coding style of
the Linux kernel's SunRPC consumers.
This approach is similar to the netlink code generator in
tools/net/ynl .
The maintainability benefits of machine-generated XDR code include:
- Stronger type checking
- Reduces the number of bugs introduced by human error
- Makes the XDR code easier to audit and analyze
- Enables rapid prototyping of new RPC-based protocols
- Hardens the layering between protocol logic and marshaling
- Makes it easier to add observability on demand
- Unit tests might be built for both the tool and (automatically)
for the generated code
In addition, converting the XDR layer to use memory-safe languages
such as Rust will be easier if much of the code can be converted
automatically.
nfsd: fix delegation_blocked() to block correctly for at least 30 seconds
The pair of bloom filtered used by delegation_blocked() was intended to
block delegations on given filehandles for between 30 and 60 seconds. A
new filehandle would be recorded in the "new" bit set. That would then
be switch to the "old" bit set between 0 and 30 seconds later, and it
would remain as the "old" bit set for 30 seconds.
Unfortunately the code intended to clear the old bit set once it reached
30 seconds old, preparing it to be the next new bit set, instead cleared
the *new* bit set before switching it to be the old bit set. This means
that the "old" bit set is always empty and delegations are blocked
between 0 and 30 seconds.
This patch updates bd->new before clearing the set with that index,
instead of afterwards.
Jeff Layton [Mon, 9 Sep 2024 14:40:53 +0000 (10:40 -0400)]
nfsd: fix initial getattr on write delegation
At this point in compound processing, currentfh refers to the parent of
the file, not the file itself. Get the correct dentry from the delegation
stateid instead.
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation") Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
NeilBrown [Thu, 29 Aug 2024 13:26:40 +0000 (09:26 -0400)]
nfsd: untangle code in nfsd4_deleg_getattr_conflict()
The code in nfsd4_deleg_getattr_conflict() is convoluted and buggy.
With this patch we:
- properly handle non-nfsd leases. We must not assume flc_owner is a
delegation unless fl_lmops == &nfsd_lease_mng_ops
- move the main code out of the for loop
- have a single exit which calls nfs4_put_stid()
(and other exits which don't need to call that)
[ jlayton: refactored on top of Neil's other patch: nfsd: fix
nfsd4_deleg_getattr_conflict in presence of third party lease ]
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation") Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
Scott Mayhew [Mon, 9 Sep 2024 20:28:54 +0000 (16:28 -0400)]
nfsd: enforce upper limit for namelen in __cld_pipe_inprogress_downcall()
This patch is intended to go on top of "nfsd: return -EINVAL when
namelen is 0" from Li Lingfeng. Li's patch checks for 0, but we should
be enforcing an upper bound as well.
Note that if nfsdcld somehow gets an id > NFS4_OPAQUE_LIMIT in its
database, it'll truncate it to NFS4_OPAQUE_LIMIT when it does the
downcall anyway.
Li Lingfeng [Tue, 3 Sep 2024 11:14:46 +0000 (19:14 +0800)]
nfsd: return -EINVAL when namelen is 0
When we have a corrupted main.sqlite in /var/lib/nfs/nfsdcld/, it may
result in namelen being 0, which will cause memdup_user() to return
ZERO_SIZE_PTR.
When we access the name.data that has been assigned the value of
ZERO_SIZE_PTR in nfs4_client_to_reclaim(), null pointer dereference is
triggered.
Chuck Lever [Wed, 28 Aug 2024 17:40:04 +0000 (13:40 -0400)]
NFSD: Limit the number of concurrent async COPY operations
Nothing appears to limit the number of concurrent async COPY
operations that clients can start. In addition, AFAICT each async
COPY can copy an unlimited number of 4MB chunks, so can run for a
long time. Thus IMO async COPY can become a DoS vector.
Add a restriction mechanism that bounds the number of concurrent
background COPY operations. Start simple and try to be fair -- this
patch implements a per-namespace limit.
An async COPY request that occurs while this limit is exceeded gets
NFS4ERR_DELAY. The requesting client can choose to send the request
again after a delay or fall back to a traditional read/write style
copy.
If there is need to make the mechanism more sophisticated, we can
visit that in future patches.
Chuck Lever [Wed, 28 Aug 2024 17:40:03 +0000 (13:40 -0400)]
NFSD: Async COPY result needs to return a write verifier
Currently, when NFSD handles an asynchronous COPY, it returns a
zero write verifier, relying on the subsequent CB_OFFLOAD callback
to pass the write verifier and a stable_how4 value to the client.
However, if the CB_OFFLOAD never arrives at the client (for example,
if a network partition occurs just as the server sends the
CB_OFFLOAD operation), the client will never receive this verifier.
Thus, if the client sends a follow-up COMMIT, there is no way for
the client to assess the COMMIT result.
The usual recovery for a missing CB_OFFLOAD is for the client to
send an OFFLOAD_STATUS operation, but that operation does not carry
a write verifier in its result. Neither does it carry a stable_how4
value, so the client /must/ send a COMMIT in this case -- which will
always fail because currently there's still no write verifier in the
COPY result.
Thus the server needs to return a normal write verifier in its COPY
result even if the COPY operation is to be performed asynchronously.
If the server recognizes the callback stateid in subsequent
OFFLOAD_STATUS operations, then obviously it has not restarted, and
the write verifier the client received in the COPY result is still
valid and can be used to assess a COMMIT of the copied data, if one
is needed.
NeilBrown [Fri, 30 Aug 2024 07:03:17 +0000 (17:03 +1000)]
nfsd: avoid races with wake_up_var()
wake_up_var() needs a barrier after the important change is made in the
var and before wake_up_var() is called, else it is possible that a wake
up won't be sent when it should.
In each case here the var is changed in an "atomic" manner, so
smb_mb__after_atomic() is sufficient.
In one case the important change (removing the lease) is performed
*after* the wake_up, which is backwards. The code survives in part
because the wait_var_event is given a timeout.
This patch adds the required barriers and calls destroy_delegation()
*before* waking any threads waiting for the delegation to be destroyed.
Thorsten Blum [Wed, 28 Aug 2024 21:42:55 +0000 (23:42 +0200)]
NFSD: Annotate struct pnfs_block_deviceaddr with __counted_by()
Add the __counted_by compiler attribute to the flexible array member
volumes to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.
Use struct_size() instead of manually calculating the number of bytes to
allocate for a pnfs_block_deviceaddr with a single volume.
Guoqing Jiang [Wed, 21 Aug 2024 14:03:18 +0000 (22:03 +0800)]
nfsd: call cache_put if xdr_reserve_space returns NULL
If not enough buffer space available, but idmap_lookup has triggered
lookup_fn which calls cache_get and returns successfully. Then we
missed to call cache_put here which pairs with cache_get.
Fixes: ddd1ea563672 ("nfsd4: use xdr_reserve_space in attribute encoding") Signed-off-by: Guoqing Jiang <[email protected]> Reviwed-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
Jeff Layton [Mon, 26 Aug 2024 12:50:12 +0000 (08:50 -0400)]
nfsd: track the main opcode for callbacks
Keep track of the "main" opcode for the callback, and display it in the
tracepoint. This makes it simpler to discern what's happening when there
is more than one callback in flight.
The one special case is the CB_NULL RPC. That's not a CB_COMPOUND
opcode, so designate the value 0 for that.
Li Lingfeng [Fri, 23 Aug 2024 07:00:49 +0000 (15:00 +0800)]
nfsd: remove unused parameter of nfsd_file_mark_find_or_create
Commit 427f5f83a319 ("NFSD: Ensure nf_inode is never dereferenced") passes
inode directly to nfsd_file_mark_find_or_create instead of getting it from
nf, so there is no need to pass nf.
Li Lingfeng [Wed, 14 Aug 2024 11:29:07 +0000 (19:29 +0800)]
NFSD: remove redundant assignment operation
Commit 5826e09bf3dd ("NFSD: OP_CB_RECALL_ANY should recall both read and
write delegations") added a new assignment statement to add
RCA4_TYPE_MASK_WDATA_DLG to ra_bmval bitmask of OP_CB_RECALL_ANY. So the
old one should be removed.
Chuck Lever [Sun, 11 Aug 2024 17:11:07 +0000 (13:11 -0400)]
NFSD: Fix NFSv4's PUTPUBFH operation
According to RFC 8881, all minor versions of NFSv4 support PUTPUBFH.
Replace the XDR decoder for PUTPUBFH with a "noop" since we no
longer want the minorversion check, and PUTPUBFH has no arguments to
decode. (Ideally nfsd4_decode_noop should really be called
nfsd4_decode_void).
Mark Grimes [Wed, 7 Aug 2024 01:58:34 +0000 (18:58 -0700)]
nfsd: Add quotes to client info 'callback address'
The 'callback address' in client_info_show is output without quotes
causing yaml parsers to fail on processing IPv6 addresses.
Adding quotes to 'callback address' also matches that used by
the 'address' field.
Chuck Lever [Mon, 29 Jul 2024 20:52:32 +0000 (16:52 -0400)]
svcrdma: Handle device removal outside of the CM event handler
Synchronously wait for all disconnects to complete to ensure the
transports have divested all hardware resources before the
underlying RDMA device can safely be removed.
NeilBrown [Wed, 14 Aug 2024 13:21:01 +0000 (09:21 -0400)]
nfsd: move error choice for incorrect object types to version-specific code.
If an NFS operation expects a particular sort of object (file, dir, link,
etc) but gets a file handle for a different sort of object, it must
return an error. The actual error varies among NFS versions in non-trivial
ways.
For v2 and v3 there are ISDIR and NOTDIR errors and, for NFSv4 only,
INVAL is suitable.
For v4.0 there is also NFS4ERR_SYMLINK which should be used if a SYMLINK
was found when not expected. This take precedence over NOTDIR.
For v4.1+ there is also NFS4ERR_WRONG_TYPE which should be used in
preference to EINVAL when none of the specific error codes apply.
When nfsd_mode_check() finds a symlink where it expected a directory it
needs to return an error code that can be converted to NOTDIR for v2 or
v3 but will be SYMLINK for v4. It must be different from the error
code returns when it finds a symlink but expects a regular file - that
must be converted to EINVAL or SYMLINK.
So we introduce an internal error code nfserr_symlink_not_dir which each
version converts as appropriate.
nfsd_check_obj_isreg() is similar to nfsd_mode_check() except that it is
only used by NFSv4 and only for OPEN. NFSERR_INVAL is never a suitable
error if the object is the wrong time. For v4.0 we use nfserr_symlink
for non-dirs even if not a symlink. For v4.1 we have nfserr_wrong_type.
We handle this difference in-place in nfsd_check_obj_isreg() as there is
nothing to be gained by delaying the choice to nfsd4_map_status().
As a result of these changes, nfsd_mode_check() doesn't need an rqstp
arg any more.
Note that NFSv4 operations are actually performed in the xdr code(!!!)
so to the only place that we can map the status code successfully is in
nfsd4_encode_operation().
nfsd: be more systematic about selecting error codes for internal use.
Rather than using ad hoc values for internal errors (30000, 11000, ...)
use 'enum' to sequentially allocate numbers starting from the first
known available number - now visible as NFS4ERR_FIRST_FREE.
The goal is values that are distinct from all be32 error codes. To get
those we must first select integers that are not already used, then
convert them with cpu_to_be32().
nfsd: Move error code mapping to per-version proc code.
There is code scattered around nfsd which chooses an error status based
on the particular version of nfs being used. It is cleaner to have the
version specific choices in version specific code.
With this patch common code returns the most specific error code
possible and the version specific code maps that if necessary.
Both v2 (nfsproc.c) and v3 (nfs3proc.c) now have a "map_status()"
function which is called to map the resp->status before each non-trivial
nfsd_proc_* or nfsd3_proc_* function returns.
NFS4ERR_SYMLINK and NFS4ERR_WRONG_TYPE introduce extra complications and
are left for a later patch.
With this patch the only places that test ->rq_vers against a specific
version are nfsd_v4client() and nfsd_set_fh_dentry().
The latter sets some flags in the svc_fh, which now includes:
fh_64bit_cookies
fh_use_wgather
nfsd: use nfsd_v4client() in nfsd_breaker_owns_lease()
nfsd_breaker_owns_lease() currently open-codes the same test that
nfsd_v4client() performs.
With this patch we use nfsd_v4client() instead.
Also as i_am_nfsd() is only used in combination with kthread_data(),
replace it with nfsd_current_rqst() which combines the two and returns a
valid svc_rqst, or NULL.
The test for NULL is moved into nfsd_v4client() for code clarity.
nfsd: Pass 'cred' instead of 'rqstp' to some functions.
nfsd_permission(), exp_rdonly(), nfsd_setuser(), and nfsexp_flags()
only ever need the cred out of rqstp, so pass it explicitly instead of
the whole rqstp.
sunrpc: allow svc threads to fail initialisation cleanly
If an svc thread needs to perform some initialisation that might fail,
it has no good way to handle the failure.
Before the thread can exit it must call svc_exit_thread(), but that
requires the service mutex to be held. The thread cannot simply take
the mutex as that could deadlock if there is a concurrent attempt to
shut down all threads (which is unlikely, but not impossible).
nfsd currently call svc_exit_thread() unprotected in the unlikely event
that unshare_fs_struct() fails.
We can clean this up by introducing svc_thread_init_status() by which an
svc thread can report whether initialisation has succeeded. If it has,
it continues normally into the action loop. If it has not,
svc_thread_init_status() immediately aborts the thread.
svc_start_kthread() waits for either of these to happen, and calls
svc_exit_thread() (under the mutex) if the thread aborted.
sunrpc: don't take ->sv_lock when updating ->sv_nrthreads.
As documented in svc_xprt.c, sv_nrthreads is protected by the service
mutex, and it does not need ->sv_lock.
(->sv_lock is needed only for sv_permsocks, sv_tempsocks, and
sv_tmpcnt).
SUNRPC: make various functions static, or not exported.
Various functions are only used within the sunrpc module, and several
are only use in the one file. So clean up:
These are marked static, and any EXPORT is removed.
svc_rcpb_setup()
svc_rqst_alloc()
svc_rqst_free() - also moved before first use
svc_rpcbind_set_version()
svc_drop() - also moved to svc.c
These are now not EXPORTed, but are not static.
svc_authenticate()
svc_sock_update_bufs()
nlmsvc_timeout always has the same value as (nlm_timeout * HZ), so use
that in the one place that nlmsvc_timeout is used.
In truth it *might* not always be the same as nlmsvc_timeout is only set
when lockd is started while nlm_timeout can be set at anytime via
sysctl. I think this difference it not helpful so removing it is good.
Also remove the test for nlm_timout being 0. This is not possible -
unless a module parameter is used to set the minimum timeout to 0, and
if that happens then it probably should be honoured.
Youzhong Yang [Thu, 11 Jul 2024 15:51:33 +0000 (11:51 -0400)]
nfsd: use system_unbound_wq for nfsd_file_gc_worker()
After many rounds of changes in filecache.c, the fix by commit ce7df055(NFSD: Make the file_delayed_close workqueue UNBOUND)
is gone, now we are getting syslog messages like these:
[ 1618.186688] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 4 times, consider switching to WQ_UNBOUND
[ 1638.661616] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 8 times, consider switching to WQ_UNBOUND
[ 1665.284542] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 16 times, consider switching to WQ_UNBOUND
[ 1759.491342] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 32 times, consider switching to WQ_UNBOUND
[ 3013.012308] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 64 times, consider switching to WQ_UNBOUND
[ 3154.172827] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 128 times, consider switching to WQ_UNBOUND
[ 3422.461924] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 256 times, consider switching to WQ_UNBOUND
[ 3963.152054] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 512 times, consider switching to WQ_UNBOUND
Consider use system_unbound_wq instead of system_wq for
nfsd_file_gc_worker().
Jeff Layton [Wed, 10 Jul 2024 13:05:33 +0000 (09:05 -0400)]
nfsd: count nfsd_file allocations
We already count the frees (via nfsd_file_releases). Count the
allocations as well. Also switch the direct call to nfsd_file_slab_free
in nfsd_file_do_acquire to nfsd_file_free, so that the allocs and
releases match up.
Jeff Layton [Wed, 10 Jul 2024 13:05:32 +0000 (09:05 -0400)]
nfsd: fix refcount leak when file is unhashed after being found
If we wait_for_construction and find that the file is no longer hashed,
and we're going to retry the open, the old nfsd_file reference is
currently leaked. Put the reference before retrying.
Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error") Signed-off-by: Jeff Layton <[email protected]> Tested-by: Youzhong Yang <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
Youzhong Yang [Wed, 10 Jul 2024 14:40:35 +0000 (10:40 -0400)]
nfsd: add list_head nf_gc to struct nfsd_file
nfsd_file_put() in one thread can race with another thread doing
garbage collection (running nfsd_file_gc() -> list_lru_walk() ->
nfsd_file_lru_cb()):
* In nfsd_file_put(), nf->nf_ref is 1, so it tries to do nfsd_file_lru_add().
* nfsd_file_lru_add() returns true (with NFSD_FILE_REFERENCED bit set)
* garbage collector kicks in, nfsd_file_lru_cb() clears REFERENCED bit and
returns LRU_ROTATE.
* garbage collector kicks in again, nfsd_file_lru_cb() now decrements nf->nf_ref
to 0, runs nfsd_file_unhash(), removes it from the LRU and adds to the dispose
list [list_lru_isolate_move(lru, &nf->nf_lru, head)]
* nfsd_file_put() detects NFSD_FILE_HASHED bit is cleared, so it tries to remove
the 'nf' from the LRU [if (!nfsd_file_lru_remove(nf))]. The 'nf' has been added
to the 'dispose' list by nfsd_file_lru_cb(), so nfsd_file_lru_remove(nf) simply
treats it as part of the LRU and removes it, which leads to its removal from
the 'dispose' list.
* At this moment, 'nf' is unhashed with its nf_ref being 0, and not on the LRU.
nfsd_file_put() continues its execution [if (refcount_dec_and_test(&nf->nf_ref))],
as nf->nf_ref is already 0, nf->nf_ref is set to REFCOUNT_SATURATED, and the 'nf'
gets no chance of being freed.
nfsd_file_put() can also race with nfsd_file_cond_queue():
* In nfsd_file_put(), nf->nf_ref is 1, so it tries to do nfsd_file_lru_add().
* nfsd_file_lru_add() sets REFERENCED bit and returns true.
* Some userland application runs 'exportfs -f' or something like that, which triggers
__nfsd_file_cache_purge() -> nfsd_file_cond_queue().
* In nfsd_file_cond_queue(), it runs [if (!nfsd_file_unhash(nf))], unhash is done
successfully.
* nfsd_file_cond_queue() runs [if (!nfsd_file_get(nf))], now nf->nf_ref goes to 2.
* nfsd_file_cond_queue() runs [if (nfsd_file_lru_remove(nf))], it succeeds.
* nfsd_file_cond_queue() runs [if (refcount_sub_and_test(decrement, &nf->nf_ref))]
(with "decrement" being 2), so the nf->nf_ref goes to 0, the 'nf' is added to the
dispose list [list_add(&nf->nf_lru, dispose)]
* nfsd_file_put() detects NFSD_FILE_HASHED bit is cleared, so it tries to remove
the 'nf' from the LRU [if (!nfsd_file_lru_remove(nf))], although the 'nf' is not
in the LRU, but it is linked in the 'dispose' list, nfsd_file_lru_remove() simply
treats it as part of the LRU and removes it. This leads to its removal from
the 'dispose' list!
* Now nf->ref is 0, unhashed. nfsd_file_put() continues its execution and set
nf->nf_ref to REFCOUNT_SATURATED.
As shown in the above analysis, using nf_lru for both the LRU list and dispose list
can cause the leaks. This patch adds a new list_head nf_gc in struct nfsd_file, and uses
it for the dispose list. This does not fix the nfsd_file leaking issue completely.
Merge tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- copy_file_range fix
- two read fixes including read past end of file rc fix and read retry
crediting fix
- falloc zero range fix
* tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix FALLOC_FL_ZERO_RANGE to preflush buffered part of target region
cifs: Fix copy offload to flush destination region
netfs, cifs: Fix handling of short DIO read
cifs: Fix lack of credit renegotiation on read retry
Merge tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs
Push bcachefs fixes from Kent Overstreet:
"The data corruption in the buffered write path is troubling; inode
lock should not have been able to cause that...
- Fix a rare data corruption in the rebalance path, caught as a nonce
inconsistency on encrypted filesystems
- Revert lockless buffered write path
- Mark more errors as autofix"
* tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs:
bcachefs: Mark more errors as autofix
bcachefs: Revert lockless buffered IO path
bcachefs: Fix bch2_extents_match() false positive
bcachefs: Fix failure to return error in data_update_index_update()
Kent Overstreet [Thu, 22 Aug 2024 15:47:32 +0000 (11:47 -0400)]
bcachefs: Mark more errors as autofix
errors that are known to always be safe to fix should be autofix: this
should be most errors even at this point, but that will need some
thorough review.
note that errors are still logged in the superblock, so we'll still know
that they happened.
It seems that writes are being dropped, but only when issued by QEMU,
and possibly only in snapshot mode. It's undetermined if it's write
calls are being dropped or dirty folios.
Further testing, via minimizing the original patch to just the change
that skips the inode lock on non appends/truncates, reveals that it
really is just not taking the inode lock that causes the corruption: it
has nothing to do with the other logic changes for preserving write
atomicity in corner cases.
It's also kernel config dependent: it doesn't reproduce with the minimal
kernel config that ktest uses, but it does reproduce with nixos's distro
config. Bisection the kernel config initially pointer the finger at page
migration or compaction, but it appears that was erroneous; we haven't
yet determined what kernel config option actually triggers it.
Sadly it appears this will have to be reverted since we're getting too
close to release and my plate is full, but we'd _really_ like to fully
debug it.
My suspicion is that this patch is exposing a preexisting bug - the
inode lock actually covers very little in IO paths, and we have a
different lock (the pagecache add lock) that guards against races with
truncate here.
Fixes: 7e64c86cdc6c ("bcachefs: Buffered write path now can avoid the inode lock") Signed-off-by: Kent Overstreet <[email protected]>
Linus Torvalds [Sat, 31 Aug 2024 21:18:48 +0000 (09:18 +1200)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull misc fixes from Guenter Roeck.
These are fixes for regressions that Guenther has been reporting, and
the maintainers haven't picked up and sent in. With rc6 fairly imminent,
I'm taking them directly from Guenter.
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
apparmor: fix policy_unpack_test on big endian systems
Revert "MIPS: csrc-r4k: Apply verification clocksource flags"
microblaze: don't treat zero reserved memory regions as error
Linus Torvalds [Sat, 31 Aug 2024 21:07:44 +0000 (09:07 +1200)]
Merge tag 'pwrseq-fixes-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull power sequencing fix from Bartosz Golaszewski:
"A follow-up fix for the power sequencing subsystem. It turned out the
previous fix for this driver was incomplete and broke the WLAN support
on some platforms. This addresses the issue.
- set the direction of the wlan-enable GPIO to output after
requesting it as-is"
* tag 'pwrseq-fixes-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
power: sequencing: qcom-wcn: set the wlan-enable GPIO to output
power: sequencing: qcom-wcn: set the wlan-enable GPIO to output
Commit a9aaf1ff88a8 ("power: sequencing: request the WLAN enable GPIO
as-is") broke WLAN on boards on which the wlan-enable GPIO enabling the
wifi module isn't in output mode by default. We need to set direction to
output while retaining the value that was already set to keep the ath
module on if it's already started.
Linus Torvalds [Sat, 31 Aug 2024 19:00:38 +0000 (07:00 +1200)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Minor fixes only.
The sd.c one ignores a sync cache request if format is in progress
which can happen if formatting a drive across suspend/resume"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: Ignore command SYNCHRONIZE CACHE error if format in progress
scsi: aacraid: Fix double-free on probe failure
scsi: lpfc: Fix overflow build issue
Linus Torvalds [Sat, 31 Aug 2024 18:55:47 +0000 (06:55 +1200)]
Merge tag 'nfsd-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- One more write delegation fix
* tag 'nfsd-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix nfsd4_deleg_getattr_conflict in presence of third party lease
Linus Torvalds [Sat, 31 Aug 2024 18:48:37 +0000 (06:48 +1200)]
Merge tag 'xfs-6.11-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Chandan Babu:
- Do not call out v1 inodes with non-zero di_nlink field as being
corrupt
- Change xfs_finobt_count_blocks() to count "free inode btree" blocks
rather than "inode btree" blocks
- Don't report the number of trimmed bytes via FITRIM because the
underlying storage isn't required to do anything and failed discard
IOs aren't reported to the caller anyway
- Fix incorrect setting of rm_owner field in an rmap query
- Report missing disk offset range in an fsmap query
- Obtain m_growlock when extending realtime section of the filesystem
- Reset rootdir extent size hint after extending realtime section of
the filesystem
* tag 'xfs-6.11-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: reset rootdir extent size hint after growfsrt
xfs: take m_growlock when running growfsrt
xfs: Fix missing interval for missing_owner in xfs fsmap
xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code
xfs: Fix the owner setting issue for rmap query in xfs fsmap
xfs: don't bother reporting blocks trimmed via FITRIM
xfs: xfs_finobt_count_blocks() walks the wrong btree
xfs: fix folio dirtying for XFILE_ALLOC callers
xfs: fix di_onlink checking for V1/V2 inodes
Linus Torvalds [Sat, 31 Aug 2024 18:42:13 +0000 (06:42 +1200)]
Merge tag 'arm-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"There is a fairly large number of bug fixes for Qualcomm platforms,
most of them addressing issues with the devicetree files for the newly
added Snapdragon X1 based laptops to make them more reliable.
The Qualcomm driver changes address a few build-time issues as well as
runtime problems in the tzmem and scm firmware, the USB Type-C driver,
and the cmd-db and pmic_glink soc drivers.
The NXP i.MX usually gets a bunch of devicetree fixes that is
proportional to the number of supported machines. This includes both
warning fixes and correctness for the 64-bit i.MX9, i.MX8 and
layerscape platforms, as well as a single fix for a 32-bit i.MX6 based
board.
The other changes are the usual minor changes, including an update to
the MAINTAINERS file, an omap3 dts file and a SoC driver for mpfs
(risc-v)"
* tag 'arm-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (50 commits)
firmware: microchip: fix incorrect error report of programming:timeout on success
soc: qcom: pd-mapper: Fix singleton refcount
firmware: qcom: tzmem: disable sdm670 platform
soc: qcom: pmic_glink: Actually communicate when remote goes down
usb: typec: ucsi: Move unregister out of atomic section
soc: qcom: pmic_glink: Fix race during initialization
firmware: qcom: qseecom: remove unused functions
firmware: qcom: tzmem: fix virtual-to-physical address conversion
firmware: qcom: scm: Mark get_wq_ctx() as atomic call
arm64: dts: qcom: x1e80100: Fix Adreno SMMU global interrupt
arm64: dts: qcom: disable GPU on x1e80100 by default
arm64: dts: imx8mm-phygate: fix typo pinctrcl-0
arm64: dts: imx95: correct L3Cache cache-sets
arm64: dts: imx95: correct a55 power-domains
arm64: dts: freescale: imx93-tqma9352-mba93xxla: fix typo
arm64: dts: freescale: imx93-tqma9352: fix CMA alloc-ranges
ARM: dts: imx6dl-yapp43: Increase LED current to match the yapp4 HW design
arm64: dts: imx93: update default value for snps,clk-csr
arm64: dts: freescale: tqma9352: Fix watchdog reset
arm64: dts: imx8mp-beacon-kit: Fix Stereo Audio on WM8962
...
Linus Torvalds [Sat, 31 Aug 2024 03:32:38 +0000 (15:32 +1200)]
Merge tag 'input-for-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fix from Dmitry Torokhov:
- a fix for Cypress PS/2 touchpad for regression introduced in 6.11
merge window where a timeout condition is incorrectly reported for
all extended Cypress commands
* tag 'input-for-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: cypress_ps2 - fix waiting for command response
Linus Torvalds [Sat, 31 Aug 2024 02:54:11 +0000 (14:54 +1200)]
Merge tag 'pci-v6.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Add Manivannan Sadhasivam as PCI native host bridge and endpoint
driver reviewer (Manivannan Sadhasivam)
- Disable MHI RAM data parity error interrupt for qcom SA8775P SoC to
work around hardware erratum that causes a constant stream of
interrupts (Manivannan Sadhasivam)
- Don't try to fall back to qcom Operating Performance Points (OPP)
support unless the platform actually supports OPP (Manivannan
Sadhasivam)
- Add [email protected] mailing list to MAINTAINERS for NXP
layerscape and imx6 PCI controller drivers (Frank Li)
* tag 'pci-v6.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
MAINTAINERS: PCI: Add NXP PCI controller mailing list [email protected]
PCI: qcom: Use OPP only if the platform supports it
PCI: qcom-ep: Disable MHI RAM data parity error interrupt for SA8775P SoC
MAINTAINERS: Add Manivannan Sadhasivam as Reviewer for PCI native host bridge and endpoint drivers
Linus Torvalds [Sat, 31 Aug 2024 01:51:27 +0000 (13:51 +1200)]
Merge tag 'io_uring-6.11-20240830' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- A fix for a regression that happened in 6.11 merge window, where the
copying of iovecs for compat mode applications got broken for certain
cases.
- Fix for a bug introduced in 6.10, where if using recv/send bundles
with classic provided buffers, the recv/send would fail to set the
right iovec count. This caused 0 byte send/recv results. Found via
code coverage testing and writing a test case to exercise it.
* tag 'io_uring-6.11-20240830' of git://git.kernel.dk/linux:
io_uring/kbuf: return correct iovec count from classic buffer peek
io_uring/rsrc: ensure compat iovecs are copied correctly
Linus Torvalds [Fri, 30 Aug 2024 18:33:59 +0000 (06:33 +1200)]
Merge tag 'lsm-pr-20240830' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm fix from Paul Moore:
"One small patch to correct a NFS permissions problem with SELinux and
Smack"
* tag 'lsm-pr-20240830' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
selinux,smack: don't bypass permissions check in inode_setsecctx hook
Linus Torvalds [Fri, 30 Aug 2024 18:25:34 +0000 (06:25 +1200)]
Merge tag 'pm-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix three issues in the amd-pstate cpufreq driver.
Specifics:
- Remove checks for highest performance match on preferred cores when
updating preferred core ranking in amd-pstate (Mario Limonciello)
- Make amd-pstate call topology_logical_package_id() instead of
logical_die_id() to get a socked ID for a CPU (Gautham Shenoy)
- Fix uninitialized variable in amd_pstate_cpu_boost_update() (Dan
Carpenter)"
* tag 'pm-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq/amd-pstate-ut: Don't check for highest perf matching on prefcore
cpufreq/amd-pstate: Use topology_logical_package_id() instead of logical_die_id()
cpufreq: amd-pstate: Fix uninitialized variable in amd_pstate_cpu_boost_update()
Linus Torvalds [Fri, 30 Aug 2024 18:15:02 +0000 (06:15 +1200)]
Merge tag 'soundwire-6.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire fix from Vinod Koul:
- Single fix for non-continous port map programming
* tag 'soundwire-6.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: stream: fix programming slave ports for non-continous port maps
Linus Torvalds [Fri, 30 Aug 2024 18:11:34 +0000 (06:11 +1200)]
Merge tag 'iommu-fixes-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:
- Fix a device-stall problem in bad io-page-fault setups (faults
received from devices with no supporting domain attached).
- Context flush fix for Intel VT-d.
- Do not allow non-read+non-write mapping through iommufd as most
implementations can not handle that.
- Fix a possible infinite-loop issue in map_pages() path.
- Add Jean-Philippe as reviewer for SMMUv3 SVA support
* tag 'iommu-fixes-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
MAINTAINERS: Add Jean-Philippe as SMMUv3 SVA reviewer
iommu: Do not return 0 from map_pages if it doesn't do anything
iommufd: Do not allow creating areas without READ or WRITE
iommu/vt-d: Fix incorrect domain ID in context flush helper
iommu: Handle iommu faults for a bad iopf setup
Jens Axboe [Fri, 30 Aug 2024 16:45:54 +0000 (10:45 -0600)]
io_uring/kbuf: return correct iovec count from classic buffer peek
io_provided_buffers_select() returns 0 to indicate success, but it should
be returning 1 to indicate that 1 vec was mapped. This causes peeking
to fail with classic provided buffers, and while that's not a use case
that anyone should use, it should still work correctly.
The end result is that no buffer will be selected, and hence a completion
with '0' as the result will be posted, without a buffer attached.
NeilBrown [Wed, 28 Aug 2024 23:06:28 +0000 (09:06 +1000)]
nfsd: fix nfsd4_deleg_getattr_conflict in presence of third party lease
It is not safe to dereference fl->c.flc_owner without first confirming
fl->fl_lmops is the expected manager. nfsd4_deleg_getattr_conflict()
tests fl_lmops but largely ignores the result and assumes that flc_owner
is an nfs4_delegation anyway. This is wrong.
With this patch we restore the "!= &nfsd_lease_mng_ops" case to behave
as it did before the change mentioned below. This is the same as the
current code, but without any reference to a possible delegation.
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation") Signed-off-by: NeilBrown <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
Jens Axboe [Wed, 28 Aug 2024 15:42:33 +0000 (09:42 -0600)]
io_uring/rsrc: ensure compat iovecs are copied correctly
For buffer registration (or updates), a userspace iovec is copied in
and updated. If the application is within a compat syscall, then the
iovec type is compat_iovec rather than iovec. However, the type used
in __io_sqe_buffers_update() and io_sqe_buffers_register() is always
struct iovec, and hence the source is incremented by the size of a
non-compat iovec in the loop. This misses every other iovec in the
source, and will run into garbage half way through the copies and
return -EFAULT to the application.
Maintain the source address separately and assign to our user vec
pointer, so that copies always happen from the right source address.
While in there, correct a bad placement of __user which triggered
the following sparse warning prior to this fix:
io_uring/rsrc.c:981:33: warning: cast removes address space '__user' of expression
io_uring/rsrc.c:981:30: warning: incorrect type in assignment (different address spaces)
io_uring/rsrc.c:981:30: expected struct iovec const [noderef] __user *uvec
io_uring/rsrc.c:981:30: got struct iovec *[noderef] __user
Fixes: f4eaf8eda89e ("io_uring/rsrc: Drop io_copy_iov in favor of iovec API") Reviewed-by: Gabriel Krisman Bertazi <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
Linus Torvalds [Fri, 30 Aug 2024 02:17:30 +0000 (14:17 +1200)]
Merge tag 'drm-fixes-2024-08-30' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Another week, another set of GPU fixes. amdgpu and vmwgfx leading the
charge, then i915 and xe changes along with v3d and some other bits.
The TTM revert is due to some stuttering graphical apps probably due
to longer stalls while prefaulting.
i915:
- Fix #11195: The external display connect via USB type-C dock stays
blank after re-connect the dock
- Make DSI backlight work for 2G version of Lenovo Yoga Tab 3 X90F
- Move ARL GuC firmware to correct version
vmwgfx:
- prevent unmapping active read buffers
- fix prime with external buffers
- disable coherent dumb buffers without 3d
v3d:
- disable preemption while updating GPU stats"
* tag 'drm-fixes-2024-08-30' of https://gitlab.freedesktop.org/drm/kernel:
drm/xe/hwmon: Fix WRITE_I1 param from u32 to u16
drm/v3d: Disable preemption while updating GPU stats
drm/amd/pm: Drop unsupported features on smu v14_0_2
drm/amd/pm: Add support for new P2S table revision
drm/amdgpu: support for gc_info table v1.3
drm/amd/display: avoid using null object of framebuffer
drm/amdgpu/gfx12: set UNORD_DISPATCH in compute MQDs
drm/amd/pm: update message interface for smu v14.0.2/3
drm/amdgpu/swsmu: always force a state reprogram on init
drm/amdgpu/smu13.0.7: print index for profiles
drm/amdgpu: align pp_power_profile_mode with kernel docs
drm/i915/dp_mst: Fix MST state after a sink reset
drm/xe: Invalidate media_gt TLBs
drm/i915: ARL requires a newer GSC firmware
drm/i915/dsi: Make Lenovo Yoga Tab 3 X90F DMI match less strict
video/aperture: optionally match the device in sysfb_disable()
drm/vmwgfx: Disable coherent dumb buffers without 3d
drm/vmwgfx: Fix prime with external buffers
drm/vmwgfx: Prevent unmapping active read buffers
Revert "drm/ttm: increase ttm pre-fault value to PMD size"
Dave Airlie [Fri, 30 Aug 2024 01:28:00 +0000 (11:28 +1000)]
Merge tag 'drm-misc-fixes-2024-08-29' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
A revert for a previous TTM commit causing stuttering, 3 fixes for
vmwgfx related to buffer operations, a fix for video/aperture with
non-VGA primary devices, and a preemption status fix for v3d
Linus Torvalds [Fri, 30 Aug 2024 00:32:53 +0000 (12:32 +1200)]
Merge tag 'execve-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve fix from Kees Cook:
- binfmt_elf_fdpic: fix AUXV size with ELF_HWCAP2 (Max Filippov)
* tag 'execve-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt_elf_fdpic: fix AUXV size calculation when ELF_HWCAP2 is defined
Stephen Brennan [Thu, 29 Aug 2024 18:20:49 +0000 (11:20 -0700)]
dcache: keep dentry_hashtable or d_hash_shift even when not used
The runtime constant feature removes all the users of these variables,
allowing the compiler to optimize them away. It's quite difficult to
extract their values from the kernel text, and the memory saved by
removing them is tiny, and it was never the point of this optimization.
Since the dentry_hashtable is a core data structure, it's valuable for
debugging tools to be able to read it easily. For instance, scripts
built on drgn, like the dentrycache script[1], rely on it to be able to
perform diagnostics on the contents of the dcache. Annotate it as used,
so the compiler doesn't discard it.
Dave Airlie [Thu, 29 Aug 2024 23:02:27 +0000 (09:02 +1000)]
Merge tag 'drm-intel-fixes-2024-08-29' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
- Fix #11195: The external display connect via USB type-C dock stays blank after re-connect the dock
- Make DSI backlight work for 2G version of Lenovo Yoga Tab 3 X90F
. Move ARL GuC firmware to correct version
-
Linus Torvalds [Thu, 29 Aug 2024 18:14:39 +0000 (06:14 +1200)]
Merge tag 'net-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth, wireless and netfilter.
No known outstanding regressions.
Current release - regressions:
- wifi: iwlwifi: fix hibernation
- eth: ionic: prevent tx_timeout due to frequent doorbell ringing
Previous releases - regressions:
- sched: fix sch_fq incorrect behavior for small weights
- wifi:
- iwlwifi: take the mutex before running link selection
- wfx: repair open network AP mode
- netfilter: restore IP sanity checks for netdev/egress
- tcp: fix forever orphan socket caused by tcp_abort
- mptcp: close subflow when receiving TCP+FIN
- bluetooth: fix random crash seen while removing btnxpuart driver
Previous releases - always broken:
- mptcp: more fixes for the in-kernel PM
- eth: bonding: change ipsec_lock from spin lock to mutex
- eth: mana: fix race of mana_hwc_post_rx_wqe and new hwc response
Misc:
- documentation: drop special comment style for net code"
* tag 'net-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
nfc: pn533: Add poll mod list filling check
mailmap: update entry for Sriram Yagnaraman
selftests: mptcp: join: check re-re-adding ID 0 signal
mptcp: pm: ADD_ADDR 0 is not a new address
selftests: mptcp: join: validate event numbers
mptcp: avoid duplicated SUB_CLOSED events
selftests: mptcp: join: check re-re-adding ID 0 endp
mptcp: pm: fix ID 0 endp usage after multiple re-creations
mptcp: pm: do not remove already closed subflows
selftests: mptcp: join: no extra msg if no counter
selftests: mptcp: join: check re-adding init endp with != id
mptcp: pm: reset MPC endp ID when re-added
mptcp: pm: skip connecting to already established sf
mptcp: pm: send ACK on an active subflow
selftests: mptcp: join: check removing ID 0 endpoint
mptcp: pm: fix RM_ADDR ID for the initial subflow
mptcp: pm: reuse ID 0 after delete and re-add
net: busy-poll: use ktime_get_ns() instead of local_clock()
sctp: fix association labeling in the duplicate COOKIE-ECHO case
mptcp: pr_debug: add missing \n at the end
...
Dmitry Torokhov [Thu, 29 Aug 2024 15:38:54 +0000 (08:38 -0700)]
Input: cypress_ps2 - fix waiting for command response
Commit 8bccf667f62a ("Input: cypress_ps2 - report timeouts when reading
command status") uncovered an existing problem with cypress_ps2 driver:
it tries waiting on a PS/2 device waitqueue without using the rest of
libps2. Unfortunately without it nobody signals wakeup for the
waiting process, and each "extended" command was timing out. But the
rest of the code simply did not notice it.
Fix this by switching from homegrown way of sending request to get
command response and reading it to standard ps2_command() which does
the right thing.
Karthik Poosa [Tue, 27 Aug 2024 15:53:01 +0000 (21:23 +0530)]
drm/xe/hwmon: Fix WRITE_I1 param from u32 to u16
WRITE_I1 sub-command of the POWER_SETUP pcode command accepts a u16
parameter instead of u32. This change prevents potential illegal
sub-command errors.
v2: Mask uval instead of changing the prototype. (Badal)
Aleksandr Mishin [Tue, 27 Aug 2024 08:48:22 +0000 (11:48 +0300)]
nfc: pn533: Add poll mod list filling check
In case of im_protocols value is 1 and tm_protocols value is 0 this
combination successfully passes the check
'if (!im_protocols && !tm_protocols)' in the nfc_start_poll().
But then after pn533_poll_create_mod_list() call in pn533_start_poll()
poll mod list will remain empty and dev->poll_mod_count will remain 0
which lead to division by zero.
Normally no im protocol has value 1 in the mask, so this combination is
not expected by driver. But these protocol values actually come from
userspace via Netlink interface (NFC_CMD_START_POLL operation). So a
broken or malicious program may pass a message containing a "bad"
combination of protocol parameter values so that dev->poll_mod_count
is not incremented inside pn533_poll_create_mod_list(), thus leading
to division by zero.
Call trace looks like:
nfc_genl_start_poll()
nfc_start_poll()
->start_poll()
pn533_start_poll()
Add poll mod list filling check.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Paolo Abeni [Thu, 29 Aug 2024 09:35:54 +0000 (11:35 +0200)]
Merge tag 'nf-24-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
Patch #1 sets on NFT_PKTINFO_L4PROTO for UDP packets less than 4 bytes
payload from netdev/egress by subtracting skb_network_offset() when
validating IPv4 packet length, otherwise 'meta l4proto udp' never
matches.
Patch #2 subtracts skb_network_offset() when validating IPv6 packet
length for netdev/egress.
netfilter pull request 24-08-28
* tag 'nf-24-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables_ipv6: consider network offset in netdev/egress validation
netfilter: nf_tables: restore IP sanity checks for netdev/egress
====================
====================
mptcp: more fixes for the in-kernel PM
Here is a new batch of fixes for the MPTCP in-kernel path-manager:
Patch 1 ensures the address ID is set to 0 when the path-manager sends
an ADD_ADDR for the address of the initial subflow. The same fix is
applied when a new subflow is created re-using this special address. A
fix for v6.0.
Patch 2 is similar, but for the case where an endpoint is removed: if
this endpoint was used for the initial address, it is important to send
a RM_ADDR with this ID set to 0, and look for existing subflows with the
ID set to 0. A fix for v6.0 as well.
Patch 3 validates the two previous patches.
Patch 4 makes the PM selecting an "active" path to send an address
notification in an ACK, instead of taking the first path in the list. A
fix for v5.11.
Patch 5 fixes skipping the establishment of a new subflow if a previous
subflow using the same pair of addresses is being closed. A fix for
v5.13.
Patch 6 resets the ID linked to the initial subflow when the linked
endpoint is re-added, possibly with a different ID. A fix for v6.0.
Patch 7 validates the three previous patches.
Patch 8 is a small fix for the MPTCP Join selftest, when being used with
older subflows not supporting all MIB counters. A fix for a commit
introduced in v6.4, but backported up to v5.10.
Patch 9 avoids the PM to try to close the initial subflow multiple
times, and increment counters while nothing happened. A fix for v5.10.
Patch 10 stops incrementing local_addr_used and add_addr_accepted
counters when dealing with the address ID 0, because these counters are
not taking into account the initial subflow, and are then not
decremented when the linked addresses are removed. A fix for v6.0.
Patch 11 validates the previous patch.
Patch 12 avoids the PM to send multiple SUB_CLOSED events for the
initial subflow. A fix for v5.12.
Patch 13 validates the previous patch.
Patch 14 stops treating the ADD_ADDR 0 as a new address, and accepts it
in order to re-create the initial subflow if it has been closed, even if
the limit for *new* addresses -- not taking into account the address of
the initial subflow -- has been reached. A fix for v5.10.
Patch 15 validates the previous patch.
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
---
Matthieu Baerts (NGI0) (15):
mptcp: pm: reuse ID 0 after delete and re-add
mptcp: pm: fix RM_ADDR ID for the initial subflow
selftests: mptcp: join: check removing ID 0 endpoint
mptcp: pm: send ACK on an active subflow
mptcp: pm: skip connecting to already established sf
mptcp: pm: reset MPC endp ID when re-added
selftests: mptcp: join: check re-adding init endp with != id
selftests: mptcp: join: no extra msg if no counter
mptcp: pm: do not remove already closed subflows
mptcp: pm: fix ID 0 endp usage after multiple re-creations
selftests: mptcp: join: check re-re-adding ID 0 endp
mptcp: avoid duplicated SUB_CLOSED events
selftests: mptcp: join: validate event numbers
mptcp: pm: ADD_ADDR 0 is not a new address
selftests: mptcp: join: check re-re-adding ID 0 signal
selftests: mptcp: join: check re-re-adding ID 0 signal
This test extends "delete re-add signal" to validate the previous
commit: when the 'signal' endpoint linked to the initial subflow (ID 0)
is re-added multiple times, it will re-send the ADD_ADDR with id 0. The
client should still be able to re-create this subflow, even if the
add_addr_accepted limit has been reached as this special address is not
considered as a new address.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
The ADD_ADDR 0 with the address from the initial subflow should not be
considered as a new address: this is not something new. If the host
receives it, it simply means that the address is available again.
When receiving an ADD_ADDR for the ID 0, the PM already doesn't consider
it as new by not incrementing the 'add_addr_accepted' counter. But the
'accept_addr' might not be set if the limit has already been reached:
this can be bypassed in this case. But before, it is important to check
that this ADD_ADDR for the ID 0 is for the same address as the initial
subflow. If not, it is not something that should happen, and the
ADD_ADDR can be ignored.
Note that if an ADD_ADDR is received while there is already a subflow
opened using the same address, this ADD_ADDR is ignored as well. It
means that if multiple ADD_ADDR for ID 0 are received, there will not be
any duplicated subflows created by the client.