Git Repo - linux.git/log

xdrgen: Fix return code checking in built-in XDR decoders

xdr_stream_encode_u32() returns XDR_UNIT on success.
xdr_stream_decode_u32() returns zero or -EMSGSIZE, but never
XDR_UNIT.

Signed-off-by: Chuck Lever <[email protected]>

tools: Add xdrgen

Add a Python-based tool for translating XDR specifications into XDR
encoder and decoder functions written in the Linux kernel's C coding
style. The generator attempts to match the usual C coding style of
the Linux kernel's SunRPC consumers.

This approach is similar to the netlink code generator in
tools/net/ynl .

The maintainability benefits of machine-generated XDR code include:

- Stronger type checking
- Reduces the number of bugs introduced by human error
- Makes the XDR code easier to audit and analyze
- Enables rapid prototyping of new RPC-based protocols
- Hardens the layering between protocol logic and marshaling
- Makes it easier to add observability on demand
- Unit tests might be built for both the tool and (automatically)
for the generated code

In addition, converting the XDR layer to use memory-safe languages
such as Rust will be easier if much of the code can be converted
automatically.

Tested-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: fix delegation_blocked() to block correctly for at least 30 seconds

The pair of bloom filtered used by delegation_blocked() was intended to
block delegations on given filehandles for between 30 and 60 seconds.  A
new filehandle would be recorded in the "new" bit set.  That would then
be switch to the "old" bit set between 0 and 30 seconds later, and it
would remain as the "old" bit set for 30 seconds.

Unfortunately the code intended to clear the old bit set once it reached
30 seconds old, preparing it to be the next new bit set, instead cleared
the *new* bit set before switching it to be the old bit set.  This means
that the "old" bit set is always empty and delegations are blocked
between 0 and 30 seconds.

This patch updates bd->new before clearing the set with that index,
instead of afterwards.

Reported-by: Olga Kornievskaia <[email protected]>
Cc: [email protected]
Fixes: 6282cd565553 ("NFSD: Don't hand out delegations for 30 seconds after recalling them.")
Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Benjamin Coddington <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: fix initial getattr on write delegation

At this point in compound processing, currentfh refers to the parent of
the file, not the file itself. Get the correct dentry from the delegation
stateid instead.

Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: untangle code in nfsd4_deleg_getattr_conflict()

The code in nfsd4_deleg_getattr_conflict() is convoluted and buggy.

With this patch we:
- properly handle non-nfsd leases.  We must not assume flc_owner is a
    delegation unless fl_lmops == &nfsd_lease_mng_ops
- move the main code out of the for loop
- have a single exit which calls nfs4_put_stid()
   (and other exits which don't need to call that)

[ jlayton: refactored on top of Neil's other patch: nfsd: fix
   nfsd4_deleg_getattr_conflict in presence of third party lease ]

Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: enforce upper limit for namelen in __cld_pipe_inprogress_downcall()

This patch is intended to go on top of "nfsd: return -EINVAL when
namelen is 0" from Li Lingfeng. Li's patch checks for 0, but we should
be enforcing an upper bound as well.

Note that if nfsdcld somehow gets an id > NFS4_OPAQUE_LIMIT in its
database, it'll truncate it to NFS4_OPAQUE_LIMIT when it does the
downcall anyway.

Signed-off-by: Scott Mayhew <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: return -EINVAL when namelen is 0

When we have a corrupted main.sqlite in /var/lib/nfs/nfsdcld/, it may
result in namelen being 0, which will cause memdup_user() to return
ZERO_SIZE_PTR.
When we access the name.data that has been assigned the value of
ZERO_SIZE_PTR in nfs4_client_to_reclaim(), null pointer dereference is
triggered.

[ T1205] ==================================================================
[ T1205] BUG: KASAN: null-ptr-deref in nfs4_client_to_reclaim+0xe9/0x260
[ T1205] Read of size 1 at addr 0000000000000010 by task nfsdcld/1205
[ T1205]
[ T1205] CPU: 11 PID: 1205 Comm: nfsdcld Not tainted 5.10.0-00003-g2c1423731b8d #406
[ T1205] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
[ T1205] Call Trace:
[ T1205]  dump_stack+0x9a/0xd0
[ T1205]  ? nfs4_client_to_reclaim+0xe9/0x260
[ T1205]  __kasan_report.cold+0x34/0x84
[ T1205]  ? nfs4_client_to_reclaim+0xe9/0x260
[ T1205]  kasan_report+0x3a/0x50
[ T1205]  nfs4_client_to_reclaim+0xe9/0x260
[ T1205]  ? nfsd4_release_lockowner+0x410/0x410
[ T1205]  cld_pipe_downcall+0x5ca/0x760
[ T1205]  ? nfsd4_cld_tracking_exit+0x1d0/0x1d0
[ T1205]  ? down_write_killable_nested+0x170/0x170
[ T1205]  ? avc_policy_seqno+0x28/0x40
[ T1205]  ? selinux_file_permission+0x1b4/0x1e0
[ T1205]  rpc_pipe_write+0x84/0xb0
[ T1205]  vfs_write+0x143/0x520
[ T1205]  ksys_write+0xc9/0x170
[ T1205]  ? __ia32_sys_read+0x50/0x50
[ T1205]  ? ktime_get_coarse_real_ts64+0xfe/0x110
[ T1205]  ? ktime_get_coarse_real_ts64+0xa2/0x110
[ T1205]  do_syscall_64+0x33/0x40
[ T1205]  entry_SYSCALL_64_after_hwframe+0x67/0xd1
[ T1205] RIP: 0033:0x7fdbdb761bc7
[ T1205] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 514
[ T1205] RSP: 002b:00007fff8c4b7248 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ T1205] RAX: ffffffffffffffda RBX: 000000000000042b RCX: 00007fdbdb761bc7
[ T1205] RDX: 000000000000042b RSI: 00007fff8c4b75f0 RDI: 0000000000000008
[ T1205] RBP: 00007fdbdb761bb0 R08: 0000000000000000 R09: 0000000000000001
[ T1205] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000042b
[ T1205] R13: 0000000000000008 R14: 00007fff8c4b75f0 R15: 0000000000000000
[ T1205] ==================================================================

Fix it by checking namelen.

Signed-off-by: Li Lingfeng <[email protected]>
Fixes: 74725959c33c ("nfsd: un-deprecate nfsdcld")
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: Scott Mayhew <[email protected]>
Tested-by: Scott Mayhew <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

NFSD: Wrap async copy operations with trace points

Add an nfsd_copy_async_done to record the timestamp, the final
status code, and the callback stateid of an async copy.

Rename the nfsd_copy_do_async tracepoint to match that naming
convention to make it easier to enable both of these with a
single glob.

Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

NFSD: Clean up extra whitespace in trace_nfsd_copy_done

Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

NFSD: Record the callback stateid in copy tracepoints

Match COPY operations up with CB_OFFLOAD operations.

Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

NFSD: Display copy stateids with conventional print formatting

Make it easier to grep for s2s COPY stateids in trace logs: Use the
same display format in nfsd_copy_class as is used to display other
stateids.

Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

NFSD: Limit the number of concurrent async COPY operations

Nothing appears to limit the number of concurrent async COPY
operations that clients can start. In addition, AFAICT each async
COPY can copy an unlimited number of 4MB chunks, so can run for a
long time. Thus IMO async COPY can become a DoS vector.

Add a restriction mechanism that bounds the number of concurrent
background COPY operations. Start simple and try to be fair -- this
patch implements a per-namespace limit.

An async COPY request that occurs while this limit is exceeded gets
NFS4ERR_DELAY. The requesting client can choose to send the request
again after a delay or fall back to a traditional read/write style
copy.

If there is need to make the mechanism more sophisticated, we can
visit that in future patches.

Cc: [email protected]
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

NFSD: Async COPY result needs to return a write verifier

Currently, when NFSD handles an asynchronous COPY, it returns a
zero write verifier, relying on the subsequent CB_OFFLOAD callback
to pass the write verifier and a stable_how4 value to the client.

However, if the CB_OFFLOAD never arrives at the client (for example,
if a network partition occurs just as the server sends the
CB_OFFLOAD operation), the client will never receive this verifier.
Thus, if the client sends a follow-up COMMIT, there is no way for
the client to assess the COMMIT result.

The usual recovery for a missing CB_OFFLOAD is for the client to
send an OFFLOAD_STATUS operation, but that operation does not carry
a write verifier in its result. Neither does it carry a stable_how4
value, so the client /must/ send a COMMIT in this case -- which will
always fail because currently there's still no write verifier in the
COPY result.

Thus the server needs to return a normal write verifier in its COPY
result even if the COPY operation is to be performed asynchronously.

If the server recognizes the callback stateid in subsequent
OFFLOAD_STATUS operations, then obviously it has not restarted, and
the write verifier the client received in the COPY result is still
valid and can be used to assess a COMMIT of the copied data, if one
is needed.

Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: avoid races with wake_up_var()

wake_up_var() needs a barrier after the important change is made in the
var and before wake_up_var() is called, else it is possible that a wake
up won't be sent when it should.

In each case here the var is changed in an "atomic" manner, so
smb_mb__after_atomic() is sufficient.

In one case the important change (removing the lease) is performed
*after* the wake_up, which is backwards. The code survives in part
because the wait_var_event is given a timeout.

This patch adds the required barriers and calls destroy_delegation()
*before* waking any threads waiting for the delegation to be destroyed.

Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: use clear_and_wake_up_bit()

nfsd has two places that open-code clear_and_wake_up_bit(). One has
the required memory barriers. The other does not.

Change both to use clear_and_wake_up_bit() so we have the barriers
without the noise.

Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

sunrpc: xprtrdma: Use ERR_CAST() to return

Using ERR_CAST() is more reasonable and safer, When it is necessary
to convert the type of an error pointer and return it.

Signed-off-by: Yan Zhen <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

NFSD: Annotate struct pnfs_block_deviceaddr with __counted_by()

Add the __counted_by compiler attribute to the flexible array member
volumes to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.

Use struct_size() instead of manually calculating the number of bytes to
allocate for a pnfs_block_deviceaddr with a single volume.

Signed-off-by: Thorsten Blum <[email protected]>
Reviewed-by: Gustavo A. R. Silva <[email protected]>
Acked-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: call cache_put if xdr_reserve_space returns NULL

If not enough buffer space available, but idmap_lookup has triggered
lookup_fn which calls cache_get and returns successfully. Then we
missed to call cache_put here which pairs with cache_get.

Fixes: ddd1ea563672 ("nfsd4: use xdr_reserve_space in attribute encoding")
Signed-off-by: Guoqing Jiang <[email protected]>
Reviwed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: add more nfsd_cb tracepoints

Add some tracepoints in the callback client RPC operations. Also
add a tracepoint to nfsd4_cb_getattr_done.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: track the main opcode for callbacks

Keep track of the "main" opcode for the callback, and display it in the
tracepoint. This makes it simpler to discern what's happening when there
is more than one callback in flight.

The one special case is the CB_NULL RPC. That's not a CB_COMPOUND
opcode, so designate the value 0 for that.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: add more info to WARN_ON_ONCE on failed callbacks

Currently, you get the warning and stack trace, but nothing is printed
about the relevant error codes. Add that in.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: fix some spelling errors in comments

Fix spelling errors in comments of nfsd4_release_lockowner and
nfs4_set_delegation.

Signed-off-by: Li Lingfeng <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: remove unused parameter of nfsd_file_mark_find_or_create

Commit 427f5f83a319 ("NFSD: Ensure nf_inode is never dereferenced") passes
inode directly to nfsd_file_mark_find_or_create instead of getting it from
nf, so there is no need to pass nf.

Signed-off-by: Li Lingfeng <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: use LIST_HEAD() to simplify code

list_head can be initialized automatically with LIST_HEAD()
instead of calling INIT_LIST_HEAD().

Signed-off-by: Hongbo Li <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: map the EBADMSG to nfserr_io to avoid warning

Ext4 will throw -EBADMSG through ext4_readdir when a checksum error
occurs, resulting in the following WARNING.

Fix it by mapping EBADMSG to nfserr_io.

nfsd_buffered_readdir
iterate_dir // -EBADMSG -74
  ext4_readdir // .iterate_shared
   ext4_dx_readdir
    ext4_htree_fill_tree
     htree_dirblock_to_tree
      ext4_read_dirblock
       __ext4_read_dirblock
        ext4_dirblock_csum_verify
         warn_no_space_for_csum
          __warn_no_space_for_csum
        return ERR_PTR(-EFSBADCRC) // -EBADMSG -74
nfserrno // WARNING

[  161.115610] ------------[ cut here ]------------
[  161.116465] nfsd: non-standard errno: -74
[  161.117315] WARNING: CPU: 1 PID: 780 at fs/nfsd/nfsproc.c:878 nfserrno+0x9d/0xd0
[  161.118596] Modules linked in:
[  161.119243] CPU: 1 PID: 780 Comm: nfsd Not tainted 5.10.0-00014-g79679361fd5d #138
[  161.120684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qe
mu.org 04/01/2014
[  161.123601] RIP: 0010:nfserrno+0x9d/0xd0
[  161.124676] Code: 0f 87 da 30 dd 00 83 e3 01 b8 00 00 00 05 75 d7 44 89 ee 48 c7 c7 c0 57 24 98 89 44 24 04 c6
05 ce 2b 61 03 01 e8 99 20 d8 00 <0f> 0b 8b 44 24 04 eb b5 4c 89 e6 48 c7 c7 a0 6d a4 99 e8 cc 15 33
[  161.127797] RSP: 0018:ffffc90000e2f9c0 EFLAGS: 00010286
[  161.128794] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  161.130089] RDX: 1ffff1103ee16f6d RSI: 0000000000000008 RDI: fffff520001c5f2a
[  161.131379] RBP: 0000000000000022 R08: 0000000000000001 R09: ffff8881f70c1827
[  161.132664] R10: ffffed103ee18304 R11: 0000000000000001 R12: 0000000000000021
[  161.133949] R13: 00000000ffffffb6 R14: ffff8881317c0000 R15: ffffc90000e2fbd8
[  161.135244] FS:  0000000000000000(0000) GS:ffff8881f7080000(0000) knlGS:0000000000000000
[  161.136695] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  161.137761] CR2: 00007fcaad70b348 CR3: 0000000144256006 CR4: 0000000000770ee0
[  161.139041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  161.140291] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  161.141519] PKRU: 55555554
[  161.142076] Call Trace:
[  161.142575]  ? __warn+0x9b/0x140
[  161.143229]  ? nfserrno+0x9d/0xd0
[  161.143872]  ? report_bug+0x125/0x150
[  161.144595]  ? handle_bug+0x41/0x90
[  161.145284]  ? exc_invalid_op+0x14/0x70
[  161.146009]  ? asm_exc_invalid_op+0x12/0x20
[  161.146816]  ? nfserrno+0x9d/0xd0
[  161.147487]  nfsd_buffered_readdir+0x28b/0x2b0
[  161.148333]  ? nfsd4_encode_dirent_fattr+0x380/0x380
[  161.149258]  ? nfsd_buffered_filldir+0xf0/0xf0
[  161.150093]  ? wait_for_concurrent_writes+0x170/0x170
[  161.151004]  ? generic_file_llseek_size+0x48/0x160
[  161.151895]  nfsd_readdir+0x132/0x190
[  161.152606]  ? nfsd4_encode_dirent_fattr+0x380/0x380
[  161.153516]  ? nfsd_unlink+0x380/0x380
[  161.154256]  ? override_creds+0x45/0x60
[  161.155006]  nfsd4_encode_readdir+0x21a/0x3d0
[  161.155850]  ? nfsd4_encode_readlink+0x210/0x210
[  161.156731]  ? write_bytes_to_xdr_buf+0x97/0xe0
[  161.157598]  ? __write_bytes_to_xdr_buf+0xd0/0xd0
[  161.158494]  ? lock_downgrade+0x90/0x90
[  161.159232]  ? nfs4svc_decode_voidarg+0x10/0x10
[  161.160092]  nfsd4_encode_operation+0x15a/0x440
[  161.160959]  nfsd4_proc_compound+0x718/0xe90
[  161.161818]  nfsd_dispatch+0x18e/0x2c0
[  161.162586]  svc_process_common+0x786/0xc50
[  161.163403]  ? nfsd_svc+0x380/0x380
[  161.164137]  ? svc_printk+0x160/0x160
[  161.164846]  ? svc_xprt_do_enqueue.part.0+0x365/0x380
[  161.165808]  ? nfsd_svc+0x380/0x380
[  161.166523]  ? rcu_is_watching+0x23/0x40
[  161.167309]  svc_process+0x1a5/0x200
[  161.168019]  nfsd+0x1f5/0x380
[  161.168663]  ? nfsd_shutdown_threads+0x260/0x260
[  161.169554]  kthread+0x1c4/0x210
[  161.170224]  ? kthread_insert_work_sanity_check+0x80/0x80
[  161.171246]  ret_from_fork+0x1f/0x30

Signed-off-by: Li Lingfeng <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Cc: [email protected]
Signed-off-by: Chuck Lever <[email protected]>

NFSD: remove redundant assignment operation

Commit 5826e09bf3dd ("NFSD: OP_CB_RECALL_ANY should recall both read and
write delegations") added a new assignment statement to add
RCA4_TYPE_MASK_WDATA_DLG to ra_bmval bitmask of OP_CB_RECALL_ANY. So the
old one should be removed.

Signed-off-by: Li Lingfeng <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

.mailmap: Add an entry for my work email address

Collect a few very old previous employers as well.

Signed-off-by: Chuck Lever <[email protected]>

NFSD: Fix NFSv4's PUTPUBFH operation

According to RFC 8881, all minor versions of NFSv4 support PUTPUBFH.

Replace the XDR decoder for PUTPUBFH with a "noop" since we no
longer want the minorversion check, and PUTPUBFH has no arguments to
decode. (Ideally nfsd4_decode_noop should really be called
nfsd4_decode_void).

PUTPUBFH should now behave just like PUTROOTFH.

Reported-by: Cedric Blancher <[email protected]>
Fixes: e1a90ebd8b23 ("NFSD: Combine decode operations for v4 and v4.1")
Cc: Dan Shelton <[email protected]>
Cc: Roland Mainz <[email protected]>
Cc: [email protected]
Signed-off-by: Chuck Lever <[email protected]>

nfsd: Add quotes to client info 'callback address'

The 'callback address' in client_info_show is output without quotes
causing yaml parsers to fail on processing IPv6 addresses.
Adding quotes to 'callback address' also matches that used by
the 'address' field.

Signed-off-by: Mark Grimes <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Handle device removal outside of the CM event handler

Synchronously wait for all disconnects to complete to ensure the
transports have divested all hardware resources before the
underlying RDMA device can safely be removed.

Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: move error choice for incorrect object types to version-specific code.

If an NFS operation expects a particular sort of object (file, dir, link,
etc) but gets a file handle for a different sort of object, it must
return an error.  The actual error varies among NFS versions in non-trivial
ways.

For v2 and v3 there are ISDIR and NOTDIR errors and, for NFSv4 only,
INVAL is suitable.

For v4.0 there is also NFS4ERR_SYMLINK which should be used if a SYMLINK
was found when not expected.  This take precedence over NOTDIR.

For v4.1+ there is also NFS4ERR_WRONG_TYPE which should be used in
preference to EINVAL when none of the specific error codes apply.

When nfsd_mode_check() finds a symlink where it expected a directory it
needs to return an error code that can be converted to NOTDIR for v2 or
v3 but will be SYMLINK for v4.  It must be different from the error
code returns when it finds a symlink but expects a regular file - that
must be converted to EINVAL or SYMLINK.

So we introduce an internal error code nfserr_symlink_not_dir which each
version converts as appropriate.

nfsd_check_obj_isreg() is similar to nfsd_mode_check() except that it is
only used by NFSv4 and only for OPEN.  NFSERR_INVAL is never a suitable
error if the object is the wrong time.  For v4.0 we use nfserr_symlink
for non-dirs even if not a symlink.  For v4.1 we have nfserr_wrong_type.
We handle this difference in-place in nfsd_check_obj_isreg() as there is
nothing to be gained by delaying the choice to nfsd4_map_status().

As a result of these changes, nfsd_mode_check() doesn't need an rqstp
arg any more.

Note that NFSv4 operations are actually performed in the xdr code(!!!)
so to the only place that we can map the status code successfully is in
nfsd4_encode_operation().

Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: be more systematic about selecting error codes for internal use.

Rather than using ad hoc values for internal errors (30000, 11000, ...)
use 'enum' to sequentially allocate numbers starting from the first
known available number - now visible as NFS4ERR_FIRST_FREE.

The goal is values that are distinct from all be32 error codes. To get
those we must first select integers that are not already used, then
convert them with cpu_to_be32().

Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: Move error code mapping to per-version proc code.

There is code scattered around nfsd which chooses an error status based
on the particular version of nfs being used. It is cleaner to have the
version specific choices in version specific code.

With this patch common code returns the most specific error code
possible and the version specific code maps that if necessary.

Both v2 (nfsproc.c) and v3 (nfs3proc.c) now have a "map_status()"
function which is called to map the resp->status before each non-trivial
nfsd_proc_* or nfsd3_proc_* function returns.

NFS4ERR_SYMLINK and NFS4ERR_WRONG_TYPE introduce extra complications and
are left for a later patch.

Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: move V4ROOT version check to nfsd_set_fh_dentry()

This further centralizes version number checks.

Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: further centralize protocol version checks.

With this patch the only places that test ->rq_vers against a specific
version are nfsd_v4client() and nfsd_set_fh_dentry().
The latter sets some flags in the svc_fh, which now includes:
fh_64bit_cookies
fh_use_wgather

Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: use nfsd_v4client() in nfsd_breaker_owns_lease()

nfsd_breaker_owns_lease() currently open-codes the same test that
nfsd_v4client() performs.

With this patch we use nfsd_v4client() instead.

Also as i_am_nfsd() is only used in combination with kthread_data(),
replace it with nfsd_current_rqst() which combines the two and returns a
valid svc_rqst, or NULL.

The test for NULL is moved into nfsd_v4client() for code clarity.

Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: Pass 'cred' instead of 'rqstp' to some functions.

nfsd_permission(), exp_rdonly(), nfsd_setuser(), and nfsexp_flags()
only ever need the cred out of rqstp, so pass it explicitly instead of
the whole rqstp.

This makes the interfaces cleaner.

Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: Don't pass all of rqst into rqst_exp_find()

Rather than passing the whole rqst, pass the pieces that are actually
needed. This makes the inputs to rqst_exp_find() more obvious.

Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: don't assume copy notify when preprocessing the stateid

Move the stateid handling to nfsd4_copy_notify.
If nfs4_preprocess_stateid_op did not produce an output stateid, error out.

Copy notify specifically does not permit the use of special stateids,
so enforce that outside generic stateid pre-processing.

Signed-off-by: Sagi Grimberg <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: Olga Kornievskaia <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

sunrpc: allow svc threads to fail initialisation cleanly

If an svc thread needs to perform some initialisation that might fail,
it has no good way to handle the failure.

Before the thread can exit it must call svc_exit_thread(), but that
requires the service mutex to be held.  The thread cannot simply take
the mutex as that could deadlock if there is a concurrent attempt to
shut down all threads (which is unlikely, but not impossible).

nfsd currently call svc_exit_thread() unprotected in the unlikely event
that unshare_fs_struct() fails.

We can clean this up by introducing svc_thread_init_status() by which an
svc thread can report whether initialisation has succeeded.  If it has,
it continues normally into the action loop.  If it has not,
svc_thread_init_status() immediately aborts the thread.
svc_start_kthread() waits for either of these to happen, and calls
svc_exit_thread() (under the mutex) if the thread aborted.

Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

sunrpc: merge svc_rqst_alloc() into svc_prepare_thread()

The only caller of svc_rqst_alloc() is svc_prepare_thread(). So merge
the one into the other and simplify.

Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

sunrpc: don't take ->sv_lock when updating ->sv_nrthreads.

As documented in svc_xprt.c, sv_nrthreads is protected by the service
mutex, and it does not need ->sv_lock.
(->sv_lock is needed only for sv_permsocks, sv_tempsocks, and
sv_tmpcnt).

So remove the unnecessary locking.

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

sunrpc: change sp_nrthreads from atomic_t to unsigned int.

sp_nrthreads is only ever accessed under the service mutex
nlmsvc_mutex nfs_callback_mutex nfsd_mutex
so these is no need for it to be an atomic_t.

The fact that all code using it is single-threaded means that we can
simplify svc_pool_victim and remove the temporary elevation of
sp_nrthreads.

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

sunrpc: document locking rules for svc_exit_thread()

The locking required for svc_exit_thread() is not obvious, so document
it in a kdoc comment.

Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: don't allocate the versions array.

Instead of using kmalloc to allocate an array for storing active version
info, just declare an array to the max size - it is only 5 or so.

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: move nfsd_pool_stats_open into nfsctl.c

nfsd_pool_stats_open() is used in nfsctl.c, so move it there.

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: make various functions static, or not exported.

Various functions are only used within the sunrpc module, and several
are only use in the one file.  So clean up:

These are marked static, and any EXPORT is removed.
  svc_rcpb_setup()
  svc_rqst_alloc()
  svc_rqst_free()  - also moved before first use
  svc_rpcbind_set_version()
  svc_drop() - also moved to svc.c

These are now not EXPORTed, but are not static.
  svc_authenticate()
  svc_sock_update_bufs()

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

lockd: discard nlmsvc_timeout

nlmsvc_timeout always has the same value as (nlm_timeout * HZ), so use
that in the one place that nlmsvc_timeout is used.

In truth it *might* not always be the same as nlmsvc_timeout is only set
when lockd is started while nlm_timeout can be set at anytime via
sysctl. I think this difference it not helpful so removing it is good.

Also remove the test for nlm_timout being 0. This is not possible -
unless a module parameter is used to set the minimum timeout to 0, and
if that happens then it probably should be honoured.

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: don't EXPORT_SYMBOL nfsd4_ssc_init_umount_work()

nfsd4_ssc_init_umount_work() is only used in the nfsd module, so there
is no need to EXPORT it.

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

NFS: trace: show TIMEDOUT instead of 0x6e

__nfs_revalidate_inode may return ETIMEDOUT.

print symbol of ETIMEDOUT in nfs trace:

before:
cat-5191 [005] 119.331127: nfs_revalidate_inode_exit: error=-110 (0x6e)

after:
cat-1738 [004] 44.365509: nfs_revalidate_inode_exit: error=-110 (TIMEDOUT)

Signed-off-by: Chen Hanxiao <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: use system_unbound_wq for nfsd_file_gc_worker()

After many rounds of changes in filecache.c, the fix by commit
ce7df055(NFSD: Make the file_delayed_close workqueue UNBOUND)
is gone, now we are getting syslog messages like these:

[ 1618.186688] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 4 times, consider switching to WQ_UNBOUND
[ 1638.661616] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 8 times, consider switching to WQ_UNBOUND
[ 1665.284542] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 16 times, consider switching to WQ_UNBOUND
[ 1759.491342] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 32 times, consider switching to WQ_UNBOUND
[ 3013.012308] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 64 times, consider switching to WQ_UNBOUND
[ 3154.172827] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 128 times, consider switching to WQ_UNBOUND
[ 3422.461924] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 256 times, consider switching to WQ_UNBOUND
[ 3963.152054] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 512 times, consider switching to WQ_UNBOUND

Consider use system_unbound_wq instead of system_wq for
nfsd_file_gc_worker().

Signed-off-by: Youzhong Yang <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: count nfsd_file allocations

We already count the frees (via nfsd_file_releases). Count the
allocations as well. Also switch the direct call to nfsd_file_slab_free
in nfsd_file_do_acquire to nfsd_file_free, so that the allocs and
releases match up.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: fix refcount leak when file is unhashed after being found

If we wait_for_construction and find that the file is no longer hashed,
and we're going to retry the open, the old nfsd_file reference is
currently leaked. Put the reference before retrying.

Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
Signed-off-by: Jeff Layton <[email protected]>
Tested-by: Youzhong Yang <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire

Given that we do the search and insertion while holding the i_lock, I
don't think it's possible for us to get EEXIST here. Remove this case.

Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
Signed-off-by: Jeff Layton <[email protected]>
Tested-by: Youzhong Yang <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: add list_head nf_gc to struct nfsd_file

nfsd_file_put() in one thread can race with another thread doing
garbage collection (running nfsd_file_gc() -> list_lru_walk() ->
nfsd_file_lru_cb()):

  * In nfsd_file_put(), nf->nf_ref is 1, so it tries to do nfsd_file_lru_add().
  * nfsd_file_lru_add() returns true (with NFSD_FILE_REFERENCED bit set)
  * garbage collector kicks in, nfsd_file_lru_cb() clears REFERENCED bit and
    returns LRU_ROTATE.
  * garbage collector kicks in again, nfsd_file_lru_cb() now decrements nf->nf_ref
    to 0, runs nfsd_file_unhash(), removes it from the LRU and adds to the dispose
    list [list_lru_isolate_move(lru, &nf->nf_lru, head)]
  * nfsd_file_put() detects NFSD_FILE_HASHED bit is cleared, so it tries to remove
    the 'nf' from the LRU [if (!nfsd_file_lru_remove(nf))]. The 'nf' has been added
    to the 'dispose' list by nfsd_file_lru_cb(), so nfsd_file_lru_remove(nf) simply
    treats it as part of the LRU and removes it, which leads to its removal from
    the 'dispose' list.
  * At this moment, 'nf' is unhashed with its nf_ref being 0, and not on the LRU.
    nfsd_file_put() continues its execution [if (refcount_dec_and_test(&nf->nf_ref))],
    as nf->nf_ref is already 0, nf->nf_ref is set to REFCOUNT_SATURATED, and the 'nf'
    gets no chance of being freed.

nfsd_file_put() can also race with nfsd_file_cond_queue():
  * In nfsd_file_put(), nf->nf_ref is 1, so it tries to do nfsd_file_lru_add().
  * nfsd_file_lru_add() sets REFERENCED bit and returns true.
  * Some userland application runs 'exportfs -f' or something like that, which triggers
    __nfsd_file_cache_purge() -> nfsd_file_cond_queue().
  * In nfsd_file_cond_queue(), it runs [if (!nfsd_file_unhash(nf))], unhash is done
    successfully.
  * nfsd_file_cond_queue() runs [if (!nfsd_file_get(nf))], now nf->nf_ref goes to 2.
  * nfsd_file_cond_queue() runs [if (nfsd_file_lru_remove(nf))], it succeeds.
  * nfsd_file_cond_queue() runs [if (refcount_sub_and_test(decrement, &nf->nf_ref))]
    (with "decrement" being 2), so the nf->nf_ref goes to 0, the 'nf' is added to the
    dispose list [list_add(&nf->nf_lru, dispose)]
  * nfsd_file_put() detects NFSD_FILE_HASHED bit is cleared, so it tries to remove
    the 'nf' from the LRU [if (!nfsd_file_lru_remove(nf))], although the 'nf' is not
    in the LRU, but it is linked in the 'dispose' list, nfsd_file_lru_remove() simply
    treats it as part of the LRU and removes it. This leads to its removal from
    the 'dispose' list!
  * Now nf->ref is 0, unhashed. nfsd_file_put() continues its execution and set
    nf->nf_ref to REFCOUNT_SATURATED.

As shown in the above analysis, using nf_lru for both the LRU list and dispose list
can cause the leaks. This patch adds a new list_head nf_gc in struct nfsd_file, and uses
it for the dispose list. This does not fix the nfsd_file leaking issue completely.

Signed-off-by: Youzhong Yang <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

Linux 6.11-rc6

Merge tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- copy_file_range fix

- two read fixes including read past end of file rc fix and read retry
   crediting fix

- falloc zero range fix

* tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Fix FALLOC_FL_ZERO_RANGE to preflush buffered part of target region
  cifs: Fix copy offload to flush destination region
  netfs, cifs: Fix handling of short DIO read
  cifs: Fix lack of credit renegotiation on read retry

Merge tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs

Push bcachefs fixes from Kent Overstreet:
"The data corruption in the buffered write path is troubling; inode
  lock should not have been able to cause that...

   - Fix a rare data corruption in the rebalance path, caught as a nonce
     inconsistency on encrypted filesystems

   - Revert lockless buffered write path

   - Mark more errors as autofix"

* tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs:
  bcachefs: Mark more errors as autofix
  bcachefs: Revert lockless buffered IO path
  bcachefs: Fix bch2_extents_match() false positive
  bcachefs: Fix failure to return error in data_update_index_update()

bcachefs: Mark more errors as autofix

errors that are known to always be safe to fix should be autofix: this
should be most errors even at this point, but that will need some
thorough review.

note that errors are still logged in the superblock, so we'll still know
that they happened.

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Revert lockless buffered IO path

We had a report of data corruption on nixos when building installer
images.

https://github.com/NixOS/nixpkgs/pull/321055#issuecomment-2184131334

It seems that writes are being dropped, but only when issued by QEMU,
and possibly only in snapshot mode. It's undetermined if it's write
calls are being dropped or dirty folios.

Further testing, via minimizing the original patch to just the change
that skips the inode lock on non appends/truncates, reveals that it
really is just not taking the inode lock that causes the corruption: it
has nothing to do with the other logic changes for preserving write
atomicity in corner cases.

It's also kernel config dependent: it doesn't reproduce with the minimal
kernel config that ktest uses, but it does reproduce with nixos's distro
config. Bisection the kernel config initially pointer the finger at page
migration or compaction, but it appears that was erroneous; we haven't
yet determined what kernel config option actually triggers it.

Sadly it appears this will have to be reverted since we're getting too
close to release and my plate is full, but we'd _really_ like to fully
debug it.

My suspicion is that this patch is exposing a preexisting bug - the
inode lock actually covers very little in IO paths, and we have a
different lock (the pagecache add lock) that guards against races with
truncate here.

Fixes: 7e64c86cdc6c ("bcachefs: Buffered write path now can avoid the inode lock")
Signed-off-by: Kent Overstreet <[email protected]>

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull misc fixes from Guenter Roeck.

These are fixes for regressions that Guenther has been reporting, and
the maintainers haven't picked up and sent in. With rc6 fairly imminent,
I'm taking them directly from Guenter.

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  apparmor: fix policy_unpack_test on big endian systems
  Revert "MIPS: csrc-r4k: Apply verification clocksource flags"
  microblaze: don't treat zero reserved memory regions as error

Merge tag 'pwrseq-fixes-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull power sequencing fix from Bartosz Golaszewski:
"A follow-up fix for the power sequencing subsystem. It turned out the
  previous fix for this driver was incomplete and broke the WLAN support
  on some platforms. This addresses the issue.

   - set the direction of the wlan-enable GPIO to output after
     requesting it as-is"

* tag 'pwrseq-fixes-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  power: sequencing: qcom-wcn: set the wlan-enable GPIO to output

power: sequencing: qcom-wcn: set the wlan-enable GPIO to output

Commit a9aaf1ff88a8 ("power: sequencing: request the WLAN enable GPIO
as-is") broke WLAN on boards on which the wlan-enable GPIO enabling the
wifi module isn't in output mode by default. We need to set direction to
output while retaining the value that was already set to keep the ath
module on if it's already started.

Fixes: a9aaf1ff88a8 ("power: sequencing: request the WLAN enable GPIO as-is")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>

Merge tag 'usb-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some small USB fixes for 6.11-rc6.  Included in here are:

   - dwc3 driver fixes for reported issues

   - MAINTAINER file update, marking a driver as unsupported :(

   - cdnsp driver fixes

   - USB gadget driver fix

   - USB sysfs fix

   - other tiny fixes

   - new device ids for usb serial driver

  All of these have been in linux-next this week with no reported
  issues"

* tag 'usb-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: serial: option: add MeiG Smart SRM825L
  usb: cdnsp: fix for Link TRB with TC
  usb: dwc3: st: add missing depopulate in probe error path
  usb: dwc3: st: fix probed platform device ref count on probe error path
  usb: dwc3: ep0: Don't reset resource alloc flag (including ep0)
  usb: core: sysfs: Unmerge @usb3_hardware_lpm_attr_group in remove_power_attributes()
  usb: typec: fsa4480: Relax CHIP_ID check
  usb: dwc3: xilinx: add missing depopulate in probe error path
  usb: dwc3: omap: add missing depopulate in probe error path
  dt-bindings: usb: microchip,usb2514: Fix reference USB device schema
  usb: gadget: uvc: queue pump work in uvcg_video_enable()
  cdc-acm: Add DISABLE_ECHO quirk for GE HealthCare UI Controller
  usb: cdnsp: fix incorrect index in cdnsp_get_hw_deq function
  usb: dwc3: core: Prevent USB core invalid event buffer address access
  MAINTAINERS: Mark UVC gadget driver as orphan

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Minor fixes only.

  The sd.c one ignores a sync cache request if format is in progress
  which can happen if formatting a drive across suspend/resume"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: sd: Ignore command SYNCHRONIZE CACHE error if format in progress
  scsi: aacraid: Fix double-free on probe failure
  scsi: lpfc: Fix overflow build issue

Merge tag 'nfsd-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

- One more write delegation fix

* tag 'nfsd-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix nfsd4_deleg_getattr_conflict in presence of third party lease

Merge tag 'xfs-6.11-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Chandan Babu:

- Do not call out v1 inodes with non-zero di_nlink field as being
   corrupt

- Change xfs_finobt_count_blocks() to count "free inode btree" blocks
   rather than "inode btree" blocks

- Don't report the number of trimmed bytes via FITRIM because the
   underlying storage isn't required to do anything and failed discard
   IOs aren't reported to the caller anyway

- Fix incorrect setting of rm_owner field in an rmap query

- Report missing disk offset range in an fsmap query

- Obtain m_growlock when extending realtime section of the filesystem

- Reset rootdir extent size hint after extending realtime section of
   the filesystem

* tag 'xfs-6.11-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: reset rootdir extent size hint after growfsrt
  xfs: take m_growlock when running growfsrt
  xfs: Fix missing interval for missing_owner in xfs fsmap
  xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code
  xfs: Fix the owner setting issue for rmap query in xfs fsmap
  xfs: don't bother reporting blocks trimmed via FITRIM
  xfs: xfs_finobt_count_blocks() walks the wrong btree
  xfs: fix folio dirtying for XFILE_ALLOC callers
  xfs: fix di_onlink checking for V1/V2 inodes

Merge tag 'arm-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
"There is a fairly large number of bug fixes for Qualcomm platforms,
  most of them addressing issues with the devicetree files for the newly
  added Snapdragon X1 based laptops to make them more reliable.

  The Qualcomm driver changes address a few build-time issues as well as
  runtime problems in the tzmem and scm firmware, the USB Type-C driver,
  and the cmd-db and pmic_glink soc drivers.

  The NXP i.MX usually gets a bunch of devicetree fixes that is
  proportional to the number of supported machines. This includes both
  warning fixes and correctness for the 64-bit i.MX9, i.MX8 and
  layerscape platforms, as well as a single fix for a 32-bit i.MX6 based
  board.

  The other changes are the usual minor changes, including an update to
  the MAINTAINERS file, an omap3 dts file and a SoC driver for mpfs
  (risc-v)"

* tag 'arm-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (50 commits)
  firmware: microchip: fix incorrect error report of programming:timeout on success
  soc: qcom: pd-mapper: Fix singleton refcount
  firmware: qcom: tzmem: disable sdm670 platform
  soc: qcom: pmic_glink: Actually communicate when remote goes down
  usb: typec: ucsi: Move unregister out of atomic section
  soc: qcom: pmic_glink: Fix race during initialization
  firmware: qcom: qseecom: remove unused functions
  firmware: qcom: tzmem: fix virtual-to-physical address conversion
  firmware: qcom: scm: Mark get_wq_ctx() as atomic call
  arm64: dts: qcom: x1e80100: Fix Adreno SMMU global interrupt
  arm64: dts: qcom: disable GPU on x1e80100 by default
  arm64: dts: imx8mm-phygate: fix typo pinctrcl-0
  arm64: dts: imx95: correct L3Cache cache-sets
  arm64: dts: imx95: correct a55 power-domains
  arm64: dts: freescale: imx93-tqma9352-mba93xxla: fix typo
  arm64: dts: freescale: imx93-tqma9352: fix CMA alloc-ranges
  ARM: dts: imx6dl-yapp43: Increase LED current to match the yapp4 HW design
  arm64: dts: imx93: update default value for snps,clk-csr
  arm64: dts: freescale: tqma9352: Fix watchdog reset
  arm64: dts: imx8mp-beacon-kit: Fix Stereo Audio on WM8962
  ...

Merge tag 'input-for-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input fix from Dmitry Torokhov:

- a fix for Cypress PS/2 touchpad for regression introduced in 6.11
   merge window where a timeout condition is incorrectly reported for
   all extended Cypress commands

* tag 'input-for-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: cypress_ps2 - fix waiting for command response

Merge tag 'pci-v6.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fixes from Bjorn Helgaas:

- Add Manivannan Sadhasivam as PCI native host bridge and endpoint
   driver reviewer (Manivannan Sadhasivam)

- Disable MHI RAM data parity error interrupt for qcom SA8775P SoC to
   work around hardware erratum that causes a constant stream of
   interrupts (Manivannan Sadhasivam)

- Don't try to fall back to qcom Operating Performance Points (OPP)
   support unless the platform actually supports OPP (Manivannan
   Sadhasivam)

- Add [email protected] mailing list to MAINTAINERS for NXP
   layerscape and imx6 PCI controller drivers (Frank Li)

* tag 'pci-v6.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  MAINTAINERS: PCI: Add NXP PCI controller mailing list [email protected]
  PCI: qcom: Use OPP only if the platform supports it
  PCI: qcom-ep: Disable MHI RAM data parity error interrupt for SA8775P SoC
  MAINTAINERS: Add Manivannan Sadhasivam as Reviewer for PCI native host bridge and endpoint drivers

Merge tag 'block-6.11-20240830' of git://git.kernel.dk/linux

Pull block fix from Jens Axboe:
"Fix for a single regression for WRITE_SAME introduced in the 6.11
merge window"

* tag 'block-6.11-20240830' of git://git.kernel.dk/linux:
block: fix detection of unsupported WRITE SAME in blkdev_issue_write_zeroes

Merge tag 'io_uring-6.11-20240830' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

- A fix for a regression that happened in 6.11 merge window, where the
   copying of iovecs for compat mode applications got broken for certain
   cases.

- Fix for a bug introduced in 6.10, where if using recv/send bundles
   with classic provided buffers, the recv/send would fail to set the
   right iovec count. This caused 0 byte send/recv results. Found via
   code coverage testing and writing a test case to exercise it.

* tag 'io_uring-6.11-20240830' of git://git.kernel.dk/linux:
  io_uring/kbuf: return correct iovec count from classic buffer peek
  io_uring/rsrc: ensure compat iovecs are copied correctly

Merge tag 'at91-fixes-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into arm/fixes

Microchip AT91 fixes for v6.11

It contains:
- DTS directory update to match all entries not only those starting with
at91 or sama

Merge tag 'lsm-pr-20240830' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull lsm fix from Paul Moore:
"One small patch to correct a NFS permissions problem with SELinux and
Smack"

* tag 'lsm-pr-20240830' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
selinux,smack: don't bypass permissions check in inode_setsecctx hook

Merge tag 'pm-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These fix three issues in the amd-pstate cpufreq driver.

  Specifics:

   - Remove checks for highest performance match on preferred cores when
     updating preferred core ranking in amd-pstate (Mario Limonciello)

   - Make amd-pstate call topology_logical_package_id() instead of
     logical_die_id() to get a socked ID for a CPU (Gautham Shenoy)

   - Fix uninitialized variable in amd_pstate_cpu_boost_update() (Dan
     Carpenter)"

* tag 'pm-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq/amd-pstate-ut: Don't check for highest perf matching on prefcore
  cpufreq/amd-pstate: Use topology_logical_package_id() instead of logical_die_id()
  cpufreq: amd-pstate: Fix uninitialized variable in amd_pstate_cpu_boost_update()

Merge tag 'dmaengine-fix-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine

Pull dmaengine fixes from Vinod Koul:

- A bunch of dw driver changes to fix the src/dst addr width config

- Omap driver fix for sglen initialization

- stm32-dma3 driver lli_size init fix

- dw edma driver fixes for watermark interrupts and unmasking STOP and
   ABORT interrupts

* tag 'dmaengine-fix-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
  dmaengine: dw-edma: Do not enable watermark interrupts for HDMA
  dmaengine: dw-edma: Fix unmasking STOP and ABORT interrupts for HDMA
  dmaengine: stm32-dma3: Set lli_size after allocation
  dmaengine: ti: omap-dma: Initialize sglen after allocation
  dmaengine: dw: Unify ret-val local variables naming
  dmaengine: dw: Simplify max-burst calculation procedure
  dmaengine: dw: Define encode_maxburst() above prepare_ctllo() callbacks
  dmaengine: dw: Simplify prepare CTL_LO methods
  dmaengine: dw: Add memory bus width verification
  dmaengine: dw: Add peripheral bus width verification

Merge tag 'phy-fixes-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy

Pull phy fixes from Vinod Koul:

- Qualcomm QMP X1E80100 PCIe Gen4 PHY initialisation fix

- Freescale imx8mq tuning parameter name fix

- Samsung exynos5 fir for error code in probe()

- Xilinx Zynqmp SGMII linkup failure fix

* tag 'phy-fixes-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
  phy: xilinx: phy-zynqmp: Fix SGMII linkup failure on resume
  phy: exynos5-usbdrd: fix error code in probe()
  phy: fsl-imx8mq-usb: fix tuning parameter name
  phy: qcom: qmp-pcie: Fix X1E80100 PCIe Gen4 PHY initialisation

Merge tag 'soundwire-6.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire

Pull soundwire fix from Vinod Koul:

- Single fix for non-continous port map programming

* tag 'soundwire-6.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: stream: fix programming slave ports for non-continous port maps

Merge tag 'iommu-fixes-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu fixes from Joerg Roedel:

- Fix a device-stall problem in bad io-page-fault setups (faults
   received from devices with no supporting domain attached).

- Context flush fix for Intel VT-d.

- Do not allow non-read+non-write mapping through iommufd as most
   implementations can not handle that.

- Fix a possible infinite-loop issue in map_pages() path.

- Add Jean-Philippe as reviewer for SMMUv3 SVA support

* tag 'iommu-fixes-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  MAINTAINERS: Add Jean-Philippe as SMMUv3 SVA reviewer
  iommu: Do not return 0 from map_pages if it doesn't do anything
  iommufd: Do not allow creating areas without READ or WRITE
  iommu/vt-d: Fix incorrect domain ID in context flush helper
  iommu: Handle iommu faults for a bad iopf setup

MAINTAINERS: PCI: Add NXP PCI controller mailing list [email protected]

Add imx mailing list [email protected] for PCI controller of NXP chips
(Layerscape and iMX).

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Frank Li <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Acked-by: Richard Zhu <[email protected]>

io_uring/kbuf: return correct iovec count from classic buffer peek

io_provided_buffers_select() returns 0 to indicate success, but it should
be returning 1 to indicate that 1 vec was mapped. This causes peeking
to fail with classic provided buffers, and while that's not a use case
that anyone should use, it should still work correctly.

The end result is that no buffer will be selected, and hence a completion
with '0' as the result will be posted, without a buffer attached.

Fixes: 35c8711c8fc4 ("io_uring/kbuf: add helpers for getting/peeking multiple buffers")
Signed-off-by: Jens Axboe <[email protected]>

nfsd: fix nfsd4_deleg_getattr_conflict in presence of third party lease

It is not safe to dereference fl->c.flc_owner without first confirming
fl->fl_lmops is the expected manager.  nfsd4_deleg_getattr_conflict()
tests fl_lmops but largely ignores the result and assumes that flc_owner
is an nfs4_delegation anyway.  This is wrong.

With this patch we restore the "!= &nfsd_lease_mng_ops" case to behave
as it did before the change mentioned below.  This is the same as the
current code, but without any reference to a possible delegation.

Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

io_uring/rsrc: ensure compat iovecs are copied correctly

For buffer registration (or updates), a userspace iovec is copied in
and updated. If the application is within a compat syscall, then the
iovec type is compat_iovec rather than iovec. However, the type used
in __io_sqe_buffers_update() and io_sqe_buffers_register() is always
struct iovec, and hence the source is incremented by the size of a
non-compat iovec in the loop. This misses every other iovec in the
source, and will run into garbage half way through the copies and
return -EFAULT to the application.

Maintain the source address separately and assign to our user vec
pointer, so that copies always happen from the right source address.

While in there, correct a bad placement of __user which triggered
the following sparse warning prior to this fix:

io_uring/rsrc.c:981:33: warning: cast removes address space '__user' of expression
io_uring/rsrc.c:981:30: warning: incorrect type in assignment (different address spaces)
io_uring/rsrc.c:981:30: expected struct iovec const [noderef] __user *uvec
io_uring/rsrc.c:981:30: got struct iovec *[noderef] __user

Fixes: f4eaf8eda89e ("io_uring/rsrc: Drop io_copy_iov in favor of iovec API")
Reviewed-by: Gabriel Krisman Bertazi <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Merge tag 'usb-serial-6.11-rc6' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus

Johan writes:

USB-serial device id for 6.11-rc6

Here's a new modem device id.

This one has been in linux-next with no reported issues.

* tag 'usb-serial-6.11-rc6' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: add MeiG Smart SRM825L

Merge tag 'drm-fixes-2024-08-30' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Another week, another set of GPU fixes. amdgpu and vmwgfx leading the
  charge, then i915 and xe changes along with v3d and some other bits.
  The TTM revert is due to some stuttering graphical apps probably due
  to longer stalls while prefaulting.

  Seems pretty much where I'd expect things,

  ttm:
   - revert prefault change, caused stutters

  aperture:
   - handle non-VGA devices bettter

  amdgpu:
   - SWSMU gaming stability fix
   - SMU 13.0.7 fix
   - SWSMU documentation alignment fix
   - SMU 14.0.x fixes
   - GC 12.x fix
   - Display fix
   - IP discovery fix
   - SMU 13.0.6 fix

  i915:
   - Fix #11195: The external display connect via USB type-C dock stays
     blank after re-connect the dock
   - Make DSI backlight work for 2G version of Lenovo Yoga Tab 3 X90F
   - Move ARL GuC firmware to correct version

  xe:
   - Invalidate media_gt TLBs
   - Fix HWMON i1 power setup write command

  vmwgfx:
   - prevent unmapping active read buffers
   - fix prime with external buffers
   - disable coherent dumb buffers without 3d

  v3d:
   - disable preemption while updating GPU stats"

* tag 'drm-fixes-2024-08-30' of https://gitlab.freedesktop.org/drm/kernel:
  drm/xe/hwmon: Fix WRITE_I1 param from u32 to u16
  drm/v3d: Disable preemption while updating GPU stats
  drm/amd/pm: Drop unsupported features on smu v14_0_2
  drm/amd/pm: Add support for new P2S table revision
  drm/amdgpu: support for gc_info table v1.3
  drm/amd/display: avoid using null object of framebuffer
  drm/amdgpu/gfx12: set UNORD_DISPATCH in compute MQDs
  drm/amd/pm: update message interface for smu v14.0.2/3
  drm/amdgpu/swsmu: always force a state reprogram on init
  drm/amdgpu/smu13.0.7: print index for profiles
  drm/amdgpu: align pp_power_profile_mode with kernel docs
  drm/i915/dp_mst: Fix MST state after a sink reset
  drm/xe: Invalidate media_gt TLBs
  drm/i915: ARL requires a newer GSC firmware
  drm/i915/dsi: Make Lenovo Yoga Tab 3 X90F DMI match less strict
  video/aperture: optionally match the device in sysfb_disable()
  drm/vmwgfx: Disable coherent dumb buffers without 3d
  drm/vmwgfx: Fix prime with external buffers
  drm/vmwgfx: Prevent unmapping active read buffers
  Revert "drm/ttm: increase ttm pre-fault value to PMD size"

Merge tag 'drm-misc-fixes-2024-08-29' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

A revert for a previous TTM commit causing stuttering, 3 fixes for
vmwgfx related to buffer operations, a fix for video/aperture with
non-VGA primary devices, and a preemption status fix for v3d

Signed-off-by: Dave Airlie <[email protected]>
From: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20240829-efficient-swift-from-lemuria-f60c05@houat

Merge tag 'drm-xe-fixes-2024-08-29' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

- Invalidate media_gt TLBs (Brost)
- Fix HWMON i1 power setup write command (Karthik)

Signed-off-by: Dave Airlie <[email protected]>
From: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'execve-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull execve fix from Kees Cook:

- binfmt_elf_fdpic: fix AUXV size with ELF_HWCAP2 (Max Filippov)

* tag 'execve-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt_elf_fdpic: fix AUXV size calculation when ELF_HWCAP2 is defined

dcache: keep dentry_hashtable or d_hash_shift even when not used

The runtime constant feature removes all the users of these variables,
allowing the compiler to optimize them away.  It's quite difficult to
extract their values from the kernel text, and the memory saved by
removing them is tiny, and it was never the point of this optimization.

Since the dentry_hashtable is a core data structure, it's valuable for
debugging tools to be able to read it easily.  For instance, scripts
built on drgn, like the dentrycache script[1], rely on it to be able to
perform diagnostics on the contents of the dcache.  Annotate it as used,
so the compiler doesn't discard it.

Link: https://github.com/oracle-samples/drgn-tools/blob/3afc56146f54d09dfd1f6d3c1b7436eda7e638be/drgn_tools/dentry.py#L325-L355
Fixes: e3c92e81711d ("runtime constants: add x86 architecture support")
Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'drm-intel-fixes-2024-08-29' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Fix #11195: The external display connect via USB type-C dock stays blank after re-connect the dock
- Make DSI backlight work for 2G version of Lenovo Yoga Tab 3 X90F
. Move ARL GuC firmware to correct version
-

Signed-off-by: Dave Airlie <[email protected]>
From: Joonas Lahtinen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'hwmon-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

- pt5161l: Fix invalid temperature reading of bad ADC values

- asus-ec-sensors: Remove unsupported VRM temperature from X570-E
   GAMING

* tag 'hwmon-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (pt5161l) Fix invalid temperature reading
  hwmon: (asus-ec-sensors) remove VRM temp X570-E GAMING

Merge tag 'net-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth, wireless and netfilter.

  No known outstanding regressions.

  Current release - regressions:

   - wifi: iwlwifi: fix hibernation

   - eth: ionic: prevent tx_timeout due to frequent doorbell ringing

  Previous releases - regressions:

   - sched: fix sch_fq incorrect behavior for small weights

   - wifi:
      - iwlwifi: take the mutex before running link selection
      - wfx: repair open network AP mode

   - netfilter: restore IP sanity checks for netdev/egress

   - tcp: fix forever orphan socket caused by tcp_abort

   - mptcp: close subflow when receiving TCP+FIN

   - bluetooth: fix random crash seen while removing btnxpuart driver

  Previous releases - always broken:

   - mptcp: more fixes for the in-kernel PM

   - eth: bonding: change ipsec_lock from spin lock to mutex

   - eth: mana: fix race of mana_hwc_post_rx_wqe and new hwc response

  Misc:

   - documentation: drop special comment style for net code"

* tag 'net-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
  nfc: pn533: Add poll mod list filling check
  mailmap: update entry for Sriram Yagnaraman
  selftests: mptcp: join: check re-re-adding ID 0 signal
  mptcp: pm: ADD_ADDR 0 is not a new address
  selftests: mptcp: join: validate event numbers
  mptcp: avoid duplicated SUB_CLOSED events
  selftests: mptcp: join: check re-re-adding ID 0 endp
  mptcp: pm: fix ID 0 endp usage after multiple re-creations
  mptcp: pm: do not remove already closed subflows
  selftests: mptcp: join: no extra msg if no counter
  selftests: mptcp: join: check re-adding init endp with != id
  mptcp: pm: reset MPC endp ID when re-added
  mptcp: pm: skip connecting to already established sf
  mptcp: pm: send ACK on an active subflow
  selftests: mptcp: join: check removing ID 0 endpoint
  mptcp: pm: fix RM_ADDR ID for the initial subflow
  mptcp: pm: reuse ID 0 after delete and re-add
  net: busy-poll: use ktime_get_ns() instead of local_clock()
  sctp: fix association labeling in the duplicate COOKIE-ECHO case
  mptcp: pr_debug: add missing \n at the end
  ...

Input: cypress_ps2 - fix waiting for command response

Commit 8bccf667f62a ("Input: cypress_ps2 - report timeouts when reading
command status") uncovered an existing problem with cypress_ps2 driver:
it tries waiting on a PS/2 device waitqueue without using the rest of
libps2. Unfortunately without it nobody signals wakeup for the
waiting process, and each "extended" command was timing out. But the
rest of the code simply did not notice it.

Fix this by switching from homegrown way of sending request to get
command response and reading it to standard ps2_command() which does
the right thing.

Reported-by: Woody Suwalski <[email protected]>
Tested-by: Woody Suwalski <[email protected]>
Fixes: 8bccf667f62a ("Input: cypress_ps2 - report timeouts when reading command status")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>

drm/xe/hwmon: Fix WRITE_I1 param from u32 to u16

WRITE_I1 sub-command of the POWER_SETUP pcode command accepts a u16
parameter instead of u32. This change prevents potential illegal
sub-command errors.

v2: Mask uval instead of changing the prototype. (Badal)

v3: Rephrase commit message. (Badal)

Signed-off-by: Karthik Poosa <[email protected]>
Fixes: 92d44a422d0d ("drm/xe/hwmon: Expose card reactive critical power")
Reviewed-by: Badal Nilawar <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
(cherry picked from commit a7f657097e96d8fa745c74bb1a239ebd5a8c971c)
Signed-off-by: Rodrigo Vivi <[email protected]>

nfc: pn533: Add poll mod list filling check

In case of im_protocols value is 1 and tm_protocols value is 0 this
combination successfully passes the check
'if (!im_protocols && !tm_protocols)' in the nfc_start_poll().
But then after pn533_poll_create_mod_list() call in pn533_start_poll()
poll mod list will remain empty and dev->poll_mod_count will remain 0
which lead to division by zero.

Normally no im protocol has value 1 in the mask, so this combination is
not expected by driver. But these protocol values actually come from
userspace via Netlink interface (NFC_CMD_START_POLL operation). So a
broken or malicious program may pass a message containing a "bad"
combination of protocol parameter values so that dev->poll_mod_count
is not incremented inside pn533_poll_create_mod_list(), thus leading
to division by zero.
Call trace looks like:
nfc_genl_start_poll()
  nfc_start_poll()
    ->start_poll()
    pn533_start_poll()

Add poll mod list filling check.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: dfccd0f58044 ("NFC: pn533: Add some polling entropy")
Signed-off-by: Aleksandr Mishin <[email protected]>
Acked-by: Krzysztof Kozlowski <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge tag 'nf-24-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

Patch #1 sets on NFT_PKTINFO_L4PROTO for UDP packets less than 4 bytes
payload from netdev/egress by subtracting skb_network_offset() when
validating IPv4 packet length, otherwise 'meta l4proto udp' never
matches.

Patch #2 subtracts skb_network_offset() when validating IPv6 packet
length for netdev/egress.

netfilter pull request 24-08-28

* tag 'nf-24-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables_ipv6: consider network offset in netdev/egress validation
netfilter: nf_tables: restore IP sanity checks for netdev/egress
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

mailmap: update entry for Sriram Yagnaraman

Link my old est.tech address to my active mail address

Signed-off-by: Sriram Yagnaraman <[email protected]>
Reviewed-by: Kurt Kanzenbach <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'mptcp-more-fixes-for-the-in-kernel-pm'

Matthieu Baerts says:

====================
mptcp: more fixes for the in-kernel PM

Here is a new batch of fixes for the MPTCP in-kernel path-manager:

Patch 1 ensures the address ID is set to 0 when the path-manager sends
an ADD_ADDR for the address of the initial subflow. The same fix is
applied when a new subflow is created re-using this special address. A
fix for v6.0.

Patch 2 is similar, but for the case where an endpoint is removed: if
this endpoint was used for the initial address, it is important to send
a RM_ADDR with this ID set to 0, and look for existing subflows with the
ID set to 0. A fix for v6.0 as well.

Patch 3 validates the two previous patches.

Patch 4 makes the PM selecting an "active" path to send an address
notification in an ACK, instead of taking the first path in the list. A
fix for v5.11.

Patch 5 fixes skipping the establishment of a new subflow if a previous
subflow using the same pair of addresses is being closed. A fix for
v5.13.

Patch 6 resets the ID linked to the initial subflow when the linked
endpoint is re-added, possibly with a different ID. A fix for v6.0.

Patch 7 validates the three previous patches.

Patch 8 is a small fix for the MPTCP Join selftest, when being used with
older subflows not supporting all MIB counters. A fix for a commit
introduced in v6.4, but backported up to v5.10.

Patch 9 avoids the PM to try to close the initial subflow multiple
times, and increment counters while nothing happened. A fix for v5.10.

Patch 10 stops incrementing local_addr_used and add_addr_accepted
counters when dealing with the address ID 0, because these counters are
not taking into account the initial subflow, and are then not
decremented when the linked addresses are removed. A fix for v6.0.

Patch 11 validates the previous patch.

Patch 12 avoids the PM to send multiple SUB_CLOSED events for the
initial subflow. A fix for v5.12.

Patch 13 validates the previous patch.

Patch 14 stops treating the ADD_ADDR 0 as a new address, and accepts it
in order to re-create the initial subflow if it has been closed, even if
the limit for *new* addresses -- not taking into account the address of
the initial subflow -- has been reached. A fix for v5.10.

Patch 15 validates the previous patch.

Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
---
Matthieu Baerts (NGI0) (15):
      mptcp: pm: reuse ID 0 after delete and re-add
      mptcp: pm: fix RM_ADDR ID for the initial subflow
      selftests: mptcp: join: check removing ID 0 endpoint
      mptcp: pm: send ACK on an active subflow
      mptcp: pm: skip connecting to already established sf
      mptcp: pm: reset MPC endp ID when re-added
      selftests: mptcp: join: check re-adding init endp with != id
      selftests: mptcp: join: no extra msg if no counter
      mptcp: pm: do not remove already closed subflows
      mptcp: pm: fix ID 0 endp usage after multiple re-creations
      selftests: mptcp: join: check re-re-adding ID 0 endp
      mptcp: avoid duplicated SUB_CLOSED events
      selftests: mptcp: join: validate event numbers
      mptcp: pm: ADD_ADDR 0 is not a new address
      selftests: mptcp: join: check re-re-adding ID 0 signal

net/mptcp/pm.c                                  |   4 +-
net/mptcp/pm_netlink.c                          |  87 ++++++++++----
net/mptcp/protocol.c                            |   6 +
net/mptcp/protocol.h                            |   5 +-
tools/testing/selftests/net/mptcp/mptcp_join.sh | 153 ++++++++++++++++++++----
tools/testing/selftests/net/mptcp/mptcp_lib.sh  |   4 +
6 files changed, 209 insertions(+), 50 deletions(-)
---
base-commit: 3a0504d54b3b57f0d7bf3d9184a00c9f8887f6d7
change-id: 20240826-net-mptcp-more-pm-fix-ffa61a36f817

Best regards,
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

selftests: mptcp: join: check re-re-adding ID 0 signal

This test extends "delete re-add signal" to validate the previous
commit: when the 'signal' endpoint linked to the initial subflow (ID 0)
is re-added multiple times, it will re-send the ADD_ADDR with id 0. The
client should still be able to re-create this subflow, even if the
add_addr_accepted limit has been reached as this special address is not
considered as a new address.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: d0876b2284cf ("mptcp: add the incoming RM_ADDR support")
Cc: [email protected]
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

mptcp: pm: ADD_ADDR 0 is not a new address

The ADD_ADDR 0 with the address from the initial subflow should not be
considered as a new address: this is not something new. If the host
receives it, it simply means that the address is available again.

When receiving an ADD_ADDR for the ID 0, the PM already doesn't consider
it as new by not incrementing the 'add_addr_accepted' counter. But the
'accept_addr' might not be set if the limit has already been reached:
this can be bypassed in this case. But before, it is important to check
that this ADD_ADDR for the ID 0 is for the same address as the initial
subflow. If not, it is not something that should happen, and the
ADD_ADDR can be ignored.

Note that if an ADD_ADDR is received while there is already a subflow
opened using the same address, this ADD_ADDR is ignored as well. It
means that if multiple ADD_ADDR for ID 0 are received, there will not be
any duplicated subflows created by the client.

Fixes: d0876b2284cf ("mptcp: add the incoming RM_ADDR support")
Cc: [email protected]
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>