Git Repo - linux.git/log

Merge tag 'landlock-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull Landlock updates from Mickaël Salaün:
"New tests, a slight optimization, and some cosmetic changes"

* tag 'landlock-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  landlock: Optimize the number of calls to get_access_mask slightly
  selftests/landlock: Rename "permitted" to "allowed" in ftruncate tests
  landlock: Remove remaining "inline" modifiers in .c files [v6.6]
  landlock: Remove remaining "inline" modifiers in .c files [v6.1]
  landlock: Remove remaining "inline" modifiers in .c files [v5.15]
  selftests/landlock: Add tests to check unhandled rule's access rights
  selftests/landlock: Add tests to check unknown rule's access rights

Merge tag 'lsm-pr-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull security module updates from Paul Moore:

- Add three new syscalls: lsm_list_modules(), lsm_get_self_attr(), and
   lsm_set_self_attr().

   The first syscall simply lists the LSMs enabled, while the second and
   third get and set the current process' LSM attributes. Yes, these
   syscalls may provide similar functionality to what can be found under
   /proc or /sys, but they were designed to support multiple,
   simultaneaous (stacked) LSMs from the start as opposed to the current
   /proc based solutions which were created at a time when only one LSM
   was allowed to be active at a given time.

   We have spent considerable time discussing ways to extend the
   existing /proc interfaces to support multiple, simultaneaous LSMs and
   even our best ideas have been far too ugly to support as a kernel
   API; after +20 years in the kernel, I felt the LSM layer had
   established itself enough to justify a handful of syscalls.

   Support amongst the individual LSM developers has been nearly
   unanimous, with a single objection coming from Tetsuo (TOMOYO) as he
   is worried that the LSM_ID_XXX token concept will make it more
   difficult for out-of-tree LSMs to survive. Several members of the LSM
   community have demonstrated the ability for out-of-tree LSMs to
   continue to exist by picking high/unused LSM_ID values as well as
   pointing out that many kernel APIs rely on integer identifiers, e.g.
   syscalls (!), but unfortunately Tetsuo's objections remain.

   My personal opinion is that while I have no interest in penalizing
   out-of-tree LSMs, I'm not going to penalize in-tree development to
   support out-of-tree development, and I view this as a necessary step
   forward to support the push for expanded LSM stacking and reduce our
   reliance on /proc and /sys which has occassionally been problematic
   for some container users. Finally, we have included the linux-api
   folks on (all?) recent revisions of the patchset and addressed all of
   their concerns.

- Add a new security_file_ioctl_compat() LSM hook to handle the 32-bit
   ioctls on 64-bit systems problem.

   This patch includes support for all of the existing LSMs which
   provide ioctl hooks, although it turns out only SELinux actually
   cares about the individual ioctls. It is worth noting that while
   Casey (Smack) and Tetsuo (TOMOYO) did not give explicit ACKs to this
   patch, they did both indicate they are okay with the changes.

- Fix a potential memory leak in the CALIPSO code when IPv6 is disabled
   at boot.

   While it's good that we are fixing this, I doubt this is something
   users are seeing in the wild as you need to both disable IPv6 and
   then attempt to configure IPv6 labeled networking via
   NetLabel/CALIPSO; that just doesn't make much sense.

   Normally this would go through netdev, but Jakub asked me to take
   this patch and of all the trees I maintain, the LSM tree seemed like
   the best fit.

- Update the LSM MAINTAINERS entry with additional information about
   our process docs, patchwork, bug reporting, etc.

   I also noticed that the Lockdown LSM is missing a dedicated
   MAINTAINERS entry so I've added that to the pull request. I've been
   working with one of the major Lockdown authors/contributors to see if
   they are willing to step up and assume a Lockdown maintainer role;
   hopefully that will happen soon, but in the meantime I'll continue to
   look after it.

- Add a handful of mailmap entries for Serge Hallyn and myself.

* tag 'lsm-pr-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (27 commits)
  lsm: new security_file_ioctl_compat() hook
  lsm: Add a __counted_by() annotation to lsm_ctx.ctx
  calipso: fix memory leak in netlbl_calipso_add_pass()
  selftests: remove the LSM_ID_IMA check in lsm/lsm_list_modules_test
  MAINTAINERS: add an entry for the lockdown LSM
  MAINTAINERS: update the LSM entry
  mailmap: add entries for Serge Hallyn's dead accounts
  mailmap: update/replace my old email addresses
  lsm: mark the lsm_id variables are marked as static
  lsm: convert security_setselfattr() to use memdup_user()
  lsm: align based on pointer length in lsm_fill_user_ctx()
  lsm: consolidate buffer size handling into lsm_fill_user_ctx()
  lsm: correct error codes in security_getselfattr()
  lsm: cleanup the size counters in security_getselfattr()
  lsm: don't yet account for IMA in LSM_CONFIG_COUNT calculation
  lsm: drop LSM_ID_IMA
  LSM: selftests for Linux Security Module syscalls
  SELinux: Add selfattr hooks
  AppArmor: Add selfattr hooks
  Smack: implement setselfattr and getselfattr hooks
  ...

Merge tag 'selinux-pr-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:

- Add a new SELinux initial SID, SECINITSID_INIT, to represent
   userspace processes started before the SELinux policy is loaded in
   early boot.

   Prior to this patch all processes were marked as SECINITSID_KERNEL
   before the SELinux policy was loaded, making it difficult to
   distinquish early boot userspace processes from the kernel in the
   SELinux policy.

   For most users this will be a non-issue as the policy is loaded early
   enough during boot, but for users who load their SELinux policy
   relatively late, this should make it easier to construct meaningful
   security policies.

- Cleanups to the selinuxfs code by Al, mostly on VFS related issues
   during a policy reload.

   The commit description has more detail, but the quick summary is that
   we are replacing a disconnected directory approach with a temporary
   directory that we swapover at the end of the reload.

- Fix an issue where the input sanity checking on socket bind()
   operations was slightly different depending on the presence of
   SELinux.

   This is caused by the placement of the LSM hooks in the generic
   socket layer as opposed to the protocol specific bind() handler where
   the protocol specific sanity checks are performed. Mickaël has
   mentioned that he is working to fix this, but in the meantime we just
   ensure that we are replicating the checks properly.

   We need to balance the placement of the LSM hooks with the number of
   LSM hooks; pushing the hooks down into the protocol layers is likely
   not the right answer.

- Update the avc_has_perm_noaudit() prototype to better match the
   function definition.

- Migrate from using partial_name_hash() to full_name_hash() the
   filename transition hash table.

   This improves the quality of the code and has the potential for a
   minor performance bump.

- Consolidate some open coded SELinux access vector comparisions into a
   single new function, avtab_node_cmp(), and use that instead.

   A small, but nice win for code quality and maintainability.

- Updated the SELinux MAINTAINERS entry with additional information
   around process, bug reporting, etc.

   We're also updating some of our "official" roles: dropping Eric Paris
   and adding Ondrej as a reviewer.

- Cleanup the coding style crimes in security/selinux/include.

   While I'm not a fan of code churn, I am pushing for more automated
   code checks that can be done at the developer level and one of the
   obvious things to check for is coding style.

   In an effort to start from a "good" base I'm slowly working through
   our source files cleaning them up with the help of clang-format and
   good ol' fashioned human eyeballs; this has the first batch of these
   changes.

   I've been splitting the changes up per-file to help reduce the impact
   if backports are required (either for LTS or distro kernels), and I
   expect the some of the larger files, e.g. hooks.c and ss/services.c,
   will likely need to be split even further.

- Cleanup old, outdated comments.

* tag 'selinux-pr-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: (24 commits)
  selinux: Fix error priority for bind with AF_UNSPEC on PF_INET6 socket
  selinux: fix style issues in security/selinux/include/initial_sid_to_string.h
  selinux: fix style issues in security/selinux/include/xfrm.h
  selinux: fix style issues in security/selinux/include/security.h
  selinux: fix style issues with security/selinux/include/policycap_names.h
  selinux: fix style issues in security/selinux/include/policycap.h
  selinux: fix style issues in security/selinux/include/objsec.h
  selinux: fix style issues with security/selinux/include/netlabel.h
  selinux: fix style issues in security/selinux/include/netif.h
  selinux: fix style issues in security/selinux/include/ima.h
  selinux: fix style issues in security/selinux/include/conditional.h
  selinux: fix style issues in security/selinux/include/classmap.h
  selinux: fix style issues in security/selinux/include/avc_ss.h
  selinux: align avc_has_perm_noaudit() prototype with definition
  selinux: fix style issues in security/selinux/include/avc.h
  selinux: fix style issues in security/selinux/include/audit.h
  MAINTAINERS: drop Eric Paris from his SELinux role
  MAINTAINERS: add Ondrej Mosnacek as a SELinux reviewer
  selinux: remove the wrong comment about multithreaded process handling
  selinux: introduce an initial SID for early boot processes
  ...

Merge tag 'audit-pr-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:
"The audit updates are fairly minor with only two patches:

   - Send an audit ACK to userspace immediately upon receiving an auditd
     registration event as opposed to waiting until the registration has
     been fully processed and the audit backlog starts filling the
     netlink buffers.

     Sending the ACK earlier, as done here, is still safe as the
     operation should not fail at the point when the ACK is done, and
     doing so helps avoid the ACK being dropped in extreme situations.

   - Update the audit MAINTAINERS entry with additional information.

     There isn't anything in this update that should be new to regular
     contributors or list subscribers, but I'm pushing to start
     documenting our processes, conventions, etc. and this seems like an
     important part of that"

* tag 'audit-pr-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  MAINTAINERS: update the audit entry
  audit: Send netlink ACK before setting connection in auditd_set

Merge tag 'mm-nonmm-stable-2024-01-09-10-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
"Quite a lot of kexec work this time around. Many singleton patches in
  many places. The notable patch series are:

   - nilfs2 folio conversion from Matthew Wilcox in 'nilfs2: Folio
     conversions for file paths'.

   - Additional nilfs2 folio conversion from Ryusuke Konishi in 'nilfs2:
     Folio conversions for directory paths'.

   - IA64 remnant removal in Heiko Carstens's 'Remove unused code after
     IA-64 removal'.

   - Arnd Bergmann has enabled the -Wmissing-prototypes warning
     everywhere in 'Treewide: enable -Wmissing-prototypes'. This had
     some followup fixes:

      - Nathan Chancellor has cleaned up the hexagon build in the series
        'hexagon: Fix up instances of -Wmissing-prototypes'.

      - Nathan also addressed some s390 warnings in 's390: A couple of
        fixes for -Wmissing-prototypes'.

      - Arnd Bergmann addresses the same warnings for MIPS in his series
        'mips: address -Wmissing-prototypes warnings'.

   - Baoquan He has made kexec_file operate in a top-down-fitting manner
     similar to kexec_load in the series 'kexec_file: Load kernel at top
     of system RAM if required'

   - Baoquan He has also added the self-explanatory 'kexec_file: print
     out debugging message if required'.

   - Some checkstack maintenance work from Tiezhu Yang in the series
     'Modify some code about checkstack'.

   - Douglas Anderson has disentangled the watchdog code's logging when
     multiple reports are occurring simultaneously. The series is
     'watchdog: Better handling of concurrent lockups'.

   - Yuntao Wang has contributed some maintenance work on the crash code
     in 'crash: Some cleanups and fixes'"

* tag 'mm-nonmm-stable-2024-01-09-10-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (157 commits)
  crash_core: fix and simplify the logic of crash_exclude_mem_range()
  x86/crash: use SZ_1M macro instead of hardcoded value
  x86/crash: remove the unused image parameter from prepare_elf_headers()
  kdump: remove redundant DEFAULT_CRASH_KERNEL_LOW_SIZE
  scripts/decode_stacktrace.sh: strip unexpected CR from lines
  watchdog: if panicking and we dumped everything, don't re-enable dumping
  watchdog/hardlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
  watchdog/softlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
  watchdog/hardlockup: adopt softlockup logic avoiding double-dumps
  kexec_core: fix the assignment to kimage->control_page
  x86/kexec: fix incorrect end address passed to kernel_ident_mapping_init()
  lib/trace_readwrite.c:: replace asm-generic/io with linux/io
  nilfs2: cpfile: fix some kernel-doc warnings
  stacktrace: fix kernel-doc typo
  scripts/checkstack.pl: fix no space expression between sp and offset
  x86/kexec: fix incorrect argument passed to kexec_dprintk()
  x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs
  nilfs2: add missing set_freezable() for freezable kthread
  kernel: relay: remove relay_file_splice_read dead code, doesn't work
  docs: submit-checklist: remove all of "make namespacecheck"
  ...

Merge tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
"Many singleton patches against the MM code. The patch series which are
  included in this merge do the following:

   - Peng Zhang has done some mapletree maintainance work in the series

'maple_tree: add mt_free_one() and mt_attr() helpers'
'Some cleanups of maple tree'

   - In the series 'mm: use memmap_on_memory semantics for dax/kmem'
     Vishal Verma has altered the interworking between memory-hotplug
     and dax/kmem so that newly added 'device memory' can more easily
     have its memmap placed within that newly added memory.

   - Matthew Wilcox continues folio-related work (including a few fixes)
     in the patch series

'Add folio_zero_tail() and folio_fill_tail()'
'Make folio_start_writeback return void'
'Fix fault handler's handling of poisoned tail pages'
'Convert aops->error_remove_page to ->error_remove_folio'
'Finish two folio conversions'
'More swap folio conversions'

   - Kefeng Wang has also contributed folio-related work in the series

'mm: cleanup and use more folio in page fault'

   - Jim Cromie has improved the kmemleak reporting output in the series
     'tweak kmemleak report format'.

   - In the series 'stackdepot: allow evicting stack traces' Andrey
     Konovalov to permits clients (in this case KASAN) to cause eviction
     of no longer needed stack traces.

   - Charan Teja Kalla has fixed some accounting issues in the page
     allocator's atomic reserve calculations in the series 'mm:
     page_alloc: fixes for high atomic reserve caluculations'.

   - Dmitry Rokosov has added to the samples/ dorectory some sample code
     for a userspace memcg event listener application. See the series
     'samples: introduce cgroup events listeners'.

   - Some mapletree maintanance work from Liam Howlett in the series
     'maple_tree: iterator state changes'.

   - Nhat Pham has improved zswap's approach to writeback in the series
     'workload-specific and memory pressure-driven zswap writeback'.

   - DAMON/DAMOS feature and maintenance work from SeongJae Park in the
     series

'mm/damon: let users feed and tame/auto-tune DAMOS'
'selftests/damon: add Python-written DAMON functionality tests'
'mm/damon: misc updates for 6.8'

   - Yosry Ahmed has improved memcg's stats flushing in the series 'mm:
     memcg: subtree stats flushing and thresholds'.

   - In the series 'Multi-size THP for anonymous memory' Ryan Roberts
     has added a runtime opt-in feature to transparent hugepages which
     improves performance by allocating larger chunks of memory during
     anonymous page faults.

   - Matthew Wilcox has also contributed some cleanup and maintenance
     work against eh buffer_head code int he series 'More buffer_head
     cleanups'.

   - Suren Baghdasaryan has done work on Andrea Arcangeli's series
     'userfaultfd move option'. UFFDIO_MOVE permits userspace heap
     compaction algorithms to move userspace's pages around rather than
     UFFDIO_COPY'a alloc/copy/free.

   - Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm:
     Add ksm advisor'. This is a governor which tunes KSM's scanning
     aggressiveness in response to userspace's current needs.

   - Chengming Zhou has optimized zswap's temporary working memory use
     in the series 'mm/zswap: dstmem reuse optimizations and cleanups'.

   - Matthew Wilcox has performed some maintenance work on the writeback
     code, both code and within filesystems. The series is 'Clean up the
     writeback paths'.

   - Andrey Konovalov has optimized KASAN's handling of alloc and free
     stack traces for secondary-level allocators, in the series 'kasan:
     save mempool stack traces'.

   - Andrey also performed some KASAN maintenance work in the series
     'kasan: assorted clean-ups'.

   - David Hildenbrand has gone to town on the rmap code. Cleanups, more
     pte batching, folio conversions and more. See the series 'mm/rmap:
     interface overhaul'.

   - Kinsey Ho has contributed some maintenance work on the MGLRU code
     in the series 'mm/mglru: Kconfig cleanup'.

   - Matthew Wilcox has contributed lruvec page accounting code cleanups
     in the series 'Remove some lruvec page accounting functions'"

* tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits)
  mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
  mm, treewide: introduce NR_PAGE_ORDERS
  selftests/mm: add separate UFFDIO_MOVE test for PMD splitting
  selftests/mm: skip test if application doesn't has root privileges
  selftests/mm: conform test to TAP format output
  selftests: mm: hugepage-mmap: conform to TAP format output
  selftests/mm: gup_test: conform test to TAP format output
  mm/selftests: hugepage-mremap: conform test to TAP format output
  mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING
  mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large
  mm/memcontrol: remove __mod_lruvec_page_state()
  mm/khugepaged: use a folio more in collapse_file()
  slub: use a folio in __kmalloc_large_node
  slub: use folio APIs in free_large_kmalloc()
  slub: use alloc_pages_node() in alloc_slab_page()
  mm: remove inc/dec lruvec page state functions
  mm: ratelimit stat flush from workingset shrinker
  kasan: stop leaking stack trace handles
  mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE
  mm/mglru: add dummy pmd_dirty()
  ...

Merge tag 'slab-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab updates from Vlastimil Babka:

- SLUB: delayed freezing of CPU partial slabs (Chengming Zhou)

   Freezing is an operation involving double_cmpxchg() that makes a slab
   exclusive for a particular CPU. Chengming noticed that we use it also
   in situations where we are not yet installing the slab as the CPU
   slab, because freezing also indicates that the slab is not on the
   shared list. This results in redundant freeze/unfreeze operation and
   can be avoided by marking separately the shared list presence by
   reusing the PG_workingset flag.

   This approach neatly avoids the issues described in 9b1ea29bc0d7
   ("Revert "mm, slub: consider rest of partial list if acquire_slab()
   fails"") as we can now grab a slab from the shared list in a quick
   and guaranteed way without the cmpxchg_double() operation that
   amplifies the lock contention and can fail.

   As a result, lkp has reported 34.2% improvement of
   stress-ng.rawudp.ops_per_sec

- SLAB removal and SLUB cleanups (Vlastimil Babka)

   The SLAB allocator has been deprecated since 6.5 and nobody has
   objected so far. We agreed at LSF/MM to wait until the next LTS,
   which is 6.6, so we should be good to go now.

   This doesn't yet erase all traces of SLAB outside of mm/ so some dead
   code, comments or documentation remain, and will be cleaned up
   gradually (some series are already in the works).

   Removing the choice of allocators has already allowed to simplify and
   optimize the code wiring up the kmalloc APIs to the SLUB
   implementation.

* tag 'slab-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (34 commits)
  mm/slub: free KFENCE objects in slab_free_hook()
  mm/slub: handle bulk and single object freeing separately
  mm/slub: introduce __kmem_cache_free_bulk() without free hooks
  mm/slub: fix bulk alloc and free stats
  mm/slub: optimize free fast path code layout
  mm/slub: optimize alloc fastpath code layout
  mm/slub: remove slab_alloc() and __kmem_cache_alloc_lru() wrappers
  mm/slab: move kmalloc() functions from slab_common.c to slub.c
  mm/slab: move kmalloc_slab() to mm/slab.h
  mm/slab: move kfree() from slab_common.c to slub.c
  mm/slab: move struct kmem_cache_node from slab.h to slub.c
  mm/slab: move memcg related functions from slab.h to slub.c
  mm/slab: move pre/post-alloc hooks from slab.h to slub.c
  mm/slab: consolidate includes in the internal mm/slab.h
  mm/slab: move the rest of slub_def.h to mm/slab.h
  mm/slab: move struct kmem_cache_cpu declaration to slub.c
  mm/slab: remove mm/slab.c and slab_def.h
  mm/mempool/dmapool: remove CONFIG_DEBUG_SLAB ifdefs
  mm/slab: remove CONFIG_SLAB code from slab common code
  cpu/hotplug: remove CPUHP_SLAB_PREPARE hooks
  ...

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Merge in late fixes to prepare for the 6.8 net-next PR

No conflicts.

Signed-off-by: Paolo Abeni <[email protected]>

tpm: cr50: fix kernel-doc warning and spelling

Fix kernel-doc notation to prevent a warning:
tpm_tis_i2c_cr50.c:681: warning: Excess function parameter 'id' description in 'tpm_cr50_i2c_probe'

and fix a spelling error reported by codespell.

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Peter Huewe <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: [email protected]
Reviewed-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>

tpm: nuvoton: Use i2c_get_match_data()

Use preferred i2c_get_match_data() instead of of_match_device() to
get the driver match data. With this, adjust the includes to explicitly
include the correct headers.

Signed-off-by: Rob Herring <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>

cifs: remove unneeded return statement

Return statement was not needed at end of cifs_chan_update_iface

Suggested-by: Christophe Jaillet <[email protected]>
Signed-off-by: Steve French <[email protected]>

cifs: make cifs_chan_update_iface() a void function

The return values for cifs_chan_update_iface() didn't match what the
documentation said and nothing was checking them anyway. Just make it
a void function.

Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Steve French <[email protected]>

Merge tag 'cgroup-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:

- Yafang Shao added task_get_cgroup1() helper to enable a similar BPF
   helper so that BPF progs can be more useful on cgroup1 hierarchies.
   While cgroup1 is mostly in maintenance mode, this addition is very
   small while having an outsized usefulness for users who are still on
   cgroup1. Yafang also optimized root cgroup list access by making it
   RCU protected in the process.

- Waiman Long optimized rstat operation leading to substantially lower
   and more consistent lock hold time while flushing the hierarchical
   statistics. As the lock can be acquired briefly in various hot paths,
   this reduction has cascading benefits.

- Waiman also improved the quality of isolation for cpuset's isolated
   partitions. CPUs which are allocated to isolated partitions are now
   excluded from running unbound work items and cpu_is_isolated() test
   which is used by vmstat and memcg to reduce interference now includes
   cpuset isolated CPUs. While it isn't there yet, the hope is
   eventually reaching parity with the isolation level provided by the
   `isolcpus` boot param but in a dynamic manner.

* tag 'cgroup-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Move rcu_head up near the top of cgroup_root
  cgroup/cpuset: Include isolated cpuset CPUs in cpu_is_isolated() check
  cgroup: Avoid false cacheline sharing of read mostly rstat_cpu
  cgroup/rstat: Optimize cgroup_rstat_updated_list()
  cgroup: Fix documentation for cpu.idle
  cgroup/cpuset: Expose cpuset.cpus.isolated
  workqueue: Move workqueue_set_unbound_cpumask() and its helpers inside CONFIG_SYSFS
  cgroup/rstat: Reduce cpu_lock hold time in cgroup_rstat_flush_locked()
  cgroup/cpuset: Take isolated CPUs out of workqueue unbound cpumask
  cgroup/cpuset: Keep track of CPUs in isolated partitions
  selftests/cgroup: Minor code cleanup and reorganization of test_cpuset_prs.sh
  workqueue: Add workqueue_unbound_exclude_cpumask() to exclude CPUs from wq_unbound_cpumask
  selftests: cgroup: Fixes a typo in a comment
  cgroup: Add a new helper for cgroup1 hierarchy
  cgroup: Add annotation for holding namespace_sem in current_cgns_cgroup_from_root()
  cgroup: Eliminate the need for cgroup_mutex in proc_cgroup_show()
  cgroup: Make operations on the cgroup root_list RCU safe
  cgroup: Remove unnecessary list_empty()

Merge tag 'sched-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
"Energy scheduling:

   - Consolidate how the max compute capacity is used in the scheduler
     and how we calculate the frequency for a level of utilization.

   - Rework interface between the scheduler and the schedutil governor

   - Simplify the util_est logic

  Deadline scheduler:

   - Work more towards reducing SCHED_DEADLINE starvation of low
     priority tasks (e.g., SCHED_OTHER) tasks when higher priority tasks
     monopolize CPU cycles, via the introduction of 'deadline servers'
     (nested/2-level scheduling).

     "Fair servers" to make use of this facility are not introduced yet.

  EEVDF:

   - Introduce O(1) fastpath for EEVDF task selection

  NUMA balancing:

   - Tune the NUMA-balancing vma scanning logic some more, to better
     distribute the probability of a particular vma getting scanned.

  Plus misc fixes, cleanups and updates"

* tag 'sched-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
  sched/fair: Fix tg->load when offlining a CPU
  sched/fair: Remove unused 'next_buddy_marked' local variable in check_preempt_wakeup_fair()
  sched/fair: Use all little CPUs for CPU-bound workloads
  sched/fair: Simplify util_est
  sched/fair: Remove SCHED_FEAT(UTIL_EST_FASTUP, true)
  arm64/amu: Use capacity_ref_freq() to set AMU ratio
  cpufreq/cppc: Set the frequency used for computing the capacity
  cpufreq/cppc: Move and rename cppc_cpufreq_{perf_to_khz|khz_to_perf}()
  energy_model: Use a fixed reference frequency
  cpufreq/schedutil: Use a fixed reference frequency
  cpufreq: Use the fixed and coherent frequency for scaling capacity
  sched/topology: Add a new arch_scale_freq_ref() method
  freezer,sched: Clean saved_state when restoring it during thaw
  sched/fair: Update min_vruntime for reweight_entity() correctly
  sched/doc: Update documentation after renames and synchronize Chinese version
  sched/cpufreq: Rework iowait boost
  sched/cpufreq: Rework schedutil governor performance estimation
  sched/pelt: Avoid underestimation of task utilization
  sched/timers: Explain why idle task schedules out on remote timer enqueue
  sched/cpuidle: Comment about timers requirements VS idle handler
  ...

Merge tag 'perf-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull performance events updates from Ingo Molnar:

- Add branch stack counters ABI extension to better capture the growing
   amount of information the PMU exposes via branch stack sampling.
   There's matching tooling support.

- Fix race when creating the nr_addr_filters sysfs file

- Add Intel Sierra Forest and Grand Ridge intel/cstate PMU support

- Add Intel Granite Rapids, Sierra Forest and Grand Ridge uncore PMU
   support

- Misc cleanups & fixes

* tag 'perf-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Factor out topology_gidnid_map()
  perf/x86/intel/uncore: Fix NULL pointer dereference issue in upi_fill_topology()
  perf/x86/amd: Reject branch stack for IBS events
  perf/x86/intel/uncore: Support Sierra Forest and Grand Ridge
  perf/x86/intel/uncore: Support IIO free-running counters on GNR
  perf/x86/intel/uncore: Support Granite Rapids
  perf/x86/uncore: Use u64 to replace unsigned for the uncore offsets array
  perf/x86/intel/uncore: Generic uncore_get_uncores and MMIO format of SPR
  perf: Fix the nr_addr_filters fix
  perf/x86/intel/cstate: Add Grand Ridge support
  perf/x86/intel/cstate: Add Sierra Forest support
  x86/smp: Export symbol cpu_clustergroup_mask()
  perf/x86/intel/cstate: Cleanup duplicate attr_groups
  perf/core: Fix narrow startup race when creating the perf nr_addr_filters sysfs file
  perf/x86/intel: Support branch counters logging
  perf/x86/intel: Reorganize attrs and is_visible
  perf: Add branch_sample_call_stack
  perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag
  perf: Add branch stack counters

Merge tag 'irq-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq subsystem updates from Ingo Molnar:

- Add support for the IA55 interrupt controller on RZ/G3S SoC's

- Update/fix the Qualcom MPM Interrupt Controller driver's register
   enumeration within the somewhat exotic "RPM Message RAM" MMIO-mapped
   shared memory region that is used for other purposes as well

- Clean up the Xtensa built-in Programmable Interrupt Controller driver
   (xtensa-pic) a bit

* tag 'irq-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/irq-xtensa-pic: Clean up
  irqchip/qcom-mpm: Support passing a slice of SRAM as reg space
  dt-bindings: interrupt-controller: mpm: Pass MSG RAM slice through phandle
  dt-bindings: interrupt-controller: renesas,rzg2l-irqc: Document RZ/G3S
  irqchip/renesas-rzg2l: Add support for suspend to RAM
  irqchip/renesas-rzg2l: Add macro to retrieve TITSR register offset based on register's index
  irqchip/renesas-rzg2l: Implement restriction when writing ISCR register
  irqchip/renesas-rzg2l: Document structure members
  irqchip/renesas-rzg2l: Align struct member names to tabs
  irqchip/renesas-rzg2l: Use tabs instead of spaces

lan78xx: remove redundant statement in lan78xx_get_eee

eee_active is set by phy_ethtool_get_eee() already, using the same
logic plus an additional check against link speed/duplex values.
See genphy_c45_eee_is_active() for details.
So we can remove this line.

Signed-off-by: Heiner Kallweit <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

lan743x: remove redundant statement in lan743x_ethtool_get_eee

eee_active is set by phy_ethtool_get_eee() already, using the same
logic plus an additional check against link speed/duplex values.
See genphy_c45_eee_is_active() for details.
So we can remove this line.

Signed-off-by: Heiner Kallweit <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'bnxt_en-ntuple-filter-fixes'

Michael Chan says:

====================
bnxt_en: ntuple filter fixes

The first patch is to remove an unneeded variable. The next 2 patches
are to release RCU lock correctly after accesing the RCU protected
filter structure. Patch 2 also re-arranges the code to look cleaner.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

bnxt_en: Fix RCU locking for ntuple filters in bnxt_rx_flow_steer()

Similar to the previous patch, RCU locking was released too early
in bnxt_rx_flow_steer(). Fix it to unlock after reading fltr->base.sw_id
to guarantee that fltr won't be freed while we are still reading it.

Fixes: cb5bdd292dc0 ("bnxt_en: Add bnxt_lookup_ntp_filter_from_idx() function")
Reported-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Michael Chan <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

bnxt_en: Fix RCU locking for ntuple filters in bnxt_srxclsrldel()

After looking up an ntuple filter from a RCU hash list, the
rcu_read_unlock() call should be made after reading the structure,
or after determining that the filter cannot age out (by aRFS).
The existing code was calling rcu_read_unlock() too early in
bnxt_srxclsrldel().

As suggested by Simon Horman, change the code to handle the error
case of fltr_base not found in the if condition. The code looks
cleaner this way.

Fixes: 8d7ba028aa9a ("bnxt_en: Add support for ntuple filter deletion by ethtool.")
Suggested-by: Simon Horman <[email protected]>
Reported-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Michael Chan <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

bnxt_en: Remove unneeded variable in bnxt_hwrm_clear_vnic_filter()

After recent refactoring, this function doesn't return error any
more. Remove the unneeded rc variable and change the function to
void. The caller is not checking for the return value.

Fixes: 96c9bedc755e ("bnxt_en: Refactor L2 filter alloc/free firmware commands.")
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Michael Chan <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tcp: Revert no longer abort SYN_SENT when receiving some ICMP

This reverts commit 0a8de364ff7a14558e9676f424283148110384d6.

Shachar reported that Vagrant (https://www.vagrantup.com/), which is
very popular tool to manage fleet of VMs stopped to work after commit
citied in Fixes line.

The issue appears while using Vagrant to manage nested VMs.
The steps are:
* create vagrant file
* vagrant up
* vagrant halt (VM is created but shut down)
* vagrant up - fail

Vagrant up stdout:
Bringing machine 'player1' up with 'libvirt' provider...
==> player1: Creating shared folders metadata...
==> player1: Starting domain.
==> player1: Domain launching with graphics connection settings...
==> player1:  -- Graphics Port:      5900
==> player1:  -- Graphics IP:        127.0.0.1
==> player1:  -- Graphics Password:  Not defined
==> player1:  -- Graphics Websocket: 5700
==> player1: Waiting for domain to get an IP address...
==> player1: Waiting for machine to boot. This may take a few minutes...
    player1: SSH address: 192.168.123.61:22
    player1: SSH username: vagrant
    player1: SSH auth method: private key
==> player1: Attempting graceful shutdown of VM...
==> player1: Attempting graceful shutdown of VM...
==> player1: Attempting graceful shutdown of VM...
    player1: Guest communication could not be established! This is usually because
    player1: SSH is not running, the authentication information was changed,
    player1: or some other networking issue. Vagrant will force halt, if
    player1: capable.
==> player1: Attempting direct shutdown of domain...

Fixes: 0a8de364ff7a ("tcp: no longer abort SYN_SENT when receiving some ICMP")
Closes: https://lore.kernel.org/all/MN2PR12MB44863139E562A59329E89DBEB982A@MN2PR12MB4486.namprd12.prod.outlook.com
Signed-off-by: Shachar Kagan <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/14459261ea9f9c7d7dfb28eb004ce8734fa83ade.1704185904.git.leonro@nvidia.com
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'timers-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer subsystem updates from Ingo Molnar:

- Various preparatory cleanups & enhancements of the timer-wheel code,
   in preparation for the WIP 'pull timers at expiry' timer migration
   model series (which will replace the current 'push timers at enqueue'
   migration model), by Anna-Maria Behnsen:

      - Update comments and clean up confusing variable names

      - Add debug check to warn about time travel

      - Improve/expand timer-wheel tracepoints

      - Optimize away unnecessary IPIs for deferrable timers

      - Restructure & clean up next_expiry_recalc()

      - Clean up forward_timer_base()

      - Introduce __forward_timer_base() and use it to simplify and
        micro-optimize get_next_timer_interrupt()

- Restructure the get_next_timer_interrupt()'s idle logic for better
   readability and to enable a minor optimization.

- Fix the nextevt calculation when no timers are pending

- Fix the sysfs_get_uname() prototype declaration

* tag 'timers-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers: Fix nextevt calculation when no timers are pending
  timers: Rework idle logic
  timers: Use already existing function for forwarding timer base
  timers: Split out forward timer base functionality
  timers: Clarify check in forward_timer_base()
  timers: Move store of next event into __next_timer_interrupt()
  timers: Do not IPI for deferrable timers
  tracing/timers: Add tracepoint for tracking timer base is_idle flag
  tracing/timers: Enhance timer_start tracepoint
  tick-sched: Warn when next tick seems to be in the past
  tick/sched: Cleanup confusing variables
  tick-sched: Fix function names in comments
  time: Make sysfs_get_uname() function visible in header

Merge tag 'smp-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull CPU hotplug updates from Ingo Molnar:

- Remove unused CPU hotplug states

- Increase the number of dynamic CPU hotplug states
   from 30 to 40, because existing drivers can exhaust
   the allocation space

* tag 'smp-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu/hotplug: Increase the number of dynamic states
  cpu/hotplug: Remove unused CPU hotplug states

Merge tag 'core-entry-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull generic syscall updates from Ingo Molnar:
"Move various entry functions from kernel/entry/common.c to a header
  file, and always-inline them, to improve syscall entry performance
  on s390 by ~11%"

* tag 'core-entry-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  entry: Move syscall_enter_from_user_mode() to header file
  entry: Move enter_from_user_mode() to header file
  entry: Move exit to usermode functions to header file

Merge tag 'core-debugobjects-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull debugobject update from Ingo Molnar:

- Make tracking object use more robust: it's not safe to access a
   tracking object after releasing the hashbucket lock. Create a
   persistent copy for debug printouts instead.

* tag 'core-debugobjects-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  debugobjects: Stop accessing objects after releasing hash bucket lock

Merge tag 'objtool-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fixlet from Ingo Molnar:
"Address a GCC-14 warning: there's no real bug, but indeed the calloc
  order doesn't match the prototype.

  (Side note: we should really add zalloc() for such cases)"

* tag 'objtool-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix calloc call for new -Walloc-size

Merge tag 'locking-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molar:
"Lock guards:

   - Use lock guards in the ptrace code

   - Introduce conditional guards to extend to conditional lock
     primitives like mutex_trylock()/mutex_lock_interruptible()/etc.

  lockdep:

   - Optimize 'struct lock_class' to be smaller

   - Update file patterns in MAINTAINERS

  mutexes:

   - Document mutex lifetime rules a bit more"

* tag 'locking-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/mutex: Clarify that mutex_unlock(), and most other sleeping locks, can still use the lock object after it's unlocked
  locking/mutex: Document that mutex_unlock() is non-atomic
  ptrace: Convert ptrace_attach() to use lock guards
  locking/lockdep: Slightly reorder 'struct lock_class' to save some memory
  MAINTAINERS: Add include/linux/lockdep*.h
  cleanup: Add conditional guard support

arm64: Update __NR_compat_syscalls for statmount/listmount

Commit d8b0f5465012 ("wire up syscalls for statmount/listmount") added
two new system calls to arch/arm64/include/asm/unistd32.h but forgot to
update the __NR_compat_syscalls number, thus causing the following build
failures:

  arch/arm64/include/asm/unistd32.h:922:24: error: array index in initializer exceeds array bounds
    922 | #define __NR_statmount 457
        |                        ^~~
  arch/arm64/kernel/sys32.c:130:34: note: in definition of macro '__SYSCALL'
    130 | #define __SYSCALL(nr, sym)      [nr] = __arm64_##sym,
        |                                  ^~

Bump up the number by two to accomodate for the new system calls added.

Fixes: d8b0f5465012 ("wire up syscalls for statmount/listmount")
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'x86-entry-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 entry updates from Ingo Molnar:

- Optimize common_interrupt_return()

- Harden the return-to-user code by making a CONFIG_DEBUG_ENTRY=y check
   unconditional & moving it closer to the IRET.

* tag 'x86-entry-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry: Harden return-to-user
  x86/entry: Optimize common_interrupt_return()

Merge tag 'x86-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 core updates from Ingo Molnar:

- Add comments about the magic behind the shadow STI
   before MWAIT in __sti_mwait().

- Fix possible unintended timer delays caused by a race
   in mwait_idle_with_hints().

* tag 'x86-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram
  x86: Add a comment about the "magic" behind shadow sti before mwait

Merge tag 'x86-cleanups-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Ingo Molnar:

- Change global variables to local

- Add missing kernel-doc function parameter descriptions

- Remove unused parameter from a macro

- Remove obsolete Kconfig entry

- Fix comments

- Fix typos, mostly scripted, manually reviewed

and a micro-optimization got misplaced as a cleanup:

- Micro-optimize the asm code in secondary_startup_64_no_verify()

* tag 'x86-cleanups-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arch/x86: Fix typos
  x86/head_64: Use TESTB instead of TESTL in secondary_startup_64_no_verify()
  x86/docs: Remove reference to syscall trampoline in PTI
  x86/Kconfig: Remove obsolete config X86_32_SMP
  x86/io: Remove the unused 'bw' parameter from the BUILDIO() macro
  x86/mtrr: Document missing function parameters in kernel-doc
  x86/setup: Make relocated_ramdisk a local variable of relocate_initrd()

Merge tag 'x86-build-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 build updates from Ingo Molnar:

- Update the objdump & instruction decoder self-test code for better
   LLVM toolchain compatibility

- Rework CONFIG_X86_PAE dependencies, for better readability and higher
   robustness.

- Misc cleanups

* tag 'x86-build-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tools: objdump_reformat.awk: Skip bad instructions from llvm-objdump
  x86/Kconfig: Rework CONFIG_X86_PAE dependency
  x86/tools: Remove chkobjdump.awk
  x86/tools: objdump_reformat.awk: Allow for spaces
  x86/tools: objdump_reformat.awk: Ensure regex matches fwait

Merge tag 'x86-boot-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 boot updates from Ingo Molnar:

- Ignore NMIs during very early boot, to address kexec crashes

- Remove redundant initialization in boot/string.c's strcmp()

* tag 'x86-boot-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Remove redundant initialization of the 'delta' variable in strcmp()
x86/boot: Ignore NMIs during very early boot

Merge tag 'x86-asm-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 asm updates from Ingo Molnar:
"Replace magic numbers in GDT descriptor definitions & handling:

   - Introduce symbolic names via macros for descriptor
     types/fields/flags, and then use these symbolic names.

   - Clean up definitions a bit, such as GDT_ENTRY_INIT()

   - Fix/clean up details that became visibly inconsistent after the
     symbol-based code was introduced:

      - Unify accessed flag handling

      - Set the D/B size flag consistently & according to the HW
        specification"

* tag 'x86-asm-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm: Add DB flag to 32-bit percpu GDT entry
  x86/asm: Always set A (accessed) flag in GDT descriptors
  x86/asm: Replace magic numbers in GDT descriptors, script-generated change
  x86/asm: Replace magic numbers in GDT descriptors, preparations
  x86/asm: Provide new infrastructure for GDT descriptors

Merge tag 'x86-apic-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 apic updates from Ingo Molnar:

- Clean up 'struct apic':
    - Drop ::delivery_mode
    - Drop 'enum apic_delivery_modes'
    - Drop 'struct local_apic'

- Fix comments

* tag 'x86-apic-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioapic: Remove unfinished sentence from comment
  x86/apic: Drop struct local_apic
  x86/apic: Drop enum apic_delivery_modes
  x86/apic: Drop apic::delivery_mode

cifs: delete unnecessary NULL checks in cifs_chan_update_iface()

We return early if "iface" is NULL so there is no need to check here.
Delete those checks.

Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Steve French <[email protected]>

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
"CPU features:

   - Remove ARM64_HAS_NO_HW_PREFETCH copy_page() optimisation for ye
     olde Thunder-X machines

   - Avoid mapping KPTI trampoline when it is not required

   - Make CPU capability API more robust during early initialisation

  Early idreg overrides:

   - Remove dependencies on core kernel helpers from the early
     command-line parsing logic in preparation for moving this code
     before the kernel is mapped

  FPsimd:

   - Restore kernel-mode fpsimd context lazily, allowing us to run
     fpsimd code sequences in the kernel with pre-emption enabled

  KBuild:

   - Install 'vmlinuz.efi' when CONFIG_EFI_ZBOOT=y

   - Makefile cleanups

  LPA2 prep:

   - Preparatory work for enabling the 'LPA2' extension, which will
     introduce 52-bit virtual and physical addressing even with 4KiB
     pages (including for KVM guests).

  Misc:

   - Remove dead code and fix a typo

  MM:

   - Pass NUMA node information for IRQ stack allocations

  Perf:

   - Add perf support for the Synopsys DesignWare PCIe PMU

   - Add support for event counting thresholds (FEAT_PMUv3_TH)
     introduced in Armv8.8

   - Add support for i.MX8DXL SoCs to the IMX DDR PMU driver.

   - Minor PMU driver fixes and optimisations

  RIP VPIPT:

   - Remove what support we had for the obsolete VPIPT I-cache policy

  Selftests:

   - Improvements to the SVE and SME selftests

  Stacktrace:

   - Refactor kernel unwind logic so that it can used by BPF unwinding
     and, eventually, reliable backtracing

  Sysregs:

   - Update a bunch of register definitions based on the latest XML drop
     from Arm"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (87 commits)
  kselftest/arm64: Don't probe the current VL for unsupported vector types
  efi/libstub: zboot: do not use $(shell ...) in cmd_copy_and_pad
  arm64: properly install vmlinuz.efi
  arm64/sysreg: Add missing system instruction definitions for FGT
  arm64/sysreg: Add missing system register definitions for FGT
  arm64/sysreg: Add missing ExtTrcBuff field definition to ID_AA64DFR0_EL1
  arm64/sysreg: Add missing Pauth_LR field definitions to ID_AA64ISAR1_EL1
  arm64: memory: remove duplicated include
  arm: perf: Fix ARCH=arm build with GCC
  arm64: Align boot cpucap handling with system cpucap handling
  arm64: Cleanup system cpucap handling
  MAINTAINERS: add maintainers for DesignWare PCIe PMU driver
  drivers/perf: add DesignWare PCIe PMU driver
  PCI: Move pci_clear_and_set_dword() helper to PCI header
  PCI: Add Alibaba Vendor ID to linux/pci_ids.h
  docs: perf: Add description for Synopsys DesignWare PCIe PMU driver
  arm64: irq: set the correct node for shadow call stack
  Revert "perf/arm_dmc620: Remove duplicate format attribute #defines"
  arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD
  arm64: fpsimd: Preserve/restore kernel mode NEON at context switch
  ...

Merge tag 'm68k-for-v6.8-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k

Pull m68k updates from Geert Uytterhoeven:

  - make the NuBus bus type static and constant

  - defconfig updates

* tag 'm68k-for-v6.8-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k: defconfig: Update defconfigs for v6.7-rc1
  nubus: Make nubus_bus_type static and constant

Merge tag 'powerpc-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

- Add initial support to recognise the HeXin C2000 processor.

- Add papr-vpd and papr-sysparm character device drivers for VPD &
   sysparm retrieval, so userspace tools can be adapted to avoid doing
   raw firmware calls from userspace.

- Sched domains optimisations for shared processor partitions on
   P9/P10.

- A series of optimisations for KVM running as a nested HV under
   PowerVM.

- Other small features and fixes.

Thanks to Aditya Gupta, Aneesh Kumar K.V, Arnd Bergmann, Christophe
Leroy, Colin Ian King, Dario Binacchi, David Heidelberg, Geoff Levand,
Gustavo A. R. Silva, Haoran Liu, Jordan Niethe, Kajol Jain, Kevin Hao,
Kunwu Chan, Li kunyu, Li zeming, Masahiro Yamada, Michal Suchánek,
Nathan Lynch, Naveen N Rao, Nicholas Piggin, Randy Dunlap, Sathvika
Vasireddy, Srikar Dronamraju, Stephen Rothwell, Vaibhav Jain, and
Zhao Ke.

* tag 'powerpc-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (96 commits)
  powerpc/ps3_defconfig: Disable PPC64_BIG_ENDIAN_ELF_ABI_V2
  powerpc/86xx: Drop unused CONFIG_MPC8610
  powerpc/powernv: Add error handling to opal_prd_range_is_valid
  selftests/powerpc: Fix spelling mistake "EACCESS" -> "EACCES"
  powerpc/hvcall: Reorder Nestedv2 hcall opcodes
  powerpc/ps3: Add missing set_freezable() for ps3_probe_thread()
  powerpc/mpc83xx: Use wait_event_freezable() for freezable kthread
  powerpc/mpc83xx: Add the missing set_freezable() for agent_thread_fn()
  powerpc/fsl: Fix fsl,tmu-calibration to match the schema
  powerpc/smp: Dynamically build Powerpc topology
  powerpc/smp: Avoid asym packing within thread_group of a core
  powerpc/smp: Add __ro_after_init attribute
  powerpc/smp: Disable MC domain for shared processor
  powerpc/smp: Enable Asym packing for cores on shared processor
  powerpc/sched: Cleanup vcpu_is_preempted()
  powerpc: add cpu_spec.cpu_features to vmcoreinfo
  powerpc/imc-pmu: Add a null pointer check in update_events_in_group()
  powerpc/powernv: Add a null pointer check in opal_powercap_init()
  powerpc/powernv: Add a null pointer check in opal_event_init()
  powerpc/powernv: Add a null pointer check to scom_debug_init_one()
  ...

Merge tag 'ras_core_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 RAS updates from Borislav Petkov:

- Convert the hw error storm handling into a finer-grained, per-bank
   solution which allows for more timely detection and reporting of
   errors

- Start a documentation section which will hold down relevant RAS
   features description and how they should be used

- Add new AMD error bank types

- Slim down and remove error type descriptions from the kernel side of
   error decoding to rasdaemon which can be used from now on to decode
   hw errors on AMD

- Mark pages containing uncorrectable errors as poison so that kdump
   can avoid them and thus not cause another panic

- The usual cleanups and fixlets

* tag 'ras_core_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Handle Intel threshold interrupt storms
  x86/mce: Add per-bank CMCI storm mitigation
  x86/mce: Remove old CMCI storm mitigation code
  Documentation: Begin a RAS section
  x86/MCE/AMD: Add new MA_LLC, USR_DP, and USR_CP bank types
  EDAC/mce_amd: Remove SMCA Extended Error code descriptions
  x86/mce/amd, EDAC/mce_amd: Move long names to decoder module
  x86/mce/inject: Clear test status value
  x86/mce: Remove redundant check from mce_device_create()
  x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel

Merge tag 'x86_cpu_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu feature updates from Borislav Petkov:

- Add synthetic X86_FEATURE flags for the different AMD Zen generations
   and use them everywhere instead of ad-hoc family/model checks. Drop
   an ancient AMD errata checking facility as a result

- Fix a fragile initcall ordering in intel_epb

- Do not issue the MFENCE+LFENCE barrier for the TSC deadline and
   X2APIC MSRs on AMD as it is not needed there

* tag 'x86_cpu_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/CPU/AMD: Add X86_FEATURE_ZEN1
  x86/CPU/AMD: Drop now unused CPU erratum checking function
  x86/CPU/AMD: Get rid of amd_erratum_1485[]
  x86/CPU/AMD: Get rid of amd_erratum_400[]
  x86/CPU/AMD: Get rid of amd_erratum_383[]
  x86/CPU/AMD: Get rid of amd_erratum_1054[]
  x86/CPU/AMD: Move the DIV0 bug detection to the Zen1 init function
  x86/CPU/AMD: Move Zenbleed check to the Zen2 init function
  x86/CPU/AMD: Rename init_amd_zn() to init_amd_zen_common()
  x86/CPU/AMD: Call the spectral chicken in the Zen2 init function
  x86/CPU/AMD: Move erratum 1076 fix into the Zen1 init function
  x86/CPU/AMD: Move the Zen3 BTC_NO detection to the Zen3 init function
  x86/CPU/AMD: Carve out the erratum 1386 fix
  x86/CPU/AMD: Add ZenX generations flags
  x86/cpu/intel_epb: Don't rely on link order
  x86/barrier: Do not serialize MSR accesses on AMD

Merge tag 'x86_sev_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SEV updates from Borislav Petkov:

- Convert the sev-guest plaform ->remove callback to return void

- Move the SEV C-bit verification to the BSP as it needs to happen only
   once and not on every AP

* tag 'x86_sev_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  virt: sev-guest: Convert to platform remove callback returning void
  x86/sev: Do the C-bit verification only on the BSP

mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER

commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has
changed the definition of MAX_ORDER to be inclusive. This has caused
issues with code that was not yet upstream and depended on the previous
definition.

To draw attention to the altered meaning of the define, rename MAX_ORDER
to MAX_PAGE_ORDER.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm, treewide: introduce NR_PAGE_ORDERS

NR_PAGE_ORDERS defines the number of page orders supported by the page
allocator, ranging from 0 to MAX_ORDER, MAX_ORDER + 1 in total.

NR_PAGE_ORDERS assists in defining arrays of page orders and allows for
more natural iteration over them.

[[email protected]: fixup for kerneldoc warning]
Link: https://lkml.kernel.org/r/20240101111512.7empzyifq7kxtzk3@box
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Cc: Linus Torvalds <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

Merge tag 'x86_paravirt_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 paravirt updates from Borislav Petkov:

- Replace the paravirt patching functionality using the alternatives
   infrastructure and remove the former

- Misc other improvements

* tag 'x86_paravirt_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/alternative: Correct feature bit debug output
  x86/paravirt: Remove no longer needed paravirt patching code
  x86/paravirt: Switch mixed paravirt/alternative calls to alternatives
  x86/alternative: Add indirect call patching
  x86/paravirt: Move some functions and defines to alternative.c
  x86/paravirt: Introduce ALT_NOT_XEN
  x86/paravirt: Make the struct paravirt_patch_site packed
  x86/paravirt: Use relative reference for the original instruction offset

Merge tag 'x86_misc_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 updates from Borislav Petkov:

- Add an informational message which gets issued when IA32 emulation
   has been disabled on the cmdline

- Clarify in detail how /proc/cpuinfo is used on x86

- Fix a theoretical overflow in num_digits()

* tag 'x86_misc_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ia32: State that IA32 emulation is disabled
  Documentation/x86: Document what /proc/cpuinfo is for
  x86/lib: Fix overflow when counting digits

Merge tag 'x86_microcode_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 microcode updates from Borislav Petkov:

- Correct minor issues after the microcode revision reporting
   sanitization

* tag 'x86_microcode_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/intel: Set new revision only after a successful update
  x86/microcode/intel: Remove redundant microcode late updated message

Merge tag 'edac_updates_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

- The EDAC drivers part of the effort to make the ->remove() platform
   driver callback return void

- Add support for AMD AI accelerators

- Add support for a number of Intel SoCs: Alder Lake-N, Raptor Lake-P,
   Meteor Lake-{P,PS}

- Random fixes and cleanups all over the place

* tag 'edac_updates_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: (39 commits)
  EDAC/skx_common: Filter out the invalid address
  EDAC, pnd2: Sort headers alphabetically
  EDAC, pnd2: Correct misleading error message in mk_region_mask()
  EDAC, pnd2: Apply bit macros and helpers where it makes sense
  EDAC, pnd2: Replace custom definition by one from sizes.h
  EDAC/igen6: Add Intel Meteor Lake-P SoCs support
  EDAC/igen6: Add Intel Meteor Lake-PS SoCs support
  EDAC/igen6: Add Intel Raptor Lake-P SoCs support
  EDAC/igen6: Add Intel Alder Lake-N SoCs support
  EDAC/igen6: Make get_mchbar() helper function
  EDAC/amd64: Add support for family 0x19, models 0x90-9f devices
  EDAC/mc: Add support for HBM3 memory type
  EDAC/{sb,i7core}_edac: Do not use a plain integer for a NULL pointer
  EDAC/armada_xp: Explicitly include correct DT includes
  EDAC/pci_sysfs: Use PCI_HEADER_TYPE_MASK instead of literals
  EDAC/thunderx: Fix possible out-of-bounds string access
  EDAC/fsl_ddr: Convert to platform remove callback returning void
  EDAC/zynqmp: Convert to platform remove callback returning void
  EDAC/xgene: Convert to platform remove callback returning void
  EDAC/ti: Convert to platform remove callback returning void
  ...

MAINTAINERS: update unicode maintainer e-mail address

I no longer have access to this mailbox. Use kernel.org to avoid
future updates.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

Merge tag 'vfs-6.8.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs iov_iter cleanups from Christian Brauner:
"This contains a minor cleanup. The patches drop an unused argument
  from import_single_range() allowing to replace import_single_range()
  with import_ubuf() and dropping import_single_range() completely"

* tag 'vfs-6.8.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  iov_iter: replace import_single_range() with import_ubuf()
  iov_iter: remove unused 'iov' argument from import_single_range()

ecryptfs: Reject casefold directory inodes

Even though it seems to be able to resolve some names of
case-insensitive directories, the lack of d_hash and d_compare means we
end up with a broken state in the d_cache. Considering it was never a
goal to support these two together, and we are preparing to use
d_revalidate in case-insensitive filesystems, which would make the
combination even more broken, reject any attempt to get a casefolded
inode from ecryptfs.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
Reviewed-by: Eric Biggers <[email protected]>

Merge tag 'vfs-6.8.cachefiles' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs cachefiles updates from Christian Brauner:
"This contains improvements for on-demand cachefiles.

  If the daemon crashes and the on-demand cachefiles fd is unexpectedly
  closed in-flight requests and subsequent read operations associated
  with the fd will fail with EIO. This causes issues in various
  scenarios as this failure is currently unrecoverable.

  The work contained in this pull request introduces a failover mode and
  enables the daemon to recover in-flight requested-related objects. A
  restarted daemon will be able to process requests as usual.

  This requires that in-flight requests are stored during daemon crash
  or while the daemon is offline. In addition, a handle to
  /dev/cachefiles needs to be stored.

  This can be done by e.g., systemd's fdstore (cf. [1]) which enables
  the restarted daemon to recover state.

  Three new states are introduced in this patchset:

   (1) CLOSE
       Object is closed by the daemon.

   (2) OPEN
       Object is open and ready for processing. IOW, the open request
       has been handled successfully.

   (3) REOPENING
       Object has been previously closed and is now reopened due to a
       read request.

  A restarted daemon can recover the /dev/cachefiles fd from systemd's
  fdstore and writes "restore" to the device. This causes the object
  state to be reset from CLOSE to REOPENING and reinitializes the
  object.

  The daemon may now handle the open request. Any in-flight operations
  are restored and handled avoiding interruptions for users"

Link: https://systemd.io/FILE_DESCRIPTOR_STORE
* tag 'vfs-6.8.cachefiles' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  cachefiles: add restore command to recover inflight ondemand read requests
  cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode
  cachefiles: resend an open request if the read request's object is closed
  cachefiles: extract ondemand info field from cachefiles_object
  cachefiles: introduce object ondemand state

Merge tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs rw updates from Christian Brauner:
"This contains updates from Amir for read-write backing file helpers
  for stacking filesystems such as overlayfs:

   - Fanotify is currently in the process of introducing pre content
     events. Roughly, a new permission event will be added indicating
     that it is safe to write to the file being accessed. These events
     are used by hierarchical storage managers to e.g., fill the content
     of files on first access.

     During that work we noticed that our current permission checking is
     inconsistent in rw_verify_area() and remap_verify_area().
     Especially in the splice code permission checking is done multiple
     times. For example, one time for the whole range and then again for
     partial ranges inside the iterator.

     In addition, we mostly do permission checking before we call
     file_start_write() except for a few places where we call it after.
     For pre-content events we need such permission checking to be done
     before file_start_write(). So this is a nice reason to clean this
     all up.

     After this series, all permission checking is done before
     file_start_write().

     As part of this cleanup we also massaged the splice code a bit. We
     got rid of a few helpers because we are alredy drowning in special
     read-write helpers. We also cleaned up the return types for splice
     helpers.

   - Introduce generic read-write helpers for backing files. This lifts
     some overlayfs code to common code so it can be used by the FUSE
     passthrough work coming in over the next cycles. Make Amir and
     Miklos the maintainers for this new subsystem of the vfs"

* tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits)
  fs: fix __sb_write_started() kerneldoc formatting
  fs: factor out backing_file_mmap() helper
  fs: factor out backing_file_splice_{read,write}() helpers
  fs: factor out backing_file_{read,write}_iter() helpers
  fs: prepare for stackable filesystems backing file helpers
  fsnotify: optionally pass access range in file permission hooks
  fsnotify: assert that file_start_write() is not held in permission hooks
  fsnotify: split fsnotify_perm() into two hooks
  fs: use splice_copy_file_range() inline helper
  splice: return type ssize_t from all helpers
  fs: use do_splice_direct() for nfsd/ksmbd server-side-copy
  fs: move file_start_write() into direct_splice_actor()
  fs: fork splice_file_range() from do_splice_direct()
  fs: create {sb,file}_write_not_started() helpers
  fs: create file_write_started() helper
  fs: create __sb_write_started() helper
  fs: move kiocb_start_write() into vfs_iocb_iter_write()
  fs: move permission hook out of do_iter_read()
  fs: move permission hook out of do_iter_write()
  fs: move file_start_write() into vfs_iter_write()
  ...

Merge tag 'vfs-6.8.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs mount updates from Christian Brauner:
"This contains the work to retrieve detailed information about mounts
  via two new system calls. This is hopefully the beginning of the end
  of the saga that started with fsinfo() years ago.

  The LWN articles in [1] and [2] can serve as a summary so we can avoid
  rehashing everything here.

  At LSFMM in May 2022 we got into a room and agreed on what we want to
  do about fsinfo(). Basically, split it into pieces. This is the first
  part of that agreement. Specifically, it is concerned with retrieving
  information about mounts. So this only concerns the mount information
  retrieval, not the mount table change notification, or the extended
  filesystem specific mount option work. That is separate work.

  Currently mounts have a 32bit id. Mount ids are already in heavy use
  by libmount and other low-level userspace but they can't be relied
  upon because they're recycled very quickly. We agreed that mounts
  should carry a unique 64bit id by which they can be referenced
  directly. This is now implemented as part of this work.

  The new 64bit mount id is exposed in statx() through the new
  STATX_MNT_ID_UNIQUE flag. If the flag isn't raised the old mount id is
  returned. If it is raised and the kernel supports the new 64bit mount
  id the flag is raised in the result mask and the new 64bit mount id is
  returned. New and old mount ids do not overlap so they cannot be
  conflated.

  Two new system calls are introduced that operate on the 64bit mount
  id: statmount() and listmount(). A summary of the api and usage can be
  found on LWN as well (cf. [3]) but of course, I'll provide a summary
  here as well.

  Both system calls rely on struct mnt_id_req. Which is the request
  struct used to pass the 64bit mount id identifying the mount to
  operate on. It is extensible to allow for the addition of new
  parameters and for future use in other apis that make use of mount
  ids.

  statmount() mimicks the semantics of statx() and exposes a set flags
  that userspace may raise in mnt_id_req to request specific information
  to be retrieved. A statmount() call returns a struct statmount filled
  in with information about the requested mount. Supported requests are
  indicated by raising the request flag passed in struct mnt_id_req in
  the @mask argument in struct statmount.

  Currently we do support:

   - STATMOUNT_SB_BASIC:
     Basic filesystem info

   - STATMOUNT_MNT_BASIC
     Mount information (mount id, parent mount id, mount attributes etc)

   - STATMOUNT_PROPAGATE_FROM
     Propagation from what mount in current namespace

   - STATMOUNT_MNT_ROOT
     Path of the root of the mount (e.g., mount --bind /bla /mnt returns /bla)

   - STATMOUNT_MNT_POINT
     Path of the mount point (e.g., mount --bind /bla /mnt returns /mnt)

   - STATMOUNT_FS_TYPE
     Name of the filesystem type as the magic number isn't enough due to submounts

  The string options STATMOUNT_MNT_{ROOT,POINT} and STATMOUNT_FS_TYPE
  are appended to the end of the struct. Userspace can use the offsets
  in @fs_type, @mnt_root, and @mnt_point to reference those strings
  easily.

  The struct statmount reserves quite a bit of space currently for
  future extensibility. This isn't really a problem and if this bothers
  us we can just send a follow-up pull request during this cycle.

  listmount() is given a 64bit mount id via mnt_id_req just as
  statmount(). It takes a buffer and a size to return an array of the
  64bit ids of the child mounts of the requested mount. Userspace can
  thus choose to either retrieve child mounts for a mount in batches or
  iterate through the child mounts. For most use-cases it will be
  sufficient to just leave space for a few child mounts. But for big
  mount tables having an iterator is really helpful. Iterating through a
  mount table works by setting @param in mnt_id_req to the mount id of
  the last child mount retrieved in the previous listmount() call"

Link: https://lwn.net/Articles/934469
Link: https://lwn.net/Articles/829212
Link: https://lwn.net/Articles/950569
* tag 'vfs-6.8.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  add selftest for statmount/listmount
  fs: keep struct mnt_id_req extensible
  wire up syscalls for statmount/listmount
  add listmount(2) syscall
  statmount: simplify string option retrieval
  statmount: simplify numeric option retrieval
  add statmount(2) syscall
  namespace: extract show_path() helper
  mounts: keep list of mounts in an rbtree
  add unique mount ID

Merge tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs super updates from Christian Brauner:
"This contains the super work for this cycle including the long-awaited
  series by Jan to make it possible to prevent writing to mounted block
  devices:

   - Writing to mounted devices is dangerous and can lead to filesystem
     corruption as well as crashes. Furthermore syzbot comes with more
     and more involved examples how to corrupt block device under a
     mounted filesystem leading to kernel crashes and reports we can do
     nothing about. Add tracking of writers to each block device and a
     kernel cmdline argument which controls whether other writeable
     opens to block devices open with BLK_OPEN_RESTRICT_WRITES flag are
     allowed.

     Note that this effectively only prevents modification of the
     particular block device's page cache by other writers. The actual
     device content can still be modified by other means - e.g. by
     issuing direct scsi commands, by doing writes through devices lower
     in the storage stack (e.g. in case loop devices, DM, or MD are
     involved) etc. But blocking direct modifications of the block
     device page cache is enough to give filesystems a chance to perform
     data validation when loading data from the underlying storage and
     thus prevent kernel crashes.

     Syzbot can use this cmdline argument option to avoid uninteresting
     crashes. Also users whose userspace setup does not need writing to
     mounted block devices can set this option for hardening. We expect
     that this will be interesting to quite a few workloads.

     Btrfs is currently opted out of this because they still haven't
     merged patches we require for this to work from three kernel
     releases ago.

   - Reimplement block device freezing and thawing as holder operations
     on the block device.

     This allows us to extend block device freezing to all devices
     associated with a superblock and not just the main device. It also
     allows us to remove get_active_super() and thus another function
     that scans the global list of superblocks.

     Freezing via additional block devices only works if the filesystem
     chooses to use @fs_holder_ops for these additional devices as well.
     That currently only includes ext4 and xfs.

     Earlier releases switched get_tree_bdev() and mount_bdev() to use
     @fs_holder_ops. The remaining nilfs2 open-coded version of
     mount_bdev() has been converted to rely on @fs_holder_ops as well.
     So block device freezing for the main block device will continue to
     work as before.

     There should be no regressions in functionality. The only special
     case is btrfs where block device freezing for the main block device
     never worked because sb->s_bdev isn't set. Block device freezing
     for btrfs can be fixed once they can switch to @fs_holder_ops but
     that can happen whenever they're ready"

* tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits)
  block: Fix a memory leak in bdev_open_by_dev()
  super: don't bother with WARN_ON_ONCE()
  super: massage wait event mechanism
  ext4: Block writes to journal device
  xfs: Block writes to log device
  fs: Block writes to mounted block devices
  btrfs: Do not restrict writes to btrfs devices
  block: Add config option to not allow writing to mounted devices
  block: Remove blkdev_get_by_*() functions
  bcachefs: Convert to bdev_open_by_path()
  fs: handle freezing from multiple devices
  fs: remove dead check
  nilfs2: simplify device handling
  fs: streamline thaw_super_locked
  ext4: simplify device handling
  xfs: simplify device handling
  fs: simplify setup_bdev_super() calls
  blkdev: comment fs_holder_ops
  porting: document block device freeze and thaw changes
  fs: remove unused helper
  ...

Merge tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
"This contains the usual miscellaneous features, cleanups, and fixes
  for vfs and individual fses.

  Features:

   - Add Jan Kara as VFS reviewer

   - Show correct device and inode numbers in proc/<pid>/maps for vma
     files on stacked filesystems. This is now easily doable thanks to
     the backing file work from the last cycles. This comes with
     selftests

  Cleanups:

   - Remove a redundant might_sleep() from wait_on_inode()

   - Initialize pointer with NULL, not 0

   - Clarify comment on access_override_creds()

   - Rework and simplify eventfd_signal() and eventfd_signal_mask()
     helpers

   - Process aio completions in batches to avoid needless wakeups

   - Completely decouple struct mnt_idmap from namespaces. We now only
     keep the actual idmapping around and don't stash references to
     namespaces

   - Reformat maintainer entries to indicate that a given subsystem
     belongs to fs/

   - Simplify fput() for files that were never opened

   - Get rid of various pointless file helpers

   - Rename various file helpers

   - Rename struct file members after SLAB_TYPESAFE_BY_RCU switch from
     last cycle

   - Make relatime_need_update() return bool

   - Use GFP_KERNEL instead of GFP_USER when allocating superblocks

   - Replace deprecated ida_simple_*() calls with their current ida_*()
     counterparts

  Fixes:

   - Fix comments on user namespace id mapping helpers. They aren't
     kernel doc comments so they shouldn't be using /**

   - s/Retuns/Returns/g in various places

   - Add missing parameter documentation on can_move_mount_beneath()

   - Rename i_mapping->private_data to i_mapping->i_private_data

   - Fix a false-positive lockdep warning in pipe_write() for watch
     queues

   - Improve __fget_files_rcu() code generation to improve performance

   - Only notify writer that pipe resizing has finished after setting
     pipe->max_usage otherwise writers are never notified that the pipe
     has been resized and hang

   - Fix some kernel docs in hfsplus

   - s/passs/pass/g in various places

   - Fix kernel docs in ntfs

   - Fix kcalloc() arguments order reported by gcc 14

   - Fix uninitialized value in reiserfs"

* tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits)
  reiserfs: fix uninit-value in comp_keys
  watch_queue: fix kcalloc() arguments order
  ntfs: dir.c: fix kernel-doc function parameter warnings
  fs: fix doc comment typo fs tree wide
  selftests/overlayfs: verify device and inode numbers in /proc/pid/maps
  fs/proc: show correct device and inode numbers in /proc/pid/maps
  eventfd: Remove usage of the deprecated ida_simple_xx() API
  fs: super: use GFP_KERNEL instead of GFP_USER for super block allocation
  fs/hfsplus: wrapper.c: fix kernel-doc warnings
  fs: add Jan Kara as reviewer
  fs/inode: Make relatime_need_update return bool
  pipe: wakeup wr_wait after setting max_usage
  file: remove __receive_fd()
  file: stop exposing receive_fd_user()
  fs: replace f_rcuhead with f_task_work
  file: remove pointless wrapper
  file: s/close_fd_get_file()/file_close_fd()/g
  Improve __fget_files_rcu() code generation (and thus __fget_light())
  file: massage cleanup of files that failed to open
  fs/pipe: Fix lockdep false-positive in watchqueue pipe_write()
  ...

asm-generic: make sparse happy with odd-sized put_unaligned_*()

__put_unaligned_be24() and friends use implicit casts to convert
larger-sized data to bytes, which trips sparse truncation warnings when
the argument is a constant:

    CC [M]  drivers/input/touchscreen/hynitron_cstxxx.o
    CHECK   drivers/input/touchscreen/hynitron_cstxxx.c
  drivers/input/touchscreen/hynitron_cstxxx.c: note: in included file (through arch/x86/include/generated/asm/unaligned.h):
  include/asm-generic/unaligned.h:119:16: warning: cast truncates bits from constant value (aa01a0 becomes a0)
  include/asm-generic/unaligned.h:120:20: warning: cast truncates bits from constant value (aa01 becomes 1)
  include/asm-generic/unaligned.h:119:16: warning: cast truncates bits from constant value (ab00d0 becomes d0)
  include/asm-generic/unaligned.h:120:20: warning: cast truncates bits from constant value (ab00 becomes 0)

To avoid this let's mask off upper bits explicitly, the resulting code
should be exactly the same, but it will keep sparse happy.

Reported-by: kernel test robot <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'pm-sleep'

Merge system-wide power management updates for 6.8-rc1:

- Fix possible deadlocks in the core system-wide PM code that occur if
   device-handling functions cannot be executed asynchronously during
   resune from system-wide suspend (Rafael J. Wysocki).

- Clean up unnecessary local variable initializations in multiple
   places in the hibernation code (Wang chaodong, Li zeming).

- Adjust core hibernation code to avoid missing wakeup events that
   occur after saving an image to persistent storage (Chris Feng).

- Update hibernation code to enforce correct ordering during image
   compression and decompression (Hongchen Zhang).

- Use kmap_local_page() instead of kmap_atomic() in copy_data_page()
   during hibernation and restore (Chen Haonan).

- Adjust documentation and code comments to reflect recent task freezer
   changes (Kevin Hao).

- Repair excess function parameter description warning in the
   hibernation image-saving code (Randy Dunlap).

* pm-sleep:
  PM: sleep: Fix possible deadlocks in core system-wide PM code
  async: Introduce async_schedule_dev_nocall()
  async: Split async_schedule_node_domain()
  PM: hibernate: Repair excess function parameter description warning
  PM: sleep: Remove obsolete comment from unlock_system_sleep()
  Documentation: PM: Adjust freezing-of-tasks.rst to the freezer changes
  PM: hibernate: Use kmap_local_page() in copy_data_page()
  PM: hibernate: Enforce ordering during image compression/decompression
  PM: hibernate: Avoid missing wakeup events during hibernation
  PM: hibernate: Do not initialize error in snapshot_write_next()
  PM: hibernate: Do not initialize error in swap_write_page()
  PM: hibernate: Drop unnecessary local variable initialization

Merge branches 'pm-cpuidle', 'pm-cpufreq' and 'pm-devfreq'

Merge cpuidle, cpufreq and devfreq updates for 6.8-rc1:

- Add support for the Sierra Forest, Grand Ridge and Meteorlake SoCs to
   the intel_idle cpuidle driver (Artem Bityutskiy, Zhang Rui).

- Do not enable interrupts when entering idle in the haltpoll cpuidle
   driver (Borislav Petkov).

- Add Emerald Rapids support in no-HWP mode to the intel_pstate cpufreq
   driver (Zhenguo Yao).

- Use EPP values programmed by the platform firmware as balance
   performance ones by default in intel_pstate (Srinivas Pandruvada).

- Add a missing function return value check to the SCMI cpufreq driver
   to avoid unexpected behavior (Alexandra Diupina).

- Fix parameter type warning in the armada-8k cpufreq driver (Gregory
   CLEMENT).

- Rework trans_stat_show() in the devfreq core code to avoid buffer
   overflows (Christian Marangi).

- Synchronize devfreq_monitor_[start/stop] so as to prevent a timer
   list corruption from occurring when devfreq governors are switched
   frequently (Mukesh Ojha).

* pm-cpuidle:
  cpuidle: haltpoll: Do not enable interrupts when entering idle
  intel_idle: add Sierra Forest SoC support
  intel_idle: add Grand Ridge SoC support
  intel_idle: Add Meteorlake support

* pm-cpufreq:
  cpufreq: intel_pstate: Add Emerald Rapids support in no-HWP mode
  cpufreq: armada-8k: Fix parameter type warning
  cpufreq: scmi: process the result of devm_of_clk_add_hw_provider()
  cpufreq: intel_pstate: Prioritize firmware-provided balance performance EPP

* pm-devfreq:
  PM / devfreq: Synchronize devfreq_monitor_[start/stop]
  PM / devfreq: Convert to use sysfs_emit_at() API
  PM / devfreq: Fix buffer overflow in trans_stat_show

Merge tag 'opp-updates-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm into pm-opp

Merge OPP (Operating Performance Points) updates for 6.8 from Viresh
Kumar:

"- Fix _set_required_opps when opp is NULL (Bryan O'Donoghue).
- ti: Use device_get_match_data() (Rob Herring).
- Minor cleanups around OPP level and other parts and call
   dev_pm_opp_set_opp() recursively for required OPPs (Viresh Kumar)."

* tag 'opp-updates-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  OPP: Rename 'rate_clk_single'
  OPP: Pass rounded rate to _set_opp()
  OPP: Relocate dev_pm_opp_sync_regulators()
  OPP: Move dev_pm_opp_icc_bw to internal opp.h
  OPP: Fix _set_required_opps when opp is NULL
  OPP: The level field is always of unsigned int type
  OPP: Check for invalid OPP in dev_pm_opp_find_level_ceil()
  OPP: Don't set OPP recursively for a parent genpd
  OPP: Call dev_pm_opp_set_opp() for required OPPs
  OPP: Use _set_opp_level() for single genpd case
  OPP: Level zero is valid
  opp: ti: Use device_get_match_data()

Merge branch 'sched/urgent' into sched/core, to pick up pending v6.7 fixes for the v6.8 merge window

This fix didn't make it upstream in time, pick it up
for the v6.8 merge window.

Signed-off-by: Ingo Molnar <[email protected]>

locking/mutex: Clarify that mutex_unlock(), and most other sleeping locks, can still use the lock object after it's unlocked

Clarify the mutex lock lifetime rules a bit more.

Signed-off-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

cifs: get rid of dup length check in parse_reparse_point()

smb2_compound_op(SMB2_OP_GET_REPARSE) already checks if ioctl response
has a valid reparse data buffer's length, so there's no need to check
it again in parse_reparse_point().

In order to get rid of duplicate check, validate reparse data buffer's
length also in cifs_query_reparse_point().

Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

Revert "mlx5 updates 2023-12-20"

Revert "net/mlx5: Implement management PF Ethernet profile"
This reverts commit 22c4640698a1d47606b5a4264a584e8046641784.
Revert "net/mlx5: Enable SD feature"
This reverts commit c88c49ac9c18fb7c3fa431126de1d8f8f555e912.
Revert "net/mlx5e: Block TLS device offload on combined SD netdev"
This reverts commit 83a59ce0057b7753d7fbece194b89622c663b2a6.
Revert "net/mlx5e: Support per-mdev queue counter"
This reverts commit d72baceb92539a178d2610b0e9ceb75706a75b55.
Revert "net/mlx5e: Support cross-vhca RSS"
This reverts commit c73a3ab8fa6e93a783bd563938d7cf00d62d5d34.
Revert "net/mlx5e: Let channels be SD-aware"
This reverts commit e4f9686bdee7b4dd89e0ed63cd03606e4bda4ced.
Revert "net/mlx5e: Create EN core HW resources for all secondary devices"
This reverts commit c4fb94aa822d6c9d05fc3c5aee35c7e339061dc1.
Revert "net/mlx5e: Create single netdev per SD group"
This reverts commit e2578b4f983cfcd47837bbe3bcdbf5920e50b2ad.
Revert "net/mlx5: SD, Add informative prints in kernel log"
This reverts commit c82d360325112ccc512fc11a3b68cdcdf04a1478.
Revert "net/mlx5: SD, Implement steering for primary and secondaries"
This reverts commit 605fcce33b2d1beb0139b6e5913fa0b2062116b2.
Revert "net/mlx5: SD, Implement devcom communication and primary election"
This reverts commit a45af9a96740873db9a4b5bb493ce2ad81ccb4d5.
Revert "net/mlx5: SD, Implement basic query and instantiation"
This reverts commit 63b9ce944c0e26c44c42cdd5095c2e9851c1a8ff.
Revert "net/mlx5: SD, Introduce SD lib"
This reverts commit 4a04a31f49320d078b8078e1da4b0e2faca5dfa3.
Revert "net/mlx5: Fix query of sd_group field"
This reverts commit e04984a37398b3f4f5a79c993b94c6b1224184cc.
Revert "net/mlx5e: Use the correct lag ports number when creating TISes"
This reverts commit a7e7b40c4bc115dbf2a2bb453d7bbb2e0ea99703.

There are some unanswered questions on the list, and we don't
have any docs. Given the lack of replies so far and the fact
that v6.8 merge window has started - let's revert this and
revisit for v6.9.

Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>

Revert "net: stmmac: Enable Per DMA Channel interrupt"

Revert "net: stmmac: Use interrupt mode INTM=1 for per channel irq"
This reverts commit 36af9f25ddfd311da82628f194c794786467cb12.
Revert "net: stmmac: Add support for TX/RX channel interrupt"
This reverts commit 9072e03d32088137a435ddf3aa95fd6e038d69d8.
Revert "net: stmmac: Make MSI interrupt routine generic"
This reverts commit 477bd4beb93bf9ace9bda71f1437b191befa9cf4.
Revert "dt-bindings: net: snps,dwmac: per channel irq"
This reverts commit 67d47c8ada0f8795bfcdb85cc8f2ad3ce556674b.

Device tree bindings need to be reviewed.

Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>

nfsd: rename nfsd_last_thread() to nfsd_destroy_serv()

As this function now destroys the svc_serv, this is a better name.

Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: discard sv_refcnt, and svc_get/svc_put

sv_refcnt is no longer useful.
lockd and nfs-cb only ever have the svc active when there are a non-zero
number of threads, so sv_refcnt mirrors sv_nrthreads.

nfsd also keeps the svc active between when a socket is added and when
the first thread is started, but we don't really need a refcount for
that. We can simply not destroy the svc while there are any permanent
sockets attached.

So remove sv_refcnt and the get/put functions.
Instead of a final call to svc_put(), call svc_destroy() instead.
This is changed to also store NULL in the passed-in pointer to make it
easier to avoid use-after-free situations.

Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

svc: don't hold reference for poolstats, only mutex.

A future patch will remove refcounting on svc_serv as it is of little
use.
It is currently used to keep the svc around while the pool_stats file is
open.
Change this to get the pointer, protected by the mutex, only in
seq_start, and the release the mutex in seq_stop.
This means that if the nfsd server is stopped and restarted while the
pool_stats file it open, then some pool stats info could be from the
first instance and some from the second. This might appear odd, but is
unlikely to be a problem in practice.

Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: remove printk when back channel request not found

If the client interface is down, or there is a network partition between
the client and server that prevents the callback request to reach the
client, TCP on the server will keep re-transmitting the callback for about
~9 minutes before giving up and closing the connection.

If the connection between the client and the server is re-established
before the connection is closed and after the callback timed out (9 secs)
then the re-transmitted callback request will arrive at the client. When
the server receives the reply of the callback, receive_cb_reply prints the
"Got unrecognized reply..." message in the system log since the callback
request was already removed from the server xprt's recv_queue.

Even though this scenario has no effect on the server operation, a
malfunctioning or malicious client can fill up the server's system log.

Signed-off-by: Dai Ngo <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Implement multi-stage Read completion again

Having an nfsd thread waiting for an RDMA Read completion is
problematic if the Read responder (ie, the client) stops responding.
We need to go back to handling RDMA Reads by getting the svc scheduler
to call svc_rdma_recvfrom() a second time to finish building an RPC
message after a Read completion.

This is the final patch, and makes several changes that have to
happen concurrently:

1. svc_rdma_process_read_list no longer waits for a completion, but
   simply builds and posts the Read WRs.

2. svc_rdma_read_done() now queues a completed Read on
   sc_read_complete_q for later processing rather than calling
   complete().

3. The completed RPC message is no longer built in the
   svc_rdma_process_read_list() path. Finishing the message is now
   done in svc_rdma_recvfrom() when it notices work on the
   sc_read_complete_q. The "finish building this RPC message" code
   is removed from the svc_rdma_process_read_list() path.

This arrangement avoids the need for an nfsd thread to wait for an
RDMA Read non-interruptibly without a timeout. It's basically the
same code structure that Tom Tucker used for Read chunks along with
some clean-up and modernization.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Copy construction of svc_rqst::rq_arg to rdma_read_complete()

Once a set of RDMA Reads are complete, the Read completion handler
will poke the transport to trigger a second call to
svc_rdma_recvfrom(). recvfrom() will then merge the RDMA Read
payloads with the previously received RPC header to form a completed
RPC Call message.

The new code is copied from the svc_rdma_process_read_list() path.
A subsequent patch will make use of this code and remove the code
that this was copied from (svc_rdma_rw.c).

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Add back svcxprt_rdma::sc_read_complete_q

Having an nfsd thread waiting for an RDMA Read completion is
problematic if the Read responder (ie, the client) stops responding.
We need to go back to handling RDMA Reads by allowing the nfsd
thread to return to the svc scheduler, then waking a second thread
finish the RPC message once the Read completion fires.

As a next step, add a list_head upon which completed Reads are queued.
A subsequent patch will make use of this queue.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Add back svc_rdma_recv_ctxt::rc_pages

Having an nfsd thread waiting for an RDMA Read completion is
problematic if the Read responder (the client) stops responding. We
need to go back to handling RDMA Reads by allowing the nfsd thread
to return to the svc scheduler, then waking a second thread finish
the RPC message once the Read completion fires.

To start with, restore the rc_pages field so that RDMA Read pages
can be managed across calls to svc_rdma_recvfrom().

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Clean up comment in svc_rdma_accept()

The comment that starts "Qualify ..." applies to only some of the
following code paragraph. Re-arrange the lines so the comment makes
more sense.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Remove queue-shortening warnings

These won't have much diagnostic value for site administrators.
Since they can't be disabled, they become noise.

What's more, the subsequent rdma_create_qp() call adjusts the Send
Queue size (possibly downward) without warning, making the size
reported by these pr_warns inaccurate.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Remove pointer addresses shown in dprintk()

There are a couple of dprintk() call sites in svc_rdma_accept()
that show pointer addresses. These days, displayed pointer addresses
are hashed and thus have little or no diagnostic value, especially
for site administrators.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Optimize svc_rdma_cc_init()

The atomic_inc_return() in svc_rdma_send_cid_init() is expensive.

Some svc_rdma_chunk_ctxt's now reside in long-lived container
structures. They don't need a fresh completion ID for every I/O
operation.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: De-duplicate completion ID initialization helpers

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Move the svc_rdma_cc_init() call

Now that the chunk_ctxt for Reads is no longer dynamically allocated
it can be initialized once for the life of the object that contains
it (struct svc_rdma_recv_ctxt).

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Remove struct svc_rdma_read_info

The remaining fields of struct svc_rdma_read_info are no longer
referenced.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Update the synopsis of svc_rdma_read_special()

Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_read_special() can use that recv_ctxt to derive the
read_info rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Update the synopsis of svc_rdma_read_call_chunk()

Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_read_call_chunk() can use that recv_ctxt to derive the
read_info rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Update synopsis of svc_rdma_read_multiple_chunks()

Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_read_multiple_chunks() can use that recv_ctxt to derive the
read_info rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Update synopsis of svc_rdma_copy_inline_range()

Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_copy_inline_range() can use that recv_ctxt to derive the
read_info rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Update the synopsis of svc_rdma_read_data_item()

Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_build_read_data_item() can use that recv_ctxt to derive
that information rather than the other way around. This removes
another usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Update synopsis of svc_rdma_read_chunk_range()

Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_build_read_chunk_range() can use that recv_ctxt to derive
that information rather than the other way around. This removes
another usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Update synopsis of svc_rdma_build_read_chunk()

Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_build_read_chunk() can use that recv_ctxt to derive that
information rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Update synopsis of svc_rdma_build_read_segment()

Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_build_read_segment() can use the recv_ctxt to derive that
information rather than the other way around. This removes one usage
of the ri_readctxt field, enabling its removal in a subsequent
patch.

At the same time, the use of ri_rqst can similarly be replaced with
a passed-in function parameter.

Start with build_read_segment() because it is a common utility
function at the bottom of the Read chunk path.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Move read_info::ri_pageoff into struct svc_rdma_recv_ctxt

Further clean up: move the starting byte offset field into
svc_rdma_recv_ctxt.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Move svc_rdma_read_info::ri_pageno to struct svc_rdma_recv_ctxt

Further clean up: move the page index field into svc_rdma_recv_ctxt.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Start moving fields out of struct svc_rdma_read_info

Since the request's svc_rdma_recv_ctxt will stay around for the
duration of the RDMA Read operation, the contents of struct
svc_rdma_read_info can reside in the request's svc_rdma_recv_ctxt
rather than being allocated separately. This will eventually save a
call to kmalloc() in a hot path.

Start this clean-up by moving the Read chunk's svc_rdma_chunk_ctxt.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Move struct svc_rdma_chunk_ctxt to svc_rdma.h

Prepare for nestling these into the send and recv ctxts so they
no longer have to be allocated dynamically.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Remove the svc_rdma_chunk_ctxt::cc_rdma field

In every instance, the pointer address in that field is now
available by other means.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Pass a pointer to the transport to svc_rdma_cc_release()

Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma
field.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Explicitly pass the transport to svc_rdma_post_chunk_ctxt()

Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma
field.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Explicitly pass the transport into Read chunk I/O paths

Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma
field.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Explicitly pass the transport into Write chunk I/O paths

Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma
field.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Acquire the svcxprt_rdma pointer from the CQ context

Enable the removal of the svc_rdma_chunk_ctxt::cc_rdma field in a
subsequent patch.

Signed-off-by: Chuck Lever <[email protected]>