Git Repo - linux.git/log

Merge tag 'x86_misc_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 updates from Borislav Petkov:

- Add an informational message which gets issued when IA32 emulation
   has been disabled on the cmdline

- Clarify in detail how /proc/cpuinfo is used on x86

- Fix a theoretical overflow in num_digits()

* tag 'x86_misc_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ia32: State that IA32 emulation is disabled
  Documentation/x86: Document what /proc/cpuinfo is for
  x86/lib: Fix overflow when counting digits

Merge tag 'x86_microcode_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 microcode updates from Borislav Petkov:

- Correct minor issues after the microcode revision reporting
   sanitization

* tag 'x86_microcode_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/intel: Set new revision only after a successful update
  x86/microcode/intel: Remove redundant microcode late updated message

Merge tag 'edac_updates_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

- The EDAC drivers part of the effort to make the ->remove() platform
   driver callback return void

- Add support for AMD AI accelerators

- Add support for a number of Intel SoCs: Alder Lake-N, Raptor Lake-P,
   Meteor Lake-{P,PS}

- Random fixes and cleanups all over the place

* tag 'edac_updates_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: (39 commits)
  EDAC/skx_common: Filter out the invalid address
  EDAC, pnd2: Sort headers alphabetically
  EDAC, pnd2: Correct misleading error message in mk_region_mask()
  EDAC, pnd2: Apply bit macros and helpers where it makes sense
  EDAC, pnd2: Replace custom definition by one from sizes.h
  EDAC/igen6: Add Intel Meteor Lake-P SoCs support
  EDAC/igen6: Add Intel Meteor Lake-PS SoCs support
  EDAC/igen6: Add Intel Raptor Lake-P SoCs support
  EDAC/igen6: Add Intel Alder Lake-N SoCs support
  EDAC/igen6: Make get_mchbar() helper function
  EDAC/amd64: Add support for family 0x19, models 0x90-9f devices
  EDAC/mc: Add support for HBM3 memory type
  EDAC/{sb,i7core}_edac: Do not use a plain integer for a NULL pointer
  EDAC/armada_xp: Explicitly include correct DT includes
  EDAC/pci_sysfs: Use PCI_HEADER_TYPE_MASK instead of literals
  EDAC/thunderx: Fix possible out-of-bounds string access
  EDAC/fsl_ddr: Convert to platform remove callback returning void
  EDAC/zynqmp: Convert to platform remove callback returning void
  EDAC/xgene: Convert to platform remove callback returning void
  EDAC/ti: Convert to platform remove callback returning void
  ...

MAINTAINERS: update unicode maintainer e-mail address

I no longer have access to this mailbox. Use kernel.org to avoid
future updates.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

Merge tag 'vfs-6.8.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs iov_iter cleanups from Christian Brauner:
"This contains a minor cleanup. The patches drop an unused argument
  from import_single_range() allowing to replace import_single_range()
  with import_ubuf() and dropping import_single_range() completely"

* tag 'vfs-6.8.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  iov_iter: replace import_single_range() with import_ubuf()
  iov_iter: remove unused 'iov' argument from import_single_range()

ecryptfs: Reject casefold directory inodes

Even though it seems to be able to resolve some names of
case-insensitive directories, the lack of d_hash and d_compare means we
end up with a broken state in the d_cache. Considering it was never a
goal to support these two together, and we are preparing to use
d_revalidate in case-insensitive filesystems, which would make the
combination even more broken, reject any attempt to get a casefolded
inode from ecryptfs.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
Reviewed-by: Eric Biggers <[email protected]>

Merge tag 'vfs-6.8.cachefiles' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs cachefiles updates from Christian Brauner:
"This contains improvements for on-demand cachefiles.

  If the daemon crashes and the on-demand cachefiles fd is unexpectedly
  closed in-flight requests and subsequent read operations associated
  with the fd will fail with EIO. This causes issues in various
  scenarios as this failure is currently unrecoverable.

  The work contained in this pull request introduces a failover mode and
  enables the daemon to recover in-flight requested-related objects. A
  restarted daemon will be able to process requests as usual.

  This requires that in-flight requests are stored during daemon crash
  or while the daemon is offline. In addition, a handle to
  /dev/cachefiles needs to be stored.

  This can be done by e.g., systemd's fdstore (cf. [1]) which enables
  the restarted daemon to recover state.

  Three new states are introduced in this patchset:

   (1) CLOSE
       Object is closed by the daemon.

   (2) OPEN
       Object is open and ready for processing. IOW, the open request
       has been handled successfully.

   (3) REOPENING
       Object has been previously closed and is now reopened due to a
       read request.

  A restarted daemon can recover the /dev/cachefiles fd from systemd's
  fdstore and writes "restore" to the device. This causes the object
  state to be reset from CLOSE to REOPENING and reinitializes the
  object.

  The daemon may now handle the open request. Any in-flight operations
  are restored and handled avoiding interruptions for users"

Link: https://systemd.io/FILE_DESCRIPTOR_STORE
* tag 'vfs-6.8.cachefiles' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  cachefiles: add restore command to recover inflight ondemand read requests
  cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode
  cachefiles: resend an open request if the read request's object is closed
  cachefiles: extract ondemand info field from cachefiles_object
  cachefiles: introduce object ondemand state

Merge tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs rw updates from Christian Brauner:
"This contains updates from Amir for read-write backing file helpers
  for stacking filesystems such as overlayfs:

   - Fanotify is currently in the process of introducing pre content
     events. Roughly, a new permission event will be added indicating
     that it is safe to write to the file being accessed. These events
     are used by hierarchical storage managers to e.g., fill the content
     of files on first access.

     During that work we noticed that our current permission checking is
     inconsistent in rw_verify_area() and remap_verify_area().
     Especially in the splice code permission checking is done multiple
     times. For example, one time for the whole range and then again for
     partial ranges inside the iterator.

     In addition, we mostly do permission checking before we call
     file_start_write() except for a few places where we call it after.
     For pre-content events we need such permission checking to be done
     before file_start_write(). So this is a nice reason to clean this
     all up.

     After this series, all permission checking is done before
     file_start_write().

     As part of this cleanup we also massaged the splice code a bit. We
     got rid of a few helpers because we are alredy drowning in special
     read-write helpers. We also cleaned up the return types for splice
     helpers.

   - Introduce generic read-write helpers for backing files. This lifts
     some overlayfs code to common code so it can be used by the FUSE
     passthrough work coming in over the next cycles. Make Amir and
     Miklos the maintainers for this new subsystem of the vfs"

* tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits)
  fs: fix __sb_write_started() kerneldoc formatting
  fs: factor out backing_file_mmap() helper
  fs: factor out backing_file_splice_{read,write}() helpers
  fs: factor out backing_file_{read,write}_iter() helpers
  fs: prepare for stackable filesystems backing file helpers
  fsnotify: optionally pass access range in file permission hooks
  fsnotify: assert that file_start_write() is not held in permission hooks
  fsnotify: split fsnotify_perm() into two hooks
  fs: use splice_copy_file_range() inline helper
  splice: return type ssize_t from all helpers
  fs: use do_splice_direct() for nfsd/ksmbd server-side-copy
  fs: move file_start_write() into direct_splice_actor()
  fs: fork splice_file_range() from do_splice_direct()
  fs: create {sb,file}_write_not_started() helpers
  fs: create file_write_started() helper
  fs: create __sb_write_started() helper
  fs: move kiocb_start_write() into vfs_iocb_iter_write()
  fs: move permission hook out of do_iter_read()
  fs: move permission hook out of do_iter_write()
  fs: move file_start_write() into vfs_iter_write()
  ...

Merge tag 'vfs-6.8.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs mount updates from Christian Brauner:
"This contains the work to retrieve detailed information about mounts
  via two new system calls. This is hopefully the beginning of the end
  of the saga that started with fsinfo() years ago.

  The LWN articles in [1] and [2] can serve as a summary so we can avoid
  rehashing everything here.

  At LSFMM in May 2022 we got into a room and agreed on what we want to
  do about fsinfo(). Basically, split it into pieces. This is the first
  part of that agreement. Specifically, it is concerned with retrieving
  information about mounts. So this only concerns the mount information
  retrieval, not the mount table change notification, or the extended
  filesystem specific mount option work. That is separate work.

  Currently mounts have a 32bit id. Mount ids are already in heavy use
  by libmount and other low-level userspace but they can't be relied
  upon because they're recycled very quickly. We agreed that mounts
  should carry a unique 64bit id by which they can be referenced
  directly. This is now implemented as part of this work.

  The new 64bit mount id is exposed in statx() through the new
  STATX_MNT_ID_UNIQUE flag. If the flag isn't raised the old mount id is
  returned. If it is raised and the kernel supports the new 64bit mount
  id the flag is raised in the result mask and the new 64bit mount id is
  returned. New and old mount ids do not overlap so they cannot be
  conflated.

  Two new system calls are introduced that operate on the 64bit mount
  id: statmount() and listmount(). A summary of the api and usage can be
  found on LWN as well (cf. [3]) but of course, I'll provide a summary
  here as well.

  Both system calls rely on struct mnt_id_req. Which is the request
  struct used to pass the 64bit mount id identifying the mount to
  operate on. It is extensible to allow for the addition of new
  parameters and for future use in other apis that make use of mount
  ids.

  statmount() mimicks the semantics of statx() and exposes a set flags
  that userspace may raise in mnt_id_req to request specific information
  to be retrieved. A statmount() call returns a struct statmount filled
  in with information about the requested mount. Supported requests are
  indicated by raising the request flag passed in struct mnt_id_req in
  the @mask argument in struct statmount.

  Currently we do support:

   - STATMOUNT_SB_BASIC:
     Basic filesystem info

   - STATMOUNT_MNT_BASIC
     Mount information (mount id, parent mount id, mount attributes etc)

   - STATMOUNT_PROPAGATE_FROM
     Propagation from what mount in current namespace

   - STATMOUNT_MNT_ROOT
     Path of the root of the mount (e.g., mount --bind /bla /mnt returns /bla)

   - STATMOUNT_MNT_POINT
     Path of the mount point (e.g., mount --bind /bla /mnt returns /mnt)

   - STATMOUNT_FS_TYPE
     Name of the filesystem type as the magic number isn't enough due to submounts

  The string options STATMOUNT_MNT_{ROOT,POINT} and STATMOUNT_FS_TYPE
  are appended to the end of the struct. Userspace can use the offsets
  in @fs_type, @mnt_root, and @mnt_point to reference those strings
  easily.

  The struct statmount reserves quite a bit of space currently for
  future extensibility. This isn't really a problem and if this bothers
  us we can just send a follow-up pull request during this cycle.

  listmount() is given a 64bit mount id via mnt_id_req just as
  statmount(). It takes a buffer and a size to return an array of the
  64bit ids of the child mounts of the requested mount. Userspace can
  thus choose to either retrieve child mounts for a mount in batches or
  iterate through the child mounts. For most use-cases it will be
  sufficient to just leave space for a few child mounts. But for big
  mount tables having an iterator is really helpful. Iterating through a
  mount table works by setting @param in mnt_id_req to the mount id of
  the last child mount retrieved in the previous listmount() call"

Link: https://lwn.net/Articles/934469
Link: https://lwn.net/Articles/829212
Link: https://lwn.net/Articles/950569
* tag 'vfs-6.8.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  add selftest for statmount/listmount
  fs: keep struct mnt_id_req extensible
  wire up syscalls for statmount/listmount
  add listmount(2) syscall
  statmount: simplify string option retrieval
  statmount: simplify numeric option retrieval
  add statmount(2) syscall
  namespace: extract show_path() helper
  mounts: keep list of mounts in an rbtree
  add unique mount ID

Merge tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs super updates from Christian Brauner:
"This contains the super work for this cycle including the long-awaited
  series by Jan to make it possible to prevent writing to mounted block
  devices:

   - Writing to mounted devices is dangerous and can lead to filesystem
     corruption as well as crashes. Furthermore syzbot comes with more
     and more involved examples how to corrupt block device under a
     mounted filesystem leading to kernel crashes and reports we can do
     nothing about. Add tracking of writers to each block device and a
     kernel cmdline argument which controls whether other writeable
     opens to block devices open with BLK_OPEN_RESTRICT_WRITES flag are
     allowed.

     Note that this effectively only prevents modification of the
     particular block device's page cache by other writers. The actual
     device content can still be modified by other means - e.g. by
     issuing direct scsi commands, by doing writes through devices lower
     in the storage stack (e.g. in case loop devices, DM, or MD are
     involved) etc. But blocking direct modifications of the block
     device page cache is enough to give filesystems a chance to perform
     data validation when loading data from the underlying storage and
     thus prevent kernel crashes.

     Syzbot can use this cmdline argument option to avoid uninteresting
     crashes. Also users whose userspace setup does not need writing to
     mounted block devices can set this option for hardening. We expect
     that this will be interesting to quite a few workloads.

     Btrfs is currently opted out of this because they still haven't
     merged patches we require for this to work from three kernel
     releases ago.

   - Reimplement block device freezing and thawing as holder operations
     on the block device.

     This allows us to extend block device freezing to all devices
     associated with a superblock and not just the main device. It also
     allows us to remove get_active_super() and thus another function
     that scans the global list of superblocks.

     Freezing via additional block devices only works if the filesystem
     chooses to use @fs_holder_ops for these additional devices as well.
     That currently only includes ext4 and xfs.

     Earlier releases switched get_tree_bdev() and mount_bdev() to use
     @fs_holder_ops. The remaining nilfs2 open-coded version of
     mount_bdev() has been converted to rely on @fs_holder_ops as well.
     So block device freezing for the main block device will continue to
     work as before.

     There should be no regressions in functionality. The only special
     case is btrfs where block device freezing for the main block device
     never worked because sb->s_bdev isn't set. Block device freezing
     for btrfs can be fixed once they can switch to @fs_holder_ops but
     that can happen whenever they're ready"

* tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits)
  block: Fix a memory leak in bdev_open_by_dev()
  super: don't bother with WARN_ON_ONCE()
  super: massage wait event mechanism
  ext4: Block writes to journal device
  xfs: Block writes to log device
  fs: Block writes to mounted block devices
  btrfs: Do not restrict writes to btrfs devices
  block: Add config option to not allow writing to mounted devices
  block: Remove blkdev_get_by_*() functions
  bcachefs: Convert to bdev_open_by_path()
  fs: handle freezing from multiple devices
  fs: remove dead check
  nilfs2: simplify device handling
  fs: streamline thaw_super_locked
  ext4: simplify device handling
  xfs: simplify device handling
  fs: simplify setup_bdev_super() calls
  blkdev: comment fs_holder_ops
  porting: document block device freeze and thaw changes
  fs: remove unused helper
  ...

Merge tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
"This contains the usual miscellaneous features, cleanups, and fixes
  for vfs and individual fses.

  Features:

   - Add Jan Kara as VFS reviewer

   - Show correct device and inode numbers in proc/<pid>/maps for vma
     files on stacked filesystems. This is now easily doable thanks to
     the backing file work from the last cycles. This comes with
     selftests

  Cleanups:

   - Remove a redundant might_sleep() from wait_on_inode()

   - Initialize pointer with NULL, not 0

   - Clarify comment on access_override_creds()

   - Rework and simplify eventfd_signal() and eventfd_signal_mask()
     helpers

   - Process aio completions in batches to avoid needless wakeups

   - Completely decouple struct mnt_idmap from namespaces. We now only
     keep the actual idmapping around and don't stash references to
     namespaces

   - Reformat maintainer entries to indicate that a given subsystem
     belongs to fs/

   - Simplify fput() for files that were never opened

   - Get rid of various pointless file helpers

   - Rename various file helpers

   - Rename struct file members after SLAB_TYPESAFE_BY_RCU switch from
     last cycle

   - Make relatime_need_update() return bool

   - Use GFP_KERNEL instead of GFP_USER when allocating superblocks

   - Replace deprecated ida_simple_*() calls with their current ida_*()
     counterparts

  Fixes:

   - Fix comments on user namespace id mapping helpers. They aren't
     kernel doc comments so they shouldn't be using /**

   - s/Retuns/Returns/g in various places

   - Add missing parameter documentation on can_move_mount_beneath()

   - Rename i_mapping->private_data to i_mapping->i_private_data

   - Fix a false-positive lockdep warning in pipe_write() for watch
     queues

   - Improve __fget_files_rcu() code generation to improve performance

   - Only notify writer that pipe resizing has finished after setting
     pipe->max_usage otherwise writers are never notified that the pipe
     has been resized and hang

   - Fix some kernel docs in hfsplus

   - s/passs/pass/g in various places

   - Fix kernel docs in ntfs

   - Fix kcalloc() arguments order reported by gcc 14

   - Fix uninitialized value in reiserfs"

* tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits)
  reiserfs: fix uninit-value in comp_keys
  watch_queue: fix kcalloc() arguments order
  ntfs: dir.c: fix kernel-doc function parameter warnings
  fs: fix doc comment typo fs tree wide
  selftests/overlayfs: verify device and inode numbers in /proc/pid/maps
  fs/proc: show correct device and inode numbers in /proc/pid/maps
  eventfd: Remove usage of the deprecated ida_simple_xx() API
  fs: super: use GFP_KERNEL instead of GFP_USER for super block allocation
  fs/hfsplus: wrapper.c: fix kernel-doc warnings
  fs: add Jan Kara as reviewer
  fs/inode: Make relatime_need_update return bool
  pipe: wakeup wr_wait after setting max_usage
  file: remove __receive_fd()
  file: stop exposing receive_fd_user()
  fs: replace f_rcuhead with f_task_work
  file: remove pointless wrapper
  file: s/close_fd_get_file()/file_close_fd()/g
  Improve __fget_files_rcu() code generation (and thus __fget_light())
  file: massage cleanup of files that failed to open
  fs/pipe: Fix lockdep false-positive in watchqueue pipe_write()
  ...

asm-generic: make sparse happy with odd-sized put_unaligned_*()

__put_unaligned_be24() and friends use implicit casts to convert
larger-sized data to bytes, which trips sparse truncation warnings when
the argument is a constant:

    CC [M]  drivers/input/touchscreen/hynitron_cstxxx.o
    CHECK   drivers/input/touchscreen/hynitron_cstxxx.c
  drivers/input/touchscreen/hynitron_cstxxx.c: note: in included file (through arch/x86/include/generated/asm/unaligned.h):
  include/asm-generic/unaligned.h:119:16: warning: cast truncates bits from constant value (aa01a0 becomes a0)
  include/asm-generic/unaligned.h:120:20: warning: cast truncates bits from constant value (aa01 becomes 1)
  include/asm-generic/unaligned.h:119:16: warning: cast truncates bits from constant value (ab00d0 becomes d0)
  include/asm-generic/unaligned.h:120:20: warning: cast truncates bits from constant value (ab00 becomes 0)

To avoid this let's mask off upper bits explicitly, the resulting code
should be exactly the same, but it will keep sparse happy.

Reported-by: kernel test robot <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'pm-sleep'

Merge system-wide power management updates for 6.8-rc1:

- Fix possible deadlocks in the core system-wide PM code that occur if
   device-handling functions cannot be executed asynchronously during
   resune from system-wide suspend (Rafael J. Wysocki).

- Clean up unnecessary local variable initializations in multiple
   places in the hibernation code (Wang chaodong, Li zeming).

- Adjust core hibernation code to avoid missing wakeup events that
   occur after saving an image to persistent storage (Chris Feng).

- Update hibernation code to enforce correct ordering during image
   compression and decompression (Hongchen Zhang).

- Use kmap_local_page() instead of kmap_atomic() in copy_data_page()
   during hibernation and restore (Chen Haonan).

- Adjust documentation and code comments to reflect recent task freezer
   changes (Kevin Hao).

- Repair excess function parameter description warning in the
   hibernation image-saving code (Randy Dunlap).

* pm-sleep:
  PM: sleep: Fix possible deadlocks in core system-wide PM code
  async: Introduce async_schedule_dev_nocall()
  async: Split async_schedule_node_domain()
  PM: hibernate: Repair excess function parameter description warning
  PM: sleep: Remove obsolete comment from unlock_system_sleep()
  Documentation: PM: Adjust freezing-of-tasks.rst to the freezer changes
  PM: hibernate: Use kmap_local_page() in copy_data_page()
  PM: hibernate: Enforce ordering during image compression/decompression
  PM: hibernate: Avoid missing wakeup events during hibernation
  PM: hibernate: Do not initialize error in snapshot_write_next()
  PM: hibernate: Do not initialize error in swap_write_page()
  PM: hibernate: Drop unnecessary local variable initialization

Merge branches 'pm-cpuidle', 'pm-cpufreq' and 'pm-devfreq'

Merge cpuidle, cpufreq and devfreq updates for 6.8-rc1:

- Add support for the Sierra Forest, Grand Ridge and Meteorlake SoCs to
   the intel_idle cpuidle driver (Artem Bityutskiy, Zhang Rui).

- Do not enable interrupts when entering idle in the haltpoll cpuidle
   driver (Borislav Petkov).

- Add Emerald Rapids support in no-HWP mode to the intel_pstate cpufreq
   driver (Zhenguo Yao).

- Use EPP values programmed by the platform firmware as balance
   performance ones by default in intel_pstate (Srinivas Pandruvada).

- Add a missing function return value check to the SCMI cpufreq driver
   to avoid unexpected behavior (Alexandra Diupina).

- Fix parameter type warning in the armada-8k cpufreq driver (Gregory
   CLEMENT).

- Rework trans_stat_show() in the devfreq core code to avoid buffer
   overflows (Christian Marangi).

- Synchronize devfreq_monitor_[start/stop] so as to prevent a timer
   list corruption from occurring when devfreq governors are switched
   frequently (Mukesh Ojha).

* pm-cpuidle:
  cpuidle: haltpoll: Do not enable interrupts when entering idle
  intel_idle: add Sierra Forest SoC support
  intel_idle: add Grand Ridge SoC support
  intel_idle: Add Meteorlake support

* pm-cpufreq:
  cpufreq: intel_pstate: Add Emerald Rapids support in no-HWP mode
  cpufreq: armada-8k: Fix parameter type warning
  cpufreq: scmi: process the result of devm_of_clk_add_hw_provider()
  cpufreq: intel_pstate: Prioritize firmware-provided balance performance EPP

* pm-devfreq:
  PM / devfreq: Synchronize devfreq_monitor_[start/stop]
  PM / devfreq: Convert to use sysfs_emit_at() API
  PM / devfreq: Fix buffer overflow in trans_stat_show

Merge tag 'opp-updates-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm into pm-opp

Merge OPP (Operating Performance Points) updates for 6.8 from Viresh
Kumar:

"- Fix _set_required_opps when opp is NULL (Bryan O'Donoghue).
- ti: Use device_get_match_data() (Rob Herring).
- Minor cleanups around OPP level and other parts and call
   dev_pm_opp_set_opp() recursively for required OPPs (Viresh Kumar)."

* tag 'opp-updates-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  OPP: Rename 'rate_clk_single'
  OPP: Pass rounded rate to _set_opp()
  OPP: Relocate dev_pm_opp_sync_regulators()
  OPP: Move dev_pm_opp_icc_bw to internal opp.h
  OPP: Fix _set_required_opps when opp is NULL
  OPP: The level field is always of unsigned int type
  OPP: Check for invalid OPP in dev_pm_opp_find_level_ceil()
  OPP: Don't set OPP recursively for a parent genpd
  OPP: Call dev_pm_opp_set_opp() for required OPPs
  OPP: Use _set_opp_level() for single genpd case
  OPP: Level zero is valid
  opp: ti: Use device_get_match_data()

Merge branch 'sched/urgent' into sched/core, to pick up pending v6.7 fixes for the v6.8 merge window

This fix didn't make it upstream in time, pick it up
for the v6.8 merge window.

Signed-off-by: Ingo Molnar <[email protected]>

locking/mutex: Clarify that mutex_unlock(), and most other sleeping locks, can still use the lock object after it's unlocked

Clarify the mutex lock lifetime rules a bit more.

Signed-off-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

cifs: get rid of dup length check in parse_reparse_point()

smb2_compound_op(SMB2_OP_GET_REPARSE) already checks if ioctl response
has a valid reparse data buffer's length, so there's no need to check
it again in parse_reparse_point().

In order to get rid of duplicate check, validate reparse data buffer's
length also in cifs_query_reparse_point().

Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

nfsd: rename nfsd_last_thread() to nfsd_destroy_serv()

As this function now destroys the svc_serv, this is a better name.

Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: discard sv_refcnt, and svc_get/svc_put

sv_refcnt is no longer useful.
lockd and nfs-cb only ever have the svc active when there are a non-zero
number of threads, so sv_refcnt mirrors sv_nrthreads.

nfsd also keeps the svc active between when a socket is added and when
the first thread is started, but we don't really need a refcount for
that. We can simply not destroy the svc while there are any permanent
sockets attached.

So remove sv_refcnt and the get/put functions.
Instead of a final call to svc_put(), call svc_destroy() instead.
This is changed to also store NULL in the passed-in pointer to make it
easier to avoid use-after-free situations.

Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

svc: don't hold reference for poolstats, only mutex.

A future patch will remove refcounting on svc_serv as it is of little
use.
It is currently used to keep the svc around while the pool_stats file is
open.
Change this to get the pointer, protected by the mutex, only in
seq_start, and the release the mutex in seq_stop.
This means that if the nfsd server is stopped and restarted while the
pool_stats file it open, then some pool stats info could be from the
first instance and some from the second. This might appear odd, but is
unlikely to be a problem in practice.

Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: remove printk when back channel request not found

If the client interface is down, or there is a network partition between
the client and server that prevents the callback request to reach the
client, TCP on the server will keep re-transmitting the callback for about
~9 minutes before giving up and closing the connection.

If the connection between the client and the server is re-established
before the connection is closed and after the callback timed out (9 secs)
then the re-transmitted callback request will arrive at the client. When
the server receives the reply of the callback, receive_cb_reply prints the
"Got unrecognized reply..." message in the system log since the callback
request was already removed from the server xprt's recv_queue.

Even though this scenario has no effect on the server operation, a
malfunctioning or malicious client can fill up the server's system log.

Signed-off-by: Dai Ngo <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Implement multi-stage Read completion again

Having an nfsd thread waiting for an RDMA Read completion is
problematic if the Read responder (ie, the client) stops responding.
We need to go back to handling RDMA Reads by getting the svc scheduler
to call svc_rdma_recvfrom() a second time to finish building an RPC
message after a Read completion.

This is the final patch, and makes several changes that have to
happen concurrently:

1. svc_rdma_process_read_list no longer waits for a completion, but
   simply builds and posts the Read WRs.

2. svc_rdma_read_done() now queues a completed Read on
   sc_read_complete_q for later processing rather than calling
   complete().

3. The completed RPC message is no longer built in the
   svc_rdma_process_read_list() path. Finishing the message is now
   done in svc_rdma_recvfrom() when it notices work on the
   sc_read_complete_q. The "finish building this RPC message" code
   is removed from the svc_rdma_process_read_list() path.

This arrangement avoids the need for an nfsd thread to wait for an
RDMA Read non-interruptibly without a timeout. It's basically the
same code structure that Tom Tucker used for Read chunks along with
some clean-up and modernization.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Copy construction of svc_rqst::rq_arg to rdma_read_complete()

Once a set of RDMA Reads are complete, the Read completion handler
will poke the transport to trigger a second call to
svc_rdma_recvfrom(). recvfrom() will then merge the RDMA Read
payloads with the previously received RPC header to form a completed
RPC Call message.

The new code is copied from the svc_rdma_process_read_list() path.
A subsequent patch will make use of this code and remove the code
that this was copied from (svc_rdma_rw.c).

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Add back svcxprt_rdma::sc_read_complete_q

Having an nfsd thread waiting for an RDMA Read completion is
problematic if the Read responder (ie, the client) stops responding.
We need to go back to handling RDMA Reads by allowing the nfsd
thread to return to the svc scheduler, then waking a second thread
finish the RPC message once the Read completion fires.

As a next step, add a list_head upon which completed Reads are queued.
A subsequent patch will make use of this queue.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Add back svc_rdma_recv_ctxt::rc_pages

Having an nfsd thread waiting for an RDMA Read completion is
problematic if the Read responder (the client) stops responding. We
need to go back to handling RDMA Reads by allowing the nfsd thread
to return to the svc scheduler, then waking a second thread finish
the RPC message once the Read completion fires.

To start with, restore the rc_pages field so that RDMA Read pages
can be managed across calls to svc_rdma_recvfrom().

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Clean up comment in svc_rdma_accept()

The comment that starts "Qualify ..." applies to only some of the
following code paragraph. Re-arrange the lines so the comment makes
more sense.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Remove queue-shortening warnings

These won't have much diagnostic value for site administrators.
Since they can't be disabled, they become noise.

What's more, the subsequent rdma_create_qp() call adjusts the Send
Queue size (possibly downward) without warning, making the size
reported by these pr_warns inaccurate.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Remove pointer addresses shown in dprintk()

There are a couple of dprintk() call sites in svc_rdma_accept()
that show pointer addresses. These days, displayed pointer addresses
are hashed and thus have little or no diagnostic value, especially
for site administrators.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Optimize svc_rdma_cc_init()

The atomic_inc_return() in svc_rdma_send_cid_init() is expensive.

Some svc_rdma_chunk_ctxt's now reside in long-lived container
structures. They don't need a fresh completion ID for every I/O
operation.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: De-duplicate completion ID initialization helpers

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Move the svc_rdma_cc_init() call

Now that the chunk_ctxt for Reads is no longer dynamically allocated
it can be initialized once for the life of the object that contains
it (struct svc_rdma_recv_ctxt).

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Remove struct svc_rdma_read_info

The remaining fields of struct svc_rdma_read_info are no longer
referenced.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Update the synopsis of svc_rdma_read_special()

Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_read_special() can use that recv_ctxt to derive the
read_info rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Update the synopsis of svc_rdma_read_call_chunk()

Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_read_call_chunk() can use that recv_ctxt to derive the
read_info rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Update synopsis of svc_rdma_read_multiple_chunks()

Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_read_multiple_chunks() can use that recv_ctxt to derive the
read_info rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Update synopsis of svc_rdma_copy_inline_range()

Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_copy_inline_range() can use that recv_ctxt to derive the
read_info rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Update the synopsis of svc_rdma_read_data_item()

Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_build_read_data_item() can use that recv_ctxt to derive
that information rather than the other way around. This removes
another usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Update synopsis of svc_rdma_read_chunk_range()

Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_build_read_chunk_range() can use that recv_ctxt to derive
that information rather than the other way around. This removes
another usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Update synopsis of svc_rdma_build_read_chunk()

Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_build_read_chunk() can use that recv_ctxt to derive that
information rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Update synopsis of svc_rdma_build_read_segment()

Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_build_read_segment() can use the recv_ctxt to derive that
information rather than the other way around. This removes one usage
of the ri_readctxt field, enabling its removal in a subsequent
patch.

At the same time, the use of ri_rqst can similarly be replaced with
a passed-in function parameter.

Start with build_read_segment() because it is a common utility
function at the bottom of the Read chunk path.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Move read_info::ri_pageoff into struct svc_rdma_recv_ctxt

Further clean up: move the starting byte offset field into
svc_rdma_recv_ctxt.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Move svc_rdma_read_info::ri_pageno to struct svc_rdma_recv_ctxt

Further clean up: move the page index field into svc_rdma_recv_ctxt.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Start moving fields out of struct svc_rdma_read_info

Since the request's svc_rdma_recv_ctxt will stay around for the
duration of the RDMA Read operation, the contents of struct
svc_rdma_read_info can reside in the request's svc_rdma_recv_ctxt
rather than being allocated separately. This will eventually save a
call to kmalloc() in a hot path.

Start this clean-up by moving the Read chunk's svc_rdma_chunk_ctxt.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Move struct svc_rdma_chunk_ctxt to svc_rdma.h

Prepare for nestling these into the send and recv ctxts so they
no longer have to be allocated dynamically.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Remove the svc_rdma_chunk_ctxt::cc_rdma field

In every instance, the pointer address in that field is now
available by other means.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Pass a pointer to the transport to svc_rdma_cc_release()

Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma
field.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Explicitly pass the transport to svc_rdma_post_chunk_ctxt()

Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma
field.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Explicitly pass the transport into Read chunk I/O paths

Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma
field.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Explicitly pass the transport into Write chunk I/O paths

Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma
field.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Acquire the svcxprt_rdma pointer from the CQ context

Enable the removal of the svc_rdma_chunk_ctxt::cc_rdma field in a
subsequent patch.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Reduce size of struct svc_rdma_rw_ctxt

SG_CHUNK_SIZE is 128, making struct svc_rdma_rw_ctxt + the first
SGL array more than 4200 bytes in length, pushing the memory
allocation well into order 1.

Even so, the RDMA rw core doesn't seem to use more than max_send_sge
entries in that array (typically 32 or less), so that is all wasted
space.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Update some svcrdma DMA-related tracepoints

A send/recv_ctxt already records transport-related information
in the cq.id, thus there is no need to record the IP addresses of
the transport endpoints.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: DMA error tracepoints should report completion IDs

Update the DMA error flow tracepoints to report the completion ID of
the failing context. This ties the wait/failure to a particular
operation or request, which is more useful than knowing only the
failing transport.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: SQ error tracepoints should report completion IDs

Update the Send Queue's error flow tracepoints to report the
completion ID of the waiting or failing context. This ties the
wait/failure to a particular operation or request, which is a little
more useful than knowing only the transport that is about to close.

Signed-off-by: Chuck Lever <[email protected]>

rpcrdma: Introduce a simple cid tracepoint class

De-duplicate some code, making it easier to add new tracepoints that
report only a completion ID.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Add lockdep class keys for transport locks

Two svcrdma-related transport locks can become quite contended.
Collate their use and make them easy to find in /proc/lock_stat for
better observability.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Clean up locking

There's no need to protect llist_entry() with a spin lock.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Add an async version of svc_rdma_write_info_free()

DMA unmapping can take quite some time, so it should not be handled
in a single-threaded completion handler. Defer releasing write_info
structs to the recently-added workqueue.

With this patch, DMA unmapping can be handled in parallel, and it
does not cause head-of-queue blocking of Write completions.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Add an async version of svc_rdma_send_ctxt_put()

DMA unmapping can take quite some time, so it should not be handled
in a single-threaded completion handler. Defer releasing send_ctxts
to the recently-added workqueue.

With this patch, DMA unmapping can be handled in parallel, and it
does not cause head-of-queue blocking of Send completions.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Add a utility workqueue to svcrdma

To handle work in the background, set up an UNBOUND workqueue for
svcrdma. Subsequent patches will make use of it.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Pre-allocate svc_rdma_recv_ctxt objects

The original reason for allocating svc_rdma_recv_ctxt objects during
Receive completion was to ensure the objects were allocated on the
NUMA node closest to the underlying IB device.

Since commit c5d68d25bd6b ("svcrdma: Clean up allocation of
svc_rdma_recv_ctxt"), however, the device's favored node is
explicitly passed to the memory allocator.

To enable switching Receive completion to soft IRQ context, move
memory allocation out of completion handling, since it can be
costly, and it can sleep.

A limited number of objects is now allocated at "accept" time.

Signed-off-by: Chuck Lever <[email protected]>

svcrdma: Eliminate allocation of recv_ctxt objects in backchannel

The svc_rdma_recv_ctxt free list uses a lockless list to avoid the
need for a spin lock in the fast path. llist_del_first(), which is
used by svc_rdma_recv_ctxt_get(), requires serialization, however,
when there are multiple list producers that are unserialized.

I mistakenly thought there was only one caller of
svc_rdma_recv_ctxt_get() (svc_rdma_refresh_recvs()), thus explicit
serialization would not be necessary. But there is another caller:
svc_rdma_bc_sendto(), and these two are not serialized against each
other. I haven't seen ill effects that I could directly ascribe to
a lack of serialization. It's just an observation based on code
audit.

When DMA-mapping before sending a Reply, the passed-in struct
svc_rdma_recv_ctxt is used only for its write and reply PCLs. These
are currently always empty in the backchannel case. So, instead of
passing a full svc_rdma_recv_ctxt object to
svc_rdma_map_reply_msg(), let's pass in just the Write and Reply
PCLs.

This change makes it unnecessary for the backchannel to acquire a
dummy svc_rdma_recv_ctxt object when sending an RPC Call. The need
for svc_rdma_recv_ctxt free list serialization is now completely
avoided.

Signed-off-by: Chuck Lever <[email protected]>

NFSv4, NFSD: move enum nfs_cb_opnum4 to include/linux/nfs4.h

Callback operations enum is defined in client and server, move it to
common header file.

Signed-off-by: ChenXiaoSong <[email protected]>
Acked-by: Anna Schumaker <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: remove unnecessary NULL check

We check "state" for NULL on the previous line so it can't be NULL here.
No need to check again.

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]/
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: Remove RQ_SPLICE_OK

This flag is no longer used.

Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

NFSD: Modify NFSv4 to use nfsd_read_splice_ok()

Avoid the use of an atomic bitop, and prepare for adding a run-time
switch for using splice reads.

Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

NFSD: Replace RQ_SPLICE_OK in nfsd_read()

RQ_SPLICE_OK is a bit of a layering violation. Also, a subsequent
patch is going to provide a mechanism for always disabling splice
reads.

Splicing is an issue only for NFS READs, so refactor nfsd_read() to
check the auth type directly instead of relying on an rq_flag
setting.

The new helper will be added into the NFSv4 read path in a
subsequent patch.

Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

SUNRPC: Add a server-side API for retrieving an RPC's pseudoflavor

NFSD will use this new API to determine whether nfsd_splice_read is
safe to use. This avoids the need to add a dependency to NFSD for
CONFIG_SUNRPC_GSS.

Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

NFSD: Document lack of f_pos_lock in nfsd_readdir()

Al Viro notes that normal system calls hold f_pos_lock when calling
->iterate_shared and ->llseek; however nfsd_readdir() does not take
that mutex when calling these methods.

It should be safe however because the struct file acquired by
nfsd_readdir() is not visible to other threads.

Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

NFSD: Remove nfsd_drc_gc() tracepoint

This trace point was for debugging the DRC's garbage collection. In
the field it's just noise.

Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

NFSD: Make the file_delayed_close workqueue UNBOUND

workqueue: nfsd_file_delayed_close [nfsd] hogged CPU for >13333us 8
times, consider switching to WQ_UNBOUND

There's no harm in closing a cached file descriptor on another core.

Signed-off-by: Chuck Lever <[email protected]>

NFSD: use read_seqbegin() rather than read_seqbegin_or_lock()

The usage of read_seqbegin_or_lock() in nfsd_copy_write_verifier()
is wrong. "seq" is always even and thus "or_lock" has no effect,
this code can never take ->writeverf_lock for writing.

I guess this is fine, nfsd_copy_write_verifier() just copies 8 bytes
and nfsd_reset_write_verifier() is supposed to be very rare operation
so we do not need the adaptive locking in this case.

Yet the code looks wrong and sub-optimal, it can use read_seqbegin()
without changing the behaviour.

[ cel: Note also that it eliminates this Sparse warning:

fs/nfsd/nfssvc.c:360:6: warning: context imbalance in 'nfsd_copy_write_verifier' -
different lock contexts for basic block

]

Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

nfsd: new Kconfig option for legacy client tracking

We've had a number of attempts at different NFSv4 client tracking
methods over the years, but now nfsdcld has emerged as the clear winner
since the others (recoverydir and the usermodehelper upcall) are
problematic.

As a case in point, the recoverydir backend uses MD5 hashes to encode
long form clientid strings, which means that nfsd repeatedly gets dinged
on FIPS audits, since MD5 isn't considered secure. Its use of MD5 is not
cryptographically significant, so there is no danger there, but allowing
us to compile that out allows us to sidestep the issue entirely.

As a prelude to eventually removing support for these client tracking
methods, add a new Kconfig option that enables them. Mark it deprecated
and make it default to N.

Acked-by: NeilBrown <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

smb: client: stop revalidating reparse points unnecessarily

Query dir responses don't provide enough information on reparse points
such as major/minor numbers and symlink targets other than reparse
tags, however we don't need to unconditionally revalidate them only
because they are reparse points. Instead, revalidate them only when
their ctime or reparse tag has changed.

For instance, Windows Server updates ctime of reparse points when
their data have changed.

Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

cifs: Pass unbyteswapped eof value into SMB2_set_eof()

Change SMB2_set_eof() to take eof as CPU order rather than __le64 and pass
it directly rather than by pointer. This moves the conversion down into
SMB_set_eof() rather than all of its callers and means we don't need to
undo it for the traceline.

Signed-off-by: David Howells <[email protected]>
cc: Jeff Layton <[email protected]>
cc: [email protected]
Signed-off-by: Steve French <[email protected]>

smb3: Improve exception handling in allocate_mr_list()

The kfree() function was called in one case by
the allocate_mr_list() function during error handling
even if the passed variable contained a null pointer.
This issue was detected by using the Coccinelle software.

Thus use another label.

Signed-off-by: Markus Elfring <[email protected]>
Signed-off-by: Steve French <[email protected]>

cifs: fix in logging in cifs_chan_update_iface

Recently, cifs_chan_update_iface was modified to not
remove an iface if a suitable replacement was not found.
With that, there were two conditionals that were exactly
the same. This change removes that extra condition check.

Also, fixed a logging in the same function to indicate
the correct message.

Signed-off-by: Shyam Prasad N <[email protected]>
Signed-off-by: Steve French <[email protected]>

smb: client: handle special files and symlinks in SMB3 POSIX

Parse reparse points in SMB3 posix query info as they will be
supported and required by the new specification.

Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

smb: client: cleanup smb2_query_reparse_point()

Use smb2_compound_op() with SMB2_OP_GET_REPARSE to get reparse point.

Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

smb: client: allow creating symlinks via reparse points

Add support for creating symlinks via IO_REPARSE_TAG_SYMLINK reparse
points in SMB2+.

These are fully supported by most SMB servers and documented in
MS-FSCC. Also have the advantage of requiring fewer roundtrips as
their symlink targets can be parsed directly from CREATE responses on
STATUS_STOPPED_ON_SYMLINK errors.

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

smb: client: fix hardlinking of reparse points

The client was sending an SMB2_CREATE request without setting
OPEN_REPARSE_POINT flag thus failing the entire hardlink operation.

Fix this by setting OPEN_REPARSE_POINT in create options for
SMB2_CREATE request when the source inode is a repase point.

Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

smb: client: fix renaming of reparse points

The client was sending an SMB2_CREATE request without setting
OPEN_REPARSE_POINT flag thus failing the entire rename operation.

Fix this by setting OPEN_REPARSE_POINT in create options for
SMB2_CREATE request when the source inode is a repase point.

Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

smb: client: optimise reparse point querying

Reduce number of roundtrips to server when querying reparse points in
->query_path_info() by sending a single compound request of
create+get_reparse+get_info+close.

Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

smb: client: allow creating special files via reparse points

Add support for creating special files (e.g. char/block devices,
sockets, fifos) via NFS reparse points on SMB2+, which are fully
supported by most SMB servers and documented in MS-FSCC.

smb2_get_reparse_inode() creates the file with a corresponding reparse
point buffer set in @iov through a single roundtrip to the server.

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

smb: client: extend smb2_compound_op() to accept more commands

Make smb2_compound_op() accept up to MAX_COMPOUND(5) commands to be
sent over a single compounded request.

This will allow next commits to read and write reparse files through a
single roundtrip to the server.

Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

smb: client: Fix minor whitespace errors and warnings

Fixes no-op checkpatch errors and warnings.

Signed-off-by: Pierre Mariani <[email protected]>
Signed-off-by: Steve French <[email protected]>

Linux 6.7

Merge tag 'i2c-for-6.7-final' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"Improve the detection when to run atomic transfer handlers for kernels
  with preemption disabled. This removes some false positive splats a
  number of users were seeing if their driver didn't have support for
  atomic transfers.

  Also, fix a typo in the docs while we are here"

* tag 'i2c-for-6.7-final' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: core: Fix atomic xfer check for non-preempt config
  Documentation/i2c: fix spelling error in i2c-address-translators

i2c: core: Fix atomic xfer check for non-preempt config

Since commit aa49c90894d0 ("i2c: core: Run atomic i2c xfer when
!preemptible"), the whole reboot/power off sequence on non-preempt kernels
is using atomic i2c xfer, as !preemptible() always results to 1.

During device_shutdown(), the i2c might be used a lot and not all busses
have implemented an atomic xfer handler. This results in a lot of
avoidable noise, like:

[ 12.687169] No atomic I2C transfer handler for 'i2c-0'
[ 12.692313] WARNING: CPU: 6 PID: 275 at drivers/i2c/i2c-core.h:40 i2c_smbus_xfer+0x100/0x118
...

Fix this by allowing non-atomic xfer when the interrupts are enabled, as
it was before.

Link: https://lore.kernel.org/r/20231222230106.73f030a5@yea
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/linux-i2c/[email protected]/
Fixes: aa49c90894d0 ("i2c: core: Run atomic i2c xfer when !preemptible")
Cc: [email protected] # v5.2+
Signed-off-by: Benjamin Bara <[email protected]>
Tested-by: Michael Walle <[email protected]>
Tested-by: Tor Vic <[email protected]>
[wsa: removed a comment which needs more work, code is ok]
Signed-off-by: Wolfram Sang <[email protected]>

bcachefs: eytzinger0_find() search should be const

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: move "ptrs not changing" optimization to bch2_trigger_extent()

This is useful for btree ptrs as well, when we're just updating
sectors_written.

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: fix simulateously upgrading & downgrading

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Restart recovery passes more reliably

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: bch2_dump_bset() doesn't choke on u64s == 0

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: improve checksum error messages

new helpers:
- bch2_csum_to_text()
- bch2_csum_err_msg()

standardize our checksum error messages a bit, and print out the
checksums a bit more nicely.

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: improve validate_bset_keys()

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: print sb magic when relevant

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: __bch2_sb_field_to_text()

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: %pg is banished

not portable to userspace

Signed-off-by: Kent Overstreet <[email protected]>