Git Repo - linux.git/log

btrfs: fix invalid mapping of extent xarray state

In __extent_writepage_io(), we call btrfs_set_range_writeback() ->
folio_start_writeback(), which clears PAGECACHE_TAG_DIRTY mark from the
mapping xarray if the folio is not dirty. This worked fine before commit
97713b1a2ced ("btrfs: do not clear page dirty inside
extent_write_locked_range()").

After the commit, however, the folio is still dirty at this point, so the
mapping DIRTY tag is not cleared anymore. Then, __extent_writepage_io()
calls btrfs_folio_clear_dirty() to clear the folio's dirty flag. That
results in the page being unlocked with a "strange" state. The page is not
PageDirty, but the mapping tag is set as PAGECACHE_TAG_DIRTY.

This strange state looks like causing a hang with a call trace below when
running fstests generic/091 on a null_blk device. It is waiting for a folio
lock.

While I don't have an exact relation between this hang and the strange
state, fixing the state also fixes the hang. And, that state is worth
fixing anyway.

This commit reorders btrfs_folio_clear_dirty() and
btrfs_set_range_writeback() in __extent_writepage_io(), so that the
PAGECACHE_TAG_DIRTY tag is properly removed from the xarray.

  [464.274] task:fsx             state:D stack:0     pid:3034  tgid:3034  ppid:2853   flags:0x00004002
  [464.286] Call Trace:
  [464.291]  <TASK>
  [464.295]  __schedule+0x10ed/0x6260
  [464.301]  ? __pfx___blk_flush_plug+0x10/0x10
  [464.308]  ? __submit_bio+0x37c/0x450
  [464.314]  ? __pfx___schedule+0x10/0x10
  [464.321]  ? lock_release+0x567/0x790
  [464.327]  ? __pfx_lock_acquire+0x10/0x10
  [464.334]  ? __pfx_lock_release+0x10/0x10
  [464.340]  ? __pfx_lock_acquire+0x10/0x10
  [464.347]  ? __pfx_lock_release+0x10/0x10
  [464.353]  ? do_raw_spin_lock+0x12e/0x270
  [464.360]  schedule+0xdf/0x3b0
  [464.365]  io_schedule+0x8f/0xf0
  [464.371]  folio_wait_bit_common+0x2ca/0x6d0
  [464.378]  ? folio_wait_bit_common+0x1cc/0x6d0
  [464.385]  ? __pfx_folio_wait_bit_common+0x10/0x10
  [464.392]  ? __pfx_filemap_get_folios_tag+0x10/0x10
  [464.400]  ? __pfx_wake_page_function+0x10/0x10
  [464.407]  ? __pfx___might_resched+0x10/0x10
  [464.414]  ? do_raw_spin_unlock+0x58/0x1f0
  [464.420]  extent_write_cache_pages+0xe49/0x1620 [btrfs]
  [464.428]  ? lock_acquire+0x435/0x500
  [464.435]  ? __pfx_extent_write_cache_pages+0x10/0x10 [btrfs]
  [464.443]  ? btrfs_do_write_iter+0x493/0x640 [btrfs]
  [464.451]  ? orc_find.part.0+0x1d4/0x380
  [464.457]  ? __pfx_lock_release+0x10/0x10
  [464.464]  ? __pfx_lock_release+0x10/0x10
  [464.471]  ? btrfs_do_write_iter+0x493/0x640 [btrfs]
  [464.478]  btrfs_writepages+0x1cc/0x460 [btrfs]
  [464.485]  ? __pfx_btrfs_writepages+0x10/0x10 [btrfs]
  [464.493]  ? is_bpf_text_address+0x6e/0x100
  [464.500]  ? kernel_text_address+0x145/0x160
  [464.507]  ? unwind_get_return_address+0x5e/0xa0
  [464.514]  ? arch_stack_walk+0xac/0x100
  [464.521]  do_writepages+0x176/0x780
  [464.527]  ? lock_release+0x567/0x790
  [464.533]  ? __pfx_do_writepages+0x10/0x10
  [464.540]  ? __pfx_lock_acquire+0x10/0x10
  [464.546]  ? __pfx_stack_trace_save+0x10/0x10
  [464.553]  ? do_raw_spin_lock+0x12e/0x270
  [464.560]  ? do_raw_spin_unlock+0x58/0x1f0
  [464.566]  ? _raw_spin_unlock+0x23/0x40
  [464.573]  ? wbc_attach_and_unlock_inode+0x3da/0x7d0
  [464.580]  filemap_fdatawrite_wbc+0x113/0x180
  [464.587]  ? prepare_pages.constprop.0+0x13c/0x5c0 [btrfs]
  [464.596]  __filemap_fdatawrite_range+0xaf/0xf0
  [464.603]  ? __pfx___filemap_fdatawrite_range+0x10/0x10
  [464.611]  ? trace_irq_enable.constprop.0+0xce/0x110
  [464.618]  ? kasan_quarantine_put+0xd7/0x1e0
  [464.625]  btrfs_start_ordered_extent+0x46f/0x570 [btrfs]
  [464.633]  ? __pfx_btrfs_start_ordered_extent+0x10/0x10 [btrfs]
  [464.642]  ? __clear_extent_bit+0x2c0/0x9d0 [btrfs]
  [464.650]  btrfs_lock_and_flush_ordered_range+0xc6/0x180 [btrfs]
  [464.659]  ? __pfx_btrfs_lock_and_flush_ordered_range+0x10/0x10 [btrfs]
  [464.669]  btrfs_read_folio+0x12a/0x1d0 [btrfs]
  [464.676]  ? __pfx_btrfs_read_folio+0x10/0x10 [btrfs]
  [464.684]  ? __pfx_filemap_add_folio+0x10/0x10
  [464.691]  ? __pfx___might_resched+0x10/0x10
  [464.698]  ? __filemap_get_folio+0x1c5/0x450
  [464.705]  prepare_uptodate_page+0x12e/0x4d0 [btrfs]
  [464.713]  prepare_pages.constprop.0+0x13c/0x5c0 [btrfs]
  [464.721]  ? fault_in_iov_iter_readable+0xd2/0x240
  [464.729]  btrfs_buffered_write+0x5bd/0x12f0 [btrfs]
  [464.737]  ? __pfx_btrfs_buffered_write+0x10/0x10 [btrfs]
  [464.745]  ? __pfx_lock_release+0x10/0x10
  [464.752]  ? generic_write_checks+0x275/0x400
  [464.759]  ? down_write+0x118/0x1f0
  [464.765]  ? up_write+0x19b/0x500
  [464.770]  btrfs_direct_write+0x731/0xba0 [btrfs]
  [464.778]  ? __pfx_btrfs_direct_write+0x10/0x10 [btrfs]
  [464.785]  ? __pfx___might_resched+0x10/0x10
  [464.792]  ? lock_acquire+0x435/0x500
  [464.798]  ? lock_acquire+0x435/0x500
  [464.804]  btrfs_do_write_iter+0x494/0x640 [btrfs]
  [464.811]  ? __pfx_btrfs_do_write_iter+0x10/0x10 [btrfs]
  [464.819]  ? __pfx___might_resched+0x10/0x10
  [464.825]  ? rw_verify_area+0x6d/0x590
  [464.831]  vfs_write+0x5d7/0xf50
  [464.837]  ? __might_fault+0x9d/0x120
  [464.843]  ? __pfx_vfs_write+0x10/0x10
  [464.849]  ? btrfs_file_llseek+0xb1/0xfb0 [btrfs]
  [464.856]  ? lock_release+0x567/0x790
  [464.862]  ksys_write+0xfb/0x1d0
  [464.867]  ? __pfx_ksys_write+0x10/0x10
  [464.873]  ? _raw_spin_unlock+0x23/0x40
  [464.879]  ? btrfs_getattr+0x4af/0x670 [btrfs]
  [464.886]  ? vfs_getattr_nosec+0x79/0x340
  [464.892]  do_syscall_64+0x95/0x180
  [464.898]  ? __do_sys_newfstat+0xde/0xf0
  [464.904]  ? __pfx___do_sys_newfstat+0x10/0x10
  [464.911]  ? trace_irq_enable.constprop.0+0xce/0x110
  [464.918]  ? syscall_exit_to_user_mode+0xac/0x2a0
  [464.925]  ? do_syscall_64+0xa1/0x180
  [464.931]  ? trace_irq_enable.constprop.0+0xce/0x110
  [464.939]  ? trace_irq_enable.constprop.0+0xce/0x110
  [464.946]  ? syscall_exit_to_user_mode+0xac/0x2a0
  [464.953]  ? btrfs_file_llseek+0xb1/0xfb0 [btrfs]
  [464.960]  ? do_syscall_64+0xa1/0x180
  [464.966]  ? btrfs_file_llseek+0xb1/0xfb0 [btrfs]
  [464.973]  ? trace_irq_enable.constprop.0+0xce/0x110
  [464.980]  ? syscall_exit_to_user_mode+0xac/0x2a0
  [464.987]  ? __pfx_btrfs_file_llseek+0x10/0x10 [btrfs]
  [464.995]  ? trace_irq_enable.constprop.0+0xce/0x110
  [465.002]  ? __pfx_btrfs_file_llseek+0x10/0x10 [btrfs]
  [465.010]  ? do_syscall_64+0xa1/0x180
  [465.016]  ? lock_release+0x567/0x790
  [465.022]  ? __pfx_lock_acquire+0x10/0x10
  [465.028]  ? __pfx_lock_release+0x10/0x10
  [465.034]  ? trace_irq_enable.constprop.0+0xce/0x110
  [465.042]  ? syscall_exit_to_user_mode+0xac/0x2a0
  [465.049]  ? do_syscall_64+0xa1/0x180
  [465.055]  ? syscall_exit_to_user_mode+0xac/0x2a0
  [465.062]  ? do_syscall_64+0xa1/0x180
  [465.068]  ? syscall_exit_to_user_mode+0xac/0x2a0
  [465.075]  ? do_syscall_64+0xa1/0x180
  [465.081]  ? clear_bhb_loop+0x25/0x80
  [465.087]  ? clear_bhb_loop+0x25/0x80
  [465.093]  ? clear_bhb_loop+0x25/0x80
  [465.099]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
  [465.106] RIP: 0033:0x7f093b8ee784
  [465.111] RSP: 002b:00007ffc29d31b28 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
  [465.122] RAX: ffffffffffffffda RBX: 0000000000006000 RCX: 00007f093b8ee784
  [465.131] RDX: 000000000001de00 RSI: 00007f093b6ed200 RDI: 0000000000000003
  [465.141] RBP: 000000000001de00 R08: 0000000000006000 R09: 0000000000000000
  [465.150] R10: 0000000000023e00 R11: 0000000000000202 R12: 0000000000006000
  [465.160] R13: 0000000000023e00 R14: 0000000000023e00 R15: 0000000000000001
  [465.170]  </TASK>
  [465.174] INFO: lockdep is turned off.

Reported-by: Shinichiro Kawasaki <[email protected]>
Fixes: 97713b1a2ced ("btrfs: do not clear page dirty inside extent_write_locked_range()")
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Naohiro Aota <[email protected]>
Signed-off-by: David Sterba <[email protected]>

KVM: x86: hyper-v: Remove unused inline function kvm_hv_free_pa_page()

There is no caller in tree since introduction in commit b4f69df0f65e ("KVM:
x86: Make Hyper-V emulation optional")

Signed-off-by: Yue Haibing <[email protected]>
Message-ID: <20240803113233 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

io_uring/sqpoll: annotate debug task == current with data_race()

There's a debug check in io_sq_thread_park() checking if it's the SQPOLL
thread itself calling park. KCSAN warns about this, as we should not be
reading sqd->thread outside of sqd->lock.

Just silence this with data_race(). The pointer isn't used for anything
but this debug check.

Reported-by: [email protected]
Signed-off-by: Jens Axboe <[email protected]>

Squashfs: sanity check symbolic link size

Syzkiller reports a "KMSAN: uninit-value in pick_link" bug.

This is caused by an uninitialised page, which is ultimately caused
by a corrupted symbolic link size read from disk.

The reason why the corrupted symlink size causes an uninitialised
page is due to the following sequence of events:

1. squashfs_read_inode() is called to read the symbolic
   link from disk.  This assigns the corrupted value
   3875536935 to inode->i_size.

2. Later squashfs_symlink_read_folio() is called, which assigns
   this corrupted value to the length variable, which being a
   signed int, overflows producing a negative number.

3. The following loop that fills in the page contents checks that
   the copied bytes is less than length, which being negative means
   the loop is skipped, producing an uninitialised page.

This patch adds a sanity check which checks that the symbolic
link size is not larger than expected.

--

Signed-off-by: Phillip Lougher <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reported-by: Lizhi Xu <[email protected]>
Reported-by: [email protected]
Closes: https://lore.kernel.org/all/[email protected]/
V2: fix spelling mistake.
Signed-off-by: Christian Brauner <[email protected]>

9p: Fix DIO read through netfs

If a program is watching a file on a 9p mount, it won't see any change in
size if the file being exported by the server is changed directly in the
source filesystem, presumably because 9p doesn't have change notifications,
and because netfs skips the reads if the file is empty.

Fix this by attempting to read the full size specified when a DIO read is
requested (such as when 9p is operating in unbuffered mode) and dealing
with a short read if the EOF was less than the expected read.

To make this work, filesystems using netfslib must not set
NETFS_SREQ_CLEAR_TAIL if performing a DIO read where that read hit the EOF.
I don't want to mandatorily clear this flag in netfslib for DIO because,
say, ceph might make a read from an object that is not completely filled,
but does not reside at the end of file - and so we need to clear the
excess.

This can be tested by watching an empty file over 9p within a VM (such as
in the ktest framework):

        while true; do read content; if [ -n "$content" ]; then echo $content; break; fi; done < /host/tmp/foo

then writing something into the empty file.  The watcher should immediately
display the file content and break out of the loop.  Without this fix, it
remains in the loop indefinitely.

Fixes: 80105ed2fd27 ("9p: Use netfslib read/write_iter")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218916
Signed-off-by: David Howells <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
cc: Eric Van Hensbergen <[email protected]>
cc: Latchesar Ionkov <[email protected]>
cc: Christian Schoenebeck <[email protected]>
cc: Marc Dionne <[email protected]>
cc: Ilya Dryomov <[email protected]>
cc: Steve French <[email protected]>
cc: Paulo Alcantara <[email protected]>
cc: Trond Myklebust <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
Signed-off-by: Dominique Martinet <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

vfs: Don't evict inode under the inode lru traversing context

The inode reclaiming process(See function prune_icache_sb) collects all
reclaimable inodes and mark them with I_FREEING flag at first, at that
time, other processes will be stuck if they try getting these inodes
(See function find_inode_fast), then the reclaiming process destroy the
inodes by function dispose_list(). Some filesystems(eg. ext4 with
ea_inode feature, ubifs with xattr) may do inode lookup in the inode
evicting callback function, if the inode lookup is operated under the
inode lru traversing context, deadlock problems may happen.

Case 1: In function ext4_evict_inode(), the ea inode lookup could happen
        if ea_inode feature is enabled, the lookup process will be stuck
under the evicting context like this:

1. File A has inode i_reg and an ea inode i_ea
2. getfattr(A, xattr_buf) // i_ea is added into lru // lru->i_ea
3. Then, following three processes running like this:

    PA                              PB
echo 2 > /proc/sys/vm/drop_caches
  shrink_slab
   prune_dcache_sb
   // i_reg is added into lru, lru->i_ea->i_reg
   prune_icache_sb
    list_lru_walk_one
     inode_lru_isolate
      i_ea->i_state |= I_FREEING // set inode state
     inode_lru_isolate
      __iget(i_reg)
      spin_unlock(&i_reg->i_lock)
      spin_unlock(lru_lock)
                                     rm file A
                                      i_reg->nlink = 0
      iput(i_reg) // i_reg->nlink is 0, do evict
       ext4_evict_inode
        ext4_xattr_delete_inode
         ext4_xattr_inode_dec_ref_all
          ext4_xattr_inode_iget
           ext4_iget(i_ea->i_ino)
            iget_locked
             find_inode_fast
              __wait_on_freeing_inode(i_ea) ----→ AA deadlock
    dispose_list // cannot be executed by prune_icache_sb
     wake_up_bit(&i_ea->i_state)

Case 2: In deleted inode writing function ubifs_jnl_write_inode(), file
        deleting process holds BASEHD's wbuf->io_mutex while getting the
xattr inode, which could race with inode reclaiming process(The
        reclaiming process could try locking BASEHD's wbuf->io_mutex in
inode evicting function), then an ABBA deadlock problem would
happen as following:

1. File A has inode ia and a xattr(with inode ixa), regular file B has
    inode ib and a xattr.
2. getfattr(A, xattr_buf) // ixa is added into lru // lru->ixa
3. Then, following three processes running like this:

        PA                PB                        PC
                echo 2 > /proc/sys/vm/drop_caches
                 shrink_slab
                  prune_dcache_sb
                  // ib and ia are added into lru, lru->ixa->ib->ia
                  prune_icache_sb
                   list_lru_walk_one
                    inode_lru_isolate
                     ixa->i_state |= I_FREEING // set inode state
                    inode_lru_isolate
                     __iget(ib)
                     spin_unlock(&ib->i_lock)
                     spin_unlock(lru_lock)
                                                   rm file B
                                                    ib->nlink = 0
rm file A
  iput(ia)
   ubifs_evict_inode(ia)
    ubifs_jnl_delete_inode(ia)
     ubifs_jnl_write_inode(ia)
      make_reservation(BASEHD) // Lock wbuf->io_mutex
      ubifs_iget(ixa->i_ino)
       iget_locked
        find_inode_fast
         __wait_on_freeing_inode(ixa)
          |          iput(ib) // ib->nlink is 0, do evict
          |           ubifs_evict_inode
          |            ubifs_jnl_delete_inode(ib)
          ↓             ubifs_jnl_write_inode
     ABBA deadlock ←-----make_reservation(BASEHD)
                   dispose_list // cannot be executed by prune_icache_sb
                    wake_up_bit(&ixa->i_state)

Fix the possible deadlock by using new inode state flag I_LRU_ISOLATING
to pin the inode in memory while inode_lru_isolate() reclaims its pages
instead of using ordinary inode reference. This way inode deletion
cannot be triggered from inode_lru_isolate() thus avoiding the deadlock.
evict() is made to wait for I_LRU_ISOLATING to be cleared before
proceeding with inode cleanup.

Link: https://lore.kernel.org/all/[email protected]/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219022
Fixes: e50e5129f384 ("ext4: xattr-in-inode support")
Fixes: 7959cf3a7506 ("ubifs: journal: Handle xattrs like files")
Cc: [email protected]
Signed-off-by: Zhihao Cheng <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Suggested-by: Jan Kara <[email protected]>
Suggested-by: Mateusz Guzik <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

dm resume: don't return EINVAL when signalled

If the dm_resume method is called on a device that is not suspended, the
method will suspend the device briefly, before resuming it (so that the
table will be swapped).

However, there was a bug that the return value of dm_suspended_md was not
checked. dm_suspended_md may return an error when it is interrupted by a
signal. In this case, do_resume would call dm_swap_table, which would
return -EINVAL.

This commit fixes the logic, so that error returned by dm_suspend is
checked and the resume operation is undone.

Signed-off-by: Mikulas Patocka <[email protected]>
Signed-off-by: Khazhismel Kumykov <[email protected]>
Cc: [email protected]

dm suspend: return -ERESTARTSYS instead of -EINTR

This commit changes device mapper, so that it returns -ERESTARTSYS
instead of -EINTR when it is interrupted by a signal (so that the ioctl
can be restarted).

The manpage signal(7) says that the ioctl function should be restarted if
the signal was handled with SA_RESTART.

Signed-off-by: Mikulas Patocka <[email protected]>
Cc: [email protected]

btrfs: send: allow cloning non-aligned extent if it ends at i_size

If we a find that an extent is shared but its end offset is not sector
size aligned, then we don't clone it and issue write operations instead.
This is because the reflink (remap_file_range) operation does not allow
to clone unaligned ranges, except if the end offset of the range matches
the i_size of the source and destination files (and the start offset is
sector size aligned).

While this is not incorrect because send can only guarantee that a file
has the same data in the source and destination snapshots, it's not
optimal and generates confusion and surprising behaviour for users.

For example, running this test:

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/sdi
  MNT=/mnt/sdi

  mkfs.btrfs -f $DEV
  mount $DEV $MNT

  # Use a file size not aligned to any possible sector size.
  file_size=$((1 * 1024 * 1024 + 5)) # 1MB + 5 bytes
  dd if=/dev/random of=$MNT/foo bs=$file_size count=1
  cp --reflink=always $MNT/foo $MNT/bar

  btrfs subvolume snapshot -r $MNT/ $MNT/snap
  rm -f /tmp/send-test
  btrfs send -f /tmp/send-test $MNT/snap

  umount $MNT
  mkfs.btrfs -f $DEV
  mount $DEV $MNT

  btrfs receive -vv -f /tmp/send-test $MNT

  xfs_io -r -c "fiemap -v" $MNT/snap/bar

  umount $MNT

Gives the following result:

  (...)
  mkfile o258-7-0
  rename o258-7-0 -> bar
  write bar - offset=0 length=49152
  write bar - offset=49152 length=49152
  write bar - offset=98304 length=49152
  write bar - offset=147456 length=49152
  write bar - offset=196608 length=49152
  write bar - offset=245760 length=49152
  write bar - offset=294912 length=49152
  write bar - offset=344064 length=49152
  write bar - offset=393216 length=49152
  write bar - offset=442368 length=49152
  write bar - offset=491520 length=49152
  write bar - offset=540672 length=49152
  write bar - offset=589824 length=49152
  write bar - offset=638976 length=49152
  write bar - offset=688128 length=49152
  write bar - offset=737280 length=49152
  write bar - offset=786432 length=49152
  write bar - offset=835584 length=49152
  write bar - offset=884736 length=49152
  write bar - offset=933888 length=49152
  write bar - offset=983040 length=49152
  write bar - offset=1032192 length=16389
  chown bar - uid=0, gid=0
  chmod bar - mode=0644
  utimes bar
  utimes
  BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=06d640da-9ca1-604c-b87c-3375175a8eb3, stransid=7
  /mnt/sdi/snap/bar:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..2055]:       26624..28679      2056   0x1

There's no clone operation to clone extents from the file foo into file
bar and fiemap confirms there's no shared flag (0x2000).

So update send_write_or_clone() so that it proceeds with cloning if the
source and destination ranges end at the i_size of the respective files.

After this changes the result of the test is:

  (...)
  mkfile o258-7-0
  rename o258-7-0 -> bar
  clone bar - source=foo source offset=0 offset=0 length=1048581
  chown bar - uid=0, gid=0
  chmod bar - mode=0644
  utimes bar
  utimes
  BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=582420f3-ea7d-564e-bbe5-ce440d622190, stransid=7
  /mnt/sdi/snap/bar:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..2055]:       26624..28679      2056 0x2001

A test case for fstests will also follow up soon.

Link: https://github.com/kdave/btrfs-progs/issues/572#issuecomment-2282841416
CC: [email protected] # 5.10+
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

ACPI: EC: Evaluate _REG outside the EC scope more carefully

Commit 60fa6ae6e6d0 ("ACPI: EC: Install address space handler at the
namespace root") caused _REG methods for EC operation regions outside
the EC device scope to be evaluated which on some systems leads to the
evaluation of _REG methods in the scopes of device objects representing
devices that are not present and not functional according to the _STA
return values. Some of those device objects represent EC "alternatives"
and if _REG is evaluated for their operation regions, the platform
firmware may be confused and the platform may start to behave
incorrectly.

To avoid this problem, only evaluate _REG for EC operation regions
located in the scopes of device objects representing known-to-be-present
devices.

For this purpose, partially revert commit 60fa6ae6e6d0 and trigger the
evaluation of _REG for EC operation regions from acpi_bus_attach() for
the known-valid devices.

Fixes: 60fa6ae6e6d0 ("ACPI: EC: Install address space handler at the namespace root")
Link: https://lore.kernel.org/linux-acpi/[email protected]
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2298938
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2302253
Reported-by: Hans de Goede <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Cc: All applicable <[email protected]>
Link: https://patch.msgid.link/[email protected]

ACPICA: Add a depth argument to acpi_execute_reg_methods()

A subsequent change will need to pass a depth argument to
acpi_execute_reg_methods(), so prepare that function for it.

No intentional functional changes.

Signed-off-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Cc: All applicable <[email protected]>
Link: https://patch.msgid.link/[email protected]

Revert "ACPI: EC: Evaluate orphan _REG under EC device"

This reverts commit 0e6b6dedf168 ("Revert "ACPI: EC: Evaluate orphan
_REG under EC device") because the problem addressed by it will be
addressed differently in what follows.

Signed-off-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Cc: All applicable <[email protected]>
Link: https://patch.msgid.link/[email protected]

btrfs: only run the extent map shrinker from kswapd tasks

Currently the extent map shrinker can be run by any task when attempting
to allocate memory and there's enough memory pressure to trigger it.

To avoid too much latency we stop iterating over extent maps and removing
them once the task needs to reschedule. This logic was introduced in commit
b3ebb9b7e92a ("btrfs: stop extent map shrinker if reschedule is needed").

While that solved high latency problems for some use cases, it's still
not enough because with a too high number of tasks entering the extent map
shrinker code, either due to memory allocations or because they are a
kswapd task, we end up having a very high level of contention on some
spin locks, namely:

1) The fs_info->fs_roots_radix_lock spin lock, which we need to find
   roots to iterate over their inodes;

2) The spin lock of the xarray used to track open inodes for a root
   (struct btrfs_root::inodes) - on 6.10 kernels and below, it used to
   be a red black tree and the spin lock was root->inode_lock;

3) The fs_info->delayed_iput_lock spin lock since the shrinker adds
   delayed iputs (calls btrfs_add_delayed_iput()).

Instead of allowing the extent map shrinker to be run by any task, make
it run only by kswapd tasks. This still solves the problem of running
into OOM situations due to an unbounded extent map creation, which is
simple to trigger by direct IO writes, as described in the changelog
of commit 956a17d9d050 ("btrfs: add a shrinker for extent maps"), and
by a similar case when doing buffered IO on files with a very large
number of holes (keeping the file open and creating many holes, whose
extent maps are only released when the file is closed).

Reported-by: kzd <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219121
Reported-by: Octavia Togami <[email protected]>
Link: https://lore.kernel.org/linux-btrfs/CAHPNGSSt-a4ZZWrtJdVyYnJFscFjP9S7rMcvEMaNSpR556DdLA@mail.gmail.com/
Fixes: 956a17d9d050 ("btrfs: add a shrinker for extent maps")
CC: [email protected] # 6.10+
Tested-by: kzd <[email protected]>
Tested-by: Octavia Togami <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

ALSA: hda: cs35l41: Remove redundant call to hda_cs_dsp_control_remove()

The driver doesn't create any ALSA controls for firmware controls, so it
shouldn't be calling hda_cs_dsp_control_remove().

commit 312c04cee408 ("ALSA: hda: cs35l41: Stop creating ALSA Controls for
firmware coefficients") removed the call to hda_cs_dsp_add_controls() but
didn't remove the call for destroying those controls.

Signed-off-by: Richard Fitzgerald <[email protected]>
Fixes: 312c04cee408 ("ALSA: hda: cs35l41: Stop creating ALSA Controls for firmware coefficients")
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type

[REPORT]
There is a bug report that kernel is rejecting a mismatching inode mode
and its dir item:

[ 1881.553937] BTRFS critical (device dm-0): inode mode mismatch with
dir: inode mode=040700 btrfs type=2 dir type=0

[CAUSE]
It looks like the inode mode is correct, while the dir item type
0 is BTRFS_FT_UNKNOWN, which should not be generated by btrfs at all.

This may be caused by a memory bit flip.

[ENHANCEMENT]
Although tree-checker is not able to do any cross-leaf verification, for
this particular case we can at least reject any dir type with
BTRFS_FT_UNKNOWN.

So here we enhance the dir type check from [0, BTRFS_FT_MAX), to
(0, BTRFS_FT_MAX).
Although the existing corruption can not be fixed just by such enhanced
checking, it should prevent the same 0x2->0x0 bitflip for dir type to
reach disk in the future.

Reported-by: Kota <[email protected]>
Link: https://lore.kernel.org/linux-btrfs/CACsxjPYnQF9ZF-0OhH16dAx50=BXXOcP74MxBc3BG+xae4vTTw@mail.gmail.com/
CC: [email protected] # 5.4+
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: check delayed refs when we're checking if a ref exists

In the patch 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete
resume") I added some code to handle file systems that had been
corrupted by a bug that incorrectly skipped updating the drop progress
key while dropping a snapshot.  This code would check to see if we had
already deleted our reference for a child block, and skip the deletion
if we had already.

Unfortunately there is a bug, as the check would only check the on-disk
references.  I made an incorrect assumption that blocks in an already
deleted snapshot that was having the deletion resume on mount wouldn't
be modified.

If we have 2 pending deleted snapshots that share blocks, we can easily
modify the rules for a block.  Take the following example

subvolume a exists, and subvolume b is a snapshot of subvolume a.  They
share references to block 1.  Block 1 will have 2 full references, one
for subvolume a and one for subvolume b, and it belongs to subvolume a
(btrfs_header_owner(block 1) == subvolume a).

When deleting subvolume a, we will drop our full reference for block 1,
and because we are the owner we will drop our full reference for all of
block 1's children, convert block 1 to FULL BACKREF, and add a shared
reference to all of block 1's children.

Then we will start the snapshot deletion of subvolume b.  We look up the
extent info for block 1, which checks delayed refs and tells us that
FULL BACKREF is set, so sets parent to the bytenr of block 1.  However
because this is a resumed snapshot deletion, we call into
check_ref_exists().  Because check_ref_exists() only looks at the disk,
it doesn't find the shared backref for the child of block 1, and thus
returns 0 and we skip deleting the reference for the child of block 1
and continue.  This orphans the child of block 1.

The fix is to lookup the delayed refs, similar to what we do in
btrfs_lookup_extent_info().  However we only care about whether the
reference exists or not.  If we fail to find our reference on disk, go
look up the bytenr in the delayed refs, and if it exists look for an
existing ref in the delayed ref head.  If that exists then we know we
can delete the reference safely and carry on.  If it doesn't exist we
know we have to skip over this block.

This bug has existed since I introduced this fix, however requires
having multiple deleted snapshots pending when we unmount.  We noticed
this in production because our shutdown path stops the container on the
system, which deletes a bunch of subvolumes, and then reboots the box.
This gives us plenty of opportunities to hit this issue.  Looking at the
history we've seen this occasionally in production, but we had a big
spike recently thanks to faster machines getting jobs with multiple
subvolumes in the job.

Chris Mason wrote a reproducer which does the following

mount /dev/nvme4n1 /btrfs
btrfs subvol create /btrfs/s1
simoop -E -f 4k -n 200000 -z /btrfs/s1
while(true) ; do
btrfs subvol snap /btrfs/s1 /btrfs/s2
simoop -f 4k -n 200000 -r 10 -z /btrfs/s2
btrfs subvol snap /btrfs/s2 /btrfs/s3
btrfs balance start -dusage=80 /btrfs
btrfs subvol del /btrfs/s2 /btrfs/s3
umount /btrfs
btrfsck /dev/nvme4n1 || exit 1
mount /dev/nvme4n1 /btrfs
done

On the second loop this would fail consistently, with my patch it has
been running for hours and hasn't failed.

I also used dm-log-writes to capture the state of the failure so I could
debug the problem.  Using the existing failure case to test my patch
validated that it fixes the problem.

Fixes: 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete resume")
CC: [email protected] # 5.4+
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: David Sterba <[email protected]>

ALSA: hda: cs35l56: Remove redundant call to hda_cs_dsp_control_remove()

The driver doesn't create any ALSA controls for firmware controls, so it
shouldn't be calling hda_cs_dsp_control_remove().

commit 34e1b1bb7324 ("ALSA: hda: cs35l56: Stop creating ALSA controls for
firmware coefficients") removed the call to hda_cs_dsp_add_controls() but
didn't remove the call for destroying those controls.

Signed-off-by: Richard Fitzgerald <[email protected]>
Fixes: 34e1b1bb7324 ("ALSA: hda: cs35l56: Stop creating ALSA controls for firmware coefficients")
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings

After napi_complete_done() is called when NAPI is polling in the current
process context, another NAPI may be scheduled and start running in
softirq on another CPU and may ring the doorbell before the current CPU
does. When combined with unnecessary rings when there is no need to arm
the CQ, it triggers error paths in the hardware.

This patch fixes this by calling napi_complete_done() after doorbell
rings. It limits the number of unnecessary rings when there is
no need to arm. MANA hardware specifies that there must be one doorbell
ring every 8 CQ wraparounds. This driver guarantees one doorbell ring as
soon as the number of consumed CQEs exceeds 4 CQ wraparounds. In practical
workloads, the 4 CQ wraparounds proves to be big enough that it rarely
exceeds this limit before all the napi weight is consumed.

To implement this, add a per-CQ counter cq->work_done_since_doorbell,
and make sure the CQ is armed as soon as passing 4 wraparounds of the CQ.

Cc: [email protected]
Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ")
Reviewed-by: Haiyang Zhang <[email protected]>
Signed-off-by: Long Li <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

KVM: SVM: Fix an error code in sev_gmem_post_populate()

The copy_from_user() function returns the number of bytes which it
was not able to copy. Return -EFAULT instead.

Fixes: dee5a47cc7a4 ("KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command")
Signed-off-by: Dan Carpenter <[email protected]>
Message-ID: <20240612115040.2423290 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge tag 'kvm-s390-master-6.11-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

Fix invalid gisa designation value when gisa is not in use.
Panic if (un)share fails to maintain security.

Merge tag 'kvmarm-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.11, round #1

- Use kvfree() for the kvmalloc'd nested MMUs array

- Set of fixes to address warnings in W=1 builds

- Make KVM depend on assembler support for ARMv8.4

- Fix for vgic-debug interface for VMs without LPIs

- Actually check ID_AA64MMFR3_EL1.S1PIE in get-reg-list selftest

- Minor code / comment cleanups for configuring PAuth traps

- Take kvm->arch.config_lock to prevent destruction / initialization
race for a vCPU's CPUIF which may lead to a UAF

KVM: SVM: Fix uninitialized variable bug

If snp_lookup_rmpentry() fails then "assigned" is printed in the error
message but it was never initialized. Initialize it to false.

Fixes: dee5a47cc7a4 ("KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command")
Signed-off-by: Dan Carpenter <[email protected]>
Message-ID: <20240612115040.2423290 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge tag 'ath-current-20240812' of git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath

ath.git patch for v6.11

We have a single patch for the next 6.11-rc which introduces a
workaround to ath12k which addresses a WCN7850 hardware issue that
prevents proper operation with unaligned transmit buffers.

wifi: iwlwifi: correctly lookup DMA address in SG table

The code to lookup the scatter gather table entry assumed that it was
possible to use sg_virt() in order to lookup the DMA address in a mapped
scatter gather table. However, this assumption is incorrect as the DMA
mapping code may merge multiple entries into one. In that case, the DMA
address space may have e.g. two consecutive pages which is correctly
represented by the scatter gather list entry, however the virtual
addresses for these two pages may differ and the relationship cannot be
resolved anymore.

Avoid this problem entirely by working with the offset into the mapped
area instead of using virtual addresses. With that we only use the DMA
length and DMA address from the scatter gather list entries. The
underlying DMA/IOMMU code is therefore free to merge two entries into
one even if the virtual addresses space for the area is not continuous.

Fixes: 90db50755228 ("wifi: iwlwifi: use already mapped data when TXing an AMSDU")
Reported-by: Chris Bainbridge <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]
Signed-off-by: Benjamin Berg <[email protected]>
Tested-by: Chris Bainbridge <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://patch.msgid.link/[email protected]

wifi: mt76: mt7921: fix NULL pointer access in mt7921_ipv6_addr_change

When disabling wifi mt7921_ipv6_addr_change() is called as a notifier.
At this point mvif->phy is already NULL so we cannot use it here.

Signed-off-by: Bert Karwatzki <[email protected]>
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://patch.msgid.link/[email protected]

mips: sgi-ip22: Fix the build

Fix a recently introduced build failure.

Fixes: d69d80484598 ("driver core: have match() callback in struct bus_type take a const *")
Signed-off-by: Bart Van Assche <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

ARM: riscpc: ecard: Fix the build

Fix a recently introduced build failure.

Cc: Russell King <[email protected]>
Fixes: d69d80484598 ("driver core: have match() callback in struct bus_type take a const *")
Signed-off-by: Bart Van Assche <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

tty: atmel_serial: use the correct RTS flag.

In RS485 mode, the RTS pin is driven high by hardware when the transmitter
is operating. This behaviour cannot be changed. This means that the driver
should claim that it supports SER_RS485_RTS_ON_SEND and not
SER_RS485_RTS_AFTER_SEND.

Otherwise, when configuring the port with the SER_RS485_RTS_ON_SEND, one
get the following warning:

kern.warning kernel: atmel_usart_serial atmel_usart_serial.2.auto:
ttyS1 (1): invalid RTS setting, using RTS_AFTER_SEND instead

which is contradictory with what's really happening.

Signed-off-by: Mathieu Othacehe <[email protected]>
Cc: stable <[email protected]>
Tested-by: Alexander Dahl <[email protected]>
Fixes: af47c491e3c7 ("serial: atmel: Fill in rs485_supported")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

tty: vt: conmakehash: remove non-portable code printing comment header

Commit 6e20753da6bc ("tty: vt: conmakehash: cope with abs_srctree no
longer in env") included <linux/limits.h>, which invoked another
(wrong) patch that tried to address a build error on macOS.

According to the specification [1], the correct header to use PATH_MAX
is <limits.h>.

The minimal fix would be to replace <linux/limits.h> with <limits.h>.

However, the following commits seem questionable to me:

- 3bd85c6c97b2 ("tty: vt: conmakehash: Don't mention the full path of the input in output")
- 6e20753da6bc ("tty: vt: conmakehash: cope with abs_srctree no longer in env")

These commits made too many efforts to cope with a comment header in
drivers/tty/vt/consolemap_deftbl.c:

  /*
   * Do not edit this file; it was automatically generated by
   *
   * conmakehash drivers/tty/vt/cp437.uni > [this file]
   *
   */

With this commit, the header part of the generate C file will be
simplified as follows:

  /*
   * Automatically generated file; Do not edit.
   */

BTW, another series of excessive efforts for a comment header can be
seen in the following:

- 5ef6dc08cfde ("lib/build_OID_registry: don't mention the full path of the script in output")
- 2fe29fe94563 ("lib/build_OID_registry: avoid non-destructive substitution for Perl < 5.13.2 compat")

[1]: https://pubs.opengroup.org/onlinepubs/009695399/basedefs/limits.h.html

Fixes: 6e20753da6bc ("tty: vt: conmakehash: cope with abs_srctree no longer in env")
Cc: stable <[email protected]>
Reported-by: Daniel Gomez <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Masahiro Yamada <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

tty: serial: fsl_lpuart: mark last busy before uart_add_one_port

With "earlycon initcall_debug=1 loglevel=8" in bootargs, kernel
sometimes boot hang. It is because normal console still is not ready,
but runtime suspend is called, so early console putchar will hang
in waiting TRDE set in UARTSTAT.

The lpuart driver has auto suspend delay set to 3000ms, but during
uart_add_one_port, a child device serial ctrl will added and probed with
its pm runtime enabled(see serial_ctrl.c).
The runtime suspend call path is:
device_add
     |-> bus_probe_device
           |->device_initial_probe
           |->__device_attach
                         |-> pm_runtime_get_sync(dev->parent);
|-> pm_request_idle(dev);
|-> pm_runtime_put(dev->parent);

So in the end, before normal console ready, the lpuart get runtime
suspended. And earlycon putchar will hang.

To address the issue, mark last busy just after pm_runtime_enable,
three seconds is long enough to switch from bootconsole to normal
console.

Fixes: 43543e6f539b ("tty: serial: fsl_lpuart: Add runtime pm support")
Cc: stable <[email protected]>
Signed-off-by: Peng Fan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

iommu: Remove unused declaration iommu_sva_unbind_gpasid()

Commit 0c9f17877891 ("iommu: Remove guest pasid related interfaces and definitions")
removed the implementation but leave declaration.

Signed-off-by: Yue Haibing <[email protected]>
Reviewed-by: Lu Baolu <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Joerg Roedel <[email protected]>

usb: misc: ljca: Add Lunar Lake ljca GPIO HID to ljca_gpio_hids[]

Add LJCA GPIO support for the Lunar Lake platform.

New HID taken from out of tree ivsc-driver git repo.

Link: https://github.com/intel/ivsc-driver/commit/47e7c4a446c8ea8c741ff5a32fa7b19f9e6fd47e
Cc: stable <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Revert "usb: typec: tcpm: clear pd_event queue in PORT_RESET"

This reverts commit bf20c69cf3cf9c6445c4925dd9a8a6ca1b78bfdf.

During tcpm_init() stage, if the VBUS is still present after
tcpm_reset_port(), then we assume that VBUS will off and goto safe0v
after a specific discharge time. Following a TCPM_VBUS_EVENT event if
VBUS reach to off state. TCPM_VBUS_EVENT event may be set during
PORT_RESET handling stage. If pd_events reset to 0 after TCPM_VBUS_EVENT
set, we will lost this VBUS event. Then the port state machine may stuck
at one state.

Before:

[    2.570172] pending state change PORT_RESET -> PORT_RESET_WAIT_OFF @ 100 ms [rev1 NONE_AMS]
[    2.570179] state change PORT_RESET -> PORT_RESET_WAIT_OFF [delayed 100 ms]
[    2.570182] pending state change PORT_RESET_WAIT_OFF -> SNK_UNATTACHED @ 920 ms [rev1 NONE_AMS]
[    3.490213] state change PORT_RESET_WAIT_OFF -> SNK_UNATTACHED [delayed 920 ms]
[    3.490220] Start toggling
[    3.546050] CC1: 0 -> 0, CC2: 0 -> 2 [state TOGGLING, polarity 0, connected]
[    3.546057] state change TOGGLING -> SRC_ATTACH_WAIT [rev1 NONE_AMS]

After revert this patch, we can see VBUS off event and the port will goto
expected state.

[    2.441992] pending state change PORT_RESET -> PORT_RESET_WAIT_OFF @ 100 ms [rev1 NONE_AMS]
[    2.441999] state change PORT_RESET -> PORT_RESET_WAIT_OFF [delayed 100 ms]
[    2.442002] pending state change PORT_RESET_WAIT_OFF -> SNK_UNATTACHED @ 920 ms [rev1 NONE_AMS]
[    2.442122] VBUS off
[    2.442125] state change PORT_RESET_WAIT_OFF -> SNK_UNATTACHED [rev1 NONE_AMS]
[    2.442127] VBUS VSAFE0V
[    2.442351] CC1: 0 -> 0, CC2: 0 -> 0 [state SNK_UNATTACHED, polarity 0, disconnected]
[    2.442357] Start toggling
[    2.491850] CC1: 0 -> 0, CC2: 0 -> 2 [state TOGGLING, polarity 0, connected]
[    2.491858] state change TOGGLING -> SRC_ATTACH_WAIT [rev1 NONE_AMS]
[    2.491863] pending state change SRC_ATTACH_WAIT -> SNK_TRY @ 200 ms [rev1 NONE_AMS]
[    2.691905] state change SRC_ATTACH_WAIT -> SNK_TRY [delayed 200 ms]

Fixes: bf20c69cf3cf ("usb: typec: tcpm: clear pd_event queue in PORT_RESET")
Cc: [email protected]
Signed-off-by: Xu Yang <[email protected]>
Acked-by: Heikki Krogerus <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: typec: ucsi: Fix the return value of ucsi_run_command()

The command execution routines need to return the amount of
data that was transferred when succesful.

This fixes an issue where the alternate modes and the power
delivery capabilities are not getting registered.

Fixes: 5e9c1662a89b ("usb: typec: ucsi: rework command execution functions")
Signed-off-by: Heikki Krogerus <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: xhci: fix duplicate stall handling in handle_tx_event()

Stall handling is managed in the 'process_*' functions, which are called
right before the 'goto' stall handling code snippet. Thus, there should
be a return after the 'process_*' functions. Otherwise, the stall code may
run twice.

Fixes: 1b349f214ac7 ("usb: xhci: add 'goto' for halted endpoint check in handle_tx_event()")
Reported-by: Michal Pecio <[email protected]>
Signed-off-by: Niklas Neronin <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: xhci: Check for xhci->interrupters being allocated in xhci_mem_clearup()

If xhci_mem_init() fails, it calls into xhci_mem_cleanup() to mop
up the damage. If it fails early enough, before xhci->interrupters
is allocated but after xhci->max_interrupters has been set, which
happens in most (all?) cases, things get uglier, as xhci_mem_cleanup()
unconditionally derefences xhci->interrupters. With prejudice.

Gate the interrupt freeing loop with a check on xhci->interrupters
being non-NULL.

Found while debugging a DMA allocation issue that led the XHCI driver
on this exact path.

Fixes: c99b38c41234 ("xhci: add support to allocate several interrupters")
Cc: Mathias Nyman <[email protected]>
Cc: Wesley Cheng <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Cc: [email protected] # 6.8+
Signed-off-by: Mathias Nyman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Merge tag 'thunderbolt-for-v6.11-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into usb-linus

thunderbolt: Fixes for v6.11-rc3

This includes following USB4/Thunderbolt fixes for v6.11-rc3:

  - Fix memory leak in debugfs sideband register access
  - Fix hang when host router NVM is upgraded and there is another host
    connected.

Both have been in linux-next with no reported issues.

* tag 'thunderbolt-for-v6.11-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt:
  thunderbolt: Mark XDomain as unplugged when router is removed
  thunderbolt: Fix memory leaks in {port|retimer}_sb_regs_write()

char: xillybus: Don't destroy workqueue from work item running on it

Triggered by a kref decrement, destroy_workqueue() may be called from
within a work item for destroying its own workqueue. This illegal
situation is averted by adding a module-global workqueue for exclusive
use of the offending work item. Other work items continue to be queued
on per-device workqueues to ensure performance.

Reported-by: [email protected]
Cc: stable <[email protected]>
Closes: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Eli Billauer <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

ALSA: hda/tas2781: fix wrong calibrated data order

Wrong calibration data order cause sound too low in some device.
Fix wrong calibrated data order, add calibration data converssion
by get_unaligned_be32() after reading from UEFI.

Fixes: 5be27f1e3ec9 ("ALSA: hda/tas2781: Add tas2781 HDA driver")
Cc: <[email protected]>
Signed-off-by: Baojun Xu <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

memcg_write_event_control(): fix a user-triggerable oops

we are *not* guaranteed that anything past the terminating NUL
is mapped (let alone initialized with anything sane).

Fixes: 0dea116876ee ("cgroup: implement eventfd-based generic API for notifications")
Cc: [email protected]
Cc: Andrew Morton <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Al Viro <[email protected]>

net: macb: Use rcu_dereference() for idev->ifa_list in macb_suspend().

In macb_suspend(), idev->ifa_list is fetched with rcu_access_pointer()
and later the pointer is dereferenced as ifa->ifa_local.

So, idev->ifa_list must be fetched with rcu_dereference().

Fixes: 0cb8de39a776 ("net: macb: Add ARP support to WOL")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/bpf: Add a test to verify previous stacksafe() fix

A selftest is added such that without the previous patch,
a crash can happen. With the previous patch, the test can
run successfully. The new test is written in a way which
mimics original crash case:
  main_prog
    static_prog_1
      static_prog_2
where static_prog_1 has different paths to static_prog_2
and some path has stack allocated and some other path
does not. A stacksafe() checking in static_prog_2()
triggered the crash.

Signed-off-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

bpf: Fix a kernel verifier crash in stacksafe()

Daniel Hodges reported a kernel verifier crash when playing with sched-ext.
Further investigation shows that the crash is due to invalid memory access
in stacksafe(). More specifically, it is the following code:

    if (exact != NOT_EXACT &&
        old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
        cur->stack[spi].slot_type[i % BPF_REG_SIZE])
            return false;

The 'i' iterates old->allocated_stack.
If cur->allocated_stack < old->allocated_stack the out-of-bound
access will happen.

To fix the issue add 'i >= cur->allocated_stack' check such that if
the condition is true, stacksafe() should fail. Otherwise,
cur->stack[spi].slot_type[i % BPF_REG_SIZE] memory access is legal.

Fixes: 2793a8b015f7 ("bpf: exact states comparison for iterator convergence checks")
Cc: Eduard Zingerman <[email protected]>
Reported-by: Daniel Hodges <[email protected]>
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

scsi: mpi3mr: Avoid MAX_PAGE_ORDER WARNING for buffer allocations

Commit fc4444941140 ("scsi: mpi3mr: HDB allocation and posting for hardware
and firmware buffers") added mpi3mr_alloc_diag_bufs() which calls
dma_alloc_coherent() to allocate the trace buffer and the firmware
buffer. mpi3mr_alloc_diag_bufs() decides the buffer sizes from the driver
configuration. In my environment, the sizes are 8MB. With the sizes,
dma_alloc_coherent() fails and report this WARNING:

WARNING: CPU: 4 PID: 438 at mm/page_alloc.c:4676 __alloc_pages_noprof+0x52f/0x640

The WARNING indicates that the order of the allocation size is larger than
MAX_PAGE_ORDER. After this failure, mpi3mr_alloc_diag_bufs() reduces the
buffer sizes and retries dma_alloc_coherent(). In the end, the buffer
allocations succeed with 4MB size in my environment, which corresponds to
MAX_PAGE_ORDER=10. Though the allocations succeed, the WARNING message is
misleading and should be avoided.

To avoid the WARNING, check the orders of the buffer allocation sizes
before calling dma_alloc_coherent(). If the orders are larger than
MAX_PAGE_ORDER, fall back to the retry path.

Fixes: fc4444941140 ("scsi: mpi3mr: HDB allocation and posting for hardware and firmware buffers")
Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Sathya Prakash Veerichetty <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

powerpc/topology: Check if a core is online

topology_is_core_online() checks if the core a CPU belongs to
is online. The core is online if at least one of the sibling
CPUs is online. The first CPU of an online core is also online
in the common case, so this should be fairly quick.

Fixes: 73c58e7e1412 ("powerpc: Add HOTPLUG_SMT support")
Signed-off-by: Nysal Jan K.A <[email protected]>
Reviewed-by: Shrikanth Hegde <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]

cpu/SMT: Enable SMT only if a core is online

If a core is offline then enabling SMT should not online CPUs of
this core. By enabling SMT, what is intended is either changing the SMT
value from "off" to "on" or setting the SMT level (threads per core) from a
lower to higher value.

On PowerPC the ppc64_cpu utility can be used, among other things, to
perform the following functions:

ppc64_cpu --cores-on                # Get the number of online cores
ppc64_cpu --cores-on=X              # Put exactly X cores online
ppc64_cpu --offline-cores=X[,Y,...] # Put specified cores offline
ppc64_cpu --smt={on|off|value}      # Enable, disable or change SMT level

If the user has decided to offline certain cores, enabling SMT should
not online CPUs in those cores. This patch fixes the issue and changes
the behaviour as described, by introducing an arch specific function
topology_is_core_online(). It is currently implemented only for PowerPC.

Fixes: 73c58e7e1412 ("powerpc: Add HOTPLUG_SMT support")
Reported-by: Tyrel Datwyler <[email protected]>
Closes: https://groups.google.com/g/powerpc-utils-devel/c/wrwVzAAnRlI/m/5KJSoqP4BAAJ
Signed-off-by: Nysal Jan K.A <[email protected]>
Reviewed-by: Shrikanth Hegde <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]

bpf: Fix updating attached freplace prog in prog_array map

The commit f7866c358733 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT")
fixed a NULL pointer dereference panic, but didn't fix the issue that
fails to update attached freplace prog to prog_array map.

Since commit 1c123c567fb1 ("bpf: Resolve fext program type when checking map compatibility"),
freplace prog and its target prog are able to tail call each other.

And the commit 3aac1ead5eb6 ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach")
sets prog->aux->dst_prog as NULL after attaching freplace prog to its
target prog.

After loading freplace the prog_array's owner type is BPF_PROG_TYPE_SCHED_CLS.
Then, after attaching freplace its prog->aux->dst_prog is NULL.
Then, while updating freplace in prog_array the bpf_prog_map_compatible()
incorrectly returns false because resolve_prog_type() returns
BPF_PROG_TYPE_EXT instead of BPF_PROG_TYPE_SCHED_CLS.
After this patch the resolve_prog_type() returns BPF_PROG_TYPE_SCHED_CLS
and update to prog_array can succeed.

Fixes: f7866c358733 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT")
Cc: Toke Høiland-Jørgensen <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Signed-off-by: Leon Hwang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

scsi: mpi3mr: Add missing spin_lock_init() for mrioc->trigger_lock

Commit fc4444941140 ("scsi: mpi3mr: HDB allocation and posting for hardware
and firmware buffers") added the spinlock trigger_lock to the struct
mpi3mr_ioc. However, spin_lock_init() call was not added for it, then the
lock does not work as expected. Also, the kernel reports the message below
when lockdep is enabled.

    INFO: trying to register non-static key.
    The code is fine but needs lockdep annotation, or maybe
    you didn't initialize this object before use?

To fix the issue and to avoid the INFO message, add the missing
spin_lock_init() call.

Fixes: fc4444941140 ("scsi: mpi3mr: HDB allocation and posting for hardware and firmware buffers")
Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Sathya Prakash Veerichetty <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags

The NETFS_RREQ_USE_PGPRIV2 and NETFS_RREQ_WRITE_TO_CACHE flags aren't used
correctly.  The problem is that we try to set them up in the request
initialisation, but we the cache may be in the process of setting up still,
and so the state may not be correct.  Further, we secondarily sample the
cache state and make contradictory decisions later.

The issue arises because we set up the cache resources, which allows the
cache's ->prepare_read() to switch on NETFS_SREQ_COPY_TO_CACHE - which
triggers cache writing even if we didn't set the flags when allocating.

Fix this in the following way:

(1) Drop NETFS_ICTX_USE_PGPRIV2 and instead set NETFS_RREQ_USE_PGPRIV2 in
     ->init_request() rather than trying to juggle that in
     netfs_alloc_request().

(2) Repurpose NETFS_RREQ_USE_PGPRIV2 to merely indicate that if caching is
     to be done, then PG_private_2 is to be used rather than only setting
     it if we decide to cache and then having netfs_rreq_unlock_folios()
     set the non-PG_private_2 writeback-to-cache if it wasn't set.

(3) Split netfs_rreq_unlock_folios() into two functions, one of which
     contains the deprecated code for using PG_private_2 to avoid
     accidentally doing the writeback path - and always use it if
     USE_PGPRIV2 is set.

(4) As NETFS_ICTX_USE_PGPRIV2 is removed, make netfs_write_begin() always
     wait for PG_private_2.  This function is deprecated and only used by
     ceph anyway, and so label it so.

(5) Drop the NETFS_RREQ_WRITE_TO_CACHE flag and use
     fscache_operation_valid() on the cache_resources instead.  This has
     the advantage of picking up the result of netfs_begin_cache_read() and
     fscache_begin_write_operation() - which are called after the object is
     initialised and will wait for the cache to come to a usable state.

Just reverting ae678317b95e[1] isn't a sufficient fix, so this need to be
applied on top of that.  Without this as well, things like:

rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {

and:

WARNING: CPU: 13 PID: 3621 at fs/ceph/caps.c:3386

may happen, along with some UAFs due to PG_private_2 not getting used to
wait on writeback completion.

Fixes: 2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
Reported-by: Max Kellermann <[email protected]>
Signed-off-by: David Howells <[email protected]>
cc: Ilya Dryomov <[email protected]>
cc: Xiubo Li <[email protected]>
cc: Hristo Venev <[email protected]>
cc: Jeff Layton <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a second writeback flag"

This reverts commit ae678317b95e760607c7b20b97c9cd4ca9ed6e1a.

Revert the patch that removes the deprecated use of PG_private_2 in
netfslib for the moment as Ceph is actually still using this to track
data copied to the cache.

Fixes: ae678317b95e ("netfs: Remove deprecated use of PG_private_2 as a second writeback flag")
Reported-by: Max Kellermann <[email protected]>
Signed-off-by: David Howells <[email protected]>
cc: Ilya Dryomov <[email protected]>
cc: Xiubo Li <[email protected]>
cc: Jeff Layton <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
https: //lore.kernel.org/r/3575457.1722355300@warthog.procyon.org.uk
Signed-off-by: Christian Brauner <[email protected]>

file: fix typo in take_fd() comment

The explanatory comment above take_fd() contains a typo, fix that to not
confuse readers.

Signed-off-by: Mathias Krause <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

pidfd: prevent creation of pidfds for kthreads

It's currently possible to create pidfds for kthreads but it is unclear
what that is supposed to mean. Until we have use-cases for it and we
figured out what behavior we want block the creation of pidfds for
kthreads.

Link: https://lore.kernel.org/r/20240731-gleis-mehreinnahmen-6bbadd128383@brauner
Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
Cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>

netfs: clean up after renaming FSCACHE_DEBUG config

Commit 6b8e61472529 ("netfs: Rename CONFIG_FSCACHE_DEBUG to
CONFIG_NETFS_DEBUG") renames the config, but introduces two issues: First,
NETFS_DEBUG mistakenly depends on the non-existing config NETFS, whereas
the actual intended config is called NETFS_SUPPORT. Second, the config
renaming misses to adjust the documentation of the functionality of this
config.

Clean up those two points.

Signed-off-by: Lukas Bulwahn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

libfs: fix infinite directory reads for offset dir

After we switch tmpfs dir operations from simple_dir_operations to
simple_offset_dir_operations, every rename happened will fill new dentry
to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free
key starting with octx->newx_offset, and then set newx_offset equals to
free key + 1. This will lead to infinite readdir combine with rename
happened at the same time, which fail generic/736 in xfstests(detail show
as below).

1. create 5000 files(1 2 3...) under one dir
2. call readdir(man 3 readdir) once, and get one entry
3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry)
4. loop 2~3, until readdir return nothing or we loop too many
times(tmpfs break test with the second condition)

We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite
directory reads") to fix it, record the last_index when we open dir, and
do not emit the entry which index >= last_index. The file->private_data
now used in offset dir can use directly to do this, and we also update
the last_index when we llseek the dir file.

Fixes: a2e459555c5f ("shmem: stable directory offsets")
Signed-off-by: yangerkun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Chuck Lever <[email protected]>
[brauner: only update last_index after seek when offset is zero like Jan suggested]
Signed-off-by: Christian Brauner <[email protected]>

nsfs: fix ioctl declaration

The kernel is writing an object of type __u64, so the ioctl has to be
defined to _IOR(NSIO, 0x5, __u64) instead of _IO(NSIO, 0x5).

Reported-by: Dmitry V. Levin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

fs/netfs/fscache_cookie: add missing "n_accesses" check

This fixes a NULL pointer dereference bug due to a data race which
looks like this:

  BUG: kernel NULL pointer dereference, address: 0000000000000008
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  CPU: 33 PID: 16573 Comm: kworker/u97:799 Not tainted 6.8.7-cm4all1-hp+ #43
  Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/17/2018
  Workqueue: events_unbound netfs_rreq_write_to_cache_work
  RIP: 0010:cachefiles_prepare_write+0x30/0xa0
  Code: 57 41 56 45 89 ce 41 55 49 89 cd 41 54 49 89 d4 55 53 48 89 fb 48 83 ec 08 48 8b 47 08 48 83 7f 10 00 48 89 34 24 48 8b 68 20 <48> 8b 45 08 4c 8b 38 74 45 49 8b 7f 50 e8 4e a9 b0 ff 48 8b 73 10
  RSP: 0018:ffffb4e78113bde0 EFLAGS: 00010286
  RAX: ffff976126be6d10 RBX: ffff97615cdb8438 RCX: 0000000000020000
  RDX: ffff97605e6c4c68 RSI: ffff97605e6c4c60 RDI: ffff97615cdb8438
  RBP: 0000000000000000 R08: 0000000000278333 R09: 0000000000000001
  R10: ffff97605e6c4600 R11: 0000000000000001 R12: ffff97605e6c4c68
  R13: 0000000000020000 R14: 0000000000000001 R15: ffff976064fe2c00
  FS:  0000000000000000(0000) GS:ffff9776dfd40000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000008 CR3: 000000005942c002 CR4: 00000000001706f0
  Call Trace:
   <TASK>
   ? __die+0x1f/0x70
   ? page_fault_oops+0x15d/0x440
   ? search_module_extables+0xe/0x40
   ? fixup_exception+0x22/0x2f0
   ? exc_page_fault+0x5f/0x100
   ? asm_exc_page_fault+0x22/0x30
   ? cachefiles_prepare_write+0x30/0xa0
   netfs_rreq_write_to_cache_work+0x135/0x2e0
   process_one_work+0x137/0x2c0
   worker_thread+0x2e9/0x400
   ? __pfx_worker_thread+0x10/0x10
   kthread+0xcc/0x100
   ? __pfx_kthread+0x10/0x10
   ret_from_fork+0x30/0x50
   ? __pfx_kthread+0x10/0x10
   ret_from_fork_asm+0x1b/0x30
   </TASK>
  Modules linked in:
  CR2: 0000000000000008
  ---[ end trace 0000000000000000 ]---

This happened because fscache_cookie_state_machine() was slow and was
still running while another process invoked fscache_unuse_cookie();
this led to a fscache_cookie_lru_do_one() call, setting the
FSCACHE_COOKIE_DO_LRU_DISCARD flag, which was picked up by
fscache_cookie_state_machine(), withdrawing the cookie via
cachefiles_withdraw_cookie(), clearing cookie->cache_priv.

At the same time, yet another process invoked
cachefiles_prepare_write(), which found a NULL pointer in this code
line:

  struct cachefiles_object *object = cachefiles_cres_object(cres);

The next line crashes, obviously:

  struct cachefiles_cache *cache = object->volume->cache;

During cachefiles_prepare_write(), the "n_accesses" counter is
non-zero (via fscache_begin_operation()).  The cookie must not be
withdrawn until it drops to zero.

The counter is checked by fscache_cookie_state_machine() before
switching to FSCACHE_COOKIE_STATE_RELINQUISHING and
FSCACHE_COOKIE_STATE_WITHDRAWING (in "case
FSCACHE_COOKIE_STATE_FAILED"), but not for
FSCACHE_COOKIE_STATE_LRU_DISCARDING ("case
FSCACHE_COOKIE_STATE_ACTIVE").

This patch adds the missing check.  With a non-zero access counter,
the function returns and the next fscache_end_cookie_access() call
will queue another fscache_cookie_state_machine() call to handle the
still-pending FSCACHE_COOKIE_DO_LRU_DISCARD.

Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning")
Signed-off-by: Max Kellermann <[email protected]>
Signed-off-by: David Howells <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
cc: Jeff Layton <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>

filelock: fix name of file_lease slab cache

When struct file_lease was split out from struct file_lock, the name of
the file_lock slab cache was copied to the new slab cache for
file_lease. This name conflict causes confusion in /proc/slabinfo and
/sys/kernel/slab. In particular, it caused failures in drgn's test case
for slab cache merging.

Link: https://github.com/osandov/drgn/blob/9ad29fd86499eb32847473e928b6540872d3d59a/tests/linux_kernel/helpers/test_slab.py#L81
Fixes: c69ff4071935 ("filelock: split leases out of struct file_lock")
Signed-off-by: Omar Sandoval <[email protected]>
Link: https://lore.kernel.org/r/2d1d053da1cafb3e7940c4f25952da4f0af34e38.1722293276.git.osandov@fb.com
Reviewed-by: Chuck Lever <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

netfs: Fault in smaller chunks for non-large folio mappings

As in commit 4e527d5841e2 ("iomap: fault in smaller chunks for non-large
folio mappings"), we can see a performance loss for filesystems
which have not yet been converted to large folios.

Fixes: c38f4e96e605 ("netfs: Provide func to copy data to pagecache for buffered write")
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

io_uring/napi: remove duplicate io_napi_entry timeout assignation

io_napi_entry() has 2 calling sites. One of them is unlikely to find an
entry and if it does, the timeout should arguable not be updated.

The other io_napi_entry() calling site is overwriting the update made
by io_napi_entry() so the io_napi_entry() timeout value update has no or
little value and therefore is removed.

Signed-off-by: Olivier Langlois <[email protected]>
Link: https://lore.kernel.org/r/145b54ff179f87609e20dffaf5563c07cdbcad1a.1723423275.git.olivier@trillion01.com
Signed-off-by: Jens Axboe <[email protected]>

io_uring/napi: check napi_enabled in io_napi_add() before proceeding

doing so avoids the overhead of adding napi ids to all the rings that do
not enable napi.

if no id is added to napi_list because napi is disabled,
__io_napi_busy_loop() will not be called.

Signed-off-by: Olivier Langlois <[email protected]>
Fixes: b4ccc4dd1330 ("io_uring/napi: enable even with a timeout of 0")
Link: https://lore.kernel.org/r/bd989ccef5fda14f5fd9888faf4fefcf66bd0369.1723400131.git.olivier@trillion01.com
Signed-off-by: Jens Axboe <[email protected]>

s390/dasd: fix error recovery leading to data corruption on ESE devices

Extent Space Efficient (ESE) or thin provisioned volumes need to be
formatted on demand during usual IO processing.

The dasd_ese_needs_format function checks for error codes that signal
the non existence of a proper track format.

The check for incorrect length is to imprecise since other error cases
leading to transport of insufficient data also have this flag set.
This might lead to data corruption in certain error cases for example
during a storage server warmstart.

Fix by removing the check for incorrect length and replacing by
explicitly checking for invalid track format in transport mode.

Also remove the check for file protected since this is not a valid
ESE handling case.

Cc: [email protected] # 5.3+
Fixes: 5e2b17e712cf ("s390/dasd: Add dynamic formatting support for ESE volumes")
Reviewed-by: Jan Hoeppner <[email protected]>
Signed-off-by: Stefan Haberland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

s390/dasd: Remove DMA alignment

This reverts commit bc792884b76f ("s390/dasd: Establish DMA alignment").

Quoting the original commit:
    linux-next commit bf8d08532bc1 ("iomap: add support for dma aligned
    direct-io") changes the alignment requirement to come from the block
    device rather than the block size, and the default alignment
    requirement is 512-byte boundaries. Since DASD I/O has page
    alignments for IDAW/TIDAW requests, let's override this value to
    restore the expected behavior.

I mentioned TIDAW, but that was wrong. TIDAWs have no distinct alignment
requirement (per p. 15-70 of POPS SA22-7832-13):

   Unless otherwise specified, TIDAWs may designate
   a block of main storage on any boundary and length
   up to 4K bytes, provided the specified block does not
   cross a 4 K-byte boundary.

IDAWs do, but the original commit neglected that while ECKD DASD are
typically formatted in 4096-byte blocks, they don't HAVE to be. Formatting
an ECKD volume with smaller blocks is permitted (dasdfmt -b xxx), and the
problematic commit enforces alignment properties to such a device that
will result in errors, such as:

   [test@host ~]# lsdasd -l a367 | grep blksz
     blksz: 512
   [test@host ~]# mkfs.xfs -f /dev/disk/by-path/ccw-0.0.a367-part1
   meta-data=/dev/dasdc1            isize=512    agcount=4, agsize=230075 blks
            =                       sectsz=512   attr=2, projid32bit=1
            =                       crc=1        finobt=1, sparse=1, rmapbt=1
            =                       reflink=1    bigtime=1 inobtcount=1 nrext64=1
   data     =                       bsize=4096   blocks=920299, imaxpct=25
            =                       sunit=0      swidth=0 blks
   naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
   log      =internal log           bsize=4096   blocks=16384, version=2
            =                       sectsz=512   sunit=0 blks, lazy-count=1
   realtime =none                   extsz=4096   blocks=0, rtextents=0
   error reading existing superblock: Invalid argument
   mkfs.xfs: pwrite failed: Invalid argument
   libxfs_bwrite: write failed on (unknown) bno 0x70565c/0x100, err=22
   mkfs.xfs: Releasing dirty buffer to free list!
   found dirty buffer (bulk) on free list!
   mkfs.xfs: pwrite failed: Invalid argument
   ...snipped...

The original commit omitted the FBA discipline for just this reason,
but the formatted block size of the other disciplines was overlooked.
The solution to all of this is to revert to the original behavior,
such that the block size can be respected. There were two commits [1]
that moved this code in the interim, so a straight git-revert is not
possible, but the change is straightforward.

But what of the original problem? That was manifested with a direct-io
QEMU guest, where QEMU itself was changed a month or two later with
commit 25474d90aa ("block: use the request length for iov alignment")
such that the blamed kernel commit is unnecessary.

[1] commit 0127a47f58c6 ("dasd: move queue setup to common code")
    commit fde07a4d74e3 ("dasd: use the atomic queue limits API")

Fixes: bc792884b76f ("s390/dasd: Establish DMA alignment")
Reviewed-by: Stefan Haberland <[email protected]>
Signed-off-by: Eric Farman <[email protected]>
Signed-off-by: Stefan Haberland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

Merge tag 'platform-drivers-x86-v6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:
"While the ideapad concurrency fix itself is relatively
  straightforward, it required moving code around and adding a bit of
  supporting infrastructure to have a clean inter-driver interface. This
  shows up in the diffstats.

   - ideapad-laptop / lenovo-ymc: Protect VPC calls with a mutex

   - amd/pmf: Query HPD data also when ALS is disabled"

* tag 'platform-drivers-x86-v6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: ideapad-laptop: add a mutex to synchronize VPC commands
  platform/x86: ideapad-laptop: move ymc_trigger_ec from lenovo-ymc
  platform/x86: ideapad-laptop: introduce a generic notification chain
  platform/x86/amd/pmf: Fix to Update HPD Data When ALS is Disabled

Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull fd bitmap fix from Al Viro:
"Fix bitmap corruption on close_range() by cleaning up
copy_fd_bitmaps()"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE

drm/v3d: Fix out-of-bounds read in `v3d_csd_job_run()`

When enabling UBSAN on Raspberry Pi 5, we get the following warning:

[  387.894977] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/v3d/v3d_sched.c:320:3
[  387.903868] index 7 is out of range for type '__u32 [7]'
[  387.909692] CPU: 0 PID: 1207 Comm: kworker/u16:2 Tainted: G        WC         6.10.3-v8-16k-numa #151
[  387.919166] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[  387.925961] Workqueue: v3d_csd drm_sched_run_job_work [gpu_sched]
[  387.932525] Call trace:
[  387.935296]  dump_backtrace+0x170/0x1b8
[  387.939403]  show_stack+0x20/0x38
[  387.942907]  dump_stack_lvl+0x90/0xd0
[  387.946785]  dump_stack+0x18/0x28
[  387.950301]  __ubsan_handle_out_of_bounds+0x98/0xd0
[  387.955383]  v3d_csd_job_run+0x3a8/0x438 [v3d]
[  387.960707]  drm_sched_run_job_work+0x520/0x6d0 [gpu_sched]
[  387.966862]  process_one_work+0x62c/0xb48
[  387.971296]  worker_thread+0x468/0x5b0
[  387.975317]  kthread+0x1c4/0x1e0
[  387.978818]  ret_from_fork+0x10/0x20
[  387.983014] ---[ end trace ]---

This happens because the UAPI provides only seven configuration
registers and we are reading the eighth position of this u32 array.

Therefore, fix the out-of-bounds read in `v3d_csd_job_run()` by
accessing only seven positions on the '__u32 [7]' array. The eighth
register exists indeed on V3D 7.1, but it isn't currently used. That
being so, let's guarantee that it remains unused and add a note that it
could be set in a future patch.

Fixes: 0ad5bc1ce463 ("drm/v3d: fix up register addresses for V3D 7.x")
Reported-by: Tvrtko Ursulin <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

net: ethernet: mtk_wed: fix use-after-free panic in mtk_wed_setup_tc_block_cb()

When there are multiple ap interfaces on one band and with WED on,
turning the interface down will cause a kernel panic on MT798X.

Previously, cb_priv was freed in mtk_wed_setup_tc_block() without
marking NULL,and mtk_wed_setup_tc_block_cb() didn't check the value, too.

Assign NULL after free cb_priv in mtk_wed_setup_tc_block() and check NULL
in mtk_wed_setup_tc_block_cb().

----------
Unable to handle kernel paging request at virtual address 0072460bca32b4f5
Call trace:
mtk_wed_setup_tc_block_cb+0x4/0x38
0xffffffc0794084bc
tcf_block_playback_offloads+0x70/0x1e8
tcf_block_unbind+0x6c/0xc8
...
---------

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Zheng Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mana: Fix RX buf alloc_size alignment and atomic op panic

The MANA driver's RX buffer alloc_size is passed into napi_build_skb() to
create SKB. skb_shinfo(skb) is located at the end of skb, and its alignment
is affected by the alloc_size passed into napi_build_skb(). The size needs
to be aligned properly for better performance and atomic operations.
Otherwise, on ARM64 CPU, for certain MTU settings like 4000, atomic
operations may panic on the skb_shinfo(skb)->dataref due to alignment fault.

To fix this bug, add proper alignment to the alloc_size calculation.

Sample panic info:
[  253.298819] Unable to handle kernel paging request at virtual address ffff000129ba5cce
[  253.300900] Mem abort info:
[  253.301760]   ESR = 0x0000000096000021
[  253.302825]   EC = 0x25: DABT (current EL), IL = 32 bits
[  253.304268]   SET = 0, FnV = 0
[  253.305172]   EA = 0, S1PTW = 0
[  253.306103]   FSC = 0x21: alignment fault
Call trace:
__skb_clone+0xfc/0x198
skb_clone+0x78/0xe0
raw6_local_deliver+0xfc/0x228
ip6_protocol_deliver_rcu+0x80/0x500
ip6_input_finish+0x48/0x80
ip6_input+0x48/0xc0
ip6_sublist_rcv_finish+0x50/0x78
ip6_sublist_rcv+0x1cc/0x2b8
ipv6_list_rcv+0x100/0x150
__netif_receive_skb_list_core+0x180/0x220
netif_receive_skb_list_internal+0x198/0x2a8
__napi_poll+0x138/0x250
net_rx_action+0x148/0x330
handle_softirqs+0x12c/0x3a0

Cc: [email protected]
Fixes: 80f6215b450e ("net: mana: Add support for jumbo frame")
Signed-off-by: Haiyang Zhang <[email protected]>
Reviewed-by: Long Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

dt-bindings: net: fsl,qoriq-mc-dpmac: add missed property phys

Add missed property phys, which indicate how connect to serdes phy.
Fix below warning:
arch/arm64/boot/dts/freescale/fsl-lx2160a-honeycomb.dtb: fsl-mc@80c000000: dpmacs:ethernet@7: Unevaluated properties are not allowed ('phys' was unexpected)

Signed-off-by: Frank Li <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

powerpc/mm: Fix boot warning with hugepages and CONFIG_DEBUG_VIRTUAL

Booting with CONFIG_DEBUG_VIRTUAL leads to following warning when
passing hugepage reservation on command line:

  Kernel command line: hugepagesz=1g hugepages=1 hugepagesz=64m hugepages=1 hugepagesz=256m hugepages=1 noreboot
  HugeTLB: allocating 1 of page size 1.00 GiB failed.  Only allocated 0 hugepages.
  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 0 at arch/powerpc/include/asm/io.h:948 __alloc_bootmem_huge_page+0xd4/0x284
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted 6.10.0-rc6-00396-g6b0e82791bd0-dirty #936
  Hardware name: MPC8544DS e500v2 0x80210030 MPC8544 DS
  NIP:  c1020240 LR: c10201d0 CTR: 00000000
  REGS: c13fdd30 TRAP: 0700   Not tainted  (6.10.0-rc6-00396-g6b0e82791bd0-dirty)
  MSR:  00021000 <CE,ME>  CR: 44084288  XER: 20000000

  GPR00: c10201d0 c13fde20 c130b560 e8000000 e8001000 00000000 00000000 c1420000
  GPR08: 00000000 00028001 00000000 00000004 44084282 01066ac0 c0eb7c9c efffe149
  GPR16: c0fc4228 0000005f ffffffff c0eb7d0c c0eb7cc0 c0eb7ce0 ffffffff 00000000
  GPR24: c1441cec efffe153 e8001000 c14240c0 00000000 c1441d64 00000000 e8000000
  NIP [c1020240] __alloc_bootmem_huge_page+0xd4/0x284
  LR [c10201d0] __alloc_bootmem_huge_page+0x64/0x284
  Call Trace:
  [c13fde20] [c10201d0] __alloc_bootmem_huge_page+0x64/0x284 (unreliable)
  [c13fde50] [c10207b8] hugetlb_hstate_alloc_pages+0x8c/0x3e8
  [c13fdeb0] [c1021384] hugepages_setup+0x240/0x2cc
  [c13fdef0] [c1000574] unknown_bootoption+0xfc/0x280
  [c13fdf30] [c0078904] parse_args+0x200/0x4c4
  [c13fdfa0] [c1000d9c] start_kernel+0x238/0x7d0
  [c13fdff0] [c0000434] set_ivor+0x12c/0x168
  Code: 554aa33e 7c042840 3ce0c142 80a7427c 5109a016 50caa016 7c9a2378 7fdcf378 4180000c 7c052040 41810160 7c095040 <0fe00000> 38c00000 40800108 3c60c0eb
  ---[ end trace 0000000000000000 ]---

This is due to virt_addr_valid() using high_memory before it is set.

high_memory is set in mem_init() using max_low_pfn, but max_low_pfn
is available long before, it is set in mem_topology_setup(). So just
like commit daa9ada2093e ("powerpc/mm: Fix boot crash with FLATMEM")
moved the setting of max_mapnr immediately after the call to
mem_topology_setup(), the same can be done for high_memory.

Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/62b69c4baad067093f39e7e60df0fe27a86b8d2a.1723100702.git.christophe.leroy@csgroup.eu

powerpc/mm: Fix size of allocated PGDIR

Commit 6b0e82791bd0 ("powerpc/e500: switch to 64 bits PGD on 85xx
(32 bits)") increased the size of PGD entries but failed to increase
the PGD directory.

Use the size of pgd_t instead of the size of pointers to calculate
the allocated size.

Reported-by: Guenter Roeck <[email protected]>
Fixes: 6b0e82791bd0 ("powerpc/e500: switch to 64 bits PGD on 85xx (32 bits)")
Signed-off-by: Christophe Leroy <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/1cdaacb391cbd3e0240f0e0faf691202874e9422.1723109462.git.christophe.leroy@csgroup.eu

Merge branch 'vsc73xx-fix-mdio-and-phy'

Pawel Dembicki says:

====================
net: dsa: vsc73xx: fix MDIO bus access and PHY opera

This series are extracted patches from net-next series [0].

The VSC73xx driver has issues with PHY configuration. This patch series
fixes most of them.

The first patch synchronizes the register configuration routine with the
datasheet recommendations.

Patches 2-3 restore proper communication on the MDIO bus. Currently,
the write value isn't sent to the MDIO register, and without a busy check,
communication with the PHY can be interrupted. This causes the PHY to
receive improper configuration and autonegotiation could fail.

The fourth patch removes the PHY reset blockade, as it is no longer
required.

After fixing the MDIO operations, autonegotiation became possible.
The last patch removes the blockade, which became unnecessary after
the MDIO operations fix.

[0] https://patchwork.kernel.org/project/netdevbpf/list/?series=874739&state=%2A&archive=both
====================

Signed-off-by: David S. Miller <[email protected]>

net: phy: vitesse: repair vsc73xx autonegotiation

When the vsc73xx mdio bus work properly, the generic autonegotiation
configuration works well.

Reviewed-by: Linus Walleij <[email protected]>
Signed-off-by: Pawel Dembicki <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: vsc73xx: allow phy resetting

Resetting the VSC73xx PHY was problematic because the MDIO bus, without
a busy check, read and wrote incorrect register values.

My investigation indicates that resetting the PHY only triggers changes
in configuration. However, improper register values written earlier
were only exposed after a soft reset.

The reset itself wasn't the issue; rather, the problem stemmed from
incorrect read and write operations.

A 'soft_reset' can now proceed normally. There are no reasons to keep
the VSC73xx from being reset.

This commit removes the reset blockade in the 'vsc73xx_phy_write'
function.

Reviewed-by: Linus Walleij <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Pawel Dembicki <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: vsc73xx: check busy flag in MDIO operations

The VSC73xx has a busy flag used during MDIO operations. It is raised
when MDIO read/write operations are in progress. Without it, PHYs are
misconfigured and bus operations do not work as expected.

Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver")
Reviewed-by: Linus Walleij <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Pawel Dembicki <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: vsc73xx: pass value in phy_write operation

In the 'vsc73xx_phy_write' function, the register value is missing,
and the phy write operation always sends zeros.

This commit passes the value variable into the proper register.

Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver")
Reviewed-by: Linus Walleij <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Pawel Dembicki <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: vsc73xx: fix port MAC configuration in full duplex mode

According to the datasheet description ("Port Mode Procedure" in 5.6.2),
the VSC73XX_MAC_CFG_WEXC_DIS bit is configured only for half duplex mode.

The WEXC_DIS bit is responsible for MAC behavior after an excessive
collision. Let's set it as described in the datasheet.

Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver")
Reviewed-by: Linus Walleij <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Pawel Dembicki <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: axienet: Fix register defines comment description

In axiethernet header fix register defines comment description to be
inline with IP documentation. It updates MAC configuration register,
MDIO configuration register and frame filter control description.

Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
Signed-off-by: Radhey Shyam Pandey <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

atm: idt77252: prevent use after free in dequeue_rx()

We can't dereference "skb" after calling vcc->push() because the skb
is released.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drm: panel-orientation-quirks: Add quirk for Ayn Loki Max

Add quirk orientation for Ayn Loki Max model.

This has been tested by JELOS team that uses their
own patched kernel for a while now and confirmed by
users in the ChimeraOS discord servers.

Signed-off-by: Bouke Sybren Haarsma <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm: panel-orientation-quirks: Add quirk for Ayn Loki Zero

Add quirk orientation for the Ayn Loki Zero.

This also has been tested/used by the JELOS team.

Signed-off-by: Bouke Sybren Haarsma <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

ALSA: usb-audio: Add delay quirk for VIVO USB-C-XE710 HEADSET

Audio control requests that sets sampling frequency sometimes fail on
this card. Adding delay between control messages eliminates that problem.

Signed-off-by: Lianqin Hu <[email protected]>
Cc: <[email protected]>
Link: https://patch.msgid.link/TYUPR06MB6217FF67076AF3E49E12C877D2842@TYUPR06MB6217.apcprd06.prod.outlook.com
Signed-off-by: Takashi Iwai <[email protected]>

Merge branch 'topic/cirrus-hp-g12' into for-linus

Pull Cirrus HD-audio quirks for HP G12 laptops.

Signed-off-by: Takashi Iwai <[email protected]>

ALSA: hda/realtek: Add support for new HP G12 laptops

Some of these laptop models have quirk IDs that are identical but have
different amplifier parts fitted, this difference is described in the
ACPI information.

The solution introduced for this product family can derive the required
component binding information from ACPI instead of hardcoding it,
supports the new variants of the CS35L56 being used and has generalized
naming that makes it applicable to other ALC+amp combinations.

Signed-off-by: Simon Trimmer <[email protected]>
Signed-off-by: Richard Fitzgerald <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

Merge tag 'spi-acpi-lookup-dummy' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi into topic/cirrus-hp-g12

spi: Add empty versions of ACPI lookup functions

A patch from Richard Fitzgerald adding dummy versions of the ACPI lookup
functions for SPI:

    Provide empty versions of acpi_spi_count_resources(),
    acpi_spi_device_alloc() and acpi_spi_find_controller_by_adev()
    if the real functions are not being built.

    This commit fixes two problems with the original definitions:

    1) There wasn't an empty version of these functions
    2) The #if only depended on CONFIG_ACPI. But the functions are implemented
       in the core spi.c so CONFIG_SPI_MASTER must also be enabled for the real
       functions to exist.

Linux 6.11-rc3

Merge tag 'x86-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:

- Fix 32-bit PTI for real.

   pti_clone_entry_text() is called twice, once before initcalls so that
   initcalls can use the user-mode helper and then again after text is
   set read only. Setting read only on 32-bit might break up the PMD
   mapping, which makes the second invocation of pti_clone_entry_text()
   find the mappings out of sync and failing.

   Allow the second call to split the existing PMDs in the user mapping
   and synchronize with the kernel mapping.

- Don't make acpi_mp_wake_mailbox read-only after init as the mail box
   must be writable in the case that CPU hotplug operations happen after
   boot. Otherwise the attempt to start a CPU crashes with a write to
   read only memory.

- Add a missing sanity check in mtrr_save_state() to ensure that the
   fixed MTRR MSRs are supported.

   Otherwise mtrr_save_state() ends up in a #GP, which is fixed up, but
   the WARN_ON() can bring systems down when panic on warn is set.

* tag 'x86-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mtrr: Check if fixed MTRRs exist before saving them
  x86/paravirt: Fix incorrect virt spinlock setting on bare metal
  x86/acpi: Remove __ro_after_init from acpi_mp_wake_mailbox
  x86/mm: Fix PTI for i386 some more

Merge tag 'timers-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull time keeping fixes from Thomas Gleixner:

- Fix a couple of issues in the NTP code where user supplied values are
   neither sanity checked nor clamped to the operating range. This
   results in integer overflows and eventualy NTP getting out of sync.

   According to the history the sanity checks had been removed in favor
   of clamping the values, but the clamping never worked correctly under
   all circumstances. The NTP people asked to not bring the sanity
   checks back as it might break existing applications.

   Make the clamping work correctly and add it where it's missing

- If adjtimex() sets the clock it has to trigger the hrtimer subsystem
   so it can adjust and if the clock was set into the future expire
   timers if needed. The caller should provide a bitmask to tell
   hrtimers which clocks have been adjusted.

   adjtimex() uses not the proper constant and uses CLOCK_REALTIME
   instead, which is 0. So hrtimers adjusts only the clocks, but does
   not check for expired timers, which might make them expire really
   late. Use the proper bitmask constant instead.

* tag 'timers-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Fix bogus clock_was_set() invocation in do_adjtimex()
  ntp: Safeguard against time_constant overflow
  ntp: Clamp maxerror and esterror to operating range

Merge tag 'irq-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
"Three small fixes for interrupt core and drivers:

   - The interrupt core fails to honor caller supplied affinity hints
     for non-managed interrupts and uses the system default affinity on
     startup instead. Set the missing flag in the descriptor to tell the
     core to use the provided affinity.

   - Fix a shift out of bounds error in the Xilinx driver

   - Handle switching to level trigger correctly in the RISCV APLIC
     driver. It failed to retrigger the interrupt which causes it to
     become stale"

* tag 'irq-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/riscv-aplic: Retrigger MSI interrupt on source configuration
  irqchip/xilinx: Fix shift out of bounds
  genirq/irqdesc: Honor caller provided affinity in alloc_desc()

Merge tag 'usb-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are a number of small USB driver fixes for reported issues for
  6.11-rc3. Included in here are:

   - usb serial driver MODULE_DESCRIPTION() updates

   - usb serial driver fixes

   - typec driver fixes

   - usb-ip driver fix

   - gadget driver fixes

   - dt binding update

  All of these have been in linux-next with no reported issues"

* tag 'usb-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: ucsi: Fix a deadlock in ucsi_send_command_common()
  usb: typec: tcpm: avoid sink goto SNK_UNATTACHED state if not received source capability message
  usb: gadget: f_fs: pull out f->disable() from ffs_func_set_alt()
  usb: gadget: f_fs: restore ffs_func_disable() functionality
  USB: serial: debug: do not echo input by default
  usb: typec: tipd: Delete extra semi-colon
  usb: typec: tipd: Fix dereferencing freeing memory in tps6598x_apply_patch()
  usb: gadget: u_serial: Set start_delayed during suspend
  usb: typec: tcpci: Fix error code in tcpci_check_std_output_cap()
  usb: typec: fsa4480: Check if the chip is really there
  usb: gadget: core: Check for unset descriptor
  usb: vhci-hcd: Do not drop references before new references are gained
  usb: gadget: u_audio: Check return codes from usb_ep_enable and config_ep_by_speed.
  usb: gadget: midi2: Fix the response for FB info with block 0xff
  dt-bindings: usb: microchip,usb2514: Add USB2517 compatible
  USB: serial: garmin_gps: use struct_size() to allocate pkt
  USB: serial: garmin_gps: annotate struct garmin_packet with __counted_by
  USB: serial: add missing MODULE_DESCRIPTION() macros
  USB: serial: spcp8x5: remove unused struct 'spcp8x5_usb_ctrl_arg'

Merge tag 'tty-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty / serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes for reported problems
  for 6.11-rc3. Included in here are:

   - sc16is7xx serial driver fixes

   - uartclk bugfix for a divide by zero issue

   - conmakehash userspace build issue fix

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: vt: conmakehash: cope with abs_srctree no longer in env
  serial: sc16is7xx: fix invalid FIFO access with special register set
  serial: sc16is7xx: fix TX fifo corruption
  serial: core: check uartclk for zero to avoid divide by zero

Merge tag 'driver-core-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core / documentation fixes from Greg KH:
"Here are some small fixes, and some documentation updates for
  6.11-rc3. Included in here are:

   - embargoed hardware documenation updates based on a lot of review by
     legal-types in lots of companies to try to make the process a _bit_
     easier for us to manage over time.

   - rust firmware documentation fix

   - driver detach race fix for the fix that went into 6.11-rc1

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  driver core: Fix uevent_show() vs driver detach race
  Documentation: embargoed-hardware-issues.rst: add a section documenting the "early access" process
  Documentation: embargoed-hardware-issues.rst: minor cleanups and fixes
  rust: firmware: fix invalid rustdoc link

Merge tag 'char-misc-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc fixes from Greg KH:
"Here are some small char/misc/other driver fixes for 6.11-rc3 for
  reported issues. Included in here are:

   - binder driver fixes

   - fsi MODULE_DESCRIPTION() additions (people seem to love them...)

   - eeprom driver fix

   - Kconfig dependency fix to resolve build issues

   - spmi driver fixes

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'char-misc-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  spmi: pmic-arb: add missing newline in dev_err format strings
  spmi: pmic-arb: Pass the correct of_node to irq_domain_add_tree
  binder_alloc: Fix sleeping function called from invalid context
  binder: fix descriptor lookup for context manager
  char: add missing NetWinder MODULE_DESCRIPTION() macros
  misc: mrvl-cn10k-dpi: add PCI_IOV dependency
  eeprom: ee1004: Fix locking issues in ee1004_probe()
  fsi: add missing MODULE_DESCRIPTION() macros

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Two core fixes: one to prevent discard type changes (seen on iSCSI)
  during intermittent errors and the other is fixing a lockdep problem
  caused by the queue limits change.

  And one driver fix in ufs"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: sd: Keep the discard mode stable
  scsi: sd: Move sd_read_cpr() out of the q->limits_lock region
  scsi: ufs: core: Fix hba->last_dme_cmd_tstamp timestamp updating logic

ALSA: hda/realtek: Fix noise from speakers on Lenovo IdeaPad 3 15IAU7

Fix noise from speakers connected to AUX port when no sound is playing.
The problem occurs because the `alc_shutup_pins` function includes
a 0x10ec0257 vendor ID, which causes noise on Lenovo IdeaPad 3 15IAU7 with
Realtek ALC257 codec when no sound is playing.
Removing this vendor ID from the function fixes the bug.

Fixes: 70794b9563fe ("ALSA: hda/realtek: Add more codec ID to no shutup pins list")
Signed-off-by: Parsa Poorshikhian <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-q
ueue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-08-07 (igc)

This series contains updates to igc driver only.

Faizal adjusts the size of the MAC internal buffer on i226 devices to
resolve an errata for leaking packet transmits. He also corrects a
condition in which qbv_config_change_errors are incorrectly counted.
Lastly, he adjusts the conditions for resetting the adapter when
changing TSN Tx mode and corrects the conditions in which gtxoffset
register is set.
====================

Signed-off-by: David S. Miller <[email protected]>

net: ethernet: use ip_hdrlen() instead of bit shift

`ip_hdr(skb)->ihl << 2` is the same as `ip_hdrlen(skb)`
Therefore, we should use a well-defined function not a bit shift
to find the header length.

It also compresses two lines to a single line.

Signed-off-by: Moon Yeounsu <[email protected]>
Reviewed-by: Christophe JAILLET <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

gpio: mlxbf3: Support shutdown() function

During Linux graceful reboot, the GPIO interrupts are not disabled.
Since the drivers are not removed during graceful reboot,
the logic to call mlxbf3_gpio_irq_disable() is not triggered.
Interrupts that remain enabled can cause issues on subsequent boots.

For example, the mlxbf-gige driver contains PHY logic to bring up the link.
If the gpio-mlxbf3 driver loads first, the mlxbf-gige driver
will use a GPIO interrupt to bring up the link.
Otherwise, it will use polling.
The next time Linux boots and loads the drivers in this order, we encounter the issue:
- mlxbf-gige loads first and uses polling while the GPIO10
  interrupt is still enabled from the previous boot. So if
  the interrupt triggers, there is nothing to clear it.
- gpio-mlxbf3 loads.
- i2c-mlxbf loads. The interrupt doesn't trigger for I2C
  because it is shared with the GPIO interrupt line which
  was not cleared.

The solution is to add a shutdown function to the GPIO driver to clear and disable
all interrupts. Also clear the interrupt after disabling it in mlxbf3_gpio_irq_disable().

Fixes: 38a700efc510 ("gpio: mlxbf3: Add gpio driver support")
Signed-off-by: Asmaa Mnebhi <[email protected]>
Reviewed-by: David Thompson <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>

Merge tag 'nfsd-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

- Two minor fixes for recent changes

* tag 'nfsd-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: don't set SVC_SOCK_ANONYMOUS when creating nfsd sockets
sunrpc: avoid -Wformat-security warning

Merge tag 'i2c-for-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:

- Two fixes for SMBusAlert handling in the I2C core: one to avoid an
   endless loop when scanning for handlers and one to make sure handlers
   are always called even if HW has broken behaviour

- I2C header build fix for when ACPI is enabled but I2C isn't

- The testunit gets a rename in the code to match the documentation

- Two fixes for the Qualcomm GENI I2C controller are cleaning up the
   error exit patch in the runtime_resume() function. The first is
   disabling the clock, the second disables the icc on the way out

* tag 'i2c-for-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: testunit: match HostNotify test name with docs
  i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume
  i2c: qcom-geni: Add missing clk_disable_unprepare in geni_i2c_runtime_resume
  i2c: Fix conditional for substituting empty ACPI functions
  i2c: smbus: Send alert notifications to all devices if source not found
  i2c: smbus: Improve handling of stuck alerts

Merge tag 'dma-mapping-6.11-2024-08-10' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fix from Christoph Hellwig:

- avoid a deadlock with dma-debug and netconsole (Rik van Riel)

* tag 'dma-mapping-6.11-2024-08-10' of git://git.infradead.org/users/hch/dma-mapping:
dma-debug: avoid deadlock between dma debug vs printk and netconsole