Git Repo - linux.git/log

nilfs2: fix kernel bug due to missing clearing of checked flag

Syzbot reported that in directory operations after nilfs2 detects
filesystem corruption and degrades to read-only,
__block_write_begin_int(), which is called to prepare block writes, may
fail the BUG_ON check for accesses exceeding the folio/page size,
triggering a kernel bug.

This was found to be because the "checked" flag of a page/folio was not
cleared when it was discarded by nilfs2's own routine, which causes the
sanity check of directory entries to be skipped when the directory
page/folio is reloaded. So, fix that.

This was necessary when the use of nilfs2's own page discard routine was
applied to more than just metadata files.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Signed-off-by: Ryusuke Konishi <[email protected]>
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: numa_clear_kernel_node_hotplug: Add NUMA_NO_NODE check for node id

The acquired memory blocks for reserved may include blocks outside of
memory management. In this case, the nid variable is set to NUMA_NO_NODE
(-1), so an error occurs in node_set(). This adds a check using
numa_valid_node() to numa_clear_kernel_node_hotplug() that skips
node_set() when nid is set to NUMA_NO_NODE.

Link: https://lkml.kernel.org/r/1729070461-13576-1-git-send-email-nobuhiro1.iwamatsu@toshiba.co.jp
Fixes: 87482708210f ("mm: introduce numa_memblks")
Signed-off-by: Nobuhiro Iwamatsu <[email protected]>
Reviewed-by: Mike Rapoport (Microsoft) <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Suggested-by: Yuji Ishikawa <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

ocfs2: pass u64 to ocfs2_truncate_inline maybe overflow

Syzbot reported a kernel BUG in ocfs2_truncate_inline. There are two
reasons for this: first, the parameter value passed is greater than
ocfs2_max_inline_data_with_xattr, second, the start and end parameters of
ocfs2_truncate_inline are "unsigned int".

So, we need to add a sanity check for byte_start and byte_len right before
ocfs2_truncate_inline() in ocfs2_remove_inode_range(), if they are greater
than ocfs2_max_inline_data_with_xattr return -EINVAL.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 1afc32b95233 ("ocfs2: Write support for inline data")
Signed-off-by: Edward Adam Davis <[email protected]>
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=81092778aac03460d6b7
Reviewed-by: Joseph Qi <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: shmem: fix data-race in shmem_getattr()

I got the following KCSAN report during syzbot testing:

==================================================================
BUG: KCSAN: data-race in generic_fillattr / inode_set_ctime_current

write to 0xffff888102eb3260 of 4 bytes by task 6565 on cpu 1:
inode_set_ctime_to_ts include/linux/fs.h:1638 [inline]
inode_set_ctime_current+0x169/0x1d0 fs/inode.c:2626
shmem_mknod+0x117/0x180 mm/shmem.c:3443
shmem_create+0x34/0x40 mm/shmem.c:3497
lookup_open fs/namei.c:3578 [inline]
open_last_lookups fs/namei.c:3647 [inline]
path_openat+0xdbc/0x1f00 fs/namei.c:3883
do_filp_open+0xf7/0x200 fs/namei.c:3913
do_sys_openat2+0xab/0x120 fs/open.c:1416
do_sys_open fs/open.c:1431 [inline]
__do_sys_openat fs/open.c:1447 [inline]
__se_sys_openat fs/open.c:1442 [inline]
__x64_sys_openat+0xf3/0x120 fs/open.c:1442
x64_sys_call+0x1025/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:258
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x76/0x7e

read to 0xffff888102eb3260 of 4 bytes by task 3498 on cpu 0:
inode_get_ctime_nsec include/linux/fs.h:1623 [inline]
inode_get_ctime include/linux/fs.h:1629 [inline]
generic_fillattr+0x1dd/0x2f0 fs/stat.c:62
shmem_getattr+0x17b/0x200 mm/shmem.c:1157
vfs_getattr_nosec fs/stat.c:166 [inline]
vfs_getattr+0x19b/0x1e0 fs/stat.c:207
vfs_statx_path fs/stat.c:251 [inline]
vfs_statx+0x134/0x2f0 fs/stat.c:315
vfs_fstatat+0xec/0x110 fs/stat.c:341
__do_sys_newfstatat fs/stat.c:505 [inline]
__se_sys_newfstatat+0x58/0x260 fs/stat.c:499
__x64_sys_newfstatat+0x55/0x70 fs/stat.c:499
x64_sys_call+0x141f/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:263
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x76/0x7e

value changed: 0x2755ae53 -> 0x27ee44d3

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 UID: 0 PID: 3498 Comm: udevd Not tainted 6.11.0-rc6-syzkaller-00326-gd1f2d51b711a-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
==================================================================

When calling generic_fillattr(), if you don't hold read lock, data-race
will occur in inode member variables, which can cause unexpected
behavior.

Since there is no special protection when shmem_getattr() calls
generic_fillattr(), data-race occurs by functions such as shmem_unlink()
or shmem_mknod(). This can cause unexpected results, so commenting it out
is not enough.

Therefore, when calling generic_fillattr() from shmem_getattr(), it is
appropriate to protect the inode using inode_lock_shared() and
inode_unlock_shared() to prevent data-race.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 44a30220bc0a ("shmem: recalculate file inode when fstat")
Signed-off-by: Jeongjun Park <[email protected]>
Reported-by: syzbot <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Yu Zhao <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: mark mas allocation in vms_abort_munmap_vmas as __GFP_NOFAIL

vms_abort_munmap_vmas() is a recovery path where, on entry, some VMAs have
already been torn down halfway (in a way we can't undo) but are still
present in the maple tree.

At this point, we *must* remove the VMAs from the VMA tree, otherwise we
get UAF.

Because removing VMA tree nodes can require memory allocation, the
existing code has an error path which tries to handle this by reattaching
the VMAs; but that can't be done safely.

A nicer way to fix it would probably be to preallocate enough maple tree
nodes for the removal before the point of no return, or something like
that; but for now, fix it the easy and kinda ugly way, by marking this
allocation __GFP_NOFAIL.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 4f87153e82c4 ("mm: change failure of MAP_FIXED to restoring the gap on failure")
Signed-off-by: Jann Horn <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Reviewed-by: Lorenzo Stoakes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

x86/traps: move kmsan check after instrumentation_begin

During x86_64 kernel build with CONFIG_KMSAN, the objtool warns following:

  AR      built-in.a
  AR      vmlinux.a
  LD      vmlinux.o
vmlinux.o: warning: objtool: handle_bug+0x4: call to
    kmsan_unpoison_entry_regs() leaves .noinstr.text section
  OBJCOPY modules.builtin.modinfo
  GEN     modules.builtin
  MODPOST Module.symvers
  CC      .vmlinux.export.o

Moving kmsan_unpoison_entry_regs() _after_ instrumentation_begin() fixes
the warning.

There is decode_bug(regs->ip, &imm) is left before KMSAN unpoisoining, but
it has the return condition and if we include it after
instrumentation_begin() it results the warning "return with
instrumentation enabled", hence, I'm concerned that regs will not be KMSAN
unpoisoned if `ud_type == BUG_NONE` is true.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: ba54d194f8da ("x86/traps: avoid KMSAN bugs originating from handle_bug()")
Signed-off-by: Sabyrzhan Tasbolatov <[email protected]>
Reviewed-by: Alexander Potapenko <[email protected]>
Cc: Borislav Petkov (AMD) <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

resource: remove dependency on SPARSEMEM from GET_FREE_REGION

We want to use the functions (get_free_mem_region()) configured via
GET_FREE_REGION in resource kunit tests.  However, GET_FREE_REGION
depends on SPARSEMEM now.  This makes resource kunit tests cannot be
built on some architectures lacking SPARSEMEM, or causes config warning
as follows,

  WARNING: unmet direct dependencies detected for GET_FREE_REGION
  Depends on [n]: SPARSEMEM [=n]
  Selected by [y]:
  - RESOURCE_KUNIT_TEST [=y] && RUNTIME_TESTING_MENU [=y] && KUNIT [=y]

When get_free_mem_region() was introduced the only consumers were those
looking to pass the address range to memremap_pages().  That address
range needed to be mindful of the maximum addressable platform physical
address which at the time only SPARSMEM defined via MAX_PHYSMEM_BITS.

Given that memremap_pages() also depended on SPARSEMEM via ZONE_DEVICE,
it was easier to just depend on that definition than invent a general
MAX_PHYSMEM_BITS concept outside of SPARSEMEM.

Turns out that decision was buggy and did not account for KASAN
consumption of physical address space.  That problem was resolved
recently with commit ea72ce5da228 ("x86/kaslr: Expose and use the end
of the physical memory address space"), and GET_FREE_REGION dropped its
MAX_PHYSMEM_BITS dependency.

Then commit 99185c10d5d9 ("resource, kunit: add test case for
region_intersects()"), went ahead and fixed up the only remaining
dependency on SPARSEMEM which was usage of the PA_SECTION_SHIFT macro
for setting the default alignment.  A PAGE_SIZE fallback is fine in the
SPARSEMEM=n case.

With those build dependencies gone GET_FREE_REGION no longer depends on
SPARSEMEM.  So, the patch removes dependency on SPARSEMEM from
GET_FREE_REGION to fix the build issues.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 99185c10d5d9 ("resource, kunit: add test case for region_intersects()")
Signed-off-by: "Huang, Ying" <[email protected]>
Co-developed-by: Dan Williams <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Tested-by: Nathan Chancellor <[email protected]> # build
Cc: Arnd Bergmann <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/mmap: fix race in mmap_region() with ftruncate()

Avoiding the zeroing of the vma tree in mmap_region() introduced a race
with truncate in the page table walk. To avoid any races, create a hole
in the rmap during the operation by clearing the pagetable entries earlier
under the mmap write lock and (critically) before the new vma is installed
into the vma tree. The result is that the old vma(s) are left in the vma
tree, but free_pgtables() removes them from the rmap and clears the ptes
while holding the necessary locks.

This change extends the fix required for hugetblfs and the call_mmap()
function by moving the cleanup higher in the function and running it
unconditionally.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: f8d112a4e657 ("mm/mmap: avoid zeroing vma tree in mmap_region()")
Signed-off-by: Liam R. Howlett <[email protected]>
Reported-by: Jann Horn <[email protected]>
Closes: https://lore.kernel.org/all/CAG48ez0ZpGzxi=-5O_uGQ0xKXOmbjeQ0LjZsRJ1Qtf2X5eOr1w@mail.gmail.com/
Reviewed-by: Jann Horn <[email protected]>
Reviewed-by: Lorenzo Stoakes <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/page_alloc: let GFP_ATOMIC order-0 allocs access highatomic reserves

Under memory pressure it's possible for GFP_ATOMIC order-0 allocations to
fail even though free pages are available in the highatomic reserves.
GFP_ATOMIC allocations cannot trigger unreserve_highatomic_pageblock()
since it's only run from reclaim.

Given that such allocations will pass the watermarks in
__zone_watermark_unusable_free(), it makes sense to fallback to highatomic
reserves the same way that ALLOC_OOM can.

This fixes order-0 page allocation failures observed on Cloudflare's fleet
when handling network packets:

  kswapd1: page allocation failure: order:0, mode:0x820(GFP_ATOMIC),
  nodemask=(null),cpuset=/,mems_allowed=0-7
  CPU: 10 PID: 696 Comm: kswapd1 Kdump: loaded Tainted: G           O 6.6.43-CUSTOM #1
  Hardware name: MACHINE
  Call Trace:
   <IRQ>
   dump_stack_lvl+0x3c/0x50
   warn_alloc+0x13a/0x1c0
   __alloc_pages_slowpath.constprop.0+0xc9d/0xd10
   __alloc_pages+0x327/0x340
   __napi_alloc_skb+0x16d/0x1f0
   bnxt_rx_page_skb+0x96/0x1b0 [bnxt_en]
   bnxt_rx_pkt+0x201/0x15e0 [bnxt_en]
   __bnxt_poll_work+0x156/0x2b0 [bnxt_en]
   bnxt_poll+0xd9/0x1c0 [bnxt_en]
   __napi_poll+0x2b/0x1b0
   bpf_trampoline_6442524138+0x7d/0x1000
   __napi_poll+0x5/0x1b0
   net_rx_action+0x342/0x740
   handle_softirqs+0xcf/0x2b0
   irq_exit_rcu+0x6c/0x90
   sysvec_apic_timer_interrupt+0x72/0x90
   </IRQ>

[[email protected]: update comment]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/CAGis_TWzSu=P7QJmjD58WWiu3zjMTVKSzdOwWE8ORaGytzWJwQ@mail.gmail.com/
Fixes: 1d91df85f399 ("mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs")
Signed-off-by: Matt Fleming <[email protected]>
Suggested-by: Vlastimil Babka <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

fork: only invoke khugepaged, ksm hooks if no error

There is no reason to invoke these hooks early against an mm that is in an
incomplete state.

The change in commit d24062914837 ("fork: use __mt_dup() to duplicate
maple tree in dup_mmap()") makes this more pertinent as we may be in a
state where entries in the maple tree are not yet consistent.

Their placement early in dup_mmap() only appears to have been meaningful
for early error checking, and since functionally it'd require a very small
allocation to fail (in practice 'too small to fail') that'd only occur in
the most dire circumstances, meaning the fork would fail or be OOM'd in
any case.

Since both khugepaged and KSM tracking are there to provide optimisations
to memory performance rather than critical functionality, it doesn't
really matter all that much if, under such dire memory pressure, we fail
to register an mm with these.

As a result, we follow the example of commit d2081b2bf819 ("mm:
khugepaged: make khugepaged_enter() void function") and make ksm_fork() a
void function also.

We only expose the mm to these functions once we are done with them and
only if no error occurred in the fork operation.

Link: https://lkml.kernel.org/r/e0cb8b840c9d1d5a6e84d4f8eff5f3f2022aa10c.1729014377.git.lorenzo.stoakes@oracle.com
Fixes: d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Lorenzo Stoakes <[email protected]>
Reported-by: Jann Horn <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Reviewed-by: Jann Horn <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

fork: do not invoke uffd on fork if error occurs

Patch series "fork: do not expose incomplete mm on fork".

During fork we may place the virtual memory address space into an
inconsistent state before the fork operation is complete.

In addition, we may encounter an error during the fork operation that
indicates that the virtual memory address space is invalidated.

As a result, we should not be exposing it in any way to external machinery
that might interact with the mm or VMAs, machinery that is not designed to
deal with incomplete state.

We specifically update the fork logic to defer khugepaged and ksm to the
end of the operation and only to be invoked if no error arose, and
disallow uffd from observing fork events should an error have occurred.

This patch (of 2):

Currently on fork we expose the virtual address space of a process to
userland unconditionally if uffd is registered in VMAs, regardless of
whether an error arose in the fork.

This is performed in dup_userfaultfd_complete() which is invoked
unconditionally, and performs two duties - invoking registered handlers
for the UFFD_EVENT_FORK event via dup_fctx(), and clearing down
userfaultfd_fork_ctx objects established in dup_userfaultfd().

This is problematic, because the virtual address space may not yet be
correctly initialised if an error arose.

The change in commit d24062914837 ("fork: use __mt_dup() to duplicate
maple tree in dup_mmap()") makes this more pertinent as we may be in a
state where entries in the maple tree are not yet consistent.

We address this by, on fork error, ensuring that we roll back state that
we would otherwise expect to clean up through the event being handled by
userland and perform the memory freeing duty otherwise performed by
dup_userfaultfd_complete().

We do this by implementing a new function, dup_userfaultfd_fail(), which
performs the same loop, only decrementing reference counts.

Note that we perform mmgrab() on the parent and child mm's, however
userfaultfd_ctx_put() will mmdrop() this once the reference count drops to
zero, so we will avoid memory leaks correctly here.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/d3691d58bb58712b6fb3df2be441d175bd3cdf07.1729014377.git.lorenzo.stoakes@oracle.com
Fixes: d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Lorenzo Stoakes <[email protected]>
Reported-by: Jann Horn <[email protected]>
Reviewed-by: Jann Horn <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/pagewalk: fix usage of pmd_leaf()/pud_leaf() without present check

pmd_leaf()/pud_leaf() only implies a pmd_present()/pud_present() check on
some architectures.  We really should check for
pmd_present()/pud_present() first.

This should explain the report we got on ppc64 (which has
CONFIG_PGTABLE_HAS_HUGE_LEAVES set in the config) that triggered:
VM_WARN_ON_ONCE(pmd_leaf(pmdp_get_lockless(pmdp)));

Likely we had a PMD migration entry for which pmd_leaf() did not trigger.
We raced with restoring the PMD migration entry, and suddenly saw a
pmd_leaf().  In this case, pte_offset_map_lock() saved us from more
trouble, because it rechecks the PMD value, but we would not have
processed the migration entry -- which is not too bad because the only
user of FW_MIGRATION is KSM for unsharing, and KSM only applies to small
folios.

Further, we shouldn't re-read the PMD/PUD value for our warning, the
primary purpose of the VM_WARN_ON_ONCE() is to find spurious use of
pmd_leaf()/pud_leaf() without CONFIG_PGTABLE_HAS_HUGE_LEAVES.

As a side note, we are currently not implementing FW_MIGRATION support for
PUD migration entries, which likely should exist due to hugetlb.  Add a
TODO so this won't fall through the cracks if more FW_MIGRATION users get
added.

Was able to write a quick reproducer and verify that the issue no longer triggers with this fix.

https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/reproducers/move-pages-pmd-leaf.c

Without this fix after a couple of seconds in a VM with 2 NUMA nodes:

[   54.333753] ------------[ cut here ]------------
[   54.334901] WARNING: CPU: 20 PID: 1704 at mm/pagewalk.c:815 folio_walk_start+0x48f/0x6e0
[   54.336455] Modules linked in: ...
[   54.345009] CPU: 20 UID: 0 PID: 1704 Comm: move-pages-pmd- Not tainted 6.12.0-rc2+ #81
[   54.346529] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
[   54.348191] RIP: 0010:folio_walk_start+0x48f/0x6e0
[   54.349134] Code: b5 ad 48 8d 35 00 00 00 00 e8 6d 59 d7 ff e8 08 74 da ff e9 9c fe ff ff 4c 8b 7c 24 08 4c 89 ff e8 26 2b be 00 e9 8a fe ff ff <0f> 0b e9 ec fe ff ff f7 c2 ff 0f 00 00 0f 85 81 fe ff ff 48 8b 02
[   54.352660] RSP: 0018:ffffb7e4c430bc78 EFLAGS: 00010282
[   54.353679] RAX: 80000002a3e008e7 RBX: ffff9946039aa580 RCX: ffff994380000000
[   54.355056] RDX: ffff994606aec000 RSI: 00007f004b000000 RDI: 0000000000000000
[   54.356440] RBP: 00007f004b000000 R08: 0000000000000591 R09: 0000000000000001
[   54.357820] R10: 0000000000000200 R11: 0000000000000001 R12: ffffb7e4c430bd10
[   54.359198] R13: ffff994606aec2c0 R14: 0000000000000002 R15: ffff994604a89b00
[   54.360564] FS:  00007f004ae006c0(0000) GS:ffff9947f7400000(0000) knlGS:0000000000000000
[   54.362111] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   54.363242] CR2: 00007f004adffe58 CR3: 0000000281e12005 CR4: 0000000000770ef0
[   54.364615] PKRU: 55555554
[   54.365153] Call Trace:
[   54.365646]  <TASK>
[   54.366073]  ? __warn.cold+0xb7/0x14d
[   54.366796]  ? folio_walk_start+0x48f/0x6e0
[   54.367628]  ? report_bug+0xff/0x140
[   54.368324]  ? handle_bug+0x58/0x90
[   54.369019]  ? exc_invalid_op+0x17/0x70
[   54.369771]  ? asm_exc_invalid_op+0x1a/0x20
[   54.370606]  ? folio_walk_start+0x48f/0x6e0
[   54.371415]  ? folio_walk_start+0x9e/0x6e0
[   54.372227]  do_pages_move+0x1c5/0x680
[   54.372972]  kernel_move_pages+0x1a1/0x2b0
[   54.373804]  __x64_sys_move_pages+0x25/0x30

Link: https://lkml.kernel.org/r/[email protected]
Fixes: aa39ca6940f1 ("mm/pagewalk: introduce folio_walk_start() + folio_walk_end()")
Signed-off-by: David Hildenbrand <[email protected]>
Reported-by: [email protected]
Closes: https://lkml.kernel.org/r/[email protected]
Acked-by: Kirill A. Shutemov <[email protected]>
Acked-by: Qi Zheng <[email protected]>
Cc: Jann Horn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

Linux 6.12-rc5

Merge tag 'x86_urgent_for_v6.12_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

- Prevent a certain range of pages which get marked as hypervisor-only,
   to get allocated to a CoCo (SNP) guest which cannot use them and thus
   fail booting

- Fix the microcode loader on AMD to pay attention to the stepping of a
   patch and to handle the case where a BIOS config option splits the
   machine into logical NUMA nodes per L3 cache slice

- Disable LAM from being built by default due to security concerns

* tag 'x86_urgent_for_v6.12_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Ensure that RMP table fixups are reserved
  x86/microcode/AMD: Split load_microcode_amd()
  x86/microcode/AMD: Pay attention to the stepping dynamically
  x86/lam: Disable ADDRESS_MASKING in most cases

Merge tag 'ftrace-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ftrace fixes from Steven Rostedt:

- Fix missing mutex unlock in error path of register_ftrace_graph()

   A previous fix added a return on an error path and forgot to unlock
   the mutex. Instead of dealing with error paths, use guard(mutex) as
   the mutex is just released at the exit of the function anyway. Other
   functions in this file should be updated with this, but that's a
   cleanup and not a fix.

- Change cpuhp setup name to be consistent with other cpuhp states

   The same fix that the above patch fixes added a cpuhp_setup_state()
   call with the name of "fgraph_idle_init". I was informed that it
   should instead be something like: "fgraph:online". Update that too.

* tag 'ftrace-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  fgraph: Change the name of cpuhp state to "fgraph:online"
  fgraph: Fix missing unlock in register_ftrace_graph()

Merge tag 'platform-drivers-x86-v6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:

- Asus thermal profile fix, fixing performance issues on Lunar Lake

- Intel PMC: one revert for a lockdep issue and one bugfix

- Dell WMI: Ignore some WMI events on suspend/resume to silence warnings

* tag 'platform-drivers-x86-v6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: asus-wmi: Fix thermal profile initialization
  platform/x86: dell-wmi: Ignore suspend notifications
  platform/x86/intel/pmc: Fix pmc_core_iounmap to call iounmap for valid addresses
  platform/x86:intel/pmc: Revert "Enable the ACPI PM Timer to be turned off when suspended"

Merge tag 'firewire-fixes-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire fix from Takashi Sakamoto:
"A single commit to resolve a regression existing in v6.11 or later.

  The change in 1394 OHCI driver in v6.11 kernel could cause general
  protection faults when rediscovering nodes in IEEE 1394 bus while
  holding a spin lock. Consequently, watchdog checks can report a hard
  lockup.

  Currently, this issue is observed primarily during the system resume
  phase when using an extra node with three ports or more is used.
  However, it could potentially occur in the other cases as well"

* tag 'firewire-fixes-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: core: fix invalid port index for parent device

Merge tag 'block-6.12-20241026' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

- Pull request for MD via Song fixing a few issues

- Fix a wrong check in blk_rq_map_user_bvec(), causing IO errors on
   passthrough IO (Xinyu)

* tag 'block-6.12-20241026' of git://git.kernel.dk/linux:
  block: fix sanity checks in blk_rq_map_user_bvec
  md/raid10: fix null ptr dereference in raid10_size()
  md: ensure child flush IO does not affect origin bio->bi_status

Merge tag 'xfs-6.12-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Carlos Maiolino:

- Fix recovery of allocator ops after a growfs

- Do not fail repairs on metadata files with no attr fork

* tag 'xfs-6.12-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: update the pag for the last AG at recovery time
  xfs: don't use __GFP_RETRY_MAYFAIL in xfs_initialize_perag
  xfs: error out when a superblock buffer update reduces the agcount
  xfs: update the file system geometry after recoverying superblock buffers
  xfs: merge the perag freeing helpers
  xfs: pass the exact range to initialize to xfs_initialize_perag
  xfs: don't fail repairs on metadata files with no attr fork

firewire: core: fix invalid port index for parent device

In a commit 24b7f8e5cd65 ("firewire: core: use helper functions for self
ID sequence"), the enumeration over self ID sequence was refactored with
some helper functions with KUnit tests. These helper functions are
guaranteed to work expectedly by the KUnit tests, however their application
includes a mistake to assign invalid value to the index of port connected
to parent device.

This bug affects the case that any extra node devices which has three or
more ports are connected to 1394 OHCI controller. In the case, the path
to update the tree cache could hits WARN_ON(), and gets general protection
fault due to the access to invalid address computed by the invalid value.

This commit fixes the bug to assign correct port index.

Cc: [email protected]
Reported-by: Edmund Raile <[email protected]>
Closes: https://lore.kernel.org/lkml/[email protected]/
Fixes: 24b7f8e5cd65 ("firewire: core: use helper functions for self ID sequence")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Sakamoto <[email protected]>

platform/x86: asus-wmi: Fix thermal profile initialization

When support for vivobook fan profiles was added, the initial
call to throttle_thermal_policy_set_default() was removed, which
however is necessary for full initialization.

Fix this by calling throttle_thermal_policy_set_default() again
when setting up the platform profile.

Fixes: bcbfcebda2cb ("platform/x86: asus-wmi: add support for vivobook fan profiles")
Reported-by: Michael Larabel <[email protected]>
Closes: https://www.phoronix.com/review/lunar-lake-xe2/5
Signed-off-by: Armin Wolf <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Hans de Goede <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>

Merge tag '9p-for-6.12-rc5' of https://github.com/martinetd/linux

Pull more 9p reverts from Dominique Martinet:
"Revert patches causing inode collision problems.

  The code simplification introduced significant regressions on servers
  that do not remap inode numbers when exporting multiple underlying
  filesystems with colliding inodes. See the top-most revert (commit
  be2ca3825372) for details.

  This problem had been ignored for too long and the reverts will also
  head to stable (6.9+).

  I'm confident this set of patches gets us back to previous behaviour
  (another related patch had already been reverted back in April and
  we're almost back to square 1, and the rest didn't touch inode
  lifecycle)"

* tag '9p-for-6.12-rc5' of https://github.com/martinetd/linux:
  Revert "fs/9p: simplify iget to remove unnecessary paths"
  Revert "fs/9p: fix uaf in in v9fs_stat2inode_dotl"
  Revert "fs/9p: remove redundant pointer v9ses"
  Revert " fs/9p: mitigate inode collisions"

Merge tag 'v6.12-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- Fix init module error caseb

- Fix memory allocation error path (for passwords) in mount

* tag 'v6.12-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix warning when destroy 'cifs_io_request_pool'
smb: client: Handle kstrdup failures for passwords

Merge tag 'fuse-fixes-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse fixes from Miklos Szeredi:

- Fix cached size after passthrough writes

   This fix needed a trivial change in the backing-file API, which
   resulted in some non-fuse files being touched.

- Revert a commit meant as a cleanup but which triggered a WARNING

- Remove a stray debug line left-over

* tag 'fuse-fixes-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: remove stray debug line
  Revert "fuse: move initialization of fuse_file to fuse_writepages() instead of in callback"
  fuse: update inode size after extending passthrough write
  fs: pass offset and result to backing_file end_write() callback

Merge tag 'nfsd-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

- Fix a couple of use-after-free bugs

* tag 'nfsd-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: cancel nfsd_shrinker_work using sync mode in nfs4_state_shutdown_net
nfsd: fix race between laundromat and free_stateid

Merge tag 'acpi-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
"These fix an ACPI PRM (Platform Runtime Mechanism) issue and add two
  new DMI quirks, one for an ACPI IRQ override and one for lid switch
  detection:

   - Make acpi_parse_prmt() look for EFI_MEMORY_RUNTIME memory regions
     only to comply with the UEFI specification and make PRM use
     efi_guid_t instead of guid_t to avoid a compiler warning triggered
     by that change (Koba Ko, Dan Carpenter)

   - Add an ACPI IRQ override quirk for LG 16T90SP (Christian Heusel)

   - Add a lid switch detection quirk for Samsung Galaxy Book2 (Shubham
     Panwar)"

* tag 'acpi-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: PRM: Clean up guid type in struct prm_handler_info
  ACPI: button: Add DMI quirk for Samsung Galaxy Book2 to fix initial lid detection issue
  ACPI: resource: Add LG 16T90SP to irq1_level_low_skip_override[]
  ACPI: PRM: Find EFI_MEMORY_RUNTIME block for PRM handler and context

Merge tag 'pm-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"Update cpufreq documentation to match the code after recent changes
  (Christian Loehle), fix a units conversion issue in the CPPC cpufreq
  driver (liwei), and fix an error check in the dtpm_devfreq power
  capping driver (Yuan Can)"

* tag 'pm-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: CPPC: fix perf_to_khz/khz_to_perf conversion exception
  powercap: dtpm_devfreq: Fix error check against dev_pm_qos_add_request()
  cpufreq: docs: Reflect latency changes in docs

Merge tag 'pci-v6.12-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fixes from Bjorn Helgaas:

- Hold the rescan lock while adding devices to avoid race with
   concurrent pwrctl rescan that can lead to a crash (Bartosz
   Golaszewski)

- Avoid binding pwrctl driver to QCom WCN wifi if the DT lacks the
   necessary PMU regulator descriptions (Bartosz Golaszewski)

* tag 'pci-v6.12-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI/pwrctl: Abandon QCom WCN probe on pre-pwrseq device-trees
  PCI: Hold rescan lock while adding devices during host probe

Merge tag 'fbdev-for-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev

Pull fbdev fixes from Helge Deller:

- Fix some build warnings and failures with CONFIG_FB_IOMEM_FOPS and
   CONFIG_FB_DEVICE

- Remove the da8xx fbdev driver

- Constify struct sbus_mmap_map and fix indentation warning

* tag 'fbdev-for-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbdev: wm8505fb: select CONFIG_FB_IOMEM_FOPS
  fbdev: da8xx: remove the driver
  fbdev: Constify struct sbus_mmap_map
  fbdev: nvidiafb: fix inconsistent indentation warning
  fbdev: sstfb: Make CONFIG_FB_DEVICE optional

Merge tag 'gpio-fixes-for-v6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fix from Bartosz Golaszewski:
"Update MAINTAINERS with a keyword pattern for legacy GPIO API

  The goal is to alert us to anyone trying to use the deprecated, legacy
  API (this happens almost every release)"

* tag 'gpio-fixes-for-v6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  MAINTAINERS: add a keyword entry for the GPIO subsystem

Merge tag 'ata-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata fix from Niklas Cassel:

- Fix the handling of ATA commands that timeout (command that did not
   receive a completion interrupt within the configured timeout time).

   Commands that timeout, while also having either the FAILFAST flag
   set, or the command being a passthrough command, should never be
   retried. Restore this behavior (as it was before v6.12-rc1).

* tag 'ata-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: libata: Set DID_TIME_OUT for commands that actually timed out

Merge tag 'sound-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"The majority of changes here are about ASoC.

  There are two core changes in ASoC (the bump of minimal topology ABI
  version and the fix for references of components in DAPM code), and
  others are mostly various device-specific fixes for SoundWire, AMD,
  Intel, SOF, Qualcomm and FSL, in addition to a few usual HD-audio
  quirks and fixes"

* tag 'sound-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (33 commits)
  ALSA: hda/realtek: Update default depop procedure
  ASoC: qcom: sc7280: Fix missing Soundwire runtime stream alloc
  ASoC: fsl_micfil: Add sample rate constraint
  ASoC: rt722-sdca: increase clk_stop_timeout to fix clock stop issue
  ALSA: hda/tas2781: select CRC32 instead of CRC32_SARWATE
  ALSA: hda/realtek: Add subwoofer quirk for Acer Predator G9-593
  ALSA: firewire-lib: Avoid division by zero in apply_constraint_to_size()
  ASoC: fsl_micfil: Add a flag to distinguish with different volume control types
  ASoC: codecs: lpass-rx-macro: fix RXn(rx,n) macro for DSM_CTL and SEC7 regs
  ASoC: Change my e-mail to gmail
  ASoC: Intel: soc-acpi: lnl: Add match entry for TM2 laptops
  ASoC: amd: yc: Fix non-functional mic on ASUS E1404FA
  ASoC: SOF: Intel: hda: Always clean up link DMA during stop
  soundwire: intel_ace2x: Send PDI stream number during prepare
  ASoC: SOF: Intel: hda: Handle prepare without close for non-HDA DAI's
  ASoC: SOF: ipc4-topology: Do not set ALH node_id for aggregated DAIs
  MAINTAINERS: Update maintainer list for MICROCHIP ASOC, SSC and MCP16502 drivers
  ASoC: qcom: Select missing common Soundwire module code on SDM845
  ASoC: fsl_esai: change dev_warn to dev_dbg in irq handler
  ASoC: rsnd: Fix probe failure on HiHope boards due to endpoint parsing
  ...

Merge tag 'drm-fixes-2024-10-25' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Weekly drm fixes, mostly amdgpu and xe, with minor bridge and an i915
  Kconfig fix. Nothing too scary and it seems to be pretty quiet.

  amdgpu:
   - ACPI method handling fixes
   - SMU 14.x fixes
   - Display idle optimization fix
   - DP link layer compliance fix
   - SDMA 7.x fix
   - PSR-SU fix
   - SWSMU fix

  i915:
   - Fix DRM_I915_GVT_KVMGT dependencies in Kconfig

  xe:
   - Increase invalidation timeout to avoid errors in some hosts
   - Flush worker on timeout
   - Better handling for force wake failure
   - Improve argument check on user fence creation
   - Don't restart parallel queues multiple times on GT reset

  bridge:
   - aux: Fix assignment of OF node
   - tc358767: Add missing of_node_put() in error path"

* tag 'drm-fixes-2024-10-25' of https://gitlab.freedesktop.org/drm/kernel:
  drm/xe: Don't restart parallel queues multiple times on GT reset
  drm/xe/ufence: Prefetch ufence addr to catch bogus address
  drm/xe: Handle unreliable MMIO reads during forcewake
  drm/xe/guc/ct: Flush g2h worker in case of g2h response timeout
  drm/xe: Enlarge the invalidation timeout from 150 to 500
  drm/amdgpu: handle default profile on on devices without fullscreen 3D
  drm/amd/display: Disable PSR-SU on Parade 08-01 TCON too
  drm/amdgpu: fix random data corruption for sdma 7
  drm/amd/display: temp w/a for DP Link Layer compliance
  drm/amd/display: temp w/a for dGPU to enter idle optimizations
  drm/amd/pm: update deep sleep status on smu v14.0.2/3
  drm/amd/pm: update overdrive function on smu v14.0.2/3
  drm/amd/pm: update the driver-fw interface file for smu v14.0.2/3
  drm/amd: Guard against bad data for ATIF ACPI method
  drm/bridge: tc358767: fix missing of_node_put() in for_each_endpoint_of_node()
  drm/bridge: Fix assignment of the of_node of the parent to aux bridge
  i915: fix DRM_I915_GVT_KVMGT dependencies

x86: fix whitespace in runtime-const assembler output

The x86 user pointer validation changes made me look at compiler output
a lot, and the wrong indentation for the ".popsection" in the generated
assembler triggered me.

Signed-off-by: Linus Torvalds <[email protected]>

x86: fix user address masking non-canonical speculation issue

It turns out that AMD has a "Meltdown Lite(tm)" issue with non-canonical
accesses in kernel space. And so using just the high bit to decide
whether an access is in user space or kernel space ends up with the good
old "leak speculative data" if you have the right gadget using the
result:

CVE-2020-12965 “Transient Execution of Non-Canonical Accesses“

Now, the kernel surrounds the access with a STAC/CLAC pair, and those
instructions end up serializing execution on older Zen architectures,
which closes the speculation window.

But that was true only up until Zen 5, which renames the AC bit [1].
That improves performance of STAC/CLAC a lot, but also means that the
speculation window is now open.

Note that this affects not just the new address masking, but also the
regular valid_user_address() check used by access_ok(), and the asm
version of the sign bit check in the get_user() helpers.

It does not affect put_user() or clear_user() variants, since there's no
speculative result to be used in a gadget for those operations.

Reported-by: Andrew Cooper <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/all/20241023094448.GAZxjFkEOOF_DM83TQ@fat_crate.local/
Link: https://www.amd.com/en/resources/product-security/bulletin/amd-sb-1010.html
Link: https://arxiv.org/pdf/2108.10771
Cc: Josh Poimboeuf <[email protected]>
Cc: Borislav Petkov <[email protected]>
Tested-by: Maciej Wieczor-Retman <[email protected]> # LAM case
Fixes: 2865baf54077 ("x86: support user address masking instead of non-speculative conditional")
Fixes: 6014bc27561f ("x86-64: make access_ok() independent of LAM")
Fixes: b19b74bc99b1 ("x86/mm: Rework address range check in get_user() and put_user()")
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'pm-powercap'

Merge a dtpm_devfreq power capping driver fix for 6.12-rc5:

- Fix a dev_pm_qos_add_request() return value check in
   __dtpm_devfreq_setup() to prevent it from failing if
   a positive number is returned (Yuan Can).

* pm-powercap:
  powercap: dtpm_devfreq: Fix error check against dev_pm_qos_add_request()

Merge branches 'acpi-resource' and 'acpi-button'

Merge new DMI quirks for 6.12-rc5:

- Add an ACPI IRQ override quirk for LG 16T90SP (Christian Heusel).

- Add a lid switch detection quirk for Samsung Galaxy Book2 (Shubham
   Panwar).

* acpi-resource:
  ACPI: resource: Add LG 16T90SP to irq1_level_low_skip_override[]

* acpi-button:
  ACPI: button: Add DMI quirk for Samsung Galaxy Book2 to fix initial lid detection issue

fuse: remove stray debug line

It wasn't there when the patch was posted for review, but somehow made it
into the pull.

Link: https://lore.kernel.org/all/[email protected]/
Fixes: efad7153bf93 ("fuse: allow O_PATH fd for FUSE_DEV_IOC_BACKING_OPEN")
Signed-off-by: Miklos Szeredi <[email protected]>

Merge tag 'drm-xe-fixes-2024-10-24-1' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- Increase invalidation timeout to avoid errors in some hosts (Shuicheng)
- Flush worker on timeout (Badal)
- Better handling for force wake failure (Shuicheng)
- Improve argument check on user fence creation (Nirmoy)
- Don't restart parallel queues multiple times on GT reset (Nirmoy)

Signed-off-by: Dave Airlie <[email protected]>
From: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/trlkoiewtc4x2cyhsxmj3atayyq4zwto4iryea5pvya2ymc3yp@fdx5nhwmiyem

fgraph: Change the name of cpuhp state to "fgraph:online"

The cpuhp state name given to cpuhp_setup_state() is "fgraph_idle_init"
which doesn't really conform to the names that are used for cpu hotplug
setups. Instead rename it to "fgraph:online" to be in line with other
states.

Cc: Mark Rutland <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/[email protected]
Suggested-by: Masami Hiramatsu <[email protected]>
Fixes: 2c02f7375e658 ("fgraph: Use CPU hotplug mechanism to initialize idle shadow stacks")
Signed-off-by: Steven Rostedt (Google) <[email protected]>

fgraph: Fix missing unlock in register_ftrace_graph()

Use guard(mutex)() to acquire and automatically release ftrace_lock,
fixing the issue of not unlocking when calling cpuhp_setup_state()
fails.

Fixes smatch warning:

kernel/trace/fgraph.c:1317 register_ftrace_graph() warn: inconsistent returns '&ftrace_lock'.

Link: https://lore.kernel.org/[email protected]
Fixes: 2c02f7375e65 ("fgraph: Use CPU hotplug mechanism to initialize idle shadow stacks")
Reported-by: kernel test robot <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]/
Suggested-by: Steven Rostedt <[email protected]>
Signed-off-by: Li Huafei <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

Merge tag 'drm-misc-fixes-2024-10-24' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

bridge:
- aux: Fix assignment of OF node
- tc358767: Add missing of_node_put() in error path

Signed-off-by: Dave Airlie <[email protected]>
From: Thomas Zimmermann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Daniel Borkmann:

- Fix an out-of-bounds read in bpf_link_show_fdinfo for BPF sockmap
   link file descriptors (Hou Tao)

- Fix BPF arm64 JIT's address emission with tag-based KASAN enabled
   reserving not enough size (Peter Collingbourne)

- Fix BPF verifier do_misc_fixups patching for inlining of the
   bpf_get_branch_snapshot BPF helper (Andrii Nakryiko)

- Fix a BPF verifier bug and reject BPF program write attempts into
   read-only marked BPF maps (Daniel Borkmann)

- Fix perf_event_detach_bpf_prog error handling by removing an invalid
   check which would skip BPF program release (Jiri Olsa)

- Fix memory leak when parsing mount options for the BPF filesystem
   (Hou Tao)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Check validity of link->type in bpf_link_show_fdinfo()
  bpf: Add the missing BPF_LINK_TYPE invocation for sockmap
  bpf: fix do_misc_fixups() for bpf_get_branch_snapshot()
  bpf,perf: Fix perf_event_detach_bpf_prog error handling
  selftests/bpf: Add test for passing in uninit mtu_len
  selftests/bpf: Add test for writes to .rodata
  bpf: Remove MEM_UNINIT from skb/xdp MTU helpers
  bpf: Fix overloading of MEM_UNINIT's meaning
  bpf: Add MEM_WRITE attribute
  bpf: Preserve param->string when parsing mount options
  bpf, arm64: Fix address emission with tag-based KASAN enabled

Merge tag 'net-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from netfiler, xfrm and bluetooth.

  Oddly this includes a fix for a posix clock regression; in our
  previous PR we included a change there as a pre-requisite for
  networking one. That fix proved to be buggy and requires the follow-up
  included here. Thomas suggested we should send it, given we sent the
  buggy patch.

  Current release - regressions:

   - posix-clock: Fix unbalanced locking in pc_clock_settime()

   - netfilter: fix typo causing some targets not to load on IPv6

  Current release - new code bugs:

   - xfrm: policy: remove last remnants of pernet inexact list

  Previous releases - regressions:

   - core: fix races in netdev_tx_sent_queue()/dev_watchdog()

   - bluetooth: fix UAF on sco_sock_timeout

   - eth: hv_netvsc: fix VF namespace also in synthetic NIC
     NETDEV_REGISTER event

   - eth: usbnet: fix name regression

   - eth: be2net: fix potential memory leak in be_xmit()

   - eth: plip: fix transmit path breakage

  Previous releases - always broken:

   - sched: deny mismatched skip_sw/skip_hw flags for actions created by
     classifiers

   - netfilter: bpf: must hold reference on net namespace

   - eth: virtio_net: fix integer overflow in stats

   - eth: bnxt_en: replace ptp_lock with irqsave variant

   - eth: octeon_ep: add SKB allocation failures handling in
     __octep_oq_process_rx()

  Misc:

   - MAINTAINERS: add Simon as an official reviewer"

* tag 'net-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits)
  net: dsa: mv88e6xxx: support 4000ps cycle counter period
  net: dsa: mv88e6xxx: read cycle counter period from hardware
  net: dsa: mv88e6xxx: group cycle counter coefficients
  net: usb: qmi_wwan: add Fibocom FG132 0x0112 composition
  hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event
  net: dsa: microchip: disable EEE for KSZ879x/KSZ877x/KSZ876x
  Bluetooth: ISO: Fix UAF on iso_sock_timeout
  Bluetooth: SCO: Fix UAF on sco_sock_timeout
  Bluetooth: hci_core: Disable works on hci_unregister_dev
  posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime()
  r8169: avoid unsolicited interrupts
  net: sched: use RCU read-side critical section in taprio_dump()
  net: sched: fix use-after-free in taprio_change()
  net/sched: act_api: deny mismatched skip_sw/skip_hw flags for actions created by classifiers
  net: usb: usbnet: fix name regression
  mlxsw: spectrum_router: fix xa_store() error checking
  virtio_net: fix integer overflow in stats
  net: fix races in netdev_tx_sent_queue()/dev_watchdog()
  net: wwan: fix global oob in wwan_rtnl_policy
  netfilter: xtables: fix typo causing some targets not to load on IPv6
  ...

Merge tag 'hid-for-linus-20241024' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:
"Device-specific functionality quirks for Thinkpad X1 Gen3, Logitech
  Bolt and some Goodix touchpads (Bartłomiej Maryńczak, Hans de Goede
  and Kenneth Albanowski)"

* tag 'hid-for-linus-20241024' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: lenovo: Add support for Thinkpad X1 Tablet Gen 3 keyboard
  HID: multitouch: Add quirk for Logitech Bolt receiver w/ Casa touchpad
  HID: i2c-hid: Delayed i2c resume wakeup for 0x0d42 Goodix touchpad

Merge tag 'drm-intel-fixes-2024-10-24' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Fix DRM_I915_GVT_KVMGT dependencies in Kconfig

Signed-off-by: Dave Airlie <[email protected]>
From: Joonas Lahtinen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Revert "fs/9p: simplify iget to remove unnecessary paths"

This reverts commit 724a08450f74b02bd89078a596fd24857827c012.

This code simplification introduced significant regressions on servers
that do not remap inode numbers when exporting multiple underlying
filesystems with colliding inodes, as can be illustrated with simple
tmpfs exports in qemu with remapping disabled:
```
# host side
cd /tmp/linux-test
mkdir m1 m2
mount -t tmpfs tmpfs m1
mount -t tmpfs tmpfs m2
mkdir m1/dir m2/dir
echo foo > m1/dir/foo
echo bar > m2/dir/bar

# guest side
# started with -virtfs local,path=/tmp/linux-test,mount_tag=tmp,security_model=mapped-file
mount -t 9p -o trans=virtio,debug=1 tmp /mnt/t

ls /mnt/t/m1/dir
# foo
ls /mnt/t/m2/dir
# bar (works ok if directry isn't open)

# cd to keep first dir's inode alive
cd /mnt/t/m1/dir
ls /mnt/t/m2/dir
# foo (should be bar)
```
Other examples can be crafted with regular files with fscache enabled,
in which case I/Os just happen to the wrong file leading to
corruptions, or guest failing to boot with:
| VFS: Lookup of 'com.android.runtime' in 9p 9p would have caused loop

In theory, we'd want the servers to be smart enough and ensure they
never send us two different files with the same 'qid.path', but while
qemu has an option to remap that is recommended (and qemu prints a
warning if this case happens), there are many other servers which do
not (kvmtool, nfs-ganesha, probably diod...), we should at least ensure
we don't cause regressions on this:
- assume servers can't be trusted and operations that should get a 'new'
inode properly do so. commit d05dcfdf5e16 (" fs/9p: mitigate inode
collisions") attempted to do this, but v9fs_fid_iget_dotl() was not
called so some higher level of caching got in the way; this needs to be
fixed properly before we can re-apply the patches.
- if we ever want to really simplify this code, we will need to add some
negotiation with the server at mount time where the server could claim
they handle this properly, at which point we could optimize this out.
(but that might not be needed at all if we properly handle the 'new'
check?)

Fixes: 724a08450f74 ("fs/9p: simplify iget to remove unnecessary paths")
Reported-by: Will Deacon <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lkml.kernel.org/r/20240923100508.GA32066@willie-the-truck
Cc: [email protected] # v6.9+
Message-ID: <20241024-revert_iget-v1-4-4cac63d25f72@codewreck.org>
Signed-off-by: Dominique Martinet <[email protected]>

Revert "fs/9p: fix uaf in in v9fs_stat2inode_dotl"

This reverts commit 11763a8598f888dec631a8a903f7ada32181001f.

This is a requirement to revert commit 724a08450f74 ("fs/9p: simplify
iget to remove unnecessary paths"), see that revert for details.

Fixes: 724a08450f74 ("fs/9p: simplify iget to remove unnecessary paths")
Reported-by: Will Deacon <[email protected]>
Link: https://lkml.kernel.org/r/20240923100508.GA32066@willie-the-truck
Cc: [email protected] # v6.9+
Message-ID: <20241024-revert_iget-v1-3-4cac63d25f72@codewreck.org>
Signed-off-by: Dominique Martinet <[email protected]>

Revert "fs/9p: remove redundant pointer v9ses"

This reverts commit 10211b4a23cf4a3df5c11a10e5b3d371f16a906f.

This is a requirement to revert commit 724a08450f74 ("fs/9p: simplify
iget to remove unnecessary paths"), see that revert for details.

Fixes: 724a08450f74 ("fs/9p: simplify iget to remove unnecessary paths")
Reported-by: Will Deacon <[email protected]>
Link: https://lkml.kernel.org/r/20240923100508.GA32066@willie-the-truck
Cc: [email protected] # v6.9+
Message-ID: <20241024-revert_iget-v1-2-4cac63d25f72@codewreck.org>
Signed-off-by: Dominique Martinet <[email protected]>

Revert " fs/9p: mitigate inode collisions"

This reverts commit d05dcfdf5e1659b2949d13060284eff3888b644e.

This is a requirement to revert commit 724a08450f74 ("fs/9p: simplify
iget to remove unnecessary paths"), see that revert for details.

Fixes: 724a08450f74 ("fs/9p: simplify iget to remove unnecessary paths")
Reported-by: Will Deacon <[email protected]>
Link: https://lkml.kernel.org/r/20240923100508.GA32066@willie-the-truck
Cc: [email protected] # v6.9+
Message-ID: <20241024-revert_iget-v1-1-4cac63d25f72@codewreck.org>
Signed-off-by: Dominique Martinet <[email protected]>

Merge tag 'amd-drm-fixes-6.12-2024-10-23' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.12-2024-10-23:

amdgpu:
- ACPI method handling fixes
- SMU 14.x fixes
- Display idle optimization fix
- DP link layer compliance fix
- SDMA 7.x fix
- PSR-SU fix
- SWSMU fix

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'loongarch-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
"Get correct cores_per_package for SMT systems, enable IRQ if do_ale()
  triggered in irq-enabled context, and fix some bugs about vDSO, memory
  managenent, hrtimer in KVM, etc"

* tag 'loongarch-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: KVM: Mark hrtimer to expire in hard interrupt context
  LoongArch: Make KASAN usable for variable cpu_vabits
  LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space
  LoongArch: Don't crash in stack_top() for tasks without vDSO
  LoongArch: Set correct size for vDSO code mapping
  LoongArch: Enable IRQ if do_ale() triggered in irq-enabled context
  LoongArch: Get correct cores_per_package for SMT systems
  LoongArch: Use "Exception return address" to comment ERA

Merge tag 'probes-fixes-v6.12-rc4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes fixes from Masami Hiramatsu:

- objpool: Fix choosing allocation for percpu slots

   Fixes to allocate objpool's percpu slots correctly according to the
   GFP flag. It checks whether "any bit" in GFP_ATOMIC is set to choose
   the vmalloc source, but it should check "all bits" in GFP_ATOMIC flag
   is set, because GFP_ATOMIC is a combined flag.

- tracing/probes: Fix MAX_TRACE_ARGS limit handling

   If more than MAX_TRACE_ARGS are passed for creating a probe event,
   the entries over MAX_TRACE_ARG in trace_arg array are not
   initialized. Thus if the kernel accesses those entries, it crashes.
   This rejects creating event if the number of arguments is over
   MAX_TRACE_ARGS.

- tracing: Consider the NUL character when validating the event length

   A strlen() is used when parsing the event name, and the original code
   does not consider the terminal null byte. Thus it can pass the name
   one byte longer than the buffer. This fixes to check it correctly.

* tag 'probes-fixes-v6.12-rc4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Consider the NULL character when validating the event length
  tracing/probes: Fix MAX_TRACE_ARGS limit handling
  objpool: fix choosing allocation for percpu slots

Merge tag 'for-6.12-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

- mount option fixes:
     - fix handling of compression mount options on remount
     - reject rw remount in case there are options that don't work
       in read-write mode (like rescue options)

- fix zone accounting of unusable space

- fix in-memory corruption when merging extent maps

- fix delalloc range locking for sector < page

- use more convenient default value of drop subtree threshold, clean
   more subvolumes without the fallback to marking quotas inconsistent

- fix smatch warning about incorrect value passed to ERR_PTR

* tag 'for-6.12-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix passing 0 to ERR_PTR in btrfs_search_dir_index_item()
  btrfs: reject ro->rw reconfiguration if there are hard ro requirements
  btrfs: fix read corruption due to race with extent map merging
  btrfs: fix the delalloc range locking if sector size < page size
  btrfs: qgroup: set a more sane default value for subtree drop threshold
  btrfs: clear force-compress on remount when compress mount option is given
  btrfs: zoned: fix zone unusable accounting for freed reserved extent

Merge tag 'jfs-6.12-rc5' of github.com:kleikamp/linux-shaggy

Pull jfs fix from David Kleikamp:
"Fix a regression introduced in 6.12-rc1"

* tag 'jfs-6.12-rc5' of github.com:kleikamp/linux-shaggy:
jfs: Fix sanity check in dbMount

Merge tag 'bcachefs-2024-10-22' of https://github.com/koverstreet/bcachefs

Pull bcachefs fixes from Kent Overstreet:
"Lots of hotfixes:

   - transaction restart injection has been shaking out a few things

   - fix a data corruption in the buffered write path on -ENOSPC, found
     by xfstests generic/299

   - Some small show_options fixes

   - Repair mismatches in inode hash type, seed: different snapshot
     versions of an inode must have the same hash/type seed, used for
     directory entries and xattrs. We were checking the hash seed, but
     not the type, and a user contributed a filesystem where the hash
     type on one inode had somehow been flipped; these fixes allow his
     filesystem to repair.

     Additionally, the hash type flip made some directory entries
     invisible, which were then recreated by userspace; so the hash
     check code now checks for duplicate non dangling dirents, and
     renames one of them if necessary.

   - Don't use wait_event_interruptible() in recovery: this fixes some
     filesystems failing to mount with -ERESTARTSYS

   - Workaround for kvmalloc not supporting > INT_MAX allocations,
     causing an -ENOMEM when allocating the sorted array of journal
     keys: this allows a 75 TB filesystem to mount

   - Make sure bch_inode_unpacked.bi_snapshot is set in the old inode
     compat path: this alllows Marcin's filesystem (in use since before
     6.7) to repair and mount"

* tag 'bcachefs-2024-10-22' of https://github.com/koverstreet/bcachefs: (26 commits)
  bcachefs: Set bch_inode_unpacked.bi_snapshot in old inode path
  bcachefs: Mark more errors as AUTOFIX
  bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations
  bcachefs: Don't use wait_event_interruptible() in recovery
  bcachefs: Fix __bch2_fsck_err() warning
  bcachefs: fsck: Improve hash_check_key()
  bcachefs: bch2_hash_set_or_get_in_snapshot()
  bcachefs: Repair mismatches in inode hash seed, type
  bcachefs: Add hash seed, type to inode_to_text()
  bcachefs: INODE_STR_HASH() for bch_inode_unpacked
  bcachefs: Run in-kernel offline fsck without ratelimit errors
  bcachefs: skip mount option handle for empty string.
  bcachefs: fix incorrect show_options results
  bcachefs: Fix data corruption on -ENOSPC in buffered write path
  bcachefs: bch2_folio_reservation_get_partial() is now better behaved
  bcachefs: fix disk reservation accounting in bch2_folio_reservation_get()
  bcachefS: ec: fix data type on stripe deletion
  bcachefs: Don't use commit_do() unnecessarily
  bcachefs: handle restarts in bch2_bucket_io_time_reset()
  bcachefs: fix restart handling in __bch2_resume_logged_op_finsert()
  ...

Revert "9p: Enable multipage folios"

This reverts commit 1325e4a91a405f88f1b18626904d37860a4f9069.

using multipage folios apparently break some madvise operations like
MADV_PAGEOUT which do not reliably unload the specified page anymore,

Revert the patch until that is figured out.

Reported-by: Andrii Nakryiko <[email protected]>
Fixes: 1325e4a91a40 ("9p: Enable multipage folios")
Signed-off-by: Dominique Martinet <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drm/xe: Don't restart parallel queues multiple times on GT reset

In case of parallel submissions multiple GuC id will point to the
same exec queue and on GT reset such exec queues will get restarted
multiple times which is not desirable.

v2: don't use exec_queue_enabled() which could race,
do the same for xe_guc_submit_stop (Matt B)

Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2295
Cc: Jonathan Cavitt <[email protected]>
Cc: Himal Prasad Ghimiray <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Tejas Upadhyay <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>
(cherry picked from commit c8b0acd6d8745fd7e6450f5acc38f0227bd253b3)
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe/ufence: Prefetch ufence addr to catch bogus address

access_ok() only checks for addr overflow so also try to read the addr
to catch invalid addr sent from userspace.

Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1630
Cc: Francois Dugast <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: Matthew Brost <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>
(cherry picked from commit 9408c4508483ffc60811e910a93d6425b8e63928)
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe: Handle unreliable MMIO reads during forcewake

In some cases, when the driver attempts to read an MMIO register,
the hardware may return 0xFFFFFFFF. The current force wake path
code treats this as a valid response, as it only checks the BIT.
However, 0xFFFFFFFF should be considered an invalid value, indicating
a potential issue. To address this, we should add a log entry to
highlight this condition and return failure.
The force wake failure log level is changed from notice to err
to match the failure return value.

v2 (Matt Brost):
  - set ret value (-EIO) to kick the error to upper layers
v3 (Rodrigo):
  - add commit message for the log level promotion from notice to err
v4:
  - update reviewed info

Suggested-by: Alex Zuo <[email protected]>
Signed-off-by: Shuicheng Lin <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Michal Wajdeczko <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Acked-by: Badal Nilawar <[email protected]>
Cc: Anshuman Gupta <[email protected]>
Cc: Matt Roper <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
(cherry picked from commit a9fbeabe7226a3bf90f82d0e28a02c18e3c67447)
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe/guc/ct: Flush g2h worker in case of g2h response timeout

In case if g2h worker doesn't get opportunity to within specified
timeout delay then flush the g2h worker explicitly.

v2:
  - Describe change in the comment and add TODO (Matt B/John H)
  - Add xe_gt_warn on fence done after G2H flush (John H)
v3:
  - Updated the comment with root cause
  - Clean up xe_gt_warn message (John H)

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1620
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2902
Signed-off-by: Badal Nilawar <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: John Harrison <[email protected]>
Cc: Himal Prasad Ghimiray <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Acked-by: Matthew Brost <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit e5152723380404acb8175e0777b1cea57f319a01)
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe: Enlarge the invalidation timeout from 150 to 500

There are error messages like below that are occurring during stress
testing: "[   31.004009] xe 0000:03:00.0: [drm] ERROR GT0: Global
invalidation timeout". Previously it was hitting this 3 out of 1000
executions of warm reboot.  After raising it to 500, 1000 warm reboot
executions passed and it didn't fail.

Due to the way xe_mmio_wait32() is implemented, the timeout is able to
expire early when the register matches the expected value due to the
wait increments starting small. So, the larger timeout value should have
no effect during normal use cases.

v2 (Jonathan):
  - rework the commit message
v3 (Lucas):
  - add conclusive message for the fail rate and test case
v4:
  - add suggested-by

Suggested-by: Jia Yao <[email protected]>
Signed-off-by: Shuicheng Lin <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: Nirmoy Das <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Tested-by: Zongyao Bai <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Signed-off-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 2eb460ab9f4bc5b575f52568d17936da0af681d8)
[ Fix conflict with gt->mmio ]
Signed-off-by: Lucas De Marchi <[email protected]>

Merge branch 'add-the-missing-bpf_link_type-invocation-for-sockmap'

Hou Tao says:

====================
Add the missing BPF_LINK_TYPE invocation for sockmap

From: Hou Tao <[email protected]>

Hi,

The tiny patch set fixes the out-of-bound read problem when reading the
fdinfo of sock map link fd. And in order to spot such omission early for
the newly-added link type in the future, it also checks the validity of
the link->type and adds a WARN_ONCE() for missed invocation.

Please see individual patches for more details. And comments are always
welcome.

v3:
* patch #2: check and warn the validity of link->type instead of
adding a static assertion for bpf_link_type_strs array.

v2: http://lore.kernel.org/bpf/d49fa2f4-f743-c763-7579-c3cab4dd88cb@huaweicloud.com
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Andrii Nakryiko <[email protected]>

bpf: Check validity of link->type in bpf_link_show_fdinfo()

If a newly-added link type doesn't invoke BPF_LINK_TYPE(), accessing
bpf_link_type_strs[link->type] may result in an out-of-bounds access.

To spot such missed invocations early in the future, checking the
validity of link->type in bpf_link_show_fdinfo() and emitting a warning
when such invocations are missed.

Signed-off-by: Hou Tao <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf: Add the missing BPF_LINK_TYPE invocation for sockmap

There is an out-of-bounds read in bpf_link_show_fdinfo() for the sockmap
link fd. Fix it by adding the missing BPF_LINK_TYPE invocation for
sockmap link

Also add comments for bpf_link_type to prevent missing updates in the
future.

Fixes: 699c23f02c65 ("bpf: Add bpf_link support for sk_msg and sk_skb progs")
Signed-off-by: Hou Tao <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

cpufreq: CPPC: fix perf_to_khz/khz_to_perf conversion exception

When the nominal_freq recorded by the kernel is equal to the lowest_freq,
and the frequency adjustment operation is triggered externally, there is
a logic error in cppc_perf_to_khz()/cppc_khz_to_perf(), resulting in perf
and khz conversion errors.

Fix this by adding a branch processing logic when nominal_freq is equal
to lowest_freq.

Fixes: ec1c7ad47664 ("cpufreq: CPPC: Fix performance/frequency conversion")
Signed-off-by: liwei <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
Link: https://patch.msgid.link/[email protected]
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <[email protected]>

ACPI: PRM: Clean up guid type in struct prm_handler_info

Clang 19 prints a warning when we pass &th->guid to efi_pa_va_lookup():

drivers/acpi/prmt.c:156:29: error: passing 1-byte aligned argument to
4-byte aligned parameter 1 of 'efi_pa_va_lookup' may result in an
unaligned pointer access [-Werror,-Walign-mismatch]
  156 |                         (void *)efi_pa_va_lookup(&th->guid, handler_info->handler_address);
      |                                                  ^

The problem is that efi_pa_va_lookup() takes a efi_guid_t and &th->guid
is a regular guid_t.  The difference between the two types is the
alignment.  efi_guid_t is a typedef.

typedef guid_t efi_guid_t __aligned(__alignof__(u32));

It's possible that this a bug in Clang 19.  Even though the alignment of
&th->guid is not explicitly specified, it will still end up being aligned
at 4 or 8 bytes.

Anyway, as Ard points out, it's cleaner to change guid to efi_guid_t type
and that also makes the warning go away.

Fixes: 088984c8d54c ("ACPI: PRM: Find EFI_MEMORY_RUNTIME block for PRM handler and context")
Reported-by: Linux Kernel Functional Testing <[email protected]>
Suggested-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Dan Carpenter <[email protected]>
Tested-by: Paul E. McKenney <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Link: https://patch.msgid.link/[email protected]
[ rjw: Subject edit ]
Signed-off-by: Rafael J. Wysocki <[email protected]>

Merge branch 'net-dsa-mv88e6xxx-fix-mv88e6393x-phc-frequency-on-internal-clock'

Shenghao Yang says:

====================
net: dsa: mv88e6xxx: fix MV88E6393X PHC frequency on internal clock

The MV88E6393X family of switches can additionally run their cycle
counters using a 250MHz internal clock instead of the usual 125MHz
external clock [1].

The driver currently assumes all designs utilize that external clock,
but MikroTik's RB5009 uses the internal source - causing the PHC to be
seen running at 2x real time in userspace, making synchronization
with ptp4l impossible.

This series adds support for reading off the cycle counter frequency
known to the hardware in the TAI_CLOCK_PERIOD register and picking an
appropriate set of scaling coefficients instead of using a fixed set
for each switch family.

Patch 1 groups those cycle counter coefficients into a new structure to
make it easier to pass them around.

Patch 2 modifies PTP initialization to probe TAI_CLOCK_PERIOD and
use an appropriate set of coefficients.

Patch 3 adds support for 4000ps cycle counter periods.

Changes since v2 [2]:

- Patch 1: "net: dsa: mv88e6xxx: group cycle counter coefficients"
  - Moved declaration of mv88e6xxx_cc_coeffs to avoid moving that in
    Patch 2.

- Patch 2: "net: dsa: mv88e6xxx: read cycle counter period from hardware"
  - Removed move of mv88e6xxx_cc_coeffs declaration.

- Patch 3: "net: dsa: mv88e6xxx: support 4000ps cycle counter periods"
  - No change.

[1] https://lore.kernel.org/netdev/d6622575-bf1b-445a-b08f-2739e3642aae@lunn.ch/
[2] https://lore.kernel.org/netdev/20241006145951 [email protected]/
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: dsa: mv88e6xxx: support 4000ps cycle counter period

The MV88E6393X family of devices can run its cycle counter off
an internal 250MHz clock instead of an external 125MHz one.

Add support for this cycle counter period by adding another set
of coefficients and lowering the periodic cycle counter read interval
to compensate for faster overflows at the increased frequency.

Otherwise, the PHC runs at 2x real time in userspace and cannot be
synchronized.

Fixes: de776d0d316f ("net: dsa: mv88e6xxx: add support for mv88e6393x family")
Signed-off-by: Shenghao Yang <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: dsa: mv88e6xxx: read cycle counter period from hardware

Instead of relying on a fixed mapping of hardware family to cycle
counter frequency, pull this information from the
MV88E6XXX_TAI_CLOCK_PERIOD register.

This lets us support switches whose cycle counter frequencies depend on
board design.

Fixes: de776d0d316f ("net: dsa: mv88e6xxx: add support for mv88e6393x family")
Suggested-by: Andrew Lunn <[email protected]>
Signed-off-by: Shenghao Yang <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: dsa: mv88e6xxx: group cycle counter coefficients

Instead of having them as individual fields in ptp_ops, wrap the
coefficients in a separate struct so they can be referenced together.

Fixes: de776d0d316f ("net: dsa: mv88e6xxx: add support for mv88e6393x family")
Signed-off-by: Shenghao Yang <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: usb: qmi_wwan: add Fibocom FG132 0x0112 composition

Add Fibocom FG132 0x0112 composition:

T:  Bus=03 Lev=02 Prnt=06 Port=01 Cnt=02 Dev#= 10 Spd=12   MxCh= 0
D:  Ver= 2.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=2cb7 ProdID=0112 Rev= 5.15
S:  Manufacturer=Fibocom Wireless Inc.
S:  Product=Fibocom Module
S:  SerialNumber=xxxxxxxx
C:* #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
E:  Ad=82(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
E:  Ad=81(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=01(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=83(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=85(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
E:  Ad=84(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=03(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E:  Ad=86(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=04(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms

Signed-off-by: Reinhard Speyerer <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event

The existing code moves VF to the same namespace as the synthetic NIC
during netvsc_register_vf(). But, if the synthetic device is moved to a
new namespace after the VF registration, the VF won't be moved together.

To make the behavior more consistent, add a namespace check for synthetic
NIC's NETDEV_REGISTER event (generated during its move), and move the VF
if it is not in the same namespace.

Cc: [email protected]
Fixes: c0a41b887ce6 ("hv_netvsc: move VF to same namespace as netvsc device")
Suggested-by: Stephen Hemminger <[email protected]>
Signed-off-by: Haiyang Zhang <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: dsa: microchip: disable EEE for KSZ879x/KSZ877x/KSZ876x

The well-known errata regarding EEE not being functional on various KSZ
switches has been refactored a few times. Recently the refactoring has
excluded several switches that the errata should also apply to.

Disable EEE for additional switches with this errata and provide
additional comments referring to the public errata document.

The original workaround for the errata was applied with a register
write to manually disable the EEE feature in MMD 7:60 which was being
applied for KSZ9477/KSZ9897/KSZ9567 switch ID's.

Then came commit 26dd2974c5b5 ("net: phy: micrel: Move KSZ9477 errata
fixes to PHY driver") and commit 6068e6d7ba50 ("net: dsa: microchip:
remove KSZ9477 PHY errata handling") which moved the errata from the
switch driver to the PHY driver but only for PHY_ID_KSZ9477 (PHY ID)
however that PHY code was dead code because an entry was never added
for PHY_ID_KSZ9477 via MODULE_DEVICE_TABLE.

This was apparently realized much later and commit 54a4e5c16382 ("net:
phy: micrel: add Microchip KSZ 9477 to the device table") added the
PHY_ID_KSZ9477 to the PHY driver but as the errata was only being
applied to PHY_ID_KSZ9477 it's not completely clear what switches
that relates to.

Later commit 6149db4997f5 ("net: phy: micrel: fix KSZ9477 PHY issues
after suspend/resume") breaks this again for all but KSZ9897 by only
applying the errata for that PHY ID.

Following that this was affected with commit 08c6d8bae48c("net: phy:
Provide Module 4 KSZ9477 errata (DS80000754C)") which removes
the blatant register write to MMD 7:60 and replaces it by
setting phydev->eee_broken_modes = -1 so that the generic phy-c45 code
disables EEE but this is only done for the KSZ9477_CHIP_ID (Switch ID).

Lastly commit 0411f73c13af ("net: dsa: microchip: disable EEE for
KSZ8567/KSZ9567/KSZ9896/KSZ9897.") adds some additional switches
that were missing to the errata due to the previous changes.

This commit adds an additional set of switches.

Fixes: 0411f73c13af ("net: dsa: microchip: disable EEE for KSZ8567/KSZ9567/KSZ9896/KSZ9897.")
Signed-off-by: Tim Harvey <[email protected]>
Reviewed-by: Oleksij Rempel <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge tag 'for-net-2024-10-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

- hci_core: Disable works on hci_unregister_dev
- SCO: Fix UAF on sco_sock_timeout
- ISO: Fix UAF on iso_sock_timeout

* tag 'for-net-2024-10-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: ISO: Fix UAF on iso_sock_timeout
  Bluetooth: SCO: Fix UAF on sco_sock_timeout
  Bluetooth: hci_core: Disable works on hci_unregister_dev
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

ata: libata: Set DID_TIME_OUT for commands that actually timed out

When ata_qc_complete() schedules a command for EH using
ata_qc_schedule_eh(), blk_abort_request() will be called, which leads to
req->q->mq_ops->timeout() / scsi_timeout() being called.

scsi_timeout(), if the LLDD has no abort handler (libata has no abort
handler), will set host byte to DID_TIME_OUT, and then call
scsi_eh_scmd_add() to add the command to EH.

Thus, when commands first enter libata's EH strategy_handler, all the
commands that have been added to EH will have DID_TIME_OUT set.

Commit e5dd410acb34 ("ata: libata: Clear DID_TIME_OUT for ATA PT commands
with sense data") clears this bogus DID_TIME_OUT flag for all commands
that reached libata's EH strategy_handler.

libata has its own flag (AC_ERR_TIMEOUT), that it sets for commands that
have not received a completion at the time of entering EH.

ata_eh_worth_retry() has no special handling for AC_ERR_TIMEOUT, so by
default timed out commands will get flag ATA_QCFLAG_RETRY set, and will be
retried after the port has been reset (ata_eh_link_autopsy() always
triggers a port reset if any command has AC_ERR_TIMEOUT set).

For a command that has ATA_QCFLAG_RETRY set, while also having an error
flag set (e.g. AC_ERR_TIMEOUT), ata_eh_finish() will not increment
scmd->allowed, so the command will at most be retried scmd->allowed number
of times (which by default is set to 3).

However, scsi_eh_flush_done_q() will only retry commands for which
scsi_noretry_cmd() returns false.

For a command that has DID_TIME_OUT set, while also having either the
FAILFAST flag set, or the command being a passthrough command,
scsi_noretry_cmd() will return true. Thus, such a command will never be
retried.

Thus, make sure that libata sets SCSI's DID_TIME_OUT flag for commands that
actually timed out (libata's AC_ERR_TIMEOUT flag), such that timed out
commands will once again not be retried if they are also a FAILFAST or
passthrough command.

Cc: [email protected]
Fixes: e5dd410acb34 ("ata: libata: Clear DID_TIME_OUT for ATA PT commands with sense data")
Reported-by: Lai, Yi <[email protected]>
Closes: https://lore.kernel.org/linux-ide/ZxYz871I3Blsi30F@ly-workstation/
Reviewed-by: Damien Le Moal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Niklas Cassel <[email protected]>

Merge tag 'ipsec-2024-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2024-10-22

1) Fix routing behavior that relies on L4 information
   for xfrm encapsulated packets.
   From Eyal Birger.

2) Remove leftovers of pernet policy_inexact lists.
   From Florian Westphal.

3) Validate new SA's prefixlen when the selector family is
   not set from userspace.
   From Sabrina Dubroca.

4) Fix a kernel-infoleak when dumping an auth algorithm.
   From Petr Vaganov.

Please pull or let me know if there are problems.

ipsec-2024-10-22

* tag 'ipsec-2024-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm: fix one more kernel-infoleak in algo dumping
  xfrm: validate new SA's prefixlen using SA family when sel.family is unset
  xfrm: policy: remove last remnants of pernet inexact list
  xfrm: respect ip protocols rules criteria when performing dst lookups
  xfrm: extract dst lookup parameters into a struct
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge tag 'asoc-fix-v6.12-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.12

An uncomfortably large set of fixes due to me not getting round to
sending them for longer than I should due to travel and illness.  This
is mostly smaller driver specific changes, but there are a couple of
generic changes:

- Bumping the minimal topology ABI we check for during validation, the
   code had support for v4 removed previously but the update of the
   define used for initial validation was missed.
- Fix the assumption that DAPM structs will be embedded in a component
   which isn't true for card widgets when doing name comparisons, though
   fortunately this is rarely triggered.

We've pulled in one Soundwire fix which was part of a larger series
fixing cleanup issues in on Intel Soundwire systems.

bpf: fix do_misc_fixups() for bpf_get_branch_snapshot()

We need `goto next_insn;` at the end of patching instead of `continue;`.
It currently works by accident by making verifier re-process patched
instructions.

Reported-by: Shung-Hsi Yu <[email protected]>
Fixes: 314a53623cd4 ("bpf: inline bpf_get_branch_snapshot() helper")
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Acked-by: Shung-Hsi Yu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

block: fix sanity checks in blk_rq_map_user_bvec

blk_rq_map_user_bvec contains a check bytes + bv->bv_len > nr_iter which
causes unnecessary failures in NVMe passthrough I/O, reproducible as
follows:

- register a 2 page, page-aligned buffer against a ring
- use that buffer to do a 1 page io_uring NVMe passthrough read

The second (i = 1) iteration of the loop in blk_rq_map_user_bvec will
then have nr_iter == 1 page, bytes == 1 page, bv->bv_len == 1 page, so
the check bytes + bv->bv_len > nr_iter will succeed, causing the I/O to
fail. This failure is unnecessary, as when the check succeeds, it means
we've checked the entire buffer that will be used by the request - i.e.
blk_rq_map_user_bvec should complete successfully. Therefore, terminate
the loop early and return successfully when the check bytes + bv->bv_len
> nr_iter succeeds.

While we're at it, also remove the check that all segments in the bvec
are single-page. While this seems to be true for all users of the
function, it doesn't appear to be required anywhere downstream.

CC: [email protected]
Signed-off-by: Xinyu Zhang <[email protected]>
Co-developed-by: Uday Shankar <[email protected]>
Signed-off-by: Uday Shankar <[email protected]>
Fixes: 37987547932c ("block: extend functionality to map bvec iterator")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

bpf,perf: Fix perf_event_detach_bpf_prog error handling

Peter reported that perf_event_detach_bpf_prog might skip to release
the bpf program for -ENOENT error from bpf_prog_array_copy.

This can't happen because bpf program is stored in perf event and is
detached and released only when perf event is freed.

Let's drop the -ENOENT check and make sure the bpf program is released
in any case.

Fixes: 170a7e3ea070 ("bpf: bpf_prog_array_copy() should return -ENOENT if exclude_prog not found")
Reported-by: Peter Zijlstra <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Closes: https://lore.kernel.org/lkml/[email protected]/

PCI/pwrctl: Abandon QCom WCN probe on pre-pwrseq device-trees

Old device trees for some platforms already define wifi nodes for the WCN
family of chips since before power sequencing was added upstream.

These nodes don't consume the regulator outputs from the PMU, and if we
allow this driver to bind to one of such "incomplete" nodes, we'll see a
kernel log error about the infinite probe deferral.

Extend the driver by adding a platform data struct matched against the
compatible. This struct contains the pwrseq target string as well as a
validation function called right after entering probe().

For Qualcomm WCN models, check the existence of the regulator supply
property that indicates the DT is already using power sequencing and return
-ENODEV if it's not there, indicating to the driver model that the device
should not be bound to the pwrctl driver.

Link: https://lore.kernel.org/r/[email protected]
Fixes: 6140d185a43d ("PCI/pwrctl: Add a PCI power control driver for power sequenced devices")
Reported-by: Johan Hovold <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Bartosz Golaszewski <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>

Bluetooth: ISO: Fix UAF on iso_sock_timeout

conn->sk maybe have been unlinked/freed while waiting for iso_conn_lock
so this checks if the conn->sk is still valid by checking if it part of
iso_sk_list.

Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: Luiz Augusto von Dentz <[email protected]>

Bluetooth: SCO: Fix UAF on sco_sock_timeout

conn->sk maybe have been unlinked/freed while waiting for sco_conn_lock
so this checks if the conn->sk is still valid by checking if it part of
sco_sk_list.

Reported-by: [email protected]
Tested-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=4c0d0c4cde787116d465
Fixes: ba316be1b6a0 ("Bluetooth: schedule SCO timeouts with delayed_work")
Signed-off-by: Luiz Augusto von Dentz <[email protected]>

Bluetooth: hci_core: Disable works on hci_unregister_dev

This make use of disable_work_* on hci_unregister_dev since the hci_dev is
about to be freed new submissions are not disarable.

Fixes: 0d151a103775 ("Bluetooth: hci_core: cancel all works upon hci_unregister_dev()")
Signed-off-by: Luiz Augusto von Dentz <[email protected]>

LoongArch: KVM: Mark hrtimer to expire in hard interrupt context

Like commit 2c0d278f3293f ("KVM: LAPIC: Mark hrtimer to expire in hard
interrupt context") and commit 9090825fa9974 ("KVM: arm/arm64: Let the
timer expire in hardirq context on RT"), On PREEMPT_RT enabled kernels
unmarked hrtimers are moved into soft interrupt expiry mode by default.
Then the timers are canceled from an preempt-notifier which is invoked
with disabled preemption which is not allowed on PREEMPT_RT.

The timer callback is short so in could be invoked in hard-IRQ context.
So let the timer expire on hard-IRQ context even on -RT.

This fix a "scheduling while atomic" bug for PREEMPT_RT enabled kernels:

BUG: scheduling while atomic: qemu-system-loo/1011/0x00000002
Modules linked in: amdgpu rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ns
CPU: 1 UID: 0 PID: 1011 Comm: qemu-system-loo Tainted: G        W          6.12.0-rc2+ #1774
Tainted: [W]=WARN
Hardware name: Loongson Loongson-3A5000-7A1000-1w-CRB/Loongson-LS3A5000-7A1000-1w-CRB, BIOS vUDK2018-LoongArch-V2.0.0-prebeta9 10/21/2022
Stack : ffffffffffffffff 0000000000000000 9000000004e3ea38 9000000116744000
         90000001167475a0 0000000000000000 90000001167475a8 9000000005644830
         90000000058dc000 90000000058dbff8 9000000116747420 0000000000000001
         0000000000000001 6a613fc938313980 000000000790c000 90000001001c1140
         00000000000003fe 0000000000000001 000000000000000d 0000000000000003
         0000000000000030 00000000000003f3 000000000790c000 9000000116747830
         90000000057ef000 0000000000000000 9000000005644830 0000000000000004
         0000000000000000 90000000057f4b58 0000000000000001 9000000116747868
         900000000451b600 9000000005644830 9000000003a13998 0000000010000020
         00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d
         ...
Call Trace:
[<9000000003a13998>] show_stack+0x38/0x180
[<9000000004e3ea34>] dump_stack_lvl+0x84/0xc0
[<9000000003a71708>] __schedule_bug+0x48/0x60
[<9000000004e45734>] __schedule+0x1114/0x1660
[<9000000004e46040>] schedule_rtlock+0x20/0x60
[<9000000004e4e330>] rtlock_slowlock_locked+0x3f0/0x10a0
[<9000000004e4f038>] rt_spin_lock+0x58/0x80
[<9000000003b02d68>] hrtimer_cancel_wait_running+0x68/0xc0
[<9000000003b02e30>] hrtimer_cancel+0x70/0x80
[<ffff80000235eb70>] kvm_restore_timer+0x50/0x1a0 [kvm]
[<ffff8000023616c8>] kvm_arch_vcpu_load+0x68/0x2a0 [kvm]
[<ffff80000234c2d4>] kvm_sched_in+0x34/0x60 [kvm]
[<9000000003a749a0>] finish_task_switch.isra.0+0x140/0x2e0
[<9000000004e44a70>] __schedule+0x450/0x1660
[<9000000004e45cb0>] schedule+0x30/0x180
[<ffff800002354c70>] kvm_vcpu_block+0x70/0x120 [kvm]
[<ffff800002354d80>] kvm_vcpu_halt+0x60/0x3e0 [kvm]
[<ffff80000235b194>] kvm_handle_gspr+0x3f4/0x4e0 [kvm]
[<ffff80000235f548>] kvm_handle_exit+0x1c8/0x260 [kvm]

Reviewed-by: Bibo Mao <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Make KASAN usable for variable cpu_vabits

Currently, KASAN on LoongArch assume the CPU VA bits is 48, which is
true for Loongson-3 series, but not for Loongson-2 series (only 40 or
lower), this patch fix that issue and make KASAN usable for variable
cpu_vabits.

Solution is very simple: Just define XRANGE_SHADOW_SHIFT which means
valid address length from VA_BITS to min(cpu_vabits, VA_BITS).

Cc: [email protected]
Signed-off-by: Kanglong Wang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime()

If get_clock_desc() succeeds, it calls fget() for the clockid's fd,
and get the clk->rwsem read lock, so the error path should release
the lock to make the lock balance and fput the clockid's fd to make
the refcount balance and release the fd related resource.

However the below commit left the error path locked behind resulting in
unbalanced locking. Check timespec64_valid_strict() before
get_clock_desc() to fix it, because the "ts" is not changed
after that.

Fixes: d8794ac20a29 ("posix-clock: Fix missing timespec64 check in pc_clock_settime()")
Acked-by: Richard Cochran <[email protected]>
Signed-off-by: Jinjie Ruan <[email protected]>
Acked-by: Anna-Maria Behnsen <[email protected]>
[[email protected]: fixed commit message typo]
Signed-off-by: Paolo Abeni <[email protected]>

r8169: avoid unsolicited interrupts

It was reported that after resume from suspend a PCI error is logged
and connectivity is broken. Error message is:
PCI error (cmd = 0x0407, status_errs = 0x0000)
The message seems to be a red herring as none of the error bits is set,
and the PCI command register value also is normal. Exception handling
for a PCI error includes a chip reset what apparently brakes connectivity
here. The interrupt status bit triggering the PCI error handling isn't
actually used on PCIe chip versions, so it's not clear why this bit is
set by the chip. Fix this by ignoring this bit on PCIe chip versions.

Fixes: 0e4851502f84 ("r8169: merge with version 8.001.00 of Realtek's r8168 driver")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219388
Tested-by: Atlas Yu <[email protected]>
Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

cifs: fix warning when destroy 'cifs_io_request_pool'

There's a issue as follows:
WARNING: CPU: 1 PID: 27826 at mm/slub.c:4698 free_large_kmalloc+0xac/0xe0
RIP: 0010:free_large_kmalloc+0xac/0xe0
Call Trace:
<TASK>
? __warn+0xea/0x330
mempool_destroy+0x13f/0x1d0
init_cifs+0xa50/0xff0 [cifs]
do_one_initcall+0xdc/0x550
do_init_module+0x22d/0x6b0
load_module+0x4e96/0x5ff0
init_module_from_file+0xcd/0x130
idempotent_init_module+0x330/0x620
__x64_sys_finit_module+0xb3/0x110
do_syscall_64+0xc1/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f

Obviously, 'cifs_io_request_pool' is not created by mempool_create().
So just use mempool_exit() to revert 'cifs_io_request_pool'.

Fixes: edea94a69730 ("cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs")
Signed-off-by: Ye Bin <[email protected]>
Acked-by: David Howells <[email protected]
Signed-off-by: Steve French <[email protected]>

smb: client: Handle kstrdup failures for passwords

In smb3_reconfigure(), after duplicating ctx->password and
ctx->password2 with kstrdup(), we need to check for allocation
failures.

If ses->password allocation fails, return -ENOMEM.
If ses->password2 allocation fails, free ses->password, set it
to NULL, and return -ENOMEM.

Fixes: c1eb537bf456 ("cifs: allow changing password during remount")
Reviewed-by: David Howells <[email protected]
Signed-off-by: Haoxiang Li <[email protected]>
Signed-off-by: Henrique Carvalho <[email protected]>
Signed-off-by: Steve French <[email protected]>

ALSA: hda/realtek: Update default depop procedure

Old procedure has a chance to meet Headphone no output.

Fixes: c2d6af53a43f ("ALSA: hda/realtek - Add default procedure for suspend and resume state")
Signed-off-by: Kailang Yang <[email protected]>
Link: https://lore.kernel.org/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

net: sched: use RCU read-side critical section in taprio_dump()

Fix possible use-after-free in 'taprio_dump()' by adding RCU
read-side critical section there. Never seen on x86 but
found on a KASAN-enabled arm64 system when investigating
https://syzkaller.appspot.com/bug?extid=b65e0af58423fc8a73aa:

[T15862] BUG: KASAN: slab-use-after-free in taprio_dump+0xa0c/0xbb0
[T15862] Read of size 4 at addr ffff0000d4bb88f8 by task repro/15862
[T15862]
[T15862] CPU: 0 UID: 0 PID: 15862 Comm: repro Not tainted 6.11.0-rc1-00293-gdefaf1a2113a-dirty #2
[T15862] Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-20240524-5.fc40 05/24/2024
[T15862] Call trace:
[T15862]  dump_backtrace+0x20c/0x220
[T15862]  show_stack+0x2c/0x40
[T15862]  dump_stack_lvl+0xf8/0x174
[T15862]  print_report+0x170/0x4d8
[T15862]  kasan_report+0xb8/0x1d4
[T15862]  __asan_report_load4_noabort+0x20/0x2c
[T15862]  taprio_dump+0xa0c/0xbb0
[T15862]  tc_fill_qdisc+0x540/0x1020
[T15862]  qdisc_notify.isra.0+0x330/0x3a0
[T15862]  tc_modify_qdisc+0x7b8/0x1838
[T15862]  rtnetlink_rcv_msg+0x3c8/0xc20
[T15862]  netlink_rcv_skb+0x1f8/0x3d4
[T15862]  rtnetlink_rcv+0x28/0x40
[T15862]  netlink_unicast+0x51c/0x790
[T15862]  netlink_sendmsg+0x79c/0xc20
[T15862]  __sock_sendmsg+0xe0/0x1a0
[T15862]  ____sys_sendmsg+0x6c0/0x840
[T15862]  ___sys_sendmsg+0x1ac/0x1f0
[T15862]  __sys_sendmsg+0x110/0x1d0
[T15862]  __arm64_sys_sendmsg+0x74/0xb0
[T15862]  invoke_syscall+0x88/0x2e0
[T15862]  el0_svc_common.constprop.0+0xe4/0x2a0
[T15862]  do_el0_svc+0x44/0x60
[T15862]  el0_svc+0x50/0x184
[T15862]  el0t_64_sync_handler+0x120/0x12c
[T15862]  el0t_64_sync+0x190/0x194
[T15862]
[T15862] Allocated by task 15857:
[T15862]  kasan_save_stack+0x3c/0x70
[T15862]  kasan_save_track+0x20/0x3c
[T15862]  kasan_save_alloc_info+0x40/0x60
[T15862]  __kasan_kmalloc+0xd4/0xe0
[T15862]  __kmalloc_cache_noprof+0x194/0x334
[T15862]  taprio_change+0x45c/0x2fe0
[T15862]  tc_modify_qdisc+0x6a8/0x1838
[T15862]  rtnetlink_rcv_msg+0x3c8/0xc20
[T15862]  netlink_rcv_skb+0x1f8/0x3d4
[T15862]  rtnetlink_rcv+0x28/0x40
[T15862]  netlink_unicast+0x51c/0x790
[T15862]  netlink_sendmsg+0x79c/0xc20
[T15862]  __sock_sendmsg+0xe0/0x1a0
[T15862]  ____sys_sendmsg+0x6c0/0x840
[T15862]  ___sys_sendmsg+0x1ac/0x1f0
[T15862]  __sys_sendmsg+0x110/0x1d0
[T15862]  __arm64_sys_sendmsg+0x74/0xb0
[T15862]  invoke_syscall+0x88/0x2e0
[T15862]  el0_svc_common.constprop.0+0xe4/0x2a0
[T15862]  do_el0_svc+0x44/0x60
[T15862]  el0_svc+0x50/0x184
[T15862]  el0t_64_sync_handler+0x120/0x12c
[T15862]  el0t_64_sync+0x190/0x194
[T15862]
[T15862] Freed by task 6192:
[T15862]  kasan_save_stack+0x3c/0x70
[T15862]  kasan_save_track+0x20/0x3c
[T15862]  kasan_save_free_info+0x4c/0x80
[T15862]  poison_slab_object+0x110/0x160
[T15862]  __kasan_slab_free+0x3c/0x74
[T15862]  kfree+0x134/0x3c0
[T15862]  taprio_free_sched_cb+0x18c/0x220
[T15862]  rcu_core+0x920/0x1b7c
[T15862]  rcu_core_si+0x10/0x1c
[T15862]  handle_softirqs+0x2e8/0xd64
[T15862]  __do_softirq+0x14/0x20

Fixes: 18cdd2f0998a ("net/sched: taprio: taprio_dump and taprio_change are protected by rtnl_mutex")
Acked-by: Vinicius Costa Gomes <[email protected]>
Signed-off-by: Dmitry Antipov <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: sched: fix use-after-free in taprio_change()

In 'taprio_change()', 'admin' pointer may become dangling due to sched
switch / removal caused by 'advance_sched()', and critical section
protected by 'q->current_entry_lock' is too small to prevent from such
a scenario (which causes use-after-free detected by KASAN). Fix this
by prefer 'rcu_replace_pointer()' over 'rcu_assign_pointer()' to update
'admin' immediately before an attempt to schedule freeing.

Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule")
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=b65e0af58423fc8a73aa
Acked-by: Vinicius Costa Gomes <[email protected]>
Signed-off-by: Dmitry Antipov <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

x86/sev: Ensure that RMP table fixups are reserved

The BIOS reserves RMP table memory via e820 reservations. This can still lead
to RMP page faults during kexec if the host tries to access memory within the
same 2MB region.

Commit

  400fea4b9651 ("x86/sev: Add callback to apply RMP table fixups for kexec"

adjusts the e820 reservations for the RMP table so that the entire 2MB range
at the start/end of the RMP table is marked reserved.

The e820 reservations are then passed to firmware via SNP_INIT where they get
marked HV-Fixed.

The RMP table fixups are done after the e820 ranges have been added to
memblock, allowing the fixup ranges to still be allocated and used by the
system.

The problem is that this memory range is now marked reserved in the e820
tables and during SNP initialization these reserved ranges are marked as
HV-Fixed.  This means that the pages cannot be used by an SNP guest, only by
the hypervisor.

However, the memory management subsystem does not make this distinction and
can allocate one of those pages to an SNP guest. This will ultimately result
in RMPUPDATE failures associated with the guest, causing it to fail to start
or terminate when accessing the HV-Fixed page.

The issue is captured below with memblock=debug:

  [    0.000000] SEV-SNP: *** DEBUG: snp_probe_rmptable_info:352 - rmp_base=0x280d4800000, rmp_end=0x28357efffff
  ...
  [    0.000000] BIOS-provided physical RAM map:
  ...
  [    0.000000] BIOS-e820: [mem 0x00000280d4800000-0x0000028357efffff] reserved
  [    0.000000] BIOS-e820: [mem 0x0000028357f00000-0x0000028357ffffff] usable
  ...
  ...
  [    0.183593] memblock add: [0x0000028357f00000-0x0000028357ffffff] e820__memblock_setup+0x74/0xb0
  ...
  [    0.203179] MEMBLOCK configuration:
  [    0.207057]  memory size = 0x0000027d0d194000 reserved size = 0x0000000009ed2c00
  [    0.215299]  memory.cnt  = 0xb
  ...
  [    0.311192]  memory[0x9]     [0x0000028357f00000-0x0000028357ffffff], 0x0000000000100000 bytes flags: 0x0
  ...
  ...
  [    0.419110] SEV-SNP: Reserving start/end of RMP table on a 2MB boundary [0x0000028357e00000]
  [    0.428514] e820: update [mem 0x28357e00000-0x28357ffffff] usable ==> reserved
  [    0.428517] e820: update [mem 0x28357e00000-0x28357ffffff] usable ==> reserved
  [    0.428520] e820: update [mem 0x28357e00000-0x28357ffffff] usable ==> reserved
  ...
  ...
  [    5.604051] MEMBLOCK configuration:
  [    5.607922]  memory size = 0x0000027d0d194000 reserved size = 0x0000000011faae02
  [    5.616163]  memory.cnt  = 0xe
  ...
  [    5.754525]  memory[0xc]     [0x0000028357f00000-0x0000028357ffffff], 0x0000000000100000 bytes on node 0 flags: 0x0
  ...
  ...
  [   10.080295] Early memory node ranges[   10.168065]
  ...
  node   0: [mem 0x0000028357f00000-0x0000028357ffffff]
  ...
  ...
  [ 8149.348948] SEV-SNP: RMPUPDATE failed for PFN 28357f7c, pg_level: 1, ret: 2

As shown above, the memblock allocations show 1MB after the end of the RMP as
available for allocation, which is what the RMP table fixups have reserved.
This memory range subsequently gets allocated as SNP guest memory, resulting
in an RMPUPDATE failure.

This can potentially be fixed by not reserving the memory range in the e820
table, but that causes kexec failures when using the KEXEC_FILE_LOAD syscall.

The solution is to use memblock_reserve() to mark the memory reserved for the
system, ensuring that it cannot be allocated to an SNP guest.

Since HV-Fixed memory is still readable/writable by the host, this only ends
up being a problem if the memory in this range requires a page state change,
which generally will only happen when allocating memory in this range to be
used for running SNP guests, which is now possible with the SNP hypervisor
support in kernel 6.11.

Backporter note:

Fixes tag points to a 6.9 change but as the last paragraph above explains,
this whole thing can happen after 6.11 received SNP HV support, therefore
backporting to 6.9 is not really necessary.

  [ bp: Massage commit message. ]

Fixes: 400fea4b9651 ("x86/sev: Add callback to apply RMP table fixups for kexec")
Suggested-by: Thomas Lendacky <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
Cc: <[email protected]> # 6.11, see Backporter note above.
Link: https://lore.kernel.org/r/[email protected]

net/sched: act_api: deny mismatched skip_sw/skip_hw flags for actions created by classifiers

tcf_action_init() has logic for checking mismatches between action and
filter offload flags (skip_sw/skip_hw). AFAIU, this is intended to run
on the transition between the new tc_act_bind(flags) returning true (aka
now gets bound to classifier) and tc_act_bind(act->tcfa_flags) returning
false (aka action was not bound to classifier before). Otherwise, the
check is skipped.

For the case where an action is not standalone, but rather it was
created by a classifier and is bound to it, tcf_action_init() skips the
check entirely, and this means it allows mismatched flags to occur.

Taking the matchall classifier code path as an example (with mirred as
an action), the reason is the following:

1 | mall_change()
2 | -> mall_replace_hw_filter()
3 |   -> tcf_exts_validate_ex()
4 |      -> flags |= TCA_ACT_FLAGS_BIND;
5 |      -> tcf_action_init()
6 |         -> tcf_action_init_1()
7 |            -> a_o->init()
8 |               -> tcf_mirred_init()
9 |                  -> tcf_idr_create_from_flags()
10 |                     -> tcf_idr_create()
11 |                        -> p->tcfa_flags = flags;
12 |         -> tc_act_bind(flags))
13 |         -> tc_act_bind(act->tcfa_flags)

When invoked from tcf_exts_validate_ex() like matchall does (but other
classifiers validate their extensions as well), tcf_action_init() runs
in a call path where "flags" always contains TCA_ACT_FLAGS_BIND (set by
line 4). So line 12 is always true, and line 13 is always true as well.
No transition ever takes place, and the check is skipped.

The code was added in this form in commit c86e0209dc77 ("flow_offload:
validate flags of filter and actions"), but I'm attributing the blame
even earlier in that series, to when TCA_ACT_FLAGS_SKIP_HW and
TCA_ACT_FLAGS_SKIP_SW were added to the UAPI.

Following the development process of this change, the check did not
always exist in this form. A change took place between v3 [1] and v4 [2],
AFAIU due to review feedback that it doesn't make sense for action flags
to be different than classifier flags. I think I agree with that
feedback, but it was translated into code that omits enforcing this for
"classic" actions created at the same time with the filters themselves.

There are 3 more important cases to discuss. First there is this command:

$ tc qdisc add dev eth0 clasct
$ tc filter add dev eth0 ingress matchall skip_sw \
action mirred ingress mirror dev eth1

which should be allowed, because prior to the concept of dedicated
action flags, it used to work and it used to mean the action inherited
the skip_sw/skip_hw flags from the classifier. It's not a mismatch.

Then we have this command:

$ tc qdisc add dev eth0 clasct
$ tc filter add dev eth0 ingress matchall skip_sw \
action mirred ingress mirror dev eth1 skip_hw

where there is a mismatch and it should be rejected.

Finally, we have:

$ tc qdisc add dev eth0 clasct
$ tc filter add dev eth0 ingress matchall skip_sw \
action mirred ingress mirror dev eth1 skip_sw

where the offload flags coincide, and this should be treated the same as
the first command based on inheritance, and accepted.

[1]: https://lore.kernel.org/netdev/20211028110646 [email protected]/
[2]: https://lore.kernel.org/netdev/20211118130805 [email protected]/
Fixes: 7adc57651211 ("flow_offload: add skip_hw and skip_sw to control if offload the action")
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Tested-by: Ido Schimmel <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

tracing: Consider the NULL character when validating the event length

strlen() returns a string length excluding the null byte. If the string
length equals to the maximum buffer length, the buffer will have no
space for the NULL terminating character.

This commit checks this condition and returns failure for it.

Link: https://lore.kernel.org/all/[email protected]/
Fixes: dec65d79fd26 ("tracing/probe: Check event name length correctly")
Signed-off-by: Leo Yan <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>

tracing/probes: Fix MAX_TRACE_ARGS limit handling

When creating a trace_probe we would set nr_args prior to truncating the
arguments to MAX_TRACE_ARGS. However, we would only initialize arguments
up to the limit.

This caused invalid memory access when attempting to set up probes with
more than 128 fetchargs.

  BUG: kernel NULL pointer dereference, address: 0000000000000020
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: Oops: 0000 [#1] PREEMPT SMP PTI
  CPU: 0 UID: 0 PID: 1769 Comm: cat Not tainted 6.11.0-rc7+ #8
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014
  RIP: 0010:__set_print_fmt+0x134/0x330

Resolve the issue by applying the MAX_TRACE_ARGS limit earlier. Return
an error when there are too many arguments instead of silently
truncating.

Link: https://lore.kernel.org/all/[email protected]/
Fixes: 035ba76014c0 ("tracing/probes: cleanup: Set trace_probe::nr_args at trace_probe_init")
Signed-off-by: Mikel Rychliski <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>

selftests/bpf: Add test for passing in uninit mtu_len

Add a small test to pass an uninitialized mtu_len to the bpf_check_mtu()
helper to probe whether the verifier rejects it under !CAP_PERFMON.

  # ./vmtest.sh -- ./test_progs -t verifier_mtu
  [...]
  ./test_progs -t verifier_mtu
  [    1.414712] tsc: Refined TSC clocksource calibration: 3407.993 MHz
  [    1.415327] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fcd52370, max_idle_ns: 440795242006 ns
  [    1.416463] clocksource: Switched to clocksource tsc
  [    1.429842] bpf_testmod: loading out-of-tree module taints kernel.
  [    1.430283] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  #510/1   verifier_mtu/uninit/mtu: write rejected:OK
  #510     verifier_mtu:OK
  Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Kumar Kartikeya Dwivedi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

selftests/bpf: Add test for writes to .rodata

Add a small test to write a (verification-time) fixed vs unknown but
bounded-sized buffer into .rodata BPF map and assert that both get
rejected.

  # ./vmtest.sh -- ./test_progs -t verifier_const
  [...]
  ./test_progs -t verifier_const
  [    1.418717] tsc: Refined TSC clocksource calibration: 3407.994 MHz
  [    1.419113] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fcde90a1, max_idle_ns: 440795222066 ns
  [    1.419972] clocksource: Switched to clocksource tsc
  [    1.449596] bpf_testmod: loading out-of-tree module taints kernel.
  [    1.449958] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  #475/1   verifier_const/rodata/strtol: write rejected:OK
  #475/2   verifier_const/bss/strtol: write accepted:OK
  #475/3   verifier_const/data/strtol: write accepted:OK
  #475/4   verifier_const/rodata/mtu: write rejected:OK
  #475/5   verifier_const/bss/mtu: write accepted:OK
  #475/6   verifier_const/data/mtu: write accepted:OK
  #475/7   verifier_const/rodata/mark: write with unknown reg rejected:OK
  #475/8   verifier_const/rodata/mark: write with unknown reg rejected:OK
  #475     verifier_const:OK
  #476/1   verifier_const_or/constant register |= constant should keep constant type:OK
  #476/2   verifier_const_or/constant register |= constant should not bypass stack boundary checks:OK
  #476/3   verifier_const_or/constant register |= constant register should keep constant type:OK
  #476/4   verifier_const_or/constant register |= constant register should not bypass stack boundary checks:OK
  #476     verifier_const_or:OK
  Summary: 2/12 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Kumar Kartikeya Dwivedi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>