]> Git Repo - linux.git/log
linux.git
5 months agosmb: client: fix UAF in async decryption
Enzo Matsumiya [Thu, 26 Sep 2024 17:46:13 +0000 (14:46 -0300)]
smb: client: fix UAF in async decryption

Doing an async decryption (large read) crashes with a
slab-use-after-free way down in the crypto API.

Reproducer:
    # mount.cifs -o ...,seal,esize=1 //srv/share /mnt
    # dd if=/mnt/largefile of=/dev/null
    ...
    [  194.196391] ==================================================================
    [  194.196844] BUG: KASAN: slab-use-after-free in gf128mul_4k_lle+0xc1/0x110
    [  194.197269] Read of size 8 at addr ffff888112bd0448 by task kworker/u77:2/899
    [  194.197707]
    [  194.197818] CPU: 12 UID: 0 PID: 899 Comm: kworker/u77:2 Not tainted 6.11.0-lku-00028-gfca3ca14a17a-dirty #43
    [  194.198400] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-prebuilt.qemu.org 04/01/2014
    [  194.199046] Workqueue: smb3decryptd smb2_decrypt_offload [cifs]
    [  194.200032] Call Trace:
    [  194.200191]  <TASK>
    [  194.200327]  dump_stack_lvl+0x4e/0x70
    [  194.200558]  ? gf128mul_4k_lle+0xc1/0x110
    [  194.200809]  print_report+0x174/0x505
    [  194.201040]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
    [  194.201352]  ? srso_return_thunk+0x5/0x5f
    [  194.201604]  ? __virt_addr_valid+0xdf/0x1c0
    [  194.201868]  ? gf128mul_4k_lle+0xc1/0x110
    [  194.202128]  kasan_report+0xc8/0x150
    [  194.202361]  ? gf128mul_4k_lle+0xc1/0x110
    [  194.202616]  gf128mul_4k_lle+0xc1/0x110
    [  194.202863]  ghash_update+0x184/0x210
    [  194.203103]  shash_ahash_update+0x184/0x2a0
    [  194.203377]  ? __pfx_shash_ahash_update+0x10/0x10
    [  194.203651]  ? srso_return_thunk+0x5/0x5f
    [  194.203877]  ? crypto_gcm_init_common+0x1ba/0x340
    [  194.204142]  gcm_hash_assoc_remain_continue+0x10a/0x140
    [  194.204434]  crypt_message+0xec1/0x10a0 [cifs]
    [  194.206489]  ? __pfx_crypt_message+0x10/0x10 [cifs]
    [  194.208507]  ? srso_return_thunk+0x5/0x5f
    [  194.209205]  ? srso_return_thunk+0x5/0x5f
    [  194.209925]  ? srso_return_thunk+0x5/0x5f
    [  194.210443]  ? srso_return_thunk+0x5/0x5f
    [  194.211037]  decrypt_raw_data+0x15f/0x250 [cifs]
    [  194.212906]  ? __pfx_decrypt_raw_data+0x10/0x10 [cifs]
    [  194.214670]  ? srso_return_thunk+0x5/0x5f
    [  194.215193]  smb2_decrypt_offload+0x12a/0x6c0 [cifs]

This is because TFM is being used in parallel.

Fix this by allocating a new AEAD TFM for async decryption, but keep
the existing one for synchronous READ cases (similar to what is done
in smb3_calc_signature()).

Also remove the calls to aead_request_set_callback() and
crypto_wait_req() since it's always going to be a synchronous operation.

Signed-off-by: Enzo Matsumiya <[email protected]>
Signed-off-by: Steve French <[email protected]>
5 months agonetfs: Fix write oops in generic/346 (9p) and generic/074 (cifs)
David Howells [Thu, 26 Sep 2024 13:58:30 +0000 (14:58 +0100)]
netfs: Fix write oops in generic/346 (9p) and generic/074 (cifs)

In netfslib, a buffered writeback operation has a 'write queue' of folios
that are being written, held in a linear sequence of folio_queue structs.
The 'issuer' adds new folio_queues on the leading edge of the queue and
populates each one progressively; the 'collector' pops them off the
trailing edge and discards them and the folios they point to as they are
consumed.

The queue is required to always retain at least one folio_queue structure.
This allows the queue to be accessed without locking and with just a bit of
barriering.

When a new subrequest is prepared, its ->io_iter iterator is pointed at the
current end of the write queue and then the iterator is extended as more
data is added to the queue until the subrequest is committed.

Now, the problem is that the folio_queue at the leading edge of the write
queue when a subrequest is prepared might have been entirely consumed - but
not yet removed from the queue as it is the only remaining one and is
preventing the queue from collapsing.

So, what happens is that subreq->io_iter is pointed at the spent
folio_queue, then a new folio_queue is added, and, at that point, the
collector is at entirely at liberty to immediately delete the spent
folio_queue.

This leaves the subreq->io_iter pointing at a freed object.  If the system
is lucky, iterate_folioq() sees ->io_iter, sees the as-yet uncorrupted
freed object and advances to the next folio_queue in the queue.

In the case seen, however, the freed object gets recycled and put back onto
the queue at the tail and filled to the end.  This confuses
iterate_folioq() and it tries to step ->next, which may be NULL - resulting
in an oops.

Fix this by the following means:

 (1) When preparing a write subrequest, make sure there's a folio_queue
     struct with space in it at the leading edge of the queue.  A function
     to make space is split out of the function to append a folio so that
     it can be called for this purpose.

 (2) If the request struct iterator is pointing to a completely spent
     folio_queue when we make space, then advance the iterator to the newly
     allocated folio_queue.  The subrequest's iterator will then be set
     from this.

The oops could be triggered using the generic/346 xfstest with a filesystem
on9P over TCP with cache=loose.  The oops looked something like:

 BUG: kernel NULL pointer dereference, address: 0000000000000008
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 ...
 RIP: 0010:_copy_from_iter+0x2db/0x530
 ...
 Call Trace:
  <TASK>
 ...
  p9pdu_vwritef+0x3d8/0x5d0
  p9_client_prepare_req+0xa8/0x140
  p9_client_rpc+0x81/0x280
  p9_client_write+0xcf/0x1c0
  v9fs_issue_write+0x87/0xc0
  netfs_advance_write+0xa0/0xb0
  netfs_write_folio.isra.0+0x42d/0x500
  netfs_writepages+0x15a/0x1f0
  do_writepages+0xd1/0x220
  filemap_fdatawrite_wbc+0x5c/0x80
  v9fs_mmap_vm_close+0x7d/0xb0
  remove_vma+0x35/0x70
  vms_complete_munmap_vmas+0x11a/0x170
  do_vmi_align_munmap+0x17d/0x1c0
  do_vmi_munmap+0x13e/0x150
  __vm_munmap+0x92/0xd0
  __x64_sys_munmap+0x17/0x20
  do_syscall_64+0x80/0xe0
  entry_SYSCALL_64_after_hwframe+0x71/0x79

This also fixed a similar-looking issue with cifs and generic/074.

Fixes: cd0277ed0c18 ("netfs: Use new folio_queue data type and iterator instead of xarray iter")
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Signed-off-by: David Howells <[email protected]>
Tested-by: kernel test robot <[email protected]>
cc: Eric Van Hensbergen <[email protected]>
cc: Latchesar Ionkov <[email protected]>
cc: Dominique Martinet <[email protected]>
cc: Christian Schoenebeck <[email protected]>
cc: Paulo Alcantara <[email protected]>
cc: Jeff Layton <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
Signed-off-by: Steve French <[email protected]>
5 months agodrm/amd/pm: update workload mask after the setting
Kenneth Feng [Fri, 20 Sep 2024 11:05:37 +0000 (19:05 +0800)]
drm/amd/pm: update workload mask after the setting

update workload mask after the setting.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3625
Signed-off-by: Kenneth Feng <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
5 months agodrm/amdgpu: bump driver version for cleared VRAM
Alex Deucher [Fri, 6 Sep 2024 17:51:06 +0000 (13:51 -0400)]
drm/amdgpu: bump driver version for cleared VRAM

Driver now clears VRAM on allocation.  Bump the
driver version so mesa knows when it will get
cleared vram by default.

Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 6.11.x
5 months agodrm/amdgpu: fix vbios fetching for SR-IOV
Alex Deucher [Wed, 25 Sep 2024 18:17:53 +0000 (14:17 -0400)]
drm/amdgpu: fix vbios fetching for SR-IOV

SR-IOV fetches the vbios from VRAM in some cases.
Re-enable the VRAM path for dGPUs and rename the function
to make it clear that it is not IGP specific.

Fixes: 042658d17a54 ("drm/amdgpu: clean up vbios fetching code")
Reviewed-by: Yang Wang <[email protected]>
Tested-by: Yang Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
5 months agodrm/amdgpu: fix PTE copy corruption for sdma 7
Frank Min [Wed, 25 Sep 2024 03:39:06 +0000 (11:39 +0800)]
drm/amdgpu: fix PTE copy corruption for sdma 7

Without setting dcc bit, there is ramdon PTE copy corruption on sdma 7.

so add this bit and update the packet format accordingly.

Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Frank Min <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 6.11.x
5 months agoocfs2: fix uninit-value in ocfs2_get_block()
Joseph Qi [Wed, 25 Sep 2024 09:06:00 +0000 (17:06 +0800)]
ocfs2: fix uninit-value in ocfs2_get_block()

syzbot reported an uninit-value BUG:

BUG: KMSAN: uninit-value in ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159
ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159
do_mpage_readpage+0xc45/0x2780 fs/mpage.c:225
mpage_readahead+0x43f/0x840 fs/mpage.c:374
ocfs2_readahead+0x269/0x320 fs/ocfs2/aops.c:381
read_pages+0x193/0x1110 mm/readahead.c:160
page_cache_ra_unbounded+0x901/0x9f0 mm/readahead.c:273
do_page_cache_ra mm/readahead.c:303 [inline]
force_page_cache_ra+0x3b1/0x4b0 mm/readahead.c:332
force_page_cache_readahead mm/internal.h:347 [inline]
generic_fadvise+0x6b0/0xa90 mm/fadvise.c:106
vfs_fadvise mm/fadvise.c:185 [inline]
ksys_fadvise64_64 mm/fadvise.c:199 [inline]
__do_sys_fadvise64 mm/fadvise.c:214 [inline]
__se_sys_fadvise64 mm/fadvise.c:212 [inline]
__x64_sys_fadvise64+0x1fb/0x3a0 mm/fadvise.c:212
x64_sys_call+0xe11/0x3ba0
arch/x86/include/generated/asm/syscalls_64.h:222
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f

This is because when ocfs2_extent_map_get_blocks() fails, p_blkno is
uninitialized.  So the error log will trigger the above uninit-value
access.

The error log is out-of-date since get_blocks() was removed long time ago.
And the error code will be logged in ocfs2_extent_map_get_blocks() once
ocfs2_get_cluster() fails, so fix this by only logging inode and block.

Link: https://syzkaller.appspot.com/bug?extid=9709e73bae885b05314b
Link: https://lkml.kernel.org/r/[email protected]
Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
Signed-off-by: Joseph Qi <[email protected]>
Reported-by: [email protected]
Tested-by: [email protected]
Cc: Heming Zhao <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
5 months agozram: don't free statically defined names
Andrey Skvortsov [Mon, 23 Sep 2024 16:48:43 +0000 (19:48 +0300)]
zram: don't free statically defined names

When CONFIG_ZRAM_MULTI_COMP isn't set ZRAM_SECONDARY_COMP can hold
default_compressor, because it's the same offset as ZRAM_PRIMARY_COMP, so
we need to make sure that we don't attempt to kfree() the statically
defined compressor name.

This is detected by KASAN.

==================================================================
  Call trace:
   kfree+0x60/0x3a0
   zram_destroy_comps+0x98/0x198 [zram]
   zram_reset_device+0x22c/0x4a8 [zram]
   reset_store+0x1bc/0x2d8 [zram]
   dev_attr_store+0x44/0x80
   sysfs_kf_write+0xfc/0x188
   kernfs_fop_write_iter+0x28c/0x428
   vfs_write+0x4dc/0x9b8
   ksys_write+0x100/0x1f8
   __arm64_sys_write+0x74/0xb8
   invoke_syscall+0xd8/0x260
   el0_svc_common.constprop.0+0xb4/0x240
   do_el0_svc+0x48/0x68
   el0_svc+0x40/0xc8
   el0t_64_sync_handler+0x120/0x130
   el0t_64_sync+0x190/0x198
==================================================================

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 684826f8271a ("zram: free secondary algorithms names")
Signed-off-by: Andrey Skvortsov <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Reported-by: Venkat Rao Bagalkote <[email protected]>
Closes: https://lore.kernel.org/lkml/[email protected]/
Tested-by: Venkat Rao Bagalkote <[email protected]>
Cc: Christophe JAILLET <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Venkat Rao Bagalkote <[email protected]>
Cc: Chris Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
5 months agomemory tiers: use default_dram_perf_ref_source in log message
Huang Ying [Fri, 20 Sep 2024 01:47:40 +0000 (09:47 +0800)]
memory tiers: use default_dram_perf_ref_source in log message

Commit 3718c02dbd4c ("acpi, hmat: calculate abstract distance with HMAT")
added a default_dram_perf_ref_source variable that was initialized but
never used.  This causes kmemleak to report the following memory leak:

unreferenced object 0xff11000225a47b60 (size 16):
  comm "swapper/0", pid 1, jiffies 4294761654
  hex dump (first 16 bytes):
    41 43 50 49 20 48 4d 41 54 00 c1 4b 7d b7 75 7c  ACPI HMAT..K}.u|
  backtrace (crc e6d0e7b2):
    [<ffffffff95d5afdb>] __kmalloc_node_track_caller_noprof+0x36b/0x440
    [<ffffffff95c276d6>] kstrdup+0x36/0x60
    [<ffffffff95dfabfa>] mt_set_default_dram_perf+0x23a/0x2c0
    [<ffffffff9ad64733>] hmat_init+0x2b3/0x660
    [<ffffffff95203cec>] do_one_initcall+0x11c/0x5c0
    [<ffffffff9ac9cfc4>] do_initcalls+0x1b4/0x1f0
    [<ffffffff9ac9d52e>] kernel_init_freeable+0x4ae/0x520
    [<ffffffff97c789cc>] kernel_init+0x1c/0x150
    [<ffffffff952aecd1>] ret_from_fork+0x31/0x70
    [<ffffffff9520b18a>] ret_from_fork_asm+0x1a/0x30

This reminds us that we forget to use the performance data source
information.  So, use the variable in the error log message to help
identify the root cause of inconsistent performance number.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 3718c02dbd4c ("acpi, hmat: calculate abstract distance with HMAT")
Signed-off-by: "Huang, Ying" <[email protected]>
Reported-by: Waiman Long <[email protected]>
Acked-by: Waiman Long <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Dave Jiang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
5 months agoRevert "list: test: fix tests for list_cut_position()"
Guenter Roeck [Sun, 22 Sep 2024 15:05:07 +0000 (08:05 -0700)]
Revert "list: test: fix tests for list_cut_position()"

This reverts commit e620799c414a035dea1208bcb51c869744931dbb.

The commit introduces unit test failures.

     Expected cur == &entries[i], but
         cur == 0000037fffadfd80
         &entries[i] == 0000037fffadfd60
     # list_test_list_cut_position: pass:0 fail:1 skip:0 total:1
     not ok 21 list_test_list_cut_position
     # list_test_list_cut_before: EXPECTATION FAILED at lib/list-test.c:444
     Expected cur == &entries[i], but
         cur == 0000037fffa9fd70
         &entries[i] == 0000037fffa9fd60
     # list_test_list_cut_before: EXPECTATION FAILED at lib/list-test.c:444
     Expected cur == &entries[i], but
         cur == 0000037fffa9fd80
         &entries[i] == 0000037fffa9fd70

Revert it.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: e620799c414a ("list: test: fix tests for list_cut_position()")
Signed-off-by: Guenter Roeck <[email protected]>
Cc: I Hsin Cheng <[email protected]>
Cc: David Gow <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
5 months agokselftests: mm: fix wrong __NR_userfaultfd value
Muhammad Usama Anjum [Mon, 23 Sep 2024 05:38:36 +0000 (10:38 +0500)]
kselftests: mm: fix wrong __NR_userfaultfd value

grep -rnIF "#define __NR_userfaultfd"
tools/include/uapi/asm-generic/unistd.h:681:#define __NR_userfaultfd 282
arch/x86/include/generated/uapi/asm/unistd_32.h:374:#define
__NR_userfaultfd 374
arch/x86/include/generated/uapi/asm/unistd_64.h:327:#define
__NR_userfaultfd 323
arch/x86/include/generated/uapi/asm/unistd_x32.h:282:#define
__NR_userfaultfd (__X32_SYSCALL_BIT + 323)
arch/arm/include/generated/uapi/asm/unistd-eabi.h:347:#define
__NR_userfaultfd (__NR_SYSCALL_BASE + 388)
arch/arm/include/generated/uapi/asm/unistd-oabi.h:359:#define
__NR_userfaultfd (__NR_SYSCALL_BASE + 388)
include/uapi/asm-generic/unistd.h:681:#define __NR_userfaultfd 282

The number is dependent on the architecture. The above data shows that:
x86 374
x86_64 323

The value of __NR_userfaultfd was changed to 282 when asm-generic/unistd.h
was included.  It makes the test to fail every time as the correct number
of this syscall on x86_64 is 323.  Fix the header to asm/unistd.h.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: a5c6bc590094 ("selftests/mm: remove local __NR_* definitions")
Signed-off-by: Muhammad Usama Anjum <[email protected]>
Reviewed-by: Shuah Khan <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
5 months agocompiler.h: specify correct attribute for .rodata..c_jump_table
Tiezhu Yang [Tue, 24 Sep 2024 06:27:10 +0000 (14:27 +0800)]
compiler.h: specify correct attribute for .rodata..c_jump_table

Currently, there is an assembler message when generating kernel/bpf/core.o
under CONFIG_OBJTOOL with LoongArch compiler toolchain:

  Warning: setting incorrect section attributes for .rodata..c_jump_table

This is because the section ".rodata..c_jump_table" should be readonly,
but there is a "W" (writable) part of the flags:

  $ readelf -S kernel/bpf/core.o | grep -A 1 "rodata..c"
  [34] .rodata..c_j[...] PROGBITS         0000000000000000  0000d2e0
       0000000000000800  0000000000000000  WA       0     0     8

There is no above issue on x86 due to the generated section flag is only
"A" (allocatable). In order to silence the warning on LoongArch, specify
the attribute like ".rodata..c_jump_table,\"a\",@progbits #" explicitly,
then the section attribute of ".rodata..c_jump_table" must be readonly
in the kernel/bpf/core.o file.

Before:

  $ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c"
   21 .rodata..c_jump_table 00000800  0000000000000000  0000000000000000  0000d2e0  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, DATA

After:

  $ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c"
   21 .rodata..c_jump_table 00000800  0000000000000000  0000000000000000  0000d2e0  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

By the way, AFAICT, maybe the root cause is related with the different
compiler behavior of various archs, so to some extent this change is a
workaround for LoongArch, and also there is no effect for x86 which is the
only port supported by objtool before LoongArch with this patch.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Tiezhu Yang <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: <[email protected]> [6.9+]
Signed-off-by: Andrew Morton <[email protected]>
5 months agomm/damon/Kconfig: update DAMON doc URL
Diederik de Haas [Tue, 24 Sep 2024 08:21:46 +0000 (10:21 +0200)]
mm/damon/Kconfig: update DAMON doc URL

The old URL doesn't really work anymore and as the documentation has been
integrated in the main kernel documentation site, change the URL to point
to that.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Diederik de Haas <[email protected]>
Reviewed-by: SeongJae Park <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
5 months agomm: kfence: fix elapsed time for allocated/freed track
qiwu.chen [Tue, 24 Sep 2024 08:50:04 +0000 (16:50 +0800)]
mm: kfence: fix elapsed time for allocated/freed track

Fix elapsed time for the allocated/freed track introduced by commit
62e73fd85d7bf.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 62e73fd85d7b ("mm: kfence: print the elapsed time for allocated/freed track")
Signed-off-by: qiwu.chen <[email protected]>
Reviewed-by: Marco Elver <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
5 months agoocfs2: fix deadlock in ocfs2_get_system_file_inode
Mohammed Anees [Tue, 24 Sep 2024 09:32:57 +0000 (09:32 +0000)]
ocfs2: fix deadlock in ocfs2_get_system_file_inode

syzbot has found a possible deadlock in ocfs2_get_system_file_inode [1].

The scenario is depicted here,

CPU0 CPU1
lock(&ocfs2_file_ip_alloc_sem_key);
                               lock(&osb->system_file_mutex);
                               lock(&ocfs2_file_ip_alloc_sem_key);
lock(&osb->system_file_mutex);

The function calls which could lead to this are:

CPU0
ocfs2_mknod - lock(&ocfs2_file_ip_alloc_sem_key);
.
.
.
ocfs2_get_system_file_inode - lock(&osb->system_file_mutex);

CPU1 -
ocfs2_fill_super - lock(&osb->system_file_mutex);
.
.
.
ocfs2_read_virt_blocks - lock(&ocfs2_file_ip_alloc_sem_key);

This issue can be resolved by making the down_read -> down_read_try
in the ocfs2_read_virt_blocks.

[1] https://syzkaller.appspot.com/bug?extid=e0055ea09f1f5e6fabdd

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mohammed Anees <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Reported-by: <[email protected]>
Closes: https://syzkaller.appspot.com/bug?extid=e0055ea09f1f5e6fabdd
Tested-by: [email protected]
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
5 months agoocfs2: reserve space for inline xattr before attaching reflink tree
Gautham Ananthakrishna [Wed, 18 Sep 2024 06:38:44 +0000 (06:38 +0000)]
ocfs2: reserve space for inline xattr before attaching reflink tree

One of our customers reported a crash and a corrupted ocfs2 filesystem.
The crash was due to the detection of corruption.  Upon troubleshooting,
the fsck -fn output showed the below corruption

[EXTENT_LIST_FREE] Extent list in owner 33080590 claims 230 as the next free chain record,
but fsck believes the largest valid value is 227.  Clamp the next record value? n

The stat output from the debugfs.ocfs2 showed the following corruption
where the "Next Free Rec:" had overshot the "Count:" in the root metadata
block.

        Inode: 33080590   Mode: 0640   Generation: 2619713622 (0x9c25a856)
        FS Generation: 904309833 (0x35e6ac49)
        CRC32: 00000000   ECC: 0000
        Type: Regular   Attr: 0x0   Flags: Valid
        Dynamic Features: (0x16) HasXattr InlineXattr Refcounted
        Extended Attributes Block: 0  Extended Attributes Inline Size: 256
        User: 0 (root)   Group: 0 (root)   Size: 281320357888
        Links: 1   Clusters: 141738
        ctime: 0x66911b56 0x316edcb8 -- Fri Jul 12 06:02:30.829349048 2024
        atime: 0x66911d6b 0x7f7a28d -- Fri Jul 12 06:11:23.133669517 2024
        mtime: 0x66911b56 0x12ed75d7 -- Fri Jul 12 06:02:30.317552087 2024
        dtime: 0x0 -- Wed Dec 31 17:00:00 1969
        Refcount Block: 2777346
        Last Extblk: 2886943   Orphan Slot: 0
        Sub Alloc Slot: 0   Sub Alloc Bit: 14
        Tree Depth: 1   Count: 227   Next Free Rec: 230
        ## Offset        Clusters       Block#
        0  0             2310           2776351
        1  2310          2139           2777375
        2  4449          1221           2778399
        3  5670          731            2779423
        4  6401          566            2780447
        .......          ....           .......
        .......          ....           .......

The issue was in the reflink workfow while reserving space for inline
xattr.  The problematic function is ocfs2_reflink_xattr_inline().  By the
time this function is called the reflink tree is already recreated at the
destination inode from the source inode.  At this point, this function
reserves space for inline xattrs at the destination inode without even
checking if there is space at the root metadata block.  It simply reduces
the l_count from 243 to 227 thereby making space of 256 bytes for inline
xattr whereas the inode already has extents beyond this index (in this
case up to 230), thereby causing corruption.

The fix for this is to reserve space for inline metadata at the destination
inode before the reflink tree gets recreated. The customer has verified the
fix.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: ef962df057aa ("ocfs2: xattr: fix inlined xattr reflink")
Signed-off-by: Gautham Ananthakrishna <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
5 months agomm: migrate: annotate data-race in migrate_folio_unmap()
Jeongjun Park [Tue, 24 Sep 2024 13:00:53 +0000 (22:00 +0900)]
mm: migrate: annotate data-race in migrate_folio_unmap()

I found a report from syzbot [1]

This report shows that the value can be changed, but in reality, the
value of __folio_set_movable() cannot be changed because it holds the
folio refcount.

Therefore, it is appropriate to add an annotate to make KCSAN
ignore that data-race.

[1]

==================================================================
BUG: KCSAN: data-race in __filemap_remove_folio / migrate_pages_batch

write to 0xffffea0004b81dd8 of 8 bytes by task 6348 on cpu 0:
 page_cache_delete mm/filemap.c:153 [inline]
 __filemap_remove_folio+0x1ac/0x2c0 mm/filemap.c:233
 filemap_remove_folio+0x6b/0x1f0 mm/filemap.c:265
 truncate_inode_folio+0x42/0x50 mm/truncate.c:178
 shmem_undo_range+0x25b/0xa70 mm/shmem.c:1028
 shmem_truncate_range mm/shmem.c:1144 [inline]
 shmem_evict_inode+0x14d/0x530 mm/shmem.c:1272
 evict+0x2f0/0x580 fs/inode.c:731
 iput_final fs/inode.c:1883 [inline]
 iput+0x42a/0x5b0 fs/inode.c:1909
 dentry_unlink_inode+0x24f/0x260 fs/dcache.c:412
 __dentry_kill+0x18b/0x4c0 fs/dcache.c:615
 dput+0x5c/0xd0 fs/dcache.c:857
 __fput+0x3fb/0x6d0 fs/file_table.c:439
 ____fput+0x1c/0x30 fs/file_table.c:459
 task_work_run+0x13a/0x1a0 kernel/task_work.c:228
 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
 exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
 __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
 syscall_exit_to_user_mode+0xbe/0x130 kernel/entry/common.c:218
 do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

read to 0xffffea0004b81dd8 of 8 bytes by task 6342 on cpu 1:
 __folio_test_movable include/linux/page-flags.h:699 [inline]
 migrate_folio_unmap mm/migrate.c:1199 [inline]
 migrate_pages_batch+0x24c/0x1940 mm/migrate.c:1797
 migrate_pages_sync mm/migrate.c:1963 [inline]
 migrate_pages+0xff1/0x1820 mm/migrate.c:2072
 do_mbind mm/mempolicy.c:1390 [inline]
 kernel_mbind mm/mempolicy.c:1533 [inline]
 __do_sys_mbind mm/mempolicy.c:1607 [inline]
 __se_sys_mbind+0xf76/0x1160 mm/mempolicy.c:1603
 __x64_sys_mbind+0x78/0x90 mm/mempolicy.c:1603
 x64_sys_call+0x2b4d/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:238
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

value changed: 0xffff888127601078 -> 0x0000000000000000

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 7e2a5e5ab217 ("mm: migrate: use __folio_test_movable()")
Signed-off-by: Jeongjun Park <[email protected]>
Reported-by: syzbot <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
5 months agomm/hugetlb: simplify refs in memfd_alloc_folio
Steve Sistare [Wed, 4 Sep 2024 19:41:08 +0000 (12:41 -0700)]
mm/hugetlb: simplify refs in memfd_alloc_folio

The folio_try_get in memfd_alloc_folio is not necessary.  Delete it, and
delete the matching folio_put in memfd_pin_folios.  This also avoids
leaking a ref if the memfd_alloc_folio call to hugetlb_add_to_page_cache
fails.  That error path is also broken in a second way -- when its
folio_put causes the ref to become 0, it will implicitly call
free_huge_folio, but then the path *explicitly* calls free_huge_folio.
Delete the latter.

This is a continuation of the fix
  "mm/hugetlb: fix memfd_pin_folios free_huge_pages leak"

[[email protected]: remove explicit call to free_huge_folio(), per Matthew]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <[email protected]>
Suggested-by: Vivek Kasireddy <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
5 months agomm/gup: fix memfd_pin_folios alloc race panic
Steve Sistare [Tue, 3 Sep 2024 14:25:21 +0000 (07:25 -0700)]
mm/gup: fix memfd_pin_folios alloc race panic

If memfd_pin_folios tries to create a hugetlb page, but someone else
already did, then folio gets the value -EEXIST here:

        folio = memfd_alloc_folio(memfd, start_idx);
        if (IS_ERR(folio)) {
                ret = PTR_ERR(folio);
                if (ret != -EEXIST)
                        goto err;

then on the next trip through the "while start_idx" loop we panic here:

        if (folio) {
                folio_put(folio);

To fix, set the folio to NULL on error.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <[email protected]>
Acked-by: Vivek Kasireddy <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
5 months agomm/gup: fix memfd_pin_folios hugetlb page allocation
Steve Sistare [Tue, 3 Sep 2024 14:25:20 +0000 (07:25 -0700)]
mm/gup: fix memfd_pin_folios hugetlb page allocation

When memfd_pin_folios -> memfd_alloc_folio creates a hugetlb page, the
index is wrong.  The subsequent call to filemap_get_folios_contig thus
cannot find it, and fails, and memfd_pin_folios loops forever.  To fix,
adjust the index for the huge_page_order.

memfd_alloc_folio also forgets to unlock the folio, so the next touch of
the page calls hugetlb_fault which blocks forever trying to take the lock.
Unlock it.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <[email protected]>
Acked-by: Vivek Kasireddy <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
5 months agomm/hugetlb: fix memfd_pin_folios resv_huge_pages leak
Steve Sistare [Tue, 3 Sep 2024 14:25:19 +0000 (07:25 -0700)]
mm/hugetlb: fix memfd_pin_folios resv_huge_pages leak

memfd_pin_folios followed by unpin_folios leaves resv_huge_pages elevated
if the pages were not already faulted in.  During a normal page fault,
resv_huge_pages is consumed here:

hugetlb_fault()
  alloc_hugetlb_folio()
    dequeue_hugetlb_folio_vma()
      dequeue_hugetlb_folio_nodemask()
        dequeue_hugetlb_folio_node_exact()
          free_huge_pages--
      resv_huge_pages--

During memfd_pin_folios, the page is created by calling
alloc_hugetlb_folio_nodemask instead of alloc_hugetlb_folio, and
resv_huge_pages is not modified:

memfd_alloc_folio()
  alloc_hugetlb_folio_nodemask()
    dequeue_hugetlb_folio_nodemask()
      dequeue_hugetlb_folio_node_exact()
        free_huge_pages--

alloc_hugetlb_folio_nodemask has other callers that must not modify
resv_huge_pages.  Therefore, to fix, define an alternate version of
alloc_hugetlb_folio_nodemask for this call site that adjusts
resv_huge_pages.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <[email protected]>
Acked-by: Vivek Kasireddy <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
5 months agomm/hugetlb: fix memfd_pin_folios free_huge_pages leak
Steve Sistare [Tue, 3 Sep 2024 14:25:18 +0000 (07:25 -0700)]
mm/hugetlb: fix memfd_pin_folios free_huge_pages leak

memfd_pin_folios followed by unpin_folios fails to restore free_huge_pages
if the pages were not already faulted in, because the folio refcount for
pages created by memfd_alloc_folio never goes to 0.  memfd_pin_folios
needs another folio_put to undo the folio_try_get below:

memfd_alloc_folio()
  alloc_hugetlb_folio_nodemask()
    dequeue_hugetlb_folio_nodemask()
      dequeue_hugetlb_folio_node_exact()
        folio_ref_unfreeze(folio, 1);    ; adds 1 refcount
  folio_try_get()                        ; adds 1 refcount
  hugetlb_add_to_page_cache()            ; adds 512 refcount (on x86)

With the fix, after memfd_pin_folios + unpin_folios, the refcount for the
(unfaulted) page is 512, which is correct, as the refcount for a faulted
unpinned page is 513.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <[email protected]>
Acked-by: Vivek Kasireddy <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
5 months agomm/filemap: fix filemap_get_folios_contig THP panic
Steve Sistare [Tue, 3 Sep 2024 14:25:17 +0000 (07:25 -0700)]
mm/filemap: fix filemap_get_folios_contig THP panic

Patch series "memfd-pin huge page fixes".

Fix multiple bugs that occur when using memfd_pin_folios with hugetlb
pages and THP.  The hugetlb bugs only bite when the page is not yet
faulted in when memfd_pin_folios is called.  The THP bug bites when the
starting offset passed to memfd_pin_folios is not huge page aligned.  See
the commit messages for details.

This patch (of 5):

memfd_pin_folios on memory backed by THP panics if the requested start
offset is not huge page aligned:

BUG: kernel NULL pointer dereference, address: 0000000000000036
RIP: 0010:filemap_get_folios_contig+0xdf/0x290
RSP: 0018:ffffc9002092fbe8 EFLAGS: 00010202
RAX: 0000000000000002 RBX: 0000000000000002 RCX: 0000000000000002

The fault occurs here, because xas_load returns a folio with value 2:

    filemap_get_folios_contig()
        for (folio = xas_load(&xas); folio && xas.xa_index <= end;
                        folio = xas_next(&xas)) {
                ...
                if (!folio_try_get(folio))   <-- BOOM

"2" is an xarray sibling entry.  We get it because memfd_pin_folios does
not round the indices passed to filemap_get_folios_contig to huge page
boundaries for THP, so we load from the middle of a huge page range see a
sibling.  (It does round for hugetlbfs, at the is_file_hugepages test).

To fix, if the folio is a sibling, then return the next index as the
starting point for the next call to filemap_get_folios_contig.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Vivek Kasireddy <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
5 months agomm: make SPLIT_PTE_PTLOCKS depend on SMP
Guenter Roeck [Tue, 24 Sep 2024 15:42:05 +0000 (08:42 -0700)]
mm: make SPLIT_PTE_PTLOCKS depend on SMP

SPLIT_PTE_PTLOCKS depends on "NR_CPUS >= 4".  Unfortunately, that
evaluates to true if there is no NR_CPUS configuration option.  This
results in CONFIG_SPLIT_PTE_PTLOCKS=y for mac_defconfig.  This in turn
causes the m68k "q800" and "virt" machines to crash in qemu if debugging
options are enabled.

Making CONFIG_SPLIT_PTE_PTLOCKS dependent on the existence of NR_CPUS does
not work since a dependency on the existence of a numeric Kconfig entry
always evaluates to false.  Example:

config HAVE_NO_NR_CPUS
       def_bool y
       depends on !NR_CPUS

After adding this to a Kconfig file, "make defconfig" includes:
$ grep NR_CPUS .config
CONFIG_NR_CPUS=64
CONFIG_HAVE_NO_NR_CPUS=y

Defining NR_CPUS for m68k does not help either since many architectures
define NR_CPUS only for SMP configurations.

Make SPLIT_PTE_PTLOCKS depend on SMP instead to solve the problem.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 394290cba966 ("mm: turn USE_SPLIT_PTE_PTLOCKS / USE_SPLIT_PTE_PTLOCKS into Kconfig options")
Signed-off-by: Guenter Roeck <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Tested-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
5 months agotools: fix shared radix-tree build
Lorenzo Stoakes [Tue, 24 Sep 2024 18:07:24 +0000 (19:07 +0100)]
tools: fix shared radix-tree build

The shared radix-tree build is not correctly recompiling when
lib/maple_tree.c and lib/test_maple_tree.c are modified - fix this by
adding these core components to the SHARED_DEPS list.

Additionally, add missing header guards to shared header files.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 74579d8dab47 ("tools: separate out shared radix-tree components")
Signed-off-by: Lorenzo Stoakes <[email protected]>
Tested-by: Sidhartha Kumar <[email protected]>
Cc: "Liam R. Howlett" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
5 months agoMerge tag 'drm-intel-next-fixes-2024-09-26' of https://gitlab.freedesktop.org/drm...
Dave Airlie [Thu, 26 Sep 2024 20:30:21 +0000 (06:30 +1000)]
Merge tag 'drm-intel-next-fixes-2024-09-26' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next

- Fix colorimetry detection for DP

Signed-off-by: Dave Airlie <[email protected]>
From: Joonas Lahtinen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
5 months agoMerge tag 'soc-ep93xx-dt-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Linus Torvalds [Thu, 26 Sep 2024 19:00:25 +0000 (12:00 -0700)]
Merge tag 'soc-ep93xx-dt-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull SoC update from Arnd Bergmann:
 "Convert ep93xx to devicetree

  This concludes a long journey towards replacing the old board files
  with devictree description on the Cirrus Logic EP93xx platform.

  Nikita Shubin has been working on this for a long time, for details
  see the last post on

    https://lore.kernel.org/lkml/20240909-ep93xx-v12-0-e86ab2423d4b@maquefel.me/"

* tag 'soc-ep93xx-dt-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (47 commits)
  dt-bindings: gpio: ep9301: Add missing "#interrupt-cells" to examples
  MAINTAINERS: Update EP93XX ARM ARCHITECTURE maintainer
  soc: ep93xx: drop reference to removed EP93XX_SOC_COMMON config
  net: cirrus: use u8 for addr to calm down sparse
  dmaengine: cirrus: use snprintf() to calm down gcc 13.3.0
  dmaengine: ep93xx: Fix a NULL vs IS_ERR() check in probe()
  pinctrl: ep93xx: Fix raster pins typo
  spi: ep93xx: update kerneldoc comments for ep93xx_spi
  clk: ep93xx: Fix off by one in ep93xx_div_recalc_rate()
  clk: ep93xx: add module license
  dmaengine: cirrus: remove platform code
  ASoC: cirrus: edb93xx: Delete driver
  ARM: ep93xx: soc: drop defines
  ARM: ep93xx: delete all boardfiles
  ata: pata_ep93xx: remove legacy pinctrl use
  pwm: ep93xx: drop legacy pinctrl
  ARM: ep93xx: DT for the Cirrus ep93xx SoC platforms
  ARM: dts: ep93xx: Add EDB9302 DT
  ARM: dts: ep93xx: add ts7250 board
  ARM: dts: add Cirrus EP93XX SoC .dtsi
  ...

5 months agoMerge tag 'asm-generic-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd...
Linus Torvalds [Thu, 26 Sep 2024 18:54:40 +0000 (11:54 -0700)]
Merge tag 'asm-generic-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic updates from Arnd Bergmann:
 "These are only two small patches, one cleanup for arch/alpha and a
  preparation patch cleaning up the handling of runtime constants in the
  linker scripts"

* tag 'asm-generic-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  runtime constants: move list of constants to vmlinux.lds.h
  alpha: no need to include asm/xchg.h twice

5 months agoMerge tag 'efi-next-for-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Linus Torvalds [Thu, 26 Sep 2024 18:44:55 +0000 (11:44 -0700)]
Merge tag 'efi-next-for-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:
 "Not a lot happening in EFI land this cycle.

   - Prevent kexec from crashing on a corrupted TPM log by using a
     memory type that is reserved by default

   - Log correctable errors reported via CPER

   - A couple of cosmetic fixes"

* tag 'efi-next-for-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: Remove redundant null pointer checks in efi_debugfs_init()
  efistub/tpm: Use ACPI reclaim memory for event log to avoid corruption
  efi/cper: Print correctable AER information
  efi: Remove unused declaration efi_initialize_iomem_resources()

5 months agoRevert "binfmt_elf, coredump: Log the reason of the failed core dumps"
Linus Torvalds [Thu, 26 Sep 2024 18:39:02 +0000 (11:39 -0700)]
Revert "binfmt_elf, coredump: Log the reason of the failed core dumps"

This reverts commit fb97d2eb542faf19a8725afbd75cbc2518903210.

The logging was questionable to begin with, but it seems to actively
deadlock on the task lock.

 "On second thought, let's not log core dump failures. 'Tis a silly place"

because if you can't tell your core dump is truncated, maybe you should
just fix your debugger instead of adding bugs to the kernel.

Reported-by: Vegard Nossum <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Linus Torvalds <[email protected]>
5 months agodma-mapping: fix DMA API tracing for chained scatterlists
Christoph Hellwig [Thu, 26 Sep 2024 06:35:24 +0000 (08:35 +0200)]
dma-mapping: fix DMA API tracing for chained scatterlists

scatterlist allocations can be chained, and thus all iterations need to
use the chain-aware iterators.  Switch the newly added tracing to use the
proper iterators so that they work with chained scatterlists.

Fixes: 038eb433dc14 ("dma-mapping: add tracing for dma-mapping API calls")
Reported-by: [email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sean Anderson <[email protected]>
Tested-by: [email protected]
5 months agox86/cpu: Add two Intel CPU model numbers
Tony Luck [Mon, 23 Sep 2024 17:37:50 +0000 (10:37 -0700)]
x86/cpu: Add two Intel CPU model numbers

Pantherlake is a mobile CPU. Diamond Rapids next generation Xeon.

Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Link: https://lore.kernel.org/all/20240923173750.16874-1-tony.luck%40intel.com
5 months agoMerge tag 'net-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 26 Sep 2024 17:27:10 +0000 (10:27 -0700)]
Merge tag 'net-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter.

  It looks like that most people are still traveling: both the ML volume
  and the processing capacity are low.

  Previous releases - regressions:

    - netfilter:
        - nf_reject_ipv6: fix nf_reject_ip6_tcphdr_put()
        - nf_tables: keep deleted flowtable hooks until after RCU

    - tcp: check skb is non-NULL in tcp_rto_delta_us()

    - phy: aquantia: fix -ETIMEDOUT PHY probe failure when firmware not
      present

    - eth: virtio_net: fix mismatched buf address when unmapping for
      small packets

    - eth: stmmac: fix zero-division error when disabling tc cbs

    - eth: bonding: fix unnecessary warnings and logs from
      bond_xdp_get_xmit_slave()

  Previous releases - always broken:

    - netfilter:
        - fix clash resolution for bidirectional flows
        - fix allocation with no memcg accounting

    - eth: r8169: add tally counter fields added with RTL8125

    - eth: ravb: fix rx and tx frame size limit"

* tag 'net-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits)
  selftests: netfilter: Avoid hanging ipvs.sh
  kselftest: add test for nfqueue induced conntrack race
  netfilter: nfnetlink_queue: remove old clash resolution logic
  netfilter: nf_tables: missing objects with no memcg accounting
  netfilter: nf_tables: use rcu chain hook list iterator from netlink dump path
  netfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS
  netfilter: nf_reject: Fix build warning when CONFIG_BRIDGE_NETFILTER=n
  netfilter: nf_tables: Keep deleted flowtable hooks until after RCU
  docs: tproxy: ignore non-transparent sockets in iptables
  netfilter: ctnetlink: Guard possible unused functions
  selftests: netfilter: nft_tproxy.sh: add tcp tests
  selftests: netfilter: add reverse-clash resolution test case
  netfilter: conntrack: add clash resolution for reverse collisions
  netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash
  selftests/net: packetdrill: increase timing tolerance in debug mode
  usbnet: fix cyclical race on disconnect with work queue
  net: stmmac: set PP_FLAG_DMA_SYNC_DEV only if XDP is enabled
  virtio_net: Fix mismatched buf address when unmapping for small packets
  bonding: Fix unnecessary warnings and logs from bond_xdp_get_xmit_slave()
  r8169: add missing MODULE_FIRMWARE entry for RTL8126A rev.b
  ...

5 months agoMerge tag 'char-misc-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Thu, 26 Sep 2024 17:13:08 +0000 (10:13 -0700)]
Merge tag 'char-misc-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char / misc driver updates from Greg KH:
 "Here is the "big" set of char/misc and other driver subsystem changes
  for 6.12-rc1.

  Lots of changes in here, primarily dominated by the usual IIO driver
  updates and additions, but there are also small driver subsystem
  updates all over the place. Included in here are:

   - lots and lots of new IIO drivers and updates to existing ones

   - interconnect subsystem updates and new drivers

   - nvmem subsystem updates and new drivers

   - mhi driver updates

   - power supply subsystem updates

   - kobj_type const work for many different small subsystems

   - comedi driver fix

   - coresight subsystem and driver updates

   - fpga subsystem improvements

   - slimbus fixups

   - binder new feature addition for "frozen" notifications

   - lots and lots of other small driver updates and cleanups

  All of these have been in linux-next for a long time with no reported
  problems"

* tag 'char-misc-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (354 commits)
  greybus: gb-beagleplay: Add firmware upload API
  arm64: dts: ti: k3-am625-beagleplay: Add bootloader-backdoor-gpios to cc1352p7
  dt-bindings: net: ti,cc1352p7: Add bootloader-backdoor-gpios
  MAINTAINERS: Update path for U-Boot environment variables YAML
  nvmem: layouts: add U-Boot env layout
  comedi: ni_routing: tools: Check when the file could not be opened
  ocxl: Remove the unused declarations in headr file
  hpet: Fix the wrong format specifier
  uio: Constify struct kobj_type
  cxl: Constify struct kobj_type
  binder: modify the comment for binder_proc_unlock
  iio: adc: axp20x_adc: add support for AXP717 ADC
  dt-bindings: iio: adc: Add AXP717 compatible
  iio: adc: axp20x_adc: Add adc_en1 and adc_en2 to axp_data
  w1: ds2482: Drop explicit initialization of struct i2c_device_id::driver_data to 0
  tools: iio: rm .*.cmd when make clean
  iio: adc: standardize on formatting for id match tables
  iio: proximity: aw96103: Add support for aw96103/aw96105 proximity sensor
  bus: mhi: host: pci_generic: Enable EDL trigger for Foxconn modems
  bus: mhi: host: pci_generic: Update EDL firmware path for Foxconn modems
  ...

5 months agoMerge tag 'staging-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Thu, 26 Sep 2024 17:04:35 +0000 (10:04 -0700)]
Merge tag 'staging-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver updates from Greg KH:
 "Here is the big set of staging driver cleanups and removals for
  6.12-rc1.

  Nothing exciting here, just slow, constant, forward progress in
  removing code and cleaning up some old drivers, along with removing
  one of them that was not being used anymore at all. In discussions
  with some developers this past week, even more deletions will be
  happening for the next major merge window, as we seems to have code
  here that obviously no one is using anymore.

  Along with the normal cleanups is the good vme_user code forward
  progress, the one major bright spot in the staging subsystem for code
  that people rely on, and is getting good development behind it.
  Hopefully it can graduate out of staging "soon".

  All of these changes have been in linux-next for a long time with no
  reported problems"

* tag 'staging-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (141 commits)
  staging: vt6655: Rename variable apTD1Rings
  staging: vt6655: Rename variable apTD0Rings
  staging: rtl8723bs: remove unused 'poll_cnt' from rtw_set_rpwm()
  staging: rtl8723bs: remove unused cnt from recv_func()
  staging: rtl8723bs: remove unused efuseValue from efuse_OneByteWrite()
  staging: rtl8712: remove unused drvinfo_sz from update_recvframe_attrib
  staging: vt6655: mac.h: Fix possible precedence issue in macros
  staging: rtl8723bs: include: Remove spaces before tabs in rtw_security.h
  staging: rtl8723bs: include: Fix trailing */ position in rtw_security.h
  staging: rtl8723bs: include: Fix indent for else block struct in rtw_security.h
  staging: rtl8723bs: include: Fix indent for struct _byte_ in rtw_security.h
  staging: rtl8723bs: include: Fix use of tabs for indent in rtw_security.h
  staging: rtl8723bs: include: Fix indent for switch block in rtw_security.h
  staging: rtl8723bs: include: Fix indent for switch case in rtw_security.h
  staging: rtl8723bs: include: Fix open brace position in rtw_security.h
  staging: nvec: Use IRQF_NO_AUTOEN flag in request_irq()
  staging: rtl8723bs: Remove unused file rtw_rf.c
  staging: rtl8723bs: Remove unused function rtw_ch2freq
  staging: rtl8723bs: Remove unused files rtw_debug.c and rtw_debug.h
  staging: rtl8723bs: Remove unused function dump_4_regs
  ...

5 months agoMerge tag 'tty-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Thu, 26 Sep 2024 16:59:50 +0000 (09:59 -0700)]
Merge tag 'tty-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty / serial driver updates from Greg KH:
 "Here is the "big" set of tty/serial driver updates for 6.12-rc1.

  Nothing major in here, just nice forward progress in the slow cleanup
  of the serial apis, and lots of other driver updates and fixes.

  Included in here are:

   - serial api updates from Jiri to make things more uniform and sane

   - 8250_platform driver cleanups

   - samsung serial driver fixes and updates

   - qcom-geni serial driver fixes from Johan for the bizarre UART
     engine that that chip seems to have. Hopefully it's in a better
     state now, but hardware designers still seem to come up with more
     ways to make broken UARTS 40+ years after this all should have
     finished.

   - sc16is7xx driver updates

   - omap 8250 driver updates

   - 8250_bcm2835aux driver updates

   - a few new serial driver bindings added

   - other serial minor driver updates

  All of these have been in linux-next for a long time with no reported
  problems"

* tag 'tty-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (65 commits)
  tty: serial: samsung: Fix serial rx on Apple A7-A9
  tty: serial: samsung: Fix A7-A11 serial earlycon SError
  tty: serial: samsung: Use bit manipulation macros for APPLE_S5L_*
  tty: rp2: Fix reset with non forgiving PCIe host bridges
  serial: 8250_aspeed_vuart: Enable module autoloading
  serial: qcom-geni: fix polled console corruption
  serial: qcom-geni: disable interrupts during console writes
  serial: qcom-geni: fix console corruption
  serial: qcom-geni: introduce qcom_geni_serial_poll_bitfield()
  serial: qcom-geni: fix arg types for qcom_geni_serial_poll_bit()
  soc: qcom: geni-se: add GP_LENGTH/IRQ_EN_SET/IRQ_EN_CLEAR registers
  serial: qcom-geni: fix false console tx restart
  serial: qcom-geni: fix fifo polling timeout
  tty: hvc: convert comma to semicolon
  mxser: convert comma to semicolon
  serial: 8250_bcm2835aux: Fix clock imbalance in PM resume
  serial: sc16is7xx: convert bitmask definitions to use BIT() macro
  serial: sc16is7xx: fix copy-paste errors in EFR_SWFLOWx_BIT constants
  serial: sc16is7xx: remove SC16IS7XX_MSR_DELTA_MASK
  serial: xilinx_uartps: Make cdns_rs485_supported static
  ...

5 months agoMerge tag 'usb-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Thu, 26 Sep 2024 16:45:36 +0000 (09:45 -0700)]
Merge tag 'usb-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB/Thunderbolt updates from Greg KH:
 "Here is the large set of USB and Thunderbolt changes for 6.12-rc1.

  Nothing "major" in here, except for a new 9p network gadget that has
  been worked on for a long time (all of the needed acks are here)

  Other than that, it's the usual set of:

   - Thunderbolt / USB4 driver updates and additions for new hardware

   - dwc3 driver updates and new features added

   - xhci driver updates

   - typec driver updates

   - USB gadget updates and api additions to make some gadgets more
     configurable by userspace

   - dwc2 driver updates

   - usb phy driver updates

   - usbip feature additions

   - other minor USB driver updates

  All of these have been in linux-next for a long time with no reported
  issues"

* tag 'usb-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (145 commits)
  sub: cdns3: Use predefined PCI vendor ID constant
  sub: cdns2: Use predefined PCI vendor ID constant
  USB: misc: yurex: fix race between read and write
  USB: misc: cypress_cy7c63: check for short transfer
  USB: appledisplay: close race between probe and completion handler
  USB: class: CDC-ACM: fix race between get_serial and set_serial
  usb: r8a66597-hcd: make read-only const arrays static
  usb: typec: ucsi: Fix busy loop on ASUS VivoBooks
  usb: dwc3: rtk: Clean up error code in __get_dwc3_maximum_speed()
  usb: storage: ene_ub6250: Fix right shift warnings
  usb: roles: Improve the fix for a false positive recursive locking complaint
  locking/mutex: Introduce mutex_init_with_key()
  locking/mutex: Define mutex_init() once
  net/9p/usbg: fix CONFIG_USB_GADGET dependency
  usb: xhci: fix loss of data on Cadence xHC
  usb: xHCI: add XHCI_RESET_ON_RESUME quirk for Phytium xHCI host
  usb: dwc3: imx8mp: disable SS_CON and U3 wakeup for system sleep
  usb: dwc3: imx8mp: add 2 software managed quirk properties for host mode
  usb: host: xhci-plat: Parse xhci-missing_cas_quirk and apply quirk
  usb: misc: onboard_usb_dev: add Microchip usb5744 SMBus programming support
  ...

5 months agox86/tdx: Fix "in-kernel MMIO" check
Alexey Gladkov (Intel) [Fri, 13 Sep 2024 17:05:56 +0000 (19:05 +0200)]
x86/tdx: Fix "in-kernel MMIO" check

TDX only supports kernel-initiated MMIO operations. The handle_mmio()
function checks if the #VE exception occurred in the kernel and rejects
the operation if it did not.

However, userspace can deceive the kernel into performing MMIO on its
behalf. For example, if userspace can point a syscall to an MMIO address,
syscall does get_user() or put_user() on it, triggering MMIO #VE. The
kernel will treat the #VE as in-kernel MMIO.

Ensure that the target MMIO address is within the kernel before decoding
instruction.

Fixes: 31d58c4e557d ("x86/tdx: Handle in-kernel MMIO")
Signed-off-by: Alexey Gladkov (Intel) <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Kirill A. Shutemov <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Cc:[email protected]
Link: https://lore.kernel.org/all/565a804b80387970460a4ebc67c88d1380f61ad1.1726237595.git.legion%40kernel.org
5 months agoMerge tag 'hid-for-linus-2024092601' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 26 Sep 2024 16:25:28 +0000 (09:25 -0700)]
Merge tag 'hid-for-linus-2024092601' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID fix from Jiri Kosina:
 "A revert of Device Tree binding for Goodix SPI HID driver (while
  keeping ACPI still available), as it conflicted with already existing
  binding and the original submitter didn't respond in time with a fix.

  We will be looking into ways how to reintroduce it properly (we have
  to agree on a way how to handle cases where vendor uses the very same
  product ID for I2C and SPI parts, leading to this kind conflict). But
  before that is settled, let's revert the to unbreak everybody else
  (Krzysztof Kozlowski)"

* tag 'hid-for-linus-2024092601' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  dt-bindings: input: Revert "dt-bindings: input: Goodix SPI HID Touchscreen"
  HID: hid-goodix: drop unsupported and undocumented DT part

5 months agofbcon: break earlier in search_fb_in_map and search_for_mapped_con
Qianqiang Liu [Thu, 26 Sep 2024 11:59:11 +0000 (19:59 +0800)]
fbcon: break earlier in search_fb_in_map and search_for_mapped_con

Break the for loop immediately upon finding the target, making the
process more efficient.

Signed-off-by: Qianqiang Liu <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
5 months agofbdev: omapfb: Call of_node_put(ep) only once in omapdss_of_find_source_for_first_ep()
Markus Elfring [Wed, 25 Sep 2024 19:12:36 +0000 (21:12 +0200)]
fbdev: omapfb: Call of_node_put(ep) only once in omapdss_of_find_source_for_first_ep()

An of_node_put(ep) call was immediately used after a pointer check
for a of_graph_get_remote_port() call in this function implementation.
Thus call such a function only once instead directly before the check.

This issue was transformed by using the Coccinelle software.

Signed-off-by: Markus Elfring <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
5 months agofbcon: Fix a NULL pointer dereference issue in fbcon_putcs
Qianqiang Liu [Wed, 25 Sep 2024 05:29:36 +0000 (13:29 +0800)]
fbcon: Fix a NULL pointer dereference issue in fbcon_putcs

syzbot has found a NULL pointer dereference bug in fbcon.
Here is the simplified C reproducer:

struct param {
uint8_t type;
struct tiocl_selection ts;
};

int main()
{
struct fb_con2fbmap con2fb;
struct param param;

int fd = open("/dev/fb1", 0, 0);

con2fb.console = 0x19;
con2fb.framebuffer = 0;
ioctl(fd, FBIOPUT_CON2FBMAP, &con2fb);

param.type = 2;
param.ts.xs = 0; param.ts.ys = 0;
param.ts.xe = 0; param.ts.ye = 0;
param.ts.sel_mode = 0;

int fd1 = open("/dev/tty1", O_RDWR, 0);
ioctl(fd1, TIOCLINUX, &param);

con2fb.console = 1;
con2fb.framebuffer = 0;
ioctl(fd, FBIOPUT_CON2FBMAP, &con2fb);

return 0;
}

After calling ioctl(fd1, TIOCLINUX, &param), the subsequent ioctl(fd, FBIOPUT_CON2FBMAP, &con2fb)
causes the kernel to follow a different execution path:

 set_con2fb_map
  -> con2fb_init_display
   -> fbcon_set_disp
    -> redraw_screen
     -> hide_cursor
      -> clear_selection
       -> highlight
        -> invert_screen
         -> do_update_region
          -> fbcon_putcs
           -> ops->putcs

Since ops->putcs is a NULL pointer, this leads to a kernel panic.
To prevent this, we need to call set_blitting_type() within set_con2fb_map()
to properly initialize ops->putcs.

Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=3d613ae53c031502687a
Tested-by: [email protected]
Signed-off-by: Qianqiang Liu <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
5 months agoMerge tag 'v6.12-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Thu, 26 Sep 2024 16:20:19 +0000 (09:20 -0700)]
Merge tag 'v6.12-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
 "Most are from the recent SMB3.1.1 test event, and also an important
  netfs fix for a cifs mtime write regression

   - fix mode reported by stat of readonly directories and files

   - DFS (global namespace) related fixes

   - fixes for special file support via reparse points

   - mount improvement and reconnect fix

   - fix for noisy log message on umount

   - two netfs related fixes, one fixing a recent regression, and add
     new write tracepoint"

* tag 'v6.12-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  netfs, cifs: Fix mtime/ctime update for mmapped writes
  cifs: update internal version number
  smb: client: print failed session logoffs with FYI
  cifs: Fix reversion of the iter in cifs_readv_receive().
  smb3: fix incorrect mode displayed for read-only files
  smb: client: fix parsing of device numbers
  smb: client: set correct device number on nfs reparse points
  smb: client: propagate error from cifs_construct_tcon()
  smb: client: fix DFS failover in multiuser mounts
  cifs: Make the write_{enter,done,err} tracepoints display netfs info
  smb: client: fix DFS interlink failover
  smb: client: improve purging of cached referrals
  smb: client: avoid unnecessary reconnects when refreshing referrals

5 months agoMerge tag 'probes-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux...
Linus Torvalds [Thu, 26 Sep 2024 15:55:36 +0000 (08:55 -0700)]
Merge tag 'probes-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes updates from Masami Hiramatsu:

 - uprobes: make trace_uprobe->nhit counter a per-CPU one

   This makes uprobe event's hit counter per-CPU for improving
   scalability on multi-core environment

 - kprobes: Remove obsoleted declaration for init_test_probes

   Remove unused init_test_probes() from header

 - Raw tracepoint probe supports raw tracepoint events on modules:
     - add a function for iterating over all tracepoints in all modules
     - add a function for iterating over tracepoints in a module
     - support raw tracepoint events on modules
     - support raw tracepoints on future loaded modules
     - add a test for tracepoint events on modules"

* tag 'probes-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  sefltests/tracing: Add a test for tracepoint events on modules
  tracing/fprobe: Support raw tracepoints on future loaded modules
  tracing/fprobe: Support raw tracepoint events on modules
  tracepoint: Support iterating tracepoints in a loading module
  tracepoint: Support iterating over tracepoints on modules
  kprobes: Remove obsoleted declaration for init_test_probes
  uprobes: turn trace_uprobe's nhit counter to be per-CPU one

5 months agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Thu, 26 Sep 2024 15:43:17 +0000 (08:43 -0700)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "Several new features here:

   - virtio-balloon supports new stats

   - vdpa supports setting mac address

   - vdpa/mlx5 suspend/resume as well as MKEY ops are now faster

   - virtio_fs supports new sysfs entries for queue info

   - virtio/vsock performance has been improved

  And fixes, cleanups all over the place"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits)
  vsock/virtio: avoid queuing packets when intermediate queue is empty
  vsock/virtio: refactor virtio_transport_send_pkt_work
  fw_cfg: Constify struct kobj_type
  vdpa/mlx5: Postpone MR deletion
  vdpa/mlx5: Introduce init/destroy for MR resources
  vdpa/mlx5: Rename mr_mtx -> lock
  vdpa/mlx5: Extract mr members in own resource struct
  vdpa/mlx5: Rename function
  vdpa/mlx5: Delete direct MKEYs in parallel
  vdpa/mlx5: Create direct MKEYs in parallel
  MAINTAINERS: add virtio-vsock driver in the VIRTIO CORE section
  virtio_fs: add sysfs entries for queue information
  virtio_fs: introduce virtio_fs_put_locked helper
  vdpa: Remove unused declarations
  vdpa/mlx5: Parallelize VQ suspend/resume for CVQ MQ command
  vdpa/mlx5: Small improvement for change_num_qps()
  vdpa/mlx5: Keep notifiers during suspend but ignore
  vdpa/mlx5: Parallelize device resume
  vdpa/mlx5: Parallelize device suspend
  vdpa/mlx5: Use async API for vq modify commands
  ...

5 months agodm verity: fallback to platform keyring also if key in trusted keyring is rejected
Luca Boccassi [Sun, 22 Sep 2024 16:17:53 +0000 (18:17 +0200)]
dm verity: fallback to platform keyring also if key in trusted keyring is rejected

If enabled, we fallback to the platform keyring if the trusted keyring doesn't have
the key used to sign the roothash. But if pkcs7_verify() rejects the key for other
reasons, such as usage restrictions, we do not fallback. Do so.

Follow-up for 6fce1f40e95182ebbfe1ee3096b8fc0b37903269

Suggested-by: Serge Hallyn <[email protected]>
Signed-off-by: Luca Boccassi <[email protected]>
Acked-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Mikulas Patocka <[email protected]>
5 months agodm-verity: restart or panic on an I/O error
Mikulas Patocka [Tue, 24 Sep 2024 13:18:29 +0000 (15:18 +0200)]
dm-verity: restart or panic on an I/O error

Maxim Suhanov reported that dm-verity doesn't crash if an I/O error
happens. In theory, this could be used to subvert security, because an
attacker can create sectors that return error with the Write Uncorrectable
command. Some programs may misbehave if they have to deal with EIO.

This commit fixes dm-verity, so that if "panic_on_corruption" or
"restart_on_corruption" was specified and an I/O error happens, the
machine will panic or restart.

This commit also changes kernel_restart to emergency_restart -
kernel_restart calls reboot notifiers and these reboot notifiers may wait
for the bio that failed. emergency_restart doesn't call the notifiers.

Reported-by: Maxim Suhanov <[email protected]>
Signed-off-by: Mikulas Patocka <[email protected]>
Cc: [email protected]
5 months agodm: fix spelling errors
Shen Lichuan [Tue, 24 Sep 2024 13:21:11 +0000 (15:21 +0200)]
dm: fix spelling errors

Fixed some confusing spelling errors that were currently identified,
the details are as follows:

-in the code comments:
        dm-cache-target.c: 1371:        exclussive      ==> exclusive
        dm-raid.c: 2522:                repective       ==> respective

Signed-off-by: Shen Lichuan <[email protected]>
Signed-off-by: Mikulas Patocka <[email protected]>
5 months agodm-cache: remove pointless error check
Dipendra Khadka [Sun, 22 Sep 2024 16:47:01 +0000 (16:47 +0000)]
dm-cache: remove pointless error check

Smatch reported following:
'''
drivers/md/dm-cache-target.c:3204 parse_cblock_range() warn: sscanf doesn't return error codes
drivers/md/dm-cache-target.c:3217 parse_cblock_range() warn: sscanf doesn't return error codes
'''

Sscanf doesn't return negative values at all.

Signed-off-by: Dipendra Khadka <[email protected]>
Signed-off-by: Mikulas Patocka <[email protected]>
5 months agosh: intc: Replace simple_strtoul() with kstrtoul()
Hongbo Li [Mon, 2 Sep 2024 02:45:34 +0000 (10:45 +0800)]
sh: intc: Replace simple_strtoul() with kstrtoul()

The function simple_strtoul() performs no error checking
in scenarios where the input value overflows the intended
output variable.

We can replace the use of simple_strtoul() with the safer
alternative kstrtoul(). This also allows us to print an
error message in case of failure.

Signed-off-by: Hongbo Li <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: John Paul Adrian Glaubitz <[email protected]>
Signed-off-by: John Paul Adrian Glaubitz <[email protected]>
5 months agosh: Remove unused declarations for make_maskreg_irq() and irq_mask_register
Gaosheng Cui [Sat, 24 Aug 2024 12:06:09 +0000 (20:06 +0800)]
sh: Remove unused declarations for make_maskreg_irq() and irq_mask_register

make_maskreg_irq() and irq_mask_register have been removed since
commit 5a4053b23262 ("sh: Kill off dead boards."), so remove the
unused declarations.

Signed-off-by: Gaosheng Cui <[email protected]>
Reviewed-by: John Paul Adrian Glaubitz <[email protected]>
Signed-off-by: John Paul Adrian Glaubitz <[email protected]>
5 months agodt-bindings: gpio: ep9301: Add missing "#interrupt-cells" to examples
Rob Herring [Wed, 25 Sep 2024 17:35:10 +0000 (12:35 -0500)]
dt-bindings: gpio: ep9301: Add missing "#interrupt-cells" to examples

Enabling dtc interrupt_provider check reveals the examples are missing
the "#interrupt-cells" property as it is a dependency of
"interrupt-controller".

Some of the indentation is off, so fix that too.

Signed-off-by: Rob Herring (Arm) <[email protected]>
Reviewed-by: Nikita Shubin <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
5 months agoMerge tag 'nf-24-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Paolo Abeni [Thu, 26 Sep 2024 13:47:10 +0000 (15:47 +0200)]
Merge tag 'nf-24-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

v2: with kdoc fixes per Paolo Abeni.

The following patchset contains Netfilter fixes for net:

Patch #1 and #2 handle an esoteric scenario: Given two tasks sending UDP
packets to one another, two packets of the same flow in each direction
handled by different CPUs that result in two conntrack objects in NEW
state, where reply packet loses race. Then, patch #3 adds a testcase for
this scenario. Series from Florian Westphal.

1) NAT engine can falsely detect a port collision if it happens to pick
   up a reply packet as NEW rather than ESTABLISHED. Add extra code to
   detect this and suppress port reallocation in this case.

2) To complete the clash resolution in the reply direction, extend conntrack
   logic to detect clashing conntrack in the reply direction to existing entry.

3) Adds a test case.

Then, an assorted list of fixes follow:

4) Add a selftest for tproxy, from Antonio Ojea.

5) Guard ctnetlink_*_size() functions under
   #if defined(CONFIG_NETFILTER_NETLINK_GLUE_CT) || defined(CONFIG_NF_CONNTRACK_EVENTS)
   From Andy Shevchenko.

6) Use -m socket --transparent in iptables tproxy documentation.
   From XIE Zhibang.

7) Call kfree_rcu() when releasing flowtable hooks to address race with
   netlink dump path, from Phil Sutter.

8) Fix compilation warning in nf_reject with CONFIG_BRIDGE_NETFILTER=n.
   From Simon Horman.

9) Guard ctnetlink_label_size() under CONFIG_NF_CONNTRACK_EVENTS which
   is its only user, to address a compilation warning. From Simon Horman.

10) Use rcu-protected list iteration over basechain hooks from netlink
    dump path.

11) Fix memcg for nf_tables, use GFP_KERNEL_ACCOUNT is not complete.

12) Remove old nfqueue conntrack clash resolution. Instead trying to
    use same destination address consistently which requires double DNAT,
    use the existing clash resolution which allows clashing packets
    go through with different destination. Antonio Ojea originally
    reported an issue from the postrouting chain, I proposed a fix:
    https://lore.kernel.org/netfilter-devel/ZuwSwAqKgCB2a51-@calendula/T/
    which he reported it did not work for him.

13) Adds a selftest for patch 12.

14) Fixes ipvs.sh selftest.

netfilter pull request 24-09-26

* tag 'nf-24-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  selftests: netfilter: Avoid hanging ipvs.sh
  kselftest: add test for nfqueue induced conntrack race
  netfilter: nfnetlink_queue: remove old clash resolution logic
  netfilter: nf_tables: missing objects with no memcg accounting
  netfilter: nf_tables: use rcu chain hook list iterator from netlink dump path
  netfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS
  netfilter: nf_reject: Fix build warning when CONFIG_BRIDGE_NETFILTER=n
  netfilter: nf_tables: Keep deleted flowtable hooks until after RCU
  docs: tproxy: ignore non-transparent sockets in iptables
  netfilter: ctnetlink: Guard possible unused functions
  selftests: netfilter: nft_tproxy.sh: add tcp tests
  selftests: netfilter: add reverse-clash resolution test case
  netfilter: conntrack: add clash resolution for reverse collisions
  netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
5 months agoMAINTAINERS: Update EP93XX ARM ARCHITECTURE maintainer
Nikita Shubin [Tue, 17 Sep 2024 16:34:01 +0000 (19:34 +0300)]
MAINTAINERS: Update EP93XX ARM ARCHITECTURE maintainer

Add myself as maintainer of EP93XX ARCHITECTURE.

CC: Alexander Sverdlin <[email protected]>
CC: Arnd Bergmann <[email protected]>
Signed-off-by: Nikita Shubin <[email protected]>
Acked-by: Alexander Sverdlin <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
5 months agosoc: ep93xx: drop reference to removed EP93XX_SOC_COMMON config
Lukas Bulwahn [Tue, 24 Sep 2024 09:24:23 +0000 (11:24 +0200)]
soc: ep93xx: drop reference to removed EP93XX_SOC_COMMON config

Commit 6eab0ce6e1c6 ("soc: Add SoC driver for Cirrus ep93xx") adds the
config EP93XX_SOC referring to the config EP93XX_SOC_COMMON.

Within the same patch series of the commit above, the commit 046322f1e1d9
("ARM: ep93xx: DT for the Cirrus ep93xx SoC platforms") then removes the
config EP93XX_SOC_COMMON. With that the reference to this config is
obsolete.

Simplify the expression in the EP93XX_SOC config definition.

Signed-off-by: Lukas Bulwahn <[email protected]>
Reviewed-by: Nikita Shubin <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
5 months agoselftests: netfilter: Avoid hanging ipvs.sh
Phil Sutter [Thu, 19 Sep 2024 12:40:00 +0000 (14:40 +0200)]
selftests: netfilter: Avoid hanging ipvs.sh

If the client can't reach the server, the latter remains listening
forever. Kill it after 5s of waiting.

Fixes: 867d2190799a ("selftests: netfilter: add ipvs test script")
Signed-off-by: Phil Sutter <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
5 months agokselftest: add test for nfqueue induced conntrack race
Florian Westphal [Wed, 18 Sep 2024 13:16:33 +0000 (15:16 +0200)]
kselftest: add test for nfqueue induced conntrack race

The netfilter race happens when two packets with the same tuple are DNATed
and enqueued with nfqueue in the postrouting hook.

Once one of the packet is reinjected it may be DNATed again to a different
destination, but the conntrack entry remains the same and the return packet
was dropped.

Based on earlier patch from Antonio Ojea.

Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1766
Co-developed-by: Antonio Ojea <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
5 months agonetfilter: nfnetlink_queue: remove old clash resolution logic
Florian Westphal [Wed, 18 Sep 2024 13:13:39 +0000 (15:13 +0200)]
netfilter: nfnetlink_queue: remove old clash resolution logic

For historical reasons there are two clash resolution spots in
netfilter, one in nfnetlink_queue and one in conntrack core.

nfnetlink_queue one was added first: If a colliding entry is found, NAT
NAT transformation is reversed by calling nat engine again with altered
tuple.

See commit 368982cd7d1b ("netfilter: nfnetlink_queue: resolve clash for
unconfirmed conntracks") for details.

One problem is that nf_reroute() won't take an action if the queueing
doesn't occur in the OUTPUT hook, i.e. when queueing in forward or
postrouting, packet will be sent via the wrong path.

Another problem is that the scenario addressed (2nd UDP packet sent with
identical addresses while first packet is still being processed) can also
occur without any nfqueue involvement due to threaded resolvers doing
A and AAAA requests back-to-back.

This lead us to add clash resolution logic to the conntrack core, see
commit 6a757c07e51f ("netfilter: conntrack: allow insertion of clashing
entries").  Instead of fixing the nfqueue based logic, lets remove it
and let conntrack core handle this instead.

Retain the ->update hook for sake of nfqueue based conntrack helpers.
We could axe this hook completely but we'd have to split confirm and
helper logic again, see commit ee04805ff54a ("netfilter: conntrack: make
conntrack userspace helpers work again").

This SHOULD NOT be backported to kernels earlier than v5.6; they lack
adequate clash resolution handling.

Patch was originally written by Pablo Neira Ayuso.

Reported-by: Antonio Ojea <[email protected]>
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1766
Signed-off-by: Florian Westphal <[email protected]>
Tested-by: Antonio Ojea <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
5 months agonetfilter: nf_tables: missing objects with no memcg accounting
Pablo Neira Ayuso [Wed, 18 Sep 2024 12:19:45 +0000 (14:19 +0200)]
netfilter: nf_tables: missing objects with no memcg accounting

Several ruleset objects are still not using GFP_KERNEL_ACCOUNT for
memory accounting, update them. This includes:

- catchall elements
- compat match large info area
- log prefix
- meta secctx
- numgen counters
- pipapo set backend datastructure
- tunnel private objects

Fixes: 33758c891479 ("memcg: enable accounting for nft objects")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
5 months agonetfilter: nf_tables: use rcu chain hook list iterator from netlink dump path
Pablo Neira Ayuso [Tue, 17 Sep 2024 21:07:46 +0000 (23:07 +0200)]
netfilter: nf_tables: use rcu chain hook list iterator from netlink dump path

Lockless iteration over hook list is possible from netlink dump path,
use rcu variant to iterate over the hook list as is done with flowtable
hooks.

Fixes: b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain")
Reported-by: Phil Sutter <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
5 months agonetfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS
Simon Horman [Mon, 16 Sep 2024 15:14:41 +0000 (16:14 +0100)]
netfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS

Only provide ctnetlink_label_size when it is used,
which is when CONFIG_NF_CONNTRACK_EVENTS is configured.

Flagged by clang-18 W=1 builds as:

.../nf_conntrack_netlink.c:385:19: warning: unused function 'ctnetlink_label_size' [-Wunused-function]
  385 | static inline int ctnetlink_label_size(const struct nf_conn *ct)
      |                   ^~~~~~~~~~~~~~~~~~~~

The condition on CONFIG_NF_CONNTRACK_LABELS being removed by
this patch guards compilation of non-trivial implementations
of ctnetlink_dump_labels() and ctnetlink_label_size().

However, this is not necessary as each of these functions
will always return 0 if CONFIG_NF_CONNTRACK_LABELS is not defined
as each function starts with the equivalent of:

struct nf_conn_labels *labels = nf_ct_labels_find(ct);

if (!labels)
return 0;

And nf_ct_labels_find always returns NULL if CONFIG_NF_CONNTRACK_LABELS
is not enabled.  So I believe that the compiler optimises the code away
in such cases anyway.

Found by inspection.
Compile tested only.

Originally splitted in two patches, Pablo Neira Ayuso collapsed them and
added Fixes: tag.

Fixes: 0ceabd83875b ("netfilter: ctnetlink: deliver labels to userspace")
Link: https://lore.kernel.org/netfilter-devel/[email protected]/
Signed-off-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
5 months agonetfilter: nf_reject: Fix build warning when CONFIG_BRIDGE_NETFILTER=n
Simon Horman [Mon, 16 Sep 2024 09:50:34 +0000 (10:50 +0100)]
netfilter: nf_reject: Fix build warning when CONFIG_BRIDGE_NETFILTER=n

If CONFIG_BRIDGE_NETFILTER is not enabled, which is the case for x86_64
defconfig, then building nf_reject_ipv4.c and nf_reject_ipv6.c with W=1
using gcc-14 results in the following warnings, which are treated as
errors:

net/ipv4/netfilter/nf_reject_ipv4.c: In function 'nf_send_reset':
net/ipv4/netfilter/nf_reject_ipv4.c:243:23: error: variable 'niph' set but not used [-Werror=unused-but-set-variable]
  243 |         struct iphdr *niph;
      |                       ^~~~
cc1: all warnings being treated as errors
net/ipv6/netfilter/nf_reject_ipv6.c: In function 'nf_send_reset6':
net/ipv6/netfilter/nf_reject_ipv6.c:286:25: error: variable 'ip6h' set but not used [-Werror=unused-but-set-variable]
  286 |         struct ipv6hdr *ip6h;
      |                         ^~~~
cc1: all warnings being treated as errors

Address this by reducing the scope of these local variables to where
they are used, which is code only compiled when CONFIG_BRIDGE_NETFILTER
enabled.

Compile tested and run through netfilter selftests.

Reported-by: Andy Shevchenko <[email protected]>
Closes: https://lore.kernel.org/netfilter-devel/[email protected]/
Signed-off-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
5 months agonetfilter: nf_tables: Keep deleted flowtable hooks until after RCU
Phil Sutter [Thu, 12 Sep 2024 12:21:33 +0000 (14:21 +0200)]
netfilter: nf_tables: Keep deleted flowtable hooks until after RCU

Documentation of list_del_rcu() warns callers to not immediately free
the deleted list item. While it seems not necessary to use the
RCU-variant of list_del() here in the first place, doing so seems to
require calling kfree_rcu() on the deleted item as well.

Fixes: 3f0465a9ef02 ("netfilter: nf_tables: dynamically allocate hooks per net_device in flowtables")
Signed-off-by: Phil Sutter <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
5 months agodocs: tproxy: ignore non-transparent sockets in iptables
谢致邦 (XIE Zhibang) [Thu, 12 Sep 2024 11:59:33 +0000 (11:59 +0000)]
docs: tproxy: ignore non-transparent sockets in iptables

The iptables example was added in commit d2f26037a38a (netfilter: Add
documentation for tproxy, 2008-10-08), but xt_socket 'transparent'
option was added in commit a31e1ffd2231 (netfilter: xt_socket: added new
revision of the 'socket' match supporting flags, 2009-06-09).

Now add the 'transparent' option to the iptables example to ignore
non-transparent sockets, which is also consistent with the nft example.

Signed-off-by: 谢致邦 (XIE Zhibang) <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
5 months agonetfilter: ctnetlink: Guard possible unused functions
Andy Shevchenko [Tue, 10 Sep 2024 08:35:33 +0000 (11:35 +0300)]
netfilter: ctnetlink: Guard possible unused functions

Some of the functions may be unused (CONFIG_NETFILTER_NETLINK_GLUE_CT=n
and CONFIG_NF_CONNTRACK_EVENTS=n), it prevents kernel builds with clang,
`make W=1` and CONFIG_WERROR=y:

net/netfilter/nf_conntrack_netlink.c:657:22: error: unused function 'ctnetlink_acct_size' [-Werror,-Wunused-function]
  657 | static inline size_t ctnetlink_acct_size(const struct nf_conn *ct)
      |                      ^~~~~~~~~~~~~~~~~~~
net/netfilter/nf_conntrack_netlink.c:667:19: error: unused function 'ctnetlink_secctx_size' [-Werror,-Wunused-function]
  667 | static inline int ctnetlink_secctx_size(const struct nf_conn *ct)
      |                   ^~~~~~~~~~~~~~~~~~~~~
net/netfilter/nf_conntrack_netlink.c:683:22: error: unused function 'ctnetlink_timestamp_size' [-Werror,-Wunused-function]
  683 | static inline size_t ctnetlink_timestamp_size(const struct nf_conn *ct)
      |                      ^~~~~~~~~~~~~~~~~~~~~~~~

Fix this by guarding possible unused functions with ifdeffery.

See also commit 6863f5643dd7 ("kbuild: allow Clang to find unused static
inline functions for W=1 build").

Signed-off-by: Andy Shevchenko <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
5 months agoselftests: netfilter: nft_tproxy.sh: add tcp tests
Antonio Ojea [Thu, 12 Sep 2024 06:17:54 +0000 (06:17 +0000)]
selftests: netfilter: nft_tproxy.sh: add tcp tests

The TPROXY functionality is widely used, however, there are only mptcp
selftests covering this feature.

The selftests represent the most common scenarios and can also be used
as selfdocumentation of the feature.

UDP and TCP testcases are split in different files because of the
different nature of the protocols, specially due to the challenges that
present to reliable test UDP due to the connectionless nature of the
protocol. UDP only covers the scenarios involving the prerouting hook.

The UDP tests are signfinicantly slower than the TCP ones, hence they
use a larger timeout, it takes 20 seconds to run the full UDP suite
on a 48 vCPU Intel(R) Xeon(R) CPU @2.60GHz.

Signed-off-by: Antonio Ojea <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
5 months agoselftests: netfilter: add reverse-clash resolution test case
Florian Westphal [Tue, 10 Sep 2024 09:38:16 +0000 (11:38 +0200)]
selftests: netfilter: add reverse-clash resolution test case

Add test program that is sending UDP packets in both directions
and check that packets arrive without source port modification.

Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
5 months agonetfilter: conntrack: add clash resolution for reverse collisions
Florian Westphal [Tue, 10 Sep 2024 09:38:15 +0000 (11:38 +0200)]
netfilter: conntrack: add clash resolution for reverse collisions

Given existing entry:
ORIGIN: a:b -> c:d
REPLY:  c:d -> a:b

And colliding entry:
ORIGIN: c:d -> a:b
REPLY:  a:b -> c:d

The colliding ct (and the associated skb) get dropped on insert.
Permit this by checking if the colliding entry matches the reply
direction.

Happens when both ends send packets at same time, both requests are picked
up as NEW, rather than NEW for the 'first' and 'ESTABLISHED' for the
second packet.

This is an esoteric condition, as ruleset must permit NEW connections
in either direction and both peers must already have a bidirectional
traffic flow at the time conntrack gets enabled.

Allow the 'reverse' skb to pass and assign the existing (clashing)
entry.

While at it, also drop the extra 'dying' check, this is already
tested earlier by the calling function.

Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
5 months agonetfilter: nf_nat: don't try nat source port reallocation for reverse dir clash
Florian Westphal [Tue, 10 Sep 2024 09:38:14 +0000 (11:38 +0200)]
netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash

A conntrack entry can be inserted to the connection tracking table if there
is no existing entry with an identical tuple in either direction.

Example:
INITIATOR -> NAT/PAT -> RESPONDER

Initiator passes through NAT/PAT ("us") and SNAT is done (saddr rewrite).
Then, later, NAT/PAT machine itself also wants to connect to RESPONDER.

This will not work if the SNAT done earlier has same IP:PORT source pair.

Conntrack table has:
ORIGINAL: $IP_INITATOR:$SPORT -> $IP_RESPONDER:$DPORT
REPLY:    $IP_RESPONDER:$DPORT -> $IP_NAT:$SPORT

and new locally originating connection wants:
ORIGINAL: $IP_NAT:$SPORT -> $IP_RESPONDER:$DPORT
REPLY:    $IP_RESPONDER:$DPORT -> $IP_NAT:$SPORT

This is handled by the NAT engine which will do a source port reallocation
for the locally originating connection that is colliding with an existing
tuple by attempting a source port rewrite.

This is done even if this new connection attempt did not go through a
masquerade/snat rule.

There is a rare race condition with connection-less protocols like UDP,
where we do the port reallocation even though its not needed.

This happens when new packets from the same, pre-existing flow are received
in both directions at the exact same time on different CPUs after the
conntrack table was flushed (or conntrack becomes active for first time).

With strict ordering/single cpu, the first packet creates new ct entry and
second packet is resolved as established reply packet.

With parallel processing, both packets are picked up as new and both get
their own ct entry.

In this case, the 'reply' packet (picked up as ORIGINAL) can be mangled by
NAT engine because a port collision is detected.

This change isn't enough to prevent a packet drop later during
nf_conntrack_confirm(), the existing clash resolution strategy will not
detect such reverse clash case.  This is resolved by a followup patch.

Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
5 months agoselftests/net: packetdrill: increase timing tolerance in debug mode
Willem de Bruijn [Thu, 19 Sep 2024 12:43:42 +0000 (08:43 -0400)]
selftests/net: packetdrill: increase timing tolerance in debug mode

Some packetdrill tests are flaky in debug mode. As discussed, increase
tolerance.

We have been doing this for debug builds outside ksft too.

Previous setting was 10000. A manual 50 runs in virtme-ng showed two
failures that needed 12000. To be on the safe side, Increase to 14000.

Link: https://lore.kernel.org/netdev/Zuhhe4-MQHd3EkfN@mini-arch/
Fixes: 1e42f73fd3c2 ("selftests/net: packetdrill: import tcp/zerocopy")
Reported-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Willem de Bruijn <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Acked-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
5 months agousbnet: fix cyclical race on disconnect with work queue
Oliver Neukum [Thu, 19 Sep 2024 12:33:42 +0000 (14:33 +0200)]
usbnet: fix cyclical race on disconnect with work queue

The work can submit URBs and the URBs can schedule the work.
This cycle needs to be broken, when a device is to be stopped.
Use a flag to do so.
This is a design issue as old as the driver.

Signed-off-by: Oliver Neukum <[email protected]>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
CC: [email protected]
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
5 months agonet: stmmac: set PP_FLAG_DMA_SYNC_DEV only if XDP is enabled
Furong Xu [Thu, 19 Sep 2024 12:10:28 +0000 (20:10 +0800)]
net: stmmac: set PP_FLAG_DMA_SYNC_DEV only if XDP is enabled

Commit 5fabb01207a2 ("net: stmmac: Add initial XDP support") sets
PP_FLAG_DMA_SYNC_DEV flag for page_pool unconditionally,
page_pool_recycle_direct() will call page_pool_dma_sync_for_device()
on every page even the page is not going to be reused by XDP program.

When XDP is not enabled, the page which holds the received buffer
will be recycled once the buffer is copied into new SKB by
skb_copy_to_linear_data(), then the MAC core will never reuse this
page any longer. Always setting PP_FLAG_DMA_SYNC_DEV wastes CPU cycles
on unnecessary calling of page_pool_dma_sync_for_device().

After this patch, up to 9% noticeable performance improvement was observed
on certain platforms.

Fixes: 5fabb01207a2 ("net: stmmac: Add initial XDP support")
Signed-off-by: Furong Xu <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
5 months agovirtio_net: Fix mismatched buf address when unmapping for small packets
Wenbo Li [Thu, 19 Sep 2024 08:13:51 +0000 (16:13 +0800)]
virtio_net: Fix mismatched buf address when unmapping for small packets

Currently, the virtio-net driver will perform a pre-dma-mapping for
small or mergeable RX buffer. But for small packets, a mismatched address
without VIRTNET_RX_PAD and xdp_headroom is used for unmapping.

That will result in unsynchronized buffers when SWIOTLB is enabled, for
example, when running as a TDX guest.

This patch unifies the address passed to the virtio core as the address of
the virtnet header and fixes the mismatched buffer address.

Changes from v2: unify the buf that passed to the virtio core in small
and merge mode.
Changes from v1: Use ctx to get xdp_headroom.

Fixes: 295525e29a5b ("virtio_net: merge dma operations when filling mergeable buffers")
Signed-off-by: Wenbo Li <[email protected]>
Signed-off-by: Jiahui Cen <[email protected]>
Signed-off-by: Ying Fang <[email protected]>
Reviewed-by: Xuan Zhuo <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
5 months agoksmbd: Correct typos in multiple comments across various files
Shen Lichuan [Wed, 25 Sep 2024 07:43:23 +0000 (15:43 +0800)]
ksmbd: Correct typos in multiple comments across various files

Fixed some confusing typos that were currently identified witch codespell,
the details are as follows:

-in the code comments:
fs/smb/common/smb2pdu.h:9: specfication ==> specification
fs/smb/common/smb2pdu.h:494: usally ==> usually
fs/smb/common/smb2pdu.h:1064: Attrubutes ==> Attributes
fs/smb/server/connection.c:28: cleand ==> cleaned
fs/smb/server/ksmbd_netlink.h:216: struture ==> structure
fs/smb/server/oplock.c:799: conains ==> contains
fs/smb/server/oplock.c:1487: containted ==> contained
fs/smb/server/server.c:282: proccessing ==> processing
fs/smb/server/smb_common.c:491: comforms ==> conforms
fs/smb/server/xattr.h:102: ATTRIBUITE ==> ATTRIBUTE

Signed-off-by: Shen Lichuan <[email protected]>
Acked-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
5 months agoksmbd: fix open failure from block and char device file
Namjae Jeon [Tue, 24 Sep 2024 13:39:29 +0000 (22:39 +0900)]
ksmbd: fix open failure from block and char device file

char/block device file can't be opened with dentry_open() if device driver
is not loaded. Use O_PATH flags for fake opening file to handle it if file
is a block or char file.

Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
5 months agoksmbd: remove unsafe_memcpy use in session setup
Namjae Jeon [Mon, 23 Sep 2024 13:39:11 +0000 (22:39 +0900)]
ksmbd: remove unsafe_memcpy use in session setup

Kees pointed out to just use directly ->Buffer instead of pointing
->Buffer using offset not to use unsafe_memcpy().

Suggested-by: Kees Cook <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
5 months agoMerge tag 'for-6.12/block-20240925' of git://git.kernel.dk/linux
Linus Torvalds [Wed, 25 Sep 2024 21:56:40 +0000 (14:56 -0700)]
Merge tag 'for-6.12/block-20240925' of git://git.kernel.dk/linux

Pull more block updates from Jens Axboe:

 - Improve blk-integrity segment counting and merging (Keith)

 - NVMe pull request via Keith:
      - Multipath fixes (Hannes)
      - Sysfs attribute list NULL terminate fix (Shin'ichiro)
      - Remove problematic read-back (Keith)

 - Fix for a regression with the IO scheduler switching freezing from
   6.11 (Damien)

 - Use a raw spinlock for sbitmap, as it may get called from preempt
   disabled context (Ming)

 - Cleanup for bd_claiming waiting, using var_waitqueue() rather than
   the bit waitqueues, as that more accurately describes that it does
   (Neil)

 - Various cleanups (Kanchan, Qiu-ji, David)

* tag 'for-6.12/block-20240925' of git://git.kernel.dk/linux:
  nvme: remove CC register read-back during enabling
  nvme: null terminate nvme_tls_attrs
  nvme-multipath: avoid hang on inaccessible namespaces
  nvme-multipath: system fails to create generic nvme device
  lib/sbitmap: define swap_lock as raw_spinlock_t
  block: Remove unused blk_limits_io_{min,opt}
  drbd: Fix atomicity violation in drbd_uuid_set_bm()
  block: Fix elv_iosched_local_module handling of "none" scheduler
  block: remove bogus union
  block: change wait on bd_claiming to use a var_waitqueue
  blk-integrity: improved sg segment mapping
  block: unexport blk_rq_count_integrity_sg
  nvme-rdma: use request to get integrity segments
  scsi: use request to get integrity segments
  block: provide a request helper for user integrity segments
  blk-integrity: consider entire bio list for merging
  blk-integrity: properly account for segments
  blk-mq: set the nr_integrity_segments from bio
  blk-mq: unconditional nr_integrity_segments

5 months agoMerge tag 'spi-fix-v6.12-merge-window' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 25 Sep 2024 21:49:34 +0000 (14:49 -0700)]
Merge tag 'spi-fix-v6.12-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "Some driver specific fixes that came in during the merge window.

  Lorenzo Bianconi did some extra testing on the recently added arioha
  driver and found some issues, Alexander Dahl fixed some issues with
  signal delays in the Atmel QSPI driver and Jinjie Ruan has been fixing
  some nits with runtime PM cleanup"

* tag 'spi-fix-v6.12-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: atmel-quadspi: Avoid overwriting delay register settings
  spi: airoha: remove read cache in airoha_snand_dirmap_read()
  spi: spi-fsl-lpspi: Undo runtime PM changes at driver exit time
  spi: atmel-quadspi: Undo runtime PM changes at driver exit time
  spi: airoha: fix airoha_snand_{write,read}_data data_len estimation
  spi: airoha: fix dirmap_{read,write} operations

5 months agoMerge tag 'rtc-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Linus Torvalds [Wed, 25 Sep 2024 21:38:37 +0000 (14:38 -0700)]
Merge tag 'rtc-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
 "More conversions of DT bindings to yaml. There is one new driver, for
  the DFRobot SD2405AL and support for important features of the stm32
  RTC. Summary:

  New driver:
   - DFRobot SD2405AL

  Drivers:
   - stm32: add alarm A out and LSCO support
   - sun6i: disable automatic clock input switching
   - m48t59: set range"

* tag 'rtc-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
  rtc: rc5t619: use proper module tables
  rtc: m48t59: set range
  dt-bindings: rtc: microcrystal,rv3028: add #clock-cells property
  rtc: m48t59: Remove division condition with direct comparison
  rtc: at91sam9: fix OF node leak in probe() error path
  rtc: sun6i: disable automatic clock input switching
  dt-bindings: rtc: Drop non-trivial duplicate compatibles
  dt-bindings: vendor-prefixes: Add DFRobot.
  dt-bindings: rtc: Add support for SD2405AL.
  rtc: Add driver for SD2405AL
  rtc: s35390a: Drop vendorless compatible string from match table
  rtc: twl: convert comma to semicolon
  dt-bindings: rtc: sprd,sc2731-rtc: convert to YAML
  rtc: stm32: add alarm A out feature
  rtc: stm32: add Low Speed Clock Output (LSCO) support
  rtc: stm32: add pinctrl and pinmux interfaces
  dt-bindings: rtc: stm32: describe pinmux nodes

5 months agodt-bindings: input: Revert "dt-bindings: input: Goodix SPI HID Touchscreen"
Krzysztof Kozlowski [Wed, 25 Sep 2024 19:49:21 +0000 (21:49 +0200)]
dt-bindings: input: Revert "dt-bindings: input: Goodix SPI HID Touchscreen"

This reverts commit 9184b17fbc23 ("dt-bindings: input: Goodix SPI HID
Touchscreen") because it duplicates existing binding leadings to errors:

  goodix,gt7986u.example.dtb:
  touchscreen@0: compatible: 'oneOf' conditional failed, one must be fixed:
        ['goodix,gt7986u'] is too short
        'goodix,gt7375p' was expected

This was reported on mailing list on 6th of September, but no reaction
happened from contributor or maintainer to fix it.

Therefore let's drop binding which breaks and duplicates existing one.

Fixes: 9184b17fbc23 ("dt-bindings: input: Goodix SPI HID Touchscreen")
Reported-by: Rob Herring <[email protected]>
Closes: https://lore.kernel.org/all/CAL_Jsq+QfTtRj_JCqXzktQ49H8VUnztVuaBjvvkg3fwEHniUHw@mail.gmail.com/
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
5 months agoHID: hid-goodix: drop unsupported and undocumented DT part
Krzysztof Kozlowski [Wed, 25 Sep 2024 19:49:20 +0000 (21:49 +0200)]
HID: hid-goodix: drop unsupported and undocumented DT part

Drop support for Devicetree from, because the binding is being reverted
(on basis of duplicating existing binding) and property was not added to
the original binding.

Signed-off-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
5 months agointel_idle: fix ACPI _CST matching for newer Xeon platforms
Artem Bityutskiy [Fri, 13 Sep 2024 16:51:43 +0000 (19:51 +0300)]
intel_idle: fix ACPI _CST matching for newer Xeon platforms

Background
~~~~~~~~~~

The driver uses 'use_acpi = true' in C-state custom table for all Xeon
platforms. The meaning of this flag is as follows.

 1. If a C-state from the custom table is defined in ACPI _CST (matched
    by the mwait hint), then enable this C-state.

 2. Otherwise, disable this C-state, unless the C-sate definition in the
    custom table has the 'CPUIDLE_FLAG_ALWAYS_ENABLE' flag set, in which
    case enabled it.

The goal is to honor BIOS C6 settings - If BIOS disables C6, disable it
by default in the OS too (but it can be enabled via sysfs).

This works well on Xeons that expose only one flavor of C6. This are all
Xeons except for the newest Granite Rapids (GNR) and Sierra Forest (SRF).

The problem
~~~~~~~~~~~

GNR and SRF have 2 flavors of C6: C6/C6P on GNR, C6S/C6SP on SRF. The
the "P" flavor allows for the package C6, while the "non-P" flavor
allows only for core/module C6.

As far as this patch is concerned, both GNR and SRF platforms are
handled the same way. Therefore, further discussion is focused on GNR,
but it applies to SRF as well.

On Intel Xeon platforms, BIOS exposes only 2 ACPI C-states: C1 and C2.
Well, depending on BIOS settings, C2 may be named as C3. But there still
will be only 2 states - C1 and C3. But this is a non-essential detail,
so further discussion is focused on the ACPI C1 and C2 case.

On pre-GNR/SRF Xeon platforms, ACPI C1 is mapped to C1 or C1E, and ACPI
C2 is mapped to C6. The 'use_acpi' flag works just fine:

 * If ACPI C2 enabled, enable C6.
 * Otherwise, disable C6.

However, on GNR there are 2 flavors of C6, so BIOS maps ACPI C2 to
either C6 or C6P, depending on the user settings. As a result, due to
the 'use_acpi' flag, 'intel_idle' disables least one of the C6 flavors.

BIOS                   | OS                         | Verdict
----------------------------------------------------|---------
ACPI C2 disabled       | C6 disabled, C6P disabled  | OK
ACPI C2 mapped to C6   | C6 enabled,  C6P disabled  | Not OK
ACPI C2 mapped to C6P  | C6 disabled, C6P enabled   | Not OK

The goal of 'use_acpi' is to honor BIOS ACPI C2 disabled case, which
works fine. But if ACPI C2 is enabled, the goal is to enable all flavors
of C6, not just one of the flavors. This was overlooked when enabling
GNR/SRF platforms.

In other words, before GNR/SRF, the ACPI C2 status was binary - enabled
or disabled. But it is not binary on GNR/SRF, however the goal is to
continue treat it as binary.

The fix
~~~~~~~

Notice, that current algorithm matches ACPI and custom table C-states
by the mwait hint. However, mwait hint consists of the 'state' and
'sub-state' parts, and all C6 flavors have the same state value of 0x20,
but different sub-state values.

Introduce new C-state table flag - CPUIDLE_FLAG_PARTIAL_HINT_MATCH and
add it to both C6 flavors of the GNR/SRF platforms.

When matching ACPI _CST and custom table C-states, match only the start
part if the C-state has CPUIDLE_FLAG_PARTIAL_HINT_MATCH, other wise
match both state and sub-state parts (as before).

With this fix, GNR C-states enabled/disabled status looks like this.

BIOS                   | OS
----------------------------------------------------
ACPI C2 disabled       | C6 disabled, C6P disabled
ACPI C2 mapped to C6   | C6 enabled, C6P enabled
ACPI C2 mapped to C6P  | C6 enabled, C6P enabled

Possible alternative
~~~~~~~~~~~~~~~~~~~~

The alternative would be to remove 'use_acpi' flag for GNR and SRF.
This would be a simpler solution, but it would violate the principle of
least surprise - users of Xeon platforms are used to the fact that
intel_idle honors C6 enabled/disabled flag. It is more consistent user
experience if GNR/SRF continue doing so.

How tested
~~~~~~~~~~

Tested on GNR and SRF platform with all the 3 BIOS configurations: ACPI
C2 disabled, mapped to C6/C6S, mapped to C6P/C6SP.

Tested on Ice lake Xeon and Sapphire Rapids Xeon platforms with ACPI C2
enabled and disabled, just to verify that the patch does not break older
Xeons.

Fixes: 92813fd5b156 ("intel_idle: add Sierra Forest SoC support")
Fixes: 370406bf5738 ("intel_idle: add Granite Rapids Xeon support")
Cc: 6.8+ <[email protected]> # 6.8+
Signed-off-by: Artem Bityutskiy <[email protected]>
Link: https://patch.msgid.link/[email protected]
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <[email protected]>
5 months agoMerge tag 'memblock-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt...
Linus Torvalds [Wed, 25 Sep 2024 18:35:19 +0000 (11:35 -0700)]
Merge tag 'memblock-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

Pull memblock updates from Mike Rapoport:

 - new memblock_estimated_nr_free_pages() helper to replace
   totalram_pages() which is less accurate when
   CONFIG_DEFERRED_STRUCT_PAGE_INIT is set

 - fixes for memblock tests

* tag 'memblock-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  s390/mm: get estimated free pages by memblock api
  kernel/fork.c: get estimated free pages by memblock api
  mm/memblock: introduce a new helper memblock_estimated_nr_free_pages()
  memblock test: fix implicit declaration of function 'strscpy'
  memblock test: fix implicit declaration of function 'isspace'
  memblock test: fix implicit declaration of function 'memparse'
  memblock test: add the definition of __setup()
  memblock test: fix implicit declaration of function 'virt_to_phys'
  tools/testing: abstract two init.h into common include directory
  memblock tests: include export.h in linkage.h as kernel dose
  memblock tests: include memory_hotplug.h in mmzone.h as kernel dose

5 months agoMerge tag 'sparc-for-6.12-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 25 Sep 2024 18:21:06 +0000 (11:21 -0700)]
Merge tag 'sparc-for-6.12-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc

Pull sparc32 update from Andreas Larsson:

 - Remove an unused variable for sparc32

* tag 'sparc-for-6.12-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc:
  arch/sparc: remove unused varible paddrbase in function leon_swprobe()

5 months agoMerge tag 'powerpc-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Wed, 25 Sep 2024 18:17:25 +0000 (11:17 -0700)]
Merge tag 'powerpc-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix build error in vdso32 when building 64-bit with COMPAT=y and -Os

 - Fix build error in pseries EEH when CONFIG_DEBUG_FS is not set

Thanks to Christophe Leroy, Narayana Murty N, Christian Zigotzky, and
Ritesh Harjani.

* tag 'powerpc-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pseries/eeh: move pseries_eeh_err_inject() outside CONFIG_DEBUG_FS block
  powerpc/vdso32: Fix use of crtsavres for PPC64

5 months agoMerge tag 'clang-format-6.12' of https://github.com/ojeda/linux
Linus Torvalds [Wed, 25 Sep 2024 18:10:39 +0000 (11:10 -0700)]
Merge tag 'clang-format-6.12' of https://github.com/ojeda/linux

Pull clang-format updates from Miguel Ojeda:
 "A routine update of the 'for_each' macro list"

* tag 'clang-format-6.12' of https://github.com/ojeda/linux:
  clang-format: Update with v6.11-rc1's `for_each` macro list

5 months agoKbuild: make MODVERSIONS support depend on not being a compile test build
Linus Torvalds [Wed, 25 Sep 2024 18:08:28 +0000 (11:08 -0700)]
Kbuild: make MODVERSIONS support depend on not being a compile test build

Currently the Rust support is gated on not having MODVERSIONS enabled,
and as a result an "allmodconfig" build will disable Rust build tests.

While MODVERSIONS configurations are worth build testing, the feature is
not actually meaningful unless you run the result, and I'd rather get
build coverage of Rust than MODVERSIONS.  So let's disable MODVERSIONS
for build testing until the Rust side clears up.

Signed-off-by: Linus Torvalds <[email protected]>
5 months agoMerge tag 'rust-6.12' of https://github.com/Rust-for-Linux/linux
Linus Torvalds [Wed, 25 Sep 2024 17:25:40 +0000 (10:25 -0700)]
Merge tag 'rust-6.12' of https://github.com/Rust-for-Linux/linux

Pull Rust updates from Miguel Ojeda:
 "Toolchain and infrastructure:

   - Support 'MITIGATION_{RETHUNK,RETPOLINE,SLS}' (which cleans up
     objtool warnings), teach objtool about 'noreturn' Rust symbols and
     mimic '___ADDRESSABLE()' for 'module_{init,exit}'. With that, we
     should be objtool-warning-free, so enable it to run for all Rust
     object files.

   - KASAN (no 'SW_TAGS'), KCFI and shadow call sanitizer support.

   - Support 'RUSTC_VERSION', including re-config and re-build on
     change.

   - Split helpers file into several files in a folder, to avoid
     conflicts in it. Eventually those files will be moved to the right
     places with the new build system. In addition, remove the need to
     manually export the symbols defined there, reusing existing
     machinery for that.

   - Relax restriction on configurations with Rust + GCC plugins to just
     the RANDSTRUCT plugin.

  'kernel' crate:

   - New 'list' module: doubly-linked linked list for use with reference
     counted values, which is heavily used by the upcoming Rust Binder.

     This includes 'ListArc' (a wrapper around 'Arc' that is guaranteed
     unique for the given ID), 'AtomicTracker' (tracks whether a
     'ListArc' exists using an atomic), 'ListLinks' (the prev/next
     pointers for an item in a linked list), 'List' (the linked list
     itself), 'Iter' (an iterator over a 'List'), 'Cursor' (a cursor
     into a 'List' that allows to remove elements), 'ListArcField' (a
     field exclusively owned by a 'ListArc'), as well as support for
     heterogeneous lists.

   - New 'rbtree' module: red-black tree abstractions used by the
     upcoming Rust Binder.

     This includes 'RBTree' (the red-black tree itself), 'RBTreeNode' (a
     node), 'RBTreeNodeReservation' (a memory reservation for a node),
     'Iter' and 'IterMut' (immutable and mutable iterators), 'Cursor'
     (bidirectional cursor that allows to remove elements), as well as
     an entry API similar to the Rust standard library one.

   - 'init' module: add 'write_[pin_]init' methods and the
     'InPlaceWrite' trait. Add the 'assert_pinned!' macro.

   - 'sync' module: implement the 'InPlaceInit' trait for 'Arc' by
     introducing an associated type in the trait.

   - 'alloc' module: add 'drop_contents' method to 'BoxExt'.

   - 'types' module: implement the 'ForeignOwnable' trait for
     'Pin<Box<T>>' and improve the trait's documentation. In addition,
     add the 'into_raw' method to the 'ARef' type.

   - 'error' module: in preparation for the upcoming Rust support for
     32-bit architectures, like arm, locally allow Clippy lint for
     those.

  Documentation:

   - https://rust.docs.kernel.org has been announced, so link to it.

   - Enable rustdoc's "jump to definition" feature, making its output a
     bit closer to the experience in a cross-referencer.

   - Debian Testing now also provides recent Rust releases (outside of
     the freeze period), so add it to the list.

  MAINTAINERS:

   - Trevor is joining as reviewer of the "RUST" entry.

  And a few other small bits"

* tag 'rust-6.12' of https://github.com/Rust-for-Linux/linux: (54 commits)
  kasan: rust: Add KASAN smoke test via UAF
  kbuild: rust: Enable KASAN support
  rust: kasan: Rust does not support KHWASAN
  kbuild: rust: Define probing macros for rustc
  kasan: simplify and clarify Makefile
  rust: cfi: add support for CFI_CLANG with Rust
  cfi: add CONFIG_CFI_ICALL_NORMALIZE_INTEGERS
  rust: support for shadow call stack sanitizer
  docs: rust: include other expressions in conditional compilation section
  kbuild: rust: replace proc macros dependency on `core.o` with the version text
  kbuild: rust: rebuild if the version text changes
  kbuild: rust: re-run Kconfig if the version text changes
  kbuild: rust: add `CONFIG_RUSTC_VERSION`
  rust: avoid `box_uninit_write` feature
  MAINTAINERS: add Trevor Gross as Rust reviewer
  rust: rbtree: add `RBTree::entry`
  rust: rbtree: add cursor
  rust: rbtree: add mutable iterator
  rust: rbtree: add iterator
  rust: rbtree: add red-black tree implementation backed by the C version
  ...

5 months agodrm/amdkfd: Add SDMA queue quantum support for GFX12
Sreekant Somasekharan [Fri, 20 Sep 2024 05:53:17 +0000 (01:53 -0400)]
drm/amdkfd: Add SDMA queue quantum support for GFX12

program SDMAx_QUEUEx_SCHEDULE_CNTL for context switch due to
quantum in KFD for GFX12.

Signed-off-by: Sreekant Somasekharan <[email protected]>
Reviewed-by: Harish Kasiviswanathan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 6.11.x
5 months agodrm/amdgpu/vcn: enable AV1 on both instances
Saleemkhan Jamadar [Fri, 20 Sep 2024 13:10:18 +0000 (18:40 +0530)]
drm/amdgpu/vcn: enable AV1 on both instances

v1 - remove cs parse code (Christian)

On VCN v4_0_6 AV1 is supported on both the instances.
Remove cs IB parse code since explict handling of AV1 schedule is
not required.

Signed-off-by: Saleemkhan Jamadar <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
5 months agodrm/amdkfd: Fix CU occupancy for GFX 9.4.3
Mukul Joshi [Fri, 20 Sep 2024 18:59:29 +0000 (14:59 -0400)]
drm/amdkfd: Fix CU occupancy for GFX 9.4.3

Make CU occupancy calculations work on GFX 9.4.3 by
updating the logic to handle multiple XCCs correctly.

Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Harish Kasiviswanathan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
5 months agodrm/amdkfd: Update logic for CU occupancy calculations
Mukul Joshi [Mon, 16 Sep 2024 18:33:58 +0000 (14:33 -0400)]
drm/amdkfd: Update logic for CU occupancy calculations

Currently, the code uses the IH_VMID_X_LUT register to map
a queue's vmid to the corresponding PASID. This logic is racy
since CP can update the VMID-PASID mapping anytime especially
when there are more processes than number of vmids. Update the
logic to calculate CU occupancy by matching doorbell offset of
the queue with valid wave counts against the process's queues.

Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Harish Kasiviswanathan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
5 months agodrm/amdgpu: skip coredump after job timeout in SRIOV
ZhenGuo Yin [Thu, 19 Sep 2024 03:38:04 +0000 (11:38 +0800)]
drm/amdgpu: skip coredump after job timeout in SRIOV

VF FLR will be triggered by host driver before job timeout,
hence the error status of GPU get cleared. Performing a
coredump here is unnecessary.

Signed-off-by: ZhenGuo Yin <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
5 months agodrm/amdgpu: sync to KFD fences before clearing PTEs
Christian König [Wed, 21 Aug 2024 11:55:41 +0000 (13:55 +0200)]
drm/amdgpu: sync to KFD fences before clearing PTEs

This patch tries to solve the basic problem we also need to sync to
the KFD fences of the BO because otherwise it can be that we clear
PTEs while the KFD queues are still running.

Signed-off-by: Christian König <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
5 months agodrm/amdgpu/mes12: set enable_level_process_quantum_check
Jack Xiao [Wed, 18 Sep 2024 09:07:13 +0000 (17:07 +0800)]
drm/amdgpu/mes12: set enable_level_process_quantum_check

enable_level_process_quantum_check is requried to enable process
quantum based scheduling.

Signed-off-by: Jack Xiao <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 6.11.x
5 months agosefltests/tracing: Add a test for tracepoint events on modules
Masami Hiramatsu (Google) [Sun, 18 Aug 2024 10:43:35 +0000 (19:43 +0900)]
sefltests/tracing: Add a test for tracepoint events on modules

Add a test case for tracepoint events on modules. This checks if it can add
and remove the events correctly.

Link: https://lore.kernel.org/all/172397781494.286558.7581515061075998225.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
5 months agotracing/fprobe: Support raw tracepoints on future loaded modules
Masami Hiramatsu (Google) [Sun, 18 Aug 2024 10:43:26 +0000 (19:43 +0900)]
tracing/fprobe: Support raw tracepoints on future loaded modules

Support raw tracepoint events on future loaded (unloaded) modules.
This allows user to create raw tracepoint events which can be used from
module's __init functions.

Note: since the kernel does not have any information about the tracepoints
in the unloaded modules, fprobe events can not check whether the tracepoint
exists nor extend the BTF based arguments.

Link: https://lore.kernel.org/all/172397780593.286558.18360375226968537828.stgit@devnote2/
Suggested-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
5 months agotracing/fprobe: Support raw tracepoint events on modules
Masami Hiramatsu (Google) [Sun, 18 Aug 2024 10:43:16 +0000 (19:43 +0900)]
tracing/fprobe: Support raw tracepoint events on modules

Support raw tracepoint event on module by fprobe events.
Since it only uses for_each_kernel_tracepoint() to find a tracepoint,
the tracepoints on modules are not handled. Thus if user specified a
tracepoint on a module, it shows an error.
This adds new for_each_module_tracepoint() API to tracepoint subsystem,
and uses it to find tracepoints on modules.

Link: https://lore.kernel.org/all/172397779651.286558.15903703620679186867.stgit@devnote2/
Reported-by: don <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
5 months agotracepoint: Support iterating tracepoints in a loading module
Masami Hiramatsu (Google) [Sun, 18 Aug 2024 10:43:07 +0000 (19:43 +0900)]
tracepoint: Support iterating tracepoints in a loading module

Add for_each_tracepoint_in_module() function to iterate tracepoints in
a module. This API is needed for handling tracepoints in a loading
module from tracepoint_module_notifier callback function.
This also update for_each_module_tracepoint() to pass the module to
callback function so that it can find module easily.

Link: https://lore.kernel.org/all/172397778740.286558.15781131277732977643.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
5 months agotracepoint: Support iterating over tracepoints on modules
Masami Hiramatsu (Google) [Sun, 18 Aug 2024 10:42:58 +0000 (19:42 +0900)]
tracepoint: Support iterating over tracepoints on modules

Add for_each_module_tracepoint() for iterating over tracepoints
on modules. This is similar to the for_each_kernel_tracepoint()
but only for the tracepoints on modules (not including kernel
built-in tracepoints).

Link: https://lore.kernel.org/all/172397777800.286558.14554748203446214056.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
This page took 0.155049 seconds and 4 git commands to generate.