Git Repo - linux.git/log

Merge tag 'bcachefs-2024-11-13' of git://evilpiepirate.org/bcachefs

Pull bcachefs fixes from Kent Overstreet:
"This fixes one minor regression from the btree cache fixes (in the
  scan_for_btree_nodes repair path) - and the shutdown path fix is the
  big one here, in terms of bugs closed:

   - Assorted tiny syzbot fixes

   - Shutdown path fix: "bch2_btree_write_buffer_flush_going_ro()"

     The shutdown path wasn't flushing the btree write buffer, leading
     to shutting down while we still had operations in flight. This
     fixes a whole slew of syzbot bugs, and undoubtedly other strange
     heisenbugs.

* tag 'bcachefs-2024-11-13' of git://evilpiepirate.org/bcachefs:
  bcachefs: Fix assertion pop in bch2_ptr_swab()
  bcachefs: Fix journal_entry_dev_usage_to_text() overrun
  bcachefs: Allow for unknown key types in backpointers fsck
  bcachefs: Fix assertion pop in topology repair
  bcachefs: Fix hidden btree errors when reading roots
  bcachefs: Fix validate_bset() repair path
  bcachefs: Fix missing validation for bch_backpointer.level
  bcachefs: Fix bch_member.btree_bitmap_shift validation
  bcachefs: bch2_btree_write_buffer_flush_going_ro()

Merge tag 'pm-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
"Fix a locking issue in the asymmetric CPU capacity setup code in the
  intel_pstate driver that may lead to a deadlock if CPU online/offline
  runs in parallel with the code in question, which is unlikely but not
  impossible (Rafael Wysocki)"

* tag 'pm-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: intel_pstate: Rearrange locking in hybrid_init_cpu_capacity_scaling()

Merge tag 'tpmdd-next-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull tpm fixes from Jarkko Sakkinen:
"Two bug fixes for TPM bus encryption (the remaining reported issues in
  the feature)"

* tag 'tpmdd-next-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  tpm: Disable TPM on tpm2_create_primary() failure
  tpm: Opt-in in disable PCR integrity protection

tpm: Disable TPM on tpm2_create_primary() failure

The earlier bug fix misplaced the error-label when dealing with the
tpm2_create_primary() return value, which the original completely ignored.

Cc: [email protected]
Reported-by: Christoph Anton Mitterer <[email protected]>
Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1087331
Fixes: cc7d8594342a ("tpm: Rollback tpm2_load_null()")
Signed-off-by: Jarkko Sakkinen <[email protected]>

tpm: Opt-in in disable PCR integrity protection

The initial HMAC session feature added TPM bus encryption and/or integrity
protection to various in-kernel TPM operations. This can cause performance
bottlenecks with IMA, as it heavily utilizes PCR extend operations.

In order to mitigate this performance issue, introduce a kernel
command-line parameter to the TPM driver for disabling the integrity
protection for PCR extend operations (i.e. TPM2_PCR_Extend).

Cc: James Bottomley <[email protected]>
Link: https://lore.kernel.org/linux-integrity/[email protected]/
Fixes: 6519fea6fd37 ("tpm: add hmac checks to tpm2_pcr_extend()")
Tested-by: Mimi Zohar <[email protected]>
Co-developed-by: Roberto Sassu <[email protected]>
Signed-off-by: Roberto Sassu <[email protected]>
Co-developed-by: Mimi Zohar <[email protected]>
Signed-off-by: Mimi Zohar <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Daniel Borkmann:

- Fix a mismatching RCU unlock flavor in bpf_out_neigh_v6 (Jiawei Ye)

- Fix BPF sockmap with kTLS to reject vsock and unix sockets upon kTLS
   context retrieval (Zijian Zhang)

- Fix BPF bits iterator selftest for s390x (Hou Tao)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Fix mismatched RCU unlock flavour in bpf_out_neigh_v6
  bpf: Add sk_is_inet and IS_ICSK check in tls_sw_has_ctx_tx/rx
  selftests/bpf: Use -4095 as the bad address for bits iterator

Merge tag 'loongarch-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:

- fix possible CPUs setup logical-physical CPU mapping, in order to
   avoid CPU hotplug issue

- fix some KASAN bugs

- fix AP booting issue in VM mode

- some trivial cleanups

* tag 'loongarch-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: Fix AP booting issue in VM mode
  LoongArch: Add WriteCombine shadow mapping in KASAN
  LoongArch: Disable KASAN if PGDIR_SIZE is too large for cpu_vabits
  LoongArch: Make KASAN work with 5-level page-tables
  LoongArch: Define a default value for VM_DATA_DEFAULT_FLAGS
  LoongArch: Fix early_numa_add_cpu() usage for FDT systems
  LoongArch: For all possible CPUs setup logical-physical CPU mapping

Merge tag 'mm-hotfixes-stable-2024-11-12-16-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
"10 hotfixes, 7 of which are cc:stable. 7 are MM, 3 are not. All
  singletons"

* tag 'mm-hotfixes-stable-2024-11-12-16-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm: swapfile: fix cluster reclaim work crash on rotational devices
  selftests: hugetlb_dio: fixup check for initial conditions to skip in the start
  mm/thp: fix deferred split queue not partially_mapped: fix
  mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
  nommu: pass NULL argument to vma_iter_prealloc()
  ocfs2: fix UBSAN warning in ocfs2_verify_volume()
  nilfs2: fix null-ptr-deref in block_dirty_buffer tracepoint
  nilfs2: fix null-ptr-deref in block_touch_buffer tracepoint
  mm: page_alloc: move mlocked flag clearance into free_pages_prepare()
  mm: count zeromap read and set for swapout and swapin

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fix from Michael Tsirkin:
"A last minute mlx5 bugfix"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa/mlx5: Fix PA offset with unaligned starting iotlb map

mm: swapfile: fix cluster reclaim work crash on rotational devices

syzbot and Daan report a NULL pointer crash in the new full swap cluster
reclaim work:

> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI
> KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
> CPU: 1 UID: 0 PID: 51 Comm: kworker/1:1 Not tainted 6.12.0-rc6-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
> Workqueue: events swap_reclaim_work
> RIP: 0010:__list_del_entry_valid_or_report+0x20/0x1c0 lib/list_debug.c:49
> Code: 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 89 fe 48 83 c7 08 48 83 ec 18 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 19 01 00 00 48 89 f2 48 8b 4e 08 48 b8 00 00 00
> RSP: 0018:ffffc90000bb7c30 EFLAGS: 00010202
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffff88807b9ae078
> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
> RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000001 R11: 000000000000004f R12: dffffc0000000000
> R13: ffffffffffffffb8 R14: ffff88807b9ae000 R15: ffffc90003af1000
> FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fffaca68fb8 CR3: 00000000791c8000 CR4: 00000000003526f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  __list_del_entry_valid include/linux/list.h:124 [inline]
>  __list_del_entry include/linux/list.h:215 [inline]
>  list_move_tail include/linux/list.h:310 [inline]
>  swap_reclaim_full_clusters+0x109/0x460 mm/swapfile.c:748
>  swap_reclaim_work+0x2e/0x40 mm/swapfile.c:779

The syzbot console output indicates a virtual environment where swapfile
is on a rotational device.  In this case, clusters aren't actually used,
and si->full_clusters is not initialized.  Daan's report is from qemu, so
likely rotational too.

Make sure to only schedule the cluster reclaim work when clusters are
actually in use.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://github.com/systemd/systemd/issues/35044
Fixes: 5168a68eb78f ("mm, swap: avoid over reclaim of full clusters")
Reported-by: [email protected]
Signed-off-by: Johannes Weiner <[email protected]>
Reported-by: Daan De Meyer <[email protected]>
Cc: Kairui Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

vdpa/mlx5: Fix PA offset with unaligned starting iotlb map

When calculating the physical address range based on the iotlb and mr
[start,end) ranges, the offset of mr->start relative to map->start
is not taken into account. This leads to some incorrect and duplicate
mappings.

For the case when mr->start < map->start the code is already correct:
the range in [mr->start, map->start) was handled by a different
iteration.

Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code")
Cc: [email protected]
Signed-off-by: Si-Wei Liu <[email protected]>
Signed-off-by: Dragos Tatulea <[email protected]>
Message-Id: <20241021134040 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Jason Wang <[email protected]>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"x86 and selftests fixes.

  x86:

   - When emulating a guest TLB flush for a nested guest, flush vpid01,
     not vpid02, if L2 is active but VPID is disabled in vmcs12, i.e. if
     L2 and L1 are sharing VPID '0' (from L1's perspective).

   - Fix a bug in the SNP initialization flow where KVM would return '0'
     to userspace instead of -errno on failure.

   - Move the Intel PT virtualization (i.e. outputting host trace to
     host buffer and guest trace to guest buffer) behind CONFIG_BROKEN.

   - Fix memory leak on failure of KVM_SEV_SNP_LAUNCH_START

   - Fix a bug where KVM fails to inject an interrupt from the IRR after
     KVM_SET_LAPIC.

  Selftests:

   - Increase the timeout for the memslot performance selftest to avoid
     false failures on arm64 and nested x86 platforms.

   - Fix a goof in the guest_memfd selftest where a for-loop initialized
     a bit mask to zero instead of BIT(0).

   - Disable strict aliasing when building KVM selftests to prevent the
     compiler from treating things like "u64 *" to "uint64_t *" cases as
     undefined behavior, which can lead to nasty, hard to debug
     failures.

   - Force -march=x86-64-v2 for KVM x86 selftests if and only if the
     uarch is supported by the compiler.

   - Fix broken compilation of kvm selftests after a header sync in
     tools/"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: VMX: Bury Intel PT virtualization (guest/host mode) behind CONFIG_BROKEN
  KVM: x86: Unconditionally set irr_pending when updating APICv state
  kvm: svm: Fix gctx page leak on invalid inputs
  KVM: selftests: use X86_MEMTYPE_WB instead of VMX_BASIC_MEM_TYPE_WB
  KVM: SVM: Propagate error from snp_guest_req_init() to userspace
  KVM: nVMX: Treat vpid01 as current if L2 is active, but with VPID disabled
  KVM: selftests: Don't force -march=x86-64-v2 if it's unsupported
  KVM: selftests: Disable strict aliasing
  KVM: selftests: fix unintentional noop test in guest_memfd_test.c
  KVM: selftests: memslot_perf_test: increase guest sync timeout

Merge tag 'for-6.12/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mikulas Patocka:

- fix warnings about duplicate slab cache names

* tag 'for-6.12/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-cache: fix warnings about duplicate slab caches
dm-bufio: fix warnings about duplicate slab caches

Merge tag 'integrity-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity fixes from Mimi Zohar:
"One bug fix, one performance improvement, and the use of
  static_assert:

   - The bug fix addresses "only a cosmetic change" commit, which didn't
     take into account the original 'ima' template definition.

  - The performance improvement limits the atomic_read()"

* tag 'integrity-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  integrity: Use static_assert() to check struct sizes
  evm: stop avoidably reading i_writecount in evm_file_release
  ima: fix buffer overrun in ima_eventdigest_init_common

Merge tag 'landlock-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull landlock fixes from Mickaël Salaün:
"This fixes issues in the Landlock's sandboxer sample and
  documentation, slightly refactors helpers (required for ongoing patch
  series), and improve/fix a feature merged in v6.12 (signal and
  abstract UNIX socket scoping)"

* tag 'landlock-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  landlock: Optimize scope enforcement
  landlock: Refactor network access mask management
  landlock: Refactor filesystem access mask management
  samples/landlock: Clarify option parsing behaviour
  samples/landlock: Refactor help message
  samples/landlock: Fix port parsing in sandboxer
  landlock: Fix grammar issues in documentation
  landlock: Improve documentation of previous limitations

selftests: hugetlb_dio: fixup check for initial conditions to skip in the start

This test verifies that a hugepage, used as a user buffer for DIO
operations, is correctly freed upon unmapping. To test this, we read the
count of free hugepages before and after the mmap, DIO, and munmap
operations, then check if the free hugepage count is the same.

Reading free hugepages before the test was removed by commit 0268d4579901
('selftests: hugetlb_dio: check for initial conditions to skip at the
start'), causing the test to always fail.

This patch adds back reading the free hugepages before starting the test.
With this patch, the tests are now passing.

Test results without this patch:

./tools/testing/selftests/mm/hugetlb_dio
TAP version 13
1..4
# No. Free pages before allocation : 0
# No. Free pages after munmap : 100
not ok 1 : Huge pages not freed!
# No. Free pages before allocation : 0
# No. Free pages after munmap : 100
not ok 2 : Huge pages not freed!
# No. Free pages before allocation : 0
# No. Free pages after munmap : 100
not ok 3 : Huge pages not freed!
# No. Free pages before allocation : 0
# No. Free pages after munmap : 100
not ok 4 : Huge pages not freed!
# Totals: pass:0 fail:4 xfail:0 xpass:0 skip:0 error:0

Test results with this patch:

/tools/testing/selftests/mm/hugetlb_dio
TAP version 13
1..4
# No. Free pages before allocation : 100
# No. Free pages after munmap : 100
ok 1 : Huge pages freed successfully !
# No. Free pages before allocation : 100
# No. Free pages after munmap : 100
ok 2 : Huge pages freed successfully !
# No. Free pages before allocation : 100
# No. Free pages after munmap : 100
ok 3 : Huge pages freed successfully !
# No. Free pages before allocation : 100
# No. Free pages after munmap : 100
ok 4 : Huge pages freed successfully !

# Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 0268d4579901 ("selftests: hugetlb_dio: check for initial conditions to skip in the start")
Signed-off-by: Donet Tom <[email protected]>
Cc: Muhammad Usama Anjum <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/thp: fix deferred split queue not partially_mapped: fix

Though even more elusive than before, list_del corruption has still been
seen on THP's deferred split queue.

The idea in commit e66f3185fa04 was right, but its implementation wrong.
The context omitted an important comment just before the critical test:
"split_folio() removes folio from list on success." In ignoring that
comment, when a THP split succeeded, the code went on to release the
preceding safe folio, preserving instead an irrelevant (formerly head)
folio: which gives no safety because it's not on the list. Fix the logic.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: e66f3185fa04 ("mm/thp: fix deferred split queue not partially_mapped")
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Usama Arif <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Barry Song <[email protected]>
Cc: Chris Li <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Nhat Pham <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases

commit 53ba78de064b ("mm/gup: introduce
check_and_migrate_movable_folios()") created a new constraint on the
pin_user_pages*() API family: a potentially large internal allocation must
now occur, for FOLL_LONGTERM cases.

A user-visible consequence has now appeared: user space can no longer pin
more than 2GB of memory anymore on x86_64.  That's because, on a 4KB
PAGE_SIZE system, when user space tries to (indirectly, via a device
driver that calls pin_user_pages()) pin 2GB, this requires an allocation
of a folio pointers array of MAX_PAGE_ORDER size, which is the limit for
kmalloc().

In addition to the directly visible effect described above, there is also
the problem of adding an unnecessary allocation.  The **pages array
argument has already been allocated, and there is no need for a redundant
**folios array allocation in this case.

Fix this by avoiding the new allocation entirely.  This is done by
referring to either the original page[i] within **pages, or to the
associated folio.  Thanks to David Hildenbrand for suggesting this
approach and for providing the initial implementation (which I've tested
and adjusted slightly) as well.

[[email protected]: whitespace tweak, per David]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: bypass pofs_get_folio(), per Oscar]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 53ba78de064b ("mm/gup: introduce check_and_migrate_movable_folios()")
Signed-off-by: John Hubbard <[email protected]>
Suggested-by: David Hildenbrand <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Vivek Kasireddy <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Gerd Hoffmann <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dongwon Kim <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Junxiao Chang <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

bcachefs: Fix assertion pop in bch2_ptr_swab()

This runs on extents that haven't yet been validated, so we don't want
to assert that we have a valid entry type.

Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Fix journal_entry_dev_usage_to_text() overrun

If the jset_entry_dev_usage is malformed, and too small, our nr_entries
calculation will be incorrect - just bail out.

Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>

LoongArch: Fix AP booting issue in VM mode

Native IPI is used for AP booting, because it is the booting interface
between OS and BIOS firmware. The paravirt IPI is only used inside OS,
and native IPI is necessary to boot AP.

When booting AP, we write the kernel entry address in the HW mailbox of
AP and send IPI interrupt to it. AP executes idle instruction and waits
for interrupts or SW events, then clears IPI interrupt and jumps to the
kernel entry from HW mailbox.

Between writing HW mailbox and sending IPI, AP can be woken up by SW
events and jumps to the kernel entry, so ACTION_BOOT_CPU IPI interrupt
will keep pending during AP booting. And native IPI interrupt handler
needs be registered so that it can clear pending native IPI, else there
will be endless interrupts during AP booting stage.

Here native IPI interrupt is initialized even if paravirt IPI is used.

Cc: [email protected]
Fixes: 74c16b2e2b0c ("LoongArch: KVM: Add PV IPI support on guest side")
Signed-off-by: Bibo Mao <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Add WriteCombine shadow mapping in KASAN

Currently, the kernel couldn't boot when ARCH_IOREMAP, ARCH_WRITECOMBINE
and KASAN are enabled together. Because DMW2 is used by kernel now which
is configured as 0xa000000000000000 for WriteCombine, but KASAN has no
segment mapping for it. This patch fix this issue.

Solution: Add the relevant definitions for WriteCombine (DMW2) in KASAN.

Cc: [email protected]
Fixes: 8e02c3b782ec ("LoongArch: Add writecombine support for DMW-based ioremap()")
Signed-off-by: Kanglong Wang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Disable KASAN if PGDIR_SIZE is too large for cpu_vabits

If PGDIR_SIZE is too large for cpu_vabits, KASAN_SHADOW_END will
overflow UINTPTR_MAX because KASAN_SHADOW_START/KASAN_SHADOW_END are
aligned up by PGDIR_SIZE. And then the overflowed KASAN_SHADOW_END looks
like a user space address.

For example, PGDIR_SIZE of CONFIG_4KB_4LEVEL is 2^39, which is too large
for Loongson-2K series whose cpu_vabits = 39.

Since CONFIG_4KB_4LEVEL is completely legal for CPUs with cpu_vabits <=
39, we just disable KASAN via early return in kasan_init(). Otherwise we
get a boot failure.

Moreover, we change KASAN_SHADOW_END from the first address after KASAN
shadow area to the last address in KASAN shadow area, in order to avoid
the end address exactly overflow to 0 (which is a legal case). We don't
need to worry about alignment because pgd_addr_end() can handle it.

Cc: [email protected]
Reviewed-by: Jiaxun Yang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Make KASAN work with 5-level page-tables

Make KASAN work with 5-level page-tables, including:
1. Implement and use __pgd_none() and kasan_p4d_offset().
2. As done in kasan_pmd_populate() and kasan_pte_populate(), restrict
the loop conditions of kasan_p4d_populate() and kasan_pud_populate()
to avoid unnecessary population.

Cc: [email protected]
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Define a default value for VM_DATA_DEFAULT_FLAGS

This is a trivial cleanup, commit c62da0c35d58518d ("mm/vma: define a
default value for VM_DATA_DEFAULT_FLAGS") has unified default values of
VM_DATA_DEFAULT_FLAGS across different platforms.

Apply the same consistency to LoongArch.

Suggested-by: Wentao Guan <[email protected]>
Signed-off-by: Yuli Wang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Fix early_numa_add_cpu() usage for FDT systems

early_numa_add_cpu() applies on physical CPU id rather than logical CPU
id, so use cpuid instead of cpu.

Cc: [email protected]
Fixes: 3de9c42d02a79a5 ("LoongArch: Add all CPUs enabled by fdt to NUMA node 0")
Reported-by: Bibo Mao <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: For all possible CPUs setup logical-physical CPU mapping

In order to support ACPI-based physical CPU hotplug, we suppose for all
"possible" CPUs cpu_logical_map() can work. Because some drivers want to
use cpu_logical_map() for all "possible" CPUs, while currently we only
setup logical-physical mapping for "present" CPUs. This lack of mapping
also causes cpu_to_node() cannot work for hot-added CPUs.

All "possible" CPUs are listed in MADT, and the "present" subset is
marked as ACPI_MADT_ENABLED. To setup logical-physical CPU mapping for
all possible CPUs and keep present CPUs continuous in cpu_present_mask,
we parse MADT twice. The first pass handles CPUs with ACPI_MADT_ENABLED
and the second pass handles CPUs without ACPI_MADT_ENABLED.

The global flag (cpu_enumerated) is removed because acpi_map_cpu() calls
cpu_number_map() rather than set_processor_mask() now.

Reported-by: Bibo Mao <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

nommu: pass NULL argument to vma_iter_prealloc()

When deleting a vma entry from a maple tree, it has to pass NULL to
vma_iter_prealloc() in order to calculate internal state of the tree, but
it passed a wrong argument. As a result, nommu kernels crashed upon
accessing a vma iterator, such as acct_collect() reading the size of vma
entries after do_munmap().

This commit fixes this issue by passing a right argument to the
preallocation call.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: b5df09226450 ("mm: set up vma iterator for vma_iter_prealloc() calls")
Signed-off-by: Hajime Tazaki <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

ocfs2: fix UBSAN warning in ocfs2_verify_volume()

Syzbot has reported the following splat triggered by UBSAN:

UBSAN: shift-out-of-bounds in fs/ocfs2/super.c:2336:10
shift exponent 32768 is too large for 32-bit type 'int'
CPU: 2 UID: 0 PID: 5255 Comm: repro Not tainted 6.12.0-rc4-syzkaller-00047-gc2ee9f594da8 #0
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x241/0x360
? __pfx_dump_stack_lvl+0x10/0x10
? __pfx__printk+0x10/0x10
? __asan_memset+0x23/0x50
? lockdep_init_map_type+0xa1/0x910
__ubsan_handle_shift_out_of_bounds+0x3c8/0x420
ocfs2_fill_super+0xf9c/0x5750
? __pfx_ocfs2_fill_super+0x10/0x10
? __pfx_validate_chain+0x10/0x10
? __pfx_validate_chain+0x10/0x10
? validate_chain+0x11e/0x5920
? __lock_acquire+0x1384/0x2050
? __pfx_validate_chain+0x10/0x10
? string+0x26a/0x2b0
? widen_string+0x3a/0x310
? string+0x26a/0x2b0
? bdev_name+0x2b1/0x3c0
? pointer+0x703/0x1210
? __pfx_pointer+0x10/0x10
? __pfx_format_decode+0x10/0x10
? __lock_acquire+0x1384/0x2050
? vsnprintf+0x1ccd/0x1da0
? snprintf+0xda/0x120
? __pfx_lock_release+0x10/0x10
? do_raw_spin_lock+0x14f/0x370
? __pfx_snprintf+0x10/0x10
? set_blocksize+0x1f9/0x360
? sb_set_blocksize+0x98/0xf0
? setup_bdev_super+0x4e6/0x5d0
mount_bdev+0x20c/0x2d0
? __pfx_ocfs2_fill_super+0x10/0x10
? __pfx_mount_bdev+0x10/0x10
? vfs_parse_fs_string+0x190/0x230
? __pfx_vfs_parse_fs_string+0x10/0x10
legacy_get_tree+0xf0/0x190
? __pfx_ocfs2_mount+0x10/0x10
vfs_get_tree+0x92/0x2b0
do_new_mount+0x2be/0xb40
? __pfx_do_new_mount+0x10/0x10
__se_sys_mount+0x2d6/0x3c0
? __pfx___se_sys_mount+0x10/0x10
? do_syscall_64+0x100/0x230
? __x64_sys_mount+0x20/0xc0
do_syscall_64+0xf3/0x230
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f37cae96fda
Code: 48 8b 0d 51 ce 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1e ce 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007fff6c1aa228 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00007fff6c1aa240 RCX: 00007f37cae96fda
RDX: 00000000200002c0 RSI: 0000000020000040 RDI: 00007fff6c1aa240
RBP: 0000000000000004 R08: 00007fff6c1aa280 R09: 0000000000000000
R10: 00000000000008c0 R11: 0000000000000206 R12: 00000000000008c0
R13: 00007fff6c1aa280 R14: 0000000000000003 R15: 0000000001000000
</TASK>

For a really damaged superblock, the value of 'i_super.s_blocksize_bits'
may exceed the maximum possible shift for an underlying 'int'. So add an
extra check whether the aforementioned field represents the valid block
size, which is 512 bytes, 1K, 2K, or 4K.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
Signed-off-by: Dmitry Antipov <[email protected]>
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=56f7cd1abe4b8e475180
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

nilfs2: fix null-ptr-deref in block_dirty_buffer tracepoint

When using the "block:block_dirty_buffer" tracepoint, mark_buffer_dirty()
may cause a NULL pointer dereference, or a general protection fault when
KASAN is enabled.

This happens because, since the tracepoint was added in
mark_buffer_dirty(), it references the dev_t member bh->b_bdev->bd_dev
regardless of whether the buffer head has a pointer to a block_device
structure.

In the current implementation, nilfs_grab_buffer(), which grabs a buffer
to read (or create) a block of metadata, including b-tree node blocks,
does not set the block device, but instead does so only if the buffer is
not in the "uptodate" state for each of its caller block reading
functions. However, if the uptodate flag is set on a folio/page, and the
buffer heads are detached from it by try_to_free_buffers(), and new buffer
heads are then attached by create_empty_buffers(), the uptodate flag may
be restored to each buffer without the block device being set to
bh->b_bdev, and mark_buffer_dirty() may be called later in that state,
resulting in the bug mentioned above.

Fix this issue by making nilfs_grab_buffer() always set the block device
of the super block structure to the buffer head, regardless of the state
of the buffer's uptodate flag.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 5305cb830834 ("block: add block_{touch|dirty}_buffer tracepoint")
Signed-off-by: Ryusuke Konishi <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Ubisectech Sirius <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

nilfs2: fix null-ptr-deref in block_touch_buffer tracepoint

Patch series "nilfs2: fix null-ptr-deref bugs on block tracepoints".

This series fixes null pointer dereference bugs that occur when using
nilfs2 and two block-related tracepoints.

This patch (of 2):

It has been reported that when using "block:block_touch_buffer"
tracepoint, touch_buffer() called from __nilfs_get_folio_block() causes a
NULL pointer dereference, or a general protection fault when KASAN is
enabled.

This happens because since the tracepoint was added in touch_buffer(), it
references the dev_t member bh->b_bdev->bd_dev regardless of whether the
buffer head has a pointer to a block_device structure. In the current
implementation, the block_device structure is set after the function
returns to the caller.

Here, touch_buffer() is used to mark the folio/page that owns the buffer
head as accessed, but the common search helper for folio/page used by the
caller function was optimized to mark the folio/page as accessed when it
was reimplemented a long time ago, eliminating the need to call
touch_buffer() here in the first place.

So this solves the issue by eliminating the touch_buffer() call itself.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 5305cb830834 ("block: add block_{touch|dirty}_buffer tracepoint")
Signed-off-by: Ryusuke Konishi <[email protected]>
Reported-by: Ubisectech Sirius <[email protected]>
Closes: https://lkml.kernel.org/r/[email protected]
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=9982fb8d18eba905abe2
Tested-by: [email protected]
Cc: Tejun Heo <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: page_alloc: move mlocked flag clearance into free_pages_prepare()

Syzbot reported a bad page state problem caused by a page being freed
using free_page() still having a mlocked flag at free_pages_prepare()
stage:

  BUG: Bad page state in process syz.5.504  pfn:61f45
  page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x61f45
  flags: 0xfff00000080204(referenced|workingset|mlocked|node=0|zone=1|lastcpupid=0x7ff)
  raw: 00fff00000080204 0000000000000000 dead000000000122 0000000000000000
  raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
  page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
  page_owner tracks the page as allocated
  page last allocated via order 0, migratetype Unmovable, gfp_mask 0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 8443, tgid 8442 (syz.5.504), ts 201884660643, free_ts 201499827394
   set_page_owner include/linux/page_owner.h:32 [inline]
   post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537
   prep_new_page mm/page_alloc.c:1545 [inline]
   get_page_from_freelist+0x303f/0x3190 mm/page_alloc.c:3457
   __alloc_pages_noprof+0x292/0x710 mm/page_alloc.c:4733
   alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265
   kvm_coalesced_mmio_init+0x1f/0xf0 virt/kvm/coalesced_mmio.c:99
   kvm_create_vm virt/kvm/kvm_main.c:1235 [inline]
   kvm_dev_ioctl_create_vm virt/kvm/kvm_main.c:5488 [inline]
   kvm_dev_ioctl+0x12dc/0x2240 virt/kvm/kvm_main.c:5530
   __do_compat_sys_ioctl fs/ioctl.c:1007 [inline]
   __se_compat_sys_ioctl+0x510/0xc90 fs/ioctl.c:950
   do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
   __do_fast_syscall_32+0xb4/0x110 arch/x86/entry/common.c:386
   do_fast_syscall_32+0x34/0x80 arch/x86/entry/common.c:411
   entry_SYSENTER_compat_after_hwframe+0x84/0x8e
  page last free pid 8399 tgid 8399 stack trace:
   reset_page_owner include/linux/page_owner.h:25 [inline]
   free_pages_prepare mm/page_alloc.c:1108 [inline]
   free_unref_folios+0xf12/0x18d0 mm/page_alloc.c:2686
   folios_put_refs+0x76c/0x860 mm/swap.c:1007
   free_pages_and_swap_cache+0x5c8/0x690 mm/swap_state.c:335
   __tlb_batch_free_encoded_pages mm/mmu_gather.c:136 [inline]
   tlb_batch_pages_flush mm/mmu_gather.c:149 [inline]
   tlb_flush_mmu_free mm/mmu_gather.c:366 [inline]
   tlb_flush_mmu+0x3a3/0x680 mm/mmu_gather.c:373
   tlb_finish_mmu+0xd4/0x200 mm/mmu_gather.c:465
   exit_mmap+0x496/0xc40 mm/mmap.c:1926
   __mmput+0x115/0x390 kernel/fork.c:1348
   exit_mm+0x220/0x310 kernel/exit.c:571
   do_exit+0x9b2/0x28e0 kernel/exit.c:926
   do_group_exit+0x207/0x2c0 kernel/exit.c:1088
   __do_sys_exit_group kernel/exit.c:1099 [inline]
   __se_sys_exit_group kernel/exit.c:1097 [inline]
   __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1097
   x64_sys_call+0x2634/0x2640 arch/x86/include/generated/asm/syscalls_64.h:232
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
  Modules linked in:
  CPU: 0 UID: 0 PID: 8442 Comm: syz.5.504 Not tainted 6.12.0-rc6-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
  Call Trace:
   <TASK>
   __dump_stack lib/dump_stack.c:94 [inline]
   dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
   bad_page+0x176/0x1d0 mm/page_alloc.c:501
   free_page_is_bad mm/page_alloc.c:918 [inline]
   free_pages_prepare mm/page_alloc.c:1100 [inline]
   free_unref_page+0xed0/0xf20 mm/page_alloc.c:2638
   kvm_destroy_vm virt/kvm/kvm_main.c:1327 [inline]
   kvm_put_kvm+0xc75/0x1350 virt/kvm/kvm_main.c:1386
   kvm_vcpu_release+0x54/0x60 virt/kvm/kvm_main.c:4143
   __fput+0x23f/0x880 fs/file_table.c:431
   task_work_run+0x24f/0x310 kernel/task_work.c:239
   exit_task_work include/linux/task_work.h:43 [inline]
   do_exit+0xa2f/0x28e0 kernel/exit.c:939
   do_group_exit+0x207/0x2c0 kernel/exit.c:1088
   __do_sys_exit_group kernel/exit.c:1099 [inline]
   __se_sys_exit_group kernel/exit.c:1097 [inline]
   __ia32_sys_exit_group+0x3f/0x40 kernel/exit.c:1097
   ia32_sys_call+0x2624/0x2630 arch/x86/include/generated/asm/syscalls_32.h:253
   do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
   __do_fast_syscall_32+0xb4/0x110 arch/x86/entry/common.c:386
   do_fast_syscall_32+0x34/0x80 arch/x86/entry/common.c:411
   entry_SYSENTER_compat_after_hwframe+0x84/0x8e
  RIP: 0023:0xf745d579
  Code: Unable to access opcode bytes at 0xf745d54f.
  RSP: 002b:00000000f75afd6c EFLAGS: 00000206 ORIG_RAX: 00000000000000fc
  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 00000000ffffff9c RDI: 00000000f744cff4
  RBP: 00000000f717ae61 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
   </TASK>

The problem was originally introduced by commit b109b87050df ("mm/munlock:
replace clear_page_mlock() by final clearance"): it was focused on
handling pagecache and anonymous memory and wasn't suitable for lower
level get_page()/free_page() API's used for example by KVM, as with this
reproducer.

Fix it by moving the mlocked flag clearance down to free_page_prepare().

The bug itself if fairly old and harmless (aside from generating these
warnings), aside from a small memory leak - "bad" pages are stopped from
being allocated again.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance")
Signed-off-by: Roman Gushchin <[email protected]>
Reported-by: [email protected]
Closes: https://lore.kernel.org/all/[email protected]
Acked-by: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

Merge tag 'sched_ext-for-6.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:

- The fair sched class currently has a bug where its balance() returns
   true telling the sched core that it has tasks to run but then NULL
   from pick_task(). This makes sched core call sched_ext's pick_task()
   without preceding balance() which can lead to stalls in partial mode.

   For now, work around by detecting the condition and forcing the CPU
   to go through another scheduling cycle.

- Add a missing newline to an error message and fix drgn introspection
   tool which went out of sync.

* tag 'sched_ext-for-6.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Handle cases where pick_task_scx() is called without preceding balance_scx()
  sched_ext: Update scx_show_state.py to match scx_ops_bypass_depth's new type
  sched_ext: Add a missing newline at the end of an error message

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
"Several small bugfixes all over the place"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vdpa/mlx5: Fix error path during device add
  vp_vdpa: fix id_table array not null terminated error
  virtio_pci: Fix admin vq cleanup by using correct info pointer
  vDPA/ifcvf: Fix pci_read_config_byte() return code handling
  Fix typo in vringh_test.c
  vdpa: solidrun: Fix UB bug with devres
  vsock/virtio: Initialization of the dangling pointer occurring in vsk->trans

dm-cache: fix warnings about duplicate slab caches

The commit 4c39529663b9 adds a warning about duplicate cache names if
CONFIG_DEBUG_VM is selected. These warnings are triggered by the dm-cache
code.

The dm-cache code allocates a slab cache for each device. This commit
changes it to allocate just one slab cache in the module init function.

Signed-off-by: Mikulas Patocka <[email protected]>
Fixes: 4c39529663b9 ("slab: Warn on duplicate cache names when DEBUG_VM=y")

dm-bufio: fix warnings about duplicate slab caches

The commit 4c39529663b9 adds a warning about duplicate cache names if
CONFIG_DEBUG_VM is selected. These warnings are triggered by the dm-bufio
code. The dm-bufio code allocates a slab cache with each client. It is
not possible to preallocate the caches in the module init function
because the size of auxiliary per-buffer data is not known at this point.

So, this commit changes dm-bufio so that it appends a unique atomic value
to the cache name, to avoid the warnings.

Signed-off-by: Mikulas Patocka <[email protected]>
Fixes: 4c39529663b9 ("slab: Warn on duplicate cache names when DEBUG_VM=y")

cpufreq: intel_pstate: Rearrange locking in hybrid_init_cpu_capacity_scaling()

Notice that hybrid_init_cpu_capacity_scaling() only needs to hold
hybrid_capacity_lock around __hybrid_init_cpu_capacity_scaling()
calls, so introduce a "locked" wrapper around the latter and call
it from the former. This allows to drop a local variable and a
label that are not needed any more.

Also, rename __hybrid_init_cpu_capacity_scaling() to
__hybrid_refresh_cpu_capacity_scaling() for consistency.

Interestingly enough, this fixes a locking issue introduced by commit
929ebc93ccaa ("cpufreq: intel_pstate: Set asymmetric CPU capacity on
hybrid systems") that put an arch_enable_hybrid_capacity_scale() call
under hybrid_capacity_lock, which was a mistake because the latter is
acquired in CPU hotplug paths and so it cannot be held around
cpus_read_lock() calls.

Link: https://lore.kernel.org/linux-pm/SJ1PR11MB6129EDBF22F8A90FC3A3EDC8B9582@SJ1PR11MB6129.namprd11.prod.outlook.com/
Fixes: 929ebc93ccaa ("cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems")
Signed-off-by: Rafael J. Wysocki <[email protected]>
Reported-by: "Borah, Chaitanya Kumar" <[email protected]>
Link: https://patch.msgid.link/[email protected]
[ rjw: Changelog update ]
Signed-off-by: Rafael J. Wysocki <[email protected]>

mm: count zeromap read and set for swapout and swapin

When the proportion of folios from the zeromap is small, missing their
accounting may not significantly impact profiling.  However, it's easy to
construct a scenario where this becomes an issue—for example, allocating
1 GB of memory, writing zeros from userspace, followed by MADV_PAGEOUT,
and then swapping it back in.  In this case, the swap-out and swap-in
counts seem to vanish into a black hole, potentially causing semantic
ambiguity.

On the other hand, Usama reported that zero-filled pages can exceed 10% in
workloads utilizing zswap, while Hailong noted that some app in Android
have more than 6% zero-filled pages.  Before commit 0ca0c24e3211 ("mm:
store zero pages to be swapped out in a bitmap"), both zswap and zRAM
implemented similar optimizations, leading to these optimized-out pages
being counted in either zswap or zRAM counters (with pswpin/pswpout also
increasing for zRAM).  With zeromap functioning prior to both zswap and
zRAM, userspace will no longer detect these swap-out and swap-in actions.

We have three ways to address this:

1. Introduce a dedicated counter specifically for the zeromap.

2. Use pswpin/pswpout accounting, treating the zero map as a standard
   backend.  This approach aligns with zRAM's current handling of
   same-page fills at the device level.  However, it would mean losing the
   optimized-out page counters previously available in zRAM and would not
   align with systems using zswap.  Additionally, as noted by Nhat Pham,
   pswpin/pswpout counters apply only to I/O done directly to the backend
   device.

3. Count zeromap pages under zswap, aligning with system behavior when
   zswap is enabled.  However, this would not be consistent with zRAM, nor
   would it align with systems lacking both zswap and zRAM.

Given the complications with options 2 and 3, this patch selects
option 1.

We can find these counters from /proc/vmstat (counters for the whole
system) and memcg's memory.stat (counters for the interested memcg).

For example:

$ grep -E 'swpin_zero|swpout_zero' /proc/vmstat
swpin_zero 1648
swpout_zero 33536

$ grep -E 'swpin_zero|swpout_zero' /sys/fs/cgroup/system.slice/memory.stat
swpin_zero 3905
swpout_zero 3985

This patch does not address any specific zeromap bug, but the missing
swpout and swpin counts for zero-filled pages can be highly confusing and
may mislead user-space agents that rely on changes in these counters as
indicators.  Therefore, we add a Fixes tag to encourage the inclusion of
this counter in any kernel versions with zeromap.

Many thanks to Kanchana for the contribution of changing
count_objcg_event() to count_objcg_events() to support large folios[1],
which has now been incorporated into this patch.

[1] https://lkml.kernel.org/r/20241001053222 [email protected]

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 0ca0c24e3211 ("mm: store zero pages to be swapped out in a bitmap")
Co-developed-by: Kanchana P Sridhar <[email protected]>
Signed-off-by: Barry Song <[email protected]>
Reviewed-by: Nhat Pham <[email protected]>
Reviewed-by: Chengming Zhou <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Usama Arif <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Cc: Hailong Liu <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Chris Li <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Kairui Song <[email protected]>
Cc: Ryan Roberts <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

bcachefs: Allow for unknown key types in backpointers fsck

We can't assume that btrees only contain keys of a given type - even if
they only have a single key type listed in the allowed key types for
that btree; this is a forwards compatibility issue.

Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Fix assertion pop in topology repair

Fixes: baefd3f849ed ("bcachefs: btree_cache.freeable list fixes")
Signed-off-by: Kent Overstreet <[email protected]>

Linux 6.12-rc7

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
"A handful of Qualcomm clk driver fixes:

   - Correct flags for X Elite USB MP GDSC and pcie pipediv2 clocks

   - Fix alpha PLL post_div mask for the cases where width is not
     specified

   - Avoid hangs in the SM8350 video driver (venus) by setting HW_CTRL
     trigger feature on the video clocks"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: qcom: gcc-x1e80100: Fix USB MP SS1 PHY GDSC pwrsts flags
  clk: qcom: gcc-x1e80100: Fix halt_check for pipediv2 clocks
  clk: qcom: clk-alpha-pll: Fix pll post div mask when width is not set
  clk: qcom: videocc-sm8350: use HW_CTRL_TRIGGER for vcodec GDSCs

Merge tag 'i2c-for-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"i2c-host fixes for v6.12-rc7 (from Andi):

   - Fix designware incorrect behavior when concluding a transmission

   - Fix Mule multiplexer error value evaluation"

* tag 'i2c-for-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: designware: do not hold SCL low when I2C_DYNAMIC_TAR_UPDATE is not set
  i2c: muxes: Fix return value check in mule_i2c_mux_probe()

filemap: Fix bounds checking in filemap_read()

If the caller supplies an iocb->ki_pos value that is close to the
filesystem upper limit, and an iterator with a count that causes us to
overflow that limit, then filemap_read() enters an infinite loop.

This behaviour was discovered when testing xfstests generic/525 with the
"localio" optimisation for loopback NFS mounts.

Reported-by: Mike Snitzer <[email protected]>
Fixes: c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
Tested-by: Mike Snitzer <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'irq_urgent_for_v6.12_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fix from Borislav Petkov:

- Make sure GICv3 controller interrupt activation doesn't race with a
   concurrent deactivation due to propagation delays of the register
   write

* tag 'irq_urgent_for_v6.12_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v3: Force propagation of the active state with a read-back

Merge tag 'mm-hotfixes-stable-2024-11-09-22-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
"20 hotfixes, 14 of which are cc:stable.

  Three affect DAMON. Lorenzo's five-patch series to address the
  mmap_region error handling is here also.

  Apart from that, various singletons"

* tag 'mm-hotfixes-stable-2024-11-09-22-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mailmap: add entry for Thorsten Blum
  ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove()
  signal: restore the override_rlimit logic
  fs/proc: fix compile warning about variable 'vmcore_mmap_ops'
  ucounts: fix counter leak in inc_rlimit_get_ucounts()
  selftests: hugetlb_dio: check for initial conditions to skip in the start
  mm: fix docs for the kernel parameter ``thp_anon=``
  mm/damon/core: avoid overflow in damon_feed_loop_next_input()
  mm/damon/core: handle zero schemes apply interval
  mm/damon/core: handle zero {aggregation,ops_update} intervals
  mm/mlock: set the correct prev on failure
  objpool: fix to make percpu slot allocation more robust
  mm/page_alloc: keep track of free highatomic
  mm: resolve faulty mmap_region() error path behaviour
  mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling
  mm: refactor map_deny_write_exec()
  mm: unconditionally close VMAs on error
  mm: avoid unsafe VMA hook invocation when error arises on mmap hook
  mm/thp: fix deferred split unqueue naming and locking
  mm/thp: fix deferred split queue not partially_mapped

Merge tag 'usb-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB/Thunderbolt fixes from Greg KH:
"Here are some small remaining USB and Thunderbolt fixes and device ids
  for 6.12-rc7. Included in here are:

   - new USB serial driver device ids

   - thunderbolt driver fixes for reported problems

   - typec bugfixes

   - dwc3 driver fix

   - musb driver fix

  All of these have been in linux-next this past week with no reported
  issues"

* tag 'usb-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: serial: qcserial: add support for Sierra Wireless EM86xx
  thunderbolt: Fix connection issue with Pluggable UD-4VPD dock
  usb: typec: fix potential out of bounds in ucsi_ccg_update_set_new_cam_cmd()
  usb: dwc3: fix fault at system suspend if device was already runtime suspended
  usb: typec: qcom-pmic: init value of hdr_len/txbuf_len earlier
  usb: musb: sunxi: Fix accessing an released usb phy
  USB: serial: io_edgeport: fix use after free in debug printk
  USB: serial: option: add Quectel RG650V
  USB: serial: option: add Fibocom FG132 0x0112 composition
  thunderbolt: Add only on-board retimers when !CONFIG_USB4_DEBUGFS_MARGINING

Merge tag 'staging-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
"Here are two small memory leak fixes for the vchiq_arm staging driver
  that have been sitting in my tree for weeks and should get merged for
  6.12-rc7 so that people don't keep tripping over them.

  They both have been in linux-next for a while with no reported
  problems"

* tag 'staging-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: vchiq_arm: Use devm_kzalloc() for drv_mgmt allocation
  staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state allocation

Merge tag 'i2c-host-fixes-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current

i2c-host fixes for v6.12-rc7

In designware an incorrect behavior has been fixes when
concluding a transmission.

Fixed return error value evaluation in the Mule multiplexer.

Merge tag 'nfsd-6.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

- Fix a v6.12-rc regression when exporting ext4 filesystems with NFSD

* tag 'nfsd-6.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: Fix READDIR on NFSv3 mounts of ext4 exports

Merge tag 'v6.12-rc6-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fix from Steve French:
"Fix net namespace refcount use after free issue"

* tag 'v6.12-rc6-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: Fix use-after-free of network namespace.

Merge tag 'block-6.12-20241108' of git://git.kernel.dk/linux

Pull block fix from Jens Axboe:
"Single fix for an issue triggered with PROVE_RCU=y, with nvme using
the wrong iterators for an SRCU protected list"

* tag 'block-6.12-20241108' of git://git.kernel.dk/linux:
nvme/host: Fix RCU list traversal to use SRCU primitive

sched_ext: Handle cases where pick_task_scx() is called without preceding balance_scx()

sched_ext dispatches tasks from the BPF scheduler from balance_scx() and
thus every pick_task_scx() call must be preceded by balance_scx(). While
this usually holds, due to a bug, there are cases where the fair class's
balance() returns true indicating that it has tasks to run on the CPU and
thus terminating balance() calls but fails to actually find the next task to
run when pick_task() is called. In such cases, pick_task_scx() can be called
without preceding balance_scx().

Detect this condition using SCX_RQ_BAL_PENDING flags. If detected, keep
running the previous task if possible and avoid stalling from entering idle
without balancing.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]

landlock: Optimize scope enforcement

Do not walk through the domain hierarchy when the required scope is not
supported by this domain. This is the same approach as for filesystem
and network restrictions.

Cc: Mikhail Ivanov <[email protected]>
Cc: Tahera Fahimi <[email protected]>
Reviewed-by: Günther Noack <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mickaël Salaün <[email protected]>

landlock: Refactor network access mask management

Replace get_raw_handled_net_accesses() and get_current_net_domain() with
a call to landlock_get_applicable_domain().

Cc: Konstantin Meskhidze <[email protected]>
Cc: Mikhail Ivanov <[email protected]>
Reviewed-by: Günther Noack <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mickaël Salaün <[email protected]>

landlock: Refactor filesystem access mask management

Replace get_raw_handled_fs_accesses() with a generic
landlock_union_access_masks(), and replace get_fs_domain() with a
generic landlock_get_applicable_domain(). These helpers will also be
useful for other types of access.

Cc: Mikhail Ivanov <[email protected]>
Reviewed-by: Günther Noack <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[mic: Slightly improve doc as suggested by Günther]
Signed-off-by: Mickaël Salaün <[email protected]>

Merge tag 'thermal-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control fixes from Rafael Wysocki:
"These fix one issue in the qcom lmh thermal driver, a DT handling
  issue in the thermal core and two issues in the userspace thermal
  library:

   - Allow tripless thermal zones defined in a DT to be registered in
     accordance with the thermal DT bindings (Icenowy Zheng)

   - Annotate LMH IRQs with lockdep classes to prevent lockdep from
     reporting a possible recursive locking issue that cannot really
     occur (Dmitry Baryshkov)

   - Improve the thermal library "make clean" to remove a leftover
     symbolic link created during compilation and fix the sampling
     handler invocation in that library to pass the correct pointer to
     it (Emil Dahl Juhl, zhang jiao)"

* tag 'thermal-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal/of: support thermal zones w/o trips subnode
  tools/lib/thermal: Remove the thermal.h soft link when doing make clean
  tools/lib/thermal: Fix sampling handler context ptr
  thermal/drivers/qcom/lmh: Remove false lockdep backtrace

Merge tag 'pm-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
"Fix the asymmetric CPU capacity support code in the intel_pstate
  driver, added during this develompent cycle, to address a corner case
  in which the capacity of a CPU going online is not updated (Rafael
  Wysocki)"

* tag 'pm-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: intel_pstate: Update asym capacity for CPUs that were offline initially
  cpufreq: intel_pstate: Clear hybrid_max_perf_cpu before driver registration

Merge tag 'acpi-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
"Fix the ACPI processor driver initialization ordering after recent
  changes to avoid calling init_freq_invariance_cppc() too early on AMD
  platforms (Mario Limonciello)"

* tag 'acpi-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: processor: Move arch_init_invariance_cppc() call later

Merge tag 'v6.12-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:
"Four fixes, all also marked for stable:

   - fix two potential use after free issues

   - fix OOM issue with many simultaneous requests

   - fix missing error check in RPC pipe handling"

* tag 'v6.12-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd:
  ksmbd: check outstanding simultaneous SMB operations
  ksmbd: fix slab-use-after-free in smb3_preauth_hash_rsp
  ksmbd: fix slab-use-after-free in ksmbd_smb2_session_create
  ksmbd: Fix the missing xa_store error check

bpf: Fix mismatched RCU unlock flavour in bpf_out_neigh_v6

In the bpf_out_neigh_v6 function, rcu_read_lock() is used to begin an RCU
read-side critical section. However, when unlocking, one branch
incorrectly uses a different RCU unlock flavour rcu_read_unlock_bh()
instead of rcu_read_unlock(). This mismatch in RCU locking flavours can
lead to unexpected behavior and potential concurrency issues.

This possible bug was identified using a static analysis tool developed
by myself, specifically designed to detect RCU-related issues.

This patch corrects the mismatched unlock flavour by replacing the
incorrect rcu_read_unlock_bh() with the appropriate rcu_read_unlock(),
ensuring that the RCU critical section is properly exited. This change
prevents potential synchronization issues and aligns with proper RCU
usage patterns.

Fixes: 09eed1192cec ("neighbour: switch to standard rcu, instead of rcu_bh")
Signed-off-by: Jiawei Ye <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Two small fixes, the drivers one in ufs simply delays running a work
  queue and the generic one in zoned storage switches to a more correct
  API that tries the standard buddy allocator first (for small
  allocations); this fixes an allocation problem with small allocations
  seen under memory pressure"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: core: Start the RTC update work later
  scsi: sd_zbc: Use kvzalloc() to allocate REPORT ZONES buffer

Merge tag 'drm-fixes-2024-11-09' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Weekly fixes, usual leaders in amdgpu and xe, then a panel quirk, and
  some fixes to imagination and panthor drivers. Seems around the usual
  level for this time and don't know of any big problems.

  amdgpu:
   - Brightness fix
   - DC vbios parsing fix
   - ACPI fix
   - SMU 14.x fix
   - Power workload profile fix
   - GC partitioning fix
   - Debugfs fixes

  imagination:
   - Track PVR context per file
   - Break ref-counting cycle

  panel-orientation-quirks:
   - Fix matching Lenovo Yoga Tab 3 X90F

  panthor:
   - Lock VM array
   - Be strict about I/O mapping flags

  xe:
   - Fix ccs_mode setting for Xe2 and later
   - Synchronize ccs_mode setting with client creation
   - Apply scheduling WA for LNL in additional places as needed
   - Fix leak and lock handling in error paths of xe_exec ioctl
   - Fix GGTT allocation leak leading to eventual crash in SR-IOV
   - Move run_ticks update out of job handling to avoid synchronization
     with reader"

* tag 'drm-fixes-2024-11-09' of https://gitlab.freedesktop.org/drm/kernel: (23 commits)
  drm/panthor: Be stricter about IO mapping flags
  drm/panthor: Lock XArray when getting entries for the VM
  drm: panel-orientation-quirks: Make Lenovo Yoga Tab 3 X90F DMI match less strict
  drm/xe: Stop accumulating LRC timestamp on job_free
  drm/xe/pf: Fix potential GGTT allocation leak
  drm/xe: Drop VM dma-resv lock on xe_sync_in_fence_get failure in exec IOCTL
  drm/xe: Fix possible exec queue leak in exec IOCTL
  drm/amdgpu: add missing size check in amdgpu_debugfs_gprwave_read()
  drm/amdgpu: Adjust debugfs eviction and IB access permissions
  drm/amdgpu: Adjust debugfs register access permissions
  drm/amdgpu: Fix DPX valid mode check on GC 9.4.3
  drm/amd/pm: correct the workload setting
  drm/amd/pm: always pick the pptable from IFWI
  drm/amdgpu: prevent NULL pointer dereference if ATIF is not supported
  drm/amd/display: parse umc_info or vram_info based on ASIC
  drm/amd/display: Fix brightness level not retained over reboot
  drm/xe/guc/tlb: Flush g2h worker in case of tlb timeout
  drm/xe/ufence: Flush xe ordered_wq in case of ufence timeout
  drm/xe: Move LNL scheduling WA to xe_device.h
  drm/xe: Use the filelist from drm for ccs_mode change
  ...

Merge tag 'drm-xe-fixes-2024-11-08' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- Fix ccs_mode setting for Xe2 and later (Balasubramani)
- Synchronize ccs_mode setting with client creation (Balasubramani)
- Apply scheduling WA for LNL in additional places as needed
  (Nirmoy)
- Fix leak and lock handling in error paths of xe_exec ioctl
  (Matthew Brost)
- Fix GGTT allocation leak leading to eventual crash in SR-IOV
  (Michal Wajdeczko)
- Move run_ticks update out of job handling to avoid synchronization
  with reader (Lucas)

Signed-off-by: Dave Airlie <[email protected]>
From: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/4ffcebtluaaaohquxfyf5babpihmtscxwad3jjmt5nggwh2xpm@ztw67ucywttg

Merge tag 'drm-misc-fixes-2024-11-08' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

imagination:
- Track PVR context per file
- Break ref-counting cycle

panel-orientation-quirks:
- Fix matching Lenovo Yoga Tab 3 X90F

panthor:
- Lock VM array
- Be strict about I/O mapping flags

Signed-off-by: Dave Airlie <[email protected]>
From: Thomas Zimmermann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

bcachefs: Fix hidden btree errors when reading roots

We silence btree errors in btree_node_scan, since it's probing and
errors are expected: add a fake pass so that btree_node_scan is no
longer recovery pass 0, and we don't think we're in btree node scan when
reading btree roots.

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Fix validate_bset() repair path

When we truncate a bset (due to it extending past the end of the btree
node), we can't skip the rest of the validation for e.g. the packed
format (if it's the first bset in the node).

Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>

i2c: designware: do not hold SCL low when I2C_DYNAMIC_TAR_UPDATE is not set

When the Tx FIFO is empty and the last command has no STOP bit
set, the master holds SCL low. If I2C_DYNAMIC_TAR_UPDATE is not
set, BIT(13) MST_ON_HOLD of IC_RAW_INTR_STAT is not enabled,
causing the __i2c_dw_disable() timeout. This is quite similar to
commit 2409205acd3c ("i2c: designware: fix __i2c_dw_disable() in
case master is holding SCL low"). Also check BIT(7)
MST_HOLD_TX_FIFO_EMPTY in IC_STATUS, which is available when
IC_STAT_FOR_CLK_STRETCH is set.

Fixes: 2409205acd3c ("i2c: designware: fix __i2c_dw_disable() in case master is holding SCL low")
Co-developed-by: Xiaowu Ding <[email protected]>
Signed-off-by: Xiaowu Ding <[email protected]>
Co-developed-by: Angus Chen <[email protected]>
Signed-off-by: Angus Chen <[email protected]>
Signed-off-by: Liu Peibao <[email protected]>
Acked-by: Jarkko Nikula <[email protected]>
Signed-off-by: Andi Shyti <[email protected]>

Merge tag 'sound-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Still more changes floating than wished at this late stage, but all
  are small device-specific fixes, and look less troublesome.

  Including a few ASoC quirk / ID additoins, a series of ASoC STM fixes,
  HD-audio conexant codec regression fix, and other various quirks and
  device-specific fixes"

* tag 'sound-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ASoC: SOF: sof-client-probes-ipc4: Set param_size extension bits
  ASoC: stm: Prevent potential division by zero in stm32_sai_get_clk_div()
  ASoC: stm: Prevent potential division by zero in stm32_sai_mclk_round_rate()
  ASoC: amd: yc: Support dmic on another model of Lenovo Thinkpad E14 Gen 6
  ASoC: SOF: amd: Fix for incorrect DMA ch status register offset
  ASoC: amd: yc: fix internal mic on Xiaomi Book Pro 14 2022
  ASoC: stm32: spdifrx: fix dma channel release in stm32_spdifrx_remove
  MAINTAINERS: Generic Sound Card section
  ALSA: usb-audio: Add quirk for HP 320 FHD Webcam
  ASoC: tas2781: Add new driver version for tas2563 & tas2781 qfn chip
  ALSA: firewire-lib: fix return value on fail in amdtp_tscm_init()
  ALSA: ump: Don't enumeration invalid groups for legacy rawmidi
  Revert "ALSA: hda/conexant: Mute speakers at suspend / shutdown"

Merge tag 'media/v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

- dvb-core fixes for vb2 check and device registration

- v4l2-core: fix an issue with error handling for VIDIOC_G_CTRL

- vb2 core: fix an issue with vb plane copy logic

- videobuf2-core: copy vb planes unconditionally

- vivid: fix buffer overwrite when using > 32 buffers

- vivid: fix a potential division by zero due to an issue at v4l2-tpg

- some spectre vulnerability fixes

- several OOM access fixes

- some buffer overflow fixes

* tag 'media/v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: videobuf2-core: copy vb planes unconditionally
  media: dvbdev: fix the logic when DVB_DYNAMIC_MINORS is not set
  media: vivid: fix buffer overwrite when using > 32 buffers
  media: pulse8-cec: fix data timestamp at pulse8_setup()
  media: cec: extron-da-hd-4k-plus: don't use -1 as an error code
  media: stb0899_algo: initialize cfr before using it
  media: adv7604: prevent underflow condition when reporting colorspace
  media: cx24116: prevent overflows on SNR calculus
  media: ar0521: don't overflow when checking PLL values
  media: s5p-jpeg: prevent buffer overflows
  media: av7110: fix a spectre vulnerability
  media: mgb4: protect driver against spectre
  media: dvb_frontend: don't play tricks with underflow values
  media: dvbdev: prevent the risk of out of memory access
  media: v4l2-tpg: prevent the risk of a division by zero
  media: v4l2-ctrls-api: fix error handling for v4l2_g_ctrl()
  media: dvb-core: add missing buffer index check

Merge tag 'slab-for-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab fix from Vlastimil Babka:

- Fix for duplicate caches in some arm64 configurations with
CONFIG_SLAB_BUCKETS (Koichiro Den)

* tag 'slab-for-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slab: fix warning caused by duplicate kmem_cache creation in kmem_buckets_create

Merge tag 'for-6.12-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
"A few more one-liners that fix some user visible problems:

   - use correct range when clearing qgroup reservations after COW

   - properly reset freed delayed ref list head

   - fix ro/rw subvolume mounts to be backward compatible with old and
     new mount API"

* tag 'for-6.12-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix the length of reserved qgroup to free
  btrfs: reinitialize delayed ref list after deleting it from the list
  btrfs: fix per-subvolume RO/RW flags with new mount API

Merge tag 'bcachefs-2024-11-07' of git://evilpiepirate.org/bcachefs

Pull bcachefs fixes from Kent Overstreet:
"Some trivial syzbot fixes, two more serious btree fixes found by
  looping single_devices.ktest small_nodes:

   - Topology error on split after merge, where we accidentaly picked
     the node being deleted for the pivot, resulting in an assertion pop

   - New nodes being preallocated were left on the freedlist, unlocked,
     resulting in them sometimes being accidentally freed: this dated
     from pre-cycle detector, when we could leave them locked. This
     should have resulted in more explosions and fireworks, but turned
     out to be surprisingly hard to hit because the preallocated nodes
     were being used right away.

     The fix for this is bigger than we'd like - reworking btree list
     handling was a bit invasive - but we've now got more assertions and
     it's well tested.

   - Also another mishandled transaction restart fix (in
     btree_node_prefetch) - we're almost done with those"

* tag 'bcachefs-2024-11-07' of git://evilpiepirate.org/bcachefs:
  bcachefs: Fix UAF in __promote_alloc() error path
  bcachefs: Change OPT_STR max to be 1 less than the size of choices array
  bcachefs: btree_cache.freeable list fixes
  bcachefs: check the invalid parameter for perf test
  bcachefs: add check NULL return of bio_kmalloc in journal_read_bucket
  bcachefs: Ensure BCH_FS_may_go_rw is set before exiting recovery
  bcachefs: Fix topology errors on split after merge
  bcachefs: Ancient versions with bad bkey_formats are no longer supported
  bcachefs: Fix error handling in bch2_btree_node_prefetch()
  bcachefs: Fix null ptr deref in bucket_gen_get()

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
"Here is a (hopefully) final round of arm64 fixes for 6.12 that address
  some user-visible floating point register corruption. Both of the
  Marks have been working on this for a couple of weeks and we've ended
  up in a position where SVE is solid but SME still has enough pending
  issues that the most pragmatic solution for the release and stable
  backports is to disable the feature. Yes, it's a shame, but the
  hardware is rare as hen's teeth at the moment and we're better off
  getting back to a known good state before fixing it all properly.
  We're also improving the selftests for 6.13 to help avoid merging
  broken code in the future.

  Anyway, the good news is that we're removing a lot more code than
  we're adding.

  Summary:

   - Fix handling of SVE traps from userspace on preemptible kernels
     when converting the saved floating point state into SVE state.

   - Remove broken support for the SMCCCv1.3 "SVE discard hint"
     optimisation.

   - Disable SME support, as the current support code suffers from
     numerous issues around signal delivery, ptrace access and
     context-switch which can lead to user-visible corruption of the
     register state"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Kconfig: Make SME depend on BROKEN for now
  arm64: smccc: Remove broken support for SMCCCv1.3 SVE discard hint
  arm64/sve: Discard stale CPU state when handling SVE traps

Merge tag 'powerpc-6.12-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Madhavan Srinivasan:

- Fix spurious interrupts in Book3S HV Nested KVM

Thanks to Gautam Menghani.

* tag 'powerpc-6.12-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV: Mask off LPCR_MER for a vCPU before running it to avoid spurious interrupts

KVM: VMX: Bury Intel PT virtualization (guest/host mode) behind CONFIG_BROKEN

Hide KVM's pt_mode module param behind CONFIG_BROKEN, i.e. disable support
for virtualizing Intel PT via guest/host mode unless BROKEN=y.  There are
myriad bugs in the implementation, some of which are fatal to the guest,
and others which put the stability and health of the host at risk.

For guest fatalities, the most glaring issue is that KVM fails to ensure
tracing is disabled, and *stays* disabled prior to VM-Enter, which is
necessary as hardware disallows loading (the guest's) RTIT_CTL if tracing
is enabled (enforced via a VMX consistency check).  Per the SDM:

  If the logical processor is operating with Intel PT enabled (if
  IA32_RTIT_CTL.TraceEn = 1) at the time of VM entry, the "load
  IA32_RTIT_CTL" VM-entry control must be 0.

On the host side, KVM doesn't validate the guest CPUID configuration
provided by userspace, and even worse, uses the guest configuration to
decide what MSRs to save/load at VM-Enter and VM-Exit.  E.g. configuring
guest CPUID to enumerate more address ranges than are supported in hardware
will result in KVM trying to passthrough, save, and load non-existent MSRs,
which generates a variety of WARNs, ToPA ERRORs in the host, a potential
deadlock, etc.

Fixes: f99e3daf94ff ("KVM: x86: Add Intel PT virtualization work mode")
Cc: [email protected]
Cc: Adrian Hunter <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Xiaoyao Li <[email protected]>
Tested-by: Adrian Hunter <[email protected]>
Message-ID: <20241101185031.1799556 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: Unconditionally set irr_pending when updating APICv state

Always set irr_pending (to true) when updating APICv status to fix a bug
where KVM fails to set irr_pending when userspace sets APIC state and
APICv is disabled, which ultimate results in KVM failing to inject the
pending interrupt(s) that userspace stuffed into the vIRR, until another
interrupt happens to be emulated by KVM.

Only the APICv-disabled case is flawed, as KVM forces apic->irr_pending to
be true if APICv is enabled, because not all vIRR updates will be visible
to KVM.

Hit the bug with a big hammer, even though strictly speaking KVM can scan
the vIRR and set/clear irr_pending as appropriate for this specific case.
The bug was introduced by commit 755c2bf87860 ("KVM: x86: lapic: don't
touch irr_pending in kvm_apic_update_apicv when inhibiting it"), which as
the shortlog suggests, deleted code that updated irr_pending.

Before that commit, kvm_apic_update_apicv() did indeed scan the vIRR, with
with the crucial difference that kvm_apic_update_apicv() did the scan even
when APICv was being *disabled*, e.g. due to an AVIC inhibition.

        struct kvm_lapic *apic = vcpu->arch.apic;

        if (vcpu->arch.apicv_active) {
                /* irr_pending is always true when apicv is activated. */
                apic->irr_pending = true;
                apic->isr_count = 1;
        } else {
                apic->irr_pending = (apic_search_irr(apic) != -1);
                apic->isr_count = count_vectors(apic->regs + APIC_ISR);
        }

And _that_ bug (clearing irr_pending) was introduced by commit b26a695a1d78
("kvm: lapic: Introduce APICv update helper function"), prior to which KVM
unconditionally set irr_pending to true in kvm_apic_set_state(), i.e.
assumed that the new virtual APIC state could have a pending IRQ.

Furthermore, in addition to introducing this issue, commit 755c2bf87860
also papered over the underlying bug: KVM doesn't ensure CPUs and devices
see APICv as disabled prior to searching the IRR.  Waiting until KVM
emulates an EOI to update irr_pending "works", but only because KVM won't
emulate EOI until after refresh_apicv_exec_ctrl(), and there are plenty of
memory barriers in between.  I.e. leaving irr_pending set is basically
hacking around bad ordering.

So, effectively revert to the pre-b26a695a1d78 behavior for state restore,
even though it's sub-optimal if no IRQs are pending, in order to provide a
minimal fix, but leave behind a FIXME to document the ugliness.  With luck,
the ordering issue will be fixed and the mess will be cleaned up in the
not-too-distant future.

Fixes: 755c2bf87860 ("KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it")
Cc: [email protected]
Cc: Maxim Levitsky <[email protected]>
Reported-by: Yong He <[email protected]>
Closes: https://lkml.kernel.org/r/20241023124527.1092810-1-alexyonghe%40tencent.com
Signed-off-by: Sean Christopherson <[email protected]>
Message-ID: <20241106015135.2462147 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

kvm: svm: Fix gctx page leak on invalid inputs

Ensure that snp gctx page allocation is adequately deallocated on
failure during snp_launch_start.

Fixes: 136d8bc931c8 ("KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command")
CC: Sean Christopherson <[email protected]>
CC: Paolo Bonzini <[email protected]>
CC: Thomas Gleixner <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Borislav Petkov <[email protected]>
CC: Dave Hansen <[email protected]>
CC: Ashish Kalra <[email protected]>
CC: Tom Lendacky <[email protected]>
CC: John Allen <[email protected]>
CC: Herbert Xu <[email protected]>
CC: "David S. Miller" <[email protected]>
CC: Michael Roth <[email protected]>
CC: Luis Chamberlain <[email protected]>
CC: Russ Weight <[email protected]>
CC: Danilo Krummrich <[email protected]>
CC: Greg Kroah-Hartman <[email protected]>
CC: "Rafael J. Wysocki" <[email protected]>
CC: Tianfei zhang <[email protected]>
CC: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Dionna Glaze <[email protected]>
Message-ID: <20241105010558.1266699 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: selftests: use X86_MEMTYPE_WB instead of VMX_BASIC_MEM_TYPE_WB

In 08a7d2525511 ("tools arch x86: Sync the msr-index.h copy with the
kernel sources"), VMX_BASIC_MEM_TYPE_WB was removed. Use X86_MEMTYPE_WB
instead.

Fixes: 08a7d2525511 ("tools arch x86: Sync the msr-index.h copy with the
kernel sources")
Signed-off-by: John Sperbeck <[email protected]>
Message-ID: <20241106034031 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge tag 'kvm-x86-fixes-6.12-rcN' of https://github.com/kvm-x86/linux into HEAD

KVM x86 and selftests fixes for 6.12:

- Increase the timeout for the memslot performance selftest to avoid false
   failures on arm64 and nested x86 platforms.

- Fix a goof in the guest_memfd selftest where a for-loop initialized a
   bit mask to zero instead of BIT(0).

- Disable strict aliasing when building KVM selftests to prevent the
   compiler from treating things like "u64 *" to "uint64_t *" cases as
   undefined behavior, which can lead to nasty, hard to debug failures.

- Force -march=x86-64-v2 for KVM x86 selftests if and only if the uarch
   is supported by the compiler.

- When emulating a guest TLB flush for a nested guest, flush vpid01, not
   vpid02, if L2 is active but VPID is disabled in vmcs12, i.e. if L2 and
   L1 are sharing VPID '0' (from L1's perspective).

- Fix a bug in the SNP initialization flow where KVM would return '0' to
   userspace instead of -errno on failure.

Merge tag 'asoc-fix-v6.12-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.12

A moderately large pile of small changes here, split fairly evenly
between fixes and ID additions/quirks and all of it driver specific.

Merge tag 'usb-serial-6.12-rc7' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus

Johan writes:

USB-serial fixes for 6.12-rc7

Here's a fix for a long-standing use-after-free in an io_edgeport debug
printk and some new modem device ids.

All have been in linux-next with no reported issues.

* tag 'usb-serial-6.12-rc7' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
  USB: serial: qcserial: add support for Sierra Wireless EM86xx
  USB: serial: io_edgeport: fix use after free in debug printk
  USB: serial: option: add Quectel RG650V
  USB: serial: option: add Fibocom FG132 0x0112 composition

bcachefs: Fix missing validation for bch_backpointer.level

This fixes an assertion pop where we try to navigate to the target of
the backpointer, and the path level isn't what we expect.

Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Fix bch_member.btree_bitmap_shift validation

Needs to match the assert later when we resize...

Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: bch2_btree_write_buffer_flush_going_ro()

The write buffer needs to be specifically flushed when going RO: keys in
the journal that haven't yet been moved to the write buffer don't have a
journal pin yet.

This fixes numerous syzbot bugs, all with symptoms of still doing writes
after we've got RO.

Signed-off-by: Kent Overstreet <[email protected]>

Merge tag 'amd-drm-fixes-6.12-2024-11-07' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.12-2024-11-07:

amdgpu:
- Brightness fix
- DC vbios parsing fix
- ACPI fix
- SMU 14.x fix
- Power workload profile fix
- GC partitioning fix
- Debugfs fixes

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'spi-fix-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fix from Mark Brown:
"An update for the maintainers of the AMD driver following some job
changes there"

* tag 'spi-fix-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
MAINTAINERS: update AMD SPI maintainer

Merge tag 'regulator-fix-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
"A couple of small fixes for drivers, nothing particularly remarkable"

* tag 'regulator-fix-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: rk808: Add apply_bit for BUCK3 on RK809
regulator: rtq2208: Fix uninitialized use of regulator_config

mailmap: add entry for Thorsten Blum

Map my previously used email address to my @linux.dev address.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Thorsten Blum <[email protected]>
Cc: Alex Elder <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Geliang Tang <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Mathieu Othacehe <[email protected]>
Cc: Matthieu Baerts (NGI0) <[email protected]>
Cc: Matt Ranostay <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Quentin Monnet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove()

Syzkaller is able to provoke null-ptr-dereference in ocfs2_xa_remove():

[   57.319872] (a.out,1161,7):ocfs2_xa_remove:2028 ERROR: status = -12
[   57.320420] (a.out,1161,7):ocfs2_xa_cleanup_value_truncate:1999 ERROR: Partial truncate while removing xattr overlay.upper.  Leaking 1 clusters and removing the entry
[   57.321727] BUG: kernel NULL pointer dereference, address: 0000000000000004
[...]
[   57.325727] RIP: 0010:ocfs2_xa_block_wipe_namevalue+0x2a/0xc0
[...]
[   57.331328] Call Trace:
[   57.331477]  <TASK>
[...]
[   57.333511]  ? do_user_addr_fault+0x3e5/0x740
[   57.333778]  ? exc_page_fault+0x70/0x170
[   57.334016]  ? asm_exc_page_fault+0x2b/0x30
[   57.334263]  ? __pfx_ocfs2_xa_block_wipe_namevalue+0x10/0x10
[   57.334596]  ? ocfs2_xa_block_wipe_namevalue+0x2a/0xc0
[   57.334913]  ocfs2_xa_remove_entry+0x23/0xc0
[   57.335164]  ocfs2_xa_set+0x704/0xcf0
[   57.335381]  ? _raw_spin_unlock+0x1a/0x40
[   57.335620]  ? ocfs2_inode_cache_unlock+0x16/0x20
[   57.335915]  ? trace_preempt_on+0x1e/0x70
[   57.336153]  ? start_this_handle+0x16c/0x500
[   57.336410]  ? preempt_count_sub+0x50/0x80
[   57.336656]  ? _raw_read_unlock+0x20/0x40
[   57.336906]  ? start_this_handle+0x16c/0x500
[   57.337162]  ocfs2_xattr_block_set+0xa6/0x1e0
[   57.337424]  __ocfs2_xattr_set_handle+0x1fd/0x5d0
[   57.337706]  ? ocfs2_start_trans+0x13d/0x290
[   57.337971]  ocfs2_xattr_set+0xb13/0xfb0
[   57.338207]  ? dput+0x46/0x1c0
[   57.338393]  ocfs2_xattr_trusted_set+0x28/0x30
[   57.338665]  ? ocfs2_xattr_trusted_set+0x28/0x30
[   57.338948]  __vfs_removexattr+0x92/0xc0
[   57.339182]  __vfs_removexattr_locked+0xd5/0x190
[   57.339456]  ? preempt_count_sub+0x50/0x80
[   57.339705]  vfs_removexattr+0x5f/0x100
[...]

Reproducer uses faultinject facility to fail ocfs2_xa_remove() ->
ocfs2_xa_value_truncate() with -ENOMEM.

In this case the comment mentions that we can return 0 if
ocfs2_xa_cleanup_value_truncate() is going to wipe the entry
anyway. But the following 'rc' check is wrong and execution flow do
'ocfs2_xa_remove_entry(loc);' twice:
* 1st: in ocfs2_xa_cleanup_value_truncate();
* 2nd: returning back to ocfs2_xa_remove() instead of going to 'out'.

Fix this by skipping the 2nd removal of the same entry and making
syzkaller repro happy.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 399ff3a748cf ("ocfs2: Handle errors while setting external xattr values.")
Signed-off-by: Andrew Kanner <[email protected]>
Reported-by: [email protected]
Closes: https://lore.kernel.org/all/[email protected]/T/
Tested-by: [email protected]
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

signal: restore the override_rlimit logic

Prior to commit d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of
ucounts") UCOUNT_RLIMIT_SIGPENDING rlimit was not enforced for a class of
signals.  However now it's enforced unconditionally, even if
override_rlimit is set.  This behavior change caused production issues.

For example, if the limit is reached and a process receives a SIGSEGV
signal, sigqueue_alloc fails to allocate the necessary resources for the
signal delivery, preventing the signal from being delivered with siginfo.
This prevents the process from correctly identifying the fault address and
handling the error.  From the user-space perspective, applications are
unaware that the limit has been reached and that the siginfo is
effectively 'corrupted'.  This can lead to unpredictable behavior and
crashes, as we observed with java applications.

Fix this by passing override_rlimit into inc_rlimit_get_ucounts() and skip
the comparison to max there if override_rlimit is set.  This effectively
restores the old behavior.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
Signed-off-by: Roman Gushchin <[email protected]>
Co-developed-by: Andrei Vagin <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
Acked-by: Alexey Gladkov <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

fs/proc: fix compile warning about variable 'vmcore_mmap_ops'

When build with !CONFIG_MMU, the variable 'vmcore_mmap_ops'
is defined but not used:

>> fs/proc/vmcore.c:458:42: warning: unused variable 'vmcore_mmap_ops'
458 | static const struct vm_operations_struct vmcore_mmap_ops = {

Fix this by only defining it when CONFIG_MMU is enabled.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 9cb218131de1 ("vmcore: introduce remap_oldmem_pfn_range()")
Signed-off-by: Qi Xi <[email protected]>
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/lkml/[email protected]/
Cc: Baoquan He <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Michael Holzheu <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Wang ShaoBo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

ucounts: fix counter leak in inc_rlimit_get_ucounts()

The inc_rlimit_get_ucounts() increments the specified rlimit counter and
then checks its limit. If the value exceeds the limit, the function
returns an error without decrementing the counter.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 15bc01effefe ("ucounts: Fix signal ucount refcounting")
Signed-off-by: Andrei Vagin <[email protected]>
Co-developed-by: Roman Gushchin <[email protected]>
Signed-off-by: Roman Gushchin <[email protected]>
Tested-by: Roman Gushchin <[email protected]>
Acked-by: Alexey Gladkov <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Andrei Vagin <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Alexey Gladkov <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

selftests: hugetlb_dio: check for initial conditions to skip in the start

The test should be skipped if initial conditions aren't fulfilled in the
start instead of failing and outputting non-compliant TAP logs. This kind
of failure pollutes the results. The initial conditions are:

- The test should only execute if /tmp file can be allocated.
- The test should only execute if huge pages are free.

Before:
TAP version 13
1..4
Bail out! Error opening file
: Read-only file system (30)
# Planned tests != run tests (4 != 0)
# Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0

After:
TAP version 13
1..0 # SKIP Unable to allocate file: Read-only file system

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muhammad Usama Anjum <[email protected]>
Fixes: 3a103b5315b7 ("selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()")
Cc: Muhammad Usama Anjum <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Donet Tom <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: fix docs for the kernel parameter ``thp_anon=``

If we add ``thp_anon=32,64K:always`` to the kernel command line, we
will see the following error:

[ 0.000000] huge_memory: thp_anon=32,64K:always: error parsing string, ignoring setting

This happens because the correct format isn't ``thp_anon=<size>,<size>[KMG]:<state>```,
as [KMG] must follow each number to especify its unit. So, the correct
format is ``thp_anon=<size>[KMG],<size>[KMG]:<state>```.

Therefore, adjust the documentation to reflect the correct format of the
parameter ``thp_anon=``.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: dd4d30d1cdbe ("mm: override mTHP "enabled" defaults at kernel cmdline")
Signed-off-by: Maíra Canal <[email protected]>
Acked-by: Barry Song <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Lance Yang <[email protected]>
Cc: Ryan Roberts <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/damon/core: avoid overflow in damon_feed_loop_next_input()

damon_feed_loop_next_input() is inefficient and fragile to overflows.
Specifically, 'score_goal_diff_bp' calculation can overflow when 'score'
is high.  The calculation is actually unnecessary at all because 'goal' is
a constant of value 10,000.  Calculation of 'compensation' is again
fragile to overflow.  Final calculation of return value for under-achiving
case is again fragile to overflow when the current score is
under-achieving the target.

Add two corner cases handling at the beginning of the function to make the
body easier to read, and rewrite the body of the function to avoid
overflows and the unnecessary bp value calcuation.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 9294a037c015 ("mm/damon/core: implement goal-oriented feedback-driven quota auto-tuning")
Signed-off-by: SeongJae Park <[email protected]>
Reported-by: Guenter Roeck <[email protected]>
Closes: https://lore.kernel.org/[email protected]
Tested-by: Guenter Roeck <[email protected]>
Cc: <[email protected]> [6.8.x]
Signed-off-by: Andrew Morton <[email protected]>

mm/damon/core: handle zero schemes apply interval

DAMON's logics to determine if this is the time to apply damos schemes
assumes next_apply_sis is always set larger than current
passed_sample_intervals.  And therefore assume continuously incrementing
passed_sample_intervals will make it reaches to the next_apply_sis in
future.  The logic hence does apply the scheme and update next_apply_sis
only if passed_sample_intervals is same to next_apply_sis.

If Schemes apply interval is set as zero, however, next_apply_sis is set
same to current passed_sample_intervals, respectively.  And
passed_sample_intervals is incremented before doing the next_apply_sis
check.  Hence, next_apply_sis becomes larger than next_apply_sis, and the
logic says it is not the time to apply schemes and update next_apply_sis.
In other words, DAMON stops applying schemes until passed_sample_intervals
overflows.

Based on the documents and the common sense, a reasonable behavior for
such inputs would be applying the schemes for every sampling interval.
Handle the case by removing the assumption.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 42f994b71404 ("mm/damon/core: implement scheme-specific apply interval")
Signed-off-by: SeongJae Park <[email protected]>
Cc: <[email protected]> [6.7.x]
Signed-off-by: Andrew Morton <[email protected]>

mm/damon/core: handle zero {aggregation,ops_update} intervals

Patch series "mm/damon/core: fix handling of zero non-sampling intervals".

DAMON's internal intervals accounting logic is not correctly handling
non-sampling intervals of zero values for a wrong assumption.  This could
cause unexpected monitoring behavior, and even result in infinite hang of
DAMON sysfs interface user threads in case of zero aggregation interval.
Fix those by updating the intervals accounting logic.  For details of the
root case and solutions, please refer to commit messages of fixes.

This patch (of 2):

DAMON's logics to determine if this is the time to do aggregation and ops
update assumes next_{aggregation,ops_update}_sis are always set larger
than current passed_sample_intervals.  And therefore it further assumes
continuously incrementing passed_sample_intervals every sampling interval
will make it reaches to the next_{aggregation,ops_update}_sis in future.
The logic therefore make the action and update
next_{aggregation,ops_updaste}_sis only if passed_sample_intervals is same
to the counts, respectively.

If Aggregation interval or Ops update interval are zero, however,
next_aggregation_sis or next_ops_update_sis are set same to current
passed_sample_intervals, respectively.  And passed_sample_intervals is
incremented before doing the next_{aggregation,ops_update}_sis check.
Hence, passed_sample_intervals becomes larger than
next_{aggregation,ops_update}_sis, and the logic says it is not the time
to do the action and update next_{aggregation,ops_update}_sis forever,
until an overflow happens.  In other words, DAMON stops doing aggregations
or ops updates effectively forever, and users cannot get monitoring
results.

Based on the documents and the common sense, a reasonable behavior for
such inputs is doing an aggregation and an ops update for every sampling
interval.  Handle the case by removing the assumption.

Note that this could incur particular real issue for DAMON sysfs interface
users, in case of zero Aggregation interval.  When user starts DAMON with
zero Aggregation interval and asks online DAMON parameter tuning via DAMON
sysfs interface, the request is handled by the aggregation callback.
Until the callback finishes the work, the user who requested the online
tuning just waits.  Hence, the user will be stuck until the
passed_sample_intervals overflows.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 4472edf63d66 ("mm/damon/core: use number of passed access sampling as a timer")
Signed-off-by: SeongJae Park <[email protected]>
Cc: <[email protected]> [6.7.x]
Signed-off-by: Andrew Morton <[email protected]>

mm/mlock: set the correct prev on failure

After commit 94d7d9233951 ("mm: abstract the vma_merge()/split_vma()
pattern for mprotect() et al."), if vma_modify_flags() return error, the
vma is set to an error code. This will lead to an invalid prev be
returned.

Generally this shouldn't matter as the caller should treat an error as
indicating state is now invalidated, however unfortunately
apply_mlockall_flags() does not check for errors and assumes that
mlock_fixup() correctly maintains prev even if an error were to occur.

This patch fixes that assumption.

[[email protected]: provide a better fix and rephrase the log]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 94d7d9233951 ("mm: abstract the vma_merge()/split_vma() pattern for mprotect() et al.")
Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: Lorenzo Stoakes <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

objpool: fix to make percpu slot allocation more robust

Since gfp & GFP_ATOMIC == GFP_ATOMIC is true for GFP_KERNEL | GFP_HIGH, it
will use kmalloc if user specifies that combination.  Here the reason why
combining the __vmalloc_node() and kmalloc_node() is that the vmalloc does
not support all GFP flag, especially GFP_ATOMIC.  So we should check if
gfp & (GFP_ATOMIC | GFP_KERNEL) != GFP_ATOMIC for vmalloc first.  This
ensures caller can sleep.  And for the robustness, even if vmalloc fails,
it should retry with kmalloc to allocate it.

Link: https://lkml.kernel.org/r/173008598713.1262174.2959179484209897252.stgit@mhiramat.roam.corp.google.com
Fixes: aff1871bfc81 ("objpool: fix choosing allocation for percpu slots")
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Reported-by: Linus Torvalds <[email protected]>
Closes: https://lore.kernel.org/all/CAHk-=whO+vSH+XVRio8byJU8idAWES0SPGVZ7KAVdc4qrV0VUA@mail.gmail.com/
Cc: Leo Yan <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Matt Wu <[email protected]>
Cc: Mikel Rychliski <[email protected]>
Cc: Steven Rostedt (Google) <[email protected]>
Cc: Viktor Malik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>