Git Repo - linux.git/log

mm: memory-failure: make put_ref_page() more useful

Pass pfn/flags to put_ref_page(), then check MF_COUNT_INCREASED and drop
refcount to make the code look cleaner.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

compiler-gcc: document minimum version for `__no_sanitize_coverage__`

The attribute was added in GCC 12.1.

This will simplify future cleanups, and is closer to what we do
in `compiler_attributes.h`.

Link: https://godbolt.org/z/MGbT76j6G
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miguel Ojeda <[email protected]>
Acked-by: Marco Elver <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Dan Li <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Kumar Kartikeya Dwivedi <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Uros Bizjak <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

compiler-gcc: remove attribute support check for `__no_sanitize_undefined__`

The attribute was added in GCC 4.9, while the minimum GCC version
supported by the kernel is GCC 5.1.

Therefore, remove the check.

Link: https://godbolt.org/z/GrMeo6fYr
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miguel Ojeda <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Dan Li <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Kumar Kartikeya Dwivedi <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Uros Bizjak <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

compiler-gcc: remove attribute support check for `__no_sanitize_thread__`

The attribute was added in GCC 5.1, which matches the minimum GCC version
supported by the kernel.

Therefore, remove the check.

Link: https://godbolt.org/z/vbxKejxbx
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miguel Ojeda <[email protected]>
Acked-by: Marco Elver <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Dan Li <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Kumar Kartikeya Dwivedi <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Uros Bizjak <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

compiler-gcc: remove attribute support check for `__no_sanitize_address__`

The attribute was added in GCC 4.8, while the minimum GCC version
supported by the kernel is GCC 5.1.

Therefore, remove the check.

Link: https://godbolt.org/z/84v56vcn8
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miguel Ojeda <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Dan Li <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Kumar Kartikeya Dwivedi <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Uros Bizjak <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

compiler-gcc: be consistent with underscores use for `no_sanitize`

Patch series "compiler-gcc: be consistent with underscores use for
`no_sanitize`".

This patch (of 5):

Other macros that define shorthands for attributes in e.g.
`compiler_attributes.h` and elsewhere use underscores.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miguel Ojeda <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Dan Li <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Kumar Kartikeya Dwivedi <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Uros Bizjak <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/hugetlb: unify clearing of RestoreReserve for private pages

A trivial cleanup to move clearing of RestoreReserve into adding anon rmap
of private hugetlb mappings. It matches with the shared mappings where we
only clear the bit when adding into page cache, rather than spreading it
around the code paths.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Muchun Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

amdgpu: use VM_ACCESS_FLAGS

Simplify VM_READ|VM_WRITE|VM_EXEC with VM_ACCESS_FLAGS.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: "Pan, Xinhui" <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dinh Nguyen <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: debug_vm_pgtable: use VM_ACCESS_FLAGS

Directly use VM_ACCESS_FLAGS instead VMFLAGS.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Dinh Nguyen <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: "Pan, Xinhui" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: mprotect: use VM_ACCESS_FLAGS

Simplify VM_READ|VM_WRITE|VM_EXEC with VM_ACCESS_FLAGS.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Dinh Nguyen <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: "Pan, Xinhui" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

x86/sgx: use VM_ACCESS_FLAGS

Simplify VM_READ|VM_WRITE|VM_EXEC with VM_ACCESS_FLAGS.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Dinh Nguyen <[email protected]>
Cc: "Pan, Xinhui" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

nios2: remove unused INIT_MMAP

Patch series "mm: cleanup with VM_ACCESS_FLAGS".

This patch (of 5):

It seems that INIT_MMAP is gone in 2.4.10, not sure, anyways, it is
useless now, kill it.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: Dinh Nguyen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: "Pan, Xinhui" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: remove FGP_HEAD

This is no longer used; all callers have been converted to use folios
instead. Somehow this manages to save 11 bytes of text.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: convert find_get_incore_page() to filemap_get_incore_folio()

Return the containing folio instead of the precise page. One of the
callers wants the folio and the other can do the folio->page conversion
itself. Nets 442 bytes of text size reduction, 478 bytes removed and 36
bytes added.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/swap: convert find_get_incore_page to use folios

Eliminates a use of FGP_HEAD and saves 35 bytes of text.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/huge_memory: convert split_huge_pages_in_file() to use a folio

Patch series "Remove FGP_HEAD flag".

We have just two users left of the FGP_HEAD flag and both of them are
better off; sometimes startlingly so as a result of conversion to use
folios.

This patch (of 4):

Removes a number of calls to compound_head() and a call to
pagecache_get_page().

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: remove kern_addr_valid() completely

Most architectures (except arm64/x86/sparc) simply return 1 for
kern_addr_valid(), which is only used in read_kcore(), and it calls
copy_from_kernel_nofault() which could check whether the address is a
valid kernel address. So as there is no need for kern_addr_valid(), let's
remove it.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]> [m68k]
Acked-by: Heiko Carstens <[email protected]> [s390]
Acked-by: Christoph Hellwig <[email protected]>
Acked-by: Helge Deller <[email protected]> [parisc]
Acked-by: Michael Ellerman <[email protected]> [powerpc]
Acked-by: Guo Ren <[email protected]> [csky]
Acked-by: Catalin Marinas <[email protected]> [arm64]
Cc: Alexander Gordeev <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Anton Ivanov <[email protected]>
Cc: <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Dinh Nguyen <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Stefan Kristiansson <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Xuerui Wang <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

vmalloc: add reviewers for vmalloc code

Add myself and Christoph Hellwig as reviewers for vmalloc.

[[email protected]: coding-style cleanups]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oleksiy Avramchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: vmalloc: use trace_free_vmap_area_noflush event

It is for debug purposes and is called when a vmap area gets freed. This
event gives some indication about:

- a start address of released area;
- a current number of outstanding pages;
- a maximum number of allowed outstanding pages.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oleksiy Avramchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: vmalloc: use trace_purge_vmap_area_lazy event

This is for debug purposes and is called when all outstanding areas are
removed back to the vmap space. It gives some extra information about:

- a start:end range where set of vmap ares were freed;
- a number of purged areas which were backed off.

[[email protected]: simplify return boolean expression]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oleksiy Avramchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: vmalloc: use trace_alloc_vmap_area event

This is for debug purpose and is called when an allocation attempt occurs.
This event gives some information about:

- start address of allocated area;
- size that is requested;
- alignment that is required;
- vstart/vend restriction;
- if an allocation fails.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oleksiy Avramchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: vmalloc: add free_vmap_area_noflush trace event

This event is used in order to validate/debug a start address of freed VA,
number of currently outstanding and maximum allowed areas.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oleksiy Avramchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: vmalloc: add purge_vmap_area_lazy trace event

It is for debug purposes to track number of freed vmap areas including a
range it occurs on.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oleksiy Avramchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: vmalloc: add alloc_vmap_area trace event

Patch series "Add basic trace events for vmap/vmalloc (v2)", v2.

This small series add some basic trace events for the vmap/vmalloc code.
Since currently we lack any, sometimes it is hard to start debuging vmap
code if an issue is reported or occured.

For example https://lore.kernel.org/linux-mm/Y0p8BZIiDXLQbde%2F@pc636/T/

The final patch adds two reviewers for vmalloc code.

This patch (of 7):

It is for debug purposes and for validation of passed parameters.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oleksiy Avramchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

memory: move hotplug memory notifier priority to same file for easy sorting

The priority of hotplug memory callback is defined in a different file.
And there are some callers using numbers directly. Collect them together
into include/linux/memory.h for easy reading. This allows us to sort
their priorities more intuitively without additional comments.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liu Shixin <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: zefan li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

memory: remove unused register_hotmemory_notifier()

Remove unused register_hotmemory_notifier().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liu Shixin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: zefan li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

ACPI: HMAT: use hotplug_memory_notifier() directly

Commit 76ae847497bc52 ("Documentation: raise minimum supported version of
GCC to 5.1") updated the minimum gcc version to 5.1. So the problem
mentioned in f02c69680088 ("include/linux/memory.h: implement
register_hotmemory_notifier()") no longer exist. So we can now switch to
use hotplug_memory_notifier() directly rather than
register_hotmemory_notifier().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liu Shixin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: zefan li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/mm_init.c: use hotplug_memory_notifier() directly

Commit 76ae847497bc52 ("Documentation: raise minimum supported version of
GCC to 5.1") updated the minimum gcc version to 5.1. So the problem
mentioned in f02c69680088 ("include/linux/memory.h: implement
register_hotmemory_notifier()") no longer exist. So we can now switch to
use hotplug_memory_notifier() directly rather than
register_hotmemory_notifier().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liu Shixin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: zefan li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/mmap: use hotplug_memory_notifier() directly

Commit 76ae847497bc52 ("Documentation: raise minimum supported version of
GCC to 5.1") updated the minimum gcc version to 5.1. So the problem
mentioned in f02c69680088 ("include/linux/memory.h: implement
register_hotmemory_notifier()") no longer exist. So we can now switch to
use hotplug_memory_notifier() directly rather than
register_hotmemory_notifier().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liu Shixin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: zefan li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/slub.c: use hotplug_memory_notifier() directly

Commit 76ae847497bc52 ("Documentation: raise minimum supported version of
GCC to 5.1") updated the minimum gcc version to 5.1. So the problem
mentioned in f02c69680088 ("include/linux/memory.h: implement
register_hotmemory_notifier()") no longer exist. So we can now switch to
use hotplug_memory_notifier() directly rather than
register_hotmemory_notifier().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liu Shixin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: zefan li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

fs/proc/kcore.c: use hotplug_memory_notifier() directly

Commit 76ae847497bc52 ("Documentation: raise minimum supported version of
GCC to 5.1") updated the minimum gcc version to 5.1. So the problem
mentioned in f02c69680088 ("include/linux/memory.h: implement
register_hotmemory_notifier()") no longer exist. So we can now switch to
use hotplug_memory_notifier() directly rather than
register_hotmemory_notifier().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liu Shixin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: zefan li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

cgroup/cpuset: use hotplug_memory_notifier() directly

Patch series "mm: Use hotplug_memory_notifier() instead of
register_hotmemory_notifier()", v4.

Commit f02c69680088 ("include/linux/memory.h: implement
register_hotmemory_notifier()") introduced register_hotmemory_notifier()
to avoid a compile problem with gcc-4.4.4:

    When CONFIG_MEMORY_HOTPLUG=n, we don't want the memory-hotplug notifier
    handlers to be included in the .o files, for space reasons.

    The existing hotplug_memory_notifier() tries to handle this but testing
    with gcc-4.4.4 shows that it doesn't work - the hotplug functions are
    still present in the .o files.

Since commit 76ae847497bc52 ("Documentation: raise minimum supported
version of GCC to 5.1") has already updated the minimum gcc version to
5.1.  The previous problem mentioned in f02c69680088 does not exist.  So
we can now revert to use hotplug_memory_notifier() directly rather than
register_hotmemory_notifier().

In the last patch, we move all hotplug memory notifier priority to same
file for easy sorting.

This patch (of 8):

Commit 76ae847497bc52 ("Documentation: raise minimum supported version of
GCC to 5.1") updated the minimum gcc version to 5.1.  So the problem
mentioned in f02c69680088 ("include/linux/memory.h: implement
register_hotmemory_notifier()") no longer exist.  So we can now switch to
use hotplug_memory_notifier() directly rather than
register_hotmemory_notifier().

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liu Shixin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: zefan li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: rmap: rename page_not_mapped() to folio_not_mapped()

Since commit 2f031c6f042c ("mm/rmap: Convert rmap_walk() to take a
folio"), page_not_mapped() takes folio as parameter, rename it to be
consistent.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

selftests/vm: anon_cow: add R/O longterm tests via gup_test

Let's trigger a R/O longterm pin on three cases of R/O mapped anonymous
pages:
* exclusive (never shared)
* shared (child still alive)
* previously shared (child no longer alive)

... and make sure that the pin is reliable: whatever we write via the page
tables has to be observable via the pin.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Christoph von Recklinghausen <[email protected]>
Cc: Don Dutile <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/gup_test: start/stop/read functionality for PIN LONGTERM test

We want an easy way to take a R/O or R/W longterm pin on a range and be
able to observe the content of the pinned pages, so we can properly test
how longterm puns interact with our COW logic.

[[email protected]: silence a warning on 32-bit]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: ./mm/gup_test.c:281:2-3: Unneeded semicolon]
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2455
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Yang Li <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Christoph von Recklinghausen <[email protected]>
Cc: Don Dutile <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

selftests/vm: anon_cow: add liburing test cases

io_uring provides a simple mechanism to test long-term, R/W GUP pins
-- via fixed buffers -- and can be used to verify that GUP pins stay
in sync with the pages in the page table even if a page would
temporarily get mapped R/O or concurrent fork() could accidentially
end up sharing pinned pages with the child.

Note that this essentially re-introduces local_config support that was
removed recently in commit 6f83d6c74ea5 ("Kselftests: remove support of
libhugetlbfs from kselftests").

[[email protected]: s/size_t/ssize_t/ on `cur', `total'.]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Christoph von Recklinghausen <[email protected]>
Cc: Don Dutile <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

selftests/vm: anon_cow: hugetlb tests

Let's run all existing test cases with all hugetlb sizes we're able to
detect.

Note that some tests cases still fail. This will, for example, be fixed
once vmsplice properly uses FOLL_PIN instead of FOLL_GET for pinning.
With 2 MiB and 1 GiB hugetlb on x86_64, the expected failures are:

# [RUN] vmsplice() + unmap in child ... with hugetlb (2048 kB)
not ok 23 No leak from parent into child
# [RUN] vmsplice() + unmap in child ... with hugetlb (1048576 kB)
not ok 24 No leak from parent into child
# [RUN] vmsplice() before fork(), unmap in parent after fork() ... with hugetlb (2048 kB)
not ok 35 No leak from child into parent
# [RUN] vmsplice() before fork(), unmap in parent after fork() ... with hugetlb (1048576 kB)
not ok 36 No leak from child into parent
# [RUN] vmsplice() + unmap in parent after fork() ... with hugetlb (2048 kB)
not ok 47 No leak from child into parent
# [RUN] vmsplice() + unmap in parent after fork() ... with hugetlb (1048576 kB)
not ok 48 No leak from child into parent

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Christoph von Recklinghausen <[email protected]>
Cc: Don Dutile <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

selftests/vm: anon_cow: THP tests

Let's add various THP variants that we'll run with our existing test
cases.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Christoph von Recklinghausen <[email protected]>
Cc: Don Dutile <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

selftests/vm: factor out pagemap_is_populated() into vm_util

We'll reuse it in the anon_cow test next.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Christoph von Recklinghausen <[email protected]>
Cc: Don Dutile <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

selftests/vm: anon_cow: test COW handling of anonymous memory

Patch series "selftests/vm: test COW handling of anonymous memory".

This is my current set of tests for testing COW handling of anonymous
memory, especially when interacting with GUP.  I developed these tests
while working on PageAnonExclusive and managed to clean them up just now.

On current upstream Linux, all tests pass except the hugetlb tests that
rely on vmsplice -- these tests should pass as soon as vmsplice properly
uses FOLL_PIN instead of FOLL_GET.

I'm working on additional tests for COW handling in private mappings,
focusing on long-term R/O pinning e.g., of the shared zeropage, pagecache
pages and KSM pages.  These tests, however, will go into a different file.
So this is everything I have regarding tests for anonymous memory.

This patch (of 7):

Let's start adding tests for our COW handling of anonymous memory.  We'll
focus on basic tests that we can achieve without additional libraries or
gup_test extensions.

We'll add THP and hugetlb tests separately.

[[email protected]: s/size_t/ssize_t/ on `cur', `total', `transferred';]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Christoph von Recklinghausen <[email protected]>
Cc: Don Dutile <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

kasan: migrate workqueue_uaf test to kunit

Migrate the workqueue_uaf test to the KUnit framework.

Initially, this test was intended to check that Generic KASAN prints
auxiliary stack traces for workqueues. Nevertheless, the test is enabled
for all modes to make that KASAN reports bad accesses in the tested
scenario.

The presence of auxiliary stack traces for the Generic mode needs to be
inspected manually.

Link: https://lkml.kernel.org/r/1d81b6cc2a58985126283d1e0de8e663716dd930.1664298455.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Marco Elver <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

kasan: migrate kasan_rcu_uaf test to kunit

Migrate the kasan_rcu_uaf test to the KUnit framework.

Changes to the implementation of the test:

- Call rcu_barrier() after call_rcu() to make that the RCU callbacks get
triggered before the test is over.

- Cast pointer passed to rcu_dereference_protected as __rcu to get rid of
the Sparse warning.

- Check that KASAN prints a report via KUNIT_EXPECT_KASAN_FAIL.

Initially, this test was intended to check that Generic KASAN prints
auxiliary stack traces for RCU objects. Nevertheless, the test is enabled
for all modes to make that KASAN reports bad accesses in RCU callbacks.

The presence of auxiliary stack traces for the Generic mode needs to be
inspected manually.

Link: https://lkml.kernel.org/r/897ee08d6cd0ba7e8a4fbfd9d8502823a2f922e6.1664298455.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Marco Elver <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

kasan: switch kunit tests to console tracepoints

Switch KUnit-compatible KASAN tests from using per-task KUnit resources to
console tracepoints.

This allows for two things:

1. Migrating tests that trigger a KASAN report in the context of a task
   other than current to KUnit framework.
   This is implemented in the patches that follow.

2. Parsing and matching the contents of KASAN reports.
   This is not yet implemented.

Link: https://lkml.kernel.org/r/9345acdd11e953b207b0ed4724ff780e63afeb36.1664298455.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Marco Elver <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

tmpfs: ensure O_LARGEFILE with generic_file_open()

Without this check open() will open large files on tmpfs although
O_LARGEFILE was not specified. This is inconsistent with other
filesystems. Also it will later result in EOVERFLOW on stat() or EFBIG on
write().

Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Weißschuh <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: memcontrol: use mem_cgroup_is_root() helper

Replace the checks for memcg is root memcg, with mem_cgroup_is_root()
helper.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kamalesh Babulal <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kamalesh Babulal <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Tom Hromatka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/mincore.c: use vma_lookup() instead of find_vma()

Using vma_lookup() verifies the start address is contained in the found
vma. This results in easier to read the code.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Deming Wang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/shmem: remove unneeded assignments in shmem_get_folio_gfp()

After the rework of shmem_get_folio_gfp() to use a folio, the local
variable hindex is only needed to be set once before passing it to
shmem_add_to_page_cache().

Remove the unneeded initialization and assignments of the variable hindex
before the actual effective assignment and first use.

No functional change. No change in object code.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Lukas Bulwahn <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: fix typo in struct vm_operations_struct comments

There is no eprotect(), so I assume this is about mprotect().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Rolf Eike Beer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

zram: use try_cmpxchg in update_used_max

Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
update_used_max. x86 CMPXCHG instruction returns success in ZF flag, so
this change saves a compare after cmpxchg (and related move instruction in
front of cmpxchg).

Also, reorder code a bit to remove additional compare and conditional jump
from the assembly code. Together, hese two changes save 15 bytes from the
function when compiled for x86_64.

No functional change intended.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Uros Bizjak <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nitin Gupta <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

filemap: find_get_entries() now updates start offset

Initially, find_get_entries() was being passed in the start offset as a
value.  That left the calculation of the offset to the callers.  This led
to complexity in the callers trying to keep track of the index.

Now find_get_entries() takes in a pointer to the start offset and updates
the value to be directly after the last entry found.  If no entry is
found, the offset is not changed.  This gets rid of multiple hacky
calculations that kept track of the start offset.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

filemap: find_lock_entries() now updates start offset

Patch series "Rework find_get_entries() and find_lock_entries()", v3.

Originally the callers of find_get_entries() and find_lock_entries() were
keeping track of the start index themselves as they traverse the search
range.

This resulted in hacky code such as in shmem_undo_range():

index = folio->index + folio_nr_pages(folio) - 1;

where the - 1 is only present to stay in the right spot after incrementing
index later.  This sort of calculation was also being done on every folio
despite not even using index later within that function.

These patches change find_get_entries() and find_lock_entries() to
calculate the new index instead of leaving it to the callers so we can
avoid all these complications.

This patch (of 2):

Initially, find_lock_entries() was being passed in the start offset as a
value.  That left the calculation of the offset to the callers.  This led
to complexity in the callers trying to keep track of the index.

Now find_lock_entries() takes in a pointer to the start offset and updates
the value to be directly after the last entry found.  If no entry is
found, the offset is not changed.  This gets rid of multiple hacky
calculations that kept track of the start offset.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/rmap: fix comment in anon_vma_clone()

Commit 2555283eb40d ("mm/rmap: Fix anon_vma->degree ambiguity leading to
double-reuse") use num_children and num_active_vmas to replace the origin
degree to fix anon_vma UAF problem. Update the comment in anon_vma_clone
to fit this change.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ma Wupeng <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/hugetlb: add folio_hstate()

Helper function to retrieve hstate information from a hugetlb folio.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Reported-by: kernel test robot <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: David Howells <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

hugetlbfs: convert hugetlb_delete_from_page_cache() to use folios

Remove the last caller of delete_from_page_cache() by converting the code
to its folio equivalent.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: David Howells <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/hugetlb: add hugetlb_folio_subpool() helpers

Allow hugetlbfs_migrate_folio to check and read subpool information by
passing in a folio.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: David Howells <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: add private field of first tail to struct page and struct folio

Allow struct folio to store hugetlb metadata that is contained in the
private field of the first tail page. On 32-bit, _private_1 aligns with
page[1].private.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Acked-by: Mike Kravetz <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: David Howells <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/hugetlb: add folio support to hugetlb specific flag macros

Patch series "begin converting hugetlb code to folios", v4.

This patch series starts the conversion of the hugetlb code to operate on
struct folios rather than struct pages.  This removes the ambiguitiy of
whether functions are operating on head pages, tail pages of compound
pages, or base pages.

This series passes the linux test project hugetlb test cases.

Patch 1 adds hugeltb specific page macros that can operate on folios.

Patch 2 adds the private field of the first tail page to struct page.  For
32-bit, _private_1 alinging with page[1].private was confirmed by using
pahole.

Patch 3 introduces hugetlb subpool helper functions which operate on
struct folios. These patches were tested using the hugepage-mmap.c
selftest along with the migratepages command.

Patch 4 converts hugetlb_delete_from_page_cache() to use folios.

Patch 5 adds a folio_hstate() function to get hstate information from a
folio and adds a user of folio_hstate().

Bpftrace was used to track time spent in the free_huge_pages function
during the ltp test cases as it is a caller of the hugetlb subpool
functions. From the histogram, the performance is similar before and
after the patch series.

Time spent in 'free_huge_page'

6.0.0-rc2.master.20220823
@nsecs:
[256, 512)         14770 |@@@@@@@@@@@@@@@@@@@@@@@@@@@
|@@@@@@@@@@@@@@@@@@@@@@@@@       |
[512, 1K)            155 |                                                    |
[1K, 2K)             169 |                                                    |
[2K, 4K)              50 |                                                    |
[4K, 8K)              14 |                                                    |
[8K, 16K)              3 |                                                    |
[16K, 32K)             3 |                                                    |

6.0.0-rc2.master.20220823 + patch series
@nsecs:
[256, 512)         13678 |@@@@@@@@@@@@@@@@@@@@@@@@@@@       |
|@@@@@@@@@@@@@@@@@@@@@@@@@       |
[512, 1K)            142 |                                                    |
[1K, 2K)             199 |                                                    |
[2K, 4K)              44 |                                                    |
[4K, 8K)              13 |                                                    |
[8K, 16K)              4 |                                                    |
[16K, 32K)             1 |                                                    |

This patch (of 5):

Allow the macros which test, set, and clear hugetlb specific page flags to
take a hugetlb folio as an input.  The macrros are generated as
folio_{test, set, clear}_hugetlb_{restore_reserve, migratable, temporary,
freed, vmemmap_optimized, raw_hwp_unreliable}.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: David Howells <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

selftests/vm: drop mnt point for hugetlb in run_vmtests.sh

After converting all the three relevant testcases (uffd, madvise, mremap)
to use memfd, no test will need the hugetlb mount point anymore. Drop the
code.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Axel Rasmussen <[email protected]>
Cc: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

selftests/vm: use memfd for hugepage-mremap test

For dropping the hugetlb mountpoint in run_vmtests.sh. Cleaned it up a
little bit around the changed codes.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

selftests/vm: use memfd for hugetlb-madvise test

For dropping the hugetlb mountpoint in run_vmtests.sh. Since no parameter
is needed, drop USAGE too.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

selftests/vm: use memfd for uffd hugetlb tests

Patch series "selftests/vm: Drop hugetlb mntpoint in run_vmtests.sh", v2.

Clean the code up so we can use the same memfd for both hugetlb and shmem
which is cleaner.

This patch (of 4):

We already used memfd for shmem test, move it forward with hugetlb too so
that we don't need user to specify the hugetlb file path explicitly when
running hugetlb shared tests.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Axel Rasmussen <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: vmscan: make rotations a secondary factor in balancing anon vs file

We noticed a 2% webserver throughput regression after upgrading from 5.6.
This could be tracked down to a shift in the anon/file reclaim balance
(confirmed with swappiness) that resulted in worse reclaim efficiency and
thus more kswapd activity for the same outcome.

The change that exposed the problem is aae466b0052e ("mm/swap: implement
workingset detection for anonymous LRU").  By qualifying swapins based on
their refault distance, it lowered the cost of anon reclaim in this
workload, in turn causing (much) more anon scanning than before.  Scanning
the anon list is more expensive due to the higher ratio of mmapped pages
that may rotate during reclaim, and so the result was an increase in %sys
time.

Right now, rotations aren't considered a cost when balancing scan pressure
between LRUs.  We can end up with very few file refaults putting all the
scan pressure on hot anon pages that are rotated en masse, don't get
reclaimed, and never push back on the file LRU again.  We still only
reclaim file cache in that case, but we burn a lot CPU rotating anon
pages.  It's "fair" from an LRU age POV, but doesn't reflect the real cost
it imposes on the system.

Consider rotations as a secondary factor in balancing the LRUs.  This
doesn't attempt to make a precise comparison between IO cost and CPU cost,
it just says: if reloads are about comparable between the lists, or
rotations are overwhelmingly different, adjust for CPU work.

This fixed the regression on our webservers.  It has since been deployed
to the entire Meta fleet and hasn't caused any problems.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Johannes Weiner <[email protected]>
Cc: Rik van Riel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

hugetlb: simplify hugetlb handling in follow_page_mask

During discussions of this series [1], it was suggested that hugetlb
handling code in follow_page_mask could be simplified.  At the beginning
of follow_page_mask, there currently is a call to follow_huge_addr which
'may' handle hugetlb pages.  ia64 is the only architecture which provides
a follow_huge_addr routine that does not return error.  Instead, at each
level of the page table a check is made for a hugetlb entry.  If a hugetlb
entry is found, a call to a routine associated with that entry is made.

Currently, there are two checks for hugetlb entries at each page table
level.  The first check is of the form:

        if (p?d_huge())
                page = follow_huge_p?d();

the second check is of the form:

        if (is_hugepd())
                page = follow_huge_pd().

We can replace these checks, as well as the special handling routines such
as follow_huge_p?d() and follow_huge_pd() with a single routine to handle
hugetlb vmas.

A new routine hugetlb_follow_page_mask is called for hugetlb vmas at the
beginning of follow_page_mask.  hugetlb_follow_page_mask will use the
existing routine huge_pte_offset to walk page tables looking for hugetlb
entries.  huge_pte_offset can be overwritten by architectures, and already
handles special cases such as hugepd entries.

[1] https://lore.kernel.org/linux-mm/cover.1661240170 [email protected]/

[[email protected]: remove vma (pmd sharing) per Peter]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: remove left over hugetlb_vma_unlock_read()]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Kravetz <[email protected]>
Suggested-by: David Hildenbrand <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
Tested-by: Baolin Wang <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

docs: kmsan: fix formatting of "Example report"

Add a blank line to make the sentence before the list render as a separate
paragraph, not a definition.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 93858ae70cf4 ("kmsan: add ReST documentation")
Signed-off-by: Alexander Potapenko <[email protected]>
Suggested-by: Bagas Sanjaya <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/damon/dbgfs: check if rm_contexts input is for a real context

A user could write a name of a file under 'damon/' debugfs directory,
which is not a user-created context, to 'rm_contexts' file.  In the case,
'dbgfs_rm_context()' just assumes it's the valid DAMON context directory
only if a file of the name exist.  As a result, invalid memory access
could happen as below.  Fix the bug by checking if the given input is for
a directory.  This check can filter out non-context inputs because
directories under 'damon/' debugfs directory can be created via only
'mk_contexts' file.

This bug has found by syzbot[1].

[1] https://lore.kernel.org/damon/000000000000ede3ac05ec4abf8e@google.com/

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 75c1c2b53c78 ("mm/damon/dbgfs: support multiple contexts")
Signed-off-by: SeongJae Park <[email protected]>
Reported-by: [email protected]
Cc: <[email protected]> [5.15.x]
Signed-off-by: Andrew Morton <[email protected]>

maple_tree: don't set a new maximum on the node when not reusing nodes

In RCU mode, the node limits were being updated to the last pivot which
may not be correct and would cause the metadata to be set when it
shouldn't. Fix this by not setting a new limit in this case.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam R. Howlett <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

maple_tree: fix depth tracking in maple_state

It is possible to confuse the depth tracking in the maple state by
searching the same node for values. Fix the depth tracking by moving
where the depth is incremented closer to where the node changes level.
Also change the initial depth setting when using the root node.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam R. Howlett <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

arch/x86/mm/hugetlbpage.c: pud_huge() returns 0 when using 2-level paging

The following bug is reported to be triggered when starting X on x86-32
system with i915:

  [  225.777375] kernel BUG at mm/memory.c:2664!
  [  225.777391] invalid opcode: 0000 [#1] PREEMPT SMP
  [  225.777405] CPU: 0 PID: 2402 Comm: Xorg Not tainted 6.1.0-rc3-bdg+ #86
  [  225.777415] Hardware name:  /8I865G775-G, BIOS F1 08/29/2006
  [  225.777421] EIP: __apply_to_page_range+0x24d/0x31c
  [  225.777437] Code: ff ff 8b 55 e8 8b 45 cc e8 0a 11 ec ff 89 d8 83 c4 28 5b 5e 5f 5d c3 81 7d e0 a0 ef 96 c1 74 ad 8b 45 d0 e8 2d 83 49 00 eb a3 <0f> 0b 25 00 f0 ff ff 81 eb 00 00 00 40 01 c3 8b 45 ec 8b 00 e8 76
  [  225.777446] EAX: 00000001 EBX: c53a3b58 ECX: b5c00000 EDX: c258aa00
  [  225.777454] ESI: b5c00000 EDI: b5900000 EBP: c4b0fdb4 ESP: c4b0fd80
  [  225.777462] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010202
  [  225.777470] CR0: 80050033 CR2: b5900000 CR3: 053a3000 CR4: 000006d0
  [  225.777479] Call Trace:
  [  225.777486]  ? i915_memcpy_init_early+0x63/0x63 [i915]
  [  225.777684]  apply_to_page_range+0x21/0x27
  [  225.777694]  ? i915_memcpy_init_early+0x63/0x63 [i915]
  [  225.777870]  remap_io_mapping+0x49/0x75 [i915]
  [  225.778046]  ? i915_memcpy_init_early+0x63/0x63 [i915]
  [  225.778220]  ? mutex_unlock+0xb/0xd
  [  225.778231]  ? i915_vma_pin_fence+0x6d/0xf7 [i915]
  [  225.778420]  vm_fault_gtt+0x2a9/0x8f1 [i915]
  [  225.778644]  ? lock_is_held_type+0x56/0xe7
  [  225.778655]  ? lock_is_held_type+0x7a/0xe7
  [  225.778663]  ? 0xc1000000
  [  225.778670]  __do_fault+0x21/0x6a
  [  225.778679]  handle_mm_fault+0x708/0xb21
  [  225.778686]  ? mt_find+0x21e/0x5ae
  [  225.778696]  exc_page_fault+0x185/0x705
  [  225.778704]  ? doublefault_shim+0x127/0x127
  [  225.778715]  handle_exception+0x130/0x130
  [  225.778723] EIP: 0xb700468a

Recently pud_huge() got aware of non-present entry by commit 3a194f3f8ad0
("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present
pud entry") to handle some special states of gigantic page.  However, it's
overlooked that pud_none() always returns false when running with 2-level
paging, and as a result pud_huge() can return true pointlessly.

Introduce "#if CONFIG_PGTABLE_LEVELS > 2" to pud_huge() to deal with this.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 3a194f3f8ad0 ("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry")
Signed-off-by: Naoya Horiguchi <[email protected]>
Reported-by: Ville Syrjälä <[email protected]>
Tested-by: Ville Syrjälä <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Liu Shixin <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

fs: fix leaked psi pressure state

When psi annotations were added to to btrfs compression reads, the psi
state tracking over add_ra_bio_pages and btrfs_submit_compressed_read was
faulty.  A pressure state, once entered, is never left.  This results in
incorrectly elevated pressure, which triggers OOM kills.

pflags record the *previous* memstall state when we enter a new one.  The
code tried to initialize pflags to 1, and then optimize the leave call
when we either didn't enter a memstall, or were already inside a nested
stall.  However, there can be multiple PageWorkingset pages in the bio, at
which point it's that path itself that enters repeatedly and overwrites
pflags.  This causes us to miss the exit.

Enter the stall only once if needed, then unwind correctly.

erofs has the same problem, fix that up too.  And move the memstall exit
past submit_bio() to restore submit accounting originally added by
b8e24a9300b0 ("block: annotate refault stalls from IO submission").

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 4088a47e78f9 ("btrfs: add manual PSI accounting for compressed reads")
Fixes: 99486c511f68 ("erofs: add manual PSI accounting for the compressed address space")
Fixes: 118f3663fbc6 ("block: remove PSI accounting from the bio layer")
Link: https://lore.kernel.org/r/[email protected]/
Signed-off-by: Johannes Weiner <[email protected]>
Reported-by: Thorsten Leemhuis <[email protected]>
Tested-by: Thorsten Leemhuis <[email protected]>
Cc: Chao Yu <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Gao Xiang <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

nilfs2: fix use-after-free bug of ns_writer on remount

If a nilfs2 filesystem is downgraded to read-only due to metadata
corruption on disk and is remounted read/write, or if emergency read-only
remount is performed, detaching a log writer and synchronizing the
filesystem can be done at the same time.

In these cases, use-after-free of the log writer (hereinafter
nilfs->ns_writer) can happen as shown in the scenario below:

Task1                               Task2
--------------------------------    ------------------------------
nilfs_construct_segment
   nilfs_segctor_sync
     init_wait
     init_waitqueue_entry
     add_wait_queue
     schedule
                                     nilfs_remount (R/W remount case)
       nilfs_attach_log_writer
                                         nilfs_detach_log_writer
                                           nilfs_segctor_destroy
                                             kfree
     finish_wait
       _raw_spin_lock_irqsave
         __raw_spin_lock_irqsave
           do_raw_spin_lock
             debug_spin_lock_before  <-- use-after-free

While Task1 is sleeping, nilfs->ns_writer is freed by Task2.  After Task1
waked up, Task1 accesses nilfs->ns_writer which is already freed.  This
scenario diagram is based on the Shigeru Yoshida's post [1].

This patch fixes the issue by not detaching nilfs->ns_writer on remount so
that this UAF race doesn't happen.  Along with this change, this patch
also inserts a few necessary read-only checks with superblock instance
where only the ns_writer pointer was used to check if the filesystem is
read-only.

Link: https://syzkaller.appspot.com/bug?id=79a4c002e960419ca173d55e863bd09e8112df8b
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Reported-by: [email protected]
Reported-by: Shigeru Yoshida <[email protected]>
Tested-by: Ryusuke Konishi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

x86/traps: avoid KMSAN bugs originating from handle_bug()

There is a case in exc_invalid_op handler that is executed outside the
irqentry_enter()/irqentry_exit() region when an UD2 instruction is used to
encode a call to __warn().

In that case the `struct pt_regs` passed to the interrupt handler is never
unpoisoned by KMSAN (this is normally done in irqentry_enter()), which
leads to false positives inside handle_bug().

Use kmsan_unpoison_entry_regs() to explicitly unpoison those registers
before using them.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexander Potapenko <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

kmsan: make sure PREEMPT_RT is off

As pointed out by Peter Zijlstra, __msan_poison_alloca() does not play
well with IRQ code when PREEMPT_RT is on, because in that mode even
GFP_ATOMIC allocations cannot be performed.

Fixing this would require making stackdepot completely lockless, which is
quite challenging and may be excessive for the time being.

Instead, make sure KMSAN is incompatible with PREEMPT_RT, like other debug
configs are.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Alexander Potapenko <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

Kconfig.debug: ensure early check for KMSAN in CONFIG_KMSAN_WARN

As pointed out by Masahiro Yamada, Kconfig picks up the first default
entry which has true 'if' condition. Hence, the previously added check
for KMSAN was never used, because it followed the checks for 64BIT and
!64BIT.

Put KMSAN check before others to ensure it is always applied.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://github.com/google/kmsan/issues/89
Link: https://lore.kernel.org/linux-mm/[email protected]/
Fixes: 921757bc9b61 ("Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default")
Signed-off-by: Alexander Potapenko <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

x86/uaccess: instrument copy_from_user_nmi()

Make sure usercopy hooks from linux/instrumented.h are invoked for
copy_from_user_nmi(). This fixes KMSAN false positives reported when
dumping opcodes for a stack trace.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexander Potapenko <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

kmsan: core: kmsan_in_runtime() should return true in NMI context

Without that, every call to __msan_poison_alloca() in NMI may end up
allocating memory, which is NMI-unsafe.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Alexander Potapenko <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: hugetlb_vmemmap: include missing linux/moduleparam.h

The kernel test robot reported build failures with a 'randconfig' on s390:
>> mm/hugetlb_vmemmap.c:421:11: error: a function declaration without a
prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
core_param(hugetlb_free_vmemmap, vmemmap_optimize_enabled, bool, 0);
^

Link: https://lore.kernel.org/linux-mm/[email protected]/
Link: https://lkml.kernel.org/r/patch.git-296b83ca939b.your-ad-here.call-01667411912-ext-5073@work.hours
Fixes: 30152245c63b ("mm: hugetlb_vmemmap: replace early_param() with core_param()")
Signed-off-by: Vasily Gorbik <[email protected]>
Reported-by: kernel test robot <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/shmem: use page_mapping() to detect page cache for uffd continue

mfill_atomic_install_pte() checks page->mapping to detect whether one page
is used in the page cache.  However as pointed out by Matthew, the page
can logically be a tail page rather than always the head in the case of
uffd minor mode with UFFDIO_CONTINUE.  It means we could wrongly install
one pte with shmem thp tail page assuming it's an anonymous page.

It's not that clear even for anonymous page, since normally anonymous
pages also have page->mapping being setup with the anon vma.  It's safe
here only because the only such caller to mfill_atomic_install_pte() is
always passing in a newly allocated page (mcopy_atomic_pte()), whose
page->mapping is not yet setup.  However that's not extremely obvious
either.

For either of above, use page_mapping() instead.

Link: https://lkml.kernel.org/r/Y2K+y7wnhC4vbnP2@x1n
Fixes: 153132571f02 ("userfaultfd/shmem: support UFFDIO_CONTINUE for shmem")
Signed-off-by: Peter Xu <[email protected]>
Reported-by: Matthew Wilcox <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/memremap.c: map FS_DAX device memory as decrypted

virtio_pmem use devm_memremap_pages() to map the device memory. By
default this memory is mapped as encrypted with SEV. Guest reboot changes
the current encryption key and guest no longer properly decrypts the FSDAX
device meta data.

Mark the corresponding device memory region for FSDAX devices (mapped with
memremap_pages) as decrypted to retain the persistent memory property.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: b7b3c01b19159 ("mm/memremap_pages: support multiple ranges per invocation")
Signed-off-by: Pankaj Gupta <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

Partly revert "mm/thp: carry over dirty bit when thp splits on pmd"

Anatoly Pugachev reported sparc64 breakage on the patch:

https://lore.kernel.org/r/20221021160603 [email protected]

The sparc64 impl of pte_mkdirty() is definitely slightly special in that
it leverages a code patching mechanism for sun4u/sun4v on relevant pgtable
entry operations.

Before having a clue of why the sparc64 is special and caused the patch to
SIGSEGV the processes, revert the patch for now. The swap path of dirty
bit inheritage is kept because that's using the swap shared code so we
assume it'll not be affected.

Link: https://lkml.kernel.org/r/Y1Wbi4yyVvDtg4zN@x1n
Fixes: 0ccf7f168e17 ("mm/thp: carry over dirty bit when thp splits on pmd")
Signed-off-by: Peter Xu <[email protected]>
Reported-by: Anatoly Pugachev <[email protected]>
Tested-by: Anatoly Pugachev <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

nilfs2: fix deadlock in nilfs_count_free_blocks()

A semaphore deadlock can occur if nilfs_get_block() detects metadata
corruption while locating data blocks and a superblock writeback occurs at
the same time:

task 1                               task 2
------                               ------
* A file operation *
nilfs_truncate()
  nilfs_get_block()
    down_read(rwsem A) <--
    nilfs_bmap_lookup_contig()
      ...                            generic_shutdown_super()
                                       nilfs_put_super()
                                         * Prepare to write superblock *
                                         down_write(rwsem B) <--
                                         nilfs_cleanup_super()
      * Detect b-tree corruption *         nilfs_set_log_cursor()
      nilfs_bmap_convert_error()             nilfs_count_free_blocks()
        __nilfs_error()                        down_read(rwsem A) <--
          nilfs_set_error()
            down_write(rwsem B) <--

                           *** DEADLOCK ***

Here, nilfs_get_block() readlocks rwsem A (= NILFS_MDT(dat_inode)->mi_sem)
and then calls nilfs_bmap_lookup_contig(), but if it fails due to metadata
corruption, __nilfs_error() is called from nilfs_bmap_convert_error()
inside the lock section.

Since __nilfs_error() calls nilfs_set_error() unless the filesystem is
read-only and nilfs_set_error() attempts to writelock rwsem B (=
nilfs->ns_sem) to write back superblock exclusively, hierarchical lock
acquisition occurs in the order rwsem A -> rwsem B.

Now, if another task starts updating the superblock, it may writelock
rwsem B during the lock sequence above, and can deadlock trying to
readlock rwsem A in nilfs_count_free_blocks().

However, there is actually no need to take rwsem A in
nilfs_count_free_blocks() because it, within the lock section, only reads
a single integer data on a shared struct with
nilfs_sufile_get_ncleansegs().  This has been the case after commit
aa474a220180 ("nilfs2: add local variable to cache the number of clean
segments"), that is, even before this bug was introduced.

So, this resolves the deadlock problem by just not taking the semaphore in
nilfs_count_free_blocks().

Link: https://lkml.kernel.org/r/[email protected]
Fixes: e828949e5b42 ("nilfs2: call nilfs_error inside bmap routines")
Signed-off-by: Ryusuke Konishi <[email protected]>
Reported-by: [email protected]
Tested-by: Ryusuke Konishi <[email protected]>
Cc: <[email protected]> [2.6.38+
Signed-off-by: Andrew Morton <[email protected]>

mm/mmap: fix memory leak in mmap_region()

There is a memory leak reported by kmemleak:

  unreferenced object 0xffff88817231ce40 (size 224):
    comm "mount.cifs", pid 19308, jiffies 4295917571 (age 405.880s)
    hex dump (first 32 bytes):
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      60 c0 b2 00 81 88 ff ff 98 83 01 42 81 88 ff ff  `..........B....
    backtrace:
      [<ffffffff81936171>] __alloc_file+0x21/0x250
      [<ffffffff81937051>] alloc_empty_file+0x41/0xf0
      [<ffffffff81937159>] alloc_file+0x59/0x710
      [<ffffffff81937964>] alloc_file_pseudo+0x154/0x210
      [<ffffffff81741dbf>] __shmem_file_setup+0xff/0x2a0
      [<ffffffff817502cd>] shmem_zero_setup+0x8d/0x160
      [<ffffffff817cc1d5>] mmap_region+0x1075/0x19d0
      [<ffffffff817cd257>] do_mmap+0x727/0x1110
      [<ffffffff817518b2>] vm_mmap_pgoff+0x112/0x1e0
      [<ffffffff83adf955>] do_syscall_64+0x35/0x80
      [<ffffffff83c0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0

The root cause was traced to an error handing path in mmap_region() when
arch_validate_flags() or mas_preallocate() fails.  In the shared anonymous
mapping sence, vma will be setuped and mapped with a new shared anonymous
file via shmem_zero_setup().  So in this case, the file resource needs to
be released.

Fix it by calling fput(vma->vm_file) and unmap_region() when
arch_validate_flags() or mas_preallocate() returns an error in the shared
anonymous mapping sence.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree")
Fixes: c462ac288f2c ("mm: Introduce arch_validate_flags()")
Signed-off-by: Li Zetao <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

hugetlbfs: don't delete error page from pagecache

This change is very similar to the change that was made for shmem [1], and
it solves the same problem but for HugeTLBFS instead.

Currently, when poison is found in a HugeTLB page, the page is removed
from the page cache.  That means that attempting to map or read that
hugepage in the future will result in a new hugepage being allocated
instead of notifying the user that the page was poisoned.  As [1] states,
this is effectively memory corruption.

The fix is to leave the page in the page cache.  If the user attempts to
use a poisoned HugeTLB page with a syscall, the syscall will fail with
EIO, the same error code that shmem uses.  For attempts to map the page,
the thread will get a BUS_MCEERR_AR SIGBUS.

[1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens")

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: James Houghton <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Tested-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: James Houghton <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

maple_tree: reorganize testing to restore module testing

Along the development cycle, the testing code support for module/in-kernel
compiles was removed.  Restore this functionality by moving any internal
API tests to the userspace side, as well as threading tests.  Fix the
lockdep issues and add a way to reduce memory usage so the tests can
complete with KASAN + memleak detection.  Make the tests work on 32 bit
hosts where possible and detect 32 bit hosts in the radix test suite.

[[email protected]: fix module export]
[[email protected]: fix it some more]
[[email protected]: fix compile warnings on 32bit build in check_find()]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam R. Howlett <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

maple_tree: mas_anode_descend() clang-analyzer cleanup

clang-analyzer reported some Dead Stores in mas_anode_descend(). Upon
inspection, there were a few clean ups that would make the code cleaner:

The count variable was set from the mt_slots array and then updated but
never used again. Just use the array reference directly.

Also stop updating the type since it isn't used after the update.

Stop setting the gaps pointer to NULL at the start since it is always
set before the loop begins.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam R. Howlett <[email protected]>
Suggested-by: Lukas Bulwahn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

maple_tree: remove pointer to pointer use in mas_alloc_nodes()

There is a more direct and cleaner way of implementing the same functional
code. Remove the confusing and unnecessary use of pointers here.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam R. Howlett <[email protected]>
Suggested-by: Lukas Bulwahn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

Linux 6.1-rc4

Merge tag 'cxl-fixes-for-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull cxl fixes from Dan Williams:
"Several fixes for CXL region creation crashes, leaks and failures.

  This is mainly fallout from the original implementation of dynamic CXL
  region creation (instantiate new physical memory pools) that arrived
  in v6.0-rc1.

  Given the theme of "failures in the presence of pass-through decoders"
  this also includes new regression test infrastructure for that case.

  Summary:

   - Fix region creation crash with pass-through decoders

   - Fix region creation crash when no decoder allocation fails

   - Fix region creation crash when scanning regions to enforce the
     increasing physical address order constraint that CXL mandates

   - Fix a memory leak for cxl_pmem_region objects, track 1:N instead of
     1:1 memory-device-to-region associations.

   - Fix a memory leak for cxl_region objects when regions with active
     targets are deleted

   - Fix assignment of NUMA nodes to CXL regions by CFMWS (CXL Window)
     emulated proximity domains.

   - Fix region creation failure for switch attached devices downstream
     of a single-port host-bridge

   - Fix false positive memory leak of cxl_region objects by recycling
     recently used region ids rather than freeing them

   - Add regression test infrastructure for a pass-through decoder
     configuration

   - Fix some mailbox payload handling corner cases"

* tag 'cxl-fixes-for-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl/region: Recycle region ids
  cxl/region: Fix 'distance' calculation with passthrough ports
  tools/testing/cxl: Add a single-port host-bridge regression config
  tools/testing/cxl: Fix some error exits
  cxl/pmem: Fix cxl_pmem_region and cxl_memdev leak
  cxl/region: Fix cxl_region leak, cleanup targets at region delete
  cxl/region: Fix region HPA ordering validation
  cxl/pmem: Use size_add() against integer overflow
  cxl/region: Fix decoder allocation crash
  ACPI: NUMA: Add CXL CFMWS 'nodes' to the possible nodes set
  cxl/pmem: Fix failure to account for 8 byte header for writes to the device LSA.
  cxl/region: Fix null pointer dereference due to pass through decoder commit
  cxl/mbox: Add a check on input payload size

Merge tag 'hwmon-for-v6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
"Fix two regressions:

   - Commit 54cc3dbfc10d ("hwmon: (pmbus) Add regulator supply into
     macro") resulted in regulator undercount when disabling regulators.
     Revert it.

   - The thermal subsystem rework caused the scmi driver to no longer
     register with the thermal subsystem because index values no longer
     match. To fix the problem, the scmi driver now directly registers
     with the thermal subsystem, no longer through the hwmon core"

* tag 'hwmon-for-v6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  Revert "hwmon: (pmbus) Add regulator supply into macro"
  hwmon: (scmi) Register explicitly with Thermal Framework

Merge tag 'perf_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

- Add Cooper Lake's stepping to the PEBS guest/host events isolation
   fixed microcode revisions checking quirk

- Update Icelake and Sapphire Rapids events constraints

- Use the standard energy unit for Sapphire Rapids in RAPL

- Fix the hw_breakpoint test to fail more graciously on !SMP configs

* tag 'perf_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Add Cooper Lake stepping to isolation_ucodes[]
  perf/x86/intel: Fix pebs event constraints for SPR
  perf/x86/intel: Fix pebs event constraints for ICL
  perf/x86/rapl: Use standard Energy Unit for SPR Dram RAPL domain
  perf/hw_breakpoint: test: Skip the test if dependencies unmet

Merge tag 'x86_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

- Add new Intel CPU models

- Enforce that TDX guests are successfully loaded only on TDX hardware
   where virtualization exception (#VE) delivery on kernel memory is
   disabled because handling those in all possible cases is "essentially
   impossible"

- Add the proper include to the syscall wrappers so that BTF can see
   the real pt_regs definition and not only the forward declaration

* tag 'x86_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add several Intel server CPU model numbers
  x86/tdx: Panic on bad configs that #VE on "private" memory access
  x86/tdx: Prepare for using "INFO" call for a second purpose
  x86/syscall: Include asm/ptrace.h in syscall_wrapper header

Merge tag 'kbuild-fixes-v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

- Use POSIX-compatible grep options

- Document git-related tips for reproducible builds

- Fix a typo in the modpost rule

- Suppress SIGPIPE error message from gcc-ar and llvm-ar

- Fix segmentation fault in the menuconfig search

* tag 'kbuild-fixes-v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kconfig: fix segmentation fault in menuconfig search
  kbuild: fix SIGPIPE error message for AR=gcc-ar and AR=llvm-ar
  kbuild: fix typo in modpost
  Documentation: kbuild: Add description of git for reproducible builds
  kbuild: use POSIX-compatible grep option

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"ARM:

   - Fix the pKVM stage-1 walker erronously using the stage-2 accessor

   - Correctly convert vcpu->kvm to a hyp pointer when generating an
     exception in a nVHE+MTE configuration

   - Check that KVM_CAP_DIRTY_LOG_* are valid before enabling them

   - Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE

   - Document the boot requirements for FGT when entering the kernel at
     EL1

  x86:

   - Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit()

   - Make argument order consistent for kvcalloc()

   - Userspace API fixes for DEBUGCTL and LBRs"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Fix a typo about the usage of kvcalloc()
  KVM: x86: Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit()
  KVM: VMX: Ignore guest CPUID for host userspace writes to DEBUGCTL
  KVM: VMX: Fold vmx_supported_debugctl() into vcpu_supported_debugctl()
  KVM: VMX: Advertise PMU LBRs if and only if perf supports LBRs
  arm64: booting: Document our requirements for fine grained traps with SME
  KVM: arm64: Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE
  KVM: Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them
  KVM: arm64: Fix bad dereference on MTE-enabled systems
  KVM: arm64: Use correct accessor to parse stage-1 PTEs

Merge tag 'for-linus-6.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
"One fix for silencing a smatch warning, and a small cleanup patch"

* tag 'for-linus-6.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: simplify sysenter and syscall setup
x86/xen: silence smatch warning in pmu_msr_chk_emulated()

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
"Fix a number of bugs, including some regressions, the most serious of
  which was one which would cause online resizes to fail with file
  systems with metadata checksums enabled.

  Also fix a warning caused by the newly added fortify string checker,
  plus some bugs that were found using fuzzed file systems"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix fortify warning in fs/ext4/fast_commit.c:1551
  ext4: fix wrong return err in ext4_load_and_init_journal()
  ext4: fix warning in 'ext4_da_release_space'
  ext4: fix BUG_ON() when directory entry has invalid rec_len
  ext4: update the backup superblock's at the end of the online resize

Merge tag '6.1-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
"One symlink handling fix and two fixes foir multichannel issues with
  iterating channels, including for oplock breaks when leases are
  disabled"

* tag '6.1-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix use-after-free on the link name
  cifs: avoid unnecessary iteration of tcp sessions
  cifs: always iterate smb sessions using primary channel

Merge tag 'trace-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull `lTracing fixes for 6.1-rc3:

- Fixed NULL pointer dereference in the ring buffer wait-waiters code
   for machines that have less CPUs than what nr_cpu_ids returns.

   The buffer array is of size nr_cpu_ids, but only the online CPUs get
   initialized.

- Fixed use after free call in ftrace_shutdown.

- Fix accounting of if a kprobe is enabled

- Fix NULL pointer dereference on error path of fprobe rethook_alloc().

- Fix unregistering of fprobe_kprobe_handler

- Fix memory leak in kprobe test module

* tag 'trace-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: kprobe: Fix memory leak in test_gen_kprobe/kretprobe_cmd()
  tracing/fprobe: Fix to check whether fprobe is registered correctly
  fprobe: Check rethook_alloc() return in rethook initialization
  kprobe: reverse kp->flags when arm_kprobe failed
  ftrace: Fix use-after-free for dynamic ftrace_ops
  ring-buffer: Check for NULL cpu_buffer in ring_buffer_wake_waiters()

Merge tag 'kvmarm-fixes-6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

* Fix the pKVM stage-1 walker erronously using the stage-2 accessor

* Correctly convert vcpu->kvm to a hyp pointer when generating
an exception in a nVHE+MTE configuration

* Check that KVM_CAP_DIRTY_LOG_* are valid before enabling them

* Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE

* Document the boot requirements for FGT when entering the kernel
at EL1

Merge branch 'kvm-master' into HEAD

x86:
* Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit()

* Make argument order consistent for kvcalloc()

* Userspace API fixes for DEBUGCTL and LBRs

ext4: fix fortify warning in fs/ext4/fast_commit.c:1551

With the new fortify string system, rework the memcpy to avoid this
warning:

memcpy: detected field-spanning write (size 60) of single field "&raw_inode->i_generation" at fs/ext4/fast_commit.c:1551 (size 4)

Cc: [email protected]
Fixes: 54d9469bc515 ("fortify: Add run-time WARN for cross-field memcpy()")
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: fix wrong return err in ext4_load_and_init_journal()

The return value is wrong in ext4_load_and_init_journal(). The local
variable 'err' need to be initialized before goto out. The original code
in __ext4_fill_super() is fine because it has two return values 'ret'
and 'err' and 'ret' is initialized as -EINVAL. After we factor out
ext4_load_and_init_journal(), this code is broken. So fix it by directly
returning -EINVAL in the error handler path.

Cc: [email protected]
Fixes: 9c1dd22d7422 ("ext4: factor out ext4_load_and_init_journal()")
Signed-off-by: Jason Yan <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>