]> Git Repo - linux.git/log
linux.git
2 years agoparisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
David Hildenbrand [Fri, 13 Jan 2023 17:10:16 +0000 (18:10 +0100)]
parisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by using the yet-unused
_PAGE_ACCESSED location in the swap PTE.  Looking at pte_present() and
pte_none() checks, there seems to be no actual reason why we cannot use
it: we only have to make sure we're not using _PAGE_PRESENT.

Reusing this bit avoids having to steal one bit from the swap offset.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Helge Deller <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agoopenrisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
David Hildenbrand [Fri, 13 Jan 2023 17:10:15 +0000 (18:10 +0100)]
openrisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
type.  Generic MM currently only uses 5 bits for the type
(MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused.

While at it, mask the type in __swp_entry().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Stefan Kristiansson <[email protected]>
Cc: Stafford Horne <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agonios2/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
David Hildenbrand [Fri, 13 Jan 2023 17:10:14 +0000 (18:10 +0100)]
nios2/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by using the yet-unused bit
31.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agonios2/mm: refactor swap PTE layout
David Hildenbrand [Fri, 13 Jan 2023 17:10:13 +0000 (18:10 +0100)]
nios2/mm: refactor swap PTE layout

nios2 disables swap for a good reason: it doesn't even provide sufficient
type bits as required by core MM.  However, swap entries are nowadays also
used for other purposes (migration entries, PTE markers, HWPoison, ...),
and accidential use could be problematic.

Let's properly use 5 bits for the swap type and document the layout.  Bits
26--31 should get ignored by hardware completely, so they can be used.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Dinh Nguyen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomips/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
David Hildenbrand [Fri, 13 Jan 2023 17:10:12 +0000 (18:10 +0100)]
mips/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE.

On 64bit, steal one bit from the type.  Generic MM currently only uses 5
bits for the type (MAX_SWAPFILES_SHIFT), so the stolen bit is effectively
unused.

On 32bit we're able to locate unused bits.  As the PTE layout for 32 bit
is very confusing, document it a bit better.

While at it, mask the type in __swp_entry()/mk_swap_pte().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomicroblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
David Hildenbrand [Fri, 13 Jan 2023 17:10:11 +0000 (18:10 +0100)]
microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
type.  Generic MM currently only uses 5 bits for the type
(MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused.

The shift by 2 when converting between PTE and arch-specific swap entry
makes the swap PTE layout a little bit harder to decipher.

While at it, drop the comment from paulus---copy-and-paste leftover from
powerpc where we actually have _PAGE_HASHPTE---and mask the type in
__swp_entry_to_pte() as well.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Michal Simek <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agom68k/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
David Hildenbrand [Fri, 13 Jan 2023 17:10:10 +0000 (18:10 +0100)]
m68k/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
type.  Generic MM currently only uses 5 bits for the type
(MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused.

While at it, make sure for sun3 that the valid bit never gets set by
properly masking it off and mask the type in __swp_entry().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greg Ungerer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agom68k/mm: remove dummy __swp definitions for nommu
David Hildenbrand [Fri, 13 Jan 2023 17:10:09 +0000 (18:10 +0100)]
m68k/mm: remove dummy __swp definitions for nommu

The definitions are not required, let's remove them.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greg Ungerer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agoloongarch/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
David Hildenbrand [Fri, 13 Jan 2023 17:10:08 +0000 (18:10 +0100)]
loongarch/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
type.  Generic MM currently only uses 5 bits for the type
(MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused.

While at it, also mask the type in mk_swap_pte().

Note that this bit does not conflict with swap PMDs and could also be used
in swap PMD context later.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: WANG Xuerui <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agoia64/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
David Hildenbrand [Fri, 13 Jan 2023 17:10:07 +0000 (18:10 +0100)]
ia64/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
type.  Generic MM currently only uses 5 bits for the type
(MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused.

While at it, also mask the type in __swp_entry().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agohexagon/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
David Hildenbrand [Fri, 13 Jan 2023 17:10:06 +0000 (18:10 +0100)]
hexagon/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
offset.  This reduces the maximum swap space per file to 16 GiB (was 32
GiB).

While at it, mask the type in __swp_entry().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Brian Cain <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agocsky/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
David Hildenbrand [Fri, 13 Jan 2023 17:10:05 +0000 (18:10 +0100)]
csky/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
offset.  This reduces the maximum swap space per file to 16 GiB (was 32
GiB).

We might actually be able to reuse one of the other software bits
(_PAGE_READ / PAGE_WRITE) instead, because we only have to keep
pte_present(), pte_none() and HW happy.  For now, let's keep it simple
because there might be something non-obvious.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Guo Ren <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agoarm/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
David Hildenbrand [Fri, 13 Jan 2023 17:10:04 +0000 (18:10 +0100)]
arm/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
offset.  This reduces the maximum swap space per file to 64 GiB (was 128
GiB).

While at it drop the PTE_TYPE_FAULT from __swp_entry_to_pte() which is
defined to be 0 and is rather confusing because we should be dealing with
"Linux PTEs" not "hardware PTEs".  Also, properly mask the type in
__swp_entry().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agoarc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
David Hildenbrand [Fri, 13 Jan 2023 17:10:03 +0000 (18:10 +0100)]
arc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by using bit 5, which is yet
unused.  The only important parts seems to be to not use _PAGE_PRESENT
(bit 9).

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Vineet Gupta <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agoalpha/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
David Hildenbrand [Fri, 13 Jan 2023 17:10:02 +0000 (18:10 +0100)]
alpha/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
type.  Generic MM currently only uses 5 bits for the type
(MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused.

While at it, mask the type in mk_swap_pte() as well.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/debug_vm_pgtable: more pte_swp_exclusive() sanity checks
David Hildenbrand [Fri, 13 Jan 2023 17:10:01 +0000 (18:10 +0100)]
mm/debug_vm_pgtable: more pte_swp_exclusive() sanity checks

Patch series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all
architectures with swap PTEs".

This is the follow-up on [1]:
[PATCH v2 0/8] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of
anonymous pages

After we implemented __HAVE_ARCH_PTE_SWP_EXCLUSIVE on most prominent
enterprise architectures, implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all
remaining architectures that support swap PTEs.

This makes sure that exclusive anonymous pages will stay exclusive, even
after they were swapped out -- for example, making GUP R/W FOLL_GET of
anonymous pages reliable.  Details can be found in [1].

This primarily fixes remaining known O_DIRECT memory corruptions that can
happen on concurrent swapout, whereby we can lose DMA reads to a page
(modifying the user page by writing to it).

To verify, there are two test cases (requiring swap space, obviously):
(1) The O_DIRECT+swapout test case [2] from Andrea. This test case tries
    triggering a race condition.
(2) My vmsplice() test case [3] that tries to detect if the exclusive
    marker was lost during swapout, not relying on a race condition.

For example, on 32bit x86 (with and without PAE), my test case fails
without these patches:
$ ./test_swp_exclusive
FAIL: page was replaced during COW
But succeeds with these patches:
$ ./test_swp_exclusive
PASS: page was not replaced during COW

Why implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE for all architectures, even
the ones where swap support might be in a questionable state?  This is the
first step towards removing "readable_exclusive" migration entries, and
instead using pte_swp_exclusive() also with (readable) migration entries
instead (as suggested by Peter).  The only missing piece for that is
supporting pmd_swp_exclusive() on relevant architectures with THP
migration support.

As all relevant architectures now implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE,,
we can drop __HAVE_ARCH_PTE_SWP_EXCLUSIVE in the last patch.

I tried cross-compiling all relevant setups and tested on x86 and sparc64
so far.

CCing arch maintainers only on this cover letter and on the respective
patch(es).

[1] https://lkml.kernel.org/r/20220329164329[email protected]
[2] https://gitlab.com/aarcange/kernel-testcases-for-v5.11/-/blob/main/page_count_do_wp_page-swap.c
[3] https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/test_swp_exclusive.c

This patch (of 26):

We want to implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures.
Let's extend our sanity checks, especially testing that our PTE bit does
not affect:

* is_swap_pte() -> pte_present() and pte_none()
* the swap entry + type
* pte_swp_soft_dirty()

Especially, the pfn_pte() is dodgy when the swap PTE layout differs
heavily from ordinary PTEs.  Let's properly construct a swap PTE from swap
type+offset.

[[email protected]: fix build]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Anton Ivanov <[email protected]>
Cc: <[email protected]>
Cc: Borislav Petkov (AMD) <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Dinh Nguyen <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: H. Peter Anvin (Intel) <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Stefan Kristiansson <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Xuerui Wang <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/khugepaged: convert release_pte_pages() to use folios
Vishal Moola (Oracle) [Sat, 14 Jan 2023 00:15:56 +0000 (16:15 -0800)]
mm/khugepaged: convert release_pte_pages() to use folios

Converts release_pte_pages() to use folios instead of pages.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/khugepaged: introduce release_pte_folio() to replace release_pte_page()
Vishal Moola (Oracle) [Sat, 14 Jan 2023 00:15:55 +0000 (16:15 -0800)]
mm/khugepaged: introduce release_pte_folio() to replace release_pte_page()

release_pte_page() is converted to be a wrapper for release_pte_folio() to
help facilitate the khugepaged conversion to folios.

This replaces 3 calls to compound_head() with 1, and saves 85 bytes of
kernel text.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agokmsan: silence -Wmissing-prototypes warnings
Alexander Potapenko [Thu, 12 Jan 2023 10:31:47 +0000 (11:31 +0100)]
kmsan: silence -Wmissing-prototypes warnings

When building the kernel with W=1, the compiler reports numerous warnings
about the missing prototypes for KMSAN instrumentation hooks.

Because these functions are not supposed to be called explicitly by the
kernel code (calls to them are emitted by the compiler), they do not have
to be declared in the headers.  Instead, we add forward declarations right
before the definitions to silence the warnings produced by
-Wmissing-prototypes.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexander Potapenko <[email protected]>
Reported-by: Vlastimil Babka <[email protected]>
Suggested-by: Marco Elver <[email protected]>
Reviewed-by: Marco Elver <[email protected]>
Reported-by: kernel test robot <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/T/
Cc: Dmitry Vyukov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agoDocumentation/mm: update references to __m[un]lock_page() to *_folio()
Lorenzo Stoakes [Thu, 12 Jan 2023 12:39:32 +0000 (12:39 +0000)]
Documentation/mm: update references to __m[un]lock_page() to *_folio()

We now pass folios to these functions, so update the documentation
accordingly.

Additionally, correct the outdated reference to __pagevec_lru_add_fn(),
the referenced action occurs in __munlock_folio() directly now, replace
reference to lru_cache_add_inactive_or_unevictable() with the modern folio
equivalent folio_add_lru_vma() and reference folio flags by the flag name
rather than accessor.

Link: https://lkml.kernel.org/r/898c487169d98a7f09c1c1e57a7dfdc2b3f6bf0f.1673526881.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Joel Fernandes (Google) <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: mlock: update the interface to use folios
Lorenzo Stoakes [Thu, 12 Jan 2023 12:39:31 +0000 (12:39 +0000)]
mm: mlock: update the interface to use folios

Update the mlock interface to accept folios rather than pages, bringing
the interface in line with the internal implementation.

munlock_vma_page() still requires a page_folio() conversion, however this
is consistent with the existent mlock_vma_page() implementation and a
product of rmap still dealing in pages rather than folios.

Link: https://lkml.kernel.org/r/cba12777c5544305014bc0cbec56bb4cc71477d8.1673526881.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Joel Fernandes (Google) <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agom68k/mm/motorola: specify pmd_page() type
Lorenzo Stoakes [Thu, 12 Jan 2023 12:39:30 +0000 (12:39 +0000)]
m68k/mm/motorola: specify pmd_page() type

Failing to specify a specific type here breaks anything that relies on the
type being explicitly known, such as page_folio().

Make explicit the type of null pointer returned here.

Link: https://lkml.kernel.org/r/ad6be2821bbd6af10966b3704568ff458b270d9c.1673526881.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Joel Fernandes (Google) <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: mlock: use folios and a folio batch internally
Lorenzo Stoakes [Thu, 12 Jan 2023 12:39:29 +0000 (12:39 +0000)]
mm: mlock: use folios and a folio batch internally

This brings mlock in line with the folio batches declared in mm/swap.c and
makes the code more consistent across the two.

The existing mechanism for identifying which operation each folio in the
batch is undergoing is maintained, i.e.  using the lower 2 bits of the
struct folio address (previously struct page address).  This should
continue to function correctly as folios remain at least system
word-aligned.

All invocations of mlock() pass either a non-compound page or the head of
a THP-compound page and no tail pages need updating so this functionality
works with struct folios being used internally rather than struct pages.

In this patch the external interface is kept identical to before in order
to maintain separation between patches in the series, using a rather
awkward conversion from struct page to struct folio in relevant functions.

However, this maintenance of the existing interface is intended to be
temporary - the next patch in the series will update the interfaces to
accept folios directly.

Link: https://lkml.kernel.org/r/9f894d54d568773f4ed3cb0eef5f8932f62c95f4.1673526881.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Joel Fernandes (Google) <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: pagevec: add folio_batch_reinit()
Lorenzo Stoakes [Thu, 12 Jan 2023 12:39:28 +0000 (12:39 +0000)]
mm: pagevec: add folio_batch_reinit()

Patch series "update mlock to use folios", v4.

This series updates mlock to use folios, converting the internal interface
to using folios exclusively and exposing the folio interface externally.

As a product of this we move to using a folio batch rather than a pagevec
for mlock folios, which brings it in line with the core folio batches
contained in mm/swap.c.

This patch (of 5):

This performs the same task as pagevec_reinit(), only modifying a folio
batch rather than a pagevec.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/9018cecacb39e34c883540f997f9be8281153613.1673526881.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Joel Fernandes (Google) <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: madvise: use vm_normal_folio() in madvise_free_pte_range()
Kefeng Wang [Thu, 12 Jan 2023 12:40:28 +0000 (20:40 +0800)]
mm: madvise: use vm_normal_folio() in madvise_free_pte_range()

There is already a vm_normal_folio(), use it to make
madvise_free_pte_range() only use a folio.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Vishal Moola (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agoshmem: convert shmem_write_end() to use a folio
Matthew Wilcox (Oracle) [Thu, 12 Jan 2023 13:10:31 +0000 (13:10 +0000)]
shmem: convert shmem_write_end() to use a folio

Use a folio internally to shmem_write_end() which saves a number of calls
to compound_head() and lets us get rid of the custom code to zero out the
rest of a THP and supports folios of arbitrary size.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/memory-failure: convert unpoison_memory() to folios
Sidhartha Kumar [Thu, 12 Jan 2023 20:46:08 +0000 (14:46 -0600)]
mm/memory-failure: convert unpoison_memory() to folios

Use a folio inside unpoison_memory which replaces a compound_head() call
with a call to page_folio().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/memory-failure: convert hugetlb_set_page_hwpoison() to folios
Sidhartha Kumar [Thu, 12 Jan 2023 20:46:07 +0000 (14:46 -0600)]
mm/memory-failure: convert hugetlb_set_page_hwpoison() to folios

Change hugetlb_set_page_hwpoison() to folio_set_hugetlb_hwpoison() and use
a folio internally.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/memory-failure: convert __free_raw_hwp_pages() to folios
Sidhartha Kumar [Thu, 12 Jan 2023 20:46:06 +0000 (14:46 -0600)]
mm/memory-failure: convert __free_raw_hwp_pages() to folios

Change __free_raw_hwp_pages() to __folio_free_raw_hwp() and modify its
callers to pass in a folio.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/memory-failure: convert raw_hwp_list_head() to folios
Sidhartha Kumar [Thu, 12 Jan 2023 20:46:05 +0000 (14:46 -0600)]
mm/memory-failure: convert raw_hwp_list_head() to folios

Change raw_hwp_list_head() to take in a folio and modify its callers to
pass in a folio.  Also converts two users of hugetlb specific page macro
users to their folio equivalents.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/memory-failure: convert free_raw_hwp_pages() to folios
Sidhartha Kumar [Thu, 12 Jan 2023 20:46:04 +0000 (14:46 -0600)]
mm/memory-failure: convert free_raw_hwp_pages() to folios

Change free_raw_hwp_pages() to folio_free_raw_hwp(), converts two users of
hugetlb specific page macro users to their folio equivalents.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/memory-failure: convert hugetlb_clear_page_hwpoison to folios
Sidhartha Kumar [Thu, 12 Jan 2023 20:46:03 +0000 (14:46 -0600)]
mm/memory-failure: convert hugetlb_clear_page_hwpoison to folios

Change hugetlb_clear_page_hwpoison() to folio_clear_hugetlb_hwpoison() by
changing the function to take in a folio.  This converts one use of
ClearPageHWPoison and HPageRawHwpUnreliable to their folio equivalents.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/memory-failure: convert try_memory_failure_hugetlb() to folios
Sidhartha Kumar [Thu, 12 Jan 2023 20:46:02 +0000 (14:46 -0600)]
mm/memory-failure: convert try_memory_failure_hugetlb() to folios

Use a struct folio rather than a head page in try_memory_failure_hugetlb.
This converts one user of SetHPageMigratable to the folio equivalent.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/memory-failure: convert __get_huge_page_for_hwpoison() to folios
Sidhartha Kumar [Thu, 12 Jan 2023 20:46:01 +0000 (14:46 -0600)]
mm/memory-failure: convert __get_huge_page_for_hwpoison() to folios

Patch series "convert hugepage memory failure functions to folios".

This series contains a 1:1 straightforward page to folio conversion for
memory failure functions which deal with huge pages.  I renamed a few
functions to fit with how other folio operating functions are named.
These include:

hugetlb_clear_page_hwpoison -> folio_clear_hugetlb_hwpoison
free_raw_hwp_pages -> folio_free_raw_hwp
__free_raw_hwp_pages -> __folio_free_raw_hwp
hugetlb_set_page_hwpoison -> folio_set_hugetlb_hwpoison

The goal of this series was to reduce users of the hugetlb specific page
flag macros which take in a page so users are protected by the compiler to
make sure they are operating on a head page.

This patch (of 8):

Use a folio throughout the function rather than using a head page.  This
also reduces the users of the page version of hugetlb specific page flags.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agoRevert "x86: kmsan: sync metadata pages on page fault"
Alexander Potapenko [Wed, 11 Jan 2023 10:18:06 +0000 (11:18 +0100)]
Revert "x86: kmsan: sync metadata pages on page fault"

This reverts commit 3f1e2c7a9099c1ed32c67f12cdf432ba782cf51f.

As noticed by Qun-Wei Lin, arch_sync_kernel_mappings() in
arch/x86/mm/fault.c is only used with CONFIG_X86_32, whereas KMSAN is only
supported on x86_64, where this code is not compiled.

The patch in question dates back to downstream KMSAN branch based on
v5.8-rc5, it sneaked into upstream unnoticed in v6.1.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexander Potapenko <[email protected]>
Reported-by: Qun-Wei Lin <[email protected]>
Link: https://github.com/google/kmsan/issues/91
Cc: Andy Lutomirski <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/mmap: fix comment of unmapped_area{_topdown}
Vernon Yang [Wed, 11 Jan 2023 13:20:36 +0000 (21:20 +0800)]
mm/mmap: fix comment of unmapped_area{_topdown}

The low_limit of unmapped area information is inclusive, and the
hight_limit is not, so make symbol to be [ instead of (.

And replace hight_limit to high_limit.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 3499a13168da ("mm/mmap: use maple tree for unmapped_area{_topdown}")
Signed-off-by: Vernon Yang <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomaple_tree: fix comment of mte_destroy_walk
Vernon Yang [Wed, 11 Jan 2023 13:53:48 +0000 (21:53 +0800)]
maple_tree: fix comment of mte_destroy_walk

The parameter name of maple tree is mt, make the comment be mt instead of
mn, and the separator between the parameter name and the description to be
: instead of -.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Vernon Yang <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: remove the hugetlb field from struct page
Sidhartha Kumar [Wed, 11 Jan 2023 14:29:14 +0000 (14:29 +0000)]
mm: remove the hugetlb field from struct page

Patch series "Get rid of tail page fields".

Continue the shrinkage of the struct page definition by getting rid of the
'first tail page' and 'second tail page' fields.  I originally did this
patch set before Hugh's rewrite of the subpages_mapcount, so it needed
substantial updates; hope I didn't miss anything.

This patch (of 28):

commit dad6a5eb5556(mm,hugetlb: use folio fields in second tail page)
added a transitional hugetlb field to struct page and struct folio to make
room for another int in the first tail of a compound page.  Hugetlb folio
conversions have changed all page users of this field to use the fields
within the folio so struct page no longer needs this hugetlb specific
field.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: convert deferred_split_huge_page() to deferred_split_folio()
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:13 +0000 (14:29 +0000)]
mm: convert deferred_split_huge_page() to deferred_split_folio()

Now that both callers use a folio, pass the folio in and save a call to
compound_head().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/huge_memory: convert get_deferred_split_queue() to take a folio
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:12 +0000 (14:29 +0000)]
mm/huge_memory: convert get_deferred_split_queue() to take a folio

Removes a few calls to compound_head().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/huge_memory: remove page_deferred_list()
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:11 +0000 (14:29 +0000)]
mm/huge_memory: remove page_deferred_list()

Use folio->_deferred_list directly.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: move page->deferred_list to folio->_deferred_list
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:10 +0000 (14:29 +0000)]
mm: move page->deferred_list to folio->_deferred_list

Remove the entire block of definitions for the second tail page, and add
the deferred list to the struct folio.  This actually moves _deferred_list
to a different offset in struct folio because I don't see a need to
include the padding.

This lets us use list_for_each_entry_safe() in deferred_split_scan()
and avoid a number of calls to compound_head().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agodoc: correct struct folio kernel-doc
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:09 +0000 (14:29 +0000)]
doc: correct struct folio kernel-doc

Insert appropriate public: and private: markers to make the generated
kernel-doc look right.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: remove 'First tail page' members from struct page
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:08 +0000 (14:29 +0000)]
mm: remove 'First tail page' members from struct page

All former users now use the folio equivalents, so remove them from the
definition of struct page.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agohugetlb: remove uses of compound_dtor and compound_nr
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:07 +0000 (14:29 +0000)]
hugetlb: remove uses of compound_dtor and compound_nr

Convert the entire file to use the folio equivalents.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: convert destroy_large_folio() to use folio_dtor
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:06 +0000 (14:29 +0000)]
mm: convert destroy_large_folio() to use folio_dtor

Replace a use of compound_dtor.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: convert is_transparent_hugepage() to use a folio
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:05 +0000 (14:29 +0000)]
mm: convert is_transparent_hugepage() to use a folio

Replace a use of page->compound_dtor with its folio equivalent.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: convert set_compound_page_dtor() and set_compound_order() to folios
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:04 +0000 (14:29 +0000)]
mm: convert set_compound_page_dtor() and set_compound_order() to folios

Replace uses of compound_dtor, compound_order and compound_nr by their
folio equivalents.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: reimplement compound_nr()
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:03 +0000 (14:29 +0000)]
mm: reimplement compound_nr()

Turn compound_nr() into a wrapper around folio_nr_pages().  Similarly to
compound_order(), casting the struct page directly to struct folio
preserves the existing behaviour, while calling page_folio() would change
the behaviour.  Move thp_nr_pages() down in the file so that compound_nr()
can be after folio_nr_pages().

[[email protected]: fix assertion triggering]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: reimplement compound_order()
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:02 +0000 (14:29 +0000)]
mm: reimplement compound_order()

Make compound_order() use struct folio.  It can't be turned into a wrapper
around folio_order() as a page can be turned into a tail page between a
check in compound_order() and the assertion in folio_test_large().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: remove head_compound_mapcount() and _ptr functions
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:01 +0000 (14:29 +0000)]
mm: remove head_compound_mapcount() and _ptr functions

folio_mapcount_ptr(), compound_mapcount_ptr() and subpages_mapcount_ptr()
are all now unused.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: convert page_mapcount() to use folio_entire_mapcount()
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:00 +0000 (14:29 +0000)]
mm: convert page_mapcount() to use folio_entire_mapcount()

Remove a use of head_compound_mapcount().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agohugetlb: remove uses of folio_mapcount_ptr
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:59 +0000 (14:28 +0000)]
hugetlb: remove uses of folio_mapcount_ptr

Use the entire_mapcount field directly.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/debug: remove call to head_compound_mapcount()
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:58 +0000 (14:28 +0000)]
mm/debug: remove call to head_compound_mapcount()

Call folio_entire_mapcount() instead.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: use entire_mapcount in __page_dup_rmap()
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:57 +0000 (14:28 +0000)]
mm: use entire_mapcount in __page_dup_rmap()

Remove the use of the compound_mapcount_ptr() wrapper, and add an
assertion that we're not passing a tail page if we're duplicating a PMD.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: use a folio in hugepage_add_anon_rmap() and hugepage_add_new_anon_rmap()
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:56 +0000 (14:28 +0000)]
mm: use a folio in hugepage_add_anon_rmap() and hugepage_add_new_anon_rmap()

Remove uses of compound_mapcount_ptr()

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agopage_alloc: use folio fields directly
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:55 +0000 (14:28 +0000)]
page_alloc: use folio fields directly

Rmove the uses of compound_mapcount_ptr(), head_compound_mapcount() and
subpages_mapcount_ptr()

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: add folio_add_new_anon_rmap()
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:54 +0000 (14:28 +0000)]
mm: add folio_add_new_anon_rmap()

In contrast to other rmap functions, page_add_new_anon_rmap() is always
called with a freshly allocated page.  That means it can't be called with
a tail page.  Turn page_add_new_anon_rmap() into folio_add_new_anon_rmap()
and add a page_add_new_anon_rmap() wrapper.  Callers can be converted
individually.

[[email protected]: fix NOMMU build.  page_add_new_anon_rmap() requires CONFIG_MMU]
[[email protected]: folio-compat.c needs rmap.h]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: convert page_add_file_rmap() to use a folio internally
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:53 +0000 (14:28 +0000)]
mm: convert page_add_file_rmap() to use a folio internally

The API for page_add_file_rmap() needs to be page-based, because we can
add mappings of individual pages.  But inside the function, we want to
only call compound_head() once and then use the folio APIs instead of the
page APIs that each call compound_head().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: convert page_add_anon_rmap() to use a folio internally
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:52 +0000 (14:28 +0000)]
mm: convert page_add_anon_rmap() to use a folio internally

The API for page_add_anon_rmap() needs to be page-based, because we can
add mappings of individual pages.  But inside the function, we want to
only call compound_head() once and then use the folio APIs instead of the
page APIs that each call compound_head().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: convert page_remove_rmap() to use a folio internally
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:51 +0000 (14:28 +0000)]
mm: convert page_remove_rmap() to use a folio internally

The API for page_remove_rmap() needs to be page-based, because we can
remove mappings of pages individually.  But inside the function, we want
to only call compound_head() once and then use the folio APIs instead of
the page APIs that each call compound_head().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: convert total_compound_mapcount() to folio_total_mapcount()
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:50 +0000 (14:28 +0000)]
mm: convert total_compound_mapcount() to folio_total_mapcount()

Instead of enforcing that the argument must be a head page by naming,
enforce it with the compiler by making it a folio.  Also rename the
counter in struct folio from _compound_mapcount to _entire_mapcount.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agodoc: clarify refcount section by referring to folios & pages
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:49 +0000 (14:28 +0000)]
doc: clarify refcount section by referring to folios & pages

Include the rename of subpages_mapcount to _nr_pages_mapped.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: convert head_subpages_mapcount() into folio_nr_pages_mapped()
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:48 +0000 (14:28 +0000)]
mm: convert head_subpages_mapcount() into folio_nr_pages_mapped()

Calling this 'mapcount' is confusing since mapcount is usually the number
of times something is mapped; instead this is the number of mapped pages.
It's also better to enforce that this is a folio rather than a head page.

Move folio_nr_pages_mapped() into mm/internal.h since this is not
something we want device drivers or filesystems poking at.  Get rid of
folio_subpages_mapcount_ptr() and use folio->_nr_pages_mapped directly.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: remove folio_pincount_ptr() and head_compound_pincount()
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:47 +0000 (14:28 +0000)]
mm: remove folio_pincount_ptr() and head_compound_pincount()

We can use folio->_pincount directly, since all users are guarded by tests
of compound/large.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export
Alistair Popple [Tue, 10 Jan 2023 02:57:22 +0000 (13:57 +1100)]
mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export

mmu_notifier_range_update_to_read_only() was originally introduced in
commit c6d23413f81b ("mm/mmu_notifier:
mmu_notifier_range_update_to_read_only() helper") as an optimisation for
device drivers that know a range has only been mapped read-only.  However
there are no users of this feature so remove it.  As it is the only user
of the struct mmu_notifier_range.vma field remove that also.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alistair Popple <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Ralph Campbell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: compaction: avoid fragmentation score calculation for empty zones
Baolin Wang [Tue, 10 Jan 2023 13:36:22 +0000 (21:36 +0800)]
mm: compaction: avoid fragmentation score calculation for empty zones

There is no need to calculate the fragmentation score for empty zones.

Link: https://lkml.kernel.org/r/100331ad9d274a9725e687b00d85d75d7e4a17c7.1673342761.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: compaction: add missing kcompactd wakeup trace event
Baolin Wang [Tue, 10 Jan 2023 13:36:21 +0000 (21:36 +0800)]
mm: compaction: add missing kcompactd wakeup trace event

Add missing kcompactd wakeup trace event for proactive compaction,
meanwhile use order = -1 and the highest zone index of the pgdat for the
kcompactd wakeup trace event by proactive compaction.

Link: https://lkml.kernel.org/r/cbf8097a2d8a1b6800991f2a21575550d3613ce6.1673342761.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: compaction: count the migration scanned pages events for proactive compaction
Baolin Wang [Tue, 10 Jan 2023 13:36:20 +0000 (21:36 +0800)]
mm: compaction: count the migration scanned pages events for proactive compaction

The proactive compaction will reuse per-node kcompactd threads, so we
should also count the KCOMPACTD_MIGRATE_SCANNED and KCOMPACTD_FREE_SCANNED
events for proactive compaction.

Link: https://lkml.kernel.org/r/b7f1ece1adc17defa47e3667b5f9fd61f496517a.1673342761.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: compaction: move list validation into compact_zone()
Baolin Wang [Tue, 10 Jan 2023 13:36:19 +0000 (21:36 +0800)]
mm: compaction: move list validation into compact_zone()

Move the cc.freepages and cc.migratepages list validation into compact_zone()
to remove some duplicate code.

Link: https://lkml.kernel.org/r/15cf54f7d762e87b04ac3cc74536f7d1ebbcd8cd.1673342761.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: compaction: remove redundant VM_BUG_ON() in compact_zone()
Baolin Wang [Tue, 10 Jan 2023 13:36:18 +0000 (21:36 +0800)]
mm: compaction: remove redundant VM_BUG_ON() in compact_zone()

Patch series "Some small improvements for compaction".

When I did some compaction testing, I found some small room for
improvement as well as some code cleanups.

This patch (of 5):

The compaction_suitable() will never return values other than
COMPACT_SUCCESS, COMPACT_SKIPPED and COMPACT_CONTINUE, so after validation
of COMPACT_SUCCESS and COMPACT_SKIPPED, we will never hit other unexpected
case.  Thus remove the redundant VM_BUG_ON() validation for the return
values of compaction_suitable().

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/740a2396d9b98154dba76e326cba5e798b640ead.1673342761.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/mmap: fix typo in comment
Vernon Yang [Tue, 10 Jan 2023 14:53:53 +0000 (22:53 +0800)]
mm/mmap: fix typo in comment

Replace "parital" with "partial".

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vernon Yang <[email protected]>
Cc: Liam Howlett <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomaple_tree: remove the parameter entry of mas_preallocate
Vernon Yang [Tue, 10 Jan 2023 15:42:11 +0000 (23:42 +0800)]
maple_tree: remove the parameter entry of mas_preallocate

The parameter entry of mas_preallocate is not used, so drop it.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vernon Yang <[email protected]>
Cc: Liam Howlett <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agoselftests/damon/debugfs_rm_non_contexts: hide expected write error messages
SeongJae Park [Tue, 10 Jan 2023 19:04:00 +0000 (19:04 +0000)]
selftests/damon/debugfs_rm_non_contexts: hide expected write error messages

A selftest case for DAMON debugfs interface has a test for expected
failure.  To make the test output clean, hide the expected failure error
message.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: SeongJae Park <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agoselftests/damon/sysfs: hide expected write failures
SeongJae Park [Tue, 10 Jan 2023 19:03:59 +0000 (19:03 +0000)]
selftests/damon/sysfs: hide expected write failures

DAMON selftests for sysfs (sysfs.sh) tests if some writes to DAMON sysfs
interface files fails as expected.  It makes the test results noisy with
the failure error message because it tests a number of such failures.
Redirect the expected failure error messages to /dev/null to make the
results clean.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: SeongJae Park <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agoMAINTAINERS/DAMON: link maintainer profile, git trees, and website
SeongJae Park [Tue, 10 Jan 2023 19:03:58 +0000 (19:03 +0000)]
MAINTAINERS/DAMON: link maintainer profile, git trees, and website

Add links to below DAMON development related resource to DAMON section in
MAINTAINERS file.

- The basic policies and expectations of DAMON development,
- DAMON development trees, and
- DAMON introduction website.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: SeongJae Park <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agoDocs/mm/damon: add a maintainer-profile for DAMON
SeongJae Park [Tue, 10 Jan 2023 19:03:57 +0000 (19:03 +0000)]
Docs/mm/damon: add a maintainer-profile for DAMON

Document the basic policies and expectations for DAMON development.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: SeongJae Park <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agoDocs/admin-guide/mm/damon/usage: update DAMOS actions/filters supports of each DAMON...
SeongJae Park [Tue, 10 Jan 2023 19:03:56 +0000 (19:03 +0000)]
Docs/admin-guide/mm/damon/usage: update DAMOS actions/filters supports of each DAMON operations set

Supports of each DAMOS action and filters are up to DAMON operations set
implementation, but it's not mentioned in detail on the documentation.
Update the information on the usage document.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: SeongJae Park <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agoDocs/mm/damon/index: mention DAMOS on the intro
SeongJae Park [Tue, 10 Jan 2023 19:03:55 +0000 (19:03 +0000)]
Docs/mm/damon/index: mention DAMOS on the intro

What DAMON aims to do is not only access monitoring but efficient and
effective access-aware system operations.  And DAMon-based Operation
Schemes (DAMOS) is the important feature of DAMON for the goal.  Make the
intro of DAMON documentation to emphasize the goal and mention DAMOS.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: SeongJae Park <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/damon/core: update kernel-doc comments for DAMOS filters supports of each DAMON...
SeongJae Park [Tue, 10 Jan 2023 19:03:54 +0000 (19:03 +0000)]
mm/damon/core: update kernel-doc comments for DAMOS filters supports of each DAMON operations set

Supports of each DAMOS filter type are up to DAMON operations set
implementation in use, but not well mentioned on the kernel-doc comments.
Add the comment.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: SeongJae Park <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/damon/core: update kernel-doc comments for DAMOS action supports of each DAMON...
SeongJae Park [Tue, 10 Jan 2023 19:03:53 +0000 (19:03 +0000)]
mm/damon/core: update kernel-doc comments for DAMOS action supports of each DAMON operations set

Patch series "mm/damon: trivial fixups".

This patchset contains patches for trivial fixups of DAMON's
documentation, MAINTAINERS section, and selftests.

This patch (of 8):

Supports of each DAMOS action are up to DAMON operations set
implementation in use, but not well mentioned on the kernel-doc comments.
Add the comment.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: SeongJae Park <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/highmem: add notes about conversions from kmap{,_atomic}()
Fabio M. De Francesco [Wed, 7 Dec 2022 22:53:08 +0000 (23:53 +0100)]
mm/highmem: add notes about conversions from kmap{,_atomic}()

kmap() and kmap_atomic() have been deprecated.  kmap_local_page() should
always be used in new code and the call sites of the two deprecated
functions should be converted.  This latter task can lead to errors if it
is not carried out with the necessary attention to the context around and
between the maps and unmaps.

Therefore, add further information to the Highmem's documentation for the
purpose to make it clearer that (1) kmap() and kmap_atomic() must not any
longer be called in new code and (2) developers doing conversions from
kmap() amd kmap_atomic() are expected to take care of the context around
and between the maps and unmaps, in order to not break the code.

Relevant parts of this patch have been taken from messages exchanged
privately with Ira Weiny (thanks!).

[[email protected]: merge two sentences into one, per Bagas]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Fabio M. De Francesco <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agoSync mm-stable with mm-hotfixes-stable to pick up dependent patches
Andrew Morton [Wed, 1 Feb 2023 01:25:17 +0000 (17:25 -0800)]
Sync mm-stable with mm-hotfixes-stable to pick up dependent patches

Merge branch 'mm-hotfixes-stable' into mm-stable

2 years agomm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath()
Kefeng Wang [Sun, 29 Jan 2023 04:09:45 +0000 (12:09 +0800)]
mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath()

As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"),
hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg
could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could
occurs a NULL pointer dereference, let's do not record the foreign
writebacks for folio memcg is null in mem_cgroup_track_foreign_dirty() to
fix it.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing")
Signed-off-by: Kefeng Wang <[email protected]>
Reported-by: Ma Wupeng <[email protected]>
Tested-by: Miko Larsson <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Ma Wupeng <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agoKconfig.debug: fix the help description in SCHED_DEBUG
ye xingchen [Sun, 29 Jan 2023 02:13:57 +0000 (10:13 +0800)]
Kconfig.debug: fix the help description in SCHED_DEBUG

The correct file path for SCHED_DEBUG is /sys/kernel/debug/sched.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: ye xingchen <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Zhaoyang Huang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/swapfile: add cond_resched() in get_swap_pages()
Longlong Xia [Sat, 28 Jan 2023 09:47:57 +0000 (09:47 +0000)]
mm/swapfile: add cond_resched() in get_swap_pages()

The softlockup still occurs in get_swap_pages() under memory pressure.  64
CPU cores, 64GB memory, and 28 zram devices, the disksize of each zram
device is 50MB with same priority as si.  Use the stress-ng tool to
increase memory pressure, causing the system to oom frequently.

The plist_for_each_entry_safe() loops in get_swap_pages() could reach tens
of thousands of times to find available space (extreme case:
cond_resched() is not called in scan_swap_map_slots()).  Let's add
cond_resched() into get_swap_pages() when failed to find available space
to avoid softlockup.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Longlong Xia <[email protected]>
Reviewed-by: "Huang, Ying" <[email protected]>
Cc: Chen Wandun <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Nanyong Sun <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: use stack_depot_early_init for kmemleak
Zhaoyang Huang [Thu, 19 Jan 2023 01:22:25 +0000 (09:22 +0800)]
mm: use stack_depot_early_init for kmemleak

Mirsad report the below error which is caused by stack_depot_init()
failure in kvcalloc.  Solve this by having stackdepot use
stack_depot_early_init().

On 1/4/23 17:08, Mirsad Goran Todorovac wrote:
I hate to bring bad news again, but there seems to be a problem with the output of /sys/kernel/debug/kmemleak:

[root@pc-mtodorov ~]# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff951c118568b0 (size 16):
comm "kworker/u12:2", pid 56, jiffies 4294893952 (age 4356.548s)
hex dump (first 16 bytes):
    6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0.......
    backtrace:
[root@pc-mtodorov ~]#

Apparently, backtrace of called functions on the stack is no longer
printed with the list of memory leaks.  This appeared on Lenovo desktop
10TX000VCR, with AlmaLinux 8.7 and BIOS version M22KT49A (11/10/2022) and
6.2-rc1 and 6.2-rc2 builds.  This worked on 6.1 with the same
CONFIG_KMEMLEAK=y and MGLRU enabled on a vanilla mainstream kernel from
Mr.  Torvalds' tree.  I don't know if this is deliberate feature for some
reason or a bug.  Please find attached the config, lshw and kmemleak
output.

[[email protected]: remove stack_depot_init() call]
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 56a61617dd22 ("mm: use stack_depot for recording kmemleak's backtrace")
Reported-by: Mirsad Todorovac <[email protected]>
Suggested-by: Vlastimil Babka <[email protected]>
Signed-off-by: Zhaoyang Huang <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Tested-by: Borislav Petkov (AMD) <[email protected]>
Cc: ke.wang <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agoSquashfs: fix handling and sanity checking of xattr_ids count
Phillip Lougher [Fri, 27 Jan 2023 06:18:42 +0000 (06:18 +0000)]
Squashfs: fix handling and sanity checking of xattr_ids count

A Sysbot [1] corrupted filesystem exposes two flaws in the handling and
sanity checking of the xattr_ids count in the filesystem.  Both of these
flaws cause computation overflow due to incorrect typing.

In the corrupted filesystem the xattr_ids value is 4294967071, which
stored in a signed variable becomes the negative number -225.

Flaw 1 (64-bit systems only):

The signed integer xattr_ids variable causes sign extension.

This causes variable overflow in the SQUASHFS_XATTR_*(A) macros.  The
variable is first multiplied by sizeof(struct squashfs_xattr_id) where the
type of the sizeof operator is "unsigned long".

On a 64-bit system this is 64-bits in size, and causes the negative number
to be sign extended and widened to 64-bits and then become unsigned.  This
produces the very large number 18446744073709548016 or 2^64 - 3600.  This
number when rounded up by SQUASHFS_METADATA_SIZE - 1 (8191 bytes) and
divided by SQUASHFS_METADATA_SIZE overflows and produces a length of 0
(stored in len).

Flaw 2 (32-bit systems only):

On a 32-bit system the integer variable is not widened by the unsigned
long type of the sizeof operator (32-bits), and the signedness of the
variable has no effect due it always being treated as unsigned.

The above corrupted xattr_ids value of 4294967071, when multiplied
overflows and produces the number 4294963696 or 2^32 - 3400.  This number
when rounded up by SQUASHFS_METADATA_SIZE - 1 (8191 bytes) and divided by
SQUASHFS_METADATA_SIZE overflows again and produces a length of 0.

The effect of the 0 length computation:

In conjunction with the corrupted xattr_ids field, the filesystem also has
a corrupted xattr_table_start value, where it matches the end of
filesystem value of 850.

This causes the following sanity check code to fail because the
incorrectly computed len of 0 matches the incorrect size of the table
reported by the superblock (0 bytes).

    len = SQUASHFS_XATTR_BLOCK_BYTES(*xattr_ids);
    indexes = SQUASHFS_XATTR_BLOCKS(*xattr_ids);

    /*
     * The computed size of the index table (len bytes) should exactly
     * match the table start and end points
    */
    start = table_start + sizeof(*id_table);
    end = msblk->bytes_used;

    if (len != (end - start))
            return ERR_PTR(-EINVAL);

Changing the xattr_ids variable to be "usigned int" fixes the flaw on a
64-bit system.  This relies on the fact the computation is widened by the
unsigned long type of the sizeof operator.

Casting the variable to u64 in the above macro fixes this flaw on a 32-bit
system.

It also means 64-bit systems do not implicitly rely on the type of the
sizeof operator to widen the computation.

[1] https://lore.kernel.org/lkml/000000000000cd44f005f1a0f17f@google.com/

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 506220d2ba21 ("squashfs: add more sanity checks in xattr id lookup")
Signed-off-by: Phillip Lougher <[email protected]>
Reported-by: <[email protected]>
Cc: Alexey Khoroshilov <[email protected]>
Cc: Fedor Pchelkin <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agosh: define RUNTIME_DISCARD_EXIT
Tom Saeger [Tue, 24 Jan 2023 00:09:35 +0000 (17:09 -0700)]
sh: define RUNTIME_DISCARD_EXIT

sh vmlinux fails to link with GNU ld < 2.40 (likely < 2.36) since
commit 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv").

This is similar to fixes for powerpc and s390:
commit 4b9880dbf3bd ("powerpc/vmlinux.lds: Define RUNTIME_DISCARD_EXIT").
commit a494398bde27 ("s390: define RUNTIME_DISCARD_EXIT to fix link error
with GNU ld < 2.36").

  $ sh4-linux-gnu-ld --version | head -n1
  GNU ld (GNU Binutils for Debian) 2.35.2

  $ make ARCH=sh CROSS_COMPILE=sh4-linux-gnu- microdev_defconfig
  $ make ARCH=sh CROSS_COMPILE=sh4-linux-gnu-

  `.exit.text' referenced in section `__bug_table' of crypto/algboss.o:
  defined in discarded section `.exit.text' of crypto/algboss.o
  `.exit.text' referenced in section `__bug_table' of
  drivers/char/hw_random/core.o: defined in discarded section
  `.exit.text' of drivers/char/hw_random/core.o
  make[2]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
  make[1]: *** [Makefile:1252: vmlinux] Error 2

arch/sh/kernel/vmlinux.lds.S keeps EXIT_TEXT:

/*
 * .exit.text is discarded at runtime, not link time, to deal with
 * references from __bug_table
 */
.exit.text : AT(ADDR(.exit.text)) { EXIT_TEXT }

However, EXIT_TEXT is thrown away by
DISCARD(include/asm-generic/vmlinux.lds.h) because
sh does not define RUNTIME_DISCARD_EXIT.

GNU ld 2.40 does not have this issue and builds fine.
This corresponds with Masahiro's comments in a494398bde27:
"Nathan [Chancellor] also found that binutils
commit 21401fc7bf67 ("Duplicate output sections in scripts") cured this
issue, so we cannot reproduce it with binutils 2.36+, but it is better
to not rely on it."

Link: https://lkml.kernel.org/r/9166a8abdc0f979e50377e61780a4bba1dfa2f52.1674518464.git.tom.saeger@oracle.com
Fixes: 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv")
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Tom Saeger <[email protected]>
Tested-by: John Paul Adrian Glaubitz <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Dennis Gilmore <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Naresh Kamboju <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agohighmem: round down the address passed to kunmap_flush_on_unmap()
Matthew Wilcox (Oracle) [Thu, 26 Jan 2023 20:07:27 +0000 (20:07 +0000)]
highmem: round down the address passed to kunmap_flush_on_unmap()

We already round down the address in kunmap_local_indexed() which is the
other implementation of __kunmap_local().  The only implementation of
kunmap_flush_on_unmap() is PA-RISC which is expecting a page-aligned
address.  This may be causing PA-RISC to be flushing the wrong addresses
currently.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Fixes: 298fa1ad5571 ("highmem: Provide generic variant of kmap_atomic*")
Reviewed-by: Ira Weiny <[email protected]>
Cc: "Fabio M. De Francesco" <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Bagas Sanjaya <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomigrate: hugetlb: check for hugetlb shared PMD in node migration
Mike Kravetz [Thu, 26 Jan 2023 22:27:21 +0000 (14:27 -0800)]
migrate: hugetlb: check for hugetlb shared PMD in node migration

migrate_pages/mempolicy semantics state that CAP_SYS_NICE is required to
move pages shared with another process to a different node.  page_mapcount
> 1 is being used to determine if a hugetlb page is shared.  However, a
hugetlb page will have a mapcount of 1 if mapped by multiple processes via
a shared PMD.  As a result, hugetlb pages shared by multiple processes and
mapped with a shared PMD can be moved by a process without CAP_SYS_NICE.

To fix, check for a shared PMD if mapcount is 1.  If a shared PMD is found
consider the page shared.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: e2d8cf405525 ("migrate: add hugepage migration code to migrate_pages()")
Signed-off-by: Mike Kravetz <[email protected]>
Acked-by: Peter Xu <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: James Houghton <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Vishal Moola (Oracle) <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps
Mike Kravetz [Thu, 26 Jan 2023 22:27:20 +0000 (14:27 -0800)]
mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps

Patch series "Fixes for hugetlb mapcount at most 1 for shared PMDs".

This issue of mapcount in hugetlb pages referenced by shared PMDs was
discussed in [1].  The following two patches address user visible behavior
caused by this issue.

[1] https://lore.kernel.org/linux-mm/Y9BF+OCdWnCSilEu@monkey/

This patch (of 2):

A hugetlb page will have a mapcount of 1 if mapped by multiple processes
via a shared PMD.  This is because only the first process increases the
map count, and subsequent processes just add the shared PMD page to their
page table.

page_mapcount is being used to decide if a hugetlb page is shared or
private in /proc/PID/smaps.  Pages referenced via a shared PMD were
incorrectly being counted as private.

To fix, check for a shared PMD if mapcount is 1.  If a shared PMD is found
count the hugetlb page as shared.  A new helper to check for a shared PMD
is added.

[[email protected]: simplification, per David]
[[email protected]: hugetlb.h: include page_ref.h for page_count()]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 25ee01a2fca0 ("mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps")
Signed-off-by: Mike Kravetz <[email protected]>
Acked-by: Peter Xu <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: James Houghton <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Vishal Moola (Oracle) <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups
Zach O'Keefe [Wed, 25 Jan 2023 22:53:58 +0000 (14:53 -0800)]
mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups

In commit 34488399fa08 ("mm/madvise: add file and shmem support to
MADV_COLLAPSE") we make the following change to find_pmd_or_thp_or_none():

-       if (!pmd_present(pmde))
-               return SCAN_PMD_NULL;
+       if (pmd_none(pmde))
+               return SCAN_PMD_NONE;

This was for-use by MADV_COLLAPSE file/shmem codepaths, where
MADV_COLLAPSE might identify a pte-mapped hugepage, only to have
khugepaged race-in, free the pte table, and clear the pmd.  Such codepaths
include:

A) If we find a suitably-aligned compound page of order HPAGE_PMD_ORDER
   already in the pagecache.
B) In retract_page_tables(), if we fail to grab mmap_lock for the target
   mm/address.

In these cases, collapse_pte_mapped_thp() really does expect a none (not
just !present) pmd, and we want to suitably identify that case separate
from the case where no pmd is found, or it's a bad-pmd (of course, many
things could happen once we drop mmap_lock, and the pmd could plausibly
undergo multiple transitions due to intervening fault, split, etc).
Regardless, the code is prepared install a huge-pmd only when the existing
pmd entry is either a genuine pte-table-mapping-pmd, or the none-pmd.

However, the commit introduces a logical hole; namely, that we've allowed
!none- && !huge- && !bad-pmds to be classified as genuine
pte-table-mapping-pmds.  One such example that could leak through are swap
entries.  The pmd values aren't checked again before use in
pte_offset_map_lock(), which is expecting nothing less than a genuine
pte-table-mapping-pmd.

We want to put back the !pmd_present() check (below the pmd_none() check),
but need to be careful to deal with subtleties in pmd transitions and
treatments by various arch.

The issue is that __split_huge_pmd_locked() temporarily clears the present
bit (or otherwise marks the entry as invalid), but pmd_present() and
pmd_trans_huge() still need to return true while the pmd is in this
transitory state.  For example, x86's pmd_present() also checks the
_PAGE_PSE , riscv's version also checks the _PAGE_LEAF bit, and arm64 also
checks a PMD_PRESENT_INVALID bit.

Covering all 4 cases for x86 (all checks done on the same pmd value):

1) pmd_present() && pmd_trans_huge()
   All we actually know here is that the PSE bit is set. Either:
   a) We aren't racing with __split_huge_page(), and PRESENT or PROTNONE
      is set.
      => huge-pmd
   b) We are currently racing with __split_huge_page().  The danger here
      is that we proceed as-if we have a huge-pmd, but really we are
      looking at a pte-mapping-pmd.  So, what is the risk of this
      danger?

      The only relevant path is:

madvise_collapse() -> collapse_pte_mapped_thp()

      Where we might just incorrectly report back "success", when really
      the memory isn't pmd-backed.  This is fine, since split could
      happen immediately after (actually) successful madvise_collapse().
      So, it should be safe to just assume huge-pmd here.

2) pmd_present() && !pmd_trans_huge()
   Either:
   a) PSE not set and either PRESENT or PROTNONE is.
      => pte-table-mapping pmd (or PROT_NONE)
   b) devmap.  This routine can be called immediately after
      unlocking/locking mmap_lock -- or called with no locks held (see
      khugepaged_scan_mm_slot()), so previous VMA checks have since been
      invalidated.

3) !pmd_present() && pmd_trans_huge()
  Not possible.

4) !pmd_present() && !pmd_trans_huge()
  Neither PRESENT nor PROTNONE set
  => not present

I've checked all archs that implement pmd_trans_huge() (arm64, riscv,
powerpc, longarch, x86, mips, s390) and this logic roughly translates
(though devmap treatment is unique to x86 and powerpc, and (3) doesn't
necessarily hold in general -- but that doesn't matter since
!pmd_present() always takes failure path).

Also, add a comment above find_pmd_or_thp_or_none() to help future
travelers reason about the validity of the code; namely, the possible
mutations that might happen out from under us, depending on how mmap_lock
is held (if at all).

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 34488399fa08 ("mm/madvise: add file and shmem support to MADV_COLLAPSE")
Signed-off-by: Zach O'Keefe <[email protected]>
Reported-by: Hugh Dickins <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agoRevert "mm: kmemleak: alloc gray object for reserved region with direct map"
Isaac J. Manjarres [Tue, 24 Jan 2023 23:02:54 +0000 (15:02 -0800)]
Revert "mm: kmemleak: alloc gray object for reserved region with direct map"

This reverts commit 972fa3a7c17c9d60212e32ecc0205dc585b1e769.

Kmemleak operates by periodically scanning memory regions for pointers to
allocated memory blocks to determine if they are leaked or not.  However,
reserved memory regions can be used for DMA transactions between a device
and a CPU, and thus, wouldn't contain pointers to allocated memory blocks,
making them inappropriate for kmemleak to scan.  Thus, revert this commit.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 972fa3a7c17c9 ("mm: kmemleak: alloc gray object for reserved region with direct map")
Signed-off-by: Isaac J. Manjarres <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Calvin Zhang <[email protected]>
Cc: Frank Rowand <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Saravana Kannan <[email protected]>
Cc: <[email protected]> [5.17+]
Signed-off-by: Andrew Morton <[email protected]>
2 years agofreevxfs: Kconfig: fix spelling
Randy Dunlap [Tue, 24 Jan 2023 18:16:38 +0000 (10:16 -0800)]
freevxfs: Kconfig: fix spelling

Fix a spello in freevxfs Kconfig.
(reported by codespell)

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomaple_tree: should get pivots boundary by type
Wei Yang [Sat, 12 Nov 2022 23:43:08 +0000 (23:43 +0000)]
maple_tree: should get pivots boundary by type

We should get pivots boundary by type.  Fixes a potential overindexing of
mt_pivots[].

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years ago.mailmap: update e-mail address for Eugen Hristev
Eugen Hristev [Thu, 19 Jan 2023 07:22:29 +0000 (09:22 +0200)]
.mailmap: update e-mail address for Eugen Hristev

Update e-mail address.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Eugen Hristev <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agomm, mremap: fix mremap() expanding for vma's with vm_ops->close()
Vlastimil Babka [Tue, 17 Jan 2023 10:19:39 +0000 (11:19 +0100)]
mm, mremap: fix mremap() expanding for vma's with vm_ops->close()

Fabian has reported another regression in 6.1 due to ca3d76b0aa80 ("mm:
add merging after mremap resize").  The problem is that vma_merge() can
fail when vma has a vm_ops->close() method, causing is_mergeable_vma()
test to be negative.  This was happening for vma mapping a file from
fuse-overlayfs, which does have the method.  But when we are simply
expanding the vma, we never remove it due to the "merge" with the added
area, so the test should not prevent the expansion.

As a quick fix, check for such vmas and expand them using vma_adjust()
directly as was done before commit ca3d76b0aa80.  For a more robust long
term solution we should try to limit the check for vma_ops->close only to
cases that actually result in vma removal, so that no merge would be
prevented unnecessarily.

[[email protected]: fix indenting whitespace, reflow comment]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: ca3d76b0aa80 ("mm: add merging after mremap resize")
Signed-off-by: Vlastimil Babka <[email protected]>
Reported-by: Fabian Vogt <[email protected]>
Link: https://bugzilla.suse.com/show_bug.cgi?id=1206359#c35
Tested-by: Fabian Vogt <[email protected]>
Cc: Jakub Matěna <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agosquashfs: harden sanity check in squashfs_read_xattr_id_table
Fedor Pchelkin [Tue, 17 Jan 2023 10:52:26 +0000 (13:52 +0300)]
squashfs: harden sanity check in squashfs_read_xattr_id_table

While mounting a corrupted filesystem, a signed integer '*xattr_ids' can
become less than zero.  This leads to the incorrect computation of 'len'
and 'indexes' values which can cause null-ptr-deref in copy_bio_to_actor()
or out-of-bounds accesses in the next sanity checks inside
squashfs_read_xattr_id_table().

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 506220d2ba21 ("squashfs: add more sanity checks in xattr id lookup")
Reported-by: <[email protected]>
Signed-off-by: Fedor Pchelkin <[email protected]>
Signed-off-by: Alexey Khoroshilov <[email protected]>
Cc: Phillip Lougher <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agoia64: fix build error due to switch case label appearing next to declaration
James Morse [Tue, 17 Jan 2023 15:16:32 +0000 (15:16 +0000)]
ia64: fix build error due to switch case label appearing next to declaration

Since commit aa06a9bd8533 ("ia64: fix clock_getres(CLOCK_MONOTONIC) to
report ITC frequency"), gcc 10.1.0 fails to build ia64 with the gnomic:
| ../arch/ia64/kernel/sys_ia64.c: In function 'ia64_clock_getres':
| ../arch/ia64/kernel/sys_ia64.c:189:3: error: a label can only be part of a statement and a declaration is not a statement
|   189 |   s64 tick_ns = DIV_ROUND_UP(NSEC_PER_SEC, local_cpu_data->itc_freq);

This line appears immediately after a case label in a switch.

Move the declarations out of the case, to the top of the function.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: aa06a9bd8533 ("ia64: fix clock_getres(CLOCK_MONOTONIC) to report ITC frequency")
Signed-off-by: James Morse <[email protected]>
Reviewed-by: Sergei Trofimovich <[email protected]>
Cc: Émeric Maschino <[email protected]>
Cc: matoro <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
This page took 0.125421 seconds and 4 git commands to generate.