Git Repo - linux.git/log

]> Git Repo - linux.git/log

Liam R. Howlett [Fri, 20 Jan 2023 16:26:13 +0000 (11:26 -0500)]

mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator

Start passing the vma iterator through the mm code. This will allow for
reuse of the state and cleaner invalidation if necessary.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam R. Howlett <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Liam R. Howlett [Fri, 20 Jan 2023 16:26:12 +0000 (11:26 -0500)]

mm/mmap: remove preallocation from do_mas_align_munmap()

In preparation of passing the vma state through split, the pre-allocation
that occurs before the split has to be moved to after. Since the
preallocation would then live right next to the store, just call store
instead of preallocating. This effectively restores the potential error
path of splitting and not munmap'ing which pre-dates the maple tree.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam R. Howlett <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Liam R. Howlett [Fri, 20 Jan 2023 16:26:11 +0000 (11:26 -0500)]

mmap: convert vma_link() vma iterator

Avoid using the maple tree interface directly.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam R. Howlett <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Liam R. Howlett [Fri, 20 Jan 2023 16:26:10 +0000 (11:26 -0500)]

kernel/fork: convert forking to using the vmi iterator

Avoid using the maple tree interface directly. This gains type safety.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam R. Howlett <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Liam R. Howlett [Fri, 20 Jan 2023 16:26:09 +0000 (11:26 -0500)]

mm/mmap: convert brk to use vma iterator

Use the vma iterator API for the brk() system call. This will provide
type safety at compile time.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam R. Howlett <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Liam R. Howlett [Fri, 20 Jan 2023 16:26:08 +0000 (11:26 -0500)]

mm: expand vma iterator interface

Add wrappers for the maple tree to the vma iterator. This will provide
type safety at compile time.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam R. Howlett <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Liam R. Howlett [Fri, 20 Jan 2023 16:26:07 +0000 (11:26 -0500)]

maple_tree: fix mas_prev() and mas_find() state handling

When mas_prev() does not find anything, set the state to MAS_NONE.

Handle the MAS_NONE in mas_find() like a MAS_START.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam R. Howlett <[email protected]>
Reported-by: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Liam R. Howlett [Fri, 20 Jan 2023 16:26:06 +0000 (11:26 -0500)]

maple_tree: fix handle of invalidated state in mas_wr_store_setup()

If an invalidated maple state is encountered during write, reset the maple
state to MAS_START. This will result in a re-walk of the tree to the
correct location for the write.

Link: https://lore.kernel.org/all/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam R. Howlett <[email protected]>
Reported-by: SeongJae Park <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Liam R. Howlett [Fri, 20 Jan 2023 16:26:05 +0000 (11:26 -0500)]

test_maple_tree: test modifications while iterating

Add a testcase to ensure the iterator detects bad states on modifications
and does what the user expects

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam R. Howlett <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Liam R. Howlett [Fri, 20 Jan 2023 16:26:04 +0000 (11:26 -0500)]

maple_tree: reduce user error potential

When iterating, a user may operate on the tree and cause the maple state
to be altered and left in an unintuitive state. Detect this scenario and
correct it by setting to the limit and invalidating the state.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam R. Howlett <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Liam R. Howlett [Fri, 20 Jan 2023 16:26:03 +0000 (11:26 -0500)]

maple_tree: fix potential rcu issue

Ensure the node isn't dead after reading the node end.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam R. Howlett <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Liam R. Howlett [Fri, 20 Jan 2023 16:26:02 +0000 (11:26 -0500)]

maple_tree: add mas_init() function

Patch series "VMA tree type safety and remove __vma_adjust()", v4.

This patchset does two things: 1.  Clean up, including removal of
__vma_adjust() and 2.  Extends the VMA iterator API to provide type safety
to the VMA operations using the maple tree, as requested by Linus [1].

It also addresses another issue of usability brought up by Linus about
needing to modify the maple state within the loops.  The maple state has
been replaced by the VMA iterator and the iterator is now modified within
the MM code so the caller should not need to worry about doing the work
themselves when tree modifications occur.

This brought up a potential inconsistency of the iterator state and what
the user expects, so the inconsistency is addressed to keep the VMA
iterator safe for use after the looping over a VMA range.  This is
addressed in patch 3 ("maple_tree: Reduce user error potential") and 4
("test_maple_tree: Test modifications while iterating").

While cleaning up the state, the duplicate locking code in mm/mmap.c
introduced by the maple tree has been address by abstracting it to two
functions: vma_prepare() and vma_complete().  These abstractions allowed
for a much simpler __vma_adjust(), which eventually leads to the removal
of the __vma_adjust() function by placing the logic into the vma_merge()
function itself.

1. https://lore.kernel.org/linux-mm/CAHk-=wg9WQXBGkNdKD2bqocnN73rDswuWsavBB7T-tekykEn_A@mail.gmail.com/

This patch (of 49):

Add a function that will zero out the maple state struct and set some
basic defaults.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam R. Howlett <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Fri, 3 Feb 2023 21:28:40 +0000 (16:28 -0500)]

mm: fix memcpy_from_file_folio() integer underflow

If we have a HIGHMEM system with a large folio, 'offset' may be larger
than PAGE_SIZE, and so min_t will cap at 'len' instead of the intended
end-of-page. That can overflow into the next page which is likely to be
unmapped and fault, but could theoretically copy the wrong data.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 00cdf76012ab ("mm: add memcpy_from_file_folio()")
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: "Fabio M. De Francesco" <[email protected]>
Cc: Ira Weiny <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

David Hildenbrand [Wed, 8 Feb 2023 14:08:01 +0000 (15:08 +0100)]

arm/mm: fix swp type masking in __swp_entry()

We're masking with the number of type bits instead of the type mask, which
is obviously wrong.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 20aae9eff5ac ("arm/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE")
Signed-off-by: David Hildenbrand <[email protected]>
Reported-by: Mark Brown <[email protected]>
Tested-by: Mark Brown <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 26 Jan 2023 20:12:55 +0000 (20:12 +0000)]

mpage: convert __mpage_writepage() to use a folio more fully

This is just a conversion to the folio API. While there are some nods
towards supporting multi-page folios in here, the blocks array is still
sized for one page's worth of blocks, and there are other assumptions such
as the blocks_per_page variable.

[[email protected]: fix accidentally-triggering WARN_ON_ONCE]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Jan Kara <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 26 Jan 2023 20:12:54 +0000 (20:12 +0000)]

fs: convert writepage_t callback to pass a folio

Patch series "Convert writepage_t to use a folio".

More folioisation. I split out the mpage work from everything else
because it completely dominated the patch, but some implementations I just
converted outright.

This patch (of 2):

We always write back an entire folio, but that's currently passed as the
head page. Convert all filesystems that use write_cache_pages() to expect
a folio instead of a page.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Thu, 26 Jan 2023 20:15:52 +0000 (20:15 +0000)]

mm: add memcpy_from_file_folio()

This is the equivalent of memcpy_from_page(). It differs in that it takes
the position in a file instead of offset in a folio, it accepts the total
number of bytes to be copied (instead of the number of bytes to be copied
from this folio) and it returns how many bytes were copied from the folio,
rather than making the caller calculate that and then checking if the
caller got it right.

[[email protected]: fix typo in comment]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: "Fabio M. De Francesco" <[email protected]>
Cc: Ira Weiny <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Wed, 25 Jan 2023 13:34:36 +0000 (14:34 +0100)]

block: remove ->rw_page

The ->rw_page method is a special purpose bypass of the usual bio handling
path that is limited to single-page reads and writes and synchronous which
causes a lot of extra code in the drivers, callers and the block layer.

The only remaining user is the MM swap code. Switch that swap code to
simply submit a single-vec on-stack bio an synchronously wait on it based
on a newly added QUEUE_FLAG_SYNCHRONOUS flag set by the drivers that
currently implement ->rw_page instead. While this touches one extra cache
line and executes extra code, it simplifies the block layer and drivers
and ensures that all feastures are properly supported by all drivers, e.g.
right now ->rw_page bypassed cgroup writeback entirely.

[[email protected]: fix comment typo, per Dan]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Vishal Verma <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Wed, 25 Jan 2023 13:34:35 +0000 (14:34 +0100)]

mm: factor out a swap_writepage_bdev helper

Split the block device case from swap_readpage into a separate helper,
following the abstraction for file based swap.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Vishal Verma <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Wed, 25 Jan 2023 13:34:34 +0000 (14:34 +0100)]

mm: remove the __swap_writepage return value

__swap_writepage always returns 0.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Vishal Verma <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Wed, 25 Jan 2023 13:34:33 +0000 (14:34 +0100)]

mm: use an on-stack bio for synchronous swapin

Optimize the synchronous swap in case by using an on-stack bio instead of
allocating one using bio_alloc.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Vishal Verma <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Wed, 25 Jan 2023 13:34:32 +0000 (14:34 +0100)]

mm: factor out a swap_readpage_bdev helper

Split the block device case from swap_readpage into a separate helper,
following the abstraction for file based swap and frontswap.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Vishal Verma <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Wed, 25 Jan 2023 13:34:31 +0000 (14:34 +0100)]

mm: remove the swap_readpage return value

swap_readpage always returns 0, and no caller checks the return value.

[[email protected]: fix void-returning swap_readpage() stub, per Keith]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Vishal Verma <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Wed, 25 Jan 2023 13:34:30 +0000 (14:34 +0100)]

mpage: stop using bdev_{read,write}_page

Patch series "remove ->rw_page".

This series removes the ->rw_page block_device_operation, which is an old
and clumsy attempt at a simple read/write fast path for the block layer.
It isn't actually used by the fastest block layer operations that we
support (polled I/O through io_uring), but only used by the mpage buffered
I/O helpers which are some of the slowest I/O we have and do not make any
difference there at all, and zram which is a block device abused to
duplicate the zram functionality.

Given that zram is heavily used we need to make sure there is a good
replacement for synchronous I/O, so this series adds a new flag for
drivers that complete I/O synchronously and uses that flag to use on-stack
bios and synchronous submission for them in the swap code.

This patch (of 7):

These are micro-optimizations for synchronous I/O, which do not matter
compared to all the other inefficiencies in the legacy buffer_head based
mpage code.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Vishal Verma <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Sat, 21 Jan 2023 07:10:51 +0000 (08:10 +0100)]

mm: refactor va_remove_mappings

Move the VM_FLUSH_RESET_PERMS to the caller and rename the function to
better describe what it is doing.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Sat, 21 Jan 2023 07:10:50 +0000 (08:10 +0100)]

mm: split __vunmap

vunmap only needs to find and free the vmap_area and vm_strut, so open
code that there and merge the rest of the code into vfree.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Uladzislau Rezki (Sony) <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Sat, 21 Jan 2023 07:10:49 +0000 (08:10 +0100)]

mm: move debug checks from __vunmap to remove_vm_area

All these checks apply to the free_vm_area interface as well, so move them
to the common routine.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Sat, 21 Jan 2023 07:10:48 +0000 (08:10 +0100)]

mm: use remove_vm_area in __vunmap

Use the common helper to find and remove a vmap_area instead of open
coding it.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Sat, 21 Jan 2023 07:10:47 +0000 (08:10 +0100)]

mm: move __remove_vm_area out of va_remove_mappings

__remove_vm_area is the only part of va_remove_mappings that requires a
vmap_area. Move the call out to the caller and only pass the vm_struct to
va_remove_mappings.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Sat, 21 Jan 2023 07:10:46 +0000 (08:10 +0100)]

mm: call vfree instead of __vunmap from delayed_vfree_work

This adds an extra, never taken, in_interrupt() branch, but will allow to
cut down the maze of vfree helpers.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Sat, 21 Jan 2023 07:10:45 +0000 (08:10 +0100)]

mm: move vmalloc_init and free_work down in vmalloc.c

Move these two functions around a bit to avoid forward declarations.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Sat, 21 Jan 2023 07:10:44 +0000 (08:10 +0100)]

mm: remove __vfree_deferred

Fold __vfree_deferred into vfree_atomic, and call vfree_atomic early on
from vfree if called from interrupt context so that the extra low-level
helper can be avoided.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Sat, 21 Jan 2023 07:10:43 +0000 (08:10 +0100)]

mm: remove __vfree

__vfree is a subset of vfree that just skips a few checks, and which is
only used by vfree and an error cleanup path. Fold __vfree into vfree and
switch the only other caller to call vfree() instead.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Sat, 21 Jan 2023 07:10:42 +0000 (08:10 +0100)]

mm: reject vmap with VM_FLUSH_RESET_PERMS

Patch series "cleanup vfree and vunmap".

This little series untangles the vfree and vunmap code path a bit.

This patch (of 10):

VM_FLUSH_RESET_PERMS is just for use with vmalloc as it is tied to freeing
the underlying pages.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Mel Gorman [Wed, 25 Jan 2023 13:44:34 +0000 (13:44 +0000)]

mm, compaction: finish pageblocks on complete migration failure

Commit 7efc3b726103 ("mm/compaction: fix set skip in
fast_find_migrateblock") address an issue where a pageblock selected by
fast_find_migrateblock() was ignored.  Unfortunately, the same fix
resulted in numerous reports of khugepaged or kcompactd stalling for long
periods of time or consuming 100% of CPU.

Tracing showed that there was a lot of rescanning between a small subset
of pageblocks because the conditions for marking the block skip are not
met.  The scan is not reaching the end of the pageblock because enough
pages were isolated but none were migrated successfully.  Eventually it
circles back to the same block.

Pageblock skip tracking tries to minimise both latency and excessive
scanning but tracking exactly when a block is fully scanned requires an
excessive amount of state.  This patch forcibly rescans a pageblock when
all isolated pages fail to migrate even though it could be for transient
reasons such as page writeback or page dirty.  This will sometimes migrate
too many pages but pageblocks will be marked skip and forward progress
will be made.

"Usemen" from the mmtests configuration
workload-usemem-stress-numa-compact was used to stress compaction.  The
compaction trace events were recorded using a 6.2-rc5 kernel that includes
commit 7efc3b726103 and count of unique ranges were measured.  The top 5
ranges were

   3076 range=(0x10ca00-0x10cc00)
   3076 range=(0x110a00-0x110c00)
   3098 range=(0x13b600-0x13b800)
   3104 range=(0x141c00-0x141e00)
  11424 range=(0x11b600-0x11b800)

While this workload is very different than what the bugs reported, the
pattern of the same subset of blocks being repeatedly scanned is observed.
At one point, *only* the range range=(0x11b600 ~ 0x11b800) was scanned
for 2 seconds.  14 seconds passed between the first migration-related
event and the last.

With the series applied including this patch, the top 5 ranges were

      1 range=(0x11607e-0x116200)
      1 range=(0x116200-0x116278)
      1 range=(0x116278-0x116400)
      1 range=(0x116400-0x116424)
      1 range=(0x116424-0x116600)

Only unique ranges were scanned and the time between the first
migration-related event was 0.11 milliseconds.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock")
Signed-off-by: Mel Gorman <[email protected]>
Cc: Chuyi Zhou <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Maxim Levitsky <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Pedro Falcato <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Mel Gorman [Wed, 25 Jan 2023 13:44:33 +0000 (13:44 +0000)]

mm, compaction: finish scanning the current pageblock if requested

cc->finish_pageblock is set when the current pageblock should be rescanned
but fast_find_migrateblock can select an alternative block. Disable
fast_find_migrateblock when the current pageblock scan should be
completed.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mel Gorman <[email protected]>
Cc: Chuyi Zhou <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Maxim Levitsky <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Pedro Falcato <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Mel Gorman [Wed, 25 Jan 2023 13:44:32 +0000 (13:44 +0000)]

mm, compaction: check if a page has been captured before draining PCP pages

If a page has been captured then draining is unnecssary so check first for
a captured page.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mel Gorman <[email protected]>
Cc: Chuyi Zhou <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Maxim Levitsky <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Pedro Falcato <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Mel Gorman [Wed, 25 Jan 2023 13:44:31 +0000 (13:44 +0000)]

mm, compaction: rename compact_control->rescan to finish_pageblock

Patch series "Fix excessive CPU usage during compaction".

Commit 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock")
fixed a problem where pageblocks found by fast_find_migrateblock() were
ignored. Unfortunately there were numerous bug reports complaining about high
CPU usage and massive stalls once 6.1 was released. Due to the severity,
the patch was reverted by Vlastimil as a short-term fix[1] to -stable.

The underlying problem for each of the bugs is suspected to be the
repeated scanning of the same pageblocks.  This series should guarantee
forward progress even with commit 7efc3b726103.  More information is in
the changelog for patch 4.

[1] http://lore.kernel.org/r/20230113173345 [email protected]

This patch (of 4):

The rescan field was not well named albeit accurate at the time.  Rename
the field to finish_pageblock to indicate that the remainder of the
pageblock should be scanned regardless of COMPACT_CLUSTER_MAX.  The intent
is that pageblocks with transient failures get marked for skipping to
avoid revisiting the same pageblock.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mel Gorman <[email protected]>
Cc: Chuyi Zhou <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Maxim Levitsky <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Pedro Falcato <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Jongwoo Han [Wed, 25 Jan 2023 18:08:47 +0000 (03:08 +0900)]

mm/gup.c: fix typo in comments

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jongwoo Han <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Andrey Konovalov [Tue, 24 Jan 2023 20:35:26 +0000 (21:35 +0100)]

kasan: reset page tags properly with sampling

The implementation of page_alloc poisoning sampling assumed that
tag_clear_highpage resets page tags for __GFP_ZEROTAGS allocations.
However, this is no longer the case since commit 70c248aca9e7 ("mm: kasan:
Skip unpoisoning of user pages").

This leads to kernel crashes when MTE-enabled userspace mappings are used
with Hardware Tag-Based KASAN enabled.

Reset page tags for __GFP_ZEROTAGS allocations in post_alloc_hook().

Also clarify and fix related comments.

[[email protected]: update comment]
Link: https://lkml.kernel.org/r/5dbd866714b4839069e2d8469ac45b60953db290.1674592780.git.andreyknvl@google.com
Link: https://lkml.kernel.org/r/24ea20c1b19c2b4b56cf9f5b354915f8dbccfc77.1674592496.git.andreyknvl@google.com
Fixes: 44383cef54c0 ("kasan: allow sampling page_alloc allocations for HW_TAGS")
Signed-off-by: Andrey Konovalov <[email protected]>
Reported-by: Peter Collingbourne <[email protected]>
Tested-by: Peter Collingbourne <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Marco Elver <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Mike Rapoport [Sat, 21 Jan 2023 10:11:51 +0000 (12:11 +0200)]

mm/sparse: fix "unused function 'pgdat_to_phys'" warning

W=1 build with clangs complains:

mm/sparse.c:347:27: warning: unused function 'pgdat_to_phys' [-Wunused-function]
static inline phys_addr_t pgdat_to_phys(struct pglist_data *pgdat)
^
1 warning generated.

pgdat_to_phys() is only used by functions defined when
CONFIG_MEMORY_HOTREMOVE=y.

Move pgdat_to_phys() under #ifdef CONFIG_MEMORY_HOTREMOVE
to make clang happy.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Reported-by: kernel test robot <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Cc: Miles Chen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Hyeonggon Yoo [Sat, 21 Jan 2023 16:50:54 +0000 (01:50 +0900)]

mm/page_owner: record single timestamp value for high order allocations

When allocating a high-order page, separate allocation timestamp is
recorded for each sub-page resulting in different timestamp values between
them.

This behavior is not consistent with the behavior when recording free
timestamp and caused confusion when analyzing memory dumps. Record single
timestamp for the entire allocation, aligning with the behavior for free
timestamps.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hyeonggon Yoo <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Jiaqi Yan [Fri, 20 Jan 2023 03:46:22 +0000 (03:46 +0000)]

mm: memory-failure: document memory failure stats

Add documentation for memory_failure's per NUMA node sysfs entries

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiaqi Yan <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Jiaqi Yan [Fri, 20 Jan 2023 03:46:21 +0000 (03:46 +0000)]

mm: memory-failure: bump memory failure stats to pglist_data

Right before memory_failure finishes its handling, accumulate poisoned
page's resolution counters to pglist_data's memory_failure_stats, so as to
update the corresponding sysfs entries.

Tested:
1) Start an application to allocate memory buffer chunks
2) Convert random memory buffer addresses to physical addresses
3) Inject memory errors using EINJ at chosen physical addresses
4) Access poisoned memory buffer and recover from SIGBUS
5) Check counter values under
/sys/devices/system/node/node*/memory_failure/*

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiaqi Yan <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Jiaqi Yan [Fri, 20 Jan 2023 03:46:20 +0000 (03:46 +0000)]

mm: memory-failure: add memory failure stats to sysfs

Patch series "Introduce per NUMA node memory error statistics", v2.

Background
==========

In the RFC for Kernel Support of Memory Error Detection [1], one advantage
of software-based scanning over hardware patrol scrubber is the ability to
make statistics visible to system administrators.  The statistics include
2 categories:

* Memory error statistics, for example, how many memory error are
  encountered, how many of them are recovered by the kernel.  Note these
  memory errors are non-fatal to kernel: during the machine check
  exception (MCE) handling kernel already classified MCE's severity to be
  unnecessary to panic (but either action required or optional).

* Scanner statistics, for example how many times the scanner have fully
  scanned a NUMA node, how many errors are first detected by the scanner.

The memory error statistics are useful to userspace and actually not
specific to scanner detected memory errors, and are the focus of this
patchset.

Motivation
==========

Memory error stats are important to userspace but insufficient in kernel
today.  Datacenter administrators can better monitor a machine's memory
health with the visible stats.  For example, while memory errors are
inevitable on servers with 10+ TB memory, starting server maintenance when
there are only 1~2 recovered memory errors could be overreacting; in cloud
production environment maintenance usually means live migrate all the
workload running on the server and this usually causes nontrivial
disruption to the customer.  Providing insight into the scope of memory
errors on a system helps to determine the appropriate follow-up action.
In addition, the kernel's existing memory error stats need to be
standardized so that userspace can reliably count on their usefulness.

Today kernel provides following memory error info to userspace, but they
are not sufficient or have disadvantages:
* HardwareCorrupted in /proc/meminfo: number of bytes poisoned in total,
  not per NUMA node stats though
* ras:memory_failure_event: only available after explicitly enabled
* /dev/mcelog provides many useful info about the MCEs, but doesn't
  capture how memory_failure recovered memory MCEs
* kernel logs: userspace needs to process log text

Exposing memory error stats is also a good start for the in-kernel memory
error detector.  Today the data source of memory error stats are either
direct memory error consumption, or hardware patrol scrubber detection
(either signaled as UCNA or SRAO).  Once in-kernel memory scanner is
implemented, it will be the main source as it is usually configured to
scan memory DIMMs constantly and faster than hardware patrol scrubber.

How Implemented
===============

As Naoya pointed out [2], exposing memory error statistics to userspace is
useful independent of software or hardware scanner.  Therefore we
implement the memory error statistics independent of the in-kernel memory
error detector.  It exposes the following per NUMA node memory error
counters:

  /sys/devices/system/node/node${X}/memory_failure/total
  /sys/devices/system/node/node${X}/memory_failure/recovered
  /sys/devices/system/node/node${X}/memory_failure/ignored
  /sys/devices/system/node/node${X}/memory_failure/failed
  /sys/devices/system/node/node${X}/memory_failure/delayed

These counters describe how many raw pages are poisoned and after the
attempted recoveries by the kernel, their resolutions: how many are
recovered, ignored, failed, or delayed respectively.  This approach can be
easier to extend for future use cases than /proc/meminfo, trace event, and
log.  The following math holds for the statistics:

* total = recovered + ignored + failed + delayed

These memory error stats are reset during machine boot.

The 1st commit introduces these sysfs entries.  The 2nd commit populates
memory error stats every time memory_failure attempts memory error
recovery.  The 3rd commit adds documentations for introduced stats.

[1] https://lore.kernel.org/linux-mm/7E670362-C29E-4626-B546-26530D54F937@gmail.com/T/#mc22959244f5388891c523882e61163c6e4d703af
[2] https://lore.kernel.org/linux-mm/7E670362-C29E-4626-B546-26530D54F937@gmail.com/T/#m52d8d7a333d8536bd7ce74253298858b1c0c0ac6

This patch (of 3):

Today kernel provides following memory error info to userspace, but each
has its own disadvantage

* HardwareCorrupted in /proc/meminfo: number of bytes poisoned in total,
  not per NUMA node stats though

* ras:memory_failure_event: only available after explicitly enabled

* /dev/mcelog provides many useful info about the MCEs, but
  doesn't capture how memory_failure recovered memory MCEs

* kernel logs: userspace needs to process log text

Exposes per NUMA node memory error stats as sysfs entries:

  /sys/devices/system/node/node${X}/memory_failure/total
  /sys/devices/system/node/node${X}/memory_failure/recovered
  /sys/devices/system/node/node${X}/memory_failure/ignored
  /sys/devices/system/node/node${X}/memory_failure/failed
  /sys/devices/system/node/node${X}/memory_failure/delayed

These counters describe how many raw pages are poisoned and after the
attempted recoveries by the kernel, their resolutions: how many are
recovered, ignored, failed, or delayed respectively.  The following math
holds for the statistics:

* total = recovered + ignored + failed + delayed

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiaqi Yan <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

T.J. Alumbaugh [Wed, 18 Jan 2023 00:18:27 +0000 (00:18 +0000)]

mm: multi-gen LRU: simplify lru_gen_look_around()

Update the folio generation in place with or without
current->reclaim_state->mm_walk. The LRU lock is held for longer, if
mm_walk is NULL and the number of folios to update is more than
PAGEVEC_SIZE.

This causes a measurable regression from the LRU lock contention during a
microbencmark. But a tiny regression is not worth the complexity.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: T.J. Alumbaugh <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

T.J. Alumbaugh [Wed, 18 Jan 2023 00:18:26 +0000 (00:18 +0000)]

mm: multi-gen LRU: improve walk_pmd_range()

Improve readability of walk_pmd_range() and walk_pmd_range_locked().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: T.J. Alumbaugh <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

T.J. Alumbaugh [Wed, 18 Jan 2023 00:18:25 +0000 (00:18 +0000)]

mm: multi-gen LRU: improve lru_gen_exit_memcg()

Add warnings and poison ->next.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: T.J. Alumbaugh <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

T.J. Alumbaugh [Wed, 18 Jan 2023 00:18:24 +0000 (00:18 +0000)]

mm: multi-gen LRU: section for memcg LRU

Move memcg LRU code into a dedicated section. Improve the design doc to
outline its architecture.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: T.J. Alumbaugh <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

T.J. Alumbaugh [Wed, 18 Jan 2023 00:18:23 +0000 (00:18 +0000)]

mm: multi-gen LRU: section for Bloom filters

Move Bloom filters code into a dedicated section. Improve the design doc
to explain Bloom filter usage and connection between aging and eviction in
their use.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: T.J. Alumbaugh <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

T.J. Alumbaugh [Wed, 18 Jan 2023 00:18:22 +0000 (00:18 +0000)]

mm: multi-gen LRU: section for rmap/PT walk feedback

Add a section for lru_gen_look_around() in the code and the design doc.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: T.J. Alumbaugh <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

T.J. Alumbaugh [Wed, 18 Jan 2023 00:18:21 +0000 (00:18 +0000)]

mm: multi-gen LRU: section for working set protection

Patch series "mm: multi-gen LRU: improve".

This patch series improves a few MGLRU functions, collects related
functions, and adds additional documentation.

This patch (of 7):

Add a section for working set protection in the code and the design doc.
The admin doc already contains its usage.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: T.J. Alumbaugh <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Zhaoyang Huang [Thu, 19 Jan 2023 01:22:24 +0000 (09:22 +0800)]

mm: move KMEMLEAK's Kconfig items from lib to mm

Have the kmemleak's source code and Kconfig items be in the same directory.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zhaoyang Huang <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: ke.wang <[email protected]>
Cc: Mirsad Goran Todorovac <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Catalin Marinas <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

SeongJae Park [Thu, 19 Jan 2023 01:38:31 +0000 (01:38 +0000)]

mm/damon/core-test: add a test for damon_update_monitoring_results()

Add a simple unit test for damon_update_monitoring_results() function.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: SeongJae Park <[email protected]>
Cc: Brendan Higgins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

SeongJae Park [Thu, 19 Jan 2023 01:38:30 +0000 (01:38 +0000)]

mm/damon/core: update monitoring results for new monitoring attributes

region->nr_accesses is the number of sampling intervals in the last
aggregation interval that access to the region has found, and region->age
is the number of aggregation intervals that its access pattern has
maintained. Hence, the real meaning of the two fields' values is
depending on current sampling and aggregation intervals.

This means the values need to be updated for every sampling and/or
aggregation intervals updates. As DAMON core doesn't, it is a duty of
in-kernel DAMON framework applications like DAMON sysfs interface, or the
userspace users.

Handling it in userspace or in-kernel DAMON application is complicated,
inefficient, and repetitive compared to doing the update in DAMON core.
Do the update in DAMON core.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: SeongJae Park <[email protected]>
Cc: Brendan Higgins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

SeongJae Park [Thu, 19 Jan 2023 01:38:29 +0000 (01:38 +0000)]

mm/damon: update comments in damon.h for damon_attrs

Patch series "mm/damon: misc fixes".

This patchset contains three miscellaneous simple fixes for DAMON online
tuning.

This patch (of 3):

Commit cbeaa77b0449 ("mm/damon/core: use a dedicated struct for monitoring
attributes") moved monitoring intervals from damon_ctx to a new struct,
damon_attrs, but a comment in the header file has not updated for the
change. Update it.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: cbeaa77b0449 ("mm/damon/core: use a dedicated struct for monitoring attributes")
Signed-off-by: SeongJae Park <[email protected]>
Cc: Brendan Higgins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Waiman Long [Thu, 19 Jan 2023 04:01:11 +0000 (23:01 -0500)]

mm/kmemleak: fix UAF bug in kmemleak_scan()

Commit 6edda04ccc7c ("mm/kmemleak: prevent soft lockup in first object
iteration loop of kmemleak_scan()") fixes soft lockup problem in
kmemleak_scan() by periodically doing a cond_resched().  It does take a
reference of the current object before doing it.  Unfortunately, if the
object has been deleted from the object_list, the next object pointed to
by its next pointer may no longer be valid after coming back from
cond_resched().  This can result in use-after-free and other nasty
problem.

Fix this problem by adding a del_state flag into kmemleak_object structure
to synchronize the object deletion process between kmemleak_cond_resched()
and __remove_object() to make sure that the object remained in the
object_list in the duration of the cond_resched() call.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 6edda04ccc7c ("mm/kmemleak: prevent soft lockup in first object iteration loop of kmemleak_scan()")
Signed-off-by: Waiman Long <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Muchun Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Waiman Long [Thu, 19 Jan 2023 04:01:10 +0000 (23:01 -0500)]

mm/kmemleak: simplify kmemleak_cond_resched() usage

Patch series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF", v2.

It was found that a KASAN use-after-free error was reported in the
kmemleak_scan() function.  After further examination, it is believe that
even though a reference is taken from the current object, it does not
prevent the object pointed to by the next pointer from going away after a
cond_resched().

To fix that, additional flags are added to make sure that the current
object won't be removed from the object_list during the duration of the
cond_resched() to ensure the validity of the next pointer.

While making the change, I also simplify the current usage of
kmemleak_cond_resched() to make it easier to understand.

This patch (of 2):

The presence of a pinned argument and the 64k loop count make
kmemleak_cond_resched() a bit more complex to read.  The pinned argument
is used only by first kmemleak_scan() loop.

Simplify the usage of kmemleak_cond_resched() by removing the pinned
argument and always do a get_object()/put_object() sequence.  In addition,
the 64k loop is removed by using need_resched() to decide if
kmemleak_cond_resched() should be called.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Waiman Long <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Muchun Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Kees Cook [Thu, 19 Jan 2023 16:03:44 +0000 (16:03 +0000)]

kselftest: vm: add tests for memory-deny-write-execute

Add some tests to cover the new PR_SET_MDWE prctl.

Link: https://lkml.kernel.org/r/[email protected]
Co-developed-by: Joey Gouly <[email protected]>
Signed-off-by: Joey Gouly <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Jeremy Linton <[email protected]>
Cc: Lennart Poettering <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: nd <[email protected]>
Cc: Szabolcs Nagy <[email protected]>
Cc: Topi Miettinen <[email protected]>
Cc: Zbigniew Jędrzejewski-Szmek <[email protected]>
Cc: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Joey Gouly [Thu, 19 Jan 2023 16:03:43 +0000 (16:03 +0000)]

mm: implement memory-deny-write-execute as a prctl

Patch series "mm: In-kernel support for memory-deny-write-execute (MDWE)",
v2.

The background to this is that systemd has a configuration option called
MemoryDenyWriteExecute [2], implemented as a SECCOMP BPF filter.  Its aim
is to prevent a user task from inadvertently creating an executable
mapping that is (or was) writeable.  Since such BPF filter is stateless,
it cannot detect mappings that were previously writeable but subsequently
changed to read-only.  Therefore the filter simply rejects any
mprotect(PROT_EXEC).  The side-effect is that on arm64 with BTI support
(Branch Target Identification), the dynamic loader cannot change an ELF
section from PROT_EXEC to PROT_EXEC|PROT_BTI using mprotect().  For
libraries, it can resort to unmapping and re-mapping but for the main
executable it does not have a file descriptor.  The original bug report in
the Red Hat bugzilla - [3] - and subsequent glibc workaround for libraries
- [4].

This series adds in-kernel support for this feature as a prctl
PR_SET_MDWE, that is inherited on fork().  The prctl denies PROT_WRITE |
PROT_EXEC mappings.  Like the systemd BPF filter it also denies adding
PROT_EXEC to mappings.  However unlike the BPF filter it only denies it if
the mapping didn't previous have PROT_EXEC.  This allows to PROT_EXEC ->
PROT_EXEC | PROT_BTI with mprotect(), which is a problem with the BPF
filter.

This patch (of 2):

The aim of such policy is to prevent a user task from creating an
executable mapping that is also writeable.

An example of mmap() returning -EACCESS if the policy is enabled:

mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);

Similarly, mprotect() would return -EACCESS below:

addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);

The BPF filter that systemd MDWE uses is stateless, and disallows
mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to
be enabled if it was already PROT_EXEC, which allows the following case:

addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);

where PROT_BTI enables branch tracking identification on arm64.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Joey Gouly <[email protected]>
Co-developed-by: Catalin Marinas <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Jeremy Linton <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Lennart Poettering <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: nd <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Szabolcs Nagy <[email protected]>
Cc: Topi Miettinen <[email protected]>
Cc: Zbigniew Jędrzejewski-Szmek <[email protected]>
Cc: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Herton R. Krzesinski [Mon, 16 Jan 2023 22:49:21 +0000 (19:49 -0300)]

tools/mm: allow users to provide additional cflags/ldflags

Right now there is no way to provide additional cflags/ldflags when
building tools/vm binaries. And using eg. make CFLAGS=<options> will
override the CFLAGS being set in the Makefile, making the build fail since
it requires the include of the ../lib dir (for libapi).

This change then allows you to specify:
CFLAGS=<options> LDFLAGS=<options> make V=1 -C tools/vm

And the options will be correctly appended as can be seen from the
make output.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Herton R. Krzesinski <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Justin Forbes <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Scott Weaver <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Deming Wang [Wed, 18 Jan 2023 02:54:03 +0000 (21:54 -0500)]

Documentation: mm: use `s/higmem/highmem/` fix typo for highmem

We should use highmem replace higmem.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Deming Wang <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Cc: "Fabio M. De Francesco" <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Levi Yun [Wed, 18 Jan 2023 08:05:23 +0000 (17:05 +0900)]

mm/cma: fix potential memory loss on cma_declare_contiguous_nid

Suppose memblock_alloc_range_nid() with highmem_start succeeds when
cma_declare_contiguous_nid is called with !fixed on a 32-bit system with
PHYS_ADDR_T_64BIT enabled with memblock.bottom_up == false.

But the next trial to memblock_alloc_range_nid() to allocate in [SIZE_4G,
limits) nullifies former successfully allocated addr and it retries
memblock_alloc_ragne_nid().

In this situation, the first successfully allocated address area is lost.

Change the order of allocation (SIZE_4G, high_memory and base) and check
whether the allocated succeeded to prevent potential memory loss.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Levi Yun <[email protected]>
Cc: Laurent Pinchart <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Yang Yang [Wed, 18 Jan 2023 12:13:03 +0000 (20:13 +0800)]

swap_state: update shadow_nodes for anonymous page

Shadow_nodes is for shadow nodes reclaiming of workingset handling, it is
updated when page cache add or delete since long time ago workingset only
supported page cache. But when workingset supports anonymous page
detection, we missied updating shadow nodes for it. This caused that
shadow nodes of anonymous page will never be reclaimd by
scan_shadow_nodes() even they use much memory and system memory is tense.

So update shadow_nodes of anonymous page when swap cache is add or delete
by calling xas_set_update(..workingset_update_node).

Link: https://lkml.kernel.org/r/[email protected]
Fixes: aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU")
Signed-off-by: Yang Yang <[email protected]>
Reviewed-by: Ran Xiaokai <[email protected]>
Cc: Bagas Sanjaya <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Sidhartha Kumar [Wed, 18 Jan 2023 17:40:39 +0000 (09:40 -0800)]

mm/hugetlb: convert get_hwpoison_huge_page() to folios

Straightforward conversion of get_hwpoison_huge_page() to
get_hwpoison_hugetlb_folio(). Reduces two references to a head page in
memory-failure.c

[[email protected]: fix get_hwpoison_hugetlb_folio() stub]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Muchun Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Sergey Senozhatsky [Wed, 18 Jan 2023 00:52:10 +0000 (09:52 +0900)]

zsmalloc: set default zspage chain size to 8

This changes key characteristics (pages per-zspage and objects per-zspage)
of a number of size classes which in results in different pool
configuration. With zspage chain size of 8 we have more size clases
clusters (123) and higher huge size class watermark (3632 bytes).

Please read zsmalloc documentation for more details.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sergey Senozhatsky <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Cc: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Sergey Senozhatsky [Wed, 18 Jan 2023 00:52:09 +0000 (09:52 +0900)]

zsmalloc: make zspage chain size configurable

Remove hard coded limit on the maximum number of physical pages
per-zspage.

This will allow tuning of zsmalloc pool as zspage chain size changes
`pages per-zspage` and `objects per-zspage` characteristics of size
classes which also affects size classes clustering (the way size classes
are merged).

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sergey Senozhatsky <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Cc: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Sergey Senozhatsky [Wed, 18 Jan 2023 00:52:08 +0000 (09:52 +0900)]

zsmalloc: skip chain size calculation for pow_of_2 classes

If a class size is power of 2 then it wastes no memory and the best
configuration is 1 physical page per-zspage.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sergey Senozhatsky <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Cc: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Sergey Senozhatsky [Wed, 18 Jan 2023 00:52:07 +0000 (09:52 +0900)]

zsmalloc: rework zspage chain size selection

Patch series "zsmalloc: make zspage chain size configurable".

Computers are bad at division.  We currently decide the best zspage chain
size (max number of physical pages per-zspage) by looking at a `used
percentage` value.  This is not enough as we lose precision during usage
percentage calculations For example, let's look at size class 208:

pages per zspage       wasted bytes         used%
       1                   144               96
       2                    80               99
       3                    16               99
       4                   160               99

Current algorithm will select 2 page per zspage configuration, as it's the
first one to reach 99%.  However, 3 pages per zspage waste less memory.

Change algorithm and select zspage configuration that has lowest wasted
value.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sergey Senozhatsky <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Cc: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Anshuman Khandual [Thu, 5 Jan 2023 08:25:06 +0000 (13:55 +0530)]

mm/page_alloc: use deferred_pages_enabled() wherever applicable

Instead of directly accessing static deferred_pages, replace such
instances with the helper deferred_pages_enabled(). No functional change
is intended.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
Reviewed-by: Mike Rapoport (IBM) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Pasha Tatashin [Tue, 17 Jan 2023 20:46:17 +0000 (20:46 +0000)]

mm/page_ext: init page_ext early if there are no deferred struct pages

page_ext must be initialized after all struct pages are initialized.
Therefore, page_ext is initialized after page_alloc_init_late(), and can
optionally be initialized earlier via early_page_ext kernel parameter
which as a side effect also disables deferred struct pages.

Allow to automatically init page_ext early when there are no deferred
struct pages in order to be able to use page_ext during kernel boot and
track for example page allocations early.

[[email protected]: fix build with CONFIG_PAGE_EXTENSION=n]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Pasha Tatashin <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Charan Teja Kalla <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Li Zhe <[email protected]>
Cc: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Huaisheng Ye [Mon, 16 Jan 2023 06:23:47 +0000 (14:23 +0800)]

mm/damon/core: skip apply schemes if empty

Sometimes there is no scheme in damon's context, for example just use damo
record to monitor workload's data access pattern.

If current damon context doesn't have any scheme in the list, kdamond has
no need to iterate over list of all targets and regions but do nothing.

So, skip apply schemes when ctx->schemes is empty.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Huaisheng Ye <[email protected]>
Reviewed-by: SeongJae Park <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Colin Ian King [Mon, 16 Jan 2023 16:43:32 +0000 (16:43 +0000)]

mm/secretmem: remove redundant initiialization of pointer file

The pointer file is being initialized with a value that is never read, it
is being re-assigned later on. Clean up code by removing the redundant
initialization.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Mike Rapoport (IBM) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Mon, 16 Jan 2023 19:39:41 +0000 (19:39 +0000)]

readahead: convert readahead_expand() to use a folio

Replace the uses of page with a folio. Also add a missing test for
workingset in the leading edge expansion.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Mon, 16 Jan 2023 19:39:40 +0000 (19:39 +0000)]

filemap: convert filemap_range_has_page() to use a folio

The folio isn't returned from this function, so this is an entirely
internal change.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Mon, 16 Jan 2023 19:39:39 +0000 (19:39 +0000)]

filemap: convert filemap_map_pmd() to take a folio

Patch series "Some more filemap folio conversions".

Three more places which could easily be converted to folios.  The third
one fixes a minor bug in readahead_expand(), but it's only a performance
bug and there are few users of readahead_expand(), so I don't think it's
worth backporting.

This patch (of 3):

Save a few calls to compound_head().  We specify exactly which page from
the folio to use by passing in start_pgoff, which means this will work for
a folio which is larger than PMD size.  The rest of the VM isn't prepared
for that yet, but now this function is.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Mon, 16 Jan 2023 19:29:59 +0000 (19:29 +0000)]

rmap: add folio parameter to __page_set_anon_rmap()

Avoid the compound_head() call in PageAnon() by passing in the folio that
all callers have. Also save me from wondering whether page->mapping can
ever be overwritten on a tail page (I don't think it can, but I'm not 100%
sure).

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Mon, 16 Jan 2023 19:28:27 +0000 (19:28 +0000)]

mm: clean up mlock_page / munlock_page references in comments

Change documentation and comments that refer to now-renamed functions.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Mon, 16 Jan 2023 19:28:26 +0000 (19:28 +0000)]

mm: remove munlock_vma_page()

All callers now have a folio and can call munlock_vma_folio(). Update the
documentation to refer to munlock_vma_folio().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Mon, 16 Jan 2023 19:28:25 +0000 (19:28 +0000)]

mm: remove mlock_vma_page()

All callers now have a folio and can call mlock_vma_folio(). Update the
documentation to refer to mlock_vma_folio().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Mon, 16 Jan 2023 19:28:24 +0000 (19:28 +0000)]

mm: remove page_evictable()

Patch series "Remove leftover mlock/munlock page wrappers".

We no longer need the various mlock page functions as all callers have
folios.

This patch (of 4):

This function now has no users. Also update the unevictable-lru
documentation to discuss folios instead of pages (mostly).

[[email protected]: fix Documentation/mm/unevictable-lru.rst underlining]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Mon, 16 Jan 2023 19:25:07 +0000 (19:25 +0000)]

mm: convert mem_cgroup_css_from_page() to mem_cgroup_css_from_folio()

Only one caller doesn't have a folio, so move the page_folio() call to
that one caller from mem_cgroup_css_from_folio().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Mon, 16 Jan 2023 19:25:06 +0000 (19:25 +0000)]

mm/fs: convert inode_attach_wb() to take a folio

Patch series "Writeback folio conversions".

Remove more calls to compound_head() by passing folios around instead of
pages.

This patch (of 2):

The only caller of inode_attach_wb() which doesn't pass NULL already has a
folio, so convert the whole call-chain to take folios.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Mon, 16 Jan 2023 19:18:13 +0000 (19:18 +0000)]

mm: use a folio in copy_present_pte()

We still have to keep the page around because we need to know which page
in the folio we're copying, but we can replace five implict calls to
compound_head() with one.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Mon, 16 Jan 2023 19:18:12 +0000 (19:18 +0000)]

mm: use a folio in copy_pte_range()

Allocate an order-0 folio instead of a page and pass it all the way down
the call chain. Removes dozens of calls to compound_head().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Mon, 16 Jan 2023 19:18:11 +0000 (19:18 +0000)]

mm: convert wp_page_copy() to use folios

Use new_folio instead of new_page throughout, because we allocated it
and know it's an order-0 folio. Most old_page uses become old_folio,
but use vmf->page where we need the precise page.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Mon, 16 Jan 2023 19:18:10 +0000 (19:18 +0000)]

mm: convert do_anonymous_page() to use a folio

Removes six calls to compound_head(); some inline and some external.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Mon, 16 Jan 2023 19:18:09 +0000 (19:18 +0000)]

mm: add vma_alloc_zeroed_movable_folio()

Replace alloc_zeroed_user_highpage_movable(). The main difference is
returning a folio containing a single page instead of returning the page,
but take the opportunity to rename the function to match other allocation
functions a little better and rewrite the documentation to place more
emphasis on the zeroing rather than the highmem aspect.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Vishal Moola (Oracle) [Wed, 4 Jan 2023 21:14:48 +0000 (13:14 -0800)]

filemap: remove find_get_pages_range_tag()

All callers to find_get_pages_range_tag(), find_get_pages_tag(),
pagevec_lookup_range_tag(), and pagevec_lookup_tag() have been removed.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Vishal Moola (Oracle) [Wed, 4 Jan 2023 21:14:47 +0000 (13:14 -0800)]

nilfs2: convert nilfs_clear_dirty_pages() to use filemap_get_folios_tag()

Convert function to use folios throughout. This is in preparation for the
removal of find_get_pages_range_tag(). This change removes 2 calls to
compound_head().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Acked-by: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Vishal Moola (Oracle) [Wed, 4 Jan 2023 21:14:46 +0000 (13:14 -0800)]

nilfs2: convert nilfs_copy_dirty_pages() to use filemap_get_folios_tag()

Convert function to use folios throughout. This is in preparation for the
removal of find_get_pages_range_tag(). This change removes 8 calls to
compound_head().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Acked-by: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Vishal Moola (Oracle) [Wed, 4 Jan 2023 21:14:45 +0000 (13:14 -0800)]

nilfs2: convert nilfs_btree_lookup_dirty_buffers() to use filemap_get_folios_tag()

Convert function to use folios throughout. This is in preparation for the
removal of find_get_pages_range_tag(). This change removes 1 call to
compound_head().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Acked-by: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Vishal Moola (Oracle) [Wed, 4 Jan 2023 21:14:44 +0000 (13:14 -0800)]

nilfs2: convert nilfs_lookup_dirty_node_buffers() to use filemap_get_folios_tag()

Convert function to use folios throughout. This is in preparation for the
removal of find_get_pages_range_tag(). This change removes 1 call to
compound_head().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Acked-by: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Vishal Moola (Oracle) [Wed, 4 Jan 2023 21:14:43 +0000 (13:14 -0800)]

nilfs2: convert nilfs_lookup_dirty_data_buffers() to use filemap_get_folios_tag()

Convert function to use folios throughout. This is in preparation for
the removal of find_get_pages_range_tag(). This change removes 4 calls
to compound_head().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Acked-by: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Vishal Moola (Oracle) [Wed, 4 Jan 2023 21:14:42 +0000 (13:14 -0800)]

gfs2: convert gfs2_write_cache_jdata() to use filemap_get_folios_tag()

Convert function to use folios throughout. This is in preparation for the
removal of find_get_pgaes_range_tag(). This change removes 8 calls to
compound_head().

Also had to modify and rename gfs2_write_jdata_pagevec() to take in and
utilize folio_batch rather than pagevec and use folios rather than pages.
gfs2_write_jdata_batch() now supports large folios.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Reviewed-by: Andreas Gruenbacher <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Vishal Moola (Oracle) [Wed, 4 Jan 2023 21:14:41 +0000 (13:14 -0800)]

f2fs: convert f2fs_sync_meta_pages() to use filemap_get_folios_tag()

Convert function to use folios throughout.  This is in preparation for the
removal of find_get_pages_range_tag().  This change removes 5 calls to
compound_head().

Initially the function was checking if the previous page index is truly
the previous page i.e.  1 index behind the current page.  To convert to
folios and maintain this check we need to make the check folio->index !=
prev + folio_nr_pages(previous folio) since we don't know how many pages
are in a folio.

At index i == 0 the check is guaranteed to succeed, so to workaround
indexing bounds we can simply ignore the check for that specific index.
This makes the initial assignment of prev trivial, so I removed that as
well.

Also modify a comment in commit_checkpoint for consistency.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Acked-by: Chao Yu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Vishal Moola (Oracle) [Wed, 4 Jan 2023 21:14:40 +0000 (13:14 -0800)]

f2fs: convert last_fsync_dnode() to use filemap_get_folios_tag()

Convert to use a folio_batch instead of pagevec. This is in preparation
for the removal of find_get_pages_range_tag().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Acked-by: Chao Yu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Vishal Moola (Oracle) [Wed, 4 Jan 2023 21:14:39 +0000 (13:14 -0800)]

f2fs: convert f2fs_write_cache_pages() to use filemap_get_folios_tag()

Convert the function to use a folio_batch instead of pagevec.  This is in
preparation for the removal of find_get_pages_range_tag().

Also modified f2fs_all_cluster_page_ready to take in a folio_batch instead
of pagevec.  This does NOT support large folios.  The function currently
only utilizes folios of size 1 so this shouldn't cause any issues right
now.

This version of the patch limits the number of pages fetched to
F2FS_ONSTACK_PAGES.  If that ever happens, update the start index here
since filemap_get_folios_tag() updates the index to be after the last
found folio, not necessarily the last used page.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Acked-by: Chao Yu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Vishal Moola (Oracle) [Wed, 4 Jan 2023 21:14:38 +0000 (13:14 -0800)]

f2fs: convert f2fs_sync_node_pages() to use filemap_get_folios_tag()

Convert function to use a folio_batch instead of pagevec. This is in
preparation for the removal of find_get_pages_range_tag().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Acked-by: Chao Yu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Vishal Moola (Oracle) [Wed, 4 Jan 2023 21:14:37 +0000 (13:14 -0800)]

f2fs: convert f2fs_flush_inline_data() to use filemap_get_folios_tag()

Convert function to use a folio_batch instead of pagevec. This is in
preparation for the removal of find_get_pages_tag().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Acked-by: Chao Yu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

Empty description

RSS Atom

This page took 0.150805 seconds and 4 git commands to generate.