Git Repo - linux.git/log

]> Git Repo - linux.git/log

projects / linux.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Peter Xu [Fri, 5 Nov 2021 20:38:24 +0000 (13:38 -0700)]

mm/shmem: unconditionally set pte dirty in mfill_atomic_install_pte

Patch series "mm: A few cleanup patches around zap, shmem and uffd", v4.

IMHO all of them are very nice cleanups to existing code already,
they're all small and self-contained.  They'll be needed by uffd-wp
coming series.

This patch (of 4):

It was conditionally done previously, as there's one shmem special case
that we use SetPageDirty() instead.  However that's not necessary and it
should be easier and cleaner to do it unconditionally in
mfill_atomic_install_pte().

The most recent discussion about this is here, where Hugh explained the
history of SetPageDirty() and why it's possible that it's not required
at all:

https://lore.kernel.org/lkml/alpine.LSU.2.11.2104121657050 [email protected]/

Currently mfill_atomic_install_pte() has three callers:

        1. shmem_mfill_atomic_pte
        2. mcopy_atomic_pte
        3. mcontinue_atomic_pte

After the change: case (1) should have its SetPageDirty replaced by the
dirty bit on pte (so we unify them together, finally), case (2) should
have no functional change at all as it has page_in_cache==false, case
(3) may add a dirty bit to the pte.  However since case (3) is
UFFDIO_CONTINUE for shmem, it's merely 100% sure the page is dirty after
all because UFFDIO_CONTINUE normally requires another process to modify
the page cache and kick the faulted thread, so should not make a real
difference either.

This should make it much easier to follow on which case will set dirty
for uffd, as we'll simply set it all now for all uffd related ioctls.
Meanwhile, no special handling of SetPageDirty() if there's no need.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Axel Rasmussen <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Liam Howlett <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Amit Daniel Kachhap [Fri, 5 Nov 2021 20:38:18 +0000 (13:38 -0700)]

mm/memory.c: avoid unnecessary kernel/user pointer conversion

Annotating a pointer from __user to kernel and then back again might
confuse sparse. In copy_huge_page_from_user() it can be avoided by
removing the intermediate variable since it is never used.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Amit Daniel Kachhap <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Rolf Eike Beer [Fri, 5 Nov 2021 20:38:15 +0000 (13:38 -0700)]

mm: use __pfn_to_section() instead of open coding it

It is defined in the same file just a few lines above.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Rolf Eike Beer <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Peng Liu [Fri, 5 Nov 2021 20:38:12 +0000 (13:38 -0700)]

mm/mmap.c: fix a data race of mm->total_vm

The variable mm->total_vm could be accessed concurrently during mmaping
and system accounting as noticed by KCSAN,

  BUG: KCSAN: data-race in __acct_update_integrals / mmap_region

  read-write to 0xffffa40267bd14c8 of 8 bytes by task 15609 on cpu 3:
   mmap_region+0x6dc/0x1400
   do_mmap+0x794/0xca0
   vm_mmap_pgoff+0xdf/0x150
   ksys_mmap_pgoff+0xe1/0x380
   do_syscall_64+0x37/0x50
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

  read to 0xffffa40267bd14c8 of 8 bytes by interrupt on cpu 2:
   __acct_update_integrals+0x187/0x1d0
   acct_account_cputime+0x3c/0x40
   update_process_times+0x5c/0x150
   tick_sched_timer+0x184/0x210
   __run_hrtimer+0x119/0x3b0
   hrtimer_interrupt+0x350/0xaa0
   __sysvec_apic_timer_interrupt+0x7b/0x220
   asm_call_irq_on_stack+0x12/0x20
   sysvec_apic_timer_interrupt+0x4d/0x80
   asm_sysvec_apic_timer_interrupt+0x12/0x20
   smp_call_function_single+0x192/0x2b0
   perf_install_in_context+0x29b/0x4a0
   __se_sys_perf_event_open+0x1a98/0x2550
   __x64_sys_perf_event_open+0x63/0x70
   do_syscall_64+0x37/0x50
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

  Reported by Kernel Concurrency Sanitizer on:
  CPU: 2 PID: 15610 Comm: syz-executor.3 Not tainted 5.10.0+ #2
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
  Ubuntu-1.8.2-1ubuntu1 04/01/2014

In vm_stat_account which called by mmap_region, increase total_vm, and
__acct_update_integrals may read total_vm at the same time.  This will
cause a data race which lead to undefined behaviour.  To avoid potential
bad read/write, volatile property and barrier are both used to avoid
undefined behaviour.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peng Liu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Vasily Averin [Fri, 5 Nov 2021 20:38:09 +0000 (13:38 -0700)]

memcg: prohibit unconditional exceeding the limit of dying tasks

Memory cgroup charging allows killed or exiting tasks to exceed the hard
limit.  It is assumed that the amount of the memory charged by those
tasks is bound and most of the memory will get released while the task
is exiting.  This is resembling a heuristic for the global OOM situation
when tasks get access to memory reserves.  There is no global memory
shortage at the memcg level so the memcg heuristic is more relieved.

The above assumption is overly optimistic though.  E.g.  vmalloc can
scale to really large requests and the heuristic would allow that.  We
used to have an early break in the vmalloc allocator for killed tasks
but this has been reverted by commit b8c8a338f75e ("Revert "vmalloc:
back off when the current task is killed"").  There are likely other
similar code paths which do not check for fatal signals in an
allocation&charge loop.  Also there are some kernel objects charged to a
memcg which are not bound to a process life time.

It has been observed that it is not really hard to trigger these
bypasses and cause global OOM situation.

One potential way to address these runaways would be to limit the amount
of excess (similar to the global OOM with limited oom reserves).  This
is certainly possible but it is not really clear how much of an excess
is desirable and still protects from global OOMs as that would have to
consider the overall memcg configuration.

This patch is addressing the problem by removing the heuristic
altogether.  Bypass is only allowed for requests which either cannot
fail or where the failure is not desirable while excess should be still
limited (e.g.  atomic requests).  Implementation wise a killed or dying
task fails to charge if it has passed the OOM killer stage.  That should
give all forms of reclaim chance to restore the limit before the failure
(ENOMEM) and tell the caller to back off.

In addition, this patch renames should_force_charge() helper to
task_is_dying() because now its use is not associated witch forced
charging.

This patch depends on pagefault_out_of_memory() to not trigger
out_of_memory(), because then a memcg failure can unwind to VM_FAULT_OOM
and cause a global OOM killer.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vasily Averin <[email protected]>
Suggested-by: Michal Hocko <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Michal Hocko [Fri, 5 Nov 2021 20:38:06 +0000 (13:38 -0700)]

mm, oom: do not trigger out_of_memory from the #PF

Any allocation failure during the #PF path will return with VM_FAULT_OOM
which in turn results in pagefault_out_of_memory.  This can happen for 2
different reasons.  a) Memcg is out of memory and we rely on
mem_cgroup_oom_synchronize to perform the memcg OOM handling or b)
normal allocation fails.

The latter is quite problematic because allocation paths already trigger
out_of_memory and the page allocator tries really hard to not fail
allocations.  Anyway, if the OOM killer has been already invoked there
is no reason to invoke it again from the #PF path.  Especially when the
OOM condition might be gone by that time and we have no way to find out
other than allocate.

Moreover if the allocation failed and the OOM killer hasn't been invoked
then we are unlikely to do the right thing from the #PF context because
we have already lost the allocation context and restictions and
therefore might oom kill a task from a different NUMA domain.

This all suggests that there is no legitimate reason to trigger
out_of_memory from pagefault_out_of_memory so drop it.  Just to be sure
that no #PF path returns with VM_FAULT_OOM without allocation print a
warning that this is happening before we restart the #PF.

[VvS: #PF allocation can hit into limit of cgroup v1 kmem controller.
This is a local problem related to memcg, however, it causes unnecessary
global OOM kills that are repeated over and over again and escalate into a
real disaster.  This has been broken since kmem accounting has been
introduced for cgroup v1 (3.8).  There was no kmem specific reclaim for
the separate limit so the only way to handle kmem hard limit was to return
with ENOMEM.  In upstream the problem will be fixed by removing the
outdated kmem limit, however stable and LTS kernels cannot do it and are
still affected.  This patch fixes the problem and should be backported
into stable/LTS.]

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Vasily Averin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Vasily Averin [Fri, 5 Nov 2021 20:38:02 +0000 (13:38 -0700)]

mm, oom: pagefault_out_of_memory: don't force global OOM for dying tasks

Patch series "memcg: prohibit unconditional exceeding the limit of dying tasks", v3.

Memory cgroup charging allows killed or exiting tasks to exceed the hard
limit.  It can be misused and allowed to trigger global OOM from inside
a memcg-limited container.  On the other hand if memcg fails allocation,
called from inside #PF handler it triggers global OOM from inside
pagefault_out_of_memory().

To prevent these problems this patchset:
(a) removes execution of out_of_memory() from
     pagefault_out_of_memory(), becasue nobody can explain why it is
     necessary.
(b) allow memcg to fail allocation of dying/killed tasks.

This patch (of 3):

Any allocation failure during the #PF path will return with VM_FAULT_OOM
which in turn results in pagefault_out_of_memory which in turn executes
out_out_memory() and can kill a random task.

An allocation might fail when the current task is the oom victim and
there are no memory reserves left.  The OOM killer is already handled at
the page allocator level for the global OOM and at the charging level
for the memcg one.  Both have much more information about the scope of
allocation/charge request.  This means that either the OOM killer has
been invoked properly and didn't lead to the allocation success or it
has been skipped because it couldn't have been invoked.  In both cases
triggering it from here is pointless and even harmful.

It makes much more sense to let the killed task die rather than to wake
up an eternally hungry oom-killer and send him to choose a fatter victim
for breakfast.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vasily Averin <[email protected]>
Suggested-by: Michal Hocko <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Muchun Song [Fri, 5 Nov 2021 20:37:59 +0000 (13:37 -0700)]

mm: list_lru: only add memcg-aware lrus to the global lru list

The non-memcg-aware lru is always skiped when traversing the global lru
list, which is not efficient. We can only add the memcg-aware lru to
the global lru list instead to make traversing more efficient.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Muchun Song [Fri, 5 Nov 2021 20:37:56 +0000 (13:37 -0700)]

mm: memcontrol: remove the kmem states

Now the kmem states is only used to indicate whether the kmem is
offline. However, we can set ->kmemcg_id to -1 to indicate whether the
kmem is offline. Finally, we can remove the kmem states to simplify the
code.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Muchun Song [Fri, 5 Nov 2021 20:37:53 +0000 (13:37 -0700)]

mm: memcontrol: remove kmemcg_id reparenting

Since slab objects and kmem pages are charged to object cgroup instead
of memory cgroup, memcg_reparent_objcgs() will reparent this cgroup and
all its descendants to its parent cgroup.  This already makes further
list_lru_add()'s add elements to the parent's list.  So it is
unnecessary to change kmemcg_id of an offline cgroup to its parent's id.
It just wastes CPU cycles.  Just remove the redundant code.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Shakeel Butt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Muchun Song [Fri, 5 Nov 2021 20:37:50 +0000 (13:37 -0700)]

mm: list_lru: fix the return value of list_lru_count_one()

Since commit 2788cf0c401c ("memcg: reparent list_lrus and free kmemcg_id
on css offline"), ->nr_items can be negative during memory cgroup
reparenting.  In this case, list_lru_count_one() will return an unusual
and huge value, which can surprise users.  At least for now it hasn't
affected any users.  But it is better to let list_lru_count_ont()
returns zero when ->nr_items is negative.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Muchun Song [Fri, 5 Nov 2021 20:37:47 +0000 (13:37 -0700)]

mm: list_lru: remove holding lru lock

Since commit e5bc3af7734f ("rcu: Consolidate PREEMPT and !PREEMPT
synchronize_rcu()"), the critical section of spin lock can serve as an
RCU read-side critical section which already allows readers that hold
nlru->lock to avoid taking rcu lock. So just remove holding lock.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Shakeel Butt [Fri, 5 Nov 2021 20:37:44 +0000 (13:37 -0700)]

memcg, kmem: further deprecate kmem.limit_in_bytes

The deprecation process of kmem.limit_in_bytes started with the commit
0158115f702 ("memcg, kmem: deprecate kmem.limit_in_bytes") which also
explains in detail the motivation behind the deprecation.  To summarize,
it is the unexpected behavior on hitting the kmem limit.  This patch
moves the deprecation process to the next stage by disallowing to set
the kmem limit.  In future we might just remove the kmem.limit_in_bytes
file completely.

[[email protected]: s/ENOTSUPP/EOPNOTSUPP/]
[[email protected]: mark cancel_charge() inline]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Shakeel Butt <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Cc: Vasily Averin <[email protected]>
Cc: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Len Baker [Fri, 5 Nov 2021 20:37:40 +0000 (13:37 -0700)]

mm/list_lru.c: prefer struct_size over open coded arithmetic

As noted in the "Deprecated Interfaces, Language Features, Attributes,
and Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing.

This could lead to values wrapping around and a smaller allocation being
made than the caller was expecting. Using those allocations could lead
to linear overflows of heap memory and other misbehaviors.

So, use the struct_size() helper to do the arithmetic instead of the
argument "size + count * size" in the kvmalloc() functions.

Also, take the opportunity to refactor the memcpy() call to use the
flex_array_size() helper.

This code was detected with the help of Coccinelle and audited and fixed
manually.

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Len Baker <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: "Gustavo A. R. Silva" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Waiman Long [Fri, 5 Nov 2021 20:37:37 +0000 (13:37 -0700)]

mm/memcg: remove obsolete memcg_free_kmem()

Since commit d648bcc7fe65 ("mm: kmem: make memcg_kmem_enabled()
irreversible"), the only thing memcg_free_kmem() does is to call
memcg_offline_kmem() when the memcg is still online which can happen
when online_css() fails due to -ENOMEM.

However, the name memcg_free_kmem() is confusing and it is more clear
and straight forward to call memcg_offline_kmem() directly from
mem_cgroup_css_free().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Waiman Long <[email protected]>
Suggested-by: Roman Gushchin <[email protected]>
Reviewed-by: Aaron Tomlin <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Muchun Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Shakeel Butt [Fri, 5 Nov 2021 20:37:34 +0000 (13:37 -0700)]

memcg: unify memcg stat flushing

The memcg stats can be flushed in multiple context and potentially in
parallel too.  For example multiple parallel user space readers for
memcg stats will contend on the rstat locks with each other.  There is
no need for that.  We just need one flusher and everyone else can
benefit.

In addition after aa48e47e3906 ("memcg: infrastructure to flush memcg
stats") the kernel periodically flush the memcg stats from the root, so,
the other flushers will potentially have much less work to do.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Shakeel Butt <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: "Michal Koutný" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Shakeel Butt [Fri, 5 Nov 2021 20:37:31 +0000 (13:37 -0700)]

memcg: flush stats only if updated

At the moment, the kernel flushes the memcg stats on every refault and
also on every reclaim iteration.  Although rstat maintains per-cpu
update tree but on the flush the kernel still has to go through all the
cpu rstat update tree to check if there is anything to flush.  This
patch adds the tracking on the stats update side to make flush side more
clever by skipping the flush if there is no update.

The stats update codepath is very sensitive performance wise for many
workloads and benchmarks.  So, we can not follow what the commit
aa48e47e3906 ("memcg: infrastructure to flush memcg stats") did which
was triggering async flush through queue_work() and caused a lot
performance regression reports.  That got reverted by the commit
1f828223b799 ("memcg: flush lruvec stats in the refault").

In this patch we kept the stats update codepath very minimal and let the
stats reader side to flush the stats only when the updates are over a
specific threshold.  For now the threshold is (nr_cpus * CHARGE_BATCH).

To evaluate the impact of this patch, an 8 GiB tmpfs file is created on
a system with swap-on-zram and the file was pushed to swap through
memory.force_empty interface.  On reading the whole file, the memcg stat
flush in the refault code path is triggered.  With this patch, we
observed 63% reduction in the read time of 8 GiB file.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Shakeel Butt <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Reviewed-by: "Michal Koutný" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Peter Xu [Fri, 5 Nov 2021 20:37:28 +0000 (13:37 -0700)]

mm/memcg: drop swp_entry_t* in mc_handle_file_pte()

It is unused after the rework of commit f5df8635c5a3 ("mm: use
find_get_incore_page in memcontrol").

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Fri, 5 Nov 2021 20:37:25 +0000 (13:37 -0700)]

mm: optimise put_pages_list()

Instead of calling put_page() one page at a time, pop pages off the list
if their refcount was too high and pass the remainder to
put_unref_page_list(). This should be a speed improvement, but I have
no measurements to support that. Current callers do not care about
performance, but I hope to add some which do.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Anthony Yznaga <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Rafael Aquini [Fri, 5 Nov 2021 20:37:22 +0000 (13:37 -0700)]

mm/swapfile: fix an integer overflow in swap_show()

This one is just a minor nuisance for people going through /proc/swaps
if any of their swapareas is bigger than, or equal to 1073741824 pages
(4TB).

seq_printf() format string casts as uint the conversion from pages to
KB, and that will overflow in the aforementioned case.

Albeit being almost unthinkable that someone would actually set up such
big of a single swaparea, there is a ticket recently filed against RHEL:
https://bugzilla.redhat.com/show_bug.cgi?id=2008812

Given that all other codesites that use format strings for the same swap
pages-to-KB conversion do cast it as ulong, this patch just follows
suit.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Rafael Aquini <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Xu Wang [Fri, 5 Nov 2021 20:37:19 +0000 (13:37 -0700)]

mm/swapfile: remove needless request_queue NULL pointer check

The request_queue pointer returned from bdev_get_queue() shall never be
NULL, so the null check is unnecessary, just remove it.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Xu Wang <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

John Hubbard [Fri, 5 Nov 2021 20:37:16 +0000 (13:37 -0700)]

mm/gup: further simplify __gup_device_huge()

Commit 6401c4eb57f9 ("mm: gup: fix potential pgmap refcnt leak in
__gup_device_huge()") simplified the return paths, but didn't go quite
far enough, as discussed in [1].

Remove the "ret" variable entirely, because there is enough information
already available to provide the return value.

[1] https://lore.kernel.org/r/CAHk-=wgQTRX=5SkCmS+zfmpqubGHGJvXX_HgnPG8JSpHKHBMeg@mail.gmail.com

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Claudio Imbrenda <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Jens Axboe [Fri, 5 Nov 2021 20:37:13 +0000 (13:37 -0700)]

mm: move more expensive part of XA setup out of mapping check

The fast path here is not needing any writeback, yet we spend time
setting up the xarray lookup data upfront. Move the part that actually
needs to iterate the address space mapping into a separate helper,
saving ~30% of the time here.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Fri, 5 Nov 2021 20:37:10 +0000 (13:37 -0700)]

mm/filemap.c: remove bogus VM_BUG_ON

It is not safe to check page->index without holding the page lock. It
can be changed if the page is moved between the swap cache and the page
cache for a shmem file, for example. There is a VM_BUG_ON below which
checks page->index is correct after taking the page lock.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 5c211ba29deb ("mm: add and use find_lock_entries")
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reported-by: <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Jens Axboe [Fri, 5 Nov 2021 20:37:07 +0000 (13:37 -0700)]

mm: don't read i_size of inode unless we need it

We always go through i_size_read(), and we rarely end up needing it.
Push the read to down where we need to check it, which avoids it for
most cases.

It looks like we can even remove this check entirely, which might be
worth pursuing. But at least this takes it out of the hot path.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Acked-by: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Pavel Begunkov <[email protected]>
Cc: Jan Kara <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 5 Nov 2021 20:37:04 +0000 (13:37 -0700)]

mm: simplify bdi refcounting

Move grabbing and releasing the bdi refcount out of the common
wb_init/wb_exit helpers into code that is only used for the non-default
memcg driven bdi_writeback structures.

[[email protected]: add comment]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: fix typo]

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Miquel Raynal <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Vignesh Raghavendra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 5 Nov 2021 20:37:01 +0000 (13:37 -0700)]

mm: don't automatically unregister bdis

All BDI users now unregister explicitly.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Miquel Raynal <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Vignesh Raghavendra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 5 Nov 2021 20:36:58 +0000 (13:36 -0700)]

fs: explicitly unregister per-superblock BDIs

Add a new SB_I_ flag to mark superblocks that have an ephemeral bdi
associated with them, and unregister it when the superblock is shut
down.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Miquel Raynal <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Vignesh Raghavendra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 5 Nov 2021 20:36:55 +0000 (13:36 -0700)]

mtd: call bdi_unregister explicitly

Call bdi_unregister explicitly instead of relying on the automatic
unregistration.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Miquel Raynal <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Vignesh Raghavendra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 5 Nov 2021 20:36:52 +0000 (13:36 -0700)]

mm: export bdi_unregister

Patch series "simplify bdi unregistation".

This series simplifies the BDI code to get rid of the magic
auto-unregister feature that hid a recent block layer refcounting bug.

This patch (of 5):

To wind down the magic auto-unregister semantics we'll need to push this
into modular code.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Miquel Raynal <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Vignesh Raghavendra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

David Howells [Fri, 5 Nov 2021 20:36:49 +0000 (13:36 -0700)]

mm: stop filemap_read() from grabbing a superfluous page

Under some circumstances, filemap_read() will allocate sufficient pages
to read to the end of the file, call readahead/readpages on them and
copy the data over - and then it will allocate another page at the EOF
and call readpage on that and then ignore it. This is unnecessary and a
waste of time and resources.

filemap_read() *does* check for this, but only after it has already done
the allocation and I/O. Fix this by checking before calling
filemap_get_pages() also.

Link: https://lkml.kernel.org/r/163472463105.3126792.7056099385135786492.stgit@warthog.procyon.org.uk
Link: https://lore.kernel.org/r/160588481358.3465195.16552616179674485179.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/163456863216.2614702.6384850026368833133.stgit@warthog.procyon.org.uk/
Signed-off-by: David Howells <[email protected]>
Acked-by: Jeff Layton <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Kent Overstreet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Yinan Zhang [Fri, 5 Nov 2021 20:36:46 +0000 (13:36 -0700)]

mm/page_ext.c: fix a comment

I have noticed that the previous macro is #ifndef CONFIG_SPARSEMEM. I
think the comment of #else should be CONFIG_SPARSEMEM.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yinan Zhang <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Kees Cook [Fri, 5 Nov 2021 20:36:42 +0000 (13:36 -0700)]

percpu: add __alloc_size attributes for better bounds checking

As already done in GrapheneOS, add the __alloc_size attribute for
appropriate percpu allocator interfaces, to provide additional hinting
for better bounds checking, assisting CONFIG_FORTIFY_SOURCE and other
compiler optimizations.

Note that due to the implementation of the percpu API, this is unlikely
to ever actually provide compile-time checking beyond very simple
non-SMP builds. But, since they are technically allocators, mark them
as such.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
Co-developed-by: Daniel Micay <[email protected]>
Signed-off-by: Daniel Micay <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Andy Whitcroft <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Dwaipayan Ray <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Lukas Bulwahn <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Alexandre Bounine <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jing Xiangfeng <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Matt Porter <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Souptick Joarder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Kees Cook [Fri, 5 Nov 2021 20:36:38 +0000 (13:36 -0700)]

mm/page_alloc: add __alloc_size attributes for better bounds checking

As already done in GrapheneOS, add the __alloc_size attribute for
appropriate page allocator interfaces, to provide additional hinting for
better bounds checking, assisting CONFIG_FORTIFY_SOURCE and other
compiler optimizations.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
Co-developed-by: Daniel Micay <[email protected]>
Signed-off-by: Daniel Micay <[email protected]>
Cc: Andy Whitcroft <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Dwaipayan Ray <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Lukas Bulwahn <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Alexandre Bounine <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jing Xiangfeng <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Matt Porter <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Souptick Joarder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Kees Cook [Fri, 5 Nov 2021 20:36:34 +0000 (13:36 -0700)]

mm/vmalloc: add __alloc_size attributes for better bounds checking

As already done in GrapheneOS, add the __alloc_size attribute for
appropriate vmalloc allocator interfaces, to provide additional hinting
for better bounds checking, assisting CONFIG_FORTIFY_SOURCE and other
compiler optimizations.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
Co-developed-by: Daniel Micay <[email protected]>
Signed-off-by: Daniel Micay <[email protected]>
Cc: Andy Whitcroft <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Dwaipayan Ray <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Lukas Bulwahn <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Alexandre Bounine <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jing Xiangfeng <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Matt Porter <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Souptick Joarder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Kees Cook [Fri, 5 Nov 2021 20:36:31 +0000 (13:36 -0700)]

mm/kvmalloc: add __alloc_size attributes for better bounds checking

As already done in GrapheneOS, add the __alloc_size attribute for
regular kvmalloc interfaces, to provide additional hinting for better
bounds checking, assisting CONFIG_FORTIFY_SOURCE and other compiler
optimizations.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
Co-developed-by: Daniel Micay <[email protected]>
Signed-off-by: Daniel Micay <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Andy Whitcroft <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Dwaipayan Ray <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Lukas Bulwahn <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Alexandre Bounine <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jing Xiangfeng <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Matt Porter <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Souptick Joarder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Kees Cook [Fri, 5 Nov 2021 20:36:27 +0000 (13:36 -0700)]

slab: add __alloc_size attributes for better bounds checking

As already done in GrapheneOS, add the __alloc_size attribute for
regular kmalloc interfaces, to provide additional hinting for better
bounds checking, assisting CONFIG_FORTIFY_SOURCE and other compiler
optimizations.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
Co-developed-by: Daniel Micay <[email protected]>
Signed-off-by: Daniel Micay <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Andy Whitcroft <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Dwaipayan Ray <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Lukas Bulwahn <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Alexandre Bounine <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jing Xiangfeng <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Matt Porter <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Souptick Joarder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Kees Cook [Fri, 5 Nov 2021 20:36:23 +0000 (13:36 -0700)]

slab: clean up function prototypes

Based on feedback from Joe Perches and Linus Torvalds, regularize the
slab function prototypes before making attribute changes.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Alexandre Bounine <[email protected]>
Cc: Andy Whitcroft <[email protected]>
Cc: Daniel Micay <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Dwaipayan Ray <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jing Xiangfeng <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Lukas Bulwahn <[email protected]>
Cc: Matt Porter <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Souptick Joarder <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Kees Cook [Fri, 5 Nov 2021 20:36:19 +0000 (13:36 -0700)]

Compiler Attributes: add __alloc_size() for better bounds checking

GCC and Clang can use the "alloc_size" attribute to better inform the
results of __builtin_object_size() (for compile-time constant values).
Clang can additionally use alloc_size to inform the results of
__builtin_dynamic_object_size() (for run-time values).

Because GCC sees the frequent use of struct_size() as an allocator size
argument, and notices it can return SIZE_MAX (the overflow indication),
it complains about these call sites overflowing (since SIZE_MAX is
greater than the default -Walloc-size-larger-than=PTRDIFF_MAX).  This
isn't helpful since we already know a SIZE_MAX will be caught at
run-time (this was an intentional design).  To deal with this, we must
disable this check as it is both a false positive and redundant.  (Clang
does not have this warning option.)

Unfortunately, just checking the -Wno-alloc-size-larger-than is not
sufficient to make the __alloc_size attribute behave correctly under
older GCC versions.  The attribute itself must be disabled in those
situations too, as there appears to be no way to reliably silence the
SIZE_MAX constant expression cases for GCC versions less than 9.1:

   In file included from ./include/linux/resource_ext.h:11,
                    from ./include/linux/pci.h:40,
                    from drivers/net/ethernet/intel/ixgbe/ixgbe.h:9,
                    from drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c:4:
   In function 'kmalloc_node',
       inlined from 'ixgbe_alloc_q_vector' at ./include/linux/slab.h:743:9:
   ./include/linux/slab.h:618:9: error: argument 1 value '18446744073709551615' exceeds maximum object size 9223372036854775807 [-Werror=alloc-size-larger-than=]
     return __kmalloc_node(size, flags, node);
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   ./include/linux/slab.h: In function 'ixgbe_alloc_q_vector':
   ./include/linux/slab.h:455:7: note: in a call to allocation function '__kmalloc_node' declared here
    void *__kmalloc_node(size_t size, gfp_t flags, int node) __assume_slab_alignment __malloc;
          ^~~~~~~~~~~~~~

Specifically:
'-Wno-alloc-size-larger-than' is not correctly handled by GCC < 9.1
    https://godbolt.org/z/hqsfG7q84 (doesn't disable)
    https://godbolt.org/z/P9jdrPTYh (doesn't admit to not knowing about option)
    https://godbolt.org/z/465TPMWKb (only warns when other warnings appear)

'-Walloc-size-larger-than=18446744073709551615' is not handled by GCC < 8.2
    https://godbolt.org/z/73hh1EPxz (ignores numeric value)

Since anything marked with __alloc_size would also qualify for marking
with __malloc, just include __malloc along with it to avoid redundant
markings.  (Suggested by Linus Torvalds.)

Finally, make sure checkpatch.pl doesn't get confused about finding the
__alloc_size attribute on functions.  (Thanks to Joe Perches.)

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
Tested-by: Randy Dunlap <[email protected]>
Cc: Andy Whitcroft <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Daniel Micay <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Dwaipayan Ray <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Lukas Bulwahn <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Alexandre Bounine <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jing Xiangfeng <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Matt Porter <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Souptick Joarder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Kees Cook [Fri, 5 Nov 2021 20:36:15 +0000 (13:36 -0700)]

rapidio: avoid bogus __alloc_size warning

Patch series "Add __alloc_size()", v3.

GCC and Clang both use the "alloc_size" attribute to assist with bounds
checking around the use of allocation functions.  Add the attribute,
adjust the Makefile to silence needless warnings, and add the hints to
the allocators where possible.  These changes have been in use for a
while now in GrapheneOS.

This patch (of 8):

After adding __alloc_size attributes to the allocators, GCC 9.3 (but not
later) may incorrectly evaluate the arguments to check_copy_size(),
getting seemingly confused by the size being returned from array_size().
Instead, perform the calculation once, which both makes the code more
readable and avoids the bug in GCC.

   In file included from arch/x86/include/asm/preempt.h:7,
                    from include/linux/preempt.h:78,
                    from include/linux/spinlock.h:55,
                    from include/linux/mm_types.h:9,
                    from include/linux/buildid.h:5,
                    from include/linux/module.h:14,
                    from drivers/rapidio/devices/rio_mport_cdev.c:13:
   In function 'check_copy_size',
       inlined from 'copy_from_user' at include/linux/uaccess.h:191:6,
       inlined from 'rio_mport_transfer_ioctl' at drivers/rapidio/devices/rio_mport_cdev.c:983:6:
   include/linux/thread_info.h:213:4: error: call to '__bad_copy_to' declared with attribute error: copy destination size is too small
     213 |    __bad_copy_to();
         |    ^~~~~~~~~~~~~~~

But the allocation size and the copy size are identical:

transfer = vmalloc(array_size(sizeof(*transfer), transaction.count));
if (!transfer)
return -ENOMEM;

if (unlikely(copy_from_user(transfer,
    (void __user *)(uintptr_t)transaction.block,
    array_size(sizeof(*transfer), transaction.count)))) {

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/linux-mm/[email protected]/
Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
Reported-by: kernel test robot <[email protected]>
Cc: Matt Porter <[email protected]>
Cc: Alexandre Bounine <[email protected]>
Cc: Jing Xiangfeng <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Souptick Joarder <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Andy Whitcroft <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Daniel Micay <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Dwaipayan Ray <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Lukas Bulwahn <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Kees Cook [Fri, 5 Nov 2021 20:36:12 +0000 (13:36 -0700)]

kasan: test: bypass __alloc_size checks

Intentional overflows, as performed by the KASAN tests, are detected at
compile time[1] (instead of only at run-time) with the addition of
__alloc_size. Fix this by forcing the compiler into not being able to
trust the size used following the kmalloc()s.

[1] https://lore.kernel.org/lkml/20211005184717.65c6d8eb39350395e387b71f@linux-foundation.org

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Guo Ren [Fri, 5 Nov 2021 20:36:09 +0000 (13:36 -0700)]

mm: debug_vm_pgtable: don't use __P000 directly

The __Pxxx/__Sxxx macros are only for protection_map[] init. All usage
of them in linux should come from protection_map array.

Because a lot of architectures would re-initilize protection_map[]
array, eg: x86-mem_encrypt, m68k-motorola, mips, arm, sparc.

Using __P000 is not rigorous.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Guo Ren <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Cc: Gavin Shan <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Peter Xu [Fri, 5 Nov 2021 20:36:05 +0000 (13:36 -0700)]

mm/smaps: simplify shmem handling of pte holes

Firstly, check_shmem_swap variable is actually not necessary, because
it's always set with pte_hole hook; checking each would work.

Meanwhile, the check within smaps_pte_entry is not easy to follow.
E.g., pte_none() check is not needed as "!pte_present && !is_swap_pte"
is the same. Since at it, use the pte_hole() helper rather than dup the
page cache lookup.

Still keep the CONFIG_SHMEM part so the code can be optimized to nop for
!SHMEM.

There will be a very slight functional change in smaps_pte_entry(), that
for !SHMEM we'll return early for pte_none (before checking page==NULL),
but that's even nicer.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Peter Xu [Fri, 5 Nov 2021 20:36:02 +0000 (13:36 -0700)]

mm/smaps: use vma->vm_pgoff directly when counting partial swap

As it's trying to cover the whole vma anyways, use direct vm_pgoff value
and vma_pages() rather than linear_page_index.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Peter Xu [Fri, 5 Nov 2021 20:35:59 +0000 (13:35 -0700)]

mm/smaps: fix shmem pte hole swap calculation

Patch series "mm/smaps: Fixes and optimizations on shmem swap handling".

This patch (of 3):

The shmem swap calculation on the privately writable mappings are using
wrong parameters as spotted by Vlastimil.  Fix them.  This was
introduced in commit 48131e03ca4e ("mm, proc: reduce cost of
/proc/pid/smaps for unpopulated shmem mappings"), when shmem_swap_usage
was reworked to shmem_partial_swap_usage.

Test program:

  void main(void)
  {
      char *buffer, *p;
      int i, fd;

      fd = memfd_create("test", 0);
      assert(fd > 0);

      /* isize==2M*3, fill in pages, swap them out */
      ftruncate(fd, SIZE_2M * 3);
      buffer = mmap(NULL, SIZE_2M * 3, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
      assert(buffer);
      for (i = 0, p = buffer; i < SIZE_2M * 3 / 4096; i++) {
          *p = 1;
          p += 4096;
      }
      madvise(buffer, SIZE_2M * 3, MADV_PAGEOUT);
      munmap(buffer, SIZE_2M * 3);

      /*
       * Remap with private+writtable mappings on partial of the inode (<= 2M*3),
       * while the size must also be >= 2M*2 to make sure there's a none pmd so
       * smaps_pte_hole will be triggered.
       */
      buffer = mmap(NULL, SIZE_2M * 2, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
      printf("pid=%d, buffer=%p\n", getpid(), buffer);

      /* Check /proc/$PID/smap_rollup, should see 4MB swap */
      sleep(1000000);
  }

Before the patch, smaps_rollup shows <4MB swap and the number will be
random depending on the alignment of the buffer of mmap() allocated.
After this patch, it'll show 4MB.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 48131e03ca4e ("mm, proc: reduce cost of /proc/pid/smaps for unpopulated shmem mappings")
Signed-off-by: Peter Xu <[email protected]>
Reported-by: Vlastimil Babka <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Peter Collingbourne [Fri, 5 Nov 2021 20:35:56 +0000 (13:35 -0700)]

kasan: test: add memcpy test that avoids out-of-bounds write

With HW tag-based KASAN, error checks are performed implicitly by the
load and store instructions in the memcpy implementation.  A failed
check results in tag checks being disabled and execution will keep
going.  As a result, under HW tag-based KASAN, prior to commit
1b0668be62cf ("kasan: test: disable kmalloc_memmove_invalid_size for
HW_TAGS"), this memcpy would end up corrupting memory until it hits an
inaccessible page and causes a kernel panic.

This is a pre-existing issue that was revealed by commit 285133040e6c
("arm64: Import latest memcpy()/memmove() implementation") which changed
the memcpy implementation from using signed comparisons (incorrectly,
resulting in the memcpy being terminated early for negative sizes) to
using unsigned comparisons.

It is unclear how this could be handled by memcpy itself in a reasonable
way.  One possibility would be to add an exception handler that would
force memcpy to return if a tag check fault is detected -- this would
make the behavior roughly similar to generic and SW tag-based KASAN.
However, this wouldn't solve the problem for asynchronous mode and also
makes memcpy behavior inconsistent with manually copying data.

This test was added as a part of a series that taught KASAN to detect
negative sizes in memory operations, see commit 8cceeff48f23 ("kasan:
detect negative size in memory operation function").  Therefore we
should keep testing for negative sizes with generic and SW tag-based
KASAN.  But there is some value in testing small memcpy overflows, so
let's add another test with memcpy that does not destabilize the kernel
by performing out-of-bounds writes, and run it in all modes.

Link: https://linux-review.googlesource.com/id/I048d1e6a9aff766c4a53f989fb0c83de68923882
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Collingbourne <[email protected]>
Reviewed-by: Andrey Konovalov <[email protected]>
Acked-by: Marco Elver <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Evgenii Stepanov <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Fri, 5 Nov 2021 20:35:53 +0000 (13:35 -0700)]

kasan: fix tag for large allocations when using CONFIG_SLAB

If an object is allocated on a tail page of a multi-page slab, kasan
will get the wrong tag because page->s_mem is NULL for tail pages. I'm
not quite sure what the user-visible effect of this might be.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 7f94ffbc4c6a ("kasan: add hooks implementation for tag-based mode")
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Acked-by: Marco Elver <[email protected]>
Reviewed-by: Andrey Konovalov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Marco Elver [Fri, 5 Nov 2021 20:35:50 +0000 (13:35 -0700)]

workqueue, kasan: avoid alloc_pages() when recording stack

Shuah Khan reported:

| When CONFIG_PROVE_RAW_LOCK_NESTING=y and CONFIG_KASAN are enabled,
| kasan_record_aux_stack() runs into "BUG: Invalid wait context" when
| it tries to allocate memory attempting to acquire spinlock in page
| allocation code while holding workqueue pool raw_spinlock.
|
| There are several instances of this problem when block layer tries
| to __queue_work(). Call trace from one of these instances is below:
|
|     kblockd_mod_delayed_work_on()
|       mod_delayed_work_on()
|         __queue_delayed_work()
|           __queue_work() (rcu_read_lock, raw_spin_lock pool->lock held)
|             insert_work()
|               kasan_record_aux_stack()
|                 kasan_save_stack()
|                   stack_depot_save()
|                     alloc_pages()
|                       __alloc_pages()
|                         get_page_from_freelist()
|                           rm_queue()
|                             rm_queue_pcplist()
|                               local_lock_irqsave(&pagesets.lock, flags);
|                               [ BUG: Invalid wait context triggered ]

The default kasan_record_aux_stack() calls stack_depot_save() with
GFP_NOWAIT, which in turn can then call alloc_pages(GFP_NOWAIT, ...).
In general, however, it is not even possible to use either GFP_ATOMIC
nor GFP_NOWAIT in certain non-preemptive contexts, including
raw_spin_locks (see gfp.h and commmit ab00db216c9c7).

Fix it by instructing stackdepot to not expand stack storage via
alloc_pages() in case it runs out by using
kasan_record_aux_stack_noalloc().

While there is an increased risk of failing to insert the stack trace,
this is typically unlikely, especially if the same insertion had already
succeeded previously (stack depot hit).

For frequent calls from the same location, it therefore becomes
extremely unlikely that kasan_record_aux_stack_noalloc() fails.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Marco Elver <[email protected]>
Reported-by: Shuah Khan <[email protected]>
Tested-by: Shuah Khan <[email protected]>
Acked-by: Sebastian Andrzej Siewior <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Reviewed-by: Andrey Konovalov <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: "Gustavo A. R. Silva" <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Taras Madan <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vijayanand Jitta <[email protected]>
Cc: Vinayak Menon <[email protected]>
Cc: Walter Wu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Marco Elver [Fri, 5 Nov 2021 20:35:46 +0000 (13:35 -0700)]

kasan: generic: introduce kasan_record_aux_stack_noalloc()

Introduce a variant of kasan_record_aux_stack() that does not do any
memory allocation through stackdepot. This will permit using it in
contexts that cannot allocate any memory.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Marco Elver <[email protected]>
Tested-by: Shuah Khan <[email protected]>
Acked-by: Sebastian Andrzej Siewior <[email protected]>
Reviewed-by: Andrey Konovalov <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: "Gustavo A. R. Silva" <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Taras Madan <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vijayanand Jitta <[email protected]>
Cc: Vinayak Menon <[email protected]>
Cc: Walter Wu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Marco Elver [Fri, 5 Nov 2021 20:35:43 +0000 (13:35 -0700)]

kasan: common: provide can_alloc in kasan_save_stack()

Add another argument, can_alloc, to kasan_save_stack() which is passed
as-is to __stack_depot_save().

No functional change intended.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Marco Elver <[email protected]>
Tested-by: Shuah Khan <[email protected]>
Acked-by: Sebastian Andrzej Siewior <[email protected]>
Reviewed-by: Andrey Konovalov <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: "Gustavo A. R. Silva" <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Taras Madan <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vijayanand Jitta <[email protected]>
Cc: Vinayak Menon <[email protected]>
Cc: Walter Wu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Marco Elver [Fri, 5 Nov 2021 20:35:39 +0000 (13:35 -0700)]

lib/stackdepot: introduce __stack_depot_save()

Add __stack_depot_save(), which provides more fine-grained control over
stackdepot's memory allocation behaviour, in case stackdepot runs out of
"stack slabs".

Normally stackdepot uses alloc_pages() in case it runs out of space;
passing can_alloc==false to __stack_depot_save() prohibits this, at the
cost of more likely failure to record a stack trace.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Marco Elver <[email protected]>
Tested-by: Shuah Khan <[email protected]>
Acked-by: Sebastian Andrzej Siewior <[email protected]>
Reviewed-by: Andrey Konovalov <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: "Gustavo A. R. Silva" <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Taras Madan <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vijayanand Jitta <[email protected]>
Cc: Vinayak Menon <[email protected]>
Cc: Walter Wu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Marco Elver [Fri, 5 Nov 2021 20:35:36 +0000 (13:35 -0700)]

lib/stackdepot: remove unused function argument

alloc_flags in depot_alloc_stack() is no longer used; remove it.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Marco Elver <[email protected]>
Tested-by: Shuah Khan <[email protected]>
Acked-by: Sebastian Andrzej Siewior <[email protected]>
Reviewed-by: Andrey Konovalov <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: "Gustavo A. R. Silva" <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Taras Madan <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vijayanand Jitta <[email protected]>
Cc: Vinayak Menon <[email protected]>
Cc: Walter Wu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Marco Elver [Fri, 5 Nov 2021 20:35:33 +0000 (13:35 -0700)]

lib/stackdepot: include gfp.h

Patch series "stackdepot, kasan, workqueue: Avoid expanding stackdepot
slabs when holding raw_spin_lock", v2.

Shuah Khan reported [1]:

| When CONFIG_PROVE_RAW_LOCK_NESTING=y and CONFIG_KASAN are enabled,
| kasan_record_aux_stack() runs into "BUG: Invalid wait context" when
| it tries to allocate memory attempting to acquire spinlock in page
| allocation code while holding workqueue pool raw_spinlock.
|
| There are several instances of this problem when block layer tries
| to __queue_work(). Call trace from one of these instances is below:
|
|     kblockd_mod_delayed_work_on()
|       mod_delayed_work_on()
|         __queue_delayed_work()
|           __queue_work() (rcu_read_lock, raw_spin_lock pool->lock held)
|             insert_work()
|               kasan_record_aux_stack()
|                 kasan_save_stack()
|                   stack_depot_save()
|                     alloc_pages()
|                       __alloc_pages()
|                         get_page_from_freelist()
|                           rm_queue()
|                             rm_queue_pcplist()
|                               local_lock_irqsave(&pagesets.lock, flags);
|                               [ BUG: Invalid wait context triggered ]

PROVE_RAW_LOCK_NESTING is pointing out that (on RT kernels) the locking
rules are being violated.  More generally, memory is being allocated
from a non-preemptive context (raw_spin_lock'd c-s) where it is not
allowed.

To properly fix this, we must prevent stackdepot from replenishing its
"stack slab" pool if memory allocations cannot be done in the current
context: it's a bug to use either GFP_ATOMIC nor GFP_NOWAIT in certain
non-preemptive contexts, including raw_spin_locks (see gfp.h and commit
ab00db216c9c7).

The only downside is that saving a stack trace may fail if: stackdepot
runs out of space AND the same stack trace has not been recorded before.
I expect this to be unlikely, and a simple experiment (boot the kernel)
didn't result in any failure to record stack trace from insert_work().

The series includes a few minor fixes to stackdepot that I noticed in
preparing the series.  It then introduces __stack_depot_save(), which
exposes the option to force stackdepot to not allocate any memory.
Finally, KASAN is changed to use the new stackdepot interface and
provide kasan_record_aux_stack_noalloc(), which is then used by
workqueue code.

[1] https://lkml.kernel.org/r/20210902200134 [email protected]

This patch (of 6):

<linux/stackdepot.h> refers to gfp_t, but doesn't include gfp.h.

Fix it by including <linux/gfp.h>.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Marco Elver <[email protected]>
Tested-by: Shuah Khan <[email protected]>
Acked-by: Sebastian Andrzej Siewior <[email protected]>
Reviewed-by: Andrey Konovalov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Walter Wu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Vijayanand Jitta <[email protected]>
Cc: Vinayak Menon <[email protected]>
Cc: "Gustavo A. R. Silva" <[email protected]>
Cc: Taras Madan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Fri, 5 Nov 2021 20:35:30 +0000 (13:35 -0700)]

mm: don't include <linux/dax.h> in <linux/mempolicy.h>

Not required at all, and having this causes a huge kernel rebuild as
soon as something in dax.h changes.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Sebastian Andrzej Siewior [Fri, 5 Nov 2021 20:35:27 +0000 (13:35 -0700)]

mm: disable NUMA_BALANCING_DEFAULT_ENABLED and TRANSPARENT_HUGEPAGE on PREEMPT_RT

TRANSPARENT_HUGEPAGE:
  There are potential non-deterministic delays to an RT thread if a
  critical memory region is not THP-aligned and a non-RT buffer is
  located in the same hugepage-aligned region. It's also possible for an
  unrelated thread to migrate pages belonging to an RT task incurring
  unexpected page faults due to memory defragmentation even if
  khugepaged is disabled.

Regular HUGEPAGEs are not affected by this can be used.

NUMA_BALANCING:
  There is a non-deterministic delay to mark PTEs PROT_NONE to gather
  NUMA fault samples, increased page faults of regions even if mlocked
  and non-deterministic delays when migrating pages.

[Mel Gorman worded 99% of the commit description].

Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Hyeonggon Yoo [Fri, 5 Nov 2021 20:35:24 +0000 (13:35 -0700)]

mm, slub: use prefetchw instead of prefetch

Commit 0ad9500e16fe ("slub: prefetch next freelist pointer in
slab_alloc()") introduced prefetch_freepointer() because when other
cpu(s) freed objects into a page that current cpu owns, the freelist
link is hot on cpu(s) which freed objects and possibly very cold on
current cpu.

But if freelist link chain is hot on cpu(s) which freed objects, it's
better to invalidate that chain because they're not going to access
again within a short time.

So use prefetchw instead of prefetch.  On supported architectures like
x86 and arm, it invalidates other copied instances of a cache line when
prefetching it.

Before:

Time: 91.677

Performance counter stats for 'hackbench -g 100 -l 10000':
        1462938.07 msec cpu-clock                 #   15.908 CPUs utilized
          18072550      context-switches          #   12.354 K/sec
           1018814      cpu-migrations            #  696.416 /sec
            104558      page-faults               #   71.471 /sec
     1580035699271      cycles                    #    1.080 GHz                      (54.51%)
     2003670016013      instructions              #    1.27  insn per cycle           (54.31%)
        5702204863      branch-misses                                                 (54.28%)
      643368500985      cache-references          #  439.778 M/sec                    (54.26%)
       18475582235      cache-misses              #    2.872 % of all cache refs      (54.28%)
      642206796636      L1-dcache-loads           #  438.984 M/sec                    (46.87%)
       18215813147      L1-dcache-load-misses     #    2.84% of all L1-dcache accesses  (46.83%)
      653842996501      dTLB-loads                #  446.938 M/sec                    (46.63%)
        3227179675      dTLB-load-misses          #    0.49% of all dTLB cache accesses  (46.85%)
      537531951350      iTLB-loads                #  367.433 M/sec                    (54.33%)
         114750630      iTLB-load-misses          #    0.02% of all iTLB cache accesses  (54.37%)
      630135543177      L1-icache-loads           #  430.733 M/sec                    (46.80%)
       22923237620      L1-icache-load-misses     #    3.64% of all L1-icache accesses  (46.76%)

      91.964452802 seconds time elapsed

      43.416742000 seconds user
    1422.441123000 seconds sys

After:

Time: 90.220

Performance counter stats for 'hackbench -g 100 -l 10000':
        1437418.48 msec cpu-clock                 #   15.880 CPUs utilized
          17694068      context-switches          #   12.310 K/sec
            958257      cpu-migrations            #  666.651 /sec
            100604      page-faults               #   69.989 /sec
     1583259429428      cycles                    #    1.101 GHz                      (54.57%)
     2004002484935      instructions              #    1.27  insn per cycle           (54.37%)
        5594202389      branch-misses                                                 (54.36%)
      643113574524      cache-references          #  447.409 M/sec                    (54.39%)
       18233791870      cache-misses              #    2.835 % of all cache refs      (54.37%)
      640205852062      L1-dcache-loads           #  445.386 M/sec                    (46.75%)
       17968160377      L1-dcache-load-misses     #    2.81% of all L1-dcache accesses  (46.79%)
      651747432274      dTLB-loads                #  453.415 M/sec                    (46.59%)
        3127124271      dTLB-load-misses          #    0.48% of all dTLB cache accesses  (46.75%)
      535395273064      iTLB-loads                #  372.470 M/sec                    (54.38%)
         113500056      iTLB-load-misses          #    0.02% of all iTLB cache accesses  (54.35%)
      628871845924      L1-icache-loads           #  437.501 M/sec                    (46.80%)
       22585641203      L1-icache-load-misses     #    3.59% of all L1-icache accesses  (46.79%)

      90.514819303 seconds time elapsed

      43.877656000 seconds user
    1397.176001000 seconds sys

Link: https://lkml.org/lkml/2021/10/8/598=20
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hyeonggon Yoo <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Vlastimil Babka [Fri, 5 Nov 2021 20:35:20 +0000 (13:35 -0700)]

mm/slub: increase default cpu partial list sizes

The defaults are determined based on object size and can go up to 30 for
objects smaller than 256 bytes.  Before the previous patch changed the
accounting, this could have made cpu partial list contain up to 30
pages.  After that patch, only up to 2 pages with default allocation
order.

Very short lists limit the usefulness of the whole concept of cpu
partial lists, so this patch aims at a more reasonable default under the
new accounting.  The defaults are quadrupled, except for object size >=
PAGE_SIZE where it's doubled.  This makes the lists grow up to 10 pages
in practice.

A quick test of booting a kernel under virtme with 4GB RAM and 8 vcpus
shows the following slab memory usage after boot:

Before previous patch (using page->pobjects):
  Slab:              36732 kB
  SReclaimable:      14836 kB
  SUnreclaim:        21896 kB

After previous patch (using page->pages):
  Slab:              34720 kB
  SReclaimable:      13716 kB
  SUnreclaim:        21004 kB

After this patch (using page->pages, higher defaults):
  Slab:              35252 kB
  SReclaimable:      13944 kB
  SUnreclaim:        21308 kB

In the same setup, I also ran 5 times:

    hackbench -l 16000 -g 16

Differences in time were in the noise, we can compare slub stats as
given by slabinfo -r skbuff_head_cache (the other cache heavily used by
hackbench, kmalloc-cg-512 looks similar).  Negligible stats left out for
brevity.

Before previous patch (using page->pobjects):

  Objects: 1408, Memory Total:  401408 Used :  304128

  Slab Perf Counter       Alloc     Free %Al %Fr
  --------------------------------------------------
  Fastpath             469952498  5946606  91   1
  Slowpath             42053573 506059465   8  98
  Page Alloc              41093    41044   0   0
  Add partial                18 21229327   0   4
  Remove partial       20039522    36051   3   0
  Cpu partial list      4686640 24767229   0   4
  RemoteObj/SlabFrozen       16 124027841   0  24
  Total                512006071 512006071
  Flushes       18

  Slab Deactivation             Occurrences %
  -------------------------------------------------
  Slab empty                       4993    0%
  Deactivation bypass           24767229   99%
  Refilled from foreign frees   21972674   88%

After previous patch (using page->pages):

  Objects: 480, Memory Total:  131072 Used :  103680

  Slab Perf Counter       Alloc     Free %Al %Fr
  --------------------------------------------------
  Fastpath             473016294  5405653  92   1
  Slowpath             38989777 506600418   7  98
  Page Alloc              32717    32701   0   0
  Add partial                 3 22749164   0   4
  Remove partial       11371127    32474   2   0
  Cpu partial list     11686226 23090059   2   4
  RemoteObj/SlabFrozen        2 67541803   0  13
  Total                512006071 512006071
  Flushes        3

  Slab Deactivation             Occurrences %
  -------------------------------------------------
  Slab empty                        227    0%
  Deactivation bypass           23090059   99%
  Refilled from foreign frees   27585695  119%

After this patch (using page->pages, higher defaults):

  Objects: 896, Memory Total:  229376 Used :  193536

  Slab Perf Counter       Alloc     Free %Al %Fr
  --------------------------------------------------
  Fastpath             473799295  4980278  92   0
  Slowpath             38206776 507025793   7  99
  Page Alloc              32295    32267   0   0
  Add partial                11 23291143   0   4
  Remove partial        5815764    31278   1   0
  Cpu partial list     18119280 23967320   3   4
  RemoteObj/SlabFrozen       10 76974794   0  15
  Total                512006071 512006071
  Flushes       11

  Slab Deactivation             Occurrences %
  -------------------------------------------------
  Slab empty                        989    0%
  Deactivation bypass           23967320   99%
  Refilled from foreign frees   32358473  135%

As expected, memory usage dropped significantly with change of
accounting, increasing the defaults increased it, but not as much.  The
number of page allocation/frees dropped significantly with the new
accounting, but didn't increase with the higher defaults.
Interestingly, the number of fasthpath allocations increased, as well as
allocations from the cpu partial list, even though it's shorter.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Roman Gushchin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Vlastimil Babka [Fri, 5 Nov 2021 20:35:17 +0000 (13:35 -0700)]

mm, slub: change percpu partial accounting from objects to pages

With CONFIG_SLUB_CPU_PARTIAL enabled, SLUB keeps a percpu list of
partial slabs that can be promoted to cpu slab when the previous one is
depleted, without accessing the shared partial list.  A slab can be
added to this list by 1) refill of an empty list from get_partial_node()
- once we really have to access the shared partial list, we acquire
multiple slabs to amortize the cost of locking, and 2) first free to a
previously full slab - instead of putting the slab on a shared partial
list, we can more cheaply freeze it and put it on the per-cpu list.

To control how large a percpu partial list can grow for a kmem cache,
set_cpu_partial() calculates a target number of free objects on each
cpu's percpu partial list, and this can be also set by the sysfs file
cpu_partial.

However, the tracking of actual number of objects is imprecise, in order
to limit overhead from cpu X freeing an objects to a slab on percpu
partial list of cpu Y.  Basically, the percpu partial slabs form a
single linked list, and when we add a new slab to the list with current
head "oldpage", we set in the struct page of the slab we're adding:

    page->pages = oldpage->pages + 1; // this is precise
    page->pobjects = oldpage->pobjects + (page->objects - page->inuse);
    page->next = oldpage;

Thus the real number of free objects in the slab (objects - inuse) is
only determined at the moment of adding the slab to the percpu partial
list, and further freeing doesn't update the pobjects counter nor
propagate it to the current list head.  As Jann reports [1], this can
easily lead to large inaccuracies, where the target number of objects
(up to 30 by default) can translate to the same number of (empty) slab
pages on the list.  In case 2) above, we put a slab with 1 free object
on the list, thus only increase page->pobjects by 1, even if there are
subsequent frees on the same slab.  Jann has noticed this in practice
and so did we [2] when investigating significant increase of kmemcg
usage after switching from SLAB to SLUB.

While this is no longer a problem in kmemcg context thanks to the
accounting rewrite in 5.9, the memory waste is still not ideal and it's
questionable whether it makes sense to perform free object count based
control when object counts can easily become so much inaccurate.  So
this patch converts the accounting to be based on number of pages only
(which is precise) and removes the page->pobjects field completely.
This is also ultimately simpler.

To retain the existing set_cpu_partial() heuristic, first calculate the
target number of objects as previously, but then convert it to target
number of pages by assuming the pages will be half-filled on average.
This assumption might obviously also be inaccurate in practice, but
cannot degrade to actual number of pages being equal to the target
number of objects.

We could also skip the intermediate step with target number of objects
and rewrite the heuristic in terms of pages.  However we still have the
sysfs file cpu_partial which uses number of objects and could break
existing users if it suddenly becomes number of pages, so this patch
doesn't do that.

In practice, after this patch the heuristics limit the size of percpu
partial list up to 2 pages.  In case of a reported regression (which
would mean some workload has benefited from the previous imprecise
object based counting), we can tune the heuristics to get a better
compromise within the new scheme, while still avoid the unexpectedly
long percpu partial lists.

[1] https://lore.kernel.org/linux-mm/CAG48ez2Qx5K1Cab-m8BdSibp6wLTip6ro4=-umR7BLsEgjEYzA@mail.gmail.com/
[2] https://lore.kernel.org/all/2f0f46e8-2535-410a-1859-e9cfa4e57c18@suse.cz/

==========
Evaluation
==========

Mel was kind enough to run v1 through mmtests machinery for netperf
(localhost) and hackbench and, for most significant results see below.
So there are some apparent regressions, especially with hackbench, which
I think ultimately boils down to having shorter percpu partial lists on
average and some benchmarks benefiting from longer ones.  Monitoring
slab usage also indicated less memory usage by slab.  Based on that, the
following patch will bump the defaults to allow longer percpu partial
lists than after this patch.

However the goal is certainly not such that we would limit the percpu
partial lists to 30 pages just because previously a specific alloc/free
pattern could lead to the limit of 30 objects translate to a limit to 30
pages - that would make little sense.  This is a correctness patch, and
if a workload benefits from larger lists, the sysfs tuning knobs are
still there to allow that.

Netperf

  2-socket Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz (20 cores, 40 threads per socket), 384GB RAM
  TCP-RR:
    hmean before 127045.79 after 121092.94 (-4.69%, worse)
    stddev before  2634.37 after   1254.08
  UDP-RR:
    hmean before 166985.45 after 160668.94 ( -3.78%, worse)
    stddev before 4059.69 after 1943.63

  2-socket Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz (20 cores, 40 threads per socket), 512GB RAM
  TCP-RR:
    hmean before 84173.25 after 76914.72 ( -8.62%, worse)
  UDP-RR:
    hmean before 93571.12 after 96428.69 ( 3.05%, better)
    stddev before 23118.54 after 16828.14

  2-socket Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz (12 cores, 24 threads per socket), 64GB RAM
  TCP-RR:
    hmean before 49984.92 after 48922.27 ( -2.13%, worse)
    stddev before 6248.15 after 4740.51
  UDP-RR:
    hmean before 61854.31 after 68761.81 ( 11.17%, better)
    stddev before 4093.54 after 5898.91

  other machines - within 2%

Hackbench

  (results before and after the patch, negative % means worse)

  2-socket AMD EPYC 7713 (64 cores, 128 threads per core), 256GB RAM
  hackbench-process-sockets
  Amean 1 0.5380 0.5583 ( -3.78%)
  Amean 4 0.7510 0.8150 ( -8.52%)
  Amean 7 0.7930 0.9533 ( -20.22%)
  Amean 12 0.7853 1.1313 ( -44.06%)
  Amean 21 1.1520 1.4993 ( -30.15%)
  Amean 30 1.6223 1.9237 ( -18.57%)
  Amean 48 2.6767 2.9903 ( -11.72%)
  Amean 79 4.0257 5.1150 ( -27.06%)
  Amean 110 5.5193 7.4720 ( -35.38%)
  Amean 141 7.2207 9.9840 ( -38.27%)
  Amean 172 8.4770 12.1963 ( -43.88%)
  Amean 203 9.6473 14.3137 ( -48.37%)
  Amean 234 11.3960 18.7917 ( -64.90%)
  Amean 265 13.9627 22.4607 ( -60.86%)
  Amean 296 14.9163 26.0483 ( -74.63%)

  hackbench-thread-sockets
  Amean 1 0.5597 0.5877 ( -5.00%)
  Amean 4 0.7913 0.8960 ( -13.23%)
  Amean 7 0.8190 1.0017 ( -22.30%)
  Amean 12 0.9560 1.1727 ( -22.66%)
  Amean 21 1.7587 1.5660 ( 10.96%)
  Amean 30 2.4477 1.9807 ( 19.08%)
  Amean 48 3.4573 3.0630 ( 11.41%)
  Amean 79 4.7903 5.1733 ( -8.00%)
  Amean 110 6.1370 7.4220 ( -20.94%)
  Amean 141 7.5777 9.2617 ( -22.22%)
  Amean 172 9.2280 11.0907 ( -20.18%)
  Amean 203 10.2793 13.3470 ( -29.84%)
  Amean 234 11.2410 17.1070 ( -52.18%)
  Amean 265 12.5970 23.3323 ( -85.22%)
  Amean 296 17.1540 24.2857 ( -41.57%)

  2-socket Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz (20 cores, 40 threads
  per socket), 384GB RAM
  hackbench-process-sockets
  Amean 1 0.5760 0.4793 ( 16.78%)
  Amean 4 0.9430 0.9707 ( -2.93%)
  Amean 7 1.5517 1.8843 ( -21.44%)
  Amean 12 2.4903 2.7267 ( -9.49%)
  Amean 21 3.9560 4.2877 ( -8.38%)
  Amean 30 5.4613 5.8343 ( -6.83%)
  Amean 48 8.5337 9.2937 ( -8.91%)
  Amean 79 14.0670 15.2630 ( -8.50%)
  Amean 110 19.2253 21.2467 ( -10.51%)
  Amean 141 23.7557 25.8550 ( -8.84%)
  Amean 172 28.4407 29.7603 ( -4.64%)
  Amean 203 33.3407 33.9927 ( -1.96%)
  Amean 234 38.3633 39.1150 ( -1.96%)
  Amean 265 43.4420 43.8470 ( -0.93%)
  Amean 296 48.3680 48.9300 ( -1.16%)

  hackbench-thread-sockets
  Amean 1 0.6080 0.6493 ( -6.80%)
  Amean 4 1.0000 1.0513 ( -5.13%)
  Amean 7 1.6607 2.0260 ( -22.00%)
  Amean 12 2.7637 2.9273 ( -5.92%)
  Amean 21 5.0613 4.5153 ( 10.79%)
  Amean 30 6.3340 6.1140 ( 3.47%)
  Amean 48 9.0567 9.5577 ( -5.53%)
  Amean 79 14.5657 15.7983 ( -8.46%)
  Amean 110 19.6213 21.6333 ( -10.25%)
  Amean 141 24.1563 26.2697 ( -8.75%)
  Amean 172 28.9687 30.2187 ( -4.32%)
  Amean 203 33.9763 34.6970 ( -2.12%)
  Amean 234 38.8647 39.3207 ( -1.17%)
  Amean 265 44.0813 44.1507 ( -0.16%)
  Amean 296 49.2040 49.4330 ( -0.47%)

  2-socket Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz (20 cores, 40 threads
  per socket), 512GB RAM
  hackbench-process-sockets
  Amean 1 0.5027 0.5017 ( 0.20%)
  Amean 4 1.1053 1.2033 ( -8.87%)
  Amean 7 1.8760 2.1820 ( -16.31%)
  Amean 12 2.9053 3.1810 ( -9.49%)
  Amean 21 4.6777 4.9920 ( -6.72%)
  Amean 30 6.5180 6.7827 ( -4.06%)
  Amean 48 10.0710 10.5227 ( -4.48%)
  Amean 79 16.4250 17.5053 ( -6.58%)
  Amean 110 22.6203 24.4617 ( -8.14%)
  Amean 141 28.0967 31.0363 ( -10.46%)
  Amean 172 34.4030 36.9233 ( -7.33%)
  Amean 203 40.5933 43.0850 ( -6.14%)
  Amean 234 46.6477 48.7220 ( -4.45%)
  Amean 265 53.0530 53.9597 ( -1.71%)
  Amean 296 59.2760 59.9213 ( -1.09%)

  hackbench-thread-sockets
  Amean 1 0.5363 0.5330 ( 0.62%)
  Amean 4 1.1647 1.2157 ( -4.38%)
  Amean 7 1.9237 2.2833 ( -18.70%)
  Amean 12 2.9943 3.3110 ( -10.58%)
  Amean 21 4.9987 5.1880 ( -3.79%)
  Amean 30 6.7583 7.0043 ( -3.64%)
  Amean 48 10.4547 10.8353 ( -3.64%)
  Amean 79 16.6707 17.6790 ( -6.05%)
  Amean 110 22.8207 24.4403 ( -7.10%)
  Amean 141 28.7090 31.0533 ( -8.17%)
  Amean 172 34.9387 36.8260 ( -5.40%)
  Amean 203 41.1567 43.0450 ( -4.59%)
  Amean 234 47.3790 48.5307 ( -2.43%)
  Amean 265 53.9543 54.6987 ( -1.38%)
  Amean 296 60.0820 60.2163 ( -0.22%)

  1-socket Intel(R) Xeon(R) CPU E3-1240 v5 @ 3.50GHz (4 cores, 8 threads),
  32 GB RAM
  hackbench-process-sockets
  Amean 1 1.4760 1.5773 ( -6.87%)
  Amean 3 3.9370 4.0910 ( -3.91%)
  Amean 5 6.6797 6.9357 ( -3.83%)
  Amean 7 9.3367 9.7150 ( -4.05%)
  Amean 12 15.7627 16.1400 ( -2.39%)
  Amean 18 23.5360 23.6890 ( -0.65%)
  Amean 24 31.0663 31.3137 ( -0.80%)
  Amean 30 38.7283 39.0037 ( -0.71%)
  Amean 32 41.3417 41.6097 ( -0.65%)

  hackbench-thread-sockets
  Amean 1 1.5250 1.6043 ( -5.20%)
  Amean 3 4.0897 4.2603 ( -4.17%)
  Amean 5 6.7760 7.0933 ( -4.68%)
  Amean 7 9.4817 9.9157 ( -4.58%)
  Amean 12 15.9610 16.3937 ( -2.71%)
  Amean 18 23.9543 24.3417 ( -1.62%)
  Amean 24 31.4400 31.7217 ( -0.90%)
  Amean 30 39.2457 39.5467 ( -0.77%)
  Amean 32 41.8267 42.1230 ( -0.71%)

  2-socket Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz (12 cores, 24 threads
  per socket), 64GB RAM
  hackbench-process-sockets
  Amean 1 1.0347 1.0880 ( -5.15%)
  Amean 4 1.7267 1.8527 ( -7.30%)
  Amean 7 2.6707 2.8110 ( -5.25%)
  Amean 12 4.1617 4.3383 ( -4.25%)
  Amean 21 7.0070 7.2600 ( -3.61%)
  Amean 30 9.9187 10.2397 ( -3.24%)
  Amean 48 15.6710 16.3923 ( -4.60%)
  Amean 79 24.7743 26.1247 ( -5.45%)
  Amean 110 34.3000 35.9307 ( -4.75%)
  Amean 141 44.2043 44.8010 ( -1.35%)
  Amean 172 54.2430 54.7260 ( -0.89%)
  Amean 192 60.6557 60.9777 ( -0.53%)

  hackbench-thread-sockets
  Amean 1 1.0610 1.1353 ( -7.01%)
  Amean 4 1.7543 1.9140 ( -9.10%)
  Amean 7 2.7840 2.9573 ( -6.23%)
  Amean 12 4.3813 4.4937 ( -2.56%)
  Amean 21 7.3460 7.5350 ( -2.57%)
  Amean 30 10.2313 10.5190 ( -2.81%)
  Amean 48 15.9700 16.5940 ( -3.91%)
  Amean 79 25.3973 26.6637 ( -4.99%)
  Amean 110 35.1087 36.4797 ( -3.91%)
  Amean 141 45.8220 46.3053 ( -1.05%)
  Amean 172 55.4917 55.7320 ( -0.43%)
  Amean 192 62.7490 62.5410 ( 0.33%)

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Reported-by: Jann Horn <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Kefeng Wang [Fri, 5 Nov 2021 20:35:14 +0000 (13:35 -0700)]

slub: add back check for free nonslab objects

After commit f227f0faf63b ("slub: fix unreclaimable slab stat for bulk
free"), the check for free nonslab page is replaced by VM_BUG_ON_PAGE,
which only check with CONFIG_DEBUG_VM enabled, but this config may
impact performance, so it only for debug.

Commit 0937502af7c9 ("slub: Add check for kfree() of non slab objects.")
add the ability, which should be needed in any configs to catch the
invalid free, they even could be potential issue, eg, memory corruption,
use after free and double free, so replace VM_BUG_ON_PAGE to
WARN_ON_ONCE, add object address printing to help use to debug the
issue.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rienjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Shi Lei [Fri, 5 Nov 2021 20:35:10 +0000 (13:35 -0700)]

mm/slab.c: remove useless lines in enable_cpucache()

These lines are useless, so remove them.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 10befea91b61 ("mm: memcg/slab: use a single set of kmem_caches for all allocations")
Signed-off-by: Shi Lei <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Fri, 5 Nov 2021 20:35:07 +0000 (13:35 -0700)]

mm: move kvmalloc-related functions to slab.h

Not all files in the kernel should include mm.h. Migrating callers from
kmalloc to kvmalloc is easier if the kvmalloc functions are in slab.h.

[[email protected]: move the new kvrealloc() also]
[[email protected]: drivers/hwmon/occ/p9_sbe.c needs slab.h]

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Acked-by: Pekka Enberg <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Jia He [Fri, 5 Nov 2021 20:35:04 +0000 (13:35 -0700)]

d_path: fix Kernel doc validator complaining

Kernel doc validator complains:
Function parameter or member 'p' not described in 'prepend_name'
Excess function parameter 'buffer' description in 'prepend_name'

Link: https://lkml.kernel.org/r/[email protected]
Fixes: ad08ae586586 ("d_path: introduce struct prepend_buffer")
Signed-off-by: Jia He <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Arnd Bergmann [Fri, 5 Nov 2021 20:35:01 +0000 (13:35 -0700)]

fs/posix_acl.c: avoid -Wempty-body warning

The fallthrough comment for an ignored cmpxchg() return value produces a
harmless warning with 'make W=1':

  fs/posix_acl.c: In function 'get_acl':
  fs/posix_acl.c:127:36: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
    127 |                 /* fall through */ ;
        |                                    ^

Simplify it as a step towards a clean W=1 build.  As all architectures
define cmpxchg() as a statement expression these days, it is no longer
necessary to evaluate its return code, and the if() can just be droped.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/20210322132103.qiun2rjilnlgztxe@wittgenstein/
Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Christian Brauner <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: James Morris <[email protected]>
Cc: Serge Hallyn <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Jan Kara [Fri, 5 Nov 2021 20:34:58 +0000 (13:34 -0700)]

ocfs2: do not zero pages beyond i_size

ocfs2_zero_range_for_truncate() can try to zero pages beyond current
inode size despite the fact that underlying blocks should be already
zeroed out and writeback will skip writing such pages anyway. Avoid the
pointless work.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jan Kara <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Jan Kara [Fri, 5 Nov 2021 20:34:55 +0000 (13:34 -0700)]

ocfs2: fix data corruption on truncate

Patch series "ocfs2: Truncate data corruption fix".

As further testing has shown, commit 5314454ea3f ("ocfs2: fix data
corruption after conversion from inline format") didn't fix all the data
corruption issues the customer started observing after 6dbf7bb55598
("fs: Don't invalidate page buffers in block_write_full_page()") This
time I have tracked them down to two bugs in ocfs2 truncation code.

One bug (truncating page cache before clearing tail cluster and setting
i_size) could cause data corruption even before 6dbf7bb55598, but before
that commit it needed a race with page fault, after 6dbf7bb55598 it
started to be pretty deterministic.

Another bug (zeroing pages beyond old i_size) used to be harmless
inefficiency before commit 6dbf7bb55598.  But after commit 6dbf7bb55598
in combination with the first bug it resulted in deterministic data
corruption.

Although fixing only the first problem is needed to stop data
corruption, I've fixed both issues to make the code more robust.

This patch (of 2):

ocfs2_truncate_file() did unmap invalidate page cache pages before
zeroing partial tail cluster and setting i_size.  Thus some pages could
be left (and likely have left if the cluster zeroing happened) in the
page cache beyond i_size after truncate finished letting user possibly
see stale data once the file was extended again.  Also the tail cluster
zeroing was not guaranteed to finish before truncate finished causing
possible stale data exposure.  The problem started to be particularly
easy to hit after commit 6dbf7bb55598 "fs: Don't invalidate page buffers
in block_write_full_page()" stopped invalidation of pages beyond i_size
from page writeback path.

Fix these problems by unmapping and invalidating pages in the page cache
after the i_size is reduced and tail cluster is zeroed out.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
Signed-off-by: Jan Kara <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Colin Ian King [Fri, 5 Nov 2021 20:34:52 +0000 (13:34 -0700)]

ocfs2/dlm: remove redundant assignment of variable ret

The variable ret is being assigned a value that is never read, it is
updated later on with a different value. The assignment is redundant
and can be removed.

Addresses-Coverity: ("Unused value")
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Valentin Vidic [Fri, 5 Nov 2021 20:34:49 +0000 (13:34 -0700)]

ocfs2: cleanup journal init and shutdown

Allocate and free struct ocfs2_journal in ocfs2_journal_init and
ocfs2_journal_shutdown. Init and release of system inodes references
the journal so reorder calls to make sure they work correctly.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Valentin Vidic <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Chenyuan Mi [Fri, 5 Nov 2021 20:34:45 +0000 (13:34 -0700)]

ocfs2: fix handle refcount leak in two exception handling paths

The reference counting issue happens in two exception handling paths of
ocfs2_replay_truncate_records(). When executing these two exception
handling paths, the function forgets to decrease the refcount of handle
increased by ocfs2_start_trans(), causing a refcount leak.

Fix this issue by using ocfs2_commit_trans() to decrease the refcount of
handle in two handling paths.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Chenyuan Mi <[email protected]>
Signed-off-by: Xiyu Yang <[email protected]>
Signed-off-by: Xin Tan <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Wengang Wang <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

weidonghui [Fri, 5 Nov 2021 20:34:42 +0000 (13:34 -0700)]

scripts/decodecode: fix faulting instruction no print when opps.file is DOS format

If opps.file is in DOS format, faulting instruction cannot be printed:

  / # ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
  / # ./scripts/decodecode < oops.file
  [ 0.734345] Code: d0002881 912f9c21 94067e68 d2800001 (b900003f)
  aarch64-linux-gnu-strip: '/tmp/tmp.5Y9eybnnSi.o': No such file
  aarch64-linux-gnu-objdump: '/tmp/tmp.5Y9eybnnSi.o': No such file
  All code
  ========
     0:   d0002881        adrp    x1, 0x512000
     4:   912f9c21        add     x1, x1, #0xbe7
     8:   94067e68        bl      0x19f9a8
     c:   d2800001        mov     x1, #0x0                        // #0
    10:   b900003f        str     wzr, [x1]

  Code starting with the faulting instruction
  ===========================================

Background: The compilation environment is Ubuntu, and the test
environment is Windows.  Most logs are generated in the Windows
environment.  In this way, CR (carriage return) will inevitably appear,
which will affect the use of decodecode in the Ubuntu environment.

The repaired effect is as follows:

  / # ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
  / # ./scripts/decodecode < oops.file
  [ 0.734345] Code: d0002881 912f9c21 94067e68 d2800001 (b900003f)
  All code
  ========
     0:   d0002881        adrp    x1, 0x512000
     4:   912f9c21        add     x1, x1, #0xbe7
     8:   94067e68        bl      0x19f9a8
     c:   d2800001        mov     x1, #0x0                        // #0
    10:*  b900003f        str     wzr, [x1]               <-- trapping instruction

  Code starting with the faulting instruction
  ===========================================
     0:   b900003f        str     wzr, [x1]

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: weidonghui <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Rabin Vincent <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Sven Eckelmann [Fri, 5 Nov 2021 20:34:39 +0000 (13:34 -0700)]

scripts/spelling.txt: fix "mistake" version of "synchronization"

If both "mistake" version and "correction" version are the same, a
warning message is created by checkpatch which is impossible to fix.
But it was noticed that Colan Ian King created a commit e6c0a0889b80
("ALSA: aloop: Fix spelling mistake "synchronization" ->
"synchronization"") which suggests that this spelling mistake was fixed
by replacing the word "synchronization" with itself. But the actual
diff shows that the mistake in the code was "sychronization". It is
rather likely that the "mistake" in spelling.txt should have been the
latter.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 2e74c9433ba8 ("scripts/spelling.txt: add more spellings to spelling.txt")
Signed-off-by: Sven Eckelmann <[email protected]>
Reviewed-by: Colin Ian King <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Colin Ian King [Fri, 5 Nov 2021 20:34:36 +0000 (13:34 -0700)]

scripts/spelling.txt: add more spellings to spelling.txt

Some of the more common spelling mistakes and typos that I've found
while fixing up spelling mistakes in the kernel in the past few months.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Linus Torvalds [Sun, 31 Oct 2021 20:53:10 +0000 (13:53 -0700)]

Linux 5.15

commit | commitdiff | tree

Linus Torvalds [Sun, 31 Oct 2021 18:24:06 +0000 (11:24 -0700)]

Merge tag 'perf-tools-fixes-for-v5.15-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

- Fix compilation of callchain related code on powerpc with gcc11+

- Fix PERF_SAMPLE_WEIGHT_STRUCT support in 'perf script'

- Check session->header.env.arch before using it, fixing a segmentation
   fault

- Suppress 'rm dlfilter' build messages

* tag 'perf-tools-fixes-for-v5.15-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf script: Fix PERF_SAMPLE_WEIGHT_STRUCT support
  perf callchain: Fix compilation on powerpc with gcc11+
  perf script: Check session->header.env.arch before using it
  perf build: Suppress 'rm dlfilter' build message

commit | commitdiff | tree

Linus Torvalds [Sun, 31 Oct 2021 18:19:02 +0000 (11:19 -0700)]

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:

- Fixes for s390 interrupt delivery

- Fixes for Xen emulator bugs showing up as debug kernel WARNs

- Fix another issue with SEV/ES string I/O VMGEXITs

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Take srcu lock in post_kvm_run_save()
  KVM: SEV-ES: fix another issue with string I/O VMGEXITs
  KVM: x86/xen: Fix kvm_xen_has_interrupt() sleeping in kvm_vcpu_block()
  KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock
  KVM: s390: preserve deliverable_mask in __airqs_kick_single_vcpu
  KVM: s390: clear kicked_mask before sleeping again

commit | commitdiff | tree

Kan Liang [Wed, 29 Sep 2021 15:38:13 +0000 (08:38 -0700)]

perf script: Fix PERF_SAMPLE_WEIGHT_STRUCT support

-F weight in perf script is broken.

  # ./perf mem record
  # ./perf script -F weight
  Samples for 'dummy:HG' event do not have WEIGHT attribute set. Cannot
print 'weight' field.

The sample type, PERF_SAMPLE_WEIGHT_STRUCT, is an alternative of the
PERF_SAMPLE_WEIGHT sample type. They share the same space, weight. The
lower 32 bits are exactly the same for both sample type. The higher 32
bits may be different for different architecture. For a new kernel on
x86, the PERF_SAMPLE_WEIGHT_STRUCT is used. For an old kernel or other
ARCHs, the PERF_SAMPLE_WEIGHT is used.

With -F weight, current perf script will only check the input string
"weight" with the PERF_SAMPLE_WEIGHT sample type. Because the commit
ea8d0ed6eae3 ("perf tools: Support PERF_SAMPLE_WEIGHT_STRUCT") didn't
update the PERF_SAMPLE_WEIGHT_STRUCT sample type for perf script. For a
new kernel on x86, the check fails.

Use PERF_SAMPLE_WEIGHT_TYPE, which supports both sample types, to
replace PERF_SAMPLE_WEIGHT

Fixes: ea8d0ed6eae37b01 ("perf tools: Support PERF_SAMPLE_WEIGHT_STRUCT")
Reported-by: Joe Mario <[email protected]>
Reviewed-by: Kajol Jain <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Tested-by: Jiri Olsa <[email protected]>
Tested-by: Joe Mario <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Joe Mario <[email protected]>
Cc: Andi Kleen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

commit | commitdiff | tree

Jiri Olsa [Tue, 28 Sep 2021 19:52:53 +0000 (21:52 +0200)]

perf callchain: Fix compilation on powerpc with gcc11+

Got following build fail on powerpc:

    CC      arch/powerpc/util/skip-callchain-idx.o
  In function ‘check_return_reg’,
      inlined from ‘check_return_addr’ at arch/powerpc/util/skip-callchain-idx.c:213:7,
      inlined from ‘arch_skip_callchain_idx’ at arch/powerpc/util/skip-callchain-idx.c:265:7:
  arch/powerpc/util/skip-callchain-idx.c:54:18: error: ‘dwarf_frame_register’ accessing 96 bytes \
  in a region of size 64 [-Werror=stringop-overflow=]
     54 |         result = dwarf_frame_register(frame, ra_regno, ops_mem, &ops, &nops);
        |                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  arch/powerpc/util/skip-callchain-idx.c: In function ‘arch_skip_callchain_idx’:
  arch/powerpc/util/skip-callchain-idx.c:54:18: note: referencing argument 3 of type ‘Dwarf_Op *’
  In file included from /usr/include/elfutils/libdwfl.h:32,
                   from arch/powerpc/util/skip-callchain-idx.c:10:
  /usr/include/elfutils/libdw.h:1069:12: note: in a call to function ‘dwarf_frame_register’
   1069 | extern int dwarf_frame_register (Dwarf_Frame *frame, int regno,
        |            ^~~~~~~~~~~~~~~~~~~~
  cc1: all warnings being treated as errors

The dwarf_frame_register args changed with [1],
Updating ops_mem accordingly.

[1] https://sourceware.org/git/?p=elfutils.git;a=commit;h=5621fe5443da23112170235dd5cac161e5c75e65

Reviewed-by: Kajol Jain <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Mark Wieelard <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

commit | commitdiff | tree

Song Liu [Mon, 4 Oct 2021 05:32:38 +0000 (22:32 -0700)]

perf script: Check session->header.env.arch before using it

When perf.data is not written cleanly, we would like to process existing
data as much as possible (please see f_header.data.size == 0 condition
in perf_session__read_header). However, perf.data with partial data may
crash perf. Specifically, we see crash in 'perf script' for NULL
session->header.env.arch.

Fix this by checking session->header.env.arch before using it to determine
native_arch. Also split the if condition so it is easier to read.

Committer notes:

If it is a pipe, we already assume is a native arch, so no need to check
session->header.env.arch.

Signed-off-by: Song Liu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

commit | commitdiff | tree

Adrian Hunter [Thu, 30 Sep 2021 06:28:49 +0000 (09:28 +0300)]

perf build: Suppress 'rm dlfilter' build message

The following build message:

rm dlfilters/dlfilter-test-api-v0.o

is unwanted.

The object file is being treated as an intermediate file and being
automatically removed. Mark the object file as .SECONDARY to prevent
removal and hence the message.

Requested-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

commit | commitdiff | tree

Linus Torvalds [Sat, 30 Oct 2021 22:56:38 +0000 (15:56 -0700)]

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Three small fixes, all in drivers, and one sizeable update to the UFS
  driver to remove the HPB 2.0 feature that has been objected to by Jens
  and Christoph.

  Although the UFS patch is large and last minute, it's essentially the
  least intrusive way of resolving the objections in time for the 5.15
  release"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: ufshpb: Remove HPB2.0 flows
  scsi: mpt3sas: Fix reference tag handling for WRITE_INSERT
  scsi: ufs: ufs-exynos: Correct timeout value setting registers
  scsi: ibmvfc: Fix up duplicate response detection

commit | commitdiff | tree

Linus Torvalds [Sat, 30 Oct 2021 16:55:46 +0000 (09:55 -0700)]

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fix from Stephen Boyd:
"One fix for the composite clk that broke when we changed this clk type
  to use the determine_rate instead of round_rate clk op by default.
  This caused lots of problems on Rockchip SoCs because they heavily use
  the composite clk code to model the clk tree"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: composite: Also consider .determine_rate for rate + mux composites

commit | commitdiff | tree

Linus Torvalds [Sat, 30 Oct 2021 16:28:24 +0000 (09:28 -0700)]

Merge tag 'riscv-for-linus-5.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:
"These are pretty late, but they do fix concrete issues.

   - ensure the trap vector's address is aligned.

   - avoid re-populating the KASAN shadow memory.

   - allow kasan to build without warnings, which have recently become
     errors"

* tag 'riscv-for-linus-5.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fix asan-stack clang build
  riscv: Do not re-populate shadow memory with kasan_populate_early_shadow
  riscv: fix misalgned trap vector base address

commit | commitdiff | tree

Avri Altman [Sat, 30 Oct 2021 06:23:01 +0000 (09:23 +0300)]

scsi: ufs: ufshpb: Remove HPB2.0 flows

The Host Performance Buffer feature allows UFS read commands to carry the
physical media addresses along with the LBAs, thus allowing less internal
L2P-table switches in the device. HPB1.0 allowed a single LBA, while
HPB2.0 increases this capacity up to 255 blocks.

Carrying more than a single record, the read operation is no longer purely
of type "read" but a "hybrid" command: Writing the physical address to the
device in one operation and reading back the required payload in another.

The JEDEC HPB spec defines two commands for this operation:
HPB-WRITE-BUFFER (0x2) to write the physical addresses to device, and
HPB-READ to read the payload.

With the current HPB design the UFS driver has no alternative but to divide
the READ request into 2 separate commands: HPB-WRITE-BUFFER and HPB-READ.
This causes a great deal of aggravation to the block layer guys who
demanded that we completely revert the entire HPB driver regardless of the
huge amount of corporate effort already invested in it.

As a compromise, remove only the pieces that implement the 2.0
specification. This is done as a matter of urgency for the final 5.15
release.

Link: https://lore.kernel.org/r/[email protected]
Tested-by: Avri Altman <[email protected]>
Tested-by: Bean Huo <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Reviewed-by: Bean Huo <[email protected]>
Co-developed-by: James Bottomley <[email protected]>
Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Avri Altman <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

commit | commitdiff | tree

Linus Torvalds [Sat, 30 Oct 2021 00:35:56 +0000 (17:35 -0700)]

Merge tag 'powerpc-5.15-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
"Three commits fixing some issues introduced with the recent IOMMU
  changes we merged.

  Thanks to Alexey Kardashevskiy"

* tag 'powerpc-5.15-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pseries/iommu: Create huge DMA window if no MMIO32 is present
  powerpc/pseries/iommu: Check if the default window in use before removing it
  powerpc/pseries/iommu: Use correct vfree for it_map

commit | commitdiff | tree

Linus Torvalds [Sat, 30 Oct 2021 00:04:38 +0000 (17:04 -0700)]

Merge tag 'gpio-fixes-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

- fix the return value check when parsing the ngpios property in
   gpio-xgs-iproc

- check the return value of bgpio_init() in gpio-mlxbf2

* tag 'gpio-fixes-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: mlxbf2.c: Add check for bgpio_init failure
  gpio: xgs-iproc: fix parsing of ngpios property

commit | commitdiff | tree

Linus Torvalds [Fri, 29 Oct 2021 18:10:29 +0000 (11:10 -0700)]

Merge tag 'block-5.15-2021-10-29' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

- NVMe pull request:
      - fix nvmet-tcp header digest verification (Amit Engel)
      - fix a memory leak in nvmet-tcp when releasing a queue (Maurizio
        Lombardi)
      - fix nvme-tcp H2CData PDU send accounting again (Sagi Grimberg)
      - fix digest pointer calculation in nvme-tcp and nvmet-tcp (Varun
        Prakash)
      - fix possible nvme-tcp req->offset corruption (Varun Prakash)

- Queue drain ordering fix (Ming)

- Partition check regression for zoned devices (Shin'ichiro)

- Zone queue restart fix (Naohiro)

* tag 'block-5.15-2021-10-29' of git://git.kernel.dk/linux-block:
  block: Fix partition check for host-aware zoned block devices
  nvmet-tcp: fix header digest verification
  nvmet-tcp: fix data digest pointer calculation
  nvme-tcp: fix data digest pointer calculation
  nvme-tcp: fix possible req->offset corruption
  block: schedule queue restart after BLK_STS_ZONE_RESOURCE
  block: drain queue after disk is removed from sysfs
  nvme-tcp: fix H2CData PDU send accounting (again)
  nvmet-tcp: fix a memory leak when releasing a queue

commit | commitdiff | tree

Martin K. Petersen [Thu, 28 Oct 2021 03:42:02 +0000 (23:42 -0400)]

scsi: mpt3sas: Fix reference tag handling for WRITE_INSERT

Testing revealed a problem with how the reference tag was handled for
a WRITE_INSERT operation. The SCSI_PROT_REF_CHECK flag is not set when
the controller is asked to generate the protection information
(i.e. not DIX). And as a result the initial reference tag would not be
set in the WRITE_INSERT case.

Separate handling of the REF_CHECK and REF_INCREMENT flags to align
with both the DIX spec and the MPI implementation.

Link: https://lore.kernel.org/r/[email protected]
Fixes: b3e2c72af1d5 ("scsi: mpt3sas: Use the proper SCSI midlayer interfaces for PI")
Signed-off-by: Martin K. Petersen <[email protected]>

commit | commitdiff | tree

Linus Torvalds [Fri, 29 Oct 2021 17:54:44 +0000 (10:54 -0700)]

Merge tag 'mmc-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:

- tmio: Re-enable card irqs after a reset

- mtk-sd: Fixup probing of cqhci for crypto

- cqhci: Fix support for suspend/resume

- vub300: Fix control-message timeouts

- dw_mmc-exynos: Fix support for tuning

- winbond: Silences build errors on M68K

- sdhci-esdhc-imx: Fix support for tuning

- sdhci-pci: Read card detect from ACPI for Intel Merrifield

- sdhci: Fix eMMC support for Thundercomm TurboX CM2290

* tag 'mmc-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: tmio: reenable card irqs after the reset callback
  mmc: mediatek: Move cqhci init behind ungate clock
  mmc: cqhci: clear HALT state after CQE enable
  mmc: vub300: fix control-message timeouts
  mmc: dw_mmc: exynos: fix the finding clock sample value
  mmc: winbond: don't build on M68K
  mmc: sdhci-esdhc-imx: clear the buffer_read_ready to reset standard tuning circuit
  mmc: sdhci-pci: Read card detect from ACPI for Intel Merrifield
  mmc: sdhci: Map more voltage level to SDHCI_POWER_330

commit | commitdiff | tree

Linus Torvalds [Fri, 29 Oct 2021 17:46:59 +0000 (10:46 -0700)]

Merge tag 'for-5.15-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
"Last minute fixes for crash on 32bit architectures when compression is
  in use. It's a regression introduced in 5.15-rc and I'd really like
  not let this into the final release, fixes via stable trees would add
  unnecessary delay.

  The problem is on 32bit architectures with highmem enabled, the pages
  for compression may need to be kmapped, while the patches removed that
  as we don't use GFP_HIGHMEM allocations anymore. The pages that don't
  come from local allocation still may be from highmem. Despite being on
  32bit there's enough such ARM machines in use so it's not a marginal
  issue.

  I did full reverts of the patches one by one instead of a huge one.
  There's one exception for the "lzo" revert as there was an
  intermediate patch touching the same code to make it compatible with
  subpage. I can't revert that one too, so the revert in lzo.c is
  manual. Qu Wenruo has worked on that with me and verified the changes"

* tag 'for-5.15-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Revert "btrfs: compression: drop kmap/kunmap from lzo"
  Revert "btrfs: compression: drop kmap/kunmap from zlib"
  Revert "btrfs: compression: drop kmap/kunmap from zstd"
  Revert "btrfs: compression: drop kmap/kunmap from generic helpers"

commit | commitdiff | tree

Linus Torvalds [Fri, 29 Oct 2021 17:41:07 +0000 (10:41 -0700)]

Merge tag 'trace-v5.15-rc6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing comment fixes from Steven Rostedt:

- Some bots have informed me that some of the ftrace functions
   kernel-doc has formatting issues.

- Also, fix my snake instinct.

* tag 'trace-v5.15-rc6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix misspelling of "missing"
  ftrace: Fix kernel-doc formatting issues

commit | commitdiff | tree

Linus Torvalds [Fri, 29 Oct 2021 17:17:08 +0000 (10:17 -0700)]

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
"Fix a build-time warning in x86/sm4"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: x86/sm4 - Fix invalid section entry size

commit | commitdiff | tree

Linus Torvalds [Fri, 29 Oct 2021 17:03:07 +0000 (10:03 -0700)]

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"11 patches.

  Subsystems affected by this patch series: mm (memcg, memory-failure,
  oom-kill, secretmem, vmalloc, hugetlb, damon, and tools), and ocfs2"

* emailed patches from Andrew Morton <[email protected]>:
  tools/testing/selftests/vm/split_huge_page_test.c: fix application of sizeof to pointer
  mm/damon/core-test: fix wrong expectations for 'damon_split_regions_of()'
  mm: khugepaged: skip huge page collapse for special files
  mm, thp: bail out early in collapse_file for writeback page
  mm/vmalloc: fix numa spreading for large hash tables
  mm/secretmem: avoid letting secretmem_users drop to zero
  ocfs2: fix race between searching chunks and release journal_head from buffer_head
  mm/oom_kill.c: prevent a race between process_mrelease and exit_mmap
  mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
  mm: hwpoison: remove the unnecessary THP check
  memcg: page_alloc: skip bulk allocator for __GFP_ACCOUNT

commit | commitdiff | tree

Alexandre Ghiti [Fri, 29 Oct 2021 04:59:27 +0000 (06:59 +0200)]

riscv: Fix asan-stack clang build

Nathan reported that because KASAN_SHADOW_OFFSET was not defined in
Kconfig, it prevents asan-stack from getting disabled with clang even
when CONFIG_KASAN_STACK is disabled: fix this by defining the
corresponding config.

Reported-by: Nathan Chancellor <[email protected]>
Signed-off-by: Alexandre Ghiti <[email protected]>
Fixes: 8ad8b72721d0 ("riscv: Add KASAN support")
Cc: [email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

commit | commitdiff | tree

Alexandre Ghiti [Fri, 29 Oct 2021 04:59:26 +0000 (06:59 +0200)]

riscv: Do not re-populate shadow memory with kasan_populate_early_shadow

When calling this function, all the shadow memory is already populated
with kasan_early_shadow_pte which has PAGE_KERNEL protection.
kasan_populate_early_shadow write-protects the mapping of the range
of addresses passed in argument in zero_pte_populate, which actually
write-protects all the shadow memory mapping since kasan_early_shadow_pte
is used for all the shadow memory at this point. And then when using
memblock API to populate the shadow memory, the first write access to the
kernel stack triggers a trap. This becomes visible with the next commit
that contains a fix for asan-stack.

We already manually populate all the shadow memory in kasan_early_init
and we write-protect kasan_early_shadow_pte at the end of kasan_init
which makes the calls to kasan_populate_early_shadow superfluous so
we can remove them.

Signed-off-by: Alexandre Ghiti <[email protected]>
Fixes: e178d670f251 ("riscv/kasan: add KASAN_VMALLOC support")
Fixes: 8ad8b72721d0 ("riscv: Add KASAN support")
Cc: [email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

commit | commitdiff | tree

Steven Rostedt (VMware) [Fri, 29 Oct 2021 13:54:14 +0000 (09:54 -0400)]

tracing: Fix misspelling of "missing"

My snake instinct was on and I wrote "misssing" instead of "missing".

Signed-off-by: Steven Rostedt (VMware) <[email protected]>

commit | commitdiff | tree

Steven Rostedt (VMware) [Fri, 29 Oct 2021 13:52:23 +0000 (09:52 -0400)]

ftrace: Fix kernel-doc formatting issues

Some functions had kernel-doc that used a comma instead of a hash to
separate the function name from the one line description.

Also, the "ftrace_is_dead()" had an incomplete description.

Signed-off-by: Steven Rostedt (VMware) <[email protected]>

commit | commitdiff | tree

David Sterba [Wed, 27 Oct 2021 08:44:21 +0000 (10:44 +0200)]

Revert "btrfs: compression: drop kmap/kunmap from lzo"

This reverts commit 8c945d32e60427cbc0859cf7045bbe6196bb03d8.

The kmaps in compression code are still needed and cause crashes on
32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004
with enabled LZO or ZSTD compression.

The revert does not apply cleanly due to changes in a6e66e6f8c1b
("btrfs: rework lzo_decompress_bio() to make it subpage compatible")
that reworked the page iteration so the revert is done to be equivalent
to the original code.

Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839
Tested-by: Qu Wenruo <[email protected]>
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

David Sterba [Wed, 27 Oct 2021 08:42:43 +0000 (10:42 +0200)]

Revert "btrfs: compression: drop kmap/kunmap from zlib"

This reverts commit 696ab562e6df9fbafd6052d8ce4aafcb2ed16069.

The kmaps in compression code are still needed and cause crashes on
32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004
with enabled LZO or ZSTD compression.

Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

David Sterba [Wed, 27 Oct 2021 08:42:27 +0000 (10:42 +0200)]

Revert "btrfs: compression: drop kmap/kunmap from zstd"

This reverts commit bbaf9715f3f5b5ff0de71da91fcc34ee9c198ed8.

The kmaps in compression code are still needed and cause crashes on
32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004
with enabled LZO or ZSTD compression.

Example stacktrace with ZSTD on a 32bit ARM machine:

  Unable to handle kernel NULL pointer dereference at virtual address 00000000
  pgd = c4159ed3
  [00000000] *pgd=00000000
  Internal error: Oops: 5 [#1] PREEMPT SMP ARM
  Modules linked in:
  CPU: 0 PID: 210 Comm: kworker/u2:3 Not tainted 5.14.0-rc79+ #12
  Hardware name: Allwinner sun4i/sun5i Families
  Workqueue: btrfs-delalloc btrfs_work_helper
  PC is at mmiocpy+0x48/0x330
  LR is at ZSTD_compressStream_generic+0x15c/0x28c

  (mmiocpy) from [<c0629648>] (ZSTD_compressStream_generic+0x15c/0x28c)
  (ZSTD_compressStream_generic) from [<c06297dc>] (ZSTD_compressStream+0x64/0xa0)
  (ZSTD_compressStream) from [<c049444c>] (zstd_compress_pages+0x170/0x488)
  (zstd_compress_pages) from [<c0496798>] (btrfs_compress_pages+0x124/0x12c)
  (btrfs_compress_pages) from [<c043c068>] (compress_file_range+0x3c0/0x834)
  (compress_file_range) from [<c043c4ec>] (async_cow_start+0x10/0x28)
  (async_cow_start) from [<c0475c3c>] (btrfs_work_helper+0x100/0x230)
  (btrfs_work_helper) from [<c014ef68>] (process_one_work+0x1b4/0x418)
  (process_one_work) from [<c014f210>] (worker_thread+0x44/0x524)
  (worker_thread) from [<c0156aa4>] (kthread+0x180/0x1b0)
  (kthread) from [<c0100150>]

Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

David Yang [Thu, 28 Oct 2021 21:36:36 +0000 (14:36 -0700)]

tools/testing/selftests/vm/split_huge_page_test.c: fix application of sizeof to pointer

The coccinelle check report:

./tools/testing/selftests/vm/split_huge_page_test.c:344:36-42:
ERROR: application of sizeof to pointer

Use "strlen" to fix it.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Yang <[email protected]>
Reported-by: Zeal Robot <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

SeongJae Park [Thu, 28 Oct 2021 21:36:33 +0000 (14:36 -0700)]

mm/damon/core-test: fix wrong expectations for 'damon_split_regions_of()'

Kunit test cases for 'damon_split_regions_of()' expects the number of
regions after calling the function will be same to their request
('nr_sub'). However, the requested number is just an upper-limit,
because the function randomly decides the size of each sub-region.

This fixes the wrong expectation.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 17ccae8bb5c9 ("mm/damon: add kunit tests")
Signed-off-by: SeongJae Park <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Empty description

This page took 0.163303 seconds and 4 git commands to generate.